Skip to content

Releases: apache/druid

Druid 37.0.0

08 May 22:00

Choose a tag to compare

Apache Druid 37.0.0 contains over 255 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 29 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 37.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

Important features, changes, and deprecations

This section contains important information about new and existing features.

Hadoop-based ingestion

Support for Hadoop-based ingestion has been removed. The feature was deprecated in Druid 34.

Use one of Druid's other supported ingestion methods, such as SQL-based ingestion or MiddleManager-less ingestion using Kubernetes.

#19109

Query blocklist

You can now use the Broker API (/druid/coordinator/v1/config/broker) to create a query blocklist to dynamically block queries by datasource, query type, or query context. The blocklist takes effect without a restarting Druid. Block rules use AND logic, which means all criteria must match.

The following example blocks all groupBy queries on the wikipedia datasource with a query context parameter of priority equal to 0:

POST /druid/coordinator/v1/config/broker
  {
    "queryBlocklist": [
      {
        "ruleName": "block-wikipedia-groupbys",
        "dataSources": ["wikipedia"],
        "queryTypes": ["groupBy"],
        "contextMatches": {"priority": "0"}
      }
    ]
  }

#19011

Minor compaction for Overlord-based compaction (experimental)

You can now configure minor compaction to compact only newly ingested segments while upgrading existing compacted segments. When Druid upgrades segments, it updates the metadata instead of using resources to compact it again. You can use the native compaction engine or the MSQ task engine.

Use the mostFragmentedFirst compaction policy and set either a percentage of rows-based or byte-based threshold for minor compaction.

#19059 #19205 #19016

Cascading reindexing (experimental)

Using cascading reindexing, you can now define age-based rules to automatically apply different compaction configurations based on the age of your data. While standard auto-compaction applies a single flat configuration across an entire datasource, cascading reindexing lets you tailor your compaction settings to the characteristics of your data.

For example, you can keep recent data in hourly segments while automatically rolling up to daily segments after 90 days to reduce segment count. You can also layer on age-based row deletion (such as dropping bot traffic from older data), change compression settings, or shift to rollup with coarser query granularity as data ages. Rules are defined inline in the supervisor spec.

You must use compaction supervisors with the MSQ task engine to use cascading reindexing.

#18939 #19213 #19106 #19078

Multi-supervisor ingestion

Multi-supervisor ingestion is now generally available. You can run multiple stream supervisors that ingest into the same datasource.

#18983

Read-only authorizer

Added a ReadOnly authorizer to Druid. This is the first global authorizer for Druid. The authorizer enforces a global restriction on all non-READ operations, denying them regardless of individual user permissions. You can use this capability to ensure all users of a specific authorizer are limited to READ access.

There is a known limitation where some endpoints currently require WRITE access despite being READ-only, such as GET /druid/indexer/v1/supervisor. These operations will fail.

#19243

Thrift input format

As part of the Thrift contributor extension, Druid now supports Thrift-encoded data for Kafka and Kinesis streaming ingestion using InputFormat. Previously, Druid supported this through parsers, which have been removed in Druid 37.

#19111

To use this feature, you must add druid-thrift-extensions to your extension load list.

Incremental cache

Incremental segment metadata cache (useIncrementalCache) is now generally available and defaults to ifSynced. Druid blocks reads from the cache until it has synced with the metadata store at least once after becoming leader.

#19252

Kubernetes-based task management

This extension is now generally available.

#19128

Dynamic default query context

You can now add default query context parameters as a dynamic configuration to the Broker. This allows you to override static defaults set in your runtime properties without restarting your deployment or having to update multiple queries individually. Druid applies query context parameters based on the following priority:

  1. The query context included with the query
  2. The query context set as a dynamic configuration on the Broker
  3. The query context parameters set in the runtime properties
  4. The defaults that ship with Druid

Note that like other Broker dynamic configuration, this is best-effort. Settings may not be applied in certain
cases, such as when a Broker has recently started and hasn't received the configuration yet, or if the
Broker can't contact the Coordinator. If a query context parameter is critical for all your queries, set it in the runtime properties.

#19144

sys.queries table (experimental)

The new system queries table provides information about currently running and recently completed queries that use the Dart engine. This table is off by default. To enable the table, set the following:

druid.sql.planner.enableSysQueriesTable = true

As part of this change, the /druid/v2/sql/queries API now supports an includeComplete parameter that shows recently completed queries.

#18923

Auto-compaction with compaction supervisors

Auto-compaction using compaction supervisors has been improved, now generally available, and the recommended default. Automatic compaction tasks are now prefixed with auto instead of coordinator-issued.

As part of the improvement compaction states are now stored in a central location, a new indexingStates table. Individual segments only need to store a unique reference (indexing_state_fingerprint) to their full compaction state.

Since many segments in a single datasource share the same underlying compaction state, this greatly reduces metadata storage requirements for automatic compaction.

For backwards compatibility, Druid continues to persist the detailed compaction state in each segment. This functionality will be removed in a future release.

You can stop storing detailed compaction state by setting storeCompactionStatePerSegment to false in the cluster compaction config. If you turn it off and need to downgrade, Druid needs to re-compact any segments that have been compacted since you changed the config.

This change has upgrade impacts for metadata storage and metadata caching. For more information, see the Metadata storage for auto-compaction with compaction supervisors upgrade note.

#19113 #18844 #19252

Broker tier selection for realtime servers

Added druid.broker.realtime.select.tier and druid.broker.realtime.balancer.type on the Brokers to optionally override the Broker’s tier selection and balancer strategies for realtime servers. If these properties are not set (the default), realtime servers continue to use the existing druid.broker.select and druid.broker.balancer configurations that apply to both historical and realtime servers.

#19062

Manual Broker routing in the web console

You can now configure which Broker the Router uses for queries issued from the web console. You may want to do this if there are Brokers that don't have visibility into certain data tiers, and you know you're querying data available only on a certain tier.

To specify a Broker, add the following config to web-console/console-config.js:

consoleBrokerService: 'druid/BROKER_NAME'

#19069

Consul extension

The contributor extension druid-consul-extensions lets Druid clusters use Consul for service discovery and
Coordinator/Overlord leader election instead of ZooKeeper. The extension supports ACLs, TLS/mTLS, and metrics.

Before you switch to Consul, you need to set
druid.serverview.type=http and druid.indexer.runner.type=httpRemote cluster wide.

#18843

Functional area and related changes

This section contains detailed release notes separated by areas.

Web console

Changed storage column displays

The following improvements have been m...

Read more

Druid 36.0.0

09 Feb 14:50

Choose a tag to compare

Apache Druid 36.0.0 contains over 189 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 34 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes before you upgrade to Druid 36.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

Important features, changes, and deprecations

This section contains important information about new and existing features.

Functional area and related changes

This section contains detailed release notes separated by areas.

Druid operator

Druid Operator is a Kubernetes controller that manages the lifecycle of your Druid clusters. The operator simplifies the management of Druid clusters with its custom logic that is configurable through
Kubernetes CRDs.

#18435

Cost-based autoscaling for streaming ingestion

Druid now supports cost-based autoscaling for streaming ingestion that optimizes task count by balancing lag reduction against resource efficiency.. This autoscaling strategy uses the following formula:

totalCost = lagWeight × lagRecoveryTime + idleWeight × idlenessCost

which accounts for the time to clear the backlog and compute time:

lagRecoveryTime = aggregateLag / (taskCount × avgProcessingRate) — time to clear backlog
idlenessCost = taskCount × taskDuration × predictedIdleRatio — wasted compute time

#18819

Kubernetes client mode (experimental)

In kubernetes-overlord-extensions an experimental Kubernetes client mode was added. The new mode uses the fabric8 SharedInformers to cache k8s metadata. This greatly reduces API traffic between the Overlord and k8s control plane. You can try out this feature using the following config:

druid.indexer.runner.useK8sSharedInformers=true

#18599

cgroup v2 support

cgroup v2 is now supported, and all cgroup metrics now emit cgroupversion to identify which version is being used.

The following metrics automatically switch to v2 if v2 is detected: CgroupCpuMonitor , CgroupCpuSetMonitor, CgroupDiskMonitor,MemoryMonitor. CpuAcctDeltaMonitor fails gracefully if v2 is detected.

Additionally, CgroupV2CpuMonitor now also emits cgroup/cpu/shares and cgroup/cpu/cores_quota.

#18705

Query reports for Dart

Dart now supports query reports for running and recently completed queries. The reports can be fetched from the /druid/v2/sql/queries/<sqlQueryId>/reports endpoint.

The format of the response is a JSON object with two keys, "query" and "report". The "query" key is the same info that is available from the existing /druid/v2/sql/queries endpoint. The "report" key is a report map including an MSQ report.

You can control the retention behavior for reports using the following configs:

  • druid.msq.dart.controller.maxRetainedReportCount: Max number of reports that are retained. The default is 0, meaning no reports are retained
  • druid.msq.dart.controller.maxRetainedReportDuration: How long reports are retained in ISO 8601 duration format. The default is PT0S, meaning time-based expiration is turned off

#18886

New segment format

The new version 10 segment format improves upon version 9. It is off by default and not compatible with older segment format versions.

Set druid.indexer.task.buildV10=true to make segments in the new format.

If you downgrade, you must reindex your data with a supported segment format version.

You can use the bin/dump-segment tool to view segment metadata. The tool outputs serialized JSON.

#18880 #18901

Web console

New info available in the web console

The web console now includes information about the number of available processors and the total memory (in binary bytes).

This information is also available through the sys.servers table.

#18613

Other web console improvements

  • Added tracking for inactive workers for MSQ execution stages #18768
  • Added a refresh button for JSON views and stage viewers #18768
  • You can now define ARRAY type parameters in the query view #18586
  • Changed system table queries to now automatically use the native engine #18857
  • Improved time charts to support multiple measures #18701

Ingestion

  • Added support for AWS InternalError code retries #18720
  • Improved ingestion to be more resilient. Ingestion tasks no longer fail if the task log upload fails with an exception #18748
  • Improved how Druid handles situations where data doesn't match the expected type #18878
  • Improved JSON ingestion so that Druid can compute JSON values directly from dictionary or index structures, allowing ingestion to skip persisting raw JSON data entirely. This reduces on-disk storage size #18589
  • You can now choose between full dictionary-based indexing and nulls-only indexing for long/double fields in a nested column #18722

SQL-based ingestion

Additional ingestion configurations

You can now use the following configs to control how your data gets ingested and stored:

  • maxInputFilesPerWorker: Controls the maximum number of input files or segments per worker.
  • maxPartitions: Controls the maximum number of output partitions for any single stage, which affects how many segments are generated during ingestion.

#18826

Other SQL-based ingestion improvements
  • Added maxRowsInMemory to replace rowsInMemory. rowsInMemory now functions as an alternate way to provide that config and is ignored if maxRowsInMemory is specified. Previously, only rowsInMemory existed #18832

Streaming ingestion

Record offset and partition

You can now ingest the record offset (offsetColumnName) and partition (partitionColumnName) using the KafkaInputFormat. Their default names are kafka.offset and kafka.partition respectively .

#18757

Other streaming ingestion improvements
  • Improved supervisors so that they can't kill tasks while the supervisor is stopping #18767
  • Improved the lag-based autoscaler for streaming ingestion #18745
  • Improved the SeekableStream supervisor autoscaler to wait for tasks to complete before attempting subsequent scale operations. This helps prevent duplicate supervisor history entries #18715

Querying

Other querying improvements

  • Improved the user experience for invalid regex_exp queries. An error gets returned now #18762

Cluster management

Dynamic capacity for Kubernetes-based deployments

Druid can now dynamically tune the task runner capacity.

Include the capacity field in a POST API call to /druid/indexer/v1/k8s/taskrunner/executionconfig. Setting a value this way overrides druid.indexer.runner.capacity.

#18591

Server properties table

The sys.server_properties table exposes the runtime properties configured for each Druid server. Each row represents a single property key-value pair associated with a specific server.

#18692

Other cluster management improvements

  • Added quality of service filtering for the Overlord so that health check threads don't get blocked #18033

Data management

Other data management improvements

  • Added the mostFragmentedFirst compaction policy that prioritizes intervals with the most small ...
Read more

Druid 35.0.1

15 Dec 20:23

Choose a tag to compare

Apache Druid 35.0.1 is a patch release that contains important fixes for historical server segment dropping and the protobuf input format extension.

For information about new features in Druid 35, see the Druid 35 release notes.

Bug fixes

  • Fixed an issue where dropping segments didn't remove the memory mapping of segment files, leaving file descriptors of the deleted files open until the process exits #18782
  • Fixed an issue where the URL path for the protobuf input format wasn't loading #18770

Dependency updates

  • Updated lz4-java to 1.8.1 #18804

Druid 35.0.0

18 Nov 04:34

Choose a tag to compare

Apache Druid 35.0.0 contains over 229 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 29 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 35.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features, changes, and deprecations

This section contains important information about new and existing features.

# Jetty

Druid 35 uses Jetty 12. This change may impact your deployment. For more information,see the upgrade note for Jetty 12

# Java support

Druid now supports Java 21. Note that some versions of Java 21 encountered issues during test, specifically Java 21.05-21.07. If possible, avoid these versions.

Additionally, support for Java 11 has been removed. Upgrade to Java 17 or 21.

#18424 #18624

# Projections

Projections have been improved:

  • Static filters are now supported
  • The granularity in queries can match UTC time zones

Additionally, there have been general improvements to performance and reliability.

#18342 #18535 #18403

# Virtual storage (experimental)

Virtual storage fabric mode for enables Historical servers to serve more segments than what their physical disk can hold. Instead of loading segments when they're published, segments get loaded on demand during queries. Any unneeded segments get removed from disk but get loaded again when processing a query requires that particular segment.

To enable virtual storage, set druid.segmentCache.virtualStorage to true.

This feature is experimental and has several limitations. For more information, see the pull request:

#18176

# New input format

Druid now supports a lines input format. Druid reads each line from an input as UTF-8 text and creates a single column named line that contains the entire line as a string. Use this for reading line-oriented data in a simple form for later processing.

#18433

# Multi-stage query task engine

The MSQ task engine is now a core capability of Druid rather than an extension. It has been in the default extension load list for several releases.

Remove druid-multi-stage-query from druid.extensions.loadList in common.runtimes.properties before you upgrade.

Druid 35.0.0 will ignore the extension if it's in the load list. Future versions of Druid will fail to start since it can't locate the extension.

#18394

# Improved monitor loading

You can now specify monitors in common.runtime.properties and each monitor will be loaded only on the applicable server types. Previously, you needed to define monitors in the specific runtime.properties file for the service a monitor is meant for.

#18321

# Exact count extension

A new contributor extension (druid-exact-count-bitmap) adds support for exact cardinality counting using Roaring Bitmap over a Long column.

#18021

# Improved indexSpec

Users can now specify a format specification for each JSON column individually, which will override the IndexSpec defined in the ingestion job. Additionally, a system-wide default IndexSpec can be set using the druid.indexing.formats.indexSpec property.

#17762 #18638

# Functional area and related changes

This section contains detailed release notes separated by areas.

# Web console

# Time zones

You can now configure whether the web console displays local time or UTC. This setting is stored locally in your browser and doesn't impact other users.

Note that the URL maintains the query parameters in UTC time, but the Druid console automatically converts the filter to local time.

#18455

# Other web console improvements

  • Added better support for MSQ task engine-based compaction tasks. They now use the stages pane to render the compaction report instead of showing the JSON #18545
  • Added a version column to the Services tab so that you can see what version a service is running. This is helpful during rolling upgrades to verify the state of the cluster and upgrade #18542
  • Improved the resiliency of the web console when the supervisor history is extensive #18416

# Ingestion

# Dimension schemas

At ingestion time, dimension schemas in dimensionsSpec are now strictly validated against allowed types. Previously an invalid type would fall back to string dimension. Now, such values are rejected. Users must specify a type that's one of the allowed types. Omitting type still defaults to string, preserving backward compatibility.

#18565

# Other ingestion improvements

  • Added support for session tokens (sessionToken) to the S3 input source #18609
  • Improved performance of task APIs served by the Overlord. Druid now reads the in-memory state of the Overlord before fetching task information from the metadata store #18448
  • Improved task execution so that they can successfully complete even if there are problems pushing logs and reports to deep storage #18210

# SQL-based ingestion

# Other SQL-based ingestion improvements
  • Added the ability to configure the maximum frame size. Generally, you don't need to change this unless you have very large rows #18442
  • Added logging for when segment processing fails #18378
  • Improved logging to store the cause of invalid field exceptions [#18517](ht...
Read more

Druid 34.0.0

11 Aug 15:12

Choose a tag to compare

Apache Druid 34.0.0 contains over 270 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 48 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid {{DRUIDVERSION}}.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

Important features, changes, and deprecations

This section contains important information about new and existing features.

Java 11 support

Java 11 support has been deprecated since Druid 32.0, and official support will be removed as early as Druid 35.0.0

Hadoop-based ingestion

Hadoop-based ingestion has been deprecated since Druid 32.0 and will be removed as early as Druid 35.0.0.
We recommend one of Druid's other supported ingestion methods, such as SQL-based ingestion or MiddleManager-less ingestion using Kubernetes.

As part of this change, you must now opt-in to using the deprecated index_hadoop task type. If you don't do this, your Hadoop-based ingestion tasks will fail.

To opt-in, set druid.indexer.task.allowHadoopTaskExecution to true in your common.runtime.properties file.

#18239

Use SET statements for query context parameters

You can now use SET statements to define query context parameters for a query through the Druid console or the API.

#17894 #17974

SET statements in the Druid console

The web console now supports using SET statements to specify query context parameters. For example, if you include SET timeout = 20000; in your query, the timeout query context parameter is set:

SET timeout = 20000;
SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2

#17966

SET statements with the API

SQL queries issued to /druid/v2/sql can now include multiple SET statements to build up context for the final statement. For example, the following SQL query results includes the timeout, useCache, populateCache, vectorize, and engine query context parameters:

SET timeout = 20000;
SET useCache = false;
SET populateCache = false;
SET vectorize = 'force';
SET engine = 'msq-dart'
SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2

The API call for this query looks like the following:

curl --location 'http://HOST:PORT/druid/v2/sql' \
--header 'Content-Type: application/json' \
--data '{
  "query": "SET timeout=20000; SET useCache=false; SET populateCache=false; SET engine='\''msq-dart'\'';SELECT  user,  commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia GROUP BY 1, 2 ORDER BY 2 DESC",
  "resultFormat": "array",
  "header": true,
  "typesHeader": true,
  "sqlTypesHeader": true
}'

This improvement also works for INSERT and REPLACE queries using the MSQ task engine. Note that JDBC isn't supported.

Improved HTTP endpoints

You can now use raw SQL in the HTTP body for /druid/v2/sql endpoints. You can set Content-Type to text/plain instead of application/json, so you can provide raw text that isn't escaped.

#17937

Cloning Historicals (experimental)

You can now configure clones for Historicals using the dynamic Coordinator configuration cloneServers. Cloned Historicals are useful for situations such as rolling updates where you want to launch a new Historical as a replacement for an existing one.

Set the config to a map from the target Historical server to the source Historical:

  "cloneServers": {"historicalClone":"historicalOriginal"}

The clone doesn't participate in regular segment assignment or balancing. Instead, the Coordinator mirrors any segment assignment made to the original Historical onto the clone, so that the clone becomes an exact copy of the source. Segments on the clone Historical do not count towards replica counts either. If the original Historical disappears, the clone remains in the last known state of the source server until removed from the cloneServers config.

When you query your data using the native query engine, you can prefer (preferClones), exclude (excludeClones), or include (includeClones) clones by setting the query context parameter cloneQueryMode. By default, clones are excluded.

As part of this change, new Coordinator APIs are available. For more information, see Coordinator APIs for clones.

#17863 #17899 #17956

Embedded kill tasks on the Overlord (Experimental)

You can now run kill tasks directly on the Overlord itself. Embedded kill tasks provide several benefits; they:

  • Kill segments as soon as they're eligible
  • Don't take up tasks slot
  • finish faster since they use optimized metadata queries and don't launch a new JVM
  • Kill a small number of segments per task, ensuring locks on an interval aren't held for too long
  • Skip locked intervals to avoid head-of-line blocking
  • Require minimal configuration
  • Can keep up with a large number of unused segments in the cluster

This feature is controlled by the following configs:

  • druid.manager.segments.killUnused.enabled - Whether the feature is enabled or not (Defaults to false)
  • druid.manager.segments.killUnused.bufferPeriod - The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused (Defaults to P30D)

To use embedded kill tasks, you need to have segment metadata cache enabled.

As part of this feature, new metrics have been added.

#18028 #18124

Preferred tier selection

You can now configure the Broker service to prefer Historicals on a specific tier. This is useful for across availability zone deployment. Brokers in one AZ select historicals in the same AZ by default but still keeps the ability to select historical nodes in another AZ if historicals in the same AZ are not available.

To enable, set property druid.broker.select.tier to perferred in Broker runtime properties. You can then configure druid.broker.select.tier.preferred.tier to the tier you want each broker to prefer (i.e. for brokers in AZ1, you could set this to the tier name of your AZ1 historical servers).

#18136

Dart improvements

The Dart query engine now uses the /druid/v2/sql endpoint like other SQL query engines. The former Dart specific endpoint is no longer supported. To use Dart for a query, include the engine query context parameter and set it to msq-dart.

#18003 #18003

Enabling Dart remains the same, add the following line to your broker/runtime.properties and historical/runtime.properties files:

druid.msq.dart.enabled = true

Additionally, Dart now queries real-time tasks by default. You can control this behavior by setting the query context parameter includeSegmentSource to REALTIME (default) or NONE, in a similar way to MSQ tasks. You can also run synchronous or asynchronous queries.

#18076 #18241

SegmentMetadataCache on the Coordinator

#17996 #17935

Functional area and related changes

This section contains detailed release notes separated by areas.

Web console

  • You can now assign tiered replications to tiers that aren't currently online #18050
  • You can now filter tasks by the error in the Task view #18057
  • Improved SQL autocomplete and added JSON autocomplete #18126
  • Changed how the web console det...
Read more

Druid 33.0.0

29 Apr 13:10

Choose a tag to compare

Apache Druid 33.0.0 contains over 190 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 44 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes before you upgrade to Druid 33.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features, changes, and deprecations

This section contains important information about new and existing features.

# Increase segment load speed

You can now increase the speed at which segments get loaded on a Historical by providing a list of servers for the Coordinator dynamic config turboLoadingNodes. For these servers, the Coordinator ignores druid.coordinator.loadqueuepeon.http.batchSize and uses the value of the respective numLoadingThreads instead. Please note that putting a Historical in turbo-loading mode might affect query performance since more resources would be used by the segment loading threads.

#17775

# Overlord APIs for compaction (experimental)

You can use the following Overlord compaction APIs to manage compaction status and configs. These APIs work seamlessly irrespective of whether compaction supervisors are enabled or not.

For more information, see Compaction APIs

#17834

# Scheduled batch ingestion (experimental)

You can now schedule batch ingestions with the MSQ task engine by using the scheduled batch supervisor. You can specify the schedule using either the standard Unix cron syntax or Quartz cron syntax by setting the type field to either unix or quartz. Unix also supports macro expressions such as @daily and others.

Submit your supervisor spec to the /druid/v2/sql/task/ endpoint.

The following example scheduled batch supervisor spec submits a REPLACE query every 5 minutes:

{
    "type": "scheduled_batch",
    "schedulerConfig": {
        "type": "unix",
        "schedule": "*/5 * * * *"
    },
    "spec": {
        "query": "REPLACE INTO foo OVERWRITE ALL SELECT * FROM bar PARTITIONED BY DAY"
    },
    "suspended": false
}

#17353

# Improved S3 upload

Druid can now use AWS S3 Transfer Manager for S3 uploads, which can significantly reduce segment upload time. This feature is on by default and controlled with the following configs in common.runtime.properties:

    druid.storage.transfer.useTransferManager=true
    druid.storage.transfer.minimumUploadPartSize=20971520
    druid.storage.transfer.multipartUploadThreshold=20971520

#17674

# Functional area and related changes

This section contains detailed release notes separated by areas.

# Web console

# MERGE INTO

The MERGE INTO keyword is now highlighted in the web console and the query gets treated as an insert query.

#17679

# Other web console improvements

  • Added the ability to multi-select in table filters and added suggestions to the Status field for tasks and supervisors as well as service type #17765
  • The Explore view now supports timezones #17650
  • Data exported from the web console is now normalized to how Druid exports data. Additionally, you can now export results as Markdown tables #17845

# Ingestion

# SQL-based ingestion

# Other SQL-based ingestion improvements

# Streaming ingestion

# Query parameter for restarts

You can now use an optional query parameter called skipRestartIfUnmodified for the /druid/indexer/v1/supervisor endpoint. You can set skipRestartIfUnmodified=true to not restart the supervisor if the spec is unchanged.

For example:

curl -X POST --header "Content-Type: application/json" -d @supervisor.json localhost:8888/druid/indexer/v1/supervisor?skipRestartIfUnmodified=true

#17707

# Other streaming ingestion improvements
  • Improved the efficiency of streaming ingestion by fetching active tasks from memory. This reduces the number of calls to the metadata store for active datasource task payloads #16098

# Querying

# Improved the query results API

The query results API (`GE...

Read more

Druid 32.0.1

19 Mar 16:07

Choose a tag to compare

The Apache Druid team is proud to announce the release of Apache Druid 32.0.1.
Druid is a high performance analytics data store for event-driven data.

Apache Druid 32.0.1 contains security fixes for CVE-2025-27888.

Source and binary distributions can be downloaded from:
https://druid.apache.org/downloads.html

Full Changelog: druid-32.0.0...druid-32.0.1

A big thank you to all the contributors in this milestone release!

Druid 31.0.2

19 Mar 16:07

Choose a tag to compare

The Apache Druid team is proud to announce the release of Apache Druid 31.0.2.
Druid is a high performance analytics data store for event-driven data.

Apache Druid 31.0.2 contains security fixes for CVE-2025-27888.

Source and binary distributions can be downloaded from:
https://druid.apache.org/downloads.html

Full Changelog: druid-31.0.1...druid-31.0.2

A big thank you to all the contributors in this milestone release!

Druid 32.0.0

13 Feb 08:00

Choose a tag to compare

Apache Druid 32.0.0 contains over 220 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 52 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the incompatible changes before you upgrade to Druid 32.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features

This section contains important information about new and existing features.

# New Overlord APIs

APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service:

  • Mark all segments of a datasource as unused:
    POST /druid/indexer/v1/datasources/{dataSourceName}

  • Mark all (non-overshadowed) segments of a datasource as used:
    DELETE /druid/indexer/v1/datasources/{dataSourceName}

  • Mark multiple segments as used
    POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed

  • Mark multiple (non-overshadowed) segments as unused
    POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused

  • Mark a single segment as used:
    POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}

  • Mark a single segment as unused:
    DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}

As part of this change, the corresponding Coordinator APIs have been deprecated and will be removed in a future release:

  • POST /druid/coordinator/v1/datasources/{dataSourceName}
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/markUsed
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/markUnused
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}
  • DELETE /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}
  • DELETE /druid/coordinator/v1/datasources/{dataSourceName}

The Coordinator now calls the Overlord to serve these requests.

#17545

# Realtime query processing for multi-value strings

Realtime query processing no longer considers all strings as multi-value strings during expression processing, fixing a number of bugs and unexpected failures. This should also improve realtime query performance of expressions on string columns.

This change impacts topN queries for realtime segments where rows of data are implicitly null, such as from a property missing from a JSON object.

Before this change, these were handled as [] instead of null, leading to inconsistency between processing realtime segments and published segments. When processing segments, the value was treated as [], which topN ignores. After publishing, the value became null, which topN does not ignore. The same query could have different results before and after being persisted

After this change, the topN engine now treats [] as null when processing realtime segments, which is consistent with published segments.

This change doesn't impact actual multi-value string columns, regardless of if they're realtime.

#17386

# Join hints in MSQ task engine queries

Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.

#17541

# Changes and deprecations

# ANSI-SQL compatibility and query results

Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed:

  • druid.generic.useDefaultValueForNull=true
  • druid.expressions.useStrictBooleans=false
  • druid.generic.useThreeValueLogicForNativeFilters=false

They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now.

If the configs are set to the legacy behavior, Druid services will fail to start.

If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade.

For more information about how to update your queries, see the migration guide.

#17568 #17609

# Java support

Java support in Druid has been updated:

  • Java 8 support has been removed
  • Java 11 support is deprecated

We recommend that you upgrade to Java 17.

#17466

# Hadoop-based ingestion

Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion.

# Join hints in MSQ task engine queries

Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.

select /*+ sort_merge */ w1.cityName, w2.countryName
from
(
  select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName
) w1
JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName
where w1.cityName='New York';

(#17406)

# Functional area and related changes

This section contains detailed release notes separated by areas.

# Web console

# Explore view (experimental)

Several improvements have been made to the Explore view in the web console.

#17627

# Segment timeline view

The segment timeline is now more interactive and no longer forces day granularity.

#17521

# Other web conso...

Read more

Druid 31.0.1

25 Dec 05:42

Choose a tag to compare

Apache Druid 31.0.1 is a patch release that contains important fixes for topN queries using query granularity other than 'ALL' and for the new complex metric column compression feature introduced in Druid 31.0.0. It also contains fixes for the web console, the new projections feature, and a fix for a minor performance regression.

See the complete set of changes for 31.0.1 for additional details.

For information about new features in Druid 31, see the Druid 31 release notes.

#Bug fixes

  • Fixes an issue with topN queries that use a query granularity other than 'ALL', which could cause some query correctness issues #17565
  • Fixes an issue with complex metric compression that caused some data to be read incorrectly, resulting in segment data corruption or system instability due to out-of-memory exceptions. We recommend that you reingest data if you use compression for complex metric columns #17422
  • Fixes an issue with projection segment merging #17460
  • Fixes web console progress indicator #17334
  • Fixes a minor performance regression with query processing #17397

# Credits

@clintropolis
@findingrish
@gianm
@techdocsmith
@vogievetsky