Releases: apache/druid
Druid 37.0.0
Apache Druid 37.0.0 contains over 255 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 29 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes and incompatible changes before you upgrade to Druid 37.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
Important features, changes, and deprecations
This section contains important information about new and existing features.
Hadoop-based ingestion
Support for Hadoop-based ingestion has been removed. The feature was deprecated in Druid 34.
Use one of Druid's other supported ingestion methods, such as SQL-based ingestion or MiddleManager-less ingestion using Kubernetes.
Query blocklist
You can now use the Broker API (/druid/coordinator/v1/config/broker) to create a query blocklist to dynamically block queries by datasource, query type, or query context. The blocklist takes effect without a restarting Druid. Block rules use AND logic, which means all criteria must match.
The following example blocks all groupBy queries on the wikipedia datasource with a query context parameter of priority equal to 0:
POST /druid/coordinator/v1/config/broker
{
"queryBlocklist": [
{
"ruleName": "block-wikipedia-groupbys",
"dataSources": ["wikipedia"],
"queryTypes": ["groupBy"],
"contextMatches": {"priority": "0"}
}
]
}
Minor compaction for Overlord-based compaction (experimental)
You can now configure minor compaction to compact only newly ingested segments while upgrading existing compacted segments. When Druid upgrades segments, it updates the metadata instead of using resources to compact it again. You can use the native compaction engine or the MSQ task engine.
Use the mostFragmentedFirst compaction policy and set either a percentage of rows-based or byte-based threshold for minor compaction.
Cascading reindexing (experimental)
Using cascading reindexing, you can now define age-based rules to automatically apply different compaction configurations based on the age of your data. While standard auto-compaction applies a single flat configuration across an entire datasource, cascading reindexing lets you tailor your compaction settings to the characteristics of your data.
For example, you can keep recent data in hourly segments while automatically rolling up to daily segments after 90 days to reduce segment count. You can also layer on age-based row deletion (such as dropping bot traffic from older data), change compression settings, or shift to rollup with coarser query granularity as data ages. Rules are defined inline in the supervisor spec.
You must use compaction supervisors with the MSQ task engine to use cascading reindexing.
Multi-supervisor ingestion
Multi-supervisor ingestion is now generally available. You can run multiple stream supervisors that ingest into the same datasource.
Read-only authorizer
Added a ReadOnly authorizer to Druid. This is the first global authorizer for Druid. The authorizer enforces a global restriction on all non-READ operations, denying them regardless of individual user permissions. You can use this capability to ensure all users of a specific authorizer are limited to READ access.
There is a known limitation where some endpoints currently require WRITE access despite being READ-only, such as GET /druid/indexer/v1/supervisor. These operations will fail.
Thrift input format
As part of the Thrift contributor extension, Druid now supports Thrift-encoded data for Kafka and Kinesis streaming ingestion using InputFormat. Previously, Druid supported this through parsers, which have been removed in Druid 37.
To use this feature, you must add druid-thrift-extensions to your extension load list.
Incremental cache
Incremental segment metadata cache (useIncrementalCache) is now generally available and defaults to ifSynced. Druid blocks reads from the cache until it has synced with the metadata store at least once after becoming leader.
Kubernetes-based task management
This extension is now generally available.
Dynamic default query context
You can now add default query context parameters as a dynamic configuration to the Broker. This allows you to override static defaults set in your runtime properties without restarting your deployment or having to update multiple queries individually. Druid applies query context parameters based on the following priority:
- The query context included with the query
- The query context set as a dynamic configuration on the Broker
- The query context parameters set in the runtime properties
- The defaults that ship with Druid
Note that like other Broker dynamic configuration, this is best-effort. Settings may not be applied in certain
cases, such as when a Broker has recently started and hasn't received the configuration yet, or if the
Broker can't contact the Coordinator. If a query context parameter is critical for all your queries, set it in the runtime properties.
sys.queries table (experimental)
The new system queries table provides information about currently running and recently completed queries that use the Dart engine. This table is off by default. To enable the table, set the following:
druid.sql.planner.enableSysQueriesTable = true
As part of this change, the /druid/v2/sql/queries API now supports an includeComplete parameter that shows recently completed queries.
Auto-compaction with compaction supervisors
Auto-compaction using compaction supervisors has been improved, now generally available, and the recommended default. Automatic compaction tasks are now prefixed with auto instead of coordinator-issued.
As part of the improvement compaction states are now stored in a central location, a new indexingStates table. Individual segments only need to store a unique reference (indexing_state_fingerprint) to their full compaction state.
Since many segments in a single datasource share the same underlying compaction state, this greatly reduces metadata storage requirements for automatic compaction.
For backwards compatibility, Druid continues to persist the detailed compaction state in each segment. This functionality will be removed in a future release.
You can stop storing detailed compaction state by setting storeCompactionStatePerSegment to false in the cluster compaction config. If you turn it off and need to downgrade, Druid needs to re-compact any segments that have been compacted since you changed the config.
This change has upgrade impacts for metadata storage and metadata caching. For more information, see the Metadata storage for auto-compaction with compaction supervisors upgrade note.
Broker tier selection for realtime servers
Added druid.broker.realtime.select.tier and druid.broker.realtime.balancer.type on the Brokers to optionally override the Broker’s tier selection and balancer strategies for realtime servers. If these properties are not set (the default), realtime servers continue to use the existing druid.broker.select and druid.broker.balancer configurations that apply to both historical and realtime servers.
Manual Broker routing in the web console
You can now configure which Broker the Router uses for queries issued from the web console. You may want to do this if there are Brokers that don't have visibility into certain data tiers, and you know you're querying data available only on a certain tier.
To specify a Broker, add the following config to web-console/console-config.js:
consoleBrokerService: 'druid/BROKER_NAME'Consul extension
The contributor extension druid-consul-extensions lets Druid clusters use Consul for service discovery and
Coordinator/Overlord leader election instead of ZooKeeper. The extension supports ACLs, TLS/mTLS, and metrics.
Before you switch to Consul, you need to set
druid.serverview.type=http and druid.indexer.runner.type=httpRemote cluster wide.
Functional area and related changes
This section contains detailed release notes separated by areas.
Web console
Changed storage column displays
The following improvements have been m...
Druid 36.0.0
Apache Druid 36.0.0 contains over 189 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 34 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes before you upgrade to Druid 36.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
Important features, changes, and deprecations
This section contains important information about new and existing features.
Functional area and related changes
This section contains detailed release notes separated by areas.
Druid operator
Druid Operator is a Kubernetes controller that manages the lifecycle of your Druid clusters. The operator simplifies the management of Druid clusters with its custom logic that is configurable through
Kubernetes CRDs.
Cost-based autoscaling for streaming ingestion
Druid now supports cost-based autoscaling for streaming ingestion that optimizes task count by balancing lag reduction against resource efficiency.. This autoscaling strategy uses the following formula:
totalCost = lagWeight × lagRecoveryTime + idleWeight × idlenessCost
which accounts for the time to clear the backlog and compute time:
lagRecoveryTime = aggregateLag / (taskCount × avgProcessingRate) — time to clear backlog
idlenessCost = taskCount × taskDuration × predictedIdleRatio — wasted compute time
Kubernetes client mode (experimental)
In kubernetes-overlord-extensions an experimental Kubernetes client mode was added. The new mode uses the fabric8 SharedInformers to cache k8s metadata. This greatly reduces API traffic between the Overlord and k8s control plane. You can try out this feature using the following config:
druid.indexer.runner.useK8sSharedInformers=true
cgroup v2 support
cgroup v2 is now supported, and all cgroup metrics now emit cgroupversion to identify which version is being used.
The following metrics automatically switch to v2 if v2 is detected: CgroupCpuMonitor , CgroupCpuSetMonitor, CgroupDiskMonitor,MemoryMonitor. CpuAcctDeltaMonitor fails gracefully if v2 is detected.
Additionally, CgroupV2CpuMonitor now also emits cgroup/cpu/shares and cgroup/cpu/cores_quota.
Query reports for Dart
Dart now supports query reports for running and recently completed queries. The reports can be fetched from the /druid/v2/sql/queries/<sqlQueryId>/reports endpoint.
The format of the response is a JSON object with two keys, "query" and "report". The "query" key is the same info that is available from the existing /druid/v2/sql/queries endpoint. The "report" key is a report map including an MSQ report.
You can control the retention behavior for reports using the following configs:
druid.msq.dart.controller.maxRetainedReportCount: Max number of reports that are retained. The default is 0, meaning no reports are retaineddruid.msq.dart.controller.maxRetainedReportDuration: How long reports are retained in ISO 8601 duration format. The default isPT0S, meaning time-based expiration is turned off
New segment format
The new version 10 segment format improves upon version 9. It is off by default and not compatible with older segment format versions.
Set druid.indexer.task.buildV10=true to make segments in the new format.
If you downgrade, you must reindex your data with a supported segment format version.
You can use the bin/dump-segment tool to view segment metadata. The tool outputs serialized JSON.
Web console
New info available in the web console
The web console now includes information about the number of available processors and the total memory (in binary bytes).
This information is also available through the sys.servers table.
Other web console improvements
- Added tracking for inactive workers for MSQ execution stages #18768
- Added a refresh button for JSON views and stage viewers #18768
- You can now define
ARRAYtype parameters in the query view #18586 - Changed system table queries to now automatically use the native engine #18857
- Improved time charts to support multiple measures #18701
Ingestion
- Added support for AWS
InternalErrorcode retries #18720 - Improved ingestion to be more resilient. Ingestion tasks no longer fail if the task log upload fails with an exception #18748
- Improved how Druid handles situations where data doesn't match the expected type #18878
- Improved JSON ingestion so that Druid can compute JSON values directly from dictionary or index structures, allowing ingestion to skip persisting raw JSON data entirely. This reduces on-disk storage size #18589
- You can now choose between full dictionary-based indexing and nulls-only indexing for long/double fields in a nested column #18722
SQL-based ingestion
Additional ingestion configurations
You can now use the following configs to control how your data gets ingested and stored:
maxInputFilesPerWorker: Controls the maximum number of input files or segments per worker.maxPartitions: Controls the maximum number of output partitions for any single stage, which affects how many segments are generated during ingestion.
Other SQL-based ingestion improvements
- Added
maxRowsInMemoryto replacerowsInMemory.rowsInMemorynow functions as an alternate way to provide that config and is ignored ifmaxRowsInMemoryis specified. Previously, onlyrowsInMemoryexisted #18832
Streaming ingestion
Record offset and partition
You can now ingest the record offset (offsetColumnName) and partition (partitionColumnName) using the KafkaInputFormat. Their default names are kafka.offset and kafka.partition respectively .
Other streaming ingestion improvements
- Improved supervisors so that they can't kill tasks while the supervisor is stopping #18767
- Improved the lag-based autoscaler for streaming ingestion #18745
- Improved the
SeekableStreamsupervisor autoscaler to wait for tasks to complete before attempting subsequent scale operations. This helps prevent duplicate supervisor history entries #18715
Querying
Other querying improvements
- Improved the user experience for invalid
regex_expqueries. An error gets returned now #18762
Cluster management
Dynamic capacity for Kubernetes-based deployments
Druid can now dynamically tune the task runner capacity.
Include the capacity field in a POST API call to /druid/indexer/v1/k8s/taskrunner/executionconfig. Setting a value this way overrides druid.indexer.runner.capacity.
Server properties table
The sys.server_properties table exposes the runtime properties configured for each Druid server. Each row represents a single property key-value pair associated with a specific server.
Other cluster management improvements
- Added quality of service filtering for the Overlord so that health check threads don't get blocked #18033
Data management
Other data management improvements
- Added the
mostFragmentedFirstcompaction policy that prioritizes intervals with the most small ...
Druid 35.0.1
Apache Druid 35.0.1 is a patch release that contains important fixes for historical server segment dropping and the protobuf input format extension.
For information about new features in Druid 35, see the Druid 35 release notes.
Bug fixes
- Fixed an issue where dropping segments didn't remove the memory mapping of segment files, leaving file descriptors of the deleted files open until the process exits #18782
- Fixed an issue where the URL path for the protobuf input format wasn't loading #18770
Dependency updates
- Updated lz4-java to 1.8.1 #18804
Druid 35.0.0
Apache Druid 35.0.0 contains over 229 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 29 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes and incompatible changes before you upgrade to Druid 35.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
# Important features, changes, and deprecations
This section contains important information about new and existing features.
# Jetty
Druid 35 uses Jetty 12. This change may impact your deployment. For more information,see the upgrade note for Jetty 12
# Java support
Druid now supports Java 21. Note that some versions of Java 21 encountered issues during test, specifically Java 21.05-21.07. If possible, avoid these versions.
Additionally, support for Java 11 has been removed. Upgrade to Java 17 or 21.
# Projections
Projections have been improved:
- Static filters are now supported
- The granularity in queries can match UTC time zones
Additionally, there have been general improvements to performance and reliability.
# Virtual storage (experimental)
Virtual storage fabric mode for enables Historical servers to serve more segments than what their physical disk can hold. Instead of loading segments when they're published, segments get loaded on demand during queries. Any unneeded segments get removed from disk but get loaded again when processing a query requires that particular segment.
To enable virtual storage, set druid.segmentCache.virtualStorage to true.
This feature is experimental and has several limitations. For more information, see the pull request:
# New input format
Druid now supports a lines input format. Druid reads each line from an input as UTF-8 text and creates a single column named line that contains the entire line as a string. Use this for reading line-oriented data in a simple form for later processing.
# Multi-stage query task engine
The MSQ task engine is now a core capability of Druid rather than an extension. It has been in the default extension load list for several releases.
Remove druid-multi-stage-query from druid.extensions.loadList in common.runtimes.properties before you upgrade.
Druid 35.0.0 will ignore the extension if it's in the load list. Future versions of Druid will fail to start since it can't locate the extension.
# Improved monitor loading
You can now specify monitors in common.runtime.properties and each monitor will be loaded only on the applicable server types. Previously, you needed to define monitors in the specific runtime.properties file for the service a monitor is meant for.
# Exact count extension
A new contributor extension (druid-exact-count-bitmap) adds support for exact cardinality counting using Roaring Bitmap over a Long column.
# Improved indexSpec
Users can now specify a format specification for each JSON column individually, which will override the IndexSpec defined in the ingestion job. Additionally, a system-wide default IndexSpec can be set using the druid.indexing.formats.indexSpec property.
# Functional area and related changes
This section contains detailed release notes separated by areas.
# Web console
# Time zones
You can now configure whether the web console displays local time or UTC. This setting is stored locally in your browser and doesn't impact other users.
Note that the URL maintains the query parameters in UTC time, but the Druid console automatically converts the filter to local time.
# Other web console improvements
- Added better support for MSQ task engine-based compaction tasks. They now use the stages pane to render the compaction report instead of showing the JSON #18545
- Added a version column to the Services tab so that you can see what version a service is running. This is helpful during rolling upgrades to verify the state of the cluster and upgrade #18542
- Improved the resiliency of the web console when the supervisor history is extensive #18416
# Ingestion
# Dimension schemas
At ingestion time, dimension schemas in dimensionsSpec are now strictly validated against allowed types. Previously an invalid type would fall back to string dimension. Now, such values are rejected. Users must specify a type that's one of the allowed types. Omitting type still defaults to string, preserving backward compatibility.
# Other ingestion improvements
- Added support for session tokens (
sessionToken) to the S3 input source #18609 - Improved performance of task APIs served by the Overlord. Druid now reads the in-memory state of the Overlord before fetching task information from the metadata store #18448
- Improved task execution so that they can successfully complete even if there are problems pushing logs and reports to deep storage #18210
# SQL-based ingestion
# Other SQL-based ingestion improvements
Druid 34.0.0
Apache Druid 34.0.0 contains over 270 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 48 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes and incompatible changes before you upgrade to Druid {{DRUIDVERSION}}.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
Important features, changes, and deprecations
This section contains important information about new and existing features.
Java 11 support
Java 11 support has been deprecated since Druid 32.0, and official support will be removed as early as Druid 35.0.0
Hadoop-based ingestion
Hadoop-based ingestion has been deprecated since Druid 32.0 and will be removed as early as Druid 35.0.0.
We recommend one of Druid's other supported ingestion methods, such as SQL-based ingestion or MiddleManager-less ingestion using Kubernetes.
As part of this change, you must now opt-in to using the deprecated index_hadoop task type. If you don't do this, your Hadoop-based ingestion tasks will fail.
To opt-in, set druid.indexer.task.allowHadoopTaskExecution to true in your common.runtime.properties file.
Use SET statements for query context parameters
You can now use SET statements to define query context parameters for a query through the Druid console or the API.
SET statements in the Druid console
The web console now supports using SET statements to specify query context parameters. For example, if you include SET timeout = 20000; in your query, the timeout query context parameter is set:
SET timeout = 20000;
SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2SET statements with the API
SQL queries issued to /druid/v2/sql can now include multiple SET statements to build up context for the final statement. For example, the following SQL query results includes the timeout, useCache, populateCache, vectorize, and engine query context parameters:
SET timeout = 20000;
SET useCache = false;
SET populateCache = false;
SET vectorize = 'force';
SET engine = 'msq-dart'
SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2The API call for this query looks like the following:
curl --location 'http://HOST:PORT/druid/v2/sql' \
--header 'Content-Type: application/json' \
--data '{
"query": "SET timeout=20000; SET useCache=false; SET populateCache=false; SET engine='\''msq-dart'\'';SELECT user, commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia GROUP BY 1, 2 ORDER BY 2 DESC",
"resultFormat": "array",
"header": true,
"typesHeader": true,
"sqlTypesHeader": true
}'
This improvement also works for INSERT and REPLACE queries using the MSQ task engine. Note that JDBC isn't supported.
Improved HTTP endpoints
You can now use raw SQL in the HTTP body for /druid/v2/sql endpoints. You can set Content-Type to text/plain instead of application/json, so you can provide raw text that isn't escaped.
Cloning Historicals (experimental)
You can now configure clones for Historicals using the dynamic Coordinator configuration cloneServers. Cloned Historicals are useful for situations such as rolling updates where you want to launch a new Historical as a replacement for an existing one.
Set the config to a map from the target Historical server to the source Historical:
"cloneServers": {"historicalClone":"historicalOriginal"}
The clone doesn't participate in regular segment assignment or balancing. Instead, the Coordinator mirrors any segment assignment made to the original Historical onto the clone, so that the clone becomes an exact copy of the source. Segments on the clone Historical do not count towards replica counts either. If the original Historical disappears, the clone remains in the last known state of the source server until removed from the cloneServers config.
When you query your data using the native query engine, you can prefer (preferClones), exclude (excludeClones), or include (includeClones) clones by setting the query context parameter cloneQueryMode. By default, clones are excluded.
As part of this change, new Coordinator APIs are available. For more information, see Coordinator APIs for clones.
Embedded kill tasks on the Overlord (Experimental)
You can now run kill tasks directly on the Overlord itself. Embedded kill tasks provide several benefits; they:
- Kill segments as soon as they're eligible
- Don't take up tasks slot
- finish faster since they use optimized metadata queries and don't launch a new JVM
- Kill a small number of segments per task, ensuring locks on an interval aren't held for too long
- Skip locked intervals to avoid head-of-line blocking
- Require minimal configuration
- Can keep up with a large number of unused segments in the cluster
This feature is controlled by the following configs:
druid.manager.segments.killUnused.enabled- Whether the feature is enabled or not (Defaults tofalse)druid.manager.segments.killUnused.bufferPeriod- The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused (Defaults toP30D)
To use embedded kill tasks, you need to have segment metadata cache enabled.
As part of this feature, new metrics have been added.
Preferred tier selection
You can now configure the Broker service to prefer Historicals on a specific tier. This is useful for across availability zone deployment. Brokers in one AZ select historicals in the same AZ by default but still keeps the ability to select historical nodes in another AZ if historicals in the same AZ are not available.
To enable, set property druid.broker.select.tier to perferred in Broker runtime properties. You can then configure druid.broker.select.tier.preferred.tier to the tier you want each broker to prefer (i.e. for brokers in AZ1, you could set this to the tier name of your AZ1 historical servers).
Dart improvements
The Dart query engine now uses the /druid/v2/sql endpoint like other SQL query engines. The former Dart specific endpoint is no longer supported. To use Dart for a query, include the engine query context parameter and set it to msq-dart.
Enabling Dart remains the same, add the following line to your broker/runtime.properties and historical/runtime.properties files:
druid.msq.dart.enabled = true
Additionally, Dart now queries real-time tasks by default. You can control this behavior by setting the query context parameter includeSegmentSource to REALTIME (default) or NONE, in a similar way to MSQ tasks. You can also run synchronous or asynchronous queries.
SegmentMetadataCache on the Coordinator
Functional area and related changes
This section contains detailed release notes separated by areas.
Web console
Druid 33.0.0
Apache Druid 33.0.0 contains over 190 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 44 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes before you upgrade to Druid 33.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
# Important features, changes, and deprecations
This section contains important information about new and existing features.
# Increase segment load speed
You can now increase the speed at which segments get loaded on a Historical by providing a list of servers for the Coordinator dynamic config turboLoadingNodes. For these servers, the Coordinator ignores druid.coordinator.loadqueuepeon.http.batchSize and uses the value of the respective numLoadingThreads instead. Please note that putting a Historical in turbo-loading mode might affect query performance since more resources would be used by the segment loading threads.
# Overlord APIs for compaction (experimental)
You can use the following Overlord compaction APIs to manage compaction status and configs. These APIs work seamlessly irrespective of whether compaction supervisors are enabled or not.
For more information, see Compaction APIs
# Scheduled batch ingestion (experimental)
You can now schedule batch ingestions with the MSQ task engine by using the scheduled batch supervisor. You can specify the schedule using either the standard Unix cron syntax or Quartz cron syntax by setting the type field to either unix or quartz. Unix also supports macro expressions such as @daily and others.
Submit your supervisor spec to the /druid/v2/sql/task/ endpoint.
The following example scheduled batch supervisor spec submits a REPLACE query every 5 minutes:
{
"type": "scheduled_batch",
"schedulerConfig": {
"type": "unix",
"schedule": "*/5 * * * *"
},
"spec": {
"query": "REPLACE INTO foo OVERWRITE ALL SELECT * FROM bar PARTITIONED BY DAY"
},
"suspended": false
}# Improved S3 upload
Druid can now use AWS S3 Transfer Manager for S3 uploads, which can significantly reduce segment upload time. This feature is on by default and controlled with the following configs in common.runtime.properties:
druid.storage.transfer.useTransferManager=true
druid.storage.transfer.minimumUploadPartSize=20971520
druid.storage.transfer.multipartUploadThreshold=20971520
# Functional area and related changes
This section contains detailed release notes separated by areas.
# Web console
# MERGE INTO
The MERGE INTO keyword is now highlighted in the web console and the query gets treated as an insert query.
# Other web console improvements
- Added the ability to multi-select in table filters and added suggestions to the Status field for tasks and supervisors as well as service type #17765
- The Explore view now supports timezones #17650
- Data exported from the web console is now normalized to how Druid exports data. Additionally, you can now export results as Markdown tables #17845
# Ingestion
# SQL-based ingestion
# Other SQL-based ingestion improvements
# Streaming ingestion
# Query parameter for restarts
You can now use an optional query parameter called skipRestartIfUnmodified for the /druid/indexer/v1/supervisor endpoint. You can set skipRestartIfUnmodified=true to not restart the supervisor if the spec is unchanged.
For example:
curl -X POST --header "Content-Type: application/json" -d @supervisor.json localhost:8888/druid/indexer/v1/supervisor?skipRestartIfUnmodified=true# Other streaming ingestion improvements
- Improved the efficiency of streaming ingestion by fetching active tasks from memory. This reduces the number of calls to the metadata store for active datasource task payloads #16098
# Querying
# Improved the query results API
The query results API (`GE...
Druid 32.0.1
The Apache Druid team is proud to announce the release of Apache Druid 32.0.1.
Druid is a high performance analytics data store for event-driven data.
Apache Druid 32.0.1 contains security fixes for CVE-2025-27888.
Source and binary distributions can be downloaded from:
https://druid.apache.org/downloads.html
Full Changelog: druid-32.0.0...druid-32.0.1
A big thank you to all the contributors in this milestone release!
Druid 31.0.2
The Apache Druid team is proud to announce the release of Apache Druid 31.0.2.
Druid is a high performance analytics data store for event-driven data.
Apache Druid 31.0.2 contains security fixes for CVE-2025-27888.
Source and binary distributions can be downloaded from:
https://druid.apache.org/downloads.html
Full Changelog: druid-31.0.1...druid-31.0.2
A big thank you to all the contributors in this milestone release!
Druid 32.0.0
Apache Druid 32.0.0 contains over 220 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 52 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the incompatible changes before you upgrade to Druid 32.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
# Important features
This section contains important information about new and existing features.
# New Overlord APIs
APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service:
-
Mark all segments of a datasource as unused:
POST /druid/indexer/v1/datasources/{dataSourceName} -
Mark all (non-overshadowed) segments of a datasource as used:
DELETE /druid/indexer/v1/datasources/{dataSourceName} -
Mark multiple segments as used
POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed -
Mark multiple (non-overshadowed) segments as unused
POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused -
Mark a single segment as used:
POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId} -
Mark a single segment as unused:
DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}
As part of this change, the corresponding Coordinator APIs have been deprecated and will be removed in a future release:
POST /druid/coordinator/v1/datasources/{dataSourceName}POST /druid/coordinator/v1/datasources/{dataSourceName}/markUsedPOST /druid/coordinator/v1/datasources/{dataSourceName}/markUnusedPOST /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}DELETE /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}DELETE /druid/coordinator/v1/datasources/{dataSourceName}
The Coordinator now calls the Overlord to serve these requests.
# Realtime query processing for multi-value strings
Realtime query processing no longer considers all strings as multi-value strings during expression processing, fixing a number of bugs and unexpected failures. This should also improve realtime query performance of expressions on string columns.
This change impacts topN queries for realtime segments where rows of data are implicitly null, such as from a property missing from a JSON object.
Before this change, these were handled as [] instead of null, leading to inconsistency between processing realtime segments and published segments. When processing segments, the value was treated as [], which topN ignores. After publishing, the value became null, which topN does not ignore. The same query could have different results before and after being persisted
After this change, the topN engine now treats [] as null when processing realtime segments, which is consistent with published segments.
This change doesn't impact actual multi-value string columns, regardless of if they're realtime.
# Join hints in MSQ task engine queries
Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.
# Changes and deprecations
# ANSI-SQL compatibility and query results
Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed:
druid.generic.useDefaultValueForNull=truedruid.expressions.useStrictBooleans=falsedruid.generic.useThreeValueLogicForNativeFilters=false
They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now.
If the configs are set to the legacy behavior, Druid services will fail to start.
If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade.
For more information about how to update your queries, see the migration guide.
# Java support
Java support in Druid has been updated:
- Java 8 support has been removed
- Java 11 support is deprecated
We recommend that you upgrade to Java 17.
# Hadoop-based ingestion
Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion.
# Join hints in MSQ task engine queries
Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.
select /*+ sort_merge */ w1.cityName, w2.countryName
from
(
select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName
) w1
JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName
where w1.cityName='New York';(#17406)
# Functional area and related changes
This section contains detailed release notes separated by areas.
# Web console
# Explore view (experimental)
Several improvements have been made to the Explore view in the web console.
# Segment timeline view
The segment timeline is now more interactive and no longer forces day granularity.
# Other web conso...
Druid 31.0.1
Apache Druid 31.0.1 is a patch release that contains important fixes for topN queries using query granularity other than 'ALL' and for the new complex metric column compression feature introduced in Druid 31.0.0. It also contains fixes for the web console, the new projections feature, and a fix for a minor performance regression.
See the complete set of changes for 31.0.1 for additional details.
For information about new features in Druid 31, see the Druid 31 release notes.
#Bug fixes
- Fixes an issue with topN queries that use a query granularity other than 'ALL', which could cause some query correctness issues #17565
- Fixes an issue with complex metric compression that caused some data to be read incorrectly, resulting in segment data corruption or system instability due to out-of-memory exceptions. We recommend that you reingest data if you use compression for complex metric columns #17422
- Fixes an issue with projection segment merging #17460
- Fixes web console progress indicator #17334
- Fixes a minor performance regression with query processing #17397
# Credits
@clintropolis
@findingrish
@gianm
@techdocsmith
@vogievetsky