-
Notifications
You must be signed in to change notification settings - Fork 17k
Word Cloud unconditional secondary sort prevents Druid TopN optimization, causing query timeout #39072
Description
Bug description
Description
When sort_by_metric is enabled on a Word Cloud chart, buildQuery.ts unconditionally appends a secondary ORDER BY [series, ASC] alongside the metric sort. On Apache Druid, any multi-column ORDER BY prevents the native TopN query optimization, forcing a full GroupBy scan over the entire dataset before applying LIMIT. On high-cardinality dimensions this can cause dramatic query slowdowns and timeouts.
Root Cause
In superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts:
if (sort_by_metric && metric) {
orderby.push([metric, false]);
}
if (series) {
orderby.push([series, true]); // ← always added, even when sort_by_metric is true
}When sort_by_metric=true, this generates:
ORDER BY term_count DESC, search_term ASCDruid's TopN algorithm requires ordering by a single aggregate metric. The secondary dimension sort forces the full GroupBy execution path regardless of dataset size.
Steps to Reproduce
- Create a Word Cloud chart backed by a Druid datasource with a high-cardinality string dimension
- Enable Sort by metric
- Load the chart — on large datasets it will time out with "Unknown error (Unknown)"
Without "Sort by metric", the chart loads correctly (Druid uses TopN). With it enabled, the same query triggers a full GroupBy scan.
Proposed Fix
Make the secondary dimension sort mutually exclusive with sort_by_metric:
if (sort_by_metric && metric) {
orderby.push([metric, false]);
} else if (series) {
orderby.push([series, true]);
}This preserves alphabetical ordering when sort_by_metric is disabled, while allowing Druid to use TopN optimization when metric sorting is requested. The secondary sort is also unnecessary for word cloud rendering, since word size is determined by metric value rather than series order.
Additional Notes
The existing test/buildQuery.test.ts does not cover sort_by_metric or orderby behavior. A PR for this fix should add test coverage for both cases.
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- I have searched Superset docs and Slack and didn't find a solution to my problem.
- I have searched the GitHub issue tracker and didn't find a similar bug report.
- I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.