Add projection pushdown to binary expression by yeya24 · Pull Request #691 · thanos-io/promql-engine

yeya24 · 2026-02-19T00:22:56Z

Fixes #689

The idea is to reuse the same projection logical optimizer to pushdown projections to binary expression. Binary expression vector operator can use the projection information to skip non projected labels when materializing labels in the join table.

Added comprehensive tests and correctness tests to ensure the correctness.

My local benchmark showed that this helps mainly when label string interning is disabled. We are still using slicelabels in Cortex so it helps with our usecase. Users can choose whether to enable or disable this functionality.

Here is the AI generated benchmark report based on the benchmarks I ran locally.

Binary Projection Pushdown - Benchmark Results

Test Configuration

Platform: Darwin arm64
CPU: Apple M1 Pro
Benchmark: Binary operator initialization + Series() call
Build tag: -tags slicelabels (label interning disabled)

Results

Small Dataset (1K series, 10 labels)

Without Projection:

989,740 ns/op, 1,089,628 B/op, 2,178 allocs/op
1000 series with 10 labels each

With Projection:

613,850 ns/op, 418,096 B/op, 2,180 allocs/op
1000 series with 2 labels each (8 labels filtered)

Savings:

✅ Memory: -671 KB (-62%)
✅ Time: -376 μs (-38%)

Large Dataset (10K series, 20 labels)

Without Projection:

17,728,662 ns/op, 22,321,292 B/op, 21,234 allocs/op
10000 series with 20 labels each

With Projection:

10,171,683 ns/op, 4,082,405 B/op, 21,237 allocs/op
10000 series with 2 labels each (18 labels filtered)

Savings:

✅ Memory: -18.2 MB (-82%)
✅ Time: -7.6 ms (-43%)

Key Findings

Scaling Behavior

The optimization's benefits scale linearly with dataset size:

Dataset	Series	Labels	Labels Filtered	Memory Saved	Time Saved
Small	1,000	10	8 (80%)	671 KB (62%)	38%
Large	10,000	20	18 (90%)	18.2 MB (82%)	43%

Why It Works (with slicelabels)

Full string storage: Each label stores the complete string (~100 bytes)
No interning: Duplicate values stored multiple times
Allocation cost: Creating label strings is expensive
Filtering benefit: Skipping labels avoids allocations entirely

Memory Breakdown (Large Dataset)

Without projection (22.3 MB):

Label strings: 10,000 series × 20 labels × ~100 bytes = ~20 MB
Metadata: ~2.3 MB

With projection (4.1 MB):

Label strings: 10,000 series × 2 labels × ~100 bytes = ~2 MB
Metadata: ~2.1 MB

Savings: 18.2 MB (82%)

Impact of Label Interning

With Default Build (Label Interning Enabled)

The optimization provides minimal benefit because:

Labels are stored as 8-byte pointers, not full strings
Filtering overhead cancels out pointer savings
Result: +20% CPU overhead, ~0% memory savings

With slicelabels Build (Label Interning Disabled)

The optimization provides massive benefit because:

Labels are stored as full strings (~100 bytes each)
Filtering avoids expensive string allocations
Result: -43% CPU time, -82% memory usage

Real-World Implications

For Prometheus (Default Build with Interning)

Not recommended - The optimization adds CPU overhead without meaningful memory savings.

For Systems Without Label Interning

Highly recommended - Provides:

82% memory reduction for high-cardinality joins
43% performance improvement
Scales linearly with series count and label count

When to Enable

Enable this optimization when:

No label interning: Your system stores labels as full strings
High cardinality: Joins produce 10K+ result series
Many labels: Series have 15+ labels
Selective aggregation: Outer operations use <5 labels
Memory constrained: Every MB matters

Conclusion

The optimization is correctly implemented and provides dramatic benefits for systems without label interning:

✅ 82% memory reduction (18 MB saved for 10K series)
✅ 43% performance improvement (7.6 ms saved)
✅ Linear scaling with dataset size

Signed-off-by: yeya24 <benye@amazon.com>

Copilot

Pull request overview

This PR extends the existing projection optimizer so it can push projection requirements down into binary expressions (specifically many-to-one / one-to-many joins), allowing the binary vector operator to avoid materializing unnecessary labels when building join-table result metrics—reducing memory usage for high-cardinality joins.

Changes:

Add optional PushDownBinaryProjection mode to ProjectionOptimizer to store projections on Binary nodes and derive/push child projections.
Plumb Binary.Projection through logical nodes, execution planning, and into the binary vector operator to apply label filtering during resultMetric() materialization.
Add/expand unit tests, correctness tests (including Prometheus comparison), and add benchmarks for binary projection pushdown.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
logicalplan/projection.go	Adds `PushDownBinaryProjection` option and stores projection on `Binary` nodes for group_left/right cases.
logicalplan/logical_nodes.go	Extends `Binary` node with `Projection` and includes it in cloning + JSON marshal/unmarshal.
execution/execution.go	Passes binary projection from logical plan into the execution binary operator.
execution/binary/vector.go	Applies projection during binary `resultMetric()` label materialization to reduce label allocations.
logicalplan/plan_test.go	Updates plan rendering to display binary projections in test output.
logicalplan/projection_test.go	Updates projection expectations and adds targeted tests for binary projection pushdown behavior.
engine/projection_test.go	Adds correctness tests for pushdown (baseline vs optimized, and Prometheus comparison) and fuzz-like coverage focused on binaries with projections.
engine/projection_binary_bench_test.go	Adds benchmarks for measuring memory/CPU impact of binary projection pushdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T10:42:56Z

+			testutil.Ok(b, err)
+			series, _ := op.Series(context.Background())
+			if i == 0 {
+				b.Logf("Result series count: %d", len(series))


This benchmark ignores the error return from op.Series(...). Even in benchmarks, it’s useful to check and fail fast so we don’t end up measuring behavior under error conditions silently.

Copilot · 2026-04-24T10:42:56Z

+			qBaseline, err := engineBaseline.NewRangeQuery(context.Background(), storage, nil, tc.query, start, end, interval)
+			testutil.Ok(t, err)
+			resultBaseline := qBaseline.Exec(context.Background())
+			testutil.Ok(t, resultBaseline.Err)


Range queries created in this test aren't closed. The engine tests run with goleak verification, so leaving queries open can leak goroutines/resources and make the suite flaky. Please add defer qBaseline.Close() after successful creation (and similarly close the optimized query).

Copilot · 2026-04-24T10:42:56Z

+			qOptimized, err := engineOptimized.NewRangeQuery(context.Background(), storage, nil, tc.query, start, end, interval)
+			testutil.Ok(t, err)
+			resultOptimized := qOptimized.Exec(context.Background())
+			testutil.Ok(t, resultOptimized.Err)


Range query qOptimized is not closed after Exec. Please defer qOptimized.Close() after creation to avoid leaking resources/goroutines under goleak.

Copilot · 2026-04-24T10:42:57Z

+			testutil.Ok(t, err)
+			resultProm := qProm.Exec(context.Background())
+			testutil.Ok(t, resultProm.Err)
+
+			qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
+			testutil.Ok(t, err)


Prometheus range query qProm is never closed. Please defer qProm.Close() (and likewise for qThanos) to avoid leaking goroutines/resources during the engine_test goleak-verified test suite.

Suggested change

testutil.Ok(t, err)

resultProm := qProm.Exec(context.Background())

testutil.Ok(t, resultProm.Err)

qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)

testutil.Ok(t, err)

testutil.Ok(t, err)

defer qProm.Close()

resultProm := qProm.Exec(context.Background())

testutil.Ok(t, resultProm.Err)

qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)

testutil.Ok(t, err)

defer qThanos.Close()

Copilot · 2026-04-24T10:42:57Z

+			testutil.Ok(t, err)
+			resultProm := qProm.Exec(context.Background())
+			testutil.Ok(t, resultProm.Err)
+
+			qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
+			testutil.Ok(t, err)


Thanos range query qThanos isn't closed. Add defer qThanos.Close() after creation to avoid goroutine/resource leaks under goleak.

Suggested change

testutil.Ok(t, err)

resultProm := qProm.Exec(context.Background())

testutil.Ok(t, resultProm.Err)

qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)

testutil.Ok(t, err)

testutil.Ok(t, err)

defer qProm.Close()

resultProm := qProm.Exec(context.Background())

testutil.Ok(t, resultProm.Err)

qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)

testutil.Ok(t, err)

defer qThanos.Close()

Copilot · 2026-04-24T10:42:57Z

+		expected string
+	}{
+		{
+			name:     "aggregation with binary using on - binary gets projection",


This test case name says the binary gets a projection, but for * on(job) the vector matching cardinality is one-to-one, and the optimizer intentionally does not set Binary.Projection in that case. Consider renaming the case to reflect that only the selectors get projections here (or adjust the expectation if the intent changed).

Suggested change

name: "aggregation with binary using on - binary gets projection",

name: "aggregation with binary using on - selectors get projections",

Copilot · 2026-04-24T10:42:58Z

+		// Store projection on Binary only when the binary has group_left or group_right.
+		// For one-to-one or vector-scalar, projecting the binary's output can collapse distinct
+		// series to the same label set and cause implicit many-to-one in a downstream binary.
+		if p.PushDownBinaryProjection && projection != nil && n.VectorMatching != nil &&
+			(n.VectorMatching.Card == parser.CardManyToOne || n.VectorMatching.Card == parser.CardOneToMany) {
+			n.Projection = &Projection{
+				Labels:  append([]string(nil), projection.Labels...),
+				Include: projection.Include,
+			}
+		}


pushProjection currently stores a Projection on Binary whenever projection != nil and the binary is many-to-one/one-to-many. If the incoming projection is a no-op (e.g. exclude mode with an empty label list, such as sum without ()), this unnecessarily sets Binary.Projection and can add noise/overhead (and affects renderers/tests). Consider only setting n.Projection when it will actually change the output labels (e.g. projection.Include || len(projection.Labels) > 0).

Copilot · 2026-04-24T10:42:58Z

+		if t.Projection != nil {
+			b.WriteString("(")
+		}
 		b.WriteString(renderExprTree(t.LHS))
 		b.WriteString(" ")
 		b.WriteString(t.Op.String())


renderExprTree wraps every Binary with Projection != nil in parentheses even if no projection suffix will be printed (e.g. exclude mode with 0 labels). This can make rendered output misleading and can introduce unnecessary diffs in test expectations. Consider only adding parentheses/suffix when the projection is effective (Include=true or len(Labels)>0).

add projection pushdown to binary expression

90d332a

Signed-off-by: yeya24 <benye@amazon.com>

yeya24 force-pushed the projection-pushdown-aggr branch from 96bd215 to 90d332a Compare February 19, 2026 00:34

yeya24 added 2 commits February 18, 2026 18:07

lint

5bfc7c6

Signed-off-by: yeya24 <benye@amazon.com>

only pushdown when there is group left or right

d0177f4

Signed-off-by: yeya24 <benye@amazon.com>

yeya24 mentioned this pull request Feb 19, 2026

Add maxSamples limit #663

Merged

GiedriusS requested a review from Copilot April 24, 2026 10:34

Copilot started reviewing on behalf of GiedriusS April 24, 2026 10:34 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add projection pushdown to binary expression#691

Add projection pushdown to binary expression#691
yeya24 wants to merge 3 commits into
thanos-io:mainfrom
yeya24:projection-pushdown-aggr

yeya24 commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	name: "aggregation with binary using on - binary gets projection",
	name: "aggregation with binary using on - selectors get projections",

Conversation

yeya24 commented Feb 19, 2026

Binary Projection Pushdown - Benchmark Results

Test Configuration

Results

Small Dataset (1K series, 10 labels)

Large Dataset (10K series, 20 labels)

Key Findings

Scaling Behavior

Why It Works (with slicelabels)

Memory Breakdown (Large Dataset)

Impact of Label Interning

With Default Build (Label Interning Enabled)

With slicelabels Build (Label Interning Disabled)

Real-World Implications

For Prometheus (Default Build with Interning)

For Systems Without Label Interning

When to Enable

Conclusion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants