Skip to content

Add projection pushdown to binary expression#691

Open
yeya24 wants to merge 3 commits into
thanos-io:mainfrom
yeya24:projection-pushdown-aggr
Open

Add projection pushdown to binary expression#691
yeya24 wants to merge 3 commits into
thanos-io:mainfrom
yeya24:projection-pushdown-aggr

Conversation

@yeya24

@yeya24 yeya24 commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

Fixes #689

The idea is to reuse the same projection logical optimizer to pushdown projections to binary expression. Binary expression vector operator can use the projection information to skip non projected labels when materializing labels in the join table.

Added comprehensive tests and correctness tests to ensure the correctness.

My local benchmark showed that this helps mainly when label string interning is disabled. We are still using slicelabels in Cortex so it helps with our usecase. Users can choose whether to enable or disable this functionality.

Here is the AI generated benchmark report based on the benchmarks I ran locally.

Binary Projection Pushdown - Benchmark Results

Test Configuration

  • Platform: Darwin arm64
  • CPU: Apple M1 Pro
  • Benchmark: Binary operator initialization + Series() call
  • Build tag: -tags slicelabels (label interning disabled)

Results

Small Dataset (1K series, 10 labels)

Without Projection:

989,740 ns/op, 1,089,628 B/op, 2,178 allocs/op
1000 series with 10 labels each

With Projection:

613,850 ns/op, 418,096 B/op, 2,180 allocs/op
1000 series with 2 labels each (8 labels filtered)

Savings:

  • ✅ Memory: -671 KB (-62%)
  • ✅ Time: -376 μs (-38%)

Large Dataset (10K series, 20 labels)

Without Projection:

17,728,662 ns/op, 22,321,292 B/op, 21,234 allocs/op
10000 series with 20 labels each

With Projection:

10,171,683 ns/op, 4,082,405 B/op, 21,237 allocs/op
10000 series with 2 labels each (18 labels filtered)

Savings:

  • ✅ Memory: -18.2 MB (-82%)
  • ✅ Time: -7.6 ms (-43%)

Key Findings

Scaling Behavior

The optimization's benefits scale linearly with dataset size:

Dataset Series Labels Labels Filtered Memory Saved Time Saved
Small 1,000 10 8 (80%) 671 KB (62%) 38%
Large 10,000 20 18 (90%) 18.2 MB (82%) 43%

Why It Works (with slicelabels)

  1. Full string storage: Each label stores the complete string (~100 bytes)
  2. No interning: Duplicate values stored multiple times
  3. Allocation cost: Creating label strings is expensive
  4. Filtering benefit: Skipping labels avoids allocations entirely

Memory Breakdown (Large Dataset)

Without projection (22.3 MB):

  • Label strings: 10,000 series × 20 labels × ~100 bytes = ~20 MB
  • Metadata: ~2.3 MB

With projection (4.1 MB):

  • Label strings: 10,000 series × 2 labels × ~100 bytes = ~2 MB
  • Metadata: ~2.1 MB

Savings: 18.2 MB (82%)


Impact of Label Interning

With Default Build (Label Interning Enabled)

The optimization provides minimal benefit because:

  • Labels are stored as 8-byte pointers, not full strings
  • Filtering overhead cancels out pointer savings
  • Result: +20% CPU overhead, ~0% memory savings

With slicelabels Build (Label Interning Disabled)

The optimization provides massive benefit because:

  • Labels are stored as full strings (~100 bytes each)
  • Filtering avoids expensive string allocations
  • Result: -43% CPU time, -82% memory usage

Real-World Implications

For Prometheus (Default Build with Interning)

Not recommended - The optimization adds CPU overhead without meaningful memory savings.

For Systems Without Label Interning

Highly recommended - Provides:

  • 82% memory reduction for high-cardinality joins
  • 43% performance improvement
  • Scales linearly with series count and label count

When to Enable

Enable this optimization when:

  1. No label interning: Your system stores labels as full strings
  2. High cardinality: Joins produce 10K+ result series
  3. Many labels: Series have 15+ labels
  4. Selective aggregation: Outer operations use <5 labels
  5. Memory constrained: Every MB matters

Conclusion

The optimization is correctly implemented and provides dramatic benefits for systems without label interning:

  • 82% memory reduction (18 MB saved for 10K series)
  • 43% performance improvement (7.6 ms saved)
  • Linear scaling with dataset size

Signed-off-by: yeya24 <benye@amazon.com>
@yeya24 yeya24 force-pushed the projection-pushdown-aggr branch from 96bd215 to 90d332a Compare February 19, 2026 00:34
Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: yeya24 <benye@amazon.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the existing projection optimizer so it can push projection requirements down into binary expressions (specifically many-to-one / one-to-many joins), allowing the binary vector operator to avoid materializing unnecessary labels when building join-table result metrics—reducing memory usage for high-cardinality joins.

Changes:

  • Add optional PushDownBinaryProjection mode to ProjectionOptimizer to store projections on Binary nodes and derive/push child projections.
  • Plumb Binary.Projection through logical nodes, execution planning, and into the binary vector operator to apply label filtering during resultMetric() materialization.
  • Add/expand unit tests, correctness tests (including Prometheus comparison), and add benchmarks for binary projection pushdown.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
logicalplan/projection.go Adds PushDownBinaryProjection option and stores projection on Binary nodes for group_left/right cases.
logicalplan/logical_nodes.go Extends Binary node with Projection and includes it in cloning + JSON marshal/unmarshal.
execution/execution.go Passes binary projection from logical plan into the execution binary operator.
execution/binary/vector.go Applies projection during binary resultMetric() label materialization to reduce label allocations.
logicalplan/plan_test.go Updates plan rendering to display binary projections in test output.
logicalplan/projection_test.go Updates projection expectations and adds targeted tests for binary projection pushdown behavior.
engine/projection_test.go Adds correctness tests for pushdown (baseline vs optimized, and Prometheus comparison) and fuzz-like coverage focused on binaries with projections.
engine/projection_binary_bench_test.go Adds benchmarks for measuring memory/CPU impact of binary projection pushdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +87 to +90
testutil.Ok(b, err)
series, _ := op.Series(context.Background())
if i == 0 {
b.Logf("Result series count: %d", len(series))

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark ignores the error return from op.Series(...). Even in benchmarks, it’s useful to check and fail fast so we don’t end up measuring behavior under error conditions silently.

Copilot uses AI. Check for mistakes.
Comment thread engine/projection_test.go
Comment on lines +563 to +566
qBaseline, err := engineBaseline.NewRangeQuery(context.Background(), storage, nil, tc.query, start, end, interval)
testutil.Ok(t, err)
resultBaseline := qBaseline.Exec(context.Background())
testutil.Ok(t, resultBaseline.Err)

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range queries created in this test aren't closed. The engine tests run with goleak verification, so leaving queries open can leak goroutines/resources and make the suite flaky. Please add defer qBaseline.Close() after successful creation (and similarly close the optimized query).

Copilot uses AI. Check for mistakes.
Comment thread engine/projection_test.go
Comment on lines +579 to +582
qOptimized, err := engineOptimized.NewRangeQuery(context.Background(), storage, nil, tc.query, start, end, interval)
testutil.Ok(t, err)
resultOptimized := qOptimized.Exec(context.Background())
testutil.Ok(t, resultOptimized.Err)

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range query qOptimized is not closed after Exec. Please defer qOptimized.Close() after creation to avoid leaking resources/goroutines under goleak.

Copilot uses AI. Check for mistakes.
Comment thread engine/projection_test.go
Comment on lines +633 to +638
testutil.Ok(t, err)
resultProm := qProm.Exec(context.Background())
testutil.Ok(t, resultProm.Err)

qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
testutil.Ok(t, err)

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus range query qProm is never closed. Please defer qProm.Close() (and likewise for qThanos) to avoid leaking goroutines/resources during the engine_test goleak-verified test suite.

Suggested change
testutil.Ok(t, err)
resultProm := qProm.Exec(context.Background())
testutil.Ok(t, resultProm.Err)
qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
testutil.Ok(t, err)
testutil.Ok(t, err)
defer qProm.Close()
resultProm := qProm.Exec(context.Background())
testutil.Ok(t, resultProm.Err)
qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
testutil.Ok(t, err)
defer qThanos.Close()

Copilot uses AI. Check for mistakes.
Comment thread engine/projection_test.go
Comment on lines +633 to +638
testutil.Ok(t, err)
resultProm := qProm.Exec(context.Background())
testutil.Ok(t, resultProm.Err)

qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
testutil.Ok(t, err)

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanos range query qThanos isn't closed. Add defer qThanos.Close() after creation to avoid goroutine/resource leaks under goleak.

Suggested change
testutil.Ok(t, err)
resultProm := qProm.Exec(context.Background())
testutil.Ok(t, resultProm.Err)
qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
testutil.Ok(t, err)
testutil.Ok(t, err)
defer qProm.Close()
resultProm := qProm.Exec(context.Background())
testutil.Ok(t, resultProm.Err)
qThanos, err := thanosEngine.NewRangeQuery(context.Background(), storage, nil, query, start, end, interval)
testutil.Ok(t, err)
defer qThanos.Close()

Copilot uses AI. Check for mistakes.
expected string
}{
{
name: "aggregation with binary using on - binary gets projection",

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case name says the binary gets a projection, but for * on(job) the vector matching cardinality is one-to-one, and the optimizer intentionally does not set Binary.Projection in that case. Consider renaming the case to reflect that only the selectors get projections here (or adjust the expectation if the intent changed).

Suggested change
name: "aggregation with binary using on - binary gets projection",
name: "aggregation with binary using on - selectors get projections",

Copilot uses AI. Check for mistakes.
Comment thread logicalplan/projection.go
Comment on lines +66 to +75
// Store projection on Binary only when the binary has group_left or group_right.
// For one-to-one or vector-scalar, projecting the binary's output can collapse distinct
// series to the same label set and cause implicit many-to-one in a downstream binary.
if p.PushDownBinaryProjection && projection != nil && n.VectorMatching != nil &&
(n.VectorMatching.Card == parser.CardManyToOne || n.VectorMatching.Card == parser.CardOneToMany) {
n.Projection = &Projection{
Labels: append([]string(nil), projection.Labels...),
Include: projection.Include,
}
}

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushProjection currently stores a Projection on Binary whenever projection != nil and the binary is many-to-one/one-to-many. If the incoming projection is a no-op (e.g. exclude mode with an empty label list, such as sum without ()), this unnecessarily sets Binary.Projection and can add noise/overhead (and affects renderers/tests). Consider only setting n.Projection when it will actually change the output labels (e.g. projection.Include || len(projection.Labels) > 0).

Copilot uses AI. Check for mistakes.
Comment thread logicalplan/plan_test.go
Comment on lines +67 to 72
if t.Projection != nil {
b.WriteString("(")
}
b.WriteString(renderExprTree(t.LHS))
b.WriteString(" ")
b.WriteString(t.Op.String())

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renderExprTree wraps every Binary with Projection != nil in parentheses even if no projection suffix will be printed (e.g. exclude mode with 0 labels). This can make rendered output misleading and can introduce unnecessary diffs in test expectations. Consider only adding parentheses/suffix when the projection is effective (Include=true or len(Labels)>0).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High cardinality Joins caused OOM kill due to large result labels

2 participants