Skip to content

feat(metrics): add bound instruments behind experimental feature flag#3421

Open
bryantbiggs wants to merge 8 commits intoopen-telemetry:mainfrom
bryantbiggs:feat/bound-instruments
Open

feat(metrics): add bound instruments behind experimental feature flag#3421
bryantbiggs wants to merge 8 commits intoopen-telemetry:mainfrom
bryantbiggs:feat/bound-instruments

Conversation

@bryantbiggs
Copy link
Copy Markdown
Contributor

Splits from #3392, per maintainer feedback. Depends on #3420 (in-place delta collection). Refs #1374.

Note: This PR is based on #3420 and should be merged after it. The diff will be cleaner once #3420 is merged.

Summary

Adds BoundCounter and BoundHistogram to the public API behind the experimental_metrics_bound_instruments feature flag. These types cache the resolved aggregator reference for a fixed attribute set, allowing subsequent measurements to bypass per-call sort, dedup, hash, and HashMap lookup entirely.

let counter = meter.u64_counter("requests").build();
let bound = counter.bind(&[KeyValue::new("method", "GET")]);
bound.add(1); // ~1.9ns — no attribute lookup

Architecture

TrackerEntry bound_count

bound_count (introduced in #3420) tracks how many live BoundCounter/BoundHistogram handles reference an entry. Entries with bound_count > 0 are never evicted during delta collection.

Cardinality overflow handling

ValueMap::bind() returns Option<Arc<TrackerEntry>>None when at the cardinality limit. Bound handle types use a Direct/Fallback enum:

  • Direct: normal case — dedicated tracker, single-digit ns, no map lookup per call
  • Fallback: at overflow — delegates every call to the unbound Measure::call() path

Feature flag

All public API, traits, noop impls, and internal plumbing are gated behind experimental_metrics_bound_instruments.

Benchmark Results

Apple M4 Max, 16 cores (12 performance + 4 efficiency), macOS 15.4:

Benchmark Time vs Unbound
Counter Unbound 53.2 ns
Counter Bound 1.87 ns ~28x faster
Histogram Unbound 58.6 ns
Histogram Bound 6.57 ns ~8.9x faster
EC2 Counter Results (from #3392)
Instance Unbound Bound Speedup
c7i.large 86.2ns 7.47ns 11.5x
c7i.xlarge 88.9ns 7.62ns 11.7x
c7i.4xlarge 90.1ns 7.65ns 11.8x
c7a.large 77.4ns 2.71ns 28.6x
c7a.xlarge 75.0ns 2.71ns 27.7x
c7a.4xlarge 75.0ns 2.71ns 27.7x
c7g.large 111.8ns 6.87ns 16.3x
c7g.xlarge 111.4ns 6.84ns 16.3x
c7g.4xlarge 112.9ns 6.73ns 16.8x
EC2 Histogram Results (from #3392)
Instance Unbound Bound Speedup
c7i.large 97.1ns 17.1ns 5.7x
c7i.xlarge 99.0ns 17.9ns 5.5x
c7i.4xlarge 98.5ns 18.4ns 5.4x
c7a.large 92.5ns 13.7ns 6.8x
c7a.xlarge 92.4ns 13.7ns 6.7x
c7a.4xlarge 92.5ns 13.7ns 6.8x
c7g.large 133.2ns 27.7ns 4.8x
c7g.xlarge 133.5ns 27.6ns 4.8x
c7g.4xlarge 133.8ns 27.2ns 4.9x

Test Coverage

15 tests covering:

  • Cumulative and delta temporality for both Counter and Histogram
  • Bound + unbound sharing the same data point (same attribute set)
  • Idle delta cycles (no update = no export, but handle persists)
  • Cardinality overflow fallback to unbound path (Counter and Histogram)
  • Recovery after delta eviction frees cardinality space
  • Overflow fallback handles working across multiple delta cycles
  • Drop enabling eviction of stale entries
  • Multiple bound handles sharing the same tracker (ref-counting)
  • Binding with empty attributes

Test plan

  • All 164 metrics tests pass (149 existing + 15 new)
  • Compiles with and without the feature flag
  • Zero regression on unbound measurement path

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 14, 2026

Codecov Report

❌ Patch coverage is 89.61039% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.4%. Comparing base (9650783) to head (264009b).

Files with missing lines Patch % Lines
opentelemetry-sdk/src/metrics/mod.rs 93.7% 27 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/mod.rs 91.4% 11 Missing ⚠️
...pentelemetry-sdk/src/metrics/internal/aggregate.rs 0.0% 6 Missing ⚠️
opentelemetry-sdk/src/metrics/noop.rs 0.0% 5 Missing ⚠️
opentelemetry/src/metrics/noop.rs 0.0% 5 Missing ⚠️
opentelemetry/src/metrics/instruments/histogram.rs 60.0% 4 Missing ⚠️
...-sdk/src/metrics/internal/exponential_histogram.rs 62.5% 3 Missing ⚠️
...entelemetry-sdk/src/metrics/internal/last_value.rs 25.0% 3 Missing ⚠️
...emetry-sdk/src/metrics/internal/precomputed_sum.rs 40.0% 3 Missing ⚠️
opentelemetry/src/metrics/instruments/counter.rs 66.6% 3 Missing ⚠️
... and 1 more
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #3421     +/-   ##
=======================================
+ Coverage   83.2%   83.4%   +0.1%     
=======================================
  Files        128     128             
  Lines      25045   25693    +648     
=======================================
+ Hits       20858   21433    +575     
- Misses      4187    4260     +73     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bryantbiggs bryantbiggs force-pushed the feat/bound-instruments branch 2 times, most recently from 18a6278 to f8a03dc Compare March 14, 2026 21:15
…ace iteration

Replace the two-HashMap swap-and-drain pattern in collect_and_reset with
in-place iteration using TrackerEntry status tracking. TrackerEntry wraps
each aggregator with has_been_updated (AtomicBool) and bound_count
(AtomicUsize) fields.

collect_and_reset now:
- Iterates under a read lock (no write lock in steady state)
- Exports only entries updated since last collection
- Evicts stale unbound entries under a write lock with TOCTOU re-check

A new drain_and_reset method preserves the old map-clearing behavior for
Observable/async instruments that need staleness detection.

This eliminates O(n) write-lock acquisitions on the hot path per
collection cycle when attribute sets are reused (the common case).

Splits from open-telemetry#3392. Refs open-telemetry#2328.
Per review feedback, bound_count belongs in the bound instruments PR
since it's not used by the delta collection refactor alone.
Remove redundant has_no_attribute_value field from ValueMap — the
no_attribute_tracker.has_been_updated flag now serves the same purpose.

Rewrite changelog entry to focus on user-facing behavior: cardinality
overflow recovery now requires 2 collect cycles.
Add BoundCounter and BoundHistogram types that cache resolved aggregator
references for a fixed attribute set. Created via Counter::bind() and
Histogram::bind(), bound instruments bypass per-call attribute lookup
for significant performance improvements (~28x for counters, ~9x for
histograms).

Architecture:
- TrackerEntry.bound_count tracks live handles, preventing eviction
- Direct/Fallback enum handles cardinality overflow gracefully
- Unsupported aggregators (ExpoHistogram, LastValue, PrecomputedSum)
  fall back to unbound path instead of panicking

All public API, traits, and internal plumbing are gated behind the
experimental_metrics_bound_instruments feature flag. Includes 15 tests
covering cumulative/delta temporality, overflow fallback, recovery
after eviction, bound+unbound sharing, idle cycles, drop semantics,
and empty attributes.

Splits from open-telemetry#3392. Refs open-telemetry#1374.
cfg-gated imports must sort before non-gated imports of the same module
per rustfmt's stable import ordering rules.
Add bound_count field to TrackerEntry to track live bound instrument
handles. Entries with bound_count > 0 are never evicted during delta
collection, ensuring bound handles always point to a live tracker.

Moved from open-telemetry#3420 per review feedback — bound_count belongs with the
bound instruments feature, not the delta collection refactor.
@bryantbiggs bryantbiggs force-pushed the feat/bound-instruments branch from f8a03dc to 08649e8 Compare March 15, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant