[target-allocator] Add feature flag for labeled targets_remaining metric with OTel conventions#4640
Open
hkf57 wants to merge 4 commits intoopen-telemetry:mainfrom
Open
[target-allocator] Add feature flag for labeled targets_remaining metric with OTel conventions#4640hkf57 wants to merge 4 commits intoopen-telemetry:mainfrom
hkf57 wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
Contributor
|
I don't think what you've done is enough for the feature gate to propagate down. Right now, if you set it on the operator, it won't do anything. It is possible to set it on the allocator directly by passing it as an argument, but that's quite inconvenient. It'd be better if the operator did it if it's enabled. |
33122ea to
c163f8d
Compare
…ning metric
Add namespace and job_name labels to opentelemetry_allocator_targets_remaining
metric to allow distinguishing between intentionally filtered targets and
orphaned targets due to bugs.
Currently, the metric reports a scalar count making it impossible to alert
on orphaned targets without false positives from intentionally filtered
namespaces (e.g., istio-system excluded for cost control).
With this change, alerts can filter specific namespaces:
opentelemetry_allocator_targets_remaining{namespace!="istio-system"} > 10
Changes:
- Modified SetTargets() to group remaining targets by job_name and namespace
- Each unique job_name/namespace combination emits separate gauge value
- Targets without namespace use "unknown" label value
- All existing tests pass
Fixes open-telemetry#4637
Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
…ric with OTel conventions - Add operator.targetallocator.labeledmetrics feature gate (alpha) - Change label names to OTel semantic conventions: job.name and k8s.namespace.name - Maintain backward compatibility when feature flag is disabled Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
Move feature gate propagation to follow the existing pattern: operator checks the gate and writes config into the TA configmap, rather than passing CLI flags. The TA reads labeled_metrics from its config file. Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
c163f8d to
dd71375
Compare
Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds
k8s.namespace.nameandjob.namelabels to theopentelemetry_allocator_targets_remainingmetric (when enabled via feature gate) to enable distinguishing between intentionally filtered targets and orphaned targets due to bugs.Breaking Change Mitigation: This change is behind the
operator.targetallocator.labeledmetricsfeature gate (alpha) to provide a migration path for existing users.Motivation and Context
Currently,
opentelemetry_allocator_targets_remainingreports a scalar count, making it impossible to create meaningful alerts. In production environments with namespace filtering (e.g., excludingistio-systemto control costs), the baseline can be ~1600 targets. This masks genuine issues like orphaned targets from deleted collectors or version skew scenarios.Problem Example
In production environments:
istio-systemnamespace (~1600 targets) viaallowNamespacesto reduce coststargets_remainingbaseline was already 1600Solution
Feature Flag (Alpha)
Added
operator.targetallocator.labeledmetricsfeature gate:Label Names (OTel Semantic Conventions)
Following OpenTelemetry semantic conventions:
job.name(wasjob_name)k8s.namespace.name(wasnamespace)Feature Gate Propagation
Follows the existing operator pattern for feature gate propagation to the target allocator:
operator.targetallocator.labeledmetricsgatelabeled_metrics: trueinto the TA configmap when enabledallocation_fallback_strategyandhttps)Implementation
Modified
SetTargets()inallocator.goto:labeled_metricsconfig value__meta_kubernetes_namespacelabelUsage
Enable via the operator's feature gates:
# On the operator --feature-gates=operator.targetallocator.labeledmetricsThe operator will propagate this to the target allocator's configmap automatically.
After this change (when feature enabled)
Alerts can filter specific namespaces:
Testing
Checklist
Fixes #4637