Skip to content

[target-allocator] Add feature flag for labeled targets_remaining metric with OTel conventions#4640

Open
hkf57 wants to merge 4 commits intoopen-telemetry:mainfrom
hkf57:add-labels-to-targets-remaining
Open

[target-allocator] Add feature flag for labeled targets_remaining metric with OTel conventions#4640
hkf57 wants to merge 4 commits intoopen-telemetry:mainfrom
hkf57:add-labels-to-targets-remaining

Conversation

@hkf57
Copy link
Copy Markdown

@hkf57 hkf57 commented Jan 20, 2026

Description

Adds k8s.namespace.name and job.name labels to the opentelemetry_allocator_targets_remaining metric (when enabled via feature gate) to enable distinguishing between intentionally filtered targets and orphaned targets due to bugs.

Breaking Change Mitigation: This change is behind the operator.targetallocator.labeledmetrics feature gate (alpha) to provide a migration path for existing users.

Motivation and Context

Currently, opentelemetry_allocator_targets_remaining reports a scalar count, making it impossible to create meaningful alerts. In production environments with namespace filtering (e.g., excluding istio-system to control costs), the baseline can be ~1600 targets. This masks genuine issues like orphaned targets from deleted collectors or version skew scenarios.

Problem Example

In production environments:

  • Filtering istio-system namespace (~1600 targets) via allowNamespaces to reduce costs
  • Recent 14-hour outage where kube-state-metrics target was orphaned due to version skew during StatefulSet rollout
  • Could not alert on orphaned targets because targets_remaining baseline was already 1600

Solution

Feature Flag (Alpha)

Added operator.targetallocator.labeledmetrics feature gate:

  • Default: Disabled (maintains backward compatibility)
  • When enabled: Emits metrics with OTel semantic convention labels
  • Stage: Alpha
  • From version: v0.145.0

Label Names (OTel Semantic Conventions)

Following OpenTelemetry semantic conventions:

  • job.name (was job_name)
  • k8s.namespace.name (was namespace)

Feature Gate Propagation

Follows the existing operator pattern for feature gate propagation to the target allocator:

  • Operator checks operator.targetallocator.labeledmetrics gate
  • Writes labeled_metrics: true into the TA configmap when enabled
  • TA reads from its config file (same pattern as allocation_fallback_strategy and https)
  • No CLI flag forwarding needed; config-based propagation is the established convention

Implementation

Modified SetTargets() in allocator.go to:

  1. Check labeled_metrics config value
  2. When enabled: Group remaining targets by job and namespace, emit labelled metrics
  3. When disabled: Emit single unlabelled gauge (original behaviour)
  4. Extract namespace from __meta_kubernetes_namespace label
  5. Use "unknown" for targets without namespace metadata

Usage

Enable via the operator's feature gates:

# On the operator
--feature-gates=operator.targetallocator.labeledmetrics

The operator will propagate this to the target allocator's configmap automatically.

After this change (when feature enabled)

Alerts can filter specific namespaces:

# Alert on orphaned targets, excluding known filtered namespaces
opentelemetry_allocator_targets_remaining{k8s_namespace_name!="istio-system"} > 10

# Or alert on specific critical jobs
opentelemetry_allocator_targets_remaining{job_name="kube-state-metrics"} > 0

Testing

  • Code compiles successfully
  • Mock interfaces updated
  • Feature gate registered in global registry
  • Configmap propagation test
  • Collector not ready grace period test updated for new config field
  • Unit tests for labelled/unlabelled behaviour
  • E2E tests with feature flag enabled/disabled

Checklist

  • DCO sign-off included
  • No breaking changes (behind feature flag)
  • Follows OTel semantic conventions
  • Follows existing feature gate propagation pattern (configmap, not CLI flags)
  • Unit tests added
  • E2E tests updated

Fixes #4637

@hkf57 hkf57 requested a review from a team as a code owner January 20, 2026 21:29
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Jan 20, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@hkf57 hkf57 changed the title [target-allocator] Add namespace and job_name labels to targets_remaining metric [target-allocator] Add feature flag for labeled targets_remaining metric with OTel conventions Jan 28, 2026
@swiatekm
Copy link
Copy Markdown
Contributor

I don't think what you've done is enough for the feature gate to propagate down. Right now, if you set it on the operator, it won't do anything. It is possible to set it on the allocator directly by passing it as an argument, but that's quite inconvenient. It'd be better if the operator did it if it's enabled.

@hkf57 hkf57 force-pushed the add-labels-to-targets-remaining branch 2 times, most recently from 33122ea to c163f8d Compare April 2, 2026 06:30
hkf57 added 3 commits April 2, 2026 17:31
…ning metric

Add namespace and job_name labels to opentelemetry_allocator_targets_remaining
metric to allow distinguishing between intentionally filtered targets and
orphaned targets due to bugs.

Currently, the metric reports a scalar count making it impossible to alert
on orphaned targets without false positives from intentionally filtered
namespaces (e.g., istio-system excluded for cost control).

With this change, alerts can filter specific namespaces:
  opentelemetry_allocator_targets_remaining{namespace!="istio-system"} > 10

Changes:
- Modified SetTargets() to group remaining targets by job_name and namespace
- Each unique job_name/namespace combination emits separate gauge value
- Targets without namespace use "unknown" label value
- All existing tests pass

Fixes open-telemetry#4637

Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
…ric with OTel conventions

- Add operator.targetallocator.labeledmetrics feature gate (alpha)
- Change label names to OTel semantic conventions: job.name and k8s.namespace.name
- Maintain backward compatibility when feature flag is disabled

Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
Move feature gate propagation to follow the existing pattern: operator
checks the gate and writes config into the TA configmap, rather than
passing CLI flags. The TA reads labeled_metrics from its config file.

Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
@hkf57 hkf57 force-pushed the add-labels-to-targets-remaining branch from c163f8d to dd71375 Compare April 2, 2026 06:31
Signed-off-by: Charles Lee <charles.lee@safetyculture.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add namespace/job_name labels to opentelemetry_allocator_targets_remaining metric

2 participants