feat: implement pod mutation observability metrics by Arpit529Srivastava · Pull Request #4644 · open-telemetry/opentelemetry-operator

Arpit529Srivastava · 2026-01-22T15:44:53Z

Description:

adds observability to the pod mutation webhook by introducing a new prometheus counter opentelemetry_operator_pod_mutations_total. this allows operators to track success, failure and skip rates for both sidecar injection and auto-instrumentation across the cluster. the metric includes detailed labels such as status, reason, type, language, and namespace, making it easier to distinguish between cases like configuration errors, disabled feature gates, or missing resources.

Link to tracking Issue(s):

Resolves: Improve Operator Pod Mutation Observability #3702

Testing:
new unit test added at internal/webhook/podmutation/metrics_test.go to verify that the counter increments correctly with the expected label set.

integration checks confirm that both the sidecar and instrumentation mutators invoke the metrics recorder at all critical decision points.

existing unit tests were updated to ensure there are no regressions in the mutator logic.

Documentation:

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava · 2026-01-22T16:10:46Z

/cc @swiatekm @iblancasa PTAL.
thanks.

iblancasa

Maybe we should add a new option to enable/disable recording these metrics.
I also miss some E2E tests to ensure the feature works properly.

iblancasa · 2026-01-22T21:01:04Z

internal/webhook/podmutation/metrics.go

+}
+
+func (m *PodMutationMetrics) RecordSidecarMutation(ctx context.Context, status, reason, namespace string) {
+	if m == nil {


I think this can happen for tests, right? How about creating an interface and have a noopMetrics type for the tests?

iblancasa · 2026-01-22T21:42:27Z

internal/webhook/podmutation/metrics.go

+	attrs := []attribute.KeyValue{
+		attribute.String("mutation_type", "sidecar"),
+		attribute.String("status", status),
+		attribute.String("namespace", namespace),


I'm not sure about this... because of cardinality

i recommend/think keeping it since it’s critical for debugging and has bounded cardinality. with the addition of the --enable-webhook-metrics flag, this is now opt-in and minimizes risk. happy to discuss other options.

iblancasa · 2026-01-22T21:51:58Z

internal/instrumentation/podmutator.go


 	if inst, err = pm.getInstrumentationInstance(ctx, ns, pod, annotationInjectJava); err != nil {
 		// we still allow the pod to be created, but we log a message to the operator's logs
+		pm.metrics.RecordInstrumentationMutation(ctx, "error", "lookup_failed", "java", ns.Name)


I think we should define some constants:

const ( StatusSuccess = "success" StatusSkipped = "skipped" StatusRejected = "rejected" StatusError = "error" ReasonAlreadyInstrumented = "already_instrumented" ReasonFeatureDisabled = "feature_disabled" ReasonLookupFailed = "lookup_failed" )

iblancasa · 2026-01-22T22:02:22Z

internal/webhook/podmutation/metrics.go

+	attrs := []attribute.KeyValue{
+		attribute.String("mutation_type", "sidecar"),
+		attribute.String("status", status),
+		attribute.String("namespace", namespace),


Can we use semantic conventions package for those attributes that are already there? Like k8s.namespace.name

iblancasa · 2026-01-22T22:06:32Z

.chloggen/issue-3702-pod-mutation-metrics.yaml

+# (Optional) One or more lines of additional information to render under the primary note.
+# These lines will be padded with 2 spaces and then inserted directly into the document.
+# Use pipe (|) for multiline entries.
+subtext:


Can you explain here what metrics this PR is adding and other stuff?

iblancasa · 2026-01-22T22:09:59Z

internal/instrumentation/podmutator.go

 var _ podmutation.PodMutator = (*instPodMutator)(nil)

-func NewMutator(logger logr.Logger, client client.Client, recorder record.EventRecorder, cfg config.Config) podmutation.PodMutator {
+func NewMutator(logger logr.Logger, client client.Client, recorder record.EventRecorder, cfg config.Config, metrics *podmutation.PodMutationMetrics) podmutation.PodMutator {


Instead of repeating here for each language.. maybe we can do something more table-driven

iblancasa · 2026-01-22T22:11:28Z

main.go

-		var crdMetrics *otelv1beta1.Metrics
+		meterProvider, metricsErr := otelv1beta1.BootstrapMetrics()
+		if metricsErr != nil {
+			setupLog.Error(metricsErr, "Error bootstrapping metrics")


Since we continue, it can happen meterProvider is nil. And that cause a panic later if methods using meterProvider. Can you add checks to avoid that?

fixed by adding a check for meterProvider != nil before initializing metrics and using the noop metrics pattern to prevent panics in other code paths.

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

github-actions · 2026-01-23T10:56:25Z

E2E Test Results

36 files +2 229 suites +2 2h 3m 54s ⏱️ +35s
91 tests +1 90 ✅ ±0 0 💤 ±0 1 ❌ +1
233 runs +2 231 ✅ ±0 0 💤 ±0 2 ❌ +2

For more details on these failures, see this check.

Results for commit d9935fd. ± Comparison against base commit 36d6c75.

♻️ This comment has been updated with latest results.

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

… rbac issues Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava · 2026-01-23T13:07:40Z

@iblancasa
i’m running into issues verifying webhook metrics in the e2e tests due to auth and permission constraints in the test environment. i’ve tried a few common approaches, but each one hits a blocker:

service/pod proxy (kubectl get --raw)
tried using the api server proxy to fetch metrics (for example, /api/v1/namespaces/.../pods/https:${POD_NAME}:8443/proxy/metrics).
result: fails with error: you must be logged in to the server.
reason: the test runner’s serviceaccount appears to be missing permissions for the pods/proxy or services/proxy subresources.
port-forward (kubectl port-forward + curl) tried bypassing the api server by port-forwarding locally and curling the endpoint. result: returns 401 unauthorized. reason: the operator runs with --metrics-secure=true` by default, so the metrics endpoint rejects unauthenticated requests.

what’s the recommended way to verify secured metrics in this e2e environment? should the test disable secure metrics (--metrics-secure=false), or is there a supported way (token / auth mechanism) to authenticate against the metrics endpoint?

Arpit529Srivastava · 2026-01-28T16:18:30Z

@iblancasa a gentle ping, thanks :)

Arpit529Srivastava added 3 commits January 22, 2026 03:25

feat: add metrics infrastructure for pod mutation

9f3d569

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

feat: implement pod mutation observability metrics

801aac1

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

docs: add changelog entry

e81569f

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava requested a review from a team as a code owner January 22, 2026 15:44

iblancasa reviewed Jan 22, 2026

View reviewed changes

feat: add opt-in webhook metrics with E2E tests

b5bb8d8

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava force-pushed the pod-mutation-metrics branch from f8580e4 to b5bb8d8 Compare January 23, 2026 09:35

enhance webhook metrics E2E test reliability

4d99f84

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava force-pushed the pod-mutation-metrics branch from 905af90 to 4d99f84 Compare January 23, 2026 09:57

fix bash-ism in E2E test script to ensure retries work

23f3710

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava force-pushed the pod-mutation-metrics branch from cd369e3 to 23f3710 Compare January 23, 2026 10:11

Arpit529Srivastava added 4 commits January 23, 2026 15:41

Merge branch 'main' into pod-mutation-metrics

f048f95

fix: use NoopMetrics in tests, enhance E2E debugging

21dcc4e

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Merge branch 'main' into pod-mutation-metrics

0a625a9

fix: add missing EnableWebhookMetrics field to Config struct

db0caa9

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava added 5 commits January 23, 2026 16:43

fix: resolve unit and E2E test failures for webhook metrics

9139797

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

fix: resolve chainsaw variable scope in E2E test

176f5e5

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

fix: use https prefix for metrics proxy and improve debug logging

3a21303

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

fix: retrieve metrics via pod proxy instead of service proxy to avoid…

0a11704

… rbac issues Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

fix: use kubectl port-forward to bypass api proxy rbac restrictions

d9935fd

Signed-off-by: arpit529srivastava <arpitsrivastava529@gmail.com>

Arpit529Srivastava requested a review from iblancasa January 23, 2026 13:07

Conversation

Arpit529Srivastava commented Jan 22, 2026

Uh oh!

Arpit529Srivastava commented Jan 22, 2026

Uh oh!

iblancasa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

Arpit529Srivastava commented Jan 23, 2026

Uh oh!

Arpit529Srivastava commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 23, 2026 •

edited

Loading