feat: add distributed tracing for webhook handling and PipelineRun timing#2605
feat: add distributed tracing for webhook handling and PipelineRun timing#2605ci-operator wants to merge 1 commit intotektoncd:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of Pipelines-as-Code by integrating OpenTelemetry distributed tracing. This allows operators and developers to gain deeper insights into the performance and flow of webhook event processing and the various stages of PipelineRun execution. By propagating trace context, it facilitates a unified view of operations across PaC and Tekton Pipelines, streamlining debugging and performance analysis. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry distributed tracing to Pipelines-as-Code. Key changes include integrating tracing into the event handling flow, propagating trace context to PipelineRuns via annotations, and emitting timing spans for PipelineRun lifecycle events. The observability configuration has been updated to include new tracing options, but the removal of existing metrics-protocol and metrics-endpoint configurations requires clarification and documentation. Additionally, an improvement opportunity was identified to ensure consistent tracing data by always setting VCS repository and revision attributes, even when empty.
I am having trouble creating individual review comments. Click here to see my feedback.
config/305-config-observability.yaml (24-25)
The metrics-protocol and metrics-endpoint configurations are being removed from the data section. This change was not explicitly mentioned in the pull request description, which focuses on adding tracing. If these metrics were actively used, their removal could be a breaking change or an unintended side effect. Please clarify if this removal is intentional and, if so, document it in the PR description or release notes.
pkg/adapter/adapter.go (212-217)
For better consistency in tracing data, consider always setting the VCSRepositoryKey and VCSRevisionKey attributes, even if l.event.URL or l.event.SHA are empty. This ensures the attribute key is always present in the span, which can simplify querying and analysis in tracing backends. You could set them to an empty string or a placeholder like "unknown" if the values are not available, instead of omitting the attribute entirely.
if l.event.URL != "" {
span.SetAttributes(tracing.VCSRepositoryKey.String(l.event.URL))
} else {
span.SetAttributes(tracing.VCSRepositoryKey.String(""))
}
if l.event.SHA != "" {
span.SetAttributes(tracing.VCSRevisionKey.String(l.event.SHA))
} else {
span.SetAttributes(tracing.VCSRevisionKey.String(""))
}393b9d3 to
3870a7f
Compare
|
/ok-to-test |
3870a7f to
bcbb64e
Compare
|
@zakisk can you have a look pls |
bcbb64e to
cf3108c
Compare
cf3108c to
501e750
Compare
|
@ci-operator for E2E run we're working on permission workaround in this PR #2611 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements distributed tracing for Pipelines-as-Code using OpenTelemetry. It adds logic to extract trace context from incoming webhook headers, propagate it to PipelineRuns via a new annotation, and emit timing spans for event processing and PipelineRun execution. The PR also includes configuration updates and new documentation. Feedback points out that an error during JSON marshalling of the span context is currently ignored and should be logged to assist with debugging.
|
this conflitcs with recently merged 0faad24 |
|
/ok-to-test |
|
@ci-operator there are still some conflicts there I think, can you please fix them |
|
due to recent changes in #2622, label is not deleted automatically because we've changed the way it should have been deleted. |
d01d86f to
f190d03
Compare
|
/ok-to-test |
| namespace: pipelines-as-code | ||
| data: | ||
| tracing-protocol: grpc | ||
| tracing-endpoint: "http://otel-collector.observability.svc.cluster.local:4317" |
There was a problem hiding this comment.
you should also adds docs for deploying traces collector either jaeger or OTEL collector
There was a problem hiding this comment.
Added a "Deploying a trace collector" section pointing to OTel Collector and Jaeger.
pkg/adapter/adapter.go
Outdated
| if event.URL != "" { | ||
| span.SetAttributes(tracing.VCSRepositoryKey.String(event.URL)) |
There was a problem hiding this comment.
for these two as well, they are not having any value at this point and it will always be empty
There was a problem hiding this comment.
Same fix, moved to sinker.go after ParsePayload.
pkg/tracing/tracing.go
Outdated
| DeliverySuccessKey = attribute.Key("delivery.success") | ||
| DeliveryReasonKey = attribute.Key("delivery.reason") | ||
|
|
||
| TektonPipelineRunNameKey = attribute.Key("tekton.pipelinerun.name") |
There was a problem hiding this comment.
shouldn't this name be "tekton.pac.pipelinerun.name"
There was a problem hiding this comment.
Reworked the full attribute schema to align with OTel semconv. PipelineRun identity now uses bare pipelinerun and pipelinerun.uid to match Tekton's existing reconciler span attributes. VCS keys now use the semconv vcs.* namespace (vcs.provider.name, vcs.repository.url.full, vcs.ref.head.revision). See the reply on chmouel's comment for the full breakdown.
|
/ok-to-test |
I was testing it and we shouldn't add label via |
|
/ok-to-test |
|
/test |
|
@ci-operator can you squash your commits plesae? |
pkg/reconciler/emit_traces_test.go
Outdated
|
|
||
| // Add span context annotation for cases that should emit spans | ||
| // (except the "missing annotation" test case which has empty annotations) | ||
| if tt.name != "missing annotation emits no spans" { |
There was a problem hiding this comment.
using string for comparaison is bad practice, better use a bool
There was a problem hiding this comment.
Replaced with a skipSpanContext bool on the test struct. Cleaner and won't break if the test case gets renamed.
|
|
||
| const TracerName = "pipelines-as-code" | ||
|
|
||
| // Span attribute keys. |
There was a problem hiding this comment.
any reasons of the change of attributes style, The attribute keys mix conventions:
- git.* keys use dotted OTel semantic convention style
- delivery.* and tekton.pac.* are custom namespaces
- git.event_type uses underscore while others use dots
There was a problem hiding this comment.
Reworked the attribute naming to align with OTel semconv where possible, match Tekton's existing conventions for resource identity, and scope our domain-specific attributes with a reverse-domain prefix per OTel naming guidance.
Semconv (development status): k8s.namespace.name, cicd.pipeline.action.name, vcs.provider.name, vcs.repository.url.full, vcs.ref.head.revision
Tekton resource identity: pipelinerun and pipelinerun.uid, matching the bare attribute names Tekton's reconciler already emits so trace queries aren't fragmented across services.
PaC-specific: vcs.event.type for webhook event type (push, pull_request, etc.), extending the vcs.* namespace since there's no semconv equivalent.
Delivery pipeline (domain-specific): delivery.tekton.dev.application, delivery.tekton.dev.component, delivery.tekton.dev.success, delivery.tekton.dev.reason. These have no semconv equivalent, so they use a reverse-domain prefix. The PipelineRun labels they read from are also now delivery.tekton.dev/* instead of appstudio.openshift.io/*, keeping label and attribute namespaces aligned.
On cicd.pipeline.action.name: the semconv well-known values (BUILD, RUN, SYNC) describe what a pipeline is doing. Our values (build, test, release) classify a pipeline's role in a delivery chain, so they fall under "otherwise, a custom value MAY be used" per the CICD registry spec. Lowercase matches the existing annotation values on PipelineRuns.
What do you think? Open to adjustments on any of these.
pkg/reconciler/emit_traces.go
Outdated
| ) | ||
|
|
||
| const ( | ||
| applicationLabel = "appstudio.openshift.io/application" |
There was a problem hiding this comment.
why is it appstudio? why not tekton.dev?
There was a problem hiding this comment.
Good question. These were Konflux-specific labels (appstudio.openshift.io/application, appstudio.openshift.io/component) that don't belong in upstream PaC. Removed them and replaced with delivery.tekton.dev/application and delivery.tekton.dev/component, a Tekton-ecosystem namespace for delivery pipeline concepts that any platform can set on its PipelineRuns. PaC reads them if present, doesn't assume who set them.
pkg/adapter/sinker.go
Outdated
| } | ||
|
|
||
| // Enrich the parent span with VCS attributes now that ParsePayload has populated them | ||
| setVCSSpanAttributes(ctx, s.event) |
There was a problem hiding this comment.
this seems to be done twice at line 80: Inside the isIncoming branch of processEvent, before processEventPayload is called at line 46: Inside processEventPayload, after payload parsing
There was a problem hiding this comment.
The two calls were on mutually exclusive paths (incoming events skip processEventPayload, webhook events go through it), but having them in separate branches made it hard to see that at a glance. Consolidated to a single call after the branch converges.
f7c8cd7 to
4407b33
Compare
|
/ok-to-test |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2605 +/- ##
==========================================
+ Coverage 58.80% 58.91% +0.11%
==========================================
Files 206 207 +1
Lines 20304 20413 +109
==========================================
+ Hits 11940 12027 +87
- Misses 7591 7612 +21
- Partials 773 774 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| semconv.K8SNamespaceName(pr.GetNamespace()), | ||
| tracing.PipelineRunKey.String(pr.GetName()), | ||
| tracing.PipelineRunUIDKey.String(string(pr.GetUID())), | ||
| tracing.CICDPipelineActionName.String("build"), |
There was a problem hiding this comment.
CICDPipelineActionName is constant hard coded string? shouldn't it be configurable or what's the reason for just always having "build"?
4407b33 to
01f5ac2
Compare
|
/ok-to-test |
…ming Add OpenTelemetry distributed tracing to PaC. Spans cover webhook event processing and PipelineRun lifecycle timing. Attribute naming follows OTel semconv where possible (vcs.*, cicd.*, k8s.*), matches Tekton's existing bare attribute names for resource identity, and uses delivery.tekton.dev.* for domain-specific delivery pipeline concepts per OTel reverse-domain naming guidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
01f5ac2 to
81f9ee8
Compare
|
/ok-to-test |
|
@ci-operator: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
📝 Description of the Change
Add OpenTelemetry distributed tracing to Pipelines-as-Code. When tracing is enabled via the
pipelines-as-code-config-observabilityConfigMap, PaC emits trace spans for webhook event processing and PipelineRun lifecycle timing.Controller: Emits a
PipelinesAsCode:ProcessEventspan covering the full webhook event lifecycle — from SCM event receipt through PipelineRun creation. Propagates trace context onto created PipelineRuns via thetekton.dev/pipelinerunSpanContextannotation, enabling end-to-end traces when Tekton Pipelines also has tracing enabled.Watcher: Emits
waitDuration(creation → start) andexecuteDuration(start → completion) timing spans for completed PipelineRuns, using resource timestamps for accurate wall-clock timing.Tracing is configured through the existing observability ConfigMap with three new keys:
tracing-protocol,tracing-endpoint, andtracing-sampling-rate. The controller's Knative observability configurator is pointed at the correct PaC-specific ConfigMap (pipelines-as-code-config-observability).🔗 Linked GitHub Issue
https://issues.redhat.com/browse/SRVKP-8544
🧪 Testing Strategy
Manually tested end-to-end with an OpenTelemetry collector and Tempo backend. Verified traces appear with correct span names, attributes, and parent-child relationships across PaC and Tekton Pipelines reconciler spans.
🤖 AI Assistance
AI (Claude) was used for code generation, debugging, and documentation. All code has been reviewed, tested, and deployed. Co-authored-by trailers are on each commit.
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testlocally.make lintrequiresgolangci-lintwhich is not installed locally; CI will validate.