Skip to content

feat: add distributed tracing for webhook handling and PipelineRun timing#2605

Open
ci-operator wants to merge 1 commit intotektoncd:mainfrom
ci-operator:distributed-tracing
Open

feat: add distributed tracing for webhook handling and PipelineRun timing#2605
ci-operator wants to merge 1 commit intotektoncd:mainfrom
ci-operator:distributed-tracing

Conversation

@ci-operator
Copy link
Copy Markdown

📝 Description of the Change

Add OpenTelemetry distributed tracing to Pipelines-as-Code. When tracing is enabled via the pipelines-as-code-config-observability ConfigMap, PaC emits trace spans for webhook event processing and PipelineRun lifecycle timing.

Controller: Emits a PipelinesAsCode:ProcessEvent span covering the full webhook event lifecycle — from SCM event receipt through PipelineRun creation. Propagates trace context onto created PipelineRuns via the tekton.dev/pipelinerunSpanContext annotation, enabling end-to-end traces when Tekton Pipelines also has tracing enabled.

Watcher: Emits waitDuration (creation → start) and executeDuration (start → completion) timing spans for completed PipelineRuns, using resource timestamps for accurate wall-clock timing.

Tracing is configured through the existing observability ConfigMap with three new keys: tracing-protocol, tracing-endpoint, and tracing-sampling-rate. The controller's Knative observability configurator is pointed at the correct PaC-specific ConfigMap (pipelines-as-code-config-observability).

🔗 Linked GitHub Issue

https://issues.redhat.com/browse/SRVKP-8544

🧪 Testing Strategy

  • Unit tests
  • Integration tests
  • End-to-end tests
  • Manual testing
  • Not Applicable

Manually tested end-to-end with an OpenTelemetry collector and Tempo backend. Verified traces appear with correct span names, attributes, and parent-child relationships across PaC and Tekton Pipelines reconciler spans.

🤖 AI Assistance

  • I have not used any AI assistance for this PR.
  • I have used AI assistance for this PR.

AI (Claude) was used for code generation, debugging, and documentation. All code has been reviewed, tested, and deployed. Co-authored-by trailers are on each commit.

✅ Submitter Checklist

  • 📝 My commit messages are clear, informative, and follow the project's commit message guide.
  • ✨ I have ensured my commit message prefix (e.g., fix:, feat:) matches the "Type of Change" I selected above.
  • ♽ I have run make test locally. make lint requires golangci-lint which is not installed locally; CI will validate.
  • 📖 I have added or updated documentation for any user-facing changes.
  • 🧪 I have added sufficient unit tests for my code changes.
  • 🎁 I have added end-to-end tests where feasible.
  • 🔎 I have addressed any CI test flakiness or provided a clear reason to bypass it.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of Pipelines-as-Code by integrating OpenTelemetry distributed tracing. This allows operators and developers to gain deeper insights into the performance and flow of webhook event processing and the various stages of PipelineRun execution. By propagating trace context, it facilitates a unified view of operations across PaC and Tekton Pipelines, streamlining debugging and performance analysis.

Highlights

  • Distributed Tracing Implementation: Introduced OpenTelemetry distributed tracing to Pipelines-as-Code, enabling visibility into webhook event processing and PipelineRun lifecycle timing.
  • Webhook Event Tracing: The controller now emits a 'PipelinesAsCode:ProcessEvent' span, covering the full webhook event lifecycle from receipt to PipelineRun creation, with relevant VCS attributes.
  • PipelineRun Timing Spans: The watcher emits 'waitDuration' (creation to start) and 'executeDuration' (start to completion) spans for completed PipelineRuns, using accurate resource timestamps.
  • Trace Context Propagation: Trace context is propagated to created PipelineRuns via the 'tekton.dev/pipelinerunSpanContext' annotation, allowing for end-to-end traces when Tekton Pipelines also has tracing enabled.
  • Configurable Observability: Tracing is configured through the existing 'pipelines-as-code-config-observability' ConfigMap using new keys: 'tracing-protocol', 'tracing-endpoint', and 'tracing-sampling-rate'.
  • New Documentation: Added comprehensive documentation on how to enable and configure distributed tracing for Pipelines-as-Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OpenTelemetry distributed tracing to Pipelines-as-Code. Key changes include integrating tracing into the event handling flow, propagating trace context to PipelineRuns via annotations, and emitting timing spans for PipelineRun lifecycle events. The observability configuration has been updated to include new tracing options, but the removal of existing metrics-protocol and metrics-endpoint configurations requires clarification and documentation. Additionally, an improvement opportunity was identified to ensure consistent tracing data by always setting VCS repository and revision attributes, even when empty.

I am having trouble creating individual review comments. Click here to see my feedback.

config/305-config-observability.yaml (24-25)

high

The metrics-protocol and metrics-endpoint configurations are being removed from the data section. This change was not explicitly mentioned in the pull request description, which focuses on adding tracing. If these metrics were actively used, their removal could be a breaking change or an unintended side effect. Please clarify if this removal is intentional and, if so, document it in the PR description or release notes.

pkg/adapter/adapter.go (212-217)

medium

For better consistency in tracing data, consider always setting the VCSRepositoryKey and VCSRevisionKey attributes, even if l.event.URL or l.event.SHA are empty. This ensures the attribute key is always present in the span, which can simplify querying and analysis in tracing backends. You could set them to an empty string or a placeholder like "unknown" if the values are not available, instead of omitting the attribute entirely.

if l.event.URL != "" {
			span.SetAttributes(tracing.VCSRepositoryKey.String(l.event.URL))
		} else {
			span.SetAttributes(tracing.VCSRepositoryKey.String(""))
		}
		if l.event.SHA != "" {
			span.SetAttributes(tracing.VCSRevisionKey.String(l.event.SHA))
		} else {
			span.SetAttributes(tracing.VCSRevisionKey.String(""))
		}

@ci-operator ci-operator force-pushed the distributed-tracing branch from 393b9d3 to 3870a7f Compare March 25, 2026 15:04
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 25, 2026

/ok-to-test

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 27, 2026

@zakisk can you have a look pls

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

@ci-operator for E2E run we're working on permission workaround in this PR #2611

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 30, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements distributed tracing for Pipelines-as-Code using OpenTelemetry. It adds logic to extract trace context from incoming webhook headers, propagate it to PipelineRuns via a new annotation, and emit timing spans for event processing and PipelineRun execution. The PR also includes configuration updates and new documentation. Feedback points out that an error during JSON marshalling of the span context is currently ignored and should be logged to assist with debugging.

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 30, 2026

this conflitcs with recently merged 0faad24

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 30, 2026

/ok-to-test

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

@ci-operator there are still some conflicts there I think, can you please fix them

@zakisk zakisk removed the ok-to-test label Mar 30, 2026
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

due to recent changes in #2622, label is not deleted automatically because we've changed the way it should have been deleted.

@ci-operator ci-operator force-pushed the distributed-tracing branch from d01d86f to f190d03 Compare March 30, 2026 16:45
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

/ok-to-test

namespace: pipelines-as-code
data:
tracing-protocol: grpc
tracing-endpoint: "http://otel-collector.observability.svc.cluster.local:4317"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also adds docs for deploying traces collector either jaeger or OTEL collector

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a "Deploying a trace collector" section pointing to OTel Collector and Jaeger.

Comment on lines +211 to +212
if event.URL != "" {
span.SetAttributes(tracing.VCSRepositoryKey.String(event.URL))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these two as well, they are not having any value at this point and it will always be empty

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix, moved to sinker.go after ParsePayload.

DeliverySuccessKey = attribute.Key("delivery.success")
DeliveryReasonKey = attribute.Key("delivery.reason")

TektonPipelineRunNameKey = attribute.Key("tekton.pipelinerun.name")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this name be "tekton.pac.pipelinerun.name"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for uid

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked the full attribute schema to align with OTel semconv. PipelineRun identity now uses bare pipelinerun and pipelinerun.uid to match Tekton's existing reconciler span attributes. VCS keys now use the semconv vcs.* namespace (vcs.provider.name, vcs.repository.url.full, vcs.ref.head.revision). See the reply on chmouel's comment for the full breakdown.

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 31, 2026

/ok-to-test

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 2, 2026

✅ Added labels: ok-to-test.

I was testing it and we shouldn't add label via /label ok-to-test comment because pipelines-as-code[bot] doesn't have write permission (it's default behavior of GitHub for bots)

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 2, 2026

/ok-to-test

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 2, 2026

/test

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 2, 2026

@ci-operator can you squash your commits plesae?


// Add span context annotation for cases that should emit spans
// (except the "missing annotation" test case which has empty annotations)
if tt.name != "missing annotation emits no spans" {
Copy link
Copy Markdown
Member

@chmouel chmouel Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using string for comparaison is bad practice, better use a bool

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with a skipSpanContext bool on the test struct. Cleaner and won't break if the test case gets renamed.


const TracerName = "pipelines-as-code"

// Span attribute keys.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reasons of the change of attributes style, The attribute keys mix conventions:

  • git.* keys use dotted OTel semantic convention style
  • delivery.* and tekton.pac.* are custom namespaces
  • git.event_type uses underscore while others use dots

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked the attribute naming to align with OTel semconv where possible, match Tekton's existing conventions for resource identity, and scope our domain-specific attributes with a reverse-domain prefix per OTel naming guidance.

Semconv (development status): k8s.namespace.name, cicd.pipeline.action.name, vcs.provider.name, vcs.repository.url.full, vcs.ref.head.revision

Tekton resource identity: pipelinerun and pipelinerun.uid, matching the bare attribute names Tekton's reconciler already emits so trace queries aren't fragmented across services.

PaC-specific: vcs.event.type for webhook event type (push, pull_request, etc.), extending the vcs.* namespace since there's no semconv equivalent.

Delivery pipeline (domain-specific): delivery.tekton.dev.application, delivery.tekton.dev.component, delivery.tekton.dev.success, delivery.tekton.dev.reason. These have no semconv equivalent, so they use a reverse-domain prefix. The PipelineRun labels they read from are also now delivery.tekton.dev/* instead of appstudio.openshift.io/*, keeping label and attribute namespaces aligned.

On cicd.pipeline.action.name: the semconv well-known values (BUILD, RUN, SYNC) describe what a pipeline is doing. Our values (build, test, release) classify a pipeline's role in a delivery chain, so they fall under "otherwise, a custom value MAY be used" per the CICD registry spec. Lowercase matches the existing annotation values on PipelineRuns.

What do you think? Open to adjustments on any of these.

)

const (
applicationLabel = "appstudio.openshift.io/application"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it appstudio? why not tekton.dev?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. These were Konflux-specific labels (appstudio.openshift.io/application, appstudio.openshift.io/component) that don't belong in upstream PaC. Removed them and replaced with delivery.tekton.dev/application and delivery.tekton.dev/component, a Tekton-ecosystem namespace for delivery pipeline concepts that any platform can set on its PipelineRuns. PaC reads them if present, doesn't assume who set them.

}

// Enrich the parent span with VCS attributes now that ParsePayload has populated them
setVCSSpanAttributes(ctx, s.event)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be done twice at line 80: Inside the isIncoming branch of processEvent, before processEventPayload is called at line 46: Inside processEventPayload, after payload parsing

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two calls were on mutually exclusive paths (incoming events skip processEventPayload, webhook events go through it), but having them in separate branches made it hard to see that at a glance. Consolidated to a single call after the branch converges.

@ci-operator ci-operator force-pushed the distributed-tracing branch 2 times, most recently from f7c8cd7 to 4407b33 Compare April 6, 2026 14:27
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 7, 2026

/ok-to-test

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 80.18018% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.91%. Comparing base (a387d41) to head (01f5ac2).

Files with missing lines Patch % Lines
pkg/adapter/sinker.go 0.00% 14 Missing ⚠️
pkg/kubeinteraction/labels.go 42.85% 7 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2605      +/-   ##
==========================================
+ Coverage   58.80%   58.91%   +0.11%     
==========================================
  Files         206      207       +1     
  Lines       20304    20413     +109     
==========================================
+ Hits        11940    12027      +87     
- Misses       7591     7612      +21     
- Partials      773      774       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

semconv.K8SNamespaceName(pr.GetNamespace()),
tracing.PipelineRunKey.String(pr.GetName()),
tracing.PipelineRunUIDKey.String(string(pr.GetUID())),
tracing.CICDPipelineActionName.String("build"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CICDPipelineActionName is constant hard coded string? shouldn't it be configurable or what's the reason for just always having "build"?

@zakisk zakisk force-pushed the distributed-tracing branch from 4407b33 to 01f5ac2 Compare April 8, 2026 05:20
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 8, 2026

/ok-to-test

…ming

Add OpenTelemetry distributed tracing to PaC. Spans cover webhook
event processing and PipelineRun lifecycle timing. Attribute naming
follows OTel semconv where possible (vcs.*, cicd.*, k8s.*), matches
Tekton's existing bare attribute names for resource identity, and
uses delivery.tekton.dev.* for domain-specific delivery pipeline
concepts per OTel reverse-domain naming guidance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@zakisk zakisk force-pushed the distributed-tracing branch from 01f5ac2 to 81f9ee8 Compare April 8, 2026 05:50
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 8, 2026

/ok-to-test

@tekton-robot
Copy link
Copy Markdown

@ci-operator: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants