Skip to content

Kubernetes Grafana Dashboards to OpenTelemetry Collector (DaemonSet) Metrics#2094

Open
ThashmikaX wants to merge 13 commits intoopen-telemetry:mainfrom
ThashmikaX:k8s-monitoring-dashboards
Open

Kubernetes Grafana Dashboards to OpenTelemetry Collector (DaemonSet) Metrics#2094
ThashmikaX wants to merge 13 commits intoopen-telemetry:mainfrom
ThashmikaX:k8s-monitoring-dashboards

Conversation

@ThashmikaX
Copy link
Copy Markdown

Port Kubernetes Grafana Dashboards to OpenTelemetry Collector (DaemonSet) Metrics

Summary

This PR ports several Kubernetes Grafana dashboards originally designed for kube-prometheus-stack (Prometheus Node Exporter) to instead use metrics collected by the OpenTelemetry Collector running as a DaemonSet.

Background / Motivation

As part of our observability migration, we transitioned from Prometheus Node Exporter (via kube-prometheus-stack) to the OpenTelemetry Collector for node-level metrics collection. While this provides a more unified and flexible telemetry pipeline, it also results in differences in metric names and semantic conventions.

Because of these differences, the default Grafana dashboards bundled with kube-prometheus-stack no longer function correctly without modification.

To preserve equivalent monitoring visibility and avoid losing operational insight, we have:

  • Reworked the queries in several Kubernetes dashboards
  • Mapped Prometheus-based metrics to their OpenTelemetry equivalents
  • Adjusted panels where semantic or structural metric differences required it

What This PR Includes

  • Updated dashboard JSON files compatible with OpenTelemetry Collector metrics
  • Query rewrites aligned with current OpenTelemetry semantic conventions
  • Preserved layout and intent of the original Kubernetes dashboards where possible

Community Value

These dashboards may be useful for teams:

  • Migrating from kube-prometheus-stack to OpenTelemetry Collector
  • Running the Collector as a DaemonSet for node metrics
  • Looking to maintain similar dashboard functionality without Prometheus Node Exporter

Notes

  • Some metric mappings depend on current OpenTelemetry semantic conventions.
  • Future updates may be required as semantic conventions continue to evolve.
  • Feedback and suggestions for improvement are very welcome.

Hi @cyrille-leclerc, this PR is related to the dashboards mentioned here: Thread

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Feb 24, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@cyrille-leclerc
Copy link
Copy Markdown
Member

Many thanks @ThashmikaX. Can you please share with us the OpenTelemetry Collector configuration you use with these dashboards? Do these dashboards rely on metrics produced by the OTelCollector KubeletStatsReceiver, the k8sclusterreceiver, and the hostmetricsreceiver?

The dashboards of this PR seem to often use Prometheus style instrumentation metrics like container_cpu_usage_seconds_total (e.g. here) rather than their OpenTelemetry equivalent like k8s.pod.cpu.time / k8s_pod_cpu_time_seconds_total.

@cyrille-leclerc
Copy link
Copy Markdown
Member

FYI we have started documenting the mapping from Prometheus metrics used in the kubernetes-mixin to their OTel equivalent in [kubernetes-mixin/docs/otel-mapping.md}(https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/docs/otel-mapping.md)

In parallel we work at porting the Kubernetes Mixin to OTel metrics on grafana/kubernetes-mixin-otel which is intended to move to the Github org https://github.com/kubernetes-monitoring once it gets more stable, contributions are welcome.

Waiting for @ThashmikaX 's answer on the metrics consumed by the dashboards of this PR, we will continue to work on the mixin repos.

@ThashmikaX
Copy link
Copy Markdown
Author

ThashmikaX commented Feb 25, 2026

Many thanks @ThashmikaX. Can you please share with us the OpenTelemetry Collector configuration you use with these dashboards? Do these dashboards rely on metrics produced by the OTelCollector KubeletStatsReceiver, the k8sclusterreceiver, and the hostmetricsreceiver?

The dashboards of this PR seem to often use Prometheus style instrumentation metrics like container_cpu_usage_seconds_total (e.g. here) rather than their OpenTelemetry equivalent like k8s.pod.cpu.time / k8s_pod_cpu_time_seconds_total.

Thanks for pointing that out.

Yes, the dashboards are generally intended to work with metrics coming from the kubeletstatsreceiver, k8sclusterreceiver, and hostmetricsreceiver.

In a few places I’ve used Prometheus-style metrics (e.g., container_cpu_usage_seconds_total) because those are what we currently have available in our environment. I can align everything to the OTel semantic metrics. Will do the changes to this PR.

Will share the OpenTelemetry Collector configuration.

@ThashmikaX
Copy link
Copy Markdown
Author

ThashmikaX commented Feb 25, 2026

Many thanks @ThashmikaX. Can you please share with us the OpenTelemetry Collector configuration you use with these dashboards? Do these dashboards rely on metrics produced by the OTelCollector KubeletStatsReceiver, the k8sclusterreceiver, and the hostmetricsreceiver?

The dashboards of this PR seem to often use Prometheus style instrumentation metrics like container_cpu_usage_seconds_total (e.g. here) rather than their OpenTelemetry equivalent like k8s.pod.cpu.time / k8s_pod_cpu_time_seconds_total.

values.yaml @cyrille-leclerc

@cyrille-leclerc
Copy link
Copy Markdown
Member

Thanks for the clarification @ThashmikaX !

@ThashmikaX
Copy link
Copy Markdown
Author

I will update the dashboards to consistently use the OTel semantic metrics and push the changes to this PR.
I’ll get back with the updated dashboards early next week.

@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 14, 2026
@cyrille-leclerc
Copy link
Copy Markdown
Member

Thanks @ThashmikaX , it's great to see more and more dashboard panels showing metrics!

@github-actions github-actions bot removed the Stale label Mar 18, 2026
@ThashmikaX
Copy link
Copy Markdown
Author

In the kubernetes-networking-namespace-workload dashboard, I removed two existing panels: Rate of Received Packets and Rate of Transmitted Packets. This is because I couldn’t find any pod- or deployment-level network packet metrics from OpenTelemetry. Did I miss anything?

- Updated the Kubernetes Kubelet dashboard to use OTel metrics for node, pod, and container counts, replacing previous kubelet metrics.
- Changed expressions to count nodes, pods, containers, and volumes based on OTel metrics.
- Renamed titles and descriptions for clarity and accuracy.
- Adjusted the Node Exporter dashboard to utilize OTel metrics for CPU, memory, and disk usage, enhancing compatibility with Kubernetes environments.
- Improved variable queries for clusters and instances to align with OTel metrics.
- General cleanup and formatting improvements for better readability and maintainability.
@ThashmikaX
Copy link
Copy Markdown
Author

Hi @cyrille-leclerc, can you check back again now?

@cyrille-leclerc
Copy link
Copy Markdown
Member

Thanks, I'll test ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants