Add cpu metrics for k8s pods and containers#3539
Add cpu metrics for k8s pods and containers#3539jmmcorreia wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
ChrsMark
left a comment
There was a problem hiding this comment.
Can we split this into smaller standalone PRs?
I would focus on k8s.pod.cpu.limit_utilization and k8s.pod.cpu.request_utilization first so as to unblock open-telemetry/opentelemetry-collector-contrib#46464
Sure, makes sense. I will keep this PR for EDIT: Note, for now I also kept |
| brief: "Maximum CPU resource limit set for the pod." | ||
| note: | | ||
| Pod-level CPU limits are supported from Kubernetes 1.34+ via the PodSpec resources field. | ||
| When not specified at the pod level, the value may be derived from the sum of container limits. |
There was a problem hiding this comment.
Should we emit k8s.pod.cpu.limit if it's not explicitly set on the Pod? I think deriving this as a sum from container limits should only be a concern for the k8s.pod.cpu.limit_utilization metric.
There was a problem hiding this comment.
That is a good point, I probably should have detailed a little more why I proposed it this way.
K8s itself also starts with an aggregation of container resources (i.e. https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-helpers/resource/helpers.go#L343)
It then overwrites the aggregated values if something is defined at the pod-level resource spec. What this means is that if for example CPU limit is defined at pod-level resource spec, we might still see pod-level cpu request (and also memory limit/request) being emitted, since k8s provides those by aggregating the container values. (Or conversely, we could see cpu limit being emitted despite mby only the request being present on the pod level spec)
Based on this, I preferred this approach due to the following reasons:
- The current proposal covers the k8s behavior where emitted values might actually be related to an aggregation of container values instead of something coming from the spec.
- It gives more flexibility to also emit limit and request aligned with the limit/request utilization calculations. Hence, any user is either able to enable limit/request and use them as kubeletstats receiver does, or they can just get the already computed value instead by enabling utilization.
There was a problem hiding this comment.
I see, thank's for the context. In that case should we make the definition more accurate? Sth like 'the reported value is the "Effective Limit" following the formulas from Kubernetes' and link to any helper function and KEP?
I also see 2 open questions here:
- What should we do for the overhead that might also be used at https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-helpers/resource/helpers.go#L365? Should we always include it? If K8s adds this by default then maybe we should too.
- Should we also include the definition from https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-helpers/resource/helpers.go#L379-L380? BTW, does k8s require all containers to have limit/request set for the summary to be used as the fallback? If not then maybe we need to adjust accordingly.
There was a problem hiding this comment.
You are definitely right, we will need to make the definitions more accurate. I actually realized that there were some things that I misunderstood even after having looked at the k8s code. So, this could even be worse if someone was going only from the current definition.
I will just detail here a little bit what I know and where I'm focusing right now in case someone is also able to chime in on this.
Regarding pod overhead, you are right, I think we probably should also include it. Thankfully, this should be straightforward as it will be included in the /pods API, so no additional API calls will be needed.
Regarding point 2, I think this is were I need some more exploratory to make sure all gaps are closed. This is what I know so far:
-
Sum of container limits would generally match the pod limits, unless somehow the init containers have a higher limit, in which case then the init container limit is used
-
Defining pod level limits, but no requests, means requests will get populated (https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/v1/defaults.go#L198). However, this does not work the other way around, k8s only needs requests to be present for scheduling. Hence, this also creates an asymmetric behavior that I first forgot to account for
-
Due to the prior comment, this means k8s pods API will never return limits as a sum of container resources (This can ever only happen for requests). Initially I wrongly assumed this would go both ways.
-
Due to point 1 in https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5419-pod-level-resources-in-place-resize/README.md#risks-and-mitigations, if this resize feature ever is enabled, the pod spec information might actually not represent the value k8s is using.
Right now, I'm exploring to check if it's possible to get access to the limit and request values as computed by k8s using the PodLimits and PodRequests methods respectively. If that is the case, maybe we could consider using those and aligning the definition with that. However, let me know if you see any reason for us to avoid going this route, in which case we can discuss the remaining alternatives.
| - k8s.pod | ||
| note: > | ||
| The value range is [0.0, 1.0]. A value of 1.0 means the pod is using 100% of its CPU limit. | ||
| If the CPU limit is not set for the pod, or if one of the pod's containers has no CPU limit, this metric SHOULD NOT be emitted. |
There was a problem hiding this comment.
Should we instead define here that the metric is either taken directly from the Pod spec otherwise it is calculated based on container's limits if all of them define one?
There was a problem hiding this comment.
I detailed in the comment above why that was covered in the limit section (i.e. mostly due to k8s behavior). I can make the change, but I feel this way provides a little more leeway to the limit value and keeps limit and limit_utilization more aligned with each other.
If the sum of container's limits only happens for the utilization metric, then imo the two metrics might feel somewhat inconsistent with each other, if that makes sense.
There was a problem hiding this comment.
I agree! My point is mostly about explicitly defining how these ratios are calculated. We can re-use the definition from the k8s.pod.cpu.limit/request to state how the limits/requests are computed.
There was a problem hiding this comment.
Got it, I understand it now and that is a great point! I am going to address the comment regarding the limits/requests definition to make it more specific and clear. At that point, will also change the definition here to actually rely more on the other metrics.
|
This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days. |
NOTE: PR ON HOLD
This PR is on hold until issue #3558 is resolved and K8S PR kubernetes/kubernetes#136676 is merged. Due to KEP https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5419-pod-level-resources-in-place-resize/README.md, limit value will have two possible states from k8s v1.36 onwards, desired and actual. Issue #3558 attempts to solve that problem for containers since the feature is already available for those. Learnings from there will be ported later on to this PR.
Related to #2768
Changes
This PR proposes new k8s cpu metrics for the pod-level resources spec introduced in k8s v1.34 and related to issue #2768
k8s.pod.cpu.limitk8s.pod.cpu.requestk8s.pod.cpu.limit_utilizationk8s.pod.cpu.request_utilizationWith the introduction of pod-level resource limits and resources for CPU and memory, it makes sense to align with container metrics and introduce these for those who now want to define resource limits/requests at pod level.
There is a PR out for Kubelet stats receiver to update
k8s.pod.cpu.limit_utilizationandk8s.pod.cpu.request_utilizationto make use of the new k8s feature. Note that since they already existed in the receiver, the proposal is to extend on top of the existing behavior instead of replacing it. (open-telemetry/opentelemetry-collector-contrib#46464)Assuming they are approved,
k8s.pod.cpu.limitandk8s.pod.cpu.requestwould later introduced in k8scluster receiver alongside the container counterparts.