Update cpu resource metrics to handle resize#3559
Update cpu resource metrics to handle resize#3559jmmcorreia wants to merge 9 commits intoopen-telemetry:mainfrom
Conversation
ChrsMark
left a comment
There was a problem hiding this comment.
Thank's for digging into this @jmmcorreia! I left some comments but overall I think it goes to right direction!
@open-telemetry/semconv-k8s-approvers PTAL
model/k8s/metrics.yaml
Outdated
| - id: metric.k8s.container.cpu.limit.actual | ||
| type: metric | ||
| metric_name: k8s.container.cpu.request | ||
| metric_name: k8s.container.cpu.limit.actual |
There was a problem hiding this comment.
No strong preference here but maybe an alternative could be k8s.container.cpu.limit.current. @open-telemetry/semconv-k8s-approvers @dashpole thoughts?
There was a problem hiding this comment.
Just for reference, I did not really had any strong preference, so decided to align the naming scheme with the k8s doc i.e. Key Concepts under https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/
There was a problem hiding this comment.
I don't see current or actual used much in other conventions, so I think we are fine to stick with what is closest to the k8s docs.
There was a problem hiding this comment.
Thinking about this more, I think I do prefer current. desired and actual make sense when comparing the two (e.g. writing an alert if they differ from each other for a long time), but don't make as much sense when viewing actual in isolation.
E.g. if k8s allowed updating the container image in-place, it would look odd to me to see k8s.container.image.actual, but k8s.container.image.current makes sense.
If we did decide to make the desired request/limit opt-in, I think this would be especially important.
There was a problem hiding this comment.
I think your example makes sense. I also took some time to also check what existing metrics seem to be doing and current seems to be the standard. The following k8s metrics are also defined k8s.hpa.pod.current and k8s.hpa.pod.desired, which follow the proposal here. Finally as per the k8s definitions, actual is the currently configured.
I think you make a valid point. Will make the change from actual to current.
| - k8s.container | ||
| note: | | ||
| The value range is [0.0,1.0]. A value of 1.0 means the container is using 100% of its actual CPU request. | ||
| If the CPU request is not set, this metric SHOULD NOT be emitted for that container. |
There was a problem hiding this comment.
A question for my own understanding: if the request is not set on the container level by the user but is set on Pod level, will k8s automatically apply this down to all containers or otherwise how the container level resources are affected by the Pod level defined resources?
There was a problem hiding this comment.
For a detailed view, you can check the tables under https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#comprehensive-tabular-view as that cover all the possible cases.
So if your question is if by defining pod limits/requests, k8s will also set the container limits/requests in the pod spec then the answer is no. In that case, the container values will remain empty. I also checked on the API to response to see if container actual/current limits would be returned but that does not seem to be the case.
Pod resource limits can be set as the container cgroup max value when no container limit is set, mostly to ensure runtimes such as jvm can read that value and adapt accordingly.
For all other cases, if a value is missing, then the container cgroup value will use the default available. -> My assumption is that basically containers are allowed to use the pod's available resources freely until they are exhausted. Once that happens, enforcement happens in the pod cgroup, which then impacts the child cgroups.
Co-authored-by: Christos Markou <chrismarkou92@gmail.com>
|
I have a few hesitations with this:
Did we consider something more like |
Agreed. The utilization metrics does attempt to make that easier on users by making comparison on the collector side and providing the presentation metric to them.
That is a valid point, I think it makes sense. Probably a lot of users might not attempt to patch the container resources whilst it is running.
Before raising the PR I was stuck on deciding between the implemented approach (i.e. @dashpole But happy yo hear your thoughts on this. @ChrsMark if you have any other comments or anything to add I guess that could be helpful. |
|
The point about utilization is a good one. I think maybe switching to |
| | `k8s.container.ready` (type: `gauge`) | `k8s.container.ready` (type: `updowncounter`) | | ||
| | `k8s.container.cpu_limit_utilization` (type: `gauge`) | `k8s.container.cpu.limit_utilization` (type: `gauge`) | | ||
| | `k8s.container.cpu_request_utilization` (type: `gauge`) | `k8s.container.cpu.request_utilization` (type: `gauge`) | | ||
| | `k8s.container.cpu_limit` (type: `gauge`) | `k8s.container.cpu.limit.desired`, `k8s.container.cpu.limit.current` (type: `updowncounter`) | |
There was a problem hiding this comment.
Another option to consider: we keep the existing k8s.container.cpu.limit (it maps to spec.containers[*].resources) and introduce an additive metric like k8s.container.cpu.effective_limit sourced from status.containerStatuses[*].resources. When resize isn't in use or the cluster doesn't support it, implementors can simply not emit the effective metric - there is no migration burden for existing users, and the new metric is only relevant when there's actually a difference to observe.
There was a problem hiding this comment.
@dashpole had a very similar suggestion earlier in the thread and I also had discussed this with @ChrsMark.
Going with the current approach allow to neatly group desired, actual, and utilization metrics under limit and request. Moreover, going with this seems to be somewhat more aligned with the more recent semantic conventions, where _ is less used.
Also, imo having limit and request as namespaces could potentially help easily add more values in the future if needed without impacting other metrics (instead of what is happening now).
Finally, I have marked this item as an enhancement instead of breaking changes mostly because none of this metrics seems to be used anywhere, at least not in the latest format. For example, k8scluster receiver still uses k8s.container.cpu_limit and kubeletstats receiver uses k8s.container.cpu_limit_utilization: for instance. Hence, if there is a moment to making a more drastic but potentially future proof change, I would say now is the time, since these have not been adopted yet.
Personally I prefer to keep the metrics in the current proposed format. But happy to hear your thoughts on this take. The same thing will have to be done for container memory and then pod cpu and memory. Hence, it would be nice we can close on a design we all agree on.
There was a problem hiding this comment.
Thanks for the additional context, the desired/current split works for me.
One last concern - I believe the new status field used for current metric is gated on InPlacePodVerticalScaling and only populated when the feature gate is enabled (beta/on-by-default in 1.33, GA in 1.35). On older clusters the "recommended" current metric can't be emitted while the "opt-in" desired can. I guess implementation can fallback to spec but might be worth adding a note about this and the K8s version requirement in the metric description. I don't think we've had a discussion about how to handle K8s version requirements in semconv before.
Also, just an observation, K8s 1.33+ exposes PodResizePending and PodResizeInProgress as pod conditions. Implementing this open issue for a k8s.pod.condition.status metric would round off the resize observability usecase nicely, letting users correlate desired/current divergence with the actual resize state.
There was a problem hiding this comment.
That is a great point. I aligned this proposal with the oldest active version, which is 1.33 (See eol). If that is deemed enough (i.e. only support current active versions), then no special behavior needs to be accounted for at the container level (but it might have to be done for pods, since this feature is more recent there).
We could also document that, for older k8s releases, the only workaround could be to enable the "opt-in" desired metric. It puts more burden on customer side, but since they would be running a non active k8s release, I wonder if that could be an acceptable trade-off.
Co-authored-by: Jina Jain <jjain@splunk.com>
Co-authored-by: Jina Jain <jjain@splunk.com>
Related to #3558
Changes
Modifies the k8s container limit and request metrics to align with resize of CPU metrics as documented in https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/
It fixes the CPU part related to issue #3558. Memory metrics can be fixed in a follow-up PR based on the comments provided for the CPU section
For more details check issue
Merge requirement checklist #3558
[chore]