Feature/add slo metrics to flowcontrol#2685
Feature/add slo metrics to flowcontrol#2685loicmarchal wants to merge 9 commits intokubernetes-sigs:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: loicmarchal The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @loicmarchal. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
| } | ||
|
|
||
| // SLO class constants label for flow control SLO metrics (bounded buckets for the TTFT SLO header in ms). | ||
| const ( |
There was a problem hiding this comment.
I don't know if we can/should hardcode these buckets. TTFT is pretty heavily reliant on what the prompt looks like, and a customer could want higher range buckets.
| flowKey := req.FlowKey() | ||
| priority := strconv.Itoa(flowKey.Priority) | ||
| reqBytes := req.ByteSize() | ||
| sloClass := metrics.ClassifySLO(extractHeader(req, fwkrequest.TTFTSLOMsHeaderKey)) |
There was a problem hiding this comment.
I'm feeling hesitant about adding this use-case specific bucketing/labeling (see other comment).
We want to let the user define the buckets and associate it that way. A more robust solution would be adding to the InferenceObjective object, and making the inf objective string the label.
Which would probably be worth a longer discussion.
There was a problem hiding this comment.
Thank you @kfswain! I agree.
I was feeling uncomfortable with the choices of interval (as mentioned in the PR description) because I felt it restricts the possibilities. It's a good indication it should be defined by the user.
It's definitely better to add it to the InferenceObjective (as SLO attainment is already a planned expansion). How do you think we shoud add it? One label per InferenceObjective? And as many InferenceObjectives as we need, to represent all the buckets? Would it be correlated to the priority? Thank you!
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds the following metrics:
queue sizeandqueue bytesmetricsThe TTFT SLO is passed with the request in the headers (
x-slo-ttft-ms).These metrics will allow tracking the percentage of requests with a duration below or above their associated SLO class as well as the number of incoming and queued requests per SLO class.
The SLO classes might need to have smaller intervals (currently the interval is 200 ms), but it would increase cardinality of the Prometheus histogram
Which issue(s) this PR fixes:
Fixes #2532
Does this PR introduce a user-facing change?: