-
Notifications
You must be signed in to change notification settings - Fork 277
Feature/add slo metrics to flowcontrol #2685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
loicmarchal
wants to merge
9
commits into
kubernetes-sigs:main
Choose a base branch
from
loicmarchal:feature/add-slo-metrics-to-flowcontrol
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
c226903
add SLO related metrics
loicmarchal 59ab187
add more slo classes to use as histogram label
loicmarchal 23c7728
centralize SLO header name
loicmarchal 6bf02c3
clean up code
loicmarchal 212ff10
add comments for request headers name
loicmarchal 0ae195f
add metrics for number of queued requests per SLO class
loicmarchal 8a01b3a
add counter for incoming request per SLO class
loicmarchal 7a0a67e
Merge branch 'main' into feature/add-slo-metrics-to-flowcontrol
loicmarchal bf548d6
Merge branch 'main' into feature/add-slo-metrics-to-flowcontrol
loicmarchal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -401,6 +401,27 @@ var ( | |
| append([]string{"fairness_id", "priority", "outcome", "inference_pool"}, modelLabels...), | ||
| ) | ||
|
|
||
| flowControlSLORequestQueueDuration = prometheus.NewHistogramVec( | ||
| prometheus.HistogramOpts{ | ||
| Subsystem: inferenceExtension, | ||
| Name: "flow_control_slo_request_queue_duration_seconds", | ||
| Help: metricsutil.HelpMsgWithStability("Distribution of the total time requests spend in the EPP flow control layer, partitioned by SLO class.", compbasemetrics.ALPHA), | ||
| Buckets: []float64{ | ||
| 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, | ||
| }, | ||
| }, | ||
| []string{"slo_class", "outcome", "inference_pool"}, | ||
| ) | ||
|
|
||
| flowControlSLOIncomingRequestsTotal = prometheus.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Subsystem: inferenceExtension, | ||
| Name: "flow_control_slo_incoming_requests_total", | ||
| Help: metricsutil.HelpMsgWithStability("Total number of requests that entered the EPP flow control layer via EnqueueAndWait.", compbasemetrics.ALPHA), | ||
| }, | ||
| []string{"slo_class", "inference_pool"}, | ||
| ) | ||
|
|
||
| flowControlDispatchCycleDuration = prometheus.NewHistogramVec( | ||
| prometheus.HistogramOpts{ | ||
| Subsystem: inferenceExtension, | ||
|
|
@@ -431,7 +452,7 @@ var ( | |
| Name: "flow_control_queue_size", | ||
| Help: metricsutil.HelpMsgWithStability("Current number of requests actively held in the Flow Control queue.", compbasemetrics.ALPHA), | ||
| }, | ||
| append([]string{"fairness_id", "priority", "inference_pool"}, modelLabels...), | ||
| append([]string{"fairness_id", "priority", "inference_pool", "slo_class"}, modelLabels...), | ||
| ) | ||
|
|
||
| flowControlQueueBytes = prometheus.NewGaugeVec( | ||
|
|
@@ -440,7 +461,7 @@ var ( | |
| Name: "flow_control_queue_bytes", | ||
| Help: metricsutil.HelpMsgWithStability("Current total size in bytes of requests actively held in the Flow Control queue.", compbasemetrics.ALPHA), | ||
| }, | ||
| append([]string{"fairness_id", "priority", "inference_pool"}, modelLabels...), | ||
| append([]string{"fairness_id", "priority", "inference_pool", "slo_class"}, modelLabels...), | ||
| ) | ||
|
|
||
| flowControlPoolSaturation = prometheus.NewGaugeVec( | ||
|
|
@@ -505,6 +526,8 @@ func Register(customCollectors ...prometheus.Collector) { | |
| metrics.Registry.MustRegister(prefixCacheHitRatio) | ||
| metrics.Registry.MustRegister(prefixCacheHitLength) | ||
| metrics.Registry.MustRegister(flowControlRequestQueueDuration) | ||
| metrics.Registry.MustRegister(flowControlSLORequestQueueDuration) | ||
| metrics.Registry.MustRegister(flowControlSLOIncomingRequestsTotal) | ||
| metrics.Registry.MustRegister(flowControlDispatchCycleDuration) | ||
| metrics.Registry.MustRegister(flowControlQueueSize) | ||
| metrics.Registry.MustRegister(flowControlQueueBytes) | ||
|
|
@@ -556,6 +579,8 @@ func Reset() { | |
| prefixCacheHitRatio.Reset() | ||
| prefixCacheHitLength.Reset() | ||
| flowControlRequestQueueDuration.Reset() | ||
| flowControlSLORequestQueueDuration.Reset() | ||
| flowControlSLOIncomingRequestsTotal.Reset() | ||
| flowControlQueueSize.Reset() | ||
| flowControlQueueBytes.Reset() | ||
| flowControlPoolSaturation.Reset() | ||
|
|
@@ -858,6 +883,59 @@ func RecordFlowControlRequestQueueDuration( | |
| ).Observe(duration.Seconds()) | ||
| } | ||
|
|
||
| // SLO class constants label for flow control SLO metrics (bounded buckets for the TTFT SLO header in ms). | ||
| const ( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know if we can/should hardcode these buckets. TTFT is pretty heavily reliant on what the prompt looks like, and a customer could want higher range buckets. |
||
| SLOClassNone = "none" | ||
| SLOClassBelowMS200 = "below_ms_200" | ||
| SLOClassMS200to399 = "ms_200_399" | ||
| SLOClassMS400to599 = "ms_400_599" | ||
| SLOClassMS600to799 = "ms_600_799" | ||
| SLOClassMS800to1000 = "ms_800_1000" | ||
| SLOClassAboveMS1000 = "above_ms_1000" | ||
| ) | ||
|
|
||
| // ClassifySLO maps a raw SLO header value (in milliseconds) to a bounded SLO class label. | ||
| // Returns SLOClassNone when the header is absent or unparseable. | ||
| func ClassifySLO(rawHeaderValue string) string { | ||
| if rawHeaderValue == "" { | ||
| return SLOClassNone | ||
| } | ||
| ms, err := strconv.ParseInt(rawHeaderValue, 10, 64) | ||
| if err != nil || ms < 0 { | ||
| return SLOClassNone | ||
| } | ||
| switch { | ||
| case ms < 200: | ||
| return SLOClassBelowMS200 | ||
| case ms < 400: | ||
| return SLOClassMS200to399 | ||
| case ms < 600: | ||
| return SLOClassMS400to599 | ||
| case ms < 800: | ||
| return SLOClassMS600to799 | ||
| case ms <= 1000: | ||
| return SLOClassMS800to1000 | ||
| default: | ||
| return SLOClassAboveMS1000 | ||
| } | ||
| } | ||
|
|
||
| // RecordFlowControlSLORequestQueueDuration records the queue duration for a request partitioned by its | ||
| // SLO class (derived from the TTFT SLO header "x-slo-ttft-ms"). | ||
| func RecordFlowControlSLORequestQueueDuration( | ||
| sloClass, outcome, inferencePool string, | ||
| duration time.Duration, | ||
| ) { | ||
| flowControlSLORequestQueueDuration.WithLabelValues( | ||
| sloClass, outcome, inferencePool, | ||
| ).Observe(duration.Seconds()) | ||
| } | ||
|
|
||
| // RecordFlowControlSLOIncomingRequest increments the count of requests that entered flow control at EnqueueAndWait. | ||
| func RecordFlowControlSLOIncomingRequest(sloClass, inferencePool string) { | ||
| flowControlSLOIncomingRequestsTotal.WithLabelValues(sloClass, inferencePool).Inc() | ||
| } | ||
|
|
||
| // RecordFlowControlDispatchCycleDuration records the duration of a dispatch cycle in the Flow Control layer. | ||
| func RecordFlowControlDispatchCycleDuration(duration time.Duration) { | ||
| flowControlDispatchCycleDuration.WithLabelValues().Observe(duration.Seconds()) | ||
|
|
@@ -874,23 +952,23 @@ func RecordFlowControlRequestEnqueueDuration( | |
| } | ||
|
|
||
| // IncFlowControlQueueSize increments the Flow Control queue size gauge. | ||
| func IncFlowControlQueueSize(fairnessID, priority, inferencePool, modelName, targetModelName string) { | ||
| flowControlQueueSize.WithLabelValues(fairnessID, priority, inferencePool, modelName, targetModelName).Inc() | ||
| func IncFlowControlQueueSize(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName string) { | ||
| flowControlQueueSize.WithLabelValues(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName).Inc() | ||
| } | ||
|
|
||
| // DecFlowControlQueueSize decrements the Flow Control queue size gauge. | ||
| func DecFlowControlQueueSize(fairnessID, priority, inferencePool, modelName, targetModelName string) { | ||
| flowControlQueueSize.WithLabelValues(fairnessID, priority, inferencePool, modelName, targetModelName).Dec() | ||
| func DecFlowControlQueueSize(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName string) { | ||
| flowControlQueueSize.WithLabelValues(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName).Dec() | ||
| } | ||
|
|
||
| // AddFlowControlQueueBytes increments the Flow Control queue bytes gauge. | ||
| func AddFlowControlQueueBytes(fairnessID, priority, inferencePool, modelName, targetModelName string, bytes uint64) { | ||
| flowControlQueueBytes.WithLabelValues(fairnessID, priority, inferencePool, modelName, targetModelName).Add(float64(bytes)) | ||
| func AddFlowControlQueueBytes(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName string, bytes uint64) { | ||
| flowControlQueueBytes.WithLabelValues(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName).Add(float64(bytes)) | ||
| } | ||
|
|
||
| // SubFlowControlQueueBytes decrements the Flow Control queue bytes gauge. | ||
| func SubFlowControlQueueBytes(fairnessID, priority, inferencePool, modelName, targetModelName string, bytes uint64) { | ||
| flowControlQueueBytes.WithLabelValues(fairnessID, priority, inferencePool, modelName, targetModelName).Sub(float64(bytes)) | ||
| func SubFlowControlQueueBytes(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName string, bytes uint64) { | ||
| flowControlQueueBytes.WithLabelValues(fairnessID, priority, inferencePool, sloClass, modelName, targetModelName).Sub(float64(bytes)) | ||
| } | ||
|
|
||
| // RecordFlowControlPoolSaturation records the current saturation level for an inference pool. | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm feeling hesitant about adding this use-case specific bucketing/labeling (see other comment).
We want to let the user define the buckets and associate it that way. A more robust solution would be adding to the InferenceObjective object, and making the inf objective string the label.
Which would probably be worth a longer discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kfswain! I agree.
I was feeling uncomfortable with the choices of interval (as mentioned in the PR description) because I felt it restricts the possibilities. It's a good indication it should be defined by the user.
It's definitely better to add it to the
InferenceObjective(as SLO attainment is already a planned expansion). How do you think we shoud add it? One label perInferenceObjective? And as manyInferenceObjectives as we need, to represent all the buckets? Would it be correlated to the priority? Thank you!