feat: better prometheus buckets for batch_size, isl, and input length#847
Open
michaelfeil wants to merge 1 commit intohuggingface:mainfrom
Open
feat: better prometheus buckets for batch_size, isl, and input length#847michaelfeil wants to merge 1 commit intohuggingface:mainfrom
michaelfeil wants to merge 1 commit intohuggingface:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
I realized the prometheus metrics do some short-cuts where it could be helpful to have more information. Right now, buckets are bit generic and would benefit from some exta buckets. Most of the time batch size, and tokens are small, tokens by default often 8k or 16k, except for qwen models also 40k. Then for input-sequence-length, its in the same bucket. Potentially people release longer models, so the upper buckets are ok. I think this could help getting more visibility in load-distribution.
For batch_size:
For tokens (half-steps to 32k):
There is a limit on total number of metrics (times labels and buckets) which is around 750 for Baseten etc. I think we are still far below it.
Fixes # (issue)
Before submitting
instasnapshots?Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.