Skip to content

[vLLM] Hoist TT attention batch metadata out of the compiled graph#5485

Open
mmanzoorTT wants to merge 1 commit into
mainfrom
mmanzoor/vllm-hoist-batch-idxs
Open

[vLLM] Hoist TT attention batch metadata out of the compiled graph#5485
mmanzoorTT wants to merge 1 commit into
mainfrom
mmanzoor/vllm-hoist-batch-idxs

Conversation

@mmanzoorTT

Copy link
Copy Markdown
Contributor

Ticket

closes #5154

Problem description

Batch index are generated for each prefill call

What's changed

  • Hoist batch index creation to runner constructer.
  • Add num_user to attn_metadata and use it directly instead of deducing it

Checklist

  • New/Existing tests provide coverage for changes

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.83%. Comparing base (b621dab) to head (2732f60).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5485      +/-   ##
==========================================
- Coverage   33.85%   33.83%   -0.03%     
==========================================
  Files          37       37              
  Lines        4989     4989              
==========================================
- Hits         1689     1688       -1     
- Misses       3300     3301       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[vLLM] Hoist prefill fill_batch_idx to a setup-time metadata input

2 participants