Interference setup and CUDA benchmarks by elvingerpaul · Pull Request #12 · eth-easl/vllm_profile

elvingerpaul · 2025-09-28T12:59:14Z

This PR merges all of our changes on the vllm repo including benchmarks into main:

profiling support for NSYS and NCU profiling for LLM prefill, decode and both
modify vllm to run in a dedicated CUDA stream
have an additional stream to run the interference benchmark
use CUDA green context for inter_sm interference to partition SMs between both vllm stream and interference stream
paper results + plotting scripts

…benchmarking

…ve to metrics

…benchmarking

Inter-SM scripts + L2 cache results

…ctorized loads

github-actions · 2025-09-28T12:59:23Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

fotstrt and others added 30 commits May 8, 2025 12:30

basic benchmark file

c58bf46

profile args and processing

51c9f24

basic ncu support for v1 engine and prefill/decode seperation

e605fca

basic instructions

168c148

fixes

439fd29

processing scripts

d2fd4ff

fuse plotting scripts and unify plots into vertical subplots

ff854f5

minimal interference setup

b9a2ac0

load .env at very beginning

899a36f

Merge branch 'benchmarking' of github.com:eth-easl/vllm_profile into …

301286e

…benchmarking

ctypes interface to call custom interference kernels

9fdd87a

remove commented line

1167010

add copy interference kernel

8cd1d19

add averages

8326808

profile after warmup

0d265de

update plot captions

a419a85

Merge branch 'benchmarking' of github.com:eth-easl/vllm_profile into …

8fb077d

…benchmarking

slowdown plot

2491316

update plotting script

1773bef

start with inter-sm

5e9dd3c

solve multiproc error

7692c75

mem benchmarking infa

0e7d247

l2 infra

3b4f844

inter_cm scripts

0285897

add gemma3, sleep kernel, scripts for plotting kernel slowdown relati…

879014c

…ve to metrics

Merge branch 'benchmarking' of github.com:eth-easl/vllm_profile into …

a72cf39

…benchmarking

intra-sm ipc benchmark script

add4512

thread block scheduler results

7983638

update gemma-1b ipc measurements

b8c27b6

minor fixes

1f54563

Paul Elvinger and others added 28 commits July 1, 2025 19:52

specific config files for intra-sm/ipc

7ecbe3c

intra-sm tb scheduler interference

d82dce9

l2 scripts for running and plotting

92a07a1

fix inter_sm scripts and move to /pub/scratch

e87ec33

compiler to make ptx code available in ncu source section

c354e65

create dedicated folder for l2 and membw

d1d0355

fix: stop profiler in interference thread

4a0d4db

first complete version l2 cache evaluation script

c04305a

l2 cache h100 results + updated scripts

da2a1cb

l2 cache txt evaluation results

46eb3ce

remove duplicate txt entry in gitignore

3224996

l2 cache full model results, tbt latency

ed52ec1

L2 2MB single kernel interference

b984c30

uncomment

8b956c6

verify mem bw scripäts

9c2c8fa

Merge pull request #4 from eth-easl/benchmarking_inter_fix_scripts

4d0d317

Inter-SM scripts + L2 cache results

shared memory checkpoint for intra-sm interference

23995cb

copy kernel to use 4-way vectorized loads

cf42f80

colocate two benchmarks

6013c59

L2 hit rate interference H100, green contexts, copy kernel without ve…

3436bc0

…ctorized loads

add loaded byte ratio, l2 hit interference

429e07f

update l2 hit rate plot

81b4a94

add config file for high L2 hit rate, low throughput

ef54376

update tb scheduler plot

22b8a3f

add l2 / mem ratio to kernel trace plot

7641a68

mem bw gather

c401a04

Merge branch 'benchmarking_inter' into benchmarking

98921b1

Merge branch 'benchmarking-shared-mem' into benchmarking

3c55725

elvingerpaul merged commit c5e2819 into main Sep 28, 2025
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Interference setup and CUDA benchmarks#12

Interference setup and CUDA benchmarks#12
elvingerpaul merged 93 commits into
mainfrom
benchmarking

elvingerpaul commented Sep 28, 2025

Uh oh!

github-actions Bot commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

elvingerpaul commented Sep 28, 2025

Uh oh!

github-actions Bot commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants