Inter-SM scripts + L2 cache results by elvingerpaul · Pull Request #4 · eth-easl/vllm_profile

elvingerpaul · 2025-07-06T13:22:48Z

separates L2 cache and Memory bandwidth into dedicated folders each with its own scripts (adds a bit of duplication but this way each folder should contain most of the scripts that it needs). I also slightly adjusted the scripts to simplify measuring for different batch sizes and prompt sizes otherwise they remain mostly the same
I removed the profile ranges when profiling with nsys
slightly modified script to extract kernel latencies from nsys sqlite file (with the offsets as suggested, indeed much easier :)) since we now have the entire trace and no longer only the decode part. Since we're mostly interested in the decode, we simply keep the last n kernels, where n is the number of kernels in the decode stage (changes based on batch size)
created a folder other which contains all the scripts which I wasn't sure whether we still need them. Happy to delete them but wanted to keep them around for now in case you need them.
L2 results (txt and csv files) for batch sizes 1, 8, 16 and prompt sizes 100, 1000

github-actions · 2025-07-06T13:22:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

elvingerpaul · 2025-07-06T13:27:03Z

+    if sort_lats:
+        # plot the largest latency first, to ensure all bars are visible
+        for i in range(num_kernels):
+            l = [(lats[trace][i], trace) for trace in traces if trace in lats]
+
+            sorted_l = sorted(l, key=lambda x: x[0], reverse=True)
+            if bar:
+                for lat, lo in sorted_l:
+                    ax.bar(i, lat, color=colors[lo])
+
+        ax.legend(
+            handles=[plt.Rectangle((0,0),1,1, color=colors[trace]) for trace in traces if trace in lats],
+            labels=[f"{round(int(trace)*4/(1024*1024), 2)} MB" if trace != "iso" else "iso" for trace in traces if trace in lats],
+        )
+    else:


I added this in non-default mode. To be discussed, currently we have a fixed order in which the bars are plotted assuming that the highest bar is always plotted first in the background and that we plot the lower ones on top. This is not always the case.

The above sorts the latencies each time to make sure that we always plot the highest bar first. To be honest visually it looks a bit harder to understand afterwards, because sometimes the orange bar is on top, sometimes it is in the background etc.

The current alternative uses slighthly transparent bars (by setting alpha) which is the version I like most at this point, though not ideal. Let me know if you have a preference on this

Paul Elvinger and others added 12 commits July 3, 2025 15:43

fix inter_sm scripts and move to /pub/scratch

e87ec33

compiler to make ptx code available in ncu source section

c354e65

create dedicated folder for l2 and membw

d1d0355

fix: stop profiler in interference thread

4a0d4db

first complete version l2 cache evaluation script

c04305a

l2 cache h100 results + updated scripts

da2a1cb

l2 cache txt evaluation results

46eb3ce

remove duplicate txt entry in gitignore

3224996

l2 cache full model results, tbt latency

ed52ec1

L2 2MB single kernel interference

b984c30

uncomment

8b956c6

verify mem bw scripäts

9c2c8fa

elvingerpaul requested a review from fotstrt July 6, 2025 13:22

elvingerpaul commented Jul 6, 2025

View reviewed changes

fotstrt approved these changes Jul 8, 2025

View reviewed changes

fotstrt merged commit 4d0d317 into benchmarking_inter Jul 8, 2025
1 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inter-SM scripts + L2 cache results#4

Inter-SM scripts + L2 cache results#4
fotstrt merged 12 commits into
benchmarking_interfrom
benchmarking_inter_fix_scripts

elvingerpaul commented Jul 6, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 6, 2025

Uh oh!

elvingerpaul Jul 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

elvingerpaul commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 6, 2025

Uh oh!

elvingerpaul Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

elvingerpaul commented Jul 6, 2025 •

edited

Loading