Add dpa for dsr1 fp4 by faradawn · Pull Request #28954 · sgl-project/sglang

faradawn · 2026-06-22T18:19:24Z

Motivation

For better performance, add DP attention for FP4 B200.

Reference SemiAnalysisAI/InferenceX#1792

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): ✅ Run #28060588989
Latest PR Test (Extra): ❌ Run #28060588854

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

gemini-code-assist · 2026-06-22T18:19:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zijiexia · 2026-06-22T21:58:49Z

    }

-    return generateCommandFromConfig(config);
+    const isB200Fp4 = vals.hardware === 'b200' && vals.quantization === 'fp4';


This keys on hardware+quantization only, so DP attention is also force-enabled for the low-latency configs (concurrency 4–8), which the prior doc text scoped to high-throughput. Could you confirm against the referenced InferenceX PR (SemiAnalysisAI/InferenceX#1792) whether this recipe was validated for the low-latency scenario, or whether it should be gated on vals.scenario === 'high-throughput'?

zijiexia · 2026-06-22T21:58:50Z

      continue;
    }

+    if (enableDpAttention && key === 'tensor_parallel_size') {


This special-case diverges from the file's config-driven pattern, where commands render purely from each config's parameters + fieldToFlag. Consider moving these flags into the 4 b200-fp4 config blocks and adding the missing fieldToFlag entries (enable_dp_attention, enable_dp_attention_local_control_broadcast, enable_dp_lm_head) — that also makes per-scenario gating trivial.

zijiexia · 2026-06-22T21:58:50Z


+    if (enableDpAttention && key === 'tensor_parallel_size') {
+      command +=
+        ` \\\n  --tensor-parallel-size ${value}` +


These emit long-form --tensor-parallel-size / --data-parallel-size, while the rest of the generated command uses short forms (--tp, --ep-size) and fieldToFlag already maps data_parallel_size → 'dp'. Suggest --tp / --dp for copy-paste consistency.

zijiexia · 2026-06-22T21:58:50Z


+  if (enableDpAttention) {
+    command +=
+      ' \\\n  --schedule-conservativeness 3.33' +


3.33 is a calibrated value: when dp-attention is on, server_args.py applies schedule_conservativeness *= 0.3, so 3.33 × 0.3 ≈ 1.0 (the default). Worth a short comment so it isn't later "rounded" to an integer and silently dropped to 0.3 effective.

zijiexia · 2026-06-22T21:58:50Z

 ```

-**Data Parallelism Attention (`--enable-dp-attention`):** Recommended for high-throughput scenarios. Use `--enable-dp-attention --tp 8 --dp 8` on a single 8-GPU node.
+**Data Parallelism Attention (`--enable-dp-attention`):** Recommended for high-throughput scenarios. For B200 FP4, the command generator enables DP Attention automatically and adds `--data-parallel-size <TP>`, `--enable-dp-attention-local-control-broadcast`, `--enable-dp-lm-head`, `--schedule-conservativeness 3.33`, and `--enable-prefill-delayer`.


This rewrite drops the previous general --enable-dp-attention --tp 8 --dp 8 hint that also applied to other hardware (e.g. H200 high-throughput). If that's still recommended there, consider keeping a one-line general note alongside the B200-FP4-specific text.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

github-actions Bot and others added 5 commits June 5, 2026 00:50

docs: sync LMSYS SGLang blog cards

3c01313

docs: sync LMSYS SGLang blog cards

93be728

Merge branch 'main' of github.com:faradawn/sglang

bbd8708

Merge branch 'main' of github.com:sgl-project/sglang

5c37aa9

add dp attention for dsr1 fp4

5439727

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

faradawn requested review from JustinTong0323, sogalin, wisclmy0611 and zijiexia as code owners June 22, 2026 18:19

github-actions Bot added documentation Improvements or additions to documentation deepseek labels Jun 22, 2026

zijiexia reviewed Jun 22, 2026

View reviewed changes

Klaud-Cold mentioned this pull request Jun 23, 2026

[NV]dsr1-fp4-b200-sglang: add DPA PDL lane SemiAnalysisAI/InferenceX#1792

Merged

faradawn added 2 commits June 23, 2026 11:33

Update DeepSeek-R1 B200 NVFP4 DP attention docs

a20e3c7

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

Update DeepSeek-R1 B200 FP4 backend config

fda9df7

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dpa for dsr1 fp4#28954

Add dpa for dsr1 fp4#28954
faradawn wants to merge 7 commits into
sgl-project:mainfrom
faradawn:add-dpa-for-dsr1

faradawn commented Jun 22, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented Jun 22, 2026

Uh oh!

zijiexia Jun 22, 2026

Uh oh!

zijiexia Jun 22, 2026

Uh oh!

zijiexia Jun 22, 2026

Uh oh!

zijiexia Jun 22, 2026

Uh oh!

zijiexia Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

faradawn commented Jun 22, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

CI States

Uh oh!

gemini-code-assist Bot commented Jun 22, 2026

Uh oh!

zijiexia Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zijiexia Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zijiexia Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zijiexia Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zijiexia Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

faradawn commented Jun 22, 2026 •

edited by github-actions Bot

Loading