feat(server): support KV Flash with same-backend target layer split by weicj · Pull Request #391 · Luce-Org/lucebox-hub

weicj · 2026-06-16T04:52:40Z

Summary

This PR extends KV Flash bounded KV residency to same-backend target layer split, so split targets can use a pool-sized KV cache instead of allocating full-context KV on every shard. The same-backend path is wired for Qwen35, Gemma4, and Laguna.

Changes

Add KV Flash pool allocation and pager lifecycle support to Qwen35, Gemma4, and Laguna same-backend target layer split.
Thread KvFlashPager through same-backend layer-split prefill/decode paths, including Qwen35 DFlash verify.
Build layer-split attention graphs in pool-slot mode when KV Flash is enabled, including set-rows KV writes and slot-space attention masks.
Align Gemma4 split boundaries so KV-sharing layers stay on the same shard as their source layer.
Guard Laguna KV Flash mask uploads so only tensors allocated for the current graph are written.

Notes

Local same-backend runtime smoke passed on dual Pro VII HIP with Qwen3.6-27B Q3, --target-devices hip:0,hip:1 --target-layer-split 1,1, and DFLASH_KVFLASH=1024; the request returned a valid OpenAI-compatible response.
Local same-backend runtime smoke passed on dual Pro VII HIP with Gemma4 E2B Q4, --target-devices hip:0,hip:1 --target-layer-split 1,1, and DFLASH_KVFLASH=512; the server auto-aligned shards to [0,14)+[14,35) and returned a valid OpenAI-compatible response.
Local same-backend runtime smoke passed on dual Pro VII HIP with Laguna-XS.2 Q4, --target-devices hip:0,hip:1 --target-layer-split 1,1, and DFLASH_KVFLASH=512; the pool was raised to 768 tokens for the SWA tail requirement and the request returned a valid OpenAI-compatible response.

cubic-dev-ai

3 issues found across 20 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

feat(server): add same-backend KV Flash target split

641708d

cubic-dev-ai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread server/src/qwen35/qwen35_layer_split_adapter.cpp

Comment thread server/src/gemma4/gemma4_layer_split_adapter.cpp Outdated

Comment thread server/src/laguna/laguna_layer_split_adapter.cpp Outdated

fix(server): repair KV Flash split snapshot restore

8914883

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): support KV Flash with same-backend target layer split#391

feat(server): support KV Flash with same-backend target layer split#391
weicj wants to merge 2 commits into
Luce-Org:mainfrom
weicj:feat-kvflash-same-backend-layer-split

weicj commented Jun 16, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weicj commented Jun 16, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weicj commented Jun 16, 2026 •

edited by cubic-dev-ai Bot

Loading

cubic-dev-ai Bot left a comment •

edited

Loading