Releases · novitalabs/pegaflow

22 Jun 06:54

xiaguan

v0.22.10

1088928

v0.22.10 Latest

Latest

Release of the pegaflow workspace / pegaflow-llm 0.22.10 — 12 commits since v0.22.9, centered on MLA KV-cache storage efficiency, model-aware transfer-backend selection, and cross-node redundancy observability.

English

✨ Features

MLA KV page-first storage (#360) — store MLA KV cache page-first so per-block metadata collapses, cutting metadata overhead for MLA models.
Per-layer MLA TP save distribution (#359) — spread MLA tensor-parallel save work across ranks by layer to balance save load.
Model-aware KV transfer backend (#357) — the connector auto-selects the KV transfer backend per model; the server no longer needs a static backend setting.
Metaserver block-redundancy metrics (#361) — new pegaflow_metaserver_block_redundancy{owners="1|2|3|>=4"} distribution plus pegaflow_metaserver_block_redundancy_avg gauge, surfacing the cross-node KV replication factor (how much effective cache capacity shrinks).
P/D handshake wire schema (#345) — seal the prefill/decode handshake wire schema in pegaflow-pd-wire.
Transfer benchmarks (#349, eb69309) — p2p RDMA fetch example plus native D2H/H2D transfer-path measurement.

🐛 Fixes

Drop late duplicate saves (#358) — skip late duplicate saves of already-resident blocks, avoiding redundant work.

♻️ Refactors

Restructure pd_connector for maintainability (#355).
SealedBlock owns its RawBlock slots (#352).
Use usize for block_ids, validated at the RPC boundary (#351).

🔧 Chore

Bump version 0.22.9 → 0.22.10 (#362).

⚠️ Strict version handshake: client and server must match on CARGO_PKG_VERSION at registration — upgrade both sides together.

中文

✨ 新功能

MLA KV page-first 存储 (#360) — MLA KV cache 按 page-first 布局存储,合并每块元数据,降低 MLA 模型的元数据开销。
MLA TP save 按层跨 rank 分摊 (#359) — 把 MLA 张量并行的 save 工作按层分散到各 rank,均衡 save 负载。
按模型自动选 KV 传输 backend (#357) — connector 按模型自动选择 KV 传输 backend,server 不再需要静态指定。
Metaserver 块冗余度指标 (#361) — 新增 pegaflow_metaserver_block_redundancy{owners="1|2|3|>=4"} 分布与 pegaflow_metaserver_block_redundancy_avg,反映跨节点 KV 副本数(即有效缓存容量缩水倍数)。
P/D 握手 wire schema (#345) — 在 pegaflow-pd-wire 中固化 prefill/decode 握手协议。
传输 benchmark (#349, eb69309) — p2p RDMA fetch 示例 + 原生 D2H/H2D 传输路径测量。

🐛 修复

丢弃迟到的重复 save (#358) — 跳过对已驻留块的迟到重复 save,避免冗余工作。

♻️ 重构

重构 pd_connector 提升可维护性 (#355)。
SealedBlock 自持 RawBlock slot (#352)。
block_ids 改用 usize,在 RPC 边界校验 (#351)。

🔧 杂项

版本 0.22.9 → 0.22.10 (#362)。

⚠️ 严格版本握手:注册时 client 与 server 必须 CARGO_PKG_VERSION 完全一致——升级请两端同时进行。

Full Changelog: v0.22.9...v0.22.10

Assets 12

11 Jun 11:25

xiaguan

v0.22.9

53bd5dc

v0.22.9

What's Changed

refactor(core): replace KVCacheRegistration with validated KVCacheLayout by @xiaguan in #340
feat(pd): support mtp split connector by @GentleCold in #341
fix(core)!: seal instance layer topology from registered layers by @xiaguan in #346
feat(connector): register layer-split KV caches from cache config by @feifei-111 in #343
chore: bump version to 0.22.9 by @xiaguan in #347

Full Changelog: v0.22.8...v0.22.9

Contributors

xiaguan, GentleCold, and feifei-111

Assets 12

10 Jun 06:25

github-actions

v0.22.8

cce4946

v0.22.8

chore: bump version to 0.22.8 (#337)

## Summary
- bump workspace and Python package versions to 0.22.8
- refresh Cargo.lock package versions for workspace crates

## Tests
- cargo metadata --locked --no-deps
- uv run --no-project python -c "import pathlib, tomllib;
p=tomllib.loads(pathlib.Path('python/pyproject.toml').read_text());
assert p['project']['version']=='0.22.8'; assert
p['tool']['commitizen']['version']=='0.22.8'; print('python metadata
version ok')"
- pre-commit hooks during commit, including cargo test --release

Assets 12

09 Jun 08:21

github-actions

v0.22.7

521fb66

v0.22.7

chore(release): prepare 0.22.7

Assets 12

05 Jun 10:11

github-actions

v0.22.6

a7a6b8b

v0.22.6

0.22.5 -> 0.22.6 对照

0.22.6 是 P/D 分离的第一个 stable 版本：这版把 P/D RDMA push 路径的性能稳定性、正确性保护、布局兼容和发布包完整性一起补齐。0.22.5 是上一版 release baseline；0.22.6 是推荐用于 P/D 分离的正式版本。

维度	0.22.5	0.22.6
P/D 分离	基础发布状态，P/D 路径仍在收敛	第一版 stable：P/D RDMA push、scheduler/worker 协议、prefill/decode 流程、proxy 和 native RDMA binding 都经过稳定化
性能	RDMA push 和布局映射还缺少本轮优化	支持 MLA cache layout；优化 layout mapping；补齐 connector metrics；提升 RDMA push 发送、完成通知、等待路径和链路利用率观测
正确性	对版本不匹配、零命中 query probe、RDMA-only query path 等边界保护不足	vLLM/server 版本不匹配 fail early；零命中 query probe 不再错误释放 lease；RDMA-only query path 加 cfg-gate；vLLM 启动失败更早暴露
发包	CUDA 13 wheel 使用 `--no-default-features --features cuda-13`，会关掉默认 `rdma`	CUDA 13 CI/release wheel 显式使用 `--features cuda-13,rdma`；cu12 和 cu13 包都带 RDMA
其他修复	SSD cache 单路径	SSD cache 支持多路径；清理未使用 Rust 依赖；测试门禁和 e2e cargo feature 对齐

0.22.5 -> 0.22.6 Comparison

0.22.6 is the first stable P/D disaggregation release. It closes the loop on P/D RDMA push performance, correctness guards, cache-layout compatibility, and package completeness. 0.22.5 is the previous release baseline; 0.22.6 is the recommended release for stable P/D disaggregation.

Area	0.22.5	0.22.6
P/D disaggregation	Baseline release state, with the P/D path still converging	First stable release: P/D RDMA push, scheduler/worker protocol, prefill/decode flow, proxy, and native RDMA binding have been stabilized
Performance	RDMA push and layout mapping had not received this round of optimization	Adds MLA cache layout support; optimizes layout mapping; adds connector metrics; improves RDMA push sending, completion signaling, wait behavior, and link-utilization visibility
Correctness	Edge cases such as version mismatch, zero-hit query probes, and RDMA-only query paths were under-protected	Fails early on vLLM/server version mismatch; avoids incorrect lease release for zero-hit query probes; cfg-gates RDMA-only query behavior; fails faster when vLLM startup dies
Packaging	CUDA 13 wheels used `--no-default-features --features cuda-13`, which disabled default `rdma`	CUDA 13 CI/release wheels now explicitly use `--features cuda-13,rdma`; both cu12 and cu13 packages include RDMA
Other fixes	Single SSD cache path	Multiple SSD cache paths; unused Rust dependencies removed; test gates and e2e cargo features aligned

Artifacts

GitHub Release includes 10 wheels: Python 3.10-3.14 for both pegaflow-llm and pegaflow-llm-cu13.
PyPI published pegaflow-llm==0.22.6 and pegaflow-llm-cu13==0.22.6.

Assets 12

02 Jun 09:57

github-actions

v0.22.5

b4f9b38

v0.22.5

chore(release): prepare 0.22.5 (#316)

## Summary
- Bump workspace and Python package versions to 0.22.5
- Refresh Cargo.lock package versions

## Test
- cargo check --workspace --all-targets --locked

Release workflow note: .github/workflows/release.yml publishes on v* tag
push after this PR is merged.

Co-authored-by: root <root@host-192-168-172-86.tail58a9b0.ts.net>

Assets 12

29 May 05:49

xiaguan

v0.22.4

ded49f1

0.22.4

✨ Highlights

Disaggregated P/D over RDMA push (#297) — New PdConnector plus a v2 transfer engine that pushes KV prefill→decode layer-by-layer, overlapping transfer with compute. Added TTFT is 2–4× lower than NIXL on H20/Qwen3-8B.
Query leases (#284, #288) — Pin refcounts replaced by lease-backed query/load/release; query results are Loading/Ready only, with TTL-based reclaim.
Save-only mode (#300) — New pegaflow.mode lets an instance populate the cache without serving reads.

🚀 Features

Sharded SSD cache across multiple files (#299)
Per-peer N QPs with WQE-level round-robin, --qps-per-peer (default 2) (#291)
Metaserver node-lifecycle fencing with heartbeat UUIDs, --node-stale-secs (#285)

🐛 Fixes

Preserve non-MLA KV layout registration, e.g. GLM-4.7-FP8 (#295)
Allocate pinned pools on GPU-local NUMA nodes (#293)
Handle split physical KV blocks for FlashMLA (#292)
Allow query lease consume once per worker (multi-worker loads) (#288)
Validate --nics; fail on RDMA init error instead of silently disabling P2P (#283)
Remove scheduler save limit (#282); demote cache_lookup_reuse log to debug (#280)

⚡ Performance

CPU-path benchmarks + long-block save optimizations (#290): query 12.3 → 6.1 ms, save 21.3 → 13.1 ms

⚠️ Upgrade notes

Query API is now Loading/Ready only; pin/unpin semantics removed (#284)
Release RPC returns FailedPrecondition for unknown/expired leases (#289)
--nics rejects empty entries and fails on RDMA init error (#283)
New flags: --qps-per-peer, --node-stale-secs; new config pegaflow.mode
TinyLFU admission is now off unless explicitly enabled (#287)

Assets 12

15 May 12:36

github-actions

v0.22.3

40105e5

v0.22.3

v0.22.3

Assets 12

12 May 17:25

github-actions

v0.22.2

20d98ff

v0.22.2

Release v0.22.2

Assets 12

12 May 09:37

github-actions

v0.22.1

9f01c03

v0.22.1

Release v0.22.1

Assets 12

Uh oh!

Releases: novitalabs/pegaflow

v0.22.10

English

✨ Features

🐛 Fixes

♻️ Refactors

🔧 Chore

中文

✨ 新功能

🐛 修复

♻️ 重构

🔧 杂项

Uh oh!

v0.22.9

What's Changed

Contributors

Uh oh!

v0.22.8

Uh oh!

v0.22.7

Uh oh!

v0.22.6

0.22.5 -> 0.22.6 对照

0.22.5 -> 0.22.6 Comparison

Artifacts

Uh oh!

v0.22.5

Uh oh!

0.22.4

Uh oh!

v0.22.3

Uh oh!

v0.22.2

Uh oh!

v0.22.1

Uh oh!