Skip to content

Releases: novitalabs/pegaflow

v0.22.10

22 Jun 06:54
1088928

Choose a tag to compare

Release of the pegaflow workspace / pegaflow-llm 0.22.10 — 12 commits since v0.22.9, centered on MLA KV-cache storage efficiency, model-aware transfer-backend selection, and cross-node redundancy observability.

English

✨ Features

  • MLA KV page-first storage (#360) — store MLA KV cache page-first so per-block metadata collapses, cutting metadata overhead for MLA models.
  • Per-layer MLA TP save distribution (#359) — spread MLA tensor-parallel save work across ranks by layer to balance save load.
  • Model-aware KV transfer backend (#357) — the connector auto-selects the KV transfer backend per model; the server no longer needs a static backend setting.
  • Metaserver block-redundancy metrics (#361) — new pegaflow_metaserver_block_redundancy{owners="1|2|3|>=4"} distribution plus pegaflow_metaserver_block_redundancy_avg gauge, surfacing the cross-node KV replication factor (how much effective cache capacity shrinks).
  • P/D handshake wire schema (#345) — seal the prefill/decode handshake wire schema in pegaflow-pd-wire.
  • Transfer benchmarks (#349, eb69309) — p2p RDMA fetch example plus native D2H/H2D transfer-path measurement.

🐛 Fixes

  • Drop late duplicate saves (#358) — skip late duplicate saves of already-resident blocks, avoiding redundant work.

♻️ Refactors

  • Restructure pd_connector for maintainability (#355).
  • SealedBlock owns its RawBlock slots (#352).
  • Use usize for block_ids, validated at the RPC boundary (#351).

🔧 Chore

  • Bump version 0.22.90.22.10 (#362).

⚠️ Strict version handshake: client and server must match on CARGO_PKG_VERSION at registration — upgrade both sides together.

中文

✨ 新功能

  • MLA KV page-first 存储 (#360) — MLA KV cache 按 page-first 布局存储,合并每块元数据,降低 MLA 模型的元数据开销。
  • MLA TP save 按层跨 rank 分摊 (#359) — 把 MLA 张量并行的 save 工作按层分散到各 rank,均衡 save 负载。
  • 按模型自动选 KV 传输 backend (#357) — connector 按模型自动选择 KV 传输 backend,server 不再需要静态指定。
  • Metaserver 块冗余度指标 (#361) — 新增 pegaflow_metaserver_block_redundancy{owners="1|2|3|>=4"} 分布与 pegaflow_metaserver_block_redundancy_avg,反映跨节点 KV 副本数(即有效缓存容量缩水倍数)。
  • P/D 握手 wire schema (#345) — 在 pegaflow-pd-wire 中固化 prefill/decode 握手协议。
  • 传输 benchmark (#349, eb69309) — p2p RDMA fetch 示例 + 原生 D2H/H2D 传输路径测量。

🐛 修复

  • 丢弃迟到的重复 save (#358) — 跳过对已驻留块的迟到重复 save,避免冗余工作。

♻️ 重构

  • 重构 pd_connector 提升可维护性 (#355)。
  • SealedBlock 自持 RawBlock slot (#352)。
  • block_ids 改用 usize,在 RPC 边界校验 (#351)。

🔧 杂项

  • 版本 0.22.90.22.10 (#362)。

⚠️ 严格版本握手:注册时 client 与 server 必须 CARGO_PKG_VERSION 完全一致——升级请两端同时进行。

Full Changelog: v0.22.9...v0.22.10

v0.22.9

11 Jun 11:25
53bd5dc

Choose a tag to compare

What's Changed

  • refactor(core): replace KVCacheRegistration with validated KVCacheLayout by @xiaguan in #340
  • feat(pd): support mtp split connector by @GentleCold in #341
  • fix(core)!: seal instance layer topology from registered layers by @xiaguan in #346
  • feat(connector): register layer-split KV caches from cache config by @feifei-111 in #343
  • chore: bump version to 0.22.9 by @xiaguan in #347

Full Changelog: v0.22.8...v0.22.9

v0.22.8

10 Jun 06:25
cce4946

Choose a tag to compare

chore: bump version to 0.22.8 (#337)

## Summary
- bump workspace and Python package versions to 0.22.8
- refresh Cargo.lock package versions for workspace crates

## Tests
- cargo metadata --locked --no-deps
- uv run --no-project python -c "import pathlib, tomllib;
p=tomllib.loads(pathlib.Path('python/pyproject.toml').read_text());
assert p['project']['version']=='0.22.8'; assert
p['tool']['commitizen']['version']=='0.22.8'; print('python metadata
version ok')"
- pre-commit hooks during commit, including cargo test --release

v0.22.7

09 Jun 08:21
521fb66

Choose a tag to compare

chore(release): prepare 0.22.7

v0.22.6

05 Jun 10:11
a7a6b8b

Choose a tag to compare

0.22.5 -> 0.22.6 对照

0.22.6 是 P/D 分离的第一个 stable 版本:这版把 P/D RDMA push 路径的性能稳定性、正确性保护、布局兼容和发布包完整性一起补齐。0.22.5 是上一版 release baseline;0.22.6 是推荐用于 P/D 分离的正式版本。

维度 0.22.5 0.22.6
P/D 分离 基础发布状态,P/D 路径仍在收敛 第一版 stable:P/D RDMA push、scheduler/worker 协议、prefill/decode 流程、proxy 和 native RDMA binding 都经过稳定化
性能 RDMA push 和布局映射还缺少本轮优化 支持 MLA cache layout;优化 layout mapping;补齐 connector metrics;提升 RDMA push 发送、完成通知、等待路径和链路利用率观测
正确性 对版本不匹配、零命中 query probe、RDMA-only query path 等边界保护不足 vLLM/server 版本不匹配 fail early;零命中 query probe 不再错误释放 lease;RDMA-only query path 加 cfg-gate;vLLM 启动失败更早暴露
发包 CUDA 13 wheel 使用 --no-default-features --features cuda-13,会关掉默认 rdma CUDA 13 CI/release wheel 显式使用 --features cuda-13,rdma;cu12 和 cu13 包都带 RDMA
其他修复 SSD cache 单路径 SSD cache 支持多路径;清理未使用 Rust 依赖;测试门禁和 e2e cargo feature 对齐

0.22.5 -> 0.22.6 Comparison

0.22.6 is the first stable P/D disaggregation release. It closes the loop on P/D RDMA push performance, correctness guards, cache-layout compatibility, and package completeness. 0.22.5 is the previous release baseline; 0.22.6 is the recommended release for stable P/D disaggregation.

Area 0.22.5 0.22.6
P/D disaggregation Baseline release state, with the P/D path still converging First stable release: P/D RDMA push, scheduler/worker protocol, prefill/decode flow, proxy, and native RDMA binding have been stabilized
Performance RDMA push and layout mapping had not received this round of optimization Adds MLA cache layout support; optimizes layout mapping; adds connector metrics; improves RDMA push sending, completion signaling, wait behavior, and link-utilization visibility
Correctness Edge cases such as version mismatch, zero-hit query probes, and RDMA-only query paths were under-protected Fails early on vLLM/server version mismatch; avoids incorrect lease release for zero-hit query probes; cfg-gates RDMA-only query behavior; fails faster when vLLM startup dies
Packaging CUDA 13 wheels used --no-default-features --features cuda-13, which disabled default rdma CUDA 13 CI/release wheels now explicitly use --features cuda-13,rdma; both cu12 and cu13 packages include RDMA
Other fixes Single SSD cache path Multiple SSD cache paths; unused Rust dependencies removed; test gates and e2e cargo features aligned

Artifacts

  • GitHub Release includes 10 wheels: Python 3.10-3.14 for both pegaflow-llm and pegaflow-llm-cu13.
  • PyPI published pegaflow-llm==0.22.6 and pegaflow-llm-cu13==0.22.6.

v0.22.5

02 Jun 09:57
b4f9b38

Choose a tag to compare

chore(release): prepare 0.22.5 (#316)

## Summary
- Bump workspace and Python package versions to 0.22.5
- Refresh Cargo.lock package versions

## Test
- cargo check --workspace --all-targets --locked

Release workflow note: .github/workflows/release.yml publishes on v* tag
push after this PR is merged.

Co-authored-by: root <root@host-192-168-172-86.tail58a9b0.ts.net>

0.22.4

29 May 05:49
ded49f1

Choose a tag to compare

✨ Highlights

  • Disaggregated P/D over RDMA push (#297) — New PdConnector plus a v2 transfer engine that pushes KV prefill→decode layer-by-layer, overlapping transfer with compute. Added TTFT is 2–4× lower than NIXL on H20/Qwen3-8B.
  • Query leases (#284, #288) — Pin refcounts replaced by lease-backed query/load/release; query results are Loading/Ready only, with TTL-based reclaim.
  • Save-only mode (#300) — New pegaflow.mode lets an instance populate the cache without serving reads.

🚀 Features

  • Sharded SSD cache across multiple files (#299)
  • Per-peer N QPs with WQE-level round-robin, --qps-per-peer (default 2) (#291)
  • Metaserver node-lifecycle fencing with heartbeat UUIDs, --node-stale-secs (#285)

🐛 Fixes

  • Preserve non-MLA KV layout registration, e.g. GLM-4.7-FP8 (#295)
  • Allocate pinned pools on GPU-local NUMA nodes (#293)
  • Handle split physical KV blocks for FlashMLA (#292)
  • Allow query lease consume once per worker (multi-worker loads) (#288)
  • Validate --nics; fail on RDMA init error instead of silently disabling P2P (#283)
  • Remove scheduler save limit (#282); demote cache_lookup_reuse log to debug (#280)

⚡ Performance

  • CPU-path benchmarks + long-block save optimizations (#290): query 12.3 → 6.1 ms, save 21.3 → 13.1 ms

⚠️ Upgrade notes

  • Query API is now Loading/Ready only; pin/unpin semantics removed (#284)
  • Release RPC returns FailedPrecondition for unknown/expired leases (#289)
  • --nics rejects empty entries and fails on RDMA init error (#283)
  • New flags: --qps-per-peer, --node-stale-secs; new config pegaflow.mode
  • TinyLFU admission is now off unless explicitly enabled (#287)

v0.22.3

15 May 12:36
40105e5

Choose a tag to compare

v0.22.3

v0.22.2

12 May 17:25
20d98ff

Choose a tag to compare

Release v0.22.2

v0.22.1

12 May 09:37
9f01c03

Choose a tag to compare

Release v0.22.1