Ref blockmgr4 by Kang-Meng · Pull Request #1861 · jd-opensource/xllm

Kang-Meng · 2026-07-01T03:34:21Z

Description

Related Issues

Change Type

Pull Request Checklist

Thank you for contributing to xLLM. Before requesting review, please make sure the following items are complete.

PR Title and Commit Messages

The PR title and each commit message follow the xLLM commit format: <type>: <subject>.

Allowed types: feat, bugfix, docs, test, refactor, chore, style, revert, perf, model, build, release.
The subject should use clear English, start with a verb, include at least 4 words, and end with ..

Pre-commit Checks

I have installed pre-commit by running pip install pre-commit or an equivalent command.
I have installed the hooks with pre-commit install.
I have run pre-commit run --all-files and fixed any reported issues.

If you are unsure how to set up pre-commit, see the pre-commit documentation.

Self Review

I have self-reviewed the code according to .agents/skills/code-review/references/custom-code-style.md, especially code written or assisted by AI.
I have rebased this PR onto the latest main branch.

Build and Test Coverage

Tests have been added or updated as needed.
CUDA: python setup.py build test has passed on a CUDA machine.
NPU: python setup.py build test has passed on an NPU machine.
MLU: python setup.py build test has passed on an MLU machine.

Reviewer Notes

…ork.

… impls.

gemini-code-assist

Code Review

This pull request re-enables and refactors the hierarchical KV cache transfer and block manager pool, which were temporarily disabled during a block-manager refactor. It introduces platform-independent abstractions like BatchMemcpy and LayerSynchronizer (with NPU implementations), updates various KVCacheImpl subclasses to support host cache allocation, and integrates these components back into WorkerImpl and LLMEngine. The code review feedback focuses on enforcing repository style guides, including reserving vector capacity, using fixed-width integers instead of plain int, annotating constant arguments with parameter names, avoiding relative include paths, avoiding auto for primitive types, and adding defensive bounds checks for tensor indexing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-01T03:37:00Z

+  std::vector<HierarchyKVCacheTransfer::LayerBatchRange> ranges;
+  if (num_layers <= 0) {
+    return ranges;
+  }
+
+  uint32_t layers_per_batch =
+      requested_batches == 0
+          ? static_cast<uint32_t>(num_layers)
+          : static_cast<uint32_t>(num_layers) / requested_batches;
+  layers_per_batch = std::max<uint32_t>(layers_per_batch, 1);

-constexpr uint64_t MBUF_SIZE = 128 * 1024 * 1024;
-constexpr uint32_t BATCH_COPY_MAX_SIZE = 4096;
-constexpr uint32_t TIMEOUT_S = 60;      // second
-constexpr uint32_t TIMEOUT_MS = 60000;  // millisecond
+  for (int64_t begin = 0; begin < num_layers; begin += layers_per_batch) {


According to the repository style guide, we should always call reserve() before filling a std::vector when the size is known or can be estimated. In this case, the size of ranges can be estimated as (num_layers + layers_per_batch - 1) / layers_per_batch.

std::vector<HierarchyKVCacheTransfer::LayerBatchRange> ranges; if (num_layers <= 0) { return ranges; } uint32_t layers_per_batch = requested_batches == 0 ? static_cast<uint32_t>(num_layers) : static_cast<uint32_t>(num_layers) / requested_batches; layers_per_batch = std::max<uint32_t>(layers_per_batch, 1); ranges.reserve((static_cast<size_t>(num_layers) + layers_per_batch - 1) / layers_per_batch); for (int64_t begin = 0; begin < num_layers; begin += layers_per_batch) {

References

Always reserve() before filling a std::vector when the size is known or can be estimated. ^(link)

gemini-code-assist · 2026-07-01T03:37:00Z

+  for (int i = 0; i < load_threadpool_->size() + offload_threadpool_->size();
+       ++i) {


According to the repository style guide, we should use fixed-width integers (int32_t, int64_t) or standard unsigned types like size_t instead of plain int. Since size() returns size_t, using size_t for the loop counter avoids signed/unsigned comparison warnings.

Suggested change

for (int i = 0; i < load_threadpool_->size() + offload_threadpool_->size();

++i) {

for (size_t i = 0; i < load_threadpool_->size() + offload_threadpool_->size();

++i) {

References

Use fixed-width integers (int32_t, int64_t) instead of plain int, unless the API you are calling explicitly requires int. ^(link)

gemini-code-assist · 2026-07-01T03:37:00Z

+        .enable_xtensor(false)
+        .enable_raw_device_allocator(false)
+        .host_blocks_factor(options_.host_blocks_factor());
 #if defined(USE_NPU)
-    aclrtHostUnregister(page_aligned_data_);
+    host_opts.enable_kv_cache_huge_page_allocator(false);


According to the repository style guide, constant arguments must be annotated with a comment indicating the parameter name when calling functions or constructors.

Suggested change

.enable_xtensor(false)

.enable_raw_device_allocator(false)

.host_blocks_factor(options_.host_blocks_factor());

#if defined(USE_NPU)

aclrtHostUnregister(page_aligned_data_);

host_opts.enable_kv_cache_huge_page_allocator(false);

.enable_xtensor(/*enable_xtensor=*/false)

.enable_raw_device_allocator(/*enable_raw_device_allocator=*/false)

.host_blocks_factor(options_.host_blocks_factor());

#if defined(USE_NPU)

host_opts.enable_kv_cache_huge_page_allocator(/*enable_kv_cache_huge_page_allocator=*/false);

#endif

References

Annotate constant arguments with a comment indicating the parameter name when calling functions or constructors. ^(link)

gemini-code-assist · 2026-07-01T03:37:00Z

+        CHECK_LT(host_block_id, host_tensor.size(0))
+            << "host block id out of range.";
+        torch::Tensor device_block = device_tensor[device_block_id];
+        torch::Tensor host_block_layer =
+            host_tensor[host_block_id][static_cast<int64_t>(layer_slot)];


For defensive programming and correctness, we should check that device_block_id is non-negative and within the bounds of device_tensor.size(0) before accessing it, similar to how host_block_id is checked.

CHECK_LT(host_block_id, host_tensor.size(0)) << "host block id out of range."; CHECK_GE(device_block_id, 0) << "device block id must be non-negative."; CHECK_LT(device_block_id, device_tensor.size(0)) << "device block id out of range."; torch::Tensor device_block = device_tensor[device_block_id]; torch::Tensor host_block_layer = host_tensor[host_block_id][static_cast<int64_t>(layer_slot)];

References

Ensure appropriate bounds and validity checks exist before accessing array/tensor elements.

gemini-code-assist · 2026-07-01T03:37:00Z

 #include <algorithm>

 #include "block_manager_impl.h"
+#include "composite_block_manager.h"


According to the repository style guide, relative paths in #include are not allowed. Always use project-root-relative paths.

Suggested change

#include "composite_block_manager.h"

#include "core/framework/block/composite_block_manager.h"

References

No relative paths in #include. Always use project-root-relative paths. ^(link)

gemini-code-assist · 2026-07-01T03:37:01Z

+    auto ret = aclrtRecordEvent(*impl_.get_event(layer_index),
+                                stream->get_stream()->stream());


According to the repository style guide, we should not use auto for simple/primitive types. aclError is a simple/primitive type (typedef of uint32_t or enum), so it should be explicitly typed.

Suggested change

auto ret = aclrtRecordEvent(*impl_.get_event(layer_index),

stream->get_stream()->stream());

const aclError ret = aclrtRecordEvent(*impl_.get_event(layer_index),

stream->get_stream()->stream());

References

Do not use auto for simple/primitive types. ^(link)

gemini-code-assist · 2026-07-01T03:37:01Z

+  CHECK(base_ptr != MAP_FAILED)
+      << "Failed to mmap host memory, size=" << total_bytes;
+  if (mlock(base_ptr, total_bytes) != 0) {
+    const int err = errno;


According to the repository style guide, we should use fixed-width integers (int32_t, int64_t) instead of plain int.

Suggested change

const int err = errno;

const int32_t err = errno;

References

Use fixed-width integers (int32_t, int64_t) instead of plain int, unless the API you are calling explicitly requires int. ^(link)

gemini-code-assist · 2026-07-01T03:37:01Z

+                                  count,
+                                  attrs,
+                                  attrs_indexes,
+                                  1,


According to the repository style guide, constant arguments must be annotated with a comment indicating the parameter name when calling functions or constructors.

Suggested change

1,

/*attrsCount=*/1,

References

Annotate constant arguments with a comment indicating the parameter name when calling functions or constructors. ^(link)

gemini-code-assist · 2026-07-01T03:37:01Z

+        .host_blocks_factor(options_.host_blocks_factor())
+        .layers_wise_copy_batchs(options_.layers_wise_copy_batchs())
+        .enable_mla(options_.enable_mla())
+        .enable_kvcache_store(false);


According to the repository style guide, constant arguments must be annotated with a comment indicating the parameter name when calling functions or constructors.

Suggested change

.enable_kvcache_store(false);

.enable_kvcache_store(/*enable_kvcache_store=*/false);

References

Annotate constant arguments with a comment indicating the parameter name when calling functions or constructors. ^(link)

Kang-Meng added 3 commits June 29, 2026 17:07

feat: re-enable HierarchyBlockManagerPool Device-Host transfer framew…

5cc2dfb

…ork.

feat: allocate host KV cache as page-aligned pinned memory in KVCache…

288f3e4

… impls.

feat: restore D2H offload collection.

94180e0

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ref blockmgr4#1861

Ref blockmgr4#1861
Kang-Meng wants to merge 3 commits into
jd-opensource:mainfrom
Kang-Meng:ref_blockmgr4

Kang-Meng commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		for (int i = 0; i < load_threadpool_->size() + offload_threadpool_->size();
		++i) {

	#include "composite_block_manager.h"
	#include "core/framework/block/composite_block_manager.h"

		auto ret = aclrtRecordEvent(*impl_.get_event(layer_index),
		stream->get_stream()->stream());

	.enable_kvcache_store(false);
	.enable_kvcache_store(/enable_kvcache_store=/false);

Uh oh!

Conversation

Kang-Meng commented Jul 1, 2026

Description

Related Issues

Change Type

Pull Request Checklist

PR Title and Commit Messages

Pre-commit Checks

Self Review

Build and Test Coverage

Reviewer Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant