-
Notifications
You must be signed in to change notification settings - Fork 742
[Cherry-Pick][Optimization] Support multimodal runner for image/video… #7576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/2.6
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -78,6 +78,7 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| speculate_schedule_cache, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| set_data_ipc, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| unset_data_ipc, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| update_attn_mask_offsets, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import zmq | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -179,6 +180,9 @@ def __init__( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.encoder_cache = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Note(Zhengshifeng) init video cache for VL model | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_cache = {} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+183
to
+185
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Note(Zhengshifeng) init video cache for VL model | |
| self.video_cache = {} |
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增的预编码特征路径(image_feature_urls/image_features + image_grid_thws)以及 prefill 阶段的 attention_mask_offset 填充/更新属于关键推理逻辑,但当前 tests/worker/test_gpu_model_runner.py 里没有覆盖该分支。建议补充单测:构造带 image_feature_urls 的 request(含/不含 image_grid_thws、shape 不匹配等),验证 process_mm_features 的输出类型与错误处理;同时覆盖 update_attn_mask_offsets + copy 的基本形状约束。
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里仅以 image_feature_urls 非空作为条件,但随后直接访问 inputs["image_grid_thws"]/inputs["image_features"]。在 ResourceManagerV1._download_features 只会填充 image_features(不保证存在 image_grid_thws),因此该分支可能触发 KeyError 或出现长度不一致导致切片错误。建议改为显式校验这两个字段存在且长度与 image_feature_urls 对齐;缺失时给 request 设置清晰的 error_message/error_code 或直接抛出异常。
| multi_vision_inputs["image_grid_thws"].extend( | |
| inputs["image_grid_thws"][request.image_start : request.image_end] | |
| ) | |
| image_feature = inputs["image_features"][request.image_start : request.image_end] | |
| image_feature_urls = inputs["image_feature_urls"] | |
| image_grid_thws = inputs.get("image_grid_thws") | |
| image_features = inputs.get("image_features") | |
| image_start = request.image_start | |
| image_end = request.image_end | |
| if image_grid_thws is None or image_features is None: | |
| raise ValueError( | |
| "Missing multimodal input fields for image features: " | |
| f"request_idx={request.idx}, " | |
| f"has_image_feature_urls={image_feature_urls is not None}, " | |
| f"has_image_features={image_features is not None}, " | |
| f"has_image_grid_thws={image_grid_thws is not None}" | |
| ) | |
| if not ( | |
| len(image_feature_urls) == len(image_features) == len(image_grid_thws) | |
| ): | |
| raise ValueError( | |
| "Mismatched multimodal input lengths: " | |
| f"request_idx={request.idx}, " | |
| f"image_feature_urls={len(image_feature_urls)}, " | |
| f"image_features={len(image_features)}, " | |
| f"image_grid_thws={len(image_grid_thws)}" | |
| ) | |
| if not (0 <= image_start <= image_end <= len(image_feature_urls)): | |
| raise ValueError( | |
| "Invalid image slice range: " | |
| f"request_idx={request.idx}, " | |
| f"image_start={image_start}, " | |
| f"image_end={image_end}, " | |
| f"total_images={len(image_feature_urls)}" | |
| ) | |
| multi_vision_inputs["image_grid_thws"].extend( | |
| image_grid_thws[image_start:image_end] | |
| ) | |
| image_feature = image_features[image_start:image_end] |
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
检测到 image_feature_tensor 的 hidden_size 不匹配时这里只是 logger.error 但继续执行,后续 concat/模型 forward 很可能因 shape 不一致直接崩溃或产生错误结果。建议在发现不匹配时立即 fail-fast(raise 异常或设置 request.error_message/error_code 并跳过该请求),并在错误信息中同时包含期望 shape(如 [*, hidden_size])与实际 shape。
| for image_feature_tensor in image_feature: | |
| if image_feature_tensor.shape[1] != self.fd_config.model_config.hidden_size: | |
| logger.error( | |
| f"Shape mismatch: expected shape={self.fd_config.model_config.hidden_size}, \ | |
| but got {image_feature_tensor.shape}" | |
| ) | |
| expected_hidden_size = self.fd_config.model_config.hidden_size | |
| for image_feature_tensor in image_feature: | |
| if image_feature_tensor.shape[1] != expected_hidden_size: | |
| error_message = ( | |
| f"Image feature hidden size mismatch for request idx={request.idx}: " | |
| f"expected shape [*, {expected_hidden_size}], " | |
| f"but got {list(image_feature_tensor.shape)}" | |
| ) | |
| logger.error(error_message) | |
| raise ValueError(error_message) |
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里使用 vf.cuda() 会硬编码 CUDA 设备,并且逐个 tensor 迁移可能带来额外开销;同时本 runner 其他路径普遍用 .to(self.device) 保持设备一致。建议改为使用 image_feature_tensor.to(self.device)(或在 to_tensor 阶段统一放到目标 device),并尽量减少逐个拷贝。
| image_features_gpu = [vf.cuda() for vf in image_feature] | |
| image_embeds = paddle.concat(image_features_gpu, axis=0) | |
| image_embeds = paddle.concat(image_feature, axis=0).to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Bug inputs 可能为 None,但此处直接调用 inputs.get(...) 未做 None 检查
当 request.multimodal_inputs 返回 None 时(非多模态请求进入 prefill 分支),inputs.get("attention_mask_offset", None) 会抛出 AttributeError: 'NoneType' object has no attribute 'get'。
建议修复:
if self.enable_mm:
self.share_inputs["decode_states"][idx, 0] = 0
inputs = request.multimodal_inputs
if inputs is not None:
attn_offset_len = prefill_end_index - prefill_start_index
if inputs.get("attention_mask_offset", None) is None:
attention_mask_offset_slice = np.arange(...)
...
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_attn_mask_offsets 在 GPU 侧返回的 attn_mask_offsets 长度是 ids_remove_padding.shape[0] * 2(见算子实现),这里直接 copy_ 到 share_inputs["attn_mask_offsets"] 要求目标 buffer shape 完全一致。请确认 InputBatch/ProposerInputBatch 里 attn_mask_offsets 的预分配长度是 max_token_capacity*2,否则这里会在运行时因 shape 不匹配而报错。
| self.share_inputs["attn_mask_offsets"].copy_(attn_mask_offsets, False) | |
| attn_mask_offsets_buffer = self.share_inputs["attn_mask_offsets"] | |
| attn_mask_offsets_len = attn_mask_offsets.shape[0] | |
| if attn_mask_offsets_buffer.numel() < attn_mask_offsets_len: | |
| raise RuntimeError( | |
| "attn_mask_offsets buffer capacity is insufficient: " | |
| f"required={attn_mask_offsets_len}, " | |
| f"capacity={attn_mask_offsets_buffer.numel()}. " | |
| "Please ensure the preallocated attn_mask_offsets buffer " | |
| "has capacity for max_token_capacity * 2." | |
| ) | |
| attn_mask_offsets_buffer[:attn_mask_offsets_len].copy_(attn_mask_offsets, False) |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -228,7 +228,11 @@ def init_share_inputs(self): | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if self.is_mm_model: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.image_features = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.image_grid_thws = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.image_features_list = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_features = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_grid_thws = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_infinity_scales = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Set block tables | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| pre_max_block_num = ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -345,7 +349,26 @@ def init_share_inputs(self): | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| dtype="float32", | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.image_features = None # Built before the forward | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.image_grid_thws = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.image_features_list = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_features = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_grid_thws = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.video_infinity_scales = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| decode_states_len = self.speculative_config.num_speculative_tokens + 1 if self.speculative_decoding else 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.decode_states = paddle.full( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [self.scheduler_config.max_num_seqs, decode_states_len], | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -1, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| dtype="int32", | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.attn_mask_offsets = paddle.full( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| shape=[self.scheduler_config.max_num_seqs * self.model_config.max_model_len], | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+364
to
+365
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.attn_mask_offsets = paddle.full( | |
| shape=[self.scheduler_config.max_num_seqs * self.model_config.max_model_len], | |
| attn_mask_token_capacity = self.scheduler_config.max_num_seqs * self.model_config.max_model_len | |
| self.attn_mask_offsets = paddle.full( | |
| shape=[attn_mask_token_capacity * 2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 建议 非 speculative 路径缺少 attn_mask_offsets_decoder 初始化
swap_states(line 939)中在 self.enable_mm 条件下会访问 self.attn_mask_offsets_decoder,而当前新增的 init_share_inputs 多模态初始化分支(非 speculative)仅初始化了 attn_mask_offsets_full 和 attn_mask_offsets,缺少:
self.attn_mask_offsets_decoder = paddle.full([self.scheduler_config.max_num_seqs, 1], -1, dtype="int32")如果非 speculative 场景下同样需要 swap_states,则会触发 AttributeError。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 建议 swap_data 中新增了 decode_states 和 attn_mask_offsets_full 的 swap,但缺少 attn_mask_offsets 的 swap
与 swap_states(line 937)的逻辑相比,此处的 swap_data 块(init_share_inputs 路径)新增了 decode_states 和 attn_mask_offsets_full,但遗漏了 attn_mask_offsets。若两处 swap 逻辑不一致,可能导致 attn_mask_offsets 在某些 preemption 场景下数据错乱。建议补充:
swap_data(self.attn_mask_offsets, i1, i2)
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset_share_inputs 里对 attn_mask_offsets 做 fill(-1) 没问题,但需要确保 attn_mask_offsets 的预分配 shape 与 update_attn_mask_offsets 输出一致(token_num*2)。如果按当前初始化的较小 shape,会在 prepare_inputs 里 copy 时报错。建议在这里也同步按 *2 的 shape 维护。
| fill_paddle_tensor(self, "attn_mask_offsets", -1) | |
| fill_paddle_tensor(self, "attn_mask_offsets_full", -1) | |
| attn_mask_token_num = max_num_seqs * self.model_config.max_model_len | |
| attn_mask_offsets_shape = [attn_mask_token_num * 2] | |
| attn_mask_offsets = getattr(self, "attn_mask_offsets", None) | |
| if attn_mask_offsets is None or list(attn_mask_offsets.shape) != attn_mask_offsets_shape: | |
| attn_mask_offsets_dtype = attn_mask_offsets.dtype if attn_mask_offsets is not None else "int32" | |
| self.attn_mask_offsets = paddle.full( | |
| shape=attn_mask_offsets_shape, | |
| fill_value=-1, | |
| dtype=attn_mask_offsets_dtype, | |
| ) | |
| else: | |
| fill_paddle_tensor(self, "attn_mask_offsets", -1) | |
| attn_mask_offsets_full = getattr(self, "attn_mask_offsets_full", None) | |
| if attn_mask_offsets_full is None or list(attn_mask_offsets_full.shape) != attn_mask_offsets_shape: | |
| attn_mask_offsets_full_dtype = ( | |
| attn_mask_offsets_full.dtype if attn_mask_offsets_full is not None else "int32" | |
| ) | |
| self.attn_mask_offsets_full = paddle.full( | |
| shape=attn_mask_offsets_shape, | |
| fill_value=-1, | |
| dtype=attn_mask_offsets_full_dtype, | |
| ) | |
| else: | |
| fill_paddle_tensor(self, "attn_mask_offsets_full", -1) |
Copilot
AI
Apr 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里新增 swap_data(self.attn_mask_offsets, i1, i2) 可能是错误的:attn_mask_offsets 是按 token 展平的一维 buffer(长度与 ids_remove_padding token_num 对齐,且 update_attn_mask_offsets 输出为 token_num*2),按 batch 维度交换 i1/i2 只会交换单个元素,无法保持与 cu_seqlens_q/query_start 的一致性,容易导致 attention mask offsets 错乱。建议不要在 reorder/swap 时交换该一维 buffer,而是在每次 pre_process 后统一重算并整段 copy_;或者把其存储改为按 batch 的二维布局后再实现正确的交换逻辑。
| swap_data(self.attn_mask_offsets, i1, i2) | |
| swap_data(self.attn_mask_offsets_full, i1, i2) | |
| swap_data(self.attn_mask_offsets_decoder, i1, i2) | |
| # Attention mask offset buffers may be token-flattened derived state | |
| # rather than batch-aligned storage. Swapping a single element by | |
| # batch index can break consistency with the flattened token layout. | |
| # Keep them untouched here and let the later preprocessing stage | |
| # rebuild them from the current batch layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 建议
self.video_cache = {}初始化后在本文件中未见任何读写video_cache被初始化但在此 PR 所有变更中均未使用(video_features直接通过self.share_inputs传递)。若此字段用于后续功能,建议添加注释说明预期用途;否则建议移除,避免混淆。