fix(bert): propagate multimodal features to Python BertModel forward by parkerpang · Pull Request #1137 · alibaba/rtp-llm

parkerpang · 2026-06-23T17:16:24Z

After the C++ GptModel removal (b6575e4), the Python BertModel.forward path never received computed vision embeddings from the C++ engine. Vision token positions (placeholder IDs -1/-2/-3) went through word embedding lookup producing garbage hidden states, causing ~5% argmax disagreement vs PyTorch baseline on VisionBert.

Fix:

Add multimodal_features and mm_features_locs fields to BertEmbeddingInputs
Expose them via pybind11
Populate them in PyWrappedModel::buildBertEmbeddingInputs
Splice vision embeddings into hidden_states after pre-LN in BertModel.forward

LLLLKKKK · 2026-06-23T18:11:27Z

AI Code Review - PR #1137

Status: BLOCKING

Summary: P0/0 · P1/1 · P2/3 · P3/0

Blocking Issues

P1

Multimodal feature 拼接缺少越界校验和负偏移处理 @ rtp_llm/models_py/model_desc/bert.py:160
- 建议：复用 MultimodalEmbeddingInjector 替代内联代码：在 init 中 self.multimodal_embedding_injector = MultimodalEmbeddingInjector()，forward 中调用 hidden_states = self.multimodal_embedding_injector(hidden_states, mm_features, mm_locs)。参考 multimodal_generic.py:28-30 用法。如必须内联，至少添加 loc<0 截断、start+length 越界 raise、features/locs 长度一致性校验。

Non-blocking Suggestions

P2

未复用 MultimodalEmbeddingInjector，内联拼接代码违反 DRY @ rtp_llm/models_py/model_desc/bert.py:147
- 建议：import MultimodalEmbeddingInjector 并在 init 中实例化，forward 中调用 self.multimodal_embedding_injector(hidden_states, mm_features, mm_locs) 替代内联代码。拼接时机在 pre_decoder_layernorm 之后语义正确，与 GPT 路径在 embed_tokens 之后拼接一致。P1 修复后此问题自动解决。
getattr 字面量访问 pybind11 已注册属性 @ rtp_llm/models_py/model_desc/bert.py:152
- 建议：改为直接属性访问：mm_features = bert_embedding_inputs.multimodal_features；判空条件改为 if mm_features and bert_embedding_inputs.mm_features_locs.numel() > 0。复用 MultimodalEmbeddingInjector 后此问题自动解决。
新增多模态 Bert 路径缺少测试覆盖 @ rtp_llm/models_py/model_desc/bert.py:147
- 建议：添加 Python 单测：构造 mock BertEmbeddingInputs 设置 multimodal_features + mm_features_locs，调用 BertModel.forward 验证 hidden_states 在对应位置被正确覆盖、其他位置不变。如有多模态 Bert 模型权重，同步添加 smoke case。

Checklist Violations (9 fail / 88 total)

General Principles Checklist

[6.1] Software Engineering — DRY：重复非平凡逻辑被抽取或显式复用 → issue 未复用 MultimodalEmbeddingInjector，内联拼接代码违反 DRY
bert.py:152-171 的 splicing 逻辑与 MultimodalEmbeddingInjector（multimodal_embedding.py:30-89）功能重复但校验不完整。multimodal_generic.py 和 qwen3vl.py 已复用该注入器。
[6.1] Architecture — 错误语义：fail-fast/retry/fallback/silent 行为显式 → issue Multimodal feature 拼接缺少越界校验和负偏移处理
bert.py:160-171 缺少越界和长度不匹配校验，越界时 PyTorch 切片静默截断不报错，非法输入行为为 silent corruption。MultimodalEmbeddingInjector 通过 raise IndexError/ValueError 实现 fail-fast。
[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue 新增多模态 Bert 路径缺少测试覆盖
新增 C++ 传播 + pybind 绑定 + Python forward 拼接三层逻辑，仓库内无任何 multimodal + Bert 测试。
[6.1] Tests — 边界 case 覆盖（空、单元素、最大值） → issue 新增多模态 Bert 路径缺少测试覆盖
缺少边界测试：空 multimodal_features、features/locs 长度不匹配、loc 越界、负 loc 值等场景均未覆盖。

RTP-LLM Checklist

[B] 正确性与逻辑 — 逻辑错误、off-by-one、null/zero 检查 → issue Multimodal feature 拼接缺少越界校验和负偏移处理
bert.py:164 locs_cpu[i] 当 len(mm_features)>mm_locs.numel() 时 IndexError；bert.py:171 hidden_states[start:start+length] 无越界检查，PyTorch 切片越界时静默截断而非报错，导致部分特征丢弃无错误信号。MultimodalEmbeddingInjector:81-84 有显式 IndexError。
[B] 正确性与逻辑 — 边界 case（空输入、单元素、最大值） → issue Multimodal feature 拼接缺少越界校验和负偏移处理
负 loc 值（prefix-cached 图片，MultimodalEmbeddingInjector:72-78 有裁剪处理）、features/locs 长度不匹配、feature 超出 hidden_states 边界等边界 case 均未处理。
[H] 测试与 CI — 测试覆盖充分：大重构等价覆盖，新功能端到端测试 → issue 新增多模态 Bert 路径缺少测试覆盖
Bert 模型新增多模态 feature splicing 功能，涉及 C++ → pybind → Python 三层变更，无单元测试或 smoke 测试覆盖该路径。
[I] 代码质量 — 同一功能用统一工具函数 → issue 未复用 MultimodalEmbeddingInjector，内联拼接代码违反 DRY
多模态 embedding splicing 已有统一工具 MultimodalEmbeddingInjector（multimodal_generic.py/qwen3vl.py 已复用），Bert 路径内联实现了简化版但缺少校验逻辑。

Python Static-First Checklist

[P.A] 静态结构与类型纪律 — 禁止 getattr/setattr literal 访问 → issue getattr 字面量访问 pybind11 已注册属性
bert.py:152-153 使用 getattr(bert_embedding_inputs, "multimodal_features", None)。该字段已在 pybind struct 中通过 def_readwrite 注册，属性始终存在（默认空 vector），getattr fallback 永远不触发，应直接属性访问。

Strengths

pybind11 新增字段有安全 C++ 默认值（空 vector / undefined Tensor），不影响非多模态 Bert 模型的现有行为
C++/pybind/Python 三层改动配合完整，架构方向正确，与现有 PyMultimodalInputs 模式一致

LLLLKKKK · 2026-06-24T03:32:30Z

AI Code Review - PR #1137

Status: BLOCKING

Summary: P0/0 · P1/1 · P2/0 · P3/1

Blocking Issues

P1

multimodal_features 未移至 CUDA，与 buildPyMultimodalInputs 不一致 @ rtp_llm/cpp/models/PyWrappedModel.cc:246
- 建议：参考 buildPyMultimodalInputs 的模式，在循环中对每个 feature 调用 .cuda() 后再赋值：
  std::vectortorch::Tensor mm_feats;
  for (const auto& f : inputs.multimodal_features.value()) {
  mm_feats.emplace_back(f.cuda());
  }
  bert_embedding_inputs.multimodal_features = std::move(mm_feats);

Non-blocking Suggestions

P3

单元测试直接在 CUDA 上构造 features，未覆盖 host→device 路径 @ rtp_llm/models_py/model_desc/test/bert_multimodal_test.py:72
- 建议：可考虑增加一个 test case 将 feature 放在 CPU 上，验证 MultimodalEmbeddingInjector 的 device 迁移能力，以对齐生产路径。

Checklist Violations (2 fail / 88 total)

RTP-LLM Checklist

[B] 正确性与逻辑 — 逻辑错误、off-by-one、null/zero 检查 → issue multimodal_features 未移至 CUDA，与 buildPyMultimodalInputs 不一致
buildBertEmbeddingInputs 对 multimodal_features 仅做 has_value() 检查后直接赋值，未像 buildPyMultimodalInputs 那样调用 .cuda() 搬到 GPU。holdInputsHostBuffers 证实 tensor 在 host pinned memory，Python 侧虽不 crash 但引入隐式 device 迁移开销。
[I] 代码质量 — 同一功能用统一工具函数 → issue multimodal_features 未移至 CUDA，与 buildPyMultimodalInputs 不一致
buildPyMultimodalInputs 对 features 逐个 .cuda()，buildBertEmbeddingInputs 直接拷贝不 .cuda()，同一 multimodal_features 数据源的 CUDA 搬运方式不统一

Strengths

复用已有 MultimodalEmbeddingInjector 模块，避免重复实现，与 decoder 模型的多模态注入路径保持一致
测试覆盖全面：正常注入、空特征、负偏移截断、越界、长度不匹配共 5 个 case，且使用轻量 stub 避免加载真实权重
pybind 暴露正确：新字段有合理默认值（vector 默认空、Tensor 默认 undefined），不破坏已有消费者
变更聚焦且最小化：仅传播必要字段 + 调用 1 个 injector，无多余抽象

LLLLKKKK · 2026-06-24T05:07:46Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/1

lgtm ready to ci

Non-blocking Suggestions

P3

测试中 assertRaises 未验证异常消息内容 @ rtp_llm/models_py/model_desc/test/bert_multimodal_test.py:122
- 建议：改用 self.assertRaisesRegex(IndexError, 'cannot be placed at loc') 和 self.assertRaisesRegex(ValueError, 'entries but .* features') 精确匹配 MultimodalEmbeddingInjector 的错误消息。

Checklist Violations (3 fail / 67 total)

General Principles Checklist

[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → checklist-only
bert_multimodal_test.py 提供聚焦单测覆盖 Python 层 BertModel.forward 的注入接线（5 个场景），但 C++ buildBertEmbeddingInputs 传播路径和 pybind 绑定无集成/smoke 测试。当首个 BERT 多模态模型就绪后建议补充 smoke 测试。

RTP-LLM Checklist

[H] 测试与 CI — 测试覆盖充分：大重构等价覆盖，新功能端到端测试 → checklist-only
新功能有聚焦单测覆盖 Python 层接线，但 C++ → pybind → Python 的完整集成路径缺少端到端测试。待首个 BERT 多模态模型就绪后补充 smoke 测试即可，当前不升级为 issue。

Python Static-First Checklist

[P.G] 测试规范 — pytest.raises 带 match 参数 → issue 测试中 assertRaises 未验证异常消息内容
test_out_of_range_loc_raises (line 122) 使用 self.assertRaises(IndexError)、test_locs_length_mismatch_raises (line 133) 使用 self.assertRaises(ValueError)，均未验证异常消息内容。若 forward 中其他位置也抛出同类异常，测试可能误通过。

Strengths

复用已有 MultimodalEmbeddingInjector 共享模块，与 qwen3vl、multimodal_generic、qwen3_next 等 6 个模型共享同一注入实现，避免重复代码
测试覆盖 5 个场景：正常多 feature 注入、空输入 early return、负 loc 前缀缓存截断、越界 IndexError、数量不匹配 ValueError，边界覆盖充分
pybind 新字段有 C++ 隐式默认值（空 vector + 未定义 tensor），无多模态的 BERT 模型完全不受影响，向后兼容

LLLLKKKK · 2026-06-24T13:45:29Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/0

lgtm ready to ci

Non-blocking Suggestions

P2

多模态 .cuda() 调用缺少 DevicePerfWrapper 包装 @ rtp_llm/cpp/models/PyWrappedModel.cc:249
- 建议：为循环内的 feature.cuda() 和 mm_features_locs.cuda() 添加 DevicePerfWrapper，标签如 "py model multimodal_features.cuda()" 和 "py model mm_features_locs.cuda()"，与同函数中现有模式保持一致。

Checklist Violations (2 fail / 88 total)

General Principles Checklist

[6.1] Architecture — 可观测性：日志/指标/超时可操作、非噪声 → issue 多模态 .cuda() 调用缺少 DevicePerfWrapper 包装
同函数中现有 .cuda() 调用均有 DevicePerfWrapper 包装用于 profiling，但新增的 multimodal feature .cuda() 传输未包装。视觉特征可能较大，profiling 中不可见会影响性能诊断。

RTP-LLM Checklist

[I] 代码质量 — 同一功能用统一工具函数 → issue 多模态 .cuda() 调用缺少 DevicePerfWrapper 包装
buildBertEmbeddingInputs 中现有 .cuda() 调用均使用 DevicePerfWrapper 包装，但新增多模态 .cuda() 调用未使用，同函数内工具使用不一致。

Strengths

复用已有 MultimodalEmbeddingInjector 共享组件，与 MultimodalGenericModel/Qwen3VL 等模型保持一致的设计模式，避免重复实现
测试覆盖全面：正常注入、空特征、负偏移（前缀缓存）、越界、长度不匹配等边界 case 均有覆盖
C++ struct 新字段有安全默认值（空 vector + 未定义 tensor），pybind 绑定有描述性 docstring，兼容无多模态场景

LLLLKKKK · 2026-06-25T07:19:08Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/2 · P3/1

lgtm ready to ci

Non-blocking Suggestions

P2

多模态特征在 buildBertEmbeddingInputs 和 buildPyMultimodalInputs 中被重复拷贝到 GPU @ rtp_llm/cpp/models/PyWrappedModel.cc:245
- 建议：在 buildBertEmbeddingInputs 中判断是否为 Bert 模型路径再拷贝，或让 buildBertEmbeddingInputs 直接复用 buildPyMultimodalInputs 已完成的 CUDA tensor（通过参数传入），避免冗余 host→device 拷贝
micro-batch 路径下 multimodal_features 和 mm_features_locs 未被切分 @ rtp_llm/cpp/models/PyWrappedModel.cc:825
- 建议：当前 Bert 模型不使用 micro-batch（enable_layer_micro_batch 用于 MoE），风险为潜在而非现实。建议在 buildBertEmbeddingInputs 入口添加防御性断言禁止 micro-batch + multimodal 同时启用，或在 splitInputsIntoMicroBatches 中正确切分 multimodal 相关字段

P3

.pyi 类型桩未同步更新新增 BertEmbeddingInputs 字段 @ rtp_llm/models_py/bindings/OpDefs.cc:146
- 建议：在 init.pyi 中为 BertEmbeddingInputs 添加 multimodal_features: list[torch.Tensor] 和 mm_features_locs: torch.Tensor 属性声明

Checklist Violations (4 fail / 94 total)

General Principles Checklist

[6.1] Software Engineering — DRY：重复非平凡逻辑被抽取或显式复用 → issue 多模态特征在 buildBertEmbeddingInputs 和 buildPyMultimodalInputs 中被重复拷贝到 GPU
buildBertEmbeddingInputs 和 buildPyMultimodalInputs 中 multimodal_features 的 .cuda() 拷贝逻辑完全重复，同一请求生成两份 GPU tensor

RTP-LLM Checklist

[B] 正确性与逻辑 — Bypass/shortcut 路径包含新增变换步骤 → issue micro-batch 路径下 multimodal_features 和 mm_features_locs 未被切分
forwardMicroBatched 路径调用 buildBertEmbeddingInputs(micro_inputs)，但 splitInputsIntoMicroBatches 未切分 multimodal_features/mm_features_locs 字段，micro-batch combo_tokens 被 narrow 后 locs 偏移可能越界
[I] 代码质量 — 同一功能用统一工具函数 → issue 多模态特征在 buildBertEmbeddingInputs 和 buildPyMultimodalInputs 中被重复拷贝到 GPU
buildBertEmbeddingInputs 和 buildPyMultimodalInputs 中 multimodal_features 的 .cuda() 逻辑完全重复，应提取为共用函数或复用一方结果

Python Static-First Checklist

[P.G] 测试规范 — 数据驱动测试用 pytest.mark.parametrize → checklist-only
5 个测试方法各自硬编码 hidden_size/seq_len/dtype，可对 dtype 等参数做 parametrize 扩展覆盖。但各方法测试不同语义场景（正常注入、空特征、负偏移、越界、不匹配），独立方法更清晰，不升级为 issue。

Strengths

复用已有 MultimodalEmbeddingInjector 公共模块，与 MultimodalGenericModel、Qwen3VLModel 等共享注入逻辑，避免代码重复
测试覆盖全面：bert_multimodal_test.py 覆盖正常注入、空特征、负偏移截断（KV 前缀缓存场景）、越界报错、长度不匹配报错 5 个关键场景
C++ BertEmbeddingInputs 新增字段有合理默认初始化（空 vector 和 undefined Tensor），pybind 5 参数构造函数通过 C++17 聚合初始化自动 value-init 新字段，确保向后兼容

LLLLKKKK · 2026-06-25T07:57:25Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/1

lgtm ready to ci

Non-blocking Suggestions

P2

mm_features_locs 独立于 multimodal_features 传播可能造成冗余 GPU 搬运 @ rtp_llm/cpp/models/PyWrappedModel.cc:254
- 建议：将 mm_features_locs 的搬运放到 multimodal_features.has_value() 守卫内部，仅在有 features 时才搬运 locs：if (inputs.multimodal_features.has_value()) { ... if (inputs.mm_features_locs.defined()) { ... } }

P3

测试异常断言缺少消息验证 @ rtp_llm/models_py/model_desc/test/bert_multimodal_test.py:123
- 建议：改用 self.assertRaisesRegex(IndexError, 'exceeds') 和 self.assertRaisesRegex(ValueError, 'must equal') 添加消息匹配，确保捕获到的是 MultimodalEmbeddingInjector 的预期错误。

Checklist Violations (1 fail / 67 total)

Python Static-First Checklist

[P.G] 测试规范 — pytest.raises 带 match 参数 → issue 测试异常断言缺少消息验证
bert_multimodal_test.py:123 和 :134 使用 self.assertRaises 仅检查异常类型，未验证消息内容，可能捕获非预期来源的同类型异常。

Strengths

复用已有的 MultimodalEmbeddingInjector 模块，C++ 层 propagation 遵循 buildPyMultimodalInputs 的既有模式，避免重复实现
C++ struct 新字段有默认值（空 vector / undefined Tensor），pybind def_readwrite 注册完整，确保未设置多模态的 Bert 请求向后兼容
单测覆盖全面：正常注入、空特征提前返回、负位置（KV cache 前缀复用截断）、越界、长度不匹配五个场景均有覆盖
多模态注入位置正确——在 pre_decoder_layernorm 之后、transformer layers 之前，与生成式模型路径一致

LLLLKKKK · 2026-06-25T11:36:36Z

+        mm_feats.reserve(inputs.multimodal_features.value().size());
+        for (const auto& feature : inputs.multimodal_features.value()) {
+            DevicePerfWrapper wrapper(enable_device_perf_, "py model multimodal_features.cuda()");
+            mm_feats.emplace_back(feature.cuda());


tensorHoldHostAndToCuda

LLLLKKKK · 2026-06-25T14:50:23Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/0

lgtm ready to ci

Non-blocking Suggestions

P2

Bert 路径多模态特征被冗余拷贝到 GPU @ rtp_llm/cpp/models/PyWrappedModel.cc:249
- 建议：方案 A：buildBertEmbeddingInputs 接收已构建的 PyMultimodalInputs，直接复用其 GPU tensor 而非重新拷贝。方案 B：通过模型类型标志跳过不相关路径的多模态拷贝。方案 C（最小改动）：tensorHoldHostAndToCuda 内部已有 is_cuda() 短路，可将 buildBertEmbeddingInputs 放在 buildPyMultimodalInputs 之后并接收其结果，features 已在 GPU 上时会被短路跳过。

Checklist ✅ (56 items passed)

Strengths

复用已有 MultimodalEmbeddingInjector 共享模块，Bert 和 decoder 模型的多模态注入逻辑统一维护，遵循 DRY 原则
C++ struct 新字段有安全默认值（std::vector 默认空、torch::Tensor 默认 undefined），pybind 5 参数构造器不变，向后兼容
测试覆盖全面：正常注入、空特征、负偏移（prefix cache 截断）、越界、数量不匹配五种场景，端到端验证 BertModel.forward 的多模态路径

LLLLKKKK · 2026-06-26T02:43:08Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/0

lgtm ready to ci

Non-blocking Suggestions

P2

combo_tokens_type_ids 仍用 .cuda() 未升级为 tensorHoldHostAndToCuda @ rtp_llm/cpp/models/PyWrappedModel.cc:224
- 建议：将 combo_tokens_type_ids 也改为 tensorHoldHostAndToCuda(inputs.combo_tokens_type_ids)，保持与 combo_position_ids 一致的 host buffer 生命周期管理。

Checklist Violations (2 fail / 67 total)

General Principles Checklist

[6.1] Quality — 逻辑变更未混入无关格式化 → checklist-only
combo_position_ids 从 .cuda() 改为 tensorHoldHostAndToCuda 是独立的拷贝优化（PyWrappedModel.cc:217），与多模态特性传播功能无直接关联，但改动量极小且在同一函数内，不影响可审查性。

RTP-LLM Checklist

[I] 代码质量 — 同一功能用统一工具函数 → issue combo_tokens_type_ids 仍用 .cuda() 未升级为 tensorHoldHostAndToCuda
buildBertEmbeddingInputs 中 combo_position_ids 已改为 tensorHoldHostAndToCuda（line 217），但 combo_tokens_type_ids 仍使用 .cuda()（line 224），同一函数内 host→device 拷贝方式不一致。

Strengths

复用已有 MultimodalEmbeddingInjector 模块，与 qwen3vl 等模型共享注入逻辑，设计干净
测试覆盖全面：正常路径、空特征、负 loc（prefix caching 场景）、越界、长度不匹配均有用例，设计清晰
pybind 新字段有 C++ 默认值（vector 为空、Tensor 为 undefined），不破坏未传递多模态输入的调用方

LLLLKKKK · 2026-06-26T03:48:31Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/3 · P3/2

lgtm ready to ci

Non-blocking Suggestions

P2

微批次路径未拆分多模态特征，可能导致位置索引越界 @ rtp_llm/cpp/models/PyWrappedModel.cc:350
- 建议：在 splitInputsIntoMicroBatches 中拆分 multimodal_features 和 mm_features_locs，或在 buildBertEmbeddingInputs 中检测微批次场景时跳过多模态注入（附注释说明限制）。
多模态 tensor 经 buildPyMultimodalInputs 已在 CUDA，buildBertEmbeddingInputs 中 tensorHoldHostAndToCuda 冗余 @ rtp_llm/cpp/models/PyWrappedModel.cc:247
- 建议：multimodal_inputs 中的 tensor 已在 CUDA，可直接赋值 bert_embedding_inputs.multimodal_features = multimodal_inputs.multimodal_features，无需再经 tensorHoldHostAndToCuda；或改为直接从 GptModelInputs 读取 host tensor 仅做一次 tensorHoldHostAndToCuda。
CUDA Graph 路径未初始化新增 multimodal 字段 @ rtp_llm/models_py/bindings/OpDefs.h:219
- 建议：在 initCaptureBertEmbeddingInputs 末尾显式注释说明 multimodal_features/mm_features_locs 故意留空（仅 prefill 路径使用，非 CUDA graph 场景），或在 canRun 中添加 multimodal 输入非空时 fallback 到 normal path 的检查。

P3

MAX_FUSED_D2D_COPIES 容量注释未更新 @ rtp_llm/cpp/models/PyWrappedModel.cc:217
- 建议：更新第 318 行注释，将 ~6 改为 ~8 或明确列出 buildBertEmbeddingInputs 贡献的 2 个 copy。
测试 assertRaises 未带 match/regex 验证异常消息 @ rtp_llm/models_py/model_desc/test/bert_multimodal_test.py:123
- 建议：使用 self.assertRaisesRegex(IndexError, r'cannot be placed') 和 self.assertRaisesRegex(ValueError, r'entries .* features') 验证消息来源。

Checklist Violations (1 fail / 67 total)

Python Static-First Checklist

[P.G] 测试规范 — pytest.raises 带 match 参数 → issue 测试 assertRaises 未带 match/regex 验证异常消息
test_out_of_range_loc_raises (L123) 和 test_locs_length_mismatch_raises (L133) 使用 assertRaises 仅验证异常类型，未通过 assertRaisesRegex 验证消息内容。

Strengths

复用已有 MultimodalEmbeddingInjector 模块，与 MultimodalGenericModel 保持一致的注入模式，避免重复实现
多模态特征注入通过 BertEmbeddingInputs 数据结构传递，C++/pybind/Python 三层改动清晰一致，未引入新的数据通路
测试覆盖充分：正常拼接、空特征、负索引截断、越界和长度不匹配五个场景，端到端验证 BertModel.forward 的拼接逻辑
BertEmbeddingInputs 新增字段有 C++ 默认初始化（空 vector + undefined Tensor），pybind 构造函数向后兼容

LLLLKKKK · 2026-06-26T08:14:07Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist Violations (2 fail / 88 total)

General Principles Checklist

[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → checklist-only
有聚焦单测（bert_multimodal_test.py 覆盖 5 个场景），但测试通过 new + SimpleNamespace 绕过 C++ 层，未覆盖 buildBertEmbeddingInputs → pybind → Python 的完整路径。2/3 reviewer 认为单测已充分，不升级为 issue。

RTP-LLM Checklist

[H] 测试与 CI — 测试覆盖充分：大重构等价覆盖，新功能端到端测试 → checklist-only
新功能有聚焦单测（bert_multimodal_test.py 覆盖 5 种场景），但通过 stub 绕过 C++ 层，未覆盖 buildBertEmbeddingInputs → pybind → Python 的完整路径。2/3 reviewer 认为单测已充分，不升级为 issue。

Strengths

复用已有 MultimodalEmbeddingInjector 组件，与 Qwen3VL/MultimodalGeneric 等 5+ 个模型保持一致的注入模式，无重复实现
测试覆盖全面：正常注入、空特征、负位置前缀截断、越界报错、长度不匹配报错五种场景，断言使用 torch.testing.assert_close 精确比较
.cuda() → tensorHoldHostAndToCuda 统一化消除不一致的 H2D 拷贝方式，同时获得 pinned memory 断言和 fused D2D copy 优化
BertEmbeddingInputs 新字段放在 struct 末尾且有默认值（空 vector / undefined tensor），pybind 5 参数构造器无需修改，向后兼容

LLLLKKKK · 2026-06-26T10:06:19Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (84 items passed)

Strengths

复用已有 MultimodalEmbeddingInjector 组件，与 Qwen3VL、MultimodalGenericModel 等模型保持一致的多模态注入模式，避免逻辑重复
C++ 侧统一使用 tensorHoldHostAndToCuda 替换原有 .cuda() 调用，同时修复了 combo_position_ids 和 combo_tokens_type_ids 的传输方式
测试覆盖全面：正常注入、空特征、负偏移（前缀缓存场景）、越界、数量不匹配等边界 case 均有覆盖
pybind 新字段有 C++ 默认值（empty vector / undefined tensor），保持向后兼容

wht21 · 2026-06-26T10:09:44Z

internal source has been updated, please review the changes!

After the C++ GptModel removal (b6575e4), the Python BertModel.forward path never received computed vision embeddings from the C++ engine. Vision token positions (placeholder IDs -1/-2/-3) went through word embedding lookup producing garbage hidden states, causing ~5% argmax disagreement vs PyTorch baseline on VisionBert. Fix: - Add multimodal_features and mm_features_locs fields to BertEmbeddingInputs - Expose them via pybind11 - Populate them in PyWrappedModel::buildBertEmbeddingInputs - Splice vision embeddings into hidden_states after pre-LN in BertModel.forward Verified: argmax disagreement drops from 4.90% to 0.18% (full sweep, 62K items), matching mi308 reference (0.15%).

LLLLKKKK · 2026-06-26T18:44:28Z

AI Code Review - PR #1137

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (67 items passed)

Strengths

完全遵循 MultimodalGenericModel 的已有模式（MultimodalEmbeddingInjector + init/forward 接线），代码风格高度一致
单元测试覆盖正路径、空输入、负偏移前缀截断、越界异常、数量不匹配异常等关键场景
C++ 端统一使用 tensorHoldHostAndToCuda 替代 .cuda()，同步改善了 combo_position_ids/combo_tokens_type_ids 的已有代码一致性
pybind 新字段 multimodal_features/mm_features_locs 有 C++ 默认值（空 vector/未定义 tensor），不破坏已有构造路径

wht21 · 2026-06-26T18:47:50Z

internal source has been updated, please review the changes!

parkerpang requested a review from LLLLKKKK as a code owner June 23, 2026 17:16

parkerpang force-pushed the feat/fix-bert-emb branch from eab8702 to 0366816 Compare June 24, 2026 03:07

parkerpang force-pushed the feat/fix-bert-emb branch from 0366816 to c055017 Compare June 24, 2026 03:49

parkerpang force-pushed the feat/fix-bert-emb branch from c055017 to eb4b444 Compare June 24, 2026 08:17

parkerpang force-pushed the feat/fix-bert-emb branch from eb4b444 to 68badd0 Compare June 25, 2026 06:46

parkerpang force-pushed the feat/fix-bert-emb branch from 68badd0 to ac06fd0 Compare June 25, 2026 07:22

LLLLKKKK requested changes Jun 25, 2026

View reviewed changes

parkerpang force-pushed the feat/fix-bert-emb branch 2 times, most recently from 4429fa6 to 6c58893 Compare June 25, 2026 14:26

parkerpang force-pushed the feat/fix-bert-emb branch from 6c58893 to 39f06a2 Compare June 26, 2026 02:06

parkerpang force-pushed the feat/fix-bert-emb branch from 39f06a2 to 46f7dd8 Compare June 26, 2026 03:01

parkerpang force-pushed the feat/fix-bert-emb branch from 46f7dd8 to 5e7766b Compare June 26, 2026 07:06

parkerpang force-pushed the feat/fix-bert-emb branch from 5e7766b to 5d9d5e6 Compare June 26, 2026 08:16

parkerpang force-pushed the feat/fix-bert-emb branch from 5d9d5e6 to c9f128b Compare June 26, 2026 18:13

Uh oh!

Conversation

parkerpang commented Jun 23, 2026

Uh oh!

LLLLKKKK commented Jun 23, 2026

AI Code Review - PR #1137

Blocking Issues

P1

Non-blocking Suggestions

P2

Checklist Violations (9 fail / 88 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 24, 2026

AI Code Review - PR #1137

Blocking Issues

P1

Non-blocking Suggestions

P3

Checklist Violations (2 fail / 88 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 24, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P3

Checklist Violations (3 fail / 67 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 24, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P2

Checklist Violations (2 fail / 88 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 25, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P2

P3

Checklist Violations (4 fail / 94 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 25, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P2

P3

Checklist Violations (1 fail / 67 total)

Strengths

Uh oh!

LLLLKKKK Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

LLLLKKKK commented Jun 25, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P2

Checklist ✅ (56 items passed)

Strengths

Uh oh!

LLLLKKKK commented Jun 26, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P2

Checklist Violations (2 fail / 67 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 26, 2026

AI Code Review - PR #1137

Non-blocking Suggestions

P2

P3

Checklist Violations (1 fail / 67 total)

Strengths

Uh oh!

LLLLKKKK commented Jun 26, 2026

AI Code Review - PR #1137

Checklist Violations (2 fail / 88 total)