[T2-1-4] PPPoint-t #31

PPPoint-t · 2025-08-24T09:39:46Z

参与了模型适配赛题，选用InfiniLM框架，适配Qwen3-1.7B模型

llama模型推理测试截图

9g4b模型推理测试截图

qwen3模型推理测试截图

部署模型推理服务截图

模型介绍

Qwen3 模型采用了对 Query、Key 单独归一化的设计，直接对 Q 与 K 各自做一个 RMSNorm，影响了注意力权重的数值分布，从而改变 softmax 后的注意力矩阵结构。为了使推理端与训练时的计算一致，必须在投影出 Q、K 后、应用 RoPE 前，使用对应的归一化权重。

成果阐述

将 Qwen3 模型接入并在现有推理路径中支持 Q/K 专用归一化，目标是保证在不改动上层推理逻辑的前提下，引入注意力子层对 Q / K 的独立 RMSNorm 支持，从而与 Qwen3 原始权重格式对齐并提升数值稳定性与推理一致性。在设备资源构建流程中为每一层条件性加载并缓存 Q/K 的归一化权重在推理时，将 Q / K 单独做 RMSNorm（而非仅在 logits 输入处做一次），保持向后兼容，当模型无 Q/K 专用归一化时仍按原有逻辑运行。保持模型解码兼容，不影响其他模型推理结果。

PanZezhong1725 · 2025-09-04T02:02:30Z

scripts/jiuge.py

            )
            self.meta = JiugeMetaFromLlama(config, max_tokens=max_tokens)
            self.tokenizer = transformers.AutoTokenizer.from_pretrained(model_dir_path)
+            backend = getattr(self.tokenizer, "backend_tokenizer", None)


不止llama有这个问题，9g7b也有。建议无关模型类型，只要是在tokinizer中发现sequence normalizer有prepend和strip就修改

不止llama有这个问题，9g7b也有。建议无关模型类型，只要是在tokinizer中发现sequence normalizer有prepend和strip就修改

好的

PPPoint-t · 2025-09-04T09:43:27Z

llama模型推理测试截图

9g7b模型推理测试截图

qwen3模型推理测试截图

[T2-1-4] Support Qwen3-1.7b model

1643aa4

PanZezhong1725 requested a review from wooway777 September 4, 2025 01:37

PanZezhong1725 requested changes Sep 4, 2025

View reviewed changes

[T2-1-4] Modify jiuge.py

77151a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[T2-1-4] PPPoint-t #31

[T2-1-4] PPPoint-t #31

Uh oh!

PPPoint-t commented Aug 24, 2025

Uh oh!

PanZezhong1725 Sep 4, 2025

Uh oh!

PPPoint-t Sep 4, 2025

Uh oh!

PPPoint-t commented Sep 4, 2025

Uh oh!

Uh oh!

[T2-1-4] PPPoint-t #31

Are you sure you want to change the base?

[T2-1-4] PPPoint-t #31

Uh oh!

Conversation

PPPoint-t commented Aug 24, 2025

llama模型推理测试截图

9g4b模型推理测试截图

qwen3模型推理测试截图

部署模型推理服务截图

模型介绍

成果阐述

Uh oh!

PanZezhong1725 Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

PPPoint-t Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

PPPoint-t commented Sep 4, 2025

llama模型推理测试截图

9g7b模型推理测试截图

qwen3模型推理测试截图

Uh oh!

Uh oh!