[T2-1-4] PPPoint-t #31
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
参与了模型适配赛题,选用InfiniLM框架,适配Qwen3-1.7B模型
llama模型推理测试截图
9g4b模型推理测试截图
qwen3模型推理测试截图
部署模型推理服务截图
模型介绍
Qwen3 模型采用了对 Query、Key 单独归一化的设计,直接对 Q 与 K 各自做一个 RMSNorm,影响了注意力权重的数值分布,从而改变 softmax 后的注意力矩阵结构。为了使推理端与训练时的计算一致,必须在投影出 Q、K 后、应用 RoPE 前,使用对应的归一化权重。
成果阐述
将 Qwen3 模型接入并在现有推理路径中支持 Q/K 专用归一化,目标是保证在不改动上层推理逻辑的前提下,引入注意力子层对 Q / K 的独立 RMSNorm 支持,从而与 Qwen3 原始权重格式对齐并提升数值稳定性与推理一致性。在设备资源构建流程中为每一层条件性加载并缓存 Q/K 的归一化权重在推理时,将 Q / K 单独做 RMSNorm(而非仅在 logits 输入处做一次),保持向后兼容,当模型无 Q/K 专用归一化时仍按原有逻辑运行。保持模型解码兼容,不影响其他模型推理结果。