fix: cherry-pick combined projection fixes (#1324, #1357) into r0.2.1 by HuiyingLi · Pull Request #1388 · NVIDIA-NeMo/Automodel

HuiyingLi · 2026-02-25T21:57:18Z

Summary

Cherry-pick of fix: combined projection #1324: fix combined projection interleaving for TP-correct ColwiseParallel sharding
Cherry-pick of fix: FSDP pre-shard combined projections on dim 1 for Qwen2.5-7B support #1357: FSDP pre-shard combined projections on dim 1 for Qwen2.5-7B support

These fixes correct the QKV and gate_up weight layout from naive concatenation to KV-head-grouped interleaving (QKV) and row interleaving (gate_up), ensuring each TP rank receives complete head groups under ColwiseParallel sharding.

Test plan

Status: NOT YET VERIFIED — needs to be validated before merge.

Verify Qwen2.5-7B SFT with TP=2 produces correct initial loss (~6) instead of NaN
Verify Llama3.2-3B SFT with TP=2

Signed-off-by: HuiyingLi willwin.lee@gmail.com
Signed-off-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

* fix: compbined projection Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> * lint Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> * address comment Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> * add tp output parity tests Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> --------- Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ort (#1357) * fix: FSDP pre-shard combined projections on dim 1 for Qwen2.5-7B support Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> * revert recipe change Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> * lint Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> --------- Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-02-25T21:57:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HuiyingLi · 2026-02-25T22:01:16Z

Hi @ZhiyuLi-Nvidia , I cherrypicked these two commits. However the TP2 cases still see high loss. Do you know anything else missing? Thanks!

ZhiyuLi-Nvidia added 2 commits February 25, 2026 13:30

HuiyingLi requested review from adil-a, akoumpa and hemildesai as code owners February 25, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cherry-pick combined projection fixes (#1324, #1357) into r0.2.1#1388

fix: cherry-pick combined projection fixes (#1324, #1357) into r0.2.1#1388
HuiyingLi wants to merge 2 commits intor0.2.1from
cherry-pick-1324-1357-r0.2.1

HuiyingLi commented Feb 25, 2026

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

HuiyingLi commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HuiyingLi commented Feb 25, 2026

Summary

Test plan

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

HuiyingLi commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants