Skip to content

rope_fusion broken with cp for MoEs #1439

@hemildesai

Description

@hemildesai

Enabling rope_fusion with cp > 1 gives incorrect loss for Qwen3 MoE 30b for long context training. Workaround for now is to disable rope_fusion for cp > 1, need to investigate root cause for proper fix.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions