Commit 476a965
authored
Implicit overlap of shared expert compute and token combine communication (#1741)
This PR moves the computation of the shared expert before the possible
scoring of the routed expert output which leads to an implicit overlap
between shared expert compute and token combine communication in MoE
models.
Repro (lowered the layer number to 2):
```
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml" ./run_train.sh --profiling.enable_profiling --profiling.profile_freq 10 --training.steps 10
```
Trace before the change:
<img width="1503" height="625" alt="Screenshot 2025-09-23 at 12 08
31 AM"
src="https://github.com/user-attachments/assets/bbcc41cf-6497-482e-972e-d917baf4498e"
/>
Trace after the change (note that all-to-all comm is now overlapping
shared expert compute):
<img width="1503" height="625" alt="Screenshot 2025-09-23 at 12 04
56 AM"
src="https://github.com/user-attachments/assets/3504e77c-aa14-46fd-8e47-e247b88d7b9c"
/>
cc @tianyu-l @xmfan1 parent 22d2d44 commit 476a965
1 file changed
+8
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
417 | 417 | | |
418 | 418 | | |
419 | 419 | | |
420 | | - | |
421 | | - | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | 420 | | |
| 421 | + | |
| 422 | + | |
427 | 423 | | |
428 | 424 | | |
429 | 425 | | |
430 | 426 | | |
431 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
432 | 434 | | |
433 | 435 | | |
434 | 436 | | |
| |||
0 commit comments