Setup Compiler Toolkit experiment folder for dsv3 #1906

SherlockNoMad · 2025-10-17T04:40:29Z

Stack from ghstack (oldest at bottom):

-> Setup Compiler Toolkit experiment folder for dsv3 #1906

[ghstack-poisoned]

ghstack-source-id: d5493a2 Pull Request resolved: #1906

[ghstack-poisoned]

ghstack-source-id: 42dc42c Pull Request resolved: #1906

torchtitan/experiments/compiler_toolkit/deepseek_v3/parallelize.py

torchtitan/experiments/compiler_toolkit/deepseek_v3/__init__.py

torchtitan/experiments/compiler_toolkit/graph_utils.py

[ghstack-poisoned]

ghstack-source-id: ab9434d Pull Request resolved: #1906

fegin

We should verify checkpoint (DCP) in the future to ensure _restore_state_dict works correctly.

yiming0416 · 2025-10-27T18:11:54Z

@SherlockNoMad Looks like it is not merged into main

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1937 * #1906 sample output ``` [rank0]: # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:485 in all_to_all_single, code: tensor = torch.ops._c10d_functional.all_to_all_single( # type: ignore[attr-defined] [rank0]: tensor_3: "i64[8]" = torch.ops._c10d_functional.all_to_all_single(num_tokens_per_expert_3, [4, 4], [4, 4], '11') [rank0]: [rank0]: # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:136 in wait_tensor, code: return torch.ops._c10d_functional.wait_tensor(tensor) # type: ignore[attr-defined] [rank0]: num_tokens_per_expert_group_2: "i64[8]" = torch.ops._c10d_functional.wait_tensor(tensor_3); tensor_3 = None ``` ``` **[rank0]: # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:522 in all_to_all_single_autograd, code: tensor = torch.ops._c10d_functional_autograd.all_to_all_single( # type: ignore[attr-defined] [rank0]: slice_20: "bf16[u18 + u19, 256]" = torch.ops.aten.slice.Tensor(index_put_6, 0, 0, -1); index_put_6 = None [rank0]: all_to_all_single_14: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.all_to_all_single.default(slice_20, [_local_scalar_dense_16, _local_scalar_dense_17], [_local_scalar_dense_18, _local_scalar_dense_19], '11'); slice_20 = None [rank0]: [rank0]: # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:528 in all_to_all_single_autograd, code: return _FromTorchTensor.apply(tensor) [rank0]: wait_tensor_136: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.wait_tensor.default(all_to_all_single_14); all_to_all_single_14 = None [rank0]: ```

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1906

…1938) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1906 Reland of #1906 Previous diff was created using ghstack, which doesn't work with titan. reland with manual push.

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1937 * #1906 sample output ``` [rank0]: # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:485 in all_to_all_single, code: tensor = torch.ops._c10d_functional.all_to_all_single( # type: ignore[attr-defined] [rank0]: tensor_3: "i64[8]" = torch.ops._c10d_functional.all_to_all_single(num_tokens_per_expert_3, [4, 4], [4, 4], '11') [rank0]: [rank0]: # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:136 in wait_tensor, code: return torch.ops._c10d_functional.wait_tensor(tensor) # type: ignore[attr-defined] [rank0]: num_tokens_per_expert_group_2: "i64[8]" = torch.ops._c10d_functional.wait_tensor(tensor_3); tensor_3 = None ``` ``` **[rank0]: # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:522 in all_to_all_single_autograd, code: tensor = torch.ops._c10d_functional_autograd.all_to_all_single( # type: ignore[attr-defined] [rank0]: slice_20: "bf16[u18 + u19, 256]" = torch.ops.aten.slice.Tensor(index_put_6, 0, 0, -1); index_put_6 = None [rank0]: all_to_all_single_14: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.all_to_all_single.default(slice_20, [_local_scalar_dense_16, _local_scalar_dense_17], [_local_scalar_dense_18, _local_scalar_dense_19], '11'); slice_20 = None [rank0]: [rank0]: # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:528 in all_to_all_single_autograd, code: return _FromTorchTensor.apply(tensor) [rank0]: wait_tensor_136: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.wait_tensor.default(all_to_all_single_14); all_to_all_single_14 = None [rank0]: ```

add joint graph runner deepseek_v3 experiment

fc46d96

[ghstack-poisoned]

SherlockNoMad mentioned this pull request Oct 17, 2025

Annotate EP with dispatch/compute/combine #1907

Closed

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 17, 2025

SherlockNoMad mentioned this pull request Oct 17, 2025

Annotate dsv3 with layer_id #1908

Closed

Update on "add joint graph runner deepseek_v3 experiment"

28dfd25

[ghstack-poisoned]

SherlockNoMad added a commit that referenced this pull request Oct 27, 2025

add joint graph runner deepseek_v3 experiment

0577464

ghstack-source-id: d5493a2 Pull Request resolved: #1906

SherlockNoMad changed the title ~~add joint graph runner deepseek_v3 experiment~~ Setup Compiler Toolkit experiment folder for dsv3 Oct 27, 2025

SherlockNoMad requested review from ezyang, tianyu-l and xmfan October 27, 2025 15:03

Update on "Setup Compiler Toolkit experiment folder for dsv3"

b61f936

[ghstack-poisoned]

SherlockNoMad added a commit that referenced this pull request Oct 27, 2025

add joint graph runner deepseek_v3 experiment

c3ca127

ghstack-source-id: 42dc42c Pull Request resolved: #1906

SherlockNoMad requested a review from fegin October 27, 2025 15:51

yiming0416 self-requested a review October 27, 2025 16:26

yiming0416 reviewed Oct 27, 2025

View reviewed changes

torchtitan/experiments/compiler_toolkit/deepseek_v3/parallelize.py Outdated Show resolved Hide resolved

fegin reviewed Oct 27, 2025

View reviewed changes

torchtitan/experiments/compiler_toolkit/deepseek_v3/__init__.py Outdated Show resolved Hide resolved

torchtitan/experiments/compiler_toolkit/graph_utils.py Outdated Show resolved Hide resolved

torchtitan/experiments/compiler_toolkit/graph_utils.py Outdated Show resolved Hide resolved

Update on "Setup Compiler Toolkit experiment folder for dsv3"

1b35808

[ghstack-poisoned]

SherlockNoMad added a commit that referenced this pull request Oct 27, 2025

add joint graph runner deepseek_v3 experiment

bcbeadf

ghstack-source-id: ab9434d Pull Request resolved: #1906

fegin approved these changes Oct 27, 2025

View reviewed changes

SherlockNoMad merged commit 94981a5 into gh/SherlockNoMad/2/base Oct 27, 2025
4 checks passed

SherlockNoMad mentioned this pull request Oct 27, 2025

[Compiler Toolkit] Add annotations to MoE #1937

Merged

SherlockNoMad mentioned this pull request Oct 27, 2025

[Reland] Setup Compiler Toolkit experiment folder for dsv3 (#1906) #1938

Merged

SherlockNoMad added a commit that referenced this pull request Oct 27, 2025

Setup Compiler Toolkit experiment folder for dsv3 (#1906)

e4bb96b

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1906

SherlockNoMad added a commit that referenced this pull request Oct 27, 2025

Setup Compiler Toolkit experiment folder for dsv3 (#1906)

0b98de1

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1906

SherlockNoMad mentioned this pull request Oct 27, 2025

[Reland][Compiler Toolkit] Add annotations to MoE (#1937) #1941

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setup Compiler Toolkit experiment folder for dsv3 #1906

Setup Compiler Toolkit experiment folder for dsv3 #1906

Uh oh!

SherlockNoMad commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin left a comment

Uh oh!

Uh oh!

yiming0416 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Setup Compiler Toolkit experiment folder for dsv3 #1906

Setup Compiler Toolkit experiment folder for dsv3 #1906

Uh oh!

Conversation

SherlockNoMad commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiming0416 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SherlockNoMad commented Oct 17, 2025 •

edited

Loading