Skip to content

Conversation

@SherlockNoMad
Copy link
Contributor

@SherlockNoMad SherlockNoMad commented Oct 17, 2025

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 17, 2025
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
ghstack-source-id: d5493a2
Pull Request resolved: #1906
@SherlockNoMad SherlockNoMad changed the title add joint graph runner deepseek_v3 experiment Setup Compiler Toolkit experiment folder for dsv3 Oct 27, 2025
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
ghstack-source-id: 42dc42c
Pull Request resolved: #1906
@SherlockNoMad SherlockNoMad requested a review from fegin October 27, 2025 15:51
@yiming0416 yiming0416 self-requested a review October 27, 2025 16:26
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
ghstack-source-id: ab9434d
Pull Request resolved: #1906
Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify checkpoint (DCP) in the future to ensure _restore_state_dict works correctly.

@SherlockNoMad SherlockNoMad merged commit 94981a5 into gh/SherlockNoMad/2/base Oct 27, 2025
4 checks passed
@yiming0416
Copy link
Contributor

@SherlockNoMad Looks like it is not merged into main

SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #1937
* #1906

sample output
```
[rank0]:        # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:485 in all_to_all_single, code: tensor = torch.ops._c10d_functional.all_to_all_single(  # type: ignore[attr-defined]
[rank0]:        tensor_3: "i64[8]" = torch.ops._c10d_functional.all_to_all_single(num_tokens_per_expert_3, [4, 4], [4, 4], '11')
[rank0]:        
[rank0]:        # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:136 in wait_tensor, code: return torch.ops._c10d_functional.wait_tensor(tensor)  # type: ignore[attr-defined]
[rank0]:        num_tokens_per_expert_group_2: "i64[8]" = torch.ops._c10d_functional.wait_tensor(tensor_3);  tensor_3 = None
```

```
**[rank0]:        # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:522 in all_to_all_single_autograd, code: tensor = torch.ops._c10d_functional_autograd.all_to_all_single(  # type: ignore[attr-defined]
[rank0]:        slice_20: "bf16[u18 + u19, 256]" = torch.ops.aten.slice.Tensor(index_put_6, 0, 0, -1);  index_put_6 = None
[rank0]:        all_to_all_single_14: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.all_to_all_single.default(slice_20, [_local_scalar_dense_16, _local_scalar_dense_17], [_local_scalar_dense_18, _local_scalar_dense_19], '11');  slice_20 = None
[rank0]:        
[rank0]:        # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:528 in all_to_all_single_autograd, code: return _FromTorchTensor.apply(tensor)
[rank0]:        wait_tensor_136: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.wait_tensor.default(all_to_all_single_14);  all_to_all_single_14 = None
[rank0]:        
```
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
…1938)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #1906

Reland of #1906
Previous diff was created using ghstack, which doesn't work with titan.
reland with manual push.
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #1937
* #1906

sample output
```
[rank0]:        # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:485 in all_to_all_single, code: tensor = torch.ops._c10d_functional.all_to_all_single(  # type: ignore[attr-defined]
[rank0]:        tensor_3: "i64[8]" = torch.ops._c10d_functional.all_to_all_single(num_tokens_per_expert_3, [4, 4], [4, 4], '11')
[rank0]:
[rank0]:        # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:136 in wait_tensor, code: return torch.ops._c10d_functional.wait_tensor(tensor)  # type: ignore[attr-defined]
[rank0]:        num_tokens_per_expert_group_2: "i64[8]" = torch.ops._c10d_functional.wait_tensor(tensor_3);  tensor_3 = None
```

```
**[rank0]:        # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:522 in all_to_all_single_autograd, code: tensor = torch.ops._c10d_functional_autograd.all_to_all_single(  # type: ignore[attr-defined]
[rank0]:        slice_20: "bf16[u18 + u19, 256]" = torch.ops.aten.slice.Tensor(index_put_6, 0, 0, -1);  index_put_6 = None
[rank0]:        all_to_all_single_14: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.all_to_all_single.default(slice_20, [_local_scalar_dense_16, _local_scalar_dense_17], [_local_scalar_dense_18, _local_scalar_dense_19], '11');  slice_20 = None
[rank0]:
[rank0]:        # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:528 in all_to_all_single_autograd, code: return _FromTorchTensor.apply(tensor)
[rank0]:        wait_tensor_136: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.wait_tensor.default(all_to_all_single_14);  all_to_all_single_14 = None
[rank0]:
```
SherlockNoMad added a commit that referenced this pull request Oct 27, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #1937
* #1906

sample output
```
[rank0]:        # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:485 in all_to_all_single, code: tensor = torch.ops._c10d_functional.all_to_all_single(  # type: ignore[attr-defined]
[rank0]:        tensor_3: "i64[8]" = torch.ops._c10d_functional.all_to_all_single(num_tokens_per_expert_3, [4, 4], [4, 4], '11')
[rank0]:
[rank0]:        # Annotation: {'EP': 'dispatch'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:136 in wait_tensor, code: return torch.ops._c10d_functional.wait_tensor(tensor)  # type: ignore[attr-defined]
[rank0]:        num_tokens_per_expert_group_2: "i64[8]" = torch.ops._c10d_functional.wait_tensor(tensor_3);  tensor_3 = None
```

```
**[rank0]:        # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:522 in all_to_all_single_autograd, code: tensor = torch.ops._c10d_functional_autograd.all_to_all_single(  # type: ignore[attr-defined]
[rank0]:        slice_20: "bf16[u18 + u19, 256]" = torch.ops.aten.slice.Tensor(index_put_6, 0, 0, -1);  index_put_6 = None
[rank0]:        all_to_all_single_14: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.all_to_all_single.default(slice_20, [_local_scalar_dense_16, _local_scalar_dense_17], [_local_scalar_dense_18, _local_scalar_dense_19], '11');  slice_20 = None
[rank0]:
[rank0]:        # Annotation: {'EP': 'combine'} File: /data/users/bahuang/pytorch/torch/distributed/_functional_collectives.py:528 in all_to_all_single_autograd, code: return _FromTorchTensor.apply(tensor)
[rank0]:        wait_tensor_136: "bf16[u16 + u17, 256]" = torch.ops._c10d_functional.wait_tensor.default(all_to_all_single_14);  all_to_all_single_14 = None
[rank0]:
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants