Skip to content

Commit 6bccdb6

Browse files
authored
[TorchComms] update readme (#1877)
nd parallelism is available after #1876 update readme, and fix the font issue
1 parent 3ff7551 commit 6bccdb6

File tree

1 file changed

+27
-18
lines changed

1 file changed

+27
-18
lines changed
Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,34 @@
11
## TorchTitan & TorchComms Composability Testing
22

3-
#### Overview
3+
### Overview
44

5-
This folder provides a framework for composability testing with TorchComms and distributed training in TorchTitan. The goal is to enable flexible experimentation with distributed communication primitives and parallelism strategies in PyTorch.
6-
TODO: add more explanation once the torchcomm goes public.
7-
---
8-
#### Example
5+
This folder provides a framework for composability testing with TorchComms and distributed training in TorchTitan. It enables flexible experimentation with distributed communication primitives and various parallelism strategies in PyTorch.
6+
7+
> **TODO:** Additional documentation will be provided once TorchComms is publicly released.
8+
9+
### Quick Start
10+
11+
The following command uses Llama 3 as an example:
912

10-
The command below uses Llama 3 as an example, but should work on all models.
1113
```bash
1214
TEST_BACKEND=nccl TRAIN_FILE=torchtitan.experiments.torchcomms.train CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" ./run_train.sh
1315
```
14-
---
15-
### Available Features
16-
- **Distributed Training Utilities**
17-
- Training with `torchcomms.new_comm`
18-
- Device mesh initialization with `torchcomms.init_device_mesh`
19-
- **Composability Testing**
20-
- Integration and testing with `fully_shard` (FSDP)
21-
---
22-
### To Be Added
23-
- Integration and testing with additional parallelism strategies (e.g., tensor, pipeline, context parallelism) other than fully_shard
24-
- Integration and testing with torch.compile
25-
---
16+
17+
### Features
18+
19+
#### Distributed Training Utilities
20+
- Custom communicator backend initialization via `torchcomms.new_comm`
21+
- Compose torchcomms with DeviceMesh via the wrapper API `torchcomms.init_device_mesh`
22+
23+
#### Parallelism Support
24+
Locally tested with:
25+
- **FSDP** (`fully_shard`) - Fully Sharded Data Parallel
26+
- **TP** - Tensor Parallelism
27+
- **PP** - Pipeline Parallelism
28+
- **CP** - Context Parallelism
29+
30+
### Roadmap
31+
32+
- [ ] Add N-D parallelism E2E perf and convergence tests
33+
- [ ] Integrated and tested with Expert Parallelism
34+
- [ ] Integration and testing with `torch.compile`

0 commit comments

Comments
 (0)