Commit 069bae2
authored
add configuration for float8 with rowwise scaling, via recipe lookup (#808)
Summary:
Exposes the float8 config from recipe name API from torchao, so we can
use it to easily configure float8 with rowwise scaling from torchtitan.
Usage:
```
with-proxy CONFIG_FILE="./torchtitan/models/llama/train_configs/llama3_8b.toml" ./run_train.sh \
--model.converters float8 \
--training.compile \
--float8.recipe_name rowwise
```
Example, pretraining LLaMa 3 8B on 8 H100s:
```
// baseline (bf16 + compile)
> with-proxy CONFIG_FILE="./torchtitan/models/llama/train_configs/llama3_8b.toml" ./run_train.sh --training.compile
...
step: 20 loss: 8.4931 memory: 47.65GiB(50.16%) tps: 5,760 mfu: 33.73%
// experiment (rowwise float8 + compile)
> with-proxy CONFIG_FILE="./torchtitan/models/llama/train_configs/llama3_8b.toml" ./run_train.sh --model.converters float8 --training.compile --float8.recipe_name rowwise
...
step: 30 loss: 7.7109 memory: 47.77GiB(50.28%) tps: 6,468 mfu: 37.88%
// for comparison, tensorwise float8 with float8 all-gather (on main branch)
with-proxy CONFIG_FILE="./torchtitan/models/llama/train_configs/llama3_8b.toml" ./run_train.sh --model.converters float8 --training.compile --float8.enable_fsdp_float8_all_gather --float8.precompute_float8_dynamic_scale_for_fsdp
...
step: 20 loss: 8.4258 memory: 47.32GiB(49.81%) tps: 7,186 mfu: 42.08%
```
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:1 parent 0047aa2 commit 069bae2
File tree
4 files changed
+81
-31
lines changed- docs
- torchtitan
- components
- models/llama
4 files changed
+81
-31
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
18 | 27 | | |
19 | | - | |
| 28 | + | |
20 | 29 | | |
21 | | - | |
| 30 | + | |
22 | 31 | | |
23 | 32 | | |
24 | 33 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
61 | 60 | | |
62 | 61 | | |
63 | 62 | | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
69 | 75 | | |
70 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
71 | 92 | | |
72 | 93 | | |
73 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
613 | 613 | | |
614 | 614 | | |
615 | 615 | | |
616 | | - | |
| 616 | + | |
617 | 617 | | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
621 | | - | |
| 621 | + | |
622 | 622 | | |
623 | 623 | | |
624 | 624 | | |
625 | 625 | | |
626 | 626 | | |
627 | 627 | | |
628 | | - | |
629 | | - | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
630 | 641 | | |
631 | 642 | | |
632 | 643 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
59 | 60 | | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
60 | 71 | | |
61 | 72 | | |
62 | 73 | | |
63 | 74 | | |
64 | | - | |
| 75 | + | |
65 | 76 | | |
66 | 77 | | |
67 | 78 | | |
| |||
115 | 126 | | |
116 | 127 | | |
117 | 128 | | |
118 | | - | |
| 129 | + | |
119 | 130 | | |
120 | 131 | | |
121 | 132 | | |
| |||
141 | 152 | | |
142 | 153 | | |
143 | 154 | | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
| 155 | + | |
| 156 | + | |
148 | 157 | | |
149 | 158 | | |
150 | 159 | | |
| |||
202 | 211 | | |
203 | 212 | | |
204 | 213 | | |
205 | | - | |
| 214 | + | |
206 | 215 | | |
207 | 216 | | |
208 | 217 | | |
| |||
0 commit comments