Commit e5815fc
rowwise for feature processors (#3606)
Summary:
In this diff we introduce row based sharding (TWRW, RW, GRID) type support for feature processors. Previously, feature processors did not support row based sharding since feature processors are data parallel. This means by splitting up the input for row based shards the accessed feature processor weights were in correct. In column/data sharding based approaches, the data is duplicated ensuring the correct weight is accessed across ranks.
The indices/buckets are calculated post input split/distribution, to make it compatible with row based sharding we calculate this pre input split/distribution. This couples the train pipeline and feature processors. For each feature, we preprocess the input and place the calculated indices in KJT.weights, this propagates the indices correctly and indexs into the right weight to use for the final step in the feature processing.
This applies in both pipelined and non pipelined situations - the input modification is done either at the pipelined forward call or in the input dist of the FPEBC. This is determined by the pipelining flag set through rewrite_model in train pipeline.
**Previous versions of this diff were reverted as this change applied to all feature processors regardless of row wise sharding applied which surfaced errors that are not captured in usual E2E and unit tests. We now gate the change in two ways: 1) row based shardings must be specified by users to be applied for FP sharding and 2) pre processing input in pipeline will ONLY happen when row based sharding is present. This way FP sharding without row based shardings applied will go through the original forward path.**
Differential Revision: D880937631 parent 5cf0f0d commit e5815fc
File tree
8 files changed
+440
-22
lines changed- torchrec
- distributed
- planner
- tests
- train_pipeline
- tests
- modules
8 files changed
+440
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
12 | 23 | | |
| 24 | + | |
13 | 25 | | |
14 | 26 | | |
15 | | - | |
16 | 27 | | |
17 | 28 | | |
18 | 29 | | |
| |||
31 | 42 | | |
32 | 43 | | |
33 | 44 | | |
34 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
35 | 50 | | |
36 | 51 | | |
37 | 52 | | |
38 | 53 | | |
39 | 54 | | |
40 | 55 | | |
41 | 56 | | |
| 57 | + | |
| 58 | + | |
42 | 59 | | |
43 | 60 | | |
44 | 61 | | |
| |||
74 | 91 | | |
75 | 92 | | |
76 | 93 | | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
77 | 104 | | |
78 | 105 | | |
79 | 106 | | |
| |||
96 | 123 | | |
97 | 124 | | |
98 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
99 | 132 | | |
100 | 133 | | |
101 | 134 | | |
| |||
166 | 199 | | |
167 | 200 | | |
168 | 201 | | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
169 | 217 | | |
170 | 218 | | |
171 | 219 | | |
| |||
236 | 284 | | |
237 | 285 | | |
238 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
239 | 296 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
57 | 66 | | |
58 | 67 | | |
59 | 68 | | |
| |||
186 | 195 | | |
187 | 196 | | |
188 | 197 | | |
189 | | - | |
| 198 | + | |
190 | 199 | | |
191 | 200 | | |
192 | 201 | | |
| |||
308 | 317 | | |
309 | 318 | | |
310 | 319 | | |
311 | | - | |
| 320 | + | |
312 | 321 | | |
313 | 322 | | |
| 323 | + | |
314 | 324 | | |
315 | | - | |
| 325 | + | |
316 | 326 | | |
317 | 327 | | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
318 | 337 | | |
319 | 338 | | |
320 | | - | |
| 339 | + | |
321 | 340 | | |
322 | 341 | | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
323 | 351 | | |
324 | 352 | | |
325 | 353 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1115 | 1115 | | |
1116 | 1116 | | |
1117 | 1117 | | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
0 commit comments