rowwise for feature processors #3606

iamzainhuda · 2025-12-10T21:36:53Z

Summary:
In this diff we introduce row based sharding (TWRW, RW, GRID) type support for feature processors. Previously, feature processors did not support row based sharding since feature processors are data parallel. This means by splitting up the input for row based shards the accessed feature processor weights were in correct. In column/data sharding based approaches, the data is duplicated ensuring the correct weight is accessed across ranks.

The indices/buckets are calculated post input split/distribution, to make it compatible with row based sharding we calculate this pre input split/distribution. This couples the train pipeline and feature processors. For each feature, we preprocess the input and place the calculated indices in KJT.weights, this propagates the indices correctly and indexs into the right weight to use for the final step in the feature processing.

This applies in both pipelined and non pipelined situations - the input modification is done either at the pipelined forward call or in the input dist of the FPEBC. This is determined by the pipelining flag set through rewrite_model in train pipeline.

Previous versions of this diff were reverted as this change applied to all feature processors regardless of row wise sharding applied which surfaced errors that are not captured in usual E2E and unit tests. We now gate the change in two ways: 1) row based shardings must be specified by users to be applied for FP sharding and 2) pre processing input in pipeline will ONLY happen when row based sharding is present. This way FP sharding without row based shardings applied will go through the original forward path.

Differential Revision: D88093763

meta-codesync · 2025-12-10T21:37:10Z

@iamzainhuda has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88093763.

Summary: In this diff we introduce row based sharding (TWRW, RW, GRID) type support for feature processors. Previously, feature processors did not support row based sharding since feature processors are data parallel. This means by splitting up the input for row based shards the accessed feature processor weights were in correct. In column/data sharding based approaches, the data is duplicated ensuring the correct weight is accessed across ranks. The indices/buckets are calculated post input split/distribution, to make it compatible with row based sharding we calculate this pre input split/distribution. This couples the train pipeline and feature processors. For each feature, we preprocess the input and place the calculated indices in KJT.weights, this propagates the indices correctly and indexs into the right weight to use for the final step in the feature processing. This applies in both pipelined and non pipelined situations - the input modification is done either at the pipelined forward call or in the input dist of the FPEBC. This is determined by the pipelining flag set through rewrite_model in train pipeline. **Previous versions of this diff were reverted as this change applied to all feature processors regardless of row wise sharding applied which surfaced errors that are not captured in usual E2E and unit tests. We now gate the change in two ways: 1) row based shardings must be specified by users to be applied for FP sharding and 2) pre processing input in pipeline will ONLY happen when row based sharding is present. This way FP sharding without row based shardings applied will go through the original forward path.** Differential Revision: D88093763

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 10, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 10, 2025

iamzainhuda force-pushed the export-D88093763 branch from c7557d4 to e5815fc Compare December 11, 2025 01:09

iamzainhuda force-pushed the export-D88093763 branch from e5815fc to 377fba1 Compare December 11, 2025 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rowwise for feature processors #3606

rowwise for feature processors #3606

iamzainhuda commented Dec 10, 2025

Uh oh!

meta-codesync bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rowwise for feature processors #3606

Are you sure you want to change the base?

rowwise for feature processors #3606

Conversation

iamzainhuda commented Dec 10, 2025

Uh oh!

meta-codesync bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant