Skip to content

Conversation

@TroyGarden
Copy link
Contributor

@TroyGarden TroyGarden commented Nov 10, 2025

Summary:

context

  • previously in the TrainPipelineBase, the cur_batch (model input) is not released until calling the loss.backward().
  • however, the cur_batch is only needed during the forward pass.
  • this diff changes the order of clearing the current batch so that it's cleared right after the forward pass

NOTE: usually the peak memory usage happens at the beginning of the backward pass, so clearing the unused input batch can reduce the peak memory usage.

  • benchmark comparison indicates roughly 1~1.5x of memory saving (input batch ~ 1GB)
name GPU Peak Memory alloc GPU Peak Memory reserved
before 35.94 GB 56.72 GB
after 34.33 GB 54.00 GB
before-inplace 35.94 GB 53.91 GB
after-inplace 34.33 GB 51.35 GB

NOTE: in-place copy batch to gpu won't change the gpu peak memory allocation, but can reduce the peak memory reservation.

Differential Revision: D85483966

Summary:
# context
* previously in the TrainPipelineBase, the `cur_batch` (model input) is not released until calling the `loss.backward()`. 
* however, the `cur_batch` is only needed during the forward pass. 
* this diff changes the order of clearing the current batch so that it's cleared right after the forward pass

NOTE: usually the peak memory usage happens at the beginning of the backward pass, so clearing the unused input batch can reduce the peak memory usage.

* benchmark comparison indicates roughly 1~1.5x of memory saving (input batch ~ 1GB)
|name|GPU Peak Memory alloc|GPU Peak Memory reserved|
|--|--|--|
|before|35.94 GB|56.72 GB|
|after|34.33 GB|54.00 GB|

Differential Revision: D85483966
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 10, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 10, 2025

@TroyGarden has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85483966.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant