Add support for AC budget API #1731

tohskai · 2025-09-21T13:10:57Z

Inspired by the blogpost:
https://pytorch.org/blog/activation-checkpointing-techniques/

meta-cla · 2025-09-21T13:11:03Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

meta-cla · 2025-09-21T14:07:47Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

wwwjn

I read the blog and find the memory budget idea is cool. Have you try out the implementation on some model (eg, llama3) with torch.compile? I'm curious does it works end to end and if the performance are better

torchtitan/config/job_config.py

tohskai · 2025-09-22T19:35:55Z

I read the blog and find the memory budget idea is cool. Have you try out the implementation on some model (eg, llama3) with torch.compile? I'm curious does it works end to end and if the performance are better

I haven't done runs on llama3, but on our benchmarks on it showed significant improvements over regular SAC. This is why I wanted to upstream this :)

But our model is quite different, so it's totally reasonable to see less gains.

wwwjn · 2025-09-22T21:24:55Z

Thanks for sharing! We would love to see more verifications - eg, correctness and loss curves , and performance analysis on titan supported models (llama3, etc)

cc @soulitzer for reviewing

torchtitan/distributed/activation_checkpoint.py

torchtitan/config/job_config.py

tianyu-l

I haven't done runs on llama3, but on our benchmarks on it showed significant improvements over regular SAC. This is why I wanted to upstream this :)

I agree with @wwwjn that

We would love to see more verifications - eg, correctness and loss curves , and performance analysis on titan supported models (llama3, etc)

Please refer to https://github.com/pytorch/torchtitan/blob/main/CONTRIBUTING.md#proof-of-value

tohskai · 2025-10-01T16:29:48Z

@wwwjn @soulitzer @tianyu-l

Should this support selection of activation_memory_budget_solver and activation_memory_budget_runtime_estimator? What about visualize_memory_budget_pareto? I found them to be useful, and the discoverability is low, but given that this is unstable api and potentially feature overload, I would prefer to hear your opinion.

https://github.com/pytorch/pytorch/blob/main/torch/_functorch/config.py#L147-L169

tohskai · 2025-10-01T16:40:52Z

I ran models/llama3/train_configs/llama3_8b.toml on 8xH100:

Would that suffice for Proof of Value?

Add support for AC budget API

fb11820

tohskai requested review from tianyu-l, fegin, wwwjn and wconstab as code owners September 21, 2025 13:10

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 21, 2025

wwwjn reviewed Sep 22, 2025

View reviewed changes

torchtitan/config/job_config.py Outdated Show resolved Hide resolved

tianyu-l requested a review from soulitzer September 22, 2025 21:19

tianyu-l reviewed Sep 24, 2025

View reviewed changes

torchtitan/distributed/activation_checkpoint.py Outdated Show resolved Hide resolved

soulitzer reviewed Sep 24, 2025

View reviewed changes

torchtitan/config/job_config.py Outdated Show resolved Hide resolved

soulitzer reviewed Sep 24, 2025

View reviewed changes

torchtitan/config/job_config.py Outdated Show resolved Hide resolved

tianyu-l requested changes Sep 24, 2025

View reviewed changes

incorporate feedback

f34a49d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for AC budget API #1731

Add support for AC budget API #1731

Uh oh!

tohskai commented Sep 21, 2025

Uh oh!

meta-cla bot commented Sep 21, 2025

Uh oh!

meta-cla bot commented Sep 21, 2025

Uh oh!

wwwjn left a comment

Uh oh!

Uh oh!

tohskai commented Sep 22, 2025 •

edited

Loading

Uh oh!

wwwjn commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Uh oh!

tohskai commented Oct 1, 2025 •

edited

Loading

Uh oh!

tohskai commented Oct 1, 2025

Uh oh!

Uh oh!

Add support for AC budget API #1731

Are you sure you want to change the base?

Add support for AC budget API #1731

Uh oh!

Conversation

tohskai commented Sep 21, 2025

Uh oh!

meta-cla bot commented Sep 21, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Sep 21, 2025

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tohskai commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwjn commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tohskai commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tohskai commented Oct 1, 2025

Uh oh!

Uh oh!

tohskai commented Sep 22, 2025 •

edited

Loading

tohskai commented Oct 1, 2025 •

edited

Loading