Skip to content

Conversation

tohskai
Copy link

@tohskai tohskai commented Sep 21, 2025

Copy link

meta-cla bot commented Sep 21, 2025

Hi @tohskai!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Copy link

meta-cla bot commented Sep 21, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 21, 2025
Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the blog and find the memory budget idea is cool. Have you try out the implementation on some model (eg, llama3) with torch.compile? I'm curious does it works end to end and if the performance are better

@tohskai
Copy link
Author

tohskai commented Sep 22, 2025

I read the blog and find the memory budget idea is cool. Have you try out the implementation on some model (eg, llama3) with torch.compile? I'm curious does it works end to end and if the performance are better

I haven't done runs on llama3, but on our benchmarks on it showed significant improvements over regular SAC. This is why I wanted to upstream this :)
image
image

But our model is quite different, so it's totally reasonable to see less gains.

@tianyu-l tianyu-l requested a review from soulitzer September 22, 2025 21:19
@wwwjn
Copy link
Contributor

wwwjn commented Sep 22, 2025

Thanks for sharing! We would love to see more verifications - eg, correctness and loss curves , and performance analysis on titan supported models (llama3, etc)

cc @soulitzer for reviewing

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done runs on llama3, but on our benchmarks on it showed significant improvements over regular SAC. This is why I wanted to upstream this :)

I agree with @wwwjn that

We would love to see more verifications - eg, correctness and loss curves , and performance analysis on titan supported models (llama3, etc)

Please refer to https://github.com/pytorch/torchtitan/blob/main/CONTRIBUTING.md#proof-of-value

@tohskai
Copy link
Author

tohskai commented Oct 1, 2025

@wwwjn @soulitzer @tianyu-l

Should this support selection of activation_memory_budget_solver and activation_memory_budget_runtime_estimator? What about visualize_memory_budget_pareto? I found them to be useful, and the discoverability is low, but given that this is unstable api and potentially feature overload, I would prefer to hear your opinion.

https://github.com/pytorch/pytorch/blob/main/torch/_functorch/config.py#L147-L169

@tohskai
Copy link
Author

tohskai commented Oct 1, 2025

I ran models/llama3/train_configs/llama3_8b.toml on 8xH100:

image image image image

Would that suffice for Proof of Value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants