-
Notifications
You must be signed in to change notification settings - Fork 31.3k
[loading] Re-add and improve disk offloading support #42242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
c5d1a03 to
937e7e4
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the weight renaming, IMO we should just use the prepare model state dict instead of the weights
| # Offloading support | ||
| if param_device == "disk": | ||
| missing_keys.discard(target_name) | ||
| # If not already offloaded, or if we applied any special Operation, we need to re-save | ||
| if target_name not in disk_offload_index or len(operations) > 0: | ||
| disk_offload_index = offload_weight( | ||
| param, target_name, disk_offload_folder, disk_offload_index | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be done inside set param for module as its kinda related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred to keep it outside for now, as it's actually not setting the param! (just saves it to disk and skip loading) So would be a bit weird IMO to do that inside set_param as it does not set it
c634b9a to
f2cd562
Compare
49d6b3a to
9a0675a
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there were other tests I skiped with a todo @Cyrilvallez but ff to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I missed a few! Nice catch!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: deepseek_vl_hybrid, glm4_moe |
What does this PR do?
This PR re-add support for weight offloading to disk using the
device_map, which was temporarily dropped in #41580 to simplify the PR (because weights need to be resaved correctly after performing custom Ops during dynamic loading).It further improve the offloading mechanism as everything is now offloaded in
safetensorsformat, vsnumpyformat originally when usingaccelerate. This is much easier and efficient.Slow tests are exactly similar to before the big weight loading refactor (10 failing over all models)