-
Notifications
You must be signed in to change notification settings - Fork 634
[Group Partitioner] Optimize Speed #12844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12844
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ec7a6b5 with merge base 21c8e67 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
ghstack-source-id: 55182c1 ghstack-comment-id: 3115642422 Pull Request resolved: pytorch#12844
@mcr229 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
We do some optimizations on the group partitioner to improve its speed. Theoretically, since we've pre-grouped some partitions already, we should be faster than the capability based partitioner. For example, if we a dynamically quantized group, this could contain
9 nodes. In capability-based partitioner they will have to run DFS on all 9 nodes in order to group these together. Based on the hints and purpose of the group based partitioenr, we don't perform these checks and instead group all these 9 nodes, saving time by avoiding these checks. Some stats when partitioning the mobile bert model:
we see a 13x improvement in partitioning when using the group based partitioner, while still partitioning around the same number of nodes.
Differential Revision: D79020720