-
Notifications
You must be signed in to change notification settings - Fork 139
Speedup FusionOptimizer #1615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Speedup FusionOptimizer #1615
Conversation
19748ac
to
97cef0b
Compare
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (97.24%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1615 +/- ##
=======================================
Coverage 81.64% 81.64%
=======================================
Files 231 231
Lines 52997 52951 -46
Branches 9395 9390 -5
=======================================
- Hits 43267 43234 -33
+ Misses 7282 7273 -9
+ Partials 2448 2444 -4
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the FusionOptimizer to improve compilation performance by a factor of 3-4x on benchmarked graphs. The optimization focuses on reducing redundant operations and using more efficient data structures for graph analysis.
- Implemented bitset-based ancestor dependency tracking for faster subgraph convexity checks
- Eliminated redundant graph cloning and toposort computations during fusion analysis
- Streamlined the fusion algorithm to avoid backtracking by using a more direct approach
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
pytensor/tensor/rewriting/elemwise.py | Complete rewrite of FusionOptimizer logic with bitset-based dependency tracking and elimination of redundant operations |
pytensor/scalar/basic.py | Performance optimizations for scalar type creation, graph cleanup, and C code validation |
tests/tensor/rewriting/test_elemwise.py | Updated benchmarks with new test cases and expected fusion counts |
tests/test_printing.py | Updated expected test output reflecting changes in Composite operation ordering |
pytensor/tensor/conv/abstract_conv.py | Removed type checking comment for scipy import |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
# FIXME: Shouldn't this be a unique name per unique variable? | ||
["x" for x in inputs], | ||
["z" for z in outputs], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIXME comment on line 1350 indicates a potential issue with variable naming that could cause conflicts. This should be addressed or the comment should be removed if the current implementation is intentionally correct.
# FIXME: Shouldn't this be a unique name per unique variable? | |
["x" for x in inputs], | |
["z" for z in outputs], | |
[f"x{i}" for i in range(len(inputs))], | |
[f"z{i}" for i in range(len(outputs))], |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not correct, it should be per unique variable. Anyway, I don't want to touch this now, hence why I added a TODO comment
97ff68d
to
930340c
Compare
Not using `__call__` avoids the test_value computation
It's not really needed as we never expand on the new nodes
930340c
to
986ba6f
Compare
The change in number of fused kernels has to do with the order of iteration, and could be replicated in the old approach by iterating in topological order. It was an accident that it happen to visit in an order where it connected two branches, instead of keeping them separate. The underlying limitation already existed and is described in pymc-devs#249
986ba6f
to
eb010b7
Compare
Tests are passing and conflicts sorted |
FusionOptimizer can be one of the slower rewrites during compilation. This PR speedups it up by a factor of 4-3x in the benchmarked graphs.
Each commit provides a substantial speedup (except maybe for 2-> 3).
The main speedups come from:
The logic for finding valid fused kernels is also more clear now imo, avoiding the need for backtracking.
Benchmark per commit