Skip to content

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Sep 20, 2025

FusionOptimizer can be one of the slower rewrites during compilation. This PR speedups it up by a factor of 4-3x in the benchmarked graphs.

Each commit provides a substantial speedup (except maybe for 2-> 3).

The main speedups come from:

  1. Reducing number of inner graph clonings when creating CompositeOp
  2. Reducing number of toposort computation
  3. Using bitsets and bitflags to efficiently compute multiset ancestor dependencies (to ask: do these variables depend on these others?)

The logic for finding valid fused kernels is also more clear now imo, avoiding the need for backtracking.

Benchmark per commit
HEAD is now at 774792356 Benchmark another FusionOptimizer graph
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        23.0197 (1.0)       54.2629 (1.0)       36.1126 (1.0)      12.1105 (1.99)      41.8545 (1.0)      18.0821 (2.25)          2;0  27.6912 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     485.6301 (21.10)    503.2956 (9.28)     496.1585 (13.74)     6.0741 (1.0)      496.9919 (11.87)     8.0230 (1.0)           2;0   2.0155 (0.07)          7           5
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at c7b287f39 Short-circuit `as_scalar` common cases faster
---------------------------------------------------------------------------------------------------------- benchmark: 2 tests ----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        23.2032 (1.0)       30.3229 (1.0)       27.0646 (1.0)      3.2161 (1.0)       28.9586 (1.0)      5.9681 (1.0)           3;0  36.9486 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     487.7418 (21.02)    499.8394 (16.48)    494.1633 (18.26)    4.6597 (1.45)     494.5370 (17.08)    8.1017 (1.36)          3;0   2.0236 (0.05)          7           5
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at 598d9fcb9 Speedup supports c_code
---------------------------------------------------------------------------------------------------------- benchmark: 2 tests ----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        22.3723 (1.0)       30.8797 (1.0)       26.4816 (1.0)      3.6947 (1.0)       28.3610 (1.0)      6.6708 (1.0)           3;0  37.7620 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     489.3477 (21.87)    504.9612 (16.35)    495.2749 (18.70)    6.1049 (1.65)     492.2781 (17.36)    9.8775 (1.48)          1;0   2.0191 (0.05)          7           5
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at 41707795b Speedup FusionOptimizer.elemwise_to_scalar
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests ----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        19.5081 (1.0)       27.7007 (1.0)       21.9665 (1.0)       3.4888 (1.0)       20.2496 (1.0)      5.1938 (1.0)           2;0  45.5238 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     419.0247 (21.48)    617.6116 (22.30)    461.7518 (21.02)    69.2312 (19.84)    439.2438 (21.69)    9.9747 (1.92)          1;2   2.1657 (0.05)          7           5
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at 34de75ff5 Avoid double cloning of Composite Ops created by FusionOptimizer
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        17.0792 (1.0)       24.1214 (1.0)       19.3538 (1.0)       3.1298 (1.0)       17.6328 (1.0)       5.0325 (1.0)           2;0  51.6693 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     381.0516 (22.31)    455.6597 (18.89)    407.9326 (21.08)    27.3474 (8.74)     398.1366 (22.58)    37.5527 (7.46)          1;0   2.4514 (0.05)          7           5
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at 543a8a23a Do not recompute toposort in every iteration of FusionOptimizer
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        14.3812 (1.0)       21.8006 (1.0)       16.3498 (1.0)       3.1517 (1.0)       14.5783 (1.0)       4.2066 (1.0)           2;0  61.1627 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     244.2117 (16.98)    279.3751 (12.82)    261.6797 (16.01)    11.5636 (3.67)     264.0385 (18.11)    14.4312 (3.43)          2;0   3.8215 (0.06)          7           5
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at dd75569a1 Cleanup FusionOptimizer code
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        13.8777 (1.0)       21.8236 (1.0)       15.9468 (1.0)       3.5019 (1.0)       13.9371 (1.0)       4.7736 (1.0)           2;0  62.7085 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     243.7067 (17.56)    276.4169 (12.67)    257.8971 (16.17)    12.8118 (3.66)     256.3974 (18.40)    22.7781 (4.77)          3;0   3.8775 (0.06)          7           5
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at 68ca3cf36 Copy on write in FusionOptimizer
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]        14.1523 (1.0)       20.7984 (1.0)       16.0264 (1.0)       2.8238 (1.0)       14.4556 (1.0)       3.9848 (1.0)           2;0  62.3970 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     195.4694 (13.81)    264.7950 (12.73)    221.6116 (13.83)    22.3072 (7.90)     213.4827 (14.77)    19.7748 (4.96)          2;1   4.5124 (0.07)          7           5
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at bb3e54c57 Use bitset to check ancestors more efficiently
------------------------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean             StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]         8.1186 (1.0)       14.0557 (1.0)        9.9162 (1.0)       2.6646 (1.0)        8.5550 (1.0)       4.1075 (1.0)           2;0  100.8451 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     152.0318 (18.73)    176.6330 (12.57)    163.5286 (16.49)    10.0701 (3.78)     165.0426 (19.29)    18.2174 (4.44)          2;0    6.1151 (0.06)          7           5
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HEAD is now at c392faec9 Avoid backtracking in FusionOptimizer
----------------------------------------------------------------------------------------------------------- benchmark: 2 tests ----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                         Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_rewrite_benchmark[deep_small_kernels-20-expected_n_repl0]         6.9413 (1.0)       14.3935 (1.0)        8.9253 (1.0)      3.2081 (1.0)        7.1780 (1.0)      4.3681 (1.0)           2;0  112.0412 (1.0)           7           5
test_rewrite_benchmark[large_fuseable_graph-25-expected_n_repl1]     140.8090 (20.29)    155.4400 (10.80)    149.6261 (16.76)    5.7609 (1.80)     151.9854 (21.17)    9.8091 (2.25)          3;0    6.6833 (0.06)          7           5
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@ricardoV94 ricardoV94 force-pushed the faster_fusion_optimizer_based_on_edges branch 4 times, most recently from 19748ac to 97cef0b Compare September 20, 2025 11:10
Copy link

codecov bot commented Sep 20, 2025

Codecov Report

❌ Patch coverage is 97.24138% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.64%. Comparing base (96122d1) to head (eb010b7).

Files with missing lines Patch % Lines
pytensor/tensor/rewriting/elemwise.py 97.41% 1 Missing and 2 partials ⚠️
pytensor/scalar/basic.py 96.55% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (97.24%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1615   +/-   ##
=======================================
  Coverage   81.64%   81.64%           
=======================================
  Files         231      231           
  Lines       52997    52951   -46     
  Branches     9395     9390    -5     
=======================================
- Hits        43267    43234   -33     
+ Misses       7282     7273    -9     
+ Partials     2448     2444    -4     
Files with missing lines Coverage Δ
pytensor/scalar/basic.py 80.57% <96.55%> (-0.01%) ⬇️
pytensor/tensor/rewriting/elemwise.py 93.61% <97.41%> (+1.00%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ricardoV94 ricardoV94 marked this pull request as ready for review September 20, 2025 14:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the FusionOptimizer to improve compilation performance by a factor of 3-4x on benchmarked graphs. The optimization focuses on reducing redundant operations and using more efficient data structures for graph analysis.

  • Implemented bitset-based ancestor dependency tracking for faster subgraph convexity checks
  • Eliminated redundant graph cloning and toposort computations during fusion analysis
  • Streamlined the fusion algorithm to avoid backtracking by using a more direct approach

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pytensor/tensor/rewriting/elemwise.py Complete rewrite of FusionOptimizer logic with bitset-based dependency tracking and elimination of redundant operations
pytensor/scalar/basic.py Performance optimizations for scalar type creation, graph cleanup, and C code validation
tests/tensor/rewriting/test_elemwise.py Updated benchmarks with new test cases and expected fusion counts
tests/test_printing.py Updated expected test output reflecting changes in Composite operation ordering
pytensor/tensor/conv/abstract_conv.py Removed type checking comment for scipy import

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1350 to 1352
# FIXME: Shouldn't this be a unique name per unique variable?
["x" for x in inputs],
["z" for z in outputs],
Copy link
Preview

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIXME comment on line 1350 indicates a potential issue with variable naming that could cause conflicts. This should be addressed or the comment should be removed if the current implementation is intentionally correct.

Suggested change
# FIXME: Shouldn't this be a unique name per unique variable?
["x" for x in inputs],
["z" for z in outputs],
[f"x{i}" for i in range(len(inputs))],
[f"z{i}" for i in range(len(outputs))],

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not correct, it should be per unique variable. Anyway, I don't want to touch this now, hence why I added a TODO comment

@ricardoV94 ricardoV94 force-pushed the faster_fusion_optimizer_based_on_edges branch 6 times, most recently from 97ff68d to 930340c Compare September 23, 2025 06:05
@ricardoV94 ricardoV94 force-pushed the faster_fusion_optimizer_based_on_edges branch from 930340c to 986ba6f Compare September 23, 2025 06:05
The change in number of fused kernels has to do with the order of iteration, and could be replicated in the old approach by iterating in topological order. It was an accident that it happen to visit in an order where it connected two branches, instead of keeping them separate. The underlying limitation already existed and is described in pymc-devs#249
@ricardoV94
Copy link
Member Author

Tests are passing and conflicts sorted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants