[Autocast] Optimize `_add_cast` runtime #469

aboubezari · 2025-10-27T20:44:39Z

What does this PR do?

Type of change: Bug fix

Overview: Optimize the runtime of the _add_cast function. This function will greedily re-compute the producers & consumers of each node, leading to a O(N^2) runtime. This new implementation will efficiently pre-compute all the producers and consumers of the graph and look them up at constant time while looping through the tensors. There is no need to greedily compute producers and consumers, since we process each tensor only once, and only affect the consumers of that tensor during each iteration.

Before:

After:

Testing

Precision converter unittest

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Signed-off-by: Ali Boubezari <[email protected]>

copy-pr-bot · 2025-10-27T20:44:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

galagam · 2025-10-29T16:14:08Z

modelopt/onnx/autocast/precisionconverter.py

+        tensor_to_consumers = defaultdict(list)
+        tensor_to_producers = defaultdict(list)
+
+        for node in self.model.graph.node:
+            for input in node.input:
+                tensor_to_consumers[input].append(node)
+            for output in node.output:
+                tensor_to_producers[output].append(node)


This is indeed more efficient, but it's a bit risky. It relies on the assumption that the graph is static, meaning the list of consumers and producers for each tensor is constant. However, when we inject cast nodes, we render this data invalid.

In the limited scope where this is applied, it's probably OK, because we call self._remove_preexisting_casts() which prevents "chains" of cast nodes (cast->cast->cast). But we cannot replace all calls to utils.get_consumer_nodes and utils.get_consumer_nodes, nor can we modify all _add_cast instances (which you probably know, because you avoided that in this PR).
I think this at least warrants a comment to warn unsuspecting developers.

You're right, we can only do this optimization since there are no cast nodes in the graph and the producers/consumers and guaranteed not to be affected from iteration to iteration, and it's why I made sure to keep the params optional for this specific case.

It's up to you how you want to proceed, we can:

Keep as is, I added a warning to other devs, I feel that devs using this function can just leave those optional params empty to keep the safe behavior.

We write a separate function or move this logic out of _add_cast to make it super explicit for this use case.

Thanks for adding the comments. Approved.

codecov · 2025-10-29T16:39:40Z

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.39%. Comparing base (41de55f) to head (8b8c735).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/autocast/precisionconverter.py	84.61%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #469   +/-   ##
=======================================
  Coverage   73.38%   73.39%           
=======================================
  Files         180      180           
  Lines       18110    18138   +28     
=======================================
+ Hits        13290    13312   +22     
- Misses       4820     4826    +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Ali Boubezari <[email protected]>

galagam · 2025-10-29T18:44:42Z

/ok to test 8b8c735

Signed-off-by: Ali Boubezari <[email protected]>

[Autocast] Optimize runtime

0ee36a0

Signed-off-by: Ali Boubezari <[email protected]>

aboubezari requested a review from a team as a code owner October 27, 2025 20:44

aboubezari requested a review from i-riyad October 27, 2025 20:44

galagam self-requested a review October 28, 2025 16:25

galagam reviewed Oct 29, 2025

View reviewed changes

add warning to devs about precomputed maps

8b8c735

Signed-off-by: Ali Boubezari <[email protected]>

aboubezari requested a review from galagam October 29, 2025 17:31

galagam approved these changes Oct 29, 2025

View reviewed changes

galagam enabled auto-merge (squash) October 29, 2025 18:47

galagam merged commit f2eb794 into NVIDIA:main Oct 29, 2025
26 checks passed

kevalmorabia97 pushed a commit that referenced this pull request Oct 30, 2025

[Autocast] Optimize _add_cast runtime (#469)

9e827a9

Signed-off-by: Ali Boubezari <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Autocast] Optimize `_add_cast` runtime #469

[Autocast] Optimize `_add_cast` runtime #469

Uh oh!

aboubezari commented Oct 27, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

galagam Oct 29, 2025

Uh oh!

aboubezari Oct 29, 2025

Uh oh!

galagam Oct 29, 2025

Uh oh!

codecov bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

galagam commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Autocast] Optimize _add_cast runtime #469

[Autocast] Optimize _add_cast runtime #469

Uh oh!

Conversation

aboubezari commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

galagam Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

aboubezari Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

galagam Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

galagam commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Autocast] Optimize `_add_cast` runtime #469

[Autocast] Optimize `_add_cast` runtime #469

aboubezari commented Oct 27, 2025 •

edited

Loading

codecov bot commented Oct 29, 2025 •

edited

Loading