Skip to content

Conversation

@aboubezari
Copy link
Contributor

@aboubezari aboubezari commented Oct 27, 2025

What does this PR do?

Type of change: Bug fix

Overview: Optimize the runtime of the _add_cast function. This function will greedily re-compute the producers & consumers of each node, leading to a O(N^2) runtime. This new implementation will efficiently pre-compute all the producers and consumers of the graph and look them up at constant time while looping through the tensors. There is no need to greedily compute producers and consumers, since we process each tensor only once, and only affect the consumers of that tensor during each iteration.

Before:
image

After:
image

Testing

Precision converter unittest

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Signed-off-by: Ali Boubezari <[email protected]>
@aboubezari aboubezari requested a review from a team as a code owner October 27, 2025 20:44
@aboubezari aboubezari requested a review from i-riyad October 27, 2025 20:44
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@galagam galagam self-requested a review October 28, 2025 16:25
Comment on lines +187 to +194
tensor_to_consumers = defaultdict(list)
tensor_to_producers = defaultdict(list)

for node in self.model.graph.node:
for input in node.input:
tensor_to_consumers[input].append(node)
for output in node.output:
tensor_to_producers[output].append(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed more efficient, but it's a bit risky. It relies on the assumption that the graph is static, meaning the list of consumers and producers for each tensor is constant. However, when we inject cast nodes, we render this data invalid.

In the limited scope where this is applied, it's probably OK, because we call self._remove_preexisting_casts() which prevents "chains" of cast nodes (cast->cast->cast). But we cannot replace all calls to utils.get_consumer_nodes and utils.get_consumer_nodes, nor can we modify all _add_cast instances (which you probably know, because you avoided that in this PR).
I think this at least warrants a comment to warn unsuspecting developers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we can only do this optimization since there are no cast nodes in the graph and the producers/consumers and guaranteed not to be affected from iteration to iteration, and it's why I made sure to keep the params optional for this specific case.

It's up to you how you want to proceed, we can:

  1. Keep as is, I added a warning to other devs, I feel that devs using this function can just leave those optional params empty to keep the safe behavior.
  2. We write a separate function or move this logic out of _add_cast to make it super explicit for this use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comments. Approved.

@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.39%. Comparing base (41de55f) to head (8b8c735).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/autocast/precisionconverter.py 84.61% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #469   +/-   ##
=======================================
  Coverage   73.38%   73.39%           
=======================================
  Files         180      180           
  Lines       18110    18138   +28     
=======================================
+ Hits        13290    13312   +22     
- Misses       4820     4826    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aboubezari aboubezari requested a review from galagam October 29, 2025 17:31
@galagam
Copy link
Contributor

galagam commented Oct 29, 2025

/ok to test 8b8c735

@galagam galagam enabled auto-merge (squash) October 29, 2025 18:47
@galagam galagam merged commit f2eb794 into NVIDIA:main Oct 29, 2025
26 checks passed
kevalmorabia97 pushed a commit that referenced this pull request Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants