- 
                Notifications
    You must be signed in to change notification settings 
- Fork 188
          [Autocast] Optimize _add_cast runtime
          #469
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ali Boubezari <[email protected]>
| tensor_to_consumers = defaultdict(list) | ||
| tensor_to_producers = defaultdict(list) | ||
|  | ||
| for node in self.model.graph.node: | ||
| for input in node.input: | ||
| tensor_to_consumers[input].append(node) | ||
| for output in node.output: | ||
| tensor_to_producers[output].append(node) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed more efficient, but it's a bit risky. It relies on the assumption that the graph is static, meaning the list of consumers and producers for each tensor is constant. However, when we inject cast nodes, we render this data invalid.
In the limited scope where this is applied, it's probably OK, because we call self._remove_preexisting_casts() which prevents "chains" of cast nodes (cast->cast->cast). But we cannot replace all calls to utils.get_consumer_nodes and utils.get_consumer_nodes, nor can we modify all _add_cast instances (which you probably know, because you avoided that in this PR).
I think this at least warrants a comment to warn unsuspecting developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we can only do this optimization since there are no cast nodes in the graph and the producers/consumers and guaranteed not to be affected from iteration to iteration, and it's why I made sure to keep the params optional for this specific case.
It's up to you how you want to proceed, we can:
- Keep as is, I added a warning to other devs, I feel that devs using this function can just leave those optional params empty to keep the safe behavior.
- We write a separate function or move this logic out of _add_castto make it super explicit for this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comments. Approved.
| Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@           Coverage Diff           @@
##             main     #469   +/-   ##
=======================================
  Coverage   73.38%   73.39%           
=======================================
  Files         180      180           
  Lines       18110    18138   +28     
=======================================
+ Hits        13290    13312   +22     
- Misses       4820     4826    +6     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
Signed-off-by: Ali Boubezari <[email protected]>
| /ok to test 8b8c735 | 
Signed-off-by: Ali Boubezari <[email protected]>
What does this PR do?
Type of change: Bug fix
Overview: Optimize the runtime of the
_add_castfunction. This function will greedily re-compute the producers & consumers of each node, leading to aO(N^2)runtime. This new implementation will efficiently pre-compute all the producers and consumers of the graph and look them up at constant time while looping through the tensors. There is no need to greedily compute producers and consumers, since we process each tensor only once, and only affect the consumers of that tensor during each iteration.Before:

After:

Testing
Precision converter unittest
Before your PR is "Ready for review"