-
Notifications
You must be signed in to change notification settings - Fork 86
Assign names for None dims #2619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When onnx shape inference is run on symbolic input dims, it will not handle the dim name propagation and instead create a None. As long as we rely on the current version onnx shape inference there is not better information we can get. However, since in the optimizer we also have some custom shape propagator implemented (e.g. for Identity) that will propagate sym dims, we should encode the equivalents for those dimensions as much as possible. This PR assigns a string to all None dims produced by onnx shape inference, so that the string names can get propagated when possible by the optimizer. Signed-off-by: Justin Chu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses an issue where ONNX shape inference creates None dimensions when working with symbolic input dimensions, which prevents proper dimension name propagation in the optimizer. The solution assigns unique string names to all None dimensions produced by ONNX shape inference so they can be properly propagated by custom shape propagators.
- Added a counter to track unknown dimensions and generate unique names
- Modified shape inference to replace None dimensions with named symbolic dimensions
- Enhanced the shape merging process to maintain dimension equivalents
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2619 +/- ##
==========================================
- Coverage 70.30% 63.25% -7.06%
==========================================
Files 222 222
Lines 26278 26289 +11
Branches 2625 2627 +2
==========================================
- Hits 18476 16628 -1848
- Misses 6885 8833 +1948
+ Partials 917 828 -89 ☔ View full report in Codecov by Sentry. |
inferred_type | ||
) | ||
output.shape = _merge_shapes(output.shape, inferred_shape) | ||
merged_shape = _merge_shapes(output.shape, inferred_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean _merge_shapes
propagating sym_dim? It looks like more of filling in shape info if it's a known int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so:
onnxscript/onnxscript/optimizer/_constant_folding.py
Lines 915 to 916 in 28a8f56
if dim1.value is None: | |
return dim2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this specific case, I don't see how dim2 can provide meaningful sym_dim, as it's from onnxtype inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought dim2 was the original output dim, which may be from pytorch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh dim1 is. Maybe dim1 should be the preferred shape then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you must be right in the dim merging case. However:
If output.shape is None, then we can take inferred_shape; If output.shape is not None, we will keep it. We probably called _merge_shapes here to for a robust logic that is shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of merging shape information coming from two sources (assuming both to be correct) is a useful one. No point in specializing any further with assumptions about which of the two sources will have what information (because it is not needed). In particular, we can't and need not assume that the existing output.shape comes from pytorch exporter .... it might have been introduced by some optimization rule.
The only special case to be handled is if two different symbolic dims are used in the two different shapes for same dim. For now, we choose the first one. (Ideally, the underlying system would record that the two symbolic dims are meant to be the same ... so that it can be used to globally to use the same one for either one of them. We don't do such things at this point.)
@gramalingam @titaiwangms this change fails SkipLayerNormFusion. Does it require more information than shape inference can provide on dynamic dims for fusing nodes? |
def _new_unknown_dim_name(self) -> str: | ||
"""Generate a new unique name for an unknown (None) symbolic dimension.""" | ||
name = f"unknown_{self._unknown_dim_count}" | ||
self._unknown_dim_count += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic idea is useful. But this doesn't guarantee that the generated dim name will be unique (even though the likelihood of a conflict is low right now). Technically, we will need to identify all symbolic names used in the model first to check for conflicts (eg., like onnx does here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can fix that part. Do you know have an idea why this would break SkipLayerNormFusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can fix that part. Do you know have an idea why this would break SkipLayerNormFusion?
Will take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange ... was this working earlier? I noticed failures due to lack of shape information. That makes sense, since the input models don't have shape information. Specifically, these test-cases were generated by converting onnx models into onnxscript models: but the original examples did not include the shape information from original models into the onnxscript models (as here). Later on, the examples were extended so that we stored the value-infos from onnx models also into the corresponding onnxscript test-case (like here) ... but looks like we still have some older test-cases without shape information.
I added a call to shape-inference pass at the beginning of the test-case ... that mostly works, but it still fails in some edge-cases where it looks like we are unable to infer that some dim is "batch-size".
I am guessing if we update the test-cases to include shape information (as produced by exporters), it should work ... need to think best workaround
When onnx shape inference is run on symbolic input dims, it will not handle the dim name propagation and instead create a None. As long as we rely on the current version onnx shape inference there is not better information we can get.
However, since in the optimizer we also have some custom shape propagator implemented (e.g. for Identity) that will propagate sym dims, we should encode the equivalents for those dimensions as much as possible.
This PR assigns a string to all None dims produced by onnx shape inference, so that the string names can get propagated when possible by the optimizer.