Skip to content

Conversation

justinchuby
Copy link
Collaborator

When onnx shape inference is run on symbolic input dims, it will not handle the dim name propagation and instead create a None. As long as we rely on the current version onnx shape inference there is not better information we can get.

However, since in the optimizer we also have some custom shape propagator implemented (e.g. for Identity) that will propagate sym dims, we should encode the equivalents for those dimensions as much as possible.

This PR assigns a string to all None dims produced by onnx shape inference, so that the string names can get propagated when possible by the optimizer.

When onnx shape inference is run on symbolic input dims, it will not handle the dim name propagation and instead create a None. As long as we rely on the current version onnx shape inference there is not better information we can get.

However, since in the optimizer we also have some custom shape propagator implemented (e.g. for Identity) that will propagate sym dims, we should encode the equivalents for those dimensions as much as possible.

This PR assigns a string to all None dims produced by onnx shape inference, so that the string names can get propagated when possible by the optimizer.

Signed-off-by: Justin Chu <[email protected]>
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses an issue where ONNX shape inference creates None dimensions when working with symbolic input dimensions, which prevents proper dimension name propagation in the optimizer. The solution assigns unique string names to all None dimensions produced by ONNX shape inference so they can be properly propagated by custom shape propagators.

  • Added a counter to track unknown dimensions and generate unique names
  • Modified shape inference to replace None dimensions with named symbolic dimensions
  • Enhanced the shape merging process to maintain dimension equivalents

Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Signed-off-by: Justin Chu <[email protected]>
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.25%. Comparing base (28a8f56) to head (baeb372).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2619      +/-   ##
==========================================
- Coverage   70.30%   63.25%   -7.06%     
==========================================
  Files         222      222              
  Lines       26278    26289      +11     
  Branches     2625     2627       +2     
==========================================
- Hits        18476    16628    -1848     
- Misses       6885     8833    +1948     
+ Partials      917      828      -89     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby justinchuby added this to the 0.5.4 milestone Oct 9, 2025
inferred_type
)
output.shape = _merge_shapes(output.shape, inferred_shape)
merged_shape = _merge_shapes(output.shape, inferred_shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean _merge_shapes propagating sym_dim? It looks like more of filling in shape info if it's a known int.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so:

if dim1.value is None:
return dim2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this specific case, I don't see how dim2 can provide meaningful sym_dim, as it's from onnxtype inference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong.

Copy link
Collaborator Author

@justinchuby justinchuby Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought dim2 was the original output dim, which may be from pytorch?

Copy link
Collaborator Author

@justinchuby justinchuby Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh dim1 is. Maybe dim1 should be the preferred shape then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you must be right in the dim merging case. However:

If output.shape is None, then we can take inferred_shape; If output.shape is not None, we will keep it. We probably called _merge_shapes here to for a robust logic that is shared.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of merging shape information coming from two sources (assuming both to be correct) is a useful one. No point in specializing any further with assumptions about which of the two sources will have what information (because it is not needed). In particular, we can't and need not assume that the existing output.shape comes from pytorch exporter .... it might have been introduced by some optimization rule.

The only special case to be handled is if two different symbolic dims are used in the two different shapes for same dim. For now, we choose the first one. (Ideally, the underlying system would record that the two symbolic dims are meant to be the same ... so that it can be used to globally to use the same one for either one of them. We don't do such things at this point.)

@justinchuby
Copy link
Collaborator Author

@gramalingam @titaiwangms this change fails SkipLayerNormFusion. Does it require more information than shape inference can provide on dynamic dims for fusing nodes?

def _new_unknown_dim_name(self) -> str:
"""Generate a new unique name for an unknown (None) symbolic dimension."""
name = f"unknown_{self._unknown_dim_count}"
self._unknown_dim_count += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea is useful. But this doesn't guarantee that the generated dim name will be unique (even though the likelihood of a conflict is low right now). Technically, we will need to identify all symbolic names used in the model first to check for conflicts (eg., like onnx does here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix that part. Do you know have an idea why this would break SkipLayerNormFusion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix that part. Do you know have an idea why this would break SkipLayerNormFusion?

Will take a look

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange ... was this working earlier? I noticed failures due to lack of shape information. That makes sense, since the input models don't have shape information. Specifically, these test-cases were generated by converting onnx models into onnxscript models: but the original examples did not include the shape information from original models into the onnxscript models (as here). Later on, the examples were extended so that we stored the value-infos from onnx models also into the corresponding onnxscript test-case (like here) ... but looks like we still have some older test-cases without shape information.

I added a call to shape-inference pass at the beginning of the test-case ... that mostly works, but it still fails in some edge-cases where it looks like we are unable to infer that some dim is "batch-size".

I am guessing if we update the test-cases to include shape information (as produced by exporters), it should work ... need to think best workaround

@justinchuby justinchuby modified the milestones: 0.5.4, 0.5.5 Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants