-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Fix LSTM conversion for models with rank > 3 inputs from Unsqueeze operations #33023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix LSTM conversion for models with rank > 3 inputs from Unsqueeze operations #33023
Conversation
Problem: Models like silero_vad contain LSTM layers with high-rank input tensors (rank > 3), but OpenVINO's LSTM expects exactly rank 3 [batch, sequence, features]. This causes conversion failures for models with shapes like [1, 1, ?, ?, ?]. Solution: Add reduce_tensor_rank() helper function that processes LSTM inputs (X, initial_h, initial_c) before axis reordering. The function: - Squeezes leading dimensions equal to 1 when possible - Uses Reshape to collapse leading dimensions if they aren't all 1 - Reduces rank to target rank 3 before reordering axes Test: Added onnx_model_lstm_high_rank_input test with rank-5 input [1,1,3,2,4] that gets reduced to [3,2,4]. Reference outputs generated using ONNX Runtime with equivalent rank-3 input.
This test does not reproduce silero_vad.onnx structure. Keep only tests that match actual silero_vad patterns: - lstm_rank5_squeeze: multi-axis squeeze (with GPU skip) - lstm_rank4_with_unsqueeze: exact silero_vad structure
| }; | ||
|
|
||
| // Helper function to reduce tensor rank to target_rank by squeezing or reshaping | ||
| std::shared_ptr<ov::Node> reduce_tensor_rank(const ov::Output<ov::Node>& input, int64_t target_rank) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::shared_ptr<ov::Node> reduce_tensor_rank(const ov::Output<ov::Node>& input, int64_t target_rank) { | |
| ov::Output<ov::Node> reduce_tensor_rank(const ov::Output<ov::Node>& input, int64_t target_rank) { |
Avoid working with nodes as this might be a node with many outouts and you will use first output that way
| if (!input_shape.rank().is_static()) { | ||
| return input.get_node_shared_ptr(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!input_shape.rank().is_static()) { | |
| return input.get_node_shared_ptr(); | |
| } | |
| if (input_shape.rank().is_dynamic()) { | |
| return input; | |
| } |
| const auto input_rank = input_shape.rank().get_length(); | ||
|
|
||
| if (input_rank <= target_rank) { | ||
| return input.get_node_shared_ptr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return input.get_node_shared_ptr(); | |
| return input; |
| auto shape_of_input = std::make_shared<v3::ShapeOf>(input); | ||
| auto start_idx = v0::Constant::create(ov::element::i64, Shape{1}, {input_rank - target_rank}); | ||
| auto stop_idx = v0::Constant::create(ov::element::i64, Shape{1}, {input_rank}); | ||
| auto step = v0::Constant::create(ov::element::i64, Shape{1}, {1}); | ||
|
|
||
| // Get last target_rank dimensions: shape[-target_rank:] | ||
| auto last_dims = std::make_shared<v8::Slice>(shape_of_input, start_idx, stop_idx, step); | ||
|
|
||
| // Reshape to extract last target_rank dimensions | ||
| return std::make_shared<v1::Reshape>(input, last_dims, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can extra input dimensions be not 1? In such case the reshape will fail
|
LSTM spec (https://onnx.ai/onnx/operators/onnx__LSTM.html) describes X, init C and H as 3D tensors. Is this a broader behavior supported by onnxruntime? |
Details:
Subgraph from silero_vad.onnx

Problem
ONNX models exported from PyTorch frequently contain Unsqueeze operations before LSTM nodes. These operations add extra dimensions to tensors, resulting in rank-4 or rank-5 inputs to LSTM nodes. However, the ONNX LSTM specification strictly requires rank-3 inputs with shape [seq_length, batch_size, input_size]. Why this happens:
PyTorch models use various tensor shapes during training
During ONNX export, shape mismatches are "fixed" by inserting Unsqueeze nodes
These Unsqueeze operations add dimensions with size 1 to match expected shapes
The resulting LSTM inputs have rank > 3, violating ONNX LSTM specification
Real-world impact:
Models like silero_vad.onnx contain 4 LSTM nodes, all with Unsqueeze operations before them
Without this fix LSTM models fail to convert to OpenVINO IR
Solution
This fix adds automatic rank reduction in the ONNX Frontend LSTM converter (src/frontends/onnx/frontend/src/op/lstm.cpp). The implementation uses a two-strategy approach:
Used when all extra leading dimensions equal 1
Example: [1, 1, seq, batch, input] → [seq, batch, input]
Zero-cost operation that only changes metadata, no data movement
Applies to most real-world models (including silero_vad.onnx)
Used when extra dimensions are > 1 or have dynamic shapes
Example: [2, 3, seq, batch, input] → [6, batch, input] (flattens leading dimensions)
Handles edge cases and dynamic shapes
Uses dynamic shape calculation at runtime
Implementation details:
New function reduce_tensor_rank() analyzes input tensor rank and shape
Automatically selects optimal strategy based on dimension values
Applied to all LSTM inputs: X (data), initial_h (hidden state), initial_c (cell state)
Transparent to users - no model modifications required
Code structure:
Performance:
Squeeze path has zero runtime overhead (metadata-only operation)
Reshape path adds minimal overhead only for edge cases
No impact on models that already have rank-3 inputs
Tickets: