You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data type of input or output is float or float16 if not specified.
).
For float16 inputs, however, I get the error:
2025-02-15 20:15:02.999160115 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: bad_function_call
To reproduce
I crafted a minimal reproducer with CPU execution provider (example adapted from here:
I get the same errors when converting larger encoder/decoders to fp16 using the transformer optimizer fp16 conversion functionality m.convert_float_to_float16(keep_io_types=False), when keep_io_types is set to False. If I retain the input types (keep_io_types=True), execution is possible.
Could you please look into this issue? Maybe @tianleiwu ? Please let me know, if you need more Info. 👍
Urgency
Yes, mid-spring would be great.
Platform
Linux
OS Version
Ubuntu in WSL.
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered:
Describe the issue
Thanks for your work on onnx. 💯
I'm currently running into issues with the BeamSearch-Node (https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.BeamSearch) where the subgraph for the encoder and decoder (T5-style) have been converted to FP16.
According to the internal documentations inputs could be both float/float16 tensors (
onnxruntime/onnxruntime/contrib_ops/cpu/transformers/subgraph_t5_decoder.cc
Line 47 in ae6dcc8
For float16 inputs, however, I get the error:
To reproduce
I crafted a minimal reproducer with CPU execution provider (example adapted from here:
Minimal reproducer
pip freeze:
Example:
Output for DTYPE=TensorProto.FLOAT16:
Output for DTYPE=TensorProto.FLOAT:
I get the same errors when converting larger encoder/decoders to fp16 using the transformer optimizer fp16 conversion functionality
m.convert_float_to_float16(keep_io_types=False)
, when keep_io_types is set to False. If I retain the input types (keep_io_types=True), execution is possible.Could you please look into this issue? Maybe @tianleiwu ? Please let me know, if you need more Info. 👍
Urgency
Yes, mid-spring would be great.
Platform
Linux
OS Version
Ubuntu in WSL.
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: