-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
api:Javaissues related to the Java APIissues related to the Java APIperformanceissues related to performance regressionsissues related to performance regressionsstaleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot
Description
Describe the issue
When running Dolphin-1.5 model in Java with ONNX Runtime, the inference process is slow and data copy operations are also slow. This impacts overall application performance. Please investigate possible optimizations for inference speed and data transfer efficiency when using Java API.
To reproduce
decoder sample code:
for (int i = 0; i < maxNewTokens; i++) {
Map<String, OnnxTensor> inputs = new HashMap<>(2);
long[] flat = ArrayUtil.flat(inputIds);
LongBuffer buffer = LongBuffer.wrap(flat);
OnnxTensor input_ids_tensor = OnnxTensor.createTensor(env, buffer,
new long[] {inputIds.length, inputIds[0].length});
inputs.put("input_ids", input_ids_tensor);
inputs.put("encoder_hidden_states", encoder_hidden_states_tensor);
OrtSession.Result onnxResult = session.run(inputs, outputNames);
Optional<OnnxValue> optinalResult = onnxResult.get(List.copyOf(outputNames).get(0));
if (optinalResult.isPresent()) {
float[][][] decoderResultFloats = (float[][][]) optinalResult.get().getValue();
long[][] nextIds = new long[batchSize][1];
......
}
}
Urgency
No response
Platform
Windows
OS Version
windows 11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
ONNX Runtime API
Java
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.6
Model File
Dolphin-1.5
Is this a quantized model?
No
Metadata
Metadata
Assignees
Labels
api:Javaissues related to the Java APIissues related to the Java APIperformanceissues related to performance regressionsissues related to performance regressionsstaleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot