Skip to content

[Performance] Inference and data copy are slow when running Dolphin-1.5 in Java #26469

@ningpp

Description

@ningpp

Describe the issue

When running Dolphin-1.5 model in Java with ONNX Runtime, the inference process is slow and data copy operations are also slow. This impacts overall application performance. Please investigate possible optimizations for inference speed and data transfer efficiency when using Java API.

To reproduce

decoder sample code:


        for (int i = 0; i < maxNewTokens; i++) {

            Map<String, OnnxTensor> inputs = new HashMap<>(2);
            long[] flat = ArrayUtil.flat(inputIds);
            LongBuffer buffer = LongBuffer.wrap(flat);
            OnnxTensor input_ids_tensor = OnnxTensor.createTensor(env, buffer,
                    new long[] {inputIds.length, inputIds[0].length});
            inputs.put("input_ids", input_ids_tensor);
            inputs.put("encoder_hidden_states", encoder_hidden_states_tensor);

            OrtSession.Result onnxResult = session.run(inputs, outputNames);
            Optional<OnnxValue> optinalResult = onnxResult.get(List.copyOf(outputNames).get(0));
            if (optinalResult.isPresent()) {
                float[][][] decoderResultFloats = (float[][][]) optinalResult.get().getValue();
                long[][] nextIds = new long[batchSize][1];
                ......
            }
    }

Urgency

No response

Platform

Windows

OS Version

windows 11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

f217402

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.6

Model File

Dolphin-1.5

Is this a quantized model?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    api:Javaissues related to the Java APIperformanceissues related to performance regressionsstaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions