[Performance] Inference and data copy are slow when running Dolphin-1.5 in Java

### Describe the issue

When running Dolphin-1.5 model in Java with ONNX Runtime, the inference process is slow and data copy operations are also slow. This impacts overall application performance. Please investigate possible optimizations for inference speed and data transfer efficiency when using Java API.

### To reproduce


decoder sample code:
```

        for (int i = 0; i < maxNewTokens; i++) {

            Map<String, OnnxTensor> inputs = new HashMap<>(2);
            long[] flat = ArrayUtil.flat(inputIds);
            LongBuffer buffer = LongBuffer.wrap(flat);
            OnnxTensor input_ids_tensor = OnnxTensor.createTensor(env, buffer,
                    new long[] {inputIds.length, inputIds[0].length});
            inputs.put("input_ids", input_ids_tensor);
            inputs.put("encoder_hidden_states", encoder_hidden_states_tensor);

            OrtSession.Result onnxResult = session.run(inputs, outputNames);
            Optional<OnnxValue> optinalResult = onnxResult.get(List.copyOf(outputNames).get(0));
            if (optinalResult.isPresent()) {
                float[][][] decoderResultFloats = (float[][][]) optinalResult.get().getValue();
                long[][] nextIds = new long[batchSize][1];
                ......
            }
    }
``` 

### Urgency

_No response_

### Platform

Windows

### OS Version

windows 11

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

f217402897f40ebba457e2421bc0a4702771968e

### ONNX Runtime API

Java

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

CUDA 12.6

### Model File

Dolphin-1.5

### Is this a quantized model?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Inference and data copy are slow when running Dolphin-1.5 in Java #26469

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Inference and data copy are slow when running Dolphin-1.5 in Java #26469

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions