[Mobile] Android: Cannot create tensor with zero dimension (seq_len=0), causing issues with past_key_values initialization in Transformer models

### Describe the issue

## Problem Description

ONNX Runtime Android cannot create tensors with zero dimensions, which prevents proper initialization of `past_key_values` in Transformer models (such as Qwen2-VL, LLaMA, etc.) during the first inference step.

**Expected Behavior:**

According to ONNX specification and Python implementations, `past_key_values` should be initialized with `seq_len=0` (empty cache) for the first inference:

```python
past_key_values = np.zeros((1, num_heads, 0, head_dim), dtype=np.float16)
# seq_len = 0 ✅ Python/NumPy supports
```

**Actual Behavior:**

ONNX Runtime Android throws an exception when trying to create a tensor with a dimension of 0:

```kotlin
val shape = longArrayOf(1, 2, 0, 128)  // seq_len = 0
val tensor = OnnxTensor.createTensor(ortEnv!!, FloatBuffer.wrap(data), shape)
// ❌ Exception: Cannot create tensor with zero dimension
```

**Workaround Attempted:**

We must use `seq_len=1` as a placeholder:

```kotlin
val shape = longArrayOf(1, 2, 1, 128)  // seq_len = 1 (placeholder)
```

However, this causes sequence length mismatches in model inference, as the model expects:
- Total sequence length = past_seq_len + current_seq_len
- With `seq_len=1` placeholder: 1 + N = N+1
- But `inputs_embeds` only has N tokens, causing shape mismatch errors

**Impact:**

This limitation affects all Transformer models that use `past_key_values` for KV caching on Android, preventing proper inference.

**Error Example:**

```
ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION
Shape mismatch attempting to re-use buffer. {1,12,605,128} != {1,12,606,128}
```

The model expects sequence length 606 (past_seq_len=1 + current_seq_len=605), but `inputs_embeds` only has 605 tokens.

**Questions:**

1. Is this a known limitation of ONNX Runtime Android? Are there plans to support zero-dimensional tensors?
2. Is there a workaround to properly handle `past_key_values` initialization on Android?
3. Are there configuration options in `SessionOptions` that can help with this?

### To reproduce

## Steps to Reproduce

1. **Setup ONNX Runtime Android:**
   - Add dependency: `com.microsoft.onnxruntime:onnxruntime-android:1.20.0`
   - Initialize ONNX Runtime environment in Kotlin/Java

2. **Attempt to create a tensor with zero dimension:**
   ```kotlin
   val ortEnv = OrtEnvironment.getEnvironment()
   val shape = longArrayOf(1, 2, 0, 128)  // seq_len = 0 (zero dimension)
   val data = FloatArray(0)  // Empty array
   val tensor = OnnxTensor.createTensor(ortEnv, FloatBuffer.wrap(data), shape)
   ```

3. **Expected:** Tensor should be created successfully (as in Python/NumPy)
4. **Actual:** Exception is thrown because ONNX Runtime Android doesn't support zero dimensions

## Minimal Code Example

```kotlin
import ai.onnxruntime.*

fun main() {
    val ortEnv = OrtEnvironment.getEnvironment()
    
    // This works in Python/NumPy but fails in ONNX Runtime Android
    try {
        val shape = longArrayOf(1, 2, 0, 128)  // seq_len = 0
        val data = FloatArray(0)
        val tensor = OnnxTensor.createTensor(ortEnv, FloatBuffer.wrap(data), shape)
        println("Success: Tensor created with zero dimension")
    } catch (e: Exception) {
        println("Error: ${e.message}")
        // Expected: Exception about zero dimension not supported
    }
}
```

## Model Context

This issue occurs when initializing `past_key_values` for Transformer models (e.g., Qwen2-VL, LLaMA) where the KV cache should be empty (seq_len=0) for the first inference step. The model is exported correctly according to ONNX specification with `past_sequence_length=0`.

### Urgency

This issue is **urgent** as it blocks critical functionality for our project.

**Why it's urgent:**

1. **Core functionality blocked:** We are developing an Android application that uses a Qwen2-VL model for vision-language understanding. The model cannot run inference on Android due to this limitation.

2. **Widespread impact:** This affects all Transformer models that use `past_key_values` for KV caching on Android, not just our specific use case. This is a fundamental limitation that prevents many modern Transformer models from working on Android.

3. **Project deadline:** This is blocking our core feature development and preventing us from deploying our application.

4. **No workaround available:** We've tried multiple approaches (padding inputs, adjusting attention masks, using seq_len=1 as placeholder, etc.), but all fail due to this fundamental limitation. We cannot use `seq_len=0` as required by the model specification, and using `seq_len=1` causes sequence length mismatches in model inference.

**Impact:** Without a solution, we cannot deploy Transformer models with KV caching on Android using ONNX Runtime, which severely limits the framework's usefulness for mobile applications.

### Platform

Android

### OS Version

Android 15

### ONNX Runtime Installation

Released Package

### Compiler Version (if 'Built from Source')

_No response_

### Package Name (if 'Released Package')

onnxruntime-android

### ONNX Runtime Version or Commit ID

com.microsoft.onnxruntime:onnxruntime-android:1.20.0

### ONNX Runtime API

Java/Kotlin

### Architecture

ARM64

### Execution Provider

Default CPU

### Execution Provider Library Version

N/A (CPU EP)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Mobile] Android: Cannot create tensor with zero dimension (seq_len=0), causing issues with past_key_values initialization in Transformer models #26841

Describe the issue

Problem Description

To reproduce

Steps to Reproduce

Minimal Code Example

Model Context

Urgency

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Mobile] Android: Cannot create tensor with zero dimension (seq_len=0), causing issues with past_key_values initialization in Transformer models #26841

Description

Describe the issue

Problem Description

To reproduce

Steps to Reproduce

Minimal Code Example

Model Context

Urgency

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions