CUDA graph capture fails with AssertionError in causal_conv1d_update for Qwen3-Coder-Next


  - vLLM version: 0.15.0
  - PyTorch version: (from vllm/vllm-openai:latest docker image)
  - CUDA version: 12.8
  - GPU: 2x NVIDIA H100 NVL (95830 MiB each)
  - Driver: 570.133.20
  - OS: Linux (Docker)



  Command

```
  vllm serve Qwen/Qwen3-Coder-Next \
    --max-model-len 16000 \
    --tensor-parallel-size 2
```

  CUDA graph capture fails during model initialization with an AssertionError in causal_conv1d_update. The error occurs at line 1160 in causal_conv1d.py:

```

  ERROR 02-03 13:39:49 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1320, in
  gdn_attention_core
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]     self._forward_core(
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 585, in
  _forward_core
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]     mixed_qkv_non_spec = causal_conv1d_update(
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]                          ^^^^^^^^^^^^^^^^^^^^^
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/mamba/ops/causal_conv1d.py", line 1160,
  in causal_conv1d_update
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]     assert num_cache_lines >= batch
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]            ^^^^^^^^^^^^^^^^^^^^^^^^
  ERROR 02-03 13:39:49 [multiproc_executor.py:852] AssertionError

```
  
The error occurs during compile_or_warm_up_model → capture_model → _capture_cudagraphs → _dummy_run.


Workaround: Adding --enforce-eager allows the model to run, but with reduced performance (~12 tokens/s vs expected 20+ tokens/s with CUDA graphs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA graph capture fails with AssertionError in causal_conv1d_update for Qwen3-Coder-Next #559

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA graph capture fails with AssertionError in causal_conv1d_update for Qwen3-Coder-Next #559

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions