Skip to content

CUDA graph capture fails with AssertionError in causal_conv1d_update for Qwen3-Coder-Next #559

@vitalik

Description

@vitalik
  • vLLM version: 0.15.0
  • PyTorch version: (from vllm/vllm-openai:latest docker image)
  • CUDA version: 12.8
  • GPU: 2x NVIDIA H100 NVL (95830 MiB each)
  • Driver: 570.133.20
  • OS: Linux (Docker)

Command

  vllm serve Qwen/Qwen3-Coder-Next \
    --max-model-len 16000 \
    --tensor-parallel-size 2

CUDA graph capture fails during model initialization with an AssertionError in causal_conv1d_update. The error occurs at line 1160 in causal_conv1d.py:


  ERROR 02-03 13:39:49 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1320, in
  gdn_attention_core
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]     self._forward_core(
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 585, in
  _forward_core
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]     mixed_qkv_non_spec = causal_conv1d_update(
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]                          ^^^^^^^^^^^^^^^^^^^^^
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/mamba/ops/causal_conv1d.py", line 1160,
  in causal_conv1d_update
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]     assert num_cache_lines >= batch
  ERROR 02-03 13:39:49 [multiproc_executor.py:852]            ^^^^^^^^^^^^^^^^^^^^^^^^
  ERROR 02-03 13:39:49 [multiproc_executor.py:852] AssertionError

The error occurs during compile_or_warm_up_model → capture_model → _capture_cudagraphs → _dummy_run.

Workaround: Adding --enforce-eager allows the model to run, but with reduced performance (~12 tokens/s vs expected 20+ tokens/s with CUDA graphs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions