Skip to content

Qwen3-235B inference issue on 8 * Dural B60 #248

@jessie-zhao

Description

@jessie-zhao

vllm version: intel/llm-scaler-vllm:1.2

Serving got below error:
(APIServer pid=19517) INFO 01-17 07:37:12 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.8%, Prefix cache hit rate: 18.6%
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.3.dev0+g01efc7ef7.d20251125) with config: model='/llm/models/Qwen3-235B-A22B-Instruct-2507', speculative_config=None, tokenizer='/llm/models/Qwen3-235B-A22B-Instruct-2507', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=10000, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=2, data_parallel_size=1, disable_custom_all_reduce=True, quantization=fp8, enforce_eager=True, kv_cache_dtype=auto, device_config=xpu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen3-235B-A22B-Instruct-2507, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null},
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[], scheduled_cached_reqs=CachedRequestData(req_ids=['cmpl-benchmark-serving3-0', 'cmpl-benchmark-serving9-0', 'cmpl-benchmark-serving12-0', 'cmpl-benchmark-serving13-0'], resumed_from_preemption=[false, false, false, false], new_token_ids=[[15], [15], [13], [101535]], new_block_ids=[null, [[1409]], null, null], num_computed_tokens=[6368, 9728, 7805, 8434]), num_scheduled_tokens={cmpl-benchmark-serving12-0: 1, cmpl-benchmark-serving9-0: 1, cmpl-benchmark-serving13-0: 1, cmpl-benchmark-serving3-0: 1}, total_num_scheduled_tokens=4, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=[], free_encoder_mm_hashes=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=5, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.2577092511013216, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, num_corrupted_reqs=0)
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] Traceback (most recent call last):
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 711, in run_engine_core
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] engine_core.run_busy_loop()
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 738, in run_busy_loop
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] self._process_engine_step()
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 764, in _process_engine_step
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 353, in step_with_batch_queue
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] model_output = self.execute_model_with_error_logging(
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 278, in execute_model_with_error_logging
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] raise err
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 269, in execute_model_with_error_logging
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] return model_fn(scheduler_output)
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 354, in
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] lambda _: future.result(), scheduler_output)
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] return self.__get_result()
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] raise self._exception
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/executor/multiproc_executor.py", line 239, in get_response
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] status, result = w.worker_response_mq.dequeue(
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/distributed/device_communicators/shm_broadcast.py", line 507, in dequeue
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] with self.acquire_read(timeout, cancel) as buf:
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/lib/python3.12/contextlib.py", line 137, in enter
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] return next(self.gen)
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/distributed/device_communicators/shm_broadcast.py", line 469, in acquire_read
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] raise TimeoutError
(EngineCore_DP0 pid=19789) ERROR 01-17 07:41:55 [core.py:720] TimeoutError
(Worker_PP0_TP0 pid=19926) INFO 01-17 07:41:55 [multiproc_executor.py:546] Parent process exited, terminating worker
(Worker_PP0_TP1 pid=19927) INFO 01-17 07:41:55 [multiproc_executor.py:546] Parent process exited, terminating worker
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] AsyncLLM output_handler failed.
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] Traceback (most recent call last):
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/async_llm.py", line 444, in output_handler
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] outputs = await engine_core.get_output_async()
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.3.dev0+g01efc7ef7.d20251125.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core_client.py", line 845, in get_output_async
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] raise self._format_exception(outputs) from None
(APIServer pid=19517) ERROR 01-17 07:41:55 [async_llm.py:485] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(Worker_PP0_TP2 pid=19928) INFO 01-17 07:41:55 [multiproc_executor.py:546] Parent process exited, terminating worker

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions