Skip to content

Ministral 3 - 2512 vLLM Enabling Issue: Tokenizer error #241

@gilliean

Description

@gilliean

Hardware:

  • CPU: Intel(R) Xeon(R) Platinum 8468V
  • GPU: PVC - Intel(R) Data Center GPU Max 1550

OS

  • Ubuntu 22.04.5 LTS

Model to trying to enable:

  • mistralai/Ministral-3-3B-Base-2512, mistralai/Ministral-3-8B-Base-2512, mistralai/Ministral-3-14B-Base-2512

Image tried:

  • intel/llm-scaler-vllm:1.2

The main issue here seems that Ministral 3 requires vllm >=0.12.0 (as mentioned https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#installation), but the image's vllm version is 0.10.3.dev0+g01efc7ef7.d20251125.xpu. If vllm is not upgraded, it gets AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'. Did you mean: 'convert_tokens_to_string'? .

When vllm 0.12.0 is installed (pip install vllm==0.12.0), then torch version becomes 2.9.0 and does not work with IPEX (intel_extension_for_pytorch 2.8.10.post1+xpu) giving out ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.8.*, but PyTorch 2.9.0+cu128 is found. Please switch to the matching version and run again. error.

So it seems that a new XPU image with vllm >= 0.12.0 would be needed and would resolve the issue.

Recipe (w/o upgrading vllm) tried:

BASE_ARGS=(
  --rm -it
  --ipc=host
  --shm-size 32g
  -v "${PWD}/models:/root/models"
  -v /dev/dri:/dev/dri
  --device-cgroup-rule='c 226:* rmw'
  -e ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
  -e CCL_ZE_IPC_EXCHANGE=sockets
  -e HTTP_PROXY="${HTTP_PROXY}"
  -e HTTPS_PROXY="${HTTPS_PROXY}"
  -e NO_PROXY="${NO_PROXY}"
  -e http_proxy="${http_proxy}"
  -e https_proxy="${https_proxy}"
  -e no_proxy="${no_proxy}"
)


docker run "${BASE_ARGS[@]}"   \
  --device=/dev/dri \
  -e ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 \
  -e ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
  -e ZE_AFFINITY_MASK="0,1" \
  --ipc=host \
  --entrypoint= intel/llm-scaler-vllm:1.2 /bin/bash

# INSIDE Container
export HF_HOME=/home/huggingface
export HUGGING_FACE_HUB_TOKEN="MY_HF_TOKEN"

# required from  https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#transformers
pip install transformers==5.0.0rc2
pip install mistral-common --upgrade

vllm serve mistralai/Ministral-3-8B-Base-2512 --dtype=bfloat16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --gpu-memory-util=0.95 --no-enable-prefix-caching --max-num-batched-tokens=128 --disable-log-requests --max-model-len=8192 --block-size 64 -tp 1  --tokenizer-mode "mistral"

Error encountered:

:
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     processor = processor_cls.from_pretrained(
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1414, in from_pretrained
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     processor = cls(*args, **valid_kwargs)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]   File "/usr/local/lib/python3.12/dist-packages/transformers/models/pixtral/processing_pixtral.py", line 105, in __init__
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     self.image_token_id = tokenizer.convert_tokens_to_ids(self.image_token)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'. Did you mean: 'convert_tokens_to_string'?
:

Recipe tried with vllm==0.12.0

BASE_ARGS=(
  --rm -it
  --ipc=host
  --shm-size 32g
  -v "${PWD}/models:/root/models"
  -v /dev/dri:/dev/dri
  --device-cgroup-rule='c 226:* rmw'
  -e ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
  -e CCL_ZE_IPC_EXCHANGE=sockets
  -e HTTP_PROXY="${HTTP_PROXY}"
  -e HTTPS_PROXY="${HTTPS_PROXY}"
  -e NO_PROXY="${NO_PROXY}"
  -e http_proxy="${http_proxy}"
  -e https_proxy="${https_proxy}"
  -e no_proxy="${no_proxy}"
)

docker run "${BASE_ARGS[@]}"   \
  --device=/dev/dri \
  -e ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 \
  -e ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
  -e ZE_AFFINITY_MASK="0,1" \
  --ipc=host \
  --entrypoint= intel/llm-scaler-vllm:1.2 /bin/bash

# INSIDE Container
export HF_HOME=/home/huggingface
export HUGGING_FACE_HUB_TOKEN="MY_HF_TOKEN"

# vllm >= 0.12.0; required from https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#installation
pip install vllm==0.12.0
pip install transformers==5.0.0rc2
pip install mistral-common --upgrade

vllm serve mistralai/Ministral-3-8B-Base-2512 --dtype=bfloat16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --gpu-memory-util=0.95 --no-enable-prefix-caching --max-num-batched-tokens=128 --disable-log-requests --max-model-len=8192 --block-size 64 -tp 1  --tokenizer-mode "mistral"

Error message encountered

ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.8.*, but PyTorch 2.9.0+cu128 is found. Please switch to the matching version and run again.

root@73c7f3fd73da:/llm# pip list | grep torch
intel_extension_for_pytorch       2.8.10.post1+xpu
pytorch-triton-xpu                3.4.0
torch                             2.9.0
torchaudio                        2.9.0
torchvision                       0.24.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions