Ministral 3 - 2512 vLLM Enabling Issue: Tokenizer error

Hardware:
- CPU: Intel(R) Xeon(R) Platinum 8468V
- GPU: PVC - Intel(R) Data Center GPU Max 1550 

OS
- Ubuntu 22.04.5 LTS

Model to trying to enable:
- mistralai/Ministral-3-3B-Base-2512, mistralai/Ministral-3-8B-Base-2512, mistralai/Ministral-3-14B-Base-2512

Image tried:
- intel/llm-scaler-vllm:1.2

The main issue here seems that Ministral 3 requires vllm >=0.12.0 (as mentioned https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#installation), but the image's vllm version is `0.10.3.dev0+g01efc7ef7.d20251125.xpu`. If vllm is not upgraded, it gets `AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'. Did you mean: 'convert_tokens_to_string'?` .

When vllm 0.12.0 is installed (pip install vllm==0.12.0), then `torch` version becomes 2.9.0 and does not work with IPEX (intel_extension_for_pytorch       2.8.10.post1+xpu) giving out `ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.8.*, but PyTorch 2.9.0+cu128 is found. Please switch to the matching version and run again.` error.

So it seems that **a new XPU image with vllm >= 0.12.0** would be needed and would resolve the issue.

Recipe (w/o upgrading vllm) tried:
```
BASE_ARGS=(
  --rm -it
  --ipc=host
  --shm-size 32g
  -v "${PWD}/models:/root/models"
  -v /dev/dri:/dev/dri
  --device-cgroup-rule='c 226:* rmw'
  -e ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
  -e CCL_ZE_IPC_EXCHANGE=sockets
  -e HTTP_PROXY="${HTTP_PROXY}"
  -e HTTPS_PROXY="${HTTPS_PROXY}"
  -e NO_PROXY="${NO_PROXY}"
  -e http_proxy="${http_proxy}"
  -e https_proxy="${https_proxy}"
  -e no_proxy="${no_proxy}"
)


docker run "${BASE_ARGS[@]}"   \
  --device=/dev/dri \
  -e ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 \
  -e ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
  -e ZE_AFFINITY_MASK="0,1" \
  --ipc=host \
  --entrypoint= intel/llm-scaler-vllm:1.2 /bin/bash

# INSIDE Container
export HF_HOME=/home/huggingface
export HUGGING_FACE_HUB_TOKEN="MY_HF_TOKEN"

# required from  https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#transformers
pip install transformers==5.0.0rc2
pip install mistral-common --upgrade

vllm serve mistralai/Ministral-3-8B-Base-2512 --dtype=bfloat16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --gpu-memory-util=0.95 --no-enable-prefix-caching --max-num-batched-tokens=128 --disable-log-requests --max-model-len=8192 --block-size 64 -tp 1  --tokenizer-mode "mistral"
```

Error encountered:
```
:
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     processor = processor_cls.from_pretrained(
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1414, in from_pretrained
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     processor = cls(*args, **valid_kwargs)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]   File "/usr/local/lib/python3.12/dist-packages/transformers/models/pixtral/processing_pixtral.py", line 105, in __init__
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]     self.image_token_id = tokenizer.convert_tokens_to_ids(self.image_token)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'. Did you mean: 'convert_tokens_to_string'?
:
```

Recipe tried with vllm==0.12.0
```
BASE_ARGS=(
  --rm -it
  --ipc=host
  --shm-size 32g
  -v "${PWD}/models:/root/models"
  -v /dev/dri:/dev/dri
  --device-cgroup-rule='c 226:* rmw'
  -e ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
  -e CCL_ZE_IPC_EXCHANGE=sockets
  -e HTTP_PROXY="${HTTP_PROXY}"
  -e HTTPS_PROXY="${HTTPS_PROXY}"
  -e NO_PROXY="${NO_PROXY}"
  -e http_proxy="${http_proxy}"
  -e https_proxy="${https_proxy}"
  -e no_proxy="${no_proxy}"
)

docker run "${BASE_ARGS[@]}"   \
  --device=/dev/dri \
  -e ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 \
  -e ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
  -e ZE_AFFINITY_MASK="0,1" \
  --ipc=host \
  --entrypoint= intel/llm-scaler-vllm:1.2 /bin/bash

# INSIDE Container
export HF_HOME=/home/huggingface
export HUGGING_FACE_HUB_TOKEN="MY_HF_TOKEN"

# vllm >= 0.12.0; required from https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#installation
pip install vllm==0.12.0
pip install transformers==5.0.0rc2
pip install mistral-common --upgrade

vllm serve mistralai/Ministral-3-8B-Base-2512 --dtype=bfloat16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --gpu-memory-util=0.95 --no-enable-prefix-caching --max-num-batched-tokens=128 --disable-log-requests --max-model-len=8192 --block-size 64 -tp 1  --tokenizer-mode "mistral"
```

Error message encountered
```
ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.8.*, but PyTorch 2.9.0+cu128 is found. Please switch to the matching version and run again.

root@73c7f3fd73da:/llm# pip list | grep torch
intel_extension_for_pytorch       2.8.10.post1+xpu
pytorch-triton-xpu                3.4.0
torch                             2.9.0
torchaudio                        2.9.0
torchvision                       0.24.0
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ministral 3 - 2512 vLLM Enabling Issue: Tokenizer error #241

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ministral 3 - 2512 vLLM Enabling Issue: Tokenizer error #241

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions