-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hardware:
- CPU: Intel(R) Xeon(R) Platinum 8468V
- GPU: PVC - Intel(R) Data Center GPU Max 1550
OS
- Ubuntu 22.04.5 LTS
Model to trying to enable:
- mistralai/Ministral-3-3B-Base-2512, mistralai/Ministral-3-8B-Base-2512, mistralai/Ministral-3-14B-Base-2512
Image tried:
- intel/llm-scaler-vllm:1.2
The main issue here seems that Ministral 3 requires vllm >=0.12.0 (as mentioned https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#installation), but the image's vllm version is 0.10.3.dev0+g01efc7ef7.d20251125.xpu. If vllm is not upgraded, it gets AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'. Did you mean: 'convert_tokens_to_string'? .
When vllm 0.12.0 is installed (pip install vllm==0.12.0), then torch version becomes 2.9.0 and does not work with IPEX (intel_extension_for_pytorch 2.8.10.post1+xpu) giving out ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.8.*, but PyTorch 2.9.0+cu128 is found. Please switch to the matching version and run again. error.
So it seems that a new XPU image with vllm >= 0.12.0 would be needed and would resolve the issue.
Recipe (w/o upgrading vllm) tried:
BASE_ARGS=(
--rm -it
--ipc=host
--shm-size 32g
-v "${PWD}/models:/root/models"
-v /dev/dri:/dev/dri
--device-cgroup-rule='c 226:* rmw'
-e ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
-e CCL_ZE_IPC_EXCHANGE=sockets
-e HTTP_PROXY="${HTTP_PROXY}"
-e HTTPS_PROXY="${HTTPS_PROXY}"
-e NO_PROXY="${NO_PROXY}"
-e http_proxy="${http_proxy}"
-e https_proxy="${https_proxy}"
-e no_proxy="${no_proxy}"
)
docker run "${BASE_ARGS[@]}" \
--device=/dev/dri \
-e ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 \
-e ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
-e ZE_AFFINITY_MASK="0,1" \
--ipc=host \
--entrypoint= intel/llm-scaler-vllm:1.2 /bin/bash
# INSIDE Container
export HF_HOME=/home/huggingface
export HUGGING_FACE_HUB_TOKEN="MY_HF_TOKEN"
# required from https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#transformers
pip install transformers==5.0.0rc2
pip install mistral-common --upgrade
vllm serve mistralai/Ministral-3-8B-Base-2512 --dtype=bfloat16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --gpu-memory-util=0.95 --no-enable-prefix-caching --max-num-batched-tokens=128 --disable-log-requests --max-model-len=8192 --block-size 64 -tp 1 --tokenizer-mode "mistral"
Error encountered:
:
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] processor = processor_cls.from_pretrained(
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1414, in from_pretrained
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] processor = cls(*args, **valid_kwargs)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/transformers/models/pixtral/processing_pixtral.py", line 105, in __init__
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] self.image_token_id = tokenizer.convert_tokens_to_ids(self.image_token)
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=770) ERROR 01-14 19:57:36 [core.py:718] AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'. Did you mean: 'convert_tokens_to_string'?
:
Recipe tried with vllm==0.12.0
BASE_ARGS=(
--rm -it
--ipc=host
--shm-size 32g
-v "${PWD}/models:/root/models"
-v /dev/dri:/dev/dri
--device-cgroup-rule='c 226:* rmw'
-e ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
-e CCL_ZE_IPC_EXCHANGE=sockets
-e HTTP_PROXY="${HTTP_PROXY}"
-e HTTPS_PROXY="${HTTPS_PROXY}"
-e NO_PROXY="${NO_PROXY}"
-e http_proxy="${http_proxy}"
-e https_proxy="${https_proxy}"
-e no_proxy="${no_proxy}"
)
docker run "${BASE_ARGS[@]}" \
--device=/dev/dri \
-e ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 \
-e ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
-e ZE_AFFINITY_MASK="0,1" \
--ipc=host \
--entrypoint= intel/llm-scaler-vllm:1.2 /bin/bash
# INSIDE Container
export HF_HOME=/home/huggingface
export HUGGING_FACE_HUB_TOKEN="MY_HF_TOKEN"
# vllm >= 0.12.0; required from https://huggingface.co/mistralai/Ministral-3-8B-Base-2512#installation
pip install vllm==0.12.0
pip install transformers==5.0.0rc2
pip install mistral-common --upgrade
vllm serve mistralai/Ministral-3-8B-Base-2512 --dtype=bfloat16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --gpu-memory-util=0.95 --no-enable-prefix-caching --max-num-batched-tokens=128 --disable-log-requests --max-model-len=8192 --block-size 64 -tp 1 --tokenizer-mode "mistral"
Error message encountered
ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.8.*, but PyTorch 2.9.0+cu128 is found. Please switch to the matching version and run again.
root@73c7f3fd73da:/llm# pip list | grep torch
intel_extension_for_pytorch 2.8.10.post1+xpu
pytorch-triton-xpu 3.4.0
torch 2.9.0
torchaudio 2.9.0
torchvision 0.24.0