-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Closed
Labels
Description
Name and Version
➜ llama.cpp git:(master) ✗ ./build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
version: 5935 (2adf8d8)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
Rtx pro 6000 Blackwell
Models
EXAONE-4.0-32B-Q4_K_M from the LG official repo
Problem description & steps to reproduce
jinja template from exaone-4 repo on HF https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B/blob/main/chat_template.jinja
First Bad Commit
No response
Relevant log output
➜ llama.cpp git:(master) ✗ cat ./exaone-32b-q4.sh
#!/bin/bash
./build/bin/llama-server -m /thearray/git/ob/text-generation-webui/models/EXAONE-4.0-32B-Q4_K_M.gguf \
--alias "Exaone-4" \
--threads 23 \
-c 131072 --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 \
-ngl 99 --mlock --no-mmap --flash-attn --port 9808 --api-key "llamacpp" --jinja --chat-template ./models/templates/Exaone-4.jinjaReactions are currently unavailable