Skip to content

Exaone-4 gibberish when using jinja template #14761

@sirus20x6

Description

@sirus20x6

Name and Version

➜ llama.cpp git:(master) ✗ ./build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
version: 5935 (2adf8d8)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

Rtx pro 6000 Blackwell

Models

EXAONE-4.0-32B-Q4_K_M from the LG official repo

Problem description & steps to reproduce

Image

jinja template from exaone-4 repo on HF https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B/blob/main/chat_template.jinja

First Bad Commit

No response

Relevant log output

➜  llama.cpp git:(master) ✗ cat ./exaone-32b-q4.sh
#!/bin/bash
./build/bin/llama-server -m /thearray/git/ob/text-generation-webui/models/EXAONE-4.0-32B-Q4_K_M.gguf \
--alias "Exaone-4" \
--threads 23 \
-c 131072 --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 \
 -ngl 99 --mlock --no-mmap --flash-attn --port 9808 --api-key "llamacpp" --jinja --chat-template ./models/templates/Exaone-4.jinja

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions