Skip to content

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client#13196

Merged
CISC merged 16 commits intoggml-org:masterfrom
matteoserva:enable_thinking
Jun 29, 2025
Merged

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client#13196
CISC merged 16 commits intoggml-org:masterfrom
matteoserva:enable_thinking

Conversation

@matteoserva
Copy link
Copy Markdown
Contributor

@matteoserva matteoserva commented Apr 29, 2025

This PR implements support for setting additional jinja parameters.
An example of this is enable_thinking in the Qwen3 models template.

Main features:

  • Setting jinja variables for command line using --chat_template_kwargs or the environment variable
  • Setting variables per request in the OAI compatible api using the chat_template_kwargs parameter
  • Compatibility with the VLLM API

Notice

Other info

The official template is still only partially compatible. I modified it to use only supported features.
It's here: https://pastebin.com/16ZpCLHk https://pastebin.com/GGuTbFRc
And should be loaded with llama-server --jinja --chat-template-file {template_file}

It fixes #13160 and #13189

Test it with:

  • enable_thinking=false. Expected: {"prompt":"\n<|im_start|>user\nGive me a short introduction to large language models.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"}
curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'
  • enable_thinking=true
curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": true}
}'
  • enable_thinking undefined
curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5
}'

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working