Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client#13196
Merged
CISC merged 16 commits intoggml-org:masterfrom Jun 29, 2025
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements support for setting additional jinja parameters.
An example of this is
enable_thinkingin the Qwen3 models template.Main features:
--chat_template_kwargsor the environment variablechat_template_kwargsparameterNotice
server: add--reasoning-budget 0to disable thinking (incl. qwen3 w/ enable_thinking:false) #13771 the preferred way for disabling thinking with a command line argument is now--reasoning-budget 0. The command line setting can be overridden anyway by passing thechat_template_kwargsduring the request to the OAI compatible APIOther info
The official template is still only partially compatible. I modified it to use only supported features.
It's here:
https://pastebin.com/16ZpCLHkhttps://pastebin.com/GGuTbFRcAnd should be loaded with
llama-server --jinja --chat-template-file {template_file}It fixes #13160 and #13189
Test it with:
{"prompt":"\n<|im_start|>user\nGive me a short introduction to large language models.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"}