enforce response_format and json_schema for Kimi K2 #18851
+173
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to fix an issue I encountered using response_format with Kimi K2 Instruct 0905.
Using the /v1/chat/completions endpoint in llama-server I noticed that I was receiving responses which were not adhereing to the submitted json_schema.
Simplest reproduction:
./build/bin/llama-server \ --host 127.0.0.1 --port 5840 \ --model /path/to/Kimi-K2-Instruct-0905-...-00001-of-00013.gguf \ --ctx-size 8192 \ --n-gpu-layers 0You are likely to receive a response like the following, backticks and json declaration included:
This should not be possible if grammar is being created and enforced.
The issue is that for chat completions, when the Kimi format is detected it routes to the kimi based handler (common_chat_params_init_kimi_k2). This handler did not follow the same behavior as the generic handler which would generate a grammar for a schema in response_format, it only handled tool grammars.
This revealed a second issue with the Kimi flags which included open bracket in tool separator and closing bracket in tool end. Due to those characters being in template tag definitions, the trim_suffix call was removing the ending bracket and producing invalid JSON strings, e.g.
{"ok": true. I have modified the trim_suffix approach, but it is ugly and I'm hoping someone with better intuition will have a better solution. I see there is an Autoparser PR (#18675) but I have tested it and it does not resolve the original issue.AI was used in the following ways for this PR:
As requested, I ran the whole test suite which passed. Perplexity obviously not affected.