Skip to content

Conversation

@akoumjian
Copy link

This PR aims to fix an issue I encountered using response_format with Kimi K2 Instruct 0905.
Using the /v1/chat/completions endpoint in llama-server I noticed that I was receiving responses which were not adhereing to the submitted json_schema.

Simplest reproduction:

  1. Build llama.cpp without this PR's changes
  2. Download a version of https://huggingface.co/unsloth/Kimi-K2-Instruct-0905
  3. Start llama-server. Do not manually specify a chat template file.
  ./build/bin/llama-server \
    --host 127.0.0.1 --port 5840 \
    --model /path/to/Kimi-K2-Instruct-0905-...-00001-of-00013.gguf \
    --ctx-size 8192 \
    --n-gpu-layers 0
  1. Send a request that contains a json_schema as part of response_format
curl -sS http://127.0.0.1:5840/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d ' {
      "model": "any",
      "temperature": 0.1,
      "max_tokens": 64,
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "schema": {
            "type": "object",
            "properties": {"ok": {"type": "boolean"}},
            "required": ["ok"],
            "additionalProperties": false
          }
        }
      },
      "messages": [
        {"role": "system", "content": "Return the JSON wrapped in a ```json code fence```."},
        {"role": "user", "content": "Return ok=true as a JSON struct."}
      ]
    } '

You are likely to receive a response like the following, backticks and json declaration included:

  ```json
      {"ok": true}
  ```

This should not be possible if grammar is being created and enforced.

The issue is that for chat completions, when the Kimi format is detected it routes to the kimi based handler (common_chat_params_init_kimi_k2). This handler did not follow the same behavior as the generic handler which would generate a grammar for a schema in response_format, it only handled tool grammars.

This revealed a second issue with the Kimi flags which included open bracket in tool separator and closing bracket in tool end. Due to those characters being in template tag definitions, the trim_suffix call was removing the ending bracket and producing invalid JSON strings, e.g. {"ok": true. I have modified the trim_suffix approach, but it is ugly and I'm hoping someone with better intuition will have a better solution. I see there is an Autoparser PR (#18675) but I have tested it and it does not resolve the original issue.

AI was used in the following ways for this PR:

  • Locating and describing the source of the issue, after some iterative debugging sessions
  • Suggesting possible fixes, namely included the grammar creation in the kimi specific path and following that how I might check for which characters to trim.
  • I did have it create the boilerplate for the regression test, but all it's really doing is determining that the custom kimi path was selected and that a grammar was created

As requested, I ran the whole test suite which passed. Perplexity obviously not affected.

@pwilkin
Copy link
Collaborator

pwilkin commented Jan 15, 2026

Thanks for the feedback regarding the autoparser, I'll make sure to verify the json_schema / grammar generation paths.

@akoumjian akoumjian force-pushed the fix/kimi-chat-response-format-grammar branch from 53ddc91 to 3829263 Compare January 27, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants