Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
144 commits
Select commit Hold shift + click to select a range
2777ad1
peg-parser: enum-based tags, lambda mappers, and grammar improvements
ochafik Dec 24, 2025
2e932cb
chat: migrate all parsers to modular PEG infrastructure
ochafik Dec 24, 2025
11e9a5a
test: add needle streaming tests and metatest infrastructure
ochafik Dec 24, 2025
f92a78a
server: add --experimental-new-parsers flag for PEG migration
ochafik Dec 24, 2025
c880f4f
test required tool calls w/ new parsers
ochafik Dec 24, 2025
572d572
chat-parsers: enforce no content in tool_choice=required mode
ochafik Dec 24, 2025
df463aa
fix typo in test_tool_call
ochafik Dec 24, 2025
8657d44
peg-parser: add schema_or_raw_string_until helper
ochafik Dec 24, 2025
7c68b9a
test: fix tool_required_allows_content for GLM 4.6, MiniMax M2, Apertus
ochafik Dec 24, 2025
09f460a
fix: remove unused variables and trailing whitespace
ochafik Dec 25, 2025
d72f45e
fix: update SPACE_RULE and test expectations for grammar changes
ochafik Dec 25, 2025
a708406
fix: convert Nemotron V3 template from CRLF to LF
ochafik Dec 25, 2025
deb527e
fix: reject content in tool_choice=required mode, refactor tests
ochafik Dec 26, 2025
6a78eca
test: add format detection test for templates with tools
ochafik Dec 26, 2025
7fd9246
fix: remove README.md from Python test template list
ochafik Dec 26, 2025
ff10fee
fix additionalProperties default back to false
ochafik Dec 26, 2025
20bb892
manual refactoring: more uniform triggers, grammar gen & refs resolution
ochafik Dec 26, 2025
f143f0c
test-chat: make some failures easier to debug
ochafik Dec 26, 2025
7bd8e3a
test-chat: run needle tests first
ochafik Dec 26, 2025
174b439
Update firefunction-v2.cpp
ochafik Dec 26, 2025
30e5a4f
Update apertus.cpp
ochafik Dec 26, 2025
26924ee
Update command-r7b.cpp
ochafik Dec 26, 2025
ab33482
Update deepseek-r1.cpp
ochafik Dec 26, 2025
0c371ee
Update apriel-1-5.cpp
ochafik Dec 26, 2025
ec5535f
Update generic.cpp
ochafik Dec 26, 2025
40b8cda
Update gpt-oss.cpp
ochafik Dec 26, 2025
95106fb
Update granite.cpp
ochafik Dec 26, 2025
71473a0
Update lfm2.cpp
ochafik Dec 26, 2025
8da0c53
Update mistral-nemo.cpp
ochafik Dec 26, 2025
70f5e12
Update nemotron-v2.cpp
ochafik Dec 26, 2025
98fddf3
Update apertus.cpp
ochafik Dec 26, 2025
3001ace
Update minimax-m2.cpp
ochafik Dec 26, 2025
41a578d
Update nemotron-v3.cpp
ochafik Dec 26, 2025
61727f3
Update lfm2.cpp
ochafik Dec 26, 2025
575b0e4
Update qwen3-coder-xml.cpp
ochafik Dec 26, 2025
931c29f
Update glm-4-5.cpp
ochafik Dec 26, 2025
43fd9a3
Update functionary-v3-1-llama-3-1.cpp
ochafik Dec 26, 2025
211ef2e
Update kimi-k2.cpp
ochafik Dec 26, 2025
c9689ed
Update llama-3-x.cpp
ochafik Dec 26, 2025
9df0a65
Update llama-3-x.cpp
ochafik Dec 26, 2025
5cbd855
Update magistral.cpp
ochafik Dec 26, 2025
36a14ce
Update kimi-k2.cpp
ochafik Dec 26, 2025
4191a0d
fix foreach_parameter regression
ochafik Dec 26, 2025
47fb151
fix lints
ochafik Dec 26, 2025
f4eb897
Update test-chat.cpp
ochafik Dec 26, 2025
5a23763
Update nemotron-v3.cpp
ochafik Dec 26, 2025
b935551
Update minimax-m2.cpp
ochafik Dec 26, 2025
9b3c18a
feat: add json_schema support to 4 parsers + needle test scaffolding
ochafik Dec 26, 2025
e2e4352
Fix GPT OSS template parser and test configuration
ochafik Dec 26, 2025
1b8d4e6
Fix GPT OSS tool call parser - add missing TOOL_OPEN tag
ochafik Dec 26, 2025
27d1a3c
Fix GPT OSS tool call parser and test configuration
ochafik Dec 27, 2025
cf85f4c
Fix init_delta to use params_prefix for correct parser configuration
ochafik Dec 27, 2025
88327c7
fix parser dispatch (json_schema cases)
ochafik Dec 27, 2025
4e897fd
drop ToolSupport enum (assume all yes)
ochafik Dec 27, 2025
3b9d368
Update deepseek-v3-1.cpp
ochafik Dec 27, 2025
9fd315e
Simplify GLM 4.5 parser and fix test injection for thinking_forced_open
ochafik Dec 27, 2025
93c26d2
Fix DeepSeek R1 parallel tool calls and add fixed template tests
ochafik Dec 27, 2025
fc46ac5
Update firefunction-v2.cpp
ochafik Dec 27, 2025
77d9189
build_json_args_peg_parser helper
ochafik Dec 27, 2025
634de31
json refactors
ochafik Dec 27, 2025
308660d
Update firefunction-v2.cpp
ochafik Dec 27, 2025
99f7487
Update command-r7b.cpp
ochafik Dec 27, 2025
b82e377
Update lfm2.cpp
ochafik Dec 27, 2025
1650f7d
Update chat-parsers-internal.h
ochafik Dec 27, 2025
a7a372c
switch json grammars to native peg parser (w/ temp hack)
ochafik Dec 27, 2025
4ecc791
remove cruft around / between needles
ochafik Dec 27, 2025
51eb34e
Update llama-3-x.cpp
ochafik Dec 27, 2025
481b257
tool call parser helpers refactor!
ochafik Dec 27, 2025
6c945d9
Update minimax-m2.cpp
ochafik Dec 27, 2025
f990d32
revert change in legacy common_chat_params_init_nemotron_v3
ochafik Dec 27, 2025
75b30d4
Update chat-parsers-internal.h
ochafik Dec 27, 2025
c2d933a
Update apertus.cpp
ochafik Dec 27, 2025
d18c3f0
Update qwen3-coder-xml.cpp
ochafik Dec 27, 2025
dac596f
Update deepseek-r1.cpp
ochafik Dec 27, 2025
22ecdf2
Update seed-oss.cpp
ochafik Dec 27, 2025
a060f7f
Update minimax-m2.cpp
ochafik Dec 27, 2025
3527d78
Update glm-4-5.cpp
ochafik Dec 27, 2025
1a96913
Update deepseek-r1.cpp
ochafik Dec 27, 2025
24bf597
Update chat-parser.cpp
ochafik Dec 27, 2025
475a097
Update apriel-1-5.cpp
ochafik Dec 27, 2025
38d9ac0
pass more parsers to helpers
ochafik Dec 27, 2025
eff4901
peg-constructed: create tool call only when tool name arrives
ochafik Dec 27, 2025
75f976a
space nits
ochafik Dec 27, 2025
4c8cfc9
refactor: introduce format structs for PEG parser helpers
ochafik Dec 27, 2025
cfcff17
json helper: format.tool_call factory
ochafik Dec 27, 2025
bd593cc
json helper: migrate parsers to format.tool_call lambda pattern
ochafik Dec 27, 2025
31d2f7c
fix: add space after colon in tool_call lambdas
ochafik Dec 27, 2025
76a9d61
fix: make PEG mappers lazy to avoid spurious tool calls during backtr…
ochafik Dec 27, 2025
246ce11
refactor: route Command R7B to native_mapper (uses build_json_tool_ca…
ochafik Dec 27, 2025
aae4499
fix: improve PEG mapper content handling and add stricter tag validation
ochafik Dec 27, 2025
9339378
Update kimi-k2.cpp
ochafik Dec 27, 2025
6f27f7f
test-chat: unskip mimo & apriel
ochafik Dec 27, 2025
c8b8581
fix mimo
ochafik Dec 27, 2025
35848b0
fix apriel
ochafik Dec 27, 2025
c083f7e
refactor tests a bit
ochafik Dec 27, 2025
0dd47cb
nits
ochafik Dec 27, 2025
1220f22
fix needle tests
ochafik Dec 27, 2025
787a704
nemotron v2: wire enable_thinking behaviour w/ adhoc user messages
ochafik Dec 27, 2025
3dee66e
refactor tests some more
ochafik Dec 28, 2025
9efb242
end tokens for command r7b in needle tests
ochafik Dec 28, 2025
b1c852c
fix command r7b parsing (accept id before name in common_chat_peg_nat…
ochafik Dec 28, 2025
7c8e94a
tighten template detection of xiaomi mimo
ochafik Dec 28, 2025
ed21c51
minimize diff
ochafik Dec 28, 2025
8f787f8
fix apriel typo, drop nemv3 enum, minimize diffs
ochafik Dec 28, 2025
93570cb
linter nit
ochafik Dec 28, 2025
6ae6f84
test-chat: fix inputs.enable_thinking when reasoning format is none
ochafik Dec 28, 2025
a0353a9
inline build_json_tool_calls_peg_parser
ochafik Dec 28, 2025
76a46d1
test-chat: split out parser tests to their own files
ochafik Dec 28, 2025
148a605
test-chat: green test w/ lots of skipping
ochafik Dec 28, 2025
605a688
fix functionary v3.2 trigger
ochafik Dec 28, 2025
ece088b
test-chat: aggregate failures, warn about skips, filter w/ TEST=name …
ochafik Dec 28, 2025
37e9cca
Update test-chat.cpp
ochafik Dec 28, 2025
375f225
define all template_caps.end_tokens
ochafik Dec 28, 2025
e021c88
peg mappers: ignore structural wrappers
ochafik Dec 28, 2025
b6eeb3b
peg grammar: optional of epsilon is epsilon
ochafik Dec 28, 2025
36f91f4
test-kimi-k2: switch to thinking model!
ochafik Dec 28, 2025
dcdf731
test-ministral-3: tool call do have ids
ochafik Dec 28, 2025
f9fc9aa
fix end_tokens typos, unskip / reskip (+ TEST=skipped syntax)
ochafik Dec 28, 2025
acb4212
fix TEST=llama_3_x_legacy test-chat
ochafik Dec 28, 2025
f5d3d2b
fix TEST=gpt_oss_experimental
ochafik Dec 28, 2025
50b8f69
test-chat: better logging upon failure (dump messages)
ochafik Dec 28, 2025
7c5b46d
rename: test_systematic_needle_streaming -> run_template_test_suite
ochafik Dec 28, 2025
7de1a40
have TEST match substrings
ochafik Dec 28, 2025
636fc51
fix typo in test-nemotron-v2
ochafik Dec 28, 2025
304569c
pass param_ends as array of delimiters
ochafik Dec 28, 2025
93dfce6
qwen3-coder: use additional stops instead of consume_end_block()
ochafik Dec 28, 2025
3085e34
qwen3-coder: add test case that required multiple param_ends
ochafik Dec 28, 2025
40fcc3f
make TOOL_NAME and TOOL_ARG_NAME ~atomic in mappers
ochafik Dec 28, 2025
64ee540
fix streaming diff bugs for Command R7B and Mistral Nemo
ochafik Dec 29, 2025
259a3ca
peg-parser: add unicode-aware trie for GBNF exclusion patterns
ochafik Dec 30, 2025
f9bdd02
chat-peg-parser: fix streaming regressions for tool calls
ochafik Dec 30, 2025
cd209af
chat: fix magistral template detection and parser format
ochafik Dec 30, 2025
3b731a1
chat-parsers: allow leading whitespace before tool calls
ochafik Dec 30, 2025
efc4347
chat-parsers: add content-only fallback when tools provided but not c…
ochafik Dec 30, 2025
4c0c2aa
deepseek-r1: fix tool call parsing with trailing whitespace
ochafik Dec 30, 2025
32d2958
nemotron-v3: fix parameter delimiter parsing
ochafik Dec 30, 2025
82b1f56
test-kimi-k2: adjust for template's message splitting behavior
ochafik Dec 30, 2025
e8dd3f3
kimi-k2: fix p.chars() character class syntax
ochafik Dec 30, 2025
7ed8e54
test: add server test exclusion list for experimental parsers
ochafik Dec 30, 2025
77b5cf1
peg-parser: handle zero-repetition as eps()
ochafik Dec 30, 2025
d6224b6
test-lfm2: skip needle test suite for legacy parser
ochafik Dec 31, 2025
c4ff3e4
generic: allow optional content field in tool call response
ochafik Dec 31, 2025
ee18b3b
rm dead code: common_chat_peg_mapper_func, content_until in qwen3-coder
ochafik Dec 31, 2025
ae718c2
test_peg_parser: pass impl parameter instead of forcing experimental
ochafik Dec 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 0 additions & 81 deletions AGENTS.md

This file was deleted.

7 changes: 6 additions & 1 deletion common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,18 +44,23 @@ endif()

set(TARGET common)

# Glob chat parser files from the chat-parsers directory
file(GLOB CHAT_SYNTAX_SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/chat-parsers/*.cpp")

add_library(${TARGET} STATIC
arg.cpp
arg.h
base64.hpp
chat-parser.cpp
chat-parser.h
chat-parser-xml-toolcall.h
chat-parser-xml-toolcall.cpp
chat-parser-xml-toolcall.h
chat-peg-parser.cpp
chat-peg-parser.h
chat-parsers-internal.h
chat.cpp
chat.h
${CHAT_SYNTAX_SOURCES}
common.cpp
common.h
console.cpp
Expand Down
8 changes: 8 additions & 0 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2880,6 +2880,14 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
params.prefill_assistant = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_PREFILL_ASSISTANT"));
add_opt(common_arg(
{"--experimental-new-parsers"},
"use experimental new PEG parsers instead of legacy parsers for chat template output parsing (default: disabled)",
[](common_params & params) {
params.experimental_new_parsers = true;
params.use_jinja = true;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_EXPERIMENTAL_NEW_PARSERS"));
add_opt(common_arg(
{"-sps", "--slot-prompt-similarity"}, "SIMILARITY",
string_format("how much the prompt of a request must match the prompt of a slot in order to use that slot (default: %.2f, 0.0 = disabled)\n", params.slot_prompt_similarity),
Expand Down
2 changes: 2 additions & 0 deletions common/chat-parser-xml-toolcall.cpp
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
// TODO(ochafik): remove once --experimental-new-parsers graduates.
#include "chat-parser-xml-toolcall.h"
#include "chat.h"
#include "chat-parser.h"
#include "common.h"
Expand Down
1 change: 1 addition & 0 deletions common/chat-parser-xml-toolcall.h
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
// TODO(ochafik): remove once --experimental-new-parsers graduates.
#pragma once

#include "chat.h"
Expand Down
45 changes: 44 additions & 1 deletion common/chat-parser.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "chat-parser.h"
#include "chat-peg-parser.h"
#include "chat.h"
#include "common.h"
#include "log.h"
#include "peg-parser.h"
Expand All @@ -8,6 +9,7 @@
#include <algorithm>
#include <cctype>
#include <optional>
#include <sstream>
#include <stdexcept>
#include <string>
#include <string_view>
Expand Down Expand Up @@ -653,6 +655,7 @@ void common_chat_msg_parser::clear_tools() {
* All common_chat_parse_* moved from chat.cpp to chat-parser.cpp below
* to reduce incremental compile time for parser changes.
*/
// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_generic(common_chat_msg_parser & builder) {
if (!builder.syntax().parse_tool_calls) {
builder.add_content(builder.consume_rest());
Expand Down Expand Up @@ -685,6 +688,7 @@ static void common_chat_parse_generic(common_chat_msg_parser & builder) {
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_mistral_nemo(common_chat_msg_parser & builder) {
if (!builder.syntax().parse_tool_calls) {
builder.add_content(builder.consume_rest());
Expand All @@ -695,6 +699,7 @@ static void common_chat_parse_mistral_nemo(common_chat_msg_parser & builder) {
parse_prefixed_json_tool_call_array(builder, prefix);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_magistral(common_chat_msg_parser & builder) {
builder.try_parse_reasoning("[THINK]", "[/THINK]");

Expand All @@ -707,6 +712,7 @@ static void common_chat_parse_magistral(common_chat_msg_parser & builder) {
parse_prefixed_json_tool_call_array(builder, prefix);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_command_r7b(common_chat_msg_parser & builder) {
builder.try_parse_reasoning("<|START_THINKING|>", "<|END_THINKING|>");

Expand Down Expand Up @@ -740,6 +746,7 @@ static void common_chat_parse_command_r7b(common_chat_msg_parser & builder) {
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_llama_3_1(common_chat_msg_parser & builder, bool with_builtin_tools = false) {
builder.try_parse_reasoning("<think>", "</think>");

Expand Down Expand Up @@ -798,6 +805,7 @@ static void common_chat_parse_llama_3_1(common_chat_msg_parser & builder, bool w

}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_deepseek_r1(common_chat_msg_parser & builder) {
builder.try_parse_reasoning("<think>", "</think>");
if (!builder.syntax().parse_tool_calls) {
Expand All @@ -819,6 +827,7 @@ static void common_chat_parse_deepseek_r1(common_chat_msg_parser & builder) {
tool_calls_end);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_deepseek_v3_1_content(common_chat_msg_parser & builder) {
static const common_regex function_regex("(?:<|tool▁call▁begin|>)?([^\\n<]+)(?:<|tool▁sep|>)");

Expand All @@ -843,6 +852,7 @@ static void common_chat_parse_deepseek_v3_1_content(common_chat_msg_parser & bui
tool_calls_end);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_deepseek_v3_1(common_chat_msg_parser & builder) {
// DeepSeek V3.1 outputs reasoning content between "<think>" and "</think>" tags, followed by regular content
// First try to parse using the standard reasoning parsing method
Expand Down Expand Up @@ -879,6 +889,7 @@ static void common_chat_parse_deepseek_v3_1(common_chat_msg_parser & builder) {
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_minimax_m2(common_chat_msg_parser & builder) {
static const xml_tool_call_format form {
/* form.scope_start = */ "<minimax:tool_call>",
Expand All @@ -893,6 +904,7 @@ static void common_chat_parse_minimax_m2(common_chat_msg_parser & builder) {
builder.consume_reasoning_with_xml_tool_calls(form, "<think>", "</think>");
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_qwen3_coder_xml(common_chat_msg_parser & builder) {
static const xml_tool_call_format form = ([]() {
xml_tool_call_format form {};
Expand All @@ -910,6 +922,7 @@ static void common_chat_parse_qwen3_coder_xml(common_chat_msg_parser & builder)
builder.consume_reasoning_with_xml_tool_calls(form);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_kimi_k2(common_chat_msg_parser & builder) {
static const xml_tool_call_format form = ([]() {
xml_tool_call_format form {};
Expand All @@ -929,6 +942,7 @@ static void common_chat_parse_kimi_k2(common_chat_msg_parser & builder) {
builder.consume_reasoning_with_xml_tool_calls(form, "<think>", "</think>");
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_apriel_1_5(common_chat_msg_parser & builder) {
static const xml_tool_call_format form = ([]() {
xml_tool_call_format form {};
Expand All @@ -948,6 +962,7 @@ static void common_chat_parse_apriel_1_5(common_chat_msg_parser & builder) {
builder.consume_reasoning_with_xml_tool_calls(form, "<thinking>", "</thinking>");
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_xiaomi_mimo(common_chat_msg_parser & builder) {
static const xml_tool_call_format form = ([]() {
xml_tool_call_format form {};
Expand All @@ -966,6 +981,7 @@ static void common_chat_parse_xiaomi_mimo(common_chat_msg_parser & builder) {
builder.consume_reasoning_with_xml_tool_calls(form);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_gpt_oss(common_chat_msg_parser & builder) {
static const std::string constraint = "(?: (<\\|constrain\\|>)?([a-zA-Z0-9_-]+))";
static const std::string recipient("(?: to=functions\\.([^<\\s]+))");
Expand Down Expand Up @@ -1054,6 +1070,7 @@ static void common_chat_parse_gpt_oss(common_chat_msg_parser & builder) {
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_glm_4_5(common_chat_msg_parser & builder) {
static const xml_tool_call_format form {
/* form.scope_start = */ "",
Expand All @@ -1069,6 +1086,7 @@ static void common_chat_parse_glm_4_5(common_chat_msg_parser & builder) {
builder.consume_reasoning_with_xml_tool_calls(form, "<think>", "</think>");
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_firefunction_v2(common_chat_msg_parser & builder) {
if (!builder.syntax().parse_tool_calls) {
builder.add_content(builder.consume_rest());
Expand All @@ -1078,6 +1096,7 @@ static void common_chat_parse_firefunction_v2(common_chat_msg_parser & builder)
parse_prefixed_json_tool_call_array(builder, prefix, /* rstrip_prefix= */ 1);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_functionary_v3_2(common_chat_msg_parser & builder) {
static const common_regex function_regex_start_only(R"((\w+\n\{|python\n|all\n))");
static const common_regex function_regex(R"(>>>(\w+\n\{|python\n|all\n))");
Expand Down Expand Up @@ -1107,6 +1126,7 @@ static void common_chat_parse_functionary_v3_2(common_chat_msg_parser & builder)
});
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_functionary_v3_1_llama_3_1(common_chat_msg_parser & builder) {
if (!builder.syntax().parse_tool_calls) {
builder.add_content(builder.consume_rest());
Expand All @@ -1133,6 +1153,7 @@ static void common_chat_parse_functionary_v3_1_llama_3_1(common_chat_msg_parser
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_hermes_2_pro(common_chat_msg_parser & builder) {
builder.try_parse_reasoning("<think>", "</think>");
if (!builder.syntax().parse_tool_calls) {
Expand Down Expand Up @@ -1211,6 +1232,7 @@ static void common_chat_parse_hermes_2_pro(common_chat_msg_parser & builder) {
builder.add_content(builder.consume_rest());
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_granite(common_chat_msg_parser & builder) {
// Parse thinking tags
static const common_regex start_think_regex(regex_escape("<think>"));
Expand Down Expand Up @@ -1258,6 +1280,7 @@ static void common_chat_parse_granite(common_chat_msg_parser & builder) {
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_nemotron_v2(common_chat_msg_parser & builder) {
// Parse thinking tags
builder.try_parse_reasoning("<think>", "</think>");
Expand Down Expand Up @@ -1285,6 +1308,7 @@ static void common_chat_parse_nemotron_v2(common_chat_msg_parser & builder) {
builder.add_content(builder.consume_rest());
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_apertus(common_chat_msg_parser & builder) {
// Parse thinking tags
builder.try_parse_reasoning("<|inner_prefix|>", "<|inner_suffix|>");
Expand Down Expand Up @@ -1317,6 +1341,7 @@ static void common_chat_parse_apertus(common_chat_msg_parser & builder) {
}


// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_lfm2(common_chat_msg_parser & builder) {
if (!builder.syntax().parse_tool_calls) {
builder.add_content(builder.consume_rest());
Expand Down Expand Up @@ -1381,6 +1406,7 @@ static void common_chat_parse_lfm2(common_chat_msg_parser & builder) {
}
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
static void common_chat_parse_seed_oss(common_chat_msg_parser & builder) {
static const xml_tool_call_format form {
/* form.scope_start = */ "<seed:tool_call>",
Expand Down Expand Up @@ -1486,11 +1512,15 @@ static void common_chat_parse(common_chat_msg_parser & builder) {
}

common_chat_msg common_chat_parse(const std::string & input, bool is_partial, const common_chat_syntax & syntax) {
// Use PEG parser if format explicitly requires it (backward compatibility)
if (syntax.format == COMMON_CHAT_FORMAT_PEG_SIMPLE ||
syntax.format == COMMON_CHAT_FORMAT_PEG_NATIVE ||
syntax.format == COMMON_CHAT_FORMAT_PEG_CONSTRUCTED) {
return common_chat_peg_parse(syntax.parser, input, is_partial, syntax);
}

// TODO(ochafik): remove once --experimental-new-parsers graduates.
// Legacy non-PEG parsing path
common_chat_msg_parser builder(input, is_partial, syntax);
try {
common_chat_parse(builder);
Expand Down Expand Up @@ -1519,7 +1549,20 @@ common_chat_msg common_chat_peg_parse(const common_peg_arena & parser, const std
common_peg_parse_context ctx(input, is_partial);
auto result = parser.parse(ctx);
if (result.fail()) {
throw std::runtime_error(std::string("Failed to parse input at pos ") + std::to_string(result.end));
std::ostringstream oss;
oss << "Failed to parse input at pos " << result.end;
oss << " (format: " << common_chat_format_name(syntax.format) << ")";
oss << "\n\nInput (" << input.size() << " chars):\n" << input;
if (result.end < input.size()) {
oss << "\n\nContext around failure (pos " << result.end << "):\n";
size_t start = result.end > 20 ? result.end - 20 : 0;
size_t end = std::min(result.end + 20, input.size());
if (start > 0) oss << "...";
oss << input.substr(start, end - start);
if (end < input.size()) oss << "...";
oss << "\n" << std::string(start > 0 ? 3 : 0, ' ') << std::string(result.end - start, ' ') << "^";
}
throw std::runtime_error(oss.str());
}

common_chat_msg msg;
Expand Down
Loading
Loading