fix: Improve model compatibility in parsing and logging #7

abinashkarki · 2026-01-04T14:31:39Z

Handle None model name in sub-call logging (verbose.py)
Accept both repl and python code block tags (parsing.py)
Handle special tokens around FINAL statements (parsing.py)

These changes improve compatibility with various LLM backends that use different output formats (e.g., Gemini, some OpenRouter models).

alexzhang13 · 2026-01-04T17:01:03Z

So I don't enable python tags because 1) it affects the prompting and 2) I wanted it to be distinct from when the model wants to output python code for whatever reason, e.g. in a task that requires outputting Python code.

The None call was handled in a separate PR, thanks though!

I can merge this if you revert the above 2, "Handle special tokens around FINAL statements (parsing.py)" is fine. I will think on it though if it's necessary. Thanks!

Improves compatibility with models that wrap FINAL/FINAL_VAR in special tokens like <|begin_of_box|>FINAL(...)<|end_of_box|>.

abinashkarki · 2026-01-05T03:38:55Z

Thanks for the feedback! I've reverted the python code block tag support and the None model handling (didn't realize it was already fixed in #5).

The updated PR now only includes the special token handling for FINAL statements, which helps with models that wrap output in tokens like <|begin_of_box|>FINAL(...)<|end_of_box|>.

alexzhang13 · 2026-01-11T03:43:56Z

@abinashkarki Returning to this, the only problem now is if the model is thinking / doing some kind of CoT and is saying "I will then do FINAL(...)...". We don't want to accept these cases -- I think we should just be stricter about what exactly we accept, e.g. maybe just some trivial special tokens in case this is something people are observing.

- Only accept FINAL at start of line (with optional whitespace) - Or immediately after special tokens like <|begin_of_box|> - Rejects mid-sentence usage like 'I will then do FINAL(...)' - Preserves GLM-4 compatibility which wraps FINAL in special tokens

abinashkarki · 2026-01-12T03:45:57Z

Thanks for the feedback @alexzhang13 Updated the regex to be stricter.

Why this matters: zai-org/glm-4.6v-flash wraps FINAL in special tokens like <|begin_of_box|>FINAL(4)<|end_of_box|>. Without handling this, benchmarks fail.

What changed: FINAL is now only accepted at:

Start of line (with optional whitespace)
After special tokens <|...|>

This rejects CoT false positives like "I will do FINAL(42)" while still accepting valid cases.

fix: Handle special tokens around FINAL statements in parsing

6a38945

Improves compatibility with models that wrap FINAL/FINAL_VAR in special tokens like <|begin_of_box|>FINAL(...)<|end_of_box|>.

abinashkarki force-pushed the fix/model-compatibility branch from 6b1416a to 6a38945 Compare January 5, 2026 03:36

Merge branch 'main' into fix/model-compatibility

dd6872f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Improve model compatibility in parsing and logging #7

fix: Improve model compatibility in parsing and logging #7

Uh oh!

abinashkarki commented Jan 4, 2026

Uh oh!

alexzhang13 commented Jan 4, 2026

Uh oh!

abinashkarki commented Jan 5, 2026

Uh oh!

alexzhang13 commented Jan 11, 2026

Uh oh!

abinashkarki commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Improve model compatibility in parsing and logging #7

Are you sure you want to change the base?

fix: Improve model compatibility in parsing and logging #7

Uh oh!

Conversation

abinashkarki commented Jan 4, 2026

Uh oh!

alexzhang13 commented Jan 4, 2026

Uh oh!

abinashkarki commented Jan 5, 2026

Uh oh!

alexzhang13 commented Jan 11, 2026

Uh oh!

abinashkarki commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants