fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577)#626
Conversation
…y dedup Closes 666ghj#624, 666ghj#622, 666ghj#601, 666ghj#599, 666ghj#577 ## LLM JSON parsing (666ghj#624 / 666ghj#622 / 666ghj#601) - New `_parse_llm_json()` in llm_client.py with 5-stage fallback: 1. Strip markdown fences (existing) 2. Strict json.loads (fast path) 3. json.JSONDecoder.raw_decode (handles trailing prose after JSON) 4. Balanced-brace extraction (leading prose + embedded JSON) 5. Strip control chars + retry - Replaces strict json.loads in chat_json() that was failing on any LLM appending text after the JSON (common with qwen-plus, ollama, gemma even with response_format=json_object). - Logs which fallback was used so problematic LLMs are visible. - 8 unit-test cases covering each strategy. ## Report section tool_call leak (666ghj#599) - New `_sanitize_section_content()` in report_agent.py detects when a section's "final answer" is actually an unexecuted tool_call JSON (e.g. `{"name":"quick_search","parameters":{...}}`) and replaces it with a clear fallback message instead of writing the raw artifact to the report. - Applied at all 3 places where final_answer is returned in write_section(): the Final Answer path, the no-prefix fallback, and the force-finalize path. ## Chat history duplicate user message (666ghj#577) - In report_agent.py chat(), defensively dedupe chat_history: - Only keep {role, content} from history items - Skip entries that match the current message exactly - This prevents LLM from seeing a duplicate trailing user message and echoing back the previous answer. - Added debug log of constructed messages array for diagnostics.
|
So what do I do to actually make it work? |
|
Thanks for the quick patch in #626! However, please note that the live production website is STILL throwing the exact same error (as shown in my newly uploaded screenshot). It seems the fix has been coded and passed your local unit tests, but it hasn't been merged or deployed to the live production server yet. Paid users are still completely blocked by this Could you please nudge the team to deploy PR #626 to the production server ASAP? Thank you! |
Its 01/06/2026 and the issue is still present is anyone going to do anything with the issue? Paid subscribesr are locked out of the tool. |
Fix 4 open bugs: robust LLM JSON parsing + sanitize report sections + chat history dedup
Closes #624, #622, #601, #599, #577
Summary
This PR addresses four open bugs that all stem from how the backend handles slightly-non-conforming LLM outputs (trailing text, raw
tool_callJSON, duplicate user messages). Net change: backend is more defensive, no behavioral regressions on conforming outputs.Issues fixed
#624 / #622 / #601 —
Unexpected non-whitespace character after JSON at position N+ 500 on/api/graph/ontology/generateRoot cause:
chat_json()inbackend/app/utils/llm_client.pyruns strictjson.loads()after stripping Markdown fences. Many LLMs (qwen-plus, gemma, ollama-served models) append trailing prose after the JSON block even withresponse_format=json_object— strict parsing fails with the reported error.Fix: Extracted parsing into
_parse_llm_json()with a 5-stage strategy:json.loads(fast path)json.JSONDecoder().raw_decode()— parses JSON prefix, ignores trailing text (logs a warning)Falls through to a
ValueErrorwith a clear preview if everything fails. Logs visibility into which strategy succeeded so we can monitor which LLMs misbehave.Tests: 8 unit-test cases covering clean JSON, fenced JSON, trailing text, fences+trailing, leading prose, nested-objects+trailing, control chars, and the exact failure mode at position 10243 — all pass.
#599 — Section content is raw unexecuted
tool_callJSONRoot cause: In
backend/app/services/report_agent.py::write_section(), three code paths can return afinal_answerthat consists entirely of an unexecutedtool_callJSON (e.g.{"name":"quick_search","parameters":{...}}):Final Answer:All three paths now go through new
_sanitize_section_content()which detects the leak pattern (full content parses as JSON andnamematchesVALID_TOOL_NAMES) and replaces with a clear fallback message instead of leaking the raw artifact.#577 — Report-Agent chat repeats first answer regardless of follow-up question
Likely root cause: Frontend can (under some flows) include the just-sent user message inside the
chat_historyarray. Backend then appends the same message again at the end → LLM sees duplicate-trailing user message and ignores the latest one, returning the prior answer style.Fix: In
report_agent.py::chat(), defensively:{role, content}from history items (defensive against extra fields)This is defensive — even if the frontend filters correctly, this prevents regressions from future call sites.
Files changed
Verification
_parse_llm_jsonpass (8/8 cases covering each strategy)uv run python run.py→ all routes return correct validation errors on empty bodies (400/404, no more 500s on validation failures)Notes for maintainers
loggerwhen fallback strategies are used. Monitor these — high frequency from a specific model suggests adjusting that model's system prompt._sanitize_section_content()returns a German-language fallback message ("_(Hinweis: Für diesen Abschnitt..."). If the project wants Chinese/English, localize via the i18n system inapp/utils/locale.py.Step5Interaction.vue::sendToReportAgent) — but the backend defensive fix is independent and prevents regressions.