fix(extract_json): tolerate non-strict model JSON (e.g. DeepSeek) by cnndabbler · Pull Request #325 · VectifyAI/PageIndex

cnndabbler · 2026-06-12T19:27:59Z

Problem

extract_json() assumes the entire model response is JSON and returns {} on any
JSONDecodeError. Callers then access keys directly (e.g.
toc_detector_single_page does json_content['toc_detected']), so a single
response with stray prose/code-fences around the JSON raises KeyError and aborts
the whole index build with Processing failed.

This reproduces intermittently on models that don't return bare JSON — e.g.
deepseek/deepseek-v4-flash via LiteLLM. OpenAI/GLM happened to match the strict
path, so it wasn't caught.

Fix

extract_json: add a balanced-brace fallback (_extract_balanced_json) that pulls
the first {...}/[...] object out of the raw response when direct parsing fails —
handles models that wrap JSON in prose or fences.
toc_detector_single_page: use .get('toc_detected', 'no') so one unparseable page
can't crash the run.

No behavior change for responses that already parse. Verified end-to-end on a 39-page
PDF with deepseek/deepseek-v4-flash (previously failed at TOC detection, now completes
at 100% accuracy).

🤖 Generated with Claude Code

extract_json() assumed the whole response is JSON and returned {} on any parse failure, which then KeyError-crashed callers (toc_detector_single_page) mid-index-build on models that wrap JSON in prose/fences. Add a balanced-brace fallback that pulls the first {...}/[...] object out of the raw response, and default toc_detector's key access so a single bad page can't abort the run. Repros on deepseek/deepseek-v4-flash; OpenAI/glm happened to match the strict path. Fixes intermittent 'Processing failed' on long PDFs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cnndabbler mentioned this pull request Jun 12, 2026

Bug/Fix: extract_json crashes index build on non-strict model JSON (e.g. DeepSeek) #326

Open

cnndabbler closed this by deleting the head repository Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(extract_json): tolerate non-strict model JSON (e.g. DeepSeek)#325

fix(extract_json): tolerate non-strict model JSON (e.g. DeepSeek)#325
cnndabbler wants to merge 1 commit into
VectifyAI:mainfrom
cnndabbler:pr/extract-json-robustness

cnndabbler commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cnndabbler commented Jun 12, 2026

Problem

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant