_call_claude_p_judge: strip markdown code fences from envelope.result before json.loads

## Bug

`_call_claude_p_judge` in `scripts/extract_and_label_intent_corpus.py` calls `json.loads(envelope["result"].strip())` directly, but Claude often wraps JSON responses in markdown code fences:

\`\`\`
"result": "\`\`\`json\n{\"intent\":\"implement\",\"confidence\":0.95}\n\`\`\`"
\`\`\`

`json.loads` on the fenced string raises `JSONDecodeError`, the function returns `None`, and the prompt is silently dropped as `judge_failure`.

## Reproduction (post-#1064 fix)

\`\`\`bash
$ python3 -c "
import sys; sys.path.insert(0, 'scripts')
from extract_and_label_intent_corpus import _call_claude_p_judge
print(_call_claude_p_judge('implement pagination', model='claude-haiku-4-5-20251001'))
"
None
\`\`\`

Manual subprocess call confirms claude -p works (returncode=0, is_error=false), but the result field contains:

\`\`\`
"result":"\`\`\`json\n{\"intent\":\"implement\",\"confidence\":0.95}\n\`\`\`"
\`\`\`

## Why existing tests didn't catch it

All unit tests provide clean unfenced JSON in their mocked envelopes:
\`\`\`python
fake_envelope = {"result": '{"intent": "implement", "confidence": 0.9}', ...}
\`\`\`
So the markdown-fence behavior was never exercised at the unit-test layer.

## Fix options (1-line each)

**Option A**: Strip markdown fences defensively before parsing:

\`\`\`python
raw = envelope.get("result")
if not isinstance(raw, str):
    return None
raw = raw.strip()
# Strip markdown code fences if present
if raw.startswith("\`\`\`"):
    raw = raw.split("\n", 1)[1] if "\n" in raw else raw
    if raw.endswith("\`\`\`"):
        raw = raw.rsplit("\`\`\`", 1)[0]
    raw = raw.strip()
try:
    parsed = json.loads(raw)
except json.JSONDecodeError:
    return None
\`\`\`

**Option B**: Strengthen the system prompt + use \`--json-schema\`:

\`\`\`python
cmd = [
    "claude", "-p",
    "--output-format", "json",
    "--json-schema", '{"type":"object","properties":{"intent":{"type":"string","enum":[...13 classes]},"confidence":{"type":"number"}},"required":["intent","confidence"]}',
    "--model", model,
    "--max-turns", "1",
    "--system-prompt", _JUDGE_SYSTEM_PROMPT,
]
\`\`\`

Then read \`envelope["structured_output"]\` instead of parsing \`envelope["result"]\`. Eliminates the markdown-fence problem entirely because Anthropic enforces the schema server-side.

## Recommendation

Option B is more robust (server-side schema enforcement, no client-side regex parsing) but adds a longer command line. Option A is the minimal fix.

For now: Option A as a quick fix, possibly migrate to Option B in a follow-up.

## Regression test

Add a test where the mocked envelope has \`"result": "\`\`\`json\n{...}\n\`\`\`"\` and assert \`_call_claude_p_judge\` returns the parsed dict (not None).

## Severity

HIGH — the corpus refresh (the original use case for the refactor) cannot complete without this fix. Combined with #1064, this is the second silent failure in the refactor.

## Related

- #1064: parent fix that made claude -p work from this repo (cwd=Path.home())
- d11556f: original refactor commit
- a7875d5: #1064 fix commit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_call_claude_p_judge: strip markdown code fences from envelope.result before json.loads #1065

Bug

Reproduction (post-#1064 fix)

Why existing tests didn't catch it

Fix options (1-line each)

Strip markdown code fences if present

Recommendation

Regression test

Severity

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

_call_claude_p_judge: strip markdown code fences from envelope.result before json.loads #1065

Description

Bug

Reproduction (post-#1064 fix)

Why existing tests didn't catch it

Fix options (1-line each)

Strip markdown code fences if present

Recommendation

Regression test

Severity

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions