Skip to content

feat(eval): user-book RAG eval — grounding validation for uploads#387

Merged
mrviduus merged 1 commit into
mainfrom
userbook-rag-eval
Jun 22, 2026
Merged

feat(eval): user-book RAG eval — grounding validation for uploads#387
mrviduus merged 1 commit into
mainfrom
userbook-rag-eval

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

User books are the priority; the catalog RAG eval (editionId + hardcoded DDIA golden set) didn't cover them. Adds a content-agnostic user-book eval that automates the live grounding validation.

  • UserBookRagEvalRunner: 6 generated grounding probes (synthesise a question from each retrieved chunk → real Ask path → judge citation support via the shared rubric) + a greeting probe (structural: warm, 0 citations, not insufficient, no [n]) + an off-book probe (judge: no invented facts). No recall@k (arbitrary content); no spoiler gate (user owns the doc).
  • POST /admin/rag/userbook/{id}/eval?judge=openai — admin; resolves owner, logs target userId (privacy), 503 if no key. Persists eval_run tags rag.userbook.{citation,behavior,retrieval}/ai-quality.
  • Empty/un-embedded book → short-circuit, no LLM call.
  • Extracted shared CitationJudge so the catalog RagEvalRunner + the user-book runner share one rubric copy.

61 AiEvals (catalog runner still green after the extraction) + 886 unit tests green; build clean. P2 = admin UI trigger. Already validated the behavior live on the owner's DDIA upload — this automates it.

🤖 Generated with Claude Code

…loads

User books are the priority; the catalog-only RAG eval (editionId + hardcoded
DDIA golden set) didn't cover them. New content-agnostic user-book eval:
- UserBookRagEvalRunner: 6 generated grounding probes (question synthesised from
  each retrieved chunk -> real Ask path -> judge citation support, shared rubric)
  + greeting probe (structural: warm, 0 citations, not insufficient, no [n]) +
  off-book probe (judge: no invented facts). No expected-chapter recall (can't,
  arbitrary content); no spoiler gate (user owns the doc).
- POST /admin/rag/userbook/{id}/eval?judge=openai (admin; resolves owner, logs
  target userId for privacy; 503 no key). Persists eval_run tags
  rag.userbook.{citation,behavior,retrieval} -> visible on /ai-quality.
- Empty/un-embedded book -> short-circuit, NO generator/judge LLM call.
- Extracted shared CitationJudge (rubric + JudgeCitations + MakeRun) so the
  catalog RagEvalRunner and the user-book runner share one copy.

architect-planned. 61 AiEvals (catalog runner still green after extraction) +
886 unit tests green; build clean. P2 = admin UI trigger button. Validated the
behavior live earlier on the owner's DDIA upload (grounds+cites, warm greeting,
graceful off-book, multi-turn) — this automates it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus merged commit 89b23be into main Jun 22, 2026
5 checks passed
@mrviduus mrviduus deleted the userbook-rag-eval branch June 22, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant