feat(eval): user-book RAG eval — grounding validation for uploads by mrviduus · Pull Request #387 · mrviduus/textstack

mrviduus · 2026-06-22T21:16:23Z

User books are the priority; the catalog RAG eval (editionId + hardcoded DDIA golden set) didn't cover them. Adds a content-agnostic user-book eval that automates the live grounding validation.

UserBookRagEvalRunner: 6 generated grounding probes (synthesise a question from each retrieved chunk → real Ask path → judge citation support via the shared rubric) + a greeting probe (structural: warm, 0 citations, not insufficient, no [n]) + an off-book probe (judge: no invented facts). No recall@k (arbitrary content); no spoiler gate (user owns the doc).
POST /admin/rag/userbook/{id}/eval?judge=openai — admin; resolves owner, logs target userId (privacy), 503 if no key. Persists eval_run tags rag.userbook.{citation,behavior,retrieval} → /ai-quality.
Empty/un-embedded book → short-circuit, no LLM call.
Extracted shared CitationJudge so the catalog RagEvalRunner + the user-book runner share one rubric copy.

61 AiEvals (catalog runner still green after the extraction) + 886 unit tests green; build clean. P2 = admin UI trigger. Already validated the behavior live on the owner's DDIA upload — this automates it.

🤖 Generated with Claude Code

…loads User books are the priority; the catalog-only RAG eval (editionId + hardcoded DDIA golden set) didn't cover them. New content-agnostic user-book eval: - UserBookRagEvalRunner: 6 generated grounding probes (question synthesised from each retrieved chunk -> real Ask path -> judge citation support, shared rubric) + greeting probe (structural: warm, 0 citations, not insufficient, no [n]) + off-book probe (judge: no invented facts). No expected-chapter recall (can't, arbitrary content); no spoiler gate (user owns the doc). - POST /admin/rag/userbook/{id}/eval?judge=openai (admin; resolves owner, logs target userId for privacy; 503 no key). Persists eval_run tags rag.userbook.{citation,behavior,retrieval} -> visible on /ai-quality. - Empty/un-embedded book -> short-circuit, NO generator/judge LLM call. - Extracted shared CitationJudge (rubric + JudgeCitations + MakeRun) so the catalog RagEvalRunner and the user-book runner share one copy. architect-planned. 61 AiEvals (catalog runner still green after extraction) + 886 unit tests green; build clean. P2 = admin UI trigger button. Validated the behavior live earlier on the owner's DDIA upload (grounds+cites, warm greeting, graceful off-book, multi-turn) — this automates it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mrviduus merged commit 89b23be into main Jun 22, 2026
5 checks passed

mrviduus deleted the userbook-rag-eval branch June 22, 2026 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): user-book RAG eval — grounding validation for uploads#387

feat(eval): user-book RAG eval — grounding validation for uploads#387
mrviduus merged 1 commit into
mainfrom
userbook-rag-eval

mrviduus commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrviduus commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant