feat(ai): Librarian Agent — tool-using catalog discovery (AI-Agent-3)#394
Merged
Conversation
Second agent. NL catalog request ("books like X about Y in English under 300p")
-> ranked, reasoned recommendations via a ReAct loop on AgentLoop. Tools wrap
existing search: search_library (FTS) + search_library_semantic (AI-057 hybrid =
"books like X"), and reuse the Open Library tools for external discovery when the
library is thin. Constraints (language/length) post-filtered over tool metadata.
Anti-hallucination: every rec must come from a tool_result; library recs fully
re-projected from the retrieved row (model can't rename/fabricate), external recs
re-projected from the harvested OL row (authors/year/pages, not model echo);
symmetric normalized title matching; empty transcript -> zero recs.
Recommend-only (no ingest). Endpoint POST /me/librarian (auth, 8/min, >=2-char
guard). Eval LibrarianEvalRunner: recall@k + precision@k + F1 (anti-flood) +
hallucination-free rate; admin POST /admin/ai-quality/librarian/eval. Route
librarian.agent -> gpt-4.1-mini. No migration (reuses agent_run).
947 unit + 65 AiEvals green; build + format clean. Design: docs/04-dev/agents-roadmap.md.
Deferred: ingest/HITL, SSE, dedicated similarity index, user-library personalization.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent 3 of 3 (architect → backend → adversarial QA → fix cycle). Design:
docs/04-dev/agents-roadmap.md§4.Why
Catalog discovery today is keyword search. The Librarian turns a natural-language request — "books like 1984 about surveillance, in English, under 300 pages" — into ranked, reasoned recommendations, reaching outside the library only when it's thin.
What
AgentLoop(capsMaxSteps=6/CostCapUsd=0.04, persisted transcript). No new framework.search_library(FTS provider) +search_library_semantic(AI-057 hybrid = the "books like X" mechanism, no new index) + reused Open Library tools for external discovery.approxPages ≈ wordCount/275).tool_result; library recs are fully re-projected from the retrieved row (model can't rename/fabricate a slug/title/author); external recs are re-projected from the harvested Open Library row (authors/year/pages — not the model's echo); symmetric normalized title matching; empty transcript → zero recs.POST /me/librarian(auth, rate-limited 8/min, ≥2-char guard before any model call) →{ recommendations[{source, editionId?, title, authors, why, …}], reasoning, usedExternal, runId }.agent_run(agent=librarian,tool_calls_count). Routelibrarian.agent → gpt-4.1-mini. No migration.Eval
LibrarianEvalRunner(10 goldens): recall@k + precision@k + F1 (precision/F1 added in the fix round so an agent that floods the 8-cap with irrelevant-but-real books no longer scores green) + hallucination-free rate (DB slug-existence probe). Admin-runnable:POST /admin/ai-quality/librarian/eval.QA (adversarial) — applied
0 blockers; the runtime anti-hallucination invariant held under attack. Fixed: eval gameable by flooding (added precision/F1); external authors/year/pages were model-trusted (now re-projected from the OL row); endpoint missing min-query guard; brittle/asymmetric external-title matching (now symmetric normalized key).
Verify
dotnet buildgreen ·dotnet formatclean · 947 unit + 65 AiEvals tests green (incl. flood-resistance, external re-projection, no-search→empty, keyless-semantic→FTS fallback). Needs a real-model run for live recall/precision on gpt-4.1-mini against a seeded catalog (admin endpoint + key).Deferred
Ingest/HITL confirmation, SSE streaming, a dedicated book-similarity index, user-library personalization. Agent 2 (Tutor) is the remaining roadmap item.
🤖 Generated with Claude Code