Skip to content

feat(ai): Librarian Agent — tool-using catalog discovery (AI-Agent-3)#394

Merged
mrviduus merged 1 commit into
mainfrom
feat/librarian-agent
Jun 24, 2026
Merged

feat(ai): Librarian Agent — tool-using catalog discovery (AI-Agent-3)#394
mrviduus merged 1 commit into
mainfrom
feat/librarian-agent

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

Agent 3 of 3 (architect → backend → adversarial QA → fix cycle). Design: docs/04-dev/agents-roadmap.md §4.

Why

Catalog discovery today is keyword search. The Librarian turns a natural-language request — "books like 1984 about surveillance, in English, under 300 pages" — into ranked, reasoned recommendations, reaching outside the library only when it's thin.

What

  • ReAct loop on the existing AgentLoop (caps MaxSteps=6/CostCapUsd=0.04, persisted transcript). No new framework.
  • Tools wrap existing search: search_library (FTS provider) + search_library_semantic (AI-057 hybrid = the "books like X" mechanism, no new index) + reused Open Library tools for external discovery.
  • Constraints (language, "under N pages") are deterministic post-filters over tool metadata (catalog has no page column → approxPages ≈ wordCount/275).
  • Anti-hallucination (the headline, QA-verified solid): every recommendation must come from a tool_result; library recs are fully re-projected from the retrieved row (model can't rename/fabricate a slug/title/author); external recs are re-projected from the harvested Open Library row (authors/year/pages — not the model's echo); symmetric normalized title matching; empty transcript → zero recs.
  • Recommend-only — external hits are marked suggestions, no ingest (copyright/scope; HITL ingest deferred).
  • Endpoint POST /me/librarian (auth, rate-limited 8/min, ≥2-char guard before any model call) → { recommendations[{source, editionId?, title, authors, why, …}], reasoning, usedExternal, runId }.
  • Telemetry: agent_run (agent=librarian, tool_calls_count). Route librarian.agent → gpt-4.1-mini. No migration.

Eval

LibrarianEvalRunner (10 goldens): recall@k + precision@k + F1 (precision/F1 added in the fix round so an agent that floods the 8-cap with irrelevant-but-real books no longer scores green) + hallucination-free rate (DB slug-existence probe). Admin-runnable: POST /admin/ai-quality/librarian/eval.

QA (adversarial) — applied

0 blockers; the runtime anti-hallucination invariant held under attack. Fixed: eval gameable by flooding (added precision/F1); external authors/year/pages were model-trusted (now re-projected from the OL row); endpoint missing min-query guard; brittle/asymmetric external-title matching (now symmetric normalized key).

Verify

dotnet build green · dotnet format clean · 947 unit + 65 AiEvals tests green (incl. flood-resistance, external re-projection, no-search→empty, keyless-semantic→FTS fallback). Needs a real-model run for live recall/precision on gpt-4.1-mini against a seeded catalog (admin endpoint + key).

Deferred

Ingest/HITL confirmation, SSE streaming, a dedicated book-similarity index, user-library personalization. Agent 2 (Tutor) is the remaining roadmap item.

🤖 Generated with Claude Code

Second agent. NL catalog request ("books like X about Y in English under 300p")
-> ranked, reasoned recommendations via a ReAct loop on AgentLoop. Tools wrap
existing search: search_library (FTS) + search_library_semantic (AI-057 hybrid =
"books like X"), and reuse the Open Library tools for external discovery when the
library is thin. Constraints (language/length) post-filtered over tool metadata.

Anti-hallucination: every rec must come from a tool_result; library recs fully
re-projected from the retrieved row (model can't rename/fabricate), external recs
re-projected from the harvested OL row (authors/year/pages, not model echo);
symmetric normalized title matching; empty transcript -> zero recs.

Recommend-only (no ingest). Endpoint POST /me/librarian (auth, 8/min, >=2-char
guard). Eval LibrarianEvalRunner: recall@k + precision@k + F1 (anti-flood) +
hallucination-free rate; admin POST /admin/ai-quality/librarian/eval. Route
librarian.agent -> gpt-4.1-mini. No migration (reuses agent_run).

947 unit + 65 AiEvals green; build + format clean. Design: docs/04-dev/agents-roadmap.md.
Deferred: ingest/HITL, SSE, dedicated similarity index, user-library personalization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mrviduus mrviduus merged commit d9a93c8 into main Jun 24, 2026
5 checks passed
@mrviduus mrviduus deleted the feat/librarian-agent branch June 24, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant