Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## [Unreleased]

### Learning Tutor Agent — plans what to study next over real SRS state (AI-Agent-2) — backend (2026-06-24)

The third and largest agent: a **Tutor** that reasons over the learner's actual vocabulary state and **plans what to study next**, rather than running a fixed review queue. `TutorAgent` runs on the existing `AgentLoop` runtime and calls four thin `ITool`s — `get_due_vocabulary` (due/near-due SRS cards), `get_weak_vocabulary` (lowest-accuracy / earliest-stage words), `get_reading_context` (what they're actually reading — keeps practice tied to reading, the product thesis), and `get_example_sentence` (a real in-context sentence: the learner's saved sentence, else a **spoiler-gated, owner-isolated RAG** pull from their own book) — then emits an **ordered study plan** (`{wordId, word, stage, exerciseType, difficulty, why}` + an overall `rationale` + a `readingNudge`), exercise type/difficulty **recalibrated from the real SRS stage** (recognition→recall→context-cloze). **Server-held `tutor_session`** (new entity/table, jsonb `PlanJson`, status, turn count) persists the plan between turns; **HITL**: `POST /me/tutor/session` starts/resumes and `POST /me/tutor/session/{id}/feedback` re-plans on the learner's results — re-fetching state (so SRS updates are seen), deterministically **dropping cards just answered correctly**, ignoring feedback for ids not in the prior plan, and preserving the session length. **Two hard guarantees, QA-verified**: (1) **anti-hallucination** — every scheduled `wordId` must come from a `get_due`/`get_weak` tool result (harvested ok-only from the transcript), word+stage **re-projected** from the real row, invented ids dropped, empty transcript → empty plan (the model can't fabricate or rename a card); (2) **cross-user isolation** — the example-sentence tool resolves the card with `Id == wordId && UserId == userId` and the RAG path filters on `user_id AND user_book_id`, so no other user's `user_chapter_chunk` content is reachable. All inbound book text (example sentences from user uploads, reading titles) is run through `ExternalTextSanitizer` + length-capped before entering the prompt (prompt-injection boundary). Telemetry: each turn persists an `agent_run` (agent=`tutor`, `tool_calls_count`); route `tutor.agent → gpt-4.1-mini`. **Eval**: `TutorEvalRunner` (deterministic structural rubric over synthetic learner states — due-coverage, weak-targeting, difficulty-appropriateness, no-hallucination, thesis-alignment; a golden where weak ∉ due makes weak-targeting discriminating), admin-runnable `POST /admin/ai-quality/tutor/eval`. EF migration `AddTutorSession` (reversible). `dotnet build` green, `dotnet format` clean; 968 unit + 72 AiEvals tests green. **Deferred**: SSE streaming, the tutor UI surface (frontend/mobile slice), generated free-text exercises beyond MC reuse, longitudinal pedagogical-efficacy A/B (offline evals validate planner mechanics, not learning outcomes). Completes the 3-agent roadmap (`docs/04-dev/agents-roadmap.md`); Agent 1 (Enrichment) + Agent 3 (Librarian) already shipped.

### Librarian Agent — natural-language catalog discovery via a ReAct tool-use loop (AI-Agent-3) — backend (2026-06-23)

A second true **agent**: turn a natural-language request — *"find books like 1984 about surveillance, in English, under 300 pages"* — into a ranked, **reasoned** list of recommendations. The `LibrarianAgent` runs on the existing `AgentLoop` runtime (plan→act→observe, hard `MaxSteps:6`/`CostCapUsd:0.04` caps, persisted transcript) and **decides** how to search: two new `ITool`s wrap the existing catalog search — `search_library` (keyword, wraps the Postgres FTS provider) and `search_library_semantic` ("books like X"/conceptual, wraps the AI-057 hybrid FTS+embedding RRF) — plus it **reuses Agent 1's** `search_open_library`/`get_open_library_work` to expand externally when the library is thin. Both library tools share one `LibrarySearchService` seam that runs the real search, collapses chapter hits to distinct editions, and **enriches** each with the metadata the agent post-filters on (authors, genres, language, aggregate word count → `approxPages` at ~275 w/page, since catalog editions carry no year/page column). **Constraints (language, length) are deterministic post-filters** over the returned metadata — the agent reasons over real rows, FTS isn't trusted to enforce them. **Anti-hallucination is enforced in code, not the prompt**: a `RetrievedCatalog` is rebuilt from the run's `tool_result` transcript (only `ok:true` results), and `Parse` drops any `library` recommendation whose `editionId` wasn't actually retrieved and any `open_library` suggestion whose title wasn't seen — surviving `library` recs **re-project** their title/slug/authors from the retrieved row, so the model can't even rename a real book. Each result carries **provenance** (`library` vs `open_library`) + a one-line `why`; `usedExternal` is derived from what survived grounding, not the model's flag. **Recommend-only this slice** — external hits are clearly-marked suggestions ("not in your library yet"); **no ingest** (copyright + scope; ingest/HITL deferred). All external + user free-text runs through `ExternalTextSanitizer` (untrusted DATA, never instructions). Endpoint `POST /me/librarian` (authenticated, rate-limited `librarian` 8/min, **JSON** — SSE deferred) → `{ recommendations[], reasoning, usedExternal, runId }`; persists an `agent_run` (agent=`librarian`) with `tool_calls_count`. Model route `librarian.agent → openai-explain` (gpt-4.1-mini). **Eval**: `LibrarianEvalRunner` (10 goldens: in-library / constrained / needs-external) scores **recall@k**, **constraint-satisfaction**, **coverage-decision accuracy** (expand externally exactly when thin), and the **hallucination-free rate** (every returned library slug genuinely exists, via a DB probe); admin-runnable `POST /admin/ai-quality/librarian/eval` (503 keyless). `dotnet build textstack.sln` green, `dotnet format` clean; 934 unit tests green (tool schema + page-estimate + shaping/provenance, `RetrievedCatalog` grounding incl. failed-result-contributes-nothing, Parse anti-hallucination/re-projection/de-dup/external-allowlist, loop one-shot/library-then-summarize/external-expansion/invented-book-dropped/injection-sanitized/budget-exhausted, eval recall+coverage+hallucination-probe). **No migration** (reuses `agent_run` + existing `tool_calls_count`). **Deferred**: ingest/HITL confirmation, SSE streaming, dedicated book-similarity index, user-library personalization, catalog year/page-count coverage. Design: `docs/04-dev/agents-roadmap.md` §4. **Needs a real-model run** to read live recall/constraint numbers on gpt-4.1-mini against a seeded catalog.
Expand Down
73 changes: 73 additions & 0 deletions backend/src/Ai/TextStack.Ai.EvalSuite/Datasets/tutor.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
[
{
"Name": "due-and-weak-mix",
"Cards": [
{ "WordId": "11111111-0000-0000-0000-000000000001", "Word": "ostensibly", "Stage": 1, "ConsecutiveCorrect": 0, "Accuracy": 0.30, "Due": true, "HasSentence": true },
{ "WordId": "11111111-0000-0000-0000-000000000002", "Word": "ephemeral", "Stage": 2, "ConsecutiveCorrect": 1, "Accuracy": 0.45, "Due": true, "HasSentence": true },
{ "WordId": "11111111-0000-0000-0000-000000000003", "Word": "sanguine", "Stage": 0, "ConsecutiveCorrect": 0, "Accuracy": 0.20, "Due": true, "HasSentence": true },
{ "WordId": "11111111-0000-0000-0000-000000000004", "Word": "lucid", "Stage": 4, "ConsecutiveCorrect": 5, "Accuracy": 0.95, "Due": false, "HasSentence": true },
{ "WordId": "11111111-0000-0000-0000-000000000005", "Word": "candid", "Stage": 3, "ConsecutiveCorrect": 2, "Accuracy": 0.80, "Due": false, "HasSentence": true }
],
"ReadingBook": "Nineteen Eighty-Four",
"ReadingLanguage": "en",
"ExpectedDueWordIds": [
"11111111-0000-0000-0000-000000000001",
"11111111-0000-0000-0000-000000000002",
"11111111-0000-0000-0000-000000000003"
],
"ExpectedWeakWordIds": [
"11111111-0000-0000-0000-000000000003",
"11111111-0000-0000-0000-000000000001"
]
},
{
"Name": "all-early-stage",
"Cards": [
{ "WordId": "22222222-0000-0000-0000-000000000001", "Word": "obfuscate", "Stage": 0, "ConsecutiveCorrect": 0, "Accuracy": 0.10, "Due": true, "HasSentence": false },
{ "WordId": "22222222-0000-0000-0000-000000000002", "Word": "ponderous", "Stage": 1, "ConsecutiveCorrect": 0, "Accuracy": 0.25, "Due": true, "HasSentence": true }
],
"ReadingBook": "Dracula",
"ReadingLanguage": "en",
"ExpectedDueWordIds": [
"22222222-0000-0000-0000-000000000001",
"22222222-0000-0000-0000-000000000002"
],
"ExpectedWeakWordIds": [
"22222222-0000-0000-0000-000000000001",
"22222222-0000-0000-0000-000000000002"
]
},
{
"Name": "context-stage-no-sentence-downgrades",
"Cards": [
{ "WordId": "33333333-0000-0000-0000-000000000001", "Word": "ineffable", "Stage": 4, "ConsecutiveCorrect": 1, "Accuracy": 0.55, "Due": true, "HasSentence": false },
{ "WordId": "33333333-0000-0000-0000-000000000002", "Word": "quixotic", "Stage": 3, "ConsecutiveCorrect": 2, "Accuracy": 0.60, "Due": true, "HasSentence": true }
],
"ReadingBook": "Don Quixote",
"ReadingLanguage": "en",
"ExpectedDueWordIds": [
"33333333-0000-0000-0000-000000000001",
"33333333-0000-0000-0000-000000000002"
],
"ExpectedWeakWordIds": [
"33333333-0000-0000-0000-000000000001"
]
},
{
"Name": "weak-not-due-vs-due-not-weak",
"Cards": [
{ "WordId": "44444444-0000-0000-0000-000000000001", "Word": "recalcitrant", "Stage": 1, "ConsecutiveCorrect": 0, "Accuracy": 0.15, "Due": false, "HasSentence": true },
{ "WordId": "44444444-0000-0000-0000-000000000002", "Word": "perfunctory", "Stage": 3, "ConsecutiveCorrect": 4, "Accuracy": 0.90, "Due": true, "HasSentence": true },
{ "WordId": "44444444-0000-0000-0000-000000000003", "Word": "taciturn", "Stage": 4, "ConsecutiveCorrect": 5, "Accuracy": 0.95, "Due": true, "HasSentence": true }
],
"ReadingBook": "Crime and Punishment",
"ReadingLanguage": "en",
"ExpectedDueWordIds": [
"44444444-0000-0000-0000-000000000002",
"44444444-0000-0000-0000-000000000003"
],
"ExpectedWeakWordIds": [
"44444444-0000-0000-0000-000000000001"
]
}
]
Loading
Loading