feat(ai): Learning Tutor Agent — plans next study over real SRS state (AI-Agent-2) by mrviduus · Pull Request #397 · mrviduus/textstack

mrviduus · 2026-06-24T15:40:13Z

Agent 2 of 3 — completes the agent roadmap (architect → backend → adversarial QA → fix cycle). Design: docs/04-dev/agents-roadmap.md §3.

Why

Vocabulary review today is a fixed SRS queue. The Tutor reasons over the learner's real state (what's due, what's weak, what they're reading) and plans what to study next — exercise type + difficulty per card, with reasoning — keeping practice tied to reading (the product thesis), not turning into an endless drill.

What

ReAct on the existing AgentLoop (caps, persisted transcript). Four thin ITools: get_due_vocabulary, get_weak_vocabulary, get_reading_context, get_example_sentence (saved sentence → spoiler-gated, owner-isolated RAG over the learner's own book).
Server-side tutor_session (new entity/table, jsonb plan) holds state between turns. HITL: POST /me/tutor/session plans; POST /me/tutor/session/{id}/feedback re-plans on results — re-fetches state, deterministically drops just-passed cards, ignores feedback for ids not in the prior plan, preserves session length.
Exercise type/difficulty recalibrated from the real SRS stage (recognition→recall→context-cloze); plan bounded + a reading nudge.

Two hard guarantees (QA-verified, no holes)

Anti-hallucination — every scheduled wordId must come from a get_due/get_weak tool result (ok-only from the transcript); word+stage re-projected from the real row; invented ids dropped; empty transcript → empty plan. The model can't fabricate or rename a card, nor smuggle one via free-text.
Cross-user isolation — example-sentence card resolved Id == wordId && UserId == userId; RAG path filters user_id AND user_book_id. No other user's user_chapter_chunk content is reachable.
All inbound book text (user-upload sentences, reading titles) → ExternalTextSanitizer + length cap before the prompt (prompt-injection boundary — closed in the fix round).

QA (adversarial) — applied

0 blockers; both invariants held under attack. Fixed: book text was un-sanitized into the prompt (now sanitized in the tools); re-plan session-length reset via a JSON casing bug; re-plan had no code guard against re-surfacing passed cards / trusted arbitrary feedback ids; reading-context UserBook lookup missing owner filter; eval's weak-targeting wasn't discriminating (added a weak-∉-due golden).

Verify

dotnet build green · dotnet format clean · 968 unit + 72 AiEvals green. EF migration AddTutorSession (reversible, applied against throwaway pgvector). Telemetry agent_run agent=tutor; route tutor.agent → gpt-4.1-mini. Eval admin-runnable at POST /admin/ai-quality/tutor/eval (live numbers need a key).

Deferred

SSE streaming, the tutor UI surface (frontend/mobile — separate slice), generated free-text exercises beyond MC reuse, longitudinal pedagogical-efficacy A/B (offline evals validate planner mechanics, not learning outcomes).

🤖 Generated with Claude Code

… (AI-Agent-2) Third agent (completes the roadmap). Tutor reasons over the learner's due/weak SRS vocab + reading context via 4 tools and emits an ordered study plan (exercise type/difficulty recalibrated from the real SRS stage), persisted in a server-side tutor_session with a HITL feedback re-plan loop (drops just-passed cards, ignores feedback for ids not in the prior plan, preserves length). Two QA-verified guarantees: anti-hallucination (every scheduled wordId must come from a get_due/get_weak tool result; word+stage re-projected; invented ids dropped; empty transcript -> empty plan) and cross-user isolation (card resolved Id==wordId && UserId==userId; RAG filters user_id AND user_book_id). All inbound book text sanitized + capped before the prompt. Endpoints POST /me/tutor/session + /feedback (auth, 8/min). agent_run agent=tutor. Route tutor.agent -> gpt-4.1-mini. Eval TutorEvalRunner (due-coverage/weak- targeting/difficulty/no-hallucination/thesis-alignment), admin endpoint. EF migration AddTutorSession (reversible). 968 unit + 72 AiEvals green; build/format clean. Deferred: SSE, tutor UI slice, free-text exercises, longitudinal efficacy A/B. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…pile fix) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ctions' Regex only allowed one qualifier word between 'ignore' and 'instructions', so 'ignore all previous instructions' (two: all+previous) slipped through. Now matches 1-3 qualifiers. Caught by the Tutor tool-sanitization tests; also hardens the Open Library path used by Enrichment + Librarian agents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mrviduus and others added 3 commits June 24, 2026 11:39

fix(test): set required members in TutorToolSanitizationTests (CI com…

4fcb02e

…pile fix) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mrviduus merged commit cbda68c into main Jun 24, 2026
5 checks passed

mrviduus deleted the feat/tutor-agent branch June 24, 2026 16:05

mrviduus mentioned this pull request Jun 24, 2026

feat(ai): Tutor Agent web UI — Smart session with visible reasoning (AI-Agent-2) #398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): Learning Tutor Agent — plans next study over real SRS state (AI-Agent-2)#397

feat(ai): Learning Tutor Agent — plans next study over real SRS state (AI-Agent-2)#397
mrviduus merged 3 commits into
mainfrom
feat/tutor-agent

mrviduus commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrviduus commented Jun 24, 2026

Why

What

Two hard guarantees (QA-verified, no holes)

QA (adversarial) — applied

Verify

Deferred

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant