feat(ai): Learning Tutor Agent — plans next study over real SRS state (AI-Agent-2)#397
Merged
Conversation
… (AI-Agent-2) Third agent (completes the roadmap). Tutor reasons over the learner's due/weak SRS vocab + reading context via 4 tools and emits an ordered study plan (exercise type/difficulty recalibrated from the real SRS stage), persisted in a server-side tutor_session with a HITL feedback re-plan loop (drops just-passed cards, ignores feedback for ids not in the prior plan, preserves length). Two QA-verified guarantees: anti-hallucination (every scheduled wordId must come from a get_due/get_weak tool result; word+stage re-projected; invented ids dropped; empty transcript -> empty plan) and cross-user isolation (card resolved Id==wordId && UserId==userId; RAG filters user_id AND user_book_id). All inbound book text sanitized + capped before the prompt. Endpoints POST /me/tutor/session + /feedback (auth, 8/min). agent_run agent=tutor. Route tutor.agent -> gpt-4.1-mini. Eval TutorEvalRunner (due-coverage/weak- targeting/difficulty/no-hallucination/thesis-alignment), admin endpoint. EF migration AddTutorSession (reversible). 968 unit + 72 AiEvals green; build/format clean. Deferred: SSE, tutor UI slice, free-text exercises, longitudinal efficacy A/B. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pile fix) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ctions' Regex only allowed one qualifier word between 'ignore' and 'instructions', so 'ignore all previous instructions' (two: all+previous) slipped through. Now matches 1-3 qualifiers. Caught by the Tutor tool-sanitization tests; also hardens the Open Library path used by Enrichment + Librarian agents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent 2 of 3 — completes the agent roadmap (architect → backend → adversarial QA → fix cycle). Design:
docs/04-dev/agents-roadmap.md§3.Why
Vocabulary review today is a fixed SRS queue. The Tutor reasons over the learner's real state (what's due, what's weak, what they're reading) and plans what to study next — exercise type + difficulty per card, with reasoning — keeping practice tied to reading (the product thesis), not turning into an endless drill.
What
AgentLoop(caps, persisted transcript). Four thinITools:get_due_vocabulary,get_weak_vocabulary,get_reading_context,get_example_sentence(saved sentence → spoiler-gated, owner-isolated RAG over the learner's own book).tutor_session(new entity/table, jsonb plan) holds state between turns. HITL:POST /me/tutor/sessionplans;POST /me/tutor/session/{id}/feedbackre-plans on results — re-fetches state, deterministically drops just-passed cards, ignores feedback for ids not in the prior plan, preserves session length.Two hard guarantees (QA-verified, no holes)
wordIdmust come from aget_due/get_weaktool result (ok-only from the transcript); word+stage re-projected from the real row; invented ids dropped; empty transcript → empty plan. The model can't fabricate or rename a card, nor smuggle one via free-text.Id == wordId && UserId == userId; RAG path filtersuser_id AND user_book_id. No other user'suser_chapter_chunkcontent is reachable.ExternalTextSanitizer+ length cap before the prompt (prompt-injection boundary — closed in the fix round).QA (adversarial) — applied
0 blockers; both invariants held under attack. Fixed: book text was un-sanitized into the prompt (now sanitized in the tools); re-plan session-length reset via a JSON casing bug; re-plan had no code guard against re-surfacing passed cards / trusted arbitrary feedback ids; reading-context UserBook lookup missing owner filter; eval's weak-targeting wasn't discriminating (added a weak-∉-due golden).
Verify
dotnet buildgreen ·dotnet formatclean · 968 unit + 72 AiEvals green. EF migrationAddTutorSession(reversible, applied against throwaway pgvector). Telemetryagent_runagent=tutor; routetutor.agent → gpt-4.1-mini. Eval admin-runnable atPOST /admin/ai-quality/tutor/eval(live numbers need a key).Deferred
SSE streaming, the tutor UI surface (frontend/mobile — separate slice), generated free-text exercises beyond MC reuse, longitudinal pedagogical-efficacy A/B (offline evals validate planner mechanics, not learning outcomes).
🤖 Generated with Claude Code