Skip to content

feat(ai): Learning Tutor Agent — plans next study over real SRS state (AI-Agent-2)#397

Merged
mrviduus merged 3 commits into
mainfrom
feat/tutor-agent
Jun 24, 2026
Merged

feat(ai): Learning Tutor Agent — plans next study over real SRS state (AI-Agent-2)#397
mrviduus merged 3 commits into
mainfrom
feat/tutor-agent

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

Agent 2 of 3 — completes the agent roadmap (architect → backend → adversarial QA → fix cycle). Design: docs/04-dev/agents-roadmap.md §3.

Why

Vocabulary review today is a fixed SRS queue. The Tutor reasons over the learner's real state (what's due, what's weak, what they're reading) and plans what to study next — exercise type + difficulty per card, with reasoning — keeping practice tied to reading (the product thesis), not turning into an endless drill.

What

  • ReAct on the existing AgentLoop (caps, persisted transcript). Four thin ITools: get_due_vocabulary, get_weak_vocabulary, get_reading_context, get_example_sentence (saved sentence → spoiler-gated, owner-isolated RAG over the learner's own book).
  • Server-side tutor_session (new entity/table, jsonb plan) holds state between turns. HITL: POST /me/tutor/session plans; POST /me/tutor/session/{id}/feedback re-plans on results — re-fetches state, deterministically drops just-passed cards, ignores feedback for ids not in the prior plan, preserves session length.
  • Exercise type/difficulty recalibrated from the real SRS stage (recognition→recall→context-cloze); plan bounded + a reading nudge.

Two hard guarantees (QA-verified, no holes)

  • Anti-hallucination — every scheduled wordId must come from a get_due/get_weak tool result (ok-only from the transcript); word+stage re-projected from the real row; invented ids dropped; empty transcript → empty plan. The model can't fabricate or rename a card, nor smuggle one via free-text.
  • Cross-user isolation — example-sentence card resolved Id == wordId && UserId == userId; RAG path filters user_id AND user_book_id. No other user's user_chapter_chunk content is reachable.
  • All inbound book text (user-upload sentences, reading titles) → ExternalTextSanitizer + length cap before the prompt (prompt-injection boundary — closed in the fix round).

QA (adversarial) — applied

0 blockers; both invariants held under attack. Fixed: book text was un-sanitized into the prompt (now sanitized in the tools); re-plan session-length reset via a JSON casing bug; re-plan had no code guard against re-surfacing passed cards / trusted arbitrary feedback ids; reading-context UserBook lookup missing owner filter; eval's weak-targeting wasn't discriminating (added a weak-∉-due golden).

Verify

dotnet build green · dotnet format clean · 968 unit + 72 AiEvals green. EF migration AddTutorSession (reversible, applied against throwaway pgvector). Telemetry agent_run agent=tutor; route tutor.agent → gpt-4.1-mini. Eval admin-runnable at POST /admin/ai-quality/tutor/eval (live numbers need a key).

Deferred

SSE streaming, the tutor UI surface (frontend/mobile — separate slice), generated free-text exercises beyond MC reuse, longitudinal pedagogical-efficacy A/B (offline evals validate planner mechanics, not learning outcomes).

🤖 Generated with Claude Code

mrviduus and others added 3 commits June 24, 2026 11:39
… (AI-Agent-2)

Third agent (completes the roadmap). Tutor reasons over the learner's due/weak
SRS vocab + reading context via 4 tools and emits an ordered study plan
(exercise type/difficulty recalibrated from the real SRS stage), persisted in a
server-side tutor_session with a HITL feedback re-plan loop (drops just-passed
cards, ignores feedback for ids not in the prior plan, preserves length).

Two QA-verified guarantees: anti-hallucination (every scheduled wordId must come
from a get_due/get_weak tool result; word+stage re-projected; invented ids
dropped; empty transcript -> empty plan) and cross-user isolation (card resolved
Id==wordId && UserId==userId; RAG filters user_id AND user_book_id). All inbound
book text sanitized + capped before the prompt.

Endpoints POST /me/tutor/session + /feedback (auth, 8/min). agent_run agent=tutor.
Route tutor.agent -> gpt-4.1-mini. Eval TutorEvalRunner (due-coverage/weak-
targeting/difficulty/no-hallucination/thesis-alignment), admin endpoint. EF
migration AddTutorSession (reversible). 968 unit + 72 AiEvals green; build/format clean.

Deferred: SSE, tutor UI slice, free-text exercises, longitudinal efficacy A/B.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pile fix)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ctions'

Regex only allowed one qualifier word between 'ignore' and 'instructions', so
'ignore all previous instructions' (two: all+previous) slipped through. Now
matches 1-3 qualifiers. Caught by the Tutor tool-sanitization tests; also
hardens the Open Library path used by Enrichment + Librarian agents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mrviduus mrviduus merged commit cbda68c into main Jun 24, 2026
5 checks passed
@mrviduus mrviduus deleted the feat/tutor-agent branch June 24, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant