Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## [Unreleased]

### Learning Tutor Agent — web UI (Smart session) (AI-Agent-2) — web (2026-06-24)

The frontend for the Tutor agent: a **"Smart session"** on the Vocabulary page that surfaces the tutor's *reasoning*, not just a card stack. `POST /me/tutor/session` → a **plan view** showing the overall `rationale`, each item's `word` + exercise-type badge + difficulty + per-item `why`, and a closing `readingNudge` (the thesis) — visible, intentional reasoning is the point. Then a **study phase** reusing the existing `FlashCard` (word → flip → Got it / Missed it, `responseTimeMs` measured), and a **HITL feedback loop**: `POST /me/tutor/session/{id}/feedback` re-plans the remainder ("your tutor adjusted your plan") until an empty plan ends it → summary (studied / accuracy / nudge). New `TutorSessionPage` at `/:lang/vocabulary/tutor`, `useTutorSession` state machine (`planning→plan→study→…→summary` + empty/error/signIn), `TutorPlanView`, `tutor.ts` client. **Backend DTO enrichment (anti-join):** to render cards the UI used to re-fetch the user's vocab and join by id — which silently dropped planned cards for users with >100 words (server caps `getWords` at 100 + a non-existent `'recent'` sort). Fixed at the source: `TutorEndpoints` now **loads each plan item's card from the DB scoped to the caller** (`Id IN ids AND UserId == userId` — a second anti-hallucination/isolation re-check) and enriches `TutorPlanItemDto` with `translation`/`definition`/`sentence`/`bookTitle`/`hint`/`distractors`, so the client renders straight from the plan with **no join, nothing dropped**. Also a **re-plan turn cap** (`MaxTurns=6` server-side + an 8-round client backstop) so a persistently-missed card can't loop forever. UI hardening from the adversarial pass: `AbortController`/mounted-guard (no setState-after-unmount), feedback-failure retry re-submits the **same** session (doesn't nuke progress), and untrusted LLM strings (`why`/`rationale`/`nudge`) are line-clamped with an unknown-`exerciseType` fallback (no raw i18n-key leak). `tsc` clean; **584 web + 35 backend Tutor + AiEvals** green; `vite build` green; browser-checked (entry→plan→study→re-plan→summary + empty/unknown-type/long-text/unmount, **0 console errors**). **Deferred**: mobile Tutor UI, SSE plan streaming, generated MC exercises beyond the existing card, admin replay link. Completes AI-Agent-2 (backend shipped earlier).

### Learning Tutor Agent — plans what to study next over real SRS state (AI-Agent-2) — backend (2026-06-24)

The third and largest agent: a **Tutor** that reasons over the learner's actual vocabulary state and **plans what to study next**, rather than running a fixed review queue. `TutorAgent` runs on the existing `AgentLoop` runtime and calls four thin `ITool`s — `get_due_vocabulary` (due/near-due SRS cards), `get_weak_vocabulary` (lowest-accuracy / earliest-stage words), `get_reading_context` (what they're actually reading — keeps practice tied to reading, the product thesis), and `get_example_sentence` (a real in-context sentence: the learner's saved sentence, else a **spoiler-gated, owner-isolated RAG** pull from their own book) — then emits an **ordered study plan** (`{wordId, word, stage, exerciseType, difficulty, why}` + an overall `rationale` + a `readingNudge`), exercise type/difficulty **recalibrated from the real SRS stage** (recognition→recall→context-cloze). **Server-held `tutor_session`** (new entity/table, jsonb `PlanJson`, status, turn count) persists the plan between turns; **HITL**: `POST /me/tutor/session` starts/resumes and `POST /me/tutor/session/{id}/feedback` re-plans on the learner's results — re-fetching state (so SRS updates are seen), deterministically **dropping cards just answered correctly**, ignoring feedback for ids not in the prior plan, and preserving the session length. **Two hard guarantees, QA-verified**: (1) **anti-hallucination** — every scheduled `wordId` must come from a `get_due`/`get_weak` tool result (harvested ok-only from the transcript), word+stage **re-projected** from the real row, invented ids dropped, empty transcript → empty plan (the model can't fabricate or rename a card); (2) **cross-user isolation** — the example-sentence tool resolves the card with `Id == wordId && UserId == userId` and the RAG path filters on `user_id AND user_book_id`, so no other user's `user_chapter_chunk` content is reachable. All inbound book text (example sentences from user uploads, reading titles) is run through `ExternalTextSanitizer` + length-capped before entering the prompt (prompt-injection boundary). Telemetry: each turn persists an `agent_run` (agent=`tutor`, `tool_calls_count`); route `tutor.agent → gpt-4.1-mini`. **Eval**: `TutorEvalRunner` (deterministic structural rubric over synthetic learner states — due-coverage, weak-targeting, difficulty-appropriateness, no-hallucination, thesis-alignment; a golden where weak ∉ due makes weak-targeting discriminating), admin-runnable `POST /admin/ai-quality/tutor/eval`. EF migration `AddTutorSession` (reversible). `dotnet build` green, `dotnet format` clean; 968 unit + 72 AiEvals tests green. **Deferred**: SSE streaming, the tutor UI surface (frontend/mobile slice), generated free-text exercises beyond MC reuse, longitudinal pedagogical-efficacy A/B (offline evals validate planner mechanics, not learning outcomes). Completes the 3-agent roadmap (`docs/04-dev/agents-roadmap.md`); Agent 1 (Enrichment) + Agent 3 (Librarian) already shipped.
Expand Down
3 changes: 3 additions & 0 deletions apps/web/src/App.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ const UserBookDetailPage = lazy(() => import('./pages/UserBookDetailPage').then(
const StatsPage = lazy(() => import('./pages/StatsPage').then(m => ({ default: m.StatsPage })))
const VocabularyPage = lazy(() => import('./pages/VocabularyPage').then(m => ({ default: m.VocabularyPage })))
const VocabularyReviewPage = lazy(() => import('./pages/VocabularyReviewPage').then(m => ({ default: m.VocabularyReviewPage })))
const TutorSessionPage = lazy(() => import('./pages/TutorSessionPage').then(m => ({ default: m.TutorSessionPage })))
const HighlightsPage = lazy(() => import('./pages/HighlightsPage').then(m => ({ default: m.HighlightsPage })))
const HighlightReviewPage = lazy(() => import('./pages/HighlightReviewPage').then(m => ({ default: m.HighlightReviewPage })))
import { Header } from './components/Header'
Expand Down Expand Up @@ -104,6 +105,7 @@ function LanguageRoutes() {
<Route path="/stats" element={<StatsPage />} />
<Route path="/vocabulary" element={<VocabularyPage />} />
<Route path="/vocabulary/review" element={<VocabularyReviewPage />} />
<Route path="/vocabulary/tutor" element={<TutorSessionPage />} />
<Route path="/highlights" element={<HighlightsPage />} />
<Route path="/highlights/review" element={<HighlightReviewPage />} />
<Route path="/library/my/:id" element={<UserBookDetailPage />} />
Expand Down Expand Up @@ -157,6 +159,7 @@ function AppRoutes() {
<Route path="/stats" element={<LegacyRedirect />} />
<Route path="/vocabulary" element={<LegacyRedirect />} />
<Route path="/vocabulary/review" element={<LegacyRedirect />} />
<Route path="/vocabulary/tutor" element={<LegacyRedirect />} />
<Route path="/highlights" element={<LegacyRedirect />} />
<Route path="/highlights/review" element={<LegacyRedirect />} />
<Route path="/:lang/*" element={<LanguageRoutes />} />
Expand Down
73 changes: 73 additions & 0 deletions apps/web/src/api/__tests__/tutor.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'
import { startTutorSession, sendTutorFeedback } from '../tutor'

function mockOk(body: unknown) {
return vi.fn().mockResolvedValue({
ok: true,
status: 200,
text: async () => JSON.stringify(body),
})
}

const SAMPLE = {
sessionId: 's1',
plan: [{ wordId: 'w1', word: 'foo', stage: 1, exerciseType: 'recognition', difficulty: 'easy', why: 'because' }],
rationale: 'plan',
readingNudge: 'read more',
runId: 'r1',
}

describe('tutor api', () => {
beforeEach(() => vi.restoreAllMocks())
afterEach(() => vi.unstubAllGlobals())

it('startTutorSession POSTs maxItems and returns parsed response', async () => {
const fetchMock = mockOk(SAMPLE)
vi.stubGlobal('fetch', fetchMock)

const res = await startTutorSession(7)

expect(res.sessionId).toBe('s1')
expect(res.plan).toHaveLength(1)
const [url, opts] = fetchMock.mock.calls[0]
expect(String(url)).toContain('/me/tutor/session')
expect(opts.method).toBe('POST')
expect(opts.credentials).toBe('include')
expect(JSON.parse(opts.body)).toEqual({ maxItems: 7 })
})

it('startTutorSession sends empty body when maxItems omitted', async () => {
const fetchMock = mockOk(SAMPLE)
vi.stubGlobal('fetch', fetchMock)

await startTutorSession()

const [, opts] = fetchMock.mock.calls[0]
expect(JSON.parse(opts.body)).toEqual({})
})

it('sendTutorFeedback POSTs results to the session feedback URL', async () => {
const fetchMock = mockOk({ ...SAMPLE, plan: [] })
vi.stubGlobal('fetch', fetchMock)

const results = [{ wordId: 'w1', correct: true, responseTimeMs: 1234 }]
const res = await sendTutorFeedback('s1', results)

expect(res.plan).toHaveLength(0)
const [url, opts] = fetchMock.mock.calls[0]
expect(String(url)).toContain('/me/tutor/session/s1/feedback')
expect(opts.method).toBe('POST')
expect(JSON.parse(opts.body)).toEqual({ results })
})

it('rejects on a non-ok response', async () => {
const fetchMock = vi.fn().mockResolvedValue({
ok: false,
status: 503,
text: async () => JSON.stringify({ error: 'no tutor' }),
})
vi.stubGlobal('fetch', fetchMock)

await expect(startTutorSession()).rejects.toThrow('no tutor')
})
})
72 changes: 72 additions & 0 deletions apps/web/src/api/tutor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import { authFetch } from './client'

// Learning Tutor agent (AI-Agent-2). The tutor PLANS what to study next over the learner's real SRS +
// reading state and hands off to the existing vocabulary-review flow. JSON (SSE deferred). The plan is held
// server-side in a session so the HITL re-plan turn survives across requests.

// --- Types (mirror Contracts/Agents/TutorDtos.cs, camelCase via the API) ---

/**
* One planned study item. The backend now ENRICHES each item with the full card payload (translation,
* definition, sentence, bookTitle, hint, distractors), so the UI renders cards straight from the plan —
* no separate vocab fetch + join. References a REAL vocab card by `wordId`, with per-item `why` reasoning.
*/
export interface TutorPlanItem {
wordId: string
word: string
stage: number
exerciseType: string // recognition | recall | context
difficulty: string // label string
why: string // per-item reasoning
translation?: string | null
definition?: string | null
sentence?: string | null
bookTitle?: string | null
hint?: string | null
distractors: string[] // [] when none, never null
}

/** The tutor's response: the persisted session, the ordered plan, and the surfaced reasoning. */
export interface TutorSessionResponse {
sessionId: string
plan: TutorPlanItem[]
rationale: string // overall session reasoning
readingNudge: string // ties back to reading (the thesis)
runId: string
}

/** One learner result fed back to the tutor for re-planning. */
export interface TutorFeedbackResult {
wordId: string
correct: boolean
responseTimeMs: number
}

// --- API Functions ---

/** Plan a new tutor session over the learner's current state. `maxItems` is optional (server-capped). */
export async function startTutorSession(maxItems?: number, signal?: AbortSignal): Promise<TutorSessionResponse> {
return authFetch<TutorSessionResponse>('/me/tutor/session', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(maxItems != null ? { maxItems } : {}),
signal,
})
}

/**
* Submit the learner's results for the current session and get the re-planned remainder. An empty `plan` in
* the response means the session is complete.
*/
export async function sendTutorFeedback(
sessionId: string,
results: TutorFeedbackResult[],
signal?: AbortSignal,
): Promise<TutorSessionResponse> {
return authFetch<TutorSessionResponse>(`/me/tutor/session/${sessionId}/feedback`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ results }),
signal,
})
}
55 changes: 55 additions & 0 deletions apps/web/src/components/vocabulary/TutorPlanView.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import type { TutorPlanItem } from '../../api/tutor'
import { exerciseLabel, exerciseBadgeClass } from './tutorLabels'

interface Props {
rationale: string
plan: TutorPlanItem[]
readingNudge: string
adjusted: boolean
t: (key: string) => string
onStart: () => void
}

// The showcase: surfaces the tutor's reasoning as a deliberate "here's your plan and why" view — the visible
// reasoning is the point, not debug text.
export function TutorPlanView({ rationale, plan, readingNudge, adjusted, t, onStart }: Props) {
return (
<div className="tutor-plan">
<div className="tutor-plan__rationale">
<span className="tutor-plan__rationale-label">
{adjusted ? t('tutor.plan.adjustedLabel') : t('tutor.plan.rationaleLabel')}
</span>
<p className="tutor-plan__rationale-text tutor-clamp tutor-clamp--4">{rationale}</p>
</div>

<ol className="tutor-plan__list">
{plan.map((item, i) => (
<li key={item.wordId} className="tutor-plan__item">
<span className="tutor-plan__item-index">{i + 1}</span>
<div className="tutor-plan__item-body">
<div className="tutor-plan__item-head">
<span className="tutor-plan__item-word">{item.word}</span>
<span className={exerciseBadgeClass(item.exerciseType)}>
{exerciseLabel(item.exerciseType, t)}
</span>
<span className="tutor-plan__item-difficulty tutor-clamp tutor-clamp--1">{item.difficulty}</span>
</div>
<p className="tutor-plan__item-why tutor-clamp tutor-clamp--3">{item.why}</p>
</div>
</li>
))}
</ol>

{readingNudge && (
<div className="tutor-plan__nudge">
<span className="tutor-plan__nudge-icon" aria-hidden>📖</span>
<p className="tutor-plan__nudge-text tutor-clamp tutor-clamp--3">{readingNudge}</p>
</div>
)}

<button className="tutor-plan__start" onClick={onStart}>
{t('tutor.plan.start')}
</button>
</div>
)
}
21 changes: 21 additions & 0 deletions apps/web/src/components/vocabulary/tutorLabels.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// Display helpers that tolerate untrusted LLM strings. `exerciseType` and `difficulty` come straight from
// the model — guard them so an unexpected value doesn't render a raw i18n key (`tutor.exercise.foo`) or
// blow up the layout.

const KNOWN_EXERCISE_TYPES = new Set(['recognition', 'recall', 'context'])

/**
* Label for an exercise type. For a known type we use the i18n key; for anything unexpected from the model
* we fall back to the raw value (or a generic label) rather than leaking `tutor.exercise.<garbage>`.
*/
export function exerciseLabel(exerciseType: string, t: (key: string) => string): string {
if (KNOWN_EXERCISE_TYPES.has(exerciseType)) return t(`tutor.exercise.${exerciseType}`)
const raw = exerciseType?.trim()
return raw ? raw : t('tutor.exercise.generic')
}

/** Known types map to a styled badge variant; unknown types get a neutral default. */
export function exerciseBadgeClass(exerciseType: string): string {
const variant = KNOWN_EXERCISE_TYPES.has(exerciseType) ? exerciseType : 'default'
return `tutor-badge tutor-badge--${variant}`
}
Loading
Loading