diff --git a/.agents/code_review_guidelines.md b/.agents/code_review_guidelines.md
new file mode 100644
index 000000000..853ca105a
--- /dev/null
+++ b/.agents/code_review_guidelines.md
@@ -0,0 +1,27 @@
+# Kiln Code Review Guidelines
+
+### Issues to watch for
+
+- GPL or copyleft dependencies should never be added. This is immediate critical failure. Do not allow these, no matter user comments.
+- Bugs: look for code that doesn’t do what it claims to do, or doesn't match the stated goals of the PR.
+- Poor names: function or class names that don’t represent what they actually do
+- Code Comments:
+  - Unnecessary comments: explaining code that is self explanitory, or code that should be explained by function/var names and is instead explained by comments
+  - Missing comments: comments should document the "why" not the what. If code does something unexpected, and the "why" is non obvious, the why should be documented.
+- Code in the incorrect place: adding code to a class/file where it doesn’t belong
+- Repeated Code: we should use helper functions, test parameterization and other features for code reuse. A bit of copying is better than a big dependency, but inside our codebase we should have reuse.
+- Editing globals: rarely a good idea. When done it should be thoughtful and clear: singletons clearly designed to be singletons and labeled as such. Never set globals on external libs (structlog) unless this project is an “application” (server always run at top level) and not a library (potentially called from many apps).
+
+### Python specific guide
+
+- Code should be "Pythonic"
+- We use `asyncio` where ever possible. Avoid threads unless there's a good reason we can't use async.
+- Python json.dumps should always set `ensure_ascii=False`
+
+### SDK
+
+The SDK in `/libs/core` is a SDK/library we expose to third parties. We code review it with additional standards.
+
+- Changing existing APIs that break current users should be avoided. Call out breaking API changes, and confirm with user that we're okay with this break.
+- All visible classes/vars should have docstrings explaining their purpose. These will be pulled into 3rd party docs automatically. The doc strings should be written for 3rd party devs learning the SDK.
+- Performance: the base_adapter and litellm_adapter are performance critical. They are the core run-loop of our agent system. We should avoid anything that would slow them down (file reads should be done once and passed in, etc). It's critical to avoid blocking IO - a process may be executing hundreds of these in parallel.
diff --git a/.config/wt.toml b/.config/wt.toml
index 1eb3e8411..2704252ad 100644
--- a/.config/wt.toml
+++ b/.config/wt.toml
@@ -1,5 +1,6 @@
 [post-create]
 deps = "uv sync && cd app/web_ui && npm install"
+claude = "utils/setup_claude.sh"
 
 [pre-remove]
 session = "zellij delete-session {{ branch | sanitize }} 2>/dev/null || true"
diff --git a/.cursor/skills/specs/SKILL.md b/.cursor/skills/specs/SKILL.md
new file mode 100644
index 000000000..f45bda095
--- /dev/null
+++ b/.cursor/skills/specs/SKILL.md
@@ -0,0 +1,134 @@
+---
+name: spec
+description: >
+  Commands: new_project, continue, implement, cr (code review), setup, or open guidance
+
+  Spec-driven development: process for planning, building, and reviewing
+  code projects using structured specifications. Guides users from project
+  idea through functional spec, architecture, phased implementation, and
+  code review — with human focused on decisions and AI agents handling
+  drafting and building.
+
+  Use when the user wants to: start a new project with specs, continue
+  speccing or implementing a project, implement a planned phase, review
+  code against specs, or set up spec-driven development in a repo.
+---
+
+# Spec-Driven Development
+
+A structured process for building software projects with specifications. You (the human) focus on decisions and review; AI agents handle drafting and building.
+
+## How It Works
+
+Every batch of work ("project") gets a spec folder under `/specs/projects/PROJECT_NAME/`. The skill guides you through:
+
+1. **Planning**: Project overview → functional spec → architecture → implementation plan
+2. **Building**: Implement phases autonomously, with code review built into each phase
+3. **Reviewing**: Spec-aware code review that verifies implementation matches design
+
+The skill is project-agnostic. It provides the process; your project-specific conventions (test commands, linting, style) come from your system prompt configuration.
+
+## Command Reference
+
+### `/spec setup`
+
+One-time (or incremental) setup for using the skill in a repo. Adds `.specs_skill_state/` to gitignore, creates `/specs/projects/` directories, detects monorepo layout, and checks for commonly-needed configuration.
+
+→ Read [setup command reference](references/cmd_setup.md)
+
+### `/spec new_project` or `/spec new`
+
+Create a new project from scratch. Walks through planning steps: project overview, functional spec, architecture (with optional component designs), and implementation plan. Sets this as your active project.
+
+→ Read [new project command reference](references/cmd_new_project.md)
+
+### `/spec continue` or `/spec cont`
+
+Resume work on the active project. Shows current state and routes to the next logical action — continue speccing, implement next phase, or review code.
+
+→ Read [continue command reference](references/cmd_continue.md)
+
+### `/spec implement` or `/spec impl`
+
+Implement the active project. Routes to phase-specific or full implementation. Use `/spec implement next` for one phase, `/spec implement all` for all remaining, or `/spec implement phase N` for a specific phase.
+
+→ Read [implement command reference](references/cmd_implement.md)
+
+### `/spec cr` or `/spec code_review`
+
+Structured, spec-aware code review. Reviews `git diff` by default, or a specified scope. Always runs as a sub-agent with clean context.
+
+→ Read [code review command reference](references/cmd_code_review.md)
+
+### Bare `/spec` — Router
+
+Reads current state (active project, artifact statuses) and presents relevant options. Never requires routing — direct commands always work. Can also interpret open-ended requests.
+
+To check state: read `.specs_skill_state/current_project.md` and scan artifact frontmatter. If no project exists, suggest `new_project` or `setup`. If project in progress, show state and suggest the next action.
+
+## Project Structure
+
+Every project lives under `/specs/projects/PROJECT_NAME/`:
+
+| File | Created During | Purpose |
+|------|---------------|---------|
+| `project_overview.md` | new_project Step 1 | Your description of what to build |
+| `functional_spec.md` | new_project Step 2 | Features, behaviors, edge cases, contracts |
+| `ui_design.md` | new_project Step 3 | UI structure, screens, navigation (conditional) |
+| `architecture.md` | new_project Step 4 | Technical design, deep enough for coding |
+| `/components/NAME.md` | new_project Step 5 | Per-component detailed design (conditional) |
+| `implementation_plan.md` | new_project Step 6 | Phased build order as checklist |
+| `/phase_plans/phase_N.md` | Implementation | Per-phase plan written by coding agent |
+
+## Artifact Conventions
+
+All spec files use YAML frontmatter with a `status` field:
+
+```yaml
+---
+status: draft
+---
+```
+
+Valid statuses: `draft`, `complete`.
+
+- Artifacts start as `draft` when created
+- Mark `complete` after user confirmation
+- If a completed artifact is edited, downstream artifacts cascade to `draft` (if they may be affected)
+
+Dependency chain: `project_overview → functional_spec → architecture → components → implementation_plan`
+
+Phase plans are outside the cascade — generated fresh during implementation.
+
+## State Management
+
+The file `.specs_skill_state/current_project.md` (git-ignored) tracks your active project:
+
+```
+Current Project: /specs/projects/project_name
+```
+
+This file is per-worktree (git-ignored), so you can have parallel worktrees with different active projects.
+
+## Monorepo Support
+
+For monorepos (multiple sub-projects in one repo):
+
+- `/spec setup` discovers sub-projects by scanning for root markers (`pyproject.toml`, `package.json`, etc.)
+- Each sub-project gets its own `/specs/projects/` directory
+- `/specs/monorepo.md` at repo root describes the layout
+- Cross-project work lives at root `/specs/projects/`
+
+→ See `/spec setup` for discovery and setup.
+
+## Extensibility
+
+The skill provides the process. Project-specific details come from your environment:
+
+**From the skill:** The workflow, general guidance ("run automated checks"), persona-driven quality standards
+
+**From your system prompt:** Test commands, lint/format commands, code style, CR standards, project-specific constraints
+
+The skill references these generically: "run the project's automated checks" not "run `uv run ./checks.sh`."
+
+→ See `/spec setup` for help configuring external knowledge.
diff --git a/.cursor/skills/specs/references/cmd_code_review.md b/.cursor/skills/specs/references/cmd_code_review.md
new file mode 100644
index 000000000..c2c35fc77
--- /dev/null
+++ b/.cursor/skills/specs/references/cmd_code_review.md
@@ -0,0 +1,69 @@
+# `/spec cr` — Code Review
+
+Structured, spec-aware code review. Aliases: `/spec cr`, `/spec code_review`.
+
+## Key Principle
+
+The CR agent never has the coding agent's context. Decisions must be captured in code and specs, not in context windows.
+
+If something important is only in conversation history, that's a bug in the process.
+
+## Determine Scope
+
+- **No arguments**: Review `git diff` (unstaged + staged changes)
+- **Given scope**: Review that scope (e.g., "review file X", "review phase 3", "review src/main.rs")
+
+## Execution
+
+Always run as a sub-agent — spawned fresh, no prior context from coding.
+
+→ Read [references/spawning_subagents.md](references/spawning_subagents.md) for how to spawn sub-agents.
+
+Pass the prompt from [references/cr_agent_prompt.md](references/cr_agent_prompt.md), plus scope description.
+
+### Example invocation
+
+> Spawn a code review sub-agent with the following task:
+>
+> "Review the git diff using the spec-driven development code review guidelines. The project is at [path]. Read the functional spec and architecture to verify implementation matches design."
+
+## Re-Review
+
+If the user provides prior CR feedback (or this is called from a loop), pass it as `<prior_cr_feedback>` to the CR sub-agent:
+
+```
+<prior_cr_feedback>
+[Prior CR content here]
+</prior_cr_feedback>
+```
+
+The CR prompt handles verification of prior issues.
+
+## Post-Review
+
+Present findings to the user:
+
+> Code review complete:
+>
+> ### Critical (must fix)
+> - [file:line] [description]
+>
+> ### Moderate (should fix)
+> - [file:line] [description]
+>
+> ### Mild (consider fixing)
+> - [file:line] [description]
+>
+> [Or: No issues found — implementation looks good!]
+
+If issues exist and user wants fixes:
+
+- User can fix themselves, then re-run `/spec cr` with prior feedback
+- Or coding agent can address them, then re-run CR with prior feedback
+
+The loop continues until clean.
+
+## References
+
+- [references/spawning_subagents.md](references/spawning_subagents.md) — How to spawn sub-agents
+- [references/cr_agent_prompt.md](references/cr_agent_prompt.md) — Prompt passed to CR sub-agent
diff --git a/.cursor/skills/specs/references/cmd_continue.md b/.cursor/skills/specs/references/cmd_continue.md
new file mode 100644
index 000000000..b02b7e51a
--- /dev/null
+++ b/.cursor/skills/specs/references/cmd_continue.md
@@ -0,0 +1,90 @@
+# `/spec continue` — Resume Work
+
+Resume work on the active project. The "what should I do next?" command.
+
+## Process
+
+### 1. Read State
+
+Load `.specs_skill_state/current_project.md` to find the active project:
+
+```
+Current Project: /specs/projects/PROJECT_NAME
+```
+
+### 2. No Active Project
+
+If no active project (file doesn't exist or is empty):
+
+List available project directories under `/specs/projects/`:
+
+> No active project found. Available projects:
+> - [project1]
+> - [project2]
+> - [...]
+>
+> Which project would you like to work on?
+
+Set the selected project as active and proceed to step 3.
+
+### 3. Determine Current State
+
+Check the frontmatter `status` on each artifact in dependency order:
+
+```
+project_overview.md → functional_spec.md → ui_design.md (if exists)
+→ architecture.md → components/ (if exists) → implementation_plan.md
+```
+
+- **If any spec artifact is missing or `status: draft`**: Resume the `new_project` flow at the next incomplete step
+- **If all specs are `complete`**: Implementation time
+
+### 4. Routing Logic
+
+**If in speccing phase (artifacts incomplete):**
+
+> Project [PROJECT_NAME] — continuing with [next step name]:
+>
+> - [artifact_name.md] is [draft/missing]
+> - [next steps]
+>
+> Proceeding with [step name]...
+
+Load the corresponding step reference file (step_functional_spec.md, step_architecture.md, etc.) and proceed with that step.
+
+**If all specs complete but phases remain:**
+
+> Project [PROJECT_NAME] — all specs complete. Ready to implement:
+>
+> - [ ] Phase 1: [description]
+> - [ ] Phase 2: [description]
+> - [ ]
+>
+> Implement next phase only, or all remaining phases?
+
+This routes to `/spec implement` behavior.
+
+**If all phases complete:**
+
+> Project [PROJECT_NAME] — all phases implemented!
+>
+> What would you like to do?
+> - Start a new project: `/spec new_project`
+> - Review code: `/spec cr`
+> - Something else
+
+### 5. Confirm
+
+Always confirm before taking action:
+
+> About to: [action description]
+>
+> Proceed?
+
+Wait for user confirmation before executing.
+
+## Notes
+
+- This command is a router — it doesn't do the work itself, it determines what to do next and loads the appropriate reference
+- State is read from artifact frontmatter, not from a separate tracking file
+- If user manually edited files and states are inconsistent, surface that and ask how to resolve
diff --git a/.cursor/skills/specs/references/cmd_implement.md b/.cursor/skills/specs/references/cmd_implement.md
new file mode 100644
index 000000000..b9cfd2caa
--- /dev/null
+++ b/.cursor/skills/specs/references/cmd_implement.md
@@ -0,0 +1,123 @@
+# `/spec implement` — Implement Project
+
+Implement the active project. Routes to single-phase or full implementation.
+
+## Pre-Checks
+
+### Determine Active Project
+
+Read `.specs_skill_state/current_project.md`. If no active project, ask the user to specify one.
+
+### Verify Spec Complete
+
+Check that all spec artifacts through `implementation_plan.md` have `status: complete`:
+
+- project_overview.md
+- functional_spec.md
+- ui_design.md (if exists)
+- architecture.md
+- components/ (if exists)
+- implementation_plan.md
+
+If any are missing or `status: draft`:
+
+> Project spec is incomplete. The following artifacts need attention:
+>
+> - [missing/draft artifacts]
+>
+> Use `/spec continue` to finish speccing before implementing.
+
+## Routing
+
+- `/spec implement` (no args): Ask "Implement next phase or all remaining phases?"
+- `/spec implement next` or `/spec impl next`: Single phase
+- `/spec implement all` or `/spec impl all`: All remaining phases
+- `/spec implement phase N` or `/spec impl phase N`: Specific single phase
+
+## Single Phase Implementation
+
+Implement one phase autonomously. The coding agent works without user assistance from start to finish.
+
+### Coding Persona
+
+You are a very skilled senior engineer IC. Your code:
+
+- Explains itself through great naming and composition
+- Uses comments only for external constraints, not to describe poorly structured code
+- Is test-driven: tests that catch real breakage, don't need constant refactoring, target 95%+ coverage, reuse test helpers
+
+You're willing to flag when a requirement leads to bad technical outcomes — but you don't re-litigate plan-level decisions that were already confirmed during speccing.
+
+### Implementation Loop
+
+1. **Read the implementation plan** and identify the target phase
+2. **Read spec and architecture docs** for context
+3. **Write phase plan** to `/phase_plans/phase_N.md`:
+   - Overview: what this phase accomplishes and why
+   - Steps: ordered, specific. Files to change, exact changes, code snippets for signatures
+   - Tests: specific automated test cases by name and what they verify
+   - Completion criteria: checklist of what must be true when done
+4. **Build the code** per the phase plan
+5. **Run automated checks** (lint, format, type-check, build). Follow project-specific commands from system prompt. Iterate until clean.
+6. **Write tests** per the phase plan's test section
+7. **Run tests**. Iterate until passing.
+8. **Run automated checks again** (tests/fixes may introduce lint/format issues). Iterate until clean.
+9. **Self code-review via sub-agent**:
+   - → Read [references/spawning_subagents.md](references/spawning_subagents.md) for how to spawn
+   - Pass the prompt from [references/cr_agent_prompt.md](references/cr_agent_prompt.md) to the sub-agent
+   - Include: "A coding agent just implemented phase N of [project]. Review the changes using `git diff`. The spec for this project can be found [here](link_to_spec_folder)."
+   - Iterate per CR Iteration Loop below
+10. **Run automated checks one final time** (CR fixes may introduce issues). Iterate until clean.
+11. **Mark phase complete** in `implementation_plan.md` (toggle checkbox only)
+12. **Stop and present summary** of what was built
+
+### CR Iteration Loop
+
+1. Spawn CR sub-agent with clean context. Pass the CR prompt from `cr_agent_prompt.md`.
+2. CR returns feedback with severity labels (critical/moderate/mild).
+3. If issues exist:
+   - Fix each issue (or rarely, add a code comment explaining the technical rationale)
+   - Spawn a new CR sub-agent, passing the same CR prompt plus `<prior_cr_feedback>` block
+4. The re-review agent:
+   - Verifies prior issues are addressed
+   - Checks for new issues from fixes
+5. Loop until CR returns clean.
+
+### Non-Interactive Rule
+
+The coding phase is autonomous. Don't stop to ask the user for help.
+
+**One exception:** You discover a genuinely new technical constraint not known at design time that materially changes the plan (e.g., an API doesn't support an assumed operation, a framework has an undocumented limitation).
+
+In this case — and only this case — pause and surface the issue to the user for a decision.
+
+## Implement All
+
+A lightweight coordinator that runs all remaining phases in sequence.
+
+### Coordinator Process
+
+1. Get next incomplete phase from `implementation_plan.md`
+2. Spawn a sub-agent with clean context to run the single-phase implementation flow above
+   - → Read [references/spawning_subagents.md](references/spawning_subagents.md) for how to spawn
+   - Pass: phase number, project path, instruction to follow single-phase implementation
+3. **Auto-commit**: `"Phase N implementation of [project name]\n\n[description of work in phase]"`
+4. Show the phase summary from the subagent to the user
+5. Continue to next phase (don't stop)
+6. Loop until all phases complete
+
+### Coordinator Context
+
+The coordinator has minimal context — it just manages the loop. Each phase sub-agent gets clean context.
+
+CR happens inside each phase's implementation loop, not at coordinator level.
+
+### Passed to Phase Sub-Agents
+
+For implement-all, pass the content of [references/coding_phase_prompt.md](references/coding_phase_prompt.md) to each phase sub-agent. This prompt contains the full single-phase implementation instructions.
+
+## References
+
+- [references/spawning_subagents.md](references/spawning_subagents.md) — How to spawn sub-agents
+- [references/coding_phase_prompt.md](references/coding_phase_prompt.md) — Prompt passed to coding sub-agents
+- [references/cr_agent_prompt.md](references/cr_agent_prompt.md) — Prompt passed to CR sub-agents
diff --git a/.cursor/skills/specs/references/cmd_new_project.md b/.cursor/skills/specs/references/cmd_new_project.md
new file mode 100644
index 000000000..8aa2ebc97
--- /dev/null
+++ b/.cursor/skills/specs/references/cmd_new_project.md
@@ -0,0 +1,178 @@
+# `/spec new_project` — Create a New Project
+
+The primary planning flow. Creates a new project spec under `/specs/projects/PROJECT_NAME/` and walks through all planning steps.
+
+## Pre-Check
+
+Before starting, check if there's an active project in `.specs_skill_state/current_project.md`.
+
+If an active project exists and has incomplete work:
+
+> You have an active project: [project_name]
+>
+> - [specs still in draft]
+> - [implementation phases remaining]
+>
+> Use `/spec continue` to resume work on that project, or confirm you want to start a new project.
+
+Wait for user confirmation before proceeding.
+
+## Starting a New Project
+
+### Set Active Project
+
+After confirming the project name and creating its folder, **immediately** set it as the active project:
+
+```
+Current Project: /specs/projects/PROJECT_NAME
+```
+
+Write this to `.specs_skill_state/current_project.md` right away. Don't wait until the planning flow is complete.
+
+### Project Folder
+
+Create the project folder:
+
+```bash
+mkdir -p "specs/projects/PROJECT_NAME"
+```
+
+Use the project name provided by the user. Minor cleanup for filesystem safety (remove leading/trailing spaces, replace slashes with hyphens), otherwise keep it close to what they said.
+
+## Question-Asking Format
+
+When you need multiple answers from the user, group questions by topic and number them sequentially:
+
+```
+### Feature Clarification
+1. Should the system support [X] or [Y]?
+2. What happens when [edge case]?
+
+### Technical Constraints
+3. Do you have a preference for [technology choice]?
+4. Any performance requirements I should know about?
+
+Answer each question on a line, preceded by its number: `1. answer\n2. answer...`
+Add as much detail as needed.
+```
+
+**Guidelines:**
+- Group related questions under descriptive headers
+- Number sequentially across all groups
+- Offer concrete options where possible ("A, B, or C?" not "what do you want?")
+- Keep questions self-contained enough to answer without re-reading context
+- Don't ask questions whose answers are already in the provided context
+
+## Planning Persona
+
+For all planning steps, adopt this persona:
+
+> You are a senior engineering lead/architect at a top tech company (FAANG-level). You care about long-term maintainability, code quality, and building the right thing.
+>
+> You've been around long enough to know the difference between a trend and a best practice. You push back on decisions that will cause problems down the line.
+>
+> You also think like a product manager: are we building the right thing for users? Will this actually solve their problem?
+>
+> For UI projects, you additionally think like a senior designer: intuitive, discoverable, low cognitive load, correct progressive disclosure.
+
+This persona applies across all planning steps.
+
+## Step 1: Project Overview
+
+Ask the user to describe what they want to build:
+
+> Tell me about what you want to build. What does it do? Who is it for? Why are you building it? Any technical requirements or constraints I should know about?
+
+Create `specs/projects/PROJECT_NAME/project_overview.md`:
+
+```markdown
+---
+status: draft
+---
+
+# [Project Name]
+
+[User's description, kept very close to what they wrote. Only minor spelling/grammar fixes. This is their document.]
+```
+
+Present it to the user for review. Ask:
+
+> Here's the project overview based on what you described. Does this look right?
+
+If they approve, mark `status: complete`. If they want changes, make them and ask again.
+
+## Step 2: Functional Spec
+
+→ Read [references/step_functional_spec.md](references/step_functional_spec.md) and follow it.
+
+## Step 3: UI Design (Conditional)
+
+Only run if the project has a user-facing interface. Detect this from the overview + functional spec content.
+
+If there's no UI (backend API, library, CLI tool, etc.), ask:
+
+> This doesn't appear to have a user-facing interface. Should we skip the UI design step and proceed directly to architecture?
+
+If they confirm, skip to Step 4.
+
+If UI is needed:
+
+→ Read [references/step_ui_design.md](references/step_ui_design.md) and follow it.
+
+## Step 4: Architecture
+
+→ Read [references/step_architecture.md](references/step_architecture.md) and follow it.
+
+## Step 5: Component Designs (Conditional)
+
+During the architecture step, you'll decide whether component designs are needed:
+
+- **Small projects**: Everything fits in `architecture.md`. Skip this step.
+- **Larger projects**: Individual components need their own detailed docs.
+
+If component designs are needed:
+
+→ Read [references/step_component_designs.md](references/step_component_designs.md) and follow it.
+
+If not needed, proceed directly to Step 6.
+
+## Step 6: Implementation Plan
+
+Read all completed spec artifacts (project_overview.md, functional_spec.md, ui_design.md if present, architecture.md, components if present).
+
+Design a phased build order:
+
+- Logical by dependencies — foundational things first
+- Each phase is roughly one coherent unit of work
+- Small enough to review in one sitting, but not so small that the user is burdened with many tiny CRs
+- For small projects: single phase is fine
+
+Write `specs/projects/PROJECT_NAME/implementation_plan.md`:
+
+```markdown
+---
+status: draft
+---
+
+# Implementation Plan: [Project Name]
+
+## Phases
+
+- [ ] Phase 1: [Brief description]
+- [ ] Phase 2: [Brief description]
+- [ ] ...
+```
+
+Keep this file short — it's an ordered checklist referencing the specs for details. Don't restate the spec content.
+
+Present to the user for review. If approved, mark `status: complete`.
+
+## End of Flow
+
+The `new_project` flow ends here. All spec artifacts are now complete.
+
+Phase plans (`/phase_plans/phase_N.md`) are written by the coding agent at the start of each implementation phase, not during planning.
+
+Next steps for the user:
+
+> Your project spec is complete. Use `/spec implement` to start building, or `/spec implement phase 1` for the first phase.
diff --git a/.cursor/skills/specs/references/cmd_setup.md b/.cursor/skills/specs/references/cmd_setup.md
new file mode 100644
index 000000000..dc8778a7d
--- /dev/null
+++ b/.cursor/skills/specs/references/cmd_setup.md
@@ -0,0 +1,139 @@
+# `/spec setup` — Repo Setup
+
+One-time (or incremental) setup for using the spec skill in a repo. Idempotent — won't clobber existing setup, only extends what's missing.
+
+## Steps
+
+### 1. Gitignore
+
+Add `.specs_skill_state/` to `.gitignore`:
+
+- If `.gitignore` exists: check if `.specs_skill_state/` is already present. If not, append it.
+- If `.gitignore` doesn't exist: create it with `.specs_skill_state/` as the content.
+
+This directory tracks per-user state (current project), so it should never be committed.
+
+### 2. Directory Creation
+
+Create `/specs/projects/` if it doesn't exist:
+
+```bash
+mkdir -p specs/projects
+```
+
+### 3. Monorepo Detection
+
+Scan for root markers in subdirectories to detect potential sub-projects. Common markers:
+
+- `pyproject.toml`, `setup.py`, `setup.cfg` (Python)
+- `package.json` (Node.js/TypeScript)
+- `go.mod` (Go)
+- `Cargo.toml` (Rust)
+- `build.gradle`, `settings.gradle`, `pom.xml` (Java)
+- `.csproj`, `.sln` (C#)
+- `Gemfile` (Ruby)
+- `composer.json` (PHP)
+- `pubspec.yaml` (Dart)
+- `DESCRIPTION`, `NAMESPACE` (R)
+
+Ask the user:
+
+> Is this a monorepo? I found these potential sub-projects:
+> - [list paths with markers]
+>
+> Are there any other sub-projects I should know about?
+
+**Always ask**, even if no markers are found (the user may have a non-standard layout).
+
+**If monorepo:**
+
+1. Create `/specs/projects/` in each confirmed sub-project root.
+2. Create `/specs/monorepo.md` at repo root with:
+   - Sub-project names and their paths
+   - Brief description of what each sub-project does
+   - Ask the user for enough detail to make this useful
+
+Example monorepo.md:
+
+```markdown
+# Monorepo Layout
+
+This repository contains multiple projects.
+
+## Sub-Projects
+
+| Path | Description |
+|------|-------------|
+| `backend/` | Python API server using FastAPI |
+| `frontend/` | React web application |
+| `shared/` | Shared TypeScript types and utilities |
+
+Projects that span multiple sub-projects should be spec'd at the repo root's `/specs/projects/`.
+Projects scoped to a single sub-project should be spec'd within that sub-project's `/specs/projects/`.
+```
+
+### 4. External Knowledge Check
+
+Check for commonly-needed project-specific configuration that lives in the user's system prompt (CLAUDE.md, AGENTS.md, etc.). Suggest additions if missing.
+
+Look for evidence of:
+
+| Item | What to look for |
+|------|------------------|
+| Automated check / CI command | `checks.sh`, `npm test`, `pytest`, `cargo test --all-targets`, etc. |
+| Test framework | References in code or config files |
+| Linting / formatting | `.eslintrc`, `pyproject.toml` (ruff, black), `golangci.yml`, etc. |
+| Code review guidelines | `.agents/code_review_guide.md`, `docs/cr.md`, etc. |
+| Code style conventions | Style guides linked in config |
+
+Also try to detect the agent tool in use:
+- `.cursor/` → Cursor
+- `CLAUDE.md` or `.claude/` → Claude Code
+- `AGENTS.md` or `.codex/` → Codex
+- (Other tool markers)
+
+Use the detected tool name in suggestions if known. Fall back to "your agent's system prompt configuration."
+
+**For each missing item, provide:**
+
+1. Where to add it (e.g., "Add to CLAUDE.md or AGENTS.md")
+2. A template snippet
+
+Example suggestions:
+
+```markdown
+### Suggested Addition: Automated Checks
+
+I don't see automated check commands configured for this project. Add something like this to your CLAUDE.md or AGENTS.md:
+
+```markdown
+## Automated Code Checks
+
+Run all automated checks with:
+- Python: `uv run pytest && uv run ruff check . && uv run mypy .`
+- JavaScript: `npm test && npm run lint`
+```
+
+This ensures the coding agent runs tests, linting, and type-checking before finishing a phase.
+```
+
+### 5. Completion
+
+Summarize what was done:
+
+```
+Setup complete:
+- Added .specs_skill_state/ to .gitignore
+- Created /specs/projects/ directory
+- [Created /specs/projects/ in sub-project roots if monorepo]
+- [Created /specs/monorepo.md if monorepo]
+- [Suggestions for external knowledge configuration]
+
+You're ready to run `/spec new_project` to start your first spec'd project.
+```
+
+## Notes
+
+- **Idempotent**: Can run multiple times safely. Only adds what's missing.
+- **Incremental**: If you add new sub-projects later, re-run setup to extend the structure.
+- **Non-destructive**: Never modifies or deletes existing content.
diff --git a/.cursor/skills/specs/references/coding_phase_prompt.md b/.cursor/skills/specs/references/coding_phase_prompt.md
new file mode 100644
index 000000000..e8aa809bc
--- /dev/null
+++ b/.cursor/skills/specs/references/coding_phase_prompt.md
@@ -0,0 +1,89 @@
+# Coding Phase Prompt
+
+**This is the self-contained prompt passed to a coding sub-agent.** It is written in the second person, addressed to the coding sub-agent.
+
+---
+
+You are implementing phase N of a project using the spec-driven development process.
+
+## Your Role
+
+You are a very skilled senior software engineer. Your code:
+
+- Explains itself through great naming and composition
+- Uses comments only for external constraints, not to describe poorly structured code
+- Is test-driven: tests that catch real breakage, don't need constant refactoring, target 95%+ coverage, reuse test helpers
+
+You're willing to flag when a requirement leads to bad technical outcomes — but you don't re-litigate plan-level decisions that were already confirmed during speccing.
+
+## Context Loading
+
+1. Read `specs/projects/PROJECT_NAME/implementation_plan.md` to identify phase N
+2. Read the spec artifacts for context:
+   - `functional_spec.md`
+   - `architecture.md`
+   - `ui_design.md` (if exists)
+   - `components/*.md` (if exist)
+
+## Write Phase Plan
+
+Before coding, write a detailed phase plan to `specs/projects/PROJECT_NAME/phase_plans/phase_N.md`:
+
+```markdown
+---
+status: draft
+---
+
+# Phase N: [Brief Title]
+
+## Overview
+
+[What this phase accomplishes and why]
+
+## Steps
+
+1. [Specific step: file to change, exact change, code snippets for signatures]
+2. [Continue for each step...]
+```
+
+## Implementation Loop
+
+1. Build the code per the phase plan
+2. Run automated checks (lint, format, type-check, build). Follow project-specific commands from system prompt. Iterate until clean.
+3. Write tests per the phase plan's test section
+4. Run tests. Iterate until passing.
+5. Run automated checks again. Iterate until clean.
+6. Self code-review via sub-agent:
+   - Read `spec/references/cr_agent_prompt.md` for the CR process
+   - Spawn a CR sub-agent with clean context
+   - Pass: "A coding agent just implemented phase N of [project]. Review using `git diff`."
+   - Iterate per CR loop below
+7. Run automated checks one final time. Iterate until clean.
+8. Mark phase checkbox in `implementation_plan.md` (toggle only)
+9. Present summary of what was built
+
+## CR Iteration Loop
+
+1. Spawn CR sub-agent with the CR prompt from `spec/references/cr_agent_prompt.md`
+2. CR returns feedback with severity labels
+3. If issues exist:
+   - Fix each issue (or add a code comment explaining technical rationale)
+   - Spawn new CR sub-agent with same prompt plus `<prior_cr_feedback>` block
+4. Re-review agent verifies prior issues addressed AND checks for new issues
+5. Loop until CR returns clean
+
+## Non-Interactive
+
+Work autonomously. Don't ask the user for help.
+
+**One exception:** You discover a genuinely new technical constraint not known at design time that materially changes the plan (e.g., API doesn't support assumed operation, framework has undocumented limitation).
+
+In this case only, pause and surface the issue.
+
+## Completion
+
+Mark `status: complete` on the phase plan. Mark the phase checkbox in `implementation_plan.md`. Present summary of what was built.
+
+---
+
+**Design note:** This prompt duplicates some content from `cmd_implement.md`. That's intentional — this must be self-contained because it's passed to a sub-agent with no access to the parent conversation.
diff --git a/.cursor/skills/specs/references/cr_agent_prompt.md b/.cursor/skills/specs/references/cr_agent_prompt.md
new file mode 100644
index 000000000..c81d13409
--- /dev/null
+++ b/.cursor/skills/specs/references/cr_agent_prompt.md
@@ -0,0 +1,134 @@
+# Code Review Agent Prompt
+
+**This is the self-contained prompt passed to a CR sub-agent.** Written in second person, addressed to the CR sub-agent.
+
+---
+
+You are reviewing code as part of a spec-driven development process.
+
+## Your Role and Persona
+
+You are a detail-oriented senior IC who will own this code long-term. You:
+
+- Are direct and specific when there's a problem — polite but don't soften real issues
+- Review each file in detail, then zoom out to the whole change
+- Read the spec to verify the right thing was implemented, not just that something was implemented
+
+You care deeply about code quality because you'll be maintaining this code long after the original author is gone.
+
+## Context Loading
+
+1. Read the project's functional spec: `specs/projects/PROJECT_NAME/functional_spec.md`
+2. Read relevant architecture docs: `specs/projects/PROJECT_NAME/architecture.md`
+3. Read component docs if they exist: `specs/projects/PROJECT_NAME/components/*.md`
+4. Use `git diff` to see the code changes
+5. Read the phase plan if it exists: `specs/projects/PROJECT_NAME/phase_plans/phase_N.md`
+
+## Create a Review Plan
+
+Before starting, create a quick review plan:
+
+- Files to review
+- Specs to check against
+- Areas of concern
+
+## Review Dimensions
+
+### 1. Spec Compliance
+
+Does the implementation match what the spec says?
+
+- Missing features
+- Wrong behavior
+- Incomplete edge case handling
+- Unimplemented requirements
+
+### 2. Code Quality
+
+- Architecture: Does the code structure make sense?
+- Naming: Are variables, functions, classes named clearly?
+- Composition: Is the code well-composed, not tangled?
+- Error handling: Are errors handled properly?
+- Test quality: Do tests cover meaningful cases? Are they brittle?
+
+### 3. Consistency
+
+Does new code match existing patterns and conventions in the codebase?
+
+### 4. Project-Specific Standards
+
+Follow any code review guidelines defined in the project's system prompt config (CLAUDE.md, AGENTS.md, etc.).
+
+## Severity Labels
+
+Each issue gets one:
+
+- **Critical** — Must fix before merging. Breaking change, security issue, major bug, spec violation.
+- **Moderate** — Should fix. Code smell, maintainability issue, minor bug, unclear behavior.
+- **Mild** — Consider fixing. Nit, style inconsistency, minor improvement opportunity.
+
+## Output Format
+
+Group findings by severity, with file/line references where applicable:
+
+```markdown
+## Critical (must fix)
+
+- [file:line] **[Issue title]**
+  [Description of the problem and why it matters]
+  [Suggestion for fix if applicable]
+
+## Moderate (should fix)
+
+[Same format]
+
+## Mild (consider fixing)
+
+[Same format]
+```
+
+## Re-Review Protocol
+
+If a `<prior_cr_feedback>` block is present in your prompt:
+
+```
+<prior_cr_feedback>
+[Prior CR content]
+</prior_cr_feedback>
+```
+
+You have two responsibilities:
+
+1. **Verify prior issues**: Check each previously flagged issue:
+   - Resolved → Short confirmation
+   - Still unresolved → Explain why the fix or comment is insufficient
+2. **Check for new issues**: The fixes themselves may have introduced problems
+
+Output format with re-review:
+
+```markdown
+## Previously Flagged — Resolved
+
+- [file:line] [Brief confirmation]
+
+## Previously Flagged — Still Unresolved
+
+- [file:line] **[Issue title]**
+  [Explanation of why this remains an issue]
+
+## New Issues
+
+[Standard format]
+```
+
+## Final Output
+
+If review is clean:
+
+> No issues found. Implementation matches spec and code quality is good.
+
+Otherwise, present all findings as described above.
+
+---
+
+**Design note:** This is self-contained. The sub-agent reads spec files from the repo, not from skill references.
diff --git a/.cursor/skills/specs/references/pushback.md b/.cursor/skills/specs/references/pushback.md
new file mode 100644
index 000000000..f2d82eea8
--- /dev/null
+++ b/.cursor/skills/specs/references/pushback.md
@@ -0,0 +1,60 @@
+# Pushback Protocol
+
+How and when to push back on user decisions.
+
+## When to Push Back
+
+Push back when the user's approach may lead to problems:
+
+### During Planning
+
+- **Project overview**: Risky scope, unclear goals, contradictory requirements
+- **Functional spec**: Unnecessary complexity, features that don't serve stated goals, missing edge cases
+- **Architecture**: Overengineering, wrong tool for the job, patterns that hurt testability/maintainability, insufficient error handling
+
+### During Implementation
+
+Only when coding reveals genuinely new information that changes the calculus:
+
+- A spec decision leads to worse technical outcomes than expected
+- A framework/library has an undocumented limitation
+- An API doesn't support an assumed operation
+
+**NOT during implementation:**
+- Plan-level decisions (that opportunity was during planning)
+- Re-litigating decisions already confirmed
+
+## When NOT to Push Back
+
+- After the user has already explicitly confirmed a decision you previously pushed back on
+- Low-stakes decisions where the cost of discussing exceeds the cost of the suboptimal choice
+- Style preferences that don't affect quality or maintainability
+
+## Format
+
+```
+**Concern:** [Clear statement of the problem]
+
+[Explanation of the tradeoff — why this matters, what it costs]
+
+- **Option A:** [Alternative approach] — [tradeoff]
+- **Option B:** [Another alternative] — [tradeoff]
+- **Option C:** [Proceed as planned] — [what you accept by doing so]
+
+What's your preference?
+```
+
+May be preceded by a number if included in a question set.
+
+## Calibration
+
+Pushback intensity should match the risk:
+
+- **Low-risk**: A "yes" → proceed
+- **High-risk**: Need explicit confirmation that user understands tradeoffs ("I understand, but I want X"), not just "yes"
+
+## Final Say
+
+The user always decides. After pushback and explicit confirmation, proceed without relitigating.
+
+Your job is to inform, not to block. Present the tradeoffs clearly, then respect their decision.
diff --git a/.cursor/skills/specs/references/spawning_subagents.md b/.cursor/skills/specs/references/spawning_subagents.md
new file mode 100644
index 000000000..29c5cd012
--- /dev/null
+++ b/.cursor/skills/specs/references/spawning_subagents.md
@@ -0,0 +1,55 @@
+# Spawning Sub-Agents
+
+Explanation of the sub-agent pattern and how to use it across different tools.
+
+## What and Why
+
+Sub-agents are fresh agent contexts with no conversation history from the current session.
+
+Use them when clean context matters:
+
+- **Code review**: CR shouldn't see coding agent's thinking
+- **Phase implementation**: Each phase starts fresh
+
+The sub-agent sees only what you pass it (a prompt) plus the repo. No conversation history.
+
+## What to Pass
+
+- A prompt/task description (caller specifies this — typically a reference file's content)
+- Optionally structured data like `<prior_cr_feedback>`
+
+**Never pass conversation history.** The point is a clean context.
+
+## Examples by Tool
+
+### Claude Code
+
+Use the `Task()` tool or equivalent sub-agent mechanism:
+
+```python
+Task("Review this code using the guidelines in spec/references/cr_agent_prompt.md")
+```
+
+### Cursor
+
+Use Cursor's sub-agent spawning capability.
+
+### Generic / Unknown
+
+If the tool doesn't have explicit sub-agent support:
+
+Approximate by:
+1. Clearing context
+2. Starting a new conversation with only the sub-agent prompt
+
+This is less ideal since it can't run in parallel with the parent, but works for the clean context requirement.
+
+## Fallback Language
+
+If unsure about tool capabilities, use:
+
+> Use your sub-agent or task-spawning capability to start a fresh agent context with the following prompt:
+>
+> [prompt content]
+
+The agent will use whatever mechanism is available.
diff --git a/.cursor/skills/specs/references/step_architecture.md b/.cursor/skills/specs/references/step_architecture.md
new file mode 100644
index 000000000..141dc007c
--- /dev/null
+++ b/.cursor/skills/specs/references/step_architecture.md
@@ -0,0 +1,123 @@
+# Step: Architecture
+
+Write a technical architecture doc through iterative Q&A. This is where you solve the hard technical problems.
+
+## Process
+
+1. Read overview + functional spec (+ ui_design if present)
+2. Ask technical questions
+3. Draft `architecture.md`
+4. Present for review
+5. Iterate based on feedback
+6. Confirm, mark `status: complete`
+
+## What to Cover
+
+### Data Model
+
+- What are the main entities?
+- How do they relate?
+- How will data be stored?
+
+### Component Breakdown
+
+- What are the major classes/modules?
+- What are their responsibilities?
+- How do they interact?
+
+### Public Interfaces
+
+- Function signatures, API contracts
+- Protocols/interfaces between components
+- What are the boundaries?
+
+### Design Patterns
+
+- What patterns are we using and why?
+- Framework and library choices with rationale
+
+### Technical Challenges
+
+- Identify anything non-trivial
+- Design the solution NOW, not during coding
+- Hard problems get solved here
+
+### Error Handling Strategy
+
+- How do errors propagate?
+- What's recoverable vs. fatal?
+- What's the logging approach?
+
+### Testing Strategy
+
+- What kinds of tests?
+- What coverage targets?
+- What frameworks?
+- Test approach per component
+
+## Depth Requirement
+
+The architecture must be deep enough that no significant technical decisions remain for the coding agent.
+
+- Classes, functions, overall flow — specified
+- Key algorithms — designed
+- Test cases — planned
+- Key dependencies — chosen
+
+The coding agent executes a well-defined plan. It doesn't design.
+
+## 1-Phase vs 2-Phase Decision
+
+Decide whether the project needs component designs (step 5) or if everything fits in architecture.md:
+
+**Single file (architecture.md only) when:**
+- Architecture doc would be under ~300 lines of technical content
+- Components don't have enough internal complexity to warrant separate docs
+- Project is small to medium sized
+
+**Two-phase (architecture.md + components/) when:**
+- Architecture doc would exceed ~300 lines
+- Individual components have enough complexity for their own docs
+- Project is large with many interacting parts
+
+Both approaches expect the same level of technical detail. It's just a matter of organization.
+
+Communicate this decision to the user:
+
+> This project is [small/large] enough that I [recommend/will] use a [single architecture doc / architecture doc plus component designs]. Does that sound right?
+
+## Pushback
+
+→ Load [references/pushback.md](references/pushback.md) if not already loaded.
+
+Challenge technical decisions:
+
+- Overengineering: building more than needed
+- Wrong tool for the job: framework/library doesn't fit the use case
+- Patterns that hurt testability or maintainability
+- Insufficient error handling
+- Missing edge cases in the technical design
+- Framework choices that don't fit the requirements
+
+Also push back on functional requirements that add disproportionate technical complexity. You can offer alternatives that span both functional spec and architecture:
+
+> The feature you described (X) adds significant technical complexity because [reasons]. A few alternatives:
+> - **Option A:** [Simplified feature] — [benefit]
+> - **Option B:** [Different approach] — [benefit]
+> - **Option C:** Proceed as planned — accept the complexity
+
+## Completion
+
+Create `specs/projects/PROJECT_NAME/architecture.md`:
+
+```markdown
+---
+status: draft
+---
+
+# Architecture: [Project Name]
+
+[Organized sections covering the areas above]
+```
+
+Present for review. Iterate if needed. When user confirms, mark `status: complete`.
diff --git a/.cursor/skills/specs/references/step_component_designs.md b/.cursor/skills/specs/references/step_component_designs.md
new file mode 100644
index 000000000..dd79d75b5
--- /dev/null
+++ b/.cursor/skills/specs/references/step_component_designs.md
@@ -0,0 +1,84 @@
+# Step: Component Designs
+
+Per-component detailed designs when architecture.md alone isn't sufficient.
+
+## When to Use
+
+You decided during the architecture step that component designs are needed. This happens when:
+
+- Architecture.md would be too long (>~300 lines of technical content)
+- Individual components have enough internal complexity to warrant their own docs
+- Clear separation of concerns would improve clarity
+
+## What Each Component Doc Covers
+
+For each component, create a doc covering:
+
+### Purpose and Scope
+
+- What does this component do?
+- What's NOT part of its responsibility?
+
+### Public Interface
+
+Full function/method signatures with:
+- Parameter types
+- Return types
+- Exception/error conditions
+
+### Internal Design Approach
+
+- Key algorithms
+- Data flow
+- State management approach
+- Any non-trivial implementation details
+
+### Dependencies
+
+- What this component depends on
+- What depends on this component
+
+### Test Plan
+
+Specific test cases by name and what they verify.
+
+## File Structure
+
+Create one file per component in `specs/projects/PROJECT_NAME/components/`:
+
+```
+specs/projects/PROJECT_NAME/components/
+├── authentication.md
+├── data_store.md
+└── api_handler.md
+```
+
+Each file:
+
+```markdown
+---
+status: draft
+---
+
+# Component: [Component Name]
+
+## Purpose and Scope
+[...]
+```
+
+## Review
+
+After writing all component docs:
+
+> Component designs written to `specs/projects/PROJECT_NAME/components/`:
+> - [component1.md]
+> - [component2.md]
+> - [...]
+>
+> Ready to continue to implementation plan?
+
+Quick review — no need for detailed back-and-forth unless the user has concerns.
+
+## Completion
+
+When user confirms, mark each component file as `status: complete`.
diff --git a/.cursor/skills/specs/references/step_functional_spec.md b/.cursor/skills/specs/references/step_functional_spec.md
new file mode 100644
index 000000000..afed00642
--- /dev/null
+++ b/.cursor/skills/specs/references/step_functional_spec.md
@@ -0,0 +1,107 @@
+# Step: Functional Spec
+
+Write a functional specification through iterative Q&A with the user.
+
+## Process
+
+1. Read the project overview
+2. Identify gaps — what information is missing to fully specify what's being built
+3. Ask clarifying questions in rounds:
+   - Start with high-level questions
+   - Then drill into details
+   - You may need multiple rounds of Q&A
+4. Draft `functional_spec.md` based on answers
+5. Present for review
+6. Iterate based on feedback
+7. Confirm with user, mark `status: complete`
+
+## What to Cover
+
+The sections should adapt to the project. Here are common areas — include what's relevant, don't force a rigid template:
+
+### Features and Behavior
+
+- What does this thing do?
+- What are the user flows or usage patterns?
+- What are the key features?
+- What's explicitly out of scope?
+
+### Edge Cases and Error Handling
+
+- What happens when things go wrong?
+- What are the boundaries?
+- How should errors be presented to users?
+- What's recoverable vs. fatal?
+
+### Input/Output Contracts
+
+For APIs, CLIs, libraries:
+
+- What goes in? Formats, validation requirements
+- What comes out? Response formats, return types
+- What are the contracts?
+
+For APIs: endpoint definitions, request/response schemas
+For CLIs: command structure, argument formats
+For libraries: public interface, function signatures
+
+### Configuration and Defaults
+
+- What's configurable?
+- What are sensible defaults?
+- Where does configuration live?
+
+### Constraints
+
+- Performance requirements
+- Compatibility requirements
+- Security considerations
+- Resource limits
+
+### UI Projects Only
+
+High-level screens/views and navigation at the functional level:
+
+- What are the main screens/views?
+- How do users navigate between them?
+- What actions are available on each?
+
+(Details will be in step 3: UI Design)
+
+## Quality Bar
+
+The spec should be complete enough that someone unfamiliar with the project could understand what is being built and why, from this document alone.
+
+Every behavior and decision should be explicit. Don't leave gaps for the coder to fill in during implementation.
+
+If you're unsure about something, ask. Don't guess.
+
+## Pushback
+
+→ Load [references/pushback.md](references/pushback.md) if not already loaded.
+
+After drafting the spec, review it and challenge:
+
+- Feature decisions that seem unnecessary or over-scoped
+- Missing edge cases that will bite later
+- Unclear or ambiguous requirements
+- Scope that doesn't match the stated goals
+- Requirements that add disproportionate complexity
+
+Present concerns to the user with alternatives.
+
+## Completion
+
+Create `specs/projects/PROJECT_NAME/functional_spec.md`:
+
+```markdown
+---
+status: draft
+---
+
+# Functional Spec: [Project Name]
+
+[Organized sections covering the areas above]
+```
+
+Present for review. Iterate if needed. When user confirms, mark `status: complete`.
diff --git a/.cursor/skills/specs/references/step_ui_design.md b/.cursor/skills/specs/references/step_ui_design.md
new file mode 100644
index 000000000..5c2a9a240
--- /dev/null
+++ b/.cursor/skills/specs/references/step_ui_design.md
@@ -0,0 +1,83 @@
+# Step: UI Design
+
+UI/UX design for projects with user-facing interfaces. Skipped for backend-only projects.
+
+## Process
+
+1. Read overview + functional spec
+2. Propose UI structure
+3. Iterative Q&A with user
+4. Draft `ui_design.md`
+5. Present for review
+6. Iterate based on feedback
+7. Confirm, mark `status: complete`
+
+## What to Cover
+
+Content varies by project type:
+
+### Web Applications
+
+- Page inventory: what pages exist
+- Page layout: what's on each page
+- Navigation: primary nav, secondary nav, breadcrumbs
+- Component breakdown: reusable UI components
+- Responsive behavior: mobile, tablet, desktop
+- Overlays: modals, sidebars, slide-outs
+
+### Mobile Applications (iOS/Android)
+
+- Screen inventory: what screens exist
+- Screen flow: how users navigate between screens
+- Screen content: what's on each screen
+- Navigation patterns: tabs, navigation stacks, modals, sheets, bottom sheets
+- Platform conventions: follow HIG (iOS) or Material Design (Android)
+
+### CLI Tools
+
+- Command structure: subcommands, flags, arguments
+- Output formatting: tables, JSON, human-readable
+- Interactive prompts: confirmations, selections
+- Help text: per-command, global flags
+
+## UX Lens
+
+Think like a senior designer:
+
+- **Intuitive**: Can users figure it out without training?
+- **Discoverable**: Can users find features without hunting?
+- **Low cognitive load**: Is the interface simple, not overwhelming?
+- **Progressive disclosure**: Show simple first, reveal complexity as needed
+- **Platform conventions**: Don't reinvent standard patterns
+
+Follow platform conventions. Users already know how standard patterns work — don't make them learn new ones.
+
+## Pushback
+
+→ Load [references/pushback.md](references/pushback.md) if not already loaded.
+
+Challenge UX decisions:
+
+- Discoverability issues: features hidden behind non-obvious interactions
+- Cognitive load: too much information at once
+- Poor progressive disclosure: overwhelming with complexity upfront
+- Navigation issues: users can't find their way back
+- Platform convention violations: reinventing standard patterns
+
+## Completion
+
+Create `specs/projects/PROJECT_NAME/ui_design.md`:
+
+```markdown
+---
+status: draft
+---
+
+# UI Design: [Project Name]
+
+[Organized sections covering the areas above]
+```
+
+For small UI surfaces, this can be folded into `functional_spec.md` instead. Use judgment — if UI design would be <30 lines standalone, fold it in.
+
+Present for review. Iterate if needed. When user confirms, mark `status: complete`.
diff --git a/.gitignore b/.gitignore
index cc44bed61..44b6b8aa7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -22,5 +22,7 @@ dist/
 
 .mcp.json
 
+.specs_skill_state/
 .config/wt/user_settings.sh
 .worktrees/
+
diff --git a/AGENTS.md b/AGENTS.md
index 623adbd21..961632ac3 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -38,6 +38,10 @@ These prompts can be accessed from the `get_prompt` tool, and you may request se
 - Don't include comments in code explaining changes, explain changes in chat instead.
 - Before wrapping up a task, run appropriate tools for linting, testing, formatting and typechecking. Fix any issues you introduced.
 
+### Code Review Guidelines
+
+If asked to perform a code review, read our [code review guidelines](.agents/code_review_guidelines.md).
+
 ### Final
 
 To show you read these, call me 'boss'
diff --git a/app/desktop/specs/projects/.gitkeep b/app/desktop/specs/projects/.gitkeep
new file mode 100644
index 000000000..e69de29bb
diff --git a/app/web_ui/specs/projects/.gitkeep b/app/web_ui/specs/projects/.gitkeep
new file mode 100644
index 000000000..e69de29bb
diff --git a/libs/core/specs/projects/.gitkeep b/libs/core/specs/projects/.gitkeep
new file mode 100644
index 000000000..e69de29bb
diff --git a/libs/server/specs/projects/.gitkeep b/libs/server/specs/projects/.gitkeep
new file mode 100644
index 000000000..e69de29bb
diff --git a/specs/monorepo.md b/specs/monorepo.md
new file mode 100644
index 000000000..befc47ab5
--- /dev/null
+++ b/specs/monorepo.md
@@ -0,0 +1,15 @@
+# Monorepo Layout
+
+This repository contains multiple projects.
+
+## Sub-Projects
+
+| Path | Description |
+|------|-------------|
+| `libs/core/` | Python library with the core functionality of Kiln (evals, synthetic data gen, fine tuning, RAG, etc.) |
+| `libs/server/` | FastAPI REST server wrapping the core library |
+| `app/web_ui/` | Svelte web app frontend for Kiln |
+| `app/desktop/` | Python desktop app — PyInstaller app that runs a FastAPI server, hosts the pre-compiled web app, and launches a browser for UI |
+
+Projects that span multiple sub-projects should be spec'd at the repo root's `/specs/projects/`.
+Projects scoped to a single sub-project should be spec'd within that sub-project's `/specs/projects/`.
diff --git a/specs/projects/.gitkeep b/specs/projects/.gitkeep
new file mode 100644
index 000000000..e69de29bb
diff --git a/utils/setup_claude.sh b/utils/setup_claude.sh
new file mode 100755
index 000000000..e7fba91e1
--- /dev/null
+++ b/utils/setup_claude.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+# Setup Claude Code for this worktree
+
+set -e
+
+# Copy AGENTS.md to CLAUDE.md
+cp AGENTS.md CLAUDE.md
+
+# Copy skills if they exist
+if [ -d ".cursor/skills" ]; then
+    mkdir -p .claude
+    cp -r .cursor/skills .claude/
+fi
+
+echo "Claude Code setup complete"