This guide is for the first 20 minutes with Small Harness. It focuses on the top things you can do immediately: try a bundled demo, fix failing tests on your repo, understand a codebase, make a safe edit, and tune Small Harness to the best model available on your machine.
You need Rust and at least one OpenAI-compatible local backend. Ollama is the fastest first path:
brew install ollama
brew services start ollama
ollama pull qwen2.5-coder:7bThen run Small Harness from the project you want to work in:
cargo run --releaseOn first run, the setup wizard creates agent.config.json. For the quickest
local setup, choose:
backend: ollama
model override: blank
approval policy: dangerous-only
tool mode: auto
Pull a 7B coder and run a bundled demo — no repo setup required:
/play
/play fix-failing-test
Small Harness copies a tiny Rust crate with a failing test into
.sessions/play/, switches to ship mode, and runs the agent live. You approve
edits (or pass --yolo to auto-approve). When it finishes, you get a scorecard
showing whether tests pass.
/play score
/play exit
Then try the same loop on your real project:
/fix
/fix all --attempts 3
/fix --yolo
/fix runs smart-selected tests, loops until they pass (default 5 attempts),
then restores your previous operator mode.
Compare two local models on the same demo:
/play battle fix-failing-test qwen2.5-coder:7b,deepseek-coder:6.7b
Small Harness is most useful when you let it inspect files directly instead of pasting code into chat. Start with a broad map, then ask narrower questions.
Build the local project memory index first:
/index
/index status
/map
Try:
Give me a concise map of this repo. Focus on the entry points, core modules,
and where configuration lives.
Then:
Find the code path for slash commands and explain how a new command should be
added.
Useful commands:
/config show the active backend, model, tools, workspace, and history
/mode explore use a safer read/search preset while learning a repo
/tools show enabled tools and whether adaptive tool selection is on
/context show prompt budget, effective limit, headroom, and auto-guard status
/map show the local project memory repo map
What to look for:
- Small Harness should use read/search/list tools only when needed.
- For repo/code questions,
repo_searchshould help it find likely files fast. - With
toolSelection: "auto", ordinary chat should avoid sending tool schemas. - The answer should cite concrete files and functions, not just guess.
Small Harness can edit files, but the best workflow is to ask for a small, reviewable change and let approvals show you exactly what will happen.
Try:
Add a short comment above the function that dispatches slash commands explaining
that new commands should be registered in both COMMANDS and dispatch.
Then inspect what happened:
git diff
cargo testUseful commands:
/mode edit use edit-focused defaults
/mode ship enable edit + workflow tools; auto-verify tests after edits
/shipcheck show branch drift, dirty files, diff stats, and memory freshness
/ship preview last-mile readiness, blockers, and a commit-message draft
/ship commit --all
/ship push
/ship pr
/ship status
/scorecard show global quality PRs shipped
/scorecard pr 1 drill into the most recent closed PR audit
/scorecard close "OAuth login PR" --url https://github.com/org/repo/pull/42
/scorecard doctor inspect the local scorecard ledger
/scorecard export copy the raw scorecard JSONL before repair or sharing
/handoff draft commit, changelog, testing, and X-ready release copy
/fusion on switch to OpenRouter Fusion for deliberative coding questions
/fusion tool attach Fusion deliberation to an OpenRouter coding model
/route select choose low/medium/high models for a task from your model stack
/session show current model, approval policy, session file, and tokens
/session title Refactor dispatch command
/sessions search dispatch
/new start a clean conversation
/export current markdown
/export current events copy the session event log sidecar
Transparent mode (see everything the agent did):
/verbose on
/trace on
The event log lives beside each transcript:
.sessions/<session-id>.events.jsonl (tool calls, approvals, compaction,
warmup, per-turn timing summary).
Good habits:
- Ask for one focused change at a time.
- Prefer exact files, functions, or tests when you know them.
- Keep
approvalPolicyatdangerous-onlyoralwaysuntil you trust a model. - Use
git diffas the source of truth before committing.
Different local models vary a lot. Small Harness can probe model capabilities, cache the results, benchmark latency, and recommend the best cached fit.
Everything lives under /doctor. Start with a hardware-aware recommendation:
/doctor recommend
This reads a safe summary of your Mac, ranks installed/default/cached models for coding-agent use, and shows the top choices. To refresh probes before ranking:
/doctor recommend refresh
Run:
/doctor --deep
/doctor bench
/doctor models
If you have multiple backends running, probe them all:
/doctor models refresh all
Then ask for a recommendation:
/doctor autotune
Apply the recommendation to the current session:
/doctor recommend apply
What Small Harness is checking:
- local chip, architecture, memory, and CPU counts
- model listing
- streaming responses
- usage chunks
- native tool calls
- inline JSON fallback for small models
- first-token latency
- estimated output tokens per second
By default, /doctor recommend prefers local backends. To let OpenRouter
compete with local models, use:
/doctor recommend --cloud
Long coding sessions on small local models can fill the context window quickly.
Small Harness auto-compacts older turns on local backends when usage crosses
~85% of the effective limit (run /context to see headroom). Compaction keeps
complete tool-call rounds intact so the transcript stays valid for the next
request. Use /compact manually if you want to shrink sooner. One-shot
--print mode does not auto-compact.
Use one-shot mode when you want Small Harness without the interactive TUI:
cargo run --release -- --print "Summarize the repo entry points"
printf 'What changed in this branch?\n' | cargo run --release --Approval-gated write and shell tools are denied in one-shot mode unless you pass
--allow-tools.
Run a bundled agent eval from the shell (exit code 0 on pass):
cargo run --release -- --eval read-and-explain --model qwen2.5-coder:7b
cargo run --release -- --eval fix-failing-test --jsonYou can also point --eval at a data-only fixture JSON file. Workspaces are
resolved relative to that fixture file and may not escape that root:
{
"id": "external-readme-check",
"prompt": "Update README.md so it mentions the release version.",
"workspace": "workspace/readme-check",
"checks": [
{ "type": "fileContains", "path": "README.md", "needle": "version" }
]
}cargo run --release -- --eval ./evals/local/external-readme-check.json --jsonHere is a simple sequence that exercises the whole product:
/config
/mode explore
Give me a concise map of this repo.
/index status
/doctor --deep
/doctor bench
/doctor recommend
/doctor models
/doctor autotune
Find one small README improvement and propose the exact diff before editing.
After the edit:
/mode ship
Fix the failing test and get this ready to commit.
In ship mode the harness:
- injects a compact ship-status line into the system prompt each turn
- exposes
run_tests,batch_edit, andship_statusas agent tools - after a successful edit turn, runs smart-selected tests and injects failures into the next turn context (no automatic re-run loop)
- saves a turn checkpoint when files change — use
/undoif the model breaks something
If a small model makes a bad edit:
/undo
/undo list
/checkpoints status
/undo restores file contents from immediately before the last mutating agent turn and removes files the model created. Checkpoints are enabled by default in edit and ship modes.
You can still run the operator commands manually:
/shipcheck
/shipcheck export
/ship --tests
/ship commit --all
/ship push
/ship pr
/ship status
/handoff
/handoff export
/fusion on
/fusion tool anthropic/claude-sonnet-4.5
/fusion off
/route template
/route select --dry-run add OAuth login and tests
/route apply coder high
/session
/test smart
/ship commit is local-only in this release: it can stage and commit after an
explicit confirmation, then saves a ship record under .sessions/ship/. It does
not create a PR by itself. /ship push pushes the current branch after
confirmation and sets origin/<branch> as upstream when the branch does not
have one. /ship pr uses GitHub CLI to open a draft PR, or prints the exact
gh pr create command if gh is missing or unauthenticated. /ship status
uses GitHub CLI to summarize the open PR, checks, review state, and next action
for the current branch.
Compare local models on agent-loop coding tasks:
/eval agent fix-failing-test ollama:qwen2.5-coder:7b
/eval agent all
Then run:
git diff
cargo fmt --all -- --check
cargo testSmall Harness keeps local state under .sessions/:
.sessions/
history.jsonl input history
*.jsonl session transcripts
*.events.jsonl per-session structured event logs (tools, timing, approvals)
project-memory/
index.json safe metadata-only repo index
notes.jsonl durable project notes from /remember
doctor/ deep doctor JSON and Markdown reports
evals/ eval suite JSON and Markdown reports
shipcheck/ release-readiness Markdown reports
handoff/ local ship-handoff Markdown drafts
hardware.json safe hardware summary, without serials or UUIDs
capabilities/ per-model capability and benchmark cache
That local cache powers /doctor recommend, /doctor models, /doctor autotune, /map, and
repo_search.