Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions Quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,24 @@ cargo run --release -- --eval read-and-explain --model qwen2.5-coder:7b
cargo run --release -- --eval fix-failing-test --json
```

You can also point `--eval` at a data-only fixture JSON file. Workspaces are
resolved relative to that fixture file and may not escape that root:

```json
{
"id": "external-readme-check",
"prompt": "Update README.md so it mentions the release version.",
"workspace": "workspace/readme-check",
"checks": [
{ "type": "fileContains", "path": "README.md", "needle": "version" }
]
}
```

```bash
cargo run --release -- --eval ./evals/local/external-readme-check.json --json
```

## A Good First Session

Here is a simple sequence that exercises the whole product:
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,9 @@ runtime.
`printf '…\n' | small-harness` for scripts and CI. Approval-gated tools
are denied by default; pass `--allow-tools` to allow them.
- **Agent eval CLI** — `small-harness --eval fix-failing-test [--model M] [--json]`
runs a bundled eval fixture and exits 0/1 (for CI scripts).
runs a bundled eval fixture and exits 0/1 (for CI scripts). `--eval` can
also point at a data-only fixture JSON file; its workspace is resolved
relative to that file and rejected if it escapes the fixture root.
- **Warmup.** Small Harness sends a 1-token request with the full system
prompt + tools at startup so llama.cpp-derived engines have a hot
prompt-eval cache before your first prompt. Disable with `WARMUP=false`.
Expand Down
Loading
Loading