Realistic examples of building evals and optimizing agents using Harbor.
Install Harbor:
uv tool install harborRun any task recipe:
harbor run -p harbor_cookbook/recipes/<name> -a claude-code -m anthropic/claude-opus-4-6| Name | Description |
|---|---|
| simple‑task | Minimal single-container task. |
| multi‑container | Docker Compose task where the agent interacts with a locally hosted REST API. |
| mcp‑tools | Giving the agent custom tools via a locally hosted FastMCP server. |
| multi‑reward | Multiple independent verifiers each producing their own score. |
| simulated‑user | Agent discovers requirements by talking to a simulated user. |
| computer‑use‑ubuntu | Computer use reference implementation on an Ubuntu virtual desktop. |
| computer‑use‑windows | Computer use reference implementation on a remote Windows desktop (Daytona). |
| dns‑blacklisting | Network-level hostname blacklisting with exact, wildcard, and regex rules. |
| Name | Description |
|---|---|
| gepa | Agent harness optimization for MedAgentBench using Harbor+GEPA. |
| tinker‑rl | RL training on Harbor tasks using the Tinker SDK. |