Hermes Lab

File-first autonomous experiment scaffolding for long-running research controlled by AI agents or human operators.

Run hyperparameter sweeps, code mutation loops, or any iterative research process with built-in scheduling, dispatch, multi-fidelity tiers, and automatic result tracking.

Why Hermes Lab?

Most ML experiment tools assume you're training models on GPUs. Hermes Lab is domain-agnostic: it manages any experiment that has a metric, a search space, and a way to run iterations. The lab is the scheduler and bookkeeper; you bring the executor.

File-first: All state lives in plain files (YAML, JSON, Markdown). No database, no server.
Agent-native: Built to be operated by AI coding agents (Claude Code, Codex, etc.) or humans via CLI.
Append-only history: Every run is an immutable bundle. Derived views (summaries, rankings) are rebuilt from sealed runs.
Multi-fidelity: Run cheap proxy experiments first, promote winners to expensive final validation.
Dispatch protocol: Let remote workers or cloud GPUs claim work packages without direct filesystem access.

Quick Start

# Clone the repo
git clone https://github.com/amenti-labs/hermes-lab.git
cd hermes-lab

# Set your data root (or use the default ./lab-data)
export HERMES_LAB_DATA_ROOT=~/hermes-lab-data

# Initialize the lab
python3 scripts/labctl.py init

# Create an experiment from a template
python3 scripts/labctl.py create templates/autoresearch-generic.yaml

# Run one scheduler cycle
python3 scripts/labctl.py run-once

# Check status
python3 scripts/labctl.py status

Execution Modes

Mode	Description	Use when
Cron	Autonomous background runs on a schedule	Unattended overnight search
Burst	Back-to-back iterations with auto-generated params	Fast exploration, you're watching
Guided	Pauses for approval before each iteration	Early experiment design, steering
Swarm	Multi-strategy rotation with shared blackboard	Broad search across approaches

# Burst: 20 random iterations
python3 scripts/labctl.py burst <exp_id> --strategy random -n 20

# Guided: approve each step
python3 scripts/labctl.py guided <exp_id> --strategy perturb -n 10

# Swarm: rotate strategies
python3 scripts/labctl.py swarm <exp_id> --strategies random perturb bayesian -n 30

Search Strategies

Strategy	When to use
`random`	Initial exploration, no prior knowledge
`perturb`	Refine near a good baseline
`bayesian`	10+ trials done, want smarter suggestions (needs `optuna`)
`evolution`	Large search space, population-based (needs `nevergrad`)
`tree`	AIDE-style tree search: branch new ideas or improve best ones
`llm`	LLM reads summaries and proposes next params

Templates

Start from a template that matches your use case:

Template	Description
`autoresearch-generic.yaml`	Minimal starting point for any experiment
`code-autoresearch.yaml`	Git-backed code mutation loop
`local-agent-autoresearch.yaml`	Provider-agnostic local agent mutation
`multifidelity-autoresearch.yaml`	Proxy/final two-tier experiments
`research-sprint.yaml`	Time-boxed research sprint

Key Concepts

Data root ($HERMES_LAB_DATA_ROOT): Where the lab stores all state -- experiment history, run bundles, metrics, dispatch queue. Default: ./lab-data. Set the env var to put it anywhere.
Workspace (workspace_root in SPEC): Where your experiment code lives -- training scripts, configs, search space. Can be any directory, anywhere on disk. The lab reads from it but doesn't own it.

These are independent. Your workspace might be ~/projects/my-model while the data root is ~/lab-data.

Data Root Layout

$HERMES_LAB_DATA_ROOT/
├── README-FIRST.md
├── LAB-STATUS.md
├── LAB-INDEX.json
├── PROGRAM.md
├── CHANGELOG.md
├── registry/
│   ├── events.jsonl
│   └── locks/
├── experiments/
│   └── <exp-id>/
│       ├── SPEC.yaml
│       ├── STATUS.json
│       ├── RUNBOOK.md
│       ├── SUMMARY.md
│       ├── NEXT.md
│       ├── context.md
│       ├── best.md
│       ├── metrics.jsonl
│       ├── checkpoints/
│       └── runs/
│           └── RUN-<timestamp>-<role>/
│               ├── manifest.json
│               ├── plan.md
│               ├── RESULT.md
│               ├── metrics.json
│               ├── stdout.log
│               ├── stderr.log
│               └── artifacts/
├── digests/{daily,weekly}
├── control/{inbox,approvals,outbox}
├── dispatch/{ready,running,complete}
├── backups/
└── scratch/

Search Space

Workspaces using burst, guided, or swarm modes need a search_space.json:

{
  "weight_decay": {"low": 0.1, "high": 10.0, "log": true, "type": "float"},
  "learning_rate": {"low": 1e-5, "high": 1e-2, "log": true, "type": "float"},
  "hidden_dim": {"low": 64, "high": 512, "type": "int"}
}

Flat params are auto-merged into nested configs.

How It Works

labctl run-once does this for each eligible experiment:

Reclaim stale leases if needed.
Pick the next role from worker_roles.
Acquire a lease.
Create a run bundle and plan.md.
Run the configured executor_command (or built-in stub).
Seal the run bundle with RESULT.md and metrics.json.
Rebuild projections (RUNBOOK.md, SUMMARY.md, LAB-STATUS.md).
Release the lease.

Dispatch Workflow

For workers that should not write directly into experiment folders:

python3 scripts/labctl.py dispatch-ready --max-runs 1
python3 scripts/labctl.py dispatch-claim --max-runs 1 --worker my-gpu
python3 scripts/labctl.py dispatch-work --max-runs 1 --worker my-gpu
python3 scripts/labctl.py dispatch-complete <dispatch_id>
python3 scripts/labctl.py dispatch-ingest

Multi-Fidelity

Run cheap proxy experiments, promote winners to final validation:

fidelity_tiers:
  - proxy
  - final
initial_fidelity_tier: proxy
proxy_executor_class: local-cpu
final_executor_class: cloud-h100

Agent Integration

Hermes Lab is designed for AI agents. The recommended agent is Hermes Agent, but any agent harness that can read files and run shell commands will work.

The AGENTS.md file at the repo root is the agent contract. LAB_MANIFEST.json is the machine-readable version.

Agents get structured context via environment variables:

LAB_DATA_ROOT, LAB_EXPERIMENT_ID, LAB_RUN_DIR
LAB_PRIMARY_METRIC, LAB_METRIC_DIRECTION
LAB_AGENT_PROVIDER, LAB_AGENT_MODEL
LAB_WORKSPACE_ROOT, LAB_FIDELITY_TIER
... and 30+ more (see LAB_MANIFEST.json)

Repo Structure

lab/          Core runtime (scheduler, strategies, runner, recovery)
scripts/      CLI (labctl.py), executors, mutation adapters
templates/    SPEC templates and data-root document templates
config/       launchd plist templates
docs/         Architecture and operations docs
tests/        Test suite
AGENTS.md     Agent contract

Full Command Reference

python3 scripts/labctl.py init                    # Initialize data root
python3 scripts/labctl.py create <template.yaml>  # Create experiment
python3 scripts/labctl.py run-once [--max-runs N]  # Run scheduler cycle
python3 scripts/labctl.py status                   # Lab status
python3 scripts/labctl.py list                     # List experiments
python3 scripts/labctl.py pause <exp_id>           # Pause experiment
python3 scripts/labctl.py resume <exp_id>          # Resume experiment
python3 scripts/labctl.py complete <exp_id>        # Mark complete
python3 scripts/labctl.py set-fidelity <id> <tier> # Change fidelity tier
python3 scripts/labctl.py digest                   # Generate daily digest
python3 scripts/labctl.py weekly-digest            # Generate weekly digest
python3 scripts/labctl.py refresh                  # Rebuild projections
python3 scripts/labctl.py watchdog --repair        # Fix stale state
python3 scripts/labctl.py recover                  # Full recovery
python3 scripts/labctl.py burst <id> -n N          # Burst mode
python3 scripts/labctl.py guided <id> -n N         # Guided mode
python3 scripts/labctl.py swarm <id> -n N          # Swarm mode

Try It Now

export HERMES_LAB_DATA_ROOT=./demo-data
python3 scripts/labctl.py init
python3 scripts/labctl.py create examples/optimize-function/spec.yaml
python3 scripts/labctl.py burst optimize-rosenbrock --strategy random -n 20
python3 scripts/labctl.py status

See examples/optimize-function/ for details.

Requirements

Python 3.10+
No required dependencies for core functionality
Optional: optuna (Bayesian strategy), nevergrad (evolution strategy)
Tree strategy works out of the box (no dependencies)

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hermes Lab

Why Hermes Lab?

Quick Start

Execution Modes

Search Strategies

Templates

Key Concepts

Data Root Layout

Search Space

How It Works

Dispatch Workflow

Multi-Fidelity

Agent Integration

Repo Structure

Full Command Reference

Try It Now

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
docs		docs
examples		examples
lab		lab
scripts		scripts
templates		templates
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LAB_MANIFEST.json		LAB_MANIFEST.json
LICENSE		LICENSE
README.md		README.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Hermes Lab

Why Hermes Lab?

Quick Start

Execution Modes

Search Strategies

Templates

Key Concepts

Data Root Layout

Search Space

How It Works

Dispatch Workflow

Multi-Fidelity

Agent Integration

Repo Structure

Full Command Reference

Try It Now

Requirements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages