Operational runbook for cc-manager in shared or production environments.
Recommended baseline:
- Node.js 22 LTS
- 4-10 workers for general workloads
- dedicated repo clone per environment
- persistent storage for
.cc-manager.db
Start command example:
cc-manager \
--repo /srv/repos/target-repo \
--workers 8 \
--port 8080 \
--budget 3 \
--total-budget 150Example systemd unit (/etc/systemd/system/cc-manager.service):
[Unit]
Description=CC-Manager
After=network.target
[Service]
Type=simple
WorkingDirectory=/srv/cc-manager/v1
Environment=ANTHROPIC_API_KEY=sk-ant-...
ExecStart=/usr/local/bin/cc-manager --repo /srv/repos/target-repo --workers 8 --port 8080
Restart=always
RestartSec=3
User=ccmanager
Group=ccmanager
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl daemon-reload
sudo systemctl enable cc-manager
sudo systemctl start cc-manager
sudo systemctl status cc-manager- Pull latest code.
- Install dependencies.
- Build TypeScript output.
- Restart service.
- Run health and smoke checks.
git pull
npm ci
npm run build
sudo systemctl restart cc-manager
curl http://localhost:8080/api/healthcc-manager stores task state in SQLite files at repository root:
.cc-manager.db.cc-manager.db-shm.cc-manager.db-wal
Daily backup recommendation:
tar -czf cc-manager-backup-$(date +%F).tgz \
.cc-manager.db .cc-manager.db-shm .cc-manager.db-walRecovery:
- Stop the service.
- Restore database files.
- Start the service.
- Verify
/api/healthand/api/stats.
GET /api/health: liveness and worker summaryGET /api/stats: queue depth, active workers, budget stateGET /api/tasks/errors: recent failuresGET /api/workers: worker saturation
Suggested alerts:
- queue depth above expected threshold for 10+ minutes
- repeated task failures above baseline
- total budget near hard cap
- no successful tasks in a rolling time window
Symptom: tasks fail immediately with command execution errors.
Actions:
- confirm CLI is installed (
claude --version,codex --version) - confirm binary is in service
PATH - set explicit
--agentcommand
Actions:
- increase
--timeout - reduce
--workersif host is saturated - split large prompts into smaller tasks
Actions:
- reduce parallelism for high-overlap code areas
- use tags to segment queue by subsystem
- retry failed tasks with narrower prompts
Actions:
- lower per-task
--budget - set
--total-budgetto enforce global limits - move lower-priority work to off-peak windows
- keep API keys in environment variables or a secret manager
- avoid embedding secrets in task prompts
- restrict network exposure of the server port
- route through TLS and auth at an ingress or proxy layer