An OpenClaw skill and systemd daemon that automatically detects and repairs gateway disconnections, unresolvable plugin entries, and LLM model mismatches — with a live CLI dashboard to watch it all in real time.
Built and tested on OpenClaw 2026.x running on Ubuntu with systemd.
| Problem | How It's Handled |
|---|---|
| Gateway crashes | Restart=always brings it back via systemd |
Plugin not found (e.g. retell-voice) |
fix-plugins.sh detects via jq + openclaw doctor, removes entry, restarts |
| Deprecated or mismatched model string | check-models.sh validates against known-good list, rewrites if needed |
| Invalid or expired API key | check-models.sh probes each provider's /v1/models endpoint live |
| Broken escalation chain | Chain integrity checked across all fallback models |
| Agent disconnects mid-session | Watchdog polls every 30s, triggers repair if unhealthy |
| Network down at startup | Model probing skipped entirely, plugin fix still runs |
gateway-watchdog/
├── install.sh # One-command installer
├── openclaw-startup-fix.sh # Runs on every gateway start via ExecStartPost
├── openclaw-watchdog.sh # Systemd daemon, polls every 30s
├── openclaw-watchdog.service # Watchdog systemd unit
├── openclaw-gateway.service # Gateway systemd unit (reference / patch target)
├── openclaw-watch.sh # Live CLI dashboard
└── skill/
├── SKILL.md # OpenClaw skill definition
├── fix-plugins.sh # Plugin detection and removal
└── check-models.sh # Model string, API key, and provider health checks
git clone <this repo>
cd gateway-watchdog
bash install.shRequires: jq, bash, systemd, Linux. jq will be installed automatically if missing.
After install, verify the ExecStartPost hook was applied to the gateway service:
systemctl cat openclaw-gateway.service | grep -E "ExecStart|Restart"You should see both ExecStart and ExecStartPost=/usr/local/bin/openclaw-startup-fix.sh.
Every time the gateway starts (whether after a crash or manually), openclaw-startup-fix.sh fires automatically:
Gateway starts
└─ ExecStartPost: openclaw-startup-fix.sh
├─ Backup config (timestamped, pruned after 7 days)
├─ Phase 1: Plugin fix ← always runs, local only
└─ Phase 2: Model health
├─ Network check ← skip probing if offline
├─ Previous state check ← alert-only if last state was healthy
└─ Auto-fix ← only if state was already degraded
The watchdog daemon runs in parallel and catches issues that develop after startup — rate limits, keys expiring mid-session, provider outages.
Three independent gates prevent incorrect config changes:
1. Network gate Before probing any provider endpoint, connectivity is verified with a 3-second ping. If the network is not up, the entire model-probing phase is skipped. The plugin fix still runs because it is purely local.
2. Previous state gate
The last recorded watchdog state is checked before any model auto-fix. If the state was healthy, the models were working before the crash — the script alerts and logs but does not touch the config. A human decision is required to force a model fix in that case.
3. Backup before every write
A timestamped backup is saved to ~/.openclaw/backups/ before any config change. Backups older than 7 days are pruned automatically. The worst case is always one manual restore:
cp ~/.openclaw/backups/openclaw.json.startup.<timestamp> ~/.openclaw/openclaw.json
systemctl restart openclaw-gateway.service# Live refresh every 5s
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh --follow
# One-shot snapshot
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh
# Full model health report
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh --models
# Run repair with animated progress bar
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh --repairIn --follow mode:
- R + Enter — run auto-repair with live progress bar
- M + Enter — full model health report
- Q + Enter — quit
# Plugin fix only
bash ~/.openclaw/skills/gateway-watchdog/fix-plugins.sh
# Model health check (report only)
bash ~/.openclaw/skills/gateway-watchdog/check-models.sh
# Model health check with auto-fix
bash ~/.openclaw/skills/gateway-watchdog/check-models.sh --fix
# Full startup fix (same as what runs automatically)
bash /usr/local/bin/openclaw-startup-fix.sh
# Watchdog service
systemctl status openclaw-watchdog
journalctl -u openclaw-watchdog -f
# Startup fix log
tail -f /var/log/openclaw-startup-fix.log
# Watchdog log
tail -f /var/log/openclaw-watchdog.logOnce installed, type /gateway-watchdog in the OpenClaw TUI to trigger a full health check and repair cycle from within the agent itself.
The script does not call openclaw gateway start to detect bad plugins — that command opens an interactive wizard and hangs. Instead it uses two methods:
- jq + filesystem scan — reads every key from
plugins.entriesinopenclaw.json, checks if a corresponding directory or node module exists on disk. Instant, no CLI calls. openclaw doctorwith a 10-second timeout — supplements method 1 for broken installs where files exist but the plugin still fails to resolve.</dev/nullcloses stdin to prevent interactive prompts.
The State dir migration skipped warning that appears in every openclaw doctor run is filtered out — it is a cosmetic warning, not an error.
check-models.sh checks:
- Primary model string format (
provider/model-name) - Known deprecated model strings with suggested replacements
- Fallback / escalation chain integrity and provider diversity
- API key presence (credentials file → env var → config file)
- Live provider endpoint reachability (HTTP 200 = valid, 401/403 = bad key, 429 = rate limited)
- Subagent and TTS summary model strings
- Local providers (Ollama, LMStudio) via local endpoint ping
Supported providers: Anthropic, OpenAI, Google/Gemini, Moonshot/Kimi, Mistral, Groq, OpenRouter, Ollama, LMStudio.
OpenClaw may overwrite its own systemd service file on update, removing the ExecStartPost hook. If that happens, re-run the installer:
bash install.shIt is idempotent — safe to run multiple times.
MIT. No warranties. Review before running on production systems.