gateway-watchdog

An OpenClaw skill and systemd daemon that automatically detects and repairs gateway disconnections, unresolvable plugin entries, and LLM model mismatches — with a live CLI dashboard to watch it all in real time.

Built and tested on OpenClaw 2026.x running on Ubuntu with systemd.

What It Does

Problem	How It's Handled
Gateway crashes	`Restart=always` brings it back via systemd
Plugin not found (e.g. `retell-voice`)	`fix-plugins.sh` detects via jq + `openclaw doctor`, removes entry, restarts
Deprecated or mismatched model string	`check-models.sh` validates against known-good list, rewrites if needed
Invalid or expired API key	`check-models.sh` probes each provider's `/v1/models` endpoint live
Broken escalation chain	Chain integrity checked across all fallback models
Agent disconnects mid-session	Watchdog polls every 30s, triggers repair if unhealthy
Network down at startup	Model probing skipped entirely, plugin fix still runs

File Structure

gateway-watchdog/
├── install.sh                    # One-command installer
├── openclaw-startup-fix.sh       # Runs on every gateway start via ExecStartPost
├── openclaw-watchdog.sh          # Systemd daemon, polls every 30s
├── openclaw-watchdog.service     # Watchdog systemd unit
├── openclaw-gateway.service      # Gateway systemd unit (reference / patch target)
├── openclaw-watch.sh             # Live CLI dashboard
└── skill/
    ├── SKILL.md                  # OpenClaw skill definition
    ├── fix-plugins.sh            # Plugin detection and removal
    └── check-models.sh           # Model string, API key, and provider health checks

Install

git clone <this repo>
cd gateway-watchdog
bash install.sh

Requires: jq, bash, systemd, Linux. jq will be installed automatically if missing.

After install, verify the ExecStartPost hook was applied to the gateway service:

systemctl cat openclaw-gateway.service | grep -E "ExecStart|Restart"

You should see both ExecStart and ExecStartPost=/usr/local/bin/openclaw-startup-fix.sh.

Auto-Repair Flow

Every time the gateway starts (whether after a crash or manually), openclaw-startup-fix.sh fires automatically:

Gateway starts
  └─ ExecStartPost: openclaw-startup-fix.sh
       ├─ Backup config (timestamped, pruned after 7 days)
       ├─ Phase 1: Plugin fix       ← always runs, local only
       └─ Phase 2: Model health
            ├─ Network check        ← skip probing if offline
            ├─ Previous state check ← alert-only if last state was healthy
            └─ Auto-fix             ← only if state was already degraded

The watchdog daemon runs in parallel and catches issues that develop after startup — rate limits, keys expiring mid-session, provider outages.

Safety Mitigations

Three independent gates prevent incorrect config changes:

1. Network gate Before probing any provider endpoint, connectivity is verified with a 3-second ping. If the network is not up, the entire model-probing phase is skipped. The plugin fix still runs because it is purely local.

2. Previous state gate The last recorded watchdog state is checked before any model auto-fix. If the state was healthy, the models were working before the crash — the script alerts and logs but does not touch the config. A human decision is required to force a model fix in that case.

3. Backup before every write A timestamped backup is saved to ~/.openclaw/backups/ before any config change. Backups older than 7 days are pruned automatically. The worst case is always one manual restore:

cp ~/.openclaw/backups/openclaw.json.startup.<timestamp> ~/.openclaw/openclaw.json
systemctl restart openclaw-gateway.service

Live Dashboard

# Live refresh every 5s
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh --follow

# One-shot snapshot
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh

# Full model health report
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh --models

# Run repair with animated progress bar
bash ~/.openclaw/skills/gateway-watchdog/openclaw-watch.sh --repair

In --follow mode:

R + Enter — run auto-repair with live progress bar
M + Enter — full model health report
Q + Enter — quit

Manual Commands

# Plugin fix only
bash ~/.openclaw/skills/gateway-watchdog/fix-plugins.sh

# Model health check (report only)
bash ~/.openclaw/skills/gateway-watchdog/check-models.sh

# Model health check with auto-fix
bash ~/.openclaw/skills/gateway-watchdog/check-models.sh --fix

# Full startup fix (same as what runs automatically)
bash /usr/local/bin/openclaw-startup-fix.sh

# Watchdog service
systemctl status openclaw-watchdog
journalctl -u openclaw-watchdog -f

# Startup fix log
tail -f /var/log/openclaw-startup-fix.log

# Watchdog log
tail -f /var/log/openclaw-watchdog.log

Agent Slash Command

Once installed, type /gateway-watchdog in the OpenClaw TUI to trigger a full health check and repair cycle from within the agent itself.

How Plugin Detection Works

The script does not call openclaw gateway start to detect bad plugins — that command opens an interactive wizard and hangs. Instead it uses two methods:

jq + filesystem scan — reads every key from plugins.entries in openclaw.json, checks if a corresponding directory or node module exists on disk. Instant, no CLI calls.
openclaw doctor with a 10-second timeout — supplements method 1 for broken installs where files exist but the plugin still fails to resolve. </dev/null closes stdin to prevent interactive prompts.

The State dir migration skipped warning that appears in every openclaw doctor run is filtered out — it is a cosmetic warning, not an error.

Model Validation

check-models.sh checks:

Primary model string format (provider/model-name)
Known deprecated model strings with suggested replacements
Fallback / escalation chain integrity and provider diversity
API key presence (credentials file → env var → config file)
Live provider endpoint reachability (HTTP 200 = valid, 401/403 = bad key, 429 = rate limited)
Subagent and TTS summary model strings
Local providers (Ollama, LMStudio) via local endpoint ping

Supported providers: Anthropic, OpenAI, Google/Gemini, Moonshot/Kimi, Mistral, Groq, OpenRouter, Ollama, LMStudio.

After an OpenClaw Update

OpenClaw may overwrite its own systemd service file on update, removing the ExecStartPost hook. If that happens, re-run the installer:

bash install.sh

It is idempotent — safe to run multiple times.

License

MIT. No warranties. Review before running on production systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gateway-watchdog

What It Does

File Structure

Install

Auto-Repair Flow

Safety Mitigations

Live Dashboard

Manual Commands

Agent Slash Command

How Plugin Detection Works

Model Validation

After an OpenClaw Update

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
SKILL.md		SKILL.md
check-models.sh		check-models.sh
fix-plugins.sh		fix-plugins.sh
install.sh		install.sh
openclaw-gateway.service		openclaw-gateway.service
openclaw-startup-fix.sh		openclaw-startup-fix.sh
openclaw-watchdog.service		openclaw-watchdog.service

Folders and files

Latest commit

History

Repository files navigation

gateway-watchdog

What It Does

File Structure

Install

Auto-Repair Flow

Safety Mitigations

Live Dashboard

Manual Commands

Agent Slash Command

How Plugin Detection Works

Model Validation

After an OpenClaw Update

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages