feat(rag): split RAG_API_URL into RAG_REST_URL + RAG_MCP_URL#14
feat(rag): split RAG_API_URL into RAG_REST_URL + RAG_MCP_URL#14Smilez1985 wants to merge 25 commits into
Conversation
Adds an inline-keyboard model picker so users can switch LLM backends from Telegram without SSH or env edits, and makes the choice survive reboots. User-facing additions - New /model command. Without args it shows an inline keyboard with every preset (gemini, glm, ollama). With an argument (`/model glm`) it falls through to the existing /use logic, so the old behaviour still works. - Tapping the 🦙 ollama row queries the configured Ollama server live (`/api/tags` + `/api/show`), filters by capabilities containing `tools`, and only lists tool-capable models. If none of the installed models advertise tool support, falls back to all of them with a warning. Includes `◂ Back` button and a graceful unreachable state. - Selecting a model immediately switches the active LiteLLM model and acknowledges with the new model name on the E-Ink face. Persistence - `LiteLLMConnector.set_model()` now writes the choice to `data/active_model.json` (gitignored). On startup the connector restores that selection before falling back to `DEFAULT_LITE_PRESET`, so reboots and `systemctl restart` no longer reset the user's pick. Reliability - Increase the application's HTTP timeouts via `Application.builder()` (`read=60`, `write=60`, `connect=30`, `pool=30`). The Pi Zero 2W's WiFi can otherwise time out polling Telegram while a long Ollama reply is streaming, surfacing as `httpx.ReadError` / `Timed out`. Config - Add `OLLAMA_MODEL` and `OLLAMA_API_BASE` env-driven defaults plus an `ollama` entry in `LLM_PRESETS`. Default API base is the placeholder `http://ollama-server:11434`; set `OLLAMA_API_BASE` in `.env` to point at your actual Ollama host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`BOT_LANGUAGE` was defined in `config.py` and exposed via `.env`, but nothing actually injected it into the system prompt — heartbeat reflections and the SAY: speech bubble would happily drift into Japanese/Chinese on Qwen-family models because no language was pinned. Add `_language_directive()` and append it from `build_system_context()` so the directive is part of every system prompt path (Telegram replies, heartbeat reflections, SAY: bubble, autonomous output). Codes are mapped to readable names for the LLM (`de` → "German (Deutsch)" etc.) for the common languages; unknown codes pass through verbatim. Default behaviour stays English (`BOT_LANGUAGE=en`). Users can mirror another language by writing in it — the directive explicitly allows that override, but blocks drifting into a third language. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`needs_onboarding()` only returned False after the LLM emitted one of
the magic completion phrases ("onboarding complete", "saved to
identity.md", …) inside `check_onboarding_complete`. Several models
(notably the Ollama Qwen family) write IDENTITY.md correctly via the
`write_file` tool but never produce that exact phrase, so BOOTSTRAP.md
stays on disk and the bot retriggers onboarding on every restart.
Add a mtime-based safety net: if `IDENTITY.md` is newer than
`BOOTSTRAP.md`, the LLM has demonstrably already captured the
identity — delete BOOTSTRAP.md and treat onboarding as complete. The
existing magic-phrase path stays intact for cases where the LLM does
say it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The error screen had hardcoded Japanese SAY: strings (e.g. `システムエラー発生`, `接続タイムアウト`). For owners who don't read Japanese this just renders as garbled CJK glyphs on the E-Ink panel and obscures what actually went wrong. Make the SAY: bubble localizable: a per-language dictionary keyed by the existing `BOT_LANGUAGE` env var, covering the same five error categories (default, ratelimit, timeout, auth, syntax, llm). Ships with `ja`, `en`, `de`, `ru`, `es`, `fr`. Unknown / missing codes fall back to English. Behavior preservation: when `BOT_LANGUAGE` is unset, the language falls back to **Japanese** so existing deployments keep the project's original cyberpunk aesthetic by default. New users who set `BOT_LANGUAGE=en` (or any other supported code) get readable text. The mood / face mapping is unchanged. The English `short_error` codes in the STATUS: tail are also kept English on purpose — they read like status codes (`Rate Limited`, `Bad Syntax`). Also tighten the network branch to also catch the literal `"timed out"` form (common in `socket.timeout` strings) so transient HTTP errors classify as `timeout` rather than the default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onboarding) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a self-update path so the owner doesn't have to SSH in to pull new code from upstream and restart the service. `scripts/auto_update.sh` is the engine: - Fetches the configured remote/branch, fast-forwards if there are new commits, refreshes venv deps when `requirements.txt` changed, and restarts the systemd service. Idempotent (no-op when up-to-date) and supports `--check` for dry-run. Configurable via env vars `OCG_UPDATE_REMOTE`, `OCG_UPDATE_BRANCH`, `OCG_SERVICE`. - Pre-update tarball backup: `gotchi.db` + `data/` + `.env` go to `backups/pre-update-<timestamp>-<sha>.tar.gz`. Rolling retention keeps the newest 3 (`OCG_BACKUP_KEEP`, skip with `OCG_NO_BACKUP=1`). `backups/` is gitignored. - Auto-rollback: if the service fails to come back up after the new code is pulled, the script does `git reset --hard` to the previous HEAD, reinstalls deps if needed, restarts, and exits with code 4 to flag manual review. Disable with `OCG_NO_ROLLBACK=1`. - Pre-flight check only blocks on dirty TRACKED files; untracked local-only files (drivers, ad-hoc scripts) no longer abort the run. `/update` Telegram command: - Owner-only wrapper around the script, so updates can be triggered from chat. `/update check` reports whether new commits exist without applying them. Reports rollback (exit 4) distinctly so the owner sees the upgrade was reverted. `setup.sh`: - Adds `/etc/sudoers.d/gotchi-update` so the bot user can `systemctl restart gotchi-bot.service` without a password — needed by `/update` and the unattended cron path. For unattended auto-update users can wire the script into cron, e.g. `0 4 * * 0 /bin/bash /path/to/openclawgotchi/scripts/auto_update.sh >> /path/to/openclawgotchi/logs/update.log 2>&1`. User state (`.env`, `data/*.json`, `.workspace/`) is gitignored, so `git pull` itself never touches it. The tarball is a second line of defence against schema migrations that could corrupt the DB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md
# Conflicts: # .gitignore # CHANGELOG.md # src/bot/handlers.py # src/main.py
Adds optional battery monitoring for the popular Waveshare UPS HAT (C) that turns the Pi Zero 2W into a portable / battery-backed device. Many users run openclawgotchi on this HAT, so first-class support is worthwhile. - `src/hardware/battery.py`: single-shot reader for the on-board INA219 over I2C. Returns voltage, current, power and a 0–100 percentage based on the 2× 18650 voltage curve (6.0 V empty → 8.4 V full). Auto-detects presence; if I2C is disabled or the HAT is absent, every public function returns `None` instead of raising — callers can use `is_available()` to gate UI. I2C bus / address overridable via env (`OCG_UPS_BUS`, `OCG_UPS_ADDR`) for non-default hardware. - `/battery` Telegram command in `handlers.py`: shows the live reading (`🔋 87 % — 8.12 V, +120 mA (charging, 974 mW)`) or a friendly "no UPS HAT detected" hint with `i2cdetect` instructions. - `hardware/system.get_stats_string()` adds a `[BATTERY] …` line when present, so heartbeat reflections and the bot's self-awareness pick up battery state automatically. - `smbus2>=0.4.0` added to `requirements.txt`. Pure Python, ~30 KB. Removing the line disables battery support entirely (battery.py swallows the ImportError). Setup notes for users: enable I2C on the host (DietPi/Raspberry Pi OS), add the bot user to the `i2c` group (already part of `setup.sh`'s `usermod -aG gpio,spi,i2c …`), reboot. `i2cdetect -y 1` should show 0x43 once the HAT is connected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # src/bot/handlers.py # src/main.py
…riant
Make the E-Ink display layer driver-aware so users running the
3-color UPS HAT-friendly B-variant of the Waveshare 2.13in V4 panel
get a working display without forking the project.
Selection is opt-in via the new env var:
OCG_DISPLAY_VARIANT=mono (default — current behaviour)
OCG_DISPLAY_VARIANT=b (3-color B-variant)
OCG_DISPLAY_VARIANT=auto (prefer B if its driver is importable)
Changes:
- `src/ui/gotchi_ui.py`: import path picks `epd2in13b_V4` or
`epd2in13_V4` based on OCG_DISPLAY_VARIANT, sets a module-level
`EPD_VARIANT_B` flag. `render_ui()`'s init / Clear / display calls
branch on that flag — B has no partial refresh and `display()`
takes (black, red); the red layer is fed a blank image so existing
drawings render unchanged.
- `src/hardware/display.py`: timing knobs scale to variant. B's full
refresh takes ~15 s, so:
* `_DISPLAY_BUSY_RETRY_WAIT` jumps from 4 s to 20 s.
* `_MIN_UPDATE_INTERVAL` becomes 30 s on B (was 0) — debounces
bursts of identical updates that would block the panel for
most of a minute. Disabled (0) on mono so behaviour is
unchanged for the default install.
* Dedup — skip when (mood, text) match the previous payload.
Universally beneficial; particularly valuable on B.
* `FULL_REFRESH_EVERY` ghosting compensation is now no-op on B
(B always full-refreshes), preserved for mono.
- `src/drivers/epd2in13b_V4.py`: ship the Waveshare reference driver
for the B variant alongside the existing mono driver. Sourced from
the Waveshare e-Paper sample repo, MIT-licensed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`HEARTBEAT.md` template is English-only by design (it's the system instruction the LLM follows). When BOT_LANGUAGE is set to a non- English locale, the system-level language directive in `build_system_context()` is correctly applied, but the long English user prompt — soul + identity + heartbeat template + recent context — overpowers it and the model writes the reflection in English. Add a final language reminder right before the LLM is invoked, so the language pin lives at the end of the user prompt where it's hardest to override. Mirrors the names map already used by `_language_directive()` in `llm/prompts.py`. No-op for BOT_LANGUAGE=en or unset. Telegram replies were already in the correct language because regular chat handlers don't have a comparably long instruction template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bringing the cleaned (server-agnostic, no private-repo refs) version of the RAG REST client into deploy/all-features so the running bot is consistent with the upstream PR turmyshevd#10. Mirrors feat/bot-rag-integration @ b20c55a. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Demonstrates the new B-variant red layer with a real use case: when a UPS HAT (C) is connected and reports < 20 % charge, the " | 🔋NN%/X.XXV" suffix in the header stats line renders RED on the panel instead of black. Healthy batteries / mono panels look identical to before. Implementation: - render_ui() builds a parallel red_image (only on B variant) that stays all-white unless an accent is drawn into it. - A best-effort battery probe (gracefully no-ops when no UPS HAT or smbus2 is missing) supplies the suffix. - Below 20 %: prefix renders black, battery suffix renders into the red layer only — the panel composites black + red → suffix shows as red text inline. - Otherwise: single black draw call, red layer stays blank. Red is treated as an accent, never a background — so the user's "don't paint the whole display red" rule is honoured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The display skill currently tells the LLM "Colors: Black & white only". That's correct for the mono panel but misleading once the B-variant support lands — the bot needs to understand: - which physical panel is in play (selected via OCG_DISPLAY_VARIANT) - that there is no `RED:` directive it can emit - that red rendering is system-initiated (today: low battery <20 %) - the standing rule: red is an accent, never a background This avoids the failure mode where the LLM, having been told "you can control colors", asks for or describes red usage that doesn't exist — or worse, instructs the bot to flood the screen red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Waveshare's UPS HAT (C) ships with a single 18650 cell (1S), not the 2S pack the original code assumed. A fully-charged 4.2 V cell mapped to (4.2 − 6.0) / 2.4 = −0.75 → clamped to 0 %, so users always saw "empty" regardless of actual charge. Switch the linear voltage→percent map to 1S range: empty 3.0 V → 0 % full 4.2 V → 100 % Verified on hardware: 4.09 V → 91 % (consistent with a near-full cell with ~5 % discharge tolerance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…amage An earlier `git checkout feat/bot-rag-integration -- src/...` to bring the cleaned RAG client onto deploy/all-features replaced the merged deploy versions of bot/handlers.py, main.py, src/config.py with the PR turmyshevd#10 branch's narrower versions, dropping cmd_model / cb_model / cmd_update / cmd_battery and the OLLAMA_* config. The bot crash-looped on startup (ImportError: cannot import name 'cmd_model'). Restore the merged versions (sourced from 288aa3d, the last good deploy commit) and re-add the cmd_rag / RAG_* additions on top so all features coexist. No upstream PR is affected — these files on the upstream PR branches stay scoped to their feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues seen on B-variant hardware after bot-driven display
updates (FACE: / SAY: from LLM output):
1. Top-left always showed "Gotchi" instead of the configured BOT_NAME
(e.g. "Clotchi"). Cause: `sudo` strips env_reset Defaults, so the
subprocess's `os.environ.get("BOT_NAME")` fell back to the literal
"Gotchi" default. Fix: also propagate BOT_NAME, OWNER_NAME and
BOT_LANGUAGE through the existing `sudo /usr/bin/env VAR=val ...`
wrapper that already handles OCG_DISPLAY_VARIANT and friends.
2. Battery suffix in the header stats line pushed the bot name on
the top-left off-screen / under other content when the line got
long. Move it into the footer centre instead — the footer has
spare horizontal space between the status text on the left and
the XP indicator on the right. On B variant a low-charge battery
renders into the red layer (red text accent) instead of black;
above the threshold or on mono panels it stays black.
Status text is now truncated to 30 chars (was 35) so it doesn't
overlap the centred battery cell on long messages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`apply_auto_mood()` returns a (mood, text) tuple. The text becomes the
footer status_text on the E-Ink panel, while the header simultaneously
renders the same metrics in its always-on stats line
(`T:51°C | Free:79MB | …`). For low RAM / high temp the auto-mood text
read e.g. "Low RAM: 79MB" or "Hot! 51°C" — exactly what the header
already shows, just one frame older. The two numbers drift between
frames and the duplicate crowds an already tight 250×122 layout.
Drop the metric values from the warning text and keep the warning
itself ("RAM low", "Running hot", "OVERHEATING!", "OOM!"). The header
keeps reporting the live numbers; the footer adds the qualitative
warning beside them. No mood mapping changes, no thresholds change,
no API change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-side pinning (only at the end) wasn't enough — the long English HEARTBEAT.md template still pulled the model into English on a fraction of heartbeats even when BOT_LANGUAGE was set. Wrap the prompt in the configured language at the start AND at the end so whichever side the model anchors to, the language is set. The front pin uses the user's language directly (e.g. German), making the very first thing the model reads a strong directive in the target language. The end pin restates the requirement just before generation starts. Both reference each other so it's clear they're the same rule. Behaviour for `BOT_LANGUAGE=en` or unset: no change, no extra prompt, no change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets the bot consume tools advertised by any external MCP server
that speaks the SSE transport, without dragging in the official
`mcp[cli]` Python package (it pulls `cryptography`, `pydantic-settings`,
`starlette`, `uvicorn`, `pyjwt`, `httpx-sse`, `sse-starlette`,
`python-multipart` — non-trivial RAM hit on a 512 MB Pi Zero 2W).
What's added
- src/llm/rag_mcp_client.py — hand-rolled MCP-over-SSE client,
~250 LoC, stdlib-only + `requests` (already in the venv via
litellm). Background thread reads the SSE stream and dispatches
JSON-RPC responses by id; sync `connect()` / `initialize()` /
`list_tools()` / `call_tool(name, args)` API. A module-level
`get_client()` returns a lazy singleton so multiple tool calls
share one SSE connection.
- Two new LLM tools wired into TOOL_MAP:
`mcp_list_tools()` — return advertised tool names + descriptions
`mcp_call_tool(name, arguments)` — invoke by name; arguments is
a JSON object passed as a string
(the LLM emits one).
Both gracefully no-op when the MCP path isn't configured, returning
a clear hint instead of raising.
Activation
- Env var `RAG_TRANSPORT=rest|mcp` (default `rest`). When `mcp`,
`RAG_API_URL` is interpreted as the MCP-SSE base URL (e.g.
`http://your-rag-host:8766`).
- Reuses `RAG_API_KEY` for optional Bearer auth.
- No new top-level dependencies.
Tested against rag-core's MCP-SSE endpoint
(advertised tools: rag_search, rag_persist, rag_status,
rag_list_collections, rag_recall_session, rag_session_announce,
rag_session_forget). `tools/list` returns the catalog; `tools/call`
dispatches and returns the rendered text content correctly.
Out of scope (separate follow-ups)
- Auto-registration of advertised MCP tools as first-class TOOL_MAP
entries (each with its own typed JSON schema). Today the LLM has
to look at `mcp_list_tools` then construct an `mcp_call_tool` call
itself; auto-registration would let it call them as if native.
- Multi-server support (today: single MCP server via RAG_API_URL).
- Async transport / WebSocket fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prompt
When RAG_TRANSPORT=mcp and the MCP server is reachable at startup,
discover its advertised tools via tools/list and register each one as
a first-class TOOL_MAP entry with full JSON-Schema. The LLM then calls
e.g. `rag_search(query=..., top_k=3)` directly instead of the two-hop
`mcp_list_tools` → `mcp_call_tool` indirection. Names that collide
with an existing TOOL_MAP entry are skipped. Failures are logged but
never crash the bot.
A new system-prompt section "External Memory (MCP)" lists which tools
were registered and instructs the bot to:
- search RAG BEFORE answering questions about user preferences,
project rules, decisions, or past context
- persist durable lessons via the persist tool
- optionally announce session context once per conversation
Why: the previous PR exposed `mcp_list_tools` / `mcp_call_tool` as
generic glue, but the LLM wouldn't reach for them on its own — and
even when it did, the two-hop indirection wasted turns. Auto-
registration lets the agent use the RAG as its durable memory the
same way it already uses `remember_fact` / `recall_facts` for the
in-process store; the prompt block tells it WHEN.
Both `mcp_list_tools` / `mcp_call_tool` remain available as fallback
for ad-hoc discovery and tools that show up after bot start.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ransports side-by-side A typical rag-core deployment exposes REST on :8765 AND an MCP-SSE gateway on :8766 simultaneously, but the bot's previous config locked itself to one transport at a time because both clients shared a single ``RAG_API_URL`` and an ``RAG_TRANSPORT`` switch. This separates them: - ``RAG_REST_URL`` — REST endpoint, used by ``rag_client.py``, ``query_rag`` / ``persist_to_rag`` LLM tools, and ``/rag`` Telegram command. - ``RAG_MCP_URL`` — MCP-SSE base URL, used by ``rag_mcp_client.py`` and the auto-registered first-class MCP tools (``rag_search`` etc.). Set one, the other, or both. Empty → degrade gracefully (no-op). Backwards compat: ``RAG_API_URL`` + ``RAG_TRANSPORT=rest|mcp`` still honored — a small shim in ``config.py`` maps the legacy URL onto whichever transport ``RAG_TRANSPORT`` selects (default ``rest``). Existing single-URL deployments keep working with no env edits. ``RAG_API_URL`` remains importable as an alias of ``RAG_REST_URL`` for any third-party code reading config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Quick context on this integration — wanted to be upfront about where it comes from. The RAG backend I built against is a project I call rag-core: a self-hosted, project-scoped retrieval memory layer with a REST API and an MCP-SSE gateway (Qdrant-backed, multi-collection, frontmatter-aware chunking, designed for AI agents). The repo is currently private while I finish a handful of pre-public-readiness items, but the plan is to open it up before long. This PR series (#10, #12, #13, #14) is intentionally written against a generic contract — any RAG backend that speaks the documented REST shape, and any MCP-SSE server with Reason for sending this upstream rather than keeping it on my fork: I use OpenClawGotchi as one of my main development drivers, and the RAG integration has become part of how I work with it day-to-day. Carrying a long-lived branch and re-rebasing on every upstream pull is a fair amount of churn — having it on No pressure on accepting; happy to iterate to whatever shape fits best, or to wait if you'd rather see the other end of the wire first. Once rag-core is public I'll ping back here — at that point you'd be welcome to clone the repo and stand up your own instance if you want to see how it pairs with the bot from the other side. (Just to be clear: this isn't an invite to my running instance — it's an invite to the source, so you can run your own.) |
Summary
Builds on #10 (REST RAG) and #12/#13 (MCP client + auto-registration). A typical
rag-coredeployment exposes REST on:8765and an MCP-SSE gateway on:8766simultaneously, but the bot's previous config locked itself to one transport at a time because both clients shared a singleRAG_API_URLand anRAG_TRANSPORTswitch. This separates them:RAG_REST_URL— REST endpoint, used byrag_client.py, thequery_rag/persist_to_ragLLM tools, and the/ragTelegram command.RAG_MCP_URL— MCP-SSE base URL, used byrag_mcp_client.pyand the auto-registered first-class MCP tools (rag_searchetc.).Set one, the other, or both. Empty → graceful no-op as before.
Backwards compatibility
RAG_API_URL+RAG_TRANSPORT=rest|mcpstill honored. A compat shim inconfig.pymaps the legacy URL onto whichever transportRAG_TRANSPORTselects (defaultrest). Existing single-URL deployments keep working with no env edits.RAG_API_URLis still importable as an alias ofRAG_REST_URLfor any third-party code reading config.Why
Verified from the
rag-coreproject's own architecture notes that REST and MCP-SSE run as two independent services on the same host. With the prior config the bot could either query REST or speak MCP — never both at once. Splitting the env var lets the LLM userag_search(over MCP, where it gets a typed schema) AND a human user use/rag …(over REST, where the curated query path lives) on the same deployment.Test plan
RAG_API_URL+RAG_TRANSPORT=mcp→ routes to MCPRAG_API_URL(no transport, default rest) → routes to RESTrag_client.health()returns OK from REST :8765 ANDMCPSSEClient.list_tools()returns 7 tools from MCP :8766 in the same process/rag …(REST path) and an LLM-drivenrag_searchcall (MCP path) both succeed without restart between them🤖 Generated with Claude Code