OrbFrontend · OrbFrontend · Jun 4, 2026 · Jun 4, 2026 · Jun 4, 2026 · Jun 4, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -230,7 +230,7 @@ erDiagram
 
 ## Data Contracts (the model layer)
 
-`backend/database/models.py` is the **model layer**: domain data contracts (a `TypedDict` per table-group row, plus the `PhraseGroup` union — `list[str] | LiteralPhraseGroup | RegexPhraseGroup`, a discriminated union keyed on `kind`) that describe the *shape* of persisted data and depend on nothing else in the codebase. The dependency rule is one-way — every other layer points its dependencies **inward**, toward the data, and `backend/database/` must never import "up" into `passes/` or `orchestrator.py`. (The introducing commit moved `PhraseGroup` *down* into `models.py` from `slop_detector.py` to kill the last upward import; anything in `database/` that reaches up for a shared shape is an architectural inversion — put the shape here instead.)
+`backend/database/models.py` is the **model layer**: domain data contracts (a `TypedDict` per table-group row, plus the `PhraseGroup` union — `list[str] | LiteralPhraseGroup | RegexPhraseGroup`, a discriminated union keyed on `kind`) that describe the *shape* of persisted data and depend on nothing else in the codebase. The dependency rule is one-way — every other layer points its dependencies **inward**, toward the data, and `backend/database/` must never import "up" into `passes/`, `orchestrator.py`, or `workflows/`. (The introducing commit moved `PhraseGroup` *down* into `models.py` from `slop_detector.py` to kill the last upward import; anything in `database/` that reaches up for a shared shape is an architectural inversion — put the shape here instead.) When the database layer genuinely needs higher-layer *behavior* at a fixed seam — `add_message` persisting workflow attachments inside its own write transaction — it declares the contract and the higher layer registers an implementation (dependency inversion): `database/queries/messages.py` owns `register_workflow_attachment_persister`, and `workflows/attachment_cache.py` registers `insert_workflow_attachments` into it at import. Don't reintroduce a lazy `import backend.workflows` inside a `database/` function to dodge the rule — that hides the inversion from the import graph without removing it.
 
 - **The TypedDicts label plain dicts, with zero runtime change.** The query layer still returns ordinary `dict(row)` objects; each query stamps the shape at its boundary with `cast(SomeRow, ...)` (a `TypedDict` is not assignable from a bare `dict`). So `row["col"]` access is checked against the schema without any wrapper object, validation, or runtime cost. Each `queries/*.py` module imports just the contract(s) for its tables (`SettingsRow`, `ConversationRow`/`ConversationListRow`, `MessageRow`/`MessageWithAttachments`, `EndpointRow`, `ModelConfigRow`, `WorldRow`, `LorebookEntryRow`, `CharacterCardRow`, `DirectorStateRow`, `DirectorFragmentRow`, `MoodFragmentRow`, `UserPersonaRow`, `ConversationLogRow`, `PhraseBankRow`, and the attachment rows).
 - **Every row-shaped query return is typed; only free-form blobs stay `dict`.** A query that returns table rows uses a contract. The lone exception is the per-workflow JSON state/config accessors (`get_workflow_state`, `get_workflow_message_state`, `get_workflow_character_state`, `get_workflow_config`) — these decode an arbitrary per-workflow slot with no fixed schema, so they correctly return bare `dict`/`dict | None`. Don't invent a contract for those.
@@ -389,7 +389,7 @@ Because the writer's KV cache now lives on a different server than the agent pas
 
 Orb sends the **full active message path** (leaf to root) every turn — no automatic truncation or rolling window. Inactive sibling branches are not included.
 
-- `updateContextCounter()` calls `GET /api/conversations/{cid}/context-size` which computes a per-component token breakdown (system prompt, persona, scenario, messages, director injection, lorebook, post-history) using `chars / 3.5` per component
+- `updateContextCounter()` calls `GET /api/conversations/{cid}/context-size` which computes a per-component token breakdown (system prompt, persona, scenario, messages, director injection, lorebook, post-history) using `chars / 4` per component
 - **Manual compress flow**: `POST /summarize` → LLM writes narrative summary → user reviews → `POST /compress` → creates new conversation with summary + last N messages
 - No RAG, no background compaction, no automatic summarization
 
@@ -463,4 +463,4 @@ See [docs/architecture/secondary-workflow.md](docs/architecture/secondary-workfl
 
 9. **Lorebook scan depth** — Hard-coded to 6 messages (`LOREBOOK_SCAN_DEPTH` in `prompt_builder.py`). Only the last 6 messages are scanned for lorebook keyword matches.
 
-10. **Macros resolve at different levels** — `resolve_message()` expands everything ({{user}}, {{char}}, inline macros like {{roll}}). `resolve_prompt()` only does {{user}}/{{char}} substitution. Use `resolve_prompt()` for historical messages where inline macros shouldn't fire.
+10. **Macros resolve at different levels** — `resolve_message()` expands everything ({{user}}, {{char}}, inline macros like {{roll}}). `resolve_prompt()` only does {{user}}/{{char}} substitution. Use `resolve_prompt()` for historical messages where inline macros shouldn't fire. `macros.py` is a **dependency-free leaf** (it imports nothing else in the codebase — like `database/models.py` and `llm_types.py`): it transforms strings and message dicts, and knows nothing about the LLM client. The transport-boundary catch-all that scrubs `{{user}}`/`{{char}}` from *every* outgoing message (the director's tool prompt embeds user-authored fragment text that can carry `{{char}}`) is `Macros.resolve_prompt_messages`, wired in as the `CachedBase.resolve` hook in `kv_tracker.py` — applied to `[*prefix, *trailing]` right before the call, so the KV tracker snapshots the exact resolved bytes sent. There is **no** macro-resolving `LLMClient` subclass/wrapper; don't reintroduce one.
diff --git a/backend/database/queries/messages.py b/backend/database/queries/messages.py
@@ -3,7 +3,7 @@
 import json
 import sqlite3
 from datetime import datetime, timezone
-from typing import Any, List, Mapping, Optional, Sequence, cast
+from typing import Any, List, Mapping, Optional, Protocol, Sequence, cast
 
 from ..connection import get_db
 from ..models import (
@@ -15,6 +15,36 @@
 from .conversations import get_conversation
 
 
+class _WorkflowAttachmentPersister(Protocol):
+    """Persists workflow attachments inside ``add_message``'s transaction.
+
+    Implemented by ``backend.workflows.attachment_cache.insert_workflow_attachments``
+    and registered at import time -- see the dependency-inversion note below.
+    """
+
+    async def __call__(self, message_id: int, attachments: list[dict], *, db: Any = None) -> tuple[list[int], list[dict]]: ...
+
+
+# Dependency-inversion seam. ``add_message`` must insert workflow attachments
+# inside its own write transaction (the cache layer's read->evict->insert runs
+# under the same lock as the message INSERT), yet the database layer must never
+# import "up" into ``backend.workflows``. So the workflow layer registers its
+# persister here at import time and ``add_message`` calls through this slot.
+# Left None in DB-only contexts that never produce workflow attachments; in
+# that state a workflow attachment reaching ``add_message`` is a wiring bug, so
+# we fail loudly rather than silently dropping bytes.
+_workflow_attachment_persister: "_WorkflowAttachmentPersister | None" = None
+
+
+def register_workflow_attachment_persister(fn: "_WorkflowAttachmentPersister") -> None:
+    """Wire the workflow-attachment persister into ``add_message``.
+
+    Called once, at import of ``backend.workflows.attachment_cache``.
+    """
+    global _workflow_attachment_persister
+    _workflow_attachment_persister = fn
+
+
 async def get_path_to_leaf(cid: str, leaf_id: int) -> list[MessageWithAttachments]:
     """Walk parent_id chain from leaf to root, return ordered root→leaf."""
     async with get_db() as db:
@@ -243,12 +273,16 @@ async def add_message(
                     now,
                 ),
             )
-        # Lazy import: the database package must not depend on
-        # workflows at import time (would invert the layering).
+        # Persist workflow attachments through the registered persister
+        # (see register_workflow_attachment_persister) so the database layer
+        # never imports up into backend.workflows.
         if workflow_atts:
-            from backend.workflows.attachment_cache import insert_workflow_attachments
-
-            _, rejected_workflow_atts = await insert_workflow_attachments(message_id, workflow_atts, db=db)
+            if _workflow_attachment_persister is None:
+                raise RuntimeError(
+                    "workflow attachments supplied to add_message but no persister is "
+                    "registered -- import backend.workflows before producing them"
+                )
+            _, rejected_workflow_atts = await _workflow_attachment_persister(message_id, workflow_atts, db=db)
         await db.execute("UPDATE conversations SET updated_at = ? WHERE id = ?", (now, cid))
         await db.commit()
 

diff --git a/backend/database/queries/workflow_attachments.py b/backend/database/queries/workflow_attachments.py
@@ -25,6 +25,13 @@
 
 logger = logging.getLogger(__name__)
 
+# Sentinel string written into ``data_b64`` when an artifact's bytes are
+# evicted (the other columns stay intact so a later rehydrate can recover the
+# bytes from stored parameters). Defined here -- in the database boundary --
+# because it describes the persisted shape of the column, not cache policy.
+# ``backend.workflows.attachment_cache`` re-exports it for the eviction layer.
+EVICTED_MARKER = "[evicted]"
+
 
 def _encode_metadata_field(value: object, field_name: str, workflow_id: str, filename: str) -> str | None:
     """JSON-encode a dict-shaped metadata field, or return None for absent/bad shape.
@@ -133,9 +140,6 @@ async def insert_workflow_attachment_row(
             raise ValueError("attachment data is empty")
 
     if insert_as_evicted:
-        # Lazy import keeps queries module free of attachment_cache cycle.
-        from backend.workflows.attachment_cache import EVICTED_MARKER
-
         data_b64 = EVICTED_MARKER
     elif has_path:
         with open(attachment["path"], "rb") as f:

diff --git a/backend/kv_tracker.py b/backend/kv_tracker.py
@@ -32,7 +32,7 @@
 import json
 import logging
 from dataclasses import dataclass
-from typing import Any, AsyncIterator, Mapping, Sequence
+from typing import Any, AsyncIterator, Callable, Mapping, Sequence
 
 logger = logging.getLogger(__name__)
 
@@ -208,7 +208,11 @@ def log_summary(self) -> None:
             elif stats["source"] in ("unrecognized", "no_cache_fields"):
                 provider_note = f"provider: prompt={stats['prompt_tokens']} tok  cached=N/A [{stats['source']}]"
             else:
-                pt, ct, cw = stats["prompt_tokens"], stats["cached_tokens"], stats["cache_write_tokens"]
+                pt, ct, cw = (
+                    stats["prompt_tokens"],
+                    stats["cached_tokens"],
+                    stats["cache_write_tokens"],
+                )
                 total_cached += ct
                 total_prompt += pt
                 pct = (ct / pt * 100) if pt else 0.0
@@ -291,11 +295,20 @@ class CachedBase:
     tools when it runs on a different server than the agent" — is then just a
     property of how the writer's base is built (empty ``tools``), not a flag
     threaded through the writer pass.
+
+    ``resolve`` is the last step of turning the assembled stack into the literal
+    bytes on the wire: an opaque ``messages -> messages`` transform applied to
+    ``[*prefix, *trailing]`` immediately before the call (in practice
+    ``Macros.resolve_prompt_messages``, scrubbing ``{{user}}``/``{{char}}`` from
+    whatever a pass appended). Keeping it on the base means the tracker snapshot
+    is taken from the *resolved* bytes — the same ones sent — so it cannot drift.
+    ``None`` means send the assembled stack unchanged.
     """
 
     prefix: tuple[Mapping[str, Any], ...]
     tools: tuple[dict, ...]
     model: str
+    resolve: Callable[[Sequence[Mapping[str, Any]]], list[dict]] | None = None
 
     def complete(
         self,
@@ -312,13 +325,17 @@ def complete(
         per-pass top of the stack). The cached bottom — prefix + tools + model —
         comes solely from ``self``; only *trailing* and *tool_choice* vary.
 
-        Delegates to :func:`cached_complete` so the tracker snapshot is taken
-        from the exact bytes sent.
+        The assembled stack is run through ``self.resolve`` (if set) to produce
+        the final wire bytes, then handed to :func:`cached_complete` so the
+        tracker snapshot is taken from the exact bytes sent.
         """
+        messages: Sequence[Mapping[str, Any]] = [*self.prefix, *trailing]
+        if self.resolve is not None:
+            messages = self.resolve(messages)
         return cached_complete(
             client,
             label=label,
-            messages=[*self.prefix, *trailing],
+            messages=messages,
             model=self.model,
             tools=list(self.tools) or None,
             tool_choice=tool_choice,

diff --git a/backend/macros.py b/backend/macros.py
@@ -1,6 +1,13 @@
 """
 macros.py — Macro resolution for prompts and messages.
 
+A dependency-free leaf: it turns ``{{user}}``/``{{char}}`` and inline macros
+like ``{{roll}}`` into literal text and imports nothing else in the codebase.
+It knows about *strings and message dicts*, not about the LLM client — the
+pipeline applies :meth:`Macros.resolve_prompt_messages` at the transport
+boundary (the cached-base ``resolve`` hook in ``kv_tracker.py``) rather than
+this module reaching up into the client layer.
+
 Public API:
     resolve_message(text, user_name, char_name) — Full resolution
         ({{user}}/{{char}} + inline macros like {{roll}}).
@@ -15,7 +22,8 @@
     Macros.resolve_message(text)      — instance method, full resolution
     Macros.resolve_prompt(text)       — instance method, substitution only
     Macros.resolve_prompt_messages(msgs) — batch prompt-level res on message list
-    Macros.wrap_client(client)        — wraps LLMClient for prompt-level resolution
+        (the transport-boundary catch-all that guarantees no placeholder
+        reaches the model, whatever a pass assembled)
     Macros.from_settings(...)         — factory from app settings
 """
 
@@ -25,8 +33,6 @@
 import re
 from typing import Any, Mapping, NamedTuple, Sequence
 
-from .llm_client import LLMClient
-
 
 # ---------------------------------------------------------------------------
 # Internal helpers
@@ -120,55 +126,22 @@ def resolve_prompt(self, text: str) -> str:
         """Only {{user}}/{{char}} substitution (no inline macros)."""
         return resolve_prompt(text, self.user, self.char)
 
-    def _resolve_prompt_on_message(self, msg: dict) -> dict:
+    def _resolve_prompt_on_message(self, msg: Mapping[str, Any]) -> dict:
         """Apply prompt-level resolution (substitution only) to a single message dict."""
         return {
             **msg,
             "content": _apply_content(msg.get("content"), lambda t: self.resolve_prompt(t)),
         }
 
-    def resolve_prompt_messages(self, messages: list[dict]) -> list[dict]:
-        """Apply prompt-level resolution to a list of message dicts."""
+    def resolve_prompt_messages(self, messages: Sequence[Mapping[str, Any]]) -> list[dict]:
+        """Apply prompt-level resolution to every message in a list.
+
+        This is the transport-boundary catch-all: passed to a cached base's
+        ``resolve`` hook so the fully-assembled wire messages are scrubbed of
+        ``{{user}}``/``{{char}}`` just before they are sent, no matter which
+        pass built them (e.g. the director's tool prompt embeds user-authored
+        fragment text that can carry ``{{char}}``). Inline macros like
+        ``{{roll}}`` are intentionally *not* fired here — those are resolved on
+        the latest user message and prefix content when it is built.
+        """
         return [self._resolve_prompt_on_message(m) for m in messages]
-
-    def wrap_client(self, client: LLMClient) -> "_PlaceholderClient":
-        return _PlaceholderClient(client, self.user, self.char)
-
-
-class _PlaceholderClient(LLMClient):
-    """Wraps LLMClient to resolve {{user}}/{{char}} on all messages before completion.
-
-    Only applies prompt-level resolution (no inline macros) — inline macros
-    must be resolved on the latest user message before it reaches this client.
-    """
-
-    def __init__(self, inner: LLMClient, user_name: str, char_name: str) -> None:
-        self._inner = inner
-        self._user_name = user_name
-        self._char_name = char_name
-        # Share the inner client's abort token so the inherited abort()/
-        # is_aborted reflect the same turn-wide stop signal — no delegation
-        # overrides needed. Transport config (base_url/profile/…) is left unset
-        # since complete() delegates to the inner client rather than using it.
-        self.abort_token = inner.abort_token
-
-    async def complete(
-        self,
-        messages: Sequence[Mapping[str, Any]],
-        model: str,
-        tools: list[dict] | None = None,
-        tool_choice: dict | str | None = None,
-        **params,
-    ):
-        msgs = [
-            {
-                **msg,
-                "content": _apply_content(
-                    msg.get("content"),
-                    lambda t: resolve_prompt(t, self._user_name, self._char_name),
-                ),
-            }
-            for msg in messages
-        ]
-        async for item in self._inner.complete(msgs, model, tools=tools, tool_choice=tool_choice, **params):
-            yield item
diff --git a/backend/main.py b/backend/main.py
@@ -134,6 +134,7 @@
 from . import card_downloader
 from . import prompt_builder
 from .summarizer import ConversationSummarizer
+from .utils import estimate_tokens
 
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
@@ -1674,9 +1675,6 @@ async def api_get_context_size(cid: str):
     recent_messages = messages[-scan_depth:] if len(messages) >= scan_depth else messages
     lorebook_block = prompt_builder.compute_lorebook_injection_block(recent_messages, lorebook_entries, macros)
 
-    def est(chars):
-        return max(1, round(chars / 3.5))
-
     breakdown = {}
     for label, chars in [
         ("system_prompt", len(sys_text)),
@@ -1689,12 +1687,12 @@ def est(chars):
         ("director_injection", len(inj_block)),
         ("lorebook", len(lorebook_block)),
     ]:
-        breakdown[label] = {"chars": chars, "tokens_est": est(chars)}
+        breakdown[label] = {"chars": chars, "tokens_est": estimate_tokens(chars)}
 
     total_chars = sum(v["chars"] for v in breakdown.values())
     return {
         "total_chars": total_chars,
-        "total_tokens_est": est(total_chars),
+        "total_tokens_est": estimate_tokens(total_chars),
         "breakdown": breakdown,
         "message_count": len(messages),
     }