Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ erDiagram

## Data Contracts (the model layer)

`backend/database/models.py` is the **model layer**: domain data contracts (a `TypedDict` per table-group row, plus the `PhraseGroup` union — `list[str] | LiteralPhraseGroup | RegexPhraseGroup`, a discriminated union keyed on `kind`) that describe the *shape* of persisted data and depend on nothing else in the codebase. The dependency rule is one-way — every other layer points its dependencies **inward**, toward the data, and `backend/database/` must never import "up" into `passes/` or `orchestrator.py`. (The introducing commit moved `PhraseGroup` *down* into `models.py` from `slop_detector.py` to kill the last upward import; anything in `database/` that reaches up for a shared shape is an architectural inversion — put the shape here instead.)
`backend/database/models.py` is the **model layer**: domain data contracts (a `TypedDict` per table-group row, plus the `PhraseGroup` union — `list[str] | LiteralPhraseGroup | RegexPhraseGroup`, a discriminated union keyed on `kind`) that describe the *shape* of persisted data and depend on nothing else in the codebase. The dependency rule is one-way — every other layer points its dependencies **inward**, toward the data, and `backend/database/` must never import "up" into `passes/`, `orchestrator.py`, or `workflows/`. (The introducing commit moved `PhraseGroup` *down* into `models.py` from `slop_detector.py` to kill the last upward import; anything in `database/` that reaches up for a shared shape is an architectural inversion — put the shape here instead.) When the database layer genuinely needs higher-layer *behavior* at a fixed seam — `add_message` persisting workflow attachments inside its own write transaction — it declares the contract and the higher layer registers an implementation (dependency inversion): `database/queries/messages.py` owns `register_workflow_attachment_persister`, and `workflows/attachment_cache.py` registers `insert_workflow_attachments` into it at import. Don't reintroduce a lazy `import backend.workflows` inside a `database/` function to dodge the rule — that hides the inversion from the import graph without removing it.

- **The TypedDicts label plain dicts, with zero runtime change.** The query layer still returns ordinary `dict(row)` objects; each query stamps the shape at its boundary with `cast(SomeRow, ...)` (a `TypedDict` is not assignable from a bare `dict`). So `row["col"]` access is checked against the schema without any wrapper object, validation, or runtime cost. Each `queries/*.py` module imports just the contract(s) for its tables (`SettingsRow`, `ConversationRow`/`ConversationListRow`, `MessageRow`/`MessageWithAttachments`, `EndpointRow`, `ModelConfigRow`, `WorldRow`, `LorebookEntryRow`, `CharacterCardRow`, `DirectorStateRow`, `DirectorFragmentRow`, `MoodFragmentRow`, `UserPersonaRow`, `ConversationLogRow`, `PhraseBankRow`, and the attachment rows).
- **Every row-shaped query return is typed; only free-form blobs stay `dict`.** A query that returns table rows uses a contract. The lone exception is the per-workflow JSON state/config accessors (`get_workflow_state`, `get_workflow_message_state`, `get_workflow_character_state`, `get_workflow_config`) — these decode an arbitrary per-workflow slot with no fixed schema, so they correctly return bare `dict`/`dict | None`. Don't invent a contract for those.
Expand Down Expand Up @@ -389,7 +389,7 @@ Because the writer's KV cache now lives on a different server than the agent pas

Orb sends the **full active message path** (leaf to root) every turn — no automatic truncation or rolling window. Inactive sibling branches are not included.

- `updateContextCounter()` calls `GET /api/conversations/{cid}/context-size` which computes a per-component token breakdown (system prompt, persona, scenario, messages, director injection, lorebook, post-history) using `chars / 3.5` per component
- `updateContextCounter()` calls `GET /api/conversations/{cid}/context-size` which computes a per-component token breakdown (system prompt, persona, scenario, messages, director injection, lorebook, post-history) using `chars / 4` per component
- **Manual compress flow**: `POST /summarize` → LLM writes narrative summary → user reviews → `POST /compress` → creates new conversation with summary + last N messages
- No RAG, no background compaction, no automatic summarization

Expand Down Expand Up @@ -463,4 +463,4 @@ See [docs/architecture/secondary-workflow.md](docs/architecture/secondary-workfl

9. **Lorebook scan depth** — Hard-coded to 6 messages (`LOREBOOK_SCAN_DEPTH` in `prompt_builder.py`). Only the last 6 messages are scanned for lorebook keyword matches.

10. **Macros resolve at different levels** — `resolve_message()` expands everything ({{user}}, {{char}}, inline macros like {{roll}}). `resolve_prompt()` only does {{user}}/{{char}} substitution. Use `resolve_prompt()` for historical messages where inline macros shouldn't fire.
10. **Macros resolve at different levels** — `resolve_message()` expands everything ({{user}}, {{char}}, inline macros like {{roll}}). `resolve_prompt()` only does {{user}}/{{char}} substitution. Use `resolve_prompt()` for historical messages where inline macros shouldn't fire. `macros.py` is a **dependency-free leaf** (it imports nothing else in the codebase — like `database/models.py` and `llm_types.py`): it transforms strings and message dicts, and knows nothing about the LLM client. The transport-boundary catch-all that scrubs `{{user}}`/`{{char}}` from *every* outgoing message (the director's tool prompt embeds user-authored fragment text that can carry `{{char}}`) is `Macros.resolve_prompt_messages`, wired in as the `CachedBase.resolve` hook in `kv_tracker.py` — applied to `[*prefix, *trailing]` right before the call, so the KV tracker snapshots the exact resolved bytes sent. There is **no** macro-resolving `LLMClient` subclass/wrapper; don't reintroduce one.
46 changes: 40 additions & 6 deletions backend/database/queries/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import json
import sqlite3
from datetime import datetime, timezone
from typing import Any, List, Mapping, Optional, Sequence, cast
from typing import Any, List, Mapping, Optional, Protocol, Sequence, cast

from ..connection import get_db
from ..models import (
Expand All @@ -15,6 +15,36 @@
from .conversations import get_conversation


class _WorkflowAttachmentPersister(Protocol):
"""Persists workflow attachments inside ``add_message``'s transaction.

Implemented by ``backend.workflows.attachment_cache.insert_workflow_attachments``
and registered at import time -- see the dependency-inversion note below.
"""

async def __call__(self, message_id: int, attachments: list[dict], *, db: Any = None) -> tuple[list[int], list[dict]]: ...


# Dependency-inversion seam. ``add_message`` must insert workflow attachments
# inside its own write transaction (the cache layer's read->evict->insert runs
# under the same lock as the message INSERT), yet the database layer must never
# import "up" into ``backend.workflows``. So the workflow layer registers its
# persister here at import time and ``add_message`` calls through this slot.
# Left None in DB-only contexts that never produce workflow attachments; in
# that state a workflow attachment reaching ``add_message`` is a wiring bug, so
# we fail loudly rather than silently dropping bytes.
_workflow_attachment_persister: "_WorkflowAttachmentPersister | None" = None


def register_workflow_attachment_persister(fn: "_WorkflowAttachmentPersister") -> None:
"""Wire the workflow-attachment persister into ``add_message``.

Called once, at import of ``backend.workflows.attachment_cache``.
"""
global _workflow_attachment_persister
_workflow_attachment_persister = fn


async def get_path_to_leaf(cid: str, leaf_id: int) -> list[MessageWithAttachments]:
"""Walk parent_id chain from leaf to root, return ordered root→leaf."""
async with get_db() as db:
Expand Down Expand Up @@ -243,12 +273,16 @@ async def add_message(
now,
),
)
# Lazy import: the database package must not depend on
# workflows at import time (would invert the layering).
# Persist workflow attachments through the registered persister
# (see register_workflow_attachment_persister) so the database layer
# never imports up into backend.workflows.
if workflow_atts:
from backend.workflows.attachment_cache import insert_workflow_attachments

_, rejected_workflow_atts = await insert_workflow_attachments(message_id, workflow_atts, db=db)
if _workflow_attachment_persister is None:
raise RuntimeError(
"workflow attachments supplied to add_message but no persister is "
"registered -- import backend.workflows before producing them"
)
_, rejected_workflow_atts = await _workflow_attachment_persister(message_id, workflow_atts, db=db)
await db.execute("UPDATE conversations SET updated_at = ? WHERE id = ?", (now, cid))
await db.commit()

Expand Down
10 changes: 7 additions & 3 deletions backend/database/queries/workflow_attachments.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@

logger = logging.getLogger(__name__)

# Sentinel string written into ``data_b64`` when an artifact's bytes are
# evicted (the other columns stay intact so a later rehydrate can recover the
# bytes from stored parameters). Defined here -- in the database boundary --
# because it describes the persisted shape of the column, not cache policy.
# ``backend.workflows.attachment_cache`` re-exports it for the eviction layer.
EVICTED_MARKER = "[evicted]"


def _encode_metadata_field(value: object, field_name: str, workflow_id: str, filename: str) -> str | None:
"""JSON-encode a dict-shaped metadata field, or return None for absent/bad shape.
Expand Down Expand Up @@ -133,9 +140,6 @@ async def insert_workflow_attachment_row(
raise ValueError("attachment data is empty")

if insert_as_evicted:
# Lazy import keeps queries module free of attachment_cache cycle.
from backend.workflows.attachment_cache import EVICTED_MARKER

data_b64 = EVICTED_MARKER
elif has_path:
with open(attachment["path"], "rb") as f:
Expand Down
27 changes: 22 additions & 5 deletions backend/kv_tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
import json
import logging
from dataclasses import dataclass
from typing import Any, AsyncIterator, Mapping, Sequence
from typing import Any, AsyncIterator, Callable, Mapping, Sequence

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -208,7 +208,11 @@ def log_summary(self) -> None:
elif stats["source"] in ("unrecognized", "no_cache_fields"):
provider_note = f"provider: prompt={stats['prompt_tokens']} tok cached=N/A [{stats['source']}]"
else:
pt, ct, cw = stats["prompt_tokens"], stats["cached_tokens"], stats["cache_write_tokens"]
pt, ct, cw = (
stats["prompt_tokens"],
stats["cached_tokens"],
stats["cache_write_tokens"],
)
total_cached += ct
total_prompt += pt
pct = (ct / pt * 100) if pt else 0.0
Expand Down Expand Up @@ -291,11 +295,20 @@ class CachedBase:
tools when it runs on a different server than the agent" — is then just a
property of how the writer's base is built (empty ``tools``), not a flag
threaded through the writer pass.

``resolve`` is the last step of turning the assembled stack into the literal
bytes on the wire: an opaque ``messages -> messages`` transform applied to
``[*prefix, *trailing]`` immediately before the call (in practice
``Macros.resolve_prompt_messages``, scrubbing ``{{user}}``/``{{char}}`` from
whatever a pass appended). Keeping it on the base means the tracker snapshot
is taken from the *resolved* bytes — the same ones sent — so it cannot drift.
``None`` means send the assembled stack unchanged.
"""

prefix: tuple[Mapping[str, Any], ...]
tools: tuple[dict, ...]
model: str
resolve: Callable[[Sequence[Mapping[str, Any]]], list[dict]] | None = None

def complete(
self,
Expand All @@ -312,13 +325,17 @@ def complete(
per-pass top of the stack). The cached bottom — prefix + tools + model —
comes solely from ``self``; only *trailing* and *tool_choice* vary.

Delegates to :func:`cached_complete` so the tracker snapshot is taken
from the exact bytes sent.
The assembled stack is run through ``self.resolve`` (if set) to produce
the final wire bytes, then handed to :func:`cached_complete` so the
tracker snapshot is taken from the exact bytes sent.
"""
messages: Sequence[Mapping[str, Any]] = [*self.prefix, *trailing]
if self.resolve is not None:
messages = self.resolve(messages)
return cached_complete(
client,
label=label,
messages=[*self.prefix, *trailing],
messages=messages,
model=self.model,
tools=list(self.tools) or None,
tool_choice=tool_choice,
Expand Down
69 changes: 21 additions & 48 deletions backend/macros.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
"""
macros.py — Macro resolution for prompts and messages.

A dependency-free leaf: it turns ``{{user}}``/``{{char}}`` and inline macros
like ``{{roll}}`` into literal text and imports nothing else in the codebase.
It knows about *strings and message dicts*, not about the LLM client — the
pipeline applies :meth:`Macros.resolve_prompt_messages` at the transport
boundary (the cached-base ``resolve`` hook in ``kv_tracker.py``) rather than
this module reaching up into the client layer.

Public API:
resolve_message(text, user_name, char_name) — Full resolution
({{user}}/{{char}} + inline macros like {{roll}}).
Expand All @@ -15,7 +22,8 @@
Macros.resolve_message(text) — instance method, full resolution
Macros.resolve_prompt(text) — instance method, substitution only
Macros.resolve_prompt_messages(msgs) — batch prompt-level res on message list
Macros.wrap_client(client) — wraps LLMClient for prompt-level resolution
(the transport-boundary catch-all that guarantees no placeholder
reaches the model, whatever a pass assembled)
Macros.from_settings(...) — factory from app settings
"""

Expand All @@ -25,8 +33,6 @@
import re
from typing import Any, Mapping, NamedTuple, Sequence

from .llm_client import LLMClient


# ---------------------------------------------------------------------------
# Internal helpers
Expand Down Expand Up @@ -120,55 +126,22 @@ def resolve_prompt(self, text: str) -> str:
"""Only {{user}}/{{char}} substitution (no inline macros)."""
return resolve_prompt(text, self.user, self.char)

def _resolve_prompt_on_message(self, msg: dict) -> dict:
def _resolve_prompt_on_message(self, msg: Mapping[str, Any]) -> dict:
"""Apply prompt-level resolution (substitution only) to a single message dict."""
return {
**msg,
"content": _apply_content(msg.get("content"), lambda t: self.resolve_prompt(t)),
}

def resolve_prompt_messages(self, messages: list[dict]) -> list[dict]:
"""Apply prompt-level resolution to a list of message dicts."""
def resolve_prompt_messages(self, messages: Sequence[Mapping[str, Any]]) -> list[dict]:
"""Apply prompt-level resolution to every message in a list.

This is the transport-boundary catch-all: passed to a cached base's
``resolve`` hook so the fully-assembled wire messages are scrubbed of
``{{user}}``/``{{char}}`` just before they are sent, no matter which
pass built them (e.g. the director's tool prompt embeds user-authored
fragment text that can carry ``{{char}}``). Inline macros like
``{{roll}}`` are intentionally *not* fired here — those are resolved on
the latest user message and prefix content when it is built.
"""
return [self._resolve_prompt_on_message(m) for m in messages]

def wrap_client(self, client: LLMClient) -> "_PlaceholderClient":
return _PlaceholderClient(client, self.user, self.char)


class _PlaceholderClient(LLMClient):
"""Wraps LLMClient to resolve {{user}}/{{char}} on all messages before completion.

Only applies prompt-level resolution (no inline macros) — inline macros
must be resolved on the latest user message before it reaches this client.
"""

def __init__(self, inner: LLMClient, user_name: str, char_name: str) -> None:
self._inner = inner
self._user_name = user_name
self._char_name = char_name
# Share the inner client's abort token so the inherited abort()/
# is_aborted reflect the same turn-wide stop signal — no delegation
# overrides needed. Transport config (base_url/profile/…) is left unset
# since complete() delegates to the inner client rather than using it.
self.abort_token = inner.abort_token

async def complete(
self,
messages: Sequence[Mapping[str, Any]],
model: str,
tools: list[dict] | None = None,
tool_choice: dict | str | None = None,
**params,
):
msgs = [
{
**msg,
"content": _apply_content(
msg.get("content"),
lambda t: resolve_prompt(t, self._user_name, self._char_name),
),
}
for msg in messages
]
async for item in self._inner.complete(msgs, model, tools=tools, tool_choice=tool_choice, **params):
yield item
8 changes: 3 additions & 5 deletions backend/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@
from . import card_downloader
from . import prompt_builder
from .summarizer import ConversationSummarizer
from .utils import estimate_tokens

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -1674,9 +1675,6 @@ async def api_get_context_size(cid: str):
recent_messages = messages[-scan_depth:] if len(messages) >= scan_depth else messages
lorebook_block = prompt_builder.compute_lorebook_injection_block(recent_messages, lorebook_entries, macros)

def est(chars):
return max(1, round(chars / 3.5))

breakdown = {}
for label, chars in [
("system_prompt", len(sys_text)),
Expand All @@ -1689,12 +1687,12 @@ def est(chars):
("director_injection", len(inj_block)),
("lorebook", len(lorebook_block)),
]:
breakdown[label] = {"chars": chars, "tokens_est": est(chars)}
breakdown[label] = {"chars": chars, "tokens_est": estimate_tokens(chars)}

total_chars = sum(v["chars"] for v in breakdown.values())
return {
"total_chars": total_chars,
"total_tokens_est": est(total_chars),
"total_tokens_est": estimate_tokens(total_chars),
"breakdown": breakdown,
"message_count": len(messages),
}
Expand Down
Loading
Loading