All notable changes to Headroom will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- proxy: measure and surface rolling and current token throughput metrics (active/wall-clock input, compression, effective forward, and streamed generation) in
headroom perfCLI and the dashboard (#959). - vibe: add Mistral Vibe CLI support with
headroom wrap vibe. - proxy: per-project savings breakdown on the dashboard for all wrapped agents — Claude Code, Codex, aider, Copilot, and Cursor (#802).
headroom wrap claude/codextag requests with anX-Headroom-Projectheader (launch-directory name);wrap aider/copilot/cursor— whose clients cannot send custom headers — use a/p/<name>base-URL prefix the proxy strips. Savings are aggregated per project (persisted, schema v3 with transparent v2 migration), exposed assavings.per_projectin/statsandprojectsin/stats-history, and shown in a Per-Project Savings dashboard table. - memory: opt-in Apple-GPU (MPS) embedding offload via
HEADROOM_EMBEDDER_RUNTIME=pytorch_mps. When set (and Apple MPS is available), the memory embedder runs on the torch sentence-transformers backend on the Apple GPU instead of the default ONNX CPU embedder, freeing the CPU under load. If MPS or the dependencies are unavailable, Headroom logs a warning and uses the existing default embedder selection path (ONNX when available, then the pre-existing local fallback). MPS encode calls are serialized internally (torch-MPS is not thread-safe). Adds the new[pytorch-mps]extra (pip install 'headroom-ai[pytorch-mps]'). Default behavior is unchanged.
- proxy: cross-region Bedrock inference-profile detection — geo-prefixed model IDs (
eu./us./apac./global.) are now resolved to their canonical vendor, so Anthropic cross-region profiles (e.g.eu.anthropic.claude-haiku-4-5-20251001-v1:0) receive live-zone compression instead of being silently skipped (#999). - proxy: Converse-body compression on the native Bedrock route — the live-zone dispatcher now recognizes Bedrock Converse content blocks (typeless
{"text": …}, not only Anthropic{"type":"text", …}), so Converse user-message text compresses;run_anthropic_compressionno longer bails to passthrough when the body lacks an InvokeModelanthropic_versionenvelope, and envelope re-emit stays gated on successful parse (#999). - docker: bundle
headroom-proxybinary in publishedruntimeandruntime-slimimages — closes #976 (#999).
- proxy: enable SSO credential resolution in the native Bedrock route via the
aws-configssofeature flag, making the credential chain match whatdocs/bedrock.mdalready documented (#999). - proxy: route native Bedrock
/model/{id}/converserequests to the upstream Converse endpoint instead of the hard-coded/invokeaction — the non-streaming handler now resolves the action from the inbound path, matching the streaming handler (#999). - ccr: make retrieval store TTL configurable with
HEADROOM_CCR_TTL_SECONDS, expose the effective TTL in/v1/retrieve/stats, and distinguish expired retrievals from missing hashes. - proxy: add native Bedrock
/model/{id}/converse-streamroute and forward it through the existing streaming EventStream/SSE pipeline. - wrap (codex): fix
headroom wrap codexproducing aconfig.tomlwith duplicate top-levelmodel_provider/openai_base_urlkeys (TOML-spec error) when the user had already configured their own provider. The injector now rewrites pre-existing top-levelmodel_providerandopenai_base_urllines in place — the previous value is kept in a# was: …trailing comment — instead of unconditionally prepending a duplicate, socodexcan start against the proxy. The pre-wrap snapshot mechanism continues to byte-for-byte restore the original file onheadroom unwrap codex.
0.26.0 (2026-06-16)
- add Copilot BYOK provider wrapper utilities and CLI support (#1041) (e67ee2a)
- add dashboard agent usage stats (#814) (6d3f39f)
- Add support for Mistral Vibe CLI (#935) (0932b8b)
- attribute reread waste to over-compression via marker check (#901) (f928576)
- bedrock: cross-region + Converse compression; bundle proxy binary in images (#999) (0dc2e1c)
- dashboard: surface compression-vs-cache net impact in Prefix Cache panel (#913) (2a4d300)
- evals: adversarial-input robustness grid for compressors (#918) (5939004)
- parser: detect re-issued identical tool calls as reread waste (#909) (7d4ae86)
- policy: batch deep edits through one cache-bust (#856 P3a) (#1015) (c2e52fe)
- policy: consume net-cost mutation gate in ContentRouter (#856 P2) (#905) (553ade4)
- proxy: compress AWS Bedrock InvokeModel requests via configurable upstream (#720) (7edb27a)
- anthropic: strip styled Claude model ids (#651) (0c5c89d)
- anyllm: forward openai api_base/api_key to the any-llm backend (#942) (#954) (a7ee8a6)
- cache: guard None exemplar embeddings in dynamic detector (#950) (1ec9320)
- cache: name the missing piece in semantic detector guard (#1018) (3b0bcee)
- ci: check out repo in PR Governance label job (#1021) (4558bc2)
- ci: make PR governance advisory (#1047) (74dff94)
- codex: compute waste signals on the OpenAI Responses path (#898) (b9e2761)
- codex: poll /wham/usage for subscription limits (handshake no longer sends x-codex-* headers) (#924) (8c00f71)
- codex: PR health label check state (#986) (99c874d)
- codex: retag thread providers so history menu stays whole across the proxy boundary (#1034) (74ae781)
- codex: write canonical hooks feature flag and migrate deprecated codex_hooks (#743) (dff6a19)
- compression: convert tree-sitter byte offsets to char offsets (#892) (b1f700f)
- compression: correct JSON array item counting and entropy gate (#887) (d6f0f0f)
- compression: keep container bodies compressible in code handler (#890) (16ed73b)
- compression: measure short-value threshold on payload, not token (#889) (65b0e8c)
- compression: use thread-local tree-sitter parsers in code handler (#893) (6cdb846)
- gemini: surface functionResponse payloads to waste-signal detection (#897) (9b0c840)
- learn: decode directory names with spaces in Windows project paths (#997) (#1027) (2d3701b)
- learn: scan subagent and workflow transcripts (#1045) (0ddd4ed)
- openclaw: declare headroom_retrieve tool contract (#947) (7c8c909)
- policy: correct warm-cache penalty in net_mutation_gain to (S + dT) (#903) (0632eba)
- proxy: add native Bedrock converse-stream route (#917) (b08ec15)
- proxy: keep codex image-generation WS turns alive through the relay (#1000) (7dbbb40)
- proxy: make budget enforcement actually work (#885) (a14ab45)
- proxy: read RTK gain stats globally by default (#957) (b70fccb)
- route v1internal code assist requests to cloudcode-pa.googleapis… (#821) (e20f16b)
- serena: stop the Serena dashboard popup and make --no-serena actually disable Serena (#1003) (919379a)
- support Copilot Business subscription auth (#641) (0b4a4bd)
- wire HEADROOM_EXCLUDE_TOOLS / HEADROOM_TOOL_PROFILES into Click proxy entrypoint (#943) (9b7b436)
- wrap: avoid duplicate top-level keys when injecting codex provider (#884) (dd22cfd)
0.25.0 (2026-06-12)
- add differential network capture harness (#761) (11ab5f8)
- add light mode for dashboard (#834) (c425893)
- add OAuth2 client-credentials upstream-auth proxy extension (#778) (#784) (eb2e50f)
- add Vertex AI proxy routing (#793) (3c77e52)
- cli: comprehensive help text, validation, and exception handling improvements (#640) (028efab)
- compression safety rails — error-output protection, pipeline circuit breaker, library inflation guard (#851) (c0cadcc)
- dashboard: per-model savings breakdown and expected-vs-actual cost on historical charts (#807) (34dafe6)
- detect re-served tool results as over-compression waste signal (#854) (5f1d88a)
- evals: add zero-cost tool schema compaction integrity eval (#817) (53a08c6)
- gated Markdown-KV compaction formatter (serialization-aware output) (#859) (06b2625)
- kompress: warn on unrecognized HEADROOM_KOMPRESS_BACKEND + document backend selection (#204) (6367d0b)
- memory: add opt-in Apple-GPU (MPS) embedding runtime (#766) (c71592d)
- net-cost cache mutation formula on CompressionPolicy (#856 P1) (#857) (d5f5802)
- plugins: Hermes agent headroom_retrieve plugin (#824) (058bced)
- probe-based retention scoring of recorded compression events (#862) (c2106cb)
- proxy: add CLI opt-outs for CCR injection (compression-only mode) (#823) (693d9d2)
- proxy: attribute savings history rollups per provider (#791) (0b8b8d9)
- proxy: log compressed messages alongside original request (#261) (2269e40)
- proxy: per-project savings breakdown on the dashboard (claude, codex, aider, copilot, cursor) (#803) (914a60a)
- support Python 3.14+ via pyo3 abi3 stable ABI (#516) (19eac8e)
- switch Kompress default to kompress-v2-base with weight-only int8 ONNX (#799) (74392b2)
- transforms: attribute read_lifecycle + smart_crush tags (#249) (8f37426)
- anthropic: CCR exception must re-raise, not silently swallow (#838) (8db5efc)
- ccr: key Rust search/diff/log markers with explicit_hash (#852) (bfcb07d)
- ccr: make retrieval TTL configurable (#715) (2533f77)
- ccr: skip CCR when model calls headroom_retrieve alongside user tools (#839) (30078f8)
- ccr: use shared compression store (#875) (249af6c)
- ci: correct comments, timeouts, and pip reliability in native e2e workflows (#878) (b716c8c)
- ci: pin cosign-installer to v3 (v4 does not exist) (#774) (199d693)
- codex: respect CODEX_HOME for wrap config (#731) (96abf38)
- content_router: guard against empty compression output causing Anthropic 400 (#771) (2f9ff07)
- copilot: use responses API for subscription reasoning models (#647) (84ac332)
- correct preserved-entry index mapping in Gemini content round-trip (#836) (0ffe2b6)
- dashboard: stable 'Proxy $ Saved' hero tile under --workers > 1 (#481) (fd73b88)
- don't inject empty tools:[] when client omitted the tools field (#772) (574bbae)
- harden Copilot API auth token handling (#557) (6b0c09f)
- health: readyz verifies upstream connectivity, not just process liveness (#744) (5dfb446)
- init: guard persistent task startup (#616) (9252d85)
- init: normalize Windows hook paths to forward slashes (#788) (6ea6e31)
- init: suppress hook recovery output (#760) (b439599)
- learn: claude-cli streams output with idle timeout (#373) (9bff575)
- make headroom wrap readiness probe timeout configurable for slow ML imports (#581) (163677b)
- parser: detect waste signals in Anthropic tool_result content blocks (#815) (929698a)
- proxy: F4 — trust X-Forwarded-* only behind allow-listed gateway (d10bd5f)
- proxy: lazy-import server to avoid fastapi crash (#442) (93c6937)
- proxy: make CCR multi-worker warning conditional on backend (#770) (d76a729)
- proxy: make Kompress eager preload cache-only so a cold cache can't block startup (#783) (841663d)
- proxy: restore Codex usage headers on WS and streaming SSE transports (#577) (#794) (0ce68de)
- schema compaction must not drop property names that match DROP_KEYS (#785) (ae2122f)
- security: block DNS-rebinding on /debug/* and /stats/reset via Host-header allowlist (#605) (b4b5025)
- ssl: upstream httpx client inherits SSL_CERT_FILE, REQUESTS_CA_BUNDLE, NODE_EXTRA_CA_CERTS (#745) (e50fbb3)
- suppress LiteLLM provider banner before import (#874) (f9384ef)
- transforms: use thread-local tree-sitter parsers to prevent pyo3 Unsendable panic (#604) (2ad300a)
- wrap: track shared proxy clients with markers (#877) (05bd56b)
- extract litellm model resolution to shared utility (ec7d006)
0.24.0 (2026-06-08)
- perf: add --format {text,json,csv} to
headroom perf(#648) (9fe4886) - proxy: show resolved upstream API targets in startup banner (#586) (8dbe7ad), closes #583
- relevance: weight BM25 score_batch by corpus IDF (#646) (88177bd)
- support CLAUDE_CODE_USE_FOUNDRY and custom upstream gateways (#726) (d90cdce)
- ci: restore green lint gate on main (fe50f9d)
- codex: auto-enable fail-open on compression timeout in headroom wrap codex (#531) (5f5f261)
- copilot: restore generic endpoint for non-subscription OAuth (#610) (#612) (18925b8)
- deps: move gunicorn to [proxy-prod] extra, add Windows guard (#537) (fa558c5)
- proxy: fail-open on corrupt golden bytes instead of RuntimeError (#603) (2170a1b)
- proxy: route Claude Code model metadata to Anthropic (#627) (30c1ac8)
- security: patch loopback guard, retry None raise, async subprocess, and cache race (06d7cb9)
- security: patch loopback guard, retry None raise, blocking subprocess, and cache stats race (78f3a4d)
- startup: move HF/httpx log suppression before sentence_transformers init (#622) (176d4c7)
- startup: suppress proxy startup log noise (#619) (4555901)
- wrap: report unbindable proxy ports (#602) (6dfcaa8)
- kompress: warn when
HEADROOM_KOMPRESS_BACKENDis set to an unrecognized value instead of silently falling back toauto, and document the backend selection env var (auto/onnx/onnx_cpu/onnx_coreml/pytorch/pytorch_mpsplus shorthand aliases) inwiki/configuration.md(issue #202, PR #204). - proxy: per-provider attribution in the savings history rollups. Each
/stats-historybucket (hourly/daily/weekly/monthly) now carries aby_providermap breaking downtokens_saved,compression_savings_usd_delta,total_input_tokens_delta, andtotal_input_cost_usd_deltaper provider, so consumers can show how savings and spend are distributed across providers within a time period. Providers only appear in a bucket where they moved a counter; legacy history checkpoints with no provider collapse into"unknown". Affected files:headroom/proxy/savings_tracker.py,headroom/proxy/prometheus_metrics.py. - cli: startup banner now includes a
Performance Tuningsection that surfaces activeHEADROOM_COMPRESSION_STABLE_AFTER_TURN,HEADROOM_STALE_READ_COMPRESS_AFTER_TURNS, and embedding-server socket values when set; shows a hint to set them when all defaults are in use.
- deps: loosen over-pinned constraints and add upper bounds
litellm==1.82.3->>=1.86.2,<2.0(exact pin blocked security patches; floor stays above the CVE-2026-42271 fix)transformers>=4.30.0->>=4.30.0,<6.0(add upper bound; library already crossed a major version silently)sentence-transformers>=2.2.0->>=2.2.0,<6.0(same; applied inmemory,evals, anddevextras)neo4j>=5.20.0->>=5.20.0,<7.0(client had already crossed the 5.x/6.x boundary)mem0ai>=0.1.100->>=1.0.0,<2.0(floor was pre-1.0; locked package is already 1.0.11)langchain-core>=0.2.0->>=1.3.3,<4.0(floor stays above current high-severity advisory fixes)langchain-openai>=0.1.0->>=1.1.14,<2.0(floor stays above current advisory fixes)qdrant-client>=1.9.0->>=1.9.0,<2.0uvicorn>=0.23.0->>=0.23.0,<1.0(applied inproxyanddevextras)- Same
transformersandlitellmbounds applied consistently acrossml,voice, anddevextras
- docker: bump
neo4jimage indocker-compose.ymlfrom5.15.0to5.26(latest 5.x LTS) - docker: bump
UV_VERSIONinDockerfilefrom0.11.16to0.11.18
- codex: respect
CODEX_HOMEwhenheadroom wrap codexwrites provider, MCP, memory, backup, and globalAGENTS.mdconfig, and warn whenunwrap codexmay be looking at the default Codex home becauseCODEX_HOMEis unset. - proxy: multi-worker CCR warning is now conditional on backend — when
HEADROOM_CCR_BACKENDis unset (defaultInMemoryBackend, per-process), the startup warning includes CCR retrieval failures and suggestsHEADROOM_CCR_BACKEND=sqlite; when a cross-worker backend is already configured, the warning covers only the remaining per-worker stores (compression cache, prefix tracker, TOIN, CostTracker). UpdatedRUST_DEV.mdto accurately document PythonCompressionStoreas per-process by default. - deps: move
gunicornto[proxy-prod]extra withsys_platform != 'win32'guard; removed from[proxy]to avoid forcing a Unix-only package on dev, CI, and Windows users (#537) - startup: suppress proxy startup log noise -- litellm banner, trafilatura parse errors, HuggingFace Hub unauthenticated warnings, tiktoken fallback warning, and httpx INFO lines from sentence_transformers HEAD checks. Affected files:
headroom/providers/litellm.py,headroom/transforms/html_extractor.py,headroom/memory/adapters/embedders.py,headroom/providers/anthropic.py,headroom/providers/registry.py,headroom/image/onnx_router.py,headroom/transforms/kompress_compressor.py.
0.23.0 (2026-06-04)
- copilot: GitHub Copilot subscription mode through Headroom (f4dff9b)
- ccr: scope proactive expansion by workspace (cross-project leak) (197601b)
- ccr: scope proactive expansion by workspace (cross-project leak) (1bc163f)
- codex: keep init model_provider at config root (#260) (304dcc7)
- codex: keep init model_provider at config root (#260) (849b46d)
- copilot: deterministic subscription token handoff to the proxy (72da461)
- copilot: support subscription auth through Headroom (ff4a0c6)
- correct tiktoken encoding for unknown gpt-4 model snapshots (#552) (0e551de)
- decode/encode owned config, state and template assets as UTF-8 (2f1538a)
- decode/encode owned config, state and template assets as UTF-8 (fixes #533) (92075b9)
- docker: upgrade base images to Python 3.13 / debian13 (e6bf7a0)
- docker: upgrade base images to Python 3.13 / debian13, drop digest pinning (08a2197)
- docs: bump next.js to 16.2.6 for GHSA-h64f-5h5j-jqjh (CVE-2026-44577) (a6a09e6)
- docs: mkdocs configuration to build with correct folder (#543) (5557944)
- docs: update brace-expansion to 5.0.6 to remediate GHSA-jxxr-4gwj-5jf2 (CVE-2026-45149) (6eb6fb5)
- docs: update bun.lock to next 16.2.6 for GHSA-h64f-5h5j-jqjh (CVE-2026-44577) (91e0937)
- ignore brackets inside JSON strings when splitting mixed content (#553) (bdcfc32)
- learn: decode Unix home dirs whose username contains '.', '-' or '_' (211daae)
- learn: decode Unix home dirs whose username contains '.', '-' or '_' (491a8b3)
- learn: finish gemini-flash-latest default model sweep (982d01b)
- learn: finish gemini-flash-latest default model sweep (#532) (d797366)
- memory: READ-ONLY framing + fail-closed unresolved-project fallback (a178249)
- memory: READ-ONLY framing + fail-closed unresolved-project fallback (482f80e)
- update dashboard doc link (#544) (378d77e)
- Update Next.js to 16.2.4 in docs/bun.lock to address GHSA-gx5p-jg67-6x7h (CVE-2026-44580) (0b9f11a)
- Update Next.js to 16.2.6 in docs/package.json and package-lock.json to address GHSA-h64f-5h5j-jqjh (CVE-2026-44577) (db5d15f)
- Upgrade litellm to 1.86.2 to remediate CVE-2026-42271 (07581b9)
- cli: factor shared wrap-subcommand scaffolding (8eeb926)
- cli: factor shared wrap-subcommand scaffolding (c74ad11)
0.22.4 (2026-05-26)
- cli: G1 remediation — non-string clobber, per-model systemMessage, openhands gate (ea1976e)
- cli: wrap CLI breadth — cline, continue, goose, openhands (8625f80)
- cli: wrap subcommands for cline, continue, goose, openhands (c375fa1)
- observability: G3 remediation — bound cardinality + wire dead metrics (2a717a9)
- observability: RTK metrics + Rust observability (Phase H blocker) (b36ad9f)
- observability: wire Phase G PR-G3 RTK + proxy metrics (H-blocker) (5f264a5)
- release: tag format vX.Y.Z (drop release-please component prefix) (4a39ef5)
- release: tag format vX.Y.Z (drop release-please component prefix) (0f3e3af)
- subscription: address G2 review findings — phantom delta, multi-worker race, silent fallbacks (f68090c)
- subscription: wire tokens_saved_rtk data plane (c7d1247)
- subscription: wire tokens_saved_rtk from RTK stats endpoint (44c605f)
- tests: drive RTK subprocess failure with real exec, not monkeypatched run (9b6d637)
- tests: mock logger.warning directly instead of relying on caplog (c38dac3)
- tests: patch headroom.rtk.get_rtk_path, not the helpers alias (317dffe)
- tests: tomllib fallback to tomli on python 3.10 (74843d1)
/debug/memoryloopback guard. The endpoint was missing theDepends(_require_loopback)guard that all other/debug/*endpoints carry. External callers can no longer reach it.retry_max_attemptszero guard. Whenretry_enabled=Trueandretry_max_attempts=0the retry loop exited without settinglast_error, causingraise last_errorto raiseTypeError: exceptions must derive from BaseException. ARuntimeErrorwith an actionable message is now raised instead, andProxyConfig.__post_init__rejectsretry_max_attempts < 1at construction time.- Blocking subprocess on async event loop.
_read_rtk_lifetime_statsand_read_lean_ctx_lifetime_statscalledsubprocess.rundirectly on the asyncio thread. Theinitialize_context_tool_session_baselinefunction is nowasyncand offloads the subprocess viaasyncio.to_thread; the stats endpoint usesawait asyncio.to_thread(_get_context_tool_stats). - Hardcoded Neo4j credential in
docker-compose.yml.NEO4J_AUTHnow defaults to${NEO4J_AUTH:-neo4j/devpassword}and is documented in.env.example(excluded from.gitignorevia!.env.example). SemanticCache.get_memory_stats()concurrent iteration. The method iteratesself._cache.values()without holding the async lock. A snapshot is now taken vialist(self._cache.values())before iterating to avoidRuntimeError: dictionary changed size during iterationunder async load.- Default Neo4j password in
ProxyConfig.memory_neo4j_passworddefault changed from"password"to"". The proxy startup path now emits alogger.warningwhenmemory_backend == "qdrant-neo4j"and the password is empty, prompting operators to set a real credential.
-
PyPI install clarity and release gating. Documented
pipx --python python3.13for environments where unsupported Python wheel tags cause older-version resolution, made PyPI publish failures block GitHub Releases unlessPYPI_SKIP=true, and added an sdistLICENSEinvariant. -
headroom learnwith claude-cli no longer fails silently on slow networks or large digests. The CLI backend timeout was a hard 120s wall-clock cap with no liveness signal: a successful long analysis and a hung connection looked identical, and exit 0 with "no recommendations" was the only user-visible signal. Two changes: (1) Streaming + idle timeout for claude-cli: the command now uses--output-format stream-json --verboseand a watchdog thread reads events as they arrive. The process is killed only afterHEADROOM_LEARN_CLI_IDLE_TIMEOUT_SECS(default 60s) of zero output, or afterHEADROOM_LEARN_CLI_TIMEOUT_SECS(default 300s, was 120s) total. Long-but-active analyses run to completion; genuine hangs are caught fast. The finaltype:"result"event carries the assistant response. Drains stdout/stderr via reader threads so the watchdog works on Windows too. (2) Env-var overrides for all CLI backends:HEADROOM_LEARN_CLI_TIMEOUT_SECSis honored by gemini-cli and codex-cli as the wall-clock timeout; idle override applies only to the streaming claude-cli path. -
Learned: error recoverysection in MEMORY.md no longer bloats with stale, one-shot, or contradictory entries. The matchers paired up unrelated tool calls (e.g.state.rsandlib.rsin the same dir becomingFile state.rs does not exist. The correct path is lib.rs.), the dedup key was the literal rendered bullet text so near-duplicates each created their own row, the shutdown flush dropped the evidence gate to 1 so every singleton landed at session end, and there was no TTL or re-validation. Fixed at every layer: (1) Emission: Read recoveries require the failed/successful basenames to be identical or close in edit distance; Bash recoveries require a shared binary (allowingpython↔python3andruff↔.venv/bin/ruffvariants) plus low-edit-distance OR a shared substantive non-flag token. Unrelated pairs are rejected at the source. (2) Dedup: error-recovery rows are hashed on recovery intent — Read on(basename(error_path), basename(success_path)), Bash on the primary command stripped of volatile suffixes (| tail -N,2>&1, etc.). Near-duplicates collapse into one row. (3) Evidence gating: defaultmin_evidenceraised from 2 to 5; shutdown-relaxation removed; new--min-evidenceflag andHEADROOM_MIN_EVIDENCEenvvar so embedded clients can tighten the threshold further. (4) Render-time refinement: drop rows not re-observed in 21 days, re-validate Read success paths against the filesystem, collapse same-error_path-with-multiple-targets into one "use Glob/Grep first" bullet, rank byevidence_count * 0.5 ** (days/5), cap the section at 15. A→B / B→A contradiction pairs are also dropped at flush time. Patterns now stampfirst_seen_at/last_seen_aton every save;_bump_persisted_evidenceupdates them viajson_set. OtherLearned: …categories (environment, preference, architecture) are untouched. -
headroom unwrap codexnow actually undoesheadroom wrap codex— previously there was nounwrap codexsubcommand at all, so the injectedmodel_provider = "headroom"/[model_providers.headroom]block stayed in~/.codex/config.tomlforever and Codex continued routing through the (potentially stopped) proxy, surfacing asMissing environment variable: OPENAI_API_KEY.wrap codexnow snapshots the pre-wrapconfig.tomltoconfig.toml.headroom-backupbefore its first injection, andunwrap codexrestores that snapshot byte-for-byte (or, if the backup is missing, strips only the Headroom-managed block and leaves surrounding user content intact). Safe no-op when run without a prior wrap. Reported by @raenaryl in Discord. -
Image compressors now release shared router models after use and proxy shutdown — the proxy/image compression path no longer keeps global
technique-routerandSigLIPmodel instances pinned in memory after one-off image optimization work. Theget_compressor()helper now returns a fresh, caller-owned compressor instead of a process-lifetime singleton. -
headroom learnno longer clobbers prior recommendations on re-run — the marker block inCLAUDE.md/MEMORY.mdis now merged with the prior block instead of wholesale-replaced. Sections re-surfaced by the new run win; sections not re-surfaced are carried forward so learnings accumulate across runs instead of disappearing. To fully rebuild the block, delete it manually and re-run. (#231) -
headroom learnno longer emits dangling cross-references when a section is re-surfaced — the analyzer now includes the project's current<!-- headroom:learn -->block (fromCLAUDE.mdandMEMORY.md) in the LLM digest as a "Prior Learned Patterns" section, and the system prompt instructs the LLM that re-emitting a section replaces the prior one wholesale. Prevents bullets like "Xis also large — same rule asY,Z" from appearing afterYandZgot dropped during per-section replacement. The writer's section-level carry-forward from #231 remains in place as a safety net for sections the LLM omits entirely. New helperextract_marker_blockadded toheadroom.learn.writer.
turn_idlinking agent-loop API calls to a single user prompt — a newcompute_turn_id(model, system, messages)helper inheadroom/proxy/helpers.pyhashes the message prefix up to and including the last user-text message, yielding an id that is stable across every agent-loop iteration of one prompt but rolls over when the user sends a new prompt (or runs/compact,/clear).RequestLoggained aturn_id: str | Nonefield, which is stamped at every log site (anthropic handler bedrock + direct branches, and the streaming handler) and surfaced asturn_idin/transformations/feed. Lets downstream consumers (e.g. the Headroom Desktop Activity tab) aggregate savings per user prompt rather than per API call.- Live flush of traffic-learned patterns to CLAUDE.md / MEMORY.md — the
TrafficLearnernow writes to agent-native context files continuously during proxy operation, not just at shutdown. A new dirty-flag debounced_flush_worker(10s window,FLUSH_DEBOUNCE_SECONDS) callsflush_to_file()whenever_accumulate()marks the learner dirty, so patterns surface inCLAUDE.md/MEMORY.mdnear real-time. Flushes read both persisted rows (via_load_persisted_patterns_from_sqlite) and the in-memory accumulator, bucket patterns by project via the learn plugin registry (plugin.discover_projects()+ longest-path anchoring in_project_for_pattern), and route byPatternCategoryto the correct file (_patterns_to_recommendations+_CATEGORY_TO_TARGET). Live flushes requireevidence_count >= 2; the shutdown flush accepts single-evidence rows.
- Traffic-learner evidence count stuck at 1; duplicate DB rows across
restarts.
_accumulatequeued patterns with the defaultExtractedPattern.evidence_count = 1regardless of how many times the pattern was actually seen, so every persisted row landed at1and never crossed the live-flush gate (evidence_count >= 2). Worse, once a pattern was in_saved_hashesit was early-returned on every re-sighting, and_saved_hashesreset on process restart — so a second sighting in a later session inserted a duplicate row rather than bumping the existing one. Now:_accumulatewrites the real accumulated count at save time,start()hydrates_saved_hashes+ a new_persisted_idsmap from the DB, and re-sightings bump the persisted row'smetadata.evidence_countvia an atomicjson_setUPDATE(_bump_persisted_evidence)._load_persisted_patterns_from_sqlitenow filters viajson_extract(metadata, '$.source')instead of a LIKE on the raw JSON string, so rows survive metadata rewrites.
HEADROOM_QDRANT_*environment variables for memory Qdrant configuration (#31) —Memory(backend="qdrant-neo4j"),Mem0Config,MemoryConfig, andProxyConfignow resolve their Qdrant connection fromHEADROOM_QDRANT_URL,HEADROOM_QDRANT_HOST,HEADROOM_QDRANT_PORT,HEADROOM_QDRANT_API_KEY,HEADROOM_QDRANT_HTTPS,HEADROOM_QDRANT_PREFER_GRPC, andHEADROOM_QDRANT_GRPC_PORT. Explicit constructor arguments still win; unset env keeps the existinglocalhost:6333defaults. Adds matching--memory-qdrant-{url,host,port,api-key}CLI flags. Enables hosted Qdrant (Qdrant Cloud) and shared/remote Qdrant stacks without code changes. New helper:headroom/memory/qdrant_env.py.- Telemetry stack & install-mode identity fields — anonymous beacon now
reports
headroom_stack(how Headroom is invoked:proxy,wrap_claude,adapter_ts_openai, ...) andinstall_mode(wrapped/persistent/on_demand), plusrequests_by_stackfor proxies that serve multiple integrations. Proxy exposes aby_stackbucket alongsideby_provider/by_modelon/stats, a matchingheadroom_requests_by_stackPrometheus counter, and anX-Headroom-Stackheader honored by the FastAPI middleware.headroom wrap <tool>setsHEADROOM_STACK=wrap_<agent>; the TS SDK and all four adapters (openai,anthropic,gemini,vercel-ai) tag their compress calls. Schema migration:sql/upgrade_telemetry_stack_context.sql. - Canonical filesystem contract (issue #175) — new
HEADROOM_CONFIG_DIR(default~/.headroom/config, read-mostly) andHEADROOM_WORKSPACE_DIR(default~/.headroom, read-write state) env vars recognized by the Python proxy/CLI and the npm SDK. Additive; all existing per-resource env vars (HEADROOM_SAVINGS_PATH,HEADROOM_TOIN_PATH,HEADROOM_SUBSCRIPTION_STATE_PATH,HEADROOM_MODEL_LIMITS) continue to work with identical semantics. Docker install scripts anddocker-compose.native.ymlforward the new vars into containers so savings, logs, and telemetry resolve to the bind-mounted.headroompath. Seewiki/filesystem-contract.md.
/stats-historynow returns compact checkpoint history by default — the JSON response keeps recent checkpoints dense while evenly sampling older checkpoints so long-running installs do not return ever-growing payloads. Addhistory_mode=fullto fetch the full retained checkpoint list, orhistory_mode=noneto skip it entirely while still receiving the derived hourly/daily/weekly/monthly rollups. Responses now include ahistory_summaryblock describing stored versus returned points.
- Streaming Anthropic requests are now visible to
/stats.recent_requestsand/transformations/feed—_finalize_stream_responsedid not callself.logger.log(...), so the entire streaming Anthropic code path (the one Claude Code uses) silently bypassed the request logger. Only the non-streaming Anthropic path and the Bedrock streaming path were logged. As a consequence,--log-messageshad no observable effect on the live transformations feed for typical traffic. The streaming finalizer now emits the sameRequestLogshape the other paths do, includingrequest_messageswhenlog_full_messagesis enabled.
- Cross-agent memory — Claude saves a fact, Codex reads it back. All agents sharing one proxy share one memory store. Project-scoped DB at
.headroom/memory.db, auto user_id from$USER. - Agent provenance tracking — every memory records which agent saved it (
source_agent,source_provider,created_via), with edit history on updates. - LLM-mediated dedup — on
memory_save, enriched response hints similar existing memories to the LLM. Background async dedup auto-removes >92% cosine duplicates. Zero extra LLM calls. - Memory for OpenAI and Gemini handlers — context injection + tool handling wired into all three provider handlers (Anthropic, OpenAI, Gemini).
- Plugin architecture for
headroom learn— each agent (Claude, Codex, Gemini) is a self-contained plugin. External plugins register viaheadroom.learn_pluginentry points.--agentflag for CLI. - GeminiScanner for
headroom learn— reads~/.gemini/tmp/*/chats/session-*.jsonand.jsonl. - Code graph integration —
headroom wrap claude --code-graphauto-indexes the project via codebase-memory-mcp for call-chain traversal, impact analysis, and architectural queries. Opt-in, ~200 token overhead with Claude Code's MCP Tool Search. - OpenAI embedder auto-detection — memory backend uses OpenAI embeddings when
sentence-transformersis unavailable (no torch/2GB dependency needed). - Live traffic learning flush —
headroom wrap <agent> --learnflushes learned patterns to the correct agent-native file (MEMORY.md / AGENTS.md / GEMINI.md) at proxy shutdown.
- CodeCompressor disabled by default — AST-based code compression produced invalid syntax on 40% of real files. Code now passes through uncompressed. Use
--code-graphfor code intelligence instead, or re-enable with--code-aware. - Shared tool name map — consolidated tool normalization across all learn plugins into
_shared.py. - Dynamic CLI agent detection —
headroom learndiscovers agents via plugin registry, no hardcoded choices.
- CodeCompressor statement-based truncation — body truncation now walks AST statements (not lines), never cuts mid-expression. Fixes syntax errors on multi-line dict literals and function calls.
- Docstring FIRST_LINE mode — uses source lines directly instead of reconstructing from byte offsets. Properly handles all quote styles.
- Memory shutdown queue drain — patterns in the save queue were lost on proxy shutdown. Now drained before exit.
- Codex-proxy resilience hardening — reduces event-loop starvation under cold-start reconnect storms
- Stage-timing instrumentation — per-stage durations for both Codex WS accept and Anthropic
/v1/messagespre-upstream phases emitted as a singleSTAGE_TIMINGSstructured log line per request plus Prometheus histograms - Per-pipeline shared warmup — Anthropic + OpenAI pipelines eagerly load compressors/parsers once at startup; status merged into
WarmupRegistryfor/debug/warmupand/readyz - WS session registry — first-class tracking of active Codex WS sessions with deterministic relay-task cancellation and termination-cause classification (
client_disconnect,upstream_error,client_timeout, etc.) - Bounded pre-upstream Anthropic concurrency —
--anthropic-pre-upstream-concurrency/HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCYcaps simultaneous/v1/messagespre-upstream work (body read, deep copy, first compression stage, memory-context lookup, upstream connect) so replay storms cannot starve/livez,/readyz, and new Codex WS opens. Default: automax(2, min(8, cpu_count));0or negative disables (unbounded) - Loopback-only debug endpoints —
/debug/tasks,/debug/ws-sessions,/debug/warmupreturn404(not403) to non-loopback callers so external scanners cannot enumerate them - Reconnect-storm repro harness —
scripts/repro_codex_replay.pydrives concurrent WS + HTTP replay traffic against a local proxy and asserts/livezp99 under threshold;--jsonoutput routes JSON to stdout and the human summary to stderr
- Stage-timing instrumentation — per-stage durations for both Codex WS accept and Anthropic
- Proxy liveness and readiness health checks
- Adds
GET /livezfor process liveness andGET /readyzfor traffic readiness - Keeps
GET /healthbackward compatible while expanding it with readiness details and subsystem checks - Eagerly initializes configured memory backends during proxy startup so readiness reflects real serving capability
- Wires
/readyzinto the Docker imageHEALTHCHECKand the exampledocker-compose.yml
- Adds
- Durable proxy savings history
- Persists proxy compression savings history locally at
~/.headroom/proxy_savings.json - Supports
HEADROOM_SAVINGS_PATHto override the storage location - Adds
/stats-historywith lifetime totals plus hourly/daily/weekly/monthly rollups - Supports JSON and CSV export from
/stats-history - Extends
/statswith apersistent_savingsblock while keepingsavings_historybackward compatible - Adds a historical mode to
/dashboardbacked by/stats-history, including export actions
- Persists proxy compression savings history locally at
- Proxy telemetry SDK override via
HEADROOM_SDK- Downstream apps can override the anonymous telemetry
sdkfield without patching installed files - Blank values fall back to the default
proxylabel
- Downstream apps can override the anonymous telemetry
headroom learn— Offline failure learning for coding agents- Analyzes past conversation history (Claude Code, extensible to Cursor/Codex)
- Success correlation: for each failure, finds what succeeded after and extracts the specific correction
- 5 analyzers: Environment, Structure, Command Patterns, Retry Prevention, Cross-Session
- Writes specific learnings to CLAUDE.md (stable project facts) and MEMORY.md (session patterns)
- Generic architecture: tool-agnostic
ToolCallmodel, pluggable Scanner/Writer adapters - Dry-run by default,
--applyto write,--allfor all projects - Example output: "FirstClassEntity.java is not at axion-formats/ — actually at axion-scala-common/"
- Read Lifecycle Management — Event-driven compression of stale/superseded Read outputs
- Detects when a Read output becomes stale (file was edited after) or superseded (file was re-read)
- Replaces stale/superseded content with compact CCR markers, stores originals for retrieval
- 75% of Read output bytes are provably stale or redundant (from real-world analysis of 66K tool calls)
- Fresh Reads (latest read, no subsequent edit) are never touched — Edit safety preserved
- Opt-in via
ReadLifecycleConfig(enabled=True), disabled by default - Handles both OpenAI and Anthropic message formats
- any-llm backend - Route requests through 38+ LLM providers (OpenAI, Mistral, Groq, Ollama, etc.) via any-llm
- Enable with
--backend anyllm --anyllm-provider <provider> - Install with:
pip install 'headroom-ai[anyllm]'
- Enable with
- Production-ready proxy server with caching, rate limiting, and metrics
- CLI command
headroom proxyto start the proxy server - IntelligentContextManager (semantic-aware context management)
- Multi-factor importance scoring: recency, semantic similarity, TOIN importance, error indicators, forward references, token density
- No hardcoded patterns - all importance signals learned from TOIN or computed from metrics
- TOIN integration for retrieval_rate and field_semantics-based scoring
- Strategy selection: NONE, COMPRESS_FIRST, DROP_BY_SCORE based on budget overage
- Atomic tool unit handling (call + response dropped together)
- Configurable scoring weights via
ScoringWeightsdataclass IntelligentContextConfigfor full configuration control- Backwards compatible with
RollingWindowConfig
- LLMLingua-2 Integration (opt-in ML-based compression)
LLMLinguaCompressortransform using Microsoft's LLMLingua-2 model- Content-aware compression rates (code: 0.4, JSON: 0.35, text: 0.3)
- Memory management utilities:
unload_llmlingua_model(),is_llmlingua_model_loaded() - Proxy integration via
--llmlinguaflag - Device selection:
--llmlingua-device(auto/cuda/cpu/mps) - Custom compression rate:
--llmlingua-rate - Helpful startup hints when llmlingua is available but not enabled
Install with:(thepip install headroom-ai[llmlingua][llmlingua]extra was removed in 0.9.x)
- Code-Aware Compression (AST-based, syntax-preserving)
CodeAwareCompressortransform using tree-sitter for AST parsing- Supports Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
- Preserves imports, function signatures, type annotations, error handlers
- Compresses function bodies while maintaining structural integrity
- Guarantees syntactically valid output (no broken code)
- Automatic language detection from code patterns
- Memory management:
is_tree_sitter_available(),unload_tree_sitter() - Uses
tree-sitter-language-packfor broad language support - Install with:
pip install headroom-ai[code]
- ContentRouter (intelligent compression orchestrator)
- Auto-routes content to optimal compressor based on type detection
- Source hint support for high-confidence routing (file paths, tool names)
- Handles mixed content (e.g., markdown with code blocks)
- Strategies: CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, TEXT, LLMLINGUA
- Configurable strategy preferences and fallbacks
- Routing decision log for transparency and debugging
- Custom Model Configuration
- Support for new models: Claude 4.5 (Opus), Claude 4 (Sonnet, Haiku), o3, o3-mini
- Pattern-based inference for unknown models (opus/sonnet/haiku tiers)
- Custom model config via
HEADROOM_MODEL_LIMITSenvironment variable - Config file support:
~/.headroom/models.json - Graceful fallback for unknown models (no crashes)
- Updated pricing data for all current models
- Event.wait task leak in subscription trackers —
asyncio.shieldpattern prevents cancellation of the outerwait_forfrom leaking the innerEvent.waittask - Python 3.10 compatibility for memory-context fail-open — catches
asyncio.TimeoutError(the 3.10-compatible alias) rather thanTimeoutErrorto preserve behaviour on older runtimes - uvicorn
proxy_headers=False— refusesForwarded/X-Forwarded-Forrewrites so the loopback guard on/debug/*cannot be spoofed by a misconfigured reverse proxy - First-frame timeout for Codex WS accepts — guards against a client that opens a handshake and never sends the first frame; relays cancel deterministically with
client_timeout - Semaphore leak on unexpected exception in Anthropic pre-upstream path — the finalizer now releases the pre-upstream semaphore on every exit path (early 4xx, cache hit, upstream error, streaming handoff)
active_relay_tasksgauge double-decrement —deregister_and_countreturns(handle, released_task_count)atomically so the handler decrements the Prometheus gauge by the exact number it registered, eliminating drift
- IPv6-mapped loopback recognition — the loopback guard parses
::ffff:127.0.0.1and other dual-stack literals throughipaddress.ip_address(...).is_loopback - Lock-free stage-timing accumulators —
record_stage_timingswrites to per-path counters that do not contend with/metricsexport orrecord_request - Narrow
contextlib.suppressin relay classification — onlyCancelledErroris suppressed where we reclassify it; other exceptions propagate so termination cause stays truthful jitter_delay_mshelper — shared exponential-backoff + 50-150% jitter formula inheadroom/proxy/helpers.py; used by three proxy retry sites and mirrored inline in the repro harness
0.2.0 - 2025-01-07
- SmartCrusher: Statistical compression for tool outputs
- Keeps first/last K items, errors, anomalies, and relevance matches
- Variance-based change point detection
- Pattern detection (time series, logs, search results)
- Relevance Scoring Engine: ML-powered item relevance
BM25Scorer: Fast keyword matching (zero dependencies)EmbeddingScorer: Semantic similarity with sentence-transformersHybridScorer: Adaptive combination of both methods
- CacheAligner: Prefix stabilization for better cache hits
- Dynamic date extraction
- Whitespace normalization
- Stable prefix hashing
- RollingWindow: Context management within token limits
- Drops oldest tool units first
- Never orphans tool results
- Preserves recent turns
- Multi-Provider Support:
- Anthropic with official
count_tokensAPI - Google with official
countTokensAPI - Cohere with official
tokenizeAPI - Mistral with official tokenizer
- LiteLLM for unified interface
- Anthropic with official
- Integrations:
- LangChain callback handler (
HeadroomOptimizer) - MCP (Model Context Protocol) utilities
- LangChain callback handler (
- Proxy Server (
headroom.proxy):- Semantic caching with LRU eviction
- Token bucket rate limiting
- Retry with exponential backoff
- Cost tracking with budget enforcement
- Prometheus metrics endpoint
- Request logging (JSONL)
- Pricing Registry: Centralized model pricing with staleness tracking
- Benchmarks: Performance benchmarks for transforms and relevance scoring
- Improved token counting accuracy across all providers
- Enhanced tool output compression with relevance-aware selection
- Mistral tokenizer API compatibility
- Google token counting for multi-turn conversations
0.1.0 - 2025-01-05
- Initial release
HeadroomClient: OpenAI-compatible client wrapperToolCrusher: Basic tool output compression- Audit mode for observation without modification
- Optimize mode for applying transforms
- Simulate mode for previewing changes
- SQLite and JSONL storage backends
- HTML report generation
- Streaming support
- Never removes human content
- Never breaks tool ordering
- Parse failures are no-ops
- Preserves recency (last N turns)
The 0.2.0 release is backward compatible. New features are opt-in:
# Old code still works
from headroom import HeadroomClient, OpenAIProvider
# New SmartCrusher (replaces ToolCrusher for better compression)
from headroom import SmartCrusher, SmartCrusherConfig
config = SmartCrusherConfig(
min_tokens_to_crush=200,
max_items_after_crush=50,
)
crusher = SmartCrusher(config)
# New relevance scoring
from headroom import create_scorer
scorer = create_scorer("hybrid") # or "bm25" for zero depsNew in 0.2.0 - run Headroom as a proxy server:
# Start the proxy
headroom proxy --port 8787
# Use with Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude