Skip to content

Latest commit

 

History

History
726 lines (630 loc) · 70.2 KB

File metadata and controls

726 lines (630 loc) · 70.2 KB

Changelog

All notable changes to Headroom will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Features

  • proxy: measure and surface rolling and current token throughput metrics (active/wall-clock input, compression, effective forward, and streamed generation) in headroom perf CLI and the dashboard (#959).
  • vibe: add Mistral Vibe CLI support with headroom wrap vibe.
  • proxy: per-project savings breakdown on the dashboard for all wrapped agents — Claude Code, Codex, aider, Copilot, and Cursor (#802). headroom wrap claude/codex tag requests with an X-Headroom-Project header (launch-directory name); wrap aider/copilot/cursor — whose clients cannot send custom headers — use a /p/<name> base-URL prefix the proxy strips. Savings are aggregated per project (persisted, schema v3 with transparent v2 migration), exposed as savings.per_project in /stats and projects in /stats-history, and shown in a Per-Project Savings dashboard table.
  • memory: opt-in Apple-GPU (MPS) embedding offload via HEADROOM_EMBEDDER_RUNTIME=pytorch_mps. When set (and Apple MPS is available), the memory embedder runs on the torch sentence-transformers backend on the Apple GPU instead of the default ONNX CPU embedder, freeing the CPU under load. If MPS or the dependencies are unavailable, Headroom logs a warning and uses the existing default embedder selection path (ONNX when available, then the pre-existing local fallback). MPS encode calls are serialized internally (torch-MPS is not thread-safe). Adds the new [pytorch-mps] extra (pip install 'headroom-ai[pytorch-mps]'). Default behavior is unchanged.

Features

  • proxy: cross-region Bedrock inference-profile detection — geo-prefixed model IDs (eu./us./apac./global.) are now resolved to their canonical vendor, so Anthropic cross-region profiles (e.g. eu.anthropic.claude-haiku-4-5-20251001-v1:0) receive live-zone compression instead of being silently skipped (#999).
  • proxy: Converse-body compression on the native Bedrock route — the live-zone dispatcher now recognizes Bedrock Converse content blocks (typeless {"text": …}, not only Anthropic {"type":"text", …}), so Converse user-message text compresses; run_anthropic_compression no longer bails to passthrough when the body lacks an InvokeModel anthropic_version envelope, and envelope re-emit stays gated on successful parse (#999).
  • docker: bundle headroom-proxy binary in published runtime and runtime-slim images — closes #976 (#999).

Bug Fixes

  • proxy: enable SSO credential resolution in the native Bedrock route via the aws-config sso feature flag, making the credential chain match what docs/bedrock.md already documented (#999).
  • proxy: route native Bedrock /model/{id}/converse requests to the upstream Converse endpoint instead of the hard-coded /invoke action — the non-streaming handler now resolves the action from the inbound path, matching the streaming handler (#999).
  • ccr: make retrieval store TTL configurable with HEADROOM_CCR_TTL_SECONDS, expose the effective TTL in /v1/retrieve/stats, and distinguish expired retrievals from missing hashes.
  • proxy: add native Bedrock /model/{id}/converse-stream route and forward it through the existing streaming EventStream/SSE pipeline.
  • wrap (codex): fix headroom wrap codex producing a config.toml with duplicate top-level model_provider / openai_base_url keys (TOML-spec error) when the user had already configured their own provider. The injector now rewrites pre-existing top-level model_provider and openai_base_url lines in place — the previous value is kept in a # was: … trailing comment — instead of unconditionally prepending a duplicate, so codex can start against the proxy. The pre-wrap snapshot mechanism continues to byte-for-byte restore the original file on headroom unwrap codex.

0.26.0 (2026-06-16)

Features

  • add Copilot BYOK provider wrapper utilities and CLI support (#1041) (e67ee2a)
  • add dashboard agent usage stats (#814) (6d3f39f)
  • Add support for Mistral Vibe CLI (#935) (0932b8b)
  • attribute reread waste to over-compression via marker check (#901) (f928576)
  • bedrock: cross-region + Converse compression; bundle proxy binary in images (#999) (0dc2e1c)
  • dashboard: surface compression-vs-cache net impact in Prefix Cache panel (#913) (2a4d300)
  • evals: adversarial-input robustness grid for compressors (#918) (5939004)
  • parser: detect re-issued identical tool calls as reread waste (#909) (7d4ae86)
  • policy: batch deep edits through one cache-bust (#856 P3a) (#1015) (c2e52fe)
  • policy: consume net-cost mutation gate in ContentRouter (#856 P2) (#905) (553ade4)
  • proxy: compress AWS Bedrock InvokeModel requests via configurable upstream (#720) (7edb27a)

Bug Fixes

  • anthropic: strip styled Claude model ids (#651) (0c5c89d)
  • anyllm: forward openai api_base/api_key to the any-llm backend (#942) (#954) (a7ee8a6)
  • cache: guard None exemplar embeddings in dynamic detector (#950) (1ec9320)
  • cache: name the missing piece in semantic detector guard (#1018) (3b0bcee)
  • ci: check out repo in PR Governance label job (#1021) (4558bc2)
  • ci: make PR governance advisory (#1047) (74dff94)
  • codex: compute waste signals on the OpenAI Responses path (#898) (b9e2761)
  • codex: poll /wham/usage for subscription limits (handshake no longer sends x-codex-* headers) (#924) (8c00f71)
  • codex: PR health label check state (#986) (99c874d)
  • codex: retag thread providers so history menu stays whole across the proxy boundary (#1034) (74ae781)
  • codex: write canonical hooks feature flag and migrate deprecated codex_hooks (#743) (dff6a19)
  • compression: convert tree-sitter byte offsets to char offsets (#892) (b1f700f)
  • compression: correct JSON array item counting and entropy gate (#887) (d6f0f0f)
  • compression: keep container bodies compressible in code handler (#890) (16ed73b)
  • compression: measure short-value threshold on payload, not token (#889) (65b0e8c)
  • compression: use thread-local tree-sitter parsers in code handler (#893) (6cdb846)
  • gemini: surface functionResponse payloads to waste-signal detection (#897) (9b0c840)
  • learn: decode directory names with spaces in Windows project paths (#997) (#1027) (2d3701b)
  • learn: scan subagent and workflow transcripts (#1045) (0ddd4ed)
  • openclaw: declare headroom_retrieve tool contract (#947) (7c8c909)
  • policy: correct warm-cache penalty in net_mutation_gain to (S + dT) (#903) (0632eba)
  • proxy: add native Bedrock converse-stream route (#917) (b08ec15)
  • proxy: keep codex image-generation WS turns alive through the relay (#1000) (7dbbb40)
  • proxy: make budget enforcement actually work (#885) (a14ab45)
  • proxy: read RTK gain stats globally by default (#957) (b70fccb)
  • route v1internal code assist requests to cloudcode-pa.googleapis… (#821) (e20f16b)
  • serena: stop the Serena dashboard popup and make --no-serena actually disable Serena (#1003) (919379a)
  • support Copilot Business subscription auth (#641) (0b4a4bd)
  • wire HEADROOM_EXCLUDE_TOOLS / HEADROOM_TOOL_PROFILES into Click proxy entrypoint (#943) (9b7b436)
  • wrap: avoid duplicate top-level keys when injecting codex provider (#884) (dd22cfd)

Code Refactoring

  • DRY cache logic, add thread safety, fix Bash exclusion (#704) (e36fccd)

0.25.0 (2026-06-12)

Features

  • add differential network capture harness (#761) (11ab5f8)
  • add light mode for dashboard (#834) (c425893)
  • add OAuth2 client-credentials upstream-auth proxy extension (#778) (#784) (eb2e50f)
  • add Vertex AI proxy routing (#793) (3c77e52)
  • cli: comprehensive help text, validation, and exception handling improvements (#640) (028efab)
  • compression safety rails — error-output protection, pipeline circuit breaker, library inflation guard (#851) (c0cadcc)
  • dashboard: per-model savings breakdown and expected-vs-actual cost on historical charts (#807) (34dafe6)
  • detect re-served tool results as over-compression waste signal (#854) (5f1d88a)
  • evals: add zero-cost tool schema compaction integrity eval (#817) (53a08c6)
  • gated Markdown-KV compaction formatter (serialization-aware output) (#859) (06b2625)
  • kompress: warn on unrecognized HEADROOM_KOMPRESS_BACKEND + document backend selection (#204) (6367d0b)
  • memory: add opt-in Apple-GPU (MPS) embedding runtime (#766) (c71592d)
  • net-cost cache mutation formula on CompressionPolicy (#856 P1) (#857) (d5f5802)
  • plugins: Hermes agent headroom_retrieve plugin (#824) (058bced)
  • probe-based retention scoring of recorded compression events (#862) (c2106cb)
  • proxy: add CLI opt-outs for CCR injection (compression-only mode) (#823) (693d9d2)
  • proxy: attribute savings history rollups per provider (#791) (0b8b8d9)
  • proxy: log compressed messages alongside original request (#261) (2269e40)
  • proxy: per-project savings breakdown on the dashboard (claude, codex, aider, copilot, cursor) (#803) (914a60a)
  • support Python 3.14+ via pyo3 abi3 stable ABI (#516) (19eac8e)
  • switch Kompress default to kompress-v2-base with weight-only int8 ONNX (#799) (74392b2)
  • transforms: attribute read_lifecycle + smart_crush tags (#249) (8f37426)

Bug Fixes

  • anthropic: CCR exception must re-raise, not silently swallow (#838) (8db5efc)
  • ccr: key Rust search/diff/log markers with explicit_hash (#852) (bfcb07d)
  • ccr: make retrieval TTL configurable (#715) (2533f77)
  • ccr: skip CCR when model calls headroom_retrieve alongside user tools (#839) (30078f8)
  • ccr: use shared compression store (#875) (249af6c)
  • ci: correct comments, timeouts, and pip reliability in native e2e workflows (#878) (b716c8c)
  • ci: pin cosign-installer to v3 (v4 does not exist) (#774) (199d693)
  • codex: respect CODEX_HOME for wrap config (#731) (96abf38)
  • content_router: guard against empty compression output causing Anthropic 400 (#771) (2f9ff07)
  • copilot: use responses API for subscription reasoning models (#647) (84ac332)
  • correct preserved-entry index mapping in Gemini content round-trip (#836) (0ffe2b6)
  • dashboard: stable 'Proxy $ Saved' hero tile under --workers > 1 (#481) (fd73b88)
  • don't inject empty tools:[] when client omitted the tools field (#772) (574bbae)
  • harden Copilot API auth token handling (#557) (6b0c09f)
  • health: readyz verifies upstream connectivity, not just process liveness (#744) (5dfb446)
  • init: guard persistent task startup (#616) (9252d85)
  • init: normalize Windows hook paths to forward slashes (#788) (6ea6e31)
  • init: suppress hook recovery output (#760) (b439599)
  • learn: claude-cli streams output with idle timeout (#373) (9bff575)
  • make headroom wrap readiness probe timeout configurable for slow ML imports (#581) (163677b)
  • parser: detect waste signals in Anthropic tool_result content blocks (#815) (929698a)
  • proxy: F4 — trust X-Forwarded-* only behind allow-listed gateway (d10bd5f)
  • proxy: lazy-import server to avoid fastapi crash (#442) (93c6937)
  • proxy: make CCR multi-worker warning conditional on backend (#770) (d76a729)
  • proxy: make Kompress eager preload cache-only so a cold cache can't block startup (#783) (841663d)
  • proxy: restore Codex usage headers on WS and streaming SSE transports (#577) (#794) (0ce68de)
  • schema compaction must not drop property names that match DROP_KEYS (#785) (ae2122f)
  • security: block DNS-rebinding on /debug/* and /stats/reset via Host-header allowlist (#605) (b4b5025)
  • ssl: upstream httpx client inherits SSL_CERT_FILE, REQUESTS_CA_BUNDLE, NODE_EXTRA_CA_CERTS (#745) (e50fbb3)
  • suppress LiteLLM provider banner before import (#874) (f9384ef)
  • transforms: use thread-local tree-sitter parsers to prevent pyo3 Unsendable panic (#604) (2ad300a)
  • wrap: track shared proxy clients with markers (#877) (05bd56b)

Code Refactoring

  • extract litellm model resolution to shared utility (ec7d006)

0.24.0 (2026-06-08)

Features

  • perf: add --format {text,json,csv} to headroom perf (#648) (9fe4886)
  • proxy: show resolved upstream API targets in startup banner (#586) (8dbe7ad), closes #583
  • relevance: weight BM25 score_batch by corpus IDF (#646) (88177bd)
  • support CLAUDE_CODE_USE_FOUNDRY and custom upstream gateways (#726) (d90cdce)

Bug Fixes

  • ci: restore green lint gate on main (fe50f9d)
  • codex: auto-enable fail-open on compression timeout in headroom wrap codex (#531) (5f5f261)
  • copilot: restore generic endpoint for non-subscription OAuth (#610) (#612) (18925b8)
  • deps: move gunicorn to [proxy-prod] extra, add Windows guard (#537) (fa558c5)
  • proxy: fail-open on corrupt golden bytes instead of RuntimeError (#603) (2170a1b)
  • proxy: route Claude Code model metadata to Anthropic (#627) (30c1ac8)
  • security: patch loopback guard, retry None raise, async subprocess, and cache race (06d7cb9)
  • security: patch loopback guard, retry None raise, blocking subprocess, and cache stats race (78f3a4d)
  • startup: move HF/httpx log suppression before sentence_transformers init (#622) (176d4c7)
  • startup: suppress proxy startup log noise (#619) (4555901)
  • wrap: report unbindable proxy ports (#602) (6dfcaa8)

Added

  • kompress: warn when HEADROOM_KOMPRESS_BACKEND is set to an unrecognized value instead of silently falling back to auto, and document the backend selection env var (auto / onnx / onnx_cpu / onnx_coreml / pytorch / pytorch_mps plus shorthand aliases) in wiki/configuration.md (issue #202, PR #204).
  • proxy: per-provider attribution in the savings history rollups. Each /stats-history bucket (hourly/daily/weekly/monthly) now carries a by_provider map breaking down tokens_saved, compression_savings_usd_delta, total_input_tokens_delta, and total_input_cost_usd_delta per provider, so consumers can show how savings and spend are distributed across providers within a time period. Providers only appear in a bucket where they moved a counter; legacy history checkpoints with no provider collapse into "unknown". Affected files: headroom/proxy/savings_tracker.py, headroom/proxy/prometheus_metrics.py.
  • cli: startup banner now includes a Performance Tuning section that surfaces active HEADROOM_COMPRESSION_STABLE_AFTER_TURN, HEADROOM_STALE_READ_COMPRESS_AFTER_TURNS, and embedding-server socket values when set; shows a hint to set them when all defaults are in use.

Changed

  • deps: loosen over-pinned constraints and add upper bounds
    • litellm==1.82.3 -> >=1.86.2,<2.0 (exact pin blocked security patches; floor stays above the CVE-2026-42271 fix)
    • transformers>=4.30.0 -> >=4.30.0,<6.0 (add upper bound; library already crossed a major version silently)
    • sentence-transformers>=2.2.0 -> >=2.2.0,<6.0 (same; applied in memory, evals, and dev extras)
    • neo4j>=5.20.0 -> >=5.20.0,<7.0 (client had already crossed the 5.x/6.x boundary)
    • mem0ai>=0.1.100 -> >=1.0.0,<2.0 (floor was pre-1.0; locked package is already 1.0.11)
    • langchain-core>=0.2.0 -> >=1.3.3,<4.0 (floor stays above current high-severity advisory fixes)
    • langchain-openai>=0.1.0 -> >=1.1.14,<2.0 (floor stays above current advisory fixes)
    • qdrant-client>=1.9.0 -> >=1.9.0,<2.0
    • uvicorn>=0.23.0 -> >=0.23.0,<1.0 (applied in proxy and dev extras)
    • Same transformers and litellm bounds applied consistently across ml, voice, and dev extras
  • docker: bump neo4j image in docker-compose.yml from 5.15.0 to 5.26 (latest 5.x LTS)
  • docker: bump UV_VERSION in Dockerfile from 0.11.16 to 0.11.18

Bug Fixes

  • codex: respect CODEX_HOME when headroom wrap codex writes provider, MCP, memory, backup, and global AGENTS.md config, and warn when unwrap codex may be looking at the default Codex home because CODEX_HOME is unset.
  • proxy: multi-worker CCR warning is now conditional on backend — when HEADROOM_CCR_BACKEND is unset (default InMemoryBackend, per-process), the startup warning includes CCR retrieval failures and suggests HEADROOM_CCR_BACKEND=sqlite; when a cross-worker backend is already configured, the warning covers only the remaining per-worker stores (compression cache, prefix tracker, TOIN, CostTracker). Updated RUST_DEV.md to accurately document Python CompressionStore as per-process by default.
  • deps: move gunicorn to [proxy-prod] extra with sys_platform != 'win32' guard; removed from [proxy] to avoid forcing a Unix-only package on dev, CI, and Windows users (#537)
  • startup: suppress proxy startup log noise -- litellm banner, trafilatura parse errors, HuggingFace Hub unauthenticated warnings, tiktoken fallback warning, and httpx INFO lines from sentence_transformers HEAD checks. Affected files: headroom/providers/litellm.py, headroom/transforms/html_extractor.py, headroom/memory/adapters/embedders.py, headroom/providers/anthropic.py, headroom/providers/registry.py, headroom/image/onnx_router.py, headroom/transforms/kompress_compressor.py.

0.23.0 (2026-06-04)

Features

  • copilot: GitHub Copilot subscription mode through Headroom (f4dff9b)

Bug Fixes

  • ccr: scope proactive expansion by workspace (cross-project leak) (197601b)
  • ccr: scope proactive expansion by workspace (cross-project leak) (1bc163f)
  • codex: keep init model_provider at config root (#260) (304dcc7)
  • codex: keep init model_provider at config root (#260) (849b46d)
  • copilot: deterministic subscription token handoff to the proxy (72da461)
  • copilot: support subscription auth through Headroom (ff4a0c6)
  • correct tiktoken encoding for unknown gpt-4 model snapshots (#552) (0e551de)
  • decode/encode owned config, state and template assets as UTF-8 (2f1538a)
  • decode/encode owned config, state and template assets as UTF-8 (fixes #533) (92075b9)
  • docker: upgrade base images to Python 3.13 / debian13 (e6bf7a0)
  • docker: upgrade base images to Python 3.13 / debian13, drop digest pinning (08a2197)
  • docs: bump next.js to 16.2.6 for GHSA-h64f-5h5j-jqjh (CVE-2026-44577) (a6a09e6)
  • docs: mkdocs configuration to build with correct folder (#543) (5557944)
  • docs: update brace-expansion to 5.0.6 to remediate GHSA-jxxr-4gwj-5jf2 (CVE-2026-45149) (6eb6fb5)
  • docs: update bun.lock to next 16.2.6 for GHSA-h64f-5h5j-jqjh (CVE-2026-44577) (91e0937)
  • ignore brackets inside JSON strings when splitting mixed content (#553) (bdcfc32)
  • learn: decode Unix home dirs whose username contains '.', '-' or '_' (211daae)
  • learn: decode Unix home dirs whose username contains '.', '-' or '_' (491a8b3)
  • learn: finish gemini-flash-latest default model sweep (982d01b)
  • learn: finish gemini-flash-latest default model sweep (#532) (d797366)
  • memory: READ-ONLY framing + fail-closed unresolved-project fallback (a178249)
  • memory: READ-ONLY framing + fail-closed unresolved-project fallback (482f80e)
  • update dashboard doc link (#544) (378d77e)
  • Update Next.js to 16.2.4 in docs/bun.lock to address GHSA-gx5p-jg67-6x7h (CVE-2026-44580) (0b9f11a)
  • Update Next.js to 16.2.6 in docs/package.json and package-lock.json to address GHSA-h64f-5h5j-jqjh (CVE-2026-44577) (db5d15f)
  • Upgrade litellm to 1.86.2 to remediate CVE-2026-42271 (07581b9)

Code Refactoring

  • cli: factor shared wrap-subcommand scaffolding (8eeb926)
  • cli: factor shared wrap-subcommand scaffolding (c74ad11)

0.22.4 (2026-05-26)

Bug Fixes

  • cli: G1 remediation — non-string clobber, per-model systemMessage, openhands gate (ea1976e)
  • cli: wrap CLI breadth — cline, continue, goose, openhands (8625f80)
  • cli: wrap subcommands for cline, continue, goose, openhands (c375fa1)
  • observability: G3 remediation — bound cardinality + wire dead metrics (2a717a9)
  • observability: RTK metrics + Rust observability (Phase H blocker) (b36ad9f)
  • observability: wire Phase G PR-G3 RTK + proxy metrics (H-blocker) (5f264a5)
  • release: tag format vX.Y.Z (drop release-please component prefix) (4a39ef5)
  • release: tag format vX.Y.Z (drop release-please component prefix) (0f3e3af)
  • subscription: address G2 review findings — phantom delta, multi-worker race, silent fallbacks (f68090c)
  • subscription: wire tokens_saved_rtk data plane (c7d1247)
  • subscription: wire tokens_saved_rtk from RTK stats endpoint (44c605f)
  • tests: drive RTK subprocess failure with real exec, not monkeypatched run (9b6d637)
  • tests: mock logger.warning directly instead of relying on caplog (c38dac3)
  • tests: patch headroom.rtk.get_rtk_path, not the helpers alias (317dffe)
  • tests: tomllib fallback to tomli on python 3.10 (74843d1)

Security

  • /debug/memory loopback guard. The endpoint was missing the Depends(_require_loopback) guard that all other /debug/* endpoints carry. External callers can no longer reach it.
  • retry_max_attempts zero guard. When retry_enabled=True and retry_max_attempts=0 the retry loop exited without setting last_error, causing raise last_error to raise TypeError: exceptions must derive from BaseException. A RuntimeError with an actionable message is now raised instead, and ProxyConfig.__post_init__ rejects retry_max_attempts < 1 at construction time.
  • Blocking subprocess on async event loop. _read_rtk_lifetime_stats and _read_lean_ctx_lifetime_stats called subprocess.run directly on the asyncio thread. The initialize_context_tool_session_baseline function is now async and offloads the subprocess via asyncio.to_thread; the stats endpoint uses await asyncio.to_thread(_get_context_tool_stats).
  • Hardcoded Neo4j credential in docker-compose.yml. NEO4J_AUTH now defaults to ${NEO4J_AUTH:-neo4j/devpassword} and is documented in .env.example (excluded from .gitignore via !.env.example).
  • SemanticCache.get_memory_stats() concurrent iteration. The method iterates self._cache.values() without holding the async lock. A snapshot is now taken via list(self._cache.values()) before iterating to avoid RuntimeError: dictionary changed size during iteration under async load.
  • Default Neo4j password in ProxyConfig. memory_neo4j_password default changed from "password" to "". The proxy startup path now emits a logger.warning when memory_backend == "qdrant-neo4j" and the password is empty, prompting operators to set a real credential.

Fixed

  • PyPI install clarity and release gating. Documented pipx --python python3.13 for environments where unsupported Python wheel tags cause older-version resolution, made PyPI publish failures block GitHub Releases unless PYPI_SKIP=true, and added an sdist LICENSE invariant.

  • headroom learn with claude-cli no longer fails silently on slow networks or large digests. The CLI backend timeout was a hard 120s wall-clock cap with no liveness signal: a successful long analysis and a hung connection looked identical, and exit 0 with "no recommendations" was the only user-visible signal. Two changes: (1) Streaming + idle timeout for claude-cli: the command now uses --output-format stream-json --verbose and a watchdog thread reads events as they arrive. The process is killed only after HEADROOM_LEARN_CLI_IDLE_TIMEOUT_SECS (default 60s) of zero output, or after HEADROOM_LEARN_CLI_TIMEOUT_SECS (default 300s, was 120s) total. Long-but-active analyses run to completion; genuine hangs are caught fast. The final type:"result" event carries the assistant response. Drains stdout/stderr via reader threads so the watchdog works on Windows too. (2) Env-var overrides for all CLI backends: HEADROOM_LEARN_CLI_TIMEOUT_SECS is honored by gemini-cli and codex-cli as the wall-clock timeout; idle override applies only to the streaming claude-cli path.

  • Learned: error recovery section in MEMORY.md no longer bloats with stale, one-shot, or contradictory entries. The matchers paired up unrelated tool calls (e.g. state.rs and lib.rs in the same dir becoming File state.rs does not exist. The correct path is lib.rs.), the dedup key was the literal rendered bullet text so near-duplicates each created their own row, the shutdown flush dropped the evidence gate to 1 so every singleton landed at session end, and there was no TTL or re-validation. Fixed at every layer: (1) Emission: Read recoveries require the failed/successful basenames to be identical or close in edit distance; Bash recoveries require a shared binary (allowing pythonpython3 and ruff.venv/bin/ruff variants) plus low-edit-distance OR a shared substantive non-flag token. Unrelated pairs are rejected at the source. (2) Dedup: error-recovery rows are hashed on recovery intent — Read on (basename(error_path), basename(success_path)), Bash on the primary command stripped of volatile suffixes (| tail -N, 2>&1, etc.). Near-duplicates collapse into one row. (3) Evidence gating: default min_evidence raised from 2 to 5; shutdown-relaxation removed; new --min-evidence flag and HEADROOM_MIN_EVIDENCE envvar so embedded clients can tighten the threshold further. (4) Render-time refinement: drop rows not re-observed in 21 days, re-validate Read success paths against the filesystem, collapse same-error_path-with-multiple-targets into one "use Glob/Grep first" bullet, rank by evidence_count * 0.5 ** (days/5), cap the section at 15. A→B / B→A contradiction pairs are also dropped at flush time. Patterns now stamp first_seen_at / last_seen_at on every save; _bump_persisted_evidence updates them via json_set. Other Learned: … categories (environment, preference, architecture) are untouched.

  • headroom unwrap codex now actually undoes headroom wrap codex — previously there was no unwrap codex subcommand at all, so the injected model_provider = "headroom" / [model_providers.headroom] block stayed in ~/.codex/config.toml forever and Codex continued routing through the (potentially stopped) proxy, surfacing as Missing environment variable: OPENAI_API_KEY. wrap codex now snapshots the pre-wrap config.toml to config.toml.headroom-backup before its first injection, and unwrap codex restores that snapshot byte-for-byte (or, if the backup is missing, strips only the Headroom-managed block and leaves surrounding user content intact). Safe no-op when run without a prior wrap. Reported by @raenaryl in Discord.

  • Image compressors now release shared router models after use and proxy shutdown — the proxy/image compression path no longer keeps global technique-router and SigLIP model instances pinned in memory after one-off image optimization work. The get_compressor() helper now returns a fresh, caller-owned compressor instead of a process-lifetime singleton.

  • headroom learn no longer clobbers prior recommendations on re-run — the marker block in CLAUDE.md / MEMORY.md is now merged with the prior block instead of wholesale-replaced. Sections re-surfaced by the new run win; sections not re-surfaced are carried forward so learnings accumulate across runs instead of disappearing. To fully rebuild the block, delete it manually and re-run. (#231)

  • headroom learn no longer emits dangling cross-references when a section is re-surfaced — the analyzer now includes the project's current <!-- headroom:learn --> block (from CLAUDE.md and MEMORY.md) in the LLM digest as a "Prior Learned Patterns" section, and the system prompt instructs the LLM that re-emitting a section replaces the prior one wholesale. Prevents bullets like "X is also large — same rule as Y, Z" from appearing after Y and Z got dropped during per-section replacement. The writer's section-level carry-forward from #231 remains in place as a safety net for sections the LLM omits entirely. New helper extract_marker_block added to headroom.learn.writer.

Added

  • turn_id linking agent-loop API calls to a single user prompt — a new compute_turn_id(model, system, messages) helper in headroom/proxy/helpers.py hashes the message prefix up to and including the last user-text message, yielding an id that is stable across every agent-loop iteration of one prompt but rolls over when the user sends a new prompt (or runs /compact, /clear). RequestLog gained a turn_id: str | None field, which is stamped at every log site (anthropic handler bedrock + direct branches, and the streaming handler) and surfaced as turn_id in /transformations/feed. Lets downstream consumers (e.g. the Headroom Desktop Activity tab) aggregate savings per user prompt rather than per API call.
  • Live flush of traffic-learned patterns to CLAUDE.md / MEMORY.md — the TrafficLearner now writes to agent-native context files continuously during proxy operation, not just at shutdown. A new dirty-flag debounced _flush_worker (10s window, FLUSH_DEBOUNCE_SECONDS) calls flush_to_file() whenever _accumulate() marks the learner dirty, so patterns surface in CLAUDE.md / MEMORY.md near real-time. Flushes read both persisted rows (via _load_persisted_patterns_from_sqlite) and the in-memory accumulator, bucket patterns by project via the learn plugin registry (plugin.discover_projects() + longest-path anchoring in _project_for_pattern), and route by PatternCategory to the correct file (_patterns_to_recommendations + _CATEGORY_TO_TARGET). Live flushes require evidence_count >= 2; the shutdown flush accepts single-evidence rows.

Fixed

  • Traffic-learner evidence count stuck at 1; duplicate DB rows across restarts. _accumulate queued patterns with the default ExtractedPattern.evidence_count = 1 regardless of how many times the pattern was actually seen, so every persisted row landed at 1 and never crossed the live-flush gate (evidence_count >= 2). Worse, once a pattern was in _saved_hashes it was early-returned on every re-sighting, and _saved_hashes reset on process restart — so a second sighting in a later session inserted a duplicate row rather than bumping the existing one. Now: _accumulate writes the real accumulated count at save time, start() hydrates _saved_hashes + a new _persisted_ids map from the DB, and re-sightings bump the persisted row's metadata.evidence_count via an atomic json_set UPDATE (_bump_persisted_evidence). _load_persisted_patterns_from_sqlite now filters via json_extract(metadata, '$.source') instead of a LIKE on the raw JSON string, so rows survive metadata rewrites.

Added

  • HEADROOM_QDRANT_* environment variables for memory Qdrant configuration (#31) — Memory(backend="qdrant-neo4j"), Mem0Config, MemoryConfig, and ProxyConfig now resolve their Qdrant connection from HEADROOM_QDRANT_URL, HEADROOM_QDRANT_HOST, HEADROOM_QDRANT_PORT, HEADROOM_QDRANT_API_KEY, HEADROOM_QDRANT_HTTPS, HEADROOM_QDRANT_PREFER_GRPC, and HEADROOM_QDRANT_GRPC_PORT. Explicit constructor arguments still win; unset env keeps the existing localhost:6333 defaults. Adds matching --memory-qdrant-{url,host,port,api-key} CLI flags. Enables hosted Qdrant (Qdrant Cloud) and shared/remote Qdrant stacks without code changes. New helper: headroom/memory/qdrant_env.py.
  • Telemetry stack & install-mode identity fields — anonymous beacon now reports headroom_stack (how Headroom is invoked: proxy, wrap_claude, adapter_ts_openai, ...) and install_mode (wrapped / persistent / on_demand), plus requests_by_stack for proxies that serve multiple integrations. Proxy exposes a by_stack bucket alongside by_provider / by_model on /stats, a matching headroom_requests_by_stack Prometheus counter, and an X-Headroom-Stack header honored by the FastAPI middleware. headroom wrap <tool> sets HEADROOM_STACK=wrap_<agent>; the TS SDK and all four adapters (openai, anthropic, gemini, vercel-ai) tag their compress calls. Schema migration: sql/upgrade_telemetry_stack_context.sql.
  • Canonical filesystem contract (issue #175) — new HEADROOM_CONFIG_DIR (default ~/.headroom/config, read-mostly) and HEADROOM_WORKSPACE_DIR (default ~/.headroom, read-write state) env vars recognized by the Python proxy/CLI and the npm SDK. Additive; all existing per-resource env vars (HEADROOM_SAVINGS_PATH, HEADROOM_TOIN_PATH, HEADROOM_SUBSCRIPTION_STATE_PATH, HEADROOM_MODEL_LIMITS) continue to work with identical semantics. Docker install scripts and docker-compose.native.yml forward the new vars into containers so savings, logs, and telemetry resolve to the bind-mounted .headroom path. See wiki/filesystem-contract.md.

Changed

  • /stats-history now returns compact checkpoint history by default — the JSON response keeps recent checkpoints dense while evenly sampling older checkpoints so long-running installs do not return ever-growing payloads. Add history_mode=full to fetch the full retained checkpoint list, or history_mode=none to skip it entirely while still receiving the derived hourly/daily/weekly/monthly rollups. Responses now include a history_summary block describing stored versus returned points.

Fixed

  • Streaming Anthropic requests are now visible to /stats.recent_requests and /transformations/feed_finalize_stream_response did not call self.logger.log(...), so the entire streaming Anthropic code path (the one Claude Code uses) silently bypassed the request logger. Only the non-streaming Anthropic path and the Bedrock streaming path were logged. As a consequence, --log-messages had no observable effect on the live transformations feed for typical traffic. The streaming finalizer now emits the same RequestLog shape the other paths do, including request_messages when log_full_messages is enabled.

[0.5.22] - 2026-04-11

Added

  • Cross-agent memory — Claude saves a fact, Codex reads it back. All agents sharing one proxy share one memory store. Project-scoped DB at .headroom/memory.db, auto user_id from $USER.
  • Agent provenance tracking — every memory records which agent saved it (source_agent, source_provider, created_via), with edit history on updates.
  • LLM-mediated dedup — on memory_save, enriched response hints similar existing memories to the LLM. Background async dedup auto-removes >92% cosine duplicates. Zero extra LLM calls.
  • Memory for OpenAI and Gemini handlers — context injection + tool handling wired into all three provider handlers (Anthropic, OpenAI, Gemini).
  • Plugin architecture for headroom learn — each agent (Claude, Codex, Gemini) is a self-contained plugin. External plugins register via headroom.learn_plugin entry points. --agent flag for CLI.
  • GeminiScanner for headroom learn — reads ~/.gemini/tmp/*/chats/session-*.json and .jsonl.
  • Code graph integrationheadroom wrap claude --code-graph auto-indexes the project via codebase-memory-mcp for call-chain traversal, impact analysis, and architectural queries. Opt-in, ~200 token overhead with Claude Code's MCP Tool Search.
  • OpenAI embedder auto-detection — memory backend uses OpenAI embeddings when sentence-transformers is unavailable (no torch/2GB dependency needed).
  • Live traffic learning flushheadroom wrap <agent> --learn flushes learned patterns to the correct agent-native file (MEMORY.md / AGENTS.md / GEMINI.md) at proxy shutdown.

Changed

  • CodeCompressor disabled by default — AST-based code compression produced invalid syntax on 40% of real files. Code now passes through uncompressed. Use --code-graph for code intelligence instead, or re-enable with --code-aware.
  • Shared tool name map — consolidated tool normalization across all learn plugins into _shared.py.
  • Dynamic CLI agent detectionheadroom learn discovers agents via plugin registry, no hardcoded choices.

Fixed

  • CodeCompressor statement-based truncation — body truncation now walks AST statements (not lines), never cuts mid-expression. Fixes syntax errors on multi-line dict literals and function calls.
  • Docstring FIRST_LINE mode — uses source lines directly instead of reconstructing from byte offsets. Properly handles all quote styles.
  • Memory shutdown queue drain — patterns in the save queue were lost on proxy shutdown. Now drained before exit.

Added

  • Codex-proxy resilience hardening — reduces event-loop starvation under cold-start reconnect storms
    • Stage-timing instrumentation — per-stage durations for both Codex WS accept and Anthropic /v1/messages pre-upstream phases emitted as a single STAGE_TIMINGS structured log line per request plus Prometheus histograms
    • Per-pipeline shared warmup — Anthropic + OpenAI pipelines eagerly load compressors/parsers once at startup; status merged into WarmupRegistry for /debug/warmup and /readyz
    • WS session registry — first-class tracking of active Codex WS sessions with deterministic relay-task cancellation and termination-cause classification (client_disconnect, upstream_error, client_timeout, etc.)
    • Bounded pre-upstream Anthropic concurrency--anthropic-pre-upstream-concurrency / HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY caps simultaneous /v1/messages pre-upstream work (body read, deep copy, first compression stage, memory-context lookup, upstream connect) so replay storms cannot starve /livez, /readyz, and new Codex WS opens. Default: auto max(2, min(8, cpu_count)); 0 or negative disables (unbounded)
    • Loopback-only debug endpoints/debug/tasks, /debug/ws-sessions, /debug/warmup return 404 (not 403) to non-loopback callers so external scanners cannot enumerate them
    • Reconnect-storm repro harnessscripts/repro_codex_replay.py drives concurrent WS + HTTP replay traffic against a local proxy and asserts /livez p99 under threshold; --json output routes JSON to stdout and the human summary to stderr
  • Proxy liveness and readiness health checks
    • Adds GET /livez for process liveness and GET /readyz for traffic readiness
    • Keeps GET /health backward compatible while expanding it with readiness details and subsystem checks
    • Eagerly initializes configured memory backends during proxy startup so readiness reflects real serving capability
    • Wires /readyz into the Docker image HEALTHCHECK and the example docker-compose.yml
  • Durable proxy savings history
    • Persists proxy compression savings history locally at ~/.headroom/proxy_savings.json
    • Supports HEADROOM_SAVINGS_PATH to override the storage location
    • Adds /stats-history with lifetime totals plus hourly/daily/weekly/monthly rollups
    • Supports JSON and CSV export from /stats-history
    • Extends /stats with a persistent_savings block while keeping savings_history backward compatible
    • Adds a historical mode to /dashboard backed by /stats-history, including export actions
  • Proxy telemetry SDK override via HEADROOM_SDK
    • Downstream apps can override the anonymous telemetry sdk field without patching installed files
    • Blank values fall back to the default proxy label
  • headroom learn — Offline failure learning for coding agents
    • Analyzes past conversation history (Claude Code, extensible to Cursor/Codex)
    • Success correlation: for each failure, finds what succeeded after and extracts the specific correction
    • 5 analyzers: Environment, Structure, Command Patterns, Retry Prevention, Cross-Session
    • Writes specific learnings to CLAUDE.md (stable project facts) and MEMORY.md (session patterns)
    • Generic architecture: tool-agnostic ToolCall model, pluggable Scanner/Writer adapters
    • Dry-run by default, --apply to write, --all for all projects
    • Example output: "FirstClassEntity.java is not at axion-formats/ — actually at axion-scala-common/"
  • Read Lifecycle Management — Event-driven compression of stale/superseded Read outputs
    • Detects when a Read output becomes stale (file was edited after) or superseded (file was re-read)
    • Replaces stale/superseded content with compact CCR markers, stores originals for retrieval
    • 75% of Read output bytes are provably stale or redundant (from real-world analysis of 66K tool calls)
    • Fresh Reads (latest read, no subsequent edit) are never touched — Edit safety preserved
    • Opt-in via ReadLifecycleConfig(enabled=True), disabled by default
    • Handles both OpenAI and Anthropic message formats
  • any-llm backend - Route requests through 38+ LLM providers (OpenAI, Mistral, Groq, Ollama, etc.) via any-llm
    • Enable with --backend anyllm --anyllm-provider <provider>
    • Install with: pip install 'headroom-ai[anyllm]'
  • Production-ready proxy server with caching, rate limiting, and metrics
  • CLI command headroom proxy to start the proxy server
  • IntelligentContextManager (semantic-aware context management)
    • Multi-factor importance scoring: recency, semantic similarity, TOIN importance, error indicators, forward references, token density
    • No hardcoded patterns - all importance signals learned from TOIN or computed from metrics
    • TOIN integration for retrieval_rate and field_semantics-based scoring
    • Strategy selection: NONE, COMPRESS_FIRST, DROP_BY_SCORE based on budget overage
    • Atomic tool unit handling (call + response dropped together)
    • Configurable scoring weights via ScoringWeights dataclass
    • IntelligentContextConfig for full configuration control
    • Backwards compatible with RollingWindowConfig
  • LLMLingua-2 Integration (opt-in ML-based compression)
    • LLMLinguaCompressor transform using Microsoft's LLMLingua-2 model
    • Content-aware compression rates (code: 0.4, JSON: 0.35, text: 0.3)
    • Memory management utilities: unload_llmlingua_model(), is_llmlingua_model_loaded()
    • Proxy integration via --llmlingua flag
    • Device selection: --llmlingua-device (auto/cuda/cpu/mps)
    • Custom compression rate: --llmlingua-rate
    • Helpful startup hints when llmlingua is available but not enabled
    • Install with: pip install headroom-ai[llmlingua] (the [llmlingua] extra was removed in 0.9.x)
  • Code-Aware Compression (AST-based, syntax-preserving)
    • CodeAwareCompressor transform using tree-sitter for AST parsing
    • Supports Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
    • Preserves imports, function signatures, type annotations, error handlers
    • Compresses function bodies while maintaining structural integrity
    • Guarantees syntactically valid output (no broken code)
    • Automatic language detection from code patterns
    • Memory management: is_tree_sitter_available(), unload_tree_sitter()
    • Uses tree-sitter-language-pack for broad language support
    • Install with: pip install headroom-ai[code]
  • ContentRouter (intelligent compression orchestrator)
    • Auto-routes content to optimal compressor based on type detection
    • Source hint support for high-confidence routing (file paths, tool names)
    • Handles mixed content (e.g., markdown with code blocks)
    • Strategies: CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, TEXT, LLMLINGUA
    • Configurable strategy preferences and fallbacks
    • Routing decision log for transparency and debugging
  • Custom Model Configuration
    • Support for new models: Claude 4.5 (Opus), Claude 4 (Sonnet, Haiku), o3, o3-mini
    • Pattern-based inference for unknown models (opus/sonnet/haiku tiers)
    • Custom model config via HEADROOM_MODEL_LIMITS environment variable
    • Config file support: ~/.headroom/models.json
    • Graceful fallback for unknown models (no crashes)
    • Updated pricing data for all current models

Fixed

  • Event.wait task leak in subscription trackersasyncio.shield pattern prevents cancellation of the outer wait_for from leaking the inner Event.wait task
  • Python 3.10 compatibility for memory-context fail-open — catches asyncio.TimeoutError (the 3.10-compatible alias) rather than TimeoutError to preserve behaviour on older runtimes
  • uvicorn proxy_headers=False — refuses Forwarded / X-Forwarded-For rewrites so the loopback guard on /debug/* cannot be spoofed by a misconfigured reverse proxy
  • First-frame timeout for Codex WS accepts — guards against a client that opens a handshake and never sends the first frame; relays cancel deterministically with client_timeout
  • Semaphore leak on unexpected exception in Anthropic pre-upstream path — the finalizer now releases the pre-upstream semaphore on every exit path (early 4xx, cache hit, upstream error, streaming handoff)
  • active_relay_tasks gauge double-decrementderegister_and_count returns (handle, released_task_count) atomically so the handler decrements the Prometheus gauge by the exact number it registered, eliminating drift

Internal

  • IPv6-mapped loopback recognition — the loopback guard parses ::ffff:127.0.0.1 and other dual-stack literals through ipaddress.ip_address(...).is_loopback
  • Lock-free stage-timing accumulatorsrecord_stage_timings writes to per-path counters that do not contend with /metrics export or record_request
  • Narrow contextlib.suppress in relay classification — only CancelledError is suppressed where we reclassify it; other exceptions propagate so termination cause stays truthful
  • jitter_delay_ms helper — shared exponential-backoff + 50-150% jitter formula in headroom/proxy/helpers.py; used by three proxy retry sites and mirrored inline in the repro harness

0.2.0 - 2025-01-07

Added

  • SmartCrusher: Statistical compression for tool outputs
    • Keeps first/last K items, errors, anomalies, and relevance matches
    • Variance-based change point detection
    • Pattern detection (time series, logs, search results)
  • Relevance Scoring Engine: ML-powered item relevance
    • BM25Scorer: Fast keyword matching (zero dependencies)
    • EmbeddingScorer: Semantic similarity with sentence-transformers
    • HybridScorer: Adaptive combination of both methods
  • CacheAligner: Prefix stabilization for better cache hits
    • Dynamic date extraction
    • Whitespace normalization
    • Stable prefix hashing
  • RollingWindow: Context management within token limits
    • Drops oldest tool units first
    • Never orphans tool results
    • Preserves recent turns
  • Multi-Provider Support:
    • Anthropic with official count_tokens API
    • Google with official countTokens API
    • Cohere with official tokenize API
    • Mistral with official tokenizer
    • LiteLLM for unified interface
  • Integrations:
    • LangChain callback handler (HeadroomOptimizer)
    • MCP (Model Context Protocol) utilities
  • Proxy Server (headroom.proxy):
    • Semantic caching with LRU eviction
    • Token bucket rate limiting
    • Retry with exponential backoff
    • Cost tracking with budget enforcement
    • Prometheus metrics endpoint
    • Request logging (JSONL)
  • Pricing Registry: Centralized model pricing with staleness tracking
  • Benchmarks: Performance benchmarks for transforms and relevance scoring

Changed

  • Improved token counting accuracy across all providers
  • Enhanced tool output compression with relevance-aware selection

Fixed

  • Mistral tokenizer API compatibility
  • Google token counting for multi-turn conversations

0.1.0 - 2025-01-05

Added

  • Initial release
  • HeadroomClient: OpenAI-compatible client wrapper
  • ToolCrusher: Basic tool output compression
  • Audit mode for observation without modification
  • Optimize mode for applying transforms
  • Simulate mode for previewing changes
  • SQLite and JSONL storage backends
  • HTML report generation
  • Streaming support

Safety Guarantees

  • Never removes human content
  • Never breaks tool ordering
  • Parse failures are no-ops
  • Preserves recency (last N turns)

Migration Guide

From 0.1.x to 0.2.x

The 0.2.0 release is backward compatible. New features are opt-in:

# Old code still works
from headroom import HeadroomClient, OpenAIProvider

# New SmartCrusher (replaces ToolCrusher for better compression)
from headroom import SmartCrusher, SmartCrusherConfig

config = SmartCrusherConfig(
    min_tokens_to_crush=200,
    max_items_after_crush=50,
)
crusher = SmartCrusher(config)

# New relevance scoring
from headroom import create_scorer

scorer = create_scorer("hybrid")  # or "bm25" for zero deps

Using the Proxy

New in 0.2.0 - run Headroom as a proxy server:

# Start the proxy
headroom proxy --port 8787

# Use with Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude