Skip to content

Releases: scouzi1966/maclocal-api

afm v0.9.13

21 Jun 19:12

Choose a tag to compare

afm v0.9.13

OpenAI-compatible local LLM inference for Apple Silicon (MLX + Apple Foundation Models).

Highlights since v0.9.12 (73 commits)

New models

  • cohere2_moe — Cohere North-Mini-Code (30B-A3B MoE). Correct across streaming, non-streaming, prefix-cache, and concurrency (#139).

⚡ Speculative decoding (quality-preserving)

  • --mtp — Qwen3.6 self-speculative decoding via the in-model MTP head → ~+52% decode.
  • --eagle3 <drafter> — dense Gemma4-31B EAGLE3 drafter → ~+30% decode.
  • Both work streaming and non-streaming. Bit-exact to greedy on short generations, near-greedy on long ones.

APIs & agent-friendliness

  • /v1/embeddings on the main server (Apple NaturalLanguage) (#132, #133).
  • Mid-stream cancel + /v1/tokenize / /v1/count_tokens + /openapi.json & /docs (#126).
  • vLLM-namespaced /metrics + Grafana dashboard (#122).
  • Apple-native Vision OCR and Speech transcription HTTP APIs (thanks @jesserobbins).

Performance & platform

  • Backported mlx-swift 0.31.3 adaptive-block SDPA~+10% decode @16k (pin stays 0.30.3).
  • Eager <think>-tag streaming + Metal-kernel prewarm. Swift 6 language mode migration.

Fixes

  • --no-think / server-default enable_thinking=false now actually disables thinking on reasoning models (was a silent no-op).
  • MTP reject path retains the committed token in the KV/GDN cache (fixes garbled output on longer generations).

Known limitations

  • --no-think + high --concurrent can corrupt output (#140). Default behavior unaffected; use lower concurrency or omit --no-think.
  • MTP is bit-exact to greedy on short generations; longer ones stay greedy-quality but may differ token-for-token.

Install

brew tap scouzi1966/afm && brew install scouzi1966/afm/afm   # or: brew upgrade afm
pip install macafm

SHA256 (afm-v0.9.13-arm64.tar.gz): 443bf74650fece15f7ce02663f6d5dd14a7b638c937f80262e426903a6abf42b

afm-next (20260621 · 97e6683)

21 Jun 00:33

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 97e6683
  • Date: 20260621
  • Version: 0.9.13-next.97e6683.20260621

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (a92a0fe)

  • fix(test): repair EmbeddingsControllerTests against resolver-based init (97e6683)
  • feat(model): add cohere2_moe (Cohere North-Mini-Code) with all-mode correctness (#139) (4e46bc5)
  • docs(skill): nightly Step 4b also bumps the pinned pip wheel example in README (d2437d8)
  • docs(README): bump pinned pip nightly example to dev20260614 (2880dd4)
  • Update nightly release link to 20260614-a92a0fe (22e4aff)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://maclocal-ai.pages.dev/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://maclocal-ai.pages.dev/afm/wheels/simple/ macafm-next   # nightly

afm-next (20260614 · a92a0fe)

14 Jun 16:12
a92a0fe

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: a92a0fe
  • Date: 20260614
  • Version: 0.9.13-next.a92a0fe.20260614

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (5aad36d)

  • feat(server): serve /v1/embeddings on the main server + advertise it (#132, #133) (895d20f)
  • docs(README): point pip wheel index at maclocal-ai.pages.dev (ef94c4e)
  • docs(README): highlight lossless speculative decoding (MTP + EAGLE3) (32ef421)
  • Update nightly release link to 20260613-5aad36d (75f43a9)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm-next (20260613 · 5aad36d)

13 Jun 11:18

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 5aad36d
  • Date: 20260613
  • Version: 0.9.13-next.5aad36d.20260613

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (4bd0ec62a2cc39e4d69463f5ff8f1c119a3759ed)

  • Merge speculative-decoding (MTP/EAGLE3 + streaming + benchmarks) into main (5aad36d)
  • docs: add "which flag for which model" decision section (4300baf)
  • merge afm-opt: PR #134 build/metallib + whitespace review fixes (0b3d120)
  • fix(build): address PR #134 review — metallib install guard, debug symlink, probe, whitespace (248a8b5)
  • fix(spec-stream): address PR #135 review — think injection, error propagation, cancellation (dea202c)
  • docs(bench): streaming spec-decode retest results (MTP + EAGLE3) (689cb83)
  • feat(spec): streaming support for MTP and EAGLE3 fast paths (e88011d)
  • docs: decode-optimization feature guide + release notes / social copy (e0935fd)
  • perf(eagle3)+bench: lossless bs=2 fast path; afm vs mlx-vlm verify-fidelity (f518a25)
  • bench(eagle3): afm vs mlx-vlm EAGLE3 head-to-head on dense Gemma4-31B (f535e21)
  • bench(qwen36): 2026-06-06 MLX engine re-run (latest) + MTP head-to-head (f5a080a)
  • feat(eagle3): P2/P3 — --eagle3 CLI + service routing, +22% decode on Gemma4-31B (81bd262)
  • feat(eagle3): P1 greedy speculative loop — output identical to greedy AR (7226736)
  • fix(gemma4): proportional RoPE on full-attention layers (was stock RoPE) (5a81b9b)
  • docs: document /v1/embeddings API and list embed/speech in help card (#131) (8f6a999)
  • feat(eagle3): P0 — Swift Gemma4Eagle3Drafter, bit-exact vs Python reference (e3f082c)
  • feat(eagle3): P0 reference-capture for the dense Gemma4-31B EAGLE3 port (4f8c9a4)
  • docs(eagle3): phased afm/Swift port plan for dense Gemma4-31B EAGLE3 (+25% validated) (80ea9b5)
  • bench(gemma4): dense 31B flips it — EAGLE3 +25% (vs MoE all-negative) (551dbb8)
  • bench(gemma4): spec-decode validation — all 3 methods SLOWER than AR on MoE (negative) (09720b0)
  • docs(mtp): record the +52% win; note --mtp-depth now vestigial (ade3ae2)
  • perf(mtp): rewrite loop after mlx-lm PR #990 — +52% decode vs AR (was +6%) (9d86fea)
  • docs(mtp): record final implementation result (runnable, +6.5% at depth 1) (b6dc626)
  • perf(mtp): depth-1 default beats AR (+6.5%); vectorized acceptance + instrumentation (05e288f)
  • feat(mtp): P2 runnable in afm via --mtp — correct, perf WIP (b628326)
  • feat(mtp): P2 — MTP self-speculative generator, output identical to greedy AR (11da477)
  • feat(mtp): P1 — GatedDeltaNet cache rollback, bit-exact (the make-or-break gate) (4474222)
  • chore(mtp): point P0 test + capture at the cache-root model location (9eeaf77)
  • feat(mtp): P0 — Swift Qwen3_5MTPHead, bit-exact vs Python reference (d2ab0af)
  • feat(mtp): P0 reference-capture harness for the Swift MTP head port (ad86bfe)
  • docs(mtp): phased afm/Swift port plan for MTP self-speculative decoding (b7042b8)
  • bench(ollama): use qwen3.6:27b-mlx (MLX tag), not the failing GGUF default (999ccb4)
  • bench(qwen36): rerun full 7-engine cross-engine suite + refresh plots/results (e72db8e)
  • fix(metallib): include random (RNG) kernel — was crashing sampled generation (401484c)
  • wip(bench): SDPA backport report/plot + metallib RNG-guard fix; harness fixes (9ef222f)
  • perf(mlx/sdpa): backport 0.31.3 adaptive-block 2-pass SDPA — decode@16k ~+10% (7d180f8)
  • docs(deps): reconfirm mlx-swift 0.30.3 pin — 0.31.3 still has long-context SDPA regression (ddc2c97)
  • perf(stream): eager think-tag emission — cut reasoning TTFT ~610ms -> ~346ms (f1343a6)
  • perf(mlx): prewarm Metal kernels on server startup (faster cold first-token) (33c247d)
  • test: Qwen3.6-27B local-engine performance benchmark (afm vs 6 engines) (259b5f0)
  • fix(swift6): box command+error across the CFRunLoop Task (Swift 6.3.2) (e569a75)
  • build: migrate to Swift 6 language mode (#130) (1a6ffc1)
  • feat(build): add --install flag + verify binary paths (7144459)
  • docs: prominent one-command build-from-source section in README (ac3d303)
  • build: add root-level build.sh entry point for clone-and-build (781cfb8)
  • fix(toolcall): prevent server crash on scalar JSON arg value (#128) (#129) (c55d54c)
  • test: post-merge validation reports for ab90ac1 (proof for #127) (cfb2676)
  • feat(agent): T1.4-T1.7 — cancel + tokenize + OpenAPI (rerun, stacked-merge correction) (#126) (ab90ac1)
  • feat(metrics): vLLM-namespaced /metrics + Grafana dashboard (#122) (a8dbffa)
  • feat(agent): Tier-0 promotion + Tier-1 quick wins (request id, stream usage, parallel_tool_calls) (#123) (ed68189)
  • docs(claude): require proof before labeling test failures pre-existing (46e2698)
  • Bump version to 0.9.13 for next dev cycle (1cbd60f)
  • README: move Install section above the fold (86429a8)
  • Release v0.9.12: promote nightly to stable (7dfc4f6)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm 0.9.12

08 May 02:21

Choose a tag to compare

afm 0.9.12

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.11

  • fix: resolve executable path via _NSGetExecutablePath, not argv[0] (4bd0ec6)
  • Add nightly test results for 2026-05-02 (7af438e) (4cbbe06)
  • Update nightly release link to 20260502-a589c50 (7af438e)
  • feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage (#119) (a589c50)
  • Feature: Expand vision API with barcode, classify, and saliency modes (#114) (b7ffe18)
  • Add on-device speech transcription and TTS (#113) (34001ec)
  • fix: afm -w falls back to ephemeral port when 9999 is busy (#116) (1e9f22c)
  • README: surface "What's new in afm-next" above Install (779e89e)
  • skill(promote-nightly): validate on staging tap before touching production (53e58fa)
  • README: remove staging tap from public docs (87c105e)
  • README: document installing previous versions of afm (3dae2dd)
  • Bump version to 0.9.12 for next dev cycle (7a23874)
  • Release v0.9.11: promote nightly to stable (94fdc35)
  • skill(promote-nightly): verify Apple framework links and bundle id in Step 5h (2717cdd)
  • Credit @jesserobbins for Vision OCR and Speech transcription (36e0194)
  • Fix publish-next.sh tap-staging: use full ${VERSION} not ${DATE} (90fd1d3)
  • Update nightly release link to 20260418-9c3225e (a08f840)

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.12

afm-next (20260502 · a589c50)

02 May 22:52
a589c50

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: a589c50
  • Date: 20260502
  • Version: 0.9.12-next.a589c50.20260502

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (9c3225e)

  • feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage (#119) (a589c50)
  • Feature: Expand vision API with barcode, classify, and saliency modes (#114) (b7ffe18)
  • Add on-device speech transcription and TTS (#113) (34001ec)
  • fix: afm -w falls back to ephemeral port when 9999 is busy (#116) (1e9f22c)
  • README: surface "What's new in afm-next" above Install (779e89e)
  • skill(promote-nightly): validate on staging tap before touching production (53e58fa)
  • README: remove staging tap from public docs (87c105e)
  • README: document installing previous versions of afm (3dae2dd)
  • Bump version to 0.9.12 for next dev cycle (7a23874)
  • Release v0.9.11: promote nightly to stable (94fdc35)
  • skill(promote-nightly): verify Apple framework links and bundle id in Step 5h (2717cdd)
  • Credit @jesserobbins for Vision OCR and Speech transcription (36e0194)
  • Fix publish-next.sh tap-staging: use full ${VERSION} not ${DATE} (90fd1d3)
  • Update nightly release link to 20260418-9c3225e (a08f840)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm 0.9.11

20 Apr 15:50
9c3225e

Choose a tag to compare

afm 0.9.11

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.10

  • Bump version baseline to 0.9.11 (post-v0.9.10 stable) (9b3f3bf)
  • Fix macOS 26 Speech Recognition SIGABRT: embed Info.plist via linker (b39cd60)
  • Resolve merge conflicts for PR #107 (e139985)
  • Address second round of Vision OCR review feedback (9039eb5)
  • Close taskRef/onCancel race window (8c10027)
  • Fix recognition task cancellation leak and lock race (4f19247)
  • Fix data race in speech recognition timeout (34406fd)
  • Address speech transcription review feedback (8b3a3f0)
  • Add on-device audio transcription via Apple Speech framework (7e18c90)
  • Address Vision OCR review feedback (38b16c5)
  • Fix Vision OCR in webui: bypass Foundation Model 4096 token limit (0292f3c)
  • Address Vision OCR review feedback (5159ea2)
  • Document Vision OCR API (b1d0f9c)
  • Add Vision OCR API and stabilize tests (7ab80b6)
  • Release v0.9.10: promote nightly to stable (332c8c2)
  • feat: versioned Homebrew formulae for afm and afm-next (#102) (7cff3df)
  • fix: handle 1D logits in TopPSampler (#100) (#101) (36ee874)
  • chore: bump wheel version to 0.9.10.dev20260408 (212d945)
  • Add nightly test results for 2026-04-07 (4f24281) (272cc13)
  • Update nightly release link to 20260408-628c2bb (4f24281)

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.11

afm-next (20260418 · 9c3225e)

18 Apr 15:43
9c3225e

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 9c3225e
  • Date: 20260418
  • Version: 0.9.11-next.9c3225e.20260418

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

🙏 Acknowledgement

Huge thanks to first-time contributor @jesserobbins — this cycle landed two substantial features from him: the Apple Vision OCR HTTP API (#104) and Apple Speech transcription (#107). Both lift afm's Apple-native capabilities from CLI-only into first-class HTTP APIs compatible with the OpenAI-style surface that third-party clients already speak. Contributions of this size and quality from a new contributor are rare and appreciated.

Highlights

  • Apple Vision OCR HTTP APIPOST /v1/vision/ocr for files, multipart uploads, base64, data URLs, and OpenAI-style image content parts. Multi-page PDF support, structured document/page/block/table output, Foundation chat auto-OCR integration. Contributed by @jesserobbins (#104).
  • Apple Speech transcription — on-device audio transcription via the Speech framework. New afm speech -f <file> CLI, POST /v1/audio/transcriptions API, chat input_audio content parts. Supports WAV/MP3/M4A/CAF/AIFF. Contributed by @jesserobbins (#107).
  • macOS 26 privacy fix — binary now embeds NSSpeechRecognitionUsageDescription via -sectcreate __TEXT __info_plist, so Speech Recognition actually works instead of SIGABRT'ing the process. First invocation from Terminal prompts for permission as expected (no Developer ID required). (#108)
  • Versioned Homebrew formulae — pinned nightly formulae afm-next@<full-version>.rb generated alongside the rolling afm-next.rb so users can brew install a specific nightly build. (#102)
  • TopPSampler 1D-logits crash fix — no longer crashes when concurrent batching meets top_p<1. (#100 / #101)

Changes since last build (628c2bb)

  • Fix publish-next.sh tap-staging: use full ${VERSION} not ${DATE} (90fd1d3)
  • Update nightly release link to 20260418-9c3225e (a08f840)
  • Bump version baseline to 0.9.11 (post-v0.9.10 stable) (9b3f3bf)
  • Fix macOS 26 Speech Recognition SIGABRT: embed Info.plist via linker (b39cd60)
  • Merge PR #107 (b7ecdbd, e139985) — speech transcription
  • Speech transcription hardening (8c10027, 4f19247, 34406fd, 8b3a3f0, 7e18c90)
  • Vision OCR + webui bypass (a3b60a5, 9039eb5, 38b16c5, 0292f3c, a1dda2b, 5159ea2, b1d0f9c, 7ab80b6)
  • feat: versioned Homebrew formulae for afm and afm-next (#102) (7cff3df)
  • fix: handle 1D logits in TopPSampler (#100) (#101) (36ee874)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

# Pinned to this exact nightly:
brew install scouzi1966/afm/afm-next@0.9.11-next.9c3225e.20260418

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm 0.9.10

08 Apr 03:34
628c2bb

Choose a tag to compare

afm 0.9.10

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Highlights

  • Gemma 4 support — text, vision-language, and MoE variants with tool calling
  • Gemma 4 concurrent batch mode — ~10x throughput via new BatchRotatingKVCache for sliding-window attention
  • Server-level --guided-json now actually constrains MLX requests (#97)
  • Concurrent / prefix-cache stability — resolved radix cache SIGTRAP on wrapped RotatingKVCache (#94), batched prefill lazy-graph overflow, and Metal buffer lifecycle issues under long runs (#88)
  • Performance — removed container.perform lock and actor serialization bottlenecks, raised SSE multiplex batch limit to 200, pipeline timing instrumentation via AFM_DEBUG=1
  • Request correlation IDs for end-to-end tracing across the server → scheduler → MLX path

Fixes since v0.9.9

  • --guided-json server flag now applied to every request (fixed in #97)
  • Gemma 4 batch mode, structured tool history, and Metal buffer lifecycle (#88)
  • Gemma 4 streaming + non-streaming tool call type coercion (array/object/int) (#87)
  • Radix cache SIGTRAP for wrapped RotatingKVCache (#94)
  • Root-cause batched prefill crash caused by MLX lazy graph overflow
  • Snapshot prefill state to prevent decode mutation corruption
  • BatchRotatingKVCache mask totalLen after circular buffer wrap
  • Flatten anyOf/oneOf nullable schemas for Jinja template safety
  • Structured output streaming regression
  • Queue requests instead of rejecting with server_busy
  • Homebrew libexec search path for metallib
  • Test harness: timestamped logs, format validation, Codex ARG_MAX, baseline tagging, spec extraction (#98)

Known issues

  • TopPSampler + concurrent mode crash: on this exact commit, requests with 0 < top_p < 1 hitting the BatchScheduler (concurrent mode, llama.cpp WebUI default) abort with [squeeze] axis 0 fatal error. Fix is on main (PR #101) and will ship in v0.9.11 or the next nightly. Workaround: use top_p=1.0 or temperature=0 in concurrent mode for now, or use the WebUI against the non-concurrent single-sequence server path.

Install / Upgrade via Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Pin to this version specifically:

brew install scouzi1966/afm/afm@0.9.10

Install via PyPI

pip install macafm==0.9.10

afm-next (20260408 · 628c2bb)

08 Apr 00:57
628c2bb

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 628c2bb
  • Date: 20260408
  • Version: 0.9.10-next.628c2bb.20260408

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (2b647b2)

  • fix: --guided-json server flag and per-test spec extraction (#97, #98) (628c2bb)
  • fix: generate-report.py reads from RESULTS_FILE env var, not hardcoded path (6b36714)
  • fix: timestamp all temp files to prevent overwrites between runs (f10f72b)
  • fix: codex per-test scoring — local outside function, unbound var (5142185)
  • fix: handle unset AFM_BIN with set -u (e48e0f5)
  • fix: mlx-model-test.sh defaults to local build over PATH (2f88460)
  • fix: smart analysis reporting — codex ARG_MAX, format validation, baseline tagging (f582ac5)
  • refactor: address PR #95 review — extract helper, improve readability (2c7d205)
  • fix: skip radix save for wrapped RotatingKVCache to prevent SIGTRAP (#94) (48f88a8)
  • fix: Gemma 4 batch mode, structured tool history, and Metal buffer lifecycle (#88) (d5573d1)
  • Add nightly test results for 2026-04-04 (4 models) (44c048e)
  • fix: Gemma 4 tool call type coercion (array/object/int) (2aa2abd)
  • fix: flatten anyOf/oneOf nullable schemas for Jinja template safety (ce66763)
  • Merge feature/gemma4-batch-kvcache: 10x throughput for Gemma 4 (bc9de80)
  • Fix structured output streaming regression (4afb1ff)
  • fix: address PR review — empty cache merge safety, slot wait cancellation (731ca38)
  • refactor: raise SSE multiplex batch limit to 200, extract as constant (55ea028)
  • fix: root-cause batched prefill crash — MLX lazy graph overflow (329b114)
  • refactor: improve updateConcat alloc pattern, keep individual prefill (3d5f9e1)
  • feat: add request correlation ID for end-to-end tracing (ea12c89)
  • perf: add pipeline timing instrumentation (AFM_DEBUG=1) (65c093c)
  • perf: remove container.perform lock and actor serialization bottlenecks (a289c42)
  • fix: snapshot prefill state to prevent decode mutation corruption (6e16a21)
  • fix: BatchRotatingKVCache mask totalLen after circular buffer wrap (f650d65)
  • fix: updateConcat alloc size, debug prefillBatch B>=3 crash (38159f0)
  • debug: add SDPA shape logging and BatchRotatingKV tracing (22461ac)
  • Bypass adaptive XML for Gemma 4 tool calls (45910d7)
  • feat: BatchRotatingKVCache — Gemma 4 concurrent batch mode working (4006a11)
  • WIP: BatchRotatingKVCache — B=1 works, B>=2 segfaults in SDPA (c04eafc)
  • WIP: BatchRotatingKVCache for Gemma 4 batch mode (0f94cca)
  • refactor: extract magic numbers to named constants, add coding rule (4e9edf3)
  • fix: queue requests instead of rejecting with server_busy (a73b820)
  • fix: patch pin check matched wrong line (swift-docc-plugin) (16cf3da)
  • Fix Gemma 4 handling and consolidate repo skills (f0bd9c7)
  • chore: bump wheel version to 0.9.10.dev20260403, add test reports (7c38a5a)
  • Update nightly release link to 20260403-2b647b2 (9098e31)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly