Releases: scouzi1966/maclocal-api
afm v0.9.13
afm v0.9.13
OpenAI-compatible local LLM inference for Apple Silicon (MLX + Apple Foundation Models).
Highlights since v0.9.12 (73 commits)
New models
- cohere2_moe — Cohere North-Mini-Code (30B-A3B MoE). Correct across streaming, non-streaming, prefix-cache, and concurrency (#139).
⚡ Speculative decoding (quality-preserving)
--mtp— Qwen3.6 self-speculative decoding via the in-model MTP head → ~+52% decode.--eagle3 <drafter>— dense Gemma4-31B EAGLE3 drafter → ~+30% decode.- Both work streaming and non-streaming. Bit-exact to greedy on short generations, near-greedy on long ones.
APIs & agent-friendliness
/v1/embeddingson the main server (Apple NaturalLanguage) (#132, #133).- Mid-stream cancel +
/v1/tokenize//v1/count_tokens+/openapi.json&/docs(#126). - vLLM-namespaced
/metrics+ Grafana dashboard (#122). - Apple-native Vision OCR and Speech transcription HTTP APIs (thanks @jesserobbins).
Performance & platform
- Backported mlx-swift 0.31.3 adaptive-block SDPA → ~+10% decode @16k (pin stays 0.30.3).
- Eager
<think>-tag streaming + Metal-kernel prewarm. Swift 6 language mode migration.
Fixes
--no-think/ server-defaultenable_thinking=falsenow actually disables thinking on reasoning models (was a silent no-op).- MTP reject path retains the committed token in the KV/GDN cache (fixes garbled output on longer generations).
Known limitations
--no-think+ high--concurrentcan corrupt output (#140). Default behavior unaffected; use lower concurrency or omit--no-think.- MTP is bit-exact to greedy on short generations; longer ones stay greedy-quality but may differ token-for-token.
Install
brew tap scouzi1966/afm && brew install scouzi1966/afm/afm # or: brew upgrade afm
pip install macafmSHA256 (afm-v0.9.13-arm64.tar.gz): 443bf74650fece15f7ce02663f6d5dd14a7b638c937f80262e426903a6abf42b
afm-next (20260621 · 97e6683)
Nightly build from main branch.
- Commit: 97e6683
- Date: 20260621
- Version: 0.9.13-next.97e6683.20260621
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (a92a0fe)
- fix(test): repair EmbeddingsControllerTests against resolver-based init (
97e6683) - feat(model): add cohere2_moe (Cohere North-Mini-Code) with all-mode correctness (#139) (
4e46bc5) - docs(skill): nightly Step 4b also bumps the pinned pip wheel example in README (
d2437d8) - docs(README): bump pinned pip nightly example to dev20260614 (
2880dd4) - Update nightly release link to 20260614-a92a0fe (
22e4aff)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://maclocal-ai.pages.dev/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://maclocal-ai.pages.dev/afm/wheels/simple/ macafm-next # nightlyafm-next (20260614 · a92a0fe)
Nightly build from main branch.
- Commit: a92a0fe
- Date: 20260614
- Version: 0.9.13-next.a92a0fe.20260614
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (5aad36d)
- feat(server): serve /v1/embeddings on the main server + advertise it (#132, #133) (
895d20f) - docs(README): point pip wheel index at maclocal-ai.pages.dev (
ef94c4e) - docs(README): highlight lossless speculative decoding (MTP + EAGLE3) (
32ef421) - Update nightly release link to 20260613-5aad36d (
75f43a9)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm-next (20260613 · 5aad36d)
Nightly build from main branch.
- Commit: 5aad36d
- Date: 20260613
- Version: 0.9.13-next.5aad36d.20260613
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (4bd0ec62a2cc39e4d69463f5ff8f1c119a3759ed)
- Merge speculative-decoding (MTP/EAGLE3 + streaming + benchmarks) into main (
5aad36d) - docs: add "which flag for which model" decision section (
4300baf) - merge afm-opt: PR #134 build/metallib + whitespace review fixes (
0b3d120) - fix(build): address PR #134 review — metallib install guard, debug symlink, probe, whitespace (
248a8b5) - fix(spec-stream): address PR #135 review — think injection, error propagation, cancellation (
dea202c) - docs(bench): streaming spec-decode retest results (MTP + EAGLE3) (
689cb83) - feat(spec): streaming support for MTP and EAGLE3 fast paths (
e88011d) - docs: decode-optimization feature guide + release notes / social copy (
e0935fd) - perf(eagle3)+bench: lossless bs=2 fast path; afm vs mlx-vlm verify-fidelity (
f518a25) - bench(eagle3): afm vs mlx-vlm EAGLE3 head-to-head on dense Gemma4-31B (
f535e21) - bench(qwen36): 2026-06-06 MLX engine re-run (latest) + MTP head-to-head (
f5a080a) - feat(eagle3): P2/P3 — --eagle3 CLI + service routing, +22% decode on Gemma4-31B (
81bd262) - feat(eagle3): P1 greedy speculative loop — output identical to greedy AR (
7226736) - fix(gemma4): proportional RoPE on full-attention layers (was stock RoPE) (
5a81b9b) - docs: document /v1/embeddings API and list embed/speech in help card (#131) (
8f6a999) - feat(eagle3): P0 — Swift Gemma4Eagle3Drafter, bit-exact vs Python reference (
e3f082c) - feat(eagle3): P0 reference-capture for the dense Gemma4-31B EAGLE3 port (
4f8c9a4) - docs(eagle3): phased afm/Swift port plan for dense Gemma4-31B EAGLE3 (+25% validated) (
80ea9b5) - bench(gemma4): dense 31B flips it — EAGLE3 +25% (vs MoE all-negative) (
551dbb8) - bench(gemma4): spec-decode validation — all 3 methods SLOWER than AR on MoE (negative) (
09720b0) - docs(mtp): record the +52% win; note --mtp-depth now vestigial (
ade3ae2) - perf(mtp): rewrite loop after mlx-lm PR #990 — +52% decode vs AR (was +6%) (
9d86fea) - docs(mtp): record final implementation result (runnable, +6.5% at depth 1) (
b6dc626) - perf(mtp): depth-1 default beats AR (+6.5%); vectorized acceptance + instrumentation (
05e288f) - feat(mtp): P2 runnable in afm via --mtp — correct, perf WIP (
b628326) - feat(mtp): P2 — MTP self-speculative generator, output identical to greedy AR (
11da477) - feat(mtp): P1 — GatedDeltaNet cache rollback, bit-exact (the make-or-break gate) (
4474222) - chore(mtp): point P0 test + capture at the cache-root model location (
9eeaf77) - feat(mtp): P0 — Swift Qwen3_5MTPHead, bit-exact vs Python reference (
d2ab0af) - feat(mtp): P0 reference-capture harness for the Swift MTP head port (
ad86bfe) - docs(mtp): phased afm/Swift port plan for MTP self-speculative decoding (
b7042b8) - bench(ollama): use qwen3.6:27b-mlx (MLX tag), not the failing GGUF default (
999ccb4) - bench(qwen36): rerun full 7-engine cross-engine suite + refresh plots/results (
e72db8e) - fix(metallib): include random (RNG) kernel — was crashing sampled generation (
401484c) - wip(bench): SDPA backport report/plot + metallib RNG-guard fix; harness fixes (
9ef222f) - perf(mlx/sdpa): backport 0.31.3 adaptive-block 2-pass SDPA — decode@16k ~+10% (
7d180f8) - docs(deps): reconfirm mlx-swift 0.30.3 pin — 0.31.3 still has long-context SDPA regression (
ddc2c97) - perf(stream): eager think-tag emission — cut reasoning TTFT ~610ms -> ~346ms (
f1343a6) - perf(mlx): prewarm Metal kernels on server startup (faster cold first-token) (
33c247d) - test: Qwen3.6-27B local-engine performance benchmark (afm vs 6 engines) (
259b5f0) - fix(swift6): box command+error across the CFRunLoop Task (Swift 6.3.2) (
e569a75) - build: migrate to Swift 6 language mode (#130) (
1a6ffc1) - feat(build): add --install flag + verify binary paths (
7144459) - docs: prominent one-command build-from-source section in README (
ac3d303) - build: add root-level build.sh entry point for clone-and-build (
781cfb8) - fix(toolcall): prevent server crash on scalar JSON arg value (#128) (#129) (
c55d54c) - test: post-merge validation reports for ab90ac1 (proof for #127) (
cfb2676) - feat(agent): T1.4-T1.7 — cancel + tokenize + OpenAPI (rerun, stacked-merge correction) (#126) (
ab90ac1) - feat(metrics): vLLM-namespaced /metrics + Grafana dashboard (#122) (
a8dbffa) - feat(agent): Tier-0 promotion + Tier-1 quick wins (request id, stream usage, parallel_tool_calls) (#123) (
ed68189) - docs(claude): require proof before labeling test failures pre-existing (
46e2698) - Bump version to 0.9.13 for next dev cycle (
1cbd60f) - README: move Install section above the fold (
86429a8) - Release v0.9.12: promote nightly to stable (
7dfc4f6)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm 0.9.12
afm 0.9.12
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Changes since v0.9.11
- fix: resolve executable path via _NSGetExecutablePath, not argv[0] (
4bd0ec6) - Add nightly test results for 2026-05-02 (7af438e) (
4cbbe06) - Update nightly release link to 20260502-a589c50 (
7af438e) - feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage (#119) (
a589c50) - Feature: Expand vision API with barcode, classify, and saliency modes (#114) (
b7ffe18) - Add on-device speech transcription and TTS (#113) (
34001ec) - fix: afm -w falls back to ephemeral port when 9999 is busy (#116) (
1e9f22c) - README: surface "What's new in afm-next" above Install (
779e89e) - skill(promote-nightly): validate on staging tap before touching production (
53e58fa) - README: remove staging tap from public docs (
87c105e) - README: document installing previous versions of afm (
3dae2dd) - Bump version to 0.9.12 for next dev cycle (
7a23874) - Release v0.9.11: promote nightly to stable (
94fdc35) - skill(promote-nightly): verify Apple framework links and bundle id in Step 5h (
2717cdd) - Credit @jesserobbins for Vision OCR and Speech transcription (
36e0194) - Fix publish-next.sh tap-staging: use full ${VERSION} not ${DATE} (
90fd1d3) - Update nightly release link to 20260418-9c3225e (
a08f840)
Install / Upgrade via Homebrew
Fresh install:
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
Upgrade:
brew upgrade afm
Install via PyPI
pip install macafm==0.9.12
afm-next (20260502 · a589c50)
Nightly build from main branch.
- Commit: a589c50
- Date: 20260502
- Version: 0.9.12-next.a589c50.20260502
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (9c3225e)
- feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage (#119) (
a589c50) - Feature: Expand vision API with barcode, classify, and saliency modes (#114) (
b7ffe18) - Add on-device speech transcription and TTS (#113) (
34001ec) - fix: afm -w falls back to ephemeral port when 9999 is busy (#116) (
1e9f22c) - README: surface "What's new in afm-next" above Install (
779e89e) - skill(promote-nightly): validate on staging tap before touching production (
53e58fa) - README: remove staging tap from public docs (
87c105e) - README: document installing previous versions of afm (
3dae2dd) - Bump version to 0.9.12 for next dev cycle (
7a23874) - Release v0.9.11: promote nightly to stable (
94fdc35) - skill(promote-nightly): verify Apple framework links and bundle id in Step 5h (
2717cdd) - Credit @jesserobbins for Vision OCR and Speech transcription (
36e0194) - Fix publish-next.sh tap-staging: use full ${VERSION} not ${DATE} (
90fd1d3) - Update nightly release link to 20260418-9c3225e (
a08f840)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm 0.9.11
afm 0.9.11
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Changes since v0.9.10
- Bump version baseline to 0.9.11 (post-v0.9.10 stable) (
9b3f3bf) - Fix macOS 26 Speech Recognition SIGABRT: embed Info.plist via linker (
b39cd60) - Resolve merge conflicts for PR #107 (
e139985) - Address second round of Vision OCR review feedback (
9039eb5) - Close taskRef/onCancel race window (
8c10027) - Fix recognition task cancellation leak and lock race (
4f19247) - Fix data race in speech recognition timeout (
34406fd) - Address speech transcription review feedback (
8b3a3f0) - Add on-device audio transcription via Apple Speech framework (
7e18c90) - Address Vision OCR review feedback (
38b16c5) - Fix Vision OCR in webui: bypass Foundation Model 4096 token limit (
0292f3c) - Address Vision OCR review feedback (
5159ea2) - Document Vision OCR API (
b1d0f9c) - Add Vision OCR API and stabilize tests (
7ab80b6) - Release v0.9.10: promote nightly to stable (
332c8c2) - feat: versioned Homebrew formulae for afm and afm-next (#102) (
7cff3df) - fix: handle 1D logits in TopPSampler (#100) (#101) (
36ee874) - chore: bump wheel version to 0.9.10.dev20260408 (
212d945) - Add nightly test results for 2026-04-07 (4f24281) (
272cc13) - Update nightly release link to 20260408-628c2bb (
4f24281)
Install / Upgrade via Homebrew
Fresh install:
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
Upgrade:
brew upgrade afm
Install via PyPI
pip install macafm==0.9.11
afm-next (20260418 · 9c3225e)
Nightly build from main branch.
- Commit: 9c3225e
- Date: 20260418
- Version: 0.9.11-next.9c3225e.20260418
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
🙏 Acknowledgement
Huge thanks to first-time contributor @jesserobbins — this cycle landed two substantial features from him: the Apple Vision OCR HTTP API (#104) and Apple Speech transcription (#107). Both lift afm's Apple-native capabilities from CLI-only into first-class HTTP APIs compatible with the OpenAI-style surface that third-party clients already speak. Contributions of this size and quality from a new contributor are rare and appreciated.
Highlights
- Apple Vision OCR HTTP API —
POST /v1/vision/ocrfor files, multipart uploads, base64, data URLs, and OpenAI-style image content parts. Multi-page PDF support, structured document/page/block/table output, Foundation chat auto-OCR integration. Contributed by @jesserobbins (#104). - Apple Speech transcription — on-device audio transcription via the Speech framework. New
afm speech -f <file>CLI,POST /v1/audio/transcriptionsAPI, chatinput_audiocontent parts. Supports WAV/MP3/M4A/CAF/AIFF. Contributed by @jesserobbins (#107). - macOS 26 privacy fix — binary now embeds
NSSpeechRecognitionUsageDescriptionvia-sectcreate __TEXT __info_plist, so Speech Recognition actually works instead of SIGABRT'ing the process. First invocation from Terminal prompts for permission as expected (no Developer ID required). (#108) - Versioned Homebrew formulae — pinned nightly formulae
afm-next@<full-version>.rbgenerated alongside the rollingafm-next.rbso users canbrew installa specific nightly build. (#102) - TopPSampler 1D-logits crash fix — no longer crashes when concurrent batching meets
top_p<1. (#100 / #101)
Changes since last build (628c2bb)
- Fix publish-next.sh tap-staging: use full ${VERSION} not ${DATE} (
90fd1d3) - Update nightly release link to 20260418-9c3225e (
a08f840) - Bump version baseline to 0.9.11 (post-v0.9.10 stable) (
9b3f3bf) - Fix macOS 26 Speech Recognition SIGABRT: embed Info.plist via linker (
b39cd60) - Merge PR #107 (
b7ecdbd,e139985) — speech transcription - Speech transcription hardening (
8c10027,4f19247,34406fd,8b3a3f0,7e18c90) - Vision OCR + webui bypass (
a3b60a5,9039eb5,38b16c5,0292f3c,a1dda2b,5159ea2,b1d0f9c,7ab80b6) - feat: versioned Homebrew formulae for afm and afm-next (#102) (
7cff3df) - fix: handle 1D logits in TopPSampler (#100) (#101) (
36ee874)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)
# Pinned to this exact nightly:
brew install scouzi1966/afm/afm-next@0.9.11-next.9c3225e.20260418pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm 0.9.10
afm 0.9.10
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Highlights
- Gemma 4 support — text, vision-language, and MoE variants with tool calling
- Gemma 4 concurrent batch mode — ~10x throughput via new
BatchRotatingKVCachefor sliding-window attention - Server-level
--guided-jsonnow actually constrains MLX requests (#97) - Concurrent / prefix-cache stability — resolved radix cache SIGTRAP on wrapped
RotatingKVCache(#94), batched prefill lazy-graph overflow, and Metal buffer lifecycle issues under long runs (#88) - Performance — removed
container.performlock and actor serialization bottlenecks, raised SSE multiplex batch limit to 200, pipeline timing instrumentation viaAFM_DEBUG=1 - Request correlation IDs for end-to-end tracing across the server → scheduler → MLX path
Fixes since v0.9.9
--guided-jsonserver flag now applied to every request (fixed in #97)- Gemma 4 batch mode, structured tool history, and Metal buffer lifecycle (#88)
- Gemma 4 streaming + non-streaming tool call type coercion (array/object/int) (#87)
- Radix cache SIGTRAP for wrapped
RotatingKVCache(#94) - Root-cause batched prefill crash caused by MLX lazy graph overflow
- Snapshot prefill state to prevent decode mutation corruption
BatchRotatingKVCachemasktotalLenafter circular buffer wrap- Flatten
anyOf/oneOfnullable schemas for Jinja template safety - Structured output streaming regression
- Queue requests instead of rejecting with
server_busy - Homebrew libexec search path for metallib
- Test harness: timestamped logs, format validation, Codex ARG_MAX, baseline tagging, spec extraction (#98)
Known issues
- TopPSampler + concurrent mode crash: on this exact commit, requests with
0 < top_p < 1hitting theBatchScheduler(concurrent mode, llama.cpp WebUI default) abort with[squeeze] axis 0fatal error. Fix is on main (PR #101) and will ship in v0.9.11 or the next nightly. Workaround: usetop_p=1.0ortemperature=0in concurrent mode for now, or use the WebUI against the non-concurrent single-sequence server path.
Install / Upgrade via Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
Upgrade:
brew upgrade afm
Pin to this version specifically:
brew install scouzi1966/afm/afm@0.9.10
Install via PyPI
pip install macafm==0.9.10
afm-next (20260408 · 628c2bb)
Nightly build from main branch.
- Commit: 628c2bb
- Date: 20260408
- Version: 0.9.10-next.628c2bb.20260408
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (2b647b2)
- fix: --guided-json server flag and per-test spec extraction (#97, #98) (
628c2bb) - fix: generate-report.py reads from RESULTS_FILE env var, not hardcoded path (
6b36714) - fix: timestamp all temp files to prevent overwrites between runs (
f10f72b) - fix: codex per-test scoring — local outside function, unbound var (
5142185) - fix: handle unset AFM_BIN with set -u (
e48e0f5) - fix: mlx-model-test.sh defaults to local build over PATH (
2f88460) - fix: smart analysis reporting — codex ARG_MAX, format validation, baseline tagging (
f582ac5) - refactor: address PR #95 review — extract helper, improve readability (
2c7d205) - fix: skip radix save for wrapped RotatingKVCache to prevent SIGTRAP (#94) (
48f88a8) - fix: Gemma 4 batch mode, structured tool history, and Metal buffer lifecycle (#88) (
d5573d1) - Add nightly test results for 2026-04-04 (4 models) (
44c048e) - fix: Gemma 4 tool call type coercion (array/object/int) (
2aa2abd) - fix: flatten anyOf/oneOf nullable schemas for Jinja template safety (
ce66763) - Merge feature/gemma4-batch-kvcache: 10x throughput for Gemma 4 (
bc9de80) - Fix structured output streaming regression (
4afb1ff) - fix: address PR review — empty cache merge safety, slot wait cancellation (
731ca38) - refactor: raise SSE multiplex batch limit to 200, extract as constant (
55ea028) - fix: root-cause batched prefill crash — MLX lazy graph overflow (
329b114) - refactor: improve updateConcat alloc pattern, keep individual prefill (
3d5f9e1) - feat: add request correlation ID for end-to-end tracing (
ea12c89) - perf: add pipeline timing instrumentation (AFM_DEBUG=1) (
65c093c) - perf: remove container.perform lock and actor serialization bottlenecks (
a289c42) - fix: snapshot prefill state to prevent decode mutation corruption (
6e16a21) - fix: BatchRotatingKVCache mask totalLen after circular buffer wrap (
f650d65) - fix: updateConcat alloc size, debug prefillBatch B>=3 crash (
38159f0) - debug: add SDPA shape logging and BatchRotatingKV tracing (
22461ac) - Bypass adaptive XML for Gemma 4 tool calls (
45910d7) - feat: BatchRotatingKVCache — Gemma 4 concurrent batch mode working (
4006a11) - WIP: BatchRotatingKVCache — B=1 works, B>=2 segfaults in SDPA (
c04eafc) - WIP: BatchRotatingKVCache for Gemma 4 batch mode (
0f94cca) - refactor: extract magic numbers to named constants, add coding rule (
4e9edf3) - fix: queue requests instead of rejecting with server_busy (
a73b820) - fix: patch pin check matched wrong line (swift-docc-plugin) (
16cf3da) - Fix Gemma 4 handling and consolidate repo skills (
f0bd9c7) - chore: bump wheel version to 0.9.10.dev20260403, add test reports (
7c38a5a) - Update nightly release link to 20260403-2b647b2 (
9098e31)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightly