localai 4.4.0 by BrewTestBot · Pull Request #287347 · Homebrew/homebrew-core

BrewTestBot · 2026-06-10T21:17:03Z

Created by brew bump

Created with brew bump-formula-pr.

Details

release notes

# 🎉 LocalAI 4.4.0 Release! 🚀

LocalAI 4.4.0 is out!

This is a big, multimodal-and-distributed release. Two brand-new audio backends land - parakeet.cpp (NVIDIA NeMo Parakeet ASR) and CrispASR (a multi-architecture ASR and TTS engine) - alongside native object detection + segmentation (rfdetr-cpp), video understanding in llama-cpp, and LTX-2 video generation in stablediffusion-ggml. Distributed mode grows up: prefix-cache-aware routing is on by default, and file transfers become resumable. There's a new intelligent middleware layer for request routing, PII filtering and cloud-model proxying, a security hardening pass that closes a credential-leak class across every outbound HTTP client, an interactive local-ai chat CLI, RAG source citations for agents, and a long run of reasoning / tool-call streaming fixes.

📌 TL;DR

Area	Summary
🎙️ Two new ASR backends	`parakeet-cpp` (NeMo FastConformer TDT/CTC/RNNT, streaming, word/segment timestamps) and `crispasr` (many ASR architectures + TTS in one binary).
🧭 Intelligent Middleware	Capability-based model routing, PII detection/redaction, cloud-model proxies + a MITM proxy for subscription-auth Claude Code / Codex.
🛰️ Distributed v4	Prefix-cache-aware routing (on by default), NATS JWT auth + TLS/mTLS, worker registration-token enforcement, resumable HTTP file transfers, boot-time model prefetch, ds4 layer-split inference.
🎥 Video, both ways	Video input (understanding) in `llama-cpp` via mtmd, and video generation via LTX-2 in `stablediffusion-ggml`.
👁️ Detection + Segmentation	New native `rfdetr-cpp` backend (RF-DETR), 32 prebuilt GGUFs, bbox + per-detection PNG masks.
🔐 Outbound HTTP hardening	`pkg/httpclient` refuses cross-host credential-leaking redirects across every outbound client (GHSA-3mj3-57v2-4636).
🗣️ TTS per-request control	`instructions` + a generic `params` map plumbed end to end (Qwen3-TTS VoiceDesign / CustomVoice, Chatterbox).
💻 `local-ai chat`	Interactive terminal chat against a running server, with `/models`, `/model`, `/clear`.
📚 RAG citations	Agent answers now append a clickable `Sources:` block from the Knowledge Base.
🧠 Models	Gemma 4 QAT family + QAT-matched MTP speculative-decoding bundles, Ideogram4, LTX-2.3 22B GGUFs.

🚀 New Features & Major Enhancements

🎙️ Audio Gets Serious: Two New ASR Backends

This release doubles down on speech-to-text with two independent, cgo-less Go backends (purego, CGO_ENABLED=0), each shipping a full CI matrix, gallery importer and docs.

parakeet-cpp - NVIDIA NeMo Parakeet (#10084). Wraps parakeet.cpp, a C++/ggml port of NeMo Parakeet (FastConformer TDT/CTC/RNNT/hybrid) that matches the upstream PyTorch models on CPU. Text transcription, OpenAI-compatible word timestamps, and cache-aware streaming (16 kHz PCM chunks, <EOU>/<EOB> utterance boundaries). GGUFs for all 10 Parakeet models × 5 quants ship in mudler/parakeet-cpp-gguf. Follow-ups in this cycle made it production-grade:

Dynamic batching (#10112) - concurrent transcription requests are batched for throughput.
Real, NeMo-faithful segment timestamps (#10207) - words are grouped into segments exactly like NeMo's get_segment_offsets (sentence-punctuation boundaries by default, opt-in segment_gap_threshold silence splitting in encoder frames). Streaming FinalResult segments now carry start/end when the library exposes the ABI v4 JSON entry points.
nemotron-3.5-asr multilingual streaming (#10199) + per-request language selection.

crispasr - many architectures + TTS in one backend (#10099). Wraps CrispASR (a whisper.cpp/ggml fork, MIT) through its session C-ABI. One backend serves ASR or TTS depending on the loaded model, with the architecture auto-detected from the GGUF (or forced via backend:). The gallery gains 36 -crispasr entries (32 ASR + 4 TTS):

ASR (e2e-verified across Whisper / Parakeet / Moonshine): parakeet, canary, cohere, qwen3, voxtral, granite, fastconformer-ctc, wav2vec2, hubert, data2vec, glm-asr, kyutai-stt, firered-asr, moonshine, mimo-asr, and more.
TTS (all four e2e-verified to valid 24 kHz mono WAV): vibevoice, chatterbox, qwen3-tts CustomVoice, orpheus - via backend: / codec: / speaker: / voice: model options.

🧭 Intelligent Middleware: Routing, PII Filtering & Cloud Proxies

A new middleware layer (#9802) analyzes, routes, filters and transforms chat requests before they hit a model.

Capability-based routing. Requests are classified (e.g. via an ArchRouter-style model) and scored across the capabilities they may require, then routed to the smallest model that satisfies them - easy requests go to small specialized models, hard or uncertain ones to larger general-purpose models. Classified embeddings are reused via cosine similarity so similar requests skip re-classification.
PII filtering. Private information is detected per-pattern and can be redacted, rerouted, or blocked, with a streaming PII filter that preserves a buffered-emit invariant on /v1/chat/completions, Anthropic /v1/messages, and /v1/completions. A per-model PII pattern editor lives in the model config UI.
Cloud model proxies + MITM. Cloud models and a MITM proxy can take part in routing/filtering - send easy requests to local models and hard ones to the cloud, and use Claude Code / Codex subscriptions (OAuth) through the PII filter via the MITM proxy (subject to provider ToS). Emits proxy_connect + proxy_traffic audit events and restores its listener from runtime_settings.json on restart.

Usage stats are recorded end to end and surfaced in REST, the UI, and MCP. Outbound clients used by this path were also the trigger for the security pass below.

🛰️ Distributed Mode v4

Distributed mode keeps maturing across routing, security and resilience.

Prefix-cache-aware routing, on by default (#10071). Routing now biases toward the replica that already holds the relevant KV/prefix cache, as a load-guarded hint that never routes worse than today's round-robin. A generic prefix tree (pkg/radixtree) maps cumulative prompt-prefix hashes to nodes; core/services/nodes/prefixcache turns the rendered prompt into a deterministic xxhash chain and makes a filter-then-score decision (narrow to load-eligible replicas, then prefer the longest-prefix match), feeding a preferredNodeID into the existing atomic SELECT ... FOR UPDATE pick. Observations sync across frontends over NATS. Round-robin is the floor; disable with --distributed-prefix-cache=false.

NATS JWT auth + TLS/mTLS (#10159). Previously anyone with access to the NATS port could publish backend-install messages or agent jobs (an SSRF / accidental-exposure risk). This adds JWT authentication and TLS/mTLS options, with workers acquiring and auto-refreshing their NATS credentials. Complemented by worker file-transfer registration-token enforcement (#10183).

Resumable file transfers (#10109). Large model GGUFs over flaky/throttled links no longer restart from byte 0. The worker's PUT /v1/files/<key> honors Content-Range (308/416 resume semantics, X-Content-SHA256 binding, final-hash verification) and the master-side stager HEAD-probes for the last accepted offset and resumes, switching to an outer time budget (LOCALAI_FILE_TRANSFER_BUDGET, default 1h) with exponential backoff.

ds4 layer-split distributed inference (#10098). Manual layer-split inference for the ds4 backend: a coordinator owns layers 0:K and listens; workers dial in and own higher ranges, each loading only its slice of the GGUF (a new dependency-free ds4-worker binary, driven via local-ai worker ds4-distributed). Fully back-compatible when ds4_role is absent.

Operational glue. Boot-time gallery prefetch via LOCALAI_PREFETCH_MODELS (#10108); a gated X-LocalAI-Node response header for attribution (#9976); plus fixes: self-heal stale "model not loaded" routing (#10181), stage directory-based models to remote nodes (#10175), in-flight tracking for non-LLM methods - VAD, diarize, voice (#10238), reconciler survives frontend restarts (#9981), cross-replica OpCache sync (#9983), and the reinstall/upgrade UI no longer sticks on "reinstalling" (#10214).

🎥 Video, Both Directions

Video input / understanding in llama-cpp (#10216). Video-capable multimodal models (e.g. SmolVLM2-Video) can now be sent a video in a chat request, mirroring the existing image and audio paths. Tracks the upstream mtmd video landing (ggml-org/llama.cpp#24269); grpc-server.cpp forwards request->videos() into the mtmd files vector on both the template and non-template paths, and the React chat UI accepts video/*, renders an inline <video controls> player, and emits video_url content parts. allow_video is auto-gated by whether the loaded mmproj supports it. ffmpeg/ffprobe (already in the runtime image) extract frames.

Video generation via LTX-2 (#9980). stablediffusion-ggml wires audio_vae_path and embeddings_connectors_path through to the upstream LTX-2 fields, with a new gallery/ltx-ggml.yaml template (T2V / I2V / FLF2V recipes) and six LTX-2.3 22B GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0), each bundling the text encoder + video VAE + audio VAE + embeddings connectors. Follow-up fixes wired the diffusion_model flag and vae_decode_only:false for the i2v/flf2v paths (#9986, #9987) and muxed LTX-2 audio into the output MP4 (#9990).

👁️ Native Object Detection + Segmentation: `rfdetr-cpp`

A new Go native gRPC backend (#10028) dlopens librfdetr.so (built from mudler/rf-detr.cpp) and exposes the RF-DETR pipeline through LocalAI's Detect RPC. Supports all 5 detection variants (Nano…Large) and 3 segmentation variants (SegNano/SegSmall/SegMedium) at F32/F16/Q8_0/Q4_K, with 32 prebuilt GGUFs on HuggingFace. Detection returns bbox + class_name + confidence; segmentation adds per-detection PNG-encoded masks. Matches PyTorch on CPU (sub-pixel bbox match, mask IoU 0.99+), with an HF gallery importer that auto-routes GGUF repos to the native backend.

🔗 PR: #10028. Also new: Ideogram4 support in stablediffusion-ggml (#10201).

🗣️ TTS: Per-Request Instructions & Params

The OpenAI-compatible /v1/audio/speech instructions field was silently dropped at the HTTP→gRPC boundary, so style/voice could only come from static YAML. PR #10172 plumbs a generic per-request instructions string and an optional backend-specific params map end to end (proto, schema, core/backend/tts.go), unlocking per-line emotion/style (Qwen3-TTS CustomVoice, Chatterbox) and describe-a-voice (Qwen3-TTS VoiceDesign) from a single model config. Fully backward compatible - empty instructions falls back to YAML.

curl http://localhost:8080/v1/audio/speech -H "Content-Type: application/json" -d '{
  "model": "qwen-tts-design",
  "input": "Hello world, this is a test.",
  "instructions": "A calm, low-pitched elderly storyteller with a warm tone."
}'

Also: Qwen3-TTS request-language normalization for flexible matching (#10174), and LocalVQE v1.3 with input/output spectrogram views in the Audio Transform UI (#10113).

🧠 Reasoning & Tool-Call Streaming Hardening

A focused run of correctness fixes for reasoning models and streaming tool calls:

reasoning_effort honored per request and forwarded to the backend so jinja models can act on it (#10082, #10184).
<think> parsing: stop <think> leaking into content in pure-content mode (#9991), stop a prefilled <think> from swallowing tag-less answers (#10225), and don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208).
Streaming + tools: stop tool-call double-emission when the autoparser is active (#10055), stop tool-call JSON leaking into content on tokenizer-template models (#10057), validate auto-detected XML tool-call names with a robust glm-4.5/Hermes guard (#10059), and stop healing-marker stubs / prefill-misclassified content from corrupting the stream (#9999, #10000).

💻 `local-ai chat` + 📚 RAG Citations + 🛰️ Realtime

Interactive CLI chat (#10226). A new opt-in local-ai chat command connects to a running server over the OpenAI-compatible API, streams completions, and supports /models, /model <name>, /clear, /exit. Keeps local-ai run focused on the server lifecycle. (Fixes #1535.)
RAG source citations (#10228). When an agent answers from the Knowledge Base, the response now appends a clickable Sources: block listing the original documents - deduplicated per source, with the citation-free version saved to long-term memory. (Closes #9331.)
Configurable WebRTC ICE candidates (#10231). New LOCALAI_WEBRTC_NAT_1TO1_IPS / LOCALAI_WEBRTC_ICE_INTERFACES knobs fix /v1/realtime calls dropping a few seconds in under Docker host networking (unroutable docker0/veth candidates).
"Fits in my GPU" filter (#10017) on the Install Models page, plus a single shared /api/operations poller across UI consumers (#10029) and a React bundle code-split (#10042).

🧩 Backend Capability Registration & Startup Speed

Backend capability registration fixes so the right backend is picked for the right job: register 5 backends missing from BackendCapabilities (#10107), and add face/speaker-recognition constants registering insightface + speaker-recognition (#10110).
Faster startup (#10213): skip vocab arrays and mmap GGUF headers during config parsing.

Click for the full changelog below!

What's Changed

Bug fixes :bug:

fix(config): register 5 backends missing from BackendCapabilities by @Dennisadira in fix(config): register 5 backends missing from BackendCapabilities mudler/LocalAI#10107
fix(config): register parakeet-cpp as a transcript backend (#9718) by @Dennisadira in fix(config): register parakeet-cpp as a transcript backend (#9718) mudler/LocalAI#10106
fix(parakeet-cpp): cublas/hipblas/vulkan builds were silently CPU-only by @localai-bot in fix(parakeet-cpp): cublas/hipblas/vulkan builds were silently CPU-only mudler/LocalAI#10120
fix(nemo): pin texterrors to 1.1.6 for GLIBCXX compatibility by @fqscfqj in fix(nemo): pin texterrors to 1.1.6 for GLIBCXX compatibility mudler/LocalAI#10134
fix(parakeet-cpp): convert audio before the non-batched transcribe path by @localai-bot in fix(parakeet-cpp): convert audio before the non-batched transcribe path mudler/LocalAI#10161
fix(distributed): stage directory-based models to remote nodes by @localai-bot in fix(distributed): stage directory-based models to remote nodes mudler/LocalAI#10175
fix(config): add face/speaker recognition constants and register insightface + speaker-recognition by @Dennisadira in fix(config): add face/speaker recognition constants and register insightface + speaker-recognition mudler/LocalAI#10110
fix(distributed): self-heal stale 'model not loaded' routing by @localai-bot in fix(distributed): self-heal stale 'model not loaded' routing mudler/LocalAI#10181
fix(docs): use relearn notice shortcode instead of unsupported alert by @localai-bot in fix(docs): use relearn notice shortcode instead of unsupported alert mudler/LocalAI#10206
fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs by @localai-bot in fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs mudler/LocalAI#10208
fix(config): skip vocab arrays and mmap GGUF headers to speed up startup by @Dennisadira in fix(config): skip vocab arrays and mmap GGUF headers to speed up startup mudler/LocalAI#10213
fix: distributed backend reinstall/upgrade UI stuck on 'reinstalling' by @localai-bot in fix: distributed backend reinstall/upgrade UI stuck on 'reinstalling' mudler/LocalAI#10214
fix(reasoning): stop prefilled from swallowing tag-less answers by @localai-bot in fix(reasoning): stop prefilled <think> from swallowing tag-less answers mudler/LocalAI#10225
fix(cli): handle chat output errors by @Oceankj in fix(cli): handle chat output errors mudler/LocalAI#10229
fix(distributed): track in-flight for non-LLM inference methods (VAD, diarize, voice, ...) by @localai-bot in fix(distributed): track in-flight for non-LLM inference methods (VAD, diarize, voice, ...) mudler/LocalAI#10238

Exciting New Features 🎉

feat: prefix-cache-aware routing for distributed mode by @localai-bot in feat: prefix-cache-aware routing for distributed mode mudler/LocalAI#10071
feat(ds4): layer-split distributed inference by @localai-bot in feat(ds4): layer-split distributed inference mudler/LocalAI#10098
feat(crispasr): add CrispASR backend — multi-architecture ASR + TTS by @localai-bot in feat(crispasr): add CrispASR backend — multi-architecture ASR + TTS mudler/LocalAI#10099
feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch by @localai-bot in feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch mudler/LocalAI#10108
feat(distributed): resumable file uploads via HTTP Content-Range by @localai-bot in feat(distributed): resumable file uploads via HTTP Content-Range mudler/LocalAI#10109
feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI by @richiejp in feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI mudler/LocalAI#10113
feat(parakeet-cpp): dynamic batching for concurrent transcription requests by @localai-bot in feat(parakeet-cpp): dynamic batching for concurrent transcription requests mudler/LocalAI#10112
feat(distributed): Add NATS JWT authentication and TLS/mTLS options by @richiejp in feat(distributed): Add NATS JWT authentication and TLS/mTLS options mudler/LocalAI#10159
feat(tts): support per-request instructions and params by @localai-bot in feat(tts): support per-request instructions and params mudler/LocalAI#10172
feat(qwen3-tts-cpp): normalize request language for flexible matching by @localai-bot in feat(qwen3-tts-cpp): normalize request language for flexible matching mudler/LocalAI#10174
feat(distributed): enforce registration token for worker file transfer by @richiejp in feat(distributed): enforce registration token for worker file transfer mudler/LocalAI#10183
feat: forward reasoning_effort to the backend so jinja models honor it by @localai-bot in feat: forward reasoning_effort to the backend so jinja models honor it mudler/LocalAI#10184
Harden gallery-agent Hugging Face fetches against transient rate limiting by @Copilot in Harden gallery-agent Hugging Face fetches against transient rate limiting mudler/LocalAI#10187
feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support by @localai-bot in feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support mudler/LocalAI#10199
feat: support Ideogram4 in stablediffusion-ggml backend + gallery by @localai-bot in feat: support Ideogram4 in stablediffusion-ggml backend + gallery mudler/LocalAI#10201
feat(parakeet-cpp): real segment timestamps (NeMo-faithful) by @localai-bot in feat(parakeet-cpp): real segment timestamps (NeMo-faithful) mudler/LocalAI#10207
feat(llama-cpp): video input support (mtmd #24269) by @localai-bot in feat(llama-cpp): video input support (mtmd #24269) mudler/LocalAI#10216
feat(agents): surface KB source citations in RAG responses by @petechentw in feat(agents): surface KB source citations in RAG responses mudler/LocalAI#10228
feat(cli): add interactive chat mode by @Oceankj in feat(cli): add interactive chat mode mudler/LocalAI#10226
feat(realtime): make WebRTC ICE candidates configurable by @localai-bot in feat(realtime): make WebRTC ICE candidates configurable mudler/LocalAI#10231

🧠 Models

chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in chore(model gallery): 🤖 add 1 new models via gallery agent mudler/LocalAI#10163
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in chore(model gallery): 🤖 add 1 new models via gallery agent mudler/LocalAI#10200
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in chore(model gallery): 🤖 add 1 new models via gallery agent mudler/LocalAI#10209
feat(gallery): add Gemma 4 QAT family + MTP speculative-decoding pairs by @localai-bot in feat(gallery): add Gemma 4 QAT family + MTP speculative-decoding pairs mudler/LocalAI#10215

📖 Documentation and examples

docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in docs: ⬆️ update docs version mudler/LocalAI mudler/LocalAI#10091
docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in docs: ⬆️ update docs version mudler/LocalAI mudler/LocalAI#10114
docs: fix documentation typos by @Zhao73 in docs: fix documentation typos mudler/LocalAI#10125
docs(llama.cpp): note tensor split now works with quantized KV cache by @mudler in docs(llama.cpp): note tensor split now works with quantized KV cache mudler/LocalAI#10135
docs: position LocalAI as a composable engine, not a bundle by @localai-bot in docs: position LocalAI as a composable engine, not a bundle mudler/LocalAI#10136
docs: architecture & feature diagrams (blueprint style) by @localai-bot in docs: architecture & feature diagrams (blueprint style) mudler/LocalAI#10137
docs: fix distributed-mode diagram (workers use NATS, not PostgreSQL) by @localai-bot in docs: fix distributed-mode diagram (workers use NATS, not PostgreSQL) mudler/LocalAI#10138

👒 Dependencies

chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 3f40e73c367ad9f0c1b1819f28c7348c26aa340d by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3f40e73c367ad9f0c1b1819f28c7348c26aa340d mudler/LocalAI#10097
chore: :arrow_up: Update antirez/ds4 to ba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdc by @localai-bot in chore: ⬆️ Update antirez/ds4 to ba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdc mudler/LocalAI#10095
chore: :arrow_up: Update leejet/stable-diffusion.cpp to d2797b86670622b6538123b4aeb5fbb6be2653c5 by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to d2797b86670622b6538123b4aeb5fbb6be2653c5 mudler/LocalAI#10094
chore: :arrow_up: Update ggml-org/llama.cpp to d6588daa800058dfa54f1d7ea695b1a810c8ae18 by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to d6588daa800058dfa54f1d7ea695b1a810c8ae18 mudler/LocalAI#10093
chore: :arrow_up: Update mudler/parakeet.cpp to cb45f68068081af01e7092e91b038ee353eb56be by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to cb45f68068081af01e7092e91b038ee353eb56be mudler/LocalAI#10116
chore: :arrow_up: Update ggml-org/whisper.cpp to fe69461618ffc50ba8afa65c25cc6c6e34d4537f by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to fe69461618ffc50ba8afa65c25cc6c6e34d4537f mudler/LocalAI#10117
chore: :arrow_up: Update leejet/stable-diffusion.cpp to be65ac7511b30379b003626c15224798929e33d4 by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to be65ac7511b30379b003626c15224798929e33d4 mudler/LocalAI#10118
chore: :arrow_up: Update ggml-org/llama.cpp to 399739d5c5978351f39e3454bfbfbab4f369088f by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 399739d5c5978351f39e3454bfbfbab4f369088f mudler/LocalAI#10119
chore(model-gallery): :arrow_up: update checksum by @localai-bot in chore(model-gallery): ⬆️ update checksum mudler/LocalAI#10131
chore: :arrow_up: Update ggml-org/whisper.cpp to 23ee03506a91ac3d3f0071b40e66a430eebdfa1d by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to 23ee03506a91ac3d3f0071b40e66a430eebdfa1d mudler/LocalAI#10130
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 7948df8ac1070f5f6881b8d34675821893eb97d6 by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to 7948df8ac1070f5f6881b8d34675821893eb97d6 mudler/LocalAI#10127
chore: :arrow_up: Update mudler/parakeet.cpp to 8a7c48209d7882a7ce79a6b306270e4703194543 by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to 8a7c48209d7882a7ce79a6b306270e4703194543 mudler/LocalAI#10129
chore: :arrow_up: Update ggml-org/llama.cpp to 5dcb71166686799f0d873eab7386234302d05ecf by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 5dcb71166686799f0d873eab7386234302d05ecf mudler/LocalAI#10128
chore: :arrow_up: Update CrispStrobe/CrispASR to 05e60432bcb5bc2113f8c395a41e86497c11504a by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to 05e60432bcb5bc2113f8c395a41e86497c11504a mudler/LocalAI#10115
chore(deps): bump github.com/mudler/edgevpn from 0.32.2 to 0.34.0 by @dependabot[bot] in chore(deps): bump github.com/mudler/edgevpn from 0.32.2 to 0.34.0 mudler/LocalAI#10153
chore: :arrow_up: Update mudler/parakeet.cpp to 9edf17c3ada66e0f881dcff155492867db7ac4cf by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to 9edf17c3ada66e0f881dcff155492867db7ac4cf mudler/LocalAI#10141
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.65.0 to 0.66.0 by @dependabot[bot] in chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.65.0 to 0.66.0 mudler/LocalAI#10151
chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/vllm by @dependabot[bot] in chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/vllm mudler/LocalAI#10157
chore(deps): bump github.com/google/go-containerregistry from 0.21.5 to 0.21.6 by @dependabot[bot] in chore(deps): bump github.com/google/go-containerregistry from 0.21.5 to 0.21.6 mudler/LocalAI#10149
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 2d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5 by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to 2d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5 mudler/LocalAI#10144
chore(deps): bump securego/gosec from 2.22.9 to 2.27.1 by @dependabot[bot] in chore(deps): bump securego/gosec from 2.22.9 to 2.27.1 mudler/LocalAI#10147
chore: :arrow_up: Update ggml-org/whisper.cpp to 610e664ba7cfe3af46125ed1b5a1184fccb51bcd by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to 610e664ba7cfe3af46125ed1b5a1184fccb51bcd mudler/LocalAI#10140
chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/transformers by @dependabot[bot] in chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/transformers mudler/LocalAI#10158
chore: :arrow_up: Update ggml-org/llama.cpp to 5c394fdc8b564eff6faacc50a139529d875f0e36 by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 5c394fdc8b564eff6faacc50a139529d875f0e36 mudler/LocalAI#10143
chore: :arrow_up: Update antirez/ds4 to 477c0e82e2699b35a65fd0a1ed6fe66b41087dfe by @localai-bot in chore: ⬆️ Update antirez/ds4 to 477c0e82e2699b35a65fd0a1ed6fe66b41087dfe mudler/LocalAI#10142
chore(model-gallery): :arrow_up: update checksum by @localai-bot in chore(model-gallery): ⬆️ update checksum mudler/LocalAI#10169
chore: :arrow_up: Update ggml-org/llama.cpp to 94a220cd6745e6e3f8de62870b66fd5b9bc92700 by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 94a220cd6745e6e3f8de62870b66fd5b9bc92700 mudler/LocalAI#10168
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 1f9ee88e09c258053fa59d5e05e23dfb10fa0b13 by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to 1f9ee88e09c258053fa59d5e05e23dfb10fa0b13 mudler/LocalAI#10166
chore: :arrow_up: Update CrispStrobe/CrispASR to 13d54e110e1538e0f0bc3af0680b9ab246cfb48d by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to 13d54e110e1538e0f0bc3af0680b9ab246cfb48d mudler/LocalAI#10145
chore: :arrow_up: Update predict-woo/qwen3-tts.cpp to 136e5d36c17083da0321fd96512dc7b263f94a44 by @localai-bot in chore: ⬆️ Update predict-woo/qwen3-tts.cpp to 136e5d36c17083da0321fd96512dc7b263f94a44 mudler/LocalAI#10165
chore: :arrow_up: Update mudler/parakeet.cpp to b11fe5bca78ad8b342dd559a43d76df3984bb447 by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to b11fe5bca78ad8b342dd559a43d76df3984bb447 mudler/LocalAI#10167
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 1520eda980564241434b791ce2bbbd128c4be9ea by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to 1520eda980564241434b791ce2bbbd128c4be9ea mudler/LocalAI#10180
chore: :arrow_up: Update ggml-org/llama.cpp to 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae mudler/LocalAI#10179
chore: :arrow_up: Update ggml-org/whisper.cpp to 99613cb720b65036237d44b52f753b51f75c2797 by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to 99613cb720b65036237d44b52f753b51f75c2797 mudler/LocalAI#10178
chore: :arrow_up: Update vllm-project/vllm cu130 wheel to 0.22.1 by @localai-bot in chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.22.1 mudler/LocalAI#10188
chore: bump LocalAGI + localrecall (fix pgvector hybrid search seqscan, #10186) by @localai-bot in chore: bump LocalAGI + localrecall (fix pgvector hybrid search seqscan, #10186) mudler/LocalAI#10192
chore: :arrow_up: Update mudler/parakeet.cpp to 843600590f96a31467a5199f827c253f34c110f7 by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to 843600590f96a31467a5199f827c253f34c110f7 mudler/LocalAI#10198
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 6b9de3dbaa21ae95ea80638e5ee836795cc48c93 by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to 6b9de3dbaa21ae95ea80638e5ee836795cc48c93 mudler/LocalAI#10190
chore: :arrow_up: Update mudler/parakeet.cpp to abd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67 by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to abd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67 mudler/LocalAI#10204
chore: :arrow_up: Update ggml-org/whisper.cpp to a8ec021f2750a473ff4a8f3883bc9fdf5feafa84 by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to a8ec021f2750a473ff4a8f3883bc9fdf5feafa84 mudler/LocalAI#10202
chore(turboquant): bump to 7d9715f1 + fix compilation against rebased fork by @localai-bot in chore(turboquant): bump to 7d9715f1 + fix compilation against rebased fork mudler/LocalAI#10205
chore: :arrow_up: Update ggml-org/llama.cpp to 31e82494c0a3913c919c1027fa70500fbf4c07dd by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 31e82494c0a3913c919c1027fa70500fbf4c07dd mudler/LocalAI#10191
chore: :arrow_up: Update mudler/parakeet.cpp to e270af73b94c9a5c37ec516230219ed4580e1db6 by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to e270af73b94c9a5c37ec516230219ed4580e1db6 mudler/LocalAI#10212
chore: :arrow_up: Update leejet/stable-diffusion.cpp to b3d56d0ba1bd437886079e339118e8e75bb79ee7 by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to b3d56d0ba1bd437886079e339118e8e75bb79ee7 mudler/LocalAI#10211
chore: :arrow_up: Update ggml-org/llama.cpp to 9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66 by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66 mudler/LocalAI#10210
chore: :arrow_up: Update antirez/ds4 to c463029c205c2ec8d7ab6c0df4a3f52979091286 by @localai-bot in chore: ⬆️ Update antirez/ds4 to c463029c205c2ec8d7ab6c0df4a3f52979091286 mudler/LocalAI#10189
chore: :arrow_up: Update CrispStrobe/CrispASR to f7838a306687f22c281d29c250f879a4ab3df2d7 by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to f7838a306687f22c281d29c250f879a4ab3df2d7 mudler/LocalAI#10177
chore: :arrow_up: Update antirez/ds4 to 512d07cb08f234b704b5a5959aa9e2d4c466eeb0 by @localai-bot in chore: ⬆️ Update antirez/ds4 to 512d07cb08f234b704b5a5959aa9e2d4c466eeb0 mudler/LocalAI#10224
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 2768b6251548b78b6610e95edad13f888ad95982 by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to 2768b6251548b78b6610e95edad13f888ad95982 mudler/LocalAI#10219
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 19bdfe22d255d5b4dff39d449318b9bc5ea2317f by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to 19bdfe22d255d5b4dff39d449318b9bc5ea2317f mudler/LocalAI#10222
chore: :arrow_up: Update CrispStrobe/CrispASR to 97cad527d247edefc904e6c40c4cf5ee78bed055 by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to 97cad527d247edefc904e6c40c4cf5ee78bed055 mudler/LocalAI#10221
chore: :arrow_up: Update ggml-org/whisper.cpp to df7638d8229a243af8a4b5a8ae557e0d74e0a0ae by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to df7638d8229a243af8a4b5a8ae557e0d74e0a0ae mudler/LocalAI#10220
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to e6f8112f3ba126eed3ff5b30cdd08085414a7516 by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to e6f8112f3ba126eed3ff5b30cdd08085414a7516 mudler/LocalAI#10233
chore: :arrow_up: Update antirez/ds4 to 91bafb5acd5a6cf00b1e55ef68bf40ddd207bee7 by @localai-bot in chore: ⬆️ Update antirez/ds4 to 91bafb5acd5a6cf00b1e55ef68bf40ddd207bee7 mudler/LocalAI#10234
chore: :arrow_up: Update ggml-org/llama.cpp to 039e20a2db9e87b2477c76cc04905f3e1acad77f by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to 039e20a2db9e87b2477c76cc04905f3e1acad77f mudler/LocalAI#10223
chore: :arrow_up: Update CrispStrobe/CrispASR to c29f6653a516a3001d923944dad8892072cc7334 by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to c29f6653a516a3001d923944dad8892072cc7334 mudler/LocalAI#10236

Other Changes

refactor(routing): extract replica picker into pkg/clusterrouting by @localai-bot in refactor(routing): extract replica picker into pkg/clusterrouting mudler/LocalAI#10123
test(react-ui): add page render-smoke specs, reset the coverage gate by @richiejp in test(react-ui): add page render-smoke specs, reset the coverage gate mudler/LocalAI#10122

🙌 New Contributors

@TLoE419 made their first contribution in test(utils): cover path verification, sanitization, and unique naming mudler/LocalAI#9978
@fqscfqj made their first contribution in fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models mudler/LocalAI#10012
@bozhouDev made their first contribution in fix(openai): stop streaming tool-call double-emission when autoparser is active mudler/LocalAI#10055
@Oceankj made their first contribution in test(react-ui): cover models gallery empty-state reset flow mudler/LocalAI#10019
@Zhao73 made their first contribution in docs: fix documentation typos mudler/LocalAI#10125
@petechentw made their first contribution in feat(agents): surface KB source citations in RAG responses mudler/LocalAI#10228

Enjoy!

Full Changelog: mudler/LocalAI@v4.3.0...v4.4.0

View the full release notes at https://github.com/mudler/LocalAI/releases/tag/v4.4.0.

github-actions · 2026-06-10T21:43:58Z

🤖 An automated task has requested bottles to be published to this PR.

Caution

Please do not push to this PR branch before the bottle commits have been pushed, as this results in a state that is difficult to recover from. If you need to resolve a merge conflict, please use a merge commit. Do not force-push to this PR branch.

localai 4.4.0

5969d0a

github-actions Bot added go Go use is a significant feature of the PR or issue nodejs Node or npm use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels Jun 10, 2026

botantony approved these changes Jun 10, 2026

View reviewed changes

localai: update 4.4.0 bottle.

e6a1595

github-actions Bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Jun 10, 2026

github-actions Bot approved these changes Jun 10, 2026

View reviewed changes

BrewTestBot enabled auto-merge June 10, 2026 21:46

BrewTestBot added this pull request to the merge queue Jun 10, 2026

Merged via the queue into main with commit 8875902 Jun 10, 2026
22 checks passed

BrewTestBot deleted the bump-localai-4.4.0 branch June 10, 2026 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

localai 4.4.0#287347

localai 4.4.0#287347
BrewTestBot merged 2 commits into
mainfrom
bump-localai-4.4.0

BrewTestBot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

BrewTestBot commented Jun 10, 2026

📌 TL;DR

🚀 New Features & Major Enhancements

🎙️ Audio Gets Serious: Two New ASR Backends

🧭 Intelligent Middleware: Routing, PII Filtering & Cloud Proxies

🛰️ Distributed Mode v4

🎥 Video, Both Directions

👁️ Native Object Detection + Segmentation: rfdetr-cpp

🗣️ TTS: Per-Request Instructions & Params

🧠 Reasoning & Tool-Call Streaming Hardening

💻 local-ai chat + 📚 RAG Citations + 🛰️ Realtime

🧩 Backend Capability Registration & Startup Speed

What's Changed

Bug fixes :bug:

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

🙌 New Contributors

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

👁️ Native Object Detection + Segmentation: `rfdetr-cpp`

💻 `local-ai chat` + 📚 RAG Citations + 🛰️ Realtime