localai 4.4.0#287347
Merged
Merged
Conversation
botantony
approved these changes
Jun 10, 2026
Contributor
|
🤖 An automated task has requested bottles to be published to this PR. Caution Please do not push to this PR branch before the bottle commits have been pushed, as this results in a state that is difficult to recover from. If you need to resolve a merge conflict, please use a merge commit. Do not force-push to this PR branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Created by
brew bumpCreated with
brew bump-formula-pr.Details
release notes
parakeet-cpp(NeMo FastConformer TDT/CTC/RNNT, streaming, word/segment timestamps) andcrispasr(many ASR architectures + TTS in one binary).llama-cppvia mtmd, and video generation via LTX-2 instablediffusion-ggml.rfdetr-cppbackend (RF-DETR), 32 prebuilt GGUFs, bbox + per-detection PNG masks.pkg/httpclientrefuses cross-host credential-leaking redirects across every outbound client (GHSA-3mj3-57v2-4636).instructions+ a genericparamsmap plumbed end to end (Qwen3-TTS VoiceDesign / CustomVoice, Chatterbox).local-ai chat/models,/model,/clear.Sources:block from the Knowledge Base.🚀 New Features & Major Enhancements
🎙️ Audio Gets Serious: Two New ASR Backends
This release doubles down on speech-to-text with two independent, cgo-less Go backends (purego,
CGO_ENABLED=0), each shipping a full CI matrix, gallery importer and docs.parakeet-cpp- NVIDIA NeMo Parakeet (#10084). Wraps parakeet.cpp, a C++/ggml port of NeMo Parakeet (FastConformer TDT/CTC/RNNT/hybrid) that matches the upstream PyTorch models on CPU. Text transcription, OpenAI-compatible word timestamps, and cache-aware streaming (16 kHz PCM chunks,<EOU>/<EOB>utterance boundaries). GGUFs for all 10 Parakeet models × 5 quants ship inmudler/parakeet-cpp-gguf. Follow-ups in this cycle made it production-grade:get_segment_offsets(sentence-punctuation boundaries by default, opt-insegment_gap_thresholdsilence splitting in encoder frames). StreamingFinalResultsegments now carrystart/endwhen the library exposes the ABI v4 JSON entry points.nemotron-3.5-asrmultilingual streaming (#10199) + per-request language selection.crispasr- many architectures + TTS in one backend (#10099). Wraps CrispASR (a whisper.cpp/ggml fork, MIT) through its session C-ABI. One backend serves ASR or TTS depending on the loaded model, with the architecture auto-detected from the GGUF (or forced viabackend:). The gallery gains 36-crispasrentries (32 ASR + 4 TTS):backend:/codec:/speaker:/voice:model options.🧭 Intelligent Middleware: Routing, PII Filtering & Cloud Proxies
A new middleware layer (#9802) analyzes, routes, filters and transforms chat requests before they hit a model.
/v1/chat/completions, Anthropic/v1/messages, and/v1/completions. A per-model PII pattern editor lives in the model config UI.proxy_connect+proxy_trafficaudit events and restores its listener fromruntime_settings.jsonon restart.Usage stats are recorded end to end and surfaced in REST, the UI, and MCP. Outbound clients used by this path were also the trigger for the security pass below.
🛰️ Distributed Mode v4
Distributed mode keeps maturing across routing, security and resilience.
Prefix-cache-aware routing, on by default (#10071). Routing now biases toward the replica that already holds the relevant KV/prefix cache, as a load-guarded hint that never routes worse than today's round-robin. A generic prefix tree (
pkg/radixtree) maps cumulative prompt-prefix hashes to nodes;core/services/nodes/prefixcacheturns the rendered prompt into a deterministic xxhash chain and makes a filter-then-score decision (narrow to load-eligible replicas, then prefer the longest-prefix match), feeding apreferredNodeIDinto the existing atomicSELECT ... FOR UPDATEpick. Observations sync across frontends over NATS. Round-robin is the floor; disable with--distributed-prefix-cache=false.NATS JWT auth + TLS/mTLS (#10159). Previously anyone with access to the NATS port could publish backend-install messages or agent jobs (an SSRF / accidental-exposure risk). This adds JWT authentication and TLS/mTLS options, with workers acquiring and auto-refreshing their NATS credentials. Complemented by worker file-transfer registration-token enforcement (#10183).
Resumable file transfers (#10109). Large model GGUFs over flaky/throttled links no longer restart from byte 0. The worker's
PUT /v1/files/<key>honorsContent-Range(308/416 resume semantics,X-Content-SHA256binding, final-hash verification) and the master-side stager HEAD-probes for the last accepted offset and resumes, switching to an outer time budget (LOCALAI_FILE_TRANSFER_BUDGET, default 1h) with exponential backoff.ds4 layer-split distributed inference (#10098). Manual layer-split inference for the ds4 backend: a coordinator owns layers
0:Kand listens; workers dial in and own higher ranges, each loading only its slice of the GGUF (a new dependency-freeds4-workerbinary, driven vialocal-ai worker ds4-distributed). Fully back-compatible whends4_roleis absent.Operational glue. Boot-time gallery prefetch via
LOCALAI_PREFETCH_MODELS(#10108); a gatedX-LocalAI-Noderesponse header for attribution (#9976); plus fixes: self-heal stale "model not loaded" routing (#10181), stage directory-based models to remote nodes (#10175), in-flight tracking for non-LLM methods - VAD, diarize, voice (#10238), reconciler survives frontend restarts (#9981), cross-replica OpCache sync (#9983), and the reinstall/upgrade UI no longer sticks on "reinstalling" (#10214).🎥 Video, Both Directions
Video input / understanding in
llama-cpp(#10216). Video-capable multimodal models (e.g. SmolVLM2-Video) can now be sent a video in a chat request, mirroring the existing image and audio paths. Tracks the upstream mtmd video landing (ggml-org/llama.cpp#24269);grpc-server.cppforwardsrequest->videos()into the mtmdfilesvector on both the template and non-template paths, and the React chat UI acceptsvideo/*, renders an inline<video controls>player, and emitsvideo_urlcontent parts.allow_videois auto-gated by whether the loaded mmproj supports it. ffmpeg/ffprobe (already in the runtime image) extract frames.Video generation via LTX-2 (#9980).
stablediffusion-ggmlwiresaudio_vae_pathandembeddings_connectors_paththrough to the upstream LTX-2 fields, with a newgallery/ltx-ggml.yamltemplate (T2V / I2V / FLF2V recipes) and six LTX-2.3 22B GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0), each bundling the text encoder + video VAE + audio VAE + embeddings connectors. Follow-up fixes wired thediffusion_modelflag andvae_decode_only:falsefor the i2v/flf2v paths (#9986, #9987) and muxed LTX-2 audio into the output MP4 (#9990).👁️ Native Object Detection + Segmentation:
rfdetr-cppA new Go native gRPC backend (#10028) dlopens
librfdetr.so(built from mudler/rf-detr.cpp) and exposes the RF-DETR pipeline through LocalAI'sDetectRPC. Supports all 5 detection variants (Nano…Large) and 3 segmentation variants (SegNano/SegSmall/SegMedium) at F32/F16/Q8_0/Q4_K, with 32 prebuilt GGUFs on HuggingFace. Detection returns bbox + class_name + confidence; segmentation adds per-detection PNG-encoded masks. Matches PyTorch on CPU (sub-pixel bbox match, mask IoU 0.99+), with an HF gallery importer that auto-routes GGUF repos to the native backend.🗣️ TTS: Per-Request Instructions & Params
The OpenAI-compatible
/v1/audio/speechinstructionsfield was silently dropped at the HTTP→gRPC boundary, so style/voice could only come from static YAML. PR #10172 plumbs a generic per-requestinstructionsstring and an optional backend-specificparamsmap end to end (proto, schema,core/backend/tts.go), unlocking per-line emotion/style (Qwen3-TTS CustomVoice, Chatterbox) and describe-a-voice (Qwen3-TTS VoiceDesign) from a single model config. Fully backward compatible - emptyinstructionsfalls back to YAML.Also: Qwen3-TTS request-language normalization for flexible matching (#10174), and LocalVQE v1.3 with input/output spectrogram views in the Audio Transform UI (#10113).
🧠 Reasoning & Tool-Call Streaming Hardening
A focused run of correctness fixes for reasoning models and streaming tool calls:
reasoning_efforthonored per request and forwarded to the backend so jinja models can act on it (#10082, #10184).<think>parsing: stop<think>leaking into content in pure-content mode (#9991), stop a prefilled<think>from swallowing tag-less answers (#10225), and don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208).💻
local-ai chat+ 📚 RAG Citations + 🛰️ Realtimelocal-ai chatcommand connects to a running server over the OpenAI-compatible API, streams completions, and supports/models,/model <name>,/clear,/exit. Keepslocal-ai runfocused on the server lifecycle. (Fixes #1535.)Sources:block listing the original documents - deduplicated per source, with the citation-free version saved to long-term memory. (Closes #9331.)LOCALAI_WEBRTC_NAT_1TO1_IPS/LOCALAI_WEBRTC_ICE_INTERFACESknobs fix/v1/realtimecalls dropping a few seconds in under Docker host networking (unroutabledocker0/vethcandidates)./api/operationspoller across UI consumers (#10029) and a React bundle code-split (#10042).🧩 Backend Capability Registration & Startup Speed
BackendCapabilities(#10107), and add face/speaker-recognition constants registeringinsightface+speaker-recognition(#10110).Click for the full changelog below!
What's Changed
Bug fixes :bug:
Exciting New Features 🎉
🧠 Models
📖 Documentation and examples
👒 Dependencies
3f40e73c367ad9f0c1b1819f28c7348c26aa340dby @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to3f40e73c367ad9f0c1b1819f28c7348c26aa340dmudler/LocalAI#10097ba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdcby @localai-bot in chore: ⬆️ Update antirez/ds4 toba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdcmudler/LocalAI#10095d2797b86670622b6538123b4aeb5fbb6be2653c5by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp tod2797b86670622b6538123b4aeb5fbb6be2653c5mudler/LocalAI#10094d6588daa800058dfa54f1d7ea695b1a810c8ae18by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp tod6588daa800058dfa54f1d7ea695b1a810c8ae18mudler/LocalAI#10093cb45f68068081af01e7092e91b038ee353eb56beby @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp tocb45f68068081af01e7092e91b038ee353eb56bemudler/LocalAI#10116fe69461618ffc50ba8afa65c25cc6c6e34d4537fby @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp tofe69461618ffc50ba8afa65c25cc6c6e34d4537fmudler/LocalAI#10117be65ac7511b30379b003626c15224798929e33d4by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp tobe65ac7511b30379b003626c15224798929e33d4mudler/LocalAI#10118399739d5c5978351f39e3454bfbfbab4f369088fby @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to399739d5c5978351f39e3454bfbfbab4f369088fmudler/LocalAI#1011923ee03506a91ac3d3f0071b40e66a430eebdfa1dby @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to23ee03506a91ac3d3f0071b40e66a430eebdfa1dmudler/LocalAI#101307948df8ac1070f5f6881b8d34675821893eb97d6by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to7948df8ac1070f5f6881b8d34675821893eb97d6mudler/LocalAI#101278a7c48209d7882a7ce79a6b306270e4703194543by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to8a7c48209d7882a7ce79a6b306270e4703194543mudler/LocalAI#101295dcb71166686799f0d873eab7386234302d05ecfby @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to5dcb71166686799f0d873eab7386234302d05ecfmudler/LocalAI#1012805e60432bcb5bc2113f8c395a41e86497c11504aby @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to05e60432bcb5bc2113f8c395a41e86497c11504amudler/LocalAI#101159edf17c3ada66e0f881dcff155492867db7ac4cfby @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to9edf17c3ada66e0f881dcff155492867db7ac4cfmudler/LocalAI#101412d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to2d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5mudler/LocalAI#10144610e664ba7cfe3af46125ed1b5a1184fccb51bcdby @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to610e664ba7cfe3af46125ed1b5a1184fccb51bcdmudler/LocalAI#101405c394fdc8b564eff6faacc50a139529d875f0e36by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to5c394fdc8b564eff6faacc50a139529d875f0e36mudler/LocalAI#10143477c0e82e2699b35a65fd0a1ed6fe66b41087dfeby @localai-bot in chore: ⬆️ Update antirez/ds4 to477c0e82e2699b35a65fd0a1ed6fe66b41087dfemudler/LocalAI#1014294a220cd6745e6e3f8de62870b66fd5b9bc92700by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to94a220cd6745e6e3f8de62870b66fd5b9bc92700mudler/LocalAI#101681f9ee88e09c258053fa59d5e05e23dfb10fa0b13by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to1f9ee88e09c258053fa59d5e05e23dfb10fa0b13mudler/LocalAI#1016613d54e110e1538e0f0bc3af0680b9ab246cfb48dby @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to13d54e110e1538e0f0bc3af0680b9ab246cfb48dmudler/LocalAI#10145136e5d36c17083da0321fd96512dc7b263f94a44by @localai-bot in chore: ⬆️ Update predict-woo/qwen3-tts.cpp to136e5d36c17083da0321fd96512dc7b263f94a44mudler/LocalAI#10165b11fe5bca78ad8b342dd559a43d76df3984bb447by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp tob11fe5bca78ad8b342dd559a43d76df3984bb447mudler/LocalAI#101671520eda980564241434b791ce2bbbd128c4be9eaby @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to1520eda980564241434b791ce2bbbd128c4be9eamudler/LocalAI#101807c158fbb4aec1bdc9c81d6ca0e785139f4826faeby @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to7c158fbb4aec1bdc9c81d6ca0e785139f4826faemudler/LocalAI#1017999613cb720b65036237d44b52f753b51f75c2797by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp to99613cb720b65036237d44b52f753b51f75c2797mudler/LocalAI#101780.22.1by @localai-bot in chore: ⬆️ Update vllm-project/vllm cu130 wheel to0.22.1mudler/LocalAI#10188843600590f96a31467a5199f827c253f34c110f7by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp to843600590f96a31467a5199f827c253f34c110f7mudler/LocalAI#101986b9de3dbaa21ae95ea80638e5ee836795cc48c93by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to6b9de3dbaa21ae95ea80638e5ee836795cc48c93mudler/LocalAI#10190abd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp toabd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67mudler/LocalAI#10204a8ec021f2750a473ff4a8f3883bc9fdf5feafa84by @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp toa8ec021f2750a473ff4a8f3883bc9fdf5feafa84mudler/LocalAI#1020231e82494c0a3913c919c1027fa70500fbf4c07ddby @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to31e82494c0a3913c919c1027fa70500fbf4c07ddmudler/LocalAI#10191e270af73b94c9a5c37ec516230219ed4580e1db6by @localai-bot in chore: ⬆️ Update mudler/parakeet.cpp toe270af73b94c9a5c37ec516230219ed4580e1db6mudler/LocalAI#10212b3d56d0ba1bd437886079e339118e8e75bb79ee7by @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp tob3d56d0ba1bd437886079e339118e8e75bb79ee7mudler/LocalAI#102119e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66by @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66mudler/LocalAI#10210c463029c205c2ec8d7ab6c0df4a3f52979091286by @localai-bot in chore: ⬆️ Update antirez/ds4 toc463029c205c2ec8d7ab6c0df4a3f52979091286mudler/LocalAI#10189f7838a306687f22c281d29c250f879a4ab3df2d7by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR tof7838a306687f22c281d29c250f879a4ab3df2d7mudler/LocalAI#10177512d07cb08f234b704b5a5959aa9e2d4c466eeb0by @localai-bot in chore: ⬆️ Update antirez/ds4 to512d07cb08f234b704b5a5959aa9e2d4c466eeb0mudler/LocalAI#102242768b6251548b78b6610e95edad13f888ad95982by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp to2768b6251548b78b6610e95edad13f888ad95982mudler/LocalAI#1021919bdfe22d255d5b4dff39d449318b9bc5ea2317fby @localai-bot in chore: ⬆️ Update leejet/stable-diffusion.cpp to19bdfe22d255d5b4dff39d449318b9bc5ea2317fmudler/LocalAI#1022297cad527d247edefc904e6c40c4cf5ee78bed055by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR to97cad527d247edefc904e6c40c4cf5ee78bed055mudler/LocalAI#10221df7638d8229a243af8a4b5a8ae557e0d74e0a0aeby @localai-bot in chore: ⬆️ Update ggml-org/whisper.cpp todf7638d8229a243af8a4b5a8ae557e0d74e0a0aemudler/LocalAI#10220e6f8112f3ba126eed3ff5b30cdd08085414a7516by @localai-bot in chore: ⬆️ Update ikawrakow/ik_llama.cpp toe6f8112f3ba126eed3ff5b30cdd08085414a7516mudler/LocalAI#1023391bafb5acd5a6cf00b1e55ef68bf40ddd207bee7by @localai-bot in chore: ⬆️ Update antirez/ds4 to91bafb5acd5a6cf00b1e55ef68bf40ddd207bee7mudler/LocalAI#10234039e20a2db9e87b2477c76cc04905f3e1acad77fby @localai-bot in chore: ⬆️ Update ggml-org/llama.cpp to039e20a2db9e87b2477c76cc04905f3e1acad77fmudler/LocalAI#10223c29f6653a516a3001d923944dad8892072cc7334by @localai-bot in chore: ⬆️ Update CrispStrobe/CrispASR toc29f6653a516a3001d923944dad8892072cc7334mudler/LocalAI#10236Other Changes
🙌 New Contributors
Enjoy!
Full Changelog: mudler/LocalAI@v4.3.0...v4.4.0
View the full release notes at https://github.com/mudler/LocalAI/releases/tag/v4.4.0.