Releases: YellowRoseCx/koboldcpp-rocm
Releases · YellowRoseCx/koboldcpp-rocm
KoboldCPP-v1.104.yr0-ROCm
koboldcpp-1.104
calm before the storm edition
- NEW: Added
--smartcacheadapted from @Pento95: This is a 2-in-1 dynamic caching solution that intelligently creates KV state snapshots automatically. Read more here- This will greatly speed up performance when different contexts are swapped back to back (e.g. hosting on AI Horde or shared instances).
- Also allows snapshotting when used with a RNN or Hybrid model (e.g. Qwen3Next, RWKV) which avoids having to reprocess everything.
- Reuses the KV save/load states from admin mode. Max number of KV states increased to 6.
- NEW: Added
--autofitflag which utilizes upstream's "automatic GPU fitting (-fit)" behavior from ggml-org#16653. Note that this flag overwrites all your manual layer configs and tensor overrides and is not guaranteed to work. However, it can provide a better automatic fit in some cases. Will not be accurate if you load multiple models e.g. image gen. - Pipeline parallelism is no longer the default, instead its now a flag you can enable with
--pipelineparallel. Only affects multi-gpu setups, faster speed at the cost of memory usage. - Key Improvement - Vision Bugfix: A bug in mrope position handling has been fixed, which improves vision models like Qwen3-VL. You should now see much better visual accuracy in some multimodal models compared to earlier koboldcpp versions. If you previously had issues with hallucinated text or numbers, it should be much better now.
- Increased default gen amount from 768 to 896.
- Deprecated obsolete
--forceversionflag. - Fixed safetensors loading for Z-Image
- Fixed image importer in SDUI
- Capped cfg_scale to max 3.0 for Z-Image to avoid blurry gens. If you want to override this, set
remove_limitsto1in your payload or inside--sdgendefaults. - Removed cc7.0 as a CUDA build target, Volta (V100) will fall back to PTX from cc6.1
- Tweaked branding in llama.cpp UI to make it clear it's not llama.cpp
- Added indentation to .kcpps configs
- Updated Kobold Lite, multiple fixes and improvements
- Merged fixes and improvements from upstream
- GLM4.6V and GLM4.6V Flash are now supported. You can get the model and the mmproj here.
- If you want to test out GLM ASR Nano, I've made quants here, works best with short audio clips, for longer audio please stick to Whisper.
- NEW: Added support for Flux2 and Z-Image Turbo! Another big thanks to @leejet for the sd.cpp implementation and @wbruna for the assistance with testing and merging.
- To obtain models for Z-Image (Most recommended, lightweight):
- Get the Z-Image Image model here
- Get the Z-Image VAE here, which is the same vae as FluxOne.
- Get the Z-Image text encoder here (load this as Clip 1)
- Alternative: Load this template to download all 3 automatically
- To obtain models for Flux2 (Not recommended as this model is huge so i will link the q2k. Remember to enable cpu offload. Running anything larger requires a very powerful GPU):
- NEW: Mistral and Ministral 3 model support has been merged from upstream.
- Improved "Assistant Continue" in llama.cpp UI mode, now can be used to continue partial turns.
- We have added prefill support to chat completions if you have /lcpp in your URL (/lcpp/v1/chat/completions), the regular chat completions is meant to mimick OpenAI and does not do this. Point your frontend to the URL that most fits your use case, we'd like feedback on which one of these you prefer and if the /lcpp behavior would break an existing use case.
- Minor tool calling fix to avoid passing base64 media strings into the tool call.
- Tweaked resizing behavior of the launcher UI.
- Added a secondary terminal UI to view the console logging (only for Linux), can be used even when not launched from CLI. Launch this auxiliary terminal from the Extras tab.
- AutoGuess Template fixes for GPT-OSS and Kimi
- Fixed a bug with
--showguimode being saved into some configs - Updated Kobold Lite, multiple fixes and improvements
- Merged fixes and improvements from upstream
- New: Now bundles the llama.cpp UI into KoboldCpp, as an extra option for those who prefer it. Access it at http://localhost:5001/lcpp
- The llama.cpp UI is designed strongly for assistant use-cases and provides a ChatGPT like interface, with support for importing documents like .pdf files. It can be accessed in parallel to the usual KoboldAI Lite UI (which is recommended for roleplay/story writing) and does not take up any additional resources while not in use.
- New: Massive universal tool calling improvement from @Rose22, with the new format KoboldCpp is now even better at calling tools and using multiple tools in sequence correctly. Works automatically with all tool calling capable frontends (OpenWebUI / SillyTavern etc) in chat completions mode and may work on models that normally do not support tool calling (in the correct format).
- New: Added support for jinja2 templates via
/v1/chat/completions, for those that have been asking for it. There are 3 modes:- Current Default: Uses KoboldCpp ChatAdapter templates, KoboldCpp universal toolcalling module (current behavior, most recommended).
- Using
--jinja: Uses jinja2 template from GGUF in chat completions mode for normal messages, uses KoboldCpp universal toolcalling module. Use this only if you love jinja. There are GGUF models on Huggingface which will explicitly mention --jinja must be used to get normal results, this does not apply to KoboldCpp as our regular modes cover these cases. - Using
--jinja_tools: Uses jinaja2 template from GGUF in chat completions mode for all messages and tools. Not recommended in general. In this mode the model and frontend are responsible for the compatibility.
- Synced and updated Image Generation to latest stable-diffusion.cpp, big thanks to @wbruna. Please report any issues you encounter.
- Updated google Colab notebook with easier default selectable presets, thanks @henk717
- Allow GUI launcher window to be resized slightly larger horizontally, in case some text gets cut off.
- Fixed a divide by zero error with audio projectors
- Added Vulkan support for whisper.
- Filename insensitive search when selecting chat completion adapters
- Fixed an old bug that caused mirostat to swap parameters. To get the same result as before, swap values for
tauandeta. - Added a debug command
--testmemoryto check what values auto GPU detection retrieves (not needed for most) - Now serves KoboldAI Lite UI gzipped to browsers that can support it, for faster UI loading.
- Added sampler support for smoothing curve
- Updated Kobold Lite, multiple fixes and improvements
- Web Link-sharing now defaults to dpaste.com as dpaste.org is shut down
- Added option to save and load custom scenarios in a Scenario Library (like stories but do not contain most settings)
- Allow single-turn deletion and editing in classic theme instruct mode (click on the icon)
- Better turn chunking and repacking after editing a message
- Merged new model support, fixes and improvements from upstream
Hotfix 1.102.2 - Try to fix some issues with flash attention, fixed media attachments in jinja mode
Hotfix 1.102.3 - Merged Qwen3Next support. Note that you need to use batch size 512 or less.
- Support for Qwen3-VL is merged - For a quick test, get the Qwen3-VL-2B-Instruct model here and the mmproj here. Larger versions exist, but this will work well enough for simple tasks.
- Added Qwen Image and Qwen Image Edit - Support is now officially available for Qwen Image generation models. These have much better prompt adherence than SDXL or even Flux. Here's how to set up qwen image edit:
- Get the Qwen Image Edit 2509 model here and load it as the image gen model
- Get the Qwen Image VAE and load it as VAE
- Get Qwen2.5-VL-7B-Instruct and **load it as Clip-...
KoboldCPP-v1.98.1.yr0-ROCm
workflow
KoboldCPP-v1.97.4.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.96.2.yr1-ROCm
Merge pull request #133 from nissim-c/patch-1 Update README.md for Fedora
KoboldCPP-v1.96.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.95.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.94.2.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.93.2.yr0-ROCm
runner drives moved back to D:\
KoboldCPP-v1.92.1.yr0-ROCm
I'm sorry for the long absence. If the main version gives you an error, try the b2 version. I have not added support for the latest RX 9000 cards yet (I'm sorry). Hopefully I can compile a solid rocBLAS build with all the GPUs and the new GPUs so it's back to one exe again.
If you find your way back here and download this version, thank you for sticking around <3 If you're new here, welcome to AMD AI hell xD jkjk but frfr welcome to the wonderful world of open source AMD GPU AI Generation :)
KoboldCPP-v1.86.2.yr0-ROCm
Update cmake-B-rocm-windows.yml