[draft] Attempt to implement the Flux Klein bonsai model by Juste-Leo2 · Pull Request #1580 · leejet/stable-diffusion.cpp

Juste-Leo2 · 2026-05-29T19:05:59Z

This draft adds the necessary mappings to make the dequantized 4B model work.

Opencode was used, but I am noticing significant differences during inference that I haven't been able to patch using DeepSeek Flash on Opencode.

Here are some avenues to explore:

The issue could potentially stem from the scheduler behaving differently.
Re-analyze the VAE for the umpteenth time :)

Note: The text encoder seems to use <think> tags, but it doesn't appear to have much of an impact.

I suspect the problem lies either in the denoising process or in the VAE conversion. However, despite comparing it with the Python implementation, I haven't been able to pinpoint the issue yet. Any help would be greatly appreciated!

Here is an inference example:

git clone --recursive https://github.com/Juste-Leo2/stable-diffusion.cpp
cd stable-diffusion.cpp
 
git checkout bonsai

git submodule init
git submodule update

cmake -B build -DCMAKE_BUILD_TYPE=Release -DSD_CUDA=ON

# Compilation
cmake --build build -j

cd /home/leo/stable-diffusion.cpp && timeout 180 ./build/bin/sd-cli --cfg-scale 1 --width 512 --height 512 --steps 4 --seed 42 -p "a cat sitting on a window sill" -o /tmp/vae_fp32_test.png --diffusion-model /tmp/hf_cache_bf16/prism-ml_bonsai-image-ternary-4B-unpacked/transformer/diffusion_pytorch_model.safetensors --vae /tmp/hf_cache_bf16/prism-ml_bonsai-image-ternary-4B-unpacked/vae/diffusion_pytorch_model.safetensors --llm /tmp/hf_cache_bf16/prism-ml_bonsai-image-ternary-4B-unpacked/text_encoder/ 2>&1 | tail -30

Here is the result using the Python reference:

Here is the result using the fork:

- Add FLUX.2 tensor name mappings for shared modulation, fused single-block QKV+MLP, and separate double-block Q/K/V - Add SelfAttention fused_qkv option for separate Q/K/V weights - Add sharded safetensors loader for LLM text encoder - Add image-studio workspace

…Bonsai) VAE config has 'force_upcast: true', which diffusers uses to cast the entire VAE to float32 before decode. C++ has no equivalent, causing yellow/chroma noise in output. Workaround: set expected_type = GGML_TYPE_F32 on all first_stage_model tensor entries before ae.init().

Juste-Leo2 · 2026-05-29T19:07:20Z

I picked the wrong thread entirely; I've closed it

Green-Sky · 2026-05-29T19:39:17Z

I suggest you use my script on the model first. https://huggingface.co/Green-Sky/bonsai-image-binary-4B-GGUF/blob/main/f2_from_diffusers.py

Literally the only thing that needs implementing is the ternary quant into ggml, which this is not the right place to do.

Juste-Leo2 · 2026-05-29T19:44:13Z

Je vous suggère d'utiliser d'abord mon script sur le modèle. https://huggingface.co/Green-Sky/bonsai-image-binary-4B-GGUF/blob/main/f2_from_diffusers.py

La seule chose qui reste à implémenter est la quantification ternaire dans ggml, mais ce n'est pas le bon endroit pour le faire.

Thanks so much for the advice !. I'll look into that. My original plan was to build the engine specifically for this model, fully optimized.

Yeah, that was a misclick on my part, it's in the wrong place :)

Juste-Leo2 · 2026-05-31T19:35:58Z

I had the chance to test your model, and it works great :) . So I don't really understand the issue with noisy images that I had with the dequantized versions. Maybe I didn't handle it correctly. Thanks again @Green-Sky for implementing this GGUF; it's going to help me make progress on a potential optimization.

Green-Sky · 2026-06-01T05:30:23Z

Maybe I didn't handle it correctly.

You really should check the python script again I have in the hf repo :)

Juste-Leo2 · 2026-06-01T09:01:52Z

Maybe I didn't handle it correctly.

You really should check the python script again I have in the hf repo :)

I ran some more tests based on the code. I adapted the C code and it worked (I should have done that from the start—thanks for insisting 😅). The issue was with the final_layer weight swap, which I hadn’t included in the C code but you had included in the .py file.
I’m wondering, if I want to develop exclusively with the bonsai models, would you recommend using the adapted C code or your corrected safetensors with the native code?
I’d tend to say the adapted C code since we can directly use the original safetensors for tenary and binary, but I’m curious to hear your opinion :)

Green-Sky · 2026-06-01T09:15:40Z

sd.cpp generally uses comfyui adopted safetensors instead of the diffusers format safetensors, this is not unique to bonsai-image/flux2. So I recommend the python converted way, if you want to avoid too many differences with upstream and your project to survive beyond the prototype stage :)

edit: also regarding optimizations like megakernels like they did in the paper, you should look at op-fusing (in ggml backends).

Juste-Leo2 · 2026-06-01T09:17:45Z

sd.cpp generally uses comfyui adopted safetensors instead of the diffusers format safetensors, this is not unique to bonsai-image/flux2. So I recommend the python converted way, if you want to avoid too many differences with upstream and your project to survive beyond the prototype stage :)

edit: also regarding optimizations like megakernels like they did in the paper, you should look at op-fusing (in ggml backends).

It's definitely worth getting used to comfyUI. I'll grab the Python script you wrote—thanks again!

Juste-Leo2 added 2 commits May 29, 2026 18:43

Juste-Leo2 closed this May 29, 2026

Juste-Leo2 mentioned this pull request May 29, 2026

[draft] Attempt to implement the Flux Klein bonsai model Juste-Leo2/stable-diffusion.cpp#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] Attempt to implement the Flux Klein bonsai model#1580

[draft] Attempt to implement the Flux Klein bonsai model#1580
Juste-Leo2 wants to merge 2 commits into
leejet:masterfrom
Juste-Leo2:bonsai

Juste-Leo2 commented May 29, 2026

Uh oh!

Juste-Leo2 commented May 29, 2026

Uh oh!

Green-Sky commented May 29, 2026

Uh oh!

Juste-Leo2 commented May 29, 2026 •

edited

Loading

Uh oh!

Juste-Leo2 commented May 31, 2026

Uh oh!

Green-Sky commented Jun 1, 2026

Uh oh!

Juste-Leo2 commented Jun 1, 2026

Uh oh!

Green-Sky commented Jun 1, 2026 •

edited

Loading

Uh oh!

Juste-Leo2 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Juste-Leo2 commented May 29, 2026

Uh oh!

Juste-Leo2 commented May 29, 2026

Uh oh!

Green-Sky commented May 29, 2026

Uh oh!

Juste-Leo2 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Juste-Leo2 commented May 31, 2026

Uh oh!

Green-Sky commented Jun 1, 2026

Uh oh!

Juste-Leo2 commented Jun 1, 2026

Uh oh!

Green-Sky commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Juste-Leo2 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Juste-Leo2 commented May 29, 2026 •

edited

Loading

Green-Sky commented Jun 1, 2026 •

edited

Loading