Skip to content

[draft] Attempt to implement the Flux Klein bonsai model#1580

Closed
Juste-Leo2 wants to merge 2 commits into
leejet:masterfrom
Juste-Leo2:bonsai
Closed

[draft] Attempt to implement the Flux Klein bonsai model#1580
Juste-Leo2 wants to merge 2 commits into
leejet:masterfrom
Juste-Leo2:bonsai

Conversation

@Juste-Leo2

Copy link
Copy Markdown
Contributor

This draft adds the necessary mappings to make the dequantized 4B model work.

Opencode was used, but I am noticing significant differences during inference that I haven't been able to patch using DeepSeek Flash on Opencode.

Here are some avenues to explore:

  • The issue could potentially stem from the scheduler behaving differently.
  • Re-analyze the VAE for the umpteenth time :)

Note: The text encoder seems to use <think> tags, but it doesn't appear to have much of an impact.

I suspect the problem lies either in the denoising process or in the VAE conversion. However, despite comparing it with the Python implementation, I haven't been able to pinpoint the issue yet. Any help would be greatly appreciated!

Here is an inference example:

git clone --recursive https://github.com/Juste-Leo2/stable-diffusion.cpp
cd stable-diffusion.cpp
 
git checkout bonsai

git submodule init
git submodule update

cmake -B build -DCMAKE_BUILD_TYPE=Release -DSD_CUDA=ON

# Compilation
cmake --build build -j

cd /home/leo/stable-diffusion.cpp && timeout 180 ./build/bin/sd-cli --cfg-scale 1 --width 512 --height 512 --steps 4 --seed 42 -p "a cat sitting on a window sill" -o /tmp/vae_fp32_test.png --diffusion-model /tmp/hf_cache_bf16/prism-ml_bonsai-image-ternary-4B-unpacked/transformer/diffusion_pytorch_model.safetensors --vae /tmp/hf_cache_bf16/prism-ml_bonsai-image-ternary-4B-unpacked/vae/diffusion_pytorch_model.safetensors --llm /tmp/hf_cache_bf16/prism-ml_bonsai-image-ternary-4B-unpacked/text_encoder/ 2>&1 | tail -30

Here is the result using the Python reference:

image

Here is the result using the fork:
image

- Add FLUX.2 tensor name mappings for shared modulation, fused
  single-block QKV+MLP, and separate double-block Q/K/V
- Add SelfAttention fused_qkv option for separate Q/K/V weights
- Add sharded safetensors loader for LLM text encoder
- Add image-studio workspace
…Bonsai)

VAE config has 'force_upcast: true', which diffusers uses to cast the
entire VAE to float32 before decode. C++ has no equivalent, causing
yellow/chroma noise in output. Workaround: set expected_type = GGML_TYPE_F32
on all first_stage_model tensor entries before ae.init().
@Juste-Leo2 Juste-Leo2 closed this May 29, 2026
@Juste-Leo2

Copy link
Copy Markdown
Contributor Author

I picked the wrong thread entirely; I've closed it

@Green-Sky

Copy link
Copy Markdown
Contributor

I suggest you use my script on the model first. https://huggingface.co/Green-Sky/bonsai-image-binary-4B-GGUF/blob/main/f2_from_diffusers.py

Literally the only thing that needs implementing is the ternary quant into ggml, which this is not the right place to do.

@Juste-Leo2

Juste-Leo2 commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

Je vous suggère d'utiliser d'abord mon script sur le modèle. https://huggingface.co/Green-Sky/bonsai-image-binary-4B-GGUF/blob/main/f2_from_diffusers.py

La seule chose qui reste à implémenter est la quantification ternaire dans ggml, mais ce n'est pas le bon endroit pour le faire.

Thanks so much for the advice !. I'll look into that. My original plan was to build the engine specifically for this model, fully optimized.

Yeah, that was a misclick on my part, it's in the wrong place :)

@Juste-Leo2

Copy link
Copy Markdown
Contributor Author

I had the chance to test your model, and it works great :) . So I don't really understand the issue with noisy images that I had with the dequantized versions. Maybe I didn't handle it correctly. Thanks again @Green-Sky for implementing this GGUF; it's going to help me make progress on a potential optimization.

@Green-Sky

Copy link
Copy Markdown
Contributor

Maybe I didn't handle it correctly.

You really should check the python script again I have in the hf repo :)

@Juste-Leo2

Copy link
Copy Markdown
Contributor Author

Maybe I didn't handle it correctly.

You really should check the python script again I have in the hf repo :)

I ran some more tests based on the code. I adapted the C code and it worked (I should have done that from the start—thanks for insisting 😅). The issue was with the final_layer weight swap, which I hadn’t included in the C code but you had included in the .py file.
I’m wondering, if I want to develop exclusively with the bonsai models, would you recommend using the adapted C code or your corrected safetensors with the native code?
I’d tend to say the adapted C code since we can directly use the original safetensors for tenary and binary, but I’m curious to hear your opinion :)

@Green-Sky

Green-Sky commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

sd.cpp generally uses comfyui adopted safetensors instead of the diffusers format safetensors, this is not unique to bonsai-image/flux2. So I recommend the python converted way, if you want to avoid too many differences with upstream and your project to survive beyond the prototype stage :)

edit: also regarding optimizations like megakernels like they did in the paper, you should look at op-fusing (in ggml backends).

@Juste-Leo2

Copy link
Copy Markdown
Contributor Author

sd.cpp generally uses comfyui adopted safetensors instead of the diffusers format safetensors, this is not unique to bonsai-image/flux2. So I recommend the python converted way, if you want to avoid too many differences with upstream and your project to survive beyond the prototype stage :)

edit: also regarding optimizations like megakernels like they did in the paper, you should look at op-fusing (in ggml backends).

It's definitely worth getting used to comfyUI. I'll grab the Python script you wrote—thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants