Commit 616d8d0
Mark Caldwell
feat: PuLID-Flux identity-injection support
This PR adds support for [PuLID-Flux](https://github.com/ToTheBeginning/PuLID)
identity preservation to the Flux denoise loop. Given a single source
portrait, generated images preserve the source person's face across
arbitrary scenes and prompts.
### What's included
- `src/pulid.hpp` — `PuLIDPerceiverAttentionCA`, the cross-attention
module mirroring the PyTorch reference at
[ToTheBeginning/PuLID/.../encoders_transformer.py](https://github.com/ToTheBeginning/PuLID/blob/main/pulid/encoders_transformer.py).
Pure-ggml graph; runs on CPU / CUDA / Vulkan / Metal without
backend-specific code.
- `src/flux.hpp` — adds 20 `pulid_ca.<i>` child blocks to `Flux`
(constructed conditionally when `params.pulid_enabled` is set),
inserts the cross-attention call between transformer blocks at the
intervals the PyTorch reference uses (every 2nd double block, every
4th single block), and threads two new optional parameters
(`pulid_id`, `pulid_id_weight`) through `forward`, `forward_orig`,
`forward_chroma_radiance`, `forward_flux_chroma`, `compute`, and
`build_graph`.
- `src/stable-diffusion.cpp` — loads `pulid_*.safetensors` via
`model_loader.init_from_file` under the existing
`model.diffusion_model.` prefix so PuLID-CA tensors bind to the new
blocks naturally. PuLID-encoder keys (which live in the precompute
tool, not in C++) are correctly identified as unknown. Adds
`load_pulid_id_embedding()` to parse a small `.pulidembd` binary
file and wraps its content as a `sd::Tensor<float>` passed via
`DiffusionParams`.
- `include/stable-diffusion.h` — public API: `sd_pulid_params_t`
(per-generation embedding path + weight), `pulid_weights_path` on
`sd_ctx_params_t`, `pulid_params` on `sd_img_gen_params_t`.
- `examples/common/common.{cpp,h}` — three new CLI flags:
`--pulid-weights <path>`, `--pulid-id-embedding <path>`, and
`--pulid-id-weight <float>`.
- `src/diffusion_model.hpp` — extends `DiffusionParams` to carry the
new identity embedding + weight; `FluxModel::compute` forwards both
through.
- `docs/pulid.md` — usage, binary format spec, supported PuLID weight
versions (v0.9.0 / v0.9.1; v1.1 deferred), memory budget notes, and
a three-way SHA-256 falsification recipe.
- `scripts/pulid_extract_id.py` — reference precompute tool that
produces the `.pulidembd` binary from a source portrait. Lives
outside the C++ build because identity extraction (insightface +
EVA-CLIP-L + IDFormer) is a heavy PyTorch stack that would be
impractical to port to ggml just to run once per source person.
### Why split extraction from injection
PuLID-Flux's identity extractor is a stack of three large PyTorch
models (ArcFace face detector + EVA-CLIP-L vision encoder + IDFormer
perceiver-resampler). Porting all three to C++/ggml would add ~5000
lines for code that runs once per source person and produces a 131 KB
output. By making sd.cpp consume a precomputed binary file, the C++
surface area is small (~600 lines), the heavy ML stack only needs to
run once per person on any backend that supports PyTorch, and adding
PuLID is decoupled from the active development on insightface /
EVA-CLIP / IDFormer.
### Binary format
```
offset 0 : magic "PULIDV01" (8 bytes ASCII)
offset 8 : num_tokens (uint32 LE)
offset 12 : token_dim (uint32 LE)
offset 16 : dtype (uint8): 0=fp16, 1=bf16, 2=fp32
offset 17 : reserved zeros (15 bytes; header total = 32)
offset 32 : tokens, row-major LE
```
Typical (32, 2048, fp16) = 131 KB.
### Verification
The three-way SHA-256 falsification recipe in docs/pulid.md
distinguishes "the feature is wired but doesn't do anything" from
"the feature is actively altering the diffusion trajectory":
| Run | Expected hash relation |
|-----------------------------------------|--------------------------------------------|
| A: no `--pulid-*` flags | baseline |
| B: PuLID flags, `--pulid-id-weight 0.0` | byte-identical to A |
| C: PuLID flags, `--pulid-id-weight 1.0` | differs, preserves source identity |
Verified on three backends with the same source code:
- **Vulkan-AMD** (RX 6700 XT, `-DSD_VULKAN=ON`): A == B byte-identical,
A != C, C visually preserves source identity.
- **Vulkan-NVIDIA** (RTX 3060, same binary, `--backend "diffusion=vulkan1"`):
A == B, A != C, C visually equivalent to the AMD output at the same
seed (different bytes per the usual cross-backend nondeterminism).
- **CUDA-NVIDIA** (RTX 3060, separate `-DSD_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=86`
build against CUDA 13.2): A == B byte-identical, A != C, C visually
preserves source identity. PerceiverAttentionCA's pure-ggml graph
code runs unchanged across all three backends -- no backend-specific
conditionals were needed.
Per-image sampling times at 512x512 / 4 steps / Flux Schnell Q4 + PuLID:
| Backend | Sampling (s) | Notes |
|------------------------|-------------:|--------------------------------|
| AMD 6700 XT (Vulkan) | 22 | 12 GB consumer card |
| NVIDIA 3060 (Vulkan) | 11 | same binary as AMD |
| NVIDIA 3060 (CUDA) | 9.6 | separate `-DSD_CUDA=ON` build |
batch_count=3 was tested separately and confirms the long-lived-worker
amortization story: per-image sampling drops from 19.6 s (cold) to
~11 s (warm) as the model stays resident across batch iterations.
Tested with Flux Schnell Q4_K_S + PuLID v0.9.1 at 512x512 / 4 steps,
and Flux Dev Q4_K_S + PuLID v0.9.1 at 768x768 / 20 steps. 1024x1024 +
Dev + PuLID OOMs on a 12 GB card unless the VAE is routed to the CPU
backend via `--backend "vae=cpu"` (not just `--vae-on-cpu`, which only
offloads weights, not the compute buffer); this is existing
stable-diffusion.cpp behavior, not a PuLID-specific issue, but
documented in docs/pulid.md because PuLID users will hit it.
Tested with batch_count > 1 (verified each image gets the same
identity, different composition).
### Not yet supported (called out in docs/pulid.md)
- PuLID v1.1 (`pulid_v1.1.safetensors`) -- has renamed key layout
(`id_adapter_attn_layers.*` vs `pulid_ca.*`) and potentially
different module structure. Follow-up PR.
- Multiple ID images fused into one embedding (the reference Python
pipeline supports this; the current precompute tool accepts only
one portrait per run).
- The `--true-cfg` negative-prompt branch -- PuLID only injects on the
positive conditioning path in the reference implementation; this
matches.
### Backward compatibility
Non-PuLID generations are unaffected. The `params.pulid_enabled` flag
defaults to false and is only set when the model loader sees a
`pulid_ca.*` tensor in the loaded safetensors file. A regression run
of Flux Schnell Q4 without `--pulid-*` flags produces byte-identical
output to pre-patch.
### File summary
```
include/stable-diffusion.h +34 / -0
src/stable-diffusion.cpp +120 / -0
src/diffusion_model.hpp +5 / -1
src/flux.hpp +106 / -10
src/pulid.hpp +127 / -0 (new)
examples/common/common.h +6 / -0
examples/common/common.cpp +19 / -0
docs/pulid.md +220 / -0 (new)
scripts/pulid_extract_id.py +135 / -0 (new)
```
Total ~770 added lines, ~10 changed. No removed functionality.1 parent 3a8788c commit 616d8d0
9 files changed
Lines changed: 821 additions & 17 deletions
File tree
- docs
- examples/common
- include
- scripts
- src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
384 | 384 | | |
385 | 385 | | |
386 | 386 | | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
387 | 391 | | |
388 | 392 | | |
389 | 393 | | |
| |||
746 | 750 | | |
747 | 751 | | |
748 | 752 | | |
| 753 | + | |
749 | 754 | | |
750 | 755 | | |
751 | 756 | | |
| |||
825 | 830 | | |
826 | 831 | | |
827 | 832 | | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
828 | 837 | | |
829 | 838 | | |
830 | 839 | | |
| |||
975 | 984 | | |
976 | 985 | | |
977 | 986 | | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
978 | 991 | | |
979 | 992 | | |
980 | 993 | | |
| |||
2207 | 2220 | | |
2208 | 2221 | | |
2209 | 2222 | | |
| 2223 | + | |
| 2224 | + | |
| 2225 | + | |
| 2226 | + | |
| 2227 | + | |
2210 | 2228 | | |
2211 | 2229 | | |
2212 | 2230 | | |
| |||
2227 | 2245 | | |
2228 | 2246 | | |
2229 | 2247 | | |
| 2248 | + | |
2230 | 2249 | | |
2231 | 2250 | | |
2232 | 2251 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
103 | 108 | | |
104 | 109 | | |
105 | 110 | | |
| |||
196 | 201 | | |
197 | 202 | | |
198 | 203 | | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
199 | 210 | | |
200 | 211 | | |
201 | 212 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
189 | 199 | | |
190 | 200 | | |
191 | 201 | | |
| |||
266 | 276 | | |
267 | 277 | | |
268 | 278 | | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
269 | 302 | | |
270 | 303 | | |
271 | 304 | | |
| |||
358 | 391 | | |
359 | 392 | | |
360 | 393 | | |
| 394 | + | |
361 | 395 | | |
362 | 396 | | |
363 | 397 | | |
| |||
0 commit comments