You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: automatic VAE-tiling fallback when an untiled decode exceeds the backend buffer limit
VAE decode can hard-fail on integrated / low-VRAM GPUs because the untiled compute
buffer exceeds the backend's maximum single-buffer allocation (e.g. Vulkan's
suballocation limit) even when total memory is plentiful. sd.cpp already supports
tiling that keeps each compute buffer small, but it had to be requested up front
with --vae-tiling, so users hit a hard failure one flag away from the working path.
Make the fallback automatic and on by default:
- sd_tiling_params_t gains a bool auto_tile (appended, so the C ABI stays
compatible). In AUTO (the default: --vae-tiling off, auto_tile on) VAE::decode
tries the untiled decode and, if its compute buffer can't be allocated, frees it
and retries once with tiling.
- --vae-tiling stays the original boolean flag (force tiling on);
--no-vae-tiling-fallback turns the auto fallback off (hard-fail like before).
- GGMLRunner gets an opt-in probe (set_probe_compute_buffer_fits) so AUTO can
decline a too-large untiled decode before the backend emits its raw allocation
error. On Vulkan it checks each op against the device's real per-buffer limit via
ggml_backend_supports_op (the reported max buffer size, not the smaller
suballocation block); other backends compare the planned compute buffer against
ggml_backend_buft_get_max_size. The reactive output-empty -> tile path still
backstops a genuine runtime OOM.
- extra_tiling_args gains a max_buffer_size=<bytes> key: in AUTO the fallback also
tiles when the planned untiled compute buffer would exceed it, letting a user cap
VAE VRAM on any backend.
"extra VAE tiling args, key=value list. max_buffer_size (bytes) forces the auto fallback to tile when an untiled VAE compute buffer would exceed it. LTX video VAE supports temporal_tile_frames (default: 4), temporal_tile_overlap (default: 1)",
Copy file name to clipboardExpand all lines: examples/server/api.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -518,7 +518,7 @@ Shared default fields used by both `img_gen` and `vid_gen`:
518
518
|`output_format`|`string`|
519
519
|`output_compression`|`integer`|
520
520
521
-
`vae_tiling_params.extra_tiling_args` accepts a key=value list. For LTX video VAE temporal tiling, `temporal_tile_frames` defaults to `4` and `temporal_tile_overlap` defaults to `1`.
521
+
`vae_tiling_params.extra_tiling_args` accepts a key=value list. `max_buffer_size` (bytes) forces the automatic tiling fallback when an untiled VAE compute buffer would exceed it. For LTX video VAE temporal tiling, `temporal_tile_frames` defaults to `4` and `temporal_tile_overlap` defaults to `1`.
0 commit comments