Skip to content

fix: correct Gemma3 rope settings and vram limit propagation#1583

Merged
leejet merged 1 commit into
masterfrom
fix/llm-rope-vram-limit
May 30, 2026
Merged

fix: correct Gemma3 rope settings and vram limit propagation#1583
leejet merged 1 commit into
masterfrom
fix/llm-rope-vram-limit

Conversation

@leejet

@leejet leejet commented May 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Update Gemma3 12B LLM RoPE handling to use NeoX RoPE with a 131072 context value for query and key tensors.
  • Forward max graph VRAM limits through the LTXAV embedder to both the LLM and projection components.

Related Issue / Discussion

N/A

Additional Information

N/A

Checklist

@leejet leejet merged commit d2797b8 into master May 30, 2026
14 checks passed
wbruna pushed a commit to wbruna/stable-diffusion.cpp that referenced this pull request May 30, 2026
@leejet leejet deleted the fix/llm-rope-vram-limit branch May 31, 2026 17:45
dbrain added a commit to dbrain/hbd-longcat-avatar.cpp that referenced this pull request Jun 9, 2026
Brings the upstream src-layout reorg (leejet#1615 model/, core/, conditioning/,
runtime/, extensions/), the new offload path (leejet#1601 pinned host buffer,
leejet#1576 --stream-layers), vram-limit propagation (leejet#1583), APG/PiD/ideogram4,
and the photomaker->generation-extension move (leejet#1618).

Conflict resolution (4 files):
- model.h / stable-diffusion.cpp: union the fork's LONGCAT_AVATAR version with
  upstream's PiD/Ideogram4; keep the avatar deferred-DiT-load + per-frame
  timestep zeroing, adopt upstream's alloc error-checks + generation-extensions
  alloc loop (pmid is now an extension); keep whisper-encoder alloc.
- conditioner.hpp: keep both set_keep_params_resident + set_stream_layers_enabled.
- ggml_extend.hpp: keep the fork's coherent offload system (lap-32 pinned alloc,
  lap-32.2 H2D pipelining, partial/all-param restore, umT5 free-then-reload null
  fix, lap-28 F16-KV/mask attention) and fold upstream's persistent_externals
  snapshot + observed_max_effective_budget reset alongside; flash_skip_kv_pad
  opt-out coexists with upstream leejet#1453's unconditional kv-pad removal.
- Repointed fork-only headers (longcat_avatar/audio, nava, nava example) at the
  new nested include paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant