Skip to content

memory : fix broken batch splits for recurrent cache #14575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2025

Conversation

compilade
Copy link
Collaborator

Splits producing more than one ubatch per batch for recurrent models were broken with #14512.

This could cause SEGFAULTS and possibly other problems when using any ubatch size smaller than a processed batch with a recurrent model (e.g. Mamba, Mamba-2, etc.)

(I first noticed this when getting a SEGFAULT with Mamba after updating #14139 to a commit after #14512 was merged)

This fixes it by moving the completeness check after the ubatch split loop.

Make sure to read the contributing guidelines before submitting a PR

Splits producing more than one ubatch per batch for recurrent models
were broken with #14512.

This fixes it by moving the completeness check after the ubatch split loop.
@compilade compilade requested a review from ggerganov July 8, 2025 01:28
@compilade compilade added the bugfix fixes an issue or bug label Jul 8, 2025
@ggerganov ggerganov merged commit bb4f7a9 into master Jul 8, 2025
48 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 8, 2025
* origin/master:
model : fix hunyuan moe chat template (ggml-org#14584)
model : add SmolLM3 (ggml-org#14581)
memory : fix broken batch splits for recurrent cache (ggml-org#14575)
vulkan : fix rope with partial rotation and non-cont src (ggml-org#14582)
server: Add ability to mount server at prefix (ggml-org#14544)
model : add hunyuan moe (ggml-org#14425)
vulkan: increase timeout for CI (ggml-org#14574)
cuda : fix rope with partial rotation and non-cont src (ggml-org#14580)
CUDA: add bilinear interpolation for upscale (ggml-org#14563)
musa: fix build warnings (unused variable) (ggml-org#14561)
llama : fix incorrect minicpm3 v_states shape (ggml-org#14571)
llama : remove ggml_cont where possible (ggml-org#14568)
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 10, 2025
Splits producing more than one ubatch per batch for recurrent models
were broken with ggml-org#14512.

This fixes it by moving the completeness check after the ubatch split loop.
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 10, 2025
Splits producing more than one ubatch per batch for recurrent models
were broken with ggml-org#14512.

This fixes it by moving the completeness check after the ubatch split loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants