Skip to content

Conversation

@leejet
Copy link
Contributor

@leejet leejet commented Oct 23, 2025

This PR fixes an invalid CUDA kernel launch issue for k_compute_batched_ptrs when ne12 or ne13 is large.

  • The previous launch used dim3 block(ne13, ne12), which can exceed the maximum threads per block (1024) and trigger cudaErrorInvalidConfiguration.
  • Updated the launch to use a fixed block size (e.g., 32×32 threads) and compute the required number of blocks in each dimension, ensuring the total threads per block do not exceed CUDA limits.
  • No changes were made to the kernel logic itself; only the grid/block configuration was adjusted.

@leejet leejet requested a review from slaren as a code owner October 23, 2025 15:00
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 23, 2025
@jeffbolznv
Copy link
Collaborator

Same comment for this one. Please add a backend test that hits this case. The bug may exist in other backends, too.

@leejet
Copy link
Contributor Author

leejet commented Oct 23, 2025

The backend test has been added.

@github-actions github-actions bot added the testing Everything test related label Oct 23, 2025

// test cases with large batch size
test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 8, 256, {1024, 2}, {1, 1}));
test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 8, 256, {4096, 1}, {1, 1}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a batch size of 1024 be enough to test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1024 happens to be the maximum number of threads, so it won’t trigger any issues. I’ve now changed the batch size to 1536, which is smaller than before.

@am17an
Copy link
Collaborator

am17an commented Oct 26, 2025

@JohannesGaessler merge?

@JohannesGaessler JohannesGaessler merged commit bbac6a2 into ggml-org:master Oct 26, 2025
71 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants