LaunchConfig parameters #828

richardhboyd · 2025-08-11T14:01:18Z

richardhboyd
Aug 11, 2025

The launch config takes a grid and a block parameter. The block parameter appears to be the number of threads per block. I think this aligns with how the 'regular' annotation works in C++ where you specify <<< number_of_blocks, number_of_threads_per block>>>. When I pass an int to a parameter named block I assume I am telling it how many blocks to use, but I'm actually telling it how 'big' the block should be. If I were creating a block that took an int parameter, I would assume that int is describing how big the block should be.

Does anybody have a mental model that makes this a bit more intuitive? I'm not criticizing the design, I'm just assuming that there's some way of thinking about this that makes the interface feel more intuitive.

Answered by leofang

Aug 17, 2025

Sorry for late reply. Starting the introduction of the Hopper GPU (cc 9.0), the CUDA programming model gains a new level in the thread hierarchy called "thread block clusters." The new hierarchy goes like this: a grid can have one or more clusters, a cluster can have one or more blocks, and a block can have one or more threads.

It presents a new challenge to the traditional CUDA C++ triple chevron syntax, because it does not allow simultaneously specifying all hierarchical information at once; that is, <<<grid, cluster, block>>>, where grid, cluster, and block are all dim3 objects with integer-overloads (so N means (N, 1, 1)), is not supported due to the ambiguity in overload resolution. …

View full answer

leofang · 2025-08-17T23:06:01Z

leofang
Aug 17, 2025
Maintainer

Sorry for late reply. Starting the introduction of the Hopper GPU (cc 9.0), the CUDA programming model gains a new level in the thread hierarchy called "thread block clusters." The new hierarchy goes like this: a grid can have one or more clusters, a cluster can have one or more blocks, and a block can have one or more threads.

It presents a new challenge to the traditional CUDA C++ triple chevron syntax, because it does not allow simultaneously specifying all hierarchical information at once; that is, <<<grid, cluster, block>>>, where grid, cluster, and block are all dim3 objects with integer-overloads (so N means (N, 1, 1)), is not supported due to the ambiguity in overload resolution. To work around this, two new syntaxes as outlined here are introduced to allow using either the old launch syntax <<<grid, block>>> or a new one <<<cluster>>>, with other information specified as kernel attributes.

In the new projects cuda.core (Python) and cccl-rt (C++) we have the unique opportunity to express the thread hierarchy without issues, using the launch-config-based approach. It is also future-proof should a new hierarchical level be introduced again in the future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LaunchConfig parameters #828

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LaunchConfig parameters #828

Uh oh!

richardhboyd Aug 11, 2025

Replies: 1 comment

Uh oh!

leofang Aug 17, 2025 Maintainer

richardhboyd
Aug 11, 2025

leofang
Aug 17, 2025
Maintainer