why determine VEC_SIZE by 16 bytes in attention_kernels #14339

Cordius · 2025-03-06T08:25:48Z

Cordius
Mar 6, 2025

In the comment of the kernel function paged_attention_kernel, the vector size is configured in such a way that the threads in a thread group fetch or compute 16 bytes at a time.

I want to know why choose 16 bytes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why determine VEC_SIZE by 16 bytes in attention_kernels #14339

{{title}}

Replies: 0 comments

Select a reply

why determine VEC_SIZE by 16 bytes in attention_kernels #14339

Cordius Mar 6, 2025

Replies: 0 comments

Cordius
Mar 6, 2025