Open
Description
It'd be nice to have KernelAbstractions/CPU/CUDA "rosetta stone" in the documentation so that you can start coding quickly KernelAbstractions if you know some CUDA API.
I guess it'd be something like
KernelAbstractions | CPU | CUDA |
---|---|---|
@index(Local, Linear) |
mod(i, g) |
threadIdx().x |
@index(Local, Cartesian)[2] |
threadIdx().y |
|
@index(Group, Linear) |
i ÷ g |
blockIdx().x |
@index(Group, Cartesian)[2] |
blockIdx().y |
|
groupsize()[3] |
blockDim().z |
|
prod(groupsize()) |
g |
.x * .y * .z |
workgroup (group) | thread block (block) | |
@index(Global, Linear) |
i |
DIY |
@index(Global, Cartesian)[2] |
DIY | |
local memory (@localmem ) |
@cuStaticSharedMem |
|
private memory (@private ) |
private to loop body | DIY? MArray ? "stack allocation"? |
@uniform |
loop header | no-op? |
@synchronize |
delimit the loop | sync_threads() |
? But making CPU part concise and clear is hard.
(Note for myself: @uniform
is for denoting "loop header" code that is run once. It's used for simulating GPU semantics on CPU; ref: JuliaCon 2020 | How not to write CPU code -- KernelAbstractions.jl | Valentin Churavy (16:28))
By the way, after staring at this table for a while, I wonder if it would have been cleaner if @localmem
was called @groupmem
and @private
was called @localmem
so that you don't need to have to use "private" as a terminology for "more local than local".