KernelAbstractions <-> CPU <-> CUDA terminology/API table

It'd be nice to have KernelAbstractions/CPU/CUDA "rosetta stone" in the documentation so that you can start coding quickly KernelAbstractions if you know some CUDA API.

I guess it'd be something like

| KernelAbstractions | CPU | CUDA |
| --- | --- | --- |
| `@index(Local, Linear)`           | `mod(i, g)`   | `threadIdx().x` |
| `@index(Local, Cartesian)[2]`     |               | `threadIdx().y` |
| `@index(Group, Linear)`           | `i ÷ g`      | `blockIdx().x`  |
| `@index(Group, Cartesian)[2]`     |               | `blockIdx().y`  |
| `groupsize()[3]`                  |               | `blockDim().z`  |
| `prod(groupsize())`               | `g`           | `.x * .y * .z`  |
| workgroup (group)                 |               | thread block (block) |
| `@index(Global, Linear)`          | `i`           | DIY |
| `@index(Global, Cartesian)[2]`    |               | DIY |
| local memory (`@localmem`)        |               | `@cuStaticSharedMem` |
| private memory (`@private`)       | private to loop body | DIY? `MArray`? "stack allocation"? |
| `@uniform`                        | loop header   | no-op? |
| `@synchronize`                    | delimit the loop | `sync_threads()` |

? But making CPU part concise and clear is hard.

(Note for myself: `@uniform` is for denoting "loop header" code that is run once. It's used for simulating GPU semantics on CPU; ref: [JuliaCon 2020 | How not to write CPU code -- KernelAbstractions.jl | Valentin Churavy (16:28)](https://www.youtube.com/watch?v=2Wq2AnO42k4&t=988))

By the way, after staring at this table for a while, I wonder if it would have been cleaner if `@localmem` was called `@groupmem` and `@private` was called `@localmem` so that you don't need to have to use "private" as a terminology for "more local than local".


KernelAbstractions	CPU	CUDA
`@index(Local, Linear)`	`mod(i, g)`	`threadIdx().x`
`@index(Local, Cartesian)[2]`		`threadIdx().y`
`@index(Group, Linear)`	`i ÷ g`	`blockIdx().x`
`@index(Group, Cartesian)[2]`		`blockIdx().y`
`groupsize()[3]`		`blockDim().z`
`prod(groupsize())`	`g`	`.x * .y * .z`
workgroup (group)		thread block (block)
`@index(Global, Linear)`	`i`	DIY
`@index(Global, Cartesian)[2]`		DIY
local memory (`@localmem`)		`@cuStaticSharedMem`
private memory (`@private`)	private to loop body	DIY? `MArray`? "stack allocation"?
`@uniform`	loop header	no-op?
`@synchronize`	delimit the loop	`sync_threads()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KernelAbstractions <-> CPU <-> CUDA terminology/API table #217

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

KernelAbstractions <-> CPU <-> CUDA terminology/API table #217

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions