Reduce atomics #1351

angeloskath · 2024-08-23T19:02:19Z

Added a kernel with atomics for a single col reduce. It gets back the performance that we lost from the refactor but I don't like that I don't understand why.

In our reduction benchmark this is significantly slower than the looped version. Moreover the 8-bit and 16-bit versions are painfully slow ~5x in comparison to 32 bits and 64 bits doesn't work at all so I am routing only the 32 bit types there.

I am not proposing we merge this but if anybody feels like playing with it and figuring out how it can be slower to use a 2x faster kernel according to the micro-benchmark I 'd be very interested to learn why.

Kernel with atomics for col reductions

6002d77

angeloskath force-pushed the reduce-atomics branch from 2a32d76 to 6002d77 Compare August 23, 2024 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce atomics #1351

Reduce atomics #1351

angeloskath commented Aug 23, 2024

Reduce atomics #1351

Are you sure you want to change the base?

Reduce atomics #1351

Conversation

angeloskath commented Aug 23, 2024