Open
Description
Hello,
I'm working on porting clBLAS on my company's accelerator. Our OpenCL library only support maximum
16 work-items per work group. So I fall in the unimplemented case in the kernel generator (solution_seq_make.cpp:getDefaultStepGranulation() ) where maxWorkGroupSize < 64 is not supported. I would like to implement this but don't know how to do and its algorithms back-scene. Anyone can help or explain me ? Thanks in advance.
Quan