You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we haven't built a benchmark framework to compare the operations in Numba Kernels with RAW CUDA C++. One pending reason why we haven't done so is that we know there will be a performance gap awaiting LTO support. However, this shouldn't be a blocker for us to build the benchmark framework.
There are two aspects of performances that we want to capture by the benchmarks:
Infrastructure to benchmark the performance gap between releases, this indicates the performance gain we get from optimizing Numbast, Numba, CUDA over time
Infrastructure to benchmark the gap between native CUDA C++ and Numba kernels. This measures the overhead of the additional wrappers we built with Numbast (extra shim, IRs).
The text was updated successfully, but these errors were encountered:
Currently we haven't built a benchmark framework to compare the operations in Numba Kernels with RAW CUDA C++. One pending reason why we haven't done so is that we know there will be a performance gap awaiting LTO support. However, this shouldn't be a blocker for us to build the benchmark framework.
There are two aspects of performances that we want to capture by the benchmarks:
The text was updated successfully, but these errors were encountered: