Target Architecture Optimizations #66
bigwolfeman
started this conversation in
General
Replies: 1 comment
-
|
ineresing |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When it comes to optimization what matters at the end of the day is what kernel path you compile down to.
This means some subset of optimizations must target hardware. For instance the latest blackwell supports TF32. We also may want to consider leveraging GEMM and GEMV more, and also see how quantization before training effects training. The Chinese are showing strong performance on INT4.
We also need to consider targeting TPU. TPU is cheaper to train on than NVIDIA hardware on average (~2.5x lower per VRAM GB). If we maintain a TPU and a CUDA path in the model we open up the ability to partner with more groups.
Beta Was this translation helpful? Give feedback.
All reactions