-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OneDNN or DirectML support #2303
Comments
+1 for oneDNN |
cf @rfsaliev |
Update: Now with OpenCL it takes 40s to transcribe 47s audio on the same normal TPU hardware (amd ryzen5 4500u) By the way there was weird issues with OpenCL that prevent it from work. the solution I found is to set |
Any updates on this? |
Use vulkan it works fast with amd glu on all platforms. You can try with the app vibe |
@thewh1teagle thx that looks great, giving it a try! |
@thewh1teagle There is a good minute at the start where it says "Transcribing - 0%", thought it wasn't working (no cpu/gpu/io activity), maybe adding some more detailed logging in whatever is happening at the beginning helps here. |
You can take a look in the docs of the repository in debug.md. |
DirectML support would indeed be nice. This would basically cover 100% of GPU's and NPU's on the windows platform with one single backend . If there was DirectML support for Windows it presume it would be the preferred universal windows backend |
There's already Vulkan support and it works great on Windows and Linux. I assume that it uses DirectML behind the scenes on Windows. |
I was meaning to try it, will give it a go. I was mainly worried that it was not going to be robust yet unless with absolutely latest GPU drivers from all vendors.
Don't think so, its an entirely different API. DirectML is based on DX12 compute shaders and DX12 pipeline, but I suppose vulkan implementation is directly based on custom Vulkan compute shaders. |
Currently the best results we can get with whisper.cpp is with Cuda (Nvidia) or CoreML (macOS).
On Windows there's only OpenBlas and it works slow, maybe 2 times of the duration of the audio (amd ryzen 5 4500u, medium model).
When using ctranslate2 on the same machine it works 2-3 times faster than the audio duration on CPU only!
Since recently whisper.cpp removed support for OpenCL, I think that it's important having good alternative to Windows users with Intel / AMD CPUs / TPUs.
There's few different options that can be added:
oneDNN-ExecutionProvider.html
DirectML-ExecutionProvider.html
In addition ctranslate2 uses ruy
Related: ggml-org/ggml#406 (comment)
The text was updated successfully, but these errors were encountered: