Replies: 8 comments
-
+1 to this, would be nice to get comparable performance to TensorRT without having to export models to ONNX etc. first! |
Beta Was this translation helpful? Give feedback.
-
@mindbeast @bionictoucan @hietalajulius Hi, thanks for the comment. Yes, that makes sense in general. Right now, for ExecuTorch, we are integrating Vulkan into ExecuTorch. The reason is that it is a suitable solution for mobile GPUs. Enabling mobile use-cases is our primary goal at the moment. We will revisit Cuda, but perhaps, in the second half in the year. Curious, what are your current product needs? |
Beta Was this translation helpful? Give feedback.
-
Apologies for opening a similar feature request in #5263.
@mergennachin We want to deploy LLMs in cars, but Python-based inference frameworks like vLLM and SGLang are not suitable for edge devices.
Nearly five months have passed, is there any progress on this? |
Beta Was this translation helpful? Give feedback.
-
Thank you for following up @DzAvril.
I guess this is using a platform similar to Jetson?
No update yet on CUDA backend for ET at the moment. We can get back to you here once we plan something. |
Beta Was this translation helpful? Give feedback.
-
@digantdesai Yes, Jetson Orin for now, and possibly Thor in the future. Looking forward to your update. |
Beta Was this translation helpful? Give feedback.
-
For mobile cuda backend, does |
Beta Was this translation helpful? Give feedback.
-
@DuinoDu My expectation is that compatibility is poor with torch_tensorrt. I expect a more compliant backend in executorch would help a lot of developers. |
Beta Was this translation helpful? Give feedback.
-
Does it make sense for executorch to have a mobile cuda backend? There are many edge devices in the Jetson lineup from nvidia that have a cuda gpu, but can benefit from not wanting to link an enormous libtorch dependence.
Beta Was this translation helpful? Give feedback.
All reactions