-
Notifications
You must be signed in to change notification settings - Fork 195
Description
Hi all,
I've been searching "optimized sparse tensor contractions" for a week and somehow only just found this... 😄
I'm curious what the current state of CUDA support is, and how onerous you think it would or wouldn't be to integrate this library with PyTorch. In particular, say I have some single large einsum in a PyTorch model that I want to accelerate, something like:
torch.einsum("zpui,pqrijk,zpqruvw,zqvj->zrwk")
where some of the tensors are dense and some have sparse dimensions with fixed sparsity structure.
I'm not worried about autodifferentiation — it would be simple to take the symbolic derivatives of einsums like this and feed them to TACO to generate separate compute kernels for the backwards pass. So my questions become:
- Is CUDA support mature enough for this kind of application?
- Is it possible to get the generated C/CUDA code from the python library in order to template it into PyTorch C++ extension code (for loading with https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load_inline)?
- How difficult is it to fill the dense part of a mixed dense-sparse TACO tensor from PyTorch tensors?
- Is there any code already out there that works on any of these problems?
Thanks very much for your help and making this tool available!