You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
I would like more documentation and knowledge to be shared concerning that llama.cpp is able to be built for cuda 11.4 and cuda arch 3.7
Motivation
I was able to pick up some tesla k80s for 20$ each on ebay for other projects I will be doing and for their VRAM and over all cuda performance during what is now again inflating GPU prices I thought these cards might still be good.
Possible Implementation
I was able to compile and run llama.cpp for the Tesla k80 by downgrading gcc and g++ from 12 to 10 installing nvidia driver version 470.256.02 and cuda toolkit 11.4 . I was then able to build it by running cmake with these arguments
While I understand you already support these older cards through vulkan (very cool big fan) I find that lots of performance is left on the table for these older tesla cards. running DeepSeek-R1-Distill-Qwen-7B-F16.gguf with vulkan I was able to achieve around 3 T/s but with cuda I was able to get around 5.5T/s to 6T/s with just 1tesla k80 that I bought for just 20$. And the best part is, I think it is faster I am currently bottle necked by my CPU (AMD opteron 6378) as the one core keeping the GPU fed is pinned to 100%.
Also please do not scoff at the 6T/s I am using this from the perspective that I am using not training ai, and that 6 T/s is impressively fast for 20$ but also adds the ability to add more gpus if you so desire.
Once again I am not asking for support on clearly deprecated hardware but instead discussion on workarounds and bug reports on these old platforms.
The text was updated successfully, but these errors were encountered:
Prerequisites
Feature Description
I would like more documentation and knowledge to be shared concerning that llama.cpp is able to be built for cuda 11.4 and cuda arch 3.7
Motivation
I was able to pick up some tesla k80s for 20$ each on ebay for other projects I will be doing and for their VRAM and over all cuda performance during what is now again inflating GPU prices I thought these cards might still be good.
Possible Implementation
I was able to compile and run llama.cpp for the Tesla k80 by downgrading gcc and g++ from 12 to 10 installing nvidia driver version 470.256.02 and cuda toolkit 11.4 . I was then able to build it by running cmake with these arguments
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.4/bin/nvcc -DCMAKE_CUDA_ARCHITECTURES='52;61;70;75;37'
While I understand you already support these older cards through vulkan (very cool big fan) I find that lots of performance is left on the table for these older tesla cards. running DeepSeek-R1-Distill-Qwen-7B-F16.gguf with vulkan I was able to achieve around 3 T/s but with cuda I was able to get around 5.5T/s to 6T/s with just 1tesla k80 that I bought for just 20$. And the best part is, I think it is faster I am currently bottle necked by my CPU (AMD opteron 6378) as the one core keeping the GPU fed is pinned to 100%.
Also please do not scoff at the 6T/s I am using this from the perspective that I am using not training ai, and that 6 T/s is impressively fast for 20$ but also adds the ability to add more gpus if you so desire.
Once again I am not asking for support on clearly deprecated hardware but instead discussion on workarounds and bug reports on these old platforms.
The text was updated successfully, but these errors were encountered: