Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Enable cuda 11.4 and cuda arch 3.7 #12140

Open
4 tasks done
ChunkyPanda03 opened this issue Mar 2, 2025 · 0 comments
Open
4 tasks done

Feature Request: Enable cuda 11.4 and cuda arch 3.7 #12140

ChunkyPanda03 opened this issue Mar 2, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@ChunkyPanda03
Copy link

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I would like more documentation and knowledge to be shared concerning that llama.cpp is able to be built for cuda 11.4 and cuda arch 3.7

Motivation

I was able to pick up some tesla k80s for 20$ each on ebay for other projects I will be doing and for their VRAM and over all cuda performance during what is now again inflating GPU prices I thought these cards might still be good.

Possible Implementation

I was able to compile and run llama.cpp for the Tesla k80 by downgrading gcc and g++ from 12 to 10 installing nvidia driver version 470.256.02 and cuda toolkit 11.4 . I was then able to build it by running cmake with these arguments

cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.4/bin/nvcc -DCMAKE_CUDA_ARCHITECTURES='52;61;70;75;37'

While I understand you already support these older cards through vulkan (very cool big fan) I find that lots of performance is left on the table for these older tesla cards. running DeepSeek-R1-Distill-Qwen-7B-F16.gguf with vulkan I was able to achieve around 3 T/s but with cuda I was able to get around 5.5T/s to 6T/s with just 1tesla k80 that I bought for just 20$. And the best part is, I think it is faster I am currently bottle necked by my CPU (AMD opteron 6378) as the one core keeping the GPU fed is pinned to 100%.

Also please do not scoff at the 6T/s I am using this from the perspective that I am using not training ai, and that 6 T/s is impressively fast for 20$ but also adds the ability to add more gpus if you so desire.

Once again I am not asking for support on clearly deprecated hardware but instead discussion on workarounds and bug reports on these old platforms.

@ChunkyPanda03 ChunkyPanda03 added the enhancement New feature or request label Mar 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant