Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU with GGFU LLM #2429

Open
Znbne opened this issue Oct 25, 2024 · 1 comment
Open

GPU with GGFU LLM #2429

Znbne opened this issue Oct 25, 2024 · 1 comment

Comments

@Znbne
Copy link

Znbne commented Oct 25, 2024

Hi,
I am running some accuracy tests with quantized models. I run the following commands but it takes a long time to run a small test and apparently NVIDIA GPU is not utilized while its available:

python -c "import torch; print(torch.cuda.is_available())"
True

python -m llama_cpp.server --model ./model/Yi-Coder-1.5B-Chat-Q4_0_4_4.gguf --host 0.0.0.0
lm_eval --model gguf --tasks arc_challenge --output_path ./model/yi/ --device cuda:0 --model_args model=./model/Yi-Coder-1.5B-Chat-Q4_0_4_4.gguf,base_url=http://localhost:8000

Any guidance will be appreciated. Thanks!

@Znbne
Copy link
Author

Znbne commented Oct 26, 2024

Hi again,
I have found this issue resolving the slowness of running gguf models using cuda: #1437 (comment)
I have tried the same method too. But no GPU is utilized on my system.

Is there any additional information needed from my side? I would appreciate any guidance on the matter. Thanks for your time and support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant