You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am running some accuracy tests with quantized models. I run the following commands but it takes a long time to run a small test and apparently NVIDIA GPU is not utilized while its available:
Hi again,
I have found this issue resolving the slowness of running gguf models using cuda: #1437 (comment)
I have tried the same method too. But no GPU is utilized on my system.
Is there any additional information needed from my side? I would appreciate any guidance on the matter. Thanks for your time and support!
Hi,
I am running some accuracy tests with quantized models. I run the following commands but it takes a long time to run a small test and apparently NVIDIA GPU is not utilized while its available:
python -c "import torch; print(torch.cuda.is_available())"
True
python -m llama_cpp.server --model ./model/Yi-Coder-1.5B-Chat-Q4_0_4_4.gguf --host 0.0.0.0
lm_eval --model gguf --tasks arc_challenge --output_path ./model/yi/ --device cuda:0 --model_args model=./model/Yi-Coder-1.5B-Chat-Q4_0_4_4.gguf,base_url=http://localhost:8000
Any guidance will be appreciated. Thanks!
The text was updated successfully, but these errors were encountered: