GPU with GGFU LLM #2429

Znbne · 2024-10-25T10:46:35Z

Hi,
I am running some accuracy tests with quantized models. I run the following commands but it takes a long time to run a small test and apparently NVIDIA GPU is not utilized while its available:

python -c "import torch; print(torch.cuda.is_available())"
True

python -m llama_cpp.server --model ./model/Yi-Coder-1.5B-Chat-Q4_0_4_4.gguf --host 0.0.0.0
lm_eval --model gguf --tasks arc_challenge --output_path ./model/yi/ --device cuda:0 --model_args model=./model/Yi-Coder-1.5B-Chat-Q4_0_4_4.gguf,base_url=http://localhost:8000

Any guidance will be appreciated. Thanks!

Znbne · 2024-10-26T18:26:19Z

Hi again,
I have found this issue resolving the slowness of running gguf models using cuda: #1437 (comment)
I have tried the same method too. But no GPU is utilized on my system.

Is there any additional information needed from my side? I would appreciate any guidance on the matter. Thanks for your time and support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU with GGFU LLM #2429

GPU with GGFU LLM #2429

Znbne commented Oct 25, 2024

Znbne commented Oct 26, 2024 •

edited

Loading

GPU with GGFU LLM #2429

GPU with GGFU LLM #2429

Comments

Znbne commented Oct 25, 2024

Znbne commented Oct 26, 2024 • edited Loading

Znbne commented Oct 26, 2024 •

edited

Loading