linux GPU usage maxed at 15%? #1401
JustPlaneTastic
started this conversation in
General
Replies: 1 comment
-
Hi, I'm on Windows but have the same issue, any chance you found out the solution? Thanks |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So, first off, I love privateGPT. Greate work is going on here. Thank you.
I am attempting to make use of it, but things have been quite slow for me. I've read through a few issues as well as multiple reinstalls, and a few model tests and I can't quite get the performance I am looking for and I am hoping I am just missing something rather than it is a functional issue.
I am running these in order to implement:
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
PGPT_PROFILES=local make run
privateGPT loads up with BLAS = 1
and shows that it see's the following details on the card:
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 70.42 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: VRAM used: 4095.06 MiB
...............................................................................................
llama_new_context_with_model: n_ctx = 4000
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 500.00 MiB, K (f16): 250.00 MiB, V (f16): 250.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 284.88 MiB
llama_new_context_with_model: VRAM scratch buffer: 281.82 MiB
llama_new_context_with_model: total VRAM used: 4376.88 MiB (model: 4095.06 MiB, context: 281.82 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
When I test prompts and questions I regularly see the CPU pegged, but nvtop is only showing (at the most) a 15% usage of my GPU, but most of the time in the single digits.
I've seen reference to some .env files, and privateGPT.py adjustments, but I am not seeing those files in my installation in order to edit them. Surely I have a config adjustment I am missing.
Any pointers would be appreciated.
Thanks
Before writing this questions I read through these as well as the documentation for privateGPT.
#217
#425
maozdemir#2
Beta Was this translation helpful? Give feedback.
All reactions