I have 2GPUs, each 24GB, yet I get "Cuda out of memory" Error #27

mmdrahmani · 2025-02-06T19:34:18Z

Hi again

I have 2GPUs, each 24GB. I am using lit-llama-7B model. I was trying to set using two GPUs, by setting the number of devices here, fabric = L.Fabric(accelerator=accelerator, devices=2), but the model uses only one GPU (cuda:0), and the other one is not used at all. However, I get this error:
CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 23.68 GiB total capacity; 23.03 GiB already allocated; 142.56 MiB free; 23.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Am I doing something wrong? How can I make the model use both GPUs?

Thank you so much for your support.

The text was updated successfully, but these errors were encountered:

41xu · 2025-02-10T00:03:33Z

i'm not the author and also not tried 24g gpus, but maybe this is helpful for you👀
Lightning-AI/lit-llama#191
set bf16-mixed or bfloat16 (depends on your script)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have 2GPUs, each 24GB, yet I get "Cuda out of memory" Error #27

I have 2GPUs, each 24GB, yet I get "Cuda out of memory" Error #27

mmdrahmani commented Feb 6, 2025 •

edited

Loading

41xu commented Feb 10, 2025

I have 2GPUs, each 24GB, yet I get "Cuda out of memory" Error #27

I have 2GPUs, each 24GB, yet I get "Cuda out of memory" Error #27

Comments

mmdrahmani commented Feb 6, 2025 • edited Loading

41xu commented Feb 10, 2025

mmdrahmani commented Feb 6, 2025 •

edited

Loading