Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

显存大小 #9

Open
DZ1204 opened this issue Feb 22, 2025 · 1 comment
Open

显存大小 #9

DZ1204 opened this issue Feb 22, 2025 · 1 comment

Comments

@DZ1204
Copy link

DZ1204 commented Feb 22, 2025

[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 290.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 101.31 MiB is free. Including non-PyTorch memory, this process has 23.57 GiB memory in use. Of the allocated memory 22.14 GiB is allocated by PyTorch, and 510.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
这是不是显存不够大造成的训练失败呢

@WZDTHU
Copy link
Collaborator

WZDTHU commented Mar 16, 2025

是的,可以减小batch size,或者减小模型大小

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants