Can't get multi-gpu to work anymore #14948
Unanswered
EvanZ
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 1 reply
-
@EvanZ Currently, we don't have any version update guide at this time. For the time being, I would suggest updating your PL minor version one by one. For example, if you're using I am interested in how the degradation happened to your case. Would it be feasible for you to share your code and environment detail here so that I (or someone) might be able to point out possible causes in your code? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
At one point I was able to run my model using 4 GPUs on a single machine but since upgrading to the most recent versions of torch and lightning, I am getting shared memory errors like this:
unable to open shared memory object </torch_4121_699393955_8164> in read-write mode: Too many open files (24)
Is there a tutorial or any docs that explain what changes I need to make to my code to bring it up to date?
Beta Was this translation helpful? Give feedback.
All reactions