Multiple Jobs on Multiple GPUs #18799
Unanswered
tommycwh
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am in a situation to train many independent models, and I want to start a process which helps me train models one by one automatically. I have multiple GPUs so I want to train multiple models on all GPUs at the same time. Let's say one model will take up one GPU. Right now, I am doing this manually. Whenever a GPU is idle, I start a training model on that GPU by myself using the Lightning CLI interface.
Since there are many models to train, I am wondering if it is possible to start a "master" process, which automatically starts a training job whenever a GPU is not occupied. Is this possible with lightning, or is this not something that lightning is intended to handle?
Beta Was this translation helpful? Give feedback.
All reactions