Run Trainer.fit multiple times under DDP mode #12401
Unanswered
xmlyqing00
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 3 replies
-
can you try it with |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have a machine learning architecture project that requires modifying the network structure multiple times. I used PytorchLigtning codes to implement it. The overall structure is as followed.
The model definition, I ignore the
training_step
, 'validation_step' for clearly demonstration.The following main script shows that I want to update the network structure and retrain the model in 10 iterations.
When
iter == 1
, the model has been propagated into different GPU, and the model.add() results in different models. So I add a flag to make sure the modification is happened in the main process byBut this time, the program get stuck when
iter == 1
. My questions are:Thanks for your time. Any comments or suggestions are welcome.
Beta Was this translation helpful? Give feedback.
All reactions