Correct usages of "find unused parameters" with DDP #14763
Unanswered
fstmsn
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
Usually, I’m using DDP strategy with ‘find_unused_parameters=False’, because I’m sure to use all the parameters of my model in computing the loss function.
Now, I’m training a neural network that is composed of more modules (take two as an example).
In this case, the second module takes part of its input from the first module.
What I’m trying to do is:
Train the first module for x epochs, while keeping the second module freezed.
Then freeze the first module and train the second one for x epochs.
Start again.
Each module has its own optimizer/scheduler.
Problem is that with DDP, this gives me the error:
One solution is of course to set “find_unused_parameters” to True, but this slows down training a lot.
I have tried to set “requires_grad=False” for all parameters of the second module, and also to set “module.training=False”, but this does not seem to help.
Do you have any suggestions on what is the best way to proceed?
Can you explain me better what happens when I set "find_unsued_parameters’ to False/True?
Beta Was this translation helpful? Give feedback.
All reactions