Correct usages of "find unused parameters" with DDP #14763

fstmsn · 2022-09-18T11:59:28Z

fstmsn
Sep 18, 2022

Hi there,

Usually, I’m using DDP strategy with ‘find_unused_parameters=False’, because I’m sure to use all the parameters of my model in computing the loss function.

Now, I’m training a neural network that is composed of more modules (take two as an example).
In this case, the second module takes part of its input from the first module.

What I’m trying to do is:

Train the first module for x epochs, while keeping the second module freezed.
Then freeze the first module and train the second one for x epochs.
Start again.
Each module has its own optimizer/scheduler.

Problem is that with DDP, this gives me the error:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. 
This error indicates that your module has parameters that were not used in producing loss. 
You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True 
to torch.nn.parallel.DistributedDataParallel, and by
making sure all forward function outputs participate in calculating loss.

One solution is of course to set “find_unused_parameters” to True, but this slows down training a lot.

I have tried to set “requires_grad=False” for all parameters of the second module, and also to set “module.training=False”, but this does not seem to help.

Do you have any suggestions on what is the best way to proceed?
Can you explain me better what happens when I set "find_unsued_parameters’ to False/True?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct usages of "find unused parameters" with DDP #14763

{{title}}

Replies: 0 comments

Select a reply

Correct usages of "find unused parameters" with DDP #14763

fstmsn Sep 18, 2022

Replies: 0 comments

fstmsn
Sep 18, 2022