Using DataParallel wrapper #10396

w2kun · 2021-11-07T09:06:32Z

w2kun
Nov 7, 2021

In my experiment setting, I need multi-gpu training in certain training stage to increase batch size. Is it right that I set gpus flag in Trainer to 1 (I don't want to use any accelerator) and wrap some submodule with DataParallel when multi-gpu training is needed?

tchaton · 2021-11-08T12:50:10Z

tchaton
Nov 8, 2021
Maintainer

Dear @w2kun,

This is more complicated :) Lightning doesn't support natively hybrid mechanisms right now unless you implement it yourself.

Would you mind sharing more details on what you want to achieve.

If you are trying to scale your batch_size, you could use DeepSpeed + precision=16 + activation checkpointing and this should enable you to scale to very large batch size.

If it is not enough, you could use Trainer(accumulate_grad_batches=X) to create artificially a batch size larger by accumulating gradients.

Best,
T.C

1 reply

w2kun Nov 8, 2021
Author

Thank you very much for the advices. What's I want to do in my experiment is like this:

1. Producing some common data that will be shared in later computation and always has input batch size of 1.
2. Performing time-consumed computation to get the final output.

I want to use multi-gpu training for acceleration on step 2, while I tend to choose single-gpu training for step 1 since the computation just need to be performed once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using DataParallel wrapper #10396

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Using DataParallel wrapper #10396

w2kun Nov 7, 2021

Replies: 1 comment · 1 reply

tchaton Nov 8, 2021 Maintainer

w2kun Nov 8, 2021 Author

w2kun
Nov 7, 2021

Replies: 1 comment 1 reply

tchaton
Nov 8, 2021
Maintainer

w2kun Nov 8, 2021
Author