Multi-GPUs DDP - How the dataset is distributed accross the GPUs #13342

KevinCrp · 2022-06-20T19:38:20Z

KevinCrp
Jun 20, 2022

Hi,

I'm using some GPUs and the Distributed-Data-Parallel strategy and I want to know how the global dataset is split across all GPUs.

Is it split iteratively:

Is it randomly distributed?

Is it another way?

I looked at the DistributedSampler class, but I didn't find the answer.

Answered by akihironitta

I believe this line in PyTorch code explains it all:

indices = indices[self.rank:self.total_size:self.num_replicas]

akihironitta · 2022-06-21T20:13:14Z

I believe this line in PyTorch code explains it all:

indices = indices[self.rank:self.total_size:self.num_replicas]

0 replies