While training occworld, using the command for training occworld model, after training successfully for 1 epoch, I end up with the following error:
Traceback (most recent call last):
File "/scratch/p24cs0005/OccWorld/train.py", line 362, in <module>
main(0, args)
File "/scratch/p24cs0005/OccWorld/train.py", line 327, in main
val_miou, _ = CalMeanIou_sem._after_epoch()
File "/scratch/p24cs0005/OccWorld/utils/metric_util.py", line 96, in _after_epoch
dist.all_reduce(self.total_seen)
File "/csehome/p24cs0005/miniconda3/envs/occworld/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1451, in wrapper
return func(*args, **kwargs)
File "/csehome/p24cs0005/miniconda3/envs/occworld/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1699, in all_reduce
default_pg = _get_default_group()
File "/csehome/p24cs0005/miniconda3/envs/occworld/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 707, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Why does this error occur as it happens after being trained for 1 epoch and evaluation?
Also, in the github repo, there is a pretrained weight link which leads to a file latest.pth. Is it for occworld model or the VQVAE model?
I have this doubt because I tried training the VQVAE, and the weights in out/vqvae also consists of a latest.pth file. Also, if on putting the latest.pth in out/occworld, can I evaluate my results by running the command for evaluation?
While training occworld, using the command for training occworld model, after training successfully for 1 epoch, I end up with the following error:
Why does this error occur as it happens after being trained for 1 epoch and evaluation?
Also, in the github repo, there is a pretrained weight link which leads to a file
latest.pth. Is it for occworld model or the VQVAE model?I have this doubt because I tried training the VQVAE, and the weights in
out/vqvaealso consists of alatest.pthfile. Also, if on putting thelatest.pthinout/occworld, can I evaluate my results by running the command for evaluation?