how can i training_step on gpu(ddp), validation_step on cpu? #15742
Unanswered
YooSungHyun
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
my data is too big, so i can train_batch=1 on my gpu(ddp) but, validation_step is explode cuda memory oom.
so i want to run my
logits = self(input_data)
on cpu in validation_step.i use torchmetrics kwargs
compute_on_cpu=true
andmove_metrics_to_cpu=true
but, in my training_step. self.log("train_loss", loss, sync_dist=True) got error
Tensor must cuda blah blah
something like that.how can i solve my problem?
Beta Was this translation helpful? Give feedback.
All reactions