Test results different between logging in test_step and logging in test_epoch_end #10517
-
Why do these two test codes result in different test results(both average acc and average loss)? def test_step(self, batch, batch_idx):
input_ids, labels = batch
outs = self(input_ids)
loss = self.loss_fn(outs, labels)
acc = self.acc_fn(outs, labels)
self.log_dict({'test_loss': loss, 'test_acc': acc}, on_step=False, on_epoch=True, logger=False)
return loss, acc def test_step(self, batch, batch_idx):
input_ids, labels = batch
outs = self(input_ids)
loss = self.loss_fn(outs, labels)
acc = self.acc_fn(outs, labels)
return loss, acc
def test_epoch_end(self, step_outputs):
avg_loss = torch.stack([x[0] for x in step_outputs]).mean()
avg_acc = torch.stack([x[1] for x in step_outputs]).mean()
self.log_dict({'test_loss': avg_loss, 'test_acc': avg_acc}, logger=False) I only use one GPU for testing. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
It seems that it's because the last batch may have a different number of samples. |
Beta Was this translation helpful? Give feedback.
-
just in case anyone else sees this discussion, adding more context to @Bowen-n answer. Within lightning we use a weighted average to accumulate the results at the epoch end where weights are the batch_size for each batch inside |
Beta Was this translation helpful? Give feedback.
just in case anyone else sees this discussion, adding more context to @Bowen-n answer. Within lightning we use a weighted average to accumulate the results at the epoch end where weights are the batch_size for each batch inside
test_step
.