aggregating confusion matrices to calculate accuracy #17789
Unanswered
taloy42
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Goal
I want to calculate the confusion matrix$C_g$ on each gpu, add it all to $C=\sum C_g$ , use $C$ to calculate the accuracy and log it using
self.log_dict(accuracies_from_confmat(C))
Setup
Current Situation
right now I am using th following code:
Wanted Behaviour
I would like to do something like
so to add all the matrices into one from all the GPUs, and then log the data only on rank 0.
Attempts
I have tried to use
torch.distributed.all_reduce
but I have got a cuda memory errorBeta Was this translation helpful? Give feedback.
All reactions