Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?
Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?