You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, recently I'm trying to reproduce this SEGAN model and find out some questions.
The biggest question is about the loss function of the discriminator. As we know the original GAN's discriminator is doing binary classification task. So it use a Sigmoid at the last output layer and Binary Cross Entropy as the loss function. For this model's discriminator it seems it is doing a regression task, the loss function is trying to minimize the distance between outputs and 1 (or 0). So I think the discriminator contributes nothing to the final performance. minimizing L1 loss between clean speech and generated speech make the whole system work.
So I discarded the discriminator and only train the generator for speech enhancement, it gives a very close performance of SEGAN. If only use the generator for training, the model could be seen as a de-noising auto encoder.
3.I'm kind of confused about that how much does the discriminator contribute to the final performance during the Adversarial Process. Because for speech enhancement task, we are not 'generate' basically but 'mapping' noisy signal to clean signal.
Many thanks!
The text was updated successfully, but these errors were encountered:
I think gan loss contributes high-frequency band. without gan loss, mse loss or l1 loss don't catch enough high-freq information due to low-power of the high-freq.
Hi there, recently I'm trying to reproduce this SEGAN model and find out some questions.
The biggest question is about the loss function of the discriminator. As we know the original GAN's discriminator is doing binary classification task. So it use a Sigmoid at the last output layer and Binary Cross Entropy as the loss function. For this model's discriminator it seems it is doing a regression task, the loss function is trying to minimize the distance between outputs and 1 (or 0). So I think the discriminator contributes nothing to the final performance. minimizing L1 loss between clean speech and generated speech make the whole system work.
So I discarded the discriminator and only train the generator for speech enhancement, it gives a very close performance of SEGAN. If only use the generator for training, the model could be seen as a de-noising auto encoder.
3.I'm kind of confused about that how much does the discriminator contribute to the final performance during the Adversarial Process. Because for speech enhancement task, we are not 'generate' basically but 'mapping' noisy signal to clean signal.
Many thanks!
The text was updated successfully, but these errors were encountered: