Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing weight clipping #1

Open
PatrykChrabaszcz opened this issue Feb 2, 2017 · 7 comments
Open

Implementing weight clipping #1

PatrykChrabaszcz opened this issue Feb 2, 2017 · 7 comments

Comments

@PatrykChrabaszcz
Copy link

In tensorflow I just do this for weights clipping:

t_vars = tf.trainable_variables()
critic_vars = [var for var in t_vars if 'crit' in var.name]
self.clip_critic = []
for var in critic_vars:
self.clip_critic.append(tf.assign(var, tf.clip_by_value(var, -0.1, 0.1)))

Here is my repo where I try to implement WGAN: https://github.com/PatrykChrabaszcz/WGan
Did you get any good results ?
This is what I get for mnist:
res_20

@ConnorJL
Copy link
Owner

ConnorJL commented Feb 3, 2017

Wow, that is a much much nicer implementation! Thanks so much! My code now easily runs 20-50x faster. I've added the fix to the repo.

I haven't trained a good model for MNIST, because honestly I think that MNIST is too simple to really show whether an image generation technique is good or not. I'm still running my code on a self collected dataset of ImageNet-like images. We'll see how it goes.

@PatrykChrabaszcz
Copy link
Author

I was not able to train WGAN on CelebA resized to 32x32. It gave me worse results than standard GAN.
Probably experimenting with hyperparameters would help.

res_10

@ConnorJL
Copy link
Owner

ConnorJL commented Feb 3, 2017

Very interesting! The paper said WGANs should be able to avoid this kind of mode collapse if I remember correctly, so this is definitely worth investigating. I'm going to pause my high res experiment for a while and run some tests on MNIST, CelebA and CIFAR, if I can find the time. Might take me a day or two to get representative results.

@PatrykChrabaszcz
Copy link
Author

PatrykChrabaszcz commented Feb 3, 2017

Ill run it again for little bit different architecture and original hyperparameters settings. This image I posted was from one of experiments I think it wasn't according to original settings.

Ok this is what I got so far. And I think it's still training:
res_10

@ConnorJL
Copy link
Owner

ConnorJL commented Feb 4, 2017

Ah, those are some nice results! Did you find any culprit hyperparameter or was the network just under trained? I was curious at the report that a simple MLP architecture could lead to good results using WGAN, so I ran it on MNIST, take a look:

test_490000

I'm pretty impressed by the quality, fully connected neural nets have a bad reputation nowadays, and training only took a few hours on my consumer grade computer. But for some reason it seems dead set on occasionally producing totally black images, I'm not quite sure why. I'm probably going to try it on CIFAR next, see what happens.

@PatrykChrabaszcz
Copy link
Author

I don't remember exactly now, but I think that I first ran network with original proposed learning hyperparameters, network was similar to dcgan but had 4x less features in each layer. It gave me images with something face-like. Then I was experimenting with different settings it and I got those strage images that you can see in my second post. Then I got back to the original settings but I changed network structure (Adding more kernels, changing batch norm) and now I can see something like this:
res_10

I got much better faces using Adversarial Autoencoder + feature matching using GAN. So for me those look quite bad.
Here are samples from AAE+GAN:

aaegan

Those black images are strange. Did you try to find which part of latent space produces black images?
Do you "switch batch norm off" during sampling after you train?

@ConnorJL
Copy link
Owner

ConnorJL commented Feb 4, 2017

Hmm yea the results aren't bad but they aren't significantly better than a normal GAN, AAE+GAN does look much nicer. I am beginning to wonder where WGAN has larger benefits. From what I understand (and I am not the greatest expert) it may be useful in stabilizing training in difficult domains, so maybe a test on ImageNet or similar with a larger DCGAN architecture would show its benefits. I was quite surprised it got my tiny MLP model to make decent enough results, but as said before MNIST is a very simple dataset. More testing tomorrow.

The MLP model as per the original paper actually doesn't use batch norm, so that isn't it. I might look into it a little more tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants