Monday, May 22, 2017

Using Moments Using Moments to Improve & Stabilize Generative Adversarial Network (GAN) Learning (Pt 2)

This post continues where we left off previously. We had promising results from using the first and second central moments (mean and variance) of real data to guide "fake data" from a generator network to hopefully mimic the data distribution of the real data and therefore have a better chance of producing good results. We will now switch to a convolutional neural network structure and test on the same MNIST dataset.  


In this second test, we will use the same convolutional network structure throughout and instead alter the learning rate. A known problem with deep learning, in general, is the need to test hyperparameters in order to obtain good results. It would be reasonable to assume that by attempting to minimize the difference between the distribution of the real data and the generated data it would provide a greater margin of error. In other words, by stabilizing the distribution of the generated data it will be easier to train the network for a greater range of hyperparameters. We shall experiment and find out. The full implemenation for this project can be found at: https://github.com/ogreen8084/moment_stabilization_dconv

Dependencies:
Python 3.5.1
Tensorflow 1.0.1
Numpy
Matplotlib
Pickle
Pandas

Tests:

In all tests, we optimize with AdamOptimizer. We create our “fake MNIST dataset” from an initial input of 100 dimensions drawn from a uniform distribution with a minimum of -1 and a maximum of 1. The generator DeConvNet has a fully connected layer of 1,024 units and then a fully connected layer of 7*7*256 or 12,544 units. This layer is then reshaped and fed to a conv2d transpose layer of 32 filters and finally to a final conv2d_tranpose layer of one filter. Batch normalization is used throughout, padding is "SAME" and the stride is two for each conv2d_tranpose layer. We again use a batch size of 100, but we only train for 20 epochs as opposed to the feed-forward network when we trained for 100 epochs. 


Test #1:

Test # 2: 




Test #3



Results: 

It's safe to say that the moment stabilization speeds up training in each test. In test #3, with the smallest learning rate, 0.0001, the non-moment stabilization model struggles to train at all, while the moment stabilization model is able to create digit-like figures. In test #1, there is also a clear advantage seen from moment stabilization. the model is able to generate digit like figures quicker and those figures remain crisper than the non-stabilized model throughout. In test #2, the performance of both models is closest, which could lend to the theory that if the model is already optimized then moment stabilization has less of an effect. 

Conclusion:

We have further evidence that attempting to minimize the difference between the mean and variance of the generated dataset and real dataset can increase the performance of GANs and can potentially help models train successfully with less need for fine tuning of hyperparameters. 


Benefits and Concerns:
1.  It seems like the method works, but we need to attempt to understand mathematically "why" the method seems to work. 
2. Can the method also allow us to obtain good results with a smaller, less complex network? 

Next Steps:
1. Test on smaller DeConvNet with MNIST
2. Test on Celeba Dataset

No comments:

Post a Comment

Super-Resolution SRCNN Tutorial in TensorFlow Part 1

SUPER-RESOLUTION SRCNN  TensorFlow Tutorial: Part 1 This is the first entry into a four-part series that will give a tutorial on th...