In this second test, we will use the same convolutional network structure throughout and instead alter the learning rate. A known problem with deep learning, in general, is the need to test hyperparameters in order to obtain good results. It would be reasonable to assume that by attempting to minimize the difference between the distribution of the real data and the generated data it would provide a greater margin of error. In other words, by stabilizing the distribution of the generated data it will be easier to train the network for a greater range of hyperparameters. We shall experiment and find out. The full implemenation for this project can be found at: https://github.com/ogreen8084/moment_stabilization_dconv
Dependencies:
Python 3.5.1
Tensorflow 1.0.1
Numpy
Matplotlib
Pickle
Pandas
Pandas
Tests:
In all tests, we optimize with AdamOptimizer. We create our “fake MNIST dataset” from an initial input of 100 dimensions drawn from a uniform distribution with a minimum of -1 and a maximum of 1. The generator DeConvNet has a fully connected layer of 1,024 units and then a fully connected layer of 7*7*256 or 12,544 units. This layer is then reshaped and fed to a conv2d transpose layer of 32 filters and finally to a final conv2d_tranpose layer of one filter. Batch normalization is used throughout, padding is "SAME" and the stride is two for each conv2d_tranpose layer. We again use a batch size of 100, but we only train for 20 epochs as opposed to the feed-forward network when we trained for 100 epochs.
Test #1:
Test # 2:
Test #3
Results:
It's safe to say that the moment stabilization speeds up training in each test. In test #3, with the smallest learning rate, 0.0001, the non-moment stabilization model struggles to train at all, while the moment stabilization model is able to create digit-like figures. In test #1, there is also a clear advantage seen from moment stabilization. the model is able to generate digit like figures quicker and those figures remain crisper than the non-stabilized model throughout. In test #2, the performance of both models is closest, which could lend to the theory that if the model is already optimized then moment stabilization has less of an effect.
Conclusion:
We have further evidence that attempting to minimize the difference between the mean and variance of the generated dataset and real dataset can increase the performance of GANs and can potentially help models train successfully with less need for fine tuning of hyperparameters.
Benefits and Concerns:
1. It seems like the method works, but we need to attempt to understand mathematically "why" the method seems to work.
2. Can the method also allow us to obtain good results with a smaller, less complex network?
Next Steps:
1. Test on smaller DeConvNet with MNIST
2. Test on Celeba Dataset
No comments:
Post a Comment