SUPER-RESOLUTION SRCNN
TensorFlow Tutorial: Part 1
I hate small images!
I do a lot of graphic design stuff in my free time and I'm always pulling images off the net and using them in my work. There's no worse feeling than finding the perfect image only to have its resolution be too small to be of any use. Have you ever tried to use Photoshop, GIMP, or another image editor to attempt to resize an image and make it larger? If so you know firsthand the disappointment that comes with trying to upscale an image. Whether it be the Bicubic interpolation, Spline interpolation, or Lanczos resampling algorithm, no matter how fancy the upscaling method sounds, the image still comes out blurry and filled with artifacts, noise, and/or serrated edges. Fed up one day, I decided that I would scour the internet until l found a solution. No matter how long it took I was determined to find a better way to upscale images. Well, it only took 0.36 seconds. The first Google search result was for a website called Let's Enhance, a free online image and upscale and enhancement solution. The results were amazing, and they do it by using deep convolutional neural networks (ConvNets).
SR-CNNs
Well, why didn't I think of that, just feed a bunch of downscaled images into a neural network, use the upscaled images as the target and viola, Super-Resolution. I had recently created a database of shoe images for another project I'm working on, so I would just take those images, downscale them and run them through a Deconvolutional Network (DeconvNet) and have crisp images again, it should be a piece of cake. I got ok results, but clearly, I was missing something. Although my images were less blurry since I was attempting to upsample a smaller image by using deconvolution my newly upscaled images were covered with checkerboard artifacts. Rather than try to figure it out myself I decided to do another search online for some articles on established methods of Super-Resolution. In the end, I found three successful ways to successfully upscale images with ConvNets and used one of the methods as inspiration to come up with a fourth method. This first part of the series covers what is probably the most popular method of Super-Resolution, or at least the one that usually comes up first when you do a Google search.
The first good article I found on my Super-Resolution quest was Learning a Deep Convolutional Network for Image Super-Resolution by Chao Dong, et al. Anyone interested in this topic should read this paper end to end. I won't get into it much here, but it should be a very easy read for anyone who has a bit of exposure to neural networks. They actually take a rather shallow ConvNet of only three layers and a training set of only 91 images to produce stellar results. I didn't use this framework exactly, most significantly they use the YCrCb color model and I just use RGB, but I borrowed heavily from it. Since I use TensorFlow and they used MATLAB to implement their network I found a reproduction of their work that used TensorFlow at https://github.com/tegg89/SRCNN-Tensorflow and used it as inspiration for this code.
IMAGE PREP
Before I could create the network, however, I needed a dataset of images. Luckily, I'm working on another project with shoe images, so I already had a personal database of a few thousand shoes at my disposal to use in this project. It may sound like a lot, but in reality, it only took a few hours to download. If you wanted to do the something similar you could do so fairly quickly or just download a pre-made image dataset from the net. The next step is to make a set of low and high-resolution images with the same dimensions. This method of super-resolution takes the downscaled image and upscales it back to the original size before running it through the network. Downscaling the images and making their dimensions uniform is fairly simple using PIL. Take an image, any dimensions, put it on a square white background whose length and width is the same as the largest side of the image and then downscale all the images to the same size. I have a GTX 1060 so I decided a size of 128x128 for my network. Depending on your setup, you could use bigger or smaller images. The code is as follows:
We use the same function to prepare both the high and low-resolution images. To accomplish the downsampling, we scale the low-resolution images down by a factor of 2 (making them 64x64) and then upscale them back to 128x128 using Bicubic interpolation. Upscaling with Bicubic interpolation or with another algorithm, such as KNN before feeding the images through the DeConvNet eliminates the checkerboard artifact issue. You could also keep the image small and just perform the re-size in TensorFlow with tf.image.resize_images. I utilized this method with a GAN I'm working on and got good results, but won't use the image resize method in this series. I will cover it however when I discuss GANs again.
Pros: No checkerboard artifacts, relatively small network, less computationally intensive, easy to implement.
We use the same function to prepare both the high and low-resolution images. To accomplish the downsampling, we scale the low-resolution images down by a factor of 2 (making them 64x64) and then upscale them back to 128x128 using Bicubic interpolation. Upscaling with Bicubic interpolation or with another algorithm, such as KNN before feeding the images through the DeConvNet eliminates the checkerboard artifact issue. You could also keep the image small and just perform the re-size in TensorFlow with tf.image.resize_images. I utilized this method with a GAN I'm working on and got good results, but won't use the image resize method in this series. I will cover it however when I discuss GANs again.
DECONVNET ARCHITECTURE
Our inputs to the network will be the high and low-resolution images, each 128x128x3. We set up a seven-layer DeconvNet with 32 filters in each layer. I was really hoping to use PReLUs but since they are optimized along with the model, I didn't have the processing power. I ended up choosing Leaky ReLUs for the activation function for all layers with the exception of the last one. The Leaky ReLU helps prevent the "dying ReLU" problem. A really nice, succinct explanation on dying ReLUs can be found here, A Practical Guide to ReLU. If a ReLU has a negative slope, its output is zero, the more negative slopes you have the less effective your network will be as this part of the network has basically been turned off.
Since the image size stays the same, the model utilizes a stride of 1, "same" padding, and no pooling. Since the stride is 1, technically a "transposed convolution" is not really happening, it's just a convolution since we're mapping the image back to an image that's the same size. However, coding it like this allows us to more easily experiment with feeding the original downsampled 64x64 image into the network and actually performing a transposed convolution or resize to upsample the image, so it makes the code more modular. The final activation function is Tanh, which has a range of -1 to 1, to keep the targets in line with this output, we transform our targets so that they have the same range before feeding them into the network. Our loss function is about as simple as possible, its just the mean-squared error. We just compare our upsampled image pixel by pixel against the ground truth, original image, some articles refer to this as Pixel Loss. What follows is a sample of my results on some test images after about 10 hours of training...
RESULTS
I was very excited seeing these results. Some details were lost during the downsampling and can never be recovered, we'll discuss why as this series progresses, but overall the images look pretty good, much more crisp than the Bicubic Interpolation. Full results and full implementation of this project can be found at https://github.com/ogreen8084/srcnns.SRCNN PROS AND CONS
Cons: Loss Function? Is MSE the best way to compare images?