Sorting image data for GAN training

GANs need a whole lot of image data to output proper results. Therefore one should consider oneself lucky, if one comes upon a highly uniform and standardised image dataset for GAN training. 

However if one is not that lucky, then a nice way to sort shitty image data from not-so-shitty image data, is to train a simple classifier on a subset of the images in the dataset. 

So to sort good from bad, I trained a MobileNet model in Keras to distinguish between what I considered good images and bad images. After only 10 epochs, the model got 85 percent accuracy on the validation images. That could possibly get better, so I retrained a VGG16 to try. 

To see how to retrain your own Keras models and save them for later use, check out my quick and easy notebook here. Also check out DeepLizard’s playlist on Keras on youtube, amazing tutorial! 

I also increased the size of the dataset by flipping every single image vertically using PIL, so I now had a dataset double the original size (omg). Here is the script for that. After retraining the VGG16, I now got a 99 % accuracy on the validation set and I finally reached satisfaction. 

And so I made a script that ran through the whole dataset, predicted the good images from the bad, and sorted the respective images in new folders. One with the good ones and one with the bad ones. From here, some rough mistakes of the model could quickly be moved into the right folder, and the GAN now have some pretty sweet uniform data to munch on. 

Hare krishna!

Skriv et svar

Din e-mailadresse vil ikke blive publiceret. Krævede felter er markeret med *