A combination of the DCGAN implementation by soumith and the variational autoencoder by Kaixhin.
The model produces 64x64 images from inputs of any size via center cropping. You can modify the code relatively easily to produce different sized outputs (adding more convolutional layers, for instance), as well as to rescale images instead of cropping them. Images are randomly flipped horizontally to get better coverage on training data.
I have added white noise to the original inputs that go through the discriminator after reading this post on stabilizing GANS. The noise level is annealed over time to help the generator and discriminator converge.
- Torch7
- CUDA
- CUDNN
- DPNN
- Lua File System
- optim
- xlua
To run, execute the script using
th dcgan_vae.lua -i [input folder destination] -o [output folder destination] -c [destination for saving model checkpoints] -r [reconstructions folder]
where the input folder is expected to contain color images. The model resamples the training set after every epoch so as to fit on a GPU and still (eventually) sample all of the data. "Output" is for samples generated by the model, and "reconstructions folder" is to just save some reconstructions from the training set, to see how the VAE is doing (it's not going to do particularly well, but that's okay; it's there to assist the GAN).