Screen Printing with DCGANs

The idea of using ML techniques to produce art has been the subject of a wide body of research, ranging from topics in neural style transfer (recomposing images in the style of other images) to the generation of original art itself. Requisite to some of these techniques is a class of generative learning algorithms, algorithms which seek to learn the distributions of features (e.g., image pixels) given labels (in our case, capital "A" Art!). I developed a project to study generative adversarial networks (GANs) and apply them to screen printing, which was also something I was developing an interest in at the time.

GANs are a generative algorithm originally proposed for unsupervised learning; they were first invented by Ian Goodfellow in a 2014 paper. A GAN pits two neural networks against one another. One of them, the generator, samples noise and attempts to produce output that is similar to that found in the training set. The other network, the discriminator, is fed both the training set of real data and the fake data from the discriminator, and attempts to distinguish between the two. As learning progresses, the generator learns to produce better output and the discriminator learns to tell real from fake better, until the two networks reach convergence (i.e., both networks' losses reach an equilibrium). At this point, the generator is fairly good at generating output that looks like the training set.

To prepare for the project, I read the 2015 paper "Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks" by Alec Radford, Luke Metz, and Soumith Chintala. In it, the authors class of convolutional neural networks (CNNs) called a deep convolutional generative adversarial network (DCGAN) that bridges the gap from CNNs to unsupervised learning. In particular, they suggest a framework of architectural constraints for DCGANs that they argue make these particular convolutional GANs stable to train, even with deeper and higher resolution models. These constraints are:

  • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). This allows the network to learn its own downsampling/upsampling, as opposed to a deterministic pooling layer (e.g., a maxpooling layer takes the maximum activation from each filter and discards the others).
  • Use batchnorm in both the generator and the discriminator. This stabilizes learning by normalizing the input to a unit so that it has zero mean and unit variance.
  • Remove fully connected hidden layers for deeper architectures. The authors found that using such layers (e.g., global average pooling) increased model stability but hurt convergence speed, and removing fully connected layers was a better compromise.
  • Use ReLU activation in generator for all layers except for the output, which uses Tanh; use LeakyReLU activation in the discriminator for all layers. The authors found that these particular functions worked well for their purposes.

The project was developed with PyTorch; training took several days. The results were pretty neat, and I took one of the outputs, did an exaggerated halftone color separation, and printed a few copies of it.