Generative neural networks: introduction

Equipped with the ability to process data, the obvious next step is to get an AI to produce things -- to get an AI to be creative. To come up with art, compositions, original thoughts and ideas. We'll now describe the most elementary of such neural networks, which we will call Generative Neural Networks, while more complicated ideas would exploit some sort of transfer learning.

It's not at all absurd to expect it to be possible for a neural network to generate images of horses that don't look like any horse it's actually seen -- because humans can do that! If you imagine a horse, it's probably not a horse whose image you've seen before, but it nonetheless possesses the features you've identified as common between horses.

The idea behind a generative neural network can be motivated from the following two statistical notions:

  • The inverse transform method of generating random variables.

Content generated by a mind can be considered to be a random variable in some fancy space. E.g. if we want to get our neural network to produce (28, 28) digit characters, we're training it into a random variable on the space of (28, 28) images whose support is the images we identify as valid digit characters.

The way that computers typically sample random variables is through the "inverse transform method", which is to start with a uniform random sample and apply $F^{-1}$ to your sample where $F$ is the CDF of the random variable $X$ you want to sample. Your result will be a sample of $X$.

Quick explanation of inverse transform method: under the uniform distribution, the probability of getting a value under $u$ is $u$, which under the CDF of $X$ is the probability of getting a value under $F^{-1}(u)$. So you map $u\mapsto F^{-1}(u)$.
So we once again need a function approximator -- to approximate $F^{-1}$.

The inputs to the neural network are randomly generated, typically uniformly

  • A nonlinear generalization of principal component analysis

Think, e.g. of eigenfaces. If you've ever tried to use eigenfaces to generate realistic faces, you'll notice that your results are just terrible. Much of the information in faces is not so linear and nice -- there's no reason to expect it to be. Illumination and angle are pretty much the only properties that can be expected to vary linearly.

I.e. suppose you have some data that varies as follows:


Then a PCA might give you the pink line as your first principal component, but sampling from the pink line gives you a lot of unphysical outputs, those are the areas where your pink line doesn't intersect the data.

But using PCA to generate samples from a distribution can be understood as taking some random inputs, corresponding to the values of each principal component you want to use, and feeding them through a function, the principal component change-of-basis matrix. 

But more generally, replacing this function with something nonlinear allows us to deal with nonlinear models.



OK,so it's clear to us that we need a neural network -- a "generative neural network" -- to construct the inverse CDF. How would one train this network?

Given an initial random guess for the network parameters, what we have is some guessed distribution for "images of horses". And what we really want to do is perturb these parameters to match the distribution of our data.

Well, such an approach is certainly possible -- one could measure some notion of distance from our generated sample distribution and the real distribution and backpropagate this error with each iteration.

Note that we don't actually know the distribution of our data (the red one), so we can't really use something like "the probability of observing this sample given our distribution" as our loss function. Anyway, there are measures of the distance between two distributions that we could use for our error function, such as the maximum mean discrepancy approach, and methods involving moments.

(This approach, generally, is called a Generative Matching Network.)



But you can probably see why this approach is not a great one, compared to what the human brain can do. For example, these distributions would appear to be quite close to each other:

OK, the chance of the outlier is pretty small. But I bet that if you asked a human to think of a "horse", he would never imagine this:

Source: commons.wikimedia.org/wiki/File:Alphonso_mango.jpg
Or more likely this:

Source: commons.wikimedia.org/wiki/File:White-noise-mv255-240x180.png
OK, maybe we could just use a better distance function, etc. etc. But here's an idea: we could just subjectively tell that the outlier point did not belong to the distribution. We used our human brains. How about rather than defining a discrimination function, we trained a neural network to tell if a given data point could belong to a distribution? Then this neural network would train our generative neural network, and vice versa.

And this makes sense, right? When we learn to draw an object, we're also simultaneously learning to identify one.

And one could imagine showing off a generative network's results and having people guess if they're real or not (alongside actual real images of course) -- and based on whether they thought it was real, we could use it to train the network. These "people" are precisely what a discriminator network is.

In other words, we have two networks: the generator network, which generates random horse faces from random uniform variables, and the discriminator network, which takes the output of the generator network and some actual images, and figures out if the result is real or not.

If the classification is incorrect, the discriminator network is punished, while if it is correct, the generator network is punished.

This is known as a Generative Adverserial Network.

No comments:

Post a Comment