The Winding Number: The mathematical definition of an image

We have previously noted that convolutional layers are effective in image processing applications. In this article we go a bit further and claim that the definition of an image is that convolutions are effective on it.

The motivation for the convolution operation -- as well as for the use of the term "convolution" to describe it -- is the idea that an image is best understood in its frequency representation. This is the basis of various simple compression algorithms like JPEG, as well as for general image processing. The Discrete Fourier (or equivalently cosine) transform allows an image to be written as a linear combination of sinusoids, like such.

The idea is then that the properties of the image can be inferred from this frequency representation through "simpler" functions than from the position representation. This is the defining characteristic of an image (or rather of the distribution of images) -- if we instead dealt with some other data type, which did not exhibit spatial regularities like an image does, this would no longer be true.

Multiplying the frequency representation by an enveloping function to enhance particular frequencies corresponds to a convolution of the image with the Fourier transform of said envelope. Any arbitrary such envelope can be constructed from a sequence of such convolution operations.

(Play with frequency representations of images in this Colab notebook.)

Another type of data whose distribution we know something about is time series.

The key thing about time series is that the data representation should handle sequences of varying lengths. This is not a tall order -- in general, entropy encodings are perfectly capable of handling data of arbitrary lengths if we know the distribution of the data. To model the distribution of such sequences means figuring out the probability $p(x_{i+1}\mid x_1,\dots x_i)$ -- or equivalently $p(h_{i+1}\mid h_i)$ where the $h_i$ are the data representations. So the natural way of representing sequences of arbitrary length is as states of a Markov chain, with each new item in the sequence providing information to determine the next state.

The mathematical definition of an image

No comments:

Post a Comment