### Sigma fields are Venn diagrams

The starting point for probability theory will be to note the difference between outcomes and events.

An outcome of an experiment is a fundamentally non-empirical notion, about our theoretical understanding of what states a system may be in -- it is, in a sense, analogous to the "microstates" of statistical physics. The set of all outcomes $x$ is called the sample space $X$, and is the fundamental space to which we will give a probabilistic structure (we will see what this means).

Our actual observations, the events, need not be so precise -- for example, our measurement device may not actually measure the exact sequence of heads and tails as the result of an experiment, but only the total number of heads, or something -- analogous to a "macrostate". But these measurements are statements about what microstates we know are possible for our system to be in -- i.e. they correspond to sets of outcomes. These sets of outcomes that we can "talk about" are called events $E$, and the set of all possible events is called a field $\mathcal{F}\subseteq 2^X$.

For instance: if our sample space is $\{1,2,3,4,5,6\}$ and our measurement apparatus is a guy who looks at the reading and tells us if it's even or odd, then the field is $\{\varnothing, \{1,3,5\},\{2,4,6\},X\}$. We simply cannot talk about sets like $\{1,3\}$ or $\{1\}$. Our information just doesn't tell us anything about sets like that -- when we're told "odd", we're never hinted if the outcome was 1 or 3 or 5, so we can't even have prior probabilities -- we can't even give probabilities to whether a measurement was a 1 or a 3.

Well, what kind of properties characterise a field? There's actually a bit of ambiguity in this -- it's clear that a field should be closed under negation and finite unions (and finite intersections follow via de Morgan) -- if you can talk about whether $P_1$ and $P_2$ are true, you can check each of them to decide if $P_1\lor P_2$ is true (and since a proposition $P$ corresponds to a set $S$ in the sense that $P$ says "one of the outcomes in $S$ is true", $\lor$ translates to $\cup$). But if you have an infinite number of $P_i$'s, can you really check each one of them so that you can say without a doubt that a field is closed under arbitrary union?

Well, this is (at this point) really a matter of convention, but we tend to choose the convention where the field is closed under negation and countable unions. Such a field is called a sigma-field. We will actually see where this convention comes from (and why it is actually important) when we define probability -- in fact, it is required for the idea that one may have a uniform probability distribution on a compact set in $\mathbb{R}^n$.

A beautiful way to understand fields and sigma fields is in terms of venn diagrams -- in fact, as you will see, fields are precisely a formalisation of Venn diagrams. I was pretty amazed when I discovered this (rather simple) connection for myself, and you should be too.

Suppose your experiment is to toss three coins, and make "partial measurements" on the results through three "measurement devices":
• A: Lights up iff the number of heads was at least 2.
• B: Lights up iff the first two coins landed heads.
• C: Lights up iff the third coin landed heads.
What this means is that $A$ gives you the set $\{HHT, HTH, THH, HHH\}$, $B$ gives you the set $\{HHH, HHT\}$, $C$ gives you the set $\{HHH, HTH, THH, TTH\}$. Based on precisely which devices light up, you can decide the truth values of $\lnot$'s and $\lor$'s of these statements, i.e. complements and unions of these sets -- this is the point of fields, of course.

Or we could visualise things.

Well, the Venn diagram produces a partition of $X$ corresponding to the equivalence relation of "indistinguishability", i.e. "every event containing one outcome contains the other"? The field consists precisely of any set one can "mark" on the Venn diagram -- i.e. unions of the elements of the partition.

A consequence of this becomes immediately obvious:

Given a field $\mathcal{F}$ corresponding to the partition $\sim$, the following bijection holds: $\mathcal{F}\leftrightarrow 2^{X/\sim}$.

Consequences of this include: the cardinalities of finite sigma fields are precisely the powers of two; there is no countably infinite sigma field.

Often, one may want to some raw data from an experiment to obtain some processed data. For example, let $X=\{HH,HT,TH,TT\}$ and the initial measurement is of the number of heads:

\begin{align} \mathcal{F}=&\{\varnothing, \{TT\}, \{HT, TH\}, \{HH\},\\ & \{TT, HT, TH\}, \{TT, HH\}, \{HT, TH, HH\}, X \} \end{align}
What kind of properties of the outcome can we talk about with certainty given the number of heads? For example, we can talk about the question "was there at least one heads?"

$$\mathcal{G}=\{\varnothing, \{TT\}, \{HT, TH, HH\}, X\}$$
There are two ways to understand this "processing" or "re-measuring". One is as a function $f:\frac{X}{\sim_\mathcal{F}}\to \frac{X}{\sim_\mathcal{G}}$. Recall that:

\begin{align} \frac{X}{\sim_\mathcal{F}}&=\{\{TT\},\{HT,TH\},\{HH\}\}\\ \frac{X}{\sim_\mathcal{G}}&=\{\{TT\},\{HT,TH,HH\}\} \end{align}
Any such $f$ is a permissible "measurable function", as long as $\sim_\mathcal{G}$ is at least as coarse a partition as $\sim_\mathcal{F}$. In other words, a function from $X/\sim_1$ to $(X/\sim_1)/\sim_2$ is always measurable.

But there's another, more "natural", less weird and mathematical way to think about a re-measurement -- as a function $f:X\to Y$, where in this case $Y=\{0,1\}$ where an outcome maps to 1 if it has at least one heads, and 0 if it does not.

But there's a catch: knowing that an event $E_Y$ in $Y$ occurred is equivalent to knowing that an outcome in $X$ mapping to $E_Y$ occurred -- i.e. that the event $\{x\in X\mid f(x)\in Y\}$ occurred. Such an event must be in the field on $X$, i.e.

$$\forall y\in\mathcal{F}_Y,f^{-1}(y)\in\mathcal{F}_X$$
This is the condition for a measurable function, also known as a random variable.

One may observe certain analogies between the measurable spaces outlined above, and topology -- in the case of countable sample spaces, there actually is a correspondence. The similarity between a Venn diagram and casual drawings of a topological space is not completely superficial.

The key idea behind fields is mathematically a notion of "distinguishability" -- if all we can measure is the number of heads, $HHTTH$ and $TTHHH$ are identical to us. For all practical purposes, we can view the sample space as the partition by this equivalence relation. They are basically the "same point".

It's this notion that a measurable function seeks to encapsulate -- it is, in a sense, a generalisation of a function from set theory. A function cannot distinguish indistinguishable points -- in set theory, "indistinguishability" is just equality, the discrete partition; a measurable function cannot distinguish indistinguishable points -- but in measurable spaces, "indistinguishability" is given by some equivalence relation.

Let's see this more precisely.

Given sets with equivalence relations $(X,\sim)$, $(Y,\sim)$, we want to ensure that some function $f:X\to Y$ "lifts" to a function $f:\frac{X}{\sim}\to\frac{Y}{\sim}$ such that $f([x])=[f(y)]$.

(Exercise: Show that this (i.e. this "definition" being well-defined) is equivalent to the condition $\forall E\in\mathcal{F}_Y, f^{-1}(E)\in \mathcal{F}_X$. It may help to draw out some examples.)

Well, this expression of the condition -- as $f([x])=[f(y)]$ -- even if technically misleading (the two $f$'s aren't really the same thing) give us the interpretation that a measurable function is one that commutes with the partition or preserves the partition.

While homomorphisms in other settings than measurable spaces do not precisely follow the "cannot distinguish related points" notion, they do follow a generalisation where equivalence relations are replaced with other relations, operations, etc. -- in topology, a continuous function preserves limits; in group theory, a group homomorphism preserves the group operation; in linear algebra, a linear transformation preserves linear combinations; in order theory, an increasing function preserves order, etc. In any case, a homomorphism is a function that does not "break" relationships by creating a "finer" relationship on the target space.