The Winding Number: quantum mechanics

Showing posts with label quantum mechanics. Show all posts

Mixed states II: decoherence; important measures of purity and entropy

Decoherence

At the end of this section, you should be able to:

appreciate why the density matrix is really a great way of expressing states, even for pure states (they uniquely determine the dynamics of the system, without any "overall phase", etc.)
develop an intuition for measurement, even "inadvertent" measurement
understand on a somewhat high level how classical physics arises as a limit of quantum physics
hang out with Wigner's friend
admit that complex phases matter in quantum mechanics and link them to interference

Let's talk about measurement.

Suppose we have a system that we wish to measure it under an operator whose eigenvectors are $|0\rangle_A$ and $|1\rangle_B$. The idea is that we have some measurement apparatus, and their original combined state evolves from something like:

$$|\psi\rangle_{AB}=(\lambda|0\rangle_A+\mu|1\rangle_B)\otimes|0\rangle_B$$
To the entangled state:

$$|\psi\rangle_{AB} = \lambda|0\rangle_A\otimes|0\rangle_B+\mu|1\rangle_A\otimes|1\rangle_B$$
Then observing the apparatus is sufficient to observe the system. The idea is that ultimately, the observer himself (or his "knowledge") are the apparatus, and the he entangles with the system to measure it.

Well, we know that often, we end up seeing things we didn't really want to. After all, physics does not care about your wants and preferences. In fact, in pretty much any situation, information about the system will leak out into the surroundings in some specific way. For example, Schrodinger's cat leaks information about the life of the cat by making the environment smelly, i.e. the state evolves from:

$$|\psi\rangle_{AB}=(\lambda|\mathrm{alive}\rangle+\mu|\mathrm{dead}\rangle)\otimes|\mathrm{clean}\rangle$$
To the entangled state:

$$|\psi\rangle_{AB}=\lambda|\mathrm{alive}\rangle\otimes|\mathrm{clean}\rangle+\mu|\mathrm{dead}\rangle\otimes|\mathrm{smelly}\rangle$$
What this means is that the density matrix of the cat evolves as:

$$\left[ {\begin{array}{*{20}{c}}{{{\left| \lambda \right|}^2}}&{\lambda \bar \mu }\\{\mu \bar \lambda }&{{{\left| \mu \right|}^2}}\end{array}} \right] \mapsto \left[ {\begin{array}{*{20}{c}}{{{\left| \lambda \right|}^2}}&0\\0&{{{\left| \mu \right|}^2}}\end{array}} \right]$$
(Check that I got the right transpose.) OK, what happened here?

Recall that the probabilities of collapsing to $|0\rangle$ and $|1\rangle$ are determined purely by the elements on the diagonal -- the off-diagonal elements, or the coherences, are only relevant for collapsing on to some combination of $|0\rangle$ and $|1\rangle$. What's going on here is that when the environment entangles with the system, it has "kinda" already observed it -- like your Wigner's friend. It "knows" that the system isn't in $|0\rangle+|1\rangle$, and even though you haven't observed the environment yet (you haven't smelled it), you know how the combined state has evolved, and the probability has become a classical probability, because the quantum stuff has already been observed -- by the environment.

The idea behind decoherence is the same idea that ensures that the Wigner's friend scenario is consistent.

"Eventually", "all" the information about the system will leak into the environment -- i.e. in principle, we should be able to determine anything about the system from measuring the environment, and our uncertainty about the system arises entirely from our completely classical uncertainty about the environment -- so the density matrix becomes a classical one, i.e. a diagonal one (the off-diagonal terms go to zero).

What basis is it diagonal in? In the basis corresponding to the states of the environment -- i.e. if the environment can be in states $|0\rangle_B$ and $|1\rangle_B$, then the states of the system that precisely induce these states of the environment form the preferred basis. These are often called the "environmentally selected basis".

This process is called decoherence. You may also hear the terms pointer states (for the preferred basis), einselection (environmentally induced selection of the preferred basis), or Quantum Darwinism (what the heck?) -- but they're really synonymous. We'll just use the fancy words when they're grammatically useful.

Well, the following may not be completely clear, but you should at least be able to appreciate that it is true: the off-diagonal terms approach zero, rather than hit it. Why? Although the system leaks information into the surroundings, we aren't really certain about what we're inferring about the system from the environment -- a live cat may be smelly too, etc. So the pointer states are not exactly orthogonal, either.

The precise behavior of decoherence depends on the Hamiltonian of the system -- e.g. predicting the generation of the smelliness of the air from the state of the cat based on what's going on microscopically is something that could be done in principle by solving a really complicated Schrodinger equation. You can, given a Hamiltonian, at least make order-of-magnitude estimates of at how much time and at how macroscopic a scale (i.e. with how many degrees of freedom) does the system begin to behave in a way that can be described as classical.

Decoherence does not remove the need for wavefunction collapse -- one still needs the observer to note an observation, collapsing the system.

TBC: purity, entropy, correlation functions

Time evolution, Schrodinger and Heisenberg pictures, Noether's theorem

So far, we have discussed quantum mechanics without any reference to changes across time. You might think we could just upgrade $\psi(x)$ to $\psi(x,t)$ and e.g. an observable $Q_t$ measuring a value $q$ at time $t$ would have eigenvectors whose cross-section at $t$ are of the form $\delta(x-a)$. But this would mean the entire $\psi(x,t)$ is the state of the object, rather than there being a state at each value of $t$, and time would be an observable. This is clearly not what we want (right now -- although to be consistent with special relativity we will need to treat space and time on an equal footing later in this series).

Instead, a more appropriate approach is to say that the state is a function of time $|\psi(t)\rangle$ and the evolution of the state is given by some operation $|\psi(t)\rangle=U[|\psi(0)\rangle]$.

How do we know that $U$ is a linear operator? What does it mean for $U$ to be a linear operator anyway? The only sense in which such a linearity can be tested is by looking at a state in a superposition. So suppose $|\psi(0)\rangle=|\psi_1(0)\rangle+|\psi_2(0)\rangle$. Now $|\psi(t)\rangle=U[|\psi_1(0)\rangle+|\psi_2(0)\rangle]$ -- this is from the perspective of some observer Alice.

But if another observer Bob had previously observed and collapsed the system to $|\psi_1(0)\rangle$ at time 0, then according to him, the state should evolve to $U[|\psi_1(0)\rangle]$, and if he had observed the system in $|\psi_2(0)\rangle$, his knowledge of the system would evolve to $U[|\psi_2(0)\rangle]$.

So according to Alice, who doesn't know what Bob has observed (she has not observed him), her knowledge of the system can also be written as $U[|\psi_1(0)\rangle]+U[|\psi_2(0)\rangle]$. Thus

$$U[|\psi_1(0)\rangle+|\psi_2(0)\rangle]=U[|\psi_1(0)\rangle]+U[|\psi_2(0)\rangle]$$
I.e. $U$ is linear, so we can write it as a linear operator as in $U|\psi(0)\rangle$. (The above scenario is called Wigner's friend)

$U$ is also clearly a unitary operator, as it must preserve all lengths.

We can consider infinitesimal time evolutions $U_t(dt)$ representing evolution of the state from $t$ to $t+dt$. Then:

$$|\psi(t)\rangle=U_0(dt)\dots U_{t-dt}(dt)|\psi(0)\rangle$$
This product integral can be written alternatively as:

$$|\psi(t)\rangle=\mathcal{T}\left\{e^{\int \ln U_t(dt)}|\psi(0)\rangle\right\}$$

$\mathcal{T}$ is the time-ordering operator which orders a product like $H(t_1)H(t_2)$ in order of ascending $t$ in an expansion. Can you see why this is necessary (hint: $e^{AB}\ne e^{A+B}$ for noncommuting $A,B$).

Oh, and it's not actually an operator -- not even in the math sense, it's a "formal operation", one that takes a form or sentence (rather than its value) -- in this case the $\exp$ Taylor expansion -- and changes it some way.

$\ln U_t(dt)$ is an infinitesimal, and it's easy to see that it is equal to $U_{t}'(0)dt$ -- a member of the Lie algebra. We know, of course, that the Lie Algebra of the unitary group is comprised of anti-Hermitian operators (this can be checked without Lie Algebra, of course), and so $iU_{t}'(0)$ is Hermitian. From Lie Algebra, we can tell that this represents a generator of time translations -- and from a little experience of classical mechanics, we want this to represent energy. So for dimensional consistency with energy, we write:

$$H(t)=i\hbar U_{t}'(0)$$
(Why $\hbar$ and not $h$? Because $U_{t}'(0)$ is basically already in "radians per second".) This is called the Hamiltonian operator. It determines $U_t$, and thus describes the time evolution of a state. How exactly? Since:

$$|\psi(t+dt)\rangle=U_t(dt)|\psi(t)\rangle$$
We can write re-arranging:

$$\frac{\partial |\psi(t)\rangle}{\partial t}=-\frac{i}{\hbar}H(t)|\psi(t)\rangle$$
This is the most general form of the Schrodinger equation. Note that the earlier exponential equation is the "general solution" to this equation -- obviously not very useful, rewritten as what is known as the Dyson series:

$$|\psi(t)\rangle=\mathcal{T}\left\{e^{-i/\hbar\int H(t) dt}|\psi(0)\rangle\right\}$$
It's easy to show from this that the evolution of a density matrix $\rho(t)$ is similarly:

$$\frac{\partial\rho(t)}{\partial t}=-\frac{i}\hbar [H, \rho]$$
Which is the von Neumann equation, whose solution is given by:

$$\rho(t)=\mathcal{T}\left\{e^{-i/\hbar\int H(t) dt}\rho(0)e^{i/\hbar \int H(t) dt}\right\}$$
These should all appear as obvious special cases of Lie theoretic results.

$H(t)$ is not the same as $i\hbar\partial/\partial t$. $H(t)$ is a Hermitian operator, i.e. an observable, while $\partial/\partial t$ does not act on the Hilbert space at all. One could also see what could wrong by equating the two in the "solution to the Schrodinger equation" above. The Schrodinger equation does not say that $H$ and the time-derivative are equal in general -- rather, it says that they are the same on a valid state vector $|\psi(t)\rangle$ -- you cannot just "factor this out".

So the Hamiltonian is fundamentally what determines the dynamics of a quantum system. Give me a Hamiltonian, and you've given me a theory. The Schrodinger equation (or equivalently the von Neumann equation) above is just an axiom of quantum mechanics/of any quantum mechanical theory.

Can we talk about the velocity and acceleration observables for a moment? Actually, we can't, because they fundamentally have to do with time evolution, and we can't have observables that depend on the time-evolution of the state -- observables must act on the Hilbert space. But we can define observables that predict how the state will evolve (like the Hamiltonian with the Schrodinger equation).

Doing this systematically is where the Heisenberg picture comes in.

What does this mean? Everything we've discussed so far is the Schrodinger picture, where the state evolves on a fixed background basis created by the observables' eigenvectors -- so observables represent active transformations. Instead, we can have a completely different picture of reality, the Heisenberg picture, where we view time-evolution as simply viewing the state in a different basis -- then the observables represent passive transformations.

OK, so how do we do this? Remember how every question in quantum mechanics can fundamentally be asked in terms of expectation values (specifically those of Hermitian projections). The expected value of an observable at time $t$ of course evolves as:

$$\langle A\rangle(t) = \langle\psi|U^*(t)AU(t)|\psi\rangle$$
In the Schrodinger picture, we attach the $U(t)$ to $|\psi(0)\rangle$ to make $\langle A\rangle(t)=\langle\psi(t)|A|\psi(t)\rangle$. In the Heisenberg picture instead, we attach the $U(t)$ to the $A(0)$, writing $\langle A\rangle(t)=\langle\psi|A(t)|\psi\rangle$.
From differentiating conjugation in $A(t)=U^*(t)A(0)U(t)$, we get:

$$\frac{dA}{dt}=\frac{i}\hbar [H, A]$$
This is the Heisenberg equation. Immediately, it yields:

$$\begin{array}{l}\frac{{dX}}{{dt}} = \frac{i}{\hbar }\left[ {H,X} \right]\\\frac{{{d^2}X}}{{d{t^2}}} = - \frac{1}{{{\hbar ^2}}}\left[ {H,\left[ {H,X} \right]} \right]\end{array}$$
Thinking of the evolution of $X$ as a translation of the co-ordinate system, etc., what this does is give us two conditions on what the Hamiltonian should look like for a "Euclidean" system:

$$\begin{array}{l}\left[ {H,X} \right] = - \frac{{i\hbar }}{m}P\\\left[ {H,\left[ {H,X} \right]} \right] = \frac{{{\hbar ^2}}}{m}U'(x)\end{array}$$
This gives us yet another strong reason (besides the fact that the Hamiltonian generates time-translations, that the "eigenvectors" of $\partial/\partial t$ are the energy states by the de Broglie theorem (but not really), etc.) to suspect that the Hamiltonian represents the energy of the system. Indeed if we use:

$$H=\frac1{2m}P^2+U(x)$$
We can confirm those conditions above. Well, this is certainly not the only Hamiltonian compatible with classical mechanics, so at this point, I'll just say that this is confirmed by experiment, and is an axiom of the quantum theory of Euclidean systems.

Exercise: By taking expectation values in the Heisenberg equation, show that $m\frac{d}{dt}\langle x\rangle =\langle p\rangle$ and $\frac{d}{dt}\langle p\rangle = -\langle U'(x)\rangle$ under the Euclidean Hamiltonian. This is called the Ehrenfest theorem.

I'll discuss one final application of the Heisenberg formalism: it makes Noether's theorem completely trivial.

Indeed, $dA/dt=0$ iff $[H,A]=0$ iff $\forall\tau, H=e^{-i/\hbar A\tau}He^{i/\hbar A \tau}$. Then $A$ is a conserved quantity and conjugation with it as an infinitesimal generator represents a symmetry of the Hamiltonian.

Mixed states I: density matrix, partial trace, the most general Born rule

In the last article, we saw that sub-systems entangled with other sub-systems did not have well-defined pure states themselves -- just like correlated random variables don't have their own probability distributions. Since pretty much everything you see in the real world is entangled with something -- has correlations with some other thing -- this is a problem. One can't just consider the "state of the entire universe" when you just want to study a single electron or something.

Wait -- why can't we just consider the marginal distributions, like we do in statistics? OK, suppose we start with the system -- with $|\phi\rangle$ and $|\varphi\rangle$ an orthonormal basis:

$$|\psi\rangle = \frac1{\sqrt2}|\phi\rangle\otimes|\varphi\rangle+\frac1{\sqrt2}|\varphi\rangle\otimes|\phi\rangle$$
Naively, you may think that the state of the first sub-system $|\psi_1\rangle$ may be given by $|\psi'_1\rangle=\frac1{\sqrt2}|\phi\rangle+\frac1{\sqrt2}|\varphi\rangle$. Certainly, if we're measuring the subsystem with an operator with eigenvalues $|\phi\rangle$ and $|\varphi\rangle$, you have 50% probabilities of each. But to say that two things are in the same state requires that they produce the same outcome for any measurement, not just that one. Does our sub-system behave exactly like $|\psi'_1\rangle$ for all observables? Recall that in the last article, we showed that collapsing the first sub-system onto $|\chi\rangle$ collapses the entire system into the state:

$$|\chi\rangle\otimes\left(
\langle\chi|\varphi\rangle|\phi\rangle+\langle\chi|\phi\rangle|\varphi\rangle\right)$$
To calculate the probability amplitude of this collapse, we may take the inner product of this with the original state -- you can compute this, and see the answer comes down to $1/\sqrt2$, i.e. there's a probability of $1/2$ of the first subsystem collapsing to any such eigenstate $|\chi\rangle$. You can use any observable in this two-dimensional state space, and the sub-system would collapse into either eigenstate with probability exactly $1/2$.

This is a completely different situation from if the state of the first subsystem were simply a pure state like $|\psi'_1\rangle$.

The situation we're dealing with is called a mixed state -- an example of a mixed state, in line with the motivating examples we had at the beginning of the course -- is unpolarised light. In fact, the state we described above models precisely unpolarised light involving two photons (is it obvious why?).

The basic idea behind mixed states is that we have some uncertainty as to what the state of a particle is -- we don't know if the particle has state $|\phi\rangle$ or $|\varphi\rangle$ -- it has a 50% chance of either. This is a classical probability, rather than a quantum one, and different from the state being a superposition of these states, as we just saw above.

Does this sort of thing occur with multivariate distributions in statistics? Suppose we have a multivariate distribution $\psi(x,y)$ and extract the marginal $x$-distribution $\phi(x) = \int_y \psi(x,y) dy$. Certainly this $\phi(x)$ gives us the right probability densities of each $x$-value. But the analog of considering general states like $|\chi\rangle$ is to make a transformation of the domain -- like a Fourier transform -- and consider probability densities in the transformed domain.

As an exercise, write down a multivariate Fourier transform expression for $\hat{\psi}(\omega_1,\omega_2)$ and use it to compute the $\omega_1$-marginal probabilities $\hat{\phi}(\omega_1)$ -- compare this to what you would get if you were to Fourier-transform the $x$-marginal $\phi(x)$ directly.

But saying "it has 1/2 probability of being in $|\phi\rangle$ and 1/2 probability of being in $|\varphi\rangle$" is clearly an overdetermination. As we saw above, this resulting state has a 1/2 probability of collapsing onto any state -- this is a statement that doesn't depend on $|\phi\rangle$ and $|\varphi\rangle$, the behaviour of the state is the same if you describe it instead as having "1/2 probability of being in $\frac1{\sqrt2}(|\phi\rangle+|\varphi\rangle)$ and 1/2 probability of being in $\frac1{\sqrt2}(|\phi\rangle-|\varphi\rangle)$". These two are different statistical ensembles but in the same mixed state.

You can see similarly that "50% left-polarised + 50% right-polarised" is the same mixed state as "50% left-circular + 50% right-circular" -- they're both just unpolarised light.

What's the general condition for two statistical ensembles to produce the same observations?

Given statistical ensemble $\left(\left(p_i,|\psi_i\rangle\right)\right)$ and $\left(\left(p_i,|\psi'_i\rangle\right)\right)$, they are the same mixed state if for all $|\chi\rangle$, the probabilities of collapsing onto $|\chi\rangle$ is the same, i.e.

$$\sum_i p_i|\langle\psi_i|\chi\rangle|^2=\sum_i p_i|\langle\psi'_i|\chi\rangle|^2$$
Well, each side of this equation is just the evaluation of a quadratic form for the vector $|\chi\rangle$ -- and two quadratic forms are identically equal on all vectors if and only if their matrix representations are the same. Well, what's the matrix representation? In the basis of $|\psi_i\rangle$, it's just the matrix of probabilities $p_i$. The way to write this in Bra-ket notation is to factor out the $|\chi\rangle$s:

$$\left\langle \chi \right|\left( {\sum\limits_i {{p_i}\left| {{\psi _i}} \right\rangle \left\langle {{\psi _i}} \right|} } \right)\left| \chi \right\rangle = \left\langle \chi \right|\left( {\sum\limits_i {{p_i}\left| {{{\psi '}_i}} \right\rangle \left\langle {{{\psi '}_i}} \right|} } \right)\left| \chi \right\rangle
$$
This quadratic form in between, representing a mixed state, is called the density matrix and can be used to completely specify mixed states. In this sense, it is a generalisation of the state vector, which can only be used to represent pure states.

$$\rho={\sum\limits_i {{p_i}\left| {{\psi _i}} \right\rangle \left\langle {{\psi _i}} \right|} }
$$
You may confirm that indeed:

$$\frac12|\phi\rangle\langle\phi|+\frac12|\varphi\rangle\langle\varphi|=\frac12\left(\frac{|\phi\rangle+|\varphi\rangle}{\sqrt2}\frac{\langle\phi|+\langle\varphi|}{\sqrt2}\right)+\frac12\left(\frac{|\phi\rangle-|\varphi\rangle}{\sqrt2}\frac{\langle\phi|-\langle\varphi|}{\sqrt2}\right)$$
In fact, there is a simpler way to see that those two ensembles are the same: the density matrix is simply the Gram matrix of the ensemble -- you take the states in the ensemble, weighted by $\sqrt{p_i}$ in a matrix $Y$, and $\rho=Y^*Y$. Well, $Y^*Y=Y'^*Y' \iff Y'=UY$ for some unitary $Y$, i.e. the ensembles are rotations of each other.

Properties of the density matrix, generalised Born's rule, etc.

Here's something that's obvious: the density matrix is nonnegative-definite ("positive-semidefinite") Hermitian and unit-trace -- and all such matrices can represent density matrices.

Well, so it's a Hermitian operator -- does it represent any interesting observable? Not really. It's an observable, sure, but not an interesting one (you might say it measures something's being in one of the ensemble states -- written in an orthonormal basis -- and whose eigenvalues are the mixing ratios, etc. -- but what if two mixing ratios are the same? Its behaviour is just bizarre and useless, really).

We saw earlier that the probability of a density matrix collapsing into a state $|\chi\rangle$ is given by $\langle\chi|\rho|\chi\rangle$.

This is completely different from the generalised Born's rule we saw earlier which took the form $\langle\psi|L|\psi\rangle$! There, the state was the vector and the information on the projection space was the quadratic form in between. Here, the state is the quadratic form in between while the state being projected onto is the vector. This is just a generalisation of the simple Born rule $\langle\chi|\psi\rangle\langle\psi|\chi\rangle$, as far as I can see. If anyone comes up with a connection between it and the generalised Born rule for pure states, tell me.

This brings the question, though -- what's the most generalised Born's rule we can come up with? What is the probability of a mixed state collapsing into some eigenspace of a Hermitian projection operator?

Well, given the ensemble $((p_i,|\psi_i\rangle))$ (you can start writing ensembles with their density matrices now if you like, like $\sum p_i|\psi_i\rangle\langle\psi_i|$ -- but I just want to reaffirm that our result will indeed be in terms of the density matrix), the probability is:

$$\sum_i p_i\langle\psi_i|L|\psi_i\rangle$$
This is hardly useful -- it's not in terms of the density matrix at all. But look at each term -- what's $p_i\langle\psi_i|L|\psi_i\rangle$? $L|\psi_i\rangle$ is the $i$th column of $L$ in the $(|\psi_i\rangle)$-basis -- the inner product $\langle\psi_i|L|\psi_i\rangle$ is the $i$th entry of this column. Multiplying this by $p_i$ gives us the dot product of the $i$th row of $\rho$ with the $i$th column of $L$. The sum of these for all $i$ gives us the trace of $\rho L$:

$$\ldots = \mathrm{tr}(L\rho)$$
This is the most general form of Born's rule. Note that our derivation could have also applied to finding the expectation value of a general operator $A$ under the density matrix $\rho$ (recall that Hermitian projection operators are basically "indicator variables" whose expectation values represent probabilities), indeed generally:

$$\langle A\rangle_\rho=\mathrm{tr}(L\rho)$$
(note that $\mathrm{tr}(V)=\sum_{i} \langle i | V|i \rangle $ for any basis $(|i\rangle)$, which you should show.)

It is also trivial to show that upon collapse given by Hermitian projection operator $L$, the density matrix collapses to:

$$\rho'=\frac1{\mathrm{tr}\left(L\rho L\right)}{L\rho L}=\frac1{\mathrm{tr}\left(L\rho\right)}{L\rho L}$$
Generalising the pure state collapse to $|\psi'\rangle=\frac{1}{\langle \psi | L | \psi\rangle}L|\psi\rangle$. One may check that the above expression reduces to $|\chi\rangle\langle\chi|$ in the case where $L=|\chi\rangle\langle \chi|$.

Partial trace, trace

We started our discussion considering the pure state $\frac1{\sqrt2}|\phi\rangle\otimes|\varphi\rangle+\frac1{\sqrt2}|\varphi\rangle\otimes|\phi\rangle$ and asking for the mixed state of the first sub-system. We computed the inner product of this state with its projection under the operator $|\chi\rangle\langle\chi|\otimes1$ -- this tells us the evaluation of the quadratic form $\langle\chi|\rho_A|\chi\rangle$ at all vectors $|\chi\rangle$, which determines the quadratic form $\rho_A$ of the first state.

So what exactly did we do -- in general? Starting with a density matrix $\rho$ on $H_1\otimes H_2$, we compute the probability of the first sub-system appearing in state $|\chi\rangle$: it's $\mathrm{tr}((|\chi\rangle\langle\chi|\otimes 1)\rho)$. So we try to find a density matrix $\rho_1$ satisfying, for all states $|\chi\rangle$:

$$\mathrm{tr}((|\chi\rangle\langle\chi|\otimes 1)\rho)=\mathrm{tr}(|\chi\rangle\langle\chi|\rho_1)$$

Exercise: Let $V$ be an operator on $H_1\otimes H_2$. We define its partial trace on $H_2$ as $\mathrm{tr}_2(V)=\sum_{j}\langle j|V|j\rangle $ for basis $(|j\rangle)$ of $H_2$ (where the inner product is done by extending operators by tensoring them with the identity). Show that the density matrix $\rho_1$ is given by:

$$\rho_1=\mathrm{tr}_2(\rho)$$
I.e. show that for operators of the form $A\otimes I$: $\mathrm{tr}[(A\otimes I)\rho]=\mathrm{tr}_1(A\,\mathrm{tr}_2\rho)$.

Systems and sub-systems, the tensor product, quantum entanglement

What if we want to describe probabilities relating to multiple objects -- a system of objects?

You might think it's sufficient to just write down the state vector of each individual object -- but this doesn't really tell us the entire picture. Suppose for example we're considering the state of Schrodinger's cat, less popularly known as Schrodinger's cuckoo, where the box contains TNT and the cuckoo bird. The state of the TNT is $|\mathrm{unexploded}\rangle+|\mathrm{exploded}\rangle$ and the state of the cuckoo is $|\mathrm{alive}\rangle+|\mathrm{dead}\rangle$ -- right?

If I'm in charge of the box, the state of the cuckoo will definitely be $|\mathrm{dead}\rangle$ even if the state of the TNT is $|\mathrm{unexploded}\rangle$

Not exactly. There is a correlation between the state of the TNT and the state of the cuckoo that goes missing here -- the cuckoo is dead if and only if the TNT has exploded, and the cuckoo is alive if and only if the TNT is unexploded. In fact, defining the state vectors separately doesn't really make any sense -- we're just assigning the coefficients based on overall probabilities, and as we will see, this is really mixing up quantum and classical probabilities in a certain way whereas the state vector is supposed to only show quantum probabilities.

Well, this is really the same question as what we do when we have multiple correlated random variables in statistics -- we define a "joint" probability function on a "joint" phase space that is the Cartesian product of the original phase spaces.

You may be inclined to claim that similarly, our new Hilbert space should be the Cartesian product of the original Hilbert spaces. But Hilbert spaces are different in a fundamental sense from these phase spaces -- every point on the classical phase space is an independent state in the Hilbert space -- and general vectors are distributions on the classical phase space. So like how the cardinalities of the classical phases are multiplied, the dimensions of the Hilbert spaces are multiplied.

The reason that we sometimes draw an analogy between the classical state space and the quantum state space is that the state vectors are really the "real objects" in quantum mechanics, and the Hilbert space shows the possible configurations of the state in this sense.

The product we want of Hilbert spaces -- which is not the Cartesian product -- is called a tensor product of linear spaces -- given an orthogonal basis $(|\phi_1\rangle,|\phi_2\rangle,\ldots)$ for the first Hilbert space and $(|\psi_1\rangle,|\psi_2\rangle,\ldots)$ for the second, their tensor product is spanned by new vectors which we denote as

$$\left( {\begin{array}{*{20}{c}}{|{\phi_1}\rangle \boxtimes |{\psi_1}\rangle ,|{\phi_1}\rangle \boxtimes |{\psi_2}\rangle ,...,}\\{|{\phi_2}\rangle \boxtimes |{\psi_1}\rangle ,|{\phi_2}\rangle \boxtimes |{\psi_2}\rangle ,...,}\\ \vdots \end{array}} \right)$$

We're using $\boxtimes$ instead of $\otimes$ in the above enumeration of the basis, because we haven't yet defined the tensor product of states. The idea is that $|\phi_i\rangle\boxtimes|\psi_j\rangle$ are just placeholders, and we will shortly state that they are/can be the tensor product $|\phi_i\rangle\otimes|\psi_j\rangle$, which we will define now.

Certainly, this can represent any possible state in which the combined system of two objects can be in. What we need is a way to express the state of a combined system of two independent things in this "larger" Hilbert space -- i.e. a map from $H_1\times H_2\to H_1\otimes H_2$ that takes the (pure) states of two independent objects in $H_1$ and $H_2$ and outputs their state as a combined system in $H_1\otimes H_2$ -- we will call this product the tensor product of vectors, and denote it by the same symbol $\otimes$.

OK, so what's the map? Certainly, $|\phi_i\rangle\otimes|\psi_j\rangle$ must form an orthogonal basis for $H_1\otimes H_2$ (why? think about this for a while -- they're clearly orthogonal, as they are mutually exclusive -- you can't be in "$|\phi_i\rangle$ and $|\psi_j\rangle$" and "$|\phi_{i'}\rangle$ and $|\psi_{j'}\rangle$" unless $(i,j)=(i',j')$; spanning is proven similarly, as considering the $|\phi_i\rangle$s and $|\psi_j\rangle$s as eigenstates of some operators $X$ and $Y$ on $H_1$ and $H_2$, then if one performs the operation of "observing $X$ and $Y$" -- and we can do this because the objects are independent -- then because the objects must be found in one of $|\phi_i\rangle$ and one of $|\psi_j\rangle$, the system must be found in one of $|\phi_i\rangle\otimes|\psi_j\rangle$ -- thus its original state was a linear combination of such states).

OK, so

$$
(p_1|\phi_1\rangle+p_2|\phi_2\rangle+\ldots)\otimes(q_1|\psi_1\rangle+q_2|\psi_2\rangle+\ldots)\\
\begin{align}
=\ & r_{11}|\phi_1\rangle\otimes|\psi_1\rangle + r_{12}|\phi_1\rangle\otimes|\psi_2\rangle+\ldots+\\
&r_{21}|\phi_2\rangle\otimes|\psi_1\rangle + r_{22}|\phi_2\rangle\otimes|\psi_2\rangle+\ldots+\\
& \vdots
\end{align}$$
What are the coefficients $r_{ij}$?

Well, it's fairly obvious that $|r_{ij}|^2=|p_{i}|^2|q_{j}|^2$ -- that the probabilities are multiplicative, this is tautological given what we want our product to represent -- the probability that the system is found in the state $|\phi_i\rangle\otimes|\psi_j\rangle$ is the probability that the objects are found in states $|\phi_i\rangle$ and $|\psi_j\rangle$, which is the product of the respective probabilities, as they are independent objects.

Is it also true that the probability amplitudes are multiplicative, i.e. $r_{ij}=p_iq_j$?

This may seem hard to prove, but the idea is quite simple: suppose we observe the state with the observables $X$ and $Y$, and find it in the state $|\phi_i\rangle\otimes|\psi_j\rangle$. Well, then if $r_{ij}=u_{ij}|r_{ij}|$ for some unit complex number $u_{ij}$, then from the right-hand-side, we must have collapsed to $u_{ij}|\phi_i\rangle\otimes|\psi_j\rangle$. So we must have $u_{ij}=1$.

So indeed the product we're looking for is exactly the tensor product from tensor algebra.

Here's a thing worth noting -- we've been referring to "systems" and "objects" as if they are somehow completely distinct things. But are they? The cat's state is itself a tensor product of a massive number of different states belonging to each elementary particle in its body, and lives already in a massive Hilbert space, because the "object" is itself a system. We will use the term subsystem instead of object from now.

Alright: so we now know that elements of the tensored Hilbert space are all states, and only the ones that are factorable into an element of $H_1$ and $H_2$ represent subsystems that are independent. This is precisely how only factorable probability mass/density functions represent independent variables in statistics. Otherwise the variables are correlated -- not necessarily linearly correlated, but correlated.

Such correlations can, of course, exist in our quantum mechanical theory, too -- like the cuckoo-TNT system we mentioned earlier. These are called quantum correlations or quantum entanglement.

Why the fancy name? Because its consequences may seem superficially kinda "surprising". It's also a demonstration of quantum mechanics being different from classical mechanics, because without entangled states, the dimension of $H_1\otimes H_2$ would indeed be the sum of those of $H_1$ and $H_2$ rather than their product, like with phase spaces in classical mechanics.

OK, what kind of surprising consequences?

They're basically all of the following nature: suppose we have a state given by:

$$\frac1{\sqrt2} (|\phi\rangle\otimes|\psi\rangle+|\psi\rangle\otimes|\phi\rangle)$$
I.e. two entangled particles where we know that they are in two distinct states, but we don't know which is which. Such a state can certainly be produced -- how? Just put two identical independent particles in a box then do a "partial" measurement -- a "peek" -- (which can be achieved, e.g. by some logic gates) that checks if they're in the same state or not, and uncovers no other information.

Now separate the particles spatially -- there's nothing wrong with this, they're still a system, which still has a state -- and give one to Alice and the other to Bob. Now if Bob looks at his particle and sees it in $|\psi\rangle$, he immediately knows that Alice could only observe her particle to be in state $|\phi\rangle$ -- there's nothing Alice can do to change this outcome.

(you may worry that spatially separating the particles alters the state in some important way -- but it doesn't: the states $|\phi\rangle$ and $|\psi\rangle$ are both transformed individually that doesn't change the entangled structure of the combined state -- make sure this makes sense to you. But if it makes you happy, you could imagine the particles were already spatially separated when they were first entangled.)

OK, perhaps you don't find this particularly surprising or unintuitive -- I don't either. But perhaps you do -- perhaps you think there's a violation of locality -- and the reason you do is because you haven't yet fully accepted logical positivism. Let's consider what locality entails for each observer in the set-up, and see if it's violated:

Alice: From Alice's perspective, Bob opening his box is just another way to observe her particle -- or rather, she can observe Bob's brain that contains the information, which collapses the state from her perspective. But this is perfectly local -- it takes time for information to propagate from Bob to her. Alternatively, if she doesn't observe Bob's brain and just observes her own box later, that's when her state collapses to $|\phi\rangle$, and she then learns that Bob had collapsed his state into $|\psi\rangle$ -- but as Bob cannot choose what his state vector collapses to, so he can't send her any information through entanglement. Even if there were a large number of entangled systems this way, the distribution of the states Alice can observe is the same whether or not Bob has collapsed his states (you can confirm this -- this is an idea called the no-communication theorem which we will discuss later in more mathematical detail).
Bob: Certainly, Bob acquires knowledge of something far away, but no information actually propagated from Alice to him -- he just observed his own box.
another observer: Charlie, who stands somewhere between Alice and Bob, too takes time to observe Bob's brain.

So there really isn't a violation of locality. This isn't surprising at all -- certainly one could have classical correlations too. You could just juggle two distinct particles in a box and give them to each person, and Bob discovering his particle allows him to determine Alice's particle.

The difference between the classical case and the quantum case is that in the classical case you could pretend that there's some hidden truth that is just not known to the observers. Quantum mechanics forbids any such hidden truth (as confirmed by commutator relations), and forces you to accept logical positivism, and there cannot be a "universal observer" as such a notion is inherently non-local. But the fact that correlation isn't non-local doesn't depend on whether you have metaphysical notions of hidden truths in classical physics -- it is a physical question, and is the same in the classical and quantum cases.

Are we done writing down our algebra of tensor products? We still haven't discussed how inner products and projections of tensor products behave. The basic question is "how do we upgrade/combine operators from $H_1$ and $H_2$ to $H_1\otimes H_2$? Let's start with the simple case of a factorable state in the form $|\phi\rangle\otimes|\varphi\rangle$. Suppose we apply a projection operator $X$ on the first particle. Have we made any observation on the second state? No -- just an identity projection. Or we could make an observation, a projection $Y$. So we can say that for the combined observation $X\otimes Y$,

$$(X\otimes Y)(|\phi\rangle\otimes|\varphi\rangle)=(X|\phi\rangle)\otimes(Y|\varphi\rangle)$$
And an upgrade from $H_1$ to $H_1\otimes H_2$ is just tensoring with the identity $X\otimes 1$.

But the full range of operators on $H_1\otimes H_2$ is a lot more complicated. We could consider entangled states. We could consider operators that are entangled ("partial measurement" operators like we described -- think about what these are). How would measurements on linear combinations of states look like (we know they should apply linearly, but let's show that)?

Suppose we have a state in the form $\frac1{\sqrt2}|\phi\rangle\otimes|\varphi\rangle+\frac1{\sqrt2}|\varphi\rangle\otimes|\phi\rangle$. What exactly is this? We had two independent subsystems each in state $\frac1{\sqrt2}|\phi\rangle+\frac1{\sqrt2}|\varphi\rangle$, then we made an observation that showed they were in two distinct states -- we don't know which is in which. Now we make an observation and collapse the first subsystem to $|\chi\rangle$.

How does this alter the state of sub-system 2?

OK, so "was" (quotation marks! quotation marks!) the system in $|\phi\rangle\otimes|\varphi\rangle$ or $|\varphi\rangle\otimes|\phi\rangle$? This is a question for Bayes' theorem.

The above diagram is for illustration only. There is no real hidden truth (as we will see a few articles from now) of whether the state was initially $|\phi\rangle\otimes|\varphi\rangle\otimes|\phi\rangle$ or otherwise. But the probabilities still obey all the standard laws, such as Bayes's theorem, so tree diagrams make sense to illustrate this.

So the "probability that the system was in $|\phi\rangle\otimes|\varphi\rangle$" (quotation marks! quotation marks!) is:

$$\frac{\frac12|\langle\chi|\phi\rangle|^2}{\frac12|\langle\chi|\phi\rangle|^2 + \frac12|\langle\chi|\varphi\rangle|^2}$$
(which if $\langle\phi|\varphi\rangle=0$ is just $|\langle\chi|\phi\rangle|^2$) And analogously for the other possibility. So the collapse of sub-system 1 to $|\chi\rangle$ collapses the entire state to

$$|\chi\rangle\otimes\left(\frac1{\sqrt2}\langle\chi|\phi\rangle\cdot|\varphi\rangle+\frac1{\sqrt2}\langle\chi|\varphi\rangle\cdot|\phi\rangle\right)$$
Or some normalisation thereof if $\langle\phi|\varphi\rangle\ne0$. You can confirm that if $|\chi\rangle=|\phi\rangle$ or $|\chi\rangle=|\varphi\rangle$, this reduces to $|\phi\rangle\otimes|\varphi\rangle$ or $|\varphi\rangle \otimes |\phi\rangle$ respectively as we expect.

You can check that this is precisely what you get from applying the projection operator $|\chi\rangle\langle\chi|\otimes 1$ as a linear operator to the original state. The above argument can be repeated for a general vector in the tensored space, yielding the linearity of the tensored operator.

There was the other case we mentioned -- we may have operators that are themselves entangled. What does this mean? Suppose we start with the factorable state:

$$\frac12|\phi\rangle\otimes|\phi\rangle+\frac12|\phi\rangle\otimes|\varphi\rangle+\frac12|\varphi\rangle\otimes|\phi\rangle+\frac12|\varphi\rangle\otimes|\varphi\rangle$$
Then perform the observation corresponding to "are the states different from each other?" This is a projection onto the plane spanned by $(|\phi\rangle\otimes|\varphi\rangle,|\varphi\rangle\otimes|\phi\rangle)$, perpendicular to the plane spanned by $(|\phi\rangle\otimes|\phi\rangle,|\varphi\rangle\otimes|\varphi\rangle)$ (confirm that this is true based on our discussion above of applying inner products to tensored states) -- we can write this as:

$$\left(|\phi\rangle\otimes|\varphi\rangle\right)\left(\langle\phi|\otimes\langle\varphi|\right) +
\left(|\varphi\rangle\otimes|\phi\rangle\right)\left(\langle\varphi|\otimes\langle\phi|\right)
$$
(check that this, and the system of "distributing bras" makes sense). Or alternatively:

$$|\phi\rangle\langle\phi|\otimes|\varphi\rangle\langle\varphi|+|\varphi\rangle\langle\varphi|\otimes|\phi\rangle\langle\phi|$$
One can check that applying this operator to the factorable state indeed results in the entangled state (up to normalisation). For better insight, perform the operation on the factored form of the state.

In the representation shown in the above diagram, a factorable state is one given by a single square region, while a factorable projection operator is one that selects a square region out of the maximally distributed state. With this notion, it becomes clear that an entangled operator (one that cannot be factored into operators on each Hilbert space) is the only kind of operator that projects a factorable state into an entangled state.

We're not saying that entangled operators always project factorable states onto entangled ones -- you can just measure an irrelevant property of the system (e.g. you know each particle is either in the UK or France, and you check "are they both in India?"). But they are the only operators that can, and for any such operator, there exist factorable states that it entangles (almost by definition).

Note that we're only talking about projection operators above. We could certainly have factorable observables that enforce a partial measurement -- e.g. $X_1\otimes X_2$, which measures the product of the positions of the two particles -- but the projection operators onto each of the eigenstates of this operator are not factorable (check this).

The following rules then determine the action of our operations on the tensored Hilbert space.

$(A\otimes B)(|\phi\rangle\otimes|\varphi\rangle)=(A|\phi\rangle)\otimes(B|\varphi\rangle)$ -- the tensored operators associate with the corresponding states.
The tensor product of two linear operators is linear.
The image of a linear combination of operators is the linear combination of the images, i.e. $(A+B)|\psi\rangle=A|\psi\rangle+B|\psi\rangle$.

Dealing with eigenspaces; noncommuting variables and another postulate

At the end of the last article, you might've wondered how one might talk about 3-dimensional position -- so far, we've only considered an operator representing a one-dimensional position, e.g. to find the $x$-co-ordinate of something. This is obviously insufficient. What we need to measure three-dimensional position is three separate measurements of the different spatial dimensions.

But there's a problem here -- we know that upon an observation, the state vector is modified, in that it is replaced by some eigenstate of the observable in question. So after measuring the $x$-co-ordinate, if we measure the $x$-co-ordinate afterwards, are we "really measuring" the $y$-co-ordinate of the particle as it was in its initial state, or have we shaken it around a bit?

Let's try to think very precisely about what's going on here. The first question to ask is -- how do the eigenstates of the $X$ operator look like?

Well, because it's an observable, it must have eigenstates that produce a full eigenbasis. But if each eigenvalue corresponded to just one eigenstate, then we would only have information about the $x$-positions of particles, which is clearly insufficient to represent the entire state of a particle. So we must have each eigenvalue -- each $x$-position -- correspond to an infinitude of states, an eigenspace, corresponding to each position with the same $x$-position (which, remember, is their eigenvalue), and their superpositions thereof. And this makes a lot of sense -- each position is a state, but these positions give us the same values for the $x$-position.

What this means is that each function of the form $g(y,z)\delta(x-x_0)$ is an eigenstate of the $X$ operator, with eigenvalue $x_0$. So we can have something like $g(y,z)=\delta(y-y_0)\delta(z-z_0)$, which would also be an eigenstate of the $Y$ and $Z$ operators (with eigenvalues $y_0$ and $z_0$ respectively), or some other linear combination, which would no longer be an eigenstate of $Y$ and $Z$.

OK. So what happens when we observe $X$ taking the value $x$? You might think that the state just turns into some randomly chosen eigenstate with the observed eigenvalue $x$. But if you think about it, this would be quite unphysical, as this would mean our $X$-observation would magically change our knowledge about the $y$ and $z$ positions too (for example, if the state collapsed into a state that is also an eigenstate of $Y$, we would have accidentally completely measured the $y$ position) -- but we can certainly design experiments in which an observation of an $x$ position does not so radically rattle the particle in the $y$ and $z$ directions.

Another way to think about this is that the eigenvalues of an operator are the measurements we're getting out. If a state is already in an eigenspace corresponding to the eigenvalue $\lambda$, and we "measure" the observable again (i.e. do nothing), the state shouldn't change.

So we don't want the observation to change the $y$ and $z$ probability information in any way -- so what we're looking for is a projection of the state into the eigenspace with eigenvalue $x$. This is in line with our discussion of the generalised Born's rule in the last article -- but it is an additional postulate of quantum mechanics, or rather generalises the existing postulate about states projecting randomly into eigenstates.

Weren't the eigenvalues completely irrelevant? You ask. You can just make the eigenvalues whatever you want, they're just labels to the eigenstates, right? Not really -- the eigenvalues are exactly what you measure. You can choose to measure any function of them, but if you use a function that isn't injective, you are measuring less information about the system, and you're collapsing the state "less" in this sense.

(Unlike in the projection above, the subspace being projected onto upon measurement of X is itself an infinite-dimensional space, spanned by the different positions Y and Z an take. Oh, and we have to normalise the projected state.)

Something that we've seen in this discussion above is that there is a common eigenbasis for $X$, $Y$ and $Z$ -- specifically, the position basis, the basis of Dirac delta distributions centered at the different points in three-dimensional space. From linear algebra, this is equivalent to saying that $X$, $Y$ and $Z$ commute.

What this means is that when you then go on to measure $Y$, and then $Z$, you end up in a state that is a common eigenstate of $X$, $Y$ and $Z$ -- so that you have precise values for each co-ordinate of the positions. And as the $X$ information is only altered in the $X$ observation, etc. so the probability distributions for each variable is the same regardless of the order you measure it in -- so three-dimensional position is indeed well-defined in quantum mechanics.

Just to be clear, the fact that $X$, $Y$ and $Z$ commute is a postulate -- equivalent to the physical claim that each position in space is in fact an eigenstate, that we can in fact pinpoint the position of a particle exactly. We cannot do this e.g. for position and momentum -- $(x,p)$ pairs cannot be considered eigenstates, as there is no simultaneous eigenstate of $X$ and $P$. So for example, you can't just construct a spacefilling curve in $(x,p)$ space to measure position and momentum simultaneously, because the parameters of the curve would simply not have any corresponding eigenstates. The $(x,p)$ space does not exist in the Hilbert space, there are no states that precisely put down the values of position and momentum. It is possible to construct quantum mechanical theories -- called non-commutative quantum theories -- in which the $(x,y,z)$ space isn't in the Hilbert space either, so that our perception of three-dimensional positions must necessarily be approximate.

We're assuming here that this is not so, that three-dimensional space does form an eigenbasis for the $X$, $Y$ and $Z$ operators, that the representation of the $Y$ operator in the $X$ basis is indeed $\psi(x,y,z)\mapsto y\psi(x,y,z)$, not something weird and fancy.

A very different picture arises when you have noncommuting variables. Suppose two operators $X$ and $P$ don't commute, i.e. there is no common eigenbasis for them. So once you observe $X$ and put it in some eigenspace of $X$, there is a non-zero probability that the state will have to be projected out of this $X$-eigenspace when $P$ is measured.

So this means that the observables $X$ and $P$ cannot be measured simultaneously. Some specific bounds on the uncertainties will be discussed in the next article. For now, let's demonstrate an example of two noncommuting variables: position and momentum (in the same direction).

NOTE: We will show in the next article the given results about momentum being $-i\hbar \frac{\partial}{\partial x}$, etc. Just intuit them out here from its eigenvectors.

As we've shown before, the position and momentum operators can be given in the position basis as $x$ and $-i\hbar\partial/\partial x$ respectively. What this means is that given a wavefunction $\psi(x)$, it transforms under these operators as $x\psi(x)$ and $-i\hbar\psi'(x)$ respectively (check that this makes sense -- especially for the position case -- and also that one can go the other direction and show that the corresponding eigenvectors of the position operator must be Dirac delta functions).

So do these operators commute? Clearly not -- the eigenbasis of one is Dirac delta functions in $x$, the other's is sinusoids in $x$. But we can also verify this computationally:

$$\begin{align}XP &= -i\hbar x \frac{\partial}{\partial x}\\
PX\{\psi(x)\}&=-i\hbar\frac{\partial}{\partial x}(x\psi(x))\\
&= -i\hbar\left[\psi(x)+x\psi'(x)\right]\\
\Rightarrow PX &= -i\hbar x\frac{\partial}{\partial x} -i\hbar\end{align}$$

So we have the commutator $i[X,P]=-\hbar$ (why do we talk about $i[A,B]$? Because as it is easy to see, for any Hermitian $A$ and $B$, this is Hermitian, while $[A,B]$ is simply anti-Hermitian). This is the "purest" commutator -- a (scaled) Identity operator. Since we didn't use any other properties of position and momentum, this is a property of all observables that are Fourier transforms of each other/canonically conjugate observables (more on this in the next article).

Exercise: Write down the most generalised form of Born's rule accounting for generalised eigenspaces (the answer is identical to what we've already written, but make sure you understand it). Show, as in the last article, that the probability density of finding a particle somewhere in three-dimensional space is $|\Psi(x,y,z)|^2$ -- make sure you define $\Psi(x,y,z)$ clearly!

Position, momentum bases and operators, Fourier transform, uncertainty

In this article, we'll assume the de Broglie relation for all particles -- i.e. that their momentum is given by $p=hf$. This is actually quite an incredible assumption, even if not surprising -- we've accepted that a particle is a wave in the sense of probability (the wave describes the probability amplitude densities of finding it at some point), but why at all should the spatial frequency of the probability wave relate to its momentum?

Well, it's natural for you to find this assumption unsatisfactory. We've been quite liberal in assuming the de Broglie relation earlier when motivating quantum theory, too -- we'll later produce some motivation for the de Broglie relation for photons, and discuss derivations from quantum mechanics, axiomatising our theory clearly to eliminate circularities. But for now, let's not.

The key point of $p=hf$ is that for a sinusoidal wave $e^{i \cdot 2\pi f \cdot x}$ (so the probability density is uniform, and the standard deviation in the observation of the particle's position is infinite), the momentum takes a specific definite value, $hf$, with zero standard deviation.

Well, what if the wavefunction isn't a simple sinusoid, but some other distribution $\Psi(x)$? If you did all the assigned exercises in the first article, you should know the answer (if not, work it out before reading on). Classically, if you could write that wavefunction as a sum of sinusoids (i.e. use a Fourier transform), then each sinusoid would have its own momentum and there would be some chunk of your matter in each of those momenta, forming a momentum distribution. In quantum mechanics, you can't have chunks of a single quantum, so you this distribution is a probability distribution (still a probability amplitude distribution, because we want superposition). We'll use the notation $\Psi(p)$ to represent this "momentum-space wavefunction", and we'll see why soon.

So it's not too hard to see that the frequency distribution is simply the Fourier transform of $\Psi(x)$, while the momentum-space wavefunction is given by:

$$\Psi(p)=\frac1h \mathcal{F}_x^{p/h}(\Psi(x))$$
Where $\mathcal{F}_x^{p/h}(\Psi(x))$ is the Fourier transform of $\Psi(x)$ (which is a function of $f$) written with the variable substitution $f=p/h$. Note that we're considering the non-normalised Fourier transform, in terms of ordinary frequencies.

Well, $\Psi(x)\, dx$ and $\Psi(p)\, dp$ are just the representations of the state vector in the position and momentum bases respectively. So the inverse Fourier transform acts as a change-of-basis matrix from the position basis to the momentum basis. I.e.

$$|\psi\rangle_P=F|\psi\rangle_X$$
This change-of-basis matrix $F^{-1}$ precisely represents the eigenstates of the momentum operator written in the position basis, and the corresponding eigenvalues are the actual values of the momenta. So we have eigenstates $\frac1h e^{ix \cdot 2\pi p / h} dp$ with corresponding eigenvalues $p$.

Before going any further, let's make sure we know exactly what this means: our change-of-basis matrix $F^{-1}$ is an uncountably infinite-dimensional "matrix" whose "indices" are denoted as $(x,p)$ in the rows-by-columns format. Its general entry is $\frac1h e^{ix \cdot 2\pi p / h} dp$, and each column -- here's the important bit -- each column holds p constant and varies x, i.e. each column, i.e. each eigenstate of $P$ is a function of $x$.

Anyway, so we're looking for a linear operator $P$ solving the eigenvalue problem (and we're just ignoring the scalar multiples):

$$P e^{ix \cdot 2\pi p / h} = pe^{ix \cdot 2\pi p / h}$$
It should be quite clear that the operator we're looking for is:

$$\begin{align}P &= \frac{h}{2\pi i}\frac{\partial}{\partial x} \\
&= -i\hbar \frac{\partial}{\partial x} \end{align}$$
We need to be clear that this is the representation of the momentum operator in the position basis -- in the momentum basis, its representation is simply "$p$" (i.e. its action on each eigenstate $|p\rangle$ is to multiply it by $p$). Similarly, it should be easy to show that in the momentum basis,

$$X=i\hbar\frac{\partial}{\partial p}$$
Exercise: make sure you clearly know and understand what the eigenvectors and eigenvalues of $X$ and $P$ are, in both the position and momentum bases. Hint: something about the Dirac delta function.

Derivation of Heisenberg and Robertson-Schrodinger uncertainty principles

We can derive a variety of "uncertainty principles" -- inequalities showing trade-off between the certainties of two observables -- with some basic algebraic manipulation. It is important to note that none of these individual uncertainty principles is really much more fundamental than any of the others (or at least I don't see in what way they can be) -- one can always make stronger bounds for the uncertainty, and many stronger bonds exist than the ones we're showing here -- but the concept of an uncertainty principle is crucial, in that it demonstrates the rigorously difference between quantum mechanics and statistical physics. In general, the noncommutativity of observables (having no shared eigenstates) is something that has no analog in classical physics.

OK. So we'll show two statements about the product of uncertainties of two observables, $(\langle A^2\rangle - \langle A\rangle^2)^{1/2}(\langle B^2 \rangle - \langle A \rangle^2)^{1/2} $. Once again, there is nothing special about the specific relations we will show -- we can consider other combinations than products, like $\Delta a^2 + \Delta b^2$, and indeed, there exist uncertainty relations for such terms.

Defining $A'=A-\langle A\rangle$ and $B'=B-\langle B\rangle $ for Hermitian (this is important!) $A$ and $B$, we see that:

$$\begin{align}
\langle A'^2\rangle \langle B'^2 \rangle &= \langle \psi | A'^2 | \psi \rangle \langle \psi | B'^2 | \psi \rangle \\
&= \langle A' \psi | A' \psi \rangle \langle B' \psi | B' \psi \rangle \\
&\ge |\langle \psi | A' B' | \psi \rangle| ^ 2 \\
&= \left|\frac12 \langle\psi|A'B'+B'A'|\psi\rangle + \frac12\langle\psi|A'B'-B'A'|\psi\rangle\right|^2 \\
&= \frac14 |\langle\psi|A'B'+B'A'|\psi\rangle|^2 + \frac14|\langle\psi|A'B'-B'A'|\psi\rangle|^2 \\
&= \frac14 |\langle \{A-\langle A\rangle, B-\langle B\rangle\} \rangle| ^2 + \frac14 |\langle [A,B]\rangle|^2\\
&= \frac14 |\langle\{A,B\} \rangle - 2\langle A\rangle \langle B\rangle |^2 + \frac14|\langle[A,B]\rangle|^2\\
\Rightarrow \Delta a\,\Delta b &\ge \frac12 \sqrt{|\langle\{A,B\} \rangle - 2\langle A\rangle \langle B\rangle |^2 + |\langle [A,B]\rangle|^2}
\end{align}$$
This is the Robertson-Schrodinger relation.

(Guide in case you get stuck somewhere -- line 3, Cauchy-Schwarz inequality; line 4, splitting into Hermitian and anti-Hermitian parts; line 5, magnitude of a complex number -- I'm not sure if I can give any better motivation for specifically considering the product of the standard deviations -- like I said, these specific relations are not really that fundamental. I guess we just want to illustrate the point of "the" uncertainty principle, regardless of the specific ways in which it is treated, and would like to get a simple form for it, regardless of how weak or strong it may be.)

One may weaken the inequality further, writing (and this is equivalent to having ignored the real part in line 4, saying the magnitude of a complex number is at least that of the imaginary part):

$$\Delta a\,\Delta b \ge \frac12 |\langle [A,B]\rangle|$$
This is the Heisenberg uncertainty relation. In particular, in the last article, we showed that for the position and momentum operators, $[X,P]=i\hbar$. So in this case, we get the celebrated identity:

$$\Delta x\, \Delta p \ge \frac{\hbar}{2}$$
For canonically conjugate $X$ and $P$.

As mentioned before, other stronger uncertainty relations exist for general observables. Some examples can be found on the Wikipedia page Stronger uncertainty relations (permalink).

Projection operators, generalised Born's rule, position basis, wavefunction

At the end of the last article, I asked you to investigate Born's rule for continuous variables like position and momentum.

Well, the problem is that if $x$ is continuously distributed (i.e. we have an operator $X$ whose eigenvalues form a continuous spectrum $\Sigma_X$), typically $P(x=\lambda)=0$ -- and this gives us very little information about the actual probability distribution. What we're really interested in is $P(x\in B)$ for $B$ some subset of $\Sigma_X$.

Technically, we need $B$ to be a "Borel subset", or "measurable subset". We will be omitting several such technicalities in the article, such as the need for the spectral theorem to define a "projection-valued measure" or "spectral measure" on an operator with a continuous spectrum -- this is something that will be covered in the MAO1103: Linear Algebra course.

First, let's think about $P(x\in B)$ in the countable case. One can write $B=\{\lambda_1,\ldots\lambda_n\}$, and then simply say that

$$P(x\in B)=\sum |\langle\psi|\phi_k\rangle|^2$$
But the term on the right is a Pythagorean sum -- specifically, it is the length-squared of the vector formed by summing all the projections of $|\psi\rangle$ onto the eigenstates $|\phi_1\rangle\ldots|\phi_k\rangle$. But this is the same as the length of the projection of $|\psi\rangle$ onto the span of these eigenstates.

(Note on notations: From here onwards, we will use the notation $|\lambda\rangle$ to refer to the eigenvector corresponding to the eigenvalue $\lambda$ (if the eigenspace has dimension more than 1, we'll figure something out). We will use the notation $\{|B\rangle\}$ to refer to the span of the eigenvectors corresponding to the eigenvalues in $B$.)

So we could just define a Hermitian projection operator $L_X(B)$ for any subset $B$ of the spectrum of $X$ -- it is an easy exercise to write down an explicit form for $L_X(B)$ in terms of the eigenvectors of $X$.

Then the probability $P(x\in B)$ is simply $|L_X(B)|\psi\rangle|^2$. Recalling that a Hermitian projection operator satisfies $L^*=L=L^2$, we can write the generalised Born's rule as:

$$\begin{align}P(x\in B) &= |L_X(B)|\psi\rangle|^2\\ &= \langle\psi|L_X(B)|\psi\rangle\end{align}$$
Well, this is interesting! In the last article, you proved that the expected value of an observable $X$ given a state $|\psi\rangle$ is given by $\langle\psi|X|\psi\rangle$. But here we have a probability given by the same expression. So we want to interpret our projection operators as some sort of "observable" -- we can omit the "Hermitian", since all observables are Hermitian.

There's another place you might've seen something like this, and that is with indicator variables in probability and statistics -- the expected value of an indicator variable for an event is the probability that the event occurs.

Try to interpret these projection operators as observables that are analogous to "indicators" in some sense. If you think a little about it, you might see exactly what these observables represent: the eigenvalues of $L_X(B)$ are all 1 and 0 -- if the value "1" is realised, the state has been projected into the $\{|B\rangle\}$ -- and if the value "0" is realised, it hasn't.

So projection operators are a special type of observable, measuring the answer to "Yes/No questions" -- if the answer to "is the system in one of the states $\{|B\rangle\}$?" is yes, the observable $L_X(B)$ takes the value 1 -- if the answer is no, then it takes the value 0. So it is precisely an "indicator variable" for $\{|B\rangle\}$.

We have seen such projection operators, of course, in the context of polarisation -- where the operator represented whether or not the photon has passed through. Indeed, one may formulate quantum mechanics entirely in terms of projection operators, as any question can be formulated with some number of Yes/No questions (the key reason why this can be done, as we will see -- is that these "yes/no questions" all commute, i.e. the corresponding projection operators share an eigenbasis). Let's not.

Well, this can be generalised in the straightforward way to an operator with a continuous spectrum, resulting in the same expression. We can also calculate probability densities using this result. Let $X$ be an operator with continuous spectrum $\Sigma_X$ -- then we can write the state $|\psi\rangle$ in the eigenbasis of $X$:

$$ |\psi\rangle = \int_{\Sigma_X} |x\rangle\, \Psi(x)\, dx $$
Where $\Psi(x)\, dx=\langle\psi|x\rangle$ are the coefficients of the state in the eigenbasis, i.e. the probability amplitudes -- we call $\Psi(x)$ the wavefunction, and it represents probability amplitude densities. Then for some set $M\subseteq \Sigma_X$ of eigenvalues $L_X(B)|\psi\rangle$ is the projection:

$$ L(M)|\psi\rangle = \int_B |x\rangle\, \Psi(x)\, dx $$
And one may calculate the dot product, noting that complex dot products require taking the complex conjugate:

$$ \langle \psi | L_X(B) | \psi \rangle = \int_B \Psi^*(x)\, \Psi(x)\, dx $$
Which gives us an expression for the probability density function on $\Sigma_X$ as:

$$\begin{align}\rho(x) &=\Psi^*(x)\,\Psi(x) \\
&=|\Psi(x)|^2\end{align}$$
And this applies to any operator with a continuous spectrum, like position and momentum.

Some texts define the eigenvectors $|x\rangle$ of a continuous-spectrum observable differently from us -- it is often conventional to let $|x\rangle$ be infinitely large so that $\langle x_1|x_2\rangle = \delta(x_1-x_2)$. This is so that the amplitudes $\langle\psi|x\rangle$ are not infinitesimal, but instead $\langle\psi|x\rangle=\Psi(x)$ (without multiplication by $dx$). For consistency with discrete spectra, we do not use this convention.

From polarisation to quantum mechanics: states, observables, Born's law

Like most texts on the theory, I will motivate the mathematics of quantum mechanics from the example of polarisation -- mostly because it's a very accessible example of stuff being wavelike. From this example, we will be able to motivate: the state vector (generalising the polarisation), state vector collapse (the event of polarisation), observables and their eigenvalues (stuff like energy, number of photons, etc.), eigenstates and their orthogonality (polarisation basis), noncommuting operators and uncertainty (the noncommuting of lenses).

The key feature of quantum mechanics -- the fundamentally probabilistic nature -- comes from the following two facts, confirmed by experiments (the famous experiments here are the double-slit experiment and photoelectric effect respectively):

Everything is a wave -- objects behave as waves, following the superposition principle and the waves represent densities of observations at large scales.
Everything is a particle -- which manifests itself in the form of some stuff, like energy and momentum, coming in little quanta.

This is the principle of wave-particle duality. You may realise how this implies a probabilistic description, but the following example should make it quite clear: consider a wave of light, with energy $hf$ (so it's a single photon) polarised at angle $\theta$ to the horizontal -- and it passes through a horizontal polarising filter. Well, then the wave that passes through would be a horizontally polarised wave with energy $hf\cos^2\theta$, right?

But this is impossible, since energy levels in quantum mechanics are quantised -- you can't have $\cos^2\theta$ of a photon, you can only have integer multiples of a photon. But the fact that energy drops as $E\cos^2\theta$ is something that you can verify at your home, using sunglasses -- what the heck?

The key point is that the empirical verification of the $\cos^2\theta$ business that you can do at home is on a macroscopic level, when you have a large number of photons $E=Nhf$. So something occurs with the photons on a microscopic level such that when you try it with a large number of photons, $\cos^2\theta$ of the photons pass through.

Well, this is essentially the "definition" of probability! A single photon passes through the filter with a probability of $\cos^2\theta$ so that for a large number of photons, $\cos^2\theta$ of the photons pass through. This is a non-trivial result -- wave-particle duality makes no mentions of probability as such, it just tells us that stuff is both a particle and a wave, but this simple condition in itself implies a probabilistic, non-deterministic reality.

Similar thought experiments can illustrate the probabilistic nature of other things (the "things" in question here will soon be called "eigenstates"): position is easy -- consider a standing wave photon in a box (this can easily be constructed). This is uniformly distributed throughout the box -- so how much of the energy is in some chunk of the box?

Momentum is trickier, but shouldn't be too hard if you're familiar with Fourier transforms -- what's the analog of a "box" in momentum-space? Well, consider a concentrated pulse of light -- this can be written, via a Fourier transform, as the sum of several light waves of different momenta (i.e. frequencies), each wave with some lower energy. Taking "some chunk" of this "box" amounts to filtering some specific frequencies of the light. This can be done easily, e.g. with a colour filter -- so how much of the energy is contained in the waves with these specific momenta?

In both cases, the key point is that you can't have a fraction of the energy of the photon at these positions/momenta, so you must have a probability of measuring the photon to be in a specific range of positions or a specific range of positions -- to be in a specific region or in a specific region of momentum-space.

The fundamental point here can be made for any quantity $X$: if you can filter out the "part" of a collection of particles that has $X$ in a certain subset of its range, then on a microscopic level, is probabilistic. The act of "filtering out the parts with a certain $X$", applied to a single particle, is just the act of checking if a particle is in a certain $X$-interval, and is called measurement. Any quantity that you can measure is called an observable.

Something like polarisation is really a form of measurement -- you're finding out whether or not the photon is in a certain polarisation $|\phi_{\parallel}\rangle$. You may have another observable, corresponding to a different polarisation -- even one that is orthogonal to the first polarisation -- $|\phi_{\perp}\rangle$ and still get that the photon is in $|\phi_\perp\rangle$. There is nothing wrong with this, as we just know beforehand that the photon is in $|\phi_\parallel\rangle$ or $|\phi_\perp\rangle$. If you perform the polarisation with $|\phi_\perp\rangle$ after the polarisation with $|\phi_\parallel\rangle$, you will find that the photon doesn't pass through, as you know for sure that the photon is not in both $|\phi_\parallel\rangle$ and $|\phi_{\perp}\rangle$.

Now, you may have certain psychological issues with this, as have many in history -- however, you might want to note that the aim of quantum mechanics is not to fix your psychological problems but to explain nature. You need to accept logical positivism and learn to shut up and calculate to be comfortable with quantum mechanics.

So whatever calculus we invent to describe these probabilistic phenomena, it is going to apply to all observables.

In our first example, the polarisation of the photon can be represented by a unit vector which we will denote as $|\psi\rangle$. The polarising filter has two special axes, represented by unit vectors $|\phi_{\parallel}\rangle$ and $|\phi_\perp\rangle$ -- these are special in the sense that an incoming photon polarised as $|\phi_{\parallel}\rangle$ or $|\phi_\perp\rangle$ will simply be scaled, by factors of 1 and 0 respectively -- so these form an eigenbasis for a certain operator.

Well, we said that the photon passes through (with polarisation $|\phi_{\parallel}\rangle$) with probability $\cos^2\theta$ -- this arises simply from considering the amplitude of $|\psi\rangle$ in the direction of $|\phi_\parallel\rangle$. So we can write the probability that the photon ends up in a state $|\phi\rangle$ as $|\langle\psi|\phi\rangle|^2$ where $\langle\psi|\phi\rangle$ is called the corresponding "probability amplitude".

This expression, $P(x=\lambda)=|\langle\psi|\phi_\lambda\rangle|^2$ is called Born's rule.

Let's get back to the eigenbasis -- what exactly is this an eigenbasis of? We said that the corresponding eigenvalues are 1 and 0, so this gives us a complete description of the operator. Note that this operator depends only on the observable (namely "number of photons in the $|\phi_{\parallel}\rangle$ direction), not on the state or any other feature of the observation. So we decide to call this operator/matrix the "observable", and its eigenvalues are the values of the observable that can be measured.

To find properties of these observables, the natural way is to note that the only feature we've really required of them is Born's rule, i.e. the probabilistic interpretation -- so we can apply the axioms of probability and see what they apply in the context of these observables.

$P(E)\ge 0$ -- imply that the observables are over either the reals or complexes, so that $|\langle\psi|\phi\rangle|^2\in \mathbb{R}$ in the first place. The nonnegativity then follows.
$P(\Omega)=1$ and $P\left(\bigcup_i E_i\right) = \sum_i P(E_i)$ for disjoint $E_i$ -- this, along with the second axiom, implies that $\sum |\langle\phi|\psi\rangle|^2 = 1=|\langle\psi|\psi\rangle|^2$ where the sum is taken over all eigenstates $|\phi\rangle$ of the operator. As this must be true for all states $|\psi\rangle$, the thing on the left must be a Pythagorean sum, so the $|\phi\rangle$s must form an orthogonal basis. This implies that all observables are normal operators.

The latter fact is very important, and can also be seen in the following way -- if you a system is in one eigenstate, it cannot possibly collapse onto another eigenstate (the probabilistic interpretation is: if you know for sure the value of the symbol is a thing, it's that thing) -- so we must have $|\langle \phi_1|\phi_2\rangle|^2=0$ for all eigenstates $|\phi_1\rangle$ and $|\phi_2\rangle$.

Another restriction we add is that the observables be not only normal, but Hermitian operators in particular, so they have real eigenvalues. This may seem an odd choice, but it makes sense, as any normal operator may be uniquely written as $X_H+iX_{AH}$ where $X_H$ and $X_{AH}$ are Hermitian, and $X_H$ and $X_{AH}$ commute, so any complex observation can be done unambiguously as two real observations. So we stick to real eigenvalues.

This also makes it essential that we allow complex operators rather than just real ones (the two choices were given to us from the first probability axiom), so that this decomposition is possible. Later, we will see concrete examples of this with commutators $[X,Y]$, which must be multiplied by $i$ to turn Hermitian. We will also see more fundamental reasons to choose complex numbers in QM.

Exercise: Show that the expected value of an observable $X$ given a state $\psi$ can be given as $\langle \psi|X|\psi \rangle$ (i.e. $\psi^*X\psi$ in conventional notation).

Exercise: Explain Born's rule with other observables, like position and momentum. Explain why it holds in general.