tag:blogger.com,1999:blog-32146486079968395292019-07-17T09:38:46.040+01:00The Winding NumberFree undergraduate mathematics and physics courses [not a blog]Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.comBlogger54125tag:blogger.com,1999:blog-3214648607996839529.post-69286950634292416892019-07-08T15:26:00.000+01:002019-07-14T09:34:31.596+01:00Mixed states I: density matrix, partial trace, the most general Born ruleIn the <a href="https://thewindingnumber.blogspot.com/2019/07/systems-and-sub-systems-tensor-product.html">last article</a>, we saw that sub-systems entangled with other sub-systems did <b>not have well-defined pure states</b> themselves -- just like correlated random variables don't have their own probability distributions. Since pretty much everything you see in the real world is entangled with <i>something</i> -- has correlations with some other thing -- this is a problem. One can't just consider the "state of the entire universe" when you just want to study a single electron or <i>something</i>.<br /><br />Wait -- why can't we just consider the <b>marginal distributions</b>, like we do in statistics? OK, suppose we start with the system -- with $|\phi\rangle$ and $|\varphi\rangle$ an orthonormal basis:<br /><br />$$|\psi\rangle = \frac1{\sqrt2}|\phi\rangle\otimes|\varphi\rangle+\frac1{\sqrt2}|\varphi\rangle\otimes|\phi\rangle$$<br />Naively, you may think that the state of the first sub-system $|\psi_1\rangle$ may be given by $|\psi'_1\rangle=\frac1{\sqrt2}|\phi\rangle+\frac1{\sqrt2}|\varphi\rangle$. Certainly, if we're measuring the subsystem with an operator with eigenvalues $|\phi\rangle$ and $|\varphi\rangle$, you have 50% probabilities of each. But to say that two things are in the same state requires that they produce the same outcome for <b>any</b> measurement, not just that one. Does our sub-system behave exactly like $|\psi'_1\rangle$ <b>for all observables</b>? Recall that in the last article, we showed that collapsing the first sub-system onto $|\chi\rangle$ collapses the entire system into the state:<br /><br />$$|\chi\rangle\otimes\left(<br />\langle\chi|\varphi\rangle|\phi\rangle+\langle\chi|\phi\rangle|\varphi\rangle\right)$$<br />To calculate the probability amplitude of this collapse, we may take the inner product of this with the original state -- you can compute this, and see the answer comes down to $1/\sqrt2$, i.e. there's a <b>probability of $1/2$ of the first subsystem collapsing to <i>any</i> such eigenstate $|\chi\rangle$</b>. You can use <i>any</i> observable in this two-dimensional state space, and the sub-system would collapse into <b>either eigenstate with probability exactly $1/2$</b>.<br /><br />This is a <i>completely different situation</i> from if the state of the first subsystem were simply a <b>pure state</b> like $|\psi'_1\rangle$.<br /><br />The situation we're dealing with is called a <b>mixed state</b> -- an example of a mixed state, in line with the motivating examples we had at the beginning of the course -- is <b>unpolarised light</b>. In fact, the state we described above models precisely <b>unpolarised light involving two photons</b> (is it obvious why?).<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Circular_Polarizer_Creating_Clockwise_circularly_polarized_light.svg/1086px-Circular_Polarizer_Creating_Clockwise_circularly_polarized_light.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="393" data-original-width="800" height="157" src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Circular_Polarizer_Creating_Clockwise_circularly_polarized_light.svg/1086px-Circular_Polarizer_Creating_Clockwise_circularly_polarized_light.svg.png" width="320" /></a></div><br />The basic idea behind mixed states is that we have some uncertainty as to <i>what the state of a particle is</i> -- we don't know if the particle has state $|\phi\rangle$ or $|\varphi\rangle$ -- it has a 50% chance of either. This is a classical probability, rather than a quantum one, and different from the state being a superposition of these states, as we just saw above.<br /><br /><div class="twn-furtherinsight">Does this sort of thing occur with multivariate distributions in statistics? Suppose we have a multivariate distribution $\psi(x,y)$ and extract the marginal $x$-distribution $\phi(x) = \int_y \psi(x,y) dy$. Certainly this $\phi(x)$ gives us the right probability densities of each $x$-value. But the analog of considering general states like $|\chi\rangle$ is to make a <b>transformation of the domain</b> -- like a Fourier transform -- and consider probability densities in the transformed domain.<br /><br />As an exercise, write down a multivariate Fourier transform expression for $\hat{\psi}(\omega_1,\omega_2)$ and use it to compute the $\omega_1$-marginal probabilities $\hat{\phi}(\omega_1)$ -- compare this to what you would get if you were to Fourier-transform the $x$-marginal $\phi(x)$ directly.</div><br /><hr /><br />But saying "it has 1/2 probability of being in $|\phi\rangle$ and 1/2 probability of being in $|\varphi\rangle$" is clearly an <b>overdetermination</b>. As we saw above, this resulting state has a 1/2 probability of collapsing onto any state -- this is a statement that doesn't depend on $|\phi\rangle$ and $|\varphi\rangle$, the behaviour of the state is the same if you describe it instead as having "1/2 probability of being in $\frac1{\sqrt2}(|\phi\rangle+|\varphi\rangle)$ and 1/2 probability of being in $\frac1{\sqrt2}(|\phi\rangle-|\varphi\rangle)$". These two are <b>different statistical ensembles</b> but in the <b>same mixed state</b>.<br /><br />You can see similarly that "50% left-polarised + 50% right-polarised" is the same mixed state as "50% left-circular + 50% right-circular" -- they're both just <i>unpolarised light</i>.<br /><br /><b>What's the general condition for two statistical ensembles to produce the same observations?</b><br /><b><br /></b> Given statistical ensemble $\left(\left(p_i,|\psi_i\rangle\right)\right)$ and $\left(\left(p_i,|\psi'_i\rangle\right)\right)$, they are the same mixed state if for all $|\chi\rangle$, the probabilities of collapsing onto $|\chi\rangle$ is the same, i.e.<br /><br />$$\sum_i p_i|\langle\psi_i|\chi\rangle|^2=\sum_i p_i|\langle\psi'_i|\chi\rangle|^2$$<br />Well, each side of this equation is just the evaluation of a quadratic form for the vector $|\chi\rangle$ -- and <b>two quadratic forms are identically equal on all vectors if and only if their matrix representations are the same</b>. Well, what's the matrix representation? In the basis of $|\psi_i\rangle$, it's just the matrix of probabilities $p_i$. The way to write this in Bra-ket notation is to factor out the $|\chi\rangle$s:<br /><br />$$\left\langle \chi \right|\left( {\sum\limits_i {{p_i}\left| {{\psi _i}} \right\rangle \left\langle {{\psi _i}} \right|} } \right)\left| \chi \right\rangle = \left\langle \chi \right|\left( {\sum\limits_i {{p_i}\left| {{{\psi '}_i}} \right\rangle \left\langle {{{\psi '}_i}} \right|} } \right)\left| \chi \right\rangle<br />$$<br />This quadratic form in between, representing a mixed state, is called the <b>density matrix</b> and can be used to completely specify mixed states. In this sense, it is a <b>generalisation of the state vector</b>, which can only be used to represent pure states.<br /><br />$$\rho={\sum\limits_i {{p_i}\left| {{\psi _i}} \right\rangle \left\langle {{\psi _i}} \right|} }<br />$$<br />You may confirm that indeed:<br /><br />$$\frac12|\phi\rangle\langle\phi|+\frac12|\varphi\rangle\langle\varphi|=\frac12\left(\frac{|\phi\rangle+|\varphi\rangle}{\sqrt2}\frac{\langle\phi|+\langle\varphi|}{\sqrt2}\right)+\frac12\left(\frac{|\phi\rangle-|\varphi\rangle}{\sqrt2}\frac{\langle\phi|-\langle\varphi|}{\sqrt2}\right)$$<br />In fact, there is a simpler way to see that those two ensembles are the same: the density matrix is simply the <b>Gram matrix of the ensemble</b> -- you take the states in the ensemble, weighted by $\sqrt{p_i}$ in a matrix $Y$, and $\rho=Y^*Y$. Well, $Y^*Y=Y'^*Y' \iff Y'=UY$ for some unitary $Y$, i.e. the ensembles are rotations of each other.<br /><br /><hr /><br /><b>Properties of the density matrix, generalised Born's rule, etc.</b><br /><br />Here's something that's obvious: the density matrix is <b>nonnegative-definite ("positive-semidefinite") Hermitian and unit-trace</b> -- and all such matrices can represent density matrices.<br /><br />Well, so it's a Hermitian operator -- does it represent any interesting observable? Not really. It's an observable, sure, but not an interesting one (you might say it measures something's being in one of the ensemble states -- written in an orthonormal basis -- and whose eigenvalues are the mixing ratios, etc. -- but what if two mixing ratios are the same? Its behaviour is just bizarre and useless, really).<br /><br />We saw earlier that the probability of a density matrix collapsing into a state $|\chi\rangle$ is given by $\langle\chi|\rho|\chi\rangle$.<br /><br /><div class="twn-pitfall">This is <b>completely different</b> from the generalised Born's rule we saw earlier which took the form $\langle\psi|L|\psi\rangle$! There, the <em>state</em> was the vector and the information on the projection space was the quadratic form in between. Here, the state is the quadratic form in between while the state being projected onto is the vector. This is just a generalisation of the simple Born rule $\langle\chi|\psi\rangle\langle\psi|\chi\rangle$, as far as I can see. If anyone comes up with a connection between it and the generalised Born rule for pure states, tell me.</div><br />This brings the question, though -- what's the <i>most generalised Born's rule</i> we can come up with? What is the probability of a <b>mixed state collapsing into some eigenspace of a Hermitian projection operator</b>?<br /><br />Well, given the ensemble $((p_i,|\psi_i\rangle))$ (you can start writing ensembles with their density matrices now if you like, like $\sum p_i|\psi_i\rangle\langle\psi_i|$ -- but I just want to reaffirm that our result will indeed be in terms of the density matrix), the probability is:<br /><br />$$\sum_i p_i\langle\psi_i|L|\psi_i\rangle$$<br />This is hardly useful -- it's not in terms of the density matrix at all. But look at each term -- what's $p_i\langle\psi_i|L|\psi_i\rangle$? $L|\psi_i\rangle$ is the $i$th column of $L$ in the $(|\psi_i\rangle)$-basis -- the inner product $\langle\psi_i|L|\psi_i\rangle$ is the $i$th entry of this column. Multiplying this by $p_i$ gives us the <i>dot product of the $i$th row of $\rho$ with the $i$th column of $L$</i>. The <i>sum</i> of these for all $i$ gives us the trace of $\rho L$:<br /><br />$$\ldots = \mathrm{tr}(L\rho)$$<br />This is the <b>most general form of Born's rule</b>. Note that our derivation could have also applied to finding the <b>expectation value</b> of a general operator $A$ under the density matrix $\rho$ (recall that Hermitian projection operators are basically "indicator variables" whose expectation values represent probabilities), indeed generally:<br /><br />$$\langle A\rangle_\rho=\mathrm{tr}(L\rho)$$<br />(note that $\mathrm{tr}(V)=\sum_{i} \langle i | V|i \rangle $ for any basis $(|i\rangle)$, which you should show.)<br /><br />It is also trivial to show that upon collapse given by Hermitian projection operator $L$, the density matrix <b>collapses</b> to:<br /><br />$$\rho'=\frac1{\mathrm{tr}\left(L\rho L\right)}{L\rho L}=\frac1{\mathrm{tr}\left(L\rho\right)}{L\rho L}$$<br />Generalising the pure state collapse to $|\psi'\rangle=\frac{1}{\langle \psi | L | \psi\rangle}L|\psi\rangle$. One may check that the above expression reduces to $|\chi\rangle\langle\chi|$ in the case where $L=|\chi\rangle\langle \chi|$.<br /><br /><hr /><br /><b>Partial trace, trace</b><br /><b><br /></b> We started our discussion considering the pure state $\frac1{\sqrt2}|\phi\rangle\otimes|\varphi\rangle+\frac1{\sqrt2}|\varphi\rangle\otimes|\phi\rangle$ and asking for the mixed state of the first sub-system. We computed the inner product of this state with its projection under the operator $|\chi\rangle\langle\chi|\otimes1$ -- this tells us the evaluation of the quadratic form $\langle\chi|\rho_A|\chi\rangle$ at all vectors $|\chi\rangle$, which determines the quadratic form $\rho_A$ of the first state.<br /><br />So what exactly did we do -- in general? Starting with a density matrix $\rho$ on $H_1\otimes H_2$, we compute the probability of the first sub-system appearing in state $|\chi\rangle$: it's $\mathrm{tr}((|\chi\rangle\langle\chi|\otimes 1)\rho)$. So we try to find a density matrix $\rho_1$ satisfying, for all states $|\chi\rangle$:<br /><br />$$\mathrm{tr}((|\chi\rangle\langle\chi|\otimes 1)\rho)=\mathrm{tr}(|\chi\rangle\langle\chi|\rho_1)$$<br /><br /><b>Exercise: </b>Let $V$ be an operator on $H_1\otimes H_2$. We define its <b>partial trace</b> on $H_2$ as $\mathrm{tr}_2(V)=\sum_{j}\langle j|V|j\rangle $ for basis $(|j\rangle)$ of $H_2$ (where the inner product is done by extending operators by tensoring them with the identity). Show that the density matrix $\rho_1$ is given by:<br /><br />$$\rho_1=\mathrm{tr}_2(\rho)$$<br />I.e. show that for operators of the form $A\otimes I$: $\mathrm{tr}[(A\otimes I)\rho]=\mathrm{tr}_1(A\,\mathrm{tr}_2\rho)$.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-75142887905617048262019-07-06T08:52:00.000+01:002019-07-15T12:34:26.503+01:00Systems and sub-systems, the tensor product, quantum entanglementWhat if we want to describe probabilities relating to multiple objects -- a system of objects?<br /><br />You might think it's sufficient to just write down the state vector of each individual object -- but this doesn't really tell us the entire picture. Suppose for example we're considering the state of Schrodinger's cat, less popularly known as <b>Schrodinger's cuckoo</b>, where the box contains TNT and the cuckoo bird. The state of the TNT is $|\mathrm{unexploded}\rangle+|\mathrm{exploded}\rangle$ and the state of the cuckoo is $|\mathrm{alive}\rangle+|\mathrm{dead}\rangle$ -- right?<br /><br /><div class="twn-pitfall">Never trust a cuckoo.</div><br /><div class="twn-furtherinsight">If I'm in charge of the box, the state of the cuckoo will definitely be $|\mathrm{dead}\rangle$ even if the state of the TNT is $|\mathrm{unexploded}\rangle$</div><br />Not exactly. There is a <b>correlation</b> between the state of the TNT and the state of the cuckoo that goes missing here -- the cuckoo is dead if and only if the TNT has exploded, and the cuckoo is alive if and only if the TNT is unexploded. In fact, defining the state vectors separately doesn't really make any sense -- we're just assigning the coefficients based on overall probabilities, and as we will see, this is really mixing up quantum and classical probabilities in a certain way whereas the state vector is supposed to only show quantum probabilities.<br /><br />Well, this is really the same question as what we do when we have multiple correlated random variables in statistics -- we define a <b>"joint" probability function</b> on a <b>"joint" phase space</b> that is the <b>Cartesian product of the original phase spaces</b>.<br /><br />You may be inclined to claim that similarly, our new Hilbert space should be the Cartesian product of the original Hilbert spaces. But Hilbert spaces are different in a fundamental sense from these phase spaces -- every point on the classical phase space is an independent state in the Hilbert space -- and general <b>vectors are <i>distributions</i></b> on the classical phase space. So like how the <em>cardinalities</em> of the classical phases are multiplied, the <b><em>dimensions</em> of the Hilbert spaces are multiplied</b>.<br /><br /><div class="twn-furtherinsight">The reason that we sometimes draw an analogy between the classical state space and the quantum state space is that the state vectors are really the "real objects" in quantum mechanics, and the Hilbert space shows the possible configurations of the state in this sense.</div><br />The product we want of Hilbert spaces -- which is <i>not</i> the Cartesian product -- is called a <b>tensor product of linear spaces</b> -- given an orthogonal basis $(|\phi_1\rangle,|\phi_2\rangle,\ldots)$ for the first Hilbert space and $(|\psi_1\rangle,|\psi_2\rangle,\ldots)$ for the second, their tensor product is spanned by new vectors which we denote as<br /><br />$$\left( {\begin{array}{*{20}{c}}{|{\phi_1}\rangle \boxtimes |{\psi_1}\rangle ,|{\phi_1}\rangle \boxtimes |{\psi_2}\rangle ,...,}\\{|{\phi_2}\rangle \boxtimes |{\psi_1}\rangle ,|{\phi_2}\rangle \boxtimes |{\psi_2}\rangle ,...,}\\ \vdots \end{array}} \right)$$<br /><div class="twn-pitfall">We're using $\boxtimes$ instead of $\otimes$ in the above enumeration of the basis, because we haven't yet defined the tensor product of states. The idea is that $|\phi_i\rangle\boxtimes|\psi_j\rangle$ are just placeholders, and we will shortly state that they are/can be the tensor product $|\phi_i\rangle\otimes|\psi_j\rangle$, which we will define now.</div><br />Certainly, this can represent any possible state in which the combined system of two objects can be in. What we need is a way to express the state of a combined system of two independent things in this "larger" Hilbert space -- i.e. a <b>map</b> from $H_1\times H_2\to H_1\otimes H_2$ that takes the (pure) states of two independent objects in $H_1$ and $H_2$ and outputs their state as a combined system in $H_1\otimes H_2$ -- we will call this product the <b>tensor product of vectors</b>, and denote it by the same symbol $\otimes$.<br /><br />OK, so what's the map? Certainly, $|\phi_i\rangle\otimes|\psi_j\rangle$ must form an <b>orthogonal basis</b> for $H_1\otimes H_2$ (why? think about this for a while -- they're clearly <b>orthogonal, as they are mutually exclusive</b> -- you can't be in "$|\phi_i\rangle$ and $|\psi_j\rangle$" and "$|\phi_{i'}\rangle$ and $|\psi_{j'}\rangle$" unless $(i,j)=(i',j')$; <b>spanning is proven similarly</b>, as considering the $|\phi_i\rangle$s and $|\psi_j\rangle$s as eigenstates of some operators $X$ and $Y$ on $H_1$ and $H_2$, then if one performs the operation of "observing $X$ and $Y$" -- and we can do this because the objects are independent -- then because the objects must be found in one of $|\phi_i\rangle$ and one of $|\psi_j\rangle$, the system must be found in one of $|\phi_i\rangle\otimes|\psi_j\rangle$ -- thus its original state was a linear combination of such states).<br /><br />OK, so<br /><br />$$<br />(p_1|\phi_1\rangle+p_2|\phi_2\rangle+\ldots)\otimes(q_1|\psi_1\rangle+q_2|\psi_2\rangle+\ldots)\\<br />\begin{align}<br />=\ & r_{11}|\phi_1\rangle\otimes|\psi_1\rangle + r_{12}|\phi_1\rangle\otimes|\psi_2\rangle+\ldots+\\<br />&r_{21}|\phi_2\rangle\otimes|\psi_1\rangle + r_{22}|\phi_2\rangle\otimes|\psi_2\rangle+\ldots+\\<br />& \vdots<br />\end{align}$$<br />What are the coefficients $r_{ij}$?<br /><br />Well, it's fairly obvious that $|r_{ij}|^2=|p_{i}|^2|q_{j}|^2$ -- that the <b><i>probabilities</i> are multiplicative</b>, this is tautological given what we want our product to represent -- the probability that the system is found in the state $|\phi_i\rangle\otimes|\psi_j\rangle$ is the probability that the objects are found in states $|\phi_i\rangle$ and $|\psi_j\rangle$, which is the product of the respective probabilities, as they are independent objects.<br /><br />Is it also true that the <b>probability amplitudes are multiplicative</b>, i.e. $r_{ij}=p_iq_j$?<br /><br />This may seem hard to prove, but the idea is quite simple: suppose we observe the state with the observables $X$ and $Y$, and find it in the state $|\phi_i\rangle\otimes|\psi_j\rangle$. Well, then if $r_{ij}=u_{ij}|r_{ij}|$ for some unit complex number $u_{ij}$, then from the right-hand-side, we must have collapsed to $u_{ij}|\phi_i\rangle\otimes|\psi_j\rangle$. So we must have $u_{ij}=1$.<br /><br />So indeed the product we're looking for is exactly the <b>tensor product from tensor algebra</b>.<br /><br /><div class="twn-furtherinsight">Here's a thing worth noting -- we've been referring to "systems" and "objects" as if they are somehow completely distinct things. But are they? The cat's state is itself a <b>tensor product</b> of a massive number of different states belonging to each <b>elementary particle</b> in its body, and lives already in a massive Hilbert space, because the "object" is itself a system. We will use the term <b>subsystem</b> instead of <b>object</b> from now.</div><br /><hr /><br />Alright: so we now know that elements of the tensored Hilbert space are all states, and only the ones that are <b>factorable</b> into an element of $H_1$ and $H_2$ represent subsystems that are independent. This is precisely how only <b>factorable probability mass/density functions</b> represent independent variables in statistics. Otherwise the variables are correlated -- not necessarily linearly correlated, but correlated.<br /><br />Such correlations can, of course, exist in our quantum mechanical theory, too -- like the cuckoo-TNT system we mentioned earlier. These are called <b>quantum correlations</b> or <b>quantum entanglement</b>.<br /><br />Why the fancy name? Because its consequences may seem superficially kinda "surprising". It's also a demonstration of quantum mechanics being different from classical mechanics, because without entangled states, the dimension of $H_1\otimes H_2$ would indeed be the sum of those of $H_1$ and $H_2$ rather than their product, like with phase spaces in classical mechanics.<br /><br />OK, what kind of surprising consequences?<br /><br />They're basically all of the following nature: suppose we have a state given by:<br /><br />$$\frac1{\sqrt2} (|\phi\rangle\otimes|\psi\rangle+|\psi\rangle\otimes|\phi\rangle)$$<br />I.e. two entangled particles where we know that they are in <b>two distinct states, but we don't know which is which</b>. Such a state can certainly be produced -- how? Just put two identical independent particles in a box then do a <b>"partial" measurement</b> -- a "<b>peek</b>" -- (which can be achieved, e.g. by some logic gates) that checks if they're in the same state or not, and uncovers no other information.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RdQ446MRCZ0/XR-4FMLPe8I/AAAAAAAAFnI/Wdq0C5sq0csOQ7oa3N_WDTclzTPPFqMNACLcBGAs/s1600/alibobeshwar-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="472" data-original-width="1144" height="132" src="https://1.bp.blogspot.com/-RdQ446MRCZ0/XR-4FMLPe8I/AAAAAAAAFnI/Wdq0C5sq0csOQ7oa3N_WDTclzTPPFqMNACLcBGAs/s320/alibobeshwar-1.png" width="320" /></a></div><br />Now separate the particles spatially -- there's nothing wrong with this, they're still a system, which still has a state -- and give one to Alice and the other to Bob. Now if Bob looks at his particle and sees it in $|\psi\rangle$, he immediately <i><b>knows</b></i> that Alice could only observe her particle to be in state $|\phi\rangle$ -- <b>there's nothing Alice can do to change this outcome</b>.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-IgUTjPAGjsk/XR-4JqV0FJI/AAAAAAAAFnM/K5TDzVtWn_Qco4PtrHfMEkryqkq5iVJMQCLcBGAs/s1600/alibobeshwar-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="540" data-original-width="1144" height="151" src="https://1.bp.blogspot.com/-IgUTjPAGjsk/XR-4JqV0FJI/AAAAAAAAFnM/K5TDzVtWn_Qco4PtrHfMEkryqkq5iVJMQCLcBGAs/s320/alibobeshwar-2.png" width="320" /></a></div><br />(you may worry that spatially separating the particles alters the state in some important way -- but it doesn't: the states $|\phi\rangle$ and $|\psi\rangle$ are both transformed individually that doesn't change the entangled structure of the combined state -- make sure this makes sense to you. But if it makes you happy, you could imagine the particles were already spatially separated when they were first entangled.)<br /><br />OK, perhaps you don't find this particularly surprising or unintuitive -- I don't either. But perhaps you do -- perhaps you think there's a violation of locality -- and the reason you do is because you haven't yet fully accepted <a href="https://thewindingnumber.blogspot.com/2017/08/three-domains-of-knowledge.html">logical positivism</a>. Let's consider what locality entails for each observer in the set-up, and see if it's violated:<br /><ul><li><b>Alice:</b> From Alice's perspective, Bob opening his box is just another way to observe her particle -- or rather, she can <b>observe Bob's brain</b> that contains the information, which collapses the state from her perspective. But this is <b>perfectly local</b> -- it takes time for information to propagate from Bob to her. Alternatively, if she doesn't observe Bob's brain and <b>just observes her own box</b> later, that's when her state collapses to $|\phi\rangle$, and she then learns that Bob had collapsed his state into $|\psi\rangle$ -- but as Bob <b>cannot choose what his state vector collapses</b> to, so he can't send her any information through entanglement. Even if there were a large number of entangled systems this way, the distribution of the states Alice can observe is the same whether or not Bob has collapsed his states (you can confirm this -- this is an idea called the <b>no-communication theorem</b> which we will discuss later in more mathematical detail).</li><li><b>Bob: </b>Certainly, Bob acquires knowledge of something far away, but no information actually propagated from Alice to him -- he just observed his own box.</li><li><b>another observer: </b>Charlie, who stands somewhere between Alice and Bob, too takes time to observe Bob's brain.</li></ul><div>So there really isn't a violation of locality. This isn't surprising at all -- certainly one could have <b>classical correlations</b> too. You could just juggle two distinct particles in a box and give them to each person, and Bob discovering his particle allows him to determine Alice's particle. </div><div><br /></div><div>The difference between the classical case and the quantum case is that in the classical case you could <i>pretend</i> that there's some <b>hidden truth</b> that is just not known to the observers. Quantum mechanics forbids any such hidden truth (as confirmed by commutator relations), and forces you to accept logical positivism, and there cannot be a "universal observer" as such a notion is inherently non-local. But the fact that correlation isn't non-local doesn't depend on whether you have <b>metaphysical notions of hidden truths</b> in classical physics -- it is a physical question, and is the <b>same in the classical and quantum cases</b>.<br /><br /><hr /><br />Are we done writing down our algebra of tensor products? We still haven't discussed how <b>inner products</b> and <b>projections </b>of tensor products behave. The basic question is "how do we <b>upgrade/combine operators</b> from $H_1$ and $H_2$ to $H_1\otimes H_2$? Let's start with the simple case of a factorable state in the form $|\phi\rangle\otimes|\varphi\rangle$. Suppose we apply a projection operator $X$ on the first particle. Have we made any observation on the second state? No -- just an identity projection. Or we could make an observation, a projection $Y$. So we can say that for the combined observation $X\otimes Y$,<br /><br />$$(X\otimes Y)(|\phi\rangle\otimes|\varphi\rangle)=(X|\phi\rangle)\otimes(Y|\varphi\rangle)$$<br />And an upgrade from $H_1$ to $H_1\otimes H_2$ is just tensoring with the identity $X\otimes 1$.<br /><br />But the full range of operators on $H_1\otimes H_2$ is a lot more complicated. We could consider <b>entangled states</b>. We could consider operators that are entangled (<b>"partial measurement" operators</b> like we described -- think about what these are). How would measurements on linear combinations of states look like (we know they <i>should</i> apply linearly, but let's show that)?<br /><br />Suppose we have a state in the form $\frac1{\sqrt2}|\phi\rangle\otimes|\varphi\rangle+\frac1{\sqrt2}|\varphi\rangle\otimes|\phi\rangle$. What exactly is this? We had two independent subsystems each in state $\frac1{\sqrt2}|\phi\rangle+\frac1{\sqrt2}|\varphi\rangle$, then we made an observation that showed they were in two distinct states -- we don't know which is in which. Now we make an observation and collapse the first subsystem to $|\chi\rangle$.<br /><br /><b>How does this alter the state of sub-system 2?</b><br /><br />OK, so "was" (quotation marks! quotation marks!) the system in $|\phi\rangle\otimes|\varphi\rangle$ or $|\varphi\rangle\otimes|\phi\rangle$? This is a question for <b>Bayes' theorem</b>.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-pf_gBpB1fu4/XSGtBgjdpoI/AAAAAAAAFns/zgumWWJQ2k8MIPnK3NiRd72_5fCIZDSGwCLcBGAs/s1600/bayes.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="396" data-original-width="560" height="226" src="https://1.bp.blogspot.com/-pf_gBpB1fu4/XSGtBgjdpoI/AAAAAAAAFns/zgumWWJQ2k8MIPnK3NiRd72_5fCIZDSGwCLcBGAs/s320/bayes.png" width="320" /></a></div><br /><div class="twn-pitfall">The above diagram is for <em>illustration only</em>. There is no real hidden truth (as we will see a few articles from now) of whether the state was initially $|\phi\rangle\otimes|\varphi\rangle\otimes|\phi\rangle$ or otherwise. But the probabilities still obey all the standard laws, such as Bayes's theorem, so tree diagrams make sense to illustrate this.</div><br />So the "probability that the system was in $|\phi\rangle\otimes|\varphi\rangle$" (quotation marks! quotation marks!) is:<br /><br />$$\frac{\frac12|\langle\chi|\phi\rangle|^2}{\frac12|\langle\chi|\phi\rangle|^2 + \frac12|\langle\chi|\varphi\rangle|^2}$$<br />(which if $\langle\phi|\varphi\rangle=0$ is just $|\langle\chi|\phi\rangle|^2$) And analogously for the other possibility. So the collapse of sub-system 1 to $|\chi\rangle$ collapses the entire state to<br /><br />$$|\chi\rangle\otimes\left(\frac1{\sqrt2}\langle\chi|\phi\rangle\cdot|\varphi\rangle+\frac1{\sqrt2}\langle\chi|\varphi\rangle\cdot|\phi\rangle\right)$$<br />Or some normalisation thereof if $\langle\phi|\varphi\rangle\ne0$. You can confirm that if $|\chi\rangle=|\phi\rangle$ or $|\chi\rangle=|\varphi\rangle$, this reduces to $|\phi\rangle\otimes|\varphi\rangle$ or $|\varphi\rangle \otimes |\phi\rangle$ respectively as we expect.<br /><br />You can check that this is <b>precisely what you get from applying the projection operator</b> $|\chi\rangle\langle\chi|\otimes 1$ as a linear operator to the original state. The above argument can be repeated for a general vector in the tensored space, yielding the linearity of the tensored operator.<br /><br /><div class="twn-exercises">What about <b>inner products of tensored states</b>? Convince yourself that the inner product of $|\phi_1\rangle\otimes|\phi_2\rangle$ and $|\chi_1\rangle \otimes |\chi_2\rangle$ is $\langle\phi_1|\chi_1\rangle\langle\phi_2|\chi_2\rangle$.</div><br />There was the other case we mentioned -- we may have operators that are themselves entangled. What does this mean? Suppose we start with the factorable state:<br /><br />$$\frac12|\phi\rangle\otimes|\phi\rangle+\frac12|\phi\rangle\otimes|\varphi\rangle+\frac12|\varphi\rangle\otimes|\phi\rangle+\frac12|\varphi\rangle\otimes|\varphi\rangle$$<br />Then perform the observation corresponding to "are the states different from each other?" This is a projection onto the plane spanned by $(|\phi\rangle\otimes|\varphi\rangle,|\varphi\rangle\otimes|\phi\rangle)$, perpendicular to the plane spanned by $(|\phi\rangle\otimes|\phi\rangle,|\varphi\rangle\otimes|\varphi\rangle)$ (confirm that this is true based on our discussion above of applying inner products to tensored states) -- we can write this as:<br /><br />$$\left(|\phi\rangle\otimes|\varphi\rangle\right)\left(\langle\phi|\otimes\langle\varphi|\right) +<br />\left(|\varphi\rangle\otimes|\phi\rangle\right)\left(\langle\varphi|\otimes\langle\phi|\right)<br />$$<br />(check that this, and the system of "distributing bras" makes sense). Or alternatively:<br /><br />$$|\phi\rangle\langle\phi|\otimes|\varphi\rangle\langle\varphi|+|\varphi\rangle\langle\varphi|\otimes|\phi\rangle\langle\phi|$$<br />One can check that applying this operator to the factorable state indeed results in the entangled state (up to normalisation). For better insight, perform the operation on the factored form of the state.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-_grXpKdhg2s/XSI6bApIi8I/AAAAAAAAFoE/ohtG-MZb51I1g34o5G5fiXQlZ5ch3BQ6wCLcBGAs/s1600/factorable%2Bstates%2Band%2Boperators.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1400" data-original-width="808" height="400" src="https://1.bp.blogspot.com/-_grXpKdhg2s/XSI6bApIi8I/AAAAAAAAFoE/ohtG-MZb51I1g34o5G5fiXQlZ5ch3BQ6wCLcBGAs/s400/factorable%2Bstates%2Band%2Boperators.png" width="230" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">In the representation shown in the above diagram, a <b>factorable state</b> is one given by a <b>single square region</b>, while a <b>factorable projection operator</b> is one that selects a square region out of the maximally distributed state. With this notion, it becomes clear that <b>an entangled operator (one that cannot be factored into operators on each Hilbert space) is the only kind of operator that projects a factorable state into an entangled state</b>. </div><br /><div class="twn-pitfall">We're not saying that entangled operators <em>always</em> project factorable states onto entangled ones -- you can just measure an irrelevant property of the system (e.g. you know each particle is either in the UK or France, and you check "are they both in India?"). But they are the only operators that <em>can</em>, and for any such operator, there exist factorable states that it entangles (almost by definition).</div><br /><div class="twn-pitfall">Note that we're only talking about <em>projection operators</em> above. We could certainly have factorable observables that enforce a partial measurement -- e.g. $X_1\otimes X_2$, which measures the product of the positions of the two particles -- but the projection operators onto each of the eigenstates of this operator are not factorable (check this).</div><br /><div class="separator" style="clear: both; text-align: left;">The following rules then determine the action of our operations on the tensored Hilbert space.</div><div class="separator" style="clear: both; text-align: left;"></div><ol><li>$(A\otimes B)(|\phi\rangle\otimes|\varphi\rangle)=(A|\phi\rangle)\otimes(B|\varphi\rangle)$ -- the tensored operators <b>associate</b> with the corresponding states.</li><li>The tensor product of two linear operators is <b>linear</b>.</li><li>The image of a <b>linear</b> combination of operators is the linear combination of the images, i.e. $(A+B)|\psi\rangle=A|\psi\rangle+B|\psi\rangle$.</li></ol></div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-73839945827608305552019-06-17T02:38:00.001+01:002019-07-16T21:01:20.735+01:00Derivations and the Jacobi IdentityLet's consider a new way to think of the Lie algebra to a group -- instead of just considering the tangent vector to be <i>at</i> the identity, we could smear it across the group to form a <b>vector field</b>, resolving questions of whether our tangent space "really needs to be" at the identity (the exponential map in matrix representation only exists in the traditional form if we're talking about tangent vectors at the identity, but we're free to write down the Lie algebra in this way).<br /><br />But not every vector field is a valid element of the Lie algebra. We need the vector field to be "<b>constant</b>" across the manifold in some sense so that that constant vector it equals is the tangent-space-at-the-identity element it corresponds to. But what exactly do we mean by "constant" on a Lie Group?<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Wv4qP6_NVOY/XQJ0FqkQ5HI/AAAAAAAAFmE/sImh5ae4NY82cWm_nUlQmc4bWe9zQEqlACLcBGAs/s1600/leftinvariantvectorfield.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="633" data-original-width="633" height="320" src="https://1.bp.blogspot.com/-Wv4qP6_NVOY/XQJ0FqkQ5HI/AAAAAAAAFmE/sImh5ae4NY82cWm_nUlQmc4bWe9zQEqlACLcBGAs/s320/leftinvariantvectorfield.png" width="320" /></a></div>In the case of the unit circle in the complex plane, we have an idea of what we want -- the vector field $T(M)$ is constant over the group if it is determined by the value at the identity as $T(M)=MT(0)$.<br /><br />Is this preserved in the matrix representation of the group? Well, yes, because the correspondence between complex numbers and spiral matrices is a homomorphism. We can use this as a motivation to define the condition for a vector field to be a Lie algebra on a matrix Lie group -- it needs to be a <b>left-invariant vector field</b>, i.e. we need that the value of the vector field determined as $T(M)=MT(0)$.<br /><br /><div class="twn-furtherinsight">Why left-invariant? Why not right-invariant? Why matrix multiplication at all? The choices made here are certainly arbitrary to some extent. When we study <b>abstract lie algebra</b>, we'll just have "left-multiplication by $M$" being replaced by a group action and the usage of matrix multiplication is a <b>choice of representation</b>. In the context of abstract Lie algebra, the "left-multiplication by $M$ we're interested in is really the <i>derivative</i> of the group homomorphism $M:G\to G$, which is a linear map between the tangent spaces at $I$ and $M$. You can show that this map is represented by matrix left-multiplication given a matrix representation (i.e. letting the group be $GL(n,\mathbb{C})$).</div><br /><hr /><br />Ok, why did we just do that? Why did we upgrade our tangent vectors to vector fields? If it wasn't obvious already, the <b>noncommutativity</b> of a Lie group is "the" feature of importance in a Lie group, at least in some neighbourhood of the identity (we will later find out exactly the kind of features that aren't determined by just the Lie bracket -- the important keywords here are <b>connected</b> and <b>compact</b>) -- if the Lie group is commutative, then the Lie algebra is just a vector space with no additional structure, and the Lie group is a "basically unique" choice.<br /><br />In our discussions of noncommutativity in the <a href="https://thewindingnumber.blogspot.com/2019/05/an-easy-way-to-see-closure-under-lie.html">last article</a>, we repeatedly referred to <i>flowing along a vector</i> -- the nature of noncommutativity is inherently "dynamical" in this sense. So we need to talk about <i>differentiating along the corresponding vector field to a tangent vector</i>.<br /><br />So let's upgrade our vector fields to derivative operators, or <b>derivations</b> $D$. These are operators on functions $f:G\to \mathbb{R}$ that tell you the derivative of $f$ in the direction of the vector field -- the left-invariant ones are a certain generalisation of the directional derivative operators.<br /><br />Well, what exactly is a derivation? On Euclidean space, directional derivatives can be imagined as stuff of the form $f\mapsto\vec{v}\cdot\nabla f$ -- but this requires the concept of a dot product which is quite weird within the context of matrix groups. But if you try to work this out on the unit circle (do it!), you might get an idea: we can define a <b>curve</b> $\gamma:\mathbb{R}\to G$ passing through a point and consider:<br /><br />$$f\mapsto(f\circ \gamma)'(t)$$<br />At the point and you get precisely the directional derivative in the direction $\gamma'(t)$ (show that this is right in Euclidean space, and make sure you understand why it is right/makes sense -- it's the chain rule, and a certain analogy exists to projecting matrices onto subspaces in linear algebra). And if we just want tangent vectors at the identity, we can just consider the operation $f\mapsto(f\circ \gamma)'(0)$.<br /><br />OK. Let's try to "<b>abstract out</b>" the properties of a derivation $D$, i.e. something that just allows us to define what a derivation is, abstractly, that is equivalent to being an operator of the above form.<br /><br />What makes an operator a directional derivative? Certainly it must be a linear operator -- but not every linear operator is a directional derivative. The key idea behind a directional derivative is that $D(f(x))$ is <b>determined in a specific way by $D(x)$, the rate at which $x$ changes</b> in the specified direction.<br /><br />How do we use this? Well, if you think about it a little bit, we can restrict $f$ to be analytic -- so we need:<br /><ol><li>$D(x)$ predicts $D(x^n)$ in the right way -- this is ensured by the <b>product rule</b> -- $D(fg)=f\ Dg + g\ Df$.</li><li>$D(x^n)$ for all $n$ predicts $D(a_0+a_1x+a_2x^2+\ldots)$ in the right way -- this is ensured by <b>linearity</b>.</li></ol><div class="twn-beg">If anyone can motivate the definition of a derivation without restricting to analytic functions, tell me.</div><br />An operator that satisfies these two properties is called a <b>derivation</b> -- one can prove additional properties from these axioms fairly easily, e.g. $D(c)=0$ for constant $c$, etc.<br /><br /><hr /><br />Let's think about why this whole construction above makes sense.<br /><br />Let $G$ be the group of translations of $\mathbb{R}$ -- one can parameterise them by the translated distance as $\Delta(p)$ with composition given by $\Delta(p)\Delta(q)=\Delta(p+q)$. Well, this is isomorphic to the additive group on the reals, and in turn to the multiplicative positive real numbers. We can consider the group to be acting on real analytic functions by translations of the domain: $\Delta_pf(x):=f(x+p)$ The Lie algebra is just spanned by the derivative of $\Delta(p)$ at the identity, that is:<br /><br />$$\Delta '(0) = \lim\limits_{h \to 0} \frac{{\Delta (h) - 1}}{h} = \frac{d}{{dx}}$$<br />And our Lie algebra members are all real multiples of $d/dx$ -- these are precisely the directional derivatives on $\mathbb{R}$. Similar constructions can be made on $\mathbb{R}^n$, or a general automorphism group.<br /><br />So we see that the "derivations" construction of the Lie algebra actually are <b>the tangent vectors on the Lie group identified as the automorphism group of some object</b>. If you've ever done some differential geometry, this gives you the motivation for treating partial derivatives as basis vectors.<br /><br /><div class="twn-pitfall">Our discussion of derivations so far works both for derivations (general vector fields on the manifold) and <b>point-derivations</b> (basically tangent vectors at a specific point). Under the first interpretations, though, we're <b>not actually interested in all derivations</b>, only the left-invariant ones. For example, in the example above, an operation of the form of $p(x)\frac{d}{dx}$ is linear and satisfies the product rule:<br /><br />$$p\frac{d(f\cdot g)}{dx}=g\cdot p\frac{df}{dx}+f\cdot p\frac{dg}{dx}$$<br />And why shouldn't it? It corresponds to a vector field all right -- $xe_x$. But this is not a <i>left-invariant vector field</i>.</div><br /><div class="twn-furtherinsight">Interpret the <b>Taylor series as the exponential map</b> from the Lie algebra to the Lie group! Make the "similar construction" in the multivariate case ($\mathbb{R}^n$) and interpret the <b>multivariate taylor series</b> as an exponential map -- i.e. that $\Delta=\exp\nabla$</div><div><div><br /></div><div><hr /></div><div><br /></div>The first thing that we can do with our formalism of point-derivations is give another proof of closure under the Lie Bracket: </div><div><br /></div><div>$$[D_1,D_2](fg)=f[D_1,D_2]g+g[D_2,D_1]f$$</div><div>I.e. that the Lie Bracket of two derivations is also a derivation. Check that the above is correct by expanding stuff out and using the product rule for $D_1$ and $D_2$.</div><div><br /></div><div>There's another way that derivations can be used to show closure under the Lie Bracket, which shows more closely the connection to the product rule for the second derivative discussed in the <a href="https://thewindingnumber.blogspot.com/2019/05/an-easy-way-to-see-closure-under-lie.html">previous article</a>.<br /><br />One might wonder if, like the directional derivative at the identity in the $c'(0)$ direction is given by $(f\circ c)'(0)$, the directional derivative at the identity in the $c''(0)$ direction may be given as $(f\circ c)''(0)$. Well, in general:<br /><br />$$(f\circ c)''(t)=c''(t)\cdot\nabla f(t)+c'(t)\frac{d}{dt}\nabla f(t)$$<br />Which since $c'(0)=0$, at $t=0$ is simply equal to the first term, the directional derivative in the $c''(0)$ direction. So we just need to show that $f\mapsto (f\circ c)''(0)$ is a derivation. This follows from the <b>Leibniz rule for the second derivative</b>, and the fact that the first derivative of $c$ is zero.</div><div><br /></div><div><hr /><br /><div>OK, one more thing before we actually do something useful -- something we haven't done before in other ways.<br /><br />This is an <b>extended pitfall prevention</b>, because I fell into this pit myself. When thinking about left-invariance of a vector field $D$, I formulated the idea in my head this way: the idea is that under $D$, we should get the same result if we differentiate (derivate?) $f$ at 0 or if we translate it forward by $x$ and derivate it at $x$. i.e. where $\phi^h$ represents the translation $f(x)\mapsto f(x-h)$, we want:<br /><br />$$D=\phi^{h}D\phi^{-h}$$</div></div><div>(<i>THIS IS WRONG!</i> This is a pitfall prevention, not an actual result!) And I looked at some simple Abelian cases, like the additive real group and the circle group and thought this was clearly true.<br /><br />But it's wrong. How do we know that? Well, let's consider the group action $\phi^{-h}D\phi^h$ -- certainly at $h=0$, it's the identity, so let's differentiate it (against $h$) at 0. We get, where $d\phi_0$ is the derivative of $\phi$ at 0:<br /><br />$$[d\phi_0, D]$$<br />Which isn't zero. So my argument must be wrong -- I must have assumed abelian-ness somehow.<br /><br />Here's the problem: the final left-multiplication by $\phi^h$ is fine -- it just brings the derived function back to the origin, but "translating the function forward and then differentiating it" messes things up when the direction you're differentiating in doesn't commute with the direction of translation. Draw some pictures of curved surfaces to convince yourselves of this.<br /><br />So left-multiplication determines a sort of "parallel transport" on the Lie Group, while right-multiplication is an "alternative" way to compare vectors in different tangent spaces, and its disagreement with left-multiplication determines the non-commutativity of the group. Well, this choice of left-multiplication vs right-multiplication is really a convention, arising from the choice of representation.<br /><br /><hr /></div><div><br />OK, the useful thing: Suppose we're interested in "nested Lie brackets" $[X,[Y,Z]]$. We're talking about conjugating $[Y,Z]$ as $\phi^p[Y,Z]\phi^{-p}$ where $d\phi_0=X$ so that to first-order in $p$:<br /><br />$$\phi^p[Y,Z]\phi^{-p}=[Y,Z]+p[X,[Y,Z]]$$<br />Since conjugation is a homomorphism, we can also write:<br />$$\begin{align}<br />\phi^p[Y,Z]\phi^{-p} &= [\phi^pY\phi^{-p},\phi^pZ\phi^{-p}] \\<br /> &= [Y+p[X,Y],Z+p[X,Z]] \\<br /> &= [Y,Z] + p([Y,[X,Z]]+[[X,Y],Z])\\<br />\Rightarrow [X,[Y,Z]]&=[Y,[X,Z]]+[[X,Y],Z]<br />\end{align}$$<br />Now, couldn't we have just have proven this by expanding everything out as commutators? Sure, but this provides more insight as to what's going on -- you might notice the resemblance to the product rule. Indeed, this identity -- <b>the Jacobi identity</b> -- is perhaps best stated as:<br /><br />"<b>A derivation $X$ acts through the Lie Bracket as a derivation on the space of derivations where "multiplication" is given by the Lie Bracket.</b>"<br /><br />In this sense, it's actually quite expected -- it results from the fact that the Lie Bracket is a bilinear operator obtained from <b>differentiating a group symmetry, conjugation</b> -- this mandates that it is a derivation.<br /><br />As it turns out, the Jacobi identity, along with the antisymmetry and the bilinearity, determines the Lie Algebra -- it is enough to "abstract out" the properties of a Lie Algebra. Why? This is something we will see over several articles, which will then allow us to motivate abstract Lie algebra.</div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-13410626595773071622019-06-03T01:38:00.000+01:002019-07-08T14:39:24.152+01:00Dealing with eigenspaces; noncommuting variables and another postulateAt the end of the <a href="https://thewindingnumber.blogspot.com/2019/06/projection-operators-generalised-borns.html">last article</a>, you might've wondered how one might talk about 3-dimensional position -- so far, we've only considered an operator representing a one-dimensional position, e.g. to find the $x$-co-ordinate of something. This is obviously insufficient. What we need to measure three-dimensional position is <b>three separate measurements</b> of the different spatial dimensions.<br /><br />But there's a problem here -- we know that upon an observation, the state vector is modified, in that it is replaced by some eigenstate of the observable in question. So after measuring the $x$-co-ordinate, if we measure the $x$-co-ordinate afterwards, are we "really measuring" the $y$-co-ordinate of the particle as it was in its initial state, or have we shaken it around a bit?<br /><br />Let's try to think very precisely about what's going on here. The first question to ask is -- how do the eigenstates of the $X$ operator look like?<br /><br />Well, because it's an observable, it must have eigenstates that produce a full eigenbasis. But if each eigenvalue <b>corresponded to just one eigenstate</b>, then we would only have information about the $x$-positions of particles, which is clearly insufficient to represent the entire state of a particle. So we must have each eigenvalue -- each $x$-position -- correspond to <b>an infinitude of states, an eigenspace,</b> corresponding to each position with the same $x$-position (which, remember, is their eigenvalue), <b>and their superpositions thereof</b>. And this makes a lot of sense -- each position is a state, but these positions give us the same values for the $x$-position.<br /><br />What this means is that each function of the form $g(y,z)\delta(x-x_0)$ is an eigenstate of the $X$ operator, with eigenvalue $x_0$. So we can have something like $g(y,z)=\delta(y-y_0)\delta(z-z_0)$, which would also be an eigenstate of the $Y$ and $Z$ operators (with eigenvalues $y_0$ and $z_0$ respectively), or some other linear combination, which would no longer be an eigenstate of $Y$ and $Z$.<br /><br />OK. So what happens when we observe $X$ taking the value $x$? You might think that the state just turns into some <b>randomly chosen eigenstate </b>with the observed eigenvalue $x$. But if you think about it, this would be quite <b>unphysical</b>, as this would mean our $X$-observation would <i><b>magically change</b></i> <b>our knowledge about the $y$ and $z$ positions</b> too (for example, if the state collapsed into a state that is also an eigenstate of $Y$, we would have accidentally completely measured the $y$ position) -- but we can certainly design experiments in which an observation of an $x$ position does not so radically rattle the particle in the $y$ and $z$ directions.<br /><br /><div class="twn-furtherinsight">Another way to think about this is that the eigenvalues of an operator are the measurements we're getting out. If a state is already in an eigenspace corresponding to the eigenvalue $\lambda$, and we "measure" the observable again (i.e. do nothing), the state shouldn't change.</div><br />So we don't want the observation to change the $y$ and $z$ probability information in any way -- so what we're looking for is a <b>projection of the state into the eigenspace</b> with eigenvalue $x$. This is in line with our discussion of the generalised Born's rule in the <a href="https://thewindingnumber.blogspot.com/2019/06/projection-operators-generalised-borns.html">last article</a> -- but it is an <b>additional postulate</b> of quantum mechanics, or rather generalises the existing postulate about states projecting randomly into eigenstates.<br /><br /><div class="twn-furtherinsight">Weren't the eigenvalues completely irrelevant? You ask. You can just make the eigenvalues whatever you want, they're just labels to the eigenstates, right? Not really -- the eigenvalues <em>are</em> exactly what you measure. You can choose to measure any function of them, but if you use a function that isn't injective, you are measuring <em>less information</em> about the system, and you're collapsing the state "less" in this sense.</div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Et48Tk7LJKk/XPGTZ8d7geI/AAAAAAAAFlY/hfBffc8ndk0iGTaYB3E1f1YlgaoIj1VYwCPcBGAYYCw/s1600/projection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="394" data-original-width="296" height="320" src="https://1.bp.blogspot.com/-Et48Tk7LJKk/XPGTZ8d7geI/AAAAAAAAFlY/hfBffc8ndk0iGTaYB3E1f1YlgaoIj1VYwCPcBGAYYCw/s320/projection.png" width="240" /></a></div><br />(<i>Unlike in the projection above, the subspace being projected onto upon measurement of X is itself an infinite-dimensional space, spanned by the different positions Y and Z an take. Oh, and we have to normalise the projected state.</i>)<br /><br />Something that we've seen in this discussion above is that there is a <b>common eigenbasis</b> for $X$, $Y$ and $Z$ -- specifically, the <b>position basis</b>, the basis of Dirac delta distributions centered at the different points in three-dimensional space. From linear algebra, this is equivalent to saying that $X$, $Y$ and $Z$ commute.<br /><br />What this means is that when you then go on to measure $Y$, and then $Z$, you end up in a state that is a common eigenstate of $X$, $Y$ and $Z$ -- so that you have precise values for each co-ordinate of the positions. And as the $X$ information is only altered in the $X$ observation, etc. so the probability distributions for each variable is the same <b>regardless of the order</b> you measure it in -- so <b>three-dimensional position is indeed well-defined</b> in quantum mechanics.<br /><br /><hr /><br /><div class="twn-pitfall">Just to be clear, the fact that $X$, $Y$ and $Z$ commute is a <b>postulate</b> -- equivalent to the physical claim that each position in space is in fact an eigenstate, that we can in fact pinpoint the position of a particle exactly. We <em>cannot</em> do this e.g. for position and momentum -- $(x,p)$ pairs cannot be considered eigenstates, as there is no simultaneous eigenstate of $X$ and $P$. So for example, you <b>can't just construct a spacefilling curve</b> in $(x,p)$ space to measure position and momentum simultaneously, because the parameters of the curve would simply not have any corresponding eigenstates. The $(x,p)$ space <b>does not exist in the Hilbert space</b>, there are no states that precisely put down the values of position and momentum. It is possible to construct quantum mechanical theories -- called <b>non-commutative quantum theories</b> -- in which the $(x,y,z)$ space isn't in the Hilbert space either, so that our perception of three-dimensional positions must necessarily be approximate.<br /><br />We're assuming here that this is not so, that three-dimensional space does form an eigenbasis for the $X$, $Y$ and $Z$ operators, that the representation of the $Y$ operator in the $X$ basis is indeed $\psi(x,y,z)\mapsto y\psi(x,y,z)$, not something weird and fancy.</div><br /><hr /><br />A very different picture arises when you have noncommuting variables. Suppose two operators $X$ and $P$ don't commute, i.e. there is no common eigenbasis for them. So once you observe $X$ and put it in some eigenspace of $X$, there is a non-zero probability that the state will have to be projected out of this $X$-eigenspace when $P$ is measured.<br /><br />So this means that the observables $X$ and $P$ <b>cannot be measured simultaneously</b>. Some specific bounds on the uncertainties will be discussed in the <a href="https://thewindingnumber.blogspot.com/2019/06/position-momentum-bases-fourier.html">next article</a>. For now, let's demonstrate an example of two noncommuting variables: <b>position</b> and <b>momentum</b> (in the same direction).<br /><br /><b>NOTE: We will show in the next article the given results about momentum being $-i\hbar \frac{\partial}{\partial x}$, etc. Just intuit them out here from its eigenvectors.</b><br /><br />As we've shown before, the position and momentum operators can be given in the position basis as $x$ and $-i\hbar\partial/\partial x$ respectively. What this means is that given a wavefunction $\psi(x)$, it transforms under these operators as $x\psi(x)$ and $-i\hbar\psi'(x)$ respectively (check that this makes sense -- especially for the position case -- and also that one can go the other direction and show that the <b>corresponding eigenvectors</b> of the position operator must be <b>Dirac delta functions</b>).<br /><br />So do these operators commute? Clearly not -- the eigenbasis of one is Dirac delta functions in $x$, the other's is sinusoids in $x$. But we can also verify this computationally:<br /><br />$$\begin{align}XP &= -i\hbar x \frac{\partial}{\partial x}\\<br />PX\{\psi(x)\}&=-i\hbar\frac{\partial}{\partial x}(x\psi(x))\\<br />&= -i\hbar\left[\psi(x)+x\psi'(x)\right]\\<br />\Rightarrow PX &= -i\hbar x\frac{\partial}{\partial x} -i\hbar\end{align}$$<br /><br />So we have the commutator $i[X,P]=-\hbar$ (why do we talk about $i[A,B]$? Because as it is easy to see, for any Hermitian $A$ and $B$, this is Hermitian, while $[A,B]$ is simply anti-Hermitian). This is the "purest" commutator -- a (scaled) Identity operator. Since we didn't use any other properties of position and momentum, <b>this is a property of all observables that are Fourier transforms of each other/canonically conjugate observables </b>(more on this in the next article).<br /><br /><hr /><br /><b>Exercise:</b> Write down the most generalised form of Born's rule accounting for generalised eigenspaces (the answer is identical to what we've already written, but make sure you understand it). Show, as in <a href="https://thewindingnumber.blogspot.com/2019/06/projection-operators-generalised-borns.html">the last article</a>, that the probability density of finding a particle somewhere in three-dimensional space is $|\Psi(x,y,z)|^2$ -- make sure you define $\Psi(x,y,z)$ clearly!<br /><br />Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-35716964878934175752019-06-02T19:17:00.001+01:002019-06-04T10:48:31.516+01:00Position, momentum bases and operators, Fourier transform, uncertaintyIn this article, we'll assume the de Broglie relation for all particles -- i.e. that their momentum is given by $p=hf$. This is actually quite an incredible assumption, even if not surprising -- we've accepted that a particle is a wave in the sense of probability (the wave describes the probability amplitude densities of finding it at some point), but why at all should the spatial frequency of the probability wave relate to its momentum?<br /><br />Well, it's natural for you to find this assumption unsatisfactory. We've been quite liberal in assuming the de Broglie relation earlier when <a href="https://thewindingnumber.blogspot.com/2019/05/from-polarisation-to-quantum-mechanics.html">motivating quantum theory</a>, too -- we'll later produce some motivation for the de Broglie relation for photons, and discuss derivations <i>from</i> quantum mechanics, axiomatising our theory clearly to eliminate circularities. But for now, let's not.<br /><br />The key point of $p=hf$ is that for a sinusoidal wave $e^{i \cdot 2\pi f \cdot x}$ (so the probability density is uniform, and the standard deviation in the observation of the particle's position is infinite), the momentum takes a specific <i>definite</i> value, $hf$, with zero standard deviation.<br /><br />Well, what if the wavefunction isn't a simple sinusoid, but some other distribution $\Psi(x)$? If you did all the assigned exercises in the <a href="https://thewindingnumber.blogspot.com/2019/05/from-polarisation-to-quantum-mechanics.html">first article</a>, you should know the answer (if not, work it out before reading on). Classically, if you could write that wavefunction as a sum of sinusoids (i.e. use a Fourier transform), then each sinusoid would have its own momentum and there would be some chunk of your matter in each of those momenta, forming a momentum distribution. In quantum mechanics, you can't have chunks of a single quantum, so you this distribution is a probability distribution (still a probability <i>amplitude</i> distribution, because we want superposition). We'll use the notation $\Psi(p)$ to represent this "<b>momentum-space wavefunction</b>", and we'll see why soon.<br /><br />So it's not too hard to see that the frequency distribution is simply the Fourier transform of $\Psi(x)$, while the momentum-space wavefunction is given by:<br /><br />$$\Psi(p)=\frac1h \mathcal{F}_x^{p/h}(\Psi(x))$$<br />Where $\mathcal{F}_x^{p/h}(\Psi(x))$ is the Fourier transform of $\Psi(x)$ (which is a function of $f$) written with the variable substitution $f=p/h$. Note that we're considering the non-normalised Fourier transform, in terms of ordinary frequencies.<br /><br />Well, $\Psi(x)\, dx$ and $\Psi(p)\, dp$ are just the representations of the state vector in the position and momentum bases respectively. So the <b>inverse Fourier transform acts as a <i>change-of-basis matrix</i></b> from the position basis to the momentum basis. I.e.<br /><br />$$|\psi\rangle_P=F|\psi\rangle_X$$<br />This change-of-basis matrix $F^{-1}$ precisely represents the <b>eigenstates of the momentum operator</b> written in the position basis, and the corresponding eigenvalues are the actual values of the momenta. So we have eigenstates $\frac1h e^{ix \cdot 2\pi p / h} dp$ with corresponding eigenvalues $p$.<br /><br />Before going any further, let's make sure we know exactly what this means: our change-of-basis matrix $F^{-1}$ is an uncountably infinite-dimensional "matrix" whose "indices" are denoted as $(x,p)$ in the rows-by-columns format. Its general entry is $\frac1h e^{ix \cdot 2\pi p / h} dp$, and each column -- here's the important bit -- each column holds <i>p</i> constant and varies <i>x</i>, i.e. each column, i.e. each eigenstate of $P$ is a function of $x$.<br /><br />Anyway, so we're looking for a linear operator $P$ solving the eigenvalue problem (and we're just ignoring the scalar multiples):<br /><br />$$P e^{ix \cdot 2\pi p / h} = pe^{ix \cdot 2\pi p / h}$$<br />It should be quite clear that the operator we're looking for is:<br /><br />$$\begin{align}P &= \frac{h}{2\pi i}\frac{\partial}{\partial x} \\<br />&= -i\hbar \frac{\partial}{\partial x} \end{align}$$<br />We need to be clear that this is the representation of the momentum operator <i>in the position basis</i> -- in the momentum basis, its representation is simply "$p$" (i.e. its action on each eigenstate $|p\rangle$ is to multiply it by $p$). Similarly, it should be easy to show that in the momentum basis,<br /><br />$$X=i\hbar\frac{\partial}{\partial p}$$<br /><b>Exercise:</b> make sure you clearly know and understand what the eigenvectors and eigenvalues of $X$ and $P$ are, in both the position and momentum bases. Hint: something about the Dirac delta function.<br /><br /><hr /><br /><b>Derivation of Heisenberg and Robertson-Schrodinger uncertainty principles</b><br /><b><br /></b>We can derive a variety of "uncertainty principles" -- inequalities showing trade-off between the certainties of two observables -- with some basic algebraic manipulation. It is important to note that none of these individual uncertainty principles is really much more fundamental than any of the others (or at least I don't see in what way they can be) -- one can always make stronger bounds for the uncertainty, and many stronger bonds exist than the ones we're showing here -- but the <i>concept</i> of an uncertainty principle is crucial, in that it demonstrates the rigorously difference between <b>quantum mechanics and statistical physics</b>. In general, the noncommutativity of observables (having no shared eigenstates) is something that has no analog in classical physics.<br /><br />OK. So we'll show two statements about the product of uncertainties of two observables, $(\langle A^2\rangle - \langle A\rangle^2)^{1/2}(\langle B^2 \rangle - \langle A \rangle^2)^{1/2} $. Once again, there is nothing special about the specific relations we will show -- we can consider other combinations than products, like $\Delta a^2 + \Delta b^2$, and indeed, there exist uncertainty relations for such terms.<br /><br />Defining $A'=A-\langle A\rangle$ and $B'=B-\langle B\rangle $ for <b>Hermitian</b> (this is important!) $A$ and $B$, we see that:<br /><br />$$\begin{align}<br />\langle A'^2\rangle \langle B'^2 \rangle &= \langle \psi | A'^2 | \psi \rangle \langle \psi | B'^2 | \psi \rangle \\<br />&= \langle A' \psi | A' \psi \rangle \langle B' \psi | B' \psi \rangle \\<br />&\ge |\langle \psi | A' B' | \psi \rangle| ^ 2 \\<br />&= \left|\frac12 \langle\psi|A'B'+B'A'|\psi\rangle + \frac12\langle\psi|A'B'-B'A'|\psi\rangle\right|^2 \\<br />&= \frac14 |\langle\psi|A'B'+B'A'|\psi\rangle|^2 + \frac14|\langle\psi|A'B'-B'A'|\psi\rangle|^2 \\<br />&= \frac14 |\langle \{A-\langle A\rangle, B-\langle B\rangle\} \rangle| ^2 + \frac14 |\langle [A,B]\rangle|^2\\<br />&= \frac14 |\langle\{A,B\} \rangle - 2\langle A\rangle \langle B\rangle |^2 + \frac14|\langle[A,B]\rangle|^2\\<br />\Rightarrow \Delta a\,\Delta b &\ge \frac12 \sqrt{|\langle\{A,B\} \rangle - 2\langle A\rangle \langle B\rangle |^2 + |\langle [A,B]\rangle|^2}<br />\end{align}$$<br />This is the <b>Robertson-Schrodinger relation</b>.<br /><br />(Guide in case you get stuck somewhere -- <i>line 3</i>, Cauchy-Schwarz inequality; <i>line 4</i>, splitting into Hermitian and anti-Hermitian parts; <i>line 5</i>, magnitude of a complex number -- I'm not sure if I can give any better motivation for specifically considering the product of the standard deviations -- like I said, these specific relations are not really that fundamental. I guess we just want to illustrate the <b>point</b> of "the" uncertainty principle, regardless of the specific ways in which it is treated, and would like to get a simple form for it, regardless of how weak or strong it may be.)<br /><br />One may weaken the inequality further, writing (and this is equivalent to having ignored the real part in line 4, saying the magnitude of a complex number is at least that of the imaginary part):<br /><br />$$\Delta a\,\Delta b \ge \frac12 |\langle [A,B]\rangle|$$<br />This is the <b>Heisenberg uncertainty relation</b>. In particular, in the <a href="https://thewindingnumber.blogspot.com/2019/06/dealing-with-eigenspaces-noncommuting.html">last article</a>, we showed that for the position and momentum operators, $[X,P]=i\hbar$. So in this case, we get the celebrated identity:<br /><br />$$\Delta x\, \Delta p \ge \frac{\hbar}{2}$$<br />For canonically conjugate $X$ and $P$.<br /><br />As mentioned before, other stronger uncertainty relations exist for general observables. Some examples can be found on the Wikipedia page <a href="https://en.wikipedia.org/wiki/Stronger_uncertainty_relations">Stronger uncertainty relations</a> (<a href="https://en.wikipedia.org/w/index.php?title=Stronger_uncertainty_relations&oldid=874251670">permalink</a>).Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-77330539512350933022019-06-01T13:13:00.000+01:002019-06-03T16:12:45.229+01:00Projection operators, generalised Born's rule, position basis, wavefunctionAt the end of the <a href="https://thewindingnumber.blogspot.com/2019/05/from-polarisation-to-quantum-mechanics.html">last article</a>, I asked you to investigate Born's rule for continuous variables like position and momentum.<br /><br />Well, the problem is that if $x$ is continuously distributed (i.e. we have an operator $X$ whose eigenvalues form a continuous spectrum $\Sigma_X$), typically $P(x=\lambda)=0$ -- and this gives us very little information about the actual probability distribution. What we're really interested in is $P(x\in B)$ for $B$ some subset of $\Sigma_X$.<br /><br /><div class="twn-furtherinsight">Technically, we need $B$ to be a "Borel subset", or "measurable subset". We will be omitting several such technicalities in the article, such as the need for the spectral theorem to define a "projection-valued measure" or "spectral measure" on an operator with a continuous spectrum -- this is something that will be covered in the <a href="https://thewindingnumber.blogspot.com/p/1103.html">MAO1103: Linear Algebra course</a>.</div><br />First, let's think about $P(x\in B)$ in the countable case. One can write $B=\{\lambda_1,\ldots\lambda_n\}$, and then simply say that<br /><br />$$P(x\in B)=\sum |\langle\psi|\phi_k\rangle|^2$$<br />But the term on the right is a Pythagorean sum -- specifically, it is the length-squared of the vector formed by summing all the projections of $|\psi\rangle$ onto the eigenstates $|\phi_1\rangle\ldots|\phi_k\rangle$. But this is the same as the length of the projection of $|\psi\rangle$ onto the span of these eigenstates.<br /><br />(<b>Note on notations:</b> From here onwards, we will use the notation $|\lambda\rangle$ to refer to the eigenvector corresponding to the eigenvalue $\lambda$ (if the eigenspace has dimension more than 1, we'll figure something out). We will use the notation $\{|B\rangle\}$ to refer to the span of the eigenvectors corresponding to the eigenvalues in $B$.)<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Et48Tk7LJKk/XPGTZ8d7geI/AAAAAAAAFlQ/_RK3VpoUgvoRp1dfUpK_zzE98L-8mqwfQCLcBGAs/s1600/projection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="394" data-original-width="296" height="320" src="https://1.bp.blogspot.com/-Et48Tk7LJKk/XPGTZ8d7geI/AAAAAAAAFlQ/_RK3VpoUgvoRp1dfUpK_zzE98L-8mqwfQCLcBGAs/s320/projection.png" width="240" /></a></div>So we could just define a Hermitian <b>projection operator</b> $L_X(B)$ for any subset $B$ of the spectrum of $X$ -- it is an easy exercise to write down an explicit form for $L_X(B)$ in terms of the eigenvectors of $X$.<br /><br />Then the probability $P(x\in B)$ is simply $|L_X(B)|\psi\rangle|^2$. Recalling that a Hermitian projection operator satisfies $L^*=L=L^2$, we can write the <b><i>generalised Born's rule</i></b> as:<br /><br />$$\begin{align}P(x\in B) &= |L_X(B)|\psi\rangle|^2\\ &= \langle\psi|L_X(B)|\psi\rangle\end{align}$$<br />Well, this is interesting! In the last article, you proved that the <b>expected value</b> of an observable $X$ given a state $|\psi\rangle$ is given by $\langle\psi|X|\psi\rangle$. But here we have a <i><b>probability</b></i> given by the same expression. So we want to interpret our projection operators as some sort of "observable" -- we can omit the "Hermitian", since all observables are Hermitian.<br /><br />There's another place you might've seen something like this, and that is with <i>indicator variables</i> in probability and statistics -- <i>the expected value of an indicator variable for an event is the probability that the event occurs</i>.<br /><br />Try to interpret these projection operators as observables that are analogous to "indicators" in some sense. If you think a little about it, you might see exactly what these observables represent: the eigenvalues of $L_X(B)$ are all 1 and 0 -- if the value "1" is realised, the state has been projected into the $\{|B\rangle\}$ -- and if the value "0" is realised, it hasn't.<br /><br />So projection operators are a special type of observable, measuring the answer to "<b>Yes/No questions</b>" -- if the answer to "is the system in one of the states $\{|B\rangle\}$?" is <b>yes</b>, the observable $L_X(B)$ takes the value 1 -- if the answer is <b>no</b>, then it takes the value 0. So it is precisely an "<b>indicator variable</b>" for $\{|B\rangle\}$.<br /><br />We have seen such projection operators, of course, in the context of polarisation -- where the operator represented whether or not the photon has passed through. Indeed, one may formulate quantum mechanics entirely in terms of projection operators, as any question can be formulated with some number of Yes/No questions (the key reason why this can be done, as we will see -- is that these "yes/no questions" all commute, i.e. the corresponding projection operators share an eigenbasis). Let's not.<br /><br /><hr /><br />Well, this can be generalised in the straightforward way to an operator with a continuous spectrum, resulting in the same expression. We can also calculate probability <i>densities</i> using this result. Let $X$ be an operator with continuous spectrum $\Sigma_X$ -- then we can write the state $|\psi\rangle$ in the eigenbasis of $X$:<br /><br />$$ |\psi\rangle = \int_{\Sigma_X} |x\rangle\, \Psi(x)\, dx $$<br />Where $\Psi(x)\, dx=\langle\psi|x\rangle$ are the coefficients of the state in the eigenbasis, i.e. the probability amplitudes -- we call $\Psi(x)$ the <b>wavefunction</b>, and it represents <i>probability amplitude densities</i>. Then for some set $M\subseteq \Sigma_X$ of eigenvalues $L_X(B)|\psi\rangle$ is the projection:<br /><br />$$ L(M)|\psi\rangle = \int_B |x\rangle\, \Psi(x)\, dx $$<br />And one may calculate the dot product, noting that complex dot products require taking the complex conjugate:<br /><br />$$ \langle \psi | L_X(B) | \psi \rangle = \int_B \Psi^*(x)\, \Psi(x)\, dx $$<br />Which gives us an expression for the <b>probability density function</b> on $\Sigma_X$ as:<br /><br />$$\begin{align}\rho(x) &=\Psi^*(x)\,\Psi(x) \\<br />&=|\Psi(x)|^2\end{align}$$<br />And this applies to any operator with a continuous spectrum, like position and momentum.<br /><br /><hr /><br /><div class="twn-pitfall">Some texts define the eigenvectors $|x\rangle$ of a continuous-spectrum observable differently from us -- it is often conventional to let $|x\rangle$ be infinitely large so that $\langle x_1|x_2\rangle = \delta(x_1-x_2)$. This is so that the amplitudes $\langle\psi|x\rangle$ are not infinitesimal, but instead $\langle\psi|x\rangle=\Psi(x)$ (without multiplication by $dx$). For consistency with discrete spectra, we do not use this convention.</div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-2410056145919502942019-05-29T23:11:00.000+01:002019-06-01T13:18:04.420+01:00From polarisation to quantum mechanics: states, observables, Born's lawLike most texts on the theory, I will motivate the mathematics of quantum mechanics from the example of polarisation -- mostly because it's a very accessible example of stuff being wavelike. From this example, we will be able to motivate: the <b>state vector</b> (generalising the polarisation), <b>state vector collapse</b> (the event of polarisation), <b>observables and their eigenvalues</b> (stuff like energy, number of photons, etc.), <b>eigenstates and their orthogonality </b>(polarisation basis), <b>noncommuting operators and uncertainty</b> (the noncommuting of lenses).<br /><br />The key feature of quantum mechanics -- the fundamentally probabilistic nature -- comes from the following two facts, confirmed by experiments (the famous experiments here are the <b>double-slit experiment</b> and <b>photoelectric effect</b> respectively):<br /><br /><ul><li>Everything is a <b>wave</b> -- objects behave as waves, following the superposition principle and the waves represent densities of observations at large scales.</li><li>Everything is a <b>particle </b>-- which manifests itself in the form of some stuff, like energy and momentum, coming in little quanta.</li></ul><div><br /></div><div>This is the principle of <b>wave-particle duality</b>. You may realise how this implies a probabilistic description, but the following example should make it quite clear: consider a wave of light, with energy $hf$ (so it's a single photon) polarised at angle $\theta$ to the horizontal -- and it passes through a horizontal polarising filter. Well, then the wave that passes through would be a horizontally polarised wave with energy $hf\cos^2\theta$, right?<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-g0BSGI23_Tw/XO5d5XpI5SI/AAAAAAAAFks/y51gFRd5w1MYIRBI54_95upCVVU-hiKQACLcBGAs/s1600/4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="545" data-original-width="615" height="283" src="https://1.bp.blogspot.com/-g0BSGI23_Tw/XO5d5XpI5SI/AAAAAAAAFks/y51gFRd5w1MYIRBI54_95upCVVU-hiKQACLcBGAs/s320/4.jpg" width="320" /></a></div><br />But this is <i>impossible</i>, since energy levels in quantum mechanics are quantised -- you can't <i>have</i> $\cos^2\theta$ of a photon, you can only have integer multiples of a photon. But the fact that energy drops as $E\cos^2\theta$ is something that you can verify at your home, using sunglasses -- what the heck?<br /><br />The key point is that the empirical verification of the $\cos^2\theta$ business that you can do at home is on a <b><i>macroscopic</i> level</b>, when you have a large number of photons $E=Nhf$. So something occurs with the photons on a <b>microscopic level</b> such that when you try it with a <b>large number of photons</b>, $\cos^2\theta$ of the photons pass through.<br /><br />Well, this is <b>precisely the (frequentist) definition of probability</b>! A single photon passes through the filter with a <i>probability</i> of $\cos^2\theta$ so that for a large number of photons, $\cos^2\theta$ of the photons pass through. This is a non-trivial result -- wave-particle duality makes no mentions of probability as such, it just tells us that stuff is both a particle and a wave, but this simple condition in itself implies a probabilistic, non-deterministic reality.<br /><br /><hr /><br />Similar thought experiments can illustrate the probabilistic nature of other things (the "things" in question here will soon be called "eigenstates"): <b>position</b> is easy -- consider a standing wave photon in a box (this can easily be constructed). This is uniformly distributed throughout the box -- so how much of the energy is in some chunk of the box?<br /><br /><b>Momentum</b> is trickier, but shouldn't be too hard if you're familiar with Fourier transforms -- what's the analog of a "box" in momentum-space? Well, consider a concentrated pulse of light -- this can be written, via a Fourier transform, as the sum of several light waves of different momenta (i.e. frequencies), each wave with some lower energy. Taking "some chunk" of this "box" amounts to filtering some specific frequencies of the light. This can be done easily, e.g. with a colour filter -- so how much of the energy is contained in the waves with these specific momenta?<br /><br />In both cases, the key point is that you can't have a fraction of the energy of the photon at these positions/momenta, so you must have a <i>probability</i> of measuring the photon to be in a specific range of positions or a specific range of positions -- to be in a specific region or in a specific region of momentum-space.<br /><br /><hr /></div><div><br />The fundamental point here can be made for any quantity $X$: if you can filter out the "part" of a collection of particles that has $X$ in a certain subset of its range, then on a microscopic level, is <i>probabilistic</i>. The act of "filtering out the parts with a certain $X$", applied to a single particle, is just the act of checking if a particle is in a certain $X$-interval, and is called <b>measurement</b>. Any quantity that you can measure is called an <b>observable</b>. </div><div><br /></div><div class="twn-furtherinsight">Something like polarisation is really a form of measurement -- you're <i>finding out</i> whether or not the photon is in a certain polarisation $|\phi_{\parallel}\rangle$. You may have another observable, corresponding to a different polarisation -- even one that is orthogonal to the first polarisation -- $|\phi_{\perp}\rangle$ and still get that the photon is in $|\phi_\perp\rangle$. There is nothing wrong with this, as we just know beforehand that the photon is in $|\phi_\parallel\rangle$ <em>or</em> $|\phi_\perp\rangle$. If you perform the polarisation with $|\phi_\perp\rangle$ <em>after</em> the polarisation with $|\phi_\parallel\rangle$, you will find that the photon doesn't pass through, as you know for sure that the photon is not in both $|\phi_\parallel\rangle$ <em>and</em> $|\phi_{\perp}\rangle$.<br /><br />Now, you may have certain psychological issues with this, as have many in history -- however, you might want to note that the aim of quantum mechanics is not to fix your psychological, psychiatric etc. problems but to explain nature. You need to accept logical positivism and learn to <em>shut up and calculate</em> to be comfortable with quantum mechanics -- I recommend reading <a href="https://thewindingnumber.blogspot.com/2017/08/three-domains-of-knowledge.html">Three Domains of Knowledge</a>.</div><div><br /></div><div>So whatever calculus we invent to describe these probabilistic phenomena, it is going to apply to all <i>observables</i>.</div><div><br /></div><div>In our first example, the polarisation of the photon can be represented by a unit vector which we will denote as $|\psi\rangle$. The polarising filter has two special axes, represented by unit vectors $|\phi_{\parallel}\rangle$ and $|\phi_\perp\rangle$ -- these are special in the sense that an incoming photon polarised as $|\phi_{\parallel}\rangle$ or $|\phi_\perp\rangle$ will simply be scaled, by factors of 1 and 0 respectively -- so these form an <b>eigenbasis</b> for a certain operator.</div><div><br /></div><div>Well, we said that the photon passes through (with polarisation $|\phi_{\parallel}\rangle$) with probability $\cos^2\theta$ -- this arises simply from considering the <i>amplitude of $|\psi\rangle$ in the direction of $|\phi_\parallel\rangle$</i>. So we can write the <b>probability that the photon ends up in a state</b> $|\phi\rangle$ as $|\langle\psi|\phi\rangle|^2$ where $\langle\psi|\phi\rangle$ is called the corresponding "<b>probability amplitude</b>".<br /><br /><b>This expression, $P(x=\lambda)=|\langle\psi|\phi_\lambda\rangle|^2$ is called Born's rule.</b></div><div><br /></div><div>Let's get back to the eigenbasis -- what exactly is this an eigenbasis of? We said that the corresponding eigenvalues are 1 and 0, so this gives us a complete description of the operator. Note that this operator depends <i>only</i> on the observable (namely "number of photons in the $|\phi_{\parallel}\rangle$ direction), not on the state or any other feature of the observation. So we decide to <b><i>call</i> this operator/matrix the "observable"</b>, and its eigenvalues are the values of the observable that can be measured.<br /><br />To find properties of these observables, the natural way is to note that the only feature we've really required of them is Born's rule, i.e. the probabilistic interpretation -- so we can apply the axioms of probability and see what they apply in the context of these observables.<br /><br /><ul><li><b>$P(E)\ge 0$ --</b> imply that the observables are over either the reals or complexes, so that $|\langle\psi|\phi\rangle|^2\in \mathbb{R}$ in the first place. The nonnegativity then follows.</li><li><b>$P(\Omega)=1$ and </b><b>$P\left(\bigcup_i E_i\right) = \sum_i P(E_i)$ for disjoint $E_i$ -- </b>this, along with the second axiom, implies that $\sum |\langle\phi|\psi\rangle|^2 = 1=|\langle\psi|\psi\rangle|^2$ where the sum is taken over all eigenstates $|\phi\rangle$ of the operator. As this must be true for all states $|\psi\rangle$, the thing on the left must be a Pythagorean sum, so the $|\phi\rangle$s must form an orthogonal basis. This implies that all <b>observables are normal operators</b>.</li></ul><div><br /></div><div>The latter fact is very important, and can also be seen in the following way -- if you a system is in one eigenstate, it cannot possibly collapse onto another eigenstate (the probabilistic interpretation is: if you know for sure the value of the symbol is a thing, it's that thing) -- so we must have $|\langle \phi_1|\phi_2\rangle|^2=0$ for all eigenstates $|\phi_1\rangle$ and $|\phi_2\rangle$.</div><div><br /></div><div>Another restriction we add is that the observables be not only normal, but <b>Hermitian operators</b> in particular, so they have real eigenvalues. This may seem an odd choice, but it makes sense, as any normal operator may be uniquely written as $X_H+iX_{AH}$ where $X_H$ and $X_{AH}$ are Hermitian, and $X_H$ and $X_{AH}$ commute, so any complex observation can be done unambiguously as two real observations. So we stick to real eigenvalues.</div><div><br /></div><div>This also makes it essential that we allow complex operators rather than just real ones (the two choices were given to us from the first probability axiom), so that this decomposition is possible. Later, we will see concrete examples of this with commutators $[X,Y]$, which must be multiplied by $i$ to turn Hermitian. We will also see more fundamental reasons to choose complex numbers in QM.</div><div><br /></div><div><div><hr /></div><div><br /><b>Exercise:</b> Show that the expected value of an observable $X$ given a state $\psi$ can be given as $\langle \psi|X|\psi \rangle$ (i.e. $\psi^*X\psi$ in conventional notation).<br /><br /><b>Exercise:</b> Explain Born's rule with other observables, like position and momentum. Explain why it holds in general.</div></div></div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-88310947406843342162019-05-22T00:57:00.000+01:002019-05-30T13:59:14.989+01:00What's with e^(-1/x)? On smooth non-analytic functions: part IWhen you first learned about the Taylor series, your intuition probably went something like this: you have $f(x)$, the derivative at this point tells you how $f$ changes from $x$ to $x+dx$ (which tells you $f(x+dx)$), the second derivative tells you how $f'$ changes from $x$ to $x+dx$, which recursively tells you $f(x+2\ dx)$, the third derivative tells you $f(x+3\ dx)$, and so on -- so if you have an <i>infinite </i>number of derivatives, you know how <i>each</i> derivative changes, so you should be able to predict the <i>full global behaviour of the function</i>, assuming it is infinitely differentiable (smooth) throughout.<br /><br />Everything is nice and dandy in this picture. But then you come across two disastrous, life-changing facts that make you cry for those good old days:<br /><ol><li><b>Taylor series have <i>radii of convergence</i> -- </b>If I can predict the behaviour of a function up until a certain point, why can't I predict it a bit afterwards? It makes sense if the function becomes rough at that point, like if it jumps to infinity, but even functions like $1/(1+x^2)$ have this problem. Sure, we've heard the explanation involving complex numbers, but why should we care about the complex singularities (here's a question: do we care about quaternion singularities?)? Specifically, a Taylor series may have a zero radius of convergence. Points around which a Taylor series has a zero radius of convergence are called <b>Pringsheim points</b>.</li><li><b>Weird crap --</b> Like $e^{-1/x}$. Here, the Taylor series <i>does</i> converge, but it converges to the wrong thing -- in this case, to zero. Points at which the Taylor series doesn't equal a function on any neighbourhood, despite converging, are called <b>Cauchy points</b>.</li></ol><div>In this article, we'll address the <b>weird crap -- </b>$e^{-1/x}$ (or "$e^{-1/x}$ for $x>0$, 0 for $x= 0$" if you want to be annoyingly formal about it) will be the example we'll use throughout, so if you haven't already seen this, go plot it on Desmos and get a feel for how it looks near the origin.<br /><br /><i>Terminology:</i> We'll refer to <b>smooth non-analytic functions</b> as <b>defective functions</b>.<br /><br /></div><hr /><br /><div>The thing to realise about $e^{-1/x}$ is that the Taylor series -- $0 + 0x + 0x^2 + ...$ -- <i>isn't wrong</i>. The truncated Taylor series of degree $n$ is the <i>best polynomial approximation </i>for the function near zero, and none of the logic here fails for $e^{-1/x}$. There is honestly no other polynomial that better approximates the shape of the function as $x\to 0$.<br /><div><br /></div><div>If you think about it this way, it isn't too surprising that such a function exists -- what we have is a function that <b>goes to zero</b> as $x\to 0$ <b>faster than any polynomial</b> does. I.e. a function $g(x)$ such that</div><div>$$\forall n, \lim\limits_{x\to0}\frac{g(x)}{x^n}=0$$</div><div>This is not fundamentally any weirder than a function that escapes to infinity faster than all polynomials. In fact, such functions are quite directly connected. Given a function $f(x)$ satisfying:</div><div>$$\forall n, \lim\limits_{x\to\infty} \frac{x^n}{f(x)} = 0$$</div><div>We can make the substitution $x\leftrightarrow 1/x$ to get</div><div>$$\forall n, \lim\limits_{x\to0} \frac{1}{x^n f(1/x)} = 0$$</div><div>So $\frac1{f(1/x)}$ is a valid $g(x)$. Indeed, we can generate plenty of the standard smooth non-analytic functions this way: $f(x)=e^x$ gives $g(x)=e^{-1/x}$, $f(x)=x^x$ gives $g(x)=x^{1/x}$, $f(x)=x!$ gives $g(x)=\frac1{(1/x)!}$ etc.<br /><br /></div><div><hr /><br /><div></div></div><div>To better study what exactly is going on here, consider Taylor expanding $e^{-1/x}$ around some point other than 0, or equivalently, expanding $e^{-1/(x+\varepsilon)}$ around 0. One can see that:</div></div><div>$$\begin{array}{*{20}{c}}{f(0) = {e^{ - 1/\varepsilon }}}\\{f'(0) = \frac{1}{{{\varepsilon ^2}}}{e^{ - 1/\varepsilon }}}\\{f''(0) = \frac{{ - 2\varepsilon + 1}}{{{\varepsilon ^4}}}{e^{ - 1/\varepsilon }}}\\{f'''(0) = \frac{{6{\varepsilon ^2} - 6\varepsilon + 1}}{{{\varepsilon ^6}}}{e^{ - 1/\varepsilon }}}\\ \vdots \end{array}$$</div><div>Or ignoring higher-order terms for our purposes,</div><div>$$f^{(N)}(0)\approx(1/\varepsilon)^{2N}e^{-1/\varepsilon}$$</div><div>Each derivative $\frac{e^{-1/\varepsilon}}{\varepsilon^{2N}}\to0$ as $\varepsilon\to0$, but they each approach zero <i>slower</i> than the previous derivative, and somehow that is enough to give the sequence of derivatives the "kick" that they need in the domino effect that follows -- from somewhere at $N=\infty$ (putting it non-rigorously) -- to make the function grow as $x$ leaves zero, even though all the derivatives were zero at $x=0$.</div><div><br /></div><div><hr /><br /></div><div><i>But</i> we can still make it work -- by letting $N$, the upper limit of the summation approach $\infty$ <i>first</i>, before $\varepsilon\to 0$. In other words, instead of directly computing the derivatives $f^{(n)}(0)$, we consider the terms</div><div>$$\begin{array}{*{20}{c}}{f_\varepsilon^{(0)} = f(0)}\\{{{f}_\varepsilon^{(1)} }(0) = \frac{{f(\varepsilon ) - f(0)}}{\varepsilon }}\\{{{f}_\varepsilon^{(2)} }(0) = \frac{{f(2\varepsilon ) - 2f(\varepsilon ) + f(0)}}{{{\varepsilon ^2}}}}\\{{{f}_\varepsilon^{(3)} }(0) = \frac{{f(3\varepsilon ) - 3f(2\varepsilon ) + 3f(\varepsilon ) - f(0)}}{{{\varepsilon ^3}}}}\\ \vdots \end{array}$$</div><div>And write the generalised <b>Hille-Taylor series</b> as:</div><div>$$f(x) = \mathop {\lim }\limits_{\varepsilon \to 0} \sum\limits_{n = 0}^\infty {\frac{{{x^n}}}{{n!}}f_\varepsilon ^{(n)}(0)} $$</div><div>Then $N\to\infty$ before $\varepsilon\to0$ so you "reach" $N\to\infty$ first (or rather, you get large $n$th derivatives for increasing $n$) before $\varepsilon$ gets to 0.</div><div><br /></div><div>Another way of thinking about it is that the "local determines global" stuff makes sense to predict the value of the function at $N\varepsilon$, countable $N$, but it's a stretch to talk about uncountably many $\varepsilon$s away, which is what a finite neighbourhood is. But with these difference operators in the Hille-Taylor series, one ensures that each neighbourhood is a finite multiple of $h$ away at any point, so the differences determine $f$.<br /><br /><hr /></div><div><b>Very simple (but fun to plot on Desmos) exercise: </b>use $e^{-1/x}$ or another defective function to construct a "<b>bump function</b>", i.e. a smooth function that is 0 outside $(0, 1)$, but takes non-zero values everywhere in that range.<br /><br />Similarly, construct a "<b>transition function</b>", i.e. a smooth function that is 0 for $x\le0$, 1 for $x\ge1$. (hint: think of a transition as going from a state with "none of the fraction" to "all of the fraction")<br /><br />If you're done, play around with this (but no peeking): <a href="https://www.desmos.com/calculator/ccf2goi9bj"><b>desmos.com/calculator/ccf2goi9bj</b></a></div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-37345142881056927582019-05-13T00:35:00.000+01:002019-05-13T00:38:26.043+01:00The Cauchy Riemann Equations: what do they really mean?<b>Question: <a href="https://math.stackexchange.com/a/3197879/78451">Geometrical Interpretation of Cauchy Riemann equations?</a></b><br /><br />One might think that being differentiable on $\mathbb{R}^2$ is sufficient for differentiability on $\mathbb{C}$. But the Jacobian of an arbitrary such function doesn't have a natural complex number representation.<br /><br />$$<br />\left[ {\begin{array}{*{20}{c}}<br />{\partial u/\partial x} & {\partial u/\partial y} \\<br />{\partial v/\partial x} & {\partial v/\partial y}<br />\end{array}} \right]<br />$$<br />Another way of putting this is that no complex-valued derivative (see below for an example) you can define for an arbitrary function fully captures the local behaviour of the function that is represented by the Jacobian.<br /><br />$$<br />\frac{df}{dz} = \left(\frac{\partial u}{\partial x} + \frac{\partial v}{\partial y} \right) + i\left(\frac{\partial v}{\partial x}-\frac{\partial v}{\partial y}\right)<br />$$<br />The idea is that we should be able to define a complex-valued derivative "purely" for the value $z$, without considering directions, i.e. we want to consider $\mathbb{C}$ one-dimensional in some sense (the sense being "as a vector space"). More precisely, the derivative in some direction in $\mathbb{C}$ should determine the derivative in all other directions in a natural manner -- whereas on $\mathbb{R}^2$, the derivatives in *two* directions (i.e. the gradient) determines the directional derivatives in all directions. <br /><br />If you think about it, this is quite a reasonable idea -- it's analogous to how not every linear transformation on $\mathbb{R}^2$ is a linear transformation on $\mathbb{C}$ -- only spiral transformations are.<br /><br />$$<br />\left[ {\begin{array}{*{20}{c}}<br />{a} & {-b} \\<br />{b} & {a}<br />\end{array}} \right]<br />$$<br />How would we generalise differentiability to an arbitrary manifold? Here's an idea: <b>a function is differentiable if it is locally a linear transformation</b>. So on $\mathbb{R}^2$, any Jacobian matrix is a linear transformation. But on $\mathbb{C}$, only Jacobians of the above form are linear transformations -- i.e. the only linear transformation on $\mathbb{C}$ is <b>multiplication by a complex number</b>, i.e. a spiral/amplitwist. So a complex differentiable function is one that is locally an amplitwist (geometrically), which can be stated in terms of the components of the Jacobian as:<br /><br />$$<br />\begin{align}<br />\frac{\partial u}{\partial x} & = \frac{\partial v}{\partial y} \\<br />\frac{\partial u}{\partial y} & = - \frac{\partial v}{\partial x} \\<br />\end{align}<br />$$<br />This is precisely why you shouldn't (and can't) view complex differentiability as some basic first-degree smoothness -- there is a much richer structure to these functions, and it's better to think of them via the transformations they have on grids.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-20494292960509634462019-05-06T22:21:00.001+01:002019-06-13T11:41:37.725+01:00Lie Bracket, closure under the Lie Bracket(If you're just here for the easy way to see closure, skip ahead to <a href="https://www.blogger.com/blogger.g?blogID=3214648607996839529#closure">Closure under the Lie Bracket</a>)<br /><br />In the <a href="https://thewindingnumber.blogspot.com/2019/04/introduction-to-lie-groups.html">previous article</a>, I introduced Lie Groups and Lie Algebras by talking about Lie Algebras as a parameterisation for the Lie Group -- we said that the elements of the Lie Group could be written as exponentials of these parameters (not uniquely, sure, but they can be written in this way). Some things to note here:<br /><ul><li>What we've called "Lie Groups" refers only to <b><i>connected</i> Lie Groups</b>, as motivation. In general, the theory of Lie groups considers <b>any group that is also a manifold </b>-- for instance, the non-zero real numbers are also a Lie Group (even though their Lie Algebra is identical to that of the positive real numbers -- can you see why?). We will hereby use this more general definition.</li><li>It's not really true that any Lie group can be parameterised in this fashion by writing each element as an exponential of a Lie Algebra element -- even for connected groups. This shouldn't be surprising -- given a term of the form $\exp X$ and a term $\exp Y$, their product $\exp X\exp Y$ is in the group by closure, but it isn't necessarily equivalent to $\exp(X+Y)$ on a non-Abelian group (could it be the exponential of something else? We'll find out later).</li><li>A <i>parameterisation</i> of this form is not the same as a <i>co-ordinate system</i>.</li></ul>The last point is what we will concentrate on in this article.<br /><br />What is a co-ordinate system on a manifold? Well, they key point is that any element of the manifold can be decomposed in terms of its components along the co-ordinates. On a Lie Group, this means that there should exist a "basis" for the Lie Group $\exp(X_1),\ldots\exp(X_n)$ corresponding to the basis $X_1,\ldots X_n$ for the Lie Algebra vector space such that every element of the Lie Group can be written as products of powers of these elements, and any rearrangement of the terms in the product should leave it invariant (i.e. the elements should commute with each other).<br /><br /><div class="twn-pitfall">Note that it <em>is</em> possible to decompose elements of a connected Lie Group as a product of <em>some</em> exponentials, but this is different from there being specifically $n$ elements that one can write any Lie group element as products of.</div><br />But clearly, this can only be possible if the group is <i>Abelian</i>, commutative. This is a special case of the more general fact that only a <b>holonomic basis</b> gives rise to a co-ordinate system on a manifold. The idea is -- a closed loop should produce no overall group action. If you <b>flow</b> $\varepsilon$ in the $X$ direction, then flow $\varepsilon$ in the $Y$ direction, then flow $\varepsilon$ back in the $X$ direction and flow $\varepsilon$ back in the $Y$ direction, you should end up back where you started. If you don't, then the resulting difference is the infinitesimal "<b>group commutator</b>" of the Lie Group:<br /><br />$$e^{\varepsilon X}e^{\varepsilon Y}e^{-\varepsilon X}e^{\varepsilon Y}$$<br />One can check via a Taylor expansion that this is equal, to second order, to:<br /><br />$$1+\varepsilon^2(XY-YX)$$<br />The first thing to note about this is that the $\varepsilon^1$ term is zero -- this may seem like a surprising coincidence, but perhaps it isn't that surprising (I mean, there's nothing else it <i>could</i> be, right? If the commutator was to first-order $1+\varepsilon z$, $\exp z$ would be equal to 1, and so it would give no characterisation at all of the amount of non-commutativity of the flows $X$ and $Y$) -- it's analogous to vector calculus, where the <b>curl</b> of a vector field is proportional to $\varepsilon^2$ (i.e. a line integral along the curve is proportional to its area, so you divide it by this area in the definition of curl, etc.).<br /><br />The second-order term, $XY-YX$, is more interesting. This may seem weird because so far, we've been considering the Lie algebra purely as a <b>vector space</b>, with addition and scalar multiplication being the only things going on. But clearly, this cannot be the entire picture, or a connected Lie group would be characterised entirely by the dimension of its Lie algebra. This operation -- the <b>Lie Bracket</b> or <b>Lie Algebra commutator</b> represented by $[X.Y]$ -- as we will see, gives some additional structure to the Lie Algebra, and in fact characterises it (we'll see what this means).<br /><br />So far, we've obtained no motivation for why this operation $XY-YX$ is actually of any significance. Sure, it appeared in our second-order approximation for the group commutator, but is the group commutator we defined really so great? Surely there could be other ways one could measure the non-commutativity of a group. And the $\varepsilon^2$ business is <i>weird</i>. Things that arise proportional to $\varepsilon$ live in the tangent space, in the Lie Algebra. Where does $[X,Y]$ even live?<br /><br />Two facts will convince us that the Lie Bracket is indeed the "right" measure of non-commutativity of a Lie Algebra:<br /><br /><ul><li><b>The Lie Algebra is closed under the Lie Bracket -- </b>we will see that in fact, $[X,Y]$ lives <i>in the lie algebra</i>, so it is in fact a binary operation on the Lie Algebra, and really does add structure to the Lie Algebra.</li><li><b>It characterises the entire Lie Algebra -- </b>not only is it <i>part</i> of the structure of the Lie Algebra, it characterises the entire structure of the Lie Algebra. What this means is that defining the Lie Bracket on the vector space allows a full characterisation of the part of the group connected to the identity (the "connected part" of the group), so we can say that any Lie Algebras with the same dimension and Lie Bracket are isomorphic.</li></ul><div><br /></div><hr /><br /><a href="https://www.blogger.com/null" id="closure" name="closure"><b>Closure under the Lie Bracket</b></a><br /><br />If you're like me, you might've thought of several analogous situations to our $1+\varepsilon^2(XY-YX)$ expression -- e.g. in (complex) analysis, at a point where the derivative of a function is zero, the function is characterised by its <i>second</i> derivative (consult Needham's <i>Complex Analysis</i>, p. 205-207 for an explanation). Another example is -- if the first derivative of a function is zero, the second derivative satisfies the product rule (this is actually directly related, in a way we won't go into now).<br /><br />Here's an idea you <i>might</i> think of: as we discussed earlier, the infinitesimal group commutator is $e^{\varepsilon X}e^{\varepsilon Y}e^{-\varepsilon X}e^{-\varepsilon Y}= 1+\varepsilon^2 (XY - YX) + O(\varepsilon^3)\in G$. But for a moment let $\varepsilon$ not be infinitesimal. So $\varepsilon (XY - YX) + O(\varepsilon^2)\in \mathfrak{g}$, the Lie Algebra corresponding to Lie Group $G$, so by scaling $XY-YX+O(\varepsilon)\in\mathfrak{g}$ and by connectedness of the vector space $XY-YX\in\mathfrak{g}$.<br /><br />But this argument is <b>incorrect</b> -- this becomes obvious if you try to formally write it down -- In general, $1+\varepsilon T\in G$ does <b>not</b> imply $T\in\mathfrak{g}$ for non-infinitesimal $\varepsilon$. It's close to an element in $\mathfrak{g}$ (for small $\varepsilon$), but how close? You might get the feeling that it is "sufficiently close", in that the limit $\varepsilon\to0$ of the sequence $\left(c_\varepsilon(X,Y)-1\right)/\varepsilon^2$ (where $c_\varepsilon(X,Y)$ is the group commutator) indeed ends up in the Lie Algebra.<br /><br />To make this feeling formal, consider instead the curve parameterised differently as $\gamma(\varepsilon)=e^{\sqrt\varepsilon X}e^{\sqrt\varepsilon Y}e^{-\sqrt\varepsilon X}e^{-\sqrt\varepsilon Y}$. Then $\gamma'(0)=XY-YX$, and we're done.<br /><br /><div class="twn-furtherinsight">think about the Taylor expansion here of this new curve for a while</div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-66478014476020575272019-04-22T12:57:00.000+01:002019-04-22T13:00:07.702+01:00Trace, Laplacian, the Heat equation, divergence theoremThe aim of this article is to help build an intuition for the trace of a matrix, "the sum of the elements on the diagonal" -- the basic idea is that the trace is an "average" of some sort, an average of the action of an operator or a quadratic form. We'll make this idea clearer with an example from classical physics: the heat equation.<br /><br /><hr /><br />Consider an $n$-dimensional space with some temperature distribution $T(\vec{x},t)$. We wish to set up a differential equation for this function.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-WDr_mgo-qEg/XL2NKs0J8iI/AAAAAAAAFfc/RQlvQLKSZDklWwkAOd7jkVaq-XXcwCzcACLcBGAs/s1600/lawofcooling.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="177" data-original-width="361" height="156" src="https://4.bp.blogspot.com/-WDr_mgo-qEg/XL2NKs0J8iI/AAAAAAAAFfc/RQlvQLKSZDklWwkAOd7jkVaq-XXcwCzcACLcBGAs/s320/lawofcooling.png" width="320" /></a></div>In the case that $n = 1$, this differential equation is exceedingly easy to write down, considering the difference $(T(x+dx)-T(x))-(T(x)-T(x-dx))$ as the double-derivative upon division by $dx^2$. More rigorously, what we're doing here is applying a <b>localised version of the fundamental theorem of calculus</b>. I.e. we're writing down:<br /><br />$$\begin{align}<br />\lim_{\Delta x \to 0} \frac{1}{\Delta x}(T'(x + \Delta x) - T'(x)) &= \lim_{\Delta x \to 0} \frac{1}{{\Delta x}}\int_x^{\Delta x} {T''(x)dx} \\<br />& = T''(x)<br />\end{align}<br />$$<br />More generally, we may consider the $n$-dimensional case.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-HZE4Or8E8tU/XL2YeooEtuI/AAAAAAAAFfw/akx8XXDjp5clCqNjy4LzSmkFV3Fk0-KzACLcBGAs/s1600/laplace.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="522" data-original-width="605" height="276" src="https://1.bp.blogspot.com/-HZE4Or8E8tU/XL2YeooEtuI/AAAAAAAAFfw/akx8XXDjp5clCqNjy4LzSmkFV3Fk0-KzACLcBGAs/s320/laplace.png" width="320" /></a></div>Analogously to before, one may try to look at temperature flows in each direction -- here, we have an <i>integral</i>, done on the boundary of an infinitesimal region $V$ (this symbol will also represent the volume of the region):<br /><br />$$ \frac{{\partial T}}{{\partial t}} = \lim_{V \to 0} \frac{\alpha }{V}\int_{\partial V} {\hat u\,dS \cdot \vec \nabla T} $$<br />At this point, one may apply the divergence theorem, converting this to:<br /><br />$$\frac{{\partial T}}{{\partial t}} = \mathop {\lim }\limits_{V \to 0} \frac{\alpha }{V}\int\limits_V {\vec \nabla \cdot \vec \nabla T\;dV} = \alpha{\left| {\vec \nabla } \right|^2}T$$<br />In this sense, the divergence theorem is analogous to the fundamental theorem of calculus for manifolds with boundaries that are more than one-dimensional (see the bottom of the page for a link to a formalisation/an abstraction based on this analogy). But there are more ways to intuitively understand this. Note how the Laplacian is the trace of the Hessian matrix (note: we use $\vec{\nabla}^2$ to refer to the Hessian and $\left|\vec\nabla\right|^2$ to refer to the Laplacian):<br /><br />$${\left| {\vec \nabla } \right|^2}T = {\mathop{\rm tr}} \left({\vec{\nabla} ^2}T\right)$$<br />The trace of a matrix is fundamentally linked to some notion of <i>averaging</i> -- the simplest interpretation of this is that it is the mean of the eigenvalues. But more relevant to our situation, it can be shown that the trace of a matrix is the expected value of the quadratic form defined by the matrix on the unit sphere -- or on a general sphere $S$:<br /><br />$${\mathop{\rm tr}} A = \frac{1}{S}\int_S {\frac{{\Delta {x^T}A\,\Delta x}}{{\Delta {x^T}\Delta x}}\,dS} $$<br />One may check that taking the limit as $\Delta x \to 0$, substituting $\nabla^2$ for the operator and writing ${\overrightarrow \nabla ^2}f\,d\vec x = \overrightarrow \nabla f$, one gets the original "average of directional derivatives" expression.<br /><br /><div class = "twn-furtherinsight">Can you interpret the other coefficients of the characteristic polynomial in terms of statistical ideas?</div><br /><hr /><br /><div><b>Further reading:</b></div><div><ul><li>Using the "infinitesimal region" idea to define divergence, curl and Laplacian rigorously: <a href="https://www.khanacademy.org/math/multivariable-calculus/greens-theorem-and-stokes-theorem/formal-definitions-of-divergence-and-curl/a/formal-definition-of-divergence-in-two-dimensions">Khan Academy</a></li><li>An abstraction based on the "analogy" between FTC, Divergence Theorem, Navier-Stokes Theorem, etc. <a href="https://en.wikipedia.org/wiki/Stokes%27_theorem">Stokes' theorem (Wikipedia)</a></li></ul></div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-73011940146266625212019-04-21T10:28:00.000+01:002019-04-22T13:20:48.472+01:00SVD, polar decomposition, normal matrices; a re-look at transposes and FTLABack in <b><a href="https://thewindingnumber.blogspot.com/2017/08/symmetric-matrices-null-row-space-dot-product.html">Null, row spaces, transpose, fundamental theorem of algebra</a></b>, we first introduced some hand-wavy intuition for the transpose and the orthogonality of the row space and the null space (and the following fundamental theorem of linear algebra). Here, we solidify this intuition a bit more clearly.<br /><br />Consider the "symmetric collapse" discussed in the above article. Our study of the transformation relied specifically on looking at it in a specific basis -- <b>an <i>orthogonal </i>basis</b> -- comprised of the column space and the null space. In this basis, the transformation is a scaling on both axes. In the more general case of an asymmetric collapse -- in which we rotated our space before collapsing, we looked at a basis formed by the row space and the null space -- the basis got <b>rotated and scaled</b> into the new basis, that was the column space and an arbitrary other vector (that could be perpendicular to the column space).<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-aYNoceHiRtc/XLtjjyOB04I/AAAAAAAAFeQ/LzZjyHchbq0ABijtIg5WFdvCc9SwvVU0ACLcBGAs/s1600/collapse.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="520" data-original-width="615" height="270" src="https://3.bp.blogspot.com/-aYNoceHiRtc/XLtjjyOB04I/AAAAAAAAFeQ/LzZjyHchbq0ABijtIg5WFdvCc9SwvVU0ACLcBGAs/s320/collapse.png" width="320" /></a></div>A sensible question to ask is if any transformation can be written in this fashion -- as a transformation of an orthogonal basis into another orthogonal basis. Analogous to how an <b>eigenvalue decomposition of a matrix writes it as scalings in some basis</b>, we're looking to represent the matrix as a <b>spiral (i.e. a scaling combined with a rotation) in some basis</b>. But let's stick with the first formulation of our question -- for any linear transformation $A: \mathbb{R}^n \to \mathbb{R}^m$, can we find an orthogonal basis on $\mathbb{R}^n$ that is mapped to an orthogonal basis $\mathbb{R}^m$?<br /><br />One could, e.g. consider the images under $A$ of the angle of each orthonormal basis in $\mathbb{R}^2$ (i.e. look at the function $AU(\theta)\vec{e}_1 \cdot AU(\theta)\vec {e}_2$ for varying $\theta$ where $U(\theta)$ is the rotation matrix by angle $\theta$) and apply the intermediate value theorem, etc. And such a proof could in principle be extended to $\mathbb{R}^n$.<br /><br />(See <a href="http://www.ams.org/publicoutreach/feature-column/fcarc-svd">here</a> for a thorough explanation.)<br /><br />Here's another, more insightful way you might come to prove this -- we've been visualising linear transformations so far by looking at the image of the basis vectors, but another way to do visualise these transformations is by looking at the <b>image of the unit circle</b> under the transformation.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/e/e9/Singular_value_decomposition.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="330" data-original-width="400" height="264" src="https://upload.wikimedia.org/wikipedia/commons/e/e9/Singular_value_decomposition.gif" width="320" /></a></div><br /><div class="twn-furtherinsight">Why does this make sense? Well, one can find an ellipse passing through any two vectors centered at the origin. Elaborate on this argument. Is the resulting ellipse unique? (Hint: no, unless you mark points on the circumference)</div><br />Specifically, consider the <b>axes of the image ellipse </b>$\sigma_1 u_1$, $\sigma_2 u_2$ where $u_1$, $u_2$ are unit vectors. In the original unit circle, any pair of orthogonal vectors on the circle can be axes, so consider the <b>pre-image of the axes</b> of the image ellipse $v_1$, $v_2$. So we have:<br />$$ A v_1 = \sigma_1 u_1\\<br />A v_2 = \sigma_2 u_2 $$<br />Or in general:<br />$$AV = U\Sigma\\<br />A = U\Sigma V^*$$<br />Where $\Sigma$ is diagonal and positive-definite, while $U$ and $V$ are orthogonal/unitary. This is called the <b>Singular-Value Decomposition (SVD)</b> of $A$. <br /><br />In a sense, one can view this as an alternative to the eigen-decomposition. In the eigendecomposition, one looks for a <i>single</i> basis in which the transformation is a scaling. In the singular value decomposition, one looks at scaling and then "re-interpreting" in another basis, but requires that the bases be orthogonal, and that the diagonal matrix be positive and real-valued.<br /><br />The entries $\Sigma$ are called the <b>singular values </b>of $A$, the columns of $U$ and $V$ respectively are the <b>left-singular vectors</b> and the <b>right-singular vectors</b> of $A$ respectively.<br /><br />(<b>Exercise: </b>You know that $\Sigma$ is the scaling of the orthogonal basis, i.e. of the right-singular basis. Convince yourself that the rotation of the basis is given by $UV^*$.)<br /><br />This gives us a much better intuition for the transpose. The SVD of $A^*$ is clearly:<br /><br />$$A^* = V\Sigma U^*$$<br />I.e. the transpose has precisely the opposite rotational effect as $A$ and the same scaling. This is as opposed to the inverse matrix, which has both the opposite rotational and scaling effect as the matrix. For a <b>rotation matrix </b>(or generally an orthogonal matrix $A^*A=I$), the transpose equals the inverse, analogous to how the conjugate equals the inverse for a unit complex number $\bar{z}z=1$. A Hermitian matrix, $A=A^*$, by contrast, is one for which is irrotational, $UV^*=1$, i.e. for which the SVD equals the eigendecomposition.<br /><br /><div class="twn-furtherinsight">Use the SVD to get some intuition for transpose identities like $(AB)^*=B^*A^*$</div><br /><hr /><br />It's instructive to consider the SVD in the case of our original motivating example -- an asymmetric matrix representing a collapse of $\mathbb{R}^2$ into a line. What are the singular bases of this transformation? Well, it maps the orthogonal basis formed by <b>row space and the null space</b> into the orthogonal basis formed by the <b>column space and the left-nullspace</b> (i.e. the orthogonal complement of the column space).<br /><br /><div class="twn-furtherinsight">Think about its transpose.</div><br />Arranging the singular values from largest to smallest, we then have the following relation between the SVD and the items in the fundamental theorem of linear algebra, where $n$ is the dimension of the domain, $m$ is the dimension of the codomain, and $r$ is the dimension of the image/column space:<br /><ul><li>The last $n - r$ <b>singular values are zeroes</b>. </li><li>The first $r$ <b>singular values are positive</b>.</li><li>The last $n - r$ <b>right-singular vectors</b> span the <b>null space</b>.</li><li>The first $r$ <b>right-singular vectors</b> span the <b>row space</b>.</li><li>The last $m-r$ <b>left-singular vectors </b>span the <b>left-null space</b>.</li><li>The first $r$ <b>left-singular vectors</b> span the <b>column space</b>.</li></ul>Note that the terms <b>kernel</b>, <b>coimage</b>, <b>cokernel</b> and <b>image</b> are also used for the <b>null space</b>, <b>row space</b>, <b>left-null space</b> and <b>column space</b>, sometimes in a more general setting.<br /><br />This is the full form of the <b>Fundamental Theorem of Linear Algebra</b>.<br /><br /><div class="twn-furtherinsight">Spend some time thinking about the SVD of non-square matrices, relating them to square collapse matrices. Think about their transposes.</div><br /><div class="twn-exercises">Show that the right-singular vectors $V$ of a matrix $A$ are given by the eigenvectors of $A^*A$ (hint: start by considering the two-dimensional case, relating the right-singular vectors to a maximisation/minimisation problem, and extend the idea to more dimensions).<br /><br />From this, it is clear that the left-singular eigenvectors $U$ (which are the right-singular eigenvectors of $A^*$) are given by the eigenvectors of $AA^*$ and the singular values are the square roots of the singular values of $A^*A$. Well, some of them are (which ones?).</div><br />We observed earlier that the rotational effect of the matrix $A = U \Sigma V^*$ can be given by $UV^*$. The scaling effect is given by $\Sigma$ on the basis of $V$. Hence we can write:<br /><br />$$A = (UV^*)(V\Sigma V^*)$$<br />Letting $W=UV^*$ and $R = V\Sigma V^*$, this gives us a representation of $A$ as:<br /><br />$$A=WR$$<br />Where $W$ is orthogonal and $R$ is positive-semidefinite. This is known as the <b>right-polar decomposition</b> of $A$. Analogously, one may consider the <b>left-polar decomposition</b>:<br /><br />$$A = (U\Sigma U^*)(UV^*) = R'W$$<br /><div class="twn-furtherinsight">Interpret the above decomposition like we did the right decomposition. Note how $R' = WRW^*$, and how a right-polar decomposition leads to a left-polar decomposition of the transpose, and vice versa.</div><br /><div class="twn-furtherinsight">When is the polar decomposition unique? Compare the situation to the polar decomposition of complex numbers.</div><br />Here's a question: when is $R=R'$? One way of putting it is that $WRW^* = R$, i.e. $R$ commutes with $W$. This is a bit difficult to work with. Instead, one may show that the matrices $R$ and $R'$ can be given by $(A^*A)^{1/2}$ and $(AA^*)^{1/2}$ respectively (prove it!) -- noting that a principal square root can uniquely be defined for positive semidefinite matrices. So $R=R'\Leftrightarrow A^*A=AA^*$. These are known as <b>normal matrices</b>. Below is an exercise that provides more intuition into the behaviour of normal matrices.<br /><br /><hr /><br />What does it mean for a matrix to commute with its transpose? Earlier, when discussing commuting matrices, we referred to it as "matrices that do not disturb each other" -- specifically, they preserve each other's (generalised) eigenspaces. In the case of commuting with its transpose, it's easy to show that this means having the <i>same</i> eigenvectors (prove this!).<br /><br />Here's a fact about the eigenvectors of the Hermitian transpose: the eigenvector of $A$ corresponding to the eigenvalue $\lambda$ is orthogonal to all eigenvectors of $A^*$ corresponding to any eigenvalue other than $\lambda^*$ (prove this!).<br /><br />From these two facts, it follows that the following are equivalent (write down the proofs clearly!):<br /><br /><ul><li>$A$ is <b>normal</b>.</li><li>$A$ commutes with $A^*$.</li><li>$R$ commutes with $W$.</li><li>$A$ is <b>unitarily diagonalisable</b>.</li></ul><div>This is known as the <b>spectral theorem.</b></div><hr /><br />(Anyone ever thought about how weird the word "normal" is? Sometimes, it means "perpendicular", sometimes -- as in "orthonormal" and "normalisation" -- it means "unit length", because "norm". What does it mean in this context? It's probably just referring to its eigenvectors being normal/orthogonal, but I like to think it's referring to the fact that the two alternative Hermitian-valued "norms" of the matrix, $A^*A$ and $AA^*$, are equal, so the matrix has a single "norm".)<br /><div><br /></div>Fundamentally, normal matrices are "analogous" to complex numbers, or they "generalise" complex numbers, in the sense that each of its eigenvalues acts as a complex number, transforming, acting as spirals on each of the orthogonal eigenvectors within its own copy of $\mathbb{C}$. One may construct the following table of analogies between normal matrices and complex numbers:<br /><br /><div class="twn-analogies"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;margin:0px auto;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 18px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 18px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-uys7{border-color:inherit;text-align:center} .tg .tg-l04w{font-weight:bold;background-color:#efefef;border-color:inherit;text-align:center} </style><br /><table class="tg"><tbody><tr> <th class="tg-l04w">Complex numbers</th> <th class="tg-l04w">Normal matrices</th> </tr><tr> <td class="tg-uys7">Zero (sorta)</td> <td class="tg-uys7">Singular</td> </tr><tr> <td class="tg-uys7">Non-zero</td> <td class="tg-uys7">Invertible</td> </tr><tr> <td class="tg-uys7">Real</td> <td class="tg-uys7">Hermitian</td> </tr><tr> <td class="tg-uys7">Positive real</td> <td class="tg-uys7">Positive-definite</td> </tr><tr> <td class="tg-c3ow">Nonnegative real</td> <td class="tg-c3ow">Positive-semidefinite</td> </tr><tr> <td class="tg-uys7">Imaginary</td> <td class="tg-uys7">Anti-Hermitian</td> </tr><tr> <td class="tg-c3ow">Unit</td> <td class="tg-c3ow">Unitary</td> </tr><tr> <td class="tg-c3ow">Conjugate</td> <td class="tg-c3ow">Hermitian transpose</td> </tr><tr> <td class="tg-c3ow">Norm-squared</td> <td class="tg-c3ow">Gram matrix $A^*A$</td> </tr><tr> <td class="tg-c3ow">Magnitude</td> <td class="tg-c3ow">$(A^*A)^{1/2}=R=V\Sigma V^*$</td> </tr><tr> <td class="tg-c3ow">Argument</td> <td class="tg-c3ow">$AR^{-1}=W=UV^*$</td> </tr><tr> <td class="tg-c3ow">Real Part</td> <td class="tg-c3ow">$\frac12(A+A^*)$</td> </tr><tr> <td class="tg-c3ow">Imaginary Part times $i$</td> <td class="tg-c3ow">$\frac12(A-A^*)$</td> </tr></tbody></table></div><br /><b>Exercise:</b> an <em>EP matrix</em> or range-Hermitian matrix is a weakened version of a Hermitian matrix -- the row space of the matrix equals the column space. Although this was a bit hard to understand in <a href="https://thewindingnumber.blogspot.com/2017/08/symmetric-matrices-null-row-space-dot-product.html">the first article</a> and was only briefly mentioned towards the end, we now have the intuition to comprehend them. Explain why a matrix is range-Hermitian if and only if it is unitarily similar to a matrix of the form of a block matrix:<br /><br />$$\left[ {\begin{array}{*{20}{c}}C&0\\0&0\end{array}} \right]$$<br />Where $C$ is a non-singular square matrix and the zeroes are zero block matrices. This decomposition is called the <b>core-nilpotent decomposition</b>. Hence, show that being range-Hermitian is a weakened form of being normal.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-66479992669451609272019-04-09T19:12:00.000+01:002019-06-14T23:34:35.809+01:00Geometry, positive definiteness, and Sylvester's law of inertiaSomething I found absurdly dissatisfying when first studying linear algebra was the idea of a <em>positive-definite matrix</em> (or a nonnegative-definite one). Here are some explanations of the idea I found online:<br /><ul><li><b>it's a generalisation of a positive real number </b>-- correct, but why? In what sense? Sure, you could say "all the eigenvalues are positive real numbers in an orthogonal eigenbasis" -- but why is this really important? It just doesn't feel complete. This will pretty much be our second motivation, but with more backstory.</li><li><b>it keeps vectors "roughly" near where they started -- </b>this is by far the <i>worst</i> explanation I've found of the idea -- the condition that $x\cdot Ax > 0$ means that any vector remains within a right angle from where it started. Not only is this terrible because it seems absurdly arbitrary (to choose $\pi/2$ as our special angle), but it also fails to make clear why we're interested only in positive-definite <i>symmetric</i> matrices, analogous to positive real numbers. On the reals, certainly, there are plenty of transformations that achieve this with real vectors (rotations of less than $\pi/2$, for instance), but we don't care about them. <i>The most serious problem</i> with this explanation, though, is that it tries to reinterpret the condition $x^TAx>0$ in terms of the conventional Euclidean dot product, while the <em>whole point</em> is to look at generalised dot products, with bilinear forms other than the Euclidean one.</li></ul>In this article, I'll motivate the idea of a positive-definite matrix by considering "generalised geometries" and generalised inner products -- through a series of exercises (well, I'll try to keep them exercises, but maybe I won't be able to stop myself from answering them myself).<br /><br /><hr /><br /><b>"Definite geometries"</b><br /><br /><ol><li>First, we come up with a <b>definition of geometry </b>(no pun intended). Much of the linear algebra we've dealt with -- specifically the dot product -- was with Euclidean geometry in mind, and it's interesting to think about what kind of linear algebra we would come up with if we considered other sorts of geometries.</li><ol type="a"><li>The first image in your head when hearing of geometry is that of a <i>space</i>, or <i>manifold -- </i>perhaps $\mathbb{R}^n$. But a space is just a set of points. Most geometric properties you've dealt with deal with properties like <i>length</i> and <i>angles</i> and <i>shapes</i>. These properties don't depend on e.g. where you place an object on the manifold, i.e. translations -- as well as some other transformations. <b>Can you characterise all the linear transformations under which geometric properties (like the ones we mentioned) are invariant under?</b> -- i.e. the symmetries of Euclidean geometry.</li><li>These transformations are known as "rigid transformations" and form a group (prove it if you want, but come on -- they're <i>symmetries</i>, of course they form a group). Can you identify this group (discard translations if you prefer)?</li><li>So Euclidean geometry can be defined as a <b>the symmetries of $\mathbb{R}^n$ under the group $O(n)$ acting on it</b>. It is then natural to generally define a geometry as <b>the symmetries of a manifold under a group acting on it</b>. </li></ol><li>Now that we have generalised our definition of a geometry, let's specialise to a specific sort of geometry somewhat "analogous" to the traditional orthogonal-group (i.e. Euclidean) geometry. We will let our space be $\mathbb{R}^n$ or $\mathbb{C}^n$ but experiment with our group. By <i>analogous</i>, we mean not that the geometries are identical, but at least that the same notions -- like lengths and angles and shapes -- can be defined for them, that the ideas in the geometry aren't completely foreign to us.</li><ol type="a"><li>Much like the Orthogonal group can be defined by the invariance of the identity bilinear form $\mathrm{diag}(1,1,1,1)$ under a "<b>bilinear form similarity transformation</b>", more commonly known as a <b>congruence</b> (i.e. $A^T I A = I$), we can consider groups that are defined by some general $A^T M A = M$. </li><li>Obviously, not all matrix groups can be written this way -- for example, any subgroup of $O(n)$ cannot be. But groups of this form define in some sense geometries not very different from Euclidean geometry -- why? Because the form preserved by the Orthogonal group -- the identity form -- is the <i>dot product</i> on Euclidean geometry. <b>Preservation of the identity form is equivalent to the preservation of the Euclidean dot product</b> -- prove this -- which also means lengths and angles are preserved. </li><li>As any dot product is necessarily a bilinear form, it can be represented by a bilinear form $M$ called the <b>metric</b> as $v^T M w$, and its preservation is equivalent to the preservation of $M$ under bilinear form conjugation, i.e. $A^T M A = M$ -- prove this! (the proofs are absurdly trivial).</li><li>Examples of such groups include: the "<b>indefinite orthogonal group</b>", which you may know as the <b>Lorentz group</b> $O(1,3)$ from special relativity, the group of linear transformations that preserve the bilinear form $\mathrm{diag}(-1,1,1,1)$ (called the Minkowski metric). Indeed, Minkowskian geometry has notions of length (the spacetime interval) and angles (some combination of rapidity and angles).</li></ol><li>Next, we are interested in thinking about when two geometries are "basically the same".</li><ol type="a"><li>Try to write down some simple two-dimensional geometries -- consider e.g. $M = \left[ {\begin{array}{*{20}{c}}0&{ - 1}\\1&0\end{array}} \right]$. <b>Study some of its properties. </b>(this is "symplectic geometry", by the way) Do you think this is the "same" as Euclidean geometry?</li><li>Think about what kind of properties you looked at to establish the answer as "No". Use them to come up with a simple and snappy definition of two geometries being the same, or isomorphic. </li><li>You should have come to the conclusion -- <b>two geometries on the same manifold are isomorphic iff their groups are isomorphic</b> (If you read this before figuring out the answer for yourself, bleach your brain and try again.) (For legal reasons, that's a joke.)</li><li>Let's study some examples of such an isomorphism. The trivial case is where the groups are equal, e.g. if $M = kN$ or $M = N^T$ (prove these). What about some <b>non-trivial isomorphism</b>? Here's an idea: groups defined by <b>congruent metrics</b> are isomorphic. I.e. if $M=P^TNP$ for some change of basis matrix $P$, the groups $\{A^TMA=M\}$ and $\{A^TNA=N\}$ are isomorphic. Prove this (again, the proof is trivial) -- you will see that the isomorphism is a similarity relation $A \leftrightarrow P^{-1}AP$.</li><li><div class="twn-beg">Is this an <i>iff</i> statement? If two groups bilinear form-preserving groups are isomorphic, is there a way to write them as $\{A^HMA=M\}$, $\{A^HNA=N\}$ such that $M$ and $N$ congruent? I'm not sure. It would suffice to prove that all isomorphisms of a matrix group are similarity transformations. Perhaps this is implied by <a href="https://groupprops.subwiki.org/wiki/Isomorphic_iff_potentially_conjugate"><b>Isomorphic iff potentially conjugate</b></a>, but how do we know the conjugacy isn't in some weird group that $GL_{\mathbb{R}}(n)$ is homomorphic to?</div></li><li>One can also consider the case of <b>non-invertible</b> $P$ -- do we have an isomorphism between $M$-preservers and $N$-preservers if $M=P^TNP$ for non-invertible $P$? No? <b>What about a homomorphism</b>? In which direction?</li></ol><li>So which geometries are isomorphic to Euclidean geometry? What matrices are congruent to the identity form?</li><ol type="a"><li>No prizes for saying "metric tensors of the form $P^TP$ (or more generally $P^H P$)" for invertible $P$. These matrices -- those that are congruent to the identity matrix -- are called <b>positive-definite </b>matrices. Along the lines of part f above, one can also consider the case of non-invertible $P$ -- these are called <b>positive-semidefinite</b> or <b>nonnegative-definite</b> matrices, and Euclidean geometry is homomorphic to positive-semidefinite geometries.</li><li><div class="twn-pitfall">To be fair, these <em>aren't</em> the only geometries isomorphic to Euclidean geometry -- remember the trivial isomorphisms? So for instance, negative-definite geometries are also isomorphic to Euclidean geometry.</div></li><li>A neat way to visualise these "isomorphisms and homomorphisms of geometries" is by looking at the contours of the geometries, i.e. the set $x^TMx = C$. Positive definite (and negative definite) matrices correspond to <b>elliptical contours</b> (while positive semidefinite matrices correspond to the degenerate cases -- <b>a degenerate ellipse has all the symmetries of a non-degenerate one</b>, but not vice versa), which can easily be stretched into a Euclidean circle. Other matrices, on the other hand, may have hyperbolic contours which cannot be similarly deformed into a Euclidean circle.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://2.bp.blogspot.com/-lp2lNn8m3bQ/XK2tBLfnMzI/AAAAAAAAFdc/y6RJTagcuak0TbswEPhYhvW5kEvSk3A0gCLcBGAs/s1600/ellipses.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="414" data-original-width="772" height="213" src="https://2.bp.blogspot.com/-lp2lNn8m3bQ/XK2tBLfnMzI/AAAAAAAAFdc/y6RJTagcuak0TbswEPhYhvW5kEvSk3A0gCLcBGAs/s400/ellipses.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">any symmetry of the ellipse/circle can be mapped homomorphically to a symmetry of the straight lines, but not vice versa.</td></tr></tbody></table><div class="separator" style="clear: both; text-align: center;"></div></li><li>Recall that the equation of an ellipse takes the form $\mathop \sum \limits_{i = 1}^n {a_i}{x_i}^2 = C$ for positive $a_i$. So our interpretation above is equivalent to stating that a matrix of the form $P^T P$ or $P^H P$ has <b>positive</b> (for invertible $P$) or <b>non-negative</b> (degenerate ellipse) <b>real eigenvalues</b>(this should be pretty easy to prove).</li><li>More generally, any two congruent matrices have the same numbers of positive, negative and zero eigenvalues (called the positive, <b>negative and zero indices of inertia</b> respectively). This is known as <b>Sylvester's law of inertia</b> (prove it!), and shows that all real-eigenvalued matrices are <b>congruent to a diagonal matrix</b> with some number of 1's, -1's and 0's (and arrangement doesn't matter) -- see also the <a href="https://en.wikipedia.org/wiki/Metric_signature"><b>metric signature</b></a>. This gives us a condition to tell if <i>any two</i> matrices are congruent, or any two form-preserving geometries are isomorphic/homomorphic.</li></ol><li>Is it really true, though -- that any geometry with elliptical contours is isomorphic to Euclidean geometry? Come up with a counter-example (and how did you come up with it?)</li><ol type="a"><li>You might consider, e.g. $M=\left[\begin{array}{*{20}{c}}1&{ - 1}\\1&1\end{array}\right]$ -- this produces <i>exactly the same contours</i> as Euclidean geometry -- same unit circle, same everything. But it's not symmetric, and <b>all positive-definite matrices are symmetric/Hermitian </b>(proof is trivial). In fact, <b>any $M$ produces the same contours as $\frac12 (M+M^T)$ -- its "symmetric part" </b>(why?). What's going on? Does the norm (quadratic form) <i>not</i> completely define the dot product (bilinear form)?</li><li>If you think about this for a while, you might get an idea of what's going on -- the <b>symmetric/Hermitian part of a matrix defines the contours on the <i>real part</i> of the vector space</b>, but the antisymmetric/anti-Hermitian part begins to matter in $\mathbb{C}^n$. <b>The contours of the quadratic form in $\mathbb{C}^n$ completely determine the dot product</b>, i.e. if $v^HMv=v^HNv$ for all complex vectors $v$, then $v^HMw=v^HNw$ for all complex vectors $v$ and $w$, i.e. $M=N$. The proof is trivial.</li></ol><li>Next, let's consider some properties and alternate characterisations of positive-definite matrices.</li><ol type="a"><li>From the ellipse depiction, it's reasonable to wonder if a matrix is positive-definite if the norm it induces is positive for all non-zero real vectors, i.e. $v^TMv>0$ (certainly the forward implication -- only if -- is clear, from the $P^TP$ factorisation). As it turns out, though, there are other matrices -- such as a rotation by less than $\pi/2$, that also satisfy this condition. It is certainly clear that the condition $v^TMv>0$ <i>combined with</i> $A$ being symmetric imply a matrix is positive-definite -- by completing the square on a symmetric bilinear form -- but any matrix $A$ for which $(M+M^T)/2$ is positive-definite also satisfies $v^TMv>0$ by the same argument (if this seems anything but completely obvious, think about the corresponding quadratic expressions).</li><li>The reason for this annoyance is that in $\mathbb{R}^n$, you have matrices that are pure rotations, with no eigenvectors, so the non-positiveness of your eigenvalues don't have to matter. On the other hand, if you extend our domain to $\mathbb{C}^n$ -- i.e. $v^HMv>0$ for all complex vectors $v$, all the eigenvalues are "accounted for". Find a way to write this idea down precisely.</li><li>Here's another way to think about it: if $v^HMv\in\mathbb{R}$ for all complex vectors $v$, the matrix $M$ is Hermitian. The proof is basically just algebraic simplification, considering the imaginary part of $(v+iw)^HA(v+iw)$, which is $v^HAw-w^HAv$ -- which is the "symplectic part" of the general complex dot product. See <a href="https://math.stackexchange.com/a/2843380/"><b>this math stackexchange answer</b></a> for a fuller explanation.</li></ol></ol>To be completely honest, I'm being disingenous in claiming that this is "the" motivation for positive-definite matrices. There is a completely independent motivation arising from optimisation in multivariable calculus (the second-derivative/Hessian test), a completely independent motivation arising from systems of differential equations, a completely independent motivation arising from covariance matrices, and so on.<br /><br />Perhaps a way of thinking of this is that a positive-definite matrix is simply a normal matrix with positive eigenvalues, and much of this article is really a justification for why positive-definite metric tensors are (un-)interesting.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-15900026192677627112019-04-06T17:29:00.000+01:002019-05-30T14:07:19.266+01:00Introduction to Lie groupsWhen you first learned about cyclic groups, the picture in your head was that of the unit circle (complex numbers with norm one). Sure, the unit circle isn't actually a cyclic group, but it really <i>feels</i> like one. When I <a href="https://thewindingnumber.blogspot.com/2018/12/intuition-analogies-and-abstraction.html">first motivate group theory</a>, I even base the motivation on the close similarities between the circle group and the modular addition group $\mathbb{Z}/p\mathbb{Z}$. Indeed, the circle group is just the group of real numbers mod $2\pi$.<br /><br />The solution to this problem can be seen from the quickest proof that the unit circle isn't cyclic -- the fact that it isn't countable (while the integers are). Well, what if we <b>discard the centrality of the integers to our definition of a cyclic group and admit real powers on groups</b>?<br /><br />Ok, but how? It's easy to construct integer powers on an arbitrary group -- in terms of repeated addition (which defines natural powers) and inverses. But the real numbers are a wholly different beast -- they require a nice and connected <b>"smooth" structure, a geometry</b> on the group. We can certainly visualise this geometry on the unit circle or the positive real numbers (which is also "real-power cyclic"), but it's interesting to think about how one might introduce such a geometry on other groups (groups that admit such a geometry are called <b>Lie groups</b>).<br /><br />Well, if you think for a while, you might get the idea of defining a group via a real-number <i>paramterisation</i> $\mathbb{R}\to G$. The unit circle can be parameterised as $g(\theta)=\exp i\theta$, the positive real numbers can be parametarised as $g(\xi)=\exp\xi$, etc. This parameterisation would then give $\exp rt =(\exp t)^r$ for real powers $r$ of elements in the group.<br /><br />But here's the thing -- we could have introduced any sort of ugly and terrible parameterisation for our group. We knew how the parameterisation <i>should</i> look for the unit circle, but we could have as well have created something definitely not smooth -- like mapping $\pi i$ to $-1$ and mapping $(\pi + \varepsilon)i$ to $i$ (sorry, you can't use too much dramatic hyperbole on the unit circle... fine, let's map it to 30 gazillion, which isn't on the unit circle, but whatever), and the real-power would look ridiculous, not at all what we want, and we may not even have a "real-power cyclic" structure.<br /><br />What exactly do we <i>want</i> from our parameterisation?<br /><br /><hr /><br />Let's think about what a generator looks like with real powers on the unit circle. Well, really any non-identity element $e^{i\theta_0}$ can generate the group (take it to the power of $\theta/\theta_0$), but if we want to emulate the case of the integers under addition $\{...a^{-2},a^{-1},1,a,a^2,a^3,...\}$, we'd like to call the element really close to 1 the generator. Well, there's no element that's really close to 1, so we're talking about some kind of an infinitesimal thing. This is called an <b>infinitesimal generator</b> of the Lie group.<br /><br />In the first-order approximation, such an element would be of the form $1+i\varepsilon$. By making $\varepsilon$ sufficiently small, the element will be sufficiently close to being "on the unit circle", with an arc length of $\varepsilon$ away from the identity, and its real power $r$ of the element will have an arc length of $r\varepsilon$ away from the identity. So to generate the element with parameter $\theta$, we need to take $1+i\varepsilon$ to a real power of $r=\theta/\varepsilon$. I.e.<br /><br />$$g(\theta) = \lim\limits_{\varepsilon\to 0} (1+i\varepsilon)^{\theta/\varepsilon}=\lim\limits_{r\to\infty} \left(1+\frac{i\theta}{r}\right)^{r}=\exp i\theta$$<br />If you were studying calculus for the first time, this is really solid intuition for Euler's formula. Conversely, you can go in the other direction and say it's solid intuition for the compound-interest limit.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-tNdLgOkHWM0/XKj6m_2NmYI/AAAAAAAAFcs/9G_0nKihIV0aobMY4vaPBvAaXB54X9w-ACLcBGAs/s1600/llie.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="708" data-original-width="931" height="303" src="https://1.bp.blogspot.com/-tNdLgOkHWM0/XKj6m_2NmYI/AAAAAAAAFcs/9G_0nKihIV0aobMY4vaPBvAaXB54X9w-ACLcBGAs/s400/llie.png" width="400" /></a></div><br />$$\lim\limits_{\varepsilon\to 0} (1+\varepsilon\theta t)^{1/\varepsilon} = \exp(t\theta)$$<br />But here, we can view it in a more general light, and say this is the definition of the <b>exponential map </b>to a Lie group. What exactly is it a map <i>from</i>? I.e. what is the parameterising space? Well, as you can see, it maps an element $i\theta$ to the group parameterised by $\theta$ -- what is $i\theta$? It is<br /><br />$$\lim\limits_{\varepsilon \to 0} \frac{{(1 + i\varepsilon \theta ) - 1}}{\varepsilon }$$<br />I.e. these are the elements span the <i>tangent line</i> to the group at 1. In general, one may have more dimensions to this group, i.e. more parameters to put in the smooth parameterisation -- in this case we have:<br /><br />$$ g(\theta ) = \lim\limits_{\varepsilon \to 0} {\left( {1 + \varepsilon ({t_1}{\theta _1} + \ldots {t_n}{\theta _n})} \right)^{1/\varepsilon }} = \exp \vec \theta $$<br />Where $\vec\theta \in V$, which is a <i>vector space</i> with basis $\langle t_1 \dots t_n \rangle$ -- the tangent space to the group at the identity. This vector space is called the <b>Lie algebra</b> of the Lie group.<br /><br />Take a moment to appreciate the significance of this -- smoothness tells us (sorta) that a function or structure can be determined by the values of all its derivatives at a point. But when you add the group structure -- when you require an exponential structure for the parameterisation, i.e. <b>(1)</b> $g(\theta_1+\theta_2) = g(\theta_1)g(\theta_2)$; <b>(2)</b> $g(r\theta)=g(\theta)^r$; <b>(3)</b> $g(0)=1$ -- just the <i>first</i> derivative, the tangent plane, determines the entire parameterisation. This is precisely analogous to how given that a given smooth function has an exponential structure $e^{tx}$, it can be determined from its first derivative alone. The structure of a Lie group is "fundamentally exponential".<br /><br /><hr /><br />Here's another way to see how the additivity-multiplicativity condition allows the first derivative to determine the entire parameterisation. The Taylor series of the parameterisation is given by:<br /><br />$$g(\theta)=\sum\limits_{k=0}^\infty \frac{g^{(k)}(0)}{k!}\theta^k$$<br />Meanwhile the exponential map is:<br /><br />$$\exp \left(\theta g'(0)\right) =\sum\limits_{k=0}^\infty \frac{\left(\theta g'(0)\right)^k}{k!}$$<br />So a sufficient condition for the two to be equal is:<br /><br />$$g^{(k)}(0)=g'(0)^k$$<br />This is something that is true for exponential functions, of course, but what's the condition for it to be true in general? Writing both sides in limit form and using the Binomial theorem on the right,<br /><br />$$\frac{1}{{{h^k}}}\sum\limits_{k = 0}^k {{\binom{n}{k}}{{( - 1)}^k}g\left( {(n - k)h} \right)} = \frac{1}{{{h^k}}}\sum\limits_{k = 0}^k {{\binom{n}{k}}{{( - 1)}^k}g{{(h)}^{n - k}}g{{(0)}^k}} $$<br />Which is true since $g((n-k)h) = g(h)^{n-k}$ and $g(0)=1$.<br /><br /><hr /><br />(something to note: the "official" word for <em>real-power cyclic</em> is "one-parameter group" or "one-dimensional Lie group". Higher dimensional groups have more generators, i.e. more dimensions)<br /><br />Show, from the $(1+X/r)^r$ definition of the exponential map, that it can be given by the standard Taylor expansion:<br /><br />$$\exp X = 1 + X + \frac{X^2}{2!} + \ldots $$<br />You can't really assume the Binomial theorem (as it is only true on commutative rings, and the ring of $n$-dimensional matrices -- which is the ring that we embed Lie groups and their Lie algebras in -- isn't commutative), but perhaps a weaker result holds? What kind of elements still commute on general rings?<br /><br /><div class="twn-furtherinsight">The properties of the exponential map we discussed are in fact the properties of a homomorphism. Can you see why?</div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-74023499375655137012019-03-27T18:15:00.002+00:002019-04-12T19:06:52.898+01:00Invariant and generalised eigenspaces; Jordan normal formWe defined "eigenvectors" -- or really "eigenlines" -- in order to understand the behaviour of linear transformations as scalings across certain axes (which may be complex, and the scalings may be complex too). But simply thinking of eigenlines as 1-dimensional spaces that a transformation leaves invariant (the fancy phrase here is: "the 1-dimensional subspaces on which it is an <i>endomorphism</i>"), it is natural to wonder about higher-dimensional invariant spaces -- subspaces on which some transformation $A$ acts as an endomorphism.<br /><br />The problem is that any transformation has all sorts of useless invariant subspaces -- for instance any transformation $A:F^n\to F^m$ (where $m \le n$) has the entirety of $F^n$ as an invariant subspace (for any $F$), and rotations -- although fully described by their eigenvectors and eigenvalues -- have a bunch of real and complex planes as unnecessary invariant subspaces. And if $A$ has an eigenvalue with geometric multiplicity $>1$, there are an infinite number of useless invariant subspaces.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-2XqH5nEblpk/XJuRC6SnkdI/AAAAAAAAFaw/eW2lmNYa3hg0xZV_Cc8XXWZNJ3SPlaZjACLcBGAs/s1600/img-jM6pgi.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="524" data-original-width="856" height="195" src="https://1.bp.blogspot.com/-2XqH5nEblpk/XJuRC6SnkdI/AAAAAAAAFaw/eW2lmNYa3hg0xZV_Cc8XXWZNJ3SPlaZjACLcBGAs/s320/img-jM6pgi.jpg" width="320" /></a></div><br />Specifically, if the goal is to find useful representations of <b>defective matrices</b> (non-diagonalisable), invariant subspaces seem completely useless -- they certainly have no hope of giving us any sort of unique representation. Perhaps more on the point, our "eigenlines" have <b>corresponding eigenvalues</b> that tell us <i>how</i> the transformation behaves within an eigenline. Our invariant subspaces currently have nothing of the sort -- the transformation can have <i>any sort of behaviour</i> on the invariant subspace -- rotation, skewing/scaling, shearing, skewering -- and we'd have no idea. We need a convenient way to <b>write down the behaviour of the transformation on an invariant subspace</b>.<br /><br />Here's something we can start to think about: ordinary eigenvectors satisfy $(A-\lambda I)v=0$, which gives us a one-dimensional solution space. In analogy with solutions to linear differential equations (linear homogenous if you use the conventional terminology, but I reserve "affine" for linear non-homogenous), an equation like<br /><br />$$(A-\lambda_1 I)(A-\lambda_2 I)v=0$$<br />(where ${\lambda _1},{\lambda _2}$ are both eigenvalues of $A$) would have a 2-dimensional solution space, etc.<br /><br /><div class="twn-pitfall">Note that when $\lambda_1, \lambda_2$ are not eigenvalues, we <em>don't</em> have a 2-dimensional solution space (what does the solution space look like then?). Why does it work with differential equations for any $\lambda_1, \lambda_2$? (hint: what do the eigenvalues look like?)</div><br />It's sensible to ask: are these solution spaces the same as our invariant subspaces? I.e. is every member of a $k$-dimensional invariant subspace a solution to an equation of the form<br /><br />$$(A - {\lambda _1}I)...(A - {\lambda _k}I)v = 0$$<br />for eigenvalues $\lambda_1,...\lambda_k$?<br /><br />The answer is <i>yes</i>. I encourage you to try and prove it for yourself -- it is instructive to first consider special cases: (i) rotation in a plane, where indeed $(A-iI)(A+iI)=A^2+1=0$ is the minimal polynomial of $A$ (ii) more generally, $F^n$ is an invariant subspace for all isomorphisms, and indeed for all $v$ in this subspace (i.e. $\forall v \in F^n$), $p(A)v=0$ where $p$ is the characteristic polynomial of $A$ <b>by the Cayley-Hamilton theorem</b>.<br /><br />The key to proving that every invariant subspace is given by solutions to an equation of the form $(A - {\lambda _1}I)...(A - {\lambda _k}I)v = 0$ (and vice versa) lies in recognising that on any $k$-dimensional invariant subspace, $A$ is acts as an endomorphism, and therefore the Cayley-Hamilton theorem applies to it, with a $k$-order characteristic polynomial.<br /><br /><div class="twn-furtherinsight">I encourage you to spend some time thinking about this -- try relating it to differential equations. Come up with another proof of the statement -- an inductive one. See if this results in a better intuition for the Cayley-Hamilton theorem.</div><br /><div class="twn-furtherinsight">Is it true that there $2^n$ invariant subspaces of any transformation on an $n$-dimensional linear space? What about the identity transformation?</div><br /><hr /><br />Now, let's discard the invariant subspaces we don't want. We already know how to handle cases with distinct eigenvalues -- i.e. we have distinct eigenvalues in $\lambda_1...\lambda_k$ -- we just get an eigenvector for each eigenvalue. So we're really just concerned with subspaces of the form ${(A - \lambda I)^k}v = 0$. This is analogous to linear differential equations with repeated roots being weirder than ones with distinct roots.<br /><br />Note that we still know how to handle ${(A - \lambda I)^k}v = 0$-like equations when the algebraic multiplicity is accounted for by geometric multiplicity -- when this is the case, you can reduce the power from $k$ (by subtracting from it the geometric multiplicity). This nuance doesn't exist with differential equations, because distinct eigenvectors have distinct eigenvalues.<br /><br /><b>Vectors satisfying such an equation are called generalised eigenvectors</b> of order $k$ where $k$ is the minimum value for which it does satisfy the equation, and the invariant subspaces formed by generalised eigenvectors of the same eigenvalue are called <b>generalised eigenspaces</b>. The dimension of the generalised eigenspace always equals the algebraic multiplicity, unlike the eigenspace, whose dimension equals the geometric multiplicity. <br /><br /><div class="twn-furtherinsight">Check this, and that $k$ is the difference between algebraic and geometric multiplicity.</div><br />What kind of transformations precisely do generalised eigenvectors with degree greater than 1 correspond to? Clearly, skews and rotations are out of the question. But some insight can be gained from looking at the nature of skews and rotations on a plane.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-TnN1ZoPfEog/XJvBG5uw8AI/AAAAAAAAFbQ/WDUumo2IU4wsp8YSSpr3JuXINgl-scXvACLcBGAs/s1600/shearskewrot2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1380" data-original-width="1020" height="640" src="https://2.bp.blogspot.com/-TnN1ZoPfEog/XJvBG5uw8AI/AAAAAAAAFbQ/WDUumo2IU4wsp8YSSpr3JuXINgl-scXvACLcBGAs/s640/shearskewrot2.png" width="472" /></a></div><br />In two dimensions, a characteristic polynomial with a positive discriminant yields a skew along some axis, a negative discriminant yields a rotation, and the case we're interested -- the presence of repeated roots -- corresponds to the point "between" skews and rotations (speaking hand-wavily), shears.<br /><br />In a more general setting, if one has: ${(A - \lambda I)^k}v_k = 0$ for generalised eigenvector $v_k$of degree $k$, then one can extract generalised eigenvectors of each degree lower:<br /><br />$$\begin{array}{l}{(A - \lambda )^i}{(A - \lambda )^{k - i}}{v_k} = 0<br />\\ \Rightarrow {v_i} = {(A - \lambda I)^{k - i}}{v_k}\end{array}$$<br />Implying that the generalised eigenvectors with the same eigenvalue (can) form a basis for the corresponding generalised eigenspace.<br /><br />$$\begin{array}{*{20}{r}}{(A - \lambda I){v_1} = 0 \Leftrightarrow A{v_1} = \lambda {v_1}}\\{{{(A - \lambda I)}^2}{v_2} = 0 \Leftarrow (A - \lambda I){v_2} = {c_1}{v_1} \Leftrightarrow A{v_2} = {c_1}{v_1} + \lambda {v_2}}\\{{{(A - \lambda I)}^3}{v_3} = 0 \Leftarrow (A - \lambda I){v_3} = {c_2}{v_2} \Leftrightarrow A{v_3} = {c_2}{v_2} + \lambda {v_3}}\\{ \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,}\\{{{(A - \lambda I)}^k}{v_k} = 0 \Leftarrow (A - \lambda I){v_k} = {c_{k - 1}}{v_{k - 1}} \Leftrightarrow A{v_k} = {c_{k - 1}}{v_{k - 1}} + \lambda {v_k}}\end{array}$$<br />This gives us a very clear picture of how these general "sheary" transformations look in the basis of the generalised eigenvectors -- there is a shear in each plane $\left\langle {{v_{k - 1}},{v_k}} \right\rangle$. Suitable scalings could of course be chosen to make all the $c_i=1$.<br /><br />It's hard to overstate the significance of this -- what we've just found is that <i>any</i> defective matrix can be decomposed into shears on each of its generalised eigenspaces. This <i>completely classifies</i> the diagonalisability of matrices. If it's defective, it's a shear.<br /><br /><div class="twn-furtherinsight">Draw out some of these transformations in three or more dimensions.</div><br /><div class="twn-furtherinsight">Notice the directions of the implication signs -- can we make them double-sided? What if we have some geometric multiplicity? How would our shears look like then? How many dimensions do you need to visualise this?</div><br /><div class="twn-furtherinsight">Think about why this characterisation of defective matrices makes sense. What effect does adding a 1 to the subdiagonal have on the determinant? Why? (hint: area of a parallelogram) What about the other coefficients of the characteristic polynomial? (hint: think of these in terms of traces of some matrix). So these matrices are precisely those which have the same characteristic polynomial as a diagonalisable matrix without actually being similar to one.</div><br /><hr /><br />Clearly, the generalised eigenspaces of a transformation are pairwise disjoint (i.e. intersect only at the origin). Since all eigenvalues are being considered, their union is all of $F^n$. Thus the union of their bases forms a basis for $F^n$. This gives us a representation of the transformation $A$ in this basis.<br /><br />From the last section, it is clear that the effective transformation on a $k$-dimensional eigenspace with eigenvalue $\lambda$ (called a "<b>Jordan block</b>") is given by:<br /><br />$$\left[ {\begin{array}{*{20}{c}}\lambda &0&0& \cdots &0\\1&\lambda &0& \cdots &0\\0&1&\lambda & \cdots &0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\0&0&0& \cdots &\lambda \end{array}} \right]$$<br />The <b>Jordan normal form</b> of a transformation is then the matrix formed by putting all Jordan normal blocks along the diagonal -- i.e. the representation of $A$ in its generalised eigenbasis.<br /><br /><div class="twn-furtherinsight">Some writers define the Jordan normal form with the 1's on the <em>super</em>diagonal. It should be clear to you from the work we've done that this form is obtained by taking the basis vectors in reverse order (i.e. changing the basis to $\langle e_n,...,e_1 \rangle$. If this isn't clear to you, go back and work through the previous section once more. Loop.</div><br />(Stay tuned for the <a href="https://thewindingnumber.blogspot.com/2019/02/all-matrices-can-be-diagonalised.html">next article</a> to see how we can -- instead of defining generalised eigenspaces whose dimension is defined by the algebraic, rather than geometric multiplicity -- "force" algebraic multiplicity to equal geometric multiplicity, i.e. diagonalise any matrix -- with a ring extension.)Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-76177249319224574642019-02-06T17:25:00.000+00:002019-03-28T05:44:20.889+00:00All matrices can be diagonalised over R[X]/(X^n)This post follows from my answer to the math stackexchange question <a href="https://math.stackexchange.com/questions/472915/what-kind-of-matrices-are-non-diagonalizable/3097881" style="font-weight: bold;">What kind of matrices are non-diagonalisable?</a><br /><br /><hr />Non-diagonalisable 2 by 2 matrices can be diagonalised over the <a href="https://en.wikipedia.org/wiki/Dual_number">dual numbers</a> -- and the "weird cases" like the Galilean transformation are not fundamentally different from the nilpotent matrices.<br /><br />The intuition here is that the Galilean transformation is sort of a "boundary case" between real-diagonalisability (skews) and complex-diagonalisability (rotations) (which you can sort of think in terms of discriminants). In the case of the Galilean transformation $\left[\begin{array}{*{20}{c}}{1}&{v}\\{0}&{1}\end{array}\right]$, it's a small perturbation away from being diagonalisable, i.e. it sort of has "repeated eigenvectors" (you can visualise this with <a href="https://shadanan.github.io/MatVis/">MatVis</a>). So one may imagine that the two eigenvectors are only an "epsilon" away, where $\varepsilon$ is the unit dual satisfying $\varepsilon^2=0$ (called the "soul"). Indeed, its characteristic polynomial is:<br /><br />$$(\lambda-1)^2=0$$<br />Whose solutions among the dual numbers are $\lambda=1+k\varepsilon$ for real $k$. So one may "diagonalise" the Galilean transformation over the dual numbers as e.g.:<br /><br />$$\left[\begin{array}{*{20}{c}}{1}&{0}\\{0}&{1+v\varepsilon}\end{array}\right]$$<br />Granted this is not unique, this is formed from the change-of-basis matrix $\left[\begin{array}{*{20}{c}}{1}&{1}\\{0}&{\epsilon}\end{array}\right]$, but any vector of the form $(1,k\varepsilon)$ is a valid eigenvector. You could, if you like, consider this a canonical or "principal value" of the diagonalisation, and in general each diagonalisation corresponds to a limit you can take of real/complex-diagonalisable transformations. Another way of thinking about this is that there is an entire eigenspace spanned by $(1,0)$ and $(1,\varepsilon)$ in that little gap of multiplicity. In this sense, the geometric multiplicity is forced to be equal to the algebraic multiplicity*.<br /><br />Then a nilpotent matrix with characteristic polynomial $\lambda^2=0$ has solutions $\lambda=k\varepsilon$, and is simply diagonalised as:<br /><br />$$\left[\begin{array}{*{20}{c}}{0}&{0}\\{0}&{\varepsilon}\end{array}\right]$$<br />(Think about this.) Indeed, the resulting matrix has minimal polynomial $\lambda^2=0$, and the eigenvectors are as before.<br /><br /><hr /><br />What about higher dimensional matrices? Consider:<br /><br />$$\left[ {\begin{array}{*{20}{c}}0&v&0\\0&0&w\\0&0&0\end{array}} \right]$$<br />This is a nilpotent matrix $A$ satisfying $A^3=0$ (but not $A^2=0$). The characteristic polynomial is $\lambda^3=0$. Although $\varepsilon$ might seem like a sensible choice, it doesn't really do the trick -- if you try a diagonalisation of the form $\mathrm{diag}(0,v\varepsilon,w\varepsilon)$, it has minimal polynomial $A^2=0$, which is wrong. Indeed, you won't be able to find three linearly independent eigenvectors to diagonalise the matrix this way -- they'll all take the form $(a+b\varepsilon,0,0)$.<br /><br />Instead, you need to consider a generalisation of the dual numbers, sometimes called (in computing mathematics and non-standard analysis) the "hyperdual numbers", with the soul satisfying $\epsilon^n=0$. Then the diagonalisation takes for instance the form:<br /><br />$$\left[ {\begin{array}{*{20}{c}}0&0&0\\0&{v\epsilon}&0\\0&0&{w\epsilon}\end{array}} \right]$$<br /><hr /><br />*Over the reals and complexes, when one defines algebraic multiplicity (as "the multiplicity of the corresponding factor in the characteristic polynomial"), there is a single eigenvalue corresponding to that factor. This is of course no longer true over the hyperdual numbers, because they are not a field, and $ab=0$ no longer implies "$a=0$ or $b=0$".<br /><br />In general, if you want to prove things about these numbers, the way to formalise them is by constructing them as the quotient $\mathbb{R}[X]/(X^n)$, so you actually have something clear to work with.<br /><br />(Perhaps relevant: <a href="https://math.stackexchange.com/questions/46078/grassmann-numbers-as-eigenvalues-of-nilpotent-operators">Grassmann numbers as eigenvalues of nilpotent operators</a> -- the Hyperdual numbers are not the same as the Grassmann numbers, and the algebra of the Grassmann numbers is definitely different from that of nilpotent and shear matrices, but go see if you can make sense of it.)<br /><br />Something important to note is that the diagonalisation is not of the form $D=P^{-1}AP$, as the eigenvector matrices are not invertible. However, it is still true that $PD=AP$ -- nonetheless, this limitation prevents this formalism for being any good for e.g. dealing with polynomial-ish differential equations with repeated roots, for instance, as far as I can see. The infinitesimal-perturbation/"take a limit" approach we talked about in <a href="https://thewindingnumber.blogspot.com/2018/03/repeated-roots-of-differential-equations.html"><b>Limiting Cases II: repeated roots of a differential equation</b></a> are still the right approach for that.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-79165447402416700882019-01-20T00:49:00.001+00:002019-03-28T05:46:33.198+00:00Pi and collisions (the 3blue1brown problem)Unless you've been living under a rock, you've probably heard of this problem -- perhaps from 3blue1brown (<a href="https://www.youtube.com/watch?v=HEfHFsfGXjs">link</a>) -- we have a wall (i.e. a thing with infinite mass), and two rocks of mass $m$ and some large multiple $Nm$. The smaller mass $m$ starts out stationary, while $Nm$ has some velocity $w$ in the direction towards the wall. The collisions are elastic. The question is to count the number of collisions there are as $N\to\infty$ (i.e. it approaches the rock you were living under) -- as Grant Sanderson (mysteriously, via some fancy thing he calls "digits of a number") tells us in the video, it approaches $\pi\sqrt{N}$.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-AgsqTm57nsM/XEOkNnITTmI/AAAAAAAAFVI/xWyKhdOI0WkGqi_SPGHT6hEF64FuwaQwgCLcBGAs/s1600/mnm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="476" data-original-width="731" height="208" src="https://4.bp.blogspot.com/-AgsqTm57nsM/XEOkNnITTmI/AAAAAAAAFVI/xWyKhdOI0WkGqi_SPGHT6hEF64FuwaQwgCLcBGAs/s320/mnm.png" width="320" /></a></div>If you haven't watched the linked problem video (the solution isn't revealed in it), you should -- the animations are great. I'll assume in this answer you have a full understanding of what the problem is. Not a tall order.<br /><br />I could actually give you a picture proof right now -- the solution is that amazing -- but I won't. Let's build our insight up to it, so when you see it, you are ready.<br /><br />The moment I think of $\pi$, I think of circles. Well, where are the circles here (besides my shoddy drawings of the two balls)? Here's another thing to think about: how do you solve for any result of a collision? You consider <b>conservation of momentum and energy</b>, of course. Aha, that should click in your mind -- conservation of energy is sort of like a circle, it's an ellipse! The condition:<br /><br />$$\frac12mv_m^2+\frac12Nmv_{Nm}^2=\frac12Nmw^2$$<br />Is the equation for an ellipse. And conservation of momentum is the equation for a line. But there are <b>two sorts of collisions</b> that can occur in this system: collisions between $m$ and $Nm$, and collisions between $m$ and $\infty$. The former conserves momentum, the latter does not (I mean, the law isn't violated, but the momentum of just the two balls together isn't conserved -- it is <b>transferred to the wall</b>). Respectively under the two collisions, we have:<br /><br />$$mv_m+Nmv_{nm}=C\\<br />Nmv_{Nm}=C$$<br />Now, the <i>idea</i> we have is that when we impose conservation of energy and momentum, we are solving for the <b>intersection points</b> of the ellipse and the line -- one intersection point is the pre-collision configuration of velocities, and the other is the one after collision. So the idea we have in our mind is that of a bunch of lines, each corresponding to a different momentum value (because that is not conserved) intersecting a single ellipse, and we want to <b>count the number of intersections</b>.<br /><br /><div style="text-align: center;"><iframe frameborder="0" height="500px" src="https://www.desmos.com/calculator/hjsowcabsk?embed" style="border: 1px solid #ccc;" width="500px"></iframe><br /></div><br />The key idea here is that the collisions, as they occur <b>in real time</b>, correspond to the bouncing of an object off the ellipse as it moves across the lines -- first across the slanted blue line, bouncing off the ellipse, then across the red line, then bouncing off the ellipse onto the green line, then the other blue line, and then its last collision.<br /><br />However, to say this with confidence, we need to be sure that we can really <b>map every collision onto an intersection in the velocity space</b> above, we need the following lemma:<br /><br /><hr /><b>Lemma: </b>Among the inter-collision periods, any given configuration of velocities occurs no more than once, i.e. the velocity space configurations are unique (so there is a bijection (well, the surjection is obvious) between the intersections and the collisions).<br /><br /><b>Proof:</b><br /><i>Case 1: </i>the number of collisions is finite (why do I make these my cases? Because the first argument I thought of requires this assumption, so let's consider the other possibility separately). If a given velocity configuration occurs again, then the system will have to repeat itself (so there will be an infinite number of collisions). Why?<br /><br />Because the number of collisions from a point in time onwards depends only on the configuration of velocities at that point in time (think about why this is true -- specifically, it does not depend on the distance between the masses, or between the masses and the wall -- is this true when you have more than two masses? But don't we actually have three masses here, counting the wall? So it matters if the masses are mobile. Why? What do mobile things do different that immobile things don't).<br /><br /><i>Case 2:</i> we don't even need the finiteness assumption. We know the velocity (not speed) of $Nm$ is non-decreasing (with sign convention of right being positive). Suppose it stabilises at 0. Then both walls have stabilised at 0, and the collisions must be finitely many. Suppose it doesn't. Then the velocity is continually changing as it hits the smaller ball (if their velocities become equal -- "worldlines become parallel" -- then there must have been only finitely many collisions), so each configuration is unique.<br /><hr /><br />Ok, so we need to find the number of intersections between the ellipse with radii $v$ and $v\sqrt{N}$, and the lines, with slope $-1/N$. Truth be told, I spent a lot of time staring at the diagram at this point, having no way of how to proceed. And so should you.<br /><br /><div class="twn-furtherinsight">One thing <i>is</i> easy to see here, which is that the answer is <i>something</i> times $\sqrt{N}$, i.e. it goes to infinity at the rate $\sqrt{N}$ as $N$ goes to infinity. Why is this easy to see?</div><br />Here's how I found the realisation: there are two ways to find $\pi$ in something -- by looking at circles' areas and lengths, or by looking at angles. I had exhausted everything I could looking at lengths (and again, you should too), so let's think about <b>angles</b>. Let's read the angles between the lines.<br /><br />Well, first, let's scale the diagram so the ellipse becomes a circle -- let's make it a unit circle. This is good, because now the slope of the lines becomes $-1/\sqrt{N}$, which means there's only one crazy diverging infinite term to bother about -- rather than an infinite ellipse going crazy and an infinite number of infinite line segments each going infinitely crazy.<br /><br /><div class="twn-furtherinsight">Why is such a scaling okay? Because the number of intersections is invariant under a scaling. There are other things, like distances, that aren't invariant (which is why finding the perimeter of an ellipse is quite hard). What things are invariant? Why?</div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-LTg-u2t2jSE/XEPCL2xkM0I/AAAAAAAAFVU/S_qXY6cWgmgE-aRyYuiCYPLNecD9kGCNwCLcBGAs/s1600/thing.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="476" data-original-width="731" height="416" src="https://2.bp.blogspot.com/-LTg-u2t2jSE/XEPCL2xkM0I/AAAAAAAAFVU/S_qXY6cWgmgE-aRyYuiCYPLNecD9kGCNwCLcBGAs/s640/thing.png" width="640" /></a></div>Each of these angles is clearly $\arctan(1/\sqrt{N})$. The key is to think about the sum of these angles. You might see the trick -- I do quite like it when a funny geometric argument comes out in some bizarre space, a velocity space here, in a physics proof -- and it's "angles in the same segment are equal".<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-zfl85jQx6iE/XEPDnSWGYAI/AAAAAAAAFVg/8N2057eFufQFF17KHZ4RAdBZLv8BFqAKACLcBGAs/s1600/ellipsething_blue.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="476" data-original-width="731" height="416" src="https://4.bp.blogspot.com/-zfl85jQx6iE/XEPDnSWGYAI/AAAAAAAAFVg/8N2057eFufQFF17KHZ4RAdBZLv8BFqAKACLcBGAs/s640/ellipsething_blue.png" width="640" /></a></div>So the sum of those angles approaches the angle subtended by that big chord. What is that big chord? I wonder if it has a name. Well, as $N$ approaches infinity, the chord approaches the <b>diameter</b>, and that angle approaches $\pi/2$. There's another $\pi/2$ from the angles on the other side, so the total of the angles at all intersections is $\pi$.<br /><br />(Great, another fact from elementary high-school geometry.)<br /><br />So the total number of angles/intersections/collisions is $\frac{\pi}{\arctan(1/\sqrt{N})}$, which as $N\to\infty$, is the same thing as $\pi\sqrt{N}$.<br /><br />Which is great. It's just <i>great</i>.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-18135653107196793432019-01-18T23:27:00.000+00:002019-04-23T09:18:39.192+01:00Covectors, conjugates, and the metric tensorThe fact -- as is often introduced in an introductory general relativity or tensor calculus course -- that the gradient is a <i>covector</i> seems rather bizarre to someone who's always seen the gradient as the "steepest ascent vector". Surely, the direction of steepest ascent is, you know, a direction -- an arrow. And what even is a covector, anyway?<br /><div><br /><div><div>Let's think about differentiating with respect to vectors. The idea we have is that $\frac{\partial f}{\partial \vec x}$ needs to contain all the information -- each of the $\frac{\partial f}{\partial x_i}$. And analogously for derivatives with respect to tensors. You might think we could just create an array with the same dimensions containing each derivative -- much like the gradient, Hessian, etc. that we're used to -- i.e.</div></div></div><br />$$\nabla f=\left[ {{\partial ^i}f} \right]$$<br />$$\nabla^2f = \left[ {{\partial ^i}{\partial ^j}f} \right]$$<br />(I'm using $\nabla^2$ for the Hessian -- and will do so in the rest of the article -- but it's too widely used for its trace the Laplacian, which should be represented as $|\nabla|^2$) etc. But you might get the sense that this feels just fundamentally wrong -- like you're giving the "division by tensor" object the structure of the same tensor, but you should somehow be giving it an "inverse" structure. <br /><br />We want to construct a situation to see that the idea above -- of making the gradient ("derivative with respect to a vector") and Hessian ("derivative with respect to a rank-2 tensor") -- a vector and a rank-tensor doesn't work. We know such a situation can arrive when we have multiplication between the gradient and a vector, or the Hessian and a rank-2 tensor. For instance, for linear $f$:<br /><br />$$f(\vec{x})-f(0)=\vec{x}\cdot\nabla f$$<br />But this is <i>wrong</i> -- for any non-Euclidean manifold. For instance, if the metric tensor is something like $\rm{diag}(-1,1)$, this dot product gives:<br /><br />$$f(\vec{x})-f(0)= - x\frac{{\partial f}}{{\partial x}} + y\frac{{\partial f}}{{\partial y}}$$<br />Which is just wrong. So instead, the gradient is a <i>covector</i>, which we represent in Einstein notation using subscripts instead of superscripts:<br /><br />$$f(\vec x) - f(0) = {x^i}{\partial _i}f$$<br />(As you can see, I omitted Einstein notation when I was writing the wrong equations -- seeing repeated indices on the same vertical alignment is physically painful.) If we want the <i>vector</i> gradient -- for direction of steepest ascent or whatever -- you need to multiply by the metric tensor.<br /><br /><div class="twn-furtherinsight">This also motivates the picture of seeing covectors as parallel surfaces whose normals are their vector versions -- in Euclidean geometry, it doesn't make a difference, but on a general setting, this normality is a bit weird. Think about this.</div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/1-form_linear_functional.svg/400px-1-form_linear_functional.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="217" data-original-width="400" height="173" src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/1-form_linear_functional.svg/400px-1-form_linear_functional.svg.png" width="320" /></a></div><br />But I haven't really given a motivation for the metric tensor or how it comes up here -- for this, read on.<br /><br /><hr /><br /><div>Let's talk about something completely different -- let's think about the derivative of functions from $\mathbb{C}\to\mathbb{R}$, $df/dz$. I don't know about you, but I like the complex numbers, and prefer them to $\mathbb{R}^2$, because pretty much anything I write with the complex numbers is well-defined, and easily so -- so I don't need to worry about whether $df/d \vec{x}$ makes any sense or not. Well, we can write:</div><div><br /></div><div>$$\frac{df}{dz}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial z}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial z}\\\Rightarrow \frac{df}{dz}=\frac{\partial f}{\partial x}-\frac{\partial f}{\partial y}i$$</div><div>This $df/dz$ above is exactly the analog of the gradient for real-valued functions defined on the complex plane -- analogous to scalar multivariable functions.<br /><br /></div><div class="twn-furtherinsight">What's the expression for the complex derivative of a complex function? Compute it -- it may look a bit different from the analogous tensor derivative -- think of traces and commutators.</div><br /><div class="twn-pitfall"><b>Note:</b> In actual complex calculus, complex differentiability is defined in a more restrictive way -- specifically one needs to satisfy the <a href = "https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations">Cauchy-Riemann equations</a>, which makes the structure of complex functions fundamentally more special than that of multivariable functions, stuff like $dx/dz$ is even undefined, and the stuff we've written above isn't really relevant in complex analysis. It is, however, the "Wirtinger derivative".</div><br />Something interesting happened here, though -- we got a negative sign on the imaginary component of the derivative. The derivative got conjugated, or something -- and the reason this occurred is that $i^2=-1$ (so $1/i=-i$), and this leaves some sort of signature in our derivative.<br /><div><br /></div>Now let's (<i>non-rigorous alert!</i>) think about how an analogous argument may be written for vectors.<br /><br />$$\frac{{df}}{{d\vec x}} = \frac{{\partial f}}{{\partial x}}\frac{{\partial x}}{{\partial \vec x}} + \frac{{\partial f}}{{\partial y}}\frac{{\partial y}}{{\partial \vec x}}$$<br />What really is $\frac{{\partial x}}{{\partial \vec x}}$, though? We know that $\frac{\partial \vec x}{\partial x}=\vec{e_x}$. But what's the "inverse" of a vector? What does that even mean?<br /><br />So we want to define some sort of a product, or multiplication, with vectors -- we want to define a thing that when multiplied by a vector gives a scalar. It sounds like we're talking about a dot product -- but the dot product lacks an important property we need to have division, it's not injective. I.e. $\vec{a}\cdot\vec{b}=c$ for fixed $\vec{a}$ and $c$ defines a whole plane of vectors $\vec{b}$, not a unique one. But if we added an additional component to our product, the cross product (or in more than three dimensions, the wedge product), then the "dot product and cross product combined" <i>is</i> injective.<br /><br /><div class="twn-furtherinsight">This combination, of course, is the tensor product. Specifically, when we're talking about something like $1/\vec{e_x}$, we want a thing whose tensor product with $\vec{e_x}$ has trace (dot product) 1 and commutator (wedge/cross product) 0, i.e. $\mathrm{tr}(\vec{e_x}'\vec{e_x})=1$ and $(\vec{e_x}'\vec{e_x})-(\vec{e_x}'\vec{e_x})^T=0$.</div><br />If all you've ever done in your life is Euclidean geometry, you'd probably think the answer to this question is $\vec{e_x}$ itself -- indeed, its dot product with $\vec{e_x}$ is 1 and its cross product with $\vec{e_x}$ is 0. But if you've ever done relativity and dealt with -- forget curved manifolds! -- the Minkowski manifold, you know that this is not necessarily true -- it depends on the metric tensor.<br /><br />Could we <i>define</i> a vector in a general co-ordinate system that is the inverse of $\vec{e_x}$? Yes, we can. But let's not do that (yet*) -- it just seems like there should be something more natural, or elegant, like we had with complex numbers.<br /><br />So we define a space of "covectors", as "scalars divided by vectors" (informally speaking), call their basis $\tilde{e^i}$ which have the required dot and cross products. In Euclidean space -- and only in Euclidean space, these look exactly the same as vectors, and have exactly the same components. I like to call the conjugation here "metric conjugation", and the gradient is naturally a covector.<br /><br />*As for the question of writing the gradient as a vector instead, this follows naturally using the metric tensor -- as an exercise, show, by considering the required vector corresponding to the covector $\tilde{e^x}$ (i.e. that has the right dot and cross products with $\vec{e_x}$) that the vector gradient can be given as the product of the inverse metric tensor and the covector gradient:<br /><br />$${\partial ^\mu }f = {g^{\mu \nu }}{\partial _\nu }f$$<br />(Do this exercise! It is <em>the</em> motivation for the metric tensor, and why it determines your co-ordinate system!)<br /><br /><hr /><br /><div class="twn-furtherinsight">I've been talking about the covector $\tilde{e^x}$ as being equal to the quotient "$1/\vec{e_x}$" but as I mentioned, this isn't really accurate -- the "1" in the quotient is a (1,1) tensor with trace 1 and commutator 0. Think about this tensor. Can you find this tensor in Clifford algebra? Maybe not. Can you find it as a linear transformation? Yes? Find it. And can you think of the covector alternatively as a quotient of a bivector and a trivector? Will you get $(e_y\wedge e_z)/(e_x\wedge e_y\wedge e_z)$?</div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-46717840767668938022018-12-08T14:35:00.000+00:002019-02-12T18:42:53.747+00:00Intuition, analogies and abstraction$$-1=\sqrt{-1}\sqrt{-1}=\sqrt{(-1)(-1)}=\sqrt{1}=1$$<br />I bet you've seen the fake "proof" above that minus one and one are equal. And the standard explanation as to why it's wrong is that the statement $\sqrt{ab}=\sqrt{a}\sqrt{b}$ only applies when $\sqrt{a}$ and $\sqrt{b}$ are real, or something like that (maybe only one of them needs to be real -- something like that -- who cares?).<br /><br />But if you're like me, that isn't a very satisfactory proof. <i>Why</i> does the identity not hold for complex numbers? For that matter, why does it hold for real numbers? Well, that is a good question, and one way of answering it would be to try and prove the identity for real numbers, and see what properties of the real numbers (or of the real square root, in particular) you use. And if this article were being filed under "MAR1104: Introduction to formal mathematics", that's how I might explain things -- but that doesn't give us too much insight -- not about square roots and complex numbers, anyway.<br /><br />Let's think about what $\sqrt{ab}=\sqrt{a}\sqrt{b}$ means.<br /><br /><div class="twn-furtherinsight">What does the square root of a real number mean, anyway? It's some property related to multiplying a real number by itself. What does multiplication mean? What does a real number mean? The picture I have in my head of the real numbers is of a line. But what exactly is this line? -- the real numbers are just a set. Why did you put them on this line in this specific way? In doing so, you gave the real numbers a <i>structure</i>, a specific type of structure called an "order", defined by the operation $<$.<br /><br />But there are other ways to think about/structure the real numbers. One way is to think of real numbers as (one-dimensional) scalings. You can scale things like mass, and volume, using real numbers, representing the scalings as real numbers. Scaling a mass by 2 is equivalent to multiplication by 2. So this gives the real numbers a multiplicative structure, defined by the operation $\times$ (or whatever notation -- or lack thereof -- you prefer). And the "real line" then just represents the image of "1" under all scalings.<br /><br />So the way to think about square roots is to think of numbers as linear transformations called scalings, and think about the scaling that when done twice, gives you the number you're taking the square root of. So what's $\sqrt{-1}$? What's $-1$? $-1$, multiplicative, is a reflection. What's its square root? Try to think of a (linear!) transformation that when done twice gives you a reflection. It can't be done in one dimension. And can you think of another such transformation? Can you prove these are the only two? Are you sure -- what about if you add a dimension?<br /><br />So the natural way to think about square roots of numbers that may or may not be complex, is with so-called "Argand diagrams", on the complex plane, the image of "1" under all complex numbers multiplicative.</div><br /><center><iframe frameborder="0" height="500px" src="https://www.desmos.com/calculator/lufx5iszcs?embed" style="border: 1px solid #ccc;" width="500px"></iframe></center>Click "edit graph" to play with <em>a</em> and <em>b</em>!<br /><br />To simplify things, consider only unit complex numbers (this is okay, because all complex numbers can be written as a real multiple of a unit complex number and a real number). The product of complex numbers $a$ and $b$ involves rotating by $a$, then rotating by $b$. The square roots of $a$ and $b$ involve going halfway around the circle as $a$ and $b$, and the square root of $ab$ goes halfway around the circle as $ab$.<br /><br />So it seems like the identity should hold, doesn't it? $\sqrt{ab}$ goes half as much as $a$ and $b$ put together -- this seems to be exactly what $\sqrt{a}\sqrt{b}$ does -- go around half as much as $a$, then half as much as $b$. Isn't $\frac{\theta+\phi}2=\frac{\theta}2+\frac{\phi}2$?<br /><br />The problem is that $\sqrt{ab}$ doesn't really go $\frac{\theta+\phi}2$ around the circle, if $\theta+\phi$ is greater than $2\pi$. You can see this in the diagram courtesy of Desmos above -- $ab$ has gone a <i>full circle</i>, and its square root is defined to halve the <i>argument</i> of $ab$, but the argument isn't $\arg (ab)=\arg (a) + \arg (b)$, rather:<br /><br />$$\arg (ab) \equiv \arg (a) + \arg (b) \pmod{2\pi}$$<br />But <i>halving</i> is not an operation that the $\bmod$ equivalence relation respects -- not in general, anyway. It is <i>not</i> true that<br /><br />$$\arg (ab)/2 \equiv (\arg (a) + \arg (b))/2 \pmod{2\pi}$$<br />Instead:<br /><br />$$\arg (ab)/2 \equiv (\arg (a) + \arg (b))/2 \pmod{\pi}$$<br />Let's recall from basic number theory -- on integers, the general result regarding multiplication on mods. If $a\equiv b\pmod{m}$, then $na\equiv nb \pmod{nm}$, certainly, and also $na\equiv nb \pmod{m}$ iff $n$ is an integer*. But $1/2$ <i>isn't</i> an integer, which is why only the former result is relevant.<br /><br /><div class="twn-furtherinsight">This is also why $(ab)^2=a^2b^2$ <i>does</i> hold for complex numbers.</div><br />*when $n$ isn't an integer, we need $na$, $nb$ to be integers for the statement to even be <i>well-defined</i> in standard number theory, and then you have a result for division on mods involving $\gcd(d,m)$, etc. This isn't a concern for us here because we're dealing with divisibility over the reals -- if you want to be formal, a real number is divisible by another real number if the former can be written as an integer multiple of the latter.<br /><br />So there you have it -- I just demonstrated a very fundamental analogy between two seemingly incredibly unrelated ideas: complex numbers modular arithmetic -- square roots of complex numbers don't multiply naturally, because <i>mod</i> doesn't respect division. It's almost as if somehow, somewhere, somehow magically, <i>exactly the same kind of math was used to derive results, to prove things, about these unrelated objects</i>.<br /><br />As if they're just two instances of the same thing.<br /><br />I wonder what that thing could be.<br /><br /><hr /><br />Let's talk about something completely unrelated (no, genuinely -- completely unrelated -- I won't tell you this is an instance of the "same thing" too). Let's talk about logical operators, specifically: do $\forall$ and $\exists$ commute? I.e. is $\forall t, \exists s, P(s,t)$ equivalent to $\exists s, \forall t, P(s,t)$?<br /><br />You just need to read the statements aloud to realise they don't. To use a classical example, "all men have wives" and "there is a woman who is the wife of all men" are two very different statements (okay, in this case both statements are false, so they're equivalent in that sense, so you get my point).<br /><br />But let's think more deeply about why they don't commute. What do $\forall t, \exists s, P(s,t)$ and $\exists s, \forall t, P(s,t)$ mean, anyway? $\forall$ and $\exists$ are just infinite $\land$ and $\lor$ statements , i.e. $\forall t$ is just an $\land$ statement ranging over all possible values that $t$ can take and $\exists s$ is just an $\lor$ statement ranging over all possible values $s$ can take.<br /><br />So $\forall t, \exists s, P_{st}$ just means (letting $s$ and $t$ be natural numbers for simplicity, but they don't have to):<br /><br />$$({P_{11}} \lor {P_{21}} \lor ...) \land ({P_{12}} \lor {P_{22}} \lor ...) \land ...$$<br />And $\exists s, \forall t, P(s,t)$ means:<br /><br />$$({P_{11}} \land {P_{12}} \land ...) \lor ({P_{21}} \land {P_{22}} \land ...) \lor ...$$<br />This is a bit complicated, so let's instead look at the simpler case where you have only 2 by 2 statements -- i.e. just construct the analogy between $\forall,\exists$ and actual $\land,\lor$ statements.<br /><br />So the question is if:<br /><br />$$({P_{11}} \lor {P_{21}}) \land ({P_{12}} \lor {P_{22}}) \Leftrightarrow ({P_{11}} \land {P_{12}}) \lor ({P_{21}} \lor {P_{22}})$$<br />This is interesting. Maybe you see where this is going. Let me just do a notation change -- I'll use "$\times$" for $\land$, "$+$" for $\lor$, "$=$" for $\Leftrightarrow$" and some new letters for the propositions. Under this new notation, where $\times$ is invisible as always, we're asking if:<br /><br />$$(a + b)(c + d) = ac + bd$$<br /><br />Aha! This is Freshman's dream, isn't it? And we know it's not true -- it's a dream, after all, don't be delusional -- and we know <i>why</i> it's not true too.<br /><br />But wait -- we aren't talking about elementary algebra here. I just gave you some silly notation and made it <i>look</i> like Freshman's dream. But here's the thing: the proof (or algebraic proof -- a counter-example is also a proof, but that isn't so interesting... not here, anyway) that these propositions aren't equivalent is <i>exactly</i> the same as in algebra. We expand out the brackets (because we know that $\land$ distributes over $\lor$ -- we also know that $\lor$ distributes over $\land$, incidentally, something that is <i>not</i> true in standard algebra) and point out that there are extra terms, and point out that these extra terms change the value of the expression (they aren't zero).<br /><br />So there's some kind of relationship between the boolean algebra and an elementary algebra. A lot of proofs that can be done in one of these algebras can be written almost identically in the other. Not <i>all</i> these proofs, mind you -- then the algebras would just be isomorphic to each other -- but some of them can. Maybe a lot of important ones can.<br /><br /><div class="twn-pitfall">An abstraction that produces such proofs simultaneously for both elementary algebra and boolean algebra may be more complicated than you think -- there's no real sense in which a statement is "always zero" in boolean algebra. Take for instance, distributivity of $\lor$ over $\land$ -- $a+bc=(a+b)(a+c)$. This is not true in elementary algebra, because the extra term $ab+ac$ is not always equal to zero ($a^2\ne a$ is not really an example, because $a^2=a$ for $a\in\{0,1\}$ -- but $a(b+c)=0$ is not true for all $a,b,c\in\{0,1\}$). It's just that it leaves the value of the existing terms unchanged in this specific instance.</div><br /><hr /><br />I've just illustrated two examples here -- the first one is a type of group, by the way, but you've probably seen dozens of other such "connections between different areas of mathematics" yourself. I've made these sorts of analogies fundamental to a lot of the articles I've written here (I think). You might've just thought of them as interesting insights, but in reality, abstract mathematics/abstract algebra -- or really just mathematics in general -- is all about these analogies.<br /><br />In a sense, mathematics is largely about abstraction. I mean, that's not what mathematics fundamentally <i>is</i> -- fundamentally, math is just logic -- but it's how mathematics largely functions. Whenever one talks of axioms, you could think of them as fundamental defining ideas of mathematical objects, and you can also think of them as "interfaces" between mathematics and reality (see my <a href="https://thewindingnumber.blogspot.com/2017/01/introduction-to-linear-transformations.html">introduction to linear transformations</a>). There are a massive number of different physical phenomena that we can study, and rather than prove everything from scratch for each one of them, it is much better -- and more insightful in terms of understanding the connections between things -- to show that they satisfy a certain set of axioms that apply to a whole range of things, and then deduce that all the logical consequences of these axioms -- all theorems -- are satisfied by the objects.<br /><br />If we can do that with physical phenomena, we can sure as well do it with mathematical phenomena too -- instead of proving something from scratch for every new mathematical object, we prove that it is a group, or a ring, or a field, or a module, or an algebra, or a topology, or a geometry of some sort, by verifying it matches the axioms -- and then use all the abstract knowledge we have about these things and deduce they must necessarily apply to our new object, because they are logical consequences of our axioms.<br /><br />Abstract mathematics is, in this sense, all about generalising things by finding the "smallest set of axioms" the thing requires.<br /><br />(Well, not really -- the most general statement is "true", and everything else is just a logical deduction from this statement. So in that sense mathematics is all about finding special cases. But in order to know what to take a special case of, and what special case that "what" is of "true", you need to generalise.)<br /><br /><div class="twn-exercises">List some weird analogies you've seen before in math. Something about divisibility sound familiar?</div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-87286097598672980792018-11-25T22:54:00.001+00:002019-01-25T21:40:33.522+00:00Understanding polynomial-ish differential equationsThis is a rather simple idea, perhaps not one you really had too many problems understanding to begin with. Given you know that $e^{\lambda x}$ solves first-order polynomial differential equations, it's not too much of a stretch to imagine it solves higher-order polynomial differential equations too. But let's talk about this anyway.<br /><br />So suppose you have differential equation like:<br /><br />$$y''-3y'+2=0$$<br />A more interesting way of writing this would be:<br /><br />$$(D-1)(D-2)y=0$$<br /><div class="twn-furtherinsight">The fact that you can do such a factoring is a consequence of the fact that polynomials in $D$ form a <i>commutative ring</i>. The idea behind rings and fields and other such objects is to look for a bunch of properties that a familiar set -- like the integers or the real numbers -- satisfies, then drilling those properties down to the basic axioms that imply them, to generalise them to objects other than the integers or real numbers. Differentiation operators are a great example of such a ring.</div><br />Now, your first instinct may to look at the factorisation and claim that $(D-1)y=0$ or $(D-2)y=0$. But this isn't right -- you assumed, here, incorrectly, that $(D-1)^{-1}$ and $(D-2)^{-1}$ existed (and that when applied on 0, they give you 0). This is not right, though -- we know there are in fact multiple functions that give 0 when you take $(D-1)$ of them. Which functions, specifically? The functions that are in the null space of $D-1$, i.e. the functions which satisfy:<br /><br />$$(D-1)f=0$$<br />And 0 isn't the only such function. Ok, I've been giving you silly tautologies for about three lines now, but the point I'm making is that when you take the inverse operator of $(D-1)$ of both sides, what you really get is:<br /><br />$$(D-2)y=(D-1)^{-1}0=ce^{x}$$<br />For arbitrary $c$.<br /><br /><div class="twn-furtherinsight">The way to think about this kind of a $c$ is that you don't really have an <i>equal to</i> relation, i.e. an equation, you have an <i>equivalence</i> relation -- the "=" sign there is really abuse of notation. And you're saying that $(D-2)y$ belongs in an equivalence class where all elements are of the form $ce^{x}$ (and your quotient group's "representative element" can be $e^x$. The same applies, for example, for _____ in calculus -- fill in the blank. Well, fill it in.</div><br />Anyway, what you now have is a first-order differential equation (or really differential <i>equivalence</i>) in $y$.<br /><br />$$(D-2)y=c_1e^x$$<br />But it isn't homogenous. I don't really know how to motivate a solution for a non-homogenous differential equation, really -- all I can say is that because the right-hand-side is an exponential, we just <i>know</i> that we can get some hints as to what $(D-2)^{-1}(ce^x)$ is by applying $(D-2)(ce^x)$ -- and if the right-hand-side <i>isn't</i> an exponential, then you can make it a sum or integral of exponentials, which is what Laplace and Fourier transforms are all about.<br /><br />In any case, performing $(D-2)$ on $c_1e^x$ gives us $(c_1-2)e^x$, which immediately gives us an example solution, or a particular solution, $(c_1+2)e^x$ -- and all other solutions can be formed by adding linear combinations of the elements of the null space, i.e. solutions to the homogenous equation $(D-2)y=0$. These elements we know to take the form $c_2e^{2x}$.<br /><br />$$y=(D-2)^{-1}c_1e^x=(c_1+2)e^x+c_2e^{2x}$$<br />Or transforming arbitrary constants,<br /><br />$$y=c_1e^x+c_2e^{2x}$$<br /><br /><div class="twn-exercises">Use this method to find a general form for the solution to $(D-\alpha_1)(D-\alpha_2)...(D-\alpha_n)y=0$. Formalise our method with induction, and prove this general form with induction.<br /></div>Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-75152393674113471782018-10-27T15:06:00.000+01:002019-01-30T12:37:33.303+00:00Understanding variable substitutions and domain splitting in integralsOften when I'm reading a computation of some weird integral that contains some kind of a "trick" for some variable substitution and can't help but think "How could I have thought of that?" And even when introducing these at schools, these are usually taught as "tricks", and the strategy to decide which "trick" to use is memorised -- you see $1+x^2$? Well, that's either $\tan x$ or $\cot x$. And sure, for such simple ones, that kind of a trick might make sense. You know, you have something that really looks like a trig identity, so let's just make it one...<br /><br />But I tend to find often that these kinds of "tricks" can be motivated and made to make sense, and I think that there usually is such a way to come up with one from mathematical insight (and I think so, because someone's had to actually come up with the tricks).<br /><br />Here's the Cauchy-Schwarz inequality for functions on [0, 1]:<br /><br />\[{\left[ {\int_0^1 {f(t)g(t)dt} } \right]^2} \le \int_0^1 {f{{(t)}^2}dt} \,\int_0^1 {g{{(t)}^2}dt} \]<br />How would we go about proving this?<br /><br />Well, perhaps you recall what the proof of the Cauchy-Schwarz inequality for ordinary vectors in $\mathbb{R}^n$ looks like. Here's a standard proof:<br /><br />\[{\left( {{x_1}{y_1} + {x_2}{y_2} + ... + {x_n}{y_n}} \right)^2} \le \left( {{x_1}^2 + {x_2}^2 + ... + {x_n}^2} \right)\left( {{y_1}^2 + {y_2}^2 + ... + {y_n}^2} \right)\]<br />\[\left( {\begin{array}{*{20}{c}}{{x_1}^2{y_1}^2 + {x_1}{y_1}{x_2}{y_2} + ... + {x_1}{y_1}{x_n}{y_n} + }\\\begin{array}{l}{x_2}{y_2}{x_1}{y_1} + {x_2}^2{y_2}^2 + ... + {x_2}{y_2}{x_n}{y_n} + \\... + \\{x_n}{y_n}{x_1}{y_1} + {x_n}{y_n}{x_2}{y_2} + ... + {x_n}^2{y_n}^2 + \end{array}\end{array}} \right) \le \left( {\begin{array}{*{20}{c}}{{x_1}^2{y_1}^2 + {x_1}^2{y_2}^2 + ... + {x_1}^2{y_n}^2 + }\\\begin{array}{l}{x_2}^2{y_1}^2 + {x_2}^2{y_2}^2 + ... + {x_2}^2{y_n}^2 + \\... + \\{x_n}^2{y_1}^2 + {x_n}^2{y_2}^2 + ... + {x_n}^2{y_n}^2\end{array}\end{array}} \right)\]<br /><br />And now we simply need the fact that $2{x_i}{y_i}{x_j}{y_j} \le {x_i}^2{y_j}^2 + {x_j}^2{y_i}^2$, which is of course true since squares are nonnegative.<br /><br />Why on Earth would I walk you through this inane proof, which I'd rather be flogged to death than have to write? Because you might get the idea that the same principle can be applied for functions.<br /><br />What exactly would be the analogy? Well, let's first "expand out" the product of the two integrals, like we expanded out the product of two sums -- this just means rewriting the product as a double-integral.<br /><br />\[\iint_{{[0,1]}^2}{f(s)g(s)f(t)g(t)\,ds\,dt} \leq \iint_{{[0,1]}^2} {{f{{(s)}^2}g{{(t)}^2}\,ds\,dt}}\]<br />This is essentially the same as our double summation on $[1,n]^2$ from earlier -- and like before, the diagonals of the summations are exactly identical (this idea should itself tell you when the inequality becomes an equality) -- and we'd like to prove, as before, that the inequality holds for each sum of corresponding elements across the diagonal.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://i.stack.imgur.com/WfuSV.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="320" data-original-width="800" height="128" src="https://i.stack.imgur.com/WfuSV.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">(Why does the principal diagonal look oriented different from that for the vectors in $\mathbb{R}^n$?) But how would you actually write down, on paper, this technique of summing up stuff across the principal diagonal? Well, you'll need to split your domain into two, then "reflect" one domain across the principal diagonal so the two integrals can be on the same (new triangular) domain.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">So we start with:</div><div class="separator" style="clear: both; text-align: left;"><br /></div>\[\int\limits_0^1 {\int\limits_0^1 {f(s)g(s)f(t)g(t){\kern 1pt} ds{\kern 1pt} dt} } \le \int\limits_0^1 {\int\limits_0^1 {f{{(s)}^2}g{{(t)}^2}{\kern 1pt} ds{\kern 1pt} dt} } \]<br />Where we're integrating first on $s$ (let's say this is the x-axis) and then on $t$ (the y-axis). To reflect anything, we need to actually be dealing with that thing, so split the domain of $s$ (which we can do, since $t$ is still a variable) into $[0,t]$ and $[t,1]$. This is equivalent to splitting the entire domain into the two triangles (convince yourself that this is the case if you don't see it immediately).<br /><br />\[\int\limits_0^1 {\int\limits_0^t {f(s)g(s)f(t)g(t){\kern 1pt} ds{\kern 1pt} dt} } + \int\limits_0^1 {\int\limits_t^1 {f(s)g(s)f(t)g(t){\kern 1pt} ds{\kern 1pt} dt} } \]\[\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \le \int\limits_0^1 {\int\limits_0^t {f{{(s)}^2}g{{(t)}^2}{\kern 1pt} ds{\kern 1pt} dt} } + \int\limits_0^1 {\int\limits_t^1 {f{{(s)}^2}g{{(t)}^2}{\kern 1pt} ds{\kern 1pt} dt} } \]<br />Where the split integrals represent the top-left and bottom-right squares respectively. Now how do we "reflect" the second part-integral on each side to match the domain of the first-part integral? The reflection is just:<br /><br />\[s' = t\]\[t' = s\]<br />If we transform the second part-integrals under this transformation:<br /><br />\[\int\limits_0^1 {\int\limits_0^t {f(s)g(s)f(t)g(t)\,ds\,} dt} + \int\limits_0^1 {\int\limits_{s'}^1 {f(t')g(t')f(s')g(s')\,dt'\,} ds'} \]\[\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \le \int\limits_0^1 {\int\limits_0^t {f{{(s)}^2}g{{(t)}^2}{\kern 1pt} ds{\kern 1pt} } dt} + \int\limits_0^1 {\int\limits_{s'}^1 {f{{(t')}^2}g{{(s')}^2}{\kern 1pt} dt'{\kern 1pt} } ds'} \]<br />(Don't mind the $x'$ notation for the new co-ordinates -- you should think of $x'$ as matching up with $x$) But our transformation isn't really over. The two part integrals are now integrating over the same <i>domain</i> -- the top-left triangle -- but in different ways. To see this, just consider the "way we were integrating" before the transformation and see how it transforms under our reflection:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://i.stack.imgur.com/k3g5y.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="320" data-original-width="800" height="128" src="https://i.stack.imgur.com/k3g5y.png" width="320" /></a></div><br />... which are different parameterisations of the same region. So we just reparameterise the second part-integrals (shown in green) to match that of the blue integrals, leaving the integrand the same:<br /><br />\[\int\limits_0^1 {\int\limits_0^t {f(s)g(s)f(t)g(t){\kern 1pt} ds{\kern 1pt} } dt} + \int\limits_0^1 {\int\limits_0^{t'} {f(t')g(t')f(s')g(s'){\kern 1pt} ds'{\kern 1pt} } dt'} \]\[\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \le \int\limits_0^1 {\int\limits_0^t {f{{(s)}^2}g{{(t)}^2}{\kern 1pt} ds{\kern 1pt} } dt} + \int\limits_0^1 {\int\limits_0^{t'} {f{{(t')}^2}g{{(s')}^2}{\kern 1pt} ds'{\kern 1pt} } dt'} \]<br />And then we can add the integrals:<br /><br />\[\int\limits_0^1 {\int\limits_0^t {\left[ {2f(s)g(s)f(t)g(t)} \right]\,{\kern 1pt} ds{\kern 1pt} } dt} \,\, \le \,\,\,\int\limits_0^1 {\int\limits_0^t {\left[ {f{{(s)}^2}g{{(t)}^2} + f{{(t)}^2}g{{(s)}^2}} \right]\,{\kern 1pt} ds{\kern 1pt} } dt} \]<br />Which is true as it is true locally, i.e.<br /><br />\[2f(s)g(s)f(t)g(t) \le f{(s)^2}g{(t)^2} + f{(t)^2}g{(s)^2}\]<br />Which proves our result.<br /><br /><hr /><br />What's the point of going through all of this? Well, the point is that if I'd just thrown the substitutions at you -- or worse, the reparameterisation of the region, or the splitting in the first place -- without any motivation, then it would take about 20 days before there'd be murder charges on you and a tombstone on me. The reason you make them is because you want to unify the integrands -- but this motivation comes at the <i>very beginning</i>, before you start doing any substitutions, because that's why you're doing the substitutions in the first place, <i>that's how you come up with them</i>.<br /><br /><b>Exercise: </b>Motivate the substitutions and changes in the Gaussian integral, $\int_{-\infty}^\infty e^{-x^2}dx=\sqrt{\pi}$. Hint : what's the significance of the two-variable normal distribution?<br /><br />Another exercise: consider the integral $\int_\gamma \frac{f(z)}zdz$ ($\gamma$ is a circle) with the substitution $z=re^{i\theta}$ -- what substitution is this? Understand this geometrically with thin triangles and averaging on circles or whatever.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-18444184618033141622018-10-10T22:12:00.000+01:002018-10-10T22:12:13.030+01:00Discovering the Fourier transformConsider a function with period 1 -- computing its Fourier series, you write it as:<br /><br />\[f(x) = \sum\limits_{n = - \infty }^\infty {{a_n}{e^{i2\pi \,\,nx}}} \]<br />Where<br /><br />\[{a_n} = \int_{-1}^1 {f(x){e^{ - 2\pi inx}}dx} \]<br />That's all standard and trivial. But suppose you wanted to study a function with a higher period (we will tend this period to infinity) -- what would that look like? Well, consider $g(x)=f(x/L)$, which is this function we're looking for -- then we can rewrite the above identities as:<br /><br />\[g(xL) = \sum\limits_{n = - \infty }^\infty {{a_n}{e^{i2\pi {\kern 1pt} {\kern 1pt} nx}}} \Rightarrow g(x) = \sum\limits_{n = - \infty }^\infty {{a_n}{e^{i2\pi {\kern 1pt} {\kern 1pt} nx/L}}} \]<br />\[{a_n} = \int_{-1}^1 {g(xL){e^{ - 2\pi inx}}dx} \Rightarrow {a_n} = \int_{-L}^L {g(x){e^{ - 2\pi inx/L}}dx/L} \]<br /><br />Where we transformed $x\to x/L$.<br /><br />This seems all too trivial and useless, and maybe you're looking for a little trick to turn this into something interesting. But tricks must typically also arise from some sort of insight. Let's assume for a moment that we didn't know anything about variable substitutions or transformations like the kind we did above (and indeed, the idea behind variable substitutions also comes from a geometric understanding of the corresponding transformation) and think about how we may re-think the Fourier transform in its context.<br /><br />Well, if the function's period is $P$, in other words it is stretched out by $P$, the same logic must be used to derive the Fourier series for the new function as for the function with period 1 -- specifically, sines and cosines with <i>longer periods </i>than $P$ don't matter (their coefficient must be zero, because otherwise you've introduced an element into the function that doesn't repeat with that period), but those with <i>shorter, divisible periods </i>matter, because they influence the value of the function within the period, perturbing it by little bits to get to the right function.<br /><br />So when dealing with our new period $L$, one would expect periods that are fractions of $L$, i.e. $L/n$, as opposed to just $1/n$. So $n/L$ is "more important" than $n$, and indeed it seems very easy to transform the summation into one in terms of this new variable, which we will still call $n$ (i.e. transform $n/L\to n$):<br /><br />\[g(x) = \sum\limits_n^{} {{a_n}{e^{i2\pi nx}}} \]<br />\[{a_n} = \frac{1}{L}\int_{-L}^L {g(x){e^{ - 2\pi inx}}dx} \]<br /><br />Where we labeled $a_{nL}$ as just $a_n$, because that's just a subscript, the labeling doesn't matter. Just remember that $n$ is no longer just an integer/multiple of 1, but a multiple of any fraction $1/L$.<br /><br />Now note how a non-periodic function is just a function with infinite period, i.e. $L\to\infty$. So $n$ stops being a discrete integer and starts approaching a continuous variable, which we'll call $s$, writing $a_n$ as $a(s)ds$ (why the $ds$? because the increment in $n$ is just $1/L$, which appears in the expression for $a_n$).<br /><br />\[g(x) = \int_{ - \infty }^\infty {ds\,\,a(s){e^{i2\pi sx}}} \]<br />\[a(s) = \int_{ - \infty }^\infty {dx\,\,g(x){e^{ - i2\pi sx}}} \]<br /><br />Which is just a pretty satisfying result.<br /><br /><hr /><br />Recall again the expressions we got for the Fourier transform and its inverse:<br /><br />\[f(t) = \int_{ - \infty }^\infty {ds\,\,\hat f(s){e^{i2\pi ts}}} \]<br />\[\hat f(s) = \int_{ - \infty }^\infty {dt{\kern 1pt} {\kern 1pt} f(t){e^{ - i2\pi st}}} \]<br />(We typically say the Fourier transform maps time-domain functions to frequency-domain ones, so we consider the latter to be the Fourier transform and the first equation to be its inverse.) Note how you can easily turn the first one into an actual Fourier transform, by transforming $s\to -s$:<br /><br />\[f(t) = \int_{ - \infty }^\infty {ds\,\,\hat f( - s){e^{ - i2\pi ts}}} \]<br />In other words:<br /><br />\[{\mathcal{F}^{ - 1}}\left\{ {f(s)} \right\} = \mathcal{F}\left\{ {f( - s)} \right\}\]<br />And of course that means ${\mathcal{F}^4} = I$, the identity operator (kind of like the derivative on complex exponentials/sine and cosine, is it not?).Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-68985631233167824512018-09-21T17:41:00.000+01:002018-10-02T21:58:13.809+01:00Quaternion introduction: Part II generally really like the content produced at <a href="https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw">3blue1brown</a>, but their <a href="https://www.youtube.com/watch?v=d4EgbgTm0Bg">recent video on quaternions</a> was just downright terrible. It entirely lacked Grant Sanderson's signature "discover it for yourself" approach, i.e. motivating the idea from the ground-up, and focused too much on an arbitrary formalism (stereographic projections aren't necessary for visualising anything).<br /><br />The right way to motivate quaternions is to start by thinking about generalising complex numbers to higher dimensions. Complex numbers are a remarkable and elegant idea -- if you don't understand why I'm saying this, you could either get off the grid and spend the rest of your life as a circus monkey, or you could read my posts "<a href="https://thewindingnumber.blogspot.com/2017/08/symmetric-matrices-null-row-space-dot-product.html">Null and row spaces, transpose and the dot product</a>" and "<a href="https://thewindingnumber.blogspot.com/2016/11/making-sense-of-eulers-formula.html">Making sense of Euler's formula</a>".<br /><br />The key idea behind complex numbers is that they are an alternate, simple representation of a specific set of linear transformations, namely: two-dimensional spirals (scaling and rotations). Note, similarly, that the real numbers can also be considered an alternate representation of e.g. scaling in one dimension.<br /><br />The natural way to generalise complex numbers to more than two dimensions may seem to be to have an imaginary unit for each possible rotation (or more precisely, each "basis rotation"). In three dimensions, the basis has three planes of rotation, and could be e.g. rotations in the <i>xy</i>-plane, rotations in the <i>yz</i>-plane and rotations in the <i>zx</i> plane (you may have heard these as rotations "around" the <i>z</i>, <i>x</i> and <i>y</i> axes respectively, referring to the axes that remain invariant during the rotation -- however, as it turns out, in a greater number of dimensions $n$, the number of dimensions held invariant is $n-2$, which is only equal to 1 -- i.e. a single axis -- in 3 dimensions. e.g. in 4 dimensions, an $xy$-rotation would leave the $zw$ plane invariant.)<br /><br />So let's try out this formalism, because it seems promising. We could write, e.g. <i>i</i> for the <i>yz</i> rotation, <i>j</i> for the <i>zx</i> rotation and <i>k</i> for the <i>xy</i> rotation. Try to work out some of the algebra here for yourself. What does $ij=?$ equal? What does $jk = ?$ What does $i^2=?$ equal?<br /><br />As it turns out, none of these transformations result in anything very interesting. It would have certainly been elegant if you'd gotten nice results, like $ij=k$, or something, but you don't. One of the neat things about the complex number system is that not only do all complex numbers together, or all unit complex numbers together, form a group -- even $\{1,i,-1,-i\}$ forms a group under multiplication. But $\{1,-1,,i,j,k,-i,-j,-k\}$ <i>do not</i> form a group.<br /><br />How would one solve this problem? Well, the reason $i^2$ doesn't equal minus 1 is that it only offers a reflection across the $x$-axis. The matrix representing $i^2$ is:<br /><br />$${\left[ {\begin{array}{*{20}{c}}1&0&0\\0&0&{ - 1}\\0&1&0\end{array}} \right]^2} = \left[ {\begin{array}{*{20}{c}}1&0&0\\0&{ - 1}&0\\0&0&{ - 1}\end{array}} \right]$$<br />(If you can't come up with the matrix for $i$, you should review the linear algebra series -- or the circus monkey thing.) What if you reflected across all three axes, in some order? You'd have:<br /><br />$${i^2}{j^2}{k^2} = \left[ {\begin{array}{*{20}{c}}1&0&0\\0&{ - 1}&0\\0&0&{ - 1}\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{ - 1}&0&0\\0&1&0\\0&0&{ - 1}\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{ - 1}&0&0\\0&{ - 1}&0\\0&0&1\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}1&0&0\\0&1&0\\0&0&1\end{array}} \right]$$<br />In other words, ${i^2}{j^2}{k^2} = 1$. Additionally you may have observed while crunching the numbers above that ${i^2}{j^2} = {k^2}$.<br /><br />This may give you an idea*. Here's another thing that may give you an idea: the reason you had $i^2=-1$ with complex numbers was that $i$ rotated <i>all</i> the axes in the plane. By contrast, $i,j,k$ only each rotate two of the three axes in 3-dimensional space.<br /><br />*the idea being that perhaps combinations of two rotations can give us more interesting results<br /><br />Well, how do you solve this problem? How do you create a rotation that "rotates all the axes"? Seemingly, you can't. Sure, you can define a rotation that rotates all three of the x, y and z axes, but that would still leave some other axis invariant, which we call "the axis of rotation". <b>Can we define a rotation that leaves no axis invariant?</b><br /><b><br /></b> In three dimensions, the answer is no. Any rotation leaves one axis invariant, and trying to rotate this axis requires rotating it with another axis, and the resulting product rotation still leaves some, calculable axis invariant.<br /><br /><div class="twn-furtherinsight">Calculate this axis.</div><br />The key is to extend our thinking to <i>four</i> dimensions. Here, you can have pairs of rotations <i>acting simultaneously</i> on two different pairs of axes. Since there are only four dimensions in four dimensions, all four axes are transformed.<br /><br />Now, the obvious thing to do here may be to define an imaginary number for each pair of rotations in four dimensions -- there are $\left( {\begin{array}{*{20}{c}}4\\2\end{array}} \right)=6$ rotations, and $\left( {\begin{array}{*{20}{c}}6\\2\end{array}} \right) = 15$ such pairs. But this would be too many "basis rotations", and the rotations would not be independent of each other, since rotations in 4 dimensions can be described with only 6 basis rotations.<br /><br />So how could we make use of our idea of using pairs of rotations as our basis for describing rotations?<br /><br />The key is to make one of our four axes "special" -- call this axis $t$, and the other three axes $x, y, z$. Instead of considering all 15 rotation-pairs, we only consider the following three:<br /><br />$$\begin{array}{l}i = (tx,yz)\\j = (ty,\overline{xz})\\k = (tz,xy)\end{array}$$<br /><div class="twn-furtherinsight">This is not the only possible representation of the quaternions, of course. Even among complex numbers, you have two possible representations -- you could make $i$ a counter-clockwise rotation, as is conventional, or a clockwise one, i.e. there is a symmetry between $i$ and $-i$. For quaternions, it turns out there are 48 different possible representations -- prove this.</div><br />Where $tx$ represents a rotation that sends $t$ to $x$ (i.e. a counter-clockwise rotation on a plane where $t$ is the x-axis and $x$ is the y-axis) and $\overline{xz}$ represents a rotation that sends $z$ to $x$, i.e. the clockwise rotation on a plane where $x$ is the x-axis and $z$ is the y-axis.<br /><br />It turns out that these pairs -- called <i>quaternions</i> -- in fact allow the representation of 3-dimensional rotations, since you need only a $\left( {\begin{array}{*{20}{c}}3\\2\end{array}} \right)=3$-dimensional basis to represent rotations in 3 dimensions.<br /><br /><div class="twn-furtherinsight">Think: Are there any other dimensions that allow such a system to be defined? Can you have, e.g. "hexternions"?</div><br /><div class="twn-pitfall">Note that however tempting it may seem, there is no known natural description of special relativity in terms of quaternions. Sorry.</div><br />One may work through the algebra of these new quaternions by tracking the position of each axis through the multiplication, and as it turns out, it is indeed much more elegant than the more obvious representation detailed earlier:<br /><br />$$\begin{array}{l}j = k,jk = i,ki = j\\{i^2} = {j^2} = {k^2} = - 1\\ijk = - 1\end{array}$$<br />In the next several articles, we will look at exactly how 3 dimensional rotations can be represented with quaternions, the relation between quaternions and the dot and cross products through the commutative and anti-commutative parts, and further extensions of the quaternions to higher dimensions.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-31644623727272494992018-08-18T20:17:00.001+01:002018-08-19T09:49:14.154+01:00Why are negative temperatures hot?You've probably heard the statement "negative temperatures are <i>hot</i>!", referring of course to negative absolute temperatures.<br /><br />But why are they hot? Well, a common explanation is that it's not really the temperature $T$ that is the fundamental quantity, but rather the statistical beta, or "coldness" $\beta=1/T$. So negative temperatures have <i>negative</i> coldness, which is hotter than any positive temperature, since even the hottest positive temperature is only going to give you a small, but positive coldness. So the fact that negative temperatures are hot is a result of the fact that $1/x$ is not really decreasing everywhere, due to its discontinuity.<br /><br />But why? Why is $\beta$ the fundamental quantity? Why should we arbitrarily consider this to be our metric of hotness and coldness, and not $T$?<br /><br />This is a really interesting example to teach people to think in a positivist way in physics, and to operationalise things. What does it mean for something to be hot?<br /><br />Well, you touch it and you say "Ouch!"<br /><br />Seriously, that's all there is -- if you touch something hot, you say "Ouch!", if you touch something cold, you say "Whee!", or something. That's the fundamental, positivist definition of hotness -- "Does it feel hot?"<br /><br />Well, why would something feel hot? <i>Because it transfers heat to you</i>. And this is our operational, positivistic definition of hotness -- if one body transfers heat to another body, it is said to be hotter than the other body.<br /><br />So we need to find out a criterion to decide the direction of heat flow between two bodies. In the past, you've probably taken for granted that heat is transferred from a body with higher temperature to that with lower temperature, but that's just a crappy high school definition. What really causes heat diffusion? Well, when there are a lot of fast-moving particles in one place and slow-moving particles in another, it turns out that a state where the particles are more uniformly spread-out is more likely to happen in future. This is just the requirement that entropy must increase -- it's the second law of thermodynamics.<br /><br />So if we have body 1 with temperature $T_1$ and body 2 with temperature $T_2$, with heat flow of $Q$ from body 1 to body 2, then the second law of thermodynamics is stated as:<br /><br />$$\Delta S_1+\Delta S_2>0$$<br />$$-\Delta Q/T_1+\Delta Q/T_2>0$$<br />$$\Delta Q\left(\frac1{T_2}-\frac1{T_1}\right)>0$$<br />In other words -- if $\Delta Q>0$, i.e. if the heat flow is really from body 1 to body 2, then we require $1/T_2>1/T_1$, and if the heat flow is from body 2 to body 1 ($\Delta Q<0$), we require $1/T_1>1/T_2$.<br /><br />And there you have it! Heat does <i>not</i> flow from the body with higher temperature to the body with lower temperature -- it flows from the body with lower $1/T$ to the body with higher $1/T$. For positive temperatures, these are the same thing -- but negative temperatures have the lowest $1/T$, and are thus hotter.<br /><br /><hr /><br />So those of you want the U.S. to switch to Celsius, or those who report temperatures in Kelvin for no good reason except intellectual signalling... perhaps start reporting <i>statistical betas</i> in 1/Kelvins instead.<br /><br />...<br />"Hey, Alexa, is it chilly outside?"<br />"The coldness in your area is 0.00375 anti-Kelvin."<br /><br />"...I think I'll just risk freezing to death."Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0tag:blogger.com,1999:blog-3214648607996839529.post-13570052045939868172018-07-29T07:24:00.000+01:002018-07-30T06:01:54.204+01:00A curious infinite sum arising from an elementary geometric argumentA well-known elementary geometric argument for the sum of an infinite geometric progression proceeds as follows: consider a Euclidean triangle $\Delta ABC$ with angles $A=\alpha$, $B=\beta$, $C=2\beta$ and bisect $C$ to create a point $C'$ on $AB$. Then $\Delta ABC \sim \Delta ACC'$. Record the area of $\Delta C'BC$ to a counter. Repeat the same bisection with $C'$, $C''$, ad infinitum, each time adding to the counter the area of the piece of the triangle that <i>isn't</i> similar to the parent triangle and bisecting the triangle that is.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-b9Xt5bxIYFI/W11MmoHSv6I/AAAAAAAAFBc/a9Fv2xTtFYgM7SGtqDZRlV19sVA39ZoCgCLcBGAs/s1600/tribasic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="129" data-original-width="230" src="https://3.bp.blogspot.com/-b9Xt5bxIYFI/W11MmoHSv6I/AAAAAAAAFBc/a9Fv2xTtFYgM7SGtqDZRlV19sVA39ZoCgCLcBGAs/s1600/tribasic.png" /></a></div><br />Suppose the area of the original triangle $\Delta ABC$ is 1, and the piece $ACC'$ has area $x$ (thus each succeeding similar copy has area a fraction of $x$ of the preceding triangle). Then the total value of our counter, which approaches 1, is:<br /><br />$$(1-x)+x(1-x)+x^2(1-x)+...=1$$<br />$$1+x+x^2+...=\frac1{1-x}$$<br />Where $x$ depends on the actual angle $\beta$.<br /><br />It is interesting, however, to consider the case of a general scalene triangle $\Delta ABC$ where $C$ is not necessarily twice of $B$. Here each successive triangle wouldn't be similar to the last, thus we won't be dealing with a geometric series.<br /><br />Let the angles of $\Delta ABC$ be $A=\alpha$, $B=\beta$,$C=\pi-\alpha-\beta$. We bisect angle $C$, as before, adding to our counter the piece that contains the angle $B$. The remaining triangle has angles $\alpha$, $\frac{\pi-\alpha-\beta}{2}$ and $\pi-\alpha-\frac{\pi-\alpha-\beta}{2}$.<br /><br />We keep repeating the process, each time bisecting the angle that is neither $\alpha$ nor the angle formed as half the angle that was just bisected, and adding to our counter the area of the piece that does not contain the angle $A$, while splitting the piece that does.<br /><br />To keep track of the angles in each successive triangle, we define three series:<br /><br />$$\begin{gathered}<br />{\alpha _n} = \alpha\\<br />{\beta _n} = {\gamma _{n - 1}}/2\\<br />{\gamma _n} = \pi - {\alpha _n} - {\beta _n}\\<br />\end{gathered}$$<br />These are defined recursively, of course, so we calculate the explicit form by substituting $\gamma_n$ into $\beta_n$ to get a recursion within $\beta_n$ -- then with the simple initial-value conditions $\alpha_0=\alpha$, $\beta_0=\beta$, etc. we get:<br /><br />$$\begin{gathered}<br />{\alpha _n} = \alpha\\<br />{\beta _n} = \frac{{\pi - \alpha }}{3} + {\left( { - \frac{1}{2}} \right)^n}\left( {\beta - \frac{{\pi - \alpha }}{3}} \right)\\<br />{\gamma _n} = \frac{{2(\pi - \alpha )}}{3} - {\left( { - \frac{1}{2}} \right)^n}\left( {\beta - \frac{{\pi - \alpha }}{3}} \right)\\<br />\end{gathered}$$<br />The area ratio of the piece we're keeping at each stage is $\frac{{\sin {\alpha _n}}}{{\sin {\alpha _n} + \sin {\beta _n}}}$, therefore the convergence of their sum of their areas to 1 implies:<br /><br />$$\begin{gathered}<br />\frac{{\sin \alpha }}{{\sin \alpha + \sin \beta }} + \frac{{\sin \beta }}{{\sin \alpha + \sin \beta }}\frac{{\sin \alpha }}{{\sin \alpha + \sin {\beta _1}}} \hfill \\<br />\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, + \frac{{\sin \beta }}{{\sin \alpha + \sin \beta }}\frac{{\sin {\beta _1}}}{{\sin \alpha + \sin {\beta _1}}}\frac{{\sin \alpha }}{{\sin \alpha + \sin {\beta _2}}} + ... = 1 \hfill \\ <br />\end{gathered} $$<br />Or more compactly:<br /><br />$$\sum\limits_{k = 0}^\infty {\left[ \left(1-x_k(\alpha,\beta)\right)\prod\limits_{j = 0}^{k - 1} {x_j(\alpha,\beta)} \right]} = 1$$<br />Where:<br /><br />$${x_k}(\alpha ,\beta ) = \frac{{\sin \left( {\frac{{\pi - \alpha }}{3} + {{\left( { - \frac{1}{2}} \right)}^k}\left( {\beta - \frac{{\pi - \alpha }}{3}} \right)} \right)}}{{\sin \alpha + \sin \left( {\frac{{\pi - \alpha }}{3} + {{\left( { - \frac{1}{2}} \right)}^k}\left( {\beta - \frac{{\pi - \alpha }}{3}} \right)} \right)}}$$<br />For all values of $\alpha$ and $\beta$.<br /><br /><hr /><br />Well, have we truly discovered something new? <br /><br />Turns out, no. It doesn't even matter what $x_k(\alpha,\beta)$ is, really -- the identity $\sum\limits_{k = 0}^\infty {\left[ \left(1-x_k(\alpha,\beta)\right)\prod\limits_{j = 0}^{k - 1} {x_j(\alpha,\beta)} \right]} = 1$ will always be true. Indeed, it is a telescoping sum:<br /><br />$$\begin{gathered}<br />1 - {x_0} + \hfill \\<br />\left( {1 - {x_1}} \right){x_0} + \hfill \\<br />\left( {1 - {x_2}} \right){x_0}{x_1} + \hfill \\<br />\left( {1 - {x_3}} \right){x_0}{x_1}{x_2} + \hfill \\<br />... = 1 \hfill \\ <br />\end{gathered} $$<br />All that is required is that the final term, $x_0x_1x_2x_3...x_k$ approaches 0 as $k\to\infty$ -- <a href="https://thewindingnumber.blogspot.com/2018/07/intuition-to-convergence.html">this ensures sum convergence</a>. (So I suppose I was not completely right when I said it doesn't matter what $x_k$ is -- but considering renormalisation and stuff, I kinda was.)<br /><br />This raises two interesting questions:<br /><ol><li>How would this "telescoping sum" argument work for the simple geometric series?</li><li>Can we get interesting incorrect (? perhaps renormalisations) sums by choosing an $x_k$ sequence whose product doesn't approach zero?</li></ol><br />Well, for the geometric series we had $\beta = (\pi - \alpha )/3$ so that ${x_k}(\alpha ,\beta ) = x(\alpha,\beta)=\frac{{\sin \beta }}{{\sin \alpha + \sin \beta }}$. Indeed, one may confirm that setting $x_0=x_1=x_2=...$ yields the product of the geometric series and $1-x$, and that happens to be telescoping. This is really just our standard proof of the series, where we multiply the sum by $x$, subtract this from the original sum, etc.<br /><br />As for the second question -- consider, for example, $x_k=k+1$. It gives you the sum $1!\cdot1+2!\cdot2+3!\cdot3+...=-1$. Of course, this is just the identity $n\cdot n!=(n+1)!-n!$, and the telescope doesn't really cancel out so you're left with $\infty!-1$.Abhimanyu Pallavi Sudhirhttp://www.blogger.com/profile/17891661511466934614noreply@blogger.com0