Skeptic but Jewish: Simplicity of SO(3)

I wanted to make a post about the simplicity of $\text{SO}(3)$, the group of orthogonal linear operators on $\mathbb{R}^3$. The proof is really beautiful, part of the beauty is how it is both topological and algebraic at the same time. The proof also mentions a lot of important mathematical ideas that are important in their own right. Ironically the proof that $\text{SO}(3)$ is simple is not simple.

This post will mostly be self-contained. But I cannot explain everything, otherwise it will be too long. So I will assume that the reader is familiar with the following mathematical concepts: group theory, linear algebra, quaternions, and point-set topology.

Group theory, linear algebra and point-set topology are standard mathematical subjects that math students should already be familiar with so I will not say anything about them. Quaternions are something that some math students probably never studied before. So I will mention a few basic facts about the quaternions. I will not prove anything. If you have any questions or want references please ask in the comment sections and I will hopefully expound on it.

The quaternions $\mathbb{H}$ are an extension of $\mathbb{C}$. I will not define them. That information can be found on Wikipedia. What I do want to say is that if $\alpha \in \mathbb{H}$ is a unit quaternion, i.e. $|\alpha|^2 = 1$ then we can write $\alpha = \cos \theta + u \sin \theta $ where $u$ is a unit quaternion in $\mathbb{R}i+\mathbb{R}j+\mathbb{R}k = \{ai+bj+ck|a,b,c\in \mathbb{R}\}$. Notice that $u$ satisfies the equation $u^2=-1$. So $u$ plays a role as a square root of $-1$. The relevance of this representation is that rotations in $\mathbb{R}^3$ can be represented in the form $\cos \theta + u\sin \theta$ in the following manner. A rotation in $\mathbb{R}^3$ (centered at the origin) is determined by the angle and the axis of rotation, called the "pole". We can choose a unit vector $(u_1,u_2,u_3)$ to represent this pole, and $\theta$ be the angle of rotation around this vector (the direction is determined by the right-thumb rule). Let $\alpha = \cos \theta + u\sin \theta$ where $u=u_1i+u_2j+u_3k$. Then the mapping $w \mapsto \alpha^{-1}w\alpha$ rotates $\mathbb{R}i+\mathbb{R}j+\mathbb{R}k$ around $u$ by angle $2\theta$. We can identity $\mathbb{R}^3$ with $\mathbb{R}i+\mathbb{R}j+\mathbb{R}k$ in a natural manner by the one-to-one correspondence $(x,y,z)\leftrightarrow xi+yj+zk$. In this manner if we have a pole in $\mathbb{R}^3$ and an angle of rotation we can find a corresponding quaternion of the form $\cos \frac{\theta}{2} + u\sin \frac{\theta}{2}$ which rotates $\mathbb{R}i+\mathbb{R}j+\mathbb{R}k$ around the corresponding axis by the desired such angle. This is the geometry that we need to understand about the quaternions. Again, I am sorry for not proving these results. It is just I want to start somewhere and save time. If you have questions just ask.

We will prove that $\text{SO}(3)$ is simple by rather working with a very similar group $\text{SU}(2)$, the special unitary group over $\mathbb{C}^2$. These are the special unitary linear operators on the space $\mathbb{C}^2$. The definition is the following:
$$\text{SU}(2) = \left\{ \alpha,\beta \in \mathbb{C} : \begin{bmatrix} \alpha & -\beta \\ \overline{\beta} & \overline{\alpha} \end{bmatrix} \right\}$$
If $M\in \text{SU}(2)$ then $1 = \det M = |\alpha|^2 + |\beta|^2$.

Write $\alpha = x_1+x_4i$ and $\beta = x_2+x_3i$ so that $\begin{bmatrix} \alpha & -\beta \\ \overline{\beta} & \overline{\alpha} \end{bmatrix} = \begin{bmatrix} x_1+x_4i & -x_2-x_3i \\ x_2-x_3i &x_1-x_4i \end{bmatrix}$. Now define $U = \{ \omega \in \mathbb{H} : |\omega| = 1\}$, that is, $U$ is the subgroup of the quaternion multiplicative group of all unit quaternions. Notice that the one-to-one correspondence $\begin{bmatrix} x_1+x_4i & -x_2-x_3i \\ x_2-x_3i &x_1-x_4i \end{bmatrix} \leftrightarrow x_1+x_2i+x_3j+x_4k$ is an isomorphism between $\text{SU}(2)$ and $U$. The fact that this correspondence is a bijection is obvious, the only tricky part is to prove that it is a homomorphism, but that is also easy, just multiply out the matrix and notice that is preserves quaternion multiplication.

We have discovered that we can identify $\text{SU}(2)$ with $U$, the unit quaternions, in such a way that preserves the algebraic structure. But all $U$ is the collection of all quaternions $x_1+x_2i+x_3j+x_4k$ such that $x_1^2+x_2^2+x_3^2+x_4^2=1$. Thus, we can further identify $U$ with $\mathbb{S}^3$, the 3-sphere. We can therefore define a multiplication on $\mathbb{S}^3$ in such a way that it is compatible with the identification to $U$. That is, if $p,q$ are points on the 3-sphere which correspond to $\alpha \leftrightarrow p$ and $\beta \leftrightarrow q$ where $\alpha,\beta \in U$ then we define $pq$ to be the point on $\mathbb{S}^3$ which corresponds to $\alpha\beta$. With such a construction we have defined a group structure on the 3-sphere.

The point of all our work is that we can think of $\mathbb{S}^3$ as a group which is isomorphic to $\text{SU}(2)$ and as a topological space, the topology given by the 3-sphere topology. We will determine all the normal subgroups of $\mathbb{S}^3$ by working with its topology. This is a really wonderful and beautiful way to solve an algebraic problem.

The group structure on $\mathbb{S}^3$ that we obtain has a continuous group structure. In other words, if $g\in \mathbb{S}^3$ then the map defined by $w\mapsto gw$ is a continous function on $\mathbb{S}^3\to \mathbb{S}^3$. The reason for this is very simple. If $|w_1-w_2|<\delta$ (where the norm is the standard Euclidean norm) then $|gw_1-gw_2| = |g||w_1-w_2|=|w_1-w_2|<\delta$. The equality $|gw_1-gw_2| = |g||w_1-w_2|$ follows by our identification of $\mathbb{S}^3$ with $U$ and the fact that the norm on $U$ is multiplicative. Thus, given $\varepsilon>0$ if we choose $\delta = \varepsilon$ then for all $w_1,w_2\in \mathbb{S}^3$ which satisfy $|w_1-w_2|<\delta$ we have $|gw_1-gw_2|<\varepsilon$. This prove that $w\mapsto gw$ is (uniformly) continuous function $\mathbb{S}^3\to \mathbb{S}^3$. Actually, we proven something stronger we proved that left multiplication on the 3-sphere is Lipschitz's continuous, but that is besides the point, we do not need this result anywhere.

As a corollary we can easily show that $w\mapsto gw$ is a homeomorphism. Note that $w\mapsto g^{-1}w$ is continuous and the composition of both of these maps is the identity map on $\mathbb{S}^3$. Thus, $w\mapsto gw$ is a homeomorphism of $\mathbb{S}^3$. In particular, it is an open map. If $W\subseteq \mathbb{S}^3$ is an open subset then $gW = \{gw:w\in W\}$ will stay an open subset of $\mathbb{S}^3$.

Now it is time to get visual. We will visualize $\mathbb{S}^3$. Of course, we cannot do this because it is a three dimensional sphere embedded in four dimensional space. It is impossible to visualize it, sadly. However, that inconvenience is for mere mortals. We are mathematicias after all, we can visualize anything we imagine. For a point $(x_1,x_2,x_3,x_4)$ let us imagine the vertical axis (or the z-axis if that makes it more comfortable) to represent the axis for the first coordinate $x_1$. Then think of a 2-sphere with the vertical axis (z-axis) being $x_1$. Of course, this 2-sphere is a simplified visualization of the 3-sphere but for our purposes it will help motivate us in what is going on in what follows. Every point on this 3-sphere (which we simply think of as a 2-sphere) represents a matrix in $\text{SU}(2)$. The north pole, that is, $(1,0,0,0)$ is identified with $I$, the identify matrix (under the correspondence we defined earlier) and the south pole, that is, $(-1,0,0,0)$ is identified with $-I$.

If we set $x_1=c$, some constant $-1<.c<.1$, then $x_1^2+x_2^2+x_3^2+x_4^2=1$ will geometrically represent the lattitudes of $\mathbb{S}^3$ that are $c$ from the origin in the vertical direction. As we vary $c$ we will get different latitudes. At the extreme cases $c=\pm 1$ then the latitudes are not really latitudes, they are the north and south poles.

Let $M=\begin{bmatrix} c+x_4i & -x_2-x_3i \\ x_2-x_3i &c-x_4i \end{bmatrix}$ be a point (actually the point which identifies to the matrix in $\text{SU}(2)$) to a point on the lattitude. The charachteristic polynomial of this matrix is $\lambda^2 - 2c\lambda + 1$. If $\lambda_1,\lambda_2$ (counting multiplicities) are the roots of this polynomial then $\text{tr}(M) = \lambda_1+\lambda_2$. But the sum of the roots of a polynomial, from high-school algebra, is just the coefficient $2c$. Thus, all matrices on the lattitude $x_1=c$ have trace equal to $2c$. Since the trace of a matrix is preserved under conjugation it follows that all conjugate matrices have the same trace, that is, they lie on the same lattitude.

Now we prove the converse. That is if $A,B$ are two matrices in $\text{SU}(2)$ having the same lattitude on $\mathbb{S}^3$ i.e. $\text{tr}(A)=\text{tr}(B)$ then they are conjugate. Let the common trace have value $2c$ where $c$ is the lattitude length from the origin. Since the charachteristic polynomial (it is the same for conjugate matrices) is $\lambda^2 - 2c\lambda + 1$. Notice that this is a quadradic polynomial with real coefficients. Therefore, it has two roots (counting multiplicity) that are complex conjugates. Thus, we can call these roots $\lambda, \overline{\lambda}$. To prove that $A,B$ are conjugate to one another in $\text{SU}(2)$ it sufficies to prove $A$ and $B$ are both conjugate to the same matrix in $\text{SU}(2)$ since conjugacy is a transitive property. We will prove that both $A,B$ are conjugate to $\begin{bmatrix}\lambda & 0 \\ 0&\overline{\lambda} \end{bmatrix}$. Since $A$ is a special unitary operator it can be orthogonally diagnolized by the spectral theorem. Thus, there is a matrix $M\in \text{SU}(2)$ such that $MAM^*$ is diagnol, where $M^*$ denotes the Hermitian transpose. The diagnol entries of this matrix are the eigenvalues $\lambda,\overline{\lambda}$. Thus, $MAM^*$ is either the matrix $\begin{bmatrix} \lambda & 0 \\ 0& \overline{\lambda} \end{bmatrix}$ or $\begin{bmatrix} \overline{\lambda} & 0 \\ 0 & \lambda \end{bmatrix}$. In the first case we proved that $A$ is conjugate the the desired matrix. In the second case we can conjugate this matrix with the matrix $R=\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}$ which will bring $(RM)A(RM)^*$ to the desired matrix form. Repeating the same argument for $B$ we can show that it can also be brought to the form $\begin{bmatrix} \lambda & 0 \\ 0 & \overline{\lambda}\end{bmatrix}$ by a conjugation in a matrix of $\text{SU}(2)$. Thus, we prove that two matrices are conjugate to one another if and only if they have the same trace (which geometrically means they lie on the same lattitude and all matrices on the lattitude are conjugate to one another).

We now have the machinery to determine all the normal subgroups of $\text{SU}(2)$. Clearly, the center $Z$ of $\text{SU}(2)$ is always a normal subgroup. The center is simply $Z = \{ \pm I \}$ (the proof is essentially the same that was given back here). Our argument will be that this is the only proper normal subgroup of $\text{SU}(2)$. To prove this let $N$ be a normal subgroup containing a matrix different from $\pm I$. If $M\in N$ is such a matrix it has to lie in one of the lattitudes, $x_1=c$, $|c|<1$, where $\text{tr}(M)=2c$. Now since $N$ is a normal subgroup it must contain all conjugates of $M$. But we determined that all the conjugates of $M$ is the entire lattitude $x_1=c$ on $\mathbb{S}^3$. Pick any matrix in this lattitude $M_0$ such that $M_0\not = M$. Let $f:[0,1]\to \mathbb{S}^3$ be a path such that $f(t)$ lies in the lattitude with $f(0)=M_0$ and $f(1)=M$, clearly there is such a path because the lattitude is path-connected. Now consider the function $g:[0,1]\to N$ defined by $g(t)=M_0^{-1}f(t)$. This function is well-defined i.e. $g(t)\in N$ because $N$ is a subgroup and it contains $M_0^{-1}$ and all $f(t)$ since they lie in the lattitude (conjugacy class). Furthermore, $g$ is a continous function because as we said left-multiplication in $\mathbb{S}^3$ is a continous group operation. Now define $h= \text{tr}(g)$, this is also a continous function $h:[0,1]\to \mathbb{R}$. Notice that $g(0)=I$ therefore $h(0)=2$ while $g(1)\not = I$ so $h(1)<2$. Now by the intermediate value theorem it follows that $h(t)$ takes on all values on the interval $(h(1),2]$. So $N$ contains a matrix with trace for all values on $(h(1),2]$. But $N$ is normal so it contains the entire lattitude for each trace. Thus, we have proven that $N$ contains the entire cap $x_1>h(1)$ on the sphere $\mathbb{S}^3$. This cap is clearly an open set in $\mathbb{S}^3$, let us call this open cap $W$. If $g\in \text{SU}(2)$ then by what we said above $gW$ is an open set. Consider the cosets $\text{SU}(2)/N$. Let $\{ g_sN| s\in S\}$ be a class of representatives for the cosets where $g_s\in \text{SU}(2)$. Notice that if $g,g_0$ lie in the same coset, represented by $g_s$ for some $s\in S$, then $gW,g_0W\subseteq g_sN$. If however $g,g_0$ lie in a different coset then $gW\subseteq gN$ and $g_0W\subseteq g_0N$ but $gN\cap g_0N=\emptyset$ so that $gW\cap g_0W=\emptyset$. For every $s\in S$ define $V_{g_s}$ be the union of all $gW$ where $g$ ranges over all $g$ in the same coset with $g_s$ (so it must be open since it is a union of open sets). It follows from what we said above that $V_{g_s}$ is a disjoint collection of open sets. Therefore, we can write $\text{SU}(2)$ as a disjoint union of open sets $V_{g_s}$ where $s$ ranges over $S$. The reason being is that every element of $g\in \text{SU}(2)$ lies in $gW$ and we already proved that they are all disjoint. But $\text{SU}(2)$ is a connected space, therefore, it is impossible to write $\text{SU}(2)$ as a disjoint union of non-empty proper open sets. Which means that $\text{SU}(2)/N$ consists of just a single coset. Which means that $N=\text{SU}(2)$.

Since $N=Z$ is the only proper non-trivial normal subgroup of $\text{SU}(2)$ it follows that $N=Z$ is a maximal normal subgroup. Therefore, $\text{SU}(2)/N$ must be a simple group. We would have finally proved that $\text{SO}(3)$ is simple if we can prove that $\text{SO}(3)$ is isomorphic to $\text{SU}(2)/N$. This is the last obstacle that stands in our way.

We will use the first isomorphism theorem which can be found here. We will define $\phi:\text{SU}(2)\to \text{SO}(3)$ in the following manner. If $\alpha \in \text{SU}(2)$ then we can identify it as a unit quaternion, and so write it as $\cos \theta + u\sin\theta$. Where $u=u_1i+u_2j+u_3k$ where $u_1^2+u_2^2+u_3^2=1$. Remember that conjugation by $\cos \theta + u\sin\theta$ is a rotation around $(u_1,u_2,u_3)$ by angle $2\theta$ (as we said above). Rotations are the special orthogonal operators of $\mathbb{R}^3$. So we can find a corresponding special linear operator $\phi (\alpha)$. This mapping is a homomorphism because conjugation is preserved. Notice that conjugating by $\alpha$ and then by $\beta$ is the same as conjugating by $\alpha\beta$ which corresponds to performing the corresponding rotation by $\alpha$ followed by the corresponding rotation by $\beta$. But this map is not one-to-one. It is easy to see that $\ker \phi = Z$ because the trivial rotations come from $\cos \theta + u \sin\theta$ where $2\theta $ is a multiple of $2\pi$ i.e. for $\theta = 0,\pi$, which corresponds to the matrices $\{\pm I\}=Z$. Thus, by the first isomorphism theorem it follows that $\text{SU}(2)/Z \simeq \text{SO}(3)$ and we have finally proven that $\text{SO}(3)$ is simple!

Remark: If you were careful you should have noticed that we proved that if a connected group has a continous group operator (as was with $\mathbb{S}^3$) then any proper subgroup must have empty interior.

Skeptic but Jewish

How Large is your Penis?

Tuesday, December 21, 2010

Simplicity of SO(3)

No comments:

Post a Comment

About Me

Blog Archive

Followers