The Structure of 2x2 Unitary Matrices

Henry G. Baker

May, 1996

Copyright (c) 1996 by Henry G. Baker (hbaker@netcom.com).  All rights reserved.

This file can be found as URL:
ftp://ftp.netcom.com/pub/hb/hbaker/quaternion/unitary-2x2.txt

This ASCII note is best read with a _fixed-width font_.

Abstract
--------

Many linear algebra books wax eloquent on the properties of 'unitary'
matrices of complex numbers, which are complex analogues of (real)
'orthogonal' matrices, but don't give much insight into what a unitary
matrix 'looks like'.  In this short memo, we proceed from the
definition of a unitary matrix and show how these constraints force
every 2x2 unitary matrix U into a certain simple form U = D1 O D2,
where D1,D2 are complex diagonal matrices, and O is a real orthogonal
matrix.  Furthermore, the linear function vU of a complex vector v and
the 2x2 unitary matrix U can be represented by a triple quaternion
product Q1 v Q2, where Q1,Q2 are quaternions trivially derived from U.

Notation
--------

Since this is an ascii text file, we are forced to severely curtail
the use of mathematical notations.  About the only symbols that will
be needed are the usual (TeX) symbols "^" for superscript and "_" for
subscript.  However, we will _not_ be making use of the TeX grouping
brackets "{}", but will instead use parentheses.  We use "/=" for _not
equal_. 

Unless otherwise noted, all matrices in this memo will be _square_.

If M is a matrix, then M_ij is the ij'th element of M.  The notation
(M_ij) means the matrix constructed from the elements M_ij, where i,j
range over the number of rows and columns of M, respectively.  Thus,
M=(M_ij).

The _transpose_ of a matrix M is the matrix M^T = (M_ji).

If z is a complex number, then z* is the complex _conjugate_ of z --
i.e., if z=x+iy, then z* = x-iy.

The _conjugate_ of a matrix M is the matrix M* = (M_ij*) -- i.e., the
matrix of the conjugates of the elements of M.

The _tranjugate_ of a matrix M is the conjugate of the transpose of M,
or equivalently, the transpose of the conjugate of M.  We will use the
notation M' for the tranjugate of the matrix M.

For example, the tranjugate of

[W X]    [W* Y*]
[Y Z] is [X* Z*].

We denote the _determinant_ of the matrix M by det(M), or, when
writing out the matrix explicitly, by using "|" brackets instead of
"[]" brackets.

The determinant of

[W X]
[Y Z] is WZ-XY.

The _inverse_ of a matrix M (assuming that it has one, i.e.,
det(M)/=0) is M^-1 -- i.e, M M^-1 = M^-1 M = I, where I is the
_identity_ matrix of the same size as M.

The inverse of

    [W X]               [ Z -X]
M = [Y Z] is (1/det(M)) [-Y  W].

A _diagonal_ matrix is a matrix whose only non-zero entries are on the
main diagonal -- i.e., if M is diagonal, then M_ij=0 for i/=j.  We
will use the notation diag(x,y) for a 2x2 diagonal matrix whose
non-zero elements along the main diagonal are x and y.  The 2x2
identity matrix is

            [1 0]
diag(1,1) = [0 1].

Unitary Matrices
----------------

A _unitary_ matrix U is a nonsingular (det(U)/=0) matrix such that
U^-1 = U' -- i.e., its inverse is equal to its tranjugate.

Trivial properties of unitary matrices:

* the transpose, inverse, conjugate and tranjugate of a unitary matrix
  are all themselves unitary.

* the product of two or more unitary matrices (of the same size) is
  unitary.

* the determinant of a unitary matrix has _absolute value_ of 1.

The proof of this last property is as follows.  If U is unitary, then
UU'=I, hence det(UU')=det(U)det(U')=det(U)det(U*)=det(U)det(U)*=det(I)=1.

Dot Products
------------

An appropriate 'dot product'/'inner product' for (row) vectors v with
complex elements is vv' -- i.e., if v = [z1 z2] is a 2-element row
vector, then

[z1 z2] [z1 z2]' = [z1 z2] [z1*]
                           [z2*] = z1 z1* + z2 z2* = v v'

is an appropriate dot product.  Note that for any complex vector v,
its _norm_ vv' is _real_ and _non-negative_.

A vector v is _normalized_ if vv'=1.  If uu'/=0 for a vector u, then
the vector (1/(uu'))u is normalized.

Two such vectors v,u are _orthogonal_ if vu'=0.

A set of vectors is _orthonormal_ if vv'=1 (i.e., v is normalized) for
all v, and vu'=0 for all v/=u.

We now have a few more trivial properties of unitary matrices:

* the _rows_ of a unitary matrix are orthonormal -- i.e., normalized
  and orthogonal to one another.

* the _columns_ of a unitary matrix are orthonormal -- i.e.,
  normalized and orthogonal to one another.

* A unitary matrix U preserves norms.

The proof of this last property is as follows.

norm(vU)=(vU)(vU)'=vUU'v'=vIv'=vv'=norm(v)


The Structure of 2x2 Unitary Matrices
-------------------------------------

We now have enough machinery to reveal the structure of 2x2 unitary
matrices.

We start with an arbitrary 2x2 matrix of complex numbers, and start
adding the constraints required by unitariness to see what effect
these constraints have on our arbitrary numbers.

Consider the arbitrary 2x2 complex unitary matrix

    [W X]
U = [Y Z].

We wish to utilize the various constraints on U to determine
constraints on the individual entries W,X,Y, and Z.

Our first constraint is that the first _row_ [W X] must be normalized:

[W X] [W X]' = [W X] [W*]
                     [X*] = WW* + XX* = 1.

The form of this constraint tells us that we might be better off
considering our original matrix with its elements in _polar_ form --
i.e., W=|W|exp(iw), X=|X|exp(ix), Y=-|Y|exp(iy), Z=|Z|exp(iz), or

    [ |W|exp(iw) |X|exp(ix)]
U = [-|Y|exp(iy) |Z|exp(iz)]

(The "-" sign in the lower left hand corner of U appears to be
arbitrary right now, but we choose it this way because it makes the
later expressions come out simpler.)

The fact that the first row must be normalized can now be expressed as

WW* + XX* = |W|exp(iw)|W|exp(-iw) + |X|exp(ix)|X|exp(-ix) =
|W|^2+|X|^2 = 1.

In other words, for some real angle a _in the first quadrant_,
|W|=cos(a), |X|=sin(a).

However, by applying the constraint that the first _column_ [W Y]^T
must also be normalized, we learn that |W|^2+|Y|^2=1, but since
|W|^2+|X|^2 also =1, we must have |Y|=|X|=sin(a).  By similar
reasoning, we find that |Z|=|W|=cos(a).

Therefore, by using only the properties that both rows and columns
must be normalized, our unitary matrix must have the following form:

    [ cos(a)exp(iw) sin(a)exp(ix)]
U = [-sin(a)exp(iy) cos(a)exp(iz)]

In the above matrix, we can set c=cos(a) and s=sin(a) to get

    [W X]   [ c exp(iw)  s exp(ix)]
U = [Y Z] = [-s exp(iy)  c exp(iz)]

We now apply the constraint that the _rows_ are orthogonal -- i.e.,
WY* + XZ* = 0.

WY* + XZ* = - c exp(iw) s exp(-iy) + s exp(ix) c exp(-iz)
          = - cs exp(iw-iy) + cs exp(ix-iz)
          = cs (exp(ix-iz) - exp(iw-iy))

In the general case where cs/=0, we must have

exp(ix-iz) - exp(iw-iy) = 0, or
             exp(ix-iz) = exp(iw-iy), or

                    x-z = w-y, modulo 2 pi.

Solving for z, we get z = -w+x+y, so our matrix becomes

    [W X]   [ c exp(iw)  s exp(ix)       ]
U = [Y Z] = [-s exp(iy)  c exp(i(-w+x+y))]

We now apply the requirement that the _columns_ be orthogonal -- i.e.,
WX* + YZ* = 0.

WX* + YZ* = c exp(iw) s exp(-ix) - s exp(iy) c exp(-i(-w+x+y))
          = cs exp(i(w-x)) - cs exp(i(y+w-x-y))

In the most general case, where cs/=0, we have

exp(i(w-x)) - exp(i(y+w-x-y)) =
exp(i(w-x)) - exp(i(w-x))     = 0

So, once the rows are orthogonal, the columns are already orthogonal,
as well.

We now apply the constraint that the absolute value of the determinant
must be 1:

         |W X|   | c exp(iw)  s exp(ix)       |
det(U) = |Y Z| = |-s exp(iy)  c exp(i(-w+x+y))|  =

c exp(iw) c exp(i(-w+x+y)) + s exp(iy) s exp(ix) =
c^2 exp(i(w-w+x+y)) + s^2 exp(i(y+x))            =
c^2 exp(i(x+y)) + s^2 exp(i(y+x))                =
exp(i(x+y)) (c^2 + s^2)                          =
exp(i(x+y))

Since c^2+s^2 = 1, the absolute value of the determinant =1.

So, the unitary determinant constraint is already satisfied, as well.

In a sense, we are already finished, because we have reduced our
unitary matrix to 4 real parameters: w,x,y and the angle
a=acos(c)=asin(s).

However, we can go a good bit further.

Consider now the matrix

    [W exp(ib)  X]
V = [Y exp(ib)  Z]

We note that det(V) = exp(ib) det(U), so that multiplying the first
column by a complex number whose absolute value =1 keeps the absolute
value of the determinant of the matrix U the same.

Furthermore, the first column remains normalized, since

W exp(ib) W* exp(-ib) + Y exp(ib) Y* exp(-ib) = WW* + YY* = 1.

Similarly, the rows are still orthogonal, since

W exp(ib) Y* exp(-ib) + X Z* = WY* + XZ* = 0.

Also, the columns are still orthogonal, since

W exp(ib) X* + Y exp(ib) Z* = exp(ib) (WX* + YZ*) = 0.

Therefore, multiplying any of the rows or any of the columns by an
arbitrary phase factor doesn't change the unitariness of the matrix.
This fact suggests a _factorization_ of our unitary matrix:

[ c exp(iw)  s exp(ix)       ]
[-s exp(iy)  c exp(i(-w+x+y))]             =

[exp(iq)    0   ] [ c s] [exp(is)    0   ]
[   0    exp(ir)] [-s c] [   0    exp(it)] =

                      [ c s]
diag(exp(iq),exp(ir)) [-s c] diag(exp(is),exp(it))

where q,r,s,t are new real parameters to be determined.

If we multiply out the right-hand side, we can equate the exponents,
and attempt to solve the equations.

[ c exp(iw)  s exp(ix)       ]   [ c exp(i(q+s))  s exp(i(q+t))]
[-s exp(iy)  c exp(i(-w+x+y))] = [-s exp(i(r+s))  c exp(i(r+t))]

We then get the following equations, modulo 2 pi:

q+s = w
r+s = y
q+t = x
r+t = -w+x+y

We have one too many parameters, which makes sense, because we can
move an overall phase factor from the left diagonal matrix to the the
right diagonal matrix, and vice versa, without changing the overall
product.  So we can arbitrarily choose t=0, and begin solving our
equations.

q = x
r = -w+x+y
s = y-r = y+w-x-y = w-x

So, the equations are solvable, and we obtain the factorization:

[ c exp(iw)  s exp(ix)       ]
[-s exp(iy)  c exp(i(-w+x+y))]                        =

[exp(ix)       0       ] [ c s] [exp(i(w-x))    0   ]
[   0    exp(i(-w+x+y))] [-s c] [    0       exp(i0)] =

                             [ c s]
diag(exp(ix),exp(i(-w+x+y))) [-s c] diag(exp(i(w-x)),exp(i0))

In other words, within every 2x2 _unitary_ matrix hides a (real)
_orthogonal_ matrix.  Now, this orthogonal matrix

[ c s]
[-s c]

can additionally be factored as follows:

[ c s]             [1 i] [c+is  0  ]           [ 1 -i]
[-s c] = 1/sqrt(2) [i 1] [ 0   c-is] 1/sqrt(2) [-i  1]

where, of course, c+is = exp(ia), c-is = exp(-ia).

So, we have shown that we can factor any 2x2 unitary matrix using 4
independent real parameters (e.g., a,q,r,s) as follows:

[exp(iq)    0   ] [1/n i/n] [exp(ia)    0    ] [ 1/n -i/n] [exp(is)    0   ]
[   0    exp(ir)] [i/n 1/n] [   0    exp(-ia)] [-i/n  1/n] [   0    exp(it)] =

                      [1/n i/n]                        [ 1/n -i/n]
diag(exp(iq),exp(ir)) [i/n 1/n] diag(exp(ia),exp(-ia)) [-i/n  1/n] diag(exp(is),exp(it))

where n=sqrt(2), and the set {q,r,s,t} has one redundant parameter.

Thus far, we have shown how a 2x2 unitary matrix U can be decomposed
into the form U = D1 O D2, where D1,D2 are diagonal matrices, and O is
a real orthogonal matrix.  Furthermore, this real orthogonal matrix O
can be further decomposed into a diagonal matrix with conjugate
eigenvalues.


Quaternions
-----------

For our purposes, we will define _quaternions_ as complex matrices of the form

         [ A   B ]
Q(A,B) = [-B*  A*].

Thus, a quaternion is determined by 4 real parameters -- the real and
imaginary parts of A, and the real and imaginary parts of B.

The set of quaternions is _closed_ under matrix multiplication:

Q(A,B) Q(C,D) = Q(AC-BD*, AD+BC*).

The _conjugate_ Q'(A,B) of a quaternion Q(A,B) is simply the
_tranjugate_ of the corresponding complex matrix -- i.e., also written
Q'(A,B).

          [A* -B]
Q'(A,B) = [B*  A]

The _norm_ of a quaternion Q(A,B) is its _determinant_: det(Q(A,B)) =
AA* + BB*.  Note that norm(Q'(A,B)) = norm(Q(A,B)).  Note also that

                             [1 0]                [1 0]
Q(A,B) Q'(A,B) = (AA* + BB*) [0 1] = norm(Q(A,B)) [0 1]

Note that if A=cos(a) and B=sin(a), then

[ cos(a) sin(a)]   [ A  B ]
[-sin(a) cos(a)] = [-B* A*], so the matrix Q(cos(a),sin(a)) is a quaternion.

Note that if A=exp(ia) and B=0, then

[exp(ia)    0    ]   [ A  B ]
[   0    exp(-ia)] = [-B* A*], so the matrix Q(exp(ia),0) is also a quaternion.


Representing Unitary Matrices by Quaternions
--------------------------------------------

We have already shown above that every 2x2 unitary matrix can be
factored in the following form:

[ c exp(iw)  s exp(ix)       ]
[-s exp(iy)  c exp(i(-w+x+y))]                        =

[exp(ix)       0       ] [ c s] [exp(i(w-x))    0   ]
[   0    exp(i(-w+x+y))] [-s c] [     0      exp(i0)] =

                             [ c s]
diag(exp(ix),exp(i(-w+x+y))) [-s c] diag(exp(i(w-x)),exp(i0))

We now show that with a bit of additional manipulation, we can express
the diagonal matrices as quaternions.

Consider first the left diagonal matrix:

[exp(ix)       0       ]           [exp(id)    0    ]
[   0    exp(i(-w+x+y))] = exp(ib) [   0    exp(-id)] =

[exp(i(b+d))     0      ]
[    0       exp(i(b-d))]

We get the following linear equations, mod 2 pi:

x      = b+d
-w+x+y = b-d

d = (w-y)/2
b = x-d = x-(w-y)/2

Thus, our first diagonal matrix factors as:

[exp(ix)       0       ]
[   0    exp(i(-w+x+y))]                         =

                  [exp(i(w-y)/2)       0       ]
exp(i(x-(w-y)/2)) [     0        exp(-i(w-y)/2)] =

exp(i(x-(w-y)/2)) Q(exp(i(w-y)/2),0)

Similarly, our second diagonal matrix factors as:

[exp(i(w-x))    0   ]
[     0      exp(i0)]                              =

                [exp(i((w-x)/2))        0        ]
exp(i((w-x)/2)) [       0        exp(-i((w-x)/2))] =

exp(i((w-x)/2)) Q(exp(i((w-x)/2)),0)

Now, we can gather together the constant factors from these two
diagonal matrix factorizations to get the overall constant factor:

exp(i((x+y)/2))

Finally,

[ c exp(iw)  s exp(ix)       ]
[-s exp(iy)  c exp(i(-w+x+y))]                                =

[exp(ix)       0       ] [ c s] [exp(i(w-x))    0   ]
[   0    exp(i(-w+x+y))] [-s c] [     0      exp(i0)]         =

                             [ c s]
diag(exp(ix),exp(i(-w+x+y))) [-s c] diag(exp(i(w-x)),exp(i0)) =

exp(i((x+y)/2)) times

[exp(i(w-y)/2)        0      ]
[     0        exp(-i(w-y)/2)] times

[ c s]
[-s c] times

[exp(i((w-x)/2))         0       ]
[       0        exp(-i((w-x)/2))] =

exp(i((x+y)/2)) Q(exp(i(w-y)/2),0) Q(c,s) Q(exp(i((w-x)/2)),0)

Of course, since quaternions are closed under multiplication, we can
collapse the last three factors together into a single quaternion:

Q(exp(i(w-y)/2),0) Q(c,s) Q(exp(i((w-x)/2)),0)           =

Q(c exp(i(w-y)/2), s exp(i(w-y)/2)) Q(exp(i((w-x)/2)),0) =

Q(c exp(i(w-(x+y)/2)), s exp(i(x-y)/2))

Thus, our original unitary matrix

[ c exp(iw)  s exp(ix)       ]
[-s exp(iy)  c exp(i(-w+x+y))] =

exp(i(x+y)/2) Q(c exp(i(w-(x+y)/2)), s exp(i(x-y)/2))

Now we have a problem: the complex constant exp(i(x+y)/2) _cannot_
be expressed as a quaternion, because its representation as a matrix
looks like diag(A,A) instead of the required form of diag(A,A*).

However, if we wish to express the _effect_ of a unitary matrix U on a
complex (row) vector v=[R S], we can premultiply this vector by the complex
constant exp(i(x+y)/2) _before_ multiplying the vector by the
matrix of the above quaternion.  But another more elegant way to
achieve the same effect is to represent the complex vector v=[R S] by
the quaternion Q(R,S), and multiply it _on the left_ by the
quaternion Q(exp(i(x+y)/2),0).  Thus,

Q(exp(i(x+y)/2),0) Q(R,S) = Q(exp(i(x+y)/2) R, exp(i(x+y)/2) S).

In other words, multiplying Q(R,S) on the left by a quaternion Q(C,0)
produces the quaternion Q(C R, C S), which is exactly the effect we
desire.

Thus, the _matrix_ product

      [ c exp(iw)  s exp(ix)       ]
[R S] [-s exp(iy)  c exp(i(-w+x+y))] =

v     U

produces a vector, which when interpreted as a quaternion, is equal to
the _quaternion_ product

Q(exp(i(x+y)/2),0)  Q(R,S)  Q(c exp(i(w-(x+y)/2)), s exp(i(x-y)/2)) =

Q1                  v       Q2.

In other words, we have shown that every linear unitary function v U
of a complex vector v can be computed by the quaternion triple product
Q1 v Q2, where the symbol "v" in the quaternion product means the
quaternion whose respective components are those of the vector v.

We note that we still have 4 real parameters in this representation:
w,x,y and a, where c=cos(a) and s=sin(a).


A simpler reduction of unitary 2x2 matrices to quaternions
----------------------------------------------------------

With our knowledge of the structure of unitary 2x2 matrices, we can
reduce the work necessary to decompose the unitary 2x2 matrix by
noticing that

U = sqrt(det(U)) U/sqrt(det(U)) = exp(i(x+y)/2) Q,

det(Q)=1.

As a corollary to the above structure theorem, we find that Q is
_already_ a quaternion, whose parameters can trivially read off as the
first row of Q!!

Thus, our linear function of the complex vector V can be computed
as the triple product

Q(sqrt(det(U)),0) Q(R,S) (U/sqrt(det(U))) =

Q(sqrt(det(U)),0) Q(R,S) Q

For example, consider the unitary matrix

    [1  0]
U = [0 -1]

det(U) = -1, so sqrt(det(U)) = i.

      [-i 0]
U = i [ 0 i] = i Q(-i,0) = exp(i pi/2) Q(exp(i 3pi/2),0).


Conclusions
-----------

We have shown how _every_ 2x2 complex unitary matrix U can be factored
into a form U = D1 O D2, where D1,D2 are complex diagonal matrices,
and O is an orthogonal real matrix.  Furthermore, this factorization
can then be used to derive a quaternion product representation for
every unitary linear function of complex pairs: v U can be computed by
Q1 v Q2, where the product on the left is a matrix-vector product, and
the triple product Q1 v Q2 is a quaternion triple product.


References
----------

Eves, Howard.  Elementary Matrix Theory.  Dover Publications, Inc.,
New York, 1966.  ISBN 0-486-63946-0.