In this chapter we develop a systematic
method for transforming a matrix A with entries from a field into a
special form which is called the echelon form of A. The transformation consists of a
sequence of multiplications of A from the left by certain “elementary
matrices”. If A is
invertible, then its echelon form is the identity matrix, and the
inverse is the
product of the inverses of the elementary matrices. For a
non-invertible matrix its echelon form is, in some sense, the
“closest possible” matrix to the identity matrix. This form
motivates the concept of the rank of a matrix, which we introduce
in this chapter and will use frequently later on.
5.1 Elementary Matrices
Let R be a commutative ring with unit,
and . Let
be the identity matrix and let be its ith column, i.e., .
We define
i.e., the entry (i, j) of is 1, all other entries are 0.
For and we define
Thus, is a
permutation matrix (cp. Definition 4.12) obtained by exchanging the
columns i and j of . A multiplication of from the left with means
an exchange ofthe rows i and j of A. For example,
For
we define
Thus,
is a diagonal matrix obtained by replacing the ith column of by
. A
multiplication of
from the left with means a multiplication of the
ith row of A by . For example,
For ,
and
we define
Thus, the lower triangular matrix is obtained by replacing the
ith column of by
. A multiplication of
from the left with means that times
the ith row of A is added to the jth row of A. Similarly, a multiplication of
from the left by the upper triangular matrix means that times
the jth row of A is added to the ith row of A. For example,
(5.1)
(5.2)
(5.3)
Lemma
5.1
Proof
- (1)
The invertibility of with was already shown in Theorem 4.16; the symmetry of is easily seen.
- (2)
Since is invertible, the matrix is well defined. A straightforward computation now shows that .
- (3)
Since for , we have , and therefore
5.2 The Echelon Form and Gaussian Elimination
The constructive proof of the
following theorem relies on the Gaussian elimination algorithm.1 For a given matrix
,
where K is a field, this
algorithm constructs a matrix such that is quasi-upper triangular. We obtain this
special form by left-multiplication of A with elementary matrices ,
and . Each of these left-multiplications
corresponds to the application of one of the so-called “elementary
row operations” to the matrix A:
-
: exchange two rows of A.
-
: multiply a row of A with an invertible scalar.
-
: add a multiple of one row of A to another row of A.
We assume that the entries of
A are in a field (rather
than a ring) because in the proof of the theorem we require that
nonzero entries of A
are invertible. A generalization of the result which holds over
certain rings (e.g. the integers ) is given by the Hermite normal form,2 which plays an important role in
Number Theory.
Theorem
5.2
Let K be a field and let .
Then there exist invertible matrices (these are products of
elementary matrices) such that is in echelon form, i.e., either or
Here denotes
an arbitrary (zero or nonzero) entry of C.
More precisely, is
either the zero matrix, or there exists a sequence of natural
numbers
(these are called the “steps” of the echelon form), where
and , such that
- (1)
for and ,
- (2)
for and ,
- (3)
for and all other entries in column are zero.
If , then is invertible if and only if
. In this
case .
Proof
If , then we set , , and we are done.
Now let and let be the index of the first column of
that does not consist of all zeros. Let be the first entry in this column
that is nonzero, i.e., has the form
We then proceed as follows: First we permute the rows and 1
(if ).
Then we normalize the new
first row, i.e., we multiply it with . Finally we
eliminate the nonzero
elements below the first entry in column . Permuting
and normalizing leads to
If , then we
set .
In order to eliminate below the 1 in column , we
multiply from the left with the matrices
Then we have
where
and with ,
, i.e., we keep the indices of the
larger matrix in
the smaller matrix .
If or , then we are finished, since then
is in echelon form. In this case
.
If at least one of the entries of
is
nonzero, then we apply the steps described above to the matrix
. For
we define the matrices recursively as
Each matrix
is constructed analogous to : First we identify the first column of
that
is not completely zero, as well as the first nonzero entry
in that column. Then permuting and
normalizing yields the matrix
If , then
we set . Now
so that is indeed
a product of elementary matrices of the form
where T is an elementary
matrix of size .
If we continue this procedure
inductively, it will end after steps with either
or .
After r steps we have
By construction, the entries 1 in (5.4) are in the positions
If , then
is
in echelon form (see the discussion at the beginning of the proof).
If , then
we still have to eliminate the nonzero entries above the 1 in
columns . To do this, we denote the matrix in
(5.4) by
and form for
recursively
where
For we
have in echelon form.
(5.4)
Suppose now that and that
is in echelon form.
If A is invertible,
then C is a product of
invertible matrices and thus invertible. An invertible matrix
cannot have a row containing only zeros, so that and hence
. If, on
the other hand, , then
the invertibility of the elementary matrices implies that
. As a product of
invertible matrices, A is
invertible and .
In the
literature, the echelon form is sometimes called reduced row echelon form.
Example
5.3
Transformation of a matrix from
to echelon form via left
multiplication with elementary matrices:
MATLAB-Minute.
The echelon form is computed in
MATLAB with the command rref (“reduced row echelon
form”). Apply rref to [A eye(n+1)] in order to
compute the inverse of the matrix A=full(gallery(’tridiag’,-ones(n,1),2
ones(n+1,1),-ones(n,1)))
for n=1,2,3,4,5
(cp. Exercise 5.5).
Formulate a conjecture about the
general form of . (Can
you prove your conjecture?)
The proof of Theorem 5.2 leads to the
so-called LU-decomposition
of a square matrix.
Theorem
5.4
For every matrix ,
there exists a permutation matrix , a lower triangular matrix with ones on the diagonal and
an upper triangular matrix , such that . The matrix U is invertible if and only if
A is invertible.
Proof
For the Eq. (5.4) has the form
, where
is upper triangular. If , then we set . Since the matrices
are invertible, it follows that
is invertible if and only if A is invertible. For
every matrix has the form
where for
and
(if , then
no permutation was necessary). Therefore,
The form of the permutation matrices for and implies that
holds for certain , . Hence,
The invertible lower triangular matrices and the permutation
matrices form groups with respect to the matrix multiplication (cp.
Theorems 4.13 and 4.16). Thus, , where
is invertible and lower triangular, and
is a permutation matrix. Since is invertible,
also
is invertible, and we obtain with , and . By construction, all
diagonal entries of L are
equal to one.
Example
5.5
Computation of an LU-decomposition of a matrix from
:
Hence, ,
and thus, ,
If , then the LU-decomposition yields . Hence after computing the
LU-decomposition, one
obtains the inverse of A
essentially by inverting the two triangular matrices. Since this
can be achieved by the efficient recursive formula (4.4), the LU-decomposition is a popular method in
scientific computing applications that require the inversion of
matrices or the solution of linear systems of equations (cp.
Chap. 6). In this context, however,
alternative strategies for the choice of the permutation matrices
are used. For example, instead of the first nonzero entry in a
column one chooses an entry with large (or largest) absolute value
for the row exchange and the subsequent elimination. By this
strategy the influence of rounding errors in the computation is
reduced.
MATLAB-Minute.
The Hilbert matrix 3 has the entries
for . It can be generated in MATLAB with
the command hilb(n). Carry out the
command [L,U,P]=lu(hilb(4)) in order
to compute an LU-decomposition of the matrix
hilb(4). How do
the matrices P,
L and
U look
like?
Compute
also the LU-decomposition
of the matrix
full(gallery(’tridiag’,-ones(3,1),2
ones(4,1),-ones(3,1)))
and study the corresponding
matrices P,
L and
U.
We will now show that, for a given
matrix A, the matrix
C in
Theorem 5.2 is uniquely determined in a certain sense.
For this we need the following definition.
Definition
5.6
If is in echelon form (as in
Theorem 5.2), then the positions of are called the pivot positions of C.
We also need the following
results.
Lemma
5.7
If and ,
then if and
only if .
Proof
Exercise.
Theorem
5.8
Let be in echelon form. If for a
matrix , then .
Proof
If B is the zero matrix, then , and hence .
Let now and let A, B have the respective columns
,
.
Furthermore, let be the pivot
positions of B. We will
show that every matrix with has the
form
where . Since B is in echelon form and all entries
of B below its
row r are zero, it
then follows that .
Since is
the first pivot position of B, we have for and (the first column of ). Then
implies for and . Since Z is invertible,
Lemma 5.7 implies that . Since A is in echelon form, . Furthermore,
where (cp. Exercise 5.3).
If , then we
are done.
If , then we proceed with the other pivot
positions in an analogous way: Since B is in echelon form, the kth pivot position gives . From and the invertibility of
we
obtain and
where .
This result yields the uniqueness of
the echelon form of a matrix and its invariance under
left-multiplication with invertible matrices.
Corollary
5.9
For the following assertions hold:
- (1)
There is a unique matrix in echelon form to which A can be transformed by elementary row operations, i.e., by left-multiplication with elementary matrices. This matrix C is called the echelon form of A.
- (2)
If , then the matrix C in (1) is also the echelon form of MA, i.e., the echelon form of a matrix is invariant under left-multiplication with invertible matrices.
5.3 Rank and Equivalence of Matrices
As we have seen in
Corollary 5.9, the echelon form of
is unique. In particular, for every matrix ,
there exists a unique number of pivot positions (cp.
Definition 5.6) in its echelon form. This justifies the
following definition.
Definition
5.10
We see immediately that for
always , where
if and only if .
Moreover, Theorem 5.2 shows that is invertible if and only if
. Further properties of the rank
are summarized in the following theorem.
Theorem
5.11
For the following assertions hold:
- (1)
There exist matrices and with
- (2)
If and , then .
- (3)
If with and , then
- (4)
.
- (5)
There exist matrices and with if and only if .
Proof
- (3a)
Let be such that QB is in echelon form. Then . In the matrix QBC at most the first rows contain nonzero entries. By Corollary 5.9, the echelon form of QA is equal to the echelon form of A. Thus, in the normal echelon form of A also at most the first rows will be nonzero, which implies .
- (1)
: If , i.e., , then and the assertion holds for arbitrary matrices and .If , then there exists a matrix such that QA is in echelon form with r pivot positions. Then there exists a permutation matrix , that is a product of elementary permutation matrices , with(5.5)
- (2)
If , and , then the invariance of the rank under left-multiplication with invertible matrices and can again be used for showing that
- (4)
If , then by (1) there exist matrices and with . Therefore,
- (3b)
Using and (4), we obtain
- (5)
Let with , . Then by ,
Example
5.12
The matrix
from Example 5.3 has the echelon form
Since there are two pivot positions, we have . Multiplying A from the right by
yields , and hence .
Assertion (1) in
Theorem 5.11 motivates the following definition.
Definition
5.13
Two matrices are called equivalent , if there exist matrices
and
with .
As the name suggests, this defines an
equivalence relation on the set , since the following properties hold:
-
Reflexivity: with and .
-
Symmetry: If , then .
-
Transitivity: If and , then .
The equivalence class of
is given by
If , then by (1) in
Theorem 5.11 we have
and, therefore,
Consequently, the rank of A
fully determines the equivalence class [A]. The matrix
is called the equivalence normal
form of A. We obtain
Hence there are pairwise distinct equivalence classes,
and
is a complete set of representatives.
From the proof of Theorem 4.9 we know that for is a non-commutative ring with unit that
contains non-trivial zero divisors. Using the equivalence normal
form these can be characterized as follows:
Exercises
-
If is invertible, then A cannot be a zero divisor, since then implies that .
-
If is a zero divisor, then A cannot be invertible, and hence , so that the equivalence normal form of A is not the identity matrix . Let be given with
(In the following exercises
K is an arbitrary field.)
- 5.1
Compute the echelon forms of the matrices
- 5.2
Let with . Determine the echelon form of A and a formula for .
- 5.3
Let with and . Show that if and only if .
- 5.4
Consider the matrix
- 5.5
Show that if , then the echelon form of is given by . (The inverse of an invertible matrix A can thus be computed via the transformation of to its echelon form.)
- 5.6
Two matrices are called left equivalent, if there exists a matrix with . Show that this defines an equivalence relation on and determine a most simple representative for each equivalence class.
- 5.7
Prove Lemma 5.7.
- 5.8
Determine LU-decompositions (cp. Theorem 5.4) of the matrices
- 5.9
- 5.10
Determine the rank of the matrix
- 5.11
Let be given. Show that
- 5.12
Let .
- (a)
Determine .
- (b)
Let . Show the following assertions:
- (i)
and ,
- (ii)
for ,
- (iii)
if and only if there exist with or and ,
- (iv)
.
- (i)
- (a)
Footnotes
1
Named after Carl Friedrich Gauß
(1777–1855). A similar method was already described in
Chap. 8, “Rectangular Arrays”, of the “Nine
Chapters on the Mathematical Art”. This text developed in ancient
China over several decades BC stated problems of every day life and
gave practical mathematical solution methods. A detailed commentary
and analysis was written by Liu Hui (approx. 220–280 AD) around 260
AD.
4
The concept of the rank was introduced
(in the context of bilinear forms) first in 1879 by Ferdinand Georg
Frobenius (1849–1917).