In this chapter we define matrices with
their most important operations and we study several groups and
rings of matrices. James Joseph Sylvester (1814–1897) coined the
term matrix 1 in 1850 and described matrices as
“an oblong arrangement of terms”. The matrix operations defined in
this chapter were introduced by Arthur Cayley (1821–1895) in 1858.
His article “A memoir on the theory of matrices” was the first to
consider matrices as independent algebraic objects. In our book
matrices form the central approach to the theory of Linear
Algebra.
4.1 Basic Definitions and Operations
We begin with a formal definition of
matrices.
Definition
4.1
Let R be a commutative ring with unit and
let . An array of the form
with ,
,
,
is called a matrix of size
over
R. The are
called the entries or
coefficients of the matrix.
The set of all such matrices is denoted by .
In the following we usually assume
(without explicitly mentioning it) that in
R. This excludes the
trivial case of the ring that contains only the zero element (cp.
Exercise 3.11).
Formally, in
Definition 4.1 for or we obtain “empty matrices” of the size
,
or
. We
denote such matrices by . They will be used for technical reasons in some
of the proofs below. When we analyze algebraic properties of
matrices, however, we always consider .
The zero matrix in ,
denoted by or
just 0, is the matrix that has all its entries equal to
.
A matrix of size is
called a square matrix or
just square. The entries
for
are called the diagonal
entries of A. The
identity matrix in
is the
matrix , where
is the Kronecker
delta-function.2 If it is clear which n is considered, then we just write
I instead of . For
we set
.
(4.1)
The ith row of is ,
,
where we use commas for the optical separation of the entries. The
jth column of A is
Thus, the rows and columns of a matrix are again matrices.
If matrices ,
,
are given, then we can combine them to the matrix
We then do not write square brackets around the rows of
A. In the same way we can
combine the
matrices
to the matrix
If and , , then we can combine these four matrices to
the matrix
The matrices are
then called blocks of the
block matrix A.
We now introduce four operations for
matrices and begin with the addition:
The addition in
operates entrywise, based
on the addition in R. Note
that the addition is only defined for matrices of equal size.
The multiplication of two matrices is
defined as follows:
Thus, the entry of the
product is
constructed by successive multiplication and summing up the entries
in the ith row of
A and the jth column of B. Clearly, in order to define the
product , the
number of columns of A must
be equal to the number of rows in B.
In the definition of the entries
of the
matrix we have
not written the multiplication symbol for the elements in
R. This follows the usual
convention of omitting the multiplication sign when it is clear
which multiplication is considered. Eventually we will also omit
the multiplication sign between matrices.
We can illustrate the multiplication
rule “
equals ith row of
A times jth column of B” as follows:
It is important to note that the matrix multiplication in general
is not commutative.
Example
4.2
For the matrices
we have
On the other hand, . Although and
are both
defined, we obviously have . In this case one recognizes the
non-commutativity of the matrix multiplication from the fact that
and
have
different sizes. But even if and are both defined and have the same size, in
general .
For example,
yield the two products
The matrix multiplication is, however,
associative and distributive with respect to the matrix
addition.
Lemma
4.3
For , and the following assertions hold:
- (1)
.
- (2)
.
- (3)
.
- (4)
.
Proof
We only show property (1); the others
are exercises. Let , , as well as and . By the definition of the
matrix multiplication and using the associative and distributive
law in R, we get
for
and ,
which implies that .
On the right hand sides of (2) and (3)
in Lemma 4.3 we have not written parentheses, since we
will use the common convention that the multiplication of matrices
binds stronger than the addition.
For we define
Another multiplicative operation for matrices is the multiplication
with a scalar,3 which is defined as
follows:
We easily see that and for all . In addition, the scalar multiplication
has the following properties.
(4.2)
Lemma
4.4
For , and the following assertions hold:
- (1)
.
- (2)
.
- (3)
.
- (4)
.
Proof
Exercise.
The fourth matrix operation that we
introduce is the transposition:
For example,
The matrix is called
the transpose of
A.
Definition
4.5
If satisfies , then A
is called symmetric. If
, then
A is called skew-symmetric.
For the transposition we have the
following properties.
Lemma
4.6
For , and the following assertions hold:
- (1)
.
- (2)
.
- (3)
.
- (4)
.
Proof
Properties (1)–(3) are exercises. For
the proof of (4) let with , , and . Then
from which we see that .
MATLAB-Minute.
Carry out the following commands
in order to get used to the matrix operations of this chapter in
MATLAB notation: A=ones(5,2), A+A, A-3
A, A’, A’
A, A
A’.
(In order to see MATLAB’s
output, do not put a semicolon at the end of the command.)
Example
4.7
Consider again the example of car
insurance premiums from Chap. 1. Recall that
denotes the probability that a customer in class in this
year will move to the class . Our example consists of four such classes, and
the 16 probabilities can be associated with a row-stochastic
matrix (cp. (1.2)), which we denote by
P. Suppose that the
insurance company has the following distribution of customers in
the four classes: in class , in class , in class , and in class . Then the matrix
describes the initial customer distribution. Using the matrix
multiplication we now compute
Then contains
the distribution of the customers in the next year. As an example,
consider the entry of in position (1, 4), which is computed by
A customer in the classes or in this year cannot move to the class
. Thus,
the respective initial percentages are multiplied by the
probabilities
and .
A customer in the class or will be in the class with the probabilities
or ,
respectively. This yields the two products
and .
Continuing in the same way we obtain
after k years the
distribution
(This formula also holds for , since .) The insurance company can use this formula
to compute the revenue from the payments of premium rates in the
coming years. Assume that the full premium rate (class ) is 500
Euros per year. Then the rates in classes ,
, and
are 450,
400 and 300 Euros (10, 20 and discount). If there are 1000 customers
initially, then the revenue in the first year (in Euros) is
If no customer cancels the contract, then this model yields the
revenue in year as
For example, the revenue in the next 4 years is 404500,
372025, 347340 and 341819 (rounded to full Euros). These numbers
decrease annually, but the rate of the decrease seems to slow down.
Does there exists a “stationary state”, i.e., a state when the
revenue is not changing (significantly) any more? Which properties
of the model guarantee the existence of such a state? These are
important practical questions for the insurance company. Only the
existence of a stationary state guarantees significant revenues in
the long-time future. Since the formula depends essentially on the
entries of the matrix , we have reached an interesting problem of Linear
Algebra: the analysis of the properties of row-stochastic matrices.
We will analyze these properties in Sect. 8.3.
4.2 Matrix Groups and Rings
In this section we study algebraic
structures that are formed by certain sets of matrices and the
matrix operations introduced above. We begin with the addition in
.
Theorem
4.8
is a commutative group. The neutral
element is
(the zero matrix) and for the inverse element is
. (We write instead
of .)
Proof
Using the associativity of the
addition in R, for
arbitrary , we obtain
Thus, the addition in is associative.
The zero matrix
satisfies . For a given
and we have .
Finally, the commutativity of the
addition in R implies that
.
Note that (2) in
Lemma 4.6 implies that the transposition is a
homomorphism (even an isomorphism) between the groups
and
(cp. Definition 3.6).
Theorem
4.9
is a ring with unit given by the
identity matrix . This
ring is commutative only for .
Proof
We have already shown that
is a commutative group (cp. Theorem 4.8). The other
properties of a ring (associativity, distributivity and the
existence of a unit element) follow from Lemma 4.3. The commutativity
for holds
because of the commutativity of the multiplication in the ring
R. The example
shows that the ring is not commutative for .
The example in the proof of
Theorem 4.9 shows that for the
ring has
non-trivial zero-divisors, i.e., there exist matrices with . These
exist even when R is a
field.
Let us now consider the invertibility of matrices in the ring
(with
respect to the matrix multiplication). For a given matrix
,
an inverse must satisfy the two
equations and (cp. Definition 3.10). If an inverse of
exists, i.e., if A is
invertible, then the
inverse is unique and denoted by (cp. Theorem 3.11). An invertible matrix is
sometimes called non-singular, while a non-invertible
matrix is called singular.
We will show in Corollary 7.20 that the existence of the
inverse already is implied by one of the two equations and , i.e., if one of them holds, then
A is invertible and
. Until then, to be correct, we
will have to check the validity of both equations.
Not all matrices
are invertible. Simple examples are the non-invertible matrices
Another non-invertible matrix is
However, considered as an element of , the (unique) inverse of A is given by
Lemma
4.10
If are invertible, then the following
assertions hold:
- (1)
is invertible with . (We also write this matrix as .)
- (2)
is invertible with .
Proof
Our next result shows that the
invertible matrices form a multiplicative group.
Theorem
4.11
The set of invertible
matrices over R forms a
group with respect to the matrix multiplication. We denote this
group by (“GL”
abbreviates “general linear (group)”).
Proof
The associativity of the
multiplication in is clear. As shown in (2) in
Lemma 4.10, the product of two invertible matrices
is an invertible matrix. The neutral element in is
the identity matrix , and since every is assumed to be invertible,
exists
with .
We now introduce some important
classes of matrices.
Definition
4.12
Let .
- (1)
A is called upper triangular, if for all . A is called lower triangular, if for all (i.e., is upper triangular).
- (2)
A is called diagonal, if for all (i.e., A is upper and lower triangular). We write a diagonal matrix as .
We next investigate these sets of
matrices with respect to their group properties, beginning with the
invertible upper and lower triangular matrices.
Theorem
4.13
The sets of the invertible upper
triangular
matrices and of the invertible lower triangular
matrices over R form
subgroups of .
Proof
We will only show the result for the
upper triangular matrices; the proof for the lower triangular
matrices is analogous. In order to establish the subgroup property
we will prove the three properties from Theorem 3.5.
Since is an invertible upper triangular matrix, the set
of the invertible upper triangular matrices is a nonempty subset of
.
Next we show that for two invertible
upper triangular matrices the product is
again an invertible upper triangular matrix. The invertibility of
follows from (2) in Lemma 4.10. For we have
Therefore, C is upper
triangular.
It remains to prove that the inverse
of an
invertible upper triangular matrix A is an upper triangular matrix. For
the
assertion holds trivially, so we assume that . Let
, then the equation
can be written as a system of n equations
Here,
is the Kronecker delta-function defined in (4.1).
(4.3)
We will now prove inductively for
that the diagonal entry
of
A is invertible with
, and that
This formula implies, in particular, that for
.
(4.4)
For the last row of (4.3) is given by
For we have
, where in the second
equation we use the commutativity of the multiplication in
R. Therefore, is
invertible with , and thus
This is equivalent to (4.4) for . (Note that for in (4.4) the sum is empty and thus equal to zero.)
In particular, for
.
Now assume that our assertion holds
for , where . Then, in particular, for
and . In
words, the rows of are in “upper triangular from”. In order to
prove the assertion for , we consider the kth row in (4.3), which is given by
For
we obtain
By the induction hypothesis, we have . This implies
, where we have used the
commutativity of the multiplication in R. Hence is invertible with . From (4.5) we get
and hence (4.4) holds for . If , then and , which gives
.
(4.5)
We point out that (4.4) represents a
recursive formula for
computing the entries of the inverse of an invertible upper
triangular matrix. Using this formula the entries are computed
“from bottom to top” and “from right to left”. This process is
sometimes called backward
substitution.
In the following we will frequently
partition matrices into blocks and make use of the block multiplication: For every
, we can write
as
If are both partitioned like this, then
the product can be
evaluated blockwise, i.e.,
In particular, if
with and , then
and a direct computation shows that
(4.6)
(4.7)
MATLAB-Minute.
Create block matrices in MATLAB
by carrying out the following commands:
k=5;
A11=gallery(’tridiag’,-ones(k-1,1),2
ones(k,1),-ones(k-1,1));
A12=zeros(k,2); A12(1,1)=1; A12(2,2)=1;
A22=-eye(2);
A=full([A11 A12; A12’
A22])
B=full([A11 A12;
zeros(2,k) -A22])
Investigate the meaning of the
command full.
Compute the products A
B and
B
A as well as the
inverses inv(A)
and inv(B).
Compute the inverse of B in MATLAB with the formula
(4.7).
Corollary
4.14
The set of the invertible diagonal
matrices over R forms a
commutative subgroup (with respect to the matrix multiplication) of
the invertible upper (or lower) triangular
matrices over R.
Proof
Since is an invertible diagonal matrix, the invertible
diagonal
matrices form a nonempty subset of the invertible upper (or lower)
triangular
matrices. If and
are invertible,
then is
invertible (cp. (2) in Lemma 4.10) and diagonal,
since
Moreover, if is invertible,
then
is invertible for all (cp. the proof of
Theorem 4.13). The inverse is
given by the invertible diagonal matrix .
Finally, the commutativity property follows directly from the commutativity in
R.
Definition
4.15
A matrix is called a permutation matrix, if in every row and
every column of P there is
exactly one unit and all other entries are zero.
The term “permutation” means
“exchange”. If a matrix is multiplied with a permutation matrix
from the left or from the right, then its rows or columns,
respectively, are exchanged (or permuted). For example, if
then
Theorem
4.16
The set of the
permutation matrices over R
forms a subgroup of . In particular, if
is a permutation matrix, then P is invertible with .
Proof
Exercise.
From now on we will omit the
multiplication sign in the matrix multiplication and write
AB instead of .
Exercises
(In the following exercises
R is a commutative ring
with unit.)
- 4.1
Consider the following matrices over :
- 4.2
Consider the matrices(a) xy,(b) ,(c) yx,(d) ,(e) xAy,(f) ,(g) ,(h) ,(i) xyA,(j) ,(k) Axy,(l) .
- 4.3
Show the following computational rules:
- 4.4
Prove Lemma 4.3 (2)–(4).
- 4.5
Prove Lemma 4.4.
- 4.6
Prove Lemma 4.6 (1)–(3).
- 4.7
Let . Determine for all .
- 4.8
Let be a polynomial (cp. Example 3.17) and . We define as .
- (a)
Determine p(A) for and .
- (b)
For a fixed matrix consider the map , . Show that and for all . (The map is a ring homomorphism between the rings R[t] and .)
- (c)
Show that is a commutative subring of , i.e., that is a subring of (cp. Exercise 3.14) and that the multiplication in this subring is commutative.
- (d)
Is the map surjective?
- (a)
- 4.9
Let K be a field with . Show that every matrix can be written as with a symmetric matrix (i.e., ) and a skew-symmetric matrix (i.e., ).Does this also hold in a field with ? Give a proof or a counterexample.
- 4.10
Show the binomial formula for commuting matrices: If with , then , where .
- 4.11
Let be a matrix for which is invertible. Show that holds for every .
- 4.12
Let be a matrix for which an with exists and let m be smallest natural number with this property.
- (a)
Investigate whether A is invertible, and if so, give a particularly simple representation of the inverse.
- (b)
Determine the cardinality of the set .
- (a)
- 4.13
Let
- (a)
Show that is a subring of .
- (b)
Show that for all and . (A subring with this property is called a left ideal of .)
- (c)
Determine an analogous subring of , such that for all and . (A subring with this property is called a left ideal of .)
- (a)
- 4.14
Examine whether with
- 4.15
Generalize the block multiplication (4.6) to matrices and .
- 4.16
Determine all invertible upper triangular matrices with .
- 4.17
Let , , , and
- (a)
Let . Show that A is invertible if and only if is invertible and derive in this case a formula for .
- (b)
Let . Show that A is invertible if and only if is invertible and derive in this case a formula for .
- (a)
- 4.18
Let , and . Show the following assertions:
- (a)
holds if and only if .
- (b)
If , then
- (a)
- 4.19
Show that the set of block upper triangular matrices with invertible diagonal blocks, i.e., the set of matrices
- 4.20
Prove Theorem 4.16. Is the group of permutation matrices commutative?
- 4.21
Show that the following is an equivalence relation on :
- 4.22
A company produces from four raw materials , , , five intermediate products , , , , , and from these three final products , , . The following tables show how many units of and are required for producing one unit of and , respectively:
- (a)
Determine, with the help of matrix operations, a corresponding table which shows how many units of are required for producing one unit of .
- (b)
Determine how many units of the four raw materials are required for producing 100 units of , 200 units of and 300 units of .
- (a)
Footnotes
1
The Latin word “matrix” means “womb”.
Sylvester considered matrices as objects “out of which we may form
various systems of determinants” (cp. Chap. 5). Interestingly, the English writer
Charles Lutwidge Dodgson (1832–1898), better known by his pen name
Lewis Carroll, objected to Sylvester’s term and wrote in 1867: “I
am aware that the word ‘Matrix’ is already in use to express the
very meaning for which I use the word ‘Block’; but surely the
former word means rather the mould, or form, into which algebraic
quantities may be introduced, than an actual assemblage of such
quantities”. Dodgson also objected to the notation for the
matrix entries: “...most of the space is occupied by a number of
a’s, which are wholly
superfluous, while the only important part of the notation is
reduced to minute subscripts, alike difficult to the writer and the
reader.”
3
The term “scalar” was introduced in
1845 by Sir William Rowan Hamilton (1805–1865). It originates from
the Latin word “scale” which means “ladder”.