contentsmcs.une.edu.au/~math101/lectures/additional notes... · lecture 3.1 simultaneous equations...

CONTENTS i

Contents

Read This First . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Lecture 3.1 Simultaneous Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Lecture 3.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Lecture 3.3 The Inverse of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . 14

Lecture 3.4 More on Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Lecture 3.5 Introduction to Determinants . . . . . . . . . . . . . . . . . . . . . . . . . 24

Lecture 3.6 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Lecture 3.7 Determinants and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Lecture 3.8 An Application, Leslie Matrices . . . . . . . . . . . . . . . . . . . . . . . . 41

Lecture 3.9 Leslie Matrices, continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Lecture 3.10 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Lecture 3.11 The Inner or Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . 59

Lecture 3.12 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

ii Read This First

The following notes cover in unrevised form the rest of the material in the algebra section ofMATH101.

The sequence is not precisely the one that we shall follow. For example the section on ScalarProduct and Vector Product is needed for Assignment 10, so you should read ahead to that.

Revised notes will be posted on the unit homepage as soon as possible.

If you have problems, do contact me.

iii

Preface

This is the third of three parts which together constitute the course material for MATH101.

In this part we take an intorductory look at some topics from one of the foundational buildingblocks of modern mathematics, linear algebra.

Remember that references for each section of the course are contained in your study guide.While this course material is fully self-contained you should try to consult some of the referencematerial to broaden your perspective on the topic.

I would like to acknowledge the help of Margaret McDonald, Robyn Curry and Meg Vivers intypsetting these notes. All mistakes and errors are, however, mine alone.

I am sure you will find this section stimulating and intellectually rewarding, good luck.

Chris Radford,UNE.

Lecture 3.1 Simultaneous Equations 1

Lecture 3.1 Simultaneous Equations

Our aim in this section of the course is to give an introduction to one of the fundamental edificesof modern pure mathematics, linear algebra. In developing this introduction we will start withspecific problems – such as solving systems of linear equations. In fact, it was just these problemswhich motivated the development of the subject of linear algebra.

The first known example of the use of linear algebra techniques (specifically, matrix methods)appears in the text “Nine Chapters of the Mathematical Art” written during the Han Dynasty(see reference [1], here we are following reference [10]). In this ancient Chinese text the followingproblem is considered:

There are three types of corn, of which three bundles of the first, two bundles of the second, andone of the third make 39 measures. Two of the first, three of the second, and one of the third make34 measures. And one of the first, two of the second, and three of the third make 26 measures.How many measures of corn are contained in one bundle of each type?

The author, writing in 200BC, uses a technique which is now known as Gaussian elimination,to solve this problem. The really remarkable thing is that this technique would not become widelyknown until the 19th century. By the end of this lecture we will be able to solve this problem usinga related technique, Gauss-Jordan elimination.

Simultaneous Equations.

Let’s start with a straightforward, specific example. We want to solve the following system ofsimultaneous linear equations for the unknowns x, y and z.

1. x + y + z = 1

2. x + y − z = −1

3. −x− 2y + z = 2.

To solve such a system one usually takes different combinations of the equations until you obtain“x = number”, “y = number” and “z = number”. By combinations we mean linear combinations:we are allowed to take any multiple of one equation and add (or subtract) it to any multiple ofanother equation. The aim in doing this is to eliminate one of the unknowns from the new equation.

So we might try (1)− (3):-(1′) 2x + 3y = −1.

Next we could try (2) + (3):-(2′) − y = 1

Finally, (1)− (2):-2z = 2.

We have the solution

x = 1

y = −1

z = 1.

There is one other operation we can perform on our equation set which will not alter the solution,we can interchange the order of the equations. That is, the solution of the system is unchanged ifwe simply relabel our equations – for example (1) could become (3) and (3) become (1) withoutaltering the outcome.

We now want to “abstract” this process and in so doing get an algorithm for solving such linearsystems.

Firstly, we note that we could write our system of equations as a matrix, the augmented matrix– rows indicating the equation, columns the coefficient of the unknown and a final column for the

2 Lecture 3.1 Simultaneous Equations

right hand sides.1 1 1 11 1 −1 −1−1 −2 1 2

Of course, we have to remember that column 1 represents the x’s, column 2 the y’s and column 3the z’s.

Now our rule for solving linear systems was that we could take any linear combination ofequations we wished to form a new equation. Our matrix equivalent of this rule is

Any multiple of a row may be added (or subtracted) from another. Any rowmay be multiplied by a non-zero number.

We were also allowed to change the order in which the equations appeared.

Any pair of rows may be interchanged.

How then do we use these two elementary row operations to arrive at the solution? Well, thesolution looks like “x = number”, “y = number” and “z = number”, the matrix for these threeequations would be

1 0 0 “number”0 1 0 “number”0 0 1 “number”.

So if we can use our elementary row operations to reduce the augmented matrix to this form wecan read the solution off from the right most column.

Let’s solve our problem using this technique. The best way to do this is to be methodical: startwith column 1, get a 1 as the first entry then try to get the zeros for the rest of the entries, go tocolumn 2 and repeat the process with a 1 as second entry and so on. We start with,

1 1 1 11 1 −1 −1−1 −2 1 2.

We will indicate the operation performed using Ri to stand for row i with the first R indicating therow on which the operation is performed. For example, R1 − 2R2 means an operation performedon row 1, the new row 1 is the old row 1 minus twice row 2. Row 2 remains unchanged.

1 1 1 1R2 −R1 0 0 −2 −2

−1 −2 1 2

1 1 1 10 0 −2 −2

R3 + R1 0 −1 2 3

1 1 1 1(interchange) R2 −→ R3 0 −1 2 3

0 0 −2 −2

R1 + R2 1 0 3 40 −1 2 30 0 −2 −2

1 0 3 40 −1 2 3

− 12R3 0 0 1 1

R1 − 3R3 1 0 0 10 −1 2 30 0 1 1

1 0 0 1−R2 0 1 −2 −3

0 0 1 1

1 0 0 1R2 + 2R3 0 1 0 −1

0 0 1 1.


We are done, we read off the answer from the right-most column; x = 1, y = −1, z = 1.

This technique of Gauss-Jordan elimination can be used on linear systems of any size. Thetechnique will always produce a solution, if such a solution exists (we’ll say more about this below).

�Example Solve the following system of linear equations

2x1 + x2 − x3 + x4 = 6

x1 + x3 + 3x4 = 4

3x1 − 2x2 + x3 = 2

x1 + x2 − x4 = −2

Solution

The augmented matrix for the system is

2 1 −1 1 61 0 1 3 43 −2 1 0 21 1 0 −1 −2

We proceed as before, working column by column.

12R1 1 1

2 − 12

12 3

1 0 1 3 43 −2 1 0 21 1 0 −1 −2

1 12 − 1

212 3

R2 −R1 0 − 12

32

52 1

3 −2 1 0 21 1 0 −1 −2

1 12 − 1

212 3

0 − 12

32

52 1

R3 − 3R1 0 −72

52 − 3

2 −71 1 0 −1 −2

1 12 − 1

212 3

0 − 12

32

52 1

0 − 72

52 − 3

2 −7R4 −R1 0 1

212 − 3

2 −5

1 12 − 1

212 3

−2R2 0 1 −3 −5 −20 −7

252

−32 −7

0 12

12

−32 −5

R1 − 12R2 1 0 1 3 4

0 1 −3 −5 −20 −7

252

−32 −7

0 12

12

−32 −5

1 0 1 3 40 1 −3 −5 −2

R3 + 72R2 0 0 −8 −19 −14

0 12

12

−32 −5

1 0 1 3 40 1 −3 −5 −20 0 −8 −19 −14

R4 − 12R2 0 0 2 1 −4

1 0 1 3 40 1 −3 −5 −2

− 18R3 0 0 1 19

874

0 0 2 1 −4

R1 −R3 1 0 0 58

94

0 1 −3 −5 −20 0 1 19

874

0 0 2 1 −4

1 0 0 58

94

R2 + 3R3 0 1 0 178

134

0 0 1 198

74

0 0 2 1 −4

1 0 0 58

94

0 1 0 178

134

0 0 1 198

74

R4 − 2R3 0 0 0 −154

−152

1 0 0 58

94

0 1 0 178

134

0 0 1 198

74

− 415R4 0 0 0 1 2

R1 − 58R4 1 0 0 0 1

0 1 0 178

134

0 0 1 198

74

0 0 0 1 2


1 0 0 0 1R2 − 17

8 R4 0 1 0 0 −10 0 1 19

874

0 0 0 1 2

1 0 0 0 10 1 0 0 −1

R3 − 198 R4 0 0 1 0 −3

0 0 0 1 2

Our solution can be read from the last column:

x1 = 1, x2 = −1, x3 = −3, x4 = 2.

2

Existence of Solutions

For two dimensional systems, systems in two independent variables x and y, a pair of linearequations can be represented as a pair of straight lines. For example,

x + y = 2

x + 2y = 5,

represents a pair of lines intersecting in the point (x, y) = (−1, 3). This intersection point is thesolution of the system. Of course a pair of straight lines need not intersect – they can be parallelor coincide. If they coincide then all points (x, y) on the line will satisfy the system. For example,

x + 2y = 5

−3x− 6y = −15.

The second equation is simply a multiple of the first all points (x, 12 (5− x)), for any x, satisfy the

system. Two distinct parallel lines give equations which have no points in common, the equationsare inconsistent – there is no solution. For example

x− 3y = 2

x− 3y = 6.

are inconsistent. The fact that the equations are inconsistent is easily discovered if we are usingGauss-Jordon. Of course, it’s obvious in this case but for a large system it can be far from obvious.

1 −3 21 −3 6

1 −3 2R2 −R1 0 0 4 ←− inconsistency.

For three dimensional systems each linear equation ax + by + cz = d (a, b, c, and d are allconstant) represents a plane in R3. Two equations give two planes which may intersect in a lineor be parallel. If the planes are parallel the pair of equations will be inconsistent, if they coincideall points on the plane satisfy the pair of equations. If we have a third plane (third equation) thenthere are a number of possibilities, for distinct planes,

• all three planes are distinct and parallel – inconsistent system.

• two of the planes are distinct and parallel – two parallel straight line intersections, inconsis-tent.

• all planes distinct and non-parallel with three straight line intersections – inconsistent.

• all three planes intersect in a single straight line – we can solve for two of the unknowns interms of the third.

• the three planes intersect in a single point – a unique solution.


�Example Solve the following set of linear equations,

x− y − 3z = −3

3x + y − z = −5

x + 2y + 3z = 0.

Solution

1 −1 −3 −33 1 −1 −51 2 3 0

1 −1 −3 −3R2 − 3R1 0 4 8 4

1 2 3 0

1 −1 −3 −30 4 8 4

R3 −R1 0 3 6 3

1 −1 −3 −314R2 0 1 2 1

0 3 6 3

R1 + R2 1 0 −1 −20 1 2 10 3 6 3

1 0 −1 −20 1 2 1

R3 − 3R2 0 0 0 0

The last row tells us that the equations are not independent – the planes are intersecting in asingle straight line. From the first two rows we have

1.x + 0.y − 1.z = −2

0.x + 1.y + 2.z = 1,

or

x = z − 2

y = 1− 2z.

This is as far as we can go with this system. 2

If we go to higher dimensional systems the situation becomes increasingly complex. Clearly,we need a simple test for consistency of our linear systems. Such a test is available using what isknown as the determinant. We will investigate this at a later stage in the course.

♠ Exercises 1

1. Solve the Chinese corn problem described at the beginning of this lecture.

2. Find all solutions, if any, to the following linear systems.

(a) x− 2y + 3z = 11 (b) −2x1 + x2 + 6x3 = 184x + y − z = 4 5x1 + 8x3 = −16

2x− y + 3z = 10 3x1 + 2x2 − 10x3 = −3

(c) 3x + 6y − 6z = 9 (d) 3u + 6v − 6w = 92x− 5y + 4z = 6 2u− 5v + 4w = 6

−x + 16y − 14z = −3 5u + 28v − 26w = −8

(e) x + y − z = 7 (f) x + y − z = 74x− y + 5z = 4 4x− y + 5z = 4

2x + 2y − 3z = 0 6x + y + 3z = 18

3. A traveller just returned from Europe spent $60 a day for housing in Austria, $40 a day inFrance and $40 a day in Germany. For food the traveller spent $20 a day in Austria, $30


a day in France and $20 a day in Germany. The traveller spent $15 a day in each countryon incidentals. The traveller’s records for the trip indicate a total of $680 spent on housing,$320 for food and $210 for incidental expenses. Calculate the number of days the travellerspent in each country or show that the traveller’s records are inconsistent.

Lecture 3.2 Matrices 7

Lecture 3.2 Matrices

Matrix Multiplication

In our first lecture we met the augmented matrix of a system of linear equations. At first sightthe augmented matrix might appear to be a mere aid to calculation. However, it is much morethan this. The augmented matrix is our first glimpse of the underlying algebra of matrices. Thisalgebra of matrices has wide ranging applications in many areas of mathematics, economics andthe sciences.

The augmented matrix consists of two parts the coefficient matrix, the part to the left of thevertical line, and the right hand column. What we want to do now is develop a rule for matrixmultiplication which will allow us to reconstruct the system of equations directly from the coefficientmatrix and the column.

Let’s start with a simple example

(∗) x + y = 2

2x− y = 1

The coefficient matrix is

A =(

1 12 −1

)and the column, known as a column vector, is

C =(

21

).

The fact that the right hand side can be written as a column vector suggest we define a columnvector of the unknowns,

X =(

x

y

).

We write our system of equations (*) as AX = C. This gives us the multiplication law for a matrixonto a column vector

AX =(

1 12 −1

)(x

y

)=(

x + y

2x− y

).

Schematically, the multiplication law is 1 1

2 −1

( x

y

)=

x + y

2x− y

for the top row

1 1(

x

y

)= 1.x + 1.y = x + y

for the second row

2 −1(

x

y

)= 2.x + (−1).y = 2x− y

The general rule is this, we

multiply corresponding elements of the row and column together, then add theresult.

8 Lecture 3.2 Matrices

For example,

(2 1 3 0 1)

10−201−15

= (2× 10 + 1× (−20) + 3× 1 + 0× (−1) + 1× 5) = (8).

�Example Write the following system of linear equations in matrix form,

5x− y + 3z = 10

x + 5y − 6z = 1

−x + 2y + z = 0.

Solution

We need to write the system in the form AX = C. This is easily done. The coefficient matrixis

A =

5 −1 31 5 −6−1 2 1

.

The column vector for the right hand side is

C =

1010

.

The column vector of unknowns is

X =

x

y

z

.

The linear system is 5 −1 31 5 −6−1 2 1

x

y

z

=

1010

.

2

�Example Perform the matrix multiplication AC, where

A =

1 0 −43 2 11 −1 1

and C =

−231

.

Solution

AC =

1 0 −43 2 11 −1 1

−231

=

1× (−2) + 0× 3 + (−4)× 13× (−2) + 2× 3 + 1× 11× (−2) + (−1)× 3 + 1× 1

=

−61−4

.

2

Notice in our last example that the element −6, the first element in the column resulting fromthe product AC comes from multiplying the first row of A onto C. The second element of the


column, 1, comes from multiplying the second row of A onto C and so on. For this to work thenumber of columns in A must be equal to the number of elements (rows) in C.

Written in this way the matrix product is easily generalised. Two matrices A and B can bemultiplied if

number of columns of A = number of rows of B.

If this is the case the element in the ith row and jth column of the product AB is given by theproduct of the ith row of A with the jth column of B.

�Example Form AB where

A =(

2 1−1 3

)and B =

(2 0 −11 1 5

)Solution Notice number of columns of A is 2 and the number of rows of B is 2. So we can formAB.

AB =(

2 1−1 3

) (2 0 −11 1 5

)=

(2× 2 + 1× 1 2× 0 + 1× 1 2× (−1) + 1× 5

(−1)× 2 + 3× 1 −1× 0 + 3× 1 (−1)× (−1) + 3× 5

)=

(5 1 31 3 16

).

2

To write out this matrix multiplication rule in its full generality we need a little abstractnotation.

Consider a general m×n matrix, A, i.e. a matrix with m rows and n columns. We enumerate theelements (or entries) of A using the notation aij , where i gives the row number (so i = 1, 2, . . . ,m.)and j gives the column number (so j = 1, 2, 3, . . . , n).

A = (aij) =

a11 a12 . . . aij . . . a1n

a21 a22 . . . a2j . . . a2n

. . .

ai1 ai2 . . . aij . . . ain

. . .

am1 am2 . . . amj . . . amn

.

Perhaps this notation is a little easier to fathom if we write out a general 2× 2 matrix C,

C = (cij) =(

c11 c12

c21 c22

).

Now for our general matrix multiplication rule

Let A = (aij) be an m× n matrix and B = (bij) be an r × s matrix. Thenthe product AB is only defined when n = r. If n = r then the product is

AB =

(n∑

k=1

aikbkj

)

an m× s matrix.

�Example For

A =

1 0 1−1 5 0

2 1 1

, B =

−1 23 01 1

, C =

1 0 0−1 2 1

0 3 1


state whether the following products exist. If the product exists calculate it.

(a) AB (b) BA (c) AC (d) CA.

Solution

(a) AB, for the product to exist we must have: number of columns of A = number of rows of B.This is true in this case.

AB =

1 0 1−1 5 0

2 1 1

−1 23 01 1

=

1× (−1) + 0× 3 + 1× 1 1× 2 + 0× 0 + 1× 1(−1)× (−1) + 5× 3 + 0× 1 −1× 2 + 5× 0 + 0× 1

2× (−1) + 1× 3 + 1× 1 2× 2 + 1× 0 + 1× 1

=

0 316 −22 5

.

(b) BA, the product does not exist. We have, the number of columns of B = 2 6= number ofrows of A = 3.

(c) AC =

1 0 1−1 5 0

2 1 1

1 0 0−1 2 1

0 3 1

=

1 3 1−6 10 5

1 5 2

.

(d) CA =

1 0 0−1 2 1

0 3 1

1 0 1−1 5 0

2 1 1

=

1 0 1−1 11 0−1 16 1

.

2

Note from parts (c) and (d) of our last example that AC 6= CA. Of course, for real numbersa and c we have ac = ca. This is not true, in general, for square matrices (number of rows =number of columns). Matrices, in general, do not commute – we cannot interchange the order ofmultiplication.

The Algebra of Matrices

We now introduce definitions of matrix addition and scalar multiplication. The reasons for theform of these definitions is easy to understand if you think of the matrix in terms of the associatedlinear system of equations. It is not necessary to do this, however, these are abstract definitionsgiving an algebra of matrices.

The multiplication of a matrix A = (aij) by a scalar (number) λ is defined tobe

λA = Aλ = (λaij).


All this says is that you multiply a matrix by a number simply by multiplying each entry in A bythe number.

We now define the addition of two matrices. The idea is simple we simply add correspondingelements in the matrices. Note that we can only do this if the matrices are of the same size, i.e.they must both be m× n.

Two matrices of the same size, A = (aij) and B = (bij), are added as follows

A + B = (aij + bij).

�Example For

M =

1 0 −12 5 01 1 −1

and N =

2 −1 56 0 03 1 1

calculate the following

(i) 3M (ii) −5N (iii) M + N (iv) 2M − 3N .

Solution

(i) 3M = 3

1 0 −12 5 01 1 −1

=

3 0 −36 15 03 3 −3

.

(ii) −5N = −5

2 −1 56 0 03 1 1

=

−10 5 −25−30 0 0−15 −5 −5

.

(iii) M + N =

1 0 −12 5 01 1 −1

+

2 −1 56 0 03 1 1

=

1 + 2 0− 1 −1 + 52 + 6 5 + 0 0 + 01 + 3 1 + 1 −1 + 1

=

3 −1 48 5 04 2 0

.

(iv) 2M − 3N = 2

1 0 −12 5 01 1 −1

− 3

2 −1 56 0 03 1 1

=

2 0 −24 10 02 2 −2

+

−6 3 −15−18 0 0−9 −3 −3

=

−4 3 −17−14 10 0−7 −1 −5

.

2

Using our definitions of matrix multiplication and addition, together with scalar multiplicationwe can easily prove the following associative and distributive laws.

Theorem

Let A,B and C be matrices of the size required so that all products below exist; let λ ∈ R be ascalar.

Then


(i) A(BC) = (AB)C, associativity of the matrix product.

(ii) λ(AB) = (λA)B = A(λB)

(iii) A(B + C) = AB + AC and (B + C)A = BA + CA, distributivity of matrix multiplication.

(iv) λ(A + B) = λA + λB, distributivity of scalar multiplication.

Proof. The proofs are very easy – write the matrices as A = (aij) and then use the relevantproperty of the real numbers aij . We’ll prove the first of the distributive laws, (iii).

(iii)B + C = (bij) + (cij) = (bij + cij) = (dij), say.

So dij = bij + cij .

A.(B + C) = (apq)(dij)

=

(n∑

k=1

apkdkj

)

=

(n∑

k=1

(apkbkj + apkckj)

)

=

(n∑

k=1

apkbkj +n∑

k=1

apkckj

)

=

(n∑

k=1

apkbkj

)+

(n∑

k=1

apkckj

)= AB + AC.

�


♠ Exercises 2

1. For

A =

−5 1 12 1 0−3 0 1

, B =

10 −1 03 −2 11 0 1

, C =

2 30 −14 2

state whether the following products and (or) sums exist. If the product exists calculate it.

(a) AB (b) AC (c) CB (d) A−B (e) A + 3C

(f) AC −BC (g) AB −BA.

2. Consider the matrices

A =(

0 10 1

), B =

(−1 −1

0 0

).

Prove that(A + B)2 6= A2 + 2AB + B2

but that(A + B)3 = A3 + 3A2B + 3AB2 + B3.

Note: A2 = AA, A3 = AAA etc.

3. Suppose A and B are square matrices, i.e. n× n matrices, show that if A and B commute,i.e. AB = BA, then so do Am and Bm.

4. Contagious Diseases

Suppose a group of m people have a contagious disease. Suppose that this first group hassome contact with a second group of n people. We define numbers aij (= 0 or 1) such that

aij =

1, if the ith member of group 1 has contacted the

jth member of group 20, no contact between the ith member of

group 1 and jth member of group 2.

A = (aij) is the direct contact matrix. Now suppose we have a third group of r individualswho have come into contact with some individuals of group 2. We have a direct contactmatrix for group 2 to group 3, B = (bk`).

The matrix C = AB will represent the indirect contacts between groups 1 and 3. The `th columnof C will give the indirect contacts of the `th individual of the third group with the disease.

Let

A =

1 0 1 00 1 1 01 0 0 1

, B =

1 0 1 0 00 0 0 1 01 1 0 0 00 0 1 0 1

(a) How many people are in each of the groups?

(b) Find the matrix of indirect contacts for group 1 to group 3.

(c) Are there any individuals in group 3 who have had no contact with the disease?

(d) How many contacts has the 3rd individual of the third group had?

14 Lecture 3.3 The Inverse of a Square Matrix

Lecture 3.3 The Inverse of a Square Matrix

Two Special Matrices

The real numbers possess two special elements 0 and 1 which satisfy

0 + a = a + 0 = a and

1 · a = a · 1 = 1, for all a ∈ R.

Do there exist corresponding elements in our algebra of matrices? The answer is a qualifiedyes.

We define the m × n zero matrix Om,n, as the matrix all of whose entries are zero. In whichcase we clearly have for any m× n matrix A,

Om,n + A = A + Om,n = A

and A−A = Om,n.

To define a multiplicative identity (a one!) for matrices we must confine ourselves to n × n

matrices, square matrices.

The n× n identity matrix is the n× n matrix with 1’s down the maindiagonal (the diagonal from the top left corner to the bottom right corner) and

0’s elsewhere. That is,

In = (δij), where δij ={

1 if i = j

0 if i 6= j.

The set of constants δij is known as the Kronecker delta.

For example,

I2 =(

1 00 1

)and I3 =

1 0 00 1 00 0 1

.

As a consequence of our definition we have the following theorem.

Theorem Let A be any n× n matrix, then

AIn = InA = A.

Proof This is a very straightforward proof which follows directly from the definition In.

For A = (aij), we have

AIn = (aij)(δk`)

=

(n∑

k=1

aikδkj

).

Butn∑

k=1

aikδkj = ai1δ1j + ai2δ2j + . . . ainδnj , and we know from the definition of the Kronecker

delta that only one term in this sum can be non-zero – the term with k = j. So,

n∑k=1

aikδkj = aijδjj = aij

Lecture 3.3 The Inverse of a Square Matrix 15

∴ AIn = (aij) = A.

In a similar manner, InA = A. �

Inverse of a Matrix

A nonzero real number has a multiplicative inverse, i.e. for a ∈ R and a 6= 0 we have a−1 = 1a ,

with a−1a = 1. Can we find an inverse matrix for a given matrix A? The first observation is thatA must be square, i.e. an n× n matrix. This is because the analogue of 1 is In.

Let A and B be n× n matrices. Suppose that

AB = BA = In,

then B is called the inverse of A and we write B = A−1 if such a matrix B

exists.

AA−1 = A−1A = I.

If a square matrix has an inverse it is said to be invertible.

Notice that the definition implies that (A−1)−1 = A, if A is invertible.

In our definition we referred to “the” inverse. As the following theorem shows the inverse, if itexists, is unique.

Theorem If a square matrix is invertible then the inverse is unique.

Proof Let B and C each be an inverse for A. That is,

AB = BA = I and AC = CA = I.

Then B(AC) = BI = B

= (BA)C

= IC = C,

where we have used the associative law for matrix multiplication. So, B = C. The inverse isunique. �

Let us, for the moment, return to linear systems of equations. A system of n equations in n

unknown can be written asAX = b,

where A is an n× n matrix and b is an n× 1 matrix or column vector. If A is invertible then wehave

A−1AX = A−1b

IX = A−1b

i.e. X = A−1b,

a unique solution. We have proved.

Theorem If the coefficient matrix, A, of a system of n linear equations in n unknowns is invertiblethen the system has a unique solution. The system AX = b, for A invertible, has unique solutionX = A−1b.

Another important fact about the inverse is the following.

Theorem Let A and B be invertible matrices. Then AB is invertible and

(AB)−1 = B−1A−1.


Proof Write C = AB and D = B−1A−1. Then,

DC = (B−1A−1)(AB)

= B−1(A−1A)B, by associativity.

= B−1(I)B

= B−1B

= I.

Also, CD = (AB)(B−1A−1)

= A(BB−1)A−1

= AA−1

= I.

We see from our definition of the inverse that D = B−1A−1 is the inverse of C = AB. �

We have still to answer two basic questions.

• How do we calculate the inverse?

• Which matrices have inverses?

We will answer the first of these questions now. The second question requires the notion of adeterminant for a complete answer, we will examine this later.

Calculating the Inverse Matrix

In fact we already effectively have a technique for calculating the inverse. In solving the n× n

linear systemAX = b

we reduce the coefficient matrix to the identity matrix.

A−1AX = A−1b

i.e IX = A−1b.

If we write the original system asAX = Ib

we see that elementary row operations can be used to transform the system to

IX = A−1b.

This can be viewed as transforming the matrix

(A|I)

to the matrix(I|A−1).

The matrix (A|I) is referred to as the augmented matrix for A.

We have our technique for computing the inverse of A: form the augmented matrix (A|I) anduse elementary row operations to reduce the left half of the augmented matrix to the identity I.The right side of the augmented matrix is then the desired inverse.

�Example Find the inverse of(

2 11 1

).

Solution

The augmented matrix is (2 1 1 01 1 0 1

).


We now row reduce this matrix

12R1

(1 1

212 0

1 1 0 1

)R2 −R1

(1 1

212 0

0 12 − 1

2 1

)2R2

(1 1

212 0

0 1 −1 2

)R1 −

12R2

(1 0 1 −10 1 −1 2

).

We have (2 11 1

)−1

=(

1 −1−1 2

).

2

Next we want to calculate the inverse of any (invertible) 2× 2 matrix,

A =(

a11 a12

a21 a22

).

Firstly, observe that at least one of a11 and a21 must be nonzero. It is easy to see that thematrix (

0 a12

0 a22

)is not invertible.

The augmented matrix for A is (a11 a12 1 0a21 a22 0 1

).

One of a11, a21 must be non-zero, if a11 = 0 we interchange the rows. So we may assume a11 6= 0.We now reduce the augmented matrix.

1a11

R1

(1 a12

a11

1a11

0a21 a22 0 1

)R2 − a21R1

(1 a12

a11

1a11

00 a22 − a12a21

a11

−a21a11

1

).

Now to proceed further we must have

a22 −a12a21

a116= 0

i.e.a22a11 − a12a21

a116= 0.

The quantity 4 = a22a11 − a12a21 is known as the determinant of the 2 × 2 matrix A. So anecessary condition for A to have an inverse is that its determinant, also written as detA, mustbe nonzero. In fact, as we will see later, the condition

4 = det A 6= 0

is both necessary and sufficient for A to have an inverse. Back to our row reduction, assuming4 6= 0, (

1 a12a11

1a11

00 4

a11

−a21a11

1

)a11

4R12

(1 a12

a11

1a11

00 1 −a21

4a114

)

R1 −a12

a11R2

(1 0 1

a11+ a12a21

a114−a124

0 1 −a214

a114

).

Note that1

a11+

a12a21

a114=

1a114

(4+ a12a21) =a22

4. So we have

(1 0 a22

4−a124

0 1 −a214

a114

).


We have the following result

Theorem A 2 × 2 matrix is invertible if and only if 4 = det A 6= 0. If A =(

a11 a12

a21 a22

)is

invertible then

A−1 =14

(a22 −a12

−a21 a11

).

�Example Find the inverse, if it exists, of 1 2 01 0 −10 1 0

.


Solution

The augmented matrix is 1 2 0 1 0 01 0 −1 0 1 00 1 0 0 0 1

We now reduce

R2 −R1

1 2 0 1 0 00 −2 −1 −1 1 00 1 0 0 0 1

−12R2

1 2 0 1 0 00 1 1

212 − 1

2 00 1 0 0 0 1

R1 − 2R2

1 0 −1 0 1 00 1 1

212 − 1

2 00 1 0 0 0 1

R3 −R2

1 0 −1 0 1 00 1 1

212 − 1

2 00 0 − 1

2 − 12

12 1

−2R2

1 0 −1 0 1 00 1 1

212 − 1

2 00 0 1 1 −1 −2

R1 + R3

1 0 0 1 0 −20 1 1

212 − 1

2 00 0 1 1 −1 −2

R2 −

12R3

1 0 0 1 0 −20 1 0 0 0 10 0 1 1 −1 −2

.

We have 1 2 01 0 −10 1 0

−1

=

1 0 −20 0 11 −1 −2

.

2

♠ Exercises 3

1. Calculate, if it exists, the inverse for each of the following matrices.

(a)(

2 13 2

)(b)

(0 11 0

)(c)

1 1 10 2 35 5 1

(d)

3 2 10 2 20 0 −1

(e)

1 1 10 1 10 0 1

(f)

1 0 2 3−1 1 0 4

2 1 −1 3−1 0 5 7

2. Use induction to prove that if A1, . . . , AN are all invertible matrices then A1A2 . . . AN is

invertible and(A1A2 . . . AN )−1 = A−1

N A−1N−1 . . . A−1

2 A−11 .

3. For any real number θ show that the matrix sin θ cos θ 0cos θ − sin θ 0

0 0 1

.

is invertible and find its inverse.

20 Lecture 3.4 More on Linear Equations

Lecture 3.4 More on Linear Equations

In our first lecture we saw how to solve (if the equations where consistent) a system of n equationsin n unknowns. We used elementary row operations to reduce the augmented matrix for thesystem. In matrix terms we found this was equivalent to inverting the coefficient matrix A,

system: AX = b

solution (A invertible): X = A−1b.

To get a unique solution we require A to be invertible.

But this is not all there is to linear systems.

• What about homogeneous systems, i.e. systems with b = 0?

• What about m linear equations in n unknowns (m 6= n)?

Homogeneous Systems

An m× n Homogeneous system

a11x1 + a12x2 + . . . + a1nxn = 0

a21x1 + a22x2 + . . . + a2nxn = 0

. . .

am1x1 + am2x2 + . . . + amnxn = 0

will always have at least one solution. This is the trivial solution, x1 = x2 = . . . = xn = 0. So ingeneral there are just two possibilities: either there is a unique solution (which will be the trivialsolution) or there is an infinite number of solutions (including the trivial solution).

An n× n homogeneous systemAX = On,1,

with A = (aij) an n× n matrix, will have a unique solution, the trivial solution, if A is invertible.For a non-trivial solution to an n× n homogeneous system A must be non-invertible or singular.

For an m × n homogeneous system with n > m there must always be an infinite number ofsolutions. This is because row reduction would have to lead to the unique trivial solution if therewas only one solution. In this case we would have to end up with

x1 = 0x2 = 0

x3 = 0. . .

xn = 0

at least, there may be other equations of the form “0 = 0”. But row reduction preserves thenumber of equations, so there must have been at least n equations to begin with – i.e. m ≥ n.This contradicts the assumption n > m. We have just proved the following theorem.

Theorem A homogeneous system of m equations in n unknowns has an infinite number of solutionsif n > m.

�Example Solve the system

x + y − z = 0

4x− 2y + 7z = 0

Solution Row reduce:(1 1 −1 04 −2 7 0

)R2 − 4R1

(1 1 −1 00 −6 11 0

)−16

R2

(1 1 −1 00 1 −11

6 0

)R1 −R2

(1 0 5

6 00 1 −11

6 0

)

Lecture 3.4 More on Linear Equations 21

There will be an infinite number of solutions given by

x +56z = 0 i.e. x = −5

6z

y − 116

z = 0 i.e. y =116

z.

All points (− 56z, 11

6 z, z), for any z ∈ R, satisfy the system. 2

Systems of m Equations in n Unknowns

We consider a system of m equation in n unknowns,

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

. . .

am1x1 + am2x2 + . . . + amnxn = bm.

Which can be written in matrix form as

AX = b.

The augmented matrix is (A|b). To solve the system (or to show it to be inconsistent) we wouldattempt to row reduce the augmented matrix. We notice that in all our examples, even in theinconsistent cases, we could always row reduce the augmented matrix to “stairstep” form

(∗)

0 . . . 0 ∗

∗∗

∗. . .

where all the entries under the steps are zero, all corner entries (marked with a *) are non-zeroand all other entries are arbitrary. Note the steps descend one row at a time whereas the span ofeach step may be more than one column. Such a matrix is known as a row-echelon matrix.

�Example The following are row-echelon matrices(1 50 0

),

(1 100 1

),

0 1 30 0 20 0 0

0 0 1 5 6 7

0 0 0 2 0 00 0 0 0 0 4

0 15 20 0 10 0 00 0 0

As you might have guessed every non-zero matrix can be turned into a row-echelon matrix.

Theorem By means of elementary row operations any non-zero matrix can be reduced to row-echelon form.

Proof

Let A = (aij) be a non-zero m×n matrix. Now A must have at least one non-zero column. Fromthe left this first non-zero column must contain at least one non-zero element. By interchangingrows, if necessary, we can ensure that the first (top-most) element of the first non-zero column isnon-zero. So A will have been transformed into a matrix of the form

B =

0 . . . 0 b11 b12 . . . b1r

0 . . . 0 b21 b22 . . . b2r

......

......

0 . . . 0 bm1 bm2 . . . bmr

22 Lecture 3.4 More on Linear Equations

with b11 6= 0.

Performing R2 − b21b11

R1, yields0 . . . 0 b11 b12 . . . b1r

0 . . . 0 0 c22 . . . c2r

......

......

0 0 0 bm2 . . . bmr

where c2j = b2j − b21

b11b1j . Now apply the same process to the submatrix c22 . . . c2r

......

bm2 bmr

We will extend our steps to another row. So after no more than m steps of this process we willarrive at a row-echelon matrix. �

If you now look a little closer at the row reductions we have performed you will see that theyall have two other things in common, aside from being in row-echolon form. Firstly, the non-zerocorner entries are all 1’s. Secondly, every entry above each corner 1 is zero. A row echelon matrixwith these two additional properties is called a reduced row-echelon matrix or Hermite matrix.

�Example The following are reduced row-echelon matrices(1 00 1

),

0 0 1 0 0 20 0 0 0 1 30 0 0 0 0 0

,

1 0 00 1 00 0 1

Using the same method of proof as above we can now easily prove the following theorem.

Theorem Every non-zero matrix can, by means of elementary row operations, be transformed toa reduced row-echelon matrix.

In fact, our solution method – Gauss-Jordan elimination is just the process of reducing a matrixto reduced row-echelon form.

Once we have reduced the augmented matrix to row-echelon form we can analyse the possiblesolutions of the system. Upon inspection we can draw one of the following three possible conclusions

• The last non-zero equation reads xn = c for some constant c. Then there is either a uniquesolution or an infinite number of solutions.

• The last non-zero equation reads

ck`x` + ck`+1x`+1 + . . . + ck`+sxn = c,

with ` + s = n. Where at least two of the cij are non-zero. There are an infinite number ofsolutions.

• The last equation reads 0 = c, where c 6= 0. There is no solution, the equations are inconsis-tent.

One point we should make here is that it is straight forward to solve a system of equations onceyou have it in row-echelon form. The method simply involves systematically “back substituting”from the last equation. This technique is known as Gaussian elimination; on some occasions itmay be quicker than Gauss-Jordan elimination.

�Example Use Gaussian elimination to solve the following system

x + 2y + 3z = −1

3x + y + 2z = 2

2x + 3y + z = 0

Lecture 3.4 More on Linear Equations 23

Solution

The augmented matrix is 1 2 3 −13 1 2 22 3 1 0

.

We now reduce to row-echelon form.

R2 − 3R1

1 2 3 −10 −5 −7 52 3 1 0

R3 − 2R1

1 2 3 −10 −5 −7 50 −1 −5 2

−15

R2

1 2 3 −10 1 7

5 −10 −1 −5 2

R3 + R2

1 2 3 −10 1 7

5 10 0 −18

5 1

The matrix is in row-echolon form the equations are now

x + 2y + 3z = −1

y +75z = 1

−185

z = 1.

¿From the last equation, z = − 518 . Substituting into the second equation gives y = 1− 7

5×(− 5

18

)=

2518 . Finally, substituting these values for y and z into the first equation gives x = − 53

18 . 2

♠ Exercises 4

1. State which of the following matrices is in row-echelon form, reduced row-echelon form orneither.

(a)(

1 −7 5 20 3 1 1

)(b)

1 0 30 1 30 1 4

(c)(

1 0 1 21 0 0 1

)

(d)

0 4 2 0 30 3 0 0 00 0 0 2 40 0 0 0 1

(e)

0 0 1 3 0 4 0 00 0 0 0 1 2 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1

2. Consider the system

2x1 − 3x2 + 4x3 = 0

−x1 + 7x2 − x3 = 0

4x1 − 11x2 + kx3 = 0

For what values of k will the system have nontrivial solutions?

3. Find all solutions (if any) to the following systems

(a) 3x + 4y = 9 (b) x + y + z = 2−2x + 3y = 4 2x− y + 2z = 4

−3x + 2y + 3z = 8

(c) x1 + x2 + x3 + x4 = 0 (d) x + y + z + u = 02x1 − 3x2 − x3 + 4x4 = 0 2x− 3y − z + 4u = 05x1 − x2 + 2x3 + x4 = 0 −2x + 4y + z − 2u = 0

24 Lecture 3.5 Introduction to Determinants

Lecture 3.5 Introduction to Determinants

In Lecture 3.3 we found that a 2× 2 matrix, A, is invertible (i.e. has an inverse) if and only if

∆ = detA 6= 0.

Such a matrix is said to be non–singular. If det A = 0 the matrix A is said to be singular.

Our aim in this lecture is generalise the concept of a determinant to arbitrary square matrices.Unfortunately, there is no intuitively obvious way to do this in an introductory course. If you dofurther units in linear algebra you will see the determinant introduced in the rather natural settingof determinantal mappings. So in this course we will simply introduce the “mechanical” definitionof a determinant and then try for a little more explanation after the fact.

The definition of determinant given below is inductive. We define the determinant of an n× n

matrix in terms of determinants for (n−1)×(n−1) matrices. The inductive process can be startedby defining the determinant of a 1× 1 matrix (a) to be a.

We need two preliminary definitions. Note, in the second of these definitions, we are assumingthe existence of something called the determinant of a matrix.

Let A be an n× n matrix and let Aij be the (n− 1)× (n− 1) matrix obtainedfrom A by deleting the ith row and jth column from A. Then Aij is called the

ijth minor of A

�Example Find A12, A32 and A22 if

A =

0 2 103 −1 2−4 1 5

.

Solution

A12 is A with the first row and second column deleted. That is,

A12 =(

3 2−4 5

).

Similarly,

A32 =(

0 103 2

)and A22 =

(0 10−4 5

).

2

Let A be an n× n matrix. The ijth cofactor of A, denoted cij is defined by

cij = (−1)i+j det(Aij).

�Example Find the cofactors c12, c32 and c22 for the matrix A of the last example.

Solution

c12 = (−1)1+2 det A12

= (−1)3 det A12

= −detA12

= −(3× 5− (−4)× 2)

= −23

Similarly, c32 = −detA32 = 30 and

c22 = detA22 = 40.

Lecture 3.5 Introduction to Determinants 25

2

Note we are using the definition of a 2× 2 determinant given in Lecture 3.3.

Now to our definition of a determinant.

Let A = (aij) be an n× n matrix then the determinant of A, det A (alsowritten as |A|), is defined to be

|A| =n∑

j=1

cijaij

where cij is the ijth cofactor of A.

Notice the calculation of the determinant does not depend on the i (the row number) chosen.The expression for |A| using i = 1 is called an expansion of the determinant using the first row. Ifwe use i = 2 it’s an expansion using the second row, and so on.

�Example Find detA using expansions by the first row and by the second row, where

A =

1 0 2−1 1 1

0 1 0

.

Solution

Expanding by the first row

det A = c11.1 + c12.0 + c13.2

where the cij are the cofactors of A.

c11 = (−1)1+1|A11| = (−1)2∣∣∣∣ 1 1

1 0

∣∣∣∣ = −1

c12 = (−1)1+2|A12| = −∣∣∣∣ −1 1

0 0

∣∣∣∣ = 0

c13 = (−1)1+3|A13| =∣∣∣∣ −1 1

0 1

∣∣∣∣ = −1.

Hence,detA = 1.(−1) + 2.(−1) = −3.

Expanding by the second row

|A| = −c21 + c22 + c23

c21 = (−1)2+1

∣∣∣∣ 0 21 0

∣∣∣∣ = 2

c22 = (−1)2+2

∣∣∣∣ 1 20 0

∣∣∣∣ = 0

c23 = (−1)2+3

∣∣∣∣ 1 00 1

∣∣∣∣ = −1

∴ |A| = −2 + 0− 1 = −3.

2

Note that the calculation of |A| gives the same answer independent of the choice of row usedin the expansion.


As we noted in the solution of the last example the calculation of |A| is independent of the rowchosen for the expansion. In fact, we have implicitly assumed this in our choice of notation |A|with no reference to the row number. This is rather awkward to show in the general case, althoughit is “obvious” if one starts with the abstract determinantal mappings mentioned at the beginningof the lecture.

For the case of 3 × 3 determinants we can use the “sledge hammer” approach and simplycalculate the determinant and check that the result is the same no matter which row was used inthe expansion. In fact,

det(aij) = a11a22a33 − a11a23a32 + a12a23a31 − a12a21a33 + a13a21a32 − a13a22a31.

where A = (aij), is any 3× 3 matrix.

The pattern, evident in the 3 × 3 case, is this: A determinant consists of a sum of products(with or without minus signs). The products range over all possible ways of multiplying n matrixentries such that each pair of elements in the product do not come from the same row or column.

�Example Calculate

∣∣∣∣∣∣∣∣1 0 0 12 1 0 3−1 −2 1 1

3 0 −2 2

∣∣∣∣∣∣∣∣.Solution

Expand by the first row∣∣∣∣∣∣∣∣1 0 0 12 1 0 3−1 −2 1 1

3 0 −1 2

∣∣∣∣∣∣∣∣ = (−1)1+1 · 1 ·

∣∣∣∣∣∣1 0 3−2 1 1

0 −1 2

∣∣∣∣∣∣+ 0 + 0

+(−1)1+4 · 1 ·

∣∣∣∣∣∣2 1 0−1 −2 1

3 0 −1

∣∣∣∣∣∣=

∣∣∣∣∣∣1 0 3−2 1 1

0 −1 2

∣∣∣∣∣∣−∣∣∣∣∣∣

2 1 0−1 −2 1

3 0 −1

∣∣∣∣∣∣= (−1)1+1 · 1 ·

∣∣∣∣ 1 1−1 2

∣∣∣∣+ (−1)1+3 · 3 ·∣∣∣∣ −2 1

0 −1

∣∣∣∣− (−1)1+1 · 2 ·

∣∣∣∣ −2 10 −1

∣∣∣∣− (−1)1+2 · 1 ·∣∣∣∣ −1 1

3 −1

∣∣∣∣= 1 · (2 + 1) + 3 · (2− 0)− 2 · (2− 0) + 1 · (1− 3)

= 3.

2

Some Special Determinants

We easily obtain the following

|On,n| = 0 and |In| = 1

In working out |In| you’ll notice that it is just a product of the diagonal entries. In fact, thisproperty generalises to other special matrices.

A square matrix is called upper triangular if all its entries below the main diagonal are zero.It is called lower triangular if all entries above the main diagonal are zero. A square matrix iscalled diagonal if all its elements not on the diagonal are zero. In terms of a general n× n matrixA = (aij) we have:

• A is upper triangular if aij = 0 for i > j.

Lecture 3.5 Introduction to Determinants 27

• A is lower triangular if aij = 0 for i < j.

• A is diagonal if aij = 0 for i 6= j.

�Example

The matrices A =

2 10 −10 5 00 0 20

and B =

−10 3 6 5

0 5 0 −10 0 2 10 0 0 −2

are upper triangular;

the matrices C =

6 0 02 3 0−1 5 4

and D =(

0 01 0

)are lower triangular;

the matrices In and E =

1 0 0 00 −2 0 00 0 3 00 0 0 5

are diagonal. 2

It is clear, for A = (aij) a diagonal matrix, that

|A| = a11a22a33 . . . ann,

you should check this. What happens in the general upper triangular (or lower triangular) case?

Theorem Suppose A = (aij) is an n× n lower triangular matrix then

|A| = a11a22a33 . . . ann.

That is, |A| is the product of the diagonal elements of A.

Proof. It is pretty clear this must be true – just look at the expansion for |A| by the first row.

To give a rigorous proof we’ll use induction. Let P (k) be the statement A = (aij) is a k × k

lower triangular matrix then |A| = a11a22a33 . . . akk.

The statement P (1) (i.e. k = 1) is clearly true.

Assume P (k) is true, we now prove P (k + 1) is true.

Let A = (aij) be a (k + 1)× (k + 1) lower triangular matrix. Then, expanding by the first row

|A| = a11c11, as a1j = 0 for j > 1.

Now c11 = (−1)1+1|A11| = |A11|. However, A11 is a k × k lower triangular matrix. So by ourinductive hypothesis

|A11| = product of the diagonal entries of A11.

= a22a33 . . . ak+1ak+1

∴ |A| = a11a22a33 . . . ak+1 k+1,

as required. �

♠ Exercises 5

1. In each of the following calculate the determinants first by an expansion on the first row andthen by an expansion on the third row. Verify, in each case, that you get the same answerfrom the two expansions.

(a)

∣∣∣∣∣∣2 0 30 2 42 2 0

∣∣∣∣∣∣ (b)

∣∣∣∣∣∣−2 1 0

1 1 31 3 −1

∣∣∣∣∣∣(c)

∣∣∣∣∣∣3 −1 44 2 32 −1 6

∣∣∣∣∣∣ (d)

∣∣∣∣∣∣−2 3 1

0 2 14 6 5

∣∣∣∣∣∣.


2. Evaluate the following determinants

(a)

∣∣∣∣∣∣∣∣2 0 1 03 1 1 1−1 0 2 1

4 1 −2 1

∣∣∣∣∣∣∣∣ (b)

∣∣∣∣∣∣∣∣1 2 −1 43 0 −1 5−2 0 0 7

4 2 3 0

∣∣∣∣∣∣∣∣

(c)

∣∣∣∣∣∣∣∣∣∣

2 3 10 −1 50 1 −5 3 10 0 4 2 −20 0 0 −1 00 0 0 0 2

∣∣∣∣∣∣∣∣∣∣3. If A and B are diagonal n× n matrices show that det(AB) = (detA) · (detB).

4*. Show that if A and B are lower triangular n× n matricesdet(AB) = (det A) · (detB).

Lecture 3.6 Properties of Determinants 29

Lecture 3.6 Properties of Determinants

The Geometry of Determinants

In this subsection we give a very brief introduction to the geometric significance of determinants.

We work in the Cartesian plane R2 with coordinates x and y which we write as a column vector

X =(

x

y

).

If A is a 2× 2 real matrix we define a map mA by

mA : R2 −→ R2, with

mA : X 7−→ AX.

Note that if A = (aij) then

AX =(

a11x + a12y

a21x + a22y

),

so then mA(x, y) = (a11x + a12y, a21x + a22y).

This simply means x gets mapped to a11x+ a12y and y gets mapped to a21x+ a22y. Note thatstraight lines are mapped to straight lines.

The remarkable fact (which we will not prove) is that any region of area d gets mapped to aregion of area

d · det A.

You might like to verify this for the special case of a unit square with one vertex at the origin.

If detA = 0 then either A = O2,2 and mA sends all of R2 to (0, 0) or A 6= O2,2 and mA sendsall of R2 to a straight line through the origin (why?). If however det A 6= 0 then mA is a bijectionfrom R2 to R2.

These ideas generalise to n× n matrices acting in Rn.

Properties of Determinants

We have seen that the value of determinant is independent of the row used to evaluate it. Anatural question is “what happens if we expand via a column?”. The answer, is we get the sameresult!

A matrix whose columns and rows have been interchanged is known as the transposed matrix.

Let A = (aij) be an n× n matrix then the transpose of A, denoted At is thematrix formed by interchanging rows and columns of A,

At =

a11 a12 . . . a1n

......

an1 . . . ann

t

=

a11 a21 a31 . . . an1

a12

......

a1n . . . . . . . . . ann

.

�Example

(a)(

1 23 0

)t

=(

1 32 0

)(b)

1 2 30 1 15 6 2

t

=

1 0 52 1 63 1 2

30 Lecture 3.6 Properties of Determinants

2

Theorem det(At) = det A.

Remark: This means expanding A by the jth column in evaluating |A| gives the same answeras expanding by any row. This is because the jth column of A is the jth row of At so we get, bythe usual row expansion formula, just det(At). But the theorem says det(At) = detA. So we canexpand determinants by any row or any column.

Proof

We use an inductive proof. It is easy to verify the claim for n = 1 or n = 2. Assume it is truefor n = k. Now prove it for n = k + 1.

Let A = (aij) be an (k + 1)× (k + 1) matrix and write B = (bij) = At. Note that aij = bji. Ifwe let cij be the ijth cofactor of A and cij be the ijth cofactor of B then

cij = cji.

This is because the ith row of A is the ith column of B = At and the jth column of A is thejth row of B; and the cij and cji involve k × k determinants for which we have assumed that thedeterminant of a matrix and its transpose are equal.

Now, |A| =n∑

j=1

aijcij , expanding by the ith row. If we add the right side from i = 1 to n we

get on the left just n copies of |A| and

n|A| =n∑

i=1

n∑j=1

aijcij

=n∑

j=1

(n∑

i=1

aijcij

)

=n∑

j=1

(n∑

i=1

bjicji

).

The term

(n∑

i=1

bjicji

)is the expansion of |B| by the jth row. Adding from j = 1 to n simply

gives n copies of |B| so,

n|A| = n|B|i.e. |A| = |B|, as required.

�

Calculating determinants can be a cumbersome business. There are however a number ofproperties of determinants which can make such calculations easier. The following theorem is acompilation of eight of the basic properties of determinants.

Theorem Let A be an n× n matrix.

1. If any row or column of A has all elements zero then detA = 0.

2. If each element of a row (or column) of A is multiplied by a constant λ to give a new matrixB then

|B| = λ|A|.

3. Let α be a constant then

|αA| = αn|A|.


4. Let A = (aij) and suppose B is a matrix identical to A except for the jth column, i.e.

B =

a11 a12 . . . α1j . . . a1n

a21 a22 . . . α2j . . . a2n

......

......

an1 an2 . . . αnj . . . ann

Suppose also that D is given as

D =

a11 a12 . . . a1j + α1j . . . a1n

a21 a22 . . . a2j + α2j . . . a2n

......

...an1 an2 . . . anj + αnj . . . ann

Then |D| = |A|+ |B|.

5. Interchanging any two rows (or columns) of A has the effect of multiplying det A by −1.

6. If A has two equal rows (or columns) then

|A| = 0.

7. If one row (column) of A is a constant multiple of another row (column) then detA = 0.

8. If a multiple of one row (column) of A is added to another row (column) of A then thedeterminant of A is unchanged.

Proof

1. If the kth row has every element zero then, if A = (aij), akj = 0, j = 1, 2, . . . n. Expand |A|by the kth row to get |A| = 0.

2. Let A = (aij) and B be the matrix obtained from A by multiplying every element of the ithrow by λ. The ith row of B will be

λai1 λai2 . . . λain.

Expand |B| by its ith row, remember B is the same as A except for the ith row,

|B| =n∑

j=1

(λaij)cij ,

where cij is the ijth cofactor of B and A. We have

|B| = λn∑

j=1

aijcij = λ|A|,

as required.

3. Follows easily from n applications of property 2 above.

4. Notice that A,B,D are the same except for their jth columns. This means the n cofactors,for the jth columns of A,B,D, will be identical; they will all be cij , i = 1, 2, . . . n, the jthcofactors of A.

Expanding |D| about its jth column gives

|D| =n∑

i=1

(aij + αij)cij

=n∑

i=1

aijcij +n∑

i=1

αijcij

= |A|+ |B|, as required.

32 Lecture 3.6 Properties of Determinants

5. We first prove this proposition in the case where the rows are adjacent. Let B be the matrixobtained from A by interchanging the ith and (i + 1)th rows.

Expanding |A| about the ith row,

|A| =n∑

j=1

aijcij .

Let cij be the ijth cofactor for B. Expanding |B| about the (i + 1)th row (remember in B

this row has elements ai1, ai2 . . . , ain),

|B| =n∑

j=1

aijci+1j

Now,cij = (−1)i+j |Aij |, where

Aij is the ijth minor for A. Notice that the (i + 1)jth minor for B is identical to the ijthminor for A. So,

ci+1j = (−1)i+1+j |Aij |= −cij .

∴ |B| = −n∑

j=1

aijcij = −|A|.

So we have the result for adjacent rows. Next we notice that we can always obtain theinterchange of any two rows i and k by an odd number of adjacent row interchanges:-

Suppose k > i. First swap i and i + 1, then i and i + 2 and so on until the ith row is in thekth position and the kth row is now in the (k− 1)th position. Now move this kth row up tothe ith position. How many interchanges were made? We required k− i to get row i down tothe kth position. To get row k (which starts from the (k− 1)th position) to the ith positionrequires (k − 1)− i. So altogether there are

2(k − i)− 1

interchanges.

This odd number of interchanges means that |A| is multiplied by an odd number of −1’s, i.e.by −1.

6. See exercises.

7. See exercises.

8. See exercises. �

�Example Evaluate the following determinants

(a)

∣∣∣∣∣∣2 −3 5−4 6 −10

1 7 2

∣∣∣∣∣∣ (b)

∣∣∣∣∣∣1 −1 20 −2 53 1 4

∣∣∣∣∣∣Solution

(a) the determinant vanishes as the second row is −2 times the first.


(b) Multiply the first row by −3 and add it to the third row, the determinant is unchanged(properly 8 above)∣∣∣∣∣∣

1 −1 20 −2 53 1 4

∣∣∣∣∣∣ =

∣∣∣∣∣∣1 −1 20 −2 5

3− 3 · (1) 1− 3 · (−1) 4− 3 · (2)

∣∣∣∣∣∣=

∣∣∣∣∣∣1 −1 20 −2 50 4 −2

∣∣∣∣∣∣ .Next multiply the second row by 2 and add it to the third, the determinant is unchanged,∣∣∣∣∣∣

1 −1 20 −2 53 1 4

∣∣∣∣∣∣ =

∣∣∣∣∣∣1 −1 20 −2 50 4 −2

∣∣∣∣∣∣=

∣∣∣∣∣∣1 −1 20 −2 5

0 + 2 · (0) 4 + 2 · (−2) −2 + 2 · (5)

∣∣∣∣∣∣=

∣∣∣∣∣∣1 −1 20 −2 50 0 8

∣∣∣∣∣∣ .This last matrix is upper triangular so its determinant is the product of the diagonal elements,i.e. −16.

2

♠ Exercises 6

1. Evaluate the following determinants.

(a)

∣∣∣∣∣∣∣∣3 2 4 80 −1 3 41 3 5 22 1 9 6

∣∣∣∣∣∣∣∣ (b)

∣∣∣∣∣∣∣∣−2 3 −2 3

1 −1 7 −70 5 3 24 2 1 5

∣∣∣∣∣∣∣∣2. Give proofs for properties 6, 7 and 8 of the last theorem.

3. Use properties 1 to 8 to evaluate the following determinants. State clearly which of theproperties you have used.

(a)

∣∣∣∣∣∣∣∣7 6 4 3 −9−5 −5 −9 −2 7−2 0 7 1 −1

1 2 4 3 −5

∣∣∣∣∣∣∣∣ (b)

∣∣∣∣∣∣∣∣1 2 0 03 −2 0 00 0 1 −50 0 7 2

∣∣∣∣∣∣∣∣ .

4. If

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = 2 find

(a)

∣∣∣∣∣∣a12 a11 a13

a22 a21 a23

a32 a31 a33

∣∣∣∣∣∣ (b)

∣∣∣∣∣∣a11 − a13 a12 a13

a21 − a23 a22 a23

a31 − a33 a32 a33

∣∣∣∣∣∣(c)

∣∣∣∣∣∣a11 4a12 a13

2a21 8a22 2a23

a31 4a32 a33

∣∣∣∣∣∣5*. Let mA be the mapping defined at the beginning of the lecture. Show that the unit square

with vertices (0, 0), (1, 0), (0, 1), (1, 1) is mapped to rhombus of area detA.

34 Lecture 3.7 Determinants and Inverses

Lecture 3.7 Determinants and Inverses

Determinant of a Product.

If A = (aij) and B = (bij) are two square matrices we know that their matrix product is givenas

AB =

(n∑k

= aikbkj

).

Given the definition of det A it is hard to believe that det(AB) could be anything but a mess!It’s not.

Theorem If A and B are square matrices then |AB| = |A||B|. That is the determinant of aproduct is a product of the determinants.

Proof. The proof in the general n× n case is quite difficult without the more abstract notion ofa determinant.

We will settle for proof in the 2× 2 case. Let

A =(

a11 a12

a21 a22

)and B =

(b11 b12

b21 b22

).

Then C = AB =(

a11b11 + a12b21 a11b12 + a12b22

a21b11 + a22b21 a21b12 + a22b22

)and so

detC = (a11b11 + a12b21) (a21b12 + a22b22)− (a11b12 + a12b22) (a21b11 + a22b21)

= a11b11a22b22 + a12b21a21b12 − a11b12a22b21 − a12b22a21b11

= (a11a22 − a12a21)(b11b22 − b12b21)

= detA · detB.

�

Determinants and Inverses

In this section we wish to return to something raised in Lecture 3.3, the relationship betweenthe invertability of a matrix and its determinant. We need to do some preliminary work, however,before we get to this important relationship.

Theorem If A is invertible then det A 6= 0 and

det(A−1

)=

1det A

.

Proof. If A is invertible it has an inverse A−1 and by definition

I = AA−1.

So, taking determinants

|I| = |AA−1|= |A| · |A−1|,

by our previous theorem. But |I| = 1, so we have

|A| · |A−1| = 1.

Now |A| and |A−1| are products of finite elements so we must have det A 6= 0 (and det(A−1) 6= 0)together with

det(A−1) =1

det A.

Lecture 3.7 Determinants and Inverses 35

�

Before we can use determinants to calculate inverse matrices we need to introduce the conceptof the adjoint (or adjugate) to a matrix A.

Let A be an n× n matrix with ijth cofactor cij. Let B = (cij) be the matrix ofcofactors. Then the adjoint (or adjugate) of A, written adj A, is the transpose

of B

adj A = Bt =

c11 c21 . . . cn1

c12 c22 . . . cn2

......

...c1n c2n . . . cnn

.

�Example Let A =

2 1 01 3 −10 −1 1

compute adj A.

Solution We need to calculate all the cofactors:

c11 = (−1)1+1

∣∣∣∣ 3 −1−1 1

∣∣∣∣ = 2, c12 = (−1)1+2

∣∣∣∣ 1 −10 1

∣∣∣∣ = −1,

c13 = (−1)1+3

∣∣∣∣ 1 30 −1

∣∣∣∣ = −1, c21 = (−1)2+1

∣∣∣∣ 1 0−1 1

∣∣∣∣ = −1,

c22 = (−1)2+2

∣∣∣∣ 2 00 1

∣∣∣∣ = 2, c23 = (−1)2+3

∣∣∣∣ 2 10 −1

∣∣∣∣ = 2,

c31 = (−1)3+1

∣∣∣∣ 1 03 −1

∣∣∣∣ = −1, c32 = (−1)3+2

∣∣∣∣ 2 01 −1

∣∣∣∣ = 2,

and c33 = (−1)3+3

∣∣∣∣ 2 11 3

∣∣∣∣ = 5.

We have,

B = (cij) =

2 −1 −1−1 2 2−1 2 5

so that, adj A = Bt =

2 −1 −1−1 2 2−1 2 5

.

2

�Example If A =(

a11 a12

a21 a22

)find adj A.

Solution The cofactors are c11 = (−1)1+1a22 = a22, c12 = −a21, c21 = −a12 and c22 = a11.Then

B = (cij) =(

a22 −a21

−a12 a11

),

so

adj A = Bt =(

a22 −a12

−a21 a11

).

2

This last matrix you should recognise from Lecture 3.3 — it is, except for the factor of 1|A| , the

inverse of A!


This result generalises as follows:

Theorem Let A be an n× n matrix. Then

A · adj A = |A|In.

Proof. Let A = (aij), then

adj A = (cij)t =

c11 c21 . . . cn1

c12 c22 . . . cn2

......

...c1n c2n . . . cnn

where cij is the ijth cofactor of A. Next, let E = (eij) = A · adj A. Then

eij = (ai1ai2 . . . ain) ·

cj1

cj2

...cjn

= ai1cj1 + ai2cj2 + . . . + aincjn.

If i = j this is just detA, i.e. eii = det A for i = 1, 2, . . . n. We now show that if i 6= j then eij = 0.Consider the matrix M obtained from A by replacing the jth row of A by a copy of the ith row,

M =

a1 . . . a1n

......

ai1 . . . ain

......

ai1 . . . ain

...an1 . . . ann

←− jth row.

Then det M = 0 as two rows of M are the same. However, if we expand about the jth row weget

det M = ai1cj1 + ai2cj2 + . . . + aincjn,

remember except for the jth row M is identical to A. So we have our result, ai1cj1 +ai2cj2 + . . .+aincjn = 0 for i 6= j. Returning to our matrix E we have

eij ={

detA, i = j

0, i 6= j

= (detA)δij .

So E = (det A)In, as required. �

Now for our main result of this section, a straightforward consequence of the last theorem.

Corollary Let A be an n× n matrix. Then A is invertible if and only if detA 6= 0. If detA 6= 0then

A−1 =1|A|

adj A.

Proof. The first statement in the corollary is “ if and only if” so we need to prove it “in bothdirections”. From the first theorem of this lecture we know that if A is invertible then det A 6= 0.We need to prove that if detA 6= 0 then A is invertible; from our last theorem

A · adj A = |A|In.


Hence, if |A| 6= 0, then

AD = In, where

D =1|A|

adj A.

This means D is the inverse to A, so A is invertible. With D = A−1 we have also proved thesecond statement in the corollary, so we are finished. �

�Example Let A =

2 0 34 1 53 −1 7

. Is A invertible? If A is invertible calculate the inverse.

Solution

|A| = 2 ·∣∣∣∣ 1 5−1 7

∣∣∣∣− 0 ·∣∣∣∣ 4 5

3 7

∣∣∣∣+ 3 ·∣∣∣∣ 4 1

3 −1

∣∣∣∣= 24− 21 = 3.

So A is invertible.

Next we need the cofactors of A:

c11 =∣∣∣∣ 1 5−1 7

∣∣∣∣ = 12, c12 = −∣∣∣∣ 4 5

3 7

∣∣∣∣ = −13,

c13 =∣∣∣∣ 4 1

3 −1

∣∣∣∣ = −7, c21 = −∣∣∣∣ 0 3−1 7

∣∣∣∣ = −3,

c22 =∣∣∣∣ 2 3

3 7

∣∣∣∣ = 5, c23 = −∣∣∣∣ 2 0

3 −1

∣∣∣∣ = 2,

c31 =∣∣∣∣ 0 3

1 5

∣∣∣∣ = −3, c32 = −∣∣∣∣ 2 3

4 5

∣∣∣∣ = 2,

c33 =∣∣∣∣ 2 0

4 1

∣∣∣∣ = 2.

Then

adj A = (cij)t =

12 −3 −3−13 5 2−7 2 2

Then inverse is

A−1 =1|A|· adj A =

4 −1 −1−133

53

23

−73

23

23

.

The result should be checked by multiplication with A. 2

�Example Calculate A−1 by calculating adj A for

A =

1 1 10 2 35 5 1

.

Solution

We first need to find |A|,

|A| =

∣∣∣∣∣∣1 1 10 2 35 5 1

∣∣∣∣∣∣ =∣∣∣∣∣∣

1 1 10 2 30 0 −4

∣∣∣∣∣∣ ,


here we have taken (−5) times the first row and added it to the third row. The last determinantis now evaluated with expansion by the first column,

|A| =

∣∣∣∣∣∣1 1 10 2 30 0 −4

∣∣∣∣∣∣ = (−1)1+1

∣∣∣∣ 2 30 −4

∣∣∣∣= −8.

To calculate adj A we need the cofactors:

c11 =∣∣∣∣ 2 3

5 1

∣∣∣∣ = −13, c12 = 15, c13 = −10

c21 = 4, c22 = −4, c23 = 10, c31 = 1

c32 = −3, c33 = 2.

So that,

adj A = (cij)t =

−13 4 115 −4 −3−10 10 2

.

Finally, we have (as A−1 = 1|A|adj A)

A =

138 − 1

2 − 18

−158

12

38

54

−54 − 1

4

.

2

Cramer’s Rule

A system of n equations in n unknowns

a11x1 + a12x2 + . . .+ a1nxn = b1

......

...an1x1 + an2x2 + . . .+ annxn = bn

can be written in matrix form asAX = b.

We suppose det A 6= 0 so that the system has a unique solution given by

X = A−1b.

Cramer’s rule is a method for solving this system which does not involve row reduction or a directcalculation of A−1. Cramer’s rule is named for the Swiss mathematician Gabriel Cramer (1704– 1752) who first published the rule in 1750; however, there is some evidence that this rule wasknown to the British mathematician Maclaurin as early as 1729.

Define matrices Aj(j = 1, 2, . . . , n) where Aj is obtained from A by replacing the jth columnof A by b,

Aj =

a11 . . . a1j−1 b1 a1j+1 . . . a1n

......

......

...an1 . . . anj−1 bn anj+1 . . . ann

Then we have Cramer’s rule.

Theorem Let A be an n× n non–singular matrix, then the unique solution of the system

AX = b


is given by

x1 =|A1||A|

, . . . , xi =|Ai||A|

, . . . , xn =|An||A|

.

Proof Exercise.

�Example Use Cramer’s rule to solve

2x + 4y + 6z = 18

4x + 5y + 6z = 24

3x + y − 2z = 4.

Solution

We have A =

2 4 64 5 63 1 −2

and b =

18244

. Then

A1 =

18 4 624 5 64 1 −2

, A2 =

2 18 64 24 63 4 −2

and

A3 =

2 4 184 5 243 1 4

.

We also have |A| = 6(6= 0), |A1| = 24, |A2| = −12, |A3| = 18. Hence

x =|A1||A|

=246

= 4

y =|A2||A|

=−126

= −2

z =|A3||A|

=186

= 3.

2

♠ Exercises 7

1. If A is a square matrix such that

Am ≡ A ·A · . . . A︸︷︷︸m times

= 0

for some positive integer m, prove that |A| = 0.

2. If A is an upper triangular matrix prove that adj A is also upper triangular.

3. For what values of α does

1 2 3−α α− 1 α + 1

2− α α + 3 α + 7

not have an inverse?

4. For each of the following determine if the matrix is invertible. If it is compute the inverse.


(a)(

5 11 2

)(b)

(−4 −8

3 6

)(c)

(0 1−1 0

)

(d)

0 2 34 4 11 1 1

(e)

1 2 3−1 1 0

2 2 0

(f)

1 0 01 1 01 1 1

(g)

3 1 1−1 1 1

2 2 1

.

(h)

1 −1 2 11 3 3 21 1 1 11 2 −1 2

.

5. Use Cramer’s rule to solve the following

(a) 2x + 3y = −1 (b) x1 + x2 + x3 = 8−7x + 4y = 47 4x2 − x3 = −2

3x1 − x2 + 2x3 = 0

(c) 2x + 3y − z = 5 (d) x− y + z = 7−x + 2y + 3z = 0 2x− 5z = 44x− y + z = −1 3y − z = 2.

*6. If A is an n× n invertible matrix prove that

(a) det(adj A) = (det A)n−1

(b) adj(adj A) = (det A)n−2A.

*7. Prove Cramer’s rule.

Lecture 3.8 An Application, Leslie Matrices 41

Lecture 3.8 An Application, Leslie Matrices

In this and the following lecture we want to use our work on matrices and determinants to studycertain models of population growth. These models which allow for an age group structure areknown as Leslie Matrix models.

Predicting the Age Structure of a Population.

Most natural populations experience high pre-adult mortality rates, for example about 65% ofgrey seal pups survive their first year, whereas about 93% survive their second year. How shouldwe incorporate such “age structure” into our population model?

First, we need to break up our population into age groups. For humans 20 groups which eachcover 5 years are often used:

Age Group Age Range

1 Individuals aged from 0 to 4 years2 Individuals aged from 5 to 9 years3 Individuals aged from 10 to 14 years

. . . . . .

19 Individuals aged from 90 to 94 years20 Individuals aged from 95 to 99 years

Next, we need to choose our time steps, that is we need to choose the time interval betweeneach census. At each census the number of individuals in each age group is counted. We can keepthings reasonably simple if we choose the time step to have the same duration as the age rangefor the individual age groups. This is because an individual will move (if it survives) from one agegroup to the next, at each census. Those of the first age group who survive to the next census willthen be counted in the second age group, and so on. All births during the time step will go intothe first age group at the end of that time step. Let’s formulate this mathematically.

We define N age groups each with an age range of K years (or some other appropriate unit oftime). We write Xj for the population density in the j-th age group; this group will consist of thoseindividuals aged from (j − 1)K years to (jK − 1) years. We will be dealing with large arrays ofnumbers here, so we will need a neat way of keeping track of what we are doing. Matrix techniquesprovide a good, in fact, essential way of handling these arrays. We define the population vector tobe the column vector with entries being the Xj ,

X =

X1

X2

X...

XN

.

Our time step is to be of duration K years. Lets look at what happens to a particular age group,Xj , as we move from time step n to time step n + 1. The number of individuals in the j-th agegroup at time step n is Xj(n). All those that survive the K years to the next census will move intoage group j + 1. This is the only contribution to age group j + 1 at the (n + 1)-th census. That is

Xj+1(n + 1) = survivors from Xj(n).

All births from the age group j between the n-th and (n + 1)-th time steps will go into the firstage group, in fact this is true for all N age groups; this is the only contribution to the first agegroup at the (n + 1)-th time step. We haveX1(n + 1) = Sum of all births from age groups 1 to N in the interval n to n + 1.

If we are to make detailed calculations we are going to have to be a lot more specific about theright hand sides of these last two equations. The sort of thing we need is the average number ofbirths per individual and the probability of survival for each age group. We make two definitions.

42 Lecture 3.8 An Application, Leslie Matrices

Survival Rates denoted sj, are defined by

sj ={

The probability of an individual in age group j

surviving to age group j + 1, K years later.

Fecundity Rates, denoted Fj are defined by

Fj ={

average number of offspring born to each individualin age group j surviving to the next census.

We will assume that the values of Fj and sj are constant. This is a simplifying assumption, ingeneral we would expect that birth rates and death rates would depend on the population densityat that time.

The definition of fecundity rates, above, implies that the population under consideration consistsentirely of those individuals capable of producing offspring. For human populations that means wewould only be counting females. For certain slug populations we would be counting all individuals– they are haemaphrodites!

Now calculate the change in age group populations as we move from time step n to time stepn + 1. We have,

X1(n + 1) = F1X1(n) + F2X2(n) + . . . + FNXN (n),

X2(n + 1) = s1X1(n),

X3(n + 1) = s2X2(n),

. . .

XN (n + 1) = sN−1XN−1(n).

The expressions on the right sides of these equations can be recognised (you should check this) asarising from a matrix multiplication. In fact we have,

Basic Equation, X(n + 1) = PX(n),

where X is the population column vector, defined above, and the square matrix P is the Lesliematrix, defined as follows.

Leslie Matrix, P =

F1 F2 F3 . . . FN−1 FN

s1 0 0 . . . 0 00 s2 0 . . . 0 0...

......

......

...0 0 0 . . . sN−1 0

.

�Example Consider the population model with two age groups with P given by

P =(

1 20.6 0

).


Suppose we start with an initial population of ten individuals in age group 1 and no individualsin group 2. What is the age distribution after three time steps?

We have

X(0) =(

100

).

So,

X(1) = PX(0) =(

1 20.6 0

)(100

)=(

106

).

This is just as we would have “guessed”: we started with ten individuals in the first age group 0.6of them, or 6, survive to the next age group; each individual in the first age group gives birth toone new individual to give ten new individuals for the first age group.

Continuing with our calculations,

X(2) = PX(1) = P 2X(0),

=(

226

),

and X(3) = PX(2) = P 2X(1) = P 3X(0),

=(

3413.2

).

So at the third time step we have 34 in the first age group and 13.2 in the second age group.

Given an initial population vector X(0) and a Leslie matrix P we can, in theory, calculate thepopulation vector at any future time step.

X(0)

X(1) = PX(0)

X(2) = PX(1) = P 2X(0)

. . .

X(n) = PX(n− 1) = . . . = PnX(0).

We have the general formula for the population vector at the n-th time step.

The Population Vector, at any time step n X(n) = PnX(0).

The larger the number of age groups the more difficult it becomes to use this formula overa significant number of time steps. For human population studies it is common to break thepopulation into 20 groups each covering a 5 year span. If we were to follow one age group frombirth to death we would need to do 20 matrix multiplications of a 20× 20 matrix. More detailedmodels would involve us in even more complex calculations. What we need is a simple mathematicaltechnique for extracting the important pieces of information from the model. Long time behaviourof the system, is the type of information we would like to have. Depending on our initial value,X(0), the population may change drastically during the first few iterations. In general, we wouldexpect the population to settle down into some sort of stable behaviour after the initial unsettledbehaviour. Is this always true for our Leslie matrix models? If not, under what circumstances doesit occur?

Two quantities for which we can (as we will see later) get long term trends for are the totalpopulation and fractional population vector, they are defined as follows.


Total Population, T (n) =N∑

j=1

Xj(n).

Fractional Population Vector, F (n) = 1T (n)X(n) =

X1(n)T (n)X2(n)T (n)

...XN (n)T (n)

.

The total population gives the total number of individuals in the population at the n-th timestep. The fractional population vector is a column vector whose entries give the fraction (of thetotal population) in each age group (at the given time step) – multiply by 100 to get the percentageof the population in each of the age groups. Notice that the vector F (n) is “normalised”, the entriesin the column sum to one.

Mathematical Background – Eigenvalues and Eigenvectors.

We begin with the following definitions.

An Eigenvalue and Eigenvector, of a square matrix A, are (respectively) anumber λ and a column vector x satisfying,

Ax = λx.

�Example Show that x is an eigenvector for A and find the eigenvalue associated to x, where

A =

2 1 12 3 21 1 2

and x =

121

.

Well, a simple matrix computation gives

Ax =

5105

= 5

121

= 5x.

So we see that x is indeed an eigenvector and λ = 5 is the associated eigenvalue.

Suppose A is a square matrix with eigenvalue λ and non-zero eigenvector x. In the case x is justthe zero column vector the, eigenvalue equation Ax = λx, is trivially true. Generally we rewritethe eigenvalue equation as

Ax− λx = (A− λI)x = 0,

where 0 is the zero column vector. Note that in taking out the “common factor” x we need toinclude a unit matrix, I, in the brackets. Now suppose that B = A−λI is a non-singular matrix, i.e.we suppose (for the moment) that B has an inverse. Then we have, from our rewritten eigenvalueequation,

x = B−10 = 0, since B−1(A− λI) = I, by definition.

This tells us that x must be the zero column vector. But this is just the case we wanted to exclude!We must conclude that B is singular, i.e. B must have zero determinant. This condition, detB = 0gives us an equation for the eigenvalues of a matrix A (remember B is defined in terms of A). Thisequation is called the characteristic equation for the matrix A.


The Characteristic Equation for square matrix A is, det(A− λI) = 0

�Example Find the eigenvalues of the matrix

A =(

1 −11 1

).

Solution

We require the determinant of the matrix

A− λI =(

1 −11 1

)− λ

(1 00 1

)=

(1− λ −1

1 1− λ

).

The determinant is,det(A− λI) = (1− λ)2 + 1 = λ2 − 2λ + 2.

So the characteristic equation for A is,

λ2 − 2λ + 2 = 0.

We can solve this quadratic equation in the usual way,

λ =+2±

√(−2)2 − 4× 22× 1

=2±√−4

2

=2± 2i

2= 1± i

The eigenvalues are 1 + i and 1− i, a complex conjugate pair. In fact, it is generally true that thecomplex eigenvalues of a real square matrix come in complex conjugate pairs. 2

♠ Exercises 8

1. The table below gives fecundity and survival data for the Grey Seal; ages are given in years,for example age 0 means the age range 0 to 1 year. Write down the Leslie matrix whichcould be used to model this data. If we started with a population with 1000 individuals ineach of age groups 5 and 6+, what would be the populations in each of the age groups in 3years time?


Grey Seal Fecundity and Survival Rates

Age 0 1 2 3 4 5 6+Fecundity 0 0 0 0 0.08 0.28 0.42Survival 0.657 0.930 0.930 0.930 0.935 0.935 0

SOURCE: D. Brown and P. Rothery, Models in Biology: Mathematics, Statistics and Com-puting, John Wiley and Sons Ltd., Chirchester, 1993.

2. Consider the following Leslie matrix

P =

0 0 612 0 00 1

3 0

.

(a) Show that P 3 = I, where I is the 3× 3 unit matrix.

(b) Show that this population model is cyclic with period three. That is, no matter whatthe initial population X(0), X(n + 3) = X(n), the population numbers are exactly thesame after each third time step.

3. Let A be the matrix

A =

1 0 2 10 0 2 21 0 0 −10 2 0 3

and let x1, x2 and x3 be the vectors

x1 =

413−2

, x2 =

−2

32−2

, x3 =

1101

.

Show that x1 and x2 are eigenvectors of A and find the corresponding eigenvalues. Showthat x3 is not an eigenvector of A.

*4. Suppose we divide a population into just two age groups. The sexually mature group and thesexually immature group – this second group is, of course, incapable of reproducing. Whatwould a general Leslie matrix for this population look like? Solve the characteristic equationfor such a model and find the corresponding eigenvectors.

Lecture 3.9 Leslie Matrices, continued. 47

Lecture 3.9 Leslie Matrices, continued.

�Example Find the eigenvalues and associated eigenvectors for the Leslie matrix,

P =(

1 40.5 0

).

Solution

First, we find the eigenvalues. The characteristic equation for P is (you should check this!),

λ2 − λ− 2 = 0.

The solutions of this quadratic can be found by direct factorisation or by using the formula, wehave λ = −1 or 2. Next, we need to find the eigenvectors associated to each of these eigenvalues,we will do the case λ = −1, the case λ = 2 is left as an exercise. Suppose that X1 is the eigenvectorassociated with λ = −1; X1 will be a column vector with just two rows. We write

X1 =(

α

β

),

where α and β are numbers to be determined. We substitute this expression for X1 into theeigenvalue equation PX1 = λX1, with λ = −1. We have,(

1 40.5 0

)(α

β

)= −

(α

β

).

That is, (α + 4β

0.5α

)=(−α

−β

).

The first row of this matrix equation gives α + 4β = −α and the second row gives 0.5α = −β.These two equations are not independent, they both give β = −0.5α. So we have X1,

X1 =(

α

−0.5α

)= α

(1−0.5

).

2

Notice that the eigenvector associated to an eigenvalue is only determined up to a multiple, inthe case at hand the arbitrary multiple of α. This is always the case: if X is an eigenvector forλ, PX = λX, then any multiple of X is also an eigenvector for λ, P (αX) = λ(αX), for any numberα.

The Main theorem on Leslie Matrices

In this section of the notes we will, after some preliminaries, present a theorem about Leslie ma-trices. This theorem will allow us, under certain circumstances, to predict the long term behaviourof our population model.

Let us, for the moment, assume that our initial population vector is an eigenvector of the Lesliematrix P . This would mean that, PX(0) = λX(0) for some number λ, λ must be real, since bothP and X(0) are. Iterating we get,

X(1) = PX(0) = λX(0)

X(2) = PX(1) = λ2X(0)

. . .

X(n) = λnX(0).

Of course λnX(0) is much easier to calculate than PnX(0), so we have some simplification in thiscase. What happens to the total and fractional populations in this case? We have,

48 Lecture 3.9 Leslie Matrices, continued.

T (n) =N∑

j=1

Xj(n) =N∑

j=1

(λnXj(0))

= λnN∑

j=1

Xj(0)

= λnT (0).

That is, T (n) = λnT (0). Now, for the fractional population,

F (n) =1

T (n)X(n) =

λn

T (n)X(0)

=1

T (0)X(0)

= F (0).

The fractional population remains constant, F (n) = F (0), in this case. This means that thefractions (or percentages) in each age group remain the same - the population in each groupwill of course be variable, depending on λ. To summarise, if the initial population vector is aneigenvector of the Leslie matrix P , then the fractional population vector remains constant and thetotal population is given by T (n) = λnT (0). What has this got to do with the general situation orthe long term behaviour of the model? Well, there is a mathematical sense in which the behaviourwe have just been examining is exactly what happens over the longer term for any initial condition,provided there is a “largest” eigenvalue. The mathematical result, the Perron-Frobenius theorem,says (roughly) that if there is a “largest” eigenvalue then, as we iterate the powers of P , the powerof this largest eigenvalue will dominate as n gets larger. More precisely, our main theorem is

Theorem If the Leslie matrix P has an eigenvalue λ such that

1. λ is real and positive.

2. λ is greater in absolute value than any other eigenvalue (real or complex).

Then, for large values of n, X(n + 1) ≈ λX(n), for any (non-zero) initial population vector.

Remarks:

• An eigenvalue satisfying conditions 1 and 2 of the theorem is called the dominant eigenvalueor asymptotic growth rate of the Leslie matrix.

• If the Leslie matrix has a dominant eigenvalue λ then we have, for m a large integer,

X(m + 1) ≈ λX(m)

X(m + 2) ≈ λX(m + 1) ≈ λ2X(m)

. . .

X(m + n) ≈ λnX(m).

• The total population, at late times, is also easily calculated in this case,

T (n + m) ≈ λnT (m).

• The fractional population vector at large times (the asymptotic fractional population vector)can also be approximated when there is a dominant eigenvalue,

F (n + m) =1

T (n + m)X(n + m) ≈ F (m).


So if our Leslie matrix has a dominant eigenvalue we know that, in the long term, our populationwill settle down to steady growth (λ > 1) or steady decay (λ < 1). In either case the fractionalpopulations will tend to fixed constants. Notice that the asymptotic fractional population vectorwill just be the normalised eigenvector of the dominant eigenvalue, the normalisation conditionjust requires that the sum of the entries in F (m) should be 1.

�Example For the Leslie matrix,

P =(

1 238 0

).

Describe the asymptotic behaviour by finding the percentage of the population in each age group,and the doubling time of the total population.

Solution

First we must find the eigenvalues of P .

det(P − λI) =(

1− λ 238 −λ

)= λ2 − λ− 3

4.

Our characteristic equation is, λ2−λ−3/4 = 0. The solutions of this quadratic are, λ = −1/2 and3/2. The eigenvalue, λ = 3/2, satisfies the conditions 1 and 2 of the theorem, it is the dominanteigenvalue. To get the asymptotic fractional population vector we need the eigenvector of theeigenvalue λ = 3/2. Let F be this eigenvector, with

F =(

α

β

).

We have, PF = 32F ; calculating this explicitly,

PF =(

α + 2β38α

)=

32F =

(3α23β2

).

So we find that, α = 4β; hence,

F =(

4β

β

).

The constant β is now determined by the normalisation condition, 4β + β = 1, giving β = 1/5.Our asymptotic fractional population vector has now been determined,

F =(

4515

).

So we will, in the long term, end up with 80% of our population in the first age group and 20%of the population in the second age group. Suppose the doubling time for the asymptotic totalpopulation is k time steps. Now,

T (m + k) ≈ λkT (m),

where m is a large integer and λ = 3/2. We require, T (m + k) = 2T (m), i.e. the population hasdoubled over the k time steps. So, after cancelling the T (m) from our equation, we have

2 ≈ λk.

Taking natural logarithms of both sides and re-arranging,

k ≈ ln 2lnλ

=ln 2

ln(3/2)≈ 1.7095

50 Lecture 3.9 Leslie Matrices, continued.

So it takes about 1.7 time steps for the population to double, in the asymptotic regime. 2

�Example Predict the long term behaviour of a population model governed by the Leslie matrix,

P =

0 2 4 6

0.2 0 0 00 0.4 0 00 0 0.6 0

.

Solution

The first thing we must do is to see if there is a dominant eigenvalue. The following computa-tions were done using MATLAB, you could do them on a programmable calculator.

Eigenvalues Absolute Values1.0028 1.0028-0.7165 0.7165

−0.1429 + 0.6166i 0.6329−0.1429− 0.6166i 0.6329

So λ = 1.0028 satisfies conditions 1 and 2 of our main theorem, it is the dominant eigenvalue orasymptotic growth rate. Let’s compute the time it takes the asymptotic population to double. Asabove, we have T (m + k) = λkT (m) = 2T (m), where k is the number of time steps taken for thepopulation to double. Now solve this last equality for k (the method is the same as before),

k =ln 2lnλ

.

With λ = 1.0028 we have k ≈ 247.90. This is a “long time”, but not unexpected as the populationincreases slowly, the fractional (asymptotic) growth rate is λ − 1 = 0.0028, that is an increaseof 0.28% per time step. Using MATLAB we can also find the asymptotic fractional populationvector, F , as the normalised eigenvector of the dominant eigenvalue. We have,

F =

0.75380.15030.06000.0359

.

This tells us that 75.38% of the population ends up in age group one, 15.03% in group two, 6% ingroup three and 3.59% in group four. 2

What happens if there is no dominant eigenvalue? In general we would expect the oscillatorybehaviour, seen in the early phase of all our Leslie matrix models, to continue through to largetimes. These oscillations will increase in amplitude if the absolute value of the largest eigenvalue isgreater than one; the oscillations will decrease in amplitude if this absolute value is less than one.

♠ Exercises 9

1. Prove the following two theorems.

(a) Let A be an n × n upper triangular matrix. Then the eigenvalues of A are the entrieson its main diagonal.

(b) If A is an n× n matrix A and At have the same eigenvalues.

2. Find the eigenvalues and corresponding eigenvectors for the following:

(a)(

10 −94 −2

)(b)(

7 4−3 1

)(c)

7 0 08 −4 01 5 2

.


3. Consider the general Leslie matrix model of a population with three age groups, the first ofwhich is immature; the population densities in the age groups being X1, X2 and X3. TheLeslie matrix is

P =

0 F2 F3

s1 0 00 s2 0

.

(i) Show that the eigenvalues of P are determined by the cubic equation

λ3 − s1F2λ− s1s2F3 = 0.

(ii) Verify that, if the age group 2 is (like age group 1) sexually immature, the eigenvaluesof P are

λ = α, −α

2(1 +

√3i),

α

2(−1 +

√3i),

where α = (s1s2F3)13 .

(iii) What are X(j) for j = 1, 2, 3 in the model of part (ii), if

X(0) =

100

?

What do you suspect that the long time behaviour may be?Calculate the absolute values of the eigenvalues to check your suspicion.

4*. Two n × n matrices, A and B, are said to be similar if there exists an invertible matrix S

such thatA = S−1BS.

(a) Prove that A and B have the same eigenvalues.

(b) How are the eigenvecters of A and B related?

(c) Suppose A is similar to D an n × n diagonal matrix. How are the eigenvalues of A

related to D?

52 Lecture 3.10 Vector Spaces

Lecture 3.10 Vector Spaces

Introduction

We have been calling an n× 1 matrix,

b1

b2

...bn

a column vector. The transpose of this matrix,

(b1, b2, . . . , bn) ,

is a 1×n matrix or row vector. Row (column) vectors can be added multiplied by numbers and so onby virtue of the fact that they are matrices. What we want to do now is ‘abstract’ these propertiesto define the general concept of a vector. The idea of a vector space has been one of the mostfruitful concepts in mathematics. Vector spaces are at the heart of much modern mathematics.Vector spaces also play a critical role in applications in such diverse areas as Quantum Mechanicsand Economic Modelling.

Before we begin our abstract definitions let’s look at a ‘toy’ model for vector spaces, the planeR2. Points in R2 can be represented by row vectors (or column vectors) (x, y) where x and y arereal numbers. In physics you may have been told that a vector is a quantity with magnitude anddirection. How does our row vector (x, y) fit that description?

We think of (x, y) as representing the ‘vector’ given by the directed line segment from (0, 0) to(x, y).

y

x

P (x, y)

O (0, 0)

Such a vector is often written as ~OP . This is illustrated in the diagram below, the vector v

can be represented as ~OP or ~QR.

Lecture 3.10 Vector Spaces 53

y

x

P (3, 3)

R (4, 6)

O

Q (1, 3)

1 2 3 4

1

2

3

4

5

6

5

You should also notice that when representing a vector by the row vector (x, y) we are implicitlyassuming that we have a set of coordinate axes so that we can plot (x, y). If we were to changethose axes then the vector itself would not change. However, the description of the vector (x, y)would change. Think of a gravitational force field pointing towards the centre of the Earth. Thevector describing the force remains the same no matter what we do with our rulers or measuringapparatus (i.e. axes).

y

x

x′

y′

‘old axes’

‘new axes’

v

The vector v remains the same, however its description changes as we move from axes x ∼ y

to axes x′ ∼ y′.

We can also add two vectors u and v. Suppose we have

u = (a, b)

and v = (c, d),

with respect to a fixed set of axes then

u + v = (a, b) + (c, d)

= (a + c, b + d).

This is just our matrix addition rule — remember row vectors are 1× n matrices.


y

x

(a, b)

(c, d)

(a + c, b + d)

u

vu+v

Remember we can move these vectors around as long as we keep them parallel to the original.So let’s move v so that its base point (originally the origin) is now at the tip of u, i.e. at (a, b).The arrow head will now be at the point, we have to add a to the x measurement and b to the y

measurement, (c + a, d + b).

y

xu

vu+v

This is the triangle law for vector addition in the plane.

In fact the same rule will also hold in three dimensional space or, indeed, any Rn. In otherwords the triangle law says that for any two vectors u and v the three vectors u, v, u + v formthe sides of a triangle.

u

vu+v

Notice the positions of the arrow heads in this pictorial addition law.

We can also multiply vectors by scalars (or numbers). For

u = (x, y) we have

λu = λ(x, y) = (λx, λy).


Again this is just a special case of our rule for multiplying matrices by scalars. Notice that thevector λu must be parallel to u, it points in the same direction if λ > 0 or in the opposite directionif λ < 0.

�Example A golf ball is hit 180m due east, then 50m south and finally 20m east. Give a vectordescription of the final displacement from the tee.

Solution We position our axes so that east is the positive x direction and north is the positive y

direction

N

E180

50

20

The first shot gives a displacement of 180 in the x direction; so the displacement vector is u1 =(180, 0). The second shot has displacement vector u2 = (0,−50). While the final displacement isgiven by u3 = (20, 0). The total displacement is given by

u = u1 + u2 + u3

= (200,−50).

Note that this idea generalises: any sequence of displacements u1, u2,u3, . . . results in a finaldisplacement u = u1 + u2 + u3 + . . .. 2

Vector Spaces

We now want to abstract the situation. We forget, for the moment about row vectors, R2 andso on, and try to think of the rules obeyed by vectors u in the abstract.

In fact from the abstract mathematical point of view these rules or properties are preciselywhat define a vector space (or set whose elements are vectors).


A vector space is a nonempty set V the elements of which are called vectorsand denoted by u, v, w etc. There are two operations defined on V calledaddition (denoted with +) and multiplication by scalars (real or complex

numbers) subject to the following

1. u + v ∈ V for all u,v ∈ V .

2. u + v = v + u for all u,v ∈ V .

3. (u + v) + w = u + (v + w) for all u,v,w,∈ V .

4. There exists 0 ∈ V such that u + 0 = 0 + u = u, for all u ∈ V .

5. For each u ∈ V there is a −u ∈ V such that u + (−u) = 0.

6. λu ∈ V for all u ∈ V and all scalars λ.

7. λ(u + v) = λu + λu.

8. (λ + µ)u = λu + µv for all u ∈ V and any scalars λ and µ.

9. λ(µu) = (λµ)u.

10. 1u = u.

If the scalars used are real numbers then we have a real vector space. If the scalars are complexnumbers then we have a complex vector space.

The real vector spaces R2, R3 (or Rn) provide the intuition and were the original motivationfor our ten axioms defining an abstract vector space. However, there are many other examples ofvector spaces.

• The collection of all real continuous functions, f(x), defined on an interval a < x < b is areal vector space under addition. If f and g are real continuous functions on a < x < b thenso is f + g defined by

(f + g)(x) = f(x) + g(x).

• The set V , of all real valued infinite sequences {an}∞n=1 are elements of V then so is a + b ={an + bn}∞n=1

• The set Pn of real polynomials in x of degree at most n is a real vector space. If p =∑n

j=0 ajxj

and q =∑n

j=0 bjxj are in Pn then so is

p + q =n∑

j=0

(aj + bj)xj .

It is interesting to note that this last example is really the “same” as Rn+1. Given (a0, a1, . . . , an) ∈Rn+1 we can define an element of Pn as

∑nj=0 ajx

j , and vice versa.

Subspaces

If we fix an x ∼ y plane in R3, then, R2, is a vector space in its own right as well as being asubset of R3. We call it a vector subspace or simply a subspace of R3.

A subspace, H, of a vector space V is a subset of V , H ⊂ V , such that H isitself a vector space under the operations of addition and scalar multiplication

inherited from V .


�Example Let H be the set of all vectors of the form (2a + b, a, b) where a and b are arbitraryscalars. Show that H is a subspace of R3.

Solution It is clear that H is a subset of R3. We have to show that H satisfies the axioms of avector space. It is easy to test most of the axioms. The two critical ones are those which demandclosure of H under the operations of addition and scalar multiplication (axioms 1 and 6).

Let u1 = (2a1 + b1, a1, b1) and u2 = (2a2 + b2, a2, b2). Where a1, a2, b1 and b2 are real numbers.Then

u1 + u2 = (2(a1 + a2) + (b1 + b2), a1 + a2, b1 + b2).

= (2a + b, a, b),

where a = a1 + a2 and b = b1 + b2. Clearly, u1 + u2 ∈ H. We also have

λu1 = (2λa1 + λb1, λa1, λb1)

= (2a + b, a, b)

if a = λa1, b = λb1. So clearly λu1 is in H. After verification of the other axioms (exercise!) wecan conclude H is a subspace of R3. 2

Basis Vectors

Consider an arbitrary vector u = (u1, u2, u3) in R3 referred to a fixed set of axes. We can writethis as

u = u1i + u2j + u3k,

where i, j,k are fixed vectors (of unit length) given by

i = (1, 0, 0)

j = (0, 1, 0)

k = (0, 0, 1)

The numbers u1, u2, u3 are called the components of u with respect to the basis i, j,k. Anyvector in R3 is uniquely specified by giving its components with respect to a given basis.

In terms of the x, y and z coordinate axes we see that

• i is a unit vector in the positive x direction.

• j is a unit vector in the positive y direction

• k is a unit vector in the positive z direction.

♠ Exercises 10

1. Find the displacement vector to the following sequence of movements

– 10m South East

– 20m East

– 10m North West

2. In each case write an expression for the vector u in terms of a, b or c. (a)

a

bu


(b)

a

b

c

u

(c)

b

c

u

a

3. Let H be the set of points inside and on the unit circle in the x ∼ y plane,

H = {(x, y) : x2 + y2 ≤ 1}.

Find a specific example to show that H is not a subspace of R2.

4. In the following determine if the given set is a subspace of Pn (real polynomials of degree atmost n), for the given n.

(a) n = 2, all polynomials of the form ax2, a ∈ R.

(b) Arbitrary n, all polynomials p(x) ∈ Pn such that p(0) = 0.

(c) n = 3, all polynomials of degree at most three which have integer coefficients.

(d) n = 3, polynomials of the form

ax3 + (2a + b)x2 + bx,

for a, b ∈ R.

*5. Let A be a non-singular 3× 3 matrix suppose we change basis from i, j,k to i′, j′,k′ by i′

j′

k′

= A

i

j

k

.

So, for example,i′ = a11i + a12j + a13k.

How do the components of a vector u = u1i + u2j + u3k change when we refer to the newbasis.

[Hint: u itself does not change.]

Lecture 3.11 The Inner or Scalar Product 59

Lecture 3.11 The Inner or Scalar Product

In dealing with coordinate axes in R2 and R3 we are used to associating length to line segmentsand angles to pairs of intersecting lines. How are we to do this in our vector space setting?

In general, lengths and angles between vectors are defined using what is known as an innerproduct. The inner product is a mapping which associates to each pair of vectors a scalar. We willnot pursue things in such generality here. The interested student will meet inner products in thelinear algebra course MATH213.

What we require here is an inner product which leads naturally to the Euclidean distancemeasure — Pythagoras’ theorem.

x

y

√x2 + y2

In fact we just about have such an inner product at hand. Take two vectors u1 = (x1, x2, . . . , xn)and u2 = (y1, y2, . . . , yn) in Rn, then we can define a map Rn × Rn −→ R as follows

u1 · u2 = (x1, x2, . . . , xn)(y1, y2, . . . , yn)t

= (x1x2 . . . xn)

y1

y2

...yn

.

= x1y1 + x2y2 + . . . + xnyn.

.

This scalar product (in R3 often called the dot product) of two vectors is easy to remember, itsjust the sum of the product of the components of the two vectors.

It is important to note that although we have given a definition of u1 ·u2 in terms of componentsthe definition itself can be shown to be independent of the axes used to give the components ofthe vectors (see exercises).

The length of a vector is now just given as

|u| =√

u · u

=√

(x21) + (x2

2) + . . . + (xn)2.

This is the usual length given by Pythagoras’ theorem. For example in R2

60 Lecture 3.11 The Inner or Scalar Product

y

x

(x, y)

u

The length of u is just |u| =√

x2 + y2.

�Example Calculate the following

(a) i · i (b) j · j (c) k · k (d) i · j(e) i · k (f) j · k

Solution

(a) i · i = (1 0 0)

100

= 1

The length of i is 1.

(b) j · j = 02 + 12 + 02 = 1

(c) k · k = 02 + 02 + 12 = 1

(d) i · j = 1× 0 + 0× 1 + 0× 0 = 0

(e) i · k = 1× 0 + 0× 0 + 0× 1 = 0

(f) i · k = 0× 0 + 1× 0 + 0× 1 = 0

2

Note that the scalar product is symmetric,

u · v = v · u.

�Example Calculate the following

(a) a · b where a = (1,−2), b = (3, 4)

(b) (i + j − k) · (2i + j − k)

Solution

(a)

a · b = 1× 3 + (−2)× 4

= 3− 8

= −5


(b) Either use the earlier example after expanding brackets or think of the vectors in row vectorform.

(i + j − k) · (2i + j − k) = 1× 2 + 1× 1 + (−1)× (−1)

= 2 + 1 + 1

= 4.

2

We mentioned earlier that inner products also have something to say about the angle betweentwo vectors. The following theorem shows explicitly how the dot product gives you information onthe angle between a pair of vectors.

Theorem Let θ be the acute angle between two vectors u and v in R3. Then

u · v = |u||v| cos θ.

Proof. For simplicity we will only provide a proof in R2. The proof in R3 is set as an exercise.We place the vectors ‘tail to tail’.

y

x

P

Q

u

v

θ

Let u = (x1, u1),v = (x2, y2) with respect to the given axes.

u · v = x1x2 + y1y2

Now, |u| =√

(x1)2 + (y1)2 and |v| =√

(x2)2 + (y2)2,

u · v = |u||v|

[x1√

(x1)2 + (y1)2· x2√

(x2)2 + (y2)2+

y1√(x1)2 + (y1)2

· y2√(x2)2 + (y2)2

]= |u||v|[cos α cos β + sinα sinβ],

where α = POX, β = QOX — the angles made by u and v with the positive x axis.

Therefore u · v = |u||v| cos(β − α)

= |u||v| cos θ.

We have used, cos A cos B + sinA sinB = cos(A−B) = cos(B−A), and the fact that θ = β −α.�

Our theorem gives a nice criteria for determining when two vectors are orthogonal — i.e.perpendicular.


Corollary Two non-zero vectors u and vare orthogonal if and only if u · v = 0.

Proof. The proof is a very simple consequence of the earlier theorem. Note that it is an ‘if andonly if’ proof. Firstly, if u and v are orthogonal then the angle between them, θ, is π

2 so

u · v = |u||v| cos θ

= |u||v| cosπ

2= 0.

On the other hand if u · v = 0 then, as |u| 6= 0 and |v| 6= 0, we have cos θ = 0. As θ is the acuteangle between u and v, θ = π

2 . The vectors are orthogonal. �

�Example Find the angle between the following two lines

OP : joining 0 to (1, 1, 2)

OQ : joining 0 to (0, 1, 1, ).

Solution We have

~OP = (1, 1, 2)(= i + j + 2k)~OQ = (0, 1, 1)(= j + k).

So ~OP · ~OQ = 1× 0 + 1× 1 + 2× 1 = 3.

Also, | ~OP | =√

12 + 12 + 22 =√

6 and

| ~OQ| =√

02 + 12 + 12 =√

2.

If θ is the angle between ~OP and ~OQ then we have

~OP · ~OQ = | ~OP || ~OQ| cos θ.

So,

3 =√

6 ·√

2 cos θ

i.e. cos θ =3√6√

2=

3√12

=3

2√

3

=√

32

Hence the angle θ is π6 or 30◦. 2

�Example Use vectors to prove that the median drawn from the vertex made by the equal sidesof an isosceles triangle is perpendicular to the third side of the triangle.

Solution Let a, b, c and d be as shown

c c

d ba


Notice that the median d bisects the base of the isosceles triangle represented by 2c. We use thevector rule of addition

a = c + d and

d = c + b.

From this pair of equations we deduce that

d =12(a + b) and

c =12(a− b).

So that d · c =14(a · a + a · b− b · a− b · b)

=14(|a|2 − |b|2),

since |u| =√

u · u for any vector u. However, as the triangle is isosceles |a| = |b| — the sidesgiven by a and b have equal length. Thus,

d · c = 0.

We conclude that the median (represented by d) is perpendicular to the base (represented by 2c).2

Orthogonal Projection

In R3 our basis vectors i, j and k are mutually orthogonal (each one is perpendicular to theother two), unit vectors (they all have length 1).

A general vector u in R3 can be written as

u = u1i + u2j + u3k,

where the ui are the components of u with respect to the basis i, j,k. We can think of u1 as thecomponent of the projection onto i of u — in fact it is the perpendicular or orthogonal projection

ji

k

u1

u

In the same sense u2 and u3 are the projections onto j and k respectively.

We now want to use our scalar product to characterise such projections. We note that

u1 = i · uu2 = j · u

and u3 = k · u.


So we find component u1 of the projection of u onto i by simply taking the dot product.

Let’s generalise. Let e be any vector, suppose we want to find the component of the projectionof u onto e. First, we need to make e into a unit vector. We are interested only in the componentof u in the direction of e. The unit vector in the e direction is

e =e

|e|.

Note that, e · e = e·e|e|2 = |e|2

|e|2 = 1, so e has indeed got unit length.

The required component of projection is now simply

e · u

The projection of the vector u onto e is then the vector of length e ·u in the e i.e. e, direction.

The orthogonal projection of u onto a nonzero vector e is

projeu = (e · u)e,

a vector of length (e · u) in the e direction.

As e = e|e| this can also be written as

projeu =e · u|e|2

e.

Note |projeu| = |e · u| = |e·u||e| .

projeu e

vu

Notice that the vector labelled v is orthogonal to e (and projeu). It is known as the componentof u orthogonal to e. In fact, using the vector addition rule

v = u− projev.

�Example Find the orthogonal projection and component orthogonal to it for

u = i + j + k

in the direction of e = i + j.

Solution

Unit vector in e direction,

e =e

|e|=

i + j√12 + 12

=1√2(i + j).


Then e · u =1√2(1 + 1) =

√2, so that

projeu =√

2(√

2)2(i + j)

=1√2(i + j).

The vector orthogonal to projeu is

u− projeu = (1− 1√2)i + (1− 1√

2)j + k.

2

♠ Exercises 11

1. In each part find the scalar product of the vectors and the cosine of the angle between them.

(a) u = i + j,v = i− j

(b) u = (1,−1),v = (2,−3)

(c) u = 2i− j + k,v = −i + 3j + k.

(d) u = i + j − k,v = 3i− k.

2. Use vectors to show that A(2,−1, 1), B(3, 2,−1) and C(7, 0,−2) are vertices of a right angledtriangle.

3. In each part find the orthogonal projection on e = i + j − k and also the vector componentorthogonal to e.

(a) u = 4i− j + 7k (b) u = i + j + k

(c) u = i− 2j (d) u = −i + j.

*4. Use vectors to prove that the angle inscribed in a semi-circle is a right angle.

66 Lecture 3.12 The Cross Product

Lecture 3.12 The Cross Product

For our final lecture on vectors and vector spaces we want to examine a notion which is very specificto vectors in R3. This is the cross or vector product. Although there are generalisations of thevector product to higher dimensional vector spaces they require more technical machinery, only inR3 does the vector product have a natural definition within the vector space itself.

What we want to do is to define a “product” of two (non-parallel) vectors which produces anew vector orthogonal (perpendicular) to the original pair. Here is our definition.

If u = u1i + u2j + u3k and v = v1i + v2j + v3k are two vectors in R3 thenthe cross product u× v is the vector defined by

u× v = (u2v3 − u3v2)i− (u1v3 − u3v1)j + (u1v2 − u2v1)k.

There are in fact deeper mathematical reasons why we would choose such a bizarre lookingdefinition. We’ll just have to accept it for the time being. At least until you have done some moremathematics. What we want to do is explore some of the consequences of the definition. The crossproduct became popular initially because of its great utility in applications to fluid mechanics andelectromagnetism.

Our definition of the cross product is, as it stands, difficult to use and remember. However, ifyou look at the three components of u× v, i.e. (u2v3 − u3v2),−(u1v3 − u3v1) and (u1v2 − u2v1),you should be reminded of the determinant! You can verify for yourself the following formula.

u× v =

∣∣∣∣∣∣i j k

u1 u2 u3

v1 v2 v3

∣∣∣∣∣∣.

In practice this is how one remembers the cross product definition.

�Example Calculate u × v where u = i − j + k and v = 2i + 3j − k. Verify that u × v isorthogonal to u and v.

Solution

u× v =

∣∣∣∣∣∣i j k

1 −1 12 3 −1

∣∣∣∣∣∣= [(−1)× (−1)− 3× 1]i− [1× (−1)− 2× 1]j + [1× 3− 2× (−1)]k

i.e. u× v = −2i + 3j + 5k.

To check the orthogonality of u and v with u × v we need to calculate the angle between u

and u × v; and, v and u × v. We use the scalar product formula. Let θ be the angle between u

and u× v. Then,

cos θ =u · (u× v)|u||u× v|

=1× (−2) + (−1)× 3 + 1× 5√

12 + (−1)2 + 12√

(−2)2 + 32 + 52

= 0.

So cos θ1 = 0 and θ1 = π2 ,u is orthogonal to u× v. 2

It is worth noting at this point the differences between the scalar and cross products.

Lecture 3.12 The Cross Product 67

• The scalar (or dot) product is defined on any Rn. The cross product is defined only in R3.

• The scalar product produces a scalar, i.e. u · v is a scalar. The cross product produces avector, i.e. u× v is a vector.

Properties of the Cross Product

We summarise the main properties of the cross product in the following theorem.

Theorem If u,v and w are any vectors in R3 and λ is any scalar, then

1. u× v = −(v × u)

2. u× u = 0

3. λ(u× v) = (λu)× v = u× (λv)

4. w × (u + v) = w × u + w × v

5. u · (u× v) = 0 and v · (u× v) = 0.

Proof

1. follows from the determinantal formula for u×v — interchange the rows of the determinantto create v × u, but interchanging rows of a determinant multiplies the determinant by −1.

2, 3, and 4 also follow easily from the determinant formula. They are left as an exercise.

5. says that both u and v are perpendicular to u × v. The proof is easy, following from thegeneral formulae for the dot and cross products.

�

Note that the cross product anti–commutes, i.e. u× v = −v × u, this is quite unlike ordinarymultiplication and the scalar product.

�Example . The vectors i, j and k are mutually orthogonal unit vectors show that

i× j = k, j × k = i and k × i = j

Solution

We will show i× j = k. Note

j = 1i + 0j + 0k

and j = 0i + 1j + 0k,

i× j =

∣∣∣∣∣∣i j k

1 0 00 1 0

∣∣∣∣∣∣=

∣∣∣∣ 0 01 0

∣∣∣∣ i− ∣∣∣∣ 1 00 0

∣∣∣∣ j +∣∣∣∣ 1 0

0 1

∣∣∣∣k= k.

The other formulae follow in a similar manner. 2

You will recall that we were able to calculate the scalar product in terms of the lengths of thevectors and the angle between them. Is a similar type of formula valid for the cross product? Thefollowing theorem provides the answer.

Theorem Let u and v be vectors in R3 with θ being the acute angle between them. Then

|u× v| = |u||v| sin θ.


Proof We have

cos θ =u · v|u||v|

,

so sin θ =√

1− cos2 θ

=

√1−

(u · v|u||v|

)2

.

Then,

|u||v| sin θ = |u||v|

√1− (u · v)2

|u|2|u|2

=√|u|2|v|2 − (u · v)2

=√

(u21 + u2

2 + u23)(v

21 + v2

2 + v23)− (u1v1 + u2u2 + u3v3)2

=√

(u2v3 − u3v2)2 + (u1v3 − u3v1)2 + (u1v2 − u2v1)2

= |u× v|.

�

Notice in proving our formula we derived the following interesting formula

|u× v|2 = |u|2|v|2 − (u · v)2.

We have the following easy corollary.

Corollary Let u and v be two non-zero vectors in R3. Then

(a) The area of the parallelogram with sides u,v is |u× v|.

(b) u× v = 0 if and only if u and v are parallel.

Proof

(a)

Area of the parallelogram = (base) × (perpendicular height)

= |u||v| sin θ

= |u× v|.

(b) u and v are assumed nonzero so |u| 6= 0 and |v| 6= 0. So we have

u× v = 0 if and only if sin θ = 0.

This is true if and only if θ = 0 or θ = π. So u × v = 0 if and only if u and v are parallel(or anti-parallel).


�

�Example Find the area of the triangle whose vertices are

P1(1, 1, 1), P2(−1, 1, 0) and P3(0, 2, 1).

Solution

The area of the triangle A, say, is half the area of the parallelogram determined by vectors

~P1P2 = (−1− 1, 1− 1, 0− 1)

= −2i− k

and ~P3P2 = (−1− 0, 1− 2, 0− 1)

= −i− j − k.

SoA =

12

∣∣∣ ~P1P2 × ~P3P2

∣∣∣ .Now,

~P1P2 × ~P3P2 =

∣∣∣∣∣∣i j k

−2 0 −1−1 −1 −1

∣∣∣∣∣∣= −i− j + 2k.

Then A =12| − i− j + 2k| =

12

√(−1)2 + (−1)2 + 22

=√

62

=

√32.

2

The Scalar Triple Product

The fact that the cross product produces a vector means that we can define a product of threevectors using the cross product and the scalar product.

If u,v and w are vectors in R3 we define the scalar triple product of u,v andw as the scalar

u · (v ×w).

We can give a rather nice formula for the triple product in terms of a determinant.

v ×w =

∣∣∣∣∣∣i j k

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣=

∣∣∣∣ v2 v3

w2 w3

∣∣∣∣ i− ∣∣∣∣ v1 v3

w1 w3

∣∣∣∣ j +∣∣∣∣ v1 v2

w1 w2

∣∣∣∣kSo,

u · (v ×w) = u1

∣∣∣∣ v2 v3

w2 w3

∣∣∣∣− u2

∣∣∣∣ v1 v3

w1 w3

∣∣∣∣+ u3

∣∣∣∣ v1 v2

w1 w2

∣∣∣∣=

∣∣∣∣∣∣u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣ .


u · (v ×w) =

∣∣∣∣∣∣u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣ .

�Example Calculate the triple product u · (v×w) if u = i + j −k,v = 2i− j and w = −i + 3k.

Solution

u · (v ×w) =

∣∣∣∣∣∣1 1 −12 −1 0−1 0 3

∣∣∣∣∣∣= −8.

2

The scalar triple product has a geometrical interpretation as the volume of the parallelepipeddefined by the three vectors u,v and w. This can be seen as follows,

Volume of parallelepiped, V = (Area of base) × (perpendicular height)

= |v ×w|h.

Where h is the perpendicular height,

h =∣∣projv×wu

∣∣ =|u · (v ×w)||v ×w|2

|v ×w|

=|u · (v ×w)||v ×w|

.

So we have

V = |u · (v ×w)|,or V = ±u · (v ×w).

�Example Verify the parallelepiped volume formula by calculating the volume of the unit cubewith sides i, j and k.

Solution

Volume = |i · (j × k)|= |i · i|= 1.

2

♠ Exercises 12

1. If u = i + 2j − k,v = −4i + j + 2k calculate the following

(a) u× v

(b) u× (u + v)

(c) the area of the triangle with u and v as two of its sides.

2. Prove property 5 of cross products:

u · (u× v) = 0 and v · (u× v) = 0.

3. Let u = i− j,v = 2i− j + 2k and w = 2j − 3k. Calculate


(a) (u× v)×w (b) u× (v ×w)(c) u · (v ×w) (d) v × (w × u)

4. Let P1, P2, P3 and P4 be the following four points in R3, P1(−1, 0, 0), P2(0, 1,−1), P3(1, 0, 1),P4(0, 0, 1). Calculate

(a) The area of the triangle formed by P1, P2 and P3

(b) The volume of the parallelepiped with sides given by the three vectors ~P1P2, ~P1P3 and~P1P4.

5*. Let d be the perpendicular distance from a point P to the line through two points Q and R.Show that

d =| ~PQ× ~QR|| ~QR|

.

contentsmcs.une.edu.au/~math101/lectures/additional notes... · lecture 3.1 simultaneous equations...

Documents