applied mathematics 1a (eng) mathematics …...5. d. lay: linear algebra and its applications (3rd...

APPLIED MATHEMATICS 1A (ENG)

Mathematics 132: Vectors and Matrices

University of KwaZulu-NatalPietermaritzburgc©C. Zaverdinos, 2012.

All rights reserved. No part of this book may be reproduced, in any form or by any means,without permission in writing from the author.

2

PrefaceThis course has grown out of lectures going back well over 30 years. Names which come tomind are J.Nevin, R.Watson, G.Parish and J. Raftery. I have been involved with teachinglinear algebra to engineers since the early 90s and some of my ideas have been incorporated inthe course. What is new in these notes is mainly my approach to the theoretical side of thesubject. Several of the numerical examples and exercises come from earlier notes, as do a fewtheoretical problems.

I would like to thank Dr. Paddy Ewer for showing me how to use the program LatexCad forthe diagrams and last, but not least, the Reverend George Parish for proof-reading an earlierversion and also for elucidating No.5 of Exercise 2, as well as Professors John and Johann vanden Berg for their encouragement in developing these notes.

About this course

The course is meant to be studied as a whole. Many examples and exercises in later Chaptersrefer the reader to earlier chapters. This is especially true of Chapter 3.

Chapter 1 motivates the idea of a vector through geometry and discusses lines and planes andtransformations related to such geometric objects. A vector can be thought of as a displacementin space and an ordered triple of numbers.

Chapter 2 generalizes the idea of a triple to an n-tuple and motivates linear algebra throughthe problem of finding solutions to simultaneous linear equations in n unknowns. The coeffi-cient matrix of such equations is known as a matrix. Simplification of such a matrix by rowoperations forms the major part of this chapter.

Chapter 3 considers matrices in detail and looks at them dynamically in the converse senseof Chapter 2: A matrix defines a transformation of points in n-dimensional space. Matrixmultiplication is introduced in terms of the composition of such transformations and some otherkey concepts such as linear independence, the rank and inverse of a matrix are discussed.Abstract vector spaces are never mentioned, but we remark that the the alternative proof ofthe basic theorem of linear independence in section 3.6 goes through word-for-word for suchspaces and also leads to the well-known Replacement Theorem of Grassmann.

Chapter 4 is about determinants and the cross product (also called the vector product).The theory of determinants predates that of matrices, going back to Leibnitz in the 17th Cen-tury. One of the founders of linear algebra, the 19th Century mathematician Arthur Cayley,once remarked that many things about determinants should really come after the study ofmatrices, and this is the modern approach adopted by us.The cross product is used extensively in mechanics, im particular dynamics, which is studiedin Mathematics 142. Algebraic properties of the cross product are derived from those of 3× 3determinants, while the exercises can serve as an introduction to some of its applications.

Note 1 Some exercises (particularly those marked with an asterisk *) are harder and, at thediscretion of the instructor, can be omitted or postponed to a later stage.

3

Bibliography

The following books can be consulted but they should be used with caution since differentauthors have a variety of starting points and use different notation. Many books also have atendency to become too abstract too early. Unless you are a mature reader, books can confuserather than help you.

1. A.E. Hirst: Vectors in 2 or 3 Dimensions (Arnold).This may be useful for our Chapter 1 since it makes 3-dimensional space its central theme.

2. R.B.J.T. Allenby: Linear Algebra (Arnold).This is a quite elementary text.

3. J.B. Fraleigh & R.A. Beauregard: Linear Algebra (Addison-Wesley).

4. K. Hardy: Linear Algebra for Engineers and Scientists (Prentice-Hall).

5. D. Lay: Linear Algebra and its Applications (3rd Edition, Pearson). This booktreats matrix multiplication in much the same way as we do, but its treatment of geometricaspects is less thorough.It has over 560 pages, becomes abstract and advanced after p.230, but will probably beuseful in later years of study.

6. H. Anton: Elementary Linear Algebra (6th Edition, John Wiley and Sons).

7. E.M. Landesman & M.R. Hestenes: Linear Algebra for Mathematics, Science andEngineering (Prentice-Hall). This is quite advanced.

I recommend especially (12) for Chapter 1 and (4) and (5) for later chapters of the notes.Because of its elementary nature, (2) is good for Chapter 2. If used with care the above bookscan be helpful.

C. ZaverdinosPietermaritzburg, February 6, 2012

Contents

1 Two and Three-Dimensional Analytic Geometry. Vectors 7

2 Matrices and the Solution of Simultaneous Linear Equations 43

3 Linear Transformations and Matrices 71

4 Determinants and the Cross Product 125

5

6 CONTENTS

Chapter 1

Two and Three-DimensionalAnalytic Geometry. Vectors

1.1 Points in three-dimensional space

The study of the geometry of lines and planes in space provides a good introduction to Lin-ear Algebra. Geometry is a visual subject, but it also has an analytical aspect which is thestudy of geometry using algebra: how geometric problems can be expressed and solved al-gebraically. This geometry is also called Cartesian Geometry, after its founder, the 17thCentury philosopher and mathematician Rene Descartes.

From school you are already familiar with the Cartesian plane. The x−axis and y−axismeet at right-angles at the origin O. Every point A in the x− y plane is uniquely representedby an ordered pair (a, b) of real numbers, where a is the x−coordinate and b is the y−coordinateas in Figure 1.1. We write A = (a, b). Here M is the foot of the perpendicular from A to thex−axis and likewise N is the the foot of the perpendicular from A to the y−axis.Notice that M = (a, 0) and N = (0, b).

x− axis

y− axis

O

Figure 1.1

b

a

A = (a, b).........................

.......................................a

M

b

N

In order to represent a point A in space we add the z−axis which is perpendicular to both the

7

8CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS

x− and y−axes as in Figure 1.2. Here A = (a, b, c) is a typical point and a is the x−coordinate,b is the y−coordinate and c is z−coordinate of the point A. In the diagram P is the foot of theperpendicular from A to the y− z plane. Similarly, Q and R are the feet of the perpendicularsfrom A to the z − x and x− y planes respectively.

O

x− axis

y− axis

R

P

Q

L

M

z− axis

N

aa

b

b

b

c c

Figure 1.2

A = (a, b, c)

aa

1.1.1 The Corkscrew Rule and right-handed systems

You wish to open a bottle of wine using a corkscrew. You first turn the corkscrew so thatit enters the cork. As observed by the opener the turning is clockwise, but observed fromthe underside of the bottle the sense is reversed: it is anticlockwise. The direction in whicha corkscrew moves as it is turned is given by the Corkscrew Rule. In Figure 1.2, rotatingfrom the x−axis to the y−axis and applying this rule produces motion along the (positive)z−axis. Rotating from the y−axis to the z−axis and applying the same rule gives motionalong the (positive) x−axis. Finally, rotating from the z−axis to the x−axis and applying therule produces motion along the (positive) y−axis. The axes x, y, z (in that order) are said toform a right-hand system. This can be represented by the scheme

x z−−−−−−→y x−−−−−−→z y−−−−−−→x

Have a look at yourself in the mirror while opening a bottle of wine and observe that whatis clockwise for you is anticlockwise for the person in the mirror. Thus a right-hand system ofaxes observed in a mirror does not follow the corkscrew rule and is called a left-hand system.The usual convention is to use only right-hand systems.

1.1.2 Displacements and vectors

Consider two numbers a and b on the real line <. The difference c = b−a is the displacementfrom a to b. Given any two numbers from a, b and c, the third is uniquely determined by the

1.1. POINTS IN THREE-DIMENSIONAL SPACE 9

equation c = b − a. For example, let a = −2 and b = 3. The displacement from a to b isc = 3− (−2) = 5. The displacement from 6 to 11 is also 5.Let A = (a1, a2, a3), and B = (b1, b2, b3) be two points in space. The displacement from Ato B is denoted by AB and is given by

AB = (b1 − a1, b2 − a2, b3 − a3) (1.1)

1.1.3 Units and distance

The unit can be any agreed upon distance, like the meter or kilometer. Units (of length, time)are important in applications but for the most part we won’t specify them.The distance of A = (a, b, c) from the origin O is given by

|OA| =√

a2 + b2 + c2

To see this, see Figure 1.2 and use the theorem of Pythagoras:

|OA|2 = |OR|2 + |RA|2= |OL|2 + |LR|2 + |RA|2= a2 + b2 + c2

The distance from A to B is written as |AB| and is given by

|AB| =√

(b1 − a1)2 + (b2 − a2)

2 + (b3 − a3)2 (1.2)

and is also called the length or magnitude of AB.

Because the displacement AB has both direction and magnitude it is called a vector andpictured as an arrow going from A to B. The tail of the arrow is at A and its head is at Bas in Figure 1.3.As an example, let A = (−1, 2.3,−4.5) m and B = (2.1, 3.6,−2.5) m. ThenAB = (2.1− (−1) , 3.6− 2.3,−2.5− (−4.5)) = (3.1, 1.3, 2.0) m.

Two displacement vectors AB and CD are equal if they are equal as displacements. Henceif C = (−3.5, 1.1, 3.3) and D = (−.4, 2.4, 5.3) and A, B are as above then

CD = (−.4, 2.4, 5.3)− (−3.5, 1.1, 3.3) = (3.1, 1.3, 2.0) = AB.

The magnitude of this vector is |AB| = |CD| =√

(3.1)2 + (1.3)2 + (2.0)2 = 3.9115 m.Various vectors (displacement, velocity, acceleration, force) play an important role in mechan-ics and have their own units. For example, if positions are measured in meters and time ismeasured in seconds, a velocity vector will have units ms−1. So if a particle moves in such away that each second it displaces (−1, 2, 3) meters, we say its velocity is constantly (−1, 2, 3)ms−1.

Depending on the geometric interpretation, the tail (head) of displacement vectors may beat any point. Vectors are often written as u = (u1, u2, u3), v = (v1, v2, v3), a = (a1, a2, a3),b = (b1, b2, b3) etc.

We now have three ways of expressing the position of a point A in space. If A has coordi-nates a = (a1, a2, a3), then

A = (a1, a2, a3) = a = OA (1.3)

Although equation (1.3) identifies a point with its position, geometrically speaking we liketo distinguish the point A itself from its position vector OA. When we say “a is the position


O

x− axis

:

A

BAB

z− axis

y− axis

Figure 1.3

a3

b3

b2

a2

b1a1

vector of point A” it is understood that the tail of a is at the origin and its head is at the pointA.

Our convention is that we use capital letters A,B, . . . to denote points in space.

Given the point C, there is a unique point D in space such that CD = AB (see Exercise 2,No.4). There is also a unique point D such that OD = AB and we may write D = AB. Inthat case the tail of D can only be O. The notation OD brings out the vectorial nature of thedisplacement from O to D.

1.1.4 More systematic notational convention

Instead of speaking of the x−, y− and z−axis, we also refer to these as the x1−, x2− andx3−axis respectively. Accordingly, given vectors a, b, c,... it will be convenient to assume(unless otherwise stated) that a = (a1, a2, a3), b = (b1, b2, b3), c = (c1, c2, c3), .... Also byconvention, points A, P , Q,... have position vectors a, p, q,...., unless otherwise specified.

1.1.5 Addition of vectors

Let u = (u1, u2, u3) and v = (v1, v2, v3) be two vectors. Their sum u + v is defined by theequation

u + v = (u1 + v1, u2 + v2, u3 + v3)

The vector −v is defined as −v = (−v1,−v2,−v3) and the difference

u− v = u + (−v) = (u1 − v1, u2 − v2, u3 − v3)

The zero vector is 0 = (0, 0, 0).


1.1.6 Basic properties of addition of vectors

For all vectors u, v and w

1. u + (v + w) = (u + v) + w (associative law)

2. u + v = v + u (commutative law)

3. 0 + u = u (the vector 0 = (0, 0, 0) behaves like the number 0)

4. u + (−u) = 0

An important result for addition is the following:

1.1.7 Geometrical interpretation: The triangle law for addition ofvectors

AB + BC = AC

To see this, let the position vectors of A, B and C be a, b and c respectively. Then AB = b−aand BC = c− b. Hence by the properties of addition,

AB + BC = (b− a) + (c− b) = (b− b) + (c− a)= 0 + c− a = c− a = AC

Figure 1.4 illustrates this geometrical result in which it is understood that the head B ofAB is the tail of BC etc.The point D is chosen so that AD = BC. It follows that DC = AB (why?). We have ageometric interpretation of the above commutative law (known as the parallelogram law foraddition of vectors):

BC + AB = AD + DC = AC = AB + BC

A

µ-

6

C1

.........................................................6

B

.......................................

D

AB

BC

AC

O

AD

6

Figure 1.4

DC

:b

....................

AB = DCAD = BC

AC = AB + BC

1.1.8 Multiplication of vectors by scalars. Direction of vectors. Par-allel vectors

In contrast to a vector, a scalar is just a real number, i.e. an element of the real number system<. We use α, β, a, b, r, s, t, ... to denote scalars. Note that f (f underlined) denotes a vector


while f is a scalar.

Let u = (u1, u2, u3) be a vector and α ∈ < be a scalar. The product αu is defined by

αu = (αu1, αu2, αu3)

The vector αu is said to be a (scalar) multiple of u. We usually omit the qualification ‘scalar’.

1.1.9 Properties of the product αu

For all scalars α, β and vectors u and v,

1. α (u + v) = αu + αv

2. (α + β) u = αu + βu

3. α (βu) = (αβ)u

4. 1u = u

5. αu = 0 if, and only if, α = 0 or u = 0

6. |αu| = |α| |u| (the length of αu is |α| times the length of u)

The proofs of the first four are left to you. We prove the last:

|αu| = |(αu1, αu2, αu3)|=

√(αu1)

2 + (αu2)2 + (αu3)

2

=√

α2u21 + α2u2

2 + α2u23

=√

α2 (u21 + u2

2 + u23)

=√

α2

√u2

1 + u22 + u2

3

= |α|√

u21 + u2

2 + u23

= |α| |u|

1.1.10 Geometric interpretation of αu. Parallel vectors

Since |αu| = |α| |u|, multiplication of u by the scalar α multiplies the length of u by |α|.Let u be non-zero, so that it has a definite direction. If α > 0 the direction of αu is the sameas that of u, while if α < 0 the direction of αu is opposite to that of u.

Let u and v be non-zero vectors. We say they have the same direction if u = αv for somescalar α > 0 and opposite directions if u = αv for some α < 0. The vector u is parallel tov if u is a multiple of v, that is, u = αv for some scalar α. Necessarily α 6= 0 and we write u ‖ vto express this relation.Notice that we only speak of vectors being parallel if they are non-zero.

1.1.11 Properties of parallelism

For all non-zero u, v and w,

1. u ‖ u

2. u ‖ v implies v ‖u


3. u ‖ v and v ‖ w together imply u ‖ w.

These properties are geometrically evident, while analytic proofs are left as an exercise. Forexample, to see property (2) analytically, let u = αv. As α 6= 0, v = 1

αu and v ‖ u, as expected.

Example 1.1.1 Let u = AB, where A is the point (−1, 7,−4) and B is the point (−4, 11, 1),so that u = (−3, 4, 5). Let v = CD, where C = (2, 9,−11) and D = (−7, 21, 4), so CD =(−9, 12, 15) = 3u. Hence u and v have the same direction while u and −v have oppositedirections but are parallel.

1.1.12 The dot (scalar) product of two vectors

Let u = (u1, u2, u3) and v = (v1, v2, v3) be any two vectors. Their dot product u · v is definedas

u · v = u1v1 + u2v2 + u3v3 (1.4)

Note that the dot product of two vectors is a scalar and not a vector (that is why it is alsocalled the scalar product).

1.1.13 Properties of the dot product

For all scalars α, β and vectors u, v and w,

1. u · v = v · u2. α (u · v) = (αu) · v = u · (αv)

3. (αu + βv) · w = (αu · w) + (βv · w)

4. u · u = |u|2, so |u| = +√

u · u5. u · u > 0 and u · u = 0 if, and only if u = 0.

The proofs of these these properties are left as exercises (Exercise 2 No.8).

1.1.14 Geometric interpretation of the dot product

Let u = AB and v = AC be non-zero vectors. Suppose that AB and AC make an angle θbetween them at the vertex A, where 0 ≤ θ ≤ 180 in degrees, or 0 ≤ θ ≤ π in radians (recallthat 180 degrees = π radians). Then

u · v = |u| |v| cos θ (1.5)

To see this result, see Figure 1.5 and use AB + BC = AC, so that BC = AC −AB = v−uand

|BC|2 = (v − u) · (v − u)= v · v + (−u) · (−u)− v · u− u · v= |v|2 + |u|2 − 2u · v

Then by the cosine rule (which applies whether θ is acute or obtuse),

|BC|2 = |AB|2 + |AC|2 − 2 |AB| |AC| cos θ

|v|2 + |u|2 − 2u · v = |u|2 + |v|2 − 2 |u| |v| cos θ (from the previous equation)

Cancelling |v|2 + |u|2 from both sides leads to the required result (1.5).


>

-~A

B

Cθ

θ acute

K

-

A

B

C

θ

u

v

u

vj

θ obtuse

Figure 1.5

Example 1.1.2 Let A = (1, 0, 1), B = (2, 0, 3), C =(4,√

10, 2)

and D = (2, 2,−2). Find theangle θ between AB and AC. Decide if the angle φ between AB and AD is acute or obtuse.

Solution: AB = (2, 0, 3)− (1, 0, 1) = (1, 0, 2) and AC =(4,√

10, 2)− (1, 0, 1) =

(3,√

10, 1).

Hence

cos θ =AB ·AC

|AB| |AC| =(1, 0, 2) · (3,

√10, 1

)√

12 + 02 + 22

√32 +

(√10

)2+ 12

=12

and θ = 60 degrees or π3 radians.

Similarly, AD = (1, 2,−3) and

cos φ =AB ·AD

|AB| |AD| =(1, 0, 2) · (1, 2,−3)

√12 + 02 + 22

√12 + 22 + (−3)2

= −√

514

Since the dot product is negative the angle φ is obtuse.

1.1.15 Some further properties of the dot product

1. The non-zero vectors AB and AC are at right angles (are perpendicular AB ⊥ AC) if,and only if, (AB) · (AC) = 0.Such vectors are also called mutually orthogonal.

2. |u · v| ≤ |u| |v| for any vectors u and v (Cauchy-Schwartz inequality).

3. |u + v| ≤ |u|+ |v| for any vectors u and v (Cauchy’s inequality).

The first result follows from the fact that AB ⊥ AC if, and only if, cos θ = 0.The second result is left as an exercise (Exercise 2 No.8).To see the third result, use (2) and consider

|u + v|2 = (u + v) · (u + v)= u · u + v · v + 2u · v≤ u · u + v · v + 2 |u| |v|= |u|2 + |v|2 + 2 |u| |v|= (|u|+ |v|)2


As |u + v|2 ≤ (|u|+ |v|)2, it follows that |u + v| ≤ |u|+ |v|.

Example 1.1.3 Find a non-zero vector perpendicular to both u = (2, 1,−1) and v = (3, 3,−1).

Solution:

Let x = (x1, x2, x3) satisfy x ⊥ u and x ⊥ v. Then

x · u = 2x1 + 1x2 + (−1)x3 = 0x · v = 3x1 + 3x2 + (−1)x3 = 0

By subtraction −x1−2x2 = 0, or x1 = −2x2. Thus from the first equation, x3 = 2x1 +x2 =−3x2. We may let x2 be any non-zero number, e.g. x2 = 1. Then

x = (−2, 1,−3)

is perpendicular to u and to v.We will see later why it is always possible to find a vector x 6= 0 perpendicular to two non-zerovectors. This fact will also come into our study of planes (see 1.3.4).

1.1.16 Unit vectors. Components. Projections

A vector u of length 1 is called a unit vector. For example, u =(

35 , 0,− 4

5

)is a unit vector.

A vector c 6= 0 defines the vector

c =1|c|c

which is the unique unit vector having the same direction as c. To see that c has length1 we note

|c|2 = c · c =(

1|c|c

)·(

1|c|c

)=

1|c|2 c · c =

|c|2|c|2 = 1

The component of the vector f in the direction of (or along) the non-zero vector c isdefined as f · c.Suppose that f 6= 0 and that the angle between f and c is θ. Then

f · c =∣∣f

∣∣ cos θ (1.6)

To see equation (1.6), use equation (1.5):

f · c = f · 1|c|c =

f · c|c| =

∣∣f∣∣ |c| cos θ

|c| =∣∣f

∣∣ cos θ

(See Figure 1.6). Notice that if π2 < θ ≤ π then the component (1.6) is negative (as is only

natural).The projection of f on c is the vector

(f · c) c =

(f · c|c|2

)c (1.7)

Figure 1.7 shows the geometric interpretations of the projection of the non-zero vector f = PQon c and on −c. The projection PR is the same in both cases: PR is the projection (in thenatural sense) of the vector f on the line parallel to c passing through the tail P of f .

More generally, if d ‖ c, the projection of f on c is the same as the projection of f on d. (SeeExercise 2, No.13a.)


*

-PS = c

Figure 1.6

θS

f

|f | cos θ

|f | sin θ (f 6= 0)

Q

RP

¾PS = c

Á.............................

Á.............................-

f

--

f

PT = −c

Figure 1.7

P

θ

Q Q

R RP

π − θ

TS

The situation with components is different. As noted, in Figure 1.7 the component of f along−c is −|f | cos θ, . In general, if d and c are parallel with the same direction then the componentsof f along c and d are equal. If d and c have opposite directions then the component of falong d is the negative its component along c.

We return to projections in subsections 1.2.4 and 1.2.5.

Example 1.1.4 1. If we imagine that c is pointing due East and f is a displacement, then(f · c) c is the Easterly displacement, while

∣∣f∣∣ cos θ is the Easterly component and

∣∣f∣∣ sin θ

is the Northerly component f .

You may think of f as a vector representing a force that is dragging a box placed atits tail. Then

∣∣f∣∣ sin θ is the lifting effect of f while

∣∣f∣∣ cos θ is the horizontal dragging

effect of the force.

2. Let f = (11,−3, 4) and c = (5,−1,−2). The length of c is |c| =√

52 + (−1)2 + (−2)2 =√30.

The unit vector c in the direction of c is

c =1√30

(5,−1,−2)


The component of f along c is

f · c = (11,−3, 4) · 1√30

(5,−1,−2) =53

√30

and the projection of f on c can be found from equation (1.7):

(f · c) c =

53

√30

(1√30

(5,−1,−2))

=53

(5,−1,−2)

1.1.17 The special unit vectors i, j, k

The vectors i, j and k are defined as i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1). They haveunit length and are directed along the x−, y− and z−axis respectively. It is clear that theyare also mutually orthogonal and that any vector a = (a1, a2, a3) has a unique expression

a = a1i + a2j + a3k (1.8)

The scalars a1, a2 and a3 are the components of a in the directions i, j and k respectively. Seealso No.17 of Exercise 2 below.

1.1.18 Linear combinations

Let a and b be two vectors and λ and µ two scalars. The vector x = λa + µb is called a linearcombination of a and b with coefficients λ and µ. Similarly, for scalars r, s and t, the vectorra + sb + tc is a linear combination of a, b, and c with coefficients r, s and t.

Some concrete examples.

1. Equation (1.8) shows that every vector is a linear combination of i, j and k.

2. Let a = (1,−1, 2) and b = (−3, 2, 1).

4a + 5b = 4 (1,−1, 2) + 5 (−3, 2, 1) = (−11, 6, 13)

is a linear combination of a and b with coefficients 4 and 5.

3. If a = (−1, 2, 3), b = (5,−7, 1), c = (−4, 8,−1) then

12a +

23b +

(−5

6

)c =

(376

,−313

, 3)

is a linear combination of a, b, and c with coefficients 12 , 2

3 and − 56 .

Exercise 2 1. In Figure 1.2 express ON , MP , OM , LP , NR, NM and OP as vectorsinvolving a, b and c. Write down these vectors as linear combinations of i, j and k andfind their lengths in terms of a, b and c.

2. Interpret the equation (r − (2,−1, 5)) · (r − (2,−1, 5)) = 49 geometrically.

3. Let u = (1, 3,−2), v = (2, 0, 1) and w = (3,−3, 4).

(a) Find the linear combinations u + 3v − 2w, (−5) u + 2v + 3w and 34u− 1

2v + 14w.

(b) If the tail of u is the point A = (−2,−11, 13), what is the head of u? If the head ofw is B = (2,−9, 10), what is the tail of w?

(c) Find coefficients α, β such that w = αu + βv.


(d) Is 3 (3,−1, 3) a linear combination of u, v and w?This means you must look for scalars p, q, r such that pu + qv + rw = 3 (3,−1, 3),and this involves three equations in three unknowns.

4. (Generalization of No.3b). Let u be a given vector. Show that if the tail A of u is giventhen the head B of u is uniquely determined. State and prove a similar result if the headof u is given. Draw a picture.

5. See Figure 1.8. Imagine that in each case the axes are part of a room in which you arestanding and looking at the corner indicated. Decide which of the systems of mutuallyorthogonal axes are left-handed and which right-handed. If you are outside the room,what would your answer be?

yz zx x

xxyy

z yz

Figure 1.8

6. Let A0,...,A9 be ten points in space and put am = Am−1Am for m = 1, · · · , 9 and a10 = A9A0.Find the sum a1 + · · ·+ a10 .

7. Complete the proofs in statements 1.1.6, 1.1.9 and 1.1.11 above.

Hint 3 These properties reflect similar properties of numbers. For example, the commu-tative law of addition of vectors is proved by

u + v = (u1 + v1, u2 + v2, u3 + v3)= (v1 + u1, v2 + u2, v3 + u3)= v + u

The proof of the first item of 1.1.9 is

α(u + v) = α(u1 + v1, u2 + v2, u3 + v3)= (α(u1 + v1), α(u2 + v2), α(u3 + v3))= (αu1 + αv1, αu2 + αv2, αu3 + αv3)= αu + αv

8. Complete the proofs of subsections 1.1.13 and 1.1.15.

Hint 4 For the Cauchy-Schwartz inequality it may be assumed that the vectors are non-zero. For a geometric proof use equation (1.5). For a purely algebraic proof see the nextquestion.


9. (a) Show that a dot product (αr + βs) · (γu + δv) of linear combinations multiplies outjust like in ordinary arithmetic.

(b) Show that |αu + βv|2 = α2 |u|2 + β2 |v|2 + 2αβu · v.(c) By putting α = |v| and β = − |u| in 9b prove the Cauchy-Schwartz inequality.

10. Show that if the vectors u and v in equation (1.5) are unit vectors, then cos θ = u · v.11. Show that the non-zero vectors u and v are parallel if, and only if, |u · v| = |u| |v|.12. Let a = (3,−1, 2) and b = (−7, 3,−1). Find

(a) The unit vectors a and b.(b) The component of a along b and the component of b along a.(c) The projection of a on b and the projection of b on a.

13. (a) If γ 6= 0, show that the projections of f on c and f on γc are equal.(b) Show that if f 6= 0 then the projection of f on itself is f .(c) Let g be the projection of f on c 6= 0. Show that the component of f along c is g · c.

14. A man walks from a point A in the N-E direction for 15 km to a point B. Find thecomponent of the displacement AB in the following directions

(a) East(b) South(c) N30◦E

15. A wind is blowing at 10 km/h in the direction N30◦W. What is its component in thedirection S15◦W?

16. Let u and v be two vectors. The set of all linear combinations of u and v is called thespan of the vectors and is denoted by sp (u, v). (This is discussed more fully in item 2 ofsubsection 3.1.2 of Chapter 3).

(a) Show that if u = (−1, 2, 3) and v = (2, 4,−5), then sp (u, v) consists of all vectors ofthe form (−α + 2β, 2α + 4β, 3α− 5β), where α and β are arbitrary scalars.

(b) How do you think the span of three vectors u, v, w should be defined?Describe sp ((−2, 3, 1), (5,−4, 2), (−1, 1,−3)).

(c) * Let u, v, a and b be four vectors. Show that

sp (u, v) ⊆ sp (a, b)

if, and only if each of the vectors u and v are linear combinations of the vectors aand b.

17. * Let l, m and n be three mutually orthogonal unit vectors. Placing the tails of thesevectors at the origin, it is visually clear that we may use these vectors as a frame ofreference in place of i, j and k. (An analytic proof of this will follow from work done inChapter 3 - See Exercise 78, No.9). Assuming this result, any vector a can be expressedas a linear combination

a = γ1l + γ2m + γ3n

Show that necessarily the coefficients γ1, γ2, γ3 are the components (in the above sense)of a in the directions l, m and n respectively. In fact, γ1l, γ2m, γ3n are the projectionsof the vector a on l, m and n respectively. Conclude that

|a|2 = γ21 + γ2

2 + γ23

Show that l = 1√6

(1,−1, 2), m = 1√3

(−1, 1, 1) and n = 1√2(1, 1, 0) is such a system of

three mutually orthogonal unit vectors and find the coefficients γ1, . . . , γ3, if a = (−3, 2, 1).


Hint 5 Take the dot product of both sides of the above equation with each of l, m and n.The coefficients for the specific example are the components a · l = γ1 etc. A longerprocedure to find the coefficients for the specific example is to use the method of 3d above.

18. What familiar result from Euclidean Geometry does |b− a| ≤ |a|+ |b| represent?

19. Let u and v have the same length: |u| = |v|. Show that (u− v) · (u + v) = 0. In otherwords, if u− v and u + v are not zero they are at right angles.

20. How are our general results affected (and simplified) when one of the coordinates is re-stricted to be zero for all vectors under consideration?

1.2 The straight line

Two distinct points determine a line. Points that lie on one and the same line are calledcollinear.

Let P , Q and R be three distinct points. Then it is geometrically obvious that if R lies on the(infinite) line through P and Q then PQ is parallel to PR. We will see that our definition of astraight line conforms to this criterion. See Exercise 6, No.2.

1.2.1 Generic (parametric) representations of a line `

Let A and B be two distinct points. Then as t ∈ < varies, the point R with position vector

r = r (t) = OA + tAB (1.9)

varies over the entire straight line ` through A and B.As in Figure 1.9, from O jump onto the line at A and then go parallel to AB to R. The pointR = A is obviously on ` and corresponds with t = 0. If R 6= A we have AR = tAB for somet 6= 0, so in general

r = OR = OA + AR = OA + tAB

When t = 1 then we are at R = B.

..

¸

*

O

A

B

`

R

tABµ

OA

r = OR = OA + tAB

Figure 1.9

1.2. THE STRAIGHT LINE 21

As r = r (t) represents a general point on the line `, the equation (1.9) is called a generic orparametric equation of the line through A and B. The variable t is called the parameter.

1.2.2 Some general observations

In what follows a = OA, b = OB, etc.

1. As t varies over the set < of real numbers the point r = OR = a + t (b− a) of equation(1.9) varies continuously over the line `. In this representation of the line ` we considerb− a = AB as the direction of `. In Figure 1.9 as t increases R moves to the right, as tdecreases R moves to the left.

2. As remarked, t = 0 corresponds to R = A and t = 1 to R = B. If t is between 0 and 1,say t = 1

2 we get

OR = a +12

(b− a) =12

(a + b)

Thus 12 (a + b) is the position vector of the point midway between A and B.

3. More generally, a point

r = r(t) = a + t (b− a) = (1− t) a + tb

where 0 ≤ t ≤ 1 lies between A and B and the set AB of these points is called the linesegment joining A and B.If A 6= B and 0 < t < 1, the point r(t) is said to lie strictly between A and B.The line segment AB is a set of points and it must be distinguished from the vector ABand also from the distance |AB|.

4. Let u 6= 0 define a direction and suppose a is the position vector of the point A. Then ageneric equation of the line ` through A with direction u is

r (t) = a + tu (1.10)

A typical point a + tu on the line is called a generic point. As t varies over the realnumbers in the generic equation (1.10) we obtain a set of points which we identify with` and (1.10) is its analytic definition.

5. A generic equation is not unique.

(a) If C 6= D are two points on the line given by equation (1.9), then since these pointsalso determine the line, another generic equation (now with parameter s) is

r (s) = OC + sCD

The parameters t and s are of course related. See Example 1.2.1 of subsection 1.2.3and No.1b of Exercise 6.

(b) When do two generic equations like (1.10) define the same line? Let u and v benon-zero vectors and suppose that a + tu is a generic equation of line `1 and b + sva generic equation of line `2.Then `1 = `2 if, and only if, u ‖ v and a− b is a multiple of u. (See No.4 of Exercise6).


1.2.3 Some illustrative Examples

Example 1.2.1 Find two generic equations for the line ` passing through A = (1, 2, 3) andB = (−1, 1, 4).

Solution: One generic equation is

r = (1, 2, 3) + t ((−1, 1, 4)− (1, 2, 3))= (1, 2, 3) + t (−2,−1, 1) = (1− 2t, 2− t, 3 + t)

Another is

r = (−1, 1, 4) + s ((1, 2, 3)− (−1, 1, 4))= (−1, 1, 4) + s (2, 1,−1) = (−1 + 2s, 1 + s, 4− s)

In the second representation we have used r = OB + sBA, with parameter s. The relation-ship between s and t is s = 1− t, as can be easily seen.

Example 1.2.2 Find the position vector of the point R lying on the line ` of Example 1.2.1that is a distance 1

3 |AB| from A but on the side of A opposite to that of B.

Solution: To find R put t = − 13 in OR = OA + tAB to get

OR = OA +(−1

3

)AB

= (1, 2, 3) +(−1

3

)(−2,−1, 1)

=(

53,73,83

)

Example 1.2.3 Find the foot Q of the perpendicular from the point P = (2,−1, 4) to the line` of Example 1.2.1. Hence find the shortest distance between P and `.

Solution:

Let Q = (1− 2t, 2− t, 3 + t) be the required foot of the perpendicular from P = (2,−1, 4) to theline `. Then PQ = (−1− 2t, 3− t,−1 + t) must be perpendicular to the direction (−2,−1, 1)of `:

(−1− 2t, 3− t,−1 + t) · (−2,−1, 1) = 0

This gives t = 13 and so Q = q =

(13 , 5

3 , 103

). The shortest distance of P from ` is thus∣∣PQ

∣∣ =∣∣(− 5

3 , 83 ,− 2

3

)∣∣ = 13

√93. (See Figure 1.10).

1.2.4 A general result for the foot of a perpendicular to a line

There is a better way to visualize the position vector q of the foot Q of the perpendicular fromfrom P to the line ` with generic equation r = a+ tu. From Figure 1.11 we see that AQ = q−ais the projection of AP = p− a on the vector u. Using equation (1.7) with f = p− a and c = uwe obtain

q = a +

(1|u|2

(p− a

) · u)

u (1.11)

Note that (as in example 1.2.3) the shortest distance from P to ` is |PQ|.


Q

P = (2,−1, 4)

Figure 1.10

r = (1− 2t, 2− t, 3 + t)

A

Q

P

*

Á

u

AQ = q − a

Figure 1.11

AP = p− a

line `

Example 1.2.4 Let us apply the formula (1.11) to the above example 1.2.3.

Solution:

For the line (1, 2, 3) + t (−2,−1, 1) we have |u|2 = 6 and the foot of the perpendicular fromP = (2,−1, 4) to the line has position vector

q = (1, 2, 3) +(

16

((2,−1, 4)− (1, 2, 3)) · (−2,−1, 1))

(−2,−1, 1)

=(

13,53,103

)

This is the same result as before.

In Exercise 6, No.9 you are asked to derive the general formula 1.11 using the the first methodused to solve this problem.

Example 1.2.5 Find the foot Q of the perpendicular from a general point P = (p1, p2, p3) tothe line r = a + tu if a = (1, 2, 3) and u = (−2,−1, 1).


Solution:

Using equation (1.11) - effectively equation (1.7) - we obtain

q − a =((p1, p2, p3)− (1, 2, 3)) · (−2,−1, 1)

|(−2,−1, 1)|2 (−2,−1, 1) =−2p1 − p2 + p3 + 1

6(−2,−1, 1)

=(−1

3+

23p1 +

13p2 − 1

3p3,−1

6+

13p1 +

16p2 − 1

6p3,

16− 1

3p1 − 1

6p2 +

16p3

)

and so

q =(

23

+23p1 +

13p2 − 1

3p3,

116

+13p1 +

16p2 − 1

6p3,

196− 1

3p1 − 1

6p2 +

16p3

)

1.2.5 Projections on a line through the origin

When a line ` passes through the origin, the foot Q of the perpendicular from P to ` is calledthe projection of P on `.The projection of p on the line tu is just the projection of p on the vector u and is the resultof putting a = 0 in equation (1.11):

q =

(p · u|u|2

)u (1.12)

Example 1.2.6 Find the projection of P = (p1, p2, p3) on the line t (−2,−1, 1).

Solution:

We can find the required projection by letting a = 0 in example 1.2.5 or directly from (1.12):

q =(p1, p2, p3) · (−2,−1, 1)

|(−2,−1, 1)|2 (−2,−1, 1) =−2p1 − p2 + p3

6(−2,−1, 1)

=(

23p1 +

13p2 − 1

3p3,

13p1 +

16p2 − 1

6p3,−1

3p1 − 1

6p2 +

16p3

)(1.13)

We will return to this example in Chapter 3 (See Example 3.3.1).

Example 1.2.7 A straight line lies in the x − y plane and passes through the origin. Theline makes an angle of 30◦ with the positive x−axis. Find the foot Q of the projection fromP = (p1, p2) on the line.

Solution:

Let u =(cos π

6 , sin π6

)=

(√3

2 , 12

). The parametric equation of the line is r = t

(√3

2 , 12

)and

the foot Q of the perpendicular from P to the line is the projection of p = (p1, p2) on the unitvector u:

q =

(√3p1

2+

p2

2

)(√3

2,12

)=

(14

(3p1 +

√3p2

),14

√3p1 +

14p2

)(1.14)

We shall return to this example in subsection 1.4 below and again in Chapter 3 Example 3.2.1.


7 Á

-6

-

A

B C

D

1..............................................

XAB = DCAD = BC

AX = XC

Figure 12

A well-known result on parallelograms

Example 1.2.8 Show that the diagonals of a parallelogram bisect each other.

Solution:

In Figure 1.12 ABCD is a parallelogram with AB = DC and BC = AD. It is understood thatthe parallelogram is non-degenerate in that no three of A, B, C, D are collinear.

Let X be the midpoint of AC. ThenAB + BX = AX = XC = XD + DC = XD + AB, so BX = XD.

1.2.6 Intersecting, parallel and skew lines

Let `1 and `2 given by generic equations a1 + su1 and a2 + tu2 respectively be two straightlines. They are parallel if u1 ‖ u2.Exactly one of the following holds (see Exercise 6, No.5* for a rigorous proof):

1. The lines are identical.

2. They are distinct and parallel.

3. They are not parallel and meet in a (unique) point.

4. They are not parallel and do not intersect (they are by definition skew).

`1

P

Q

`2 Figure 13


A familiar example of skew lines is provided by a road passing under a railway bridge. Atrain and car can only crash at a level crossing!In the case of skew lines there will be points P on `1 and Q on `2 such that PQ is perpendicularto the directions of both lines and the distance

∣∣PQ∣∣ is minimum. See Figure 1.13.

Example 1.2.9 Let `1 be the line through A = (0, 0, 0) and B = (1, 2,−1) and suppose `2 isthe line through C = (1, 1, 1) and D = (2, 4, 3). Are the lines skew? In any case, find theshortest distance between them.

Solution: Generic equations for `1 and `2 are t(1, 2,−1) and (1, 1, 1)+s(1, 3, 2) respectively.Since (1, 2,−1) 6‖ (1, 3, 2), the lines are not parallel (and so cannot possibly be identical). Theywill meet if for some t and s

t(1, 2,−1) = (1, 1, 1) + s(1, 3, 2)

This means that the three equations t = 1+s, 2t = 1+3s and −t = 1+2s must hold. From thefirst and last equation, 1+s = − (1 + 2s) and thus s = − 2

3 , t = 13 . But then 2t = 2

3 6= 1+3(− 2

3

)and the second equation fails. It follows that `1 and `2 are skew.Let P = t(1, 2,−1) and Q = (1, 1, 1) + s(1, 3, 2) be the points where PQ is minimum. ThenPQ = (1, 1, 1)+s(1, 3, 2)− t(1, 2,−1) is perpendicular to both directions (1, 2,−1) and (1, 3, 2).Hence:

((1, 1, 1) + s(1, 3, 2)− t(1, 2,−1)) · (1, 2,−1) = 2 + 5s− 6t = 0((1, 1, 1) + s(1, 3, 2)− t(1, 2,−1)) · (1, 3, 2) = 6 + 14s− 5t = 0

Solving these two equations gives s = − 2659 and t = − 2

59 . Hence,

PQ = (1, 1, 1) + s(1, 3, 2)− t(1, 2,−1) =(

3559

,−1559

,559

)

The shortest distance between the lines is∣∣( 35

59 ,− 1559 , 5

59

)∣∣ = 5√59

.

Exercise 6 1. let A = (1, 2,−3), B = (−1, 1, 2), C = (5, 4,−13), D = (−5,−1, 12) andE = (−1,−2, 3). Answer the following questions.

(a) Find a generic equation with parameter t of the straight line ` through A and B inthe form of equation (1.9).

(b) Is E on `? Show that C and D are on ` and find a corresponding generic equationof the line with parameter s. Find the relationship between s and t.

(c) Find two points on ` at a distance√

24 from E.

(d) Find an equation describing the line segment joining A and B. In particular find themidpoint of the segment and also the three points strictly between A and B dividingthe segment into four equal parts.

2. Let P , Q and R be three distinct points on the line of equation (1.10). Show that they arecollinear, i.e., PQ ‖ PR.

Hint 7 Let P = a + t1u, Q = a + t2u and R = a + t3u.

3. Let the generic equation r(t) = a + tu represent a line `, where u 6= 0, as in the previousquestion. Suppose that P with position-vector p is any point. Show that P lies on `

(a) if, and only if, p− a is a multiple of u.


(b) if, and only if, the equationa + tu = p

has a unique solution for t.

(Compare this with No.12 below for another criterion).

4. * Let u and v be non-zero vectors and suppose that a+ tu and b+sv are generic equationsof lines `1 and `2 respectively.Show that necessary and sufficient conditions for the generic equations to represent thesame set of points (i.e. `1 = `2) are

(i) u ‖ v and

(ii) a− b is a multiple of v.

Using this result show that two lines sharing two distinct points are identical.

5. * Use the results of the previous exercise to show that the four conditions in subsection1.2.6 are in fact mutually exclusive and exhaustive.

6. Using equation (1.11), or otherwise, find the point Q of the line ` with parametric equation(1− 2t, 2− t,−3 + 5t) which is closest to

(a) the origin,

(b) the point P = (5, 8, 4),

(c) the point P = (−3, 2, 1).

7. Find the projections of the following points P on the line (−2t,−t, 5t).

(a) P = (4, 6, 7)

(b) P = (−1,−2, 3).

8. Find the projection of the point P with position vector p = (p1, p2, p3) on the line (−2t,−t, 5t).

9. * Derive the formula (1.11) that finds the foot Q of the perpendicular from P = (p1, p2, p3)onto the line a + tu using the technique of example 1.2.3.

Hint 8 Show that the solution to(a + tu− p

) · u = 0 is

t = −(a− p

) · u|u|2

10. Show that the closest the line ` with parametric equation a + tu comes to the point P is

d =1|u|

√(|p− a||u|)2 − ((p− a

) · u)2 (1.15)

11. Show that |r (t)|2 in equation (1.10) is a quadratic in t. How does your knowledge ofquadratics tie up with the formula (1.11) and equation (1.15)?

12. Deduce from equation (1.15) that ` passes through the point P if, and only if,(p − a) · u = ±|p − a||u|. Note that geometrically this is clear in view of the formula forthe angle between p− a and u if p− a 6= 0.

13. * If t is time in seconds and u is the displacement in meters per second (the constantvelocity), then equation (1.10) can be interpreted as the position of (say) an aircraft inmeters at time t.


(a) In a situation like this we would be interested in how close the aircraft comes to acertain point P for t ≥ 0. Find a condition for (1.15) to hold at some time t ≥ 0.If this condition fails, what is the closest the aircraft gets to P and when does thisoccur?

(b) How can your findings be used to find the closest distance two aircraft travellingwith constant velocities get to each other, assuming that as time t ≥ 0 increases thedistance between the aircraft decreases?

Hint 9 Let a + tu and b + tv describe the positions of the two aircraft at time t.Consider the line b−a+ t (v − u). You want its closest distance from the origin. Byassumption, why is (b− a) · (v − u) < 0?

14. A rhombus is a (non-degenerate) parallelogram all of whose sides have the same length.Show that the diagonals of a rhombus intersect at right angles.

Hint 10 In Example 1.2.8 (Figure 1.12) use |AB| = |BC| as well as AC = AB + BCand AB + BD = AD, so BD = AD − AB = BC − AB. Now use the result of Exercise2 No.19.

15. *(Generalization of previous exercise). Let a + tu and b + sv represent two straight linesand suppose that A = a + t1u, B = b + s1v, C = a + t2u, and D = b + s2v are points onthese lines with A 6= C and B 6= D.Show that |AB|2 + |CD|2 = |BC|2 + |DA|2 if, and only if, the lines are perpendicular.

Hint 11 Expand |AB|2 = (b + s1v − (a + t1u)) · (b + s1v − (a + t1u)) etc. and show that|AB|2 + |CD|2 − |BC|2 − |DA|2 = 2u · v(t1 − t2)(s2 − s1).

16. Let A, B and C be three points not on a line and so forming the distinct vertices of atriangle. The median from A is the line joining A to the midpoint of the opposite sideBC. Show that all three medians intersect in a single point M . This point is called thecentroid of the triangle and is the centre of mass of three equal masses placed at thevertices of the triangle.

Hint 12 Let a, b, and c, be the position vectors of A, B and C respectively. Then themidpoint of BC has position vector 1

2 (b + c). A generic equation of the line through Aand the midpoint of BC is

a + t

(12(b + c)− a

)

Put t = 23 and simplify the result. You should conclude that the centroid lies two thirds of

the way from any vertex along its median toward the opposite side.

17. (Cartesian equations for a straight line). Let equation (1.10) define a straight line, whereu = (u1, u2, u3) and each ui 6= 0. Show that r = (x, y, z) is on the line if, and only if,

x− a1

u1=

y − a2

u2=

z − a3

u3

What happens if one or two of the ui = 0? Find Cartesian equations for the line ofQuestion 1a above. Find generic and Cartesian equations for the x−, y− and z−axis.

18. Let `1 be the line through A = (−1,−2, 6) and B = (3, 2,−2) and `2 the line throughC = (1, 3, 1) and D = (1, 1, 5). Solve the following problems.

(a) Show that `1 and `2 are skew.

1.3. PLANES 29

(b) Find points P on `1 and Q on `2 such that PQ is perpendicular to the directions ofboth lines. Hence find the shortest distance between `1 and `2.

19. Let `1 be the line through A = (2,−1, 4) and B = (3, 0, 5) and `2 the line through C =(−1, 0, 1) and D = (−2, 1, 0). Show that `1 and `2 meet and find the point of intersection.

20. How are our general results affected (and simplified) when one of the coordinates (say thez− coordinate) is restricted to be zero? In particular what does equation (1.11) reduce to?

1.3 Planes

1.3.1 Generic Equation for a plane

Our first object is to find a generic equation for the plane passing through three non-collinearpoints A = (a1, a2, a3), B = (b1, b2, b3) and C = (c1, c2, c3).

If four points lie on a plane, they are called coplanar. So if R is a point, our first questionis “when are A, B, C and R coplanar?” The answer is suggested by Figure 1.14:

7

A

qÁ

Á

q

q

B

C

R

P

Q

tAC = AQ = PR

sAB = AP = QR

}

6

0Figure 14

Suppose that R lies in the same plane as A, B and C. Let the line through R parallel to ACmeet the line through A and B at the point P . Then AP = sAB for some scalar s. Similarly,let the line through R parallel to AB meet the line through A and C at the point Q, so thatAQ = tAC for some scalar t. The figure APRQ forms a parallelogram and

OR = OA + AR = OA + (AP + PR) = OA +(AP + AQ

)

= OA + sAB + tAC (1.16)

This equation describes the position vector r = OR of a general point on the plane throughA, B and C and is called a generic equation of the plane through A, B and C.

Example 1.3.1 Let A = (1,−1, 2), B = (1, 2, 3) and C = (−1, 0, 1) be three given points.They are not collinear since the vectors AB = (0, 3, 1) and AC = (−2, 1,−1) are not parallel.Hence a generic equation of the plane through A, B and C is

r = (1,−1, 2) + s(0, 3, 1) + t(−2, 1,−1)


In equation (1.16) the vectors u = AB and v = AC are non-zero and non-parallel and arethought of as parallel to the plane described. This suggests the following definition:

1.3.2 Analytic definition of a plane

Let u and v be two non-zero, non-parallel vectors and A = a a given point. Then the set Π ofpoints R = r satisfying the generic (parametric) equation

r = r(s, t) = a + su + tv (1.17)

describes a plane. The parameters are s and t and we call a + su + tv a generic point of theplane.

1.3.3 Equality of planes

As with lines, a generic equation for a plane is not unique. Let

a1 + s1u1 + t1v1 and a2 + s2u2 + t2v2

Represent planes Π1 and Π2 respectively. Then Π1 = Π2 if, and only if,

(i)sp (u1, v1) = sp (u2, v2)

and

(ii) a1 − a2 is in this common span.For the meaning of ‘span’ see No.16 of Exercise 2 and compare No.4 of Exercise 6.

The proof of the above statement is left as an exercise. See No.13 of Exercise 15 below.

1.3.4 Cartesian Equation of a plane

A vector n = (n1, n2, n3) 6= 0 is called a normal of the plane defined by equation (1.17) if itis perpendicular to every line lying in the plane. So for a variable point R on the plane wemust have n · AR = 0. Thus as in Figure 1.15, all points on the plane with position vectorr = (x1, x2, x3) must satisfy

n · (r − a) = 0

This is a Cartesian equation of the plane.More fully this reads

n1x1 + n2x2 + n3x3 − (n1a1 + n2a2 + n3a3) (1.18)= n1x1 + n2x2 + n3x3 + c = 0

where c = − (n1a1 + n2a2 + n3a3)A Cartesian equation is unique up to non-zero constant multiples of (n1, n2, n3, c) since

evidently multiplying n1, n2, n3 and c by β 6= 0 cannot change a solution (x1, x2, x3).

1.3.5 Finding a normal n

If n is a normal to the plane given by equation (1.17), then in particular n ⊥ u and n ⊥ v. Inother words, simultaneously

n1u1 + n2u2 + n3u3 = 0 (1.19)n1v1 + n2v2 + n3v3 = 0

1.3. PLANES 31

6

:

A

n

R = (x1, x2, x3)

plane (r − a) · n = 0

AR

Figure 15

Conversely, if equation (1.19) holds then n must be a normal (see Exercise 15, No.5 below).Because equation (1.19) involves two equations for three unknowns, a solution n 6= 0 can alwaysbe found. See Chapter 2, 2.2.15 for a proof of this.

Example 1.3.2 Find a normal n 6= 0 to the plane of Example 1.3.1 and hence its Cartesianequation.

Solution:

We solve

(n1, n2, n3) · (0, 3, 1) = 3n2 + n3 = 0(n1, n2, n3) · (−2, 1,−1) = −2n1 + n2 − n3 = 0

simultaneously for n1, n2 and n3. (We did much the same thing in Example 1.1.3). Fromthe first equation, n3 = −3n2. Substituting this into the second equation, we get n1 =12 (n2 − n3) = 2n2. Thus, provided n2 6= 0, the vector

n = (2n2, n2,−3n2) = n2 (2, 1,−3)

is perpendicular (normal) to the plane. We may take n2 = 1 and n = (2, 1,−3). WithA = (1,−1, 2) we get a Cartesian equation for the plane:

(2, 1,−3) · ((x1, x2, x3)− (1,−1, 2)) = 2x1 + x2 − 3x3 + 5 = 0

Alternate way to find the cartesian equation of a plane

Consider again example 1.3.1 with generic equation r = (1,−1, 2) + s(0, 3, 1) + t(−2, 1,−1).Writing x = 1− 2t, y = −1 + 3s + t and z = 2 + s− t, we can eliminate s and t as follows.

Substituting t = 1−x2 into y = −1 + 3s + t and z = 2 + s − t gives y = −1 + 3s + 1−x

2 andz = 2 + s− 1−x

2 . Hence

y − 3z = −1 + 3s + 1−x2 − 3

(2 + s− 1−x

2

)= −2x− 5 and the Cartesian equation of the plane

is 2x + y − 3z + 5 = 0.


1.3.6 Some remarks on planes

Remark 13 The following statements are geometrically obvious, but analytic proofs can begiven after we have studied the subject more deeply, in particular after we ave done Chapter 3.See No.10 in Exercise 78.

1. Provided n = (n1, n2, n3) 6= 0, an equation n1x1 + n2x2 + n3x3 + c = 0 always representsa plane with normal n. (If n = 0 the equation represents nothing if c 6= 0 and all of spaceif c = 0).

2. By definition, two distinct planes are parallel if their normals are parallel. Such planesare a constant positive distance apart and have no points in common.

3. Two non-parallel planes meet in a line.

4. By definition, a line is parallel to a plane if it is perpendicular to its normal (drawa picture to see that this is correct). A line not parallel to a plane meets the plane inexactly one point.

5. A line and a point not on it determine a unique plane.

6. There are an infinity of planes containing a given line.

Example 1.3.3 Find a generic equation for the line of intersection of the planes

x1 + x2 + x3 + 1 = 02x1 + x2 − x3 + 2 = 0

Solution:

Note that (1, 1, 1) 6‖ (2, 1,−1) so the planes are not parallel. Eliminating x1 gives x2 + 3x3 = 0or x2 = −3x3. Substituting x2 = −3x3 into the first (or second) equation gives x1 = 2x3 − 1.Hence, with t = x3 the parameter,

r = (x1, x2, x3) = (2t− 1,−3t, t)

is the line of intersection of the two planes.

Example 1.3.4 Find a generic equation of the plane with Cartesian equation

x1 − 2x2 + 5x3 = 10 (1.20)

Solution:

By far the easiest solution isr = (10 + 2x2 − 5x3, x2, x3)

Here the independent parameters are x2 and x3 and they can vary freely over <. We couldof course let s = x2 and t = x3, but this is only an artificial device and changes nothing.

A more long-winded approach is to find three non-collinear points A, B and C on the planeand then proceed as in Example (1.3.1).

Letting x2 = x3 = 0 gives x1 = 10 and A = (10, 0, 0) as one point. Putting x1 = x2 = 0gives x3 = 2 and B = (0, 0, 2). With x1 = x3 = 0 we have x2 = −5 and C = (0,−5, 0). Thepoints A, B and C do not lie on a line and another generic equation of the plane is

r = OA + sAB + tAC

= (10− 10s− 10t,−5t, 2s)

1.3. PLANES 33

Remark 14 Instead of A, B and C we could have chosen any three non-collinear points sat-isfying equation (1.20). The parametric equation will be different but will represent the sameplane.

Example 1.3.5 Find two non-parallel vectors u and v both perpendicular to n = (1,−2, 5).

Solution:

From the previous example two such vectors are u = AB and v = AC

Example 1.3.6 Find the point of intersection of the line through (1,−1, 0) and (0, 1, 2) andthe plane

3x1 + 2x2 + x3 + 1 = 0

Solution:

For some t the point

r = (1,−1, 0) + t ((0, 1, 2)− (1,−1, 0))= (1,−1, 0) + t (−1, 2, 2) = (1− t,−1 + 2t, 2t)

must be on the given plane, i.e.

3 (1− t) + 2 (−1 + 2t) + (2t) + 1 = 0

Thus t = − 23 and the point of intersection is

(53 ,− 7

3 ,− 43

).

1.3.7 Perpendicular from a Point to a Plane

If Π is a plane and P a point, the problem in this section is to find the foot Q of the perpendicularfrom P to the plane. See Figure 1.16.

O

P

Á

Q

plane Π

Figure 1.16 Perpendicular Q from P to plane Π

If the plane passes through the origin, the foot Q is called the projection of P on the plane.


Example 1.3.7 Find the foot Q of the perpendicular from the point P = (4,−1, 3) to the planex + 2y + z = −4. Hence find the shortest distance from P to the given plane.

Solution:

The vector (1, 2, 1) is normal to the plane so the line ` defined by (4,−1, 3) + t (1, 2, 1) =(4 + t,−1 + 2t, 3 + t) passes through P and has direction normal to the plane. Geometry tellsus that the line ` hits the plane at the required foot Q of the perpendicular. Then

∣∣PQ∣∣ will

be the shortest distance of P from the plane.To find t, substitute (4 + t,−1 + 2t, 3 + t) into the Cartesian equation of the plane

(4 + t) + 2 (−1 + 2t) + (3 + t) = −4

and solve for t. This gives t = − 32 and the position vector of the point Q therefore has

position vector

q =(

4 +(−3

2

),−1 + 2

(−3

2

), 3 +−3

2

)=

(52,−4,

32

)

The required shortest distance of P from the plane is

∣∣PQ∣∣ =

∣∣∣∣(

52,−4,

32

)− (4,−1, 3)

∣∣∣∣

=∣∣∣∣(−3

2,−3,−3

2

)∣∣∣∣ =

√272

Example 1.3.8 Find the projection of the point P = (4,−1, 3) onto the plane x+2y+z = −0.

Solution:

The plane passes through the origin (that is why we call the foot Q of the perpendicular theprojection). As before, substitute (4 + t,−1 + 2t, 3 + t) into the Cartesian equation x+2y+z =−0:

(4 + t) + 2 (−1 + 2t) + (3 + t) = 0

This gives t = − 56 and the projection of P onto the plane x + 2y + z = −0.is the point Q =(

4− 56 ,−1 + 2

(− 56

), 3− 5

6

)=

(196 ,− 8

3 , 136

).

1.3.8 A general way to see the foot Q of the perpendicular

Figure 17 represents a plane Π with Cartesian equation n1x1 + n2x2 + n3x3 + c = 0 and nor-mal n = (n1, n2, n3). The point X = (x1, x2, x3) is any point on Π and Q is the foot of theperpendicular from P = (p1, p2, p3) to the plane.

The thing to notice is that PQ is the projection of PX = (x1 − p1, x2 − p2, x3 − p3) =(x1 − p1, x2 − p2, x3 − p3) on the normal n (i.e. on the line through P and Q):

PQ =

(1|n|2 PX · n

)n

=(

1n2

1 + n22 + n2

3

(x1 − p1, x2 − p2, x3 − p3) · (n1, n2, n3))

(n1, n2, n3)

= −(

c + n · p|n|2

)n

1.3. PLANES 35

..................

..................

..................

..................

..................

............................................................................

............

............

............

............

............

............

...Q

P

Xw

6n

plane Π

Figure 17: Foot Q of pependicular from P to Π

Hence Q has position-vector OQ = OP + PQ:

q = p−(

c + n · p|n|2

)n (1.21)

The distance of P from the plane Π is

∣∣PQ∣∣ =

∣∣n · p + c∣∣

|n| (1.22)

Notice that these formulas are independent of the choice of X. See Exercise 15, No.(8) foran alternative proof of equation (1.21).

Example 1.3.9 Find the foot Q of the perpendicular from a general point P = (p1, p2, p3) to

1. the above plane x + 2y + z = −4, and

2. the parallel plane x + 2y + z = 0 which passes through the origin.

Solution

1. The foot has position vector

q = (p1, p2, p3)−(

4 + (1, 2, 1) · (p1, p2, p3)|(1, 2, 1)|2

)(1, 2, 1)

= (p1, p2, p3)− 4 + p1 + 2p2 + p3

6(1, 2, 1)

=(

5p1

6− p2

3− p3

6− 2

3,−p1

3+

p2

3− p33

−43,−p1

6− p2

3+

5p3

6− 2

3

)(1.23)


2. The projection of P on x + 2y + z = 0 is

q = (p1, p2, p3)−(

(1, 2, 1) · (p1, p2, p3)|(1, 2, 1)|2

)(1, 2, 1)

= (p1, p2, p3)− p1 + 2p2 + p3

6(1, 2, 1)

=(

5p1

6− p2

3− p3

6,−p1

3+

p2

3− p3

3,−p1

6− p2

3+

5p3

6

)(1.24)

Exercise 15 1. Show that A = (1, 2, 1), B = (0, 1, 3) and C = (−1, 2, 5) are not collinearand so define a unique plane Π. Find:

(a) A generic equation for Π

(b) A normal to Π and hence a Cartesian equation for Π

(c) The foot Q of the perpendicular from P = (1, 2, 3) to Π and so the shortest distancefrom P to the plane Π. (Use two methods, the formula (1.21) and the method ofExample 1.3.7).

(d) In two ways the foot Q of the perpendicular from P = (p1, p2, p3) to Π. Answer:

q = (p1, p2, p3)− −3 + 2p1 + p3

5(2, 0, 1) (1.25)

.

(e) The intersection of the line ` with parametric form (1 + t, 3 + 2t, 4− t) and Π, aswell as the acute angle between the line and the normal to Π.

(f) A plane containing the point (4, 5,−3) and the line (1 + t, 3 + 2t, 4− t).

2. Find a generic equation for the plane 2x + 3y − 4z + 5 = 0.

3. Show that the plane r = r(s, t) = a + su + tv passes through the origin if, and only if a isa linear combination of u and v.

4. Find generic and Cartesian equations for the x− y and y − z planes.

5. Prove that if n satisfies equation (1.19) then it is normal to the plane given by the genericequation (1.17). In other words, show that that n will be perpendicular to every vectorparallel to the plane and so to each line lying in the plane.

6. Use equation (1.21) to deduce the shortest distance formula equation (1.22).

7. Follow the method of Example 1.3.7 to do Example 1.3.9.

8. Follow the method of Example 1.3.7 to show that the foot Q of the perpendicular from thepoint P to the plane Π with Cartesian equation n · r + c = 0 is given by equation (1.21)Consider also the case c = 0. (Then Q is the projection of P on the plane).

Hint 16 The method of of Example 1.3.7 leads to the equation

n · (p + tn)

+ c = 0

for t.

9. Write the equation of a plane n · p + c = 0 in the form m · p + d = 0 where m is a unitvector.

1.3. PLANES 37

10. Let n · r + c1 = 0 and n · r + c2 = 0 represent planes Π1 and Π2 respectively. Showalgebraically that Π1 and Π2 are either identical or have no point in common. In the lattercase write down a formula for the shortest distance between Π1 and Π2 if P = (p1, p2, p3)is a given point on the first plane.

11. Find a generic equation of the line of intersection of the planes 2x + 3y + z + 1 = 0 andx− 4y + 5z − 2 = 0.

12. (Hirst, p.36). In 12a - 12c find the intersection of the three given planes. In each casegive a brief geometric description.Note: If you have difficulty in solving three equations for three unknowns, you maypostpone this question to Exercise 33 No.(2) in Chapter 2.

(a)

3x− 2y + 2z = 54x + y − z = 3x + y + z = 6

(b)

x + 2y − z = 32x + 4y − 2z = 5

x− y + z = 1

(c)

x + 2y − z = 2x− y + z = 1

3x + 3y − z = 5

Solution to 12a:let (i) 3x− 2y + 2z = 5 (ii) 4x + y − z = 3 (iii) x + y + z = 6

(i)+2(ii) 11x = 11 so x = 1 (ii)+(iii) 5x + 2y = 9 gives y = 2 and z = 3. The planesintersect in unique point (1, 2, 3)

Solution to 12b:

(i) x + 2y − z = 3 (ii) 2x + 4y − 2z = 5 (iii) x− y + z = 1

2(i) gives 2x + 4y − 2z = 6. Since this contradicts (ii), the first two equations representnon-intersecting parallel planes. The three planes have no point in common.

Solution to 12c:(i) x + 2y − z = 2 (ii) x− y + z = 1 (iii) 3x + 3y − z = 52(i)+(ii) 3x + 3y − z = 5 which is (iii). So we only have (i) and (ii).

(i)-(ii) x + 2y − z − (x− y + z) = 3y − 2z = 1. Take y for parameter. z = 32y − 1

2and x + 2y − (

32y − 1

2

)= 2, x = − 1

2y + 32 . The planes intersect in the line with generic

equation

r =(−1

2y +

32, y,

32y − 1

2

)

13. * Prove the statement of subsection 1.3.3.

Hint 17 Imitate the procedure of No.4 of Exercise 6.


14. (See Figure 1.18) Let n · r + c = 0 and m · r + d = 0 represent non-parallel planes Π1 andΠ2 respectively. Show that the Cartesian equation of a plane Π that contains the line ofintersection of Π1 and Π2 has the form

λ (n · r + c) + µ (m · r + d) (1.26)= (λn + µm) · r + λc + µd = 0

where λ and µ are two scalars not both of which are zero.

Hint 18 First note that because n 6= 0 and m 6= 0 are not parallel, the linear combinationλn+µm is also non-zero if one or both of λ, µ are non-zero. In that case equation(1.26) represents a plane Π. Note that if (e.g.) λ 6= 0 and µ = 0 then Π is just Π1.Now check that a point on Π1 and Π2 is also on Π.

15. Find the form of a Cartesian equation of the plane Π that contains the intersection of theplanes

2x1 + 3x2 + x3 + 1 = 0 and x1 − 4x2 + 5x3 − 2 = 0

If the plane Π also contains the point (1,−1, 2), find its Cartesian equation.

...........................................

..........

.............................

..........................

Π1

Π2

Π

..

...........................................

Figure 1.17

1.4 Reflections in a line or a plane

A line or plane can serve as a mirror so that a point P is reflected in a point P ′ in such away that P and P ′ are symmetric with respect to the line or plane, as in Figure Figure 1.18 .From the figure the point P ′ is given by

p′ = OP ′ = OP + 2PQ = p + 2(q − p

)= 2q − p (1.27)

where Q is the foot of the perpendicular from P to the line or plane and we have written p′

for the position vector of P ′.

1.4. REFLECTIONS IN A LINE OR A PLANE 39

P

P′=

...............

®

P

Reflection in a planeReflection in a line

P′

QQ

`

Π

Π

`

Figure 18

Example 1.4.1 Consider example 1.2.7 of a line in the x-y plane. Using equation (1.14) andequation (1.27), the reflection of p = (p1, p2) in the line is

p′ = 2q − p = 2(

14

(3p1 +

√3p2

),14

√3p1 +

14p2

)− (p1, p2) (1.28)

=(

12p1 +

12

√3p2,

12

√3p1 − 1

2p2

)

Example 1.4.2 Consider (1) of Example 1.3.9 where we found the foot of the perpendicularfrom p to the plane x + 2y + z = −4 given by equation (1.23):

q = (p1, p2, p3)− 4 + p1 + 2p2 + p3

6(1, 2, 1)

From this it follows that the reflection P ′ of P in the plane is the point with position

p′ = 2q − p = p− 4 + p1 + 2p2 + p3

3(1, 2, 1)

Exercise 19 1. Using equation (1.11) - which finds the foot Q of the perpendicular from Pto the line ` with parametric equation r = a + tu - find a formula for the reflection P ′ ofP in the line `.

2. Using equation (1.21) - which finds the foot Q of the perpendicular from P to the planewith Cartesian equation n · r + c = 0 - find a formula for the reflection P ′ of P in thegiven plane.

3. Specialize your formulae when a = 0 and c = 0.

Solution: For Question 1 the specialized formula is

p′ =2|u|2

(p · u)

u− p


For Question 2 it is

p′ = p− 2n · p|n|2 n

4. Using the answers to Questions (2) and (3), or otherwise, find formulae for reflectionsin the planes x + 2y + z = 0, 3x− y − 2z = 1 and 3x− y − 2z = 0.

5. In the x− y plane find a formula for reflection in the line passing through the origin andmaking an angle of 60◦ with the positive x−axis.

6. * In the x − y plane find a formula for reflection in the line passing through the originand making an angle of θ radians with the positive x−axis.

Hint 20 All these questions use equation (1.27)

1.5. SUMMARY OF CHAPTER 1 41

1.5 Summary of Chapter 1

1.5.1 Points in Space, Vectors

(See section 1.1).A point P = (p1, p2, p3) in space is identified with its position vector OP = p. The displacementfrom A to B is AB = b− a.

Vectors u = (u1, u2, u3) are underlined.Scalars α,β ,...,s, t,... are real numbers.The product of a vector a by a scalar γ is γa = (γa1, γa2, γa3). Non-zero vectors a and b

are parallel if b = sa for a non-zero scalar s.Vectors a and b are added:

a + b = (a1 + b1, a2 + b2, a3 + b3)

Addition satisfies the triangle law AB + BC = AC.The dot product of vectors a and b is

a · b = a1b1 + a2b2 + a3b3

The length or magnitude of a is|a| = √

a · aThe angle θ between a and b is given by a · b = |a| |b| cos θ.The unit vector in the direction of a 6= 0 is

a =1|a|a

The component of a in the direction b 6= 0 and the projection of a on b are respectively

a · b and(a · b

)b

1.5.2 Straight lines

(See section 1.2).

A parametric equation of the straight line ` through a with direction u 6= 0 is

r = a + tu (t ∈ <)

The foot Q of the perpendicular from P to ` is found by solving(a + tu− p

) · u = 0

for t and then substituting this value back into a + tu.Lines `1 and `1 are parallel if they have parallel directions and are skew if they are not

parallel and do not intersect.

1.5.3 Planes

(See section 1.3).

A generic equation of the plane containing two non-parallel vectors u and v and passingthrough the point A is

r = a + su + tv (s, t ∈ <)


A vector n 6= 0 is normal to the plane if it is perpendicular to every line lying in the plane.A Cartesian equation of the plane has the form

(r − a) · n + c = (x1 − a1) n1 + (x2 − a2)n2 + (x3 − a3)n3 + c = 0

where c is a constant.The foot Q of the perpendicular from a point P to the plane is given by equation (1.21):

q = p−(

c + n · p|n|2

)n

It can also be found by solvingn · (p + tn

)+ c = 0

for t and then substituting this into the line p + tn

1.5.4 Projections and reflections

(See equation 1.7, subsection ??, example 1.3.9 and subsection 1.4).

When a line or plane goes through the origin the foot Q of the perpendicular from P to theline or plane is called the projection of P on the plane.

The reflection P ′ of a point P in a line or plane is given by

p′ = 2q − p

where Q is the foot of the perpendicular from P to the line or plane.

Chapter 2

Matrices and the Solution ofSimultaneous Linear Equations

Linear algebra is used in almost every branch of pure and applied mathematics. It is indis-pensable for all branches of engineering, the physical sciences (including physics, chemistryand biology), statistics and for operations research (mathematics applied to industry and com-merce).

Although most results in this course are valid for various number systems (e.g. rational realor complex numbers), it is assumed that all numbers and number-variables dealt with are real,i.e. range over the real number system <. Numbers will also be referred to as scalars.

2.1 The Solution of Linear Equations

At school you learned how to solve up to three simultaneous linear equations for three unknowns,say x1, x2, x3, In Chapter 1 we saw how such equations arise in analytic geometry and usedvector notation such as a row x = (x1, x2, x3) to represent the variables x1, x2 and x3. Inthis chapter we will learn some systematic techniques for solving m equations for n unknownsx1, x2,...,xn, and we will see the need to carefully distinguish rows from columns. Whilegeometry often serves as an inspiration for various algebraic results, it is not possible to drawan actual picture representing a system involving n variables when n ≥ 4.

2.1.1 Some examples

Example 2.1.1 Find all solutions that satisfy simultaneously

2x1 + 3x2 = −1 (a)x1 − 4x2 = 5 (b)

Solution:

1. One way: Make x1 “the subject of the formula” in (b), i.e. x1 = 5 + 4x2 and substitute5 + 4x2 for x1 in (a) to give 2(5 + 4x2) + 3x2 = −1, so x2 = −1 and x1 = 1.

2. We prefer to replace the above system by a sequence of equivalent systems (i.e. that haveexactly the same solutions), known as Gauss reduction.

i (First equivalent system). Multiply (b) by 2 and leave (a) unchanged:

2x1 + 3x2 = −1 leave unchanged (a)2x1 − 8x2 = 10 2× (b) (b)

43

44CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS

ii (Second equivalent system). In the new system subtract (a) from (b), to obtain:

2x1 + 3x2 = −1 leave unchanged (a)− 11x2 = 11 [ (-1)× (a)]+(b) (b)

It is now obvious from (b) that x2 = −1. By back-substitution in (a) we get x1 = 1. Forreasons that will be become clear later we write this unique solution as a column vector

x =[

x1

x2

]=

[1−1

]

Example 2.1.2 Consider

x1 + x2 + 3x3 = −1 (a)x1 + x2 + 4x3 = −3 (b)−x1 + x2 − 2x3 = 1 (c)

(2.1)

Solution:

x1 + x2 + 3x3 = −1 leave unchanged (a)x3 = −2 [(−1)× (a)]+(b) (b)

−x1 + x2 − 2x3 = 1 leave unchanged (c)

x1 + x2 + 3x3 = −1 leave unchanged (a)x3 = −2 leave unchanged (b)

2x2 + x3 = 0 (a) + (c) (c)

Finally, interchange the equations (b) and (c):

x1 + x2 + 3x3 = −1 (a)2x2 + x3 = 0 (b)

x3 = −2 (c)(2.2)

Because the coefficients of x1 in (b) and (c) are zero and also because the coefficient of x2

in (c) is zero, we say the system is in (row) echelon or (upper) triangular form. From (c),x3 = −2. Back-substitution into (b) gives x2 = 1. Finally, back-substitution of x3 = −2 andx2 = 1 into (a) gives x1 = 4. The solution is

x =

x1

x2

x3

=

41−2

(2.3)

Example 2.1.3 Find all solutions x to the system

x1 + x2 + 3x3 + 2x4 = −1 (a)2x2 + x3 + x4 = 0 (b)

x3 − x4 = −2 (c)(2.4)

Solution:

There are only three equations for four unknowns and the equations are already in echelonform. One of the unknowns (say x4) can be any number and back-substitution gives in turnx3 = x4 − 2, x2 = − 1

2 (x3 + x4) = 1 − x4 and x1 = −1 − x2 − 3x3 − 2x4 = 4 − 4x4. It iscustomary to let x4 = t, so

2.2. MATRICES 45

x =

x1

x2

x3

x4

=

4− 4t1− tt− 2

t

(2.5)

is the general solution. The parameter t ∈ < can be any real number.

Remark 21 Here we have 3 equations for 4 unknowns and the solution shows that we havean infinity of solutions. In Example 1.3.2 of Chapter 1, we found a vector n 6= 0 normal toa plane. For β 6= 0, βn is also a normal, so again we have an infinite number of solutions,this time of two equations in three unknowns. Example 1.3.4 of Chapter 1 involved solving oneequation for three unknowns and this involved two parameters.

These examples and also Example 2.1.3 illustrate a general result:If a system of equations has more unknowns than equations and there is a solution, then thesystem has infinitely many solutions. See, for example, 2.2.15 below. In Exercise 55, No.4 youwill get a more complete picture of this theorem.

Example 2.1.4 For which values of the variables does the following system have a solution?

x1 + x2 + 3x3 = −1 (a)x1 + 3x2 + 4x3 = −1 (b)2x1 + 4x2 + 7x3 = 5 (c)

Solution:

x1 + x2 + 3x3 = −1 leave unchanged (a)2x2 + x3 = 0 [ (−1)× (a)]+(b) (b)2x2 + x3 = 7 [ (−2)× (a)]+(c) (c)

The system cannot have a solution since (b) and (c) are contradictory. The system isinconsistent (or the equations are incompatible) and the solution set is empty.

2.2 Matrices

A rectangular array A of numbers with m rows and n columns is called an m× n (“m by n”)matrix. Matrices will form a new kind of algebra and it is customary to place brackets (wewill use square ones) around such arrays.

2.2.1 Examples of matrices

5 −1 13 −1−1 1 −12 1

31 0 40 −3

is a 3× 4 matrix,

− 12 21 −3

−1 5 00 −2 71 −2 6

is a 4× 3 matrix.

[2 3

4 −11 −6 − 2

5

]is a 2 × 3 matrix;

3 2 12

−1 0 57 8 −2

is a 3 × 3 matrix. The column x =

x1

x2

x3

is a 3× 1 matrix, while the row y =

[y1 y2 y3 y4

]is a 1× 4 matrix.

One speaks of the row entries of a column and the column entries of a row.


2.2.2 Standard notation for an m× n matrix A

Given that A is an m× n matrix, the standard notation is to write A = [aij ].Here aij is the entry of A in the ith row and jth column. One calls aij the (i, j) entry of A. Thefirst index i labels the rows and has the range i = 1, · · · ,m, while the second index j labels thecolumns and has the range j = 1, · · · , n.For example, if m = 4 and n = 5,

A =

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

(2.6)

is a 4 × 5 matrix. One reads a12 as “a one-two”, not as “a twelve”. Likewise, a34 is “athree-four”, not “a thirty four”. For example, consider the 2× 3 matrix

A =[

2 3 −11 −4 5

]

The (1, 1) entry is a11 = 2. It is the first entry in column 1 and also the first entry in row 1.The (2, 1) entry a21 = 1 is the second row entry of column 1 and the first entry in row 2. Thenumbers a12 = 3, a22 = −4, are the first and second entries of column 2 respectively. Row 2has entries a21 = 1, a22 = −4, a23 = 5 respectively. The (1, 3) entry of A is a13 = −1 the (2, 3)entry is a23 = 5.

It is important to keep in mind that aij is in row i and in column j of A.Sometimes we write A = Am×n to indicate that A is m × n. Thus if A has two rows and

three columns, we may write A = A2×3.

Exercise 22 1. For each matrix A in 2.2.1 write down

(a) each of its entries in the form aij = ...

(b) each of its rows and columns. What notation do you think would be suitable fordenoting row i of a matrixA? What notation do you think would be suitable fordenoting column j of a matrix A?In Chapter 3, 3.1.1, we will develop a systematic notation for these; here you areonly asked to use your imagination.

2. Find a formula for the entries aij in the following 4× 4 matrices:

(a)

A =

1 −1 1 −1−1 1 −1 11 −1 1 −1−1 1 −1 1

(b)

A =

2 −3 4 −5−3 4 −5 64 −5 6 −7−5 6 −7 8

(c) The n× n Hilbert matrix Hn = [hij ] is defined by

hij =1

i + j − 1(i = 1, . . . , n; j = 1, . . . , n)

Write down the matrices H1, H2, H3, H4.

2.2. MATRICES 47

3. In each case describe the essential property of A in as compact a way as possible.

(a)

A =

0 −3 −3 −3−3 0 −3 −3−3 −3 0 −3−3 −3 −3 0

(b)

A =

1 0 0 00 1 0 00 0 1 00 0 0 1

(This matrix is known as the 4× 4 identity matrix).

(c)

A =

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

=

a11 a21 a31 a41

a12 a22 a32 a42

a13 a23 a33 a43

a14 a24 a34 a44

(A matrix A with this property is called symmetric).

2.2.3 Expressing m equations in n unknowns as a matrix equationAx = b

A system of linear equations has a matrix of coefficients, or coefficient matrix. For example,the coefficient matrix of Example 2.1.4 is

1 1 31 3 42 4 7

If we are given m simultaneous equations in n unknowns x1, x2, ... , xn, there is a standardway to express the equations as a single matrix equation.

In the case of Example 2.1.4 this expression is

1 1 31 3 42 4 7

x =

−1−15

(2.7)

The system of Example 2.1.2 (equations (2.1)) has for coefficient matrix

A =

1 1 31 1 4−1 1 −2

The standard corresponding matrix equation is

1 1 31 1 4−1 1 −2

x =

−1−31

(2.8)

What this means is the equality of two column matrices:

x1 + x2 + 3x3

x1 + x2 + 4x3

−x1 + x2 − 2x3

=

−1−31

(2.9)


The system of equations of Example 2.1.3 has the matrix expression

1 1 3 20 2 1 10 0 1 −1

x1

x2

x3

x4

=

−10−2

(2.10)

Using the general 4 × 5 matrix of (2.6), the following matrix equation expresses a generalsystem of 4 equations in 5 unknowns x1, x2, ... , x5:

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

x1

x2

x3

x4

x5

=

b1

b2

b3

b4

(2.11)

Written out, we again have the equality of two column matrices:

a11x1 + a12x2 + a13x3 + a14x4 + a15x5

a21x1 + a22x2 + a23x3 + a24x4 + a25x5

a31x1 + a32x2 + a33x3 + a34x4 + a35x5

a41x1 + a42x2 + a43x3 + a44x4 + a45x5

=

b1

b2

b3

b4

(2.12)

In general, let A be the matrix of coefficients of m simultaneous equations in n unknownsx1, x2, ..., xn. If the right-hand side of equation i is bi, the standard way to express theequations in matrix form is

Ax = b (2.13)

Here the unknowns xj are written as an n× 1 column matrix (a column vector) x and bis the m× 1 column of the bi.

Observe that for the matrix equation (2.13) to be meaningful the number of rows of A mustbe the same as the number of rows of b and the number of columns of A must equal the numberof rows of x.

Exercise 23 1. Write the following systems of equations in matrix form Ax = b

(a)−2x1 + 4x2 = −1

5x1 + 3x2 = 6

(b)7x1 − 3x2 + 2x3 = 8−3x1 + 9x2 − 15x3 = 1

(c)−11x1 + 13x2 − x3 = −221x1 − 5

2x2 + 13x3 = 292

23x1 − 1

3x2 − 59x3 = −2

(d)5x + 2y = −2−7x + 3y = 1

2− 3

2x + 6y = − 16

(e)− 13

3 x1 − 12x2 + x3 − 7x4 = −21

5x1 + 14x2 − 1

7x3 + 9x4 = 0−4x1 + 2

7x2 + 19x3 + 101x4 = 2

18x1 − 7

2x2 − 5x3 + 17x4 = 5

2.2. MATRICES 49

2. Write out in full the simultaneous equations represented by the following matrix equations

(a) [9 0 2−1 3 4

]x =

[817

]

(b) [a bc d

]x =

[ −56

]

(c) −4 3 12−6 −2 −511 − 1

325

x =

pqr

(d)

−8 9 175 3 −232 −1 13−11 3 −5

x =

3−11−2

3. Let

A =

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

, x =

x1

x2

x3

x4

Which of the following are correct?

(a)

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

x1

x2

x3

x4

=

a11x1 a12x2 a13x3 a14x4

a21x1 a22x2 a23x3 a24x4

a31x1 a32x2 a33x3 a34x4

a41x1 a42x2 a43x3 a44x4

= Ax

(b)

Ax =

x1a11 x2a12 x3a13 x4a14

x1a21 x2a22 x3a23 x4a24

x1a31 x2a32 x3a33 x4a34

x1a41 x2a42 x3a43 x4a44

(c) [x1 x2 x3 x4

]A = Ax

(d)

x

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

=

x1a11 x1a12 x1a13 x1a14

x2a21 x2a22 x2a23 x2a24

x3a31 x3a32 x3a33 x3a34

x4a41 x4a42 x4a43 x4a44


2.2.4 The dot product generalized

Consider the equations of the left-hand side of Example 2.1.2 or of equation (2.9). Each entryis a dot product, e.g. the third entry is (−1, 1,−2) · (x1, x2, x3) = −x1 + x2 − 2x3. In the nextchapter we will emphasize this as the product of a row and a column:

[ −1 1 −2]

x1

x2

x3

= −x1 + x2 − 2x3

Likewise, the left-hand sides of the first and second equations are respectively

[1 1 3

]

x1

x2

x3

= x1 + x2 + 3x3

[1 1 4

]

x1

x2

x3

= x1 + x2 + 4x3

Similarly, the left-hand side of the first equation of Example 2.1.3 is the product

[1 1 3 2

]

x1

x2

x3

x4

= x1 + x2 + 3x3 + 2x4

The left-hand side of equation (2.10) in Example 2.1.3 is the column with three products

[1 1 3 2

]x[

0 2 1 1]x[

0 0 1 −1]x

=

x1 + x2 + 3x3 + 2x4

2x2 + x3 + x4

x3 − x4

Remark 24 The last examples show that we are dealing with a product which is a general-ization of the dot product of Chapter 1 (subsection 1.1.12).As remarked earlier, we will later develop special notations for row i and column j of A. Whatis more, we will emphasize Ax as the product of the matrices A and x. (See subsection 3.1.6in Chapter 3).

2.2.5 Detached Coefficients

We can also express the above problems compactly by considering the array formed by attachingthe right- hand side b to the coefficient matrix A in the form [A, b]. Corresponding to theproblem of solving the matrix equation (2.13) we call R = [A, b] the array of detachedcoefficients or the matrix A augmented by b. Operations done on the equations have theirexact counterpart when done on the rows of R.

1. Example 2.1.1 above can be expressed as the matrix equation[

2 31 −4

] [x1

x2

]=

[ −15

](2.14)

or as the array of detached coefficients:

2 3 −11 −4 5 (2.15)

2.2. MATRICES 51

We have omitted brackets around the array. Here x1 and x2 can be thought of as labelsof the first two columns of the array. The third column of (2.15) is the right-hand side of(2.14). It is clear that two equations in two unknowns determine and are determined bysuch an array. Instead of performing operations on the given equations, we perform thesame operations on this array.In the case of Example 2.1.1 these steps are:

2 3 −12 −8 10 2R2 (multiply Row 2 by 2)

2 3 −10 −11 11 −R1 + R2 (add (−1)× Row 1 to Row 2)

From the last array we read off −11x2 = 11, so x2 = −1. The first row reads 2x1 +3x2 =−1 and x1 = 1. This is the same solution we found before.

2. Example 2.1.2 has an expression as the matrix equation (2.8) and as an array of detachedcoefficients the equations are:

1 1 3 −11 1 4 −3−1 1 −2 1

Here x1, x2, and x3 label the first three columns. The steps leading to the solution are:

1 1 3 −10 0 1 −2 −R1 + R2

−1 1 −2 1

1 1 3 −10 0 1 −20 2 1 0 R1 + R3

Finally, we interchange rows 2 and 3:

1 1 3 −10 2 1 0 R2 ↔ R3

0 0 1 −2(2.16)

The equations are in echelon form and we can read off the solution:x3 = −2, 2x2 + x3 = 0, so x2 = 1 and finally from x1 + x2 + 3x3 = −1 we get x1 = 4. Wehave

x1

x2

x3

=

41−2

This is equation (2.3), found earlier.

Observation: It is important to observe that the right-hand side of the original setof equations is a linear combination of the columns of the coefficient matrix withcoefficients x1, x2 and x3:

11−1

4 +

111

+

34−2

(−2) =

−1−31

(2.17)

Linear combinations were introduced for row vectors in Chapter 1, subsection 1.1.18 andwill be systematically developed in Chapter 3. See subsection 3.1.2 and especially 3.1.9.

Note that it is natural to write the coefficients on the right of the column vectors since xappears on the right of A when we express the equations in the form Ax = b.


3. Example 2.1.3 (equations (2.4)) has the matrix expression (2.10). The correspondingarray of detached coefficients is already in echelon form:

1 1 3 2 −10 2 1 1 00 0 1 −1 −2

(2.18)

The general solution can be read off as before by putting x4 = t and solving in turn forx3, x2 and x1:

x =

x1

x2

x3

x4

=

4− 4t1− tt− 2

t

This is just the solution (2.5) found earlier. A particular solution is found by giving theparameter t some value, eg. t = 3. In that case x1 = −8, x2 = −2, x3 = 1 and x4 = 3.As a linear combination we have

100

(−8) +

120

(−2) +

311

+

21−1

3 =

−10−2

(2.19)

4. Example 2.1.4 has two expressions, one as equation (2.7), and also as the array of detachedcoefficients (augmented matrix)

1 1 3 −11 3 4 −12 4 7 5

The steps leading to the solution are:

1 1 3 −10 2 1 0 −R1 + R2

0 2 1 7 −2R1 + R3

Taking it a step further, we obtain the echelon (triangular) form:

1 1 3 −10 2 1 00 0 0 7 −R2 + R3

(2.20)

The last row reads 0x1 + 0x2 + 0x3 = 7, an impossibility. The given equations areinconsistent (incompatible); they do not have a solution.

5. As a detached array, the equations (2.11) appear as

a11 a12 a13 a14 a15 b1

a21 a22 a23 a24 a25 b2

a31 a32 a33 a34 a35 b3

a41 a42 a43 a44 a45 b4

(2.21)

Remark 25 Our convention is that an operation performed on row i is written next to row iafter the operation has been done.

Exercise 26 1. Write each expression as a product of a row and a column: (this can bedone in more than one way).

(a) 5x1 + 2x2 − 6x3

2.2. MATRICES 53

(b) ab + cd

(c) −8x1 − x2 + x3 − 2x4 + 3x5

(d) x1y1 − 7x2y2 + 9x3y3

(e) 2a1b1 + 4a2b2 − 17a3b3 + a4b4 − 13a5b5

2. Evaluate the products

(a)[ −7 5 2

]

24−3

(b)[

8 −1 3 −2]

x1

x2

x3

x4

(c)[

3 −2 5]

2x1

7x2

−x3

(d)[

a b c]

2a−3b−4c

3. Evaluate as a single column

(a)

−234

5 +

6−7−2

2

(b)

−234

5 +

6−7−2

2 +

1−35

(−4)

(c)

−234

x1 +

6−7−2

x2 +

1−35

x3

(d)

−234

x1 +

6−7−2

x2 +

1−35

x3 +

−9113

x3

4. In the following express (i) the column b as a single matrix expression in the form b = Ax.and (ii) each row i of Ax as a products of row i of A and x. (See Chapter 3, subsection3.1.6, where we look at this more thoroughly) ,

(a) b =

−234

5 +

6−7−2

2

(b) .b =

−234

5 +

6−7−2

2 +

1−35

(−4)

(c) b =

−234

x1 +

6−7−2

x2 +

1−35

x3


2.2.6 Permissible operations on equations: Elementary row opera-tions. Equivalent row operations on the augmented matrixR = [A, b]

The row operations which do not alter the solution set of a system Ax = b of m simultaneouslinear equations for n unknowns are:

1. Interchange equations i and j (interchange rows Ri and Rj of the corresponding array ofdetached coefficients). This is indicated symbolically by Ri ↔ Rj .

2. Multiply equation i (multiply row Ri of the corresponding array) by a scalar β 6= 0.

3. For any scalar γ and i 6= j, add γ times equation i to equation j (add γ times row Ri ofthe corresponding array to row Rj).

The above three operations done on an array are called elementary row operations(abbreviated as eros). Note that forming βRi is just like multiplying a vector in 3-space by ascalar, except that now the vector has n + 1 entries. The type (3) operation replaces Rj withγRi + Rj , where addition of rows is addition of vectors.

What is obvious (and assumed at school) is that if x is a simultaneous solution of the mgiven equations before any one of the above eros is performed, then x remains a solution afterthe operation is performed.

To see that the reverse is true, suppose we have just performed the elementary row operationof type (3) and that x is a solution to the new system of equations. Adding −γ times equationi (−γ times row Ri of the augmented array) to equation j (row Rj of the array) in the newsystem of equations (array) brings us back to the system before the type (3) operation wasperformed. This shows that x is also a solution to the old system.

For example, consider example 2.1.2 again and the first step toward a solution:

1 1 3 −11 1 4 −3−1 1 −2 1

1 1 3 −10 0 1 −2 −R1 + R2

−1 1 −2 1

Adding row 1 to row 2 of this array restores the original array:

1 1 3 −11 1 4 −3 R1 + R2

−1 1 −2 1

The following more general setting refers to the array (2.21). Suppose we add γ times Row2 to Row 3, obtaining

a11 a12 a13 a14 a15 b1

a21 a22 a23 a24 a25 b2

γa21 + a31 γa22 + a32 γa23 + a33 γa24 + a34 γa25 + a35 γb2 + b3 γR2 + R3

a41 a42 a43 a44 a45 b4

Then,a11 a12 a13 a14 a15 b1

a21 a22 a23 a24 a25 b2

a31 a32 a33 a34 a35 b3 −γR2 + R3

a41 a42 a43 a44 a45 b4

brings us back to the original array.

2.2. MATRICES 55

Remark 27 For convenience we have been using Ri for row i of an array R. In the nextchapter we will use a slightly different notation for row i of a matrix.

2.2.7 Arrays in row echelon form (row EF): Proper Definition

An array is in row echelon form if

1. All zero rows come after the non-zero rows.

2. For non-zero rows, the first non-zero entry in row i+1 comes later than the first non-zeroentry in row i.

Remark 28 Row echelon forms are also called upper triangular.

2.2.8 How an array gets reduced to row echelon form

At the start, and at any later stage, we arrange the rows so that

(a) all zero rows come after non-zero rows and

(b) for the non-zero rows, the first non-zero entry in row i + 1 does not come earlier thanthe first non-zero entry in row i.

Suppose that only the first r rows are non-zero and in row echelon form. The first non-zeroentry in row r is called the pivot element and row r is the pivot or reference row. Thereference row will be used in the next step.We assume that (a) and (b) hold. If the remaining rows are zero we are finished. Otherwise,multiples of row r are added to rows below it in such a way that all entries below the pivotelement become zeros. The first r + 1 rows are then in echelon form and this remains the caseafter arranging rows r + 1, r + 2,... so that (a) and (b) hold.The process ends when the whole array is in row echelon form and then each non-zero row hasa pivot element.

The columns of the array containing a pivot element are called pivot columns. (This will beused in 2.2.15 below and also in subsection 3.4.13).

2.2.9 Further examples

Example 2.2.1 Reduce to row echelon form

−3 −10 27 5 6

Solution (method 1)

1 103 − 2

3

(− 13R1

)7 5 6

1 103 − 2

3 ref row, pivot = 10 − 55

3323 −7R1 + R2

Remark 29 This illustrates how the pivot element can always be made 1.


Method 2 (Fraction-free solution)

−21 −70 14 7R1 ref row21 15 18 3R2

−21 −70 140 −55 32 R1 + R2

Remark 30 This example illustrates how it is possible to obtain a fraction-free answer whenall entries are integers or even fractions.

Example 2.2.2 Reduce the array of detached coefficients to row echelon form and hence solve

1 0 23 0 5−5 1 −10

x =

−112

Solution:

The augmented matrix (without brackets) is

1 0 2 −1 ref row3 0 5 1−5 1 −10 2

Elementary row operations (eros) bringing the detached array into echelon form are:

1 0 2 −10 0 −1 4 −3R1 + R2

0 1 0 −3 5R1 + R3

then1 0 2 −10 1 0 −3 R2 ↔ R3

0 0 −1 4

From this row echelon form we find the unique solution

x =

7−3−4

Example 2.2.3 Triangulate and hence solve

3 −3 1 102 −3 −1 3−1 2 0 31 −1 1 2

x1

x2

x3

x4

=

20−64

(2.22)

Solution:

The initial array of detached coefficients is

3 −3 1 10 22 −3 −1 3 0−1 2 0 3 −61 −1 1 2 4

The steps giving equivalent systems are:Step 1:

1 −1 1 2 4 R1 ↔ R4 ref row2 −3 −1 3 0−1 2 0 3 −63 −3 1 10 2

2.2. MATRICES 57

We have interchanged rows 1 and 4 as it is convenient to have a 1 as pivot element.Step 2

1 −1 1 2 40 −1 −3 −1 −8 −2R1 + R2 new ref row0 1 1 5 −2 R1 + R3

0 0 −2 4 10 (−3) R1 + R4

At this point the first two rows are in row echelon form (row EF).Step 3. Use row 2 for reference row to obtain

1 −1 1 2 40 −1 −3 −1 −80 0 −2 4 −10 R2 + R3 new ref row0 0 −2 4 −10

Step 4. Row 3 is the new reference row:

1 −1 1 2 40 −1 −3 −1 −80 0 −2 4 −100 0 0 0 0 −R3 + R4

This is in echelon form but we can simplify still further:

1 −1 1 2 40 1 3 1 8 −R2

0 0 1 −2 5 1−2R3

0 0 0 0 0

(2.23)

The required general solution to Example 2.2.3 is with x4 = t the parameter, x3 = 2t + 5,x2 = −7t− 7 and x1 = −11t− 8. Or,

x1

x2

x3

x4

=

−11t− 8−7t− 72t + 5

t

(2.24)

As a linear combination of the columns of the coefficient matrix,

32−11

(−11t− 8) +

−3−32−1

(−7t− 7) +

1−101

(2t + 5) +

10332

t =

20−64

(2.25)

Remark 31 Each column vector has 4 entries instead of 3 and so we cannot visualize thesevectors in space. Nevertheless, this should not be a problem. In Chapter 3 we will consider n−dimensional vectors. See subsections 3.1.2 3.1.9.

Remark 32 Although we associate arrays with simultaneous equations, it is important to re-alize that such operations can be done on any arrays, i.e. matrices, without any reference tosolving equations in the usual sense. Note however the following:Reducing a matrix A to to row echelon form amounts to reducing the augmented matrix [A, 0]to row echelon form. This in turn amounts to solving Ax = 0.

Example 2.2.4 Reduce the following matrix to row echelon form:


A =

0 0 −3 −3 −12 −2 −2 −3 −20 0 3 −2 −22 −3 −4 3 −3

Solution:

In order for (b) in 2.2.8 to hold, first interchange rows 1 and 4 of A:

2 −3 −4 3 −3 R1 ↔ R4 ref row2 −2 −2 −3 −20 0 3 −2 −20 0 −3 −3 −1

Next:

2 −3 −4 3 −30 1 2 −6 1 −R1 + R2

0 0 3 −2 −2 new ref row0 0 −3 −3 −1

Finally,

2 −3 −4 3 −30 1 2 −6 10 0 3 −2 −20 0 0 −5 −3 R3 + R4

(2.26)

The matrix is now in row echelon form.

Exercise 33 1. Express the following simultaneous equations in the form Ax = b and useGauss-reduction and detached coefficients to find solutions. Display the final array in rowechelon form. Also, express b as a linear combination of the columns of A whenever thisis possible.

(a) i.x1 − 2x2 = −13x1 − x2 = 1

ii.x − 2y = −32x − 4y = −1

iii.x − 2y = −32x − 4y = −6

iv.x1 − 2x2 = −33x1 − x2 = 14x1 − 3x2 = 5

v.x − 2y = −32x + y = 43x − y = 1

vi.0x + 0y = 0

2.2. MATRICES 59

vii.3x + 0y = −3

(b) i.x1 − 2x2 − x3 = 0

3x2 + x3 = 1

ii.x1 + x2 + x3 − x4 = 1

2x2 − x3 + 4x4 = −1x4 = −2

iii.x1 + 2x2 + x3 = −1

iv.x1 + 2x2 + x3 − 5x4 = 0

2. Solve the equations of No.(12) in Exercise 15.

3. (a) In solving simultaneous equations, if we drop an equation the resulting solution setcontains and is usually larger than the original solution set. Illustrate this with anexample.

(b) Rule 3 of permissable row operations in 2.2.6 says: For any scalar γ and i 6= j, addγ times row Ri of the array to row Rj. Why must we have i 6= j?

(c) In an array with rows R1, . . . , Rm, let i 6= j. Suppose that at least one of the numbersα, β is not zero and form the row αRi + βRj. Under what conditions can this newrow replace row i or row j? Describe what is going on in terms of the strict rules of2.2.6.

4. * (Simplified Elementary Row Operations). Consider the following simplified set of twoelementary row operations that can be applied to an array. Show that each row operationin 2.2.6 can be obtained by applying a sequence of these two operations. In other words,the two row operations are equivalent to those in 2.2.6.

(a) Multiply row Ri of the array by a scalar β 6= 0.

(b) For i 6= j add row Ri to row Rj.

Hint 34 In order to add αRi to Rj when i 6= j, we may obviously assume α 6= 0. Per-form (a) with β = α then do (b), i.e. add the resulting row i to row j. What must bedone next?It still remains to show that we can interchange rows Ri and Rj of the array using op-erations like (a) and (b). Try adding row i to row j, then changing the sign of rowi.

2.2.10 Homogeneous Equations

A system Ax = b of m equations in n unknowns is called homogeneous if in the right-handside all bi = 0. Such a system always has at least one solution, namely x = 0, the column vectorwith all n entries xj = 0. This is known as the trivial solution. The existence of non-trivialsolutions is the subject of subsection 2.2.15.Every system defines a corresponding homogeneous system.


Example 2.2.5 Consider the following homogeneous system of equations:

3 −3 1 102 −3 −1 3−1 2 0 31 −1 1 2

x1

x2

x3

x4

=

0000

(2.27)

This is the homogeneous system corresponding to Example 2.2.3. We found the row echelonform of the detached coefficients for that example as (2.23). The row EF for the homogeneoussystem (2.27) is therefore

1 −1 1 2 00 1 3 1 00 0 1 −2 00 0 0 0 0

The general solution of (2.27) is with x4 = t ∈ < any real number,

x1

x2

x3

x4

=

−11t−7t2tt

(2.28)

Remark 35 As remarked earlier, when solving a homogeneous system Ax = 0, the column 0can simply be ignored in the array of detached coefficients.

Exercise 36 In Question 1 we also give the answers to the row-reduced echelon forms askedfor in No.1 of Exercise 41.

1. Find a row echelon form of the following matrices:

(a)

A =

1 2 −3 −1−3 4 1 2−2 6 −2 14 −2 −4 −3

(b)

B =

1 −3 −2 42 4 6 −2−3 1 −2 −4−1 2 1 −3

(c) How are the matrices A and B related?

2. Let:

A =

1 1 2 1 −32 1 5 2 −70 −1 3 5 −4−2 1 −3 9 3

(a) Reduce A to echelon form.

(b) Solve the homogeneous system Ax = 0. Express 0 as a linear combination of thecolumns of A in a non-trivial way.

Hint 37 For coefficients take the values of xi in a homogeneous solution that arenot all zero. (The trivial linear combination is the one with all coefficients 0.)

2.2. MATRICES 61

Answer:

x =

t−t−3t0−2t

Anticipating the notation of Chapter 3, let A•j represent column j of A. Then lettingt = 1,

A•1 (1) + A•2 (−1) + A•3 (−3) + A•5 (2) = 0

The columns of the above A are are said to be linearly dependent since one columncan be expressed as a linear combination of the others.

Linear independence and dependence will be studied formally in Chapter 3, section3.4. See also the theorem 2.2.15 below.

3. (a) Reduce the array of detached coefficients to echelon form and solve

1 −1 1 22 −2 −1 3−1 1 0 31 −2 2 5

x =

40−65

(b) Show that whatever the RHS b may be, the system always has a unique solution.

4. (a) Solve for x:

1 1 2 12 1 5 20 −1 3 53 1 10 8

x =

−3−7−4−14

(b) Solve the corresponding homogeneous system for x:

1 1 2 12 1 5 20 −1 3 53 1 10 8

x =

0000

5. (a) Solve for x: [1 −1 1 22 −2 −1 3

]x =

[40

]

(This consists of the first two equations of No.(3).

(b) Solve the homogeneous system for x.

6. (a) What are the solutions x to

3 1 1 42 5 1 71 9 1 10

x =

−1−12

?

(b) Solve the corresponding homogeneous system.

7. If you have one solution to 4a and the general solution of 4b, show that you can writedown the general solution of 4a.


8. The same for 5a and 5b.

9. Solve the following system for x and relate your solution to that of its homogeneouscounterpart.

2 1 3 5−2 −1 −1 −24 2 7 116 3 10 16

x =

−10−3−4

10. Can you state a general theorem of which Nos.(7), (8) and (9) are special cases?Suppose that you know (any) one particular solution to Example 2.2.3 (matrix equation(2.22)) and add to this the general solution (2.28) of the homogeneous Example 2.2.5.What is the result? Consider equation (2.24). We will return to this in chapter 3. (SeeExercise 55, No.4).

11. (a) Solve the system for x, y, z:

1 1 −1−2 3 4−1 4 14 −1 −2

xyz

=

−37−22

(b) Find the homogeneous solution to (a).

12. Reduce to row EF

2 1 3 5−2 −1 −1 −24 2 7 116 3 10 16

13. * (Allenby, p.30) For any a, b, c and d discuss the possible solutions x of

1 2 −1 3 −13 4 2 5 −21 0 4 −1 03 2 7 1 −1

x =

abcd

Find also the corresponding homogeneous solution.

14. For which scalars α and β is the system

1 2 −33 −1 21 −5 8

x =

αβ−α

consistent? (recall that a system of equations is consistent (or compatible) if it has atleast one solution). If so, find the general solution x.

15. Show that the system

1 1 λ3 4 22 3 −1

x =

2λ1

is consistent for any scalar λ. Find those values of λ for which

i the system has a unique solution x and

ii more than one solution x.

2.2. MATRICES 63

16. Let Ax = b represent a system of m equations for n unknowns. Which of the following isa true statement? Give reasons, in particular, a counterexample if a statement is false.(Usually the simplest counterexample is the best one.)

(a) The system always has a solution x if m < n.

(b) The system always has a solution x if m = n.

(c) The system never has a solution x if m > n (more equations than unknowns).

(d) The system always has a solution if it is homogeneous.

(e) The system always has a non-trivial solution if it is homogeneous.

2.2.11 Row Reduced Echelon Form (row REF)

A matrix A = [aij ] is said to be in row reduced echelon form (row REF) if

1. It is in row echelon form.

2. If aij (the pivot entry) is the first non-zero entry in row i, then

(a) aij = 1 and

(b) all other entries in column j are zeros.

2.2.12 Some examples in row reduced echelon form

1 −3 0 00 0 1 00 0 0 1

,

0 1 0 50 0 1 60 0 0 00 0 0 0

,

1 3 −5 0 0 70 0 0 1 0 −20 0 0 0 1 3

2.2.13 Any matrix can be brought into row reduced echelon form bysuitable elementary row operations

Let A be in row echelon form. Suppose that A has r non-zero rows. We first divide eachnon-zero row by its leading (first non-zero) entry. So we may suppose that the leading entry ofrow r is arj = 1. Use row r for reference row and arj as a pivot element to reduce all entriesabove it to zeros. Now repeat the process with row r− 1 etc., the last reference row being row2. The resulting array is in row reduced echelon form (row REF).

Note that in converting a row EF to a row REF all the pivot elements remain in the samepositions but become 1s.The process is known as the Gauss-Jordan method.

Remark 38 Although a matrix does not in general have a unique row echelon form (whichones do?), it can be shown that the row reduced echelon form is unique.

2.2.14 Some examples of Gauss-Jordan reduction

Example 2.2.6 For our first example let us reduce the array of Example 2.2.1 to row reducedechelon form. We found the row EF

1 103 − 2

30 − 55

3323


Step 1:

1 103 − 2

30 1 − 32

55

(− 355

)R2 ref row

Step 2:1 0 14

11 − 103 R2 + R1

0 1 − 3255

Example 2.2.7 As a second example consider Example 2.1.2. Its row echelon form was foundin (2.16):

1 1 3 −10 2 1 00 0 1 −2

The following steps bring it into row REF.Step 1:

1 1 3 −10 1 1

2 0 12R2

0 0 1 −2 ref row, pivot a33 = 1

Step 2:1 1 0 5 −3R3 + R1

0 1 0 1 − 12R3 + R2 new ref row, pivot a22 = 1

0 0 1 −2

Step 31 0 0 4 −R2 + R1

0 1 0 10 0 1 −2

We read off the same solution (2.3) as before, only that it is easier to do.

Example 2.2.8 Consider Example 2.1.3 again. From (2.18) we already have the row EF ofdetached coefficients:

1 1 3 2 −10 2 1 1 00 0 1 −1 −2

The following steps bring it into row REF.Step 1

1 1 3 2 −10 1 1

212 0 1

2R2

0 0 1 −1 −2 ref row, pivot a33 = 1

Step 21 1 0 5 5 −3R3 + R1

0 1 0 1 1 − 12R3 + R2 new ref row, pivot a22 = 1

0 0 1 −1 −2

Step 31 0 0 4 4 −R2 + R1

0 1 0 1 10 0 1 −1 −2

Reading off the solution gives (unsurprisingly) the same solution (2.5) that we found before.

2.2. MATRICES 65

Example 2.2.9 Example 2.1.4 (equation (2.7)) has a row echelon form (2.20):

1 1 3 −10 2 1 00 0 0 7

The following steps bring it into row REF

1 1 3 −10 1 1

2 0 12R2

0 0 0 1 17R3

1 1 3 0 R3 + R1

0 1 12 0

0 0 0 1

1 0 52 0 −R2 + R1

0 1 12 0

0 0 0 1(2.29)

Example 2.2.10 As a final example, consider Example 2.2.3. We found the row echelon form(2.23):

1 −1 1 2 40 1 3 1 80 0 1 −2 5 ref row0 0 0 0 0

It is now two small steps to getting the reduced form:Step 1

1 −1 0 4 −1 −R3 + R1

0 1 0 7 −7 −3R3 + R2 ref row0 0 1 −2 50 0 0 0 0

Step 21 0 0 11 −8 R2 + R1

0 1 0 7 −70 0 1 −2 50 0 0 0 0

The array of detached coefficients of Example 2.2.3 therefore has row REF

1 0 0 11 −80 1 0 7 −70 0 1 −2 50 0 0 0 0

The required solution to Example 2.2.3 (matrix equation (2.22)) is (unsurprisingly) as before.We note that the row REF of the coefficient matrix of the same example is

1 0 0 110 1 0 70 0 1 −20 0 0 0

(2.30)

The homogeneous solution (2.28) to Example 2.2.5 can again be read off this array.


2.2.15 A homogeneous system with more unknowns than equationsalways has a non-trivial solution

More precisely, we will prove the following

Theorem: Let Ax = 0 be a homogeneous system of m equations in n unknowns and supposethat the row echelon form of A has r < n non-zero rows.Then there are solutions x in which the xj corresponding to the k = n− r non-pivot columnscan have arbitrary values. In other words, the solution set has k parameters, each of whichcan be arbitrary.

Proof :Consider the row REF A′′ of A. The matrix A′′ has r non-zero rows and r pivot columns.Thus there are k = n − r non-pivot columns. By the nature of the row REF, the k variablesxj corresponding to these columns can be given any values, and we can then solve uniquely forthe remaining r variables. (See the illustrative examples below.)

Corollary:A homogeneous system with more unknowns than equations always has a non-trivial solution.ProofWith the above notation, we have m < n and since the number r of non-zero rows in a rowechelon form satisfies r ≤ m, we have r < n.

Note 39 In Corollary ?? of section ?? in chapter 3 there is an alternative approach to thetopic.

Some illustrative examples

1. Suppose that the row REF of a matrix A is

A′′ =

1 3 −5 0 0 70 0 0 1 0 −20 0 0 0 1 3

Here the non-pivot columns are columns 2, 3 and 6. The general solution to Ax = 0 is

x =

−3x2 + 5x3 − 7x6

x2

x3

2x6

−3x6

x6

=

−3t1 + 5t2 − 7t3t1t22t3−3t3t3

We have renamed the parameters as t1 = x2, t2 = x3 and t3 = x6.

2. As a second example consider Example 2.2.5 (equation 2.27) which in REF reads (seearray (2.30)):

1 0 0 110 1 0 70 0 1 −20 0 0 0

x =

0000

This shows that we are actually dealing with 3 equations for 4 unknowns. Column 4 ofthe coefficient matrix is the only non-pivot column. Hence x4 = t can have any value andwe get the same solution as before (equation (2.28).

2.3. EXISTENCE THEOREM FOR SOLUTIONS TO LINEAR EQUATIONS 67

2.3 Existence Theorem for solutions to Linear Equations

All our examples, including Example 2.1.4, have led to the following result:

Basic Existence Theorem for solutions to Ax = b

Let the system Ax = b of m equations for n unknowns have echelon form A′x = b′, where theaugmented array

[A′, b′

]has r non-zero rows. Then the system has at least one solution if, and

only if, row r of [A′, b′] is not of type[

0 . . . 0 b′r], where b′r 6= 0.

Proof :Obviously, in the case that row r of

[A′, b′

]has the form

[0 . . . 0 b′r

], where b′r 6= 0, the

system A′x = b′ (and so Ax = b) can have no solution x.Conversely, if this is not the case, all the pivot columns of the REF

[A′′, b′′

]are columns of

A′′. So, if pivot column s of A′′ has the pivot entry 1 in row i, let xs = b′′i and put xj = 0 fornon-pivot columns j of A′′. The resulting column-vector x is then a solution to A′′x = b′′ andso to Ax = b.

Remark 40 This theorem will be used to prove the basic theorem on linear independence inChapter 3.

Exercise 41 1. Find the row reduced echelon form of the matrices and arrays of detachedcoefficients in Exercise (36), numbers 1a, 1b, 2, 3, 4, 9, 11 , 12 and 13. Solve (once more)the problems involving systems of equations Ax = b using the row REF. Write down therow REF of the corresponding coefficient matrix A. Illustrate the above theorem 2.2.15in the homogeneous cases.

2. Let matrices A and B have the same number of rows. Let [A, B] be the matrix formed byattaching the columns of B to A. Suppose that [A, B] has been reduced to row (reduced)echelon form [A′, B′]. Show that A′ is then also in row (reduced) echelon form.

Hint 42 This should be clear from the observation that a zero row of A′ comes after anon-zero row of A′.

3. Consider the general problem of solving two linear equations for two unknowns x1 and x2:[

a11 a12

a21 a22

] [x1

x2

]=

[b1

b2

](2.31)

(a) In (2.31) all the entries except x1 and x2 are supposed known. Using formal Gauss-reduction find formulae for the unknowns x1 and x2 in terms of the other symbols.For the purposes of reduction you may assume any number you like is non-zero.In fact, conclude that if only d = a11a22 − a21a12 6= 0, then you have a formula forx. Can you see that the solution must be unique?

Remark 43 The number a11a22 − a21a12 is called the determinant of the matrixA = [aij ]. and is denoted by |A|. For example,

∣∣∣∣3 −27 4

∣∣∣∣ = (3) (4)− (7) (−2) = 26

We will study determinants in some detail in Chapter 4.

Hint 44 Consider the array of detached coefficients and, assuming a11 6= 0, the step

a11 a12 b1

0 −a21a12a11

+ a22 −a21b1a11

+ b2 −a21a11

R1 + R2(2.32)


Now assume−a21a12

a11+ a22 6= 0

and solve for x2. Next, find x1.

(b) Use your formula to solve for x:[ −5 11

3 −7

]x =

[ −213

]



2.4.1 Matrix equation of simultaneous equations

Given a system of m equations for n unknowns, this is expressed compactly by a matrix equation(See section 2.2).

Ax = b

where A is the m× n coefficient matrix, b is a column with m entries and x is a column withn entries.

Entry i of the column matrix Ax is the ‘dot’ product of row i of A and the column x (Seesubsection 2.2.3 and equation 2.13):

[ai1 ai2 · · · ain

]

x1

x2

...xn

=

[ai1x1 + · · · + ainxn

]

2.4.2 Detached Coefficients

(See subsection 2.2.5).The other way to express the system of equations Ax = b is as the augmented matrix, or arrayof detached coefficients

C = [A, b]

where the last column of R is understood to be the column b of constants.

2.4.3 Elementary row operations (eros). Row echelon and row re-duced echelon form

(See subsections 2.2.6, 2.2.8 and 2.2.11).

Elementary row operations (eros), known as Gauss reduction, are done on the array Rreducing it to row echelon form (row EF):

C ′ =[A′, b′

]

Further eros (Gauss-Jordan) done on C ′ reduce it to row reduced echelon form (row REF):

C ′′ =[A′′, b′′

]

The important fact about elementary row operations on a system of equations (or its detachedarray of coefficients) is that they leave the solution set unchanged. For any particular xthe equation Ax = b is satisfied if, and only if, the equation A′x = b′ is satisfied if, and only if,the equation A′′x = b′′ is satisfied.

2.4.4 Homogeneous equations

(See subsection 2.2.10).

A homogeneous system of equations has b = 0 and consequently b′ = b′′ = 0, since anelementary row operation done on 0 leaves it as 0.

A′′x = 0 holds.

A system of homogeneous

Chapter 3

Linear Transformations andMatrices

In this chapter we will learn how the solution to a set of linear equations is the inverseoperation of the interpretation of a matrix A as a transformation or mapping. In thissense A transforms a vector x into another vector y = Ax. Examples in ordinary space areprojections on planes and lines through the origin, reflections in such objects and rotationsabout an axis through O (important for mechanics).Next, we deal with the difficult idea of linear independence, and it is followed by a sectionon subspaces that can be regarded as an introduction to the idea of a general ‘vector space’.Finally, there is a section on inverses of matrices.

3.1 The Algebra of Vectors and Matrices

We begin by developing some systematic notation for m× n matrices, their rows and columnsand for column and row vectors in general.

3.1.1 Systematic Notation For Matrices

1. Recall from section 2.2 that an m × n matrix is a rectangular A with m rows andn columns. In standard notation, A = [aij ] where the i − j entry aij ∈ < is at theintersection of row i and column j.The set of m× n matrices is denoted by <m×n. Hence A ∈ <m×n means that A has mrows and n columns and that the size of A is m× n. Some authors write A = Am×n toindicate the size of A.

Alternative notation If A = [aij ], we will also find it most convenient to use thenotation

[A]ij = aij (3.1)

to describe the i− j entry aij of A.

An n× n matrix is called square of size n. Such matrices deserve to be studied in theirown right (see section 3.7).

2. From the definitions it follows that <n×1 is the set of columns with n entries. We call

71

72 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES

these column vectors and write

b =

b1

b2

...bn

(3.2)

<1×n denotes the set of row vectors with n entries:

a =[

a1 a2 ... an

](3.3)

For typographical reasons this row vector is sometimes written [a1, ..., an]. It is traditionalto use <n for both <n×1 (column vectors) and <1×n (row vectors). We will only use thenotation <n if the context makes it quite clear whether rows or columns are meant.

3. The m × n zero matrix has 0 for all its entries and is written Om×n, or simply as Owhen the size is understood.

4. Addition of matricesLet A = [aij ] and B = [bij ] be of the same m× n size. Their sum A + B = C where the(i, j)-entry of C is cij = aij + bij . In brief, using the notation or equation (3.1),

[A + B]ij = [A]ij + [B]ij (3.4)

It is essential to note that the addition A+B makes sense only if A and B have the samesize.

5. Multiplication of a matrix by a scalar.Let A = [aij ] and s ∈ <. The product sA = As has the number saij in the (i, j)−position:

[sA]ij = [As]ij = s [A]ij = [A]ij s

Remark 45 Notice that it makes no difference if we write the scalar s on the left or onthe right of A. The matrix −A is defined as (−1)A and has for (i, j) entry the number−aij i.e., [(−1)A]ij = − [A]ij. The difference A − B of two matrices of the same sizemeans the same thing as A + (−B).

We will see that if a is a row vector (3.3) then it is more natural to write sa =[

sa1 sa2 ... san

]rather than as.On the other hand, for a column b, as in equation (3.2), it is better to write bs. See, forexample, Equation (3.23) below.

6. Row i of the m× n matrix A = [aij ] is denoted by Ai•. Hence,

Ai• =[

ai1 ai2 .... ain

](i = 1, · · · , m) (3.5)

7. Column j of the m× n matrix A = [aij ] is denoted by A•j . Hence,

A•j =

a1j

a2j

...amj

(j = 1, · · · , n) (3.6)

Remark 46 The symbols for row i and column j are consistent with our other notation.Thus [A]ij = aij is at the intersection of row Ai• and column A•j.

Other notations in use for the ith row of A, are Ai and [A]i, and for the jth column ofA, A(j) or [A]j. However, we will not use these.

3.1. THE ALGEBRA OF VECTORS AND MATRICES 73

8. The unit column vectors ej ∈ <n×1 (j = 1, ..., n) have a 1 in position j and zeroselsewhere:

ej =

0...1...0

row j (3.7)

9. The n×n Identity matrix is the n×n matrix with columns e1, ... ,en (in that order).It is denoted by In (or simply I if n is understood):

In =

1 0... 0

0 1... 0

......

......

0 0... 1

(3.8)

As we will see, In behaves like the number 1 when we multiply matrices. (See Exercise65 No.6). We note that, writing I = In, by definition I•j = ej for j = 1, · · · , n. The unitrow vectors are

Ii• =[

0 · · · 1 (position i) · · · 0]

(i = 1, · · · , n)

The identity I = In of (3.8) is equivalently defined by

[I]ij = 1 if i = j and [I]ij = 0 if i 6= j (i, j = 1, · · · , n)

10. The transpose of the m × n matrix A = [aij ] is denoted by AT (“A transpose”). Thetranspose AT has for its (i, j) entry the number aji. Thus AT is an n ×m matrix andcan also be vaguely defined as ‘the matrix A with its rows and columns interchanged’. Infact, using the notation (3.1),

[AT

]ji

= [A]ij (i = 1, ..., n; j = 1, 2, ...,m) (3.9)

11. The product a b of the row a with n entries and the column b with n of entries is

a b =[

a1 a2 ... an

]

b1

b1

...bn

= a1b1 + a2b2 + · · ·+ anbn (3.10)

Remark 47 This was introduced in subsection 2.2.3 of Chapter 2. You can, if you wish,call this a ‘dot product’ since it is a generalization of the dot product Equation (1.4) ofChapter 1, but there is no need here for the ‘dot’. Note the order in a b: first a, then b,This is important as ba will turn out to be something quite different: in fact, an n × nmatrix.(See No.4 of Exercise 65).

3.1.2 Linear combinations of vectors, linear dependency, Span

1. The column vector v ∈ <n×1 is a linear combination of the vectors u1, u2,...,uk withcoefficients s1, s2, ..., sk if

v = u1s1 + u2s2 + · · ·+ uksk (3.11)


We then say that v depends linearly on u1, u2,. . . ,uk. Of course, we can also writev = s1u1 + s2u2 + · · ·+ skuk, with the coefficients on the left, but, as remarked, we preferto write the coefficients of columns on the right. If v and u1, u2,. . . ,uk are row vectorsall of the same size our preference would be

v = s1u1 + s2u2 + · · ·+ skuk

2. The idea of ‘span’ was introduced in Exercise 2, No.16.The span of (column or row) vectors u1, u2, . . . , uk is the set of all linear combinationsof these vectors. It is also called the space spanned by the vectors and is denoted bysp(u1, . . . , uk). Thus,

v ∈ sp(u1, u2, . . . , uk)

if, and only if, (3.11) holds for certain coefficients s1, . . . , sk.

The idea of ‘span’ first came up in No.16 of Exercise 2.

The column space of a matrix A is the span of its columns; its row space is the spacespanned by its rows.

3. Linear combinations of matrices

We wish to extend the definition of ‘linear combination’ to matrices: Let A1, A2, ..Ak bek matrices, all of the same size and suppose that s1, s2, ..., sk are k scalars. The linearcombination of A1, A2, ..,Ak with coefficients s1, s2, ..., sk is

s1A1 + s2A2 + ... + skAk

3.1.3 Some examples

1.[ −1 5

4 132 −4 −7

]∈ <2×3,

−4 55 2

311 7

2

∈ <3×2,

[0 00 0

]∈ <2×2

2.[ − 3

725

]∈ <2×1,

53− 2

96

∈ <3×1

3.

O23 =[

0 0 00 0 0

], O22 =

[0 00 0

]

4. [ −1 54 13

2 −4 −7

]+

[2 − 7

4 −7−3 5

2 −7

]=

[1 − 1

2 6−1 − 3

2 −14

]

5. [ −1 54 13

2 −4 −7

](100) = (100)

[ −1 54 13

2 −4 −7

]=

[ −100 125 1300200 −400 −700

]

6. (a) Let

A =

−1 2 − 2

354 −4 513 −7 3

8

(3.12)

ThenA1• =

[ −1 2 − 23

]

A3• =[

13 −7 38

]


(b) Let

X =

72− 3

29− 5

6

thenX2• =

[ − 32

]

X4• =[ − 5

6

]

Remark 48 We usually use an ‘underline’ notation like (3.2) for column vectors butwill occasionally write X2 = X2• = − 3

2 if it is understood that X is a column matrix(i.e. a column vector). Similar remarks apply to row vectors (i.e. row matrices).

7. (a) If A is equation (3.12) above then

A•2 =

2−4−7

A•1 =

−15413

(b) IfY =

[5 − 7

2 −9 10 −23]

Y•2 =[−7

2

]

Y•5 = [−23]

8. (a) The vectors (3× 1 column matrices) e1, e2 and e3 in <3×1 are

e1 =

100

, e2 =

010

, e3 =

001

We met these as rows in Chapter 1, 1.1.17, calling them i, j, and k respectively.

(b) In <4×1

e1 =

1000

, e2 =

0100

, e3 =

0010

, e4 =

0001

9. We have

I3 =

1 0 00 1 00 0 1

and

I4 =

1 0 0 00 1 0 00 0 1 00 0 0 1


10. (a) Let

A =

−5 0 27 6 −112 −1 3

− 113 9 0

then A ∈ <4×3, while the transpose AT ∈ <3×4 and

AT =

−5 7 1

2 − 113

0 6 −1 92 −1 3 0

(b) The transpose of the column vector (3.2) is the row

bT =[

b1 b2 ... bn

]

and the transpose of the row vector (3.5) of A is a column vector.

(c) [Ii•]T = I•i and [I•i]

T = Ii• and IT = I, where I = In.

11. (a) An example of the product a b of a row a and a column b is

[ −3 5 −7 4]

23−29

= (−3) (2) + (5) (3) + (−7) (−2) + (4) (9) = 59

(b) Another, using a general 2×3 matrix A = [aij ], and a general 3×4 matrix B = [bij ]is

Ai•B•j = ai1b1j + ai2b2j + ai3b3j (i = 1, 2; j = 1, 2, 3, 4)

Note that such a product is only possible when the number of columns of A equalsthe number of rows of B. The significance of such products will be seen in subsection3.3.4 below.

Referring to linear combinations in 3.1.2 we have:

1. (a) In <4×1

−512−323

(−6) +

72− 3

29− 5

6

(3) +

112−4−1

(−1

2

)=

35− 17

247−6

(b) In <1×5

(−2)[

p q 12 −3 s

]+ 4

[ −1 r + 2 − 14

52 1

]

=[ −2p− 4 −2q + 4r + 8 −2 16 −2s + 4

]

2. The span of the rows of I3 is clearly <1×3.Consider the column space of matrix

A =

1 1 1 10 1 1 10 0 1 10 0 0 1


The span of the columns of A is the whole of <4×1, since for any choices of b1, b2, b3 andb4,

b1

b2

b3

b4

=

1000

(b1 − b2) +

1100

(b2 − b3) +

1110

(b3 − b4) +

1111

b4

See also subsections 3.1.6 and 3.1.9 below.

3. In <2×2

(7)[ −1 2

3 6

]+ (−3)

[0 8−2 5

]=

[ −1 23 6

](7) +

[0 8−2 5

](−3)

=[ −7 −10

27 27

]

4. The row space of[ −1 3 1

2 −4 3

]consists of all row vectors of the form α

[ −1 3 1]+

β[

2 −4 3]

where α and β are arbitrary scalars.

3.1.4 Some basic algebraic properties

For all m× n matrices A, B and C and scalars s and t,

1. A + (B + C) = (A + B) + C (associative law for addition).

2. A + B = B + A (commutative law for addition).

3. O + A = A (O = Om×n behaves like the number 0).

4. A + (−A) = O.

5. sA = As (by definition).

6. s (A + B) = sA + sB.

7. (s + t)A = sA + tA.

8. s (tA) = (st)A.

9. 1A = A.

10. Every vector x ∈ <n×1 can be uniquely written as a linear combination

x = e1x1 + e2x2 + ... + enxn

(Compare 1.1.17 in Chapter 1). Hence it is trivial that every x depends linearly on theunit vectors e1, e2,..., en, in other words that these unit vectors span <n×1. However,there are many other sets of vectors that have a similar property, as we shall see. InChapter 1, No.17 of Exercise 2 we indicated that any three mutually orthogonal vectorsin <3 span the whole of space. You are asked to show this rigorously in Exercise 78, No.9.

11.(AT

)T = A for any matrix A.

12. Referring to the product (3.10),(a b)T = bT aT (3.13)

13. (a) Let u and v be column vectors with n entries and a a row with n entries. Then

a (u + v) = a u + a v (3.14)


(b) If s is a scalar,a (us) = (a u) s = (sa) u (3.15)

(c) Equations (3.14) and (3.15) combine so that for all scalars s and t:

a (us + vt) = (a u) s + (a v) t (3.16)

(d) More generally, Equation (3.16) can be generalized as follows: Suppose u1, u2, ..,uk

are column vectors in <n×1 and s1, s2, ..,.sk are scalars and a ∈ <1×n is a rowvector. Then

a (u1s1 + u2s2 + ... + uksk) = (a u1) s1 + (a u2) s2 + ... + (a uk) sk (3.17)

14. (a) Quite similarly, if a and b are row vectors with n entries and u is a column with nentries,

(a + b) u = a u + b u (3.18)

(b) and for scalars t,(ta) u = t (a u) (3.19)

(c) Equations (3.18) and (3.19) combine so that for all scalars s and t:

(sa + tb) u = s (a u) + t (b u) (3.20)

(d) The result (3.20) can be generalized as follows: Suppose a1, a2, ..,ak are rows in<1×n and t1, t2, ..,.tk are scalars and u ∈ <n×1 is a column, then

(t1a1 + t2a2 + ... + tkak) u = t1 (a1 u) + t2 (a2 u) + ... + tk (ak u) (3.21)

3.1.5 Some proofs

The above properties and their proofs are similar to those for vectors discussed in Chapter 1,subsections 1.1.6 and 1.1.9, equation (1.8). See No.7 of Exercise 49 below.We prove a few properties as illustrations.

Proof of property (6) that s (A + B) = sA + sB.

The (i, j) entry in s (A + B) is s (aij + bij). But s (aij + bij) = saij + sbij , which is the(i, j)− entry in sA + sB. Since s (A + B) and sA + sB have the same entries, they are equal.

Using the alternate notation (equation 3.1,

[s (A + B)]ij = s [A + B]ij = s([A]ij + [B]ij

)

= s [A]ij + s [B]ij = [sA]ij + [sB]ij = [sA + sB]ij

Proof of property (11)[(

AT)T

]ij

=[AT

]ji

= [A]ij

Proof of equation (3.14) With obvious notation, entry i in the column u + v is ui + vi.Hence,

a (u + v) = a1 (u1 + v1) + · · ·+ an (un + vn)= (a1u1 + a1v1) + · · ·+ (anun + anvn)= (a1u1 + · · ·+ anun) + (a1v1 + · · ·+ anvn)= a u + a v


Or, using∑

- notation,

a (u + v) =n∑

i=1

ai (ui + vi) =n∑

i=1

(aiui + aivi)

=n∑

i=1

aiui +n∑

i=1

aivi

= a u + a v

Proof of equation (3.15). Again with obvious notation, the jth entry in the vector us isujs. Hence

a (us) = a1 (u1s) + · · ·+ an (uns)= (a1u1 + · · ·+ anun) s

= (a u) s

Proof of equation (3.16). By equation (3.14), a (us + vt) = a (us) + a (vt). Using equation(3.15), a ( us) + a (vt) = (a u) s + (a v) t.

Exercise 49 1. Consider 22 in Chapter 2 and the matrices in 2.2.1 referred to there. Foreach of these as well as for the following A ∈ <m×n find m and n and, using our notationin equation (3.5) and equation (3.6), write down all rows and columns of A.

(a) A =

−2 53 − 4

7−1 1

2

(b) A =

4 7 0 25

0 9 −1 34

− 53 1 2

3 0

(c) A =

p f cb q pd f r

(d) A =

−3 −1 8 02 −7 9 1

20 2

3 −2 01 −4 5 −6

2. Write down C if it is given that

(a) C has three columns C•1 =[

4−5

], C•2 =

[27

], C•3 =

[023

]

(b) C has three rows and C1• =[

q 5 −z], C2• =

[0 a b

], C3• =

[ −11 x 23

].

(c) C = kI5 (k a scalar)

3. Find:

(a)

−13220

2 +

121− 2

3− 1

6

(−6) +

1− 2

5110−1

5 +

03−14

(−1)

(b) 4

6 −2−1 87 2

+ 3

11 02 1−4 5

− 2

0 69 4−3 7


(c)(

12 + p

) [2 −4 6 3

]− 23 (2p + q)

[ −3 9 18 −6]+ 1

4 (p− q)[

0 −24 8 6]

4. Verify equation (3.16) in case a =[ −3 5

2 2 −1]

, u =[

12 −1 −3 1

4

]T , v =[ − 56

13 − 1

2 1]T , s = 2 and t = −6.

5. Find the transposes of the matrices− 5

6 −2 12

1 8 −9−7 2 1

3

,

[x −u c d

]and

[v −3 7 a−9 w 0 z

].

6. For the matrices A in No.5 compare Ai• and[AT

]•i as well as A•j and

[AT

]j•. Can

we say that [Ai•]T =

[AT

]•i and [A•j ]

T =[AT

]j•? Are these true for any matrix A?

Answer: Yes.

7. Prove the above basic properties in subsection 3.1.4. For equation (3.17) and equation(3.21) you may assume k=3, since the proofs of the general cases are quite similar.

Hint 50 Study the above proofs in 3.1.5 as well as your own proofs of the statements in1.1.6 and 1.1.9 of Chapter 1. Apart from the fact that Chapter 1 concentrated on vectorsin <1×3 or <1×2 most of the proofs for general vectors go through almost word-for-word.

8. (a) i. In <m = <m×1 let x be a linear combination of the vectors a1, a2 and a3.Suppose that each of a1, a2 and a3 is a linear combination of e, f and g. Showthat x is a linear combination of e, f and g.

ii. * Express the this result in terms of the ‘span’ concept.

Solution:If x ∈ sp(a1, a2, a3) and each aj ∈ sp(e, f , g), then x ∈ sp(e, f , g).

(b) Is the above result 8(a)i true for p× q matrices X, A1 A2, A3, E, F and G (in placeof the vectors x,...,g)? Can you prove your answer?

Solution: Yes. The same proof goes through.

(c) Find a statement that generalizes No.8a: Suppose that x ∈ <m depends linearly ona1, a2, . . . , ak and each vector aj depends linearly on b1, b2, . . . , bn, then ...? (Youneed not prove your statement yet, but see Exercise 65, No.14).

9. Show that, provided A and B are of the same size, (A + B)T = AT + BT . (Use thenotation of equation 3.1).

10. For the following pairs A, B of matrices find all possible products Ai•B•j and arrangethem in matrix form in what you think is a nice way.

(a)

A =[

2 −93 1

], B =

[8 2 −1−5 4 7

]

(b)

A =

−1 53 02 −4

, B =

[ −1 2 1 03 0 −4 7

]

11. In this question <n stands for <n×1 or <1×n, say <n = <n×1.Show that with appropriate definitions in <n properties 1 - 5 of subsection 1.1.13 and 2- 3 of 1.1.15 in Chapter 1 are valid.


Hint 51 For u and v in <n, the dot product u · v = uT v = u1v1 + · · ·+unvn. The lengthof x is |x| =

√x · x. (This in turn gives a definition of the of distance |a − b| between a

and b). Note that |x| = 0 only if x = 0.

For the proofs of the Cauchy and Cauchy-Schwartz inequalities see Hint 4 in Chapter1. Note that, as in the three-dimensional case, Cauchy’s inequality follows from that ofCauchy-Schwartz.

3.1.6 Ax viewed in two equivalent ways

1. In terms of the products Ai•x:

Let A = [aij ] be m × n and x = [xj ] be n × 1. In Chapter 2 we considered the systemof m equations for n unknowns symbolized by Ax = b. In 2.2.4 of the same chapter andin other examples we saw that Ax is a column with m entries. Its ith entry is (compareequation (3.10)):

Ai•x = ai1x1 + ai2x2 + ... + a1nxn

Hence,

Ax =

A1•xA2•x

...Am•x

(3.22)

We met such a product in 2.2.4 of Chapter 2. The entry in the ith row of Ax is the ‘dotproduct’ Ai•x of row Ai• and column x.

2. Ax as a linear combination of the columns of A with coefficients x1, . . . , xn:

Ax = A•1x1 + A•2x2 + ... + A•nxn (3.23)

This view was anticipated in Chapter 2, for example in equations (2.17, (2.19) and (2.25).Like the previous view, it can immediately be seen from the definitions. The following examplesshould make this clear.

Example 3.1.1 Let

A =[

a11 a12 a13

a21 a22 a23

]

and x ∈ <3×1. Then

Ax =[

a11 a12 a13

a21 a22 a23

]

x1

x2

x3

=[

a11x1 + a12x2 + a13x3

a21x1 + a22x2 + a23x3

]=

[a11

a21

]x1 +

[a12

a22

]x2 +

[a13

a23

]x3

So

[ −3 6 52 −4 7

]

x1

x2

x3

=[ −3x1 + 6x2 + 5x3

2x1 − 4x2 + 7x3

]=

[ −32

]x1 +

[6−4

]x2 +

[57

]x3


Example 3.1.2 As a more general example, consider the general system of 4 equations for fiveunknowns, equation (2.11) of the previous chapter. We found there equation (2.12):

Ax =

A1•xA2•xA3•xA4•x

=

a11x1 + a12x2 + a13x3 + a14x4 + a15x5

a21x1 + a22x2 + a23x3 + a24x4 + a25x5

a31x1 + a32x2 + a33x3 + a34x4 + a35x5

a41x1 + a42x2 + a43x3 + a44x4 + a45x5

=

a11

a21

a31

a41

x1 +

a12

a22

a32

a42

x2 +

a13

a23

a33

a43

x3 +

a14

a24

a34

a44

x4 +

a15

a25

a35

a45

x5

Example 3.1.3 In the previous example let x = ej be the unit vector of equation (3.7) withn = 5 entries . Then Ae2 = A•2 (the second column of A) since x2 = 1 and x1 = x3 = x4 =x5 = 0. Similarly, for j = 1, · · · , 5, we have Aej = A•j.

3.1.7 Aej = A•j in general

From the above example it is clear that if A is an m × n matrix and ej is the jth unit vectoras in equation (3.7), then

Aej = A•j (j = 1, · · · , n) (3.24)

Equivalently, if I is the n× n identity matrix,

AI•j = A•j (j = 1, · · · , n)

3.1.8 Ax as a product of A and x

Henceforth we will regard Ax as the product of the m × n matrix A by the n × 1 columnmatrix x.

3.1.9 Connection between linear combinations and solving equations,the column space

This was also anticipated in Chapter 2 (for example, in Exercise 33 and in No.2b from Exercise36.

Solving a system of equations Ax = b for x is exactly the same as looking for coefficients xj sothat b is a linear combination of the columns of A:

Ax = b if, and only if b = A•1x1 + A•2x2 + ... + A•nxn

In terms of the concept of ‘span’ (see item 2 in subsection 3.1.1), this is saying that b is in thespan of the columns of A:

b ∈ sp(A•1, A•2, . . . , A•n)

Example 3.1.4 Is a = [1, 5, 2, 7] a linear combination of u = [1, 3, 0, 5] and v = [0, 1, 1, 1, 1]?Another way to put it: ’Does a depend linearly on u and v?’ In other terminology we have beenusing, ’is it true that a ∈ sp(u, v)?’

Solution:

To answer this, we change the vectors into column vectors (i.e. we use uT , vT , aT ) and try tosolve the following matrix equation for x:

1 03 10 15 1

[x1

x2

]=

1527


In other words, we try to express the column on the right as a linear combination of thecolumns of the 4× 2 coefficient matrix. Use the array of detached coefficients:

1 0 13 1 50 1 25 1 7

1 0 10 1 2 −3R1 + R2

0 1 20 1 2 −5R1 + R4

This is equivalent to1 0 10 1 20 0 0 −R2 + R3

0 0 0 −R2 + R4

Hence x2 = 2 and x1 = 1, and aT = uT + vT 2, so a = u + 2v and a depends linearly on u andv.

Remark 52 We are in the habit of solving equations in the form Ax = b (so we are takinglinear combinations of columns), but we could just as well have left the vectors as rows: Thenwe would be taking linear combinations of rows and using elementary column operations.

Remark 53 Any result about columns has a corresponding result about rows and vice-versa.This is a fact you will appreciate more and more as we go ahead.

Example 3.1.5 Consider from Chapter 2, Example 2.1.4, equation (2.7):

1 1 31 3 42 4 7

x =

−1−15

We found from the detached array in echelon form (2.20) in Chapter 2 that there is no solution.Thus the right-hand side

[ −1 −1 5]T is not a linear combination of the columns of the

coefficient matrix A.

Example 3.1.6 Let

A =

2 5 −11−3 −6 124 7 −13

(3.25)

What sort of solutions does the homogeneous system Ax = 0 have? Is some column of A alinear combination of the other columns of A?

Solution:

Solve the homogeneous linear system

2 5 −11−3 −6 124 7 −13

x1

x2

x3

=

000

x = 0 is in any case the (trivial) solution. As usual, to find the general solution, use Gauss-reduction to find a row echelon form of the coefficient matrix A:


2 5 −110 1 −30 0 0

(3.26)

and the general solution with parameter t is

x1

x2

x3

=

−2t3tt

(3.27)

(Check this). If we put t = 1, we get

2 5 −11−3 −6 124 7 −13

−231

=

2−34

(−2) +

5−67

3 +

−1112−13

=

000

(3.28)

Since each coefficient −2, 3 and 1 is non-zero, every column of A is a linearly dependent onthe other two columns. For example,

A•1 =

2−34

=

5−67

3

2+

−1112−13

1

2

Example 3.1.7 Let

A =

2 5 1−3 −6 04 7 −2

(3.29)

Can we find a non-trivial solution x to Ax = 0? Is any column of A linearly dependent onthe other two?

Solution:

Reduce A to row-echelon form:2 5 10 3

232

32R1 + R2

0 −3 −4 (−2) R1 + R3

,2 5 10 3

232

0 0 −1 2R2 + R3

,2 5 10 1 1 2

3R2

0 0 1 (−1) R3

So a row EF for A is

C =

2 5 10 1 10 0 1

Thus Ax = 0 has only the trivial solution

x =

000

Hence no column of A can depend linearly on the other two. For example, should we haveA•1 = A•2β + A•3γ, then

x =

1−β−γ

would be a non-trivial solution to Ax = 0 because 1 6= 0.


Remark 54 Questions of the sort “which columns of A are linearly dependent on other columnsof A?” are further explored below in 3.4.11.When Ax = 0 has only the trivial solution x = 0 we will call the columns of A linearly inde-pendent. See subsection 3.4.1.

3.1.10 Fundamental properties of the product Ax

Let A be an m× n matrix and suppose u1, u2 are vectors in <n = <n×1 and s1, s2 are scalars.Then

A (u1s1 + u2s2) = (Au1) s1 + (Au2) s2 (3.30)

Proof

Consider the ith entry of the left-hand side of (3.30). By equation (3.16) this is

Ai• (u1s1 + u2s2) = (Ai•u1) s1 + (Ai•u2) s2

As (Ai•u1) s1 + (Ai•u2) s2 is by definition the ith entry of the right-hand side of equation(3.30), the result follows.More generally, equation (3.30) easily extends to the following general result:

Let A be an m × n matrix and suppose u1, u2, ..,uk are vectors in <n = <n×1 and s1, s2,..,.sk are scalars. Then

A (u1s1 + u2s2 + ... + uksk) = (Au1) s1 + (Au2) s2 + ... + (Auk) sk (3.31)

Exercise 55 1. Find Ix if I = In is the n × n identity matrix as in equation (3.8) andx ∈ <n×1.

2. Suppose that v1, v2, .... , vk are solutions to a system Ax = b of m equations in nunknowns x1, . . . , xn. Let w = v1α1 + v2α2 + · · ·+ vkαk be a linear combination of thesevectors.

(a) Show that if the system is homogeneous i.e. b = 0, then w is also a solution, i.e.Aw = 0

(b) Find a simple example to show that (a) may fail if b 6= 0.(c) If α1 + α2 + · · ·+ αk = 1 show that Aw = b.

3. (a) Consider Exercise 36 of Chapter 2, Numbers 3, 4a, 5a and 6, 11a and 11b. Theserequire solutions to a matrix equation Ax = b. For which of these is b a linearcombination of the columns of A?Consider those for which b is a linear combination of the columns of A. For whichof these are the coefficients in the linear combination unique? For those which arenot, express b in two ways as linear combinations of the columns of A.

Answer: See solutions to this exercise in Chapter 2.(b) * In Problem 13 from Exercise 36 of Chapter 2. For which values of a, b, c, d is[

a b c d]T a linear combination of the columns of A?

Answer: See solutions to this exercise in Chapter 2.

4. Prove the following theorem, which was anticipated in No.10 of Exercise 36 in Chapter 2.

Let Ax = b represent m equations in n unknowns. Suppose that the system has at leastone particular solution, say p. Then if Av = 0 so is x = p+ v a solution and all solutionsto Ax = b have this form.Conclude that if m < n and Ax = b has at least one solution, then the system hasinfinitely many solutions. In fact, if there are k non-pivot columns then the generalsolution contains k independent parameters. (Use 2.2.15 from Chapter 2).


3.2 Matrices as Mappings

Let A be an m × n matrix. In what follows <n stands for the set of column vectors with nentries, ie. <n = <n×1. The matrix A converts or transforms the vector x ∈ <n into thevector Ax ∈ <m. In other words, letting y = Ax, have a function of the independent variablex with y = Ax being the dependent variable. In the language of set-theory, A is a function ortransformation or mapping with domain <n and range in the set <m:

A : <n → <m (3.32)

A maps the vector x to the vector y = Ax. Symbolically,

x A−−−−−−→Ax or x → Ax

3.2.1 The range

The range of x → Ax is the set of vectors y such that Ax = y for some x. In other words, therange is just the set of all linear combinations of the columns of A. The range is therefore thecolumn space of A (see item 2 in subsection 3.1.2 and subsection 3.1.9).See also section ??, where some important properties of subspaces are developed.

Remark 56 If c is a fixed column vector with m entries then x → c+Ax is also a mapping from<n to <m. However, we will concentrate mainly on the case c = 0 (linear transformations).

3.2.2 Solving equations is the inverse process of the mapping x → Ax

Given the mapping (3.32) and a vector y ∈ <m, we can ask: Is there an x ∈ <n such thaty = Ax? This is precisely the problem of searching for a solution x to the system of equationsy = Ax, as discussed in subsection 3.1.9. In Chapter 2 we found that some systems have asolution, others not.

Example 3.2.1 Consider the Cartesian plane <2. In equation (1.28) of Chapter 1 we foundthe reflection of the point (p1, p2) in the line ` lying in the x−y plane and which passes throughthe origin making an angle of 30◦ with the positive x−axis:

p′ =(

12p1 +

12

√3p2,

12

√3p1 − 1

2p2

)

Hence considered as a mapping, the matrix

A =

[12

12

√3√

32 − 1

2

]

transforms the point x =[

x1

x2

]into its reflection in the above line:

y = A

[x1

x2

]=

[12x1 + 1

2

√3x2

12

√3x1 − 1

2x2

]

(We have written p and p′ as columns x and y respectively). See Figure 3.1. Geometrically, itis clear that the range of this reflection is <2. This can be seen algebraically by showing that,whatever the values of y1 and y2, the matrix equation

[12x1 + 1

2

√3x2

12

√3x1 − 1

2x2

]=

[y1

y2

]

always has a solution for x1 and x2. (Show this).

3.2. MATRICES AS MAPPINGS 87

x2 - axis

A

O x1 - axis

U

x

line `

30

y = Ax

Figure 3.1

Example 3.2.2 The simpler matrix

A =[ −1 0

0 1

]

represents a reflection in the x2−axis. The range is again <2.

Example 3.2.3 The matrix

A =[

1 0 00 1 0

]

represents the mapping that projects the point x = [x1, x2, x3]T ∈ <3 in space onto the point

y = Ax = [x1, x2]T ∈ <2 in the x1 − x2 plane. The range is obviously <2.

Example 3.2.4 Consider a fixed column vector a with three entries and non-zero column ma-trix

A =

u1

u2

u3

The mapping, A : <3 → <1 = <, and t → a + At represents a parametric equation of a straightline in space passing through the point a, as in Chapter 1, 1.2. The only difference is that herewe are using columns instead of rows.

Example 3.2.5 Consider the matrix

A =

u1 v1

u2 v2

u3 v3

in which the columns are non-zero and non-parallel. The mapping[

st

]→ A

[st

]

now represents a generic equation of a plane going through the origin.See Chapter 1, subsection 1.3.1.


Example 3.2.6 The matrix

A =

−4 0 00 −4 00 0 −4

transforms the point x ∈ <3 into the point y = Ax = −4x ∈ <3. This represents a stretchingx → 4x of x by a factor of 4 followed by an inversion x → −x in the origin O.

Example 3.2.7 The matrix A =[

0 −11 0

]as mapping of <2 to <2 is

y =[ −x2

x1

]=

[0 −11 0

] [x1

x2

]

and represents a rotation of the point with position vector x through 90◦ (anticlockwise lookingdown on the plane). Convince yourselves of this by looking at a few values of x, but we willreturn to this below in 3.3.11.

Example 3.2.8 The matrix A =[

1 α0 1

], interpreted as a mapping, leaves x2 fixed and

moves x1 to x1 + αx2. It is known as a shear transformation.

Ax =[

x1 + αx2

x2

]

See Figure 3.2 where α > 0. The box OBCD is transformed into the quadrilateral OEFD.The range is again <2, as can very easily be seen. (Exercise 59 No.3).

Figure 3.2

................................... ...................

x Ax

O x1

.....x1 + αx2

...

x2 ...................αx2

E FB C

D

3.2.3 Ax = Bx for all x if, and only if, A = B

It is obvious that if A = B then Ax = Bx for all x.Conversely, if Ax = Bx for x ∈ <n then A and B must have the same number n of columns andalso the same number m of rows (the number of rows in Ax = Bx). Furthermore, if ej ∈ <n isthe jth unit vector, then by equation (3.24), A•j = Aej = Bej = B•j for j = 1, · · · , n. HenceA and B have the same columns and so A = B.

3.3. LINEAR TRANSFORMATIONS 89

3.3 Linear Transformations

Because of equation (3.31), the mapping (3.32) is said to be linear and is called a lineartransformation.

In abstract terms, a mappingT : <n → <m (3.33)

is called a linear transformation if for all u1 and u2 in <n and all scalars s1 and s2 it followsthat

T (u1s1 + u2s2) = T (u1) s1 + T (u2) s2 (3.34)

An equivalent definition is the following.The mapping (3.33) is a linear transformation if, and only if, for all u and v in <n and all

scalars s,

T (u + v) = T (u) + T (v) (3.35)

T (us) = T (u) s (3.36)

For it is obvious that if (3.34) holds then so will (3.35) and (3.36). On the other hand,suppose that these last two conditions hold for the mapping T . Then for any u1 and u2 scalarss1 and s2,

T (u1s1 + u2s2) = T (u1s1) + T (u2s2) = T (u1) s1 + T (u2) s2

Remark 57 By putting s = 0 in (3.36) we see that a necessary condition for T to be linear isthat it maps the zero vector of <n to the zero vector of <m. Most transformations are decidedlynon-linear. Consider, for example, T : <1 → <1 given by T (x) = x2. Here T (0) = 0 but Tfails (3.35), since (e.g.) T (1 + 1) = 22 6= T (1) + T (1) = 2.

3.3.1 An extension of equation (3.34)

The result expressed by equation (3.34) easily extends to the following:Let T be a linear transformation and suppose vectors u1, u2, ..,uk in <n are given as well

as scalars s1, s2, ...sk, then

T (u1s1 + u2s2 + ... + uksk) = T (u1) s1 + T (u2) s2 + ... + T (uk) sk (3.37)

We use this result to show that

3.3.2 A linear transformation T : <n → <m is defined by a uniquematrix mapping x → Ax

Proof :

We have seen that the mapping x → Ax is a linear transformation.Conversely, let T : <n−<m be a linear transformation. We show that there is a unique matrixA such that T (x) = Ax for all x ∈ <n.

Let ej be one of the n unit vectors (3.7). By assumption, T(ej

)is a column vector with m

entries. Since x ∈ <n has the expression x = e1x1 + e2x2 + ...+ enxn, we deduce from equation(3.37) that

T (x) = T (e1)x1 + T (e2)x2 + ... + T (en) xn (3.38)

In other words, by equation (3.23),

T (x) = Ax (x ∈ <n) (3.39)


Here the matrix A is defined by A•j = T(ej

)for j = 1, · · · , n. This is saying that A is the

m× n matrixA =

[T (e1) T (e2) ... T (en)

]

The matrix A in (3.39) is obviously unique.

3.3.3 A linear transformation T is completely determined by its effectT

(ej

)on the unit vectors ej

This is just a restatement of equation (3.38).

Remark 58 Since a linear transformation is just a mapping determined by a matrix, whybother with the concept ‘linear transformation’? One reason is that our intuition often suggeststhat a transformation is linear.

Example 3.3.1 In equation 1.13 of Chapter 1 we found the projection Q of a point P =(p1, p2, p3) on the line t (−2,−1, 1):

q =(

23p1 +

13p2 − 1

3p3,

13p1 +

16p2 − 1

6p3,−1

3p1 − 1

6p2 +

16p3

)

Our geometric intuition strongly suggests that this should represent a linear transformation.Prove this by finding the matrix determining it.

Solution:

Writing x1 = p1, x2 = p2, x3 = p3 and using column vectors, the above projection formulabecomes

q1

q2

q3

=

23

13 − 1

313

16 − 1

6− 1

3 − 16

16

x1

x2

x3

This is the mapping x → q = Ax and is indeed a linear transformation.

Example 3.3.2 Find a matrix representation of the mapping which sends the point P to thefoot of the perpendicular from P to the plane x1 + 2x2 + x3 = −4. Show that this mapping isnot linear but that the projection of P on the plane x1 +2x2 +x3 = 0 is a linear transformation.Find the matrix that reflects in this plane.

Solution:

From equation (1.23) in Chapter 1, with P = (p1, p2, p3) we found for the foot Q

q = (p1, p2, p3)− 4 + p1 + 2p2 + p3

6(1, 2, 1)

Letting x1 = p1, x2 = p2, x3 = p3, and again using column vectors, this becomes

q1

q2

q3

=

x1

x2

x3

−

234323

+

−x1+2x2+x36

−2x1+2x2+x36

−x1+2x2+x36

This cannot be linear since x = 0 does not map to q = 0.However, from equation (1.24) we found the formula for the projection Q of P = (p1, p2, p3)

on the plane x1 + 2x2 + x3 = 0 to be

q = (p1, p2, p3)− p1 + 2p2 + p3

6(1, 2, 1)


In matrix form with P = x, this reads

q1

q2

q3

=

x1

x2

x3

+

−x1+2x2+x36

−2x1+2x2+x36

−x1+2x2+x36

=

56x1 − 1

3x2 − 16x3

− 13x1 + 1

3x2 − 13x3

− 16x1 − 1

3x2 + 56x3

=

56 − 1

3 − 16

− 13

13 − 1

3− 1

6 − 13

56

x1

x2

x3

(3.40)

The matrix that reflects in the plane x1 + 2x2 + x3 = 0 is

2

56 − 1

3 − 16

− 13

13 − 1

3− 1

6 − 13

56

−

1 0 00 1 00 0 1

=

23 − 2

3 − 13

− 23 − 1

3 − 23

− 13 − 2

323

This also defines a linear transformation.

Exercise 59 1. Find a matrix formula for the foot Q of the perpendicular from P = x ∈ <3

to the plane 2x + z − 3 = 0. Do the same for the plane 2x + z = 0, showing in the lattercase that we get a linear transformation. Write down the matrix A of this transformation,as well as the matrix that reflects in the plane 2x + z = 0.

Hint 60 Refer to 1d, equation (1.25) from Exercise 15 in Chapter 1. There you foundthe foot Q of the perpendicular from (p1, p2, p3) to the plane 2x + z − 3 = 0:

q =(

15p1 − 2

5p3 +

65, p2,

45p3 − 2

5p1 +

35

)

Now use appropriate columns in place of row vectors.

2. (a) Consider equation (1.12) in subsection 1.2.5 of Chapter 1 for the projection of apoint on a line ` that passes through the origin. Using column vectors find a matrixA such that the foot Q of the perpendicular from the point x is given by q = Ax.

Solution:From Chapter 1,

q =

(1|u|2 p · u

)u =

1|u|2 (p1u1 + p2u2 + p3u3) (u1, u2, u3) .

The first coordinate is 1|u|2 (p1u1 + p2u2 + p3u3)u1 and the others are similar. The

matrix is

A =

u21

|u|2u1u2|u|2

u1u3|u|2

u2u1|u|2

u22

|u|2u2u3|u|2

u3u1|u|2

u3u2|u|2

u23

|u|2

(b) Consider equation (1.21) from Exercise 15 in Chapter 1. Assume that the plane goesthrough the origin. Using column vectors find a matrix equation q = Ax for the footQ of the perpendicular from the point x onto the plane (the projection of P onto theplane).

The matrices in 2a and 2b are called projection matrices.


(c) Specialize (a) in case the transformation is restricted to the Cartesian plane, i.e. isfrom <2 to <2.

Solution:

A =

u21

|u|2u1u2|u|2

u2u1|u|2

u22

|u|2

=

1|u|2

[u2

1 u1u2

u2u1 u22

]

3. Find the range of the shear transformation (Example 3.2.8) and prove that your statementis correct.

4. Use Exercise 19 from Chapter 1 to find linear transformations describing the following:

(a) The reflection in a line that passes through the origin.

Remark 61 Observe that this matrix actually represents a rotation through π radi-ans about the given line. Contrast this with the situation when we restrict ourselvesto the Cartesian plane in Exercise 65, No.9, where a reflection cannot be a rotation.

(b) The reflection in a plane that passes through the origin.

Solution:This is done in much the same way as the previous exercise. Let A be the matrix ofthe projection of No.2b. The matrix of the reflection is 2A− I3:

1− 2n21

|n|2 − 2n1n2|n|2 − 2n1n3

|n|2

− 2n2n1|n|2 1− 2n2

2|n|2 − 2n2n3

|n|2

− 2n3n1|n|2 − 2n3n2

|n|2 1− 2n23

|n|2

(c) Specialize (a) in case the transformation is restricted to the Cartesian plane, i.e. isfrom <2 to <2.

(d) What do you think the ranges of these mappings are?

5. In the x1 - x2 plane let A be the rotation matrix of Example 3.2.7 and suppose B is the2× 2 matrix that reflects in the x1−axis.

(a) Find matrices C and D such that for all vectors x ∈ <2 we have Cx = B (Ax) andDx = A (Bx).

(b) Show that C and D are reflection matrices and describe them geometrically.

Remark 62 In the next section, we will regard C as the product of B and A, in thatorder: C = BA. Similarly, D = AB. See Example ??

6. Let A be one of the projection matrices from the previous exercises. Show geometricallywhy you expect that A (Ax) = Ax for all points x. Now suppose that A is one of thereflection matrices. What should A (Ax) be? See also Exercise 65, No.11 below.

3.3.4 The product AB of two matrices

Let A be an m× k matrix and B a k × n matrix. As linear transformations,

B : <n → <k and A : <k → <m.

Consequently, for x ∈ <n

x → A (Bx) (3.41)


<n

Bx

<m

z

+

x

B

A

Figure 3.4

A(Bx)

<k

defines a transformation from <n to <m. See Figure 3.3. Symbolically

x B−−−−−−→Bx A−−−−−−→A (Bx)

It is essential to observe that (3.41) only makes sense if the number of columns of Aequals the number of rows of B (here both are k).

Remark 63 We have to read this as: first apply B to x, then apply A to the result Bx. This“backward” reading is due to our traditional functional notation: For ordinary functions f andg we write the composition of f and g as (f ◦ g) (x) = f (g (x)).

We will define the product P = AB in such a way that

Px = A (Bx) (x ∈ <n) (3.42)

In other words, so that AB defines the composition of the transformations A and B.

3.3.5 There is one, and only one, matrix P satisfying equation (3.42)

Proof

Suppose (3.42) holds. Put x = ej , one of the n unit vectors (3.7). Then

P•j = Pej = A(Bej

)= AB•j (j = 1, · · · , n) (3.43)

In other words, for (3.42) to hold (3.43) is the only choice for P . On the other hand, (3.41) isa linear transformation: See No.13 in Exercise 65). Therefore its matrix can only have columns(3.43). We therefore define the matrix AB as follows:

3.3.6 Definition

If A is m× k and B is k × n, then

AB =[

AB•1 AB•2 · · · AB•n]

(3.44)


Equivalently,

[AB]•j = AB•j (j = 1, · · · , n) (3.45)

With this notation So writing AB for P we rewrite (3.42) as

(AB) x = A (Bx) (x ∈ <n)

3.3.7 Alternative more symmetric definition of AB

Equation (3.44) seems biased towards columns, but that this is not the case can be seen bylooking at the entries of (AB)•j . Using (3.22) with x = B•j and the notation of equation (3.1),

[AB]ij = Ai•B•j (i = 1, · · · ,m; j = 1, · · · , n) (3.46)

The entry in row i and column j of AB is the product of row Ai• and column B•j . Equation(3.46) can just as well be taken as the definition of AB and in fact is the standard definition.This equation was anticipated in Exercise 49, No.10a above.

Whatever is true for columns (rows) has its exact counterpart for rows (columns). From(3.46) we see that

[AB]i• = Ai•B (i = 1, · · · ,m) (3.47)

or more fully,

AB =

A1•BA2•B

...Am•B

Expanding (3.46), we get using the usual convention for matrix entries and∑

-notation,

[AB]ij =k∑

r=1

airbrj = ai1b1j + · · ·+ aikbkj

As already remarked, this equation only makes sense if the number of columns of A equalsthe number of rows of B. We may write

(AB)m×n = Am×kBk×n

Example 3.3.3 Let B be the reflection matrix of Example 3.2.1 and A that of the rotationmatrix A of Example 3.2.7. Find the matrix of (i) first rotating then reflecting and (ii) theother way round, first reflecting then rotating.See No.5 in Exercise 59.

Solution (i)If we first rotate through 90◦ then reflect in the x1−axis we get the product

BA =[

1 00 −1

] [0 −11 0

]=

[1 0

] [01

] [1 0

] [ −10

]

[0 −1

] [01

] [0 −1

] [ −10

]

=

[0 −1−1 0

]

Solution (ii)If we first reflect and then rotate we obtain

AB =[

0 −11 0

] [1 00 −1

]=

[0 −1

] [10

] [0 −1

] [0−1

]

[1 0

] [10

] [1 0

] [0−1

]

=

[0 11 0

]


Example 3.3.4 In Exercise 49, No.10a,

A =[

2 −93 1

], B =

[8 2 −1−5 4 7

]

Find AB.

Solution:

From (3.46),

AB =

[2 −9

] [8−5

] [2 −9

] [24

] [2 −9

] [ −17

]

[3 1

] [8−5

] [3 1

] [24

] [3 1

] [ −17

]

=[

61 −32 −6519 10 4

]

Example 3.3.5 Find yB , where y =[

y1 y2

]and B is as in the previous example:.

[y1 y2

] [8 2 −1−5 4 7

](3.48)

Solution:

yB =[

y1 y2

] [8 2 −1−5 4 7

]

=[ [

y1 y2

] [8−5

] [y1 y2

] [24

] [y1 y2

] [ −17

] ]

=[

8y1 − 5y2 2y1 + 4y2 −y1 + 7y2

]

Example 3.3.6 Express equation (3.48) as a linear combination of rows of B.

Solution:

In (3.23) we found that the product Ax is a linear combination of the columns of A withcoefficients xk. In exactly the same way we find the product (3.48) to be a linear combinationof the rows of B with coefficients y1 and y2:

y1B1• + y2B2•= y1

[8 2 −1

]+ y2

[ −5 4 7]

=[

8y1 − 5y2 2y1 + 4y2 −y1 + 7y2

]

= yB

3.3.8 For a row y, the product yB is a linear combination of the rowsof B with coefficients yi

Let B be a k × n matrix and y = [y1, · · · , yk] a row vector with k entries. Then

yB = y1B1• + · · ·+ ykBk• (3.49)

The reasoning is just as in the case of equation (3.23) in subsection 3.1.6 except that nowwe are multiplying on the left of B by a row y and taking a linear combination of rows of B.Like equation (3.22) we have

yB =[

yB•1 yB•2 · · · yB•n]

which is just a special case of equation (3.44).


3.3.9 The rows of AB as linear combinations of rows of B and thecolumns of AB as linear combinations of columns of A

Let A = [air] be m× k and B = [brj ] be k × n. In the product (3.46), we have

1.[AB]i• = ai1B1• + ai2B2• + · · ·+ aikBk• (i = 1, · · · ,m) (3.50)

2.[AB]•j = A•1b1j + A•2b2j + · · ·+ A•kbkj (3.51)

To see equation (3.50), use equations (3.47) and (3.49) with y = Ai•.In words: “Row i of AB is a linear combination of the rows of B with coefficients takenfrom row i of A”.To see equation (3.51) use equations (3.45) and our earlier result equation (3.23) withx = B•j .In words: “Column j of AB is a linear combination of the columns of A with coefficientstaken from column j of B”.The above examples illustrate these statements, as do the following.

Example 3.3.7 Find AB in two ways: (i) in terms of linear combinations of rows of B and(ii) in terms of linear combinations of columns of A, given that

A =[

2 1−1 3

]and B =

[5 4−7 0

]

Solution:

(i) In terms of the rows of AB:

AB =[

A1•BA2•B

]

=

[2 1

] [5 4−7 0

]

[ −1 3] [

5 4−7 0

]

=

[2

[5 4

]+ 1

[ −7 0]

(−1)[

5 4]+ 3

[ −7 0]

]

=[

3 8−26 −4

]

(ii) In terms of the columns of AB:

AB =[

AB•1 AB•2]

=[ [

2 1−1 3

] [5−7

] [2 1−1 3

] [40

] ]

=[ [

2−1

]5 +

[13

](−7)

[2−1

]4 +

[13

]0

]

=[

3 8−26 −4

]

Example 3.3.8 If A is a 3× 3 matrix, find matrices B and C such that

BA =

−5A2• + 67A1• + 92A3•

9A3• − 4A1•8A2• + 21A1•


andAC =

[7A•1 − 37A•2 19A•2 + 17A•1 − 7A•3

]

(AC has two columns).

Solution:

B =

67 −5 92−4 0 921 8 0

and C =

7 17−37 190 −7

Example 3.3.9 What is the matrix of the transformation T : <3 → <3 that first projects ontothe plane x1 +2x2 +x3 = 0 as in (3.40) and then projects onto the x1−x2 plane as in Example3.2.3?

Solution:

The required matrix is the product

[1 0 00 1 0

]

56 − 1

3 − 16

− 13

13 − 1

3− 1

6 − 13

56

=

[56 − 1

3 − 16

− 13

13 − 1

3

]

3.3.10 The Associative law: (AB) C = A (BC)

If A is m × k and B is k × n, our whole inspiration (3.42) for the definition of the productP = AB is that (AB)x = A (Bx) should hold for all x ∈ <n. This is a special case of theassociative law for multiplication of matrices, but in fact gives us the general law directly.

Statement and proof of the associative lawLet A be an m× k matrix, B a k × n matrix and C an n× p matrix. Then

(AB)C = A (BC) (3.52)

Considered as mappings, both sides of (3.52) have the same meaning: First apply C, thenB, then A. Therefore both sides are equal as matrix products.

Powers of A if A is square. Let A be m ×m. Then (AA)A and A(AA) are both definedand are equal by (3.52), so we can write A3 for this product. Similarly, if n > 1 is an integerwe have An = AA · · ·A (n times), where the bracketing does not matter. By convention,A0 = Im (the m×m identity matrix).

Alternative proof of the associative law

We have given a motivated and especially simple proof of the associative law (3.52). The usualproof given in textbooks is the following:

[A (BC)]ij =k∑

r=1

[A]ir [BC]rj =k∑

r=1

[A]ir

(n∑

s=1

[B]rs [C]sj

)

=k∑

r=1

n∑s=1

[A]ir [B]rs [C]sj =n∑

s=1

(k∑

r=1

[A]ir [B]rs

)[C]sj

=n∑

s=1

[AB]is [C]sj = [(AB)C]ij


3.3.11 Rotations are linear transformations

.Let X be a point in the Cartesian plane <2 = <2×1 and suppose that R rotates OX = x

through the angle θ about the origin anticlockwise looking down the x3− axis towards O. Thisis a clockwise rotation of θ looking from the origin O along the positive x3−axis. Let R (x) bethe effect of R on x.

....................

......

................................

±

:..............................µ

....................

.....

y

..

ª

)

a + bb

R(b)C ′

R(a + b)

B′

O

A

B

C

A′a

x1

x2....

Figure 3.4

R(a)

We will now see that R : <2 → <2 is a linear transformation.In Figure 3.5, A and B are any two points in the plane with position vectors a and b. The

point C has position vector c = a + b and the quadrilateral OACB is a parallelogram.The points A′, B′ and C ′ are the respective results of rotating about O the points A,B and

C through the angle θ. In other words, A′, B′ and C ′ have position vectors R (a), R (b) andR (a + b) respectively. The parallelogram OA′C ′B′ is congruent to OACB and this shows that

R (a + b) = R (a) + R (b)

Thus the transformation R satisfies the first condition in equation (3.35) for linearity. It ispretty obvious that if we multiply a by a scalar s that R (a) will be multiplied by the samefactor, i.e. that R (as) = R (a) s, so that that the second condition for linearity as expressed inequation (3.36) also holds. We have shown that R : <2 → <2 is a linear transformation.

3.3.12 The matrix Rθ of a rotation through θ

Let Rθ be the matrix of the above rotation transformation R that rotates through the angle θ.Then

Rθ =[

cos θ − sin θsin θ cos θ

](3.53)

To see this, note that from Figure 3.5,

Rθ

[10

]=

[cos θsin θ

]and Rθ

[01

]=

[ − sin θcos θ

]


Since these are the first and second columns of Rθ respectively, this proves (3.53).

θ

>]

6[cos θsin θ

]

-

[ − sin θcos θ

][

01

]

x1

x2

θ

[10

]

Figure 3.5

Example 3.3.10 Write down the linear transformations that rotate π/3 and 2π/3 radiansabout the origin in the Cartesian plane.

Solution:

Rπ3

=[

cos π3 − sin π

3sin π

3 cos π3

]=

[12 − 1

2

√3

12

√3 1

2

]

R 2π3

=[

cos 2π3 − sin 2π

3sin 2π

3 cos 2π3

]=

[ − 12 − 1

2

√3

12

√3 − 1

2

]

3.3.13 The addition formulae for sine and cosine

We note that rotation through φ followed by the rotation through θ is equivalent to the rotationthrough θ + φ, that is,

RθRφ = Rθ+φ

This means[

cos θ − sin θsin θ cos θ

] [cosφ − sinφsin φ cosφ

]=

[cos (θ + φ) − sin (θ + φ)sin (θ + φ) cos (θ + φ)

](3.54)

Multiplying the two matrices on the left gives the addition formulae for sine and cosine:[

cos θ cosφ− sin θ sin φ − cos θ sinφ− sin θ cos φsin θ cosφ + cos θ sin φ cos θ cos φ− sin θ sinφ

]=

[cos (θ + φ) − sin (θ + φ)sin (θ + φ) cos (θ + φ)

]

Remark 64 Let X 6= O be a point in three-dimensional space. Looking along OX we canrotate clockwise through θ and this defines a transformation T : <3 → <3. Much the sameargument shows that T is linear. However, the formula for its matrix is more complicated.

Exercise 65 1. Work out the following:

(a)

−1 −2 32 7 41 −5 −2

3 6 3−1 −2 4−4 −3 1


(b)

0 3 −4−5 −2 −1−3 5 1

4 −3 −2−1 7 50 4 1

2−35

(c)[ −3 1 1−3 −3 −2

]

3 0 1 31 −1 2 0−1 1 0 −2

(d) (The result of No.16 will help to do this one).

3

−4 4 0 4−3 2 −3 23 −1 1 −10 4 1 0

0 1 1 11 3 −3 13 −3 1 1−1 −2 3 −1

−2

−4 4 0 4−3 2 −3 23 −1 1 −10 4 1 0

3 −1 −1 −32 3 −1 −21 −3 2 3−2 −2 0 0

−

−4 4 0 4−3 2 −3 23 −1 1 −10 4 1 0

−3 2 −1 −30 2 0 −1−2 1 −3 10 0 −2 2

(e)

λ 0 0 00 λ 0 00 0 λ 00 0 0 λ

n

(n is a positive integer).

2. Find all possible products of the following matrices, whenever the product is defined,andwhen this is the case, express the rows and columns of the product as linear combinations,as in example 3.3.7.

A =[ −1 3

2 1

], B =

−2 54 −70 3

, C =

[2 0 −5−4 3 11

],

D =[ −3 4 −1

], E =

2−16

3. Given that B is a 3× 3 matrix, find by inspection matrices A and C such that

AB =

B1• − 2B3•2B2• + 5B3•

B1• + 4B2• + 7B3•−3B1•

BC =[

B•1 (−2) + B•2 (3) + B•3 B•2 −B•1]

Note that BC has two columns.

4. If a ∈ <1×n is a row and b ∈ <m×1 is a column, is the product X = b a defined? If so,describe the entries [X]ij.

5. If AB = C and two of A, B, C are square then so is the other. True or false?

6. Let A be an m × n matrix and Im and In identity matrices (see equation (3.8)). Showthat

ImA = A = AIn

Conclude that if A is n× n and I = In that IA = AI = A.

7. In No.8 of Exercise 6 of Chapter 1 you found the formula for the projection of a point(p1, p2, p3) on the line (−2t,−t, 5t). Find its matrix B as a linear transformation. In No.1of Exercise 59 above you found the matrix A of the projection on the plane 2x + z = 0.Find


(a) the matrix of the transformation that first projects on the plane 2x + z = 0 then onthe line (−2t,−t, 5t).

(b) the matrix of the transformation that first projects on the line (−2t,−t, 5t) then onthe plane 2x + z = 0.

8. (a) complete the proof of the addition formulae for sine and cosine by multiplying outequation (3.54).

(b) Check by direct multiplication that RθR−θ = I2.

9. *

(a) Let ` be a line in the Cartesian plane making an angle θ with the positive x1−axis.Find the projection matrix Pθ on ` and deduce that the matrix Mθ which reflects in` is given by

Mθ =[

cos 2θ sin 2θsin 2θ − cos 2θ

]

Observe that, in spite of superficial appearances, Mθ is not a rotation matrix. Why?We offer two proofs that a reflection in a line in the Cartesian plane cannot be arotation.Geometric proof:Let x = OX and Mθx = OX ′. Then the arm OX ′ is OX rotated through a certainangle, say 2γ. But this angle depends on the distance of X from the line through Omaking an angle of θ with the x1−axis and so Mθ cannot represent a rotation. (Seealso the next question.)Algebraic proof:Algebraically, for a rotation matrix R, we have [R]11[R]22 − [R]12[R]21 = 1. For thereflection M = Mθ this number is [M ]11[M ]22 − [M ]12[M ]21 = −1.

(b) Let φ and θ be two angles and α = φ− θ. Prove that MφMθ = R2α in two ways:

i. geometrically by referring to Figure 3.6;

................................................................................................................................θ

α = φ− θ

M

I

α

Mθx

x

O

MφMθx

..........x1-axis

x2-axis

Figure 3.6


ii. algebraically.Hint 66 For the geometric solution, let Mθ rotate x through 2γ. Then Mφ

rotates Mθx through 2(α− γ). For the algebraic solution use the expression forMθ found in the previous exercise.Algebraic solution

MφMθ =[

cos 2φ sin 2φsin 2φ − cos 2φ

] [cos 2θ sin 2θsin 2θ − cos 2θ

]

=[

cos 2φ cos 2θ + sin 2φ sin 2θ cos 2φ sin 2θ − sin 2φ cos 2θsin 2φ cos 2θ − cos 2φ sin 2θ cos 2φ cos 2θ + sin 2φ sin 2θ

]

=[

cos (2φ− 2θ) − sin (2φ− 2θ)sin (2φ− 2θ) cos (2φ− 2θ)

]

= R2(φ−θ)

10. Following the corkscrew rule, find matrices that represent rotations through θ about thex−, y− and z− axes.

Hint 67 Recall the corkscrew rule: looking along the specified axis, rotate clockwisethrough θ. The required rotation about the z−axis is

cos θ − sin θ 0sin θ cos θ 0

0 0 1

11. (Compare Exercise 59, No.6 above). Let M be the matrix of a reflection in a planegoing through the origin and P the matrix of a projection on the same plane. Explaingeometrically why you expect the following to be true. Draw diagrams!

(a) M2 = I3

(b) P 2 = P

(c) MP = P = PM

Verify these results for the plane 2x + z = 0 of No.1 of Exercise 59 above.For analytical proofs of these results see No.17 below

12. If the matrix product AB is defined, show that so is BT AT and that (AB)T = BT AT .

Hint 68 [(AB)T ]ji = [AB]ij = Ai•B•j. Now use equation (3.13) and Exercise 49 No.6.

13. Prove that the composite transformation x → A (Bx) of (3.41) is indeed linear. Can yousee why a similar result holds for general linear transformations?

Hint 69 Consider A [B(us + vt)] = A[(Bu)s + (Bv)t] = · · ·

14. (“Linear combinations of linear combinations” theorem)

(a) In Exercise 49, No.8c you anticipated the following result: Let x be a column vectorwith m entries that depends linearly on a1 . . . , ak and suppose that each vector aj

depends linearly on b1, b2, . . . , bn. Then x depends linearly on b1, b2, . . . , bn. Usingmatrix multiplication give a short proof of this statement.It should be clear that the same result must be true if the vectors are rows, but givea modified matrix proof.


(b) Interpret this result in terms of the span concept.

Hint 70 Let the m × k matrix A have columns a1, . . . , ak and let the m × n matrix Bhave columns b1, . . . , bn. By assumption, there is an n× k matrix C such that A = BC.Since x = Au for some column vector u with k entries, the result follows. Fill in theremaining details.For the row case, think in terms of the transposes of the above matrices.

15. Show that the column case of No.14 can also be proved using∑

- notation as follows:Write x =

∑kj=1 ajuj and aj =

∑ni=1 bicij for j = 1, . . . , n. Then expand and rearrange

the sum

x =k∑

j=1

(n∑

i=1

bicij

)zj

16. (a) If the matrix sum A + B and the matrix product C (A + B) are defined, show thatC (A + B) = CA + CB.

(b) If the matrix sum A + B and and the matrix product (A + B)C are defined, showthat (A + B) C = AC + BC.

(c) If AB is defined, and s is a scalar, show s(AB) = (sA)B and (AB)s = A(Bs) andthat both are equal. Conclude with statements and proofs of more general resultsthan (a) and (b).

Hint 71 To show (a) use the definition of matrix multiplication and equation (3.14):

[C (A + B)]ij = Ci• [A + B]•j = Ci• (A•j + B•j) = Ci•A•j + Ci•B•j = · · ·If C is m× k and A and B are k × n, you may prefer to proceed directly:

[C (A + B)]ij =k∑

r=1

cir [A + B]rj =k∑

r=1

cir (arj + brj) = ...

17. *

(a) Let C be an n× n matrix satisfying C2 = I where I = In.

Show that (I − C)2 = I − C and that (2C − I)2 = I.

The rest of the question refers to No.11 above and to Exercise 59, No.2a, No.2b andNo.4.

(b) Let P be the projection matrix of Exercise 59, No.2a and write the vector u as acolumn vector u =

[u1 u2 u3

]T .

Show thati. P = 1

|u|2 uuT ,

ii. P 2 = P using matrix multiplication,iii. the matrix M = 2P − I reflecting in the line satisfies M2 = I.

(c) Let P be the projection matrix of Exercise 59, No.2b onto a plane through the originwith normal n =

[n1 n2 n3

]T .

Show thati. P = I − 1

|n|2 n nT ,

ii. P 2 = P using matrix multiplication,iii. the matrix M = 2P − I reflecting in the plane satisfies M2 = I.

(d) Complete analytical proofs of No.11 above.

Hint 72 Note that we are forming matrix products of a column and a row as in No.4.Also make use of No.17a.


3.4 Linear independence

In <2 let a and b be two column vectors. If if one depends linearly on the other, say a = bs fora scalar s, we say they are linearly dependent. Otherwise the vectors are linearly independent:neither depends on the other. In that case, if some linear combination as + bt = 0, then boths and t have to be zero. For if (say) t 6= 0, then b = a

(− st

), and b would depend on a.

Similarly, in <3 three column vectors a, b and c are linearly dependent if one of them is a linearcombination of the other two. This will be the case if there are numbers s, t, r not all zerosuch that as + bt + ct = 0. For example, if s 6= 0, then a will depend linearly on b and c. Theyare linearly independent if no vector depends linearly on the other two. In that case we canonly have as + bt + cr = 0 if s = t = r = 0.

Geometrically let a, b and c be linearly independent and let the tails of these vectors areplaced at the origin O. Then O and the heads of a, b and c do not lie on a plane. For moredetails on the geometric meaning of linear independence see subsection 3.4.10 below.We now generalize and formalize these ideas as follows:

3.4.1 Column independence and column dependence

Definition

Let a1, a2, · · · , an be a list of n column vectors each with m entries. Form the m× n matrix Awith these vectors as columns:

A =[

a1 a2 · · · an

]

If there is a non-trivial solution x 6= 0 to the homogeneous system

Ax = 0 (3.55)

the vectors a1, a2, · · · , an are said to be linearly dependent, otherwise they are linearlyindependent.

To put it another way, the columns of the m×n matrix A are linearly independent if, and onlyif for all column vectors x with n entries

Ax = 0 implies x = 0 (3.56)

To say the same thing once more, if we can find a linear combination such that

a1x1 + a2x2 + · · ·+ anxn = 0 (3.57)

in which one or more of the coefficients xj are not zero, then the vectors a1, a2,..., an arelinearly dependent (there is a non-trivial linear relationship among them), otherwise they areindependent.

3.4.2 An equivalent way to see independence/dependence

Let a1, a2, · · · , an be column vectors each with m entries.If there is only one vector, i.e. n = 1, then a1 is dependent if, and only if it is the zero vector.If n ≥ 2, the vectors are linearly dependent if, and only if, one of them depends linearly on theothers.

Proof :The proof of the case n = 1 is left to you. See Exercise 78, No.1a.

3.4. LINEAR INDEPENDENCE 105

Let n ≥ 2 and suppose the vectors are linearly dependent. Then (say) x1 6= 0 in equation(3.57). In that case a1 depends linearly on a2,..., an:

a1 = a2

(−x2

x1

)+ · · ·+ an

(−xn

x1

)

Similarly, if xj 6= 0, then aj depends linearly on the vectors ai for i 6= j.Conversely, if (say) there are scalars s2, . . . , sn such that

a1 = a2s2 + · · ·+ ansn

Thena1 + a2 (−s2) + · · ·+ an (−sn) = 0

Since the coefficient of a1 is 1 6= 0, the vectors are linearly dependent, as required.

For a slightly refined version of this theorem, see Exercise 78, No.1(c)v.

Independence/dependence of row vectors

The definition of linear independence for a list of row vectors is almost identical to that forcolumn vectors. See 3.4.14 below.

3.4.3 Some examples of independence/dependence

The most elementary example of linear independence

The simplest example of linearly independent column vectors is afforded by one or more columnsof the n× n identity matrix I = In. Since Ix = x, we can only have Ix = 0 if x = 0. Slightlymore complicated is

Example 3.4.1 Consider the matrix (3.25) of example 3.1.6:

A =

2 5 −11−3 −6 124 7 −13

A row-echelon was found in equation (3.26):

A′ =

2 5 −110 1 −30 0 0

The columns of A are linearly dependent, as can be seen from (3.27) by letting t 6= 0, sayputting t = 1.

The columns of the matrix (3.29) in example 3.1.7 are linearly independent since Ax = 0 hasonly the trivial solution x = 0.

The columns of A in Exercise 36 No.2b are linearly independent.

3.4.4 If a matrix has more columns than rows, its columns are linearlydependent; if more rows than columns then its rows are linearlydependent

This is just a rephrasing of the result of subsection 2.2.15 in Chapter 2: A homogeneous systemwith more unknowns than equations has a non-trivial solution.

The row case is quite similar.


3.4.5 Testing for linear independence of some columns of A

Suppose that A has seven columns and we wish to test if the columns of [A•2, A•5, A•6] arelinearly independent. This is the same as considering all vectors x with x1 = x3 = x4 = 0 andtesting the implication (3.56). The same principle obviously applies to any selection of columnsof A.

3.4.6 Column-rank and row-rank

If a1, a2, . . . , an are column vectors, their column-rank is the maximum number of vec-tors aj that are linearly independent. This is also the column-rank of the matrix A =[

a1 a2 · · · an

].

Row-rank The row-rank of a matrix A is defined similarly, except that ‘rows’ replace ‘columns’.See 3.4.14 below for more details.

Given a list of vectors of the same size (either rows or columns), it should be clear what is bemeant by their rank: it is the maximum number of linearly independent vectors among them.Below we show that the column- and row-ranks of a matrix are equal. (See subsection 3.4.17).

3.4.7 Elementary row operations preserve column independence anddependence

Consider a system Ax = b of m equations for n unknowns. Let[A′, b′

]be the array after

performing elementary row operations. We saw in the Chapter 2 that Ax = b holds if, and onlyif, A′x = b′ holds. In homogeneous case this reads

0 = A•1x1 + · · ·+ A•nxn if, and only if, 0 = A′•1x1 + · · ·+ A′•nxn

Any dependency relation among the columns of A also holds for A′ and vice-versa:

A•1x1 + · · ·+ A•nxn = 0 if, and only if, A′•1x1 + · · ·+ A′•nxn = 0

In particular, certain columns of A are linearly independent if, and only if, the same columnsof A′ are linearly independent. It follows that the column ranks of A and A′ are the same.Also, column A•j is a linear combination of other columns of A if, and only if, A′•j is a linearcombination of the corresponding columns of A′ with the same coefficients.

For these reasons, questions regarding dependence or independence of the columns of A arebest answered by considering the row reduced echelon form (row REF) of A.

3.4.8 The column-rank of a matrix is the number of non-zero rows ina row echelon form

Proof :

Let k be the number of non-zero rows in an echelon form of the m× n matrix A and supposethat A′′ is its row REF. Recall that the pivot columns (see 2.2.8) of A′′ are columns of theidentity matrix Im, which are certainly linearly independent. Hence the column-rank of A′′ isat least k. The k× n matrix of the non-zero rows of A′′ cannot have column-rank more than kby 3.4.4. Since the other rows of A′′ are zero, the column rank of A′′ is exactly k. This is alsothe column-rank of A.


3.4.9 Uniqueness of the solution to Ax = b. One-to-one linear trans-formations

If the columns of the m×n matrix A are linearly independent then for any m×1 column vectorb the system of equations Ax = b has at most one solution.

To see this, consider the following:

Suppose that u and v are two solutions, i.e. Au = b and Av = b. Then Au = Av and soAu−Av = A (u− v) = 0. As the columns of A are linearly independent, u− v = 0 and u = v.

Consider the matrix mapping x → Ax of (3.32). To say that this mapping is one-to-onemeans that for any two column vectors u and v with n entries, we can have Au = Av only ifu = v.In other words, two linear combinations of the columns of A are equal only if they have identicalcoefficients. The mapping x → Ax is one-to-one if, and only if, the column-rank of A is n.

Example 3.4.2 A row echelon form (2.2) of the coefficient matrix A of Example 2.1.2 inChapter 2 is

1 1 30 2 10 0 1

The column-rank of A is 3 and the mapping

x →

1 1 31 1 4−1 1 −2

x

from <3 to <3 is one-to-one.All this becomes even clearer if we look at the row-reduced echelon form in Example 2.2.7 ofChapter 2. This is the 3× 3 identity matrix and so Ax = b has a solution for any b ∈ <3: Themapping x → Ax has range the whole of <3.The example illustrates the following remark.

3.4.10 Geometric interpretation of column-rank: dimension

A proper definition of ‘dimension’ is not given in this course, but we can accept that a linein ordinary space through O has dimension 1, that a plane through O has dimension 2 andthat the dimension of the whole of ordinary space is 3. The column-rank of a matrix A canbe thought of as the dimension of its range when A is considered as a linear mapping. Tobetter appreciate this, see Exercise 78, No.19 as well as No.?? in Exercise ??.Consider the matrix of Example 3.3.1 that projects on a line. Each column is a multiple of anyother and the column-rank is 1, as expected.Again, consider the coefficient matrix A of equation (3.40). Because A•1 + A•2(2) + A•3 = 0,the column-rank of A is not 3. It is easy to see that any two columns are linearly independent,so r(A) = 2, in agreement with our comment.The range of a reflection matrix is the whole of space, which has dimension 3. So we expect itsrank to be 3, which is true.

3.4.11 The fundamental Theorem on linear independence

Let a1, a2, . . . , an, b be any list of n column vectors, each with m entries. Then the rank ofa1, a2, . . . , an, b is not more than the rank of a1, a2, . . . , an if, and only if, b depends linearly ona1, a2, . . . , an.


Put slightly differently, the theorem says that

column-rank [A, b] = column-rank A

if, and only if, Ax = b has a solution. Here, A is the m × n matrix having the vectors aj ascolumns.

Proof : The proof depends on the existence theorem regarding solutions to Ax = b (section 2.3in chapter 2).

Suppose that the column-rank of [A, b] is the same as that of A. Then the same applies to theREF: The column ranks of

[A′′, b′′

]and A′′ are equal. Since the last non-zero row of

[A′′, b′′

]cannot have the form

[0 . . . 0 b′r

]where b′r 6= 0, it follows that Ax = b has a solution.

Conversely, let Ax = b have a solution. Then the REFs[A′′, b′′

]and A′′ have the same number

of non-zero rows and so the same column-rank.

Remark 73 We emphasize that the above theorem 3.4.11 about column vectors is equally validfor row vectors. The statement 3.4.11 goes through word-for-word.For those proceeding to a more advanced study of linear algebra an independent proof of thefundamental theorem is provided in subsection 3.6.

3.4.12 Lemma: A special case

A special case concerning linear independence/dependence is of interest and useful in itself andwe state and give a separate proof.

Let a1, a2, . . . , an be a list of n linearly independent column vectors, each with m entries and b acolumn vector with m entries. Then the rank of a1, a2, . . . , an, b equals the rank of a1, a2, . . . , an

if, and only if, b depends linearly on a1, a2, . . . , an.

Proof :The rank of a1, a2, . . . , an, b is either n or n + 1. If it is n, then these vectors are linearlydependent, and there exist constants x1,...,xn, β not all zero such that

a1x1 + a2x2 + · · ·+ anxn + bβ = 0

We must have β 6= 0, otherwise one or more of the scalars xj would have to be non-zero,contradicting the linear independence of the vectors a1, a2, . . . , an. Therefore,

b = a1

(−x1β−1

)+ a2

(−x2β−1

)+ · · ·+ an

(−xnβ−1)

and b is a linear combination of a1, a2, . . . , an.Conversely, if b is a linear combination of a1, a2, . . . , an, then a1, a2, . . . , an, b cannot be linearlyindependent, and the lemma has been proved.

3.4.13 Every column of A is a linear combination of the linearly in-dependent pivot columns of A

This is because the pivot columns of A′′, when restricted to the non-zero rows, are the columnsof the identity matrix Ik.The statement says that the pivot columns span the column space and in Exercise 78 No.2 youare asked to show it is a consequence of the above basic Lemma.

Example 3.4.3 Consider example 3.1.6 once more. Can we find a set of linearly independentcolumns of A in (3.25) such that every column of A is a linear combination of these independentcolumns? What is the column-rank of A? Illustrate the statements of 3.4.8 and 3.4.13.


Solution:

The original matrix (3.28) is

A =

2 5 −11−3 −6 124 7 −13

The echelon form 3.26 is

2 5 −110 1 −30 0 0

shows that first two columns of A are linearly independent. The row REF shows thingseven more clearly:

1 0 20 1 −30 0 0

(3.58)

Every column of A is a linear combination of the first two, which are pivot columns. Thecolumn rank is 2.

Example 3.4.4 Let

A =

−1 −4 −3 23 4 1 22 9 7 −5−5 −1 4 −9

Illustrate subsections 3.4.7, 3.4.8 and 3.4.13.

Solution:

We first reduce the matrix A to row REF:−1 −4 −3 20 −8 −8 8 3R1 + R2

0 1 1 −1 2R1 + R3

0 19 19 −19 −5R1 + R4

−1 −4 −3 20 −1 −1 1 1

8R2

0 1 1 −10 1 1 −1 1

19R4

−1 −4 −3 20 −1 −1 10 0 0 0 R2 + R3

0 0 0 0 R2 + R4

−1 −4 −3 20 1 1 −1 −R2

0 0 0 00 0 0 0

−1 0 1 −2 4R2 + R1

0 1 1 −10 0 0 00 0 0 0

This gives the row REF A′′ of A as

E =

1 0 −1 20 1 1 −10 0 0 00 0 0 0


The pivot columns of E are E•1 and E•2 and they are linearly independent, and therefore soare A•1 and A•2. The column-rank of A is 2. Clearly, every column of E is a linear combinationof E•1 and E•2. Therefore, every column of A depends linearly on A•1 and A•2. For example,E•4 = E•1(2) + E•2(−1), and so A•4 = A•1(2) + A•2(−1), much as in the previous example.

3.4.14 Row Independence and row-rank

Everything that can be said about column independence has its counterpart for rows. Hence,as indicated earlier, the rows of the m×n matrix A are linearly independent if, and only if forall row vectors y with m entries,

yA = 0 implies y = 0

In other words, if we can find a row vector y 6= 0 such that the linear combination

y1A1• + y2A2• + · · ·+ ymAm• = 0

then the rows of A are linearly dependent, otherwise they are linearly independent.

The row-rank of A is the maximum number of linearly independent rows of A.

3.4.15 The non-zero rows of a matrix in row echelon form are linearlyindependent

To see this, let C be an m×n matrix in row echelon form, and suppose that C1•, C2•,...., Ck•,are its non-zero rows. Let

y1C1• + y2C2• + · · ·+ ykCk• = 0

where 0 is the zero row with n entries. The first non-zero entry in C1• has only zeros belowit. Therefore, y1 = 0 and y2C2• + · · ·+ ykCk• = 0. As the first non-zero entry in C2• has onlyzeros below it, y2 = 0. Carrying on the argument, y1 = y2 = · · · = yk = 0. The claim is proved.

For example, the first two rows of

−1 −4 −3 20 1 1 −10 0 0 00 0 0 0

For more examples see Chapter 2, including Exercises 36 and 41.

3.4.16 The row-rank of a matrix is unaltered by elementary row op-erations

We will use the fundamental theorem of subsection 3.4.11 to show directly that elementary rowoperations on an m× n matrix A do not change its row-rank.

It is clear that interchanging two rows cannot change the row-rank. It is almost as obviousthat multiplication of a row by a non-zero constant does not alter the row-rank (see No.1(c)ivin Exercise 78). It remains to show that replacing (say) A1• by βA2• + A1• to form A′ doesnot change the row-rank.Let the last m−1 rows A2•, · · · , Am• of A have rank k. Since βA2•+A1• is a linear combinationof these m−1 rows if, and only if, A1• is (see No.?? in Exercise 55), it follows that the row-rankof A′ is k + 1 if, and only if, the row-rank of A is k + 1.

3.5. RANKS OF MATRIX PRODUCTS 111

3.4.17 The the column-rank and row-rank of a matrix A are equal

Proof : Let an echelon form of A be A′. If A′ has r non-zero rows its row-rank is clearly r. Bywhat has just been proved, the row-rank of A is r and the required result follows since r is alsothe column-rank of A (see subsection 3.4.8).

3.4.18 Definition of rank r(A)

The common row- and column-rank of A is called the rank of A and is denoted by r(A).By 3.4.8, the rank of A is the number of non-zero rows in a row echelon form of A.

Example 3.4.5 Find the rank of the matrix

A =

0 0 −3 −3 −12 −2 −2 −3 −20 0 3 −2 −22 −3 −4 3 −3

Solution:

This is the matrix of Example 2.2.4 in Chapter 2. We found its row-echelon form in equation2.26:

2 −3 −4 3 −30 1 2 −6 10 0 3 −2 −20 0 0 −5 −3

Therefore, r(A) = 4

3.5 Ranks of matrix products

Note We know that the row- and column-ranks of a matrix are equal, but to see just whichone plays the important role, we distinguish them in this section.

3.5.1 A Column Lemma

Let A and B have the same number n of columns. If for all column vectors x with n entries,

Ax = 0 implies Bx = 0 (3.59)

then (column-rank A) ≥ (column-rank B).

Proof : We may suppose that the first k columns of B are independent, where k is the column-rank of B. Let x be any vector such that xk+1 = · · · = xn = 0 and Ax = 0. By assumption,Bx = 0. As the first k columns of B are independent, x = 0. Hence the first k columns of Aare independent.

Remark 74 We have actually shown a bit more: if the condition (3.59 holds, and certaincolumns of B are linearly independent, then the corresponding columns of A are also linearlyindependent.

3.5.2 Corollary: column-rank of a matrix product

Let the matrix product CA = B be defined. Then (column-rank A ≥ (column-rank B) and ifC has independent columns, the two ranks are equal.


For, if Ax = 0, then certainly Bx = 0 and so (column -rank A) ≥ (column-rank B) by theLemma 3.5.1.Now let C have independent columns and suppose Bx = 0. Then C(Ax) = 0 and thereforeAx = 0 as the columns of C are independent. It follows, again by the Lemma, that (column-rankB) ≥ (column-rank A) and equality holds.

Example 3.5.1 Let C be the 3 × 3 matrix of Example 3.4.2 of rank 3 and let A be the 3 × 3matrix of example 3.4.3 with column-rank 2. From the corollary we can conclude that thecolumn-rank of the product

CA =

1 1 3−1 1 −21 1 4

2 5 −11−3 −6 124 7 −13

=

11 20 −38−13 −25 4915 27 −51

is also 2.

The counterpart for rows of the Column Lemma is

3.5.3 A Row Lemma

Let A and B have the same number m of rows. If for all row vectors y with m entries,

yA = 0 implies yB = 0

then row-rank A ≥ row-rank B.

Remark 75 As with the Column Lemma, we have shown that if the condition (3.5.3) holdsand certain rows of B are linearly independent, the corresponding rows of A are also linearlyindependent.

3.5.4 Corollary: row-rank of a matrix product

Let AD = B be a matrix product. Then row-rank A ≥ row-rank B and equality holds if Dhas independent rows.

The proof of this Lemma and its corollary are left as exercises. See No.13 in Exercise 78.

3.6 Theorem*: The fundamental property of linear inde-pendence revisited

Note 76 To understand what follows you should have a thorough grasp of the content of the“linear combinations of linear combinations theorem” contained in No.8c of Exercise 49 andNo.14 of Exercise 65.

Let a1, a2, . . . , an, b be any list of n column vectors, each with m entries. Then the rank ofa1, a2, . . . , an, b is not more than the rank of a1, a2, . . . , an if, and only if, b depends linearly ona1, a2, . . . , an.

Put slightly differently, the theorem says that

column-rank [A, b] = column-rank A

if, and only if, Ax = b has a solution. Here, A is the m × n matrix having the vectors aj ascolumns.

Proof :

Let the maximum number of linearly independent vectors among a1, a2, . . . , an be s.

3.6. THEOREM*: THE FUNDAMENTAL PROPERTY OF LINEAR INDEPENDENCE REVISITED113

1. Suppose that the rank of a1, a2, . . . , an, b is s. Then, by the lemma 3.4.12, b must be alinear combination of any s linearly independent vectors among a1, a2, . . . , an.

2. Conversely (and this is the only non-trivial case), suppose that the rank of a1, a2, . . . , an,b is s + 1.Then we may assume (without loss in generality) that a1, . . . , as, b are linearly indepen-dent. By lemma 3.4.12, every vector aj is a linear combination of a1, . . . , as. Then if b werea linear combination of a1, a2, . . . , an, it would also be a linear combination of a1, · · · , as,which is a contradiction, because a1, . . . , as, b are assumed to be linearly independent.

Corollary 77 Let the matrix B consist of k linearly independent columns of a matrix A whichcannot be enlarged by columns of A without destroying independence. Then

1. Every column of A is a linear combination of columns of B (this follows from the Lemma)

2. k is the column-rank of A.

Exercise 78 1. In most of the elementary questions that follow, you may assume the givenvectors are either rows or columns, each with the same number of entries. Prove the givenstatements.

(a) A single vector a is linearly independent if, and only if, a 6= 0.

(b) Show that two vectors a, b are linearly dependent if, and only if, one is a multipleof the other. This follows from the alternate definition 3.4.2 but prove the resultdirectly from the definition 3.4.1. Compare the idea of parallel vectors in subsection1.1.10, Chapter 1.

(c) Let a1, a2, · · · , an be a given list of vectors. Show

i. If some ai = 0, then a1, a2, · · · , an are linearly dependent.ii. If n ≥ 2 and some subset, say a2, · · · , am, are linearly dependent, so are a1, a2, · · · , an

iii. If a1, a2, · · · , an are linearly independent, what can be said of any subset ofa1, a2, · · · , an?

iv. The rank of a1, a2, · · · , an is unaltered if one of the vectors aj is multiplied by anon-zero scalar k.

v. * Let n > 2 and suppose that a1, a2, ..., an are non-zero vectors.Prove that the vectors are linearly dependent if and only if some aj (j > 2)depends linearly on a1,..., aj−1. Translate this statement into one about linearindependence.Hint 79 Suppose the vectors a1, a2, ..an are linearly dependent. Since a1 6= 0,there must be a first index j > 2 such that the vectors a1,..., aj are linearlydependent.

2. How does Lemma 3.4.12 show why 3.4.13 follows directly from 3.4.8?

3. Let a, b, c be the position vectors of the points A, B and C respectively. Show that thethree vectors are linearly independent if, and only if, A, B, C and O do not lie on a plane.(This was mentioned in the preamble 3.4).

4. Let A = [C,D] where C and D are matrices with the same number of rows. Show that ifthe rows of C (or of D) are independent, so are those of A. Conclude that in any caser (A) > r (C) and r (A) > r (D).

5. For any matrix product AB, say why r(AB) ≤ r(A) and r(AB) ≤ r(B).

6. let C be m× n. If A is m×m of rank m, then r(AC) = r(C). Let B be n× n of rank n,then r(CB) = r(C). Consequently, r(ACB) = r(C).Prove these statements.


7. Without multiplying out, find the rank of

2 5 −11−3 −6 124 7 −13

1 1 3−1 1 −21 1 4

=

−14 −4 −4815 3 51−16 −2 −54

Hint 80 Consider Example 3.5.1.

Answer: The rank of the product is 2.

8. The following questions refer you to the matrices in Exercise 36 of Chapter 2. To answerthem you should consult the row reduced echelon forms (row REF) found in Exercise 41.Occasionally you may need to find the new row reduced echelon forms.

(a) Consider the matrix A of No.1a.Show that the first two columns of A form a maxi-mum independent set of columns of A and that every column of A is a linear combi-nation of A•1 and A•2. Can you find three linearly independent columns of A? Findthe rank r(A).Show that the first two rows of A form a maximum number of independent rows ofA.

Hint 81 Either show directly that A1• and A2• are linearly independent or use thematrix B of No.1b in Exercise 36.

(b) Let C be the row REF of matrix A of No.2 of Exercise 36.Show that A•5 depends linearly on A•1, A•2, A•3 and that every maximum set oflinearly independent columns must include A•4. Find a maximum set of linearlyindependent columns of A and the rank r(A).

Which of the following sets are linearly independent?

i. A•2, A•3, A•4, A•5ii. A•1, A•3, A•4, A•5iii. A•1, A•2, A•4, A•5

Find a maximum number of linearly independent solutions to the homogeneous equa-tion.

(c) Consider the coefficient matrix A in No.3 of Exercise 36.

i. Does any row vector c =[

c1 c2 c3 c4

]depend linearly on the rows of A ?

ii. Are the columns of the coefficient matrix A linearly independent?iii. Does Ax = b have a unique solution for every column vector b with 4 entries?iv. Let C be the augmented matrix of No.3a in Exercise 36. Does every column

vector b with 4 entries depend linearly on C•1, C•2, C•3, C•5?

9. Prove that in 3-space mutually orthogonal vectors are linearly independent. Hence com-plete a rigorous proof of No.17 of Exercise 2 in Chapter 1.

10. * Give rigorous proofs of the statements about planes in subsection 1.3.6.

11. * Let I be the m×m identity matrix. The matrix J resulting from I after performing anelementary row operation on I is called an m×m elementary matrix. Hence J is theresult of doing one of the following: (i) multiplying row r of I by β 6= 0, (ii) interchangingrows r and s of I, and (iii) changing row Is• to βIr• + Is• if s 6= r.

3.6. THEOREM*: THE FUNDAMENTAL PROPERTY OF LINEAR INDEPENDENCE REVISITED115

(a) Show that an elementary matrix J has rank m.

Solution:I = Im has rank m, while an elementary row operation on I does not change its(column)-rank.

(b) Show furthermore that if A is an m × n matrix, then JA is the result of doing thecorresponding elementary row operation on A.

Hint 82 Use [JA]i• = Ji•A. So, for example, let J be the result of a type (iii)operation. Then for i 6= s, [JA]i• = Ai•, while if i = s,

[JA]s• = Js•A = (βIr• + Is•)A = (βIr•)A + Is•A = βAr• + As•

(c) Define the notion of an “elementary column operation”. In the above what corre-sponds to elementary column operations?

(d) Give a description of ”column echelon form” and ”column reduced echelon form” ofa matrix A.

12. Show that A, AT , AT A and AAT all have the same rank.

Hint 83 First of all, recall that if a is a column vector and aT a = 0, then a = 0 (SeeExercise 49 No.11).Now suppose AT Ax = 0. Then xT AT Ax = (Ax)T Ax = 0.

13. Prove the Lemma 3.5.3 and its corollary 3.5.4.

Hint 84 One way: Use AT and BT in place of A and B in the previous Lemma 3.5.1.For a more pleasant direct proof, we may suppose that the first k = r(B) rows of B areindependent. Let y be any row vector such that yk+1 = · · · = ym = 0 and yA = 0. Nowproceed as in the Column Lemma 3.5.1, except use rows in place of columns.

14. Complete the following alternative proof of the corollary to the Column Lemma (Corollary3.5.2) which says r(BC) ≤ r(C). If k columns (say [BC]•1, · · · , [BC]•k) of BC arelinearly independent then so are C•1, · · · , C•k. Do the same for the row version.

15. In the following assume the relevant matrix products are defined. Prove the statements.

(a) If the columns of A are independent and AB = AC, then B = C

(b) If the rows of A are independent and BA = CA then B = C.

16. Let A be n × n and A 6= I, A 6= O and A2 = I, where I = In. Find all integers k ≥ 1such that Ak = I.

17. * Using (among other things) the Column Lemma 3.5.1, show that if the condition (3.59)holds then B = CA for some C. (This is essentially the converse of the corollary 3.5.2).State and prove the corresponding result for the Row Lemma 3.5.3.

Hint 85 Let d = d = Bi• be a row of B and let A′be the augmented matrix formed by

adding row d to A. Then A′x = 0 if, and only if, Ax = 0. Hence A

′and A have the same

column ranks and so the same row ranks. By the row version of the basic theorem 3.4.11on linear independence, d is a linear combination of the rows of A.

State and prove a similar result corresponding to the Row Lemma 3.5.3.

18. Find the ranks of the following matrices and prove your results correct.


(a) A is the projection matrix on the plane 2x + z = 0 of Exercise 59, No.1.

(b) A is the reflection matrix in the plane 2x + z = 0.

Partial Answer:Study the solutions to these matrices, and the result should become clear: The rankof the projection matrix is 2, that of the the reflection matrix is 3.Both confirm the comment in 3.4.10.

19. Let A be a 3 × 3 matrix. In each of the following cases A is considered as a lineartransformation. Describe the rank of A with a brief geometrical proof. Illustrate thecomment 3.4.10.

(a) A is a projection matrix onto a line that passes through the origin.

(b) A is a projection matrix onto a plane that passes through the origin.

(c) A is a matrix that reflects in a plane or line that passes through the origin.

20. * Use the results of Exercise 65, No.17 to give analytic proofs for No.19 above.

21. If Rθ is the 2× 2 rotation matrix of equation (3.53), prove algebraically that r(Rθ) = 2.

22. Let A be an m× n matrix.

(a) Describe the row REF A′′ of A if r (A) = n? What is A′′ if r (A) = m = n?

(b) Describe in general terms what A′′ looks like if r(A) = m.

(c) Let Ax = b be a system of m equations for n unknowns. If the rows of A areindependent then the system always has a solution. Show that this is a consequenceof the Fundamental Theorem 3.4.11.

(d) Let yA = c be a system of n equations for m unknowns y1, · · · , ym. If the columnsof A are independent then the system always has a solution. Show that this is alsoa consequence of the Fundamental Theorem 3.4.11.

(e) What can be said of the solutions in 22c and 22d if m = n?

3.7 Square matrices

In this section we consider square matrices A of size n × n. Unless otherwise stated, I standsfor the identity matrix In. Observe that r(I) = n.

3.7.1 Invertible matrices

3.7.2 Definition of invertibility

An n× n matrix A is said to be invertible (or non-singular or to have an inverse) if thereis a matrix B such that

AB = BA = I (3.60)

It is clear that such a matrix B must also be n× n. From the definition it is also clear that ifB is an inverse of A then A is an inverse of B.From Exercise 78 No.5 we see that a necessary condition for A to have an inverse is thatr(A) = n. The converse will be shown in 3.7.7.

3.7. SQUARE MATRICES 117

3.7.3 Uniqueness of the inverse, definition of A−1

Let X and Y be n× n matrices such that AX = I and Y A = I. Then X = Y and this matrixis the unique inverse of A.Proof : We have

Y = Y I = Y (AX) = (Y A) X = IX = X (3.61)

Hence X = Y is an inverse of A. But two inverses X and Y of A must satisfy (3.61), andso A can have only one inverse. It is denoted by A−1.

Example 3.7.1 Show directly that

[3 72 5

]−1

=[

5 −7−2 3

]

Solution:

By direct multiplication,[

3 72 5

] [5 −7−2 3

]=

[1 00 1

]=

[5 −7−2 3

] [3 72 5

]

3.7.4 Use of the inverse

3.7.5 Solving Ax = b

Consider a systemAx = b (3.62)

of n equations for n unknowns. If A has an inverse, then the unique solution is

x = A−1b

In the first place, we show that x = A−1b is a solution to (3.62):

A(A−1b

)=

(AA−1

)b = Ib = b

Conversely, let x = u satisfy the equation (3.62), i.e. suppose Au = b. Multiply both sides ofthis equation on the left by A−1:

A−1 (Au) =(A−1A

)u = Iu = u = A−1b

Thus x = A−1b is the unique solution to (3.62).

3.7.6 Solving yA = c

If A has an inverse, then for any row vector c with n entries, the unique solution to

yA = c (3.63)

isy = cA−1

The proof can be derived from 3.7.5 using transposes (see equation 3.9). Alternatively, theproof parallels that of the previous section. The unique solution is found by multiplying bothsides of equation (3.63) on the right by A−1.


Example 3.7.2 Solve [3 72 5

]x =

[9−11

]

Solution:

Use the result of Example 3.7.1:

x =[

3 72 5

]−1 [9−11

]=

[5 −7−2 3

] [9−11

]=

[122−51

]

3.7.7 Necessary and sufficient conditions for A to be invertible

The n×n matrix A is invertible if, and only if, any of the following equivalent conditions hold.

If (14) holds, necessarily X = A−1. If (15) holds, necessarily Y = A−1.

1. r(A) = n.

2. The columns of A are independent.

3. The rows of A are independent.

4. The homogeneous equation Ax = 0 has only the trivial solution.

5. The homogeneous equation yA = 0 has only the trivial solution.

6. The row REF of A is I.

7. The column REF of A is I (the row REF of AT is I).

8. For every column vector b with n entries Ax = b has a solution.

9. For every column vector b with n entries Ax = b has a unique solution.

10. There is a column vector b with n entries such that Ax = b has a unique solution.

11. For every row vector c with n entries yA = c has a solution.

12. For every row vector c with n entries yA = c has a unique solution.

13. There is a row vector c with n entries such that yA = c has a unique solution.

14. There is a matrix X such that AX = I.

15. There is a matrix Y such that Y A = I.

Proof

The first seven properties are equivalent from the definition of linear independence, the factthat row-rank is equal to column-rank, and because the row (column) REF of A is I if, andonly if, the columns (rows) of A are independent.

Now assume r(A) = n. Since the addition of column b (row c) to A does not change the column(row) rank, Ax = b has a solution by Lemma 3.4.12, and similarly so does yA = c. Thesesolutions are necessarily unique on account of the independence of the columns (rows) of A.

Assume next that (8) holds. Then we can solve the n matrix equations AX•j = I•j for thecolumns X•j and obtain an n× n matrix X such that AX = I. Similarly, if (11) holds we canfind an n× n matrix Y such that Y A = I.

By No.5 of Exercise 78, if either AX = I for some X or Y A = I for some Y , necessarily


r(A) = n.

Thus all the above properties (1) - (13) are equivalent.

Finally, suppose (14) holds. Then, as the properties are equivalent, Y A = I for some Y . By3.7.3, X = Y and X is the inverse of A.A similar argument shows that if (15) holds, necessarily Y = A−1.

Example 3.7.3 Show that none of the matrices[

2 −30 0

],

[2 01 0

],

[2 21 1

]

can have inverses.

Solution:

None of these has rank 2.

3.7.8 Reduction method finds A−1, or decides it does not exist

Reduce the augmented array [A, I] to row REF [A′′, I ′′]. Then A has an inverse if, and only if,A′′ = I , in which case I ′′ = A−1. We are using property (14) of 3.7.7.Note that if at any stage say, [A′, I ′] of the reduction, A′ has a zero row, then A−1 cannot exist.

Example 3.7.4 Consider the matrix of example 3.1.7:

A =

2 5 1−3 −6 04 7 −2

We found there that an echelon form has three non-zero rows. Therefore, r (A) = 3 and A−1

exists. Find A−1 by row reduction.

Solution:

Start with2 5 1 1 0 0−3 −6 0 0 1 04 7 −2 0 0 1

2 5 1 1 0 00 3

232

32 1 0 3

2R1 + R2

0 −3 −4 −2 0 1 (−2)R1 + R3

2 5 1 1 0 00 3

232

32 1 0

0 0 −1 1 2 1 2R2 + R3

2 5 1 1 0 00 1 1 1 2

3 0 23R2

0 0 1 −1 −2 −1 (−1)R3

2 5 0 2 2 1 −R3 + R1

0 1 0 2 83 1 −R3 + R2

0 0 1 −1 −2 −1


2 0 0 −8 − 343 −4 −5R2 + R1

0 1 0 2 83 1

0 0 1 −1 −2 −1

1 0 0 −4 − 173 −2 1

2R1

0 1 0 2 83 1

0 0 1 −1 −2 −1

Hence

A−1 =

−4 − 17

3 −22 8

3 1−1 −2 −1

Exercise 86 1. Find the inverses of the following matrices (where possible), by reductionto I, and solve the related problems.

(a)

A =[

2 13 2

]

(b) i.

B =

3 2 10 2 20 0 −1

ii. Find the linear transformation T that maps column B•j of B to the correspond-ing column Y•j of Y for j = 1, 2, 3, where

Y =

1 0 4−1 1 −22 3 1

Hint 87 You want a 3× 3 matrix T such that Y•j = TB•j. In fact, T = Y B−1

(c)

C =

1 6 2−2 3 57 12 −4

Answer: C Does not have inverse.

(d) i.

D =

1 3 −12 0 −3−1 4 2

ii. Use your inverse to solve

x1 + 3x2 − x3 = 32x1 − 3x3 = −1

−x1 + 4x2 + 2x3 = 2

iii. Use your inverse to solve

y1 + 2y2 − y3 = 43y1 + 4y3 = 5

−y1 − 3y2 + 2y3 = −1

2. Prove that an invertible matrix maps linearly independent vectors to linearly independentvectors and state and prove the converse.


3. Let A and B be n × n matrices. Show that AB is invertible if, and only if, both A andB are, in which case (AB)−1 = B−1A−1. Generalize this result to products of three ormore matrices.

4. Let the n× n matrix A be invertible. Show that AT is also invertible and that(AT

)−1 =(A−1

)T .

5. Let A be n × n. As a linear transformation, the mapping A : <n → <n is one-to-one if,and only if, A maps <n onto the whole of <n.Prove the statement. (Compare Example 3.4.2).

6. If M is a matrix that reflects in a plane or line passing through the origin, show alge-braically that M−1 = M .

Hint 88 Use M2 = I from Exercise 65, No.17

7. * Let A be an m× n matrix of rank k. Show that by applying elementary row operationsand elementary column operations to the matrix A we can find an invertible matrices Pand Q such that

PAQ =[

Ik Ok×(n−k)

O(m−k)×k O(m−k)×(n−k)

]

Hint 89 First reduce A to row REF. Now use elementary column operations and the pivotcolumns to reduce the non-pivot columns to zero columns. Apply suitable permutations tothe columns and finally use the results of Exercise 78), No.11.(Compare No.6 of the sameexercise).

8. * In Chapter 2, Exercise 41), No.3 we considered the system Ax = b of two equationsfor two unknowns (2.32). Show directly that, provided the determinant defined by |A| =a11a22 − a21a12 is not zero,

B =1|A|

[a22 −a12

−a21 a11

]

satisfies AB = BA = I, so that B = A−1. Hence derive Cramer’s solution of theequations, as found in Chapter 2. This theory will be further developed in Chapter 4. SeeChapter 4, subsection 4.1.1 and equation (4.1.3).

9. Let A be a 3 × 3 matrix such that 2I − 3A + 5A2 − 7A4 = 0, where 0 is the 3 × 3 zeromatrix. Show that A has an inverse and write it in terms of A.

Hint 90 A(7A3 − 5A + 3I) = 2I

10. Find all n× n invertible A such that A2 = A.



3.8.1 Matrix notation; columns and rows

(See subsection 3.1.1) An m× n matrix A = [aij ] is also described by [A]ij = aij .

<m×n denotes the set of m× n matrices with real number entries.Row i of the m× n matrix A is written

Ai• =[

ai1 ai2 .... ain

](i = 1, · · · , m)

Column j is written

A•j =

a1j

a2j

...amj

(j = 1, · · · , n)

Addition of two matrices of the same size is defined naturally, as is multiplication by a scalarα:

[αA]ij = α [A]ij

If A is m× n and x a column vector with n entries xi, we have

Ax =

A1•xA2•x

...Am•x

= A•1x1 + A•2x2 + ... + A•nxn

Here

Ai•x =[

ai1 ai2 · · · ain

]

x1

x2

...xn

=

[ai1x1 + · · · + ainxn

]

and

A•jxj =

a1jxj

a2jxj

...amjxj

Similarly, if y is a row vector with m entries yj ,

yA =[

yA•1 yA•2 · · · yA•n]

= y1A1• + y2A2• + · · ·+ ymAm•

3.8.2 Span, row space, column space

(See (see item 2 in subsection 3.1.1)

The span of the vectors a1, . . . , am (all of the same size) is the set of linear combinations ofa1, . . . , am. It is denoted by sp(a1, . . . , am).

The row space of a matrix A is the span of its rows (see also exercise ??).The column space of a matrix A is the span of its columns.


3.8.3 Matrices as mappings

(See section 3.2.Let A be a fixed m× n matrix. If x is a variable column vector with n entries, then

y = Ax

expresses y as a function of x; A transforms the vector x into the vector y, so the function isalso called a transformation or mapping x → y = Ax. The transformation is linear since for allvectors x1 and x2 and scalars α1 and α2,

A (x1α1 + x2α2) = (Ax1)α1 + (Ax2)α2

3.8.4 The product of two matrices

(See subsection 3.3.4)Let A be an m× k matrix and B a k × n matrix. Then the product AB is given by

[AB]ij = Ai•B•j (i = 1, . . . , m; j = 1, . . . , n)

Note that AB is only defined when the number of columns of A equals the number of rows ofB.

Then AB is m× n and

[AB]i• = Ai•B (i = 1, . . . , m)[AB]•j = AB•j (j = 1, . . . , n)

Furthermore,

Ai•B = ai1B1• + ai2B2• + · · ·+ aikBk•AB•j = A•1b1j + A•2b2j + · · ·+ A•kbkj

If x is a variable vector with n entries, then the composite transformation x → A (Bx) isgiven by

A (Bx) = (AB)x

More generally, we have the associative law for multiplication: if AB and BC are defined,then so are (AB)C and A (BC) and

(AB)C = A (BC)

3.8.5 Linear independence

For this see section 3.4.The columns of a matrix A are linearly independent if the equation

Ax = 0

has only the trivial solution x = 0.The rows of a matrix A are linearly independent if the equation

yA = 0

has only the trivial solution y = 0.Elementary row operations preserve linear relations among the columns of A. See 3.4.7.

The column rank of any m × n matrix A is the maximum number of linearly independentcolumns. This is the number of pivot columns in a row EF of A. (See subsection 3.4.8).The row and column ranks of A are equal. (See 3.4.18).The main theorem on linear independence is 3.4.11): If we add a column b with m entries toan m× n matrix A to form [A, b], then A and [A, b] have the same column rank if, and only ifAx = b has a solution, i.e., if, and only if, b is linearly dependent on the columns of A.


3.8.6 Invertible square matrices

(See subsection 3.7.3.An n× n matrix A is invertible if there is a matrix B such that AB = BA = I. When this isthe case, B is unique and B = A−1 is called the inverse of A.

A has an inverse if, and only if, r (A) = n and there are a number of equivalent conditionsfor this to be the case. See 3.7.7.

Chapter 4

Determinants and the CrossProduct

4.1 Determinants

The determinant of a 1× 1 matrix A = [a] is simply |A| = a (not |a|).

4.1.1 Determinants of 2× 2 matrices

Starting in Chapter 2, Exercise 41 No.3 and continuing in Chapter 3, Exercise 86, No.8 wefound that

A−1 =1|A|

[a22 −a12

−a21 a11

]

provided that the determinant of A,

|A| = a11a22 − a21a12 (4.1)

is not zero. We will now see that for 2× 2 matrices A,

4.1.2 A has an inverse if, and only if, |A| 6= 0.

Proof

Let d = |A| and consider the matrix

C =[

a22 −a12

−a21 a11

](4.2)

Then, by direct multiplication,

CA = AC =[ |A| 0

0 |A|]

= dI (4.3)

where I = I2 is the 2× 2 identity matrix. Therefore, if |A| 6= 0, the matrix

B =1|A|C =

1d

[a22 −a12

−a21 a11

]

satisfies

BA = AB = I

125

126 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT

and so B is the inverse of A.Conversely, suppose that A−1exists. Multiply both sides of equation (4.3) by A−1:

A−1 (AC) =(A−1A

)C = IC = C = dA−1

Now if d = 0, then C = O, and therefore A = O. Obviously, the zero matrix cannot have aninverse, so d 6= 0.

4.1.3 Cramer’s Rule for two equations in two unknowns

Let us use

A−1 =1d

[a22 −a12

−a21 a11

]

to solve the system of two equations for two unknowns:

A

[x1

x2

]=

[b1

b2

]

We have A−1 (Ax) = (A−1A)x = Ix = x = A−1b, so[

x1

x2

]=

1d

[a22 −a12

−a21 a11

] [b1

b2

]

=1d

[a22b1 − a12b2

−a21b1 + a11b2

]

Or,

x1 =∣∣∣∣

b1 a12

b2 a22

∣∣∣∣ / |A|

x2 =∣∣∣∣

a11 b1

a21 b2

∣∣∣∣ / |A|

Notice how column b replaces column A•j in the determinant of A for xj .This is known as Cramer’s Rule, after the 18th Century Swiss mathematician G. Cramer.

4.1.4 Properties of determinants of 2× 2 matrices

Although a determinant is a number, equation (4.1) is also called a 2 × 2 determinant. Thefollowing properties hold for 2× 2 determinants

1. The determinant of A is the same as the determinant of its transpose:∣∣∣AT

∣∣∣ = |A|

Proof :If A =[

a11 a12

a21 a22

], then AT =

[a11 a21

a12 a22

], so

∣∣AT∣∣ = a11a22 − a12a21 = |A|.

2. Interchanging rows changes the sign of the determinant :∣∣∣∣

A2•A1•

∣∣∣∣ = −∣∣∣∣

A1•A2•

∣∣∣∣ = − |A|

Proof :∣∣∣∣

A2•A1•

∣∣∣∣ =∣∣∣∣

a21 a22

a11 a12

∣∣∣∣ = a21a12 − a11a22 = − |A|

4.1. DETERMINANTS 127

3. Multiplying a row by a scalar λ multiplies the determinant by λ:∣∣∣∣λA1•A2•

∣∣∣∣ =∣∣∣∣

A1•λA2•

∣∣∣∣ = λ |A|

Proof :∣∣∣∣λA1•A2•

∣∣∣∣ =∣∣∣∣

λa11 λa12

a21 a22

∣∣∣∣ = λa11a22 − a21λa12 = λ |A|Similarly, ∣∣∣∣

A1•λA2•

∣∣∣∣ = λ |A|

4. Let C =[

c1 c2

], then we have the expansion

∣∣∣∣A1• + C

A2•

∣∣∣∣ =∣∣∣∣

A1•A2•

∣∣∣∣ +∣∣∣∣

CA2•

∣∣∣∣Proof :

∣∣∣∣A1• + C

A2•

∣∣∣∣ =∣∣∣∣

a11 + c1 a12 + c2

a21 a22

∣∣∣∣ = (a11 + c11)a22 − a21(a12 + c12)

= a11a22 − a21a12 + c1a22 − a21c2 =∣∣∣∣

a11 a12

a21 a22

∣∣∣∣ +∣∣∣∣

c1 c2

a21 a22

∣∣∣∣ =∣∣∣∣

A1•A2•

∣∣∣∣ +∣∣∣∣

CA2•

∣∣∣∣.

5. If one row is a multiple of another the determinant vanishes:∣∣∣∣A1•λA1•

∣∣∣∣ = 0

Proof :∣∣∣∣

A1•λA1•

∣∣∣∣ =[

a11 a12

λa11 λa12

]= a11λa12 − λa11a12 = 0.

6. Adding a multiple of one row to another row leaves the determinant invariant:∣∣∣∣A1•

λA1• + A2•

∣∣∣∣ = |A|

Proof :By (4) and (5),∣∣∣∣

A1•λA1• + A2•

∣∣∣∣ =∣∣∣∣

A1•λA1•

∣∣∣∣ +∣∣∣∣

A1•A2•

∣∣∣∣ = 0 + |A| = |A|.

7. If A and B are 2× 2 matrices, then

|AB| = |A| |B|Proof :

To better see the argument let A =[

a bc d

]. Then using properties (2) - (5) we have:

|AB| =∣∣∣∣[

a bc d

]B

∣∣∣∣ =∣∣∣∣

aB1• + bB2•cB1• + dB2•

∣∣∣∣

=∣∣∣∣

aB1•cB1• + dB2•

∣∣∣∣ +∣∣∣∣

bB2•cB1• + dB2•

∣∣∣∣

=∣∣∣∣

aB1•cB1•

∣∣∣∣ +∣∣∣∣

aB1•dB2•

∣∣∣∣ +∣∣∣∣

bB2•cB1•

∣∣∣∣ +∣∣∣∣

bB2•dB2•

∣∣∣∣

= 0 +∣∣∣∣

aB1•dB2•

∣∣∣∣−∣∣∣∣

cB1•bB2•

∣∣∣∣ + 0

= (ad− bc) |B| = |A| |B|


8. The above properties hold for columns in place of rows. In No.1 of Exercise 93 you areasked to formally state and prove this.

4.1.5 3× 3 determinants

4.1.6 The search for a formula solving simultaneous equations

Let us try to solve formally

a11 a12 a13

a21 a22 a23

a31 a32 a33

x1

x2

x3

=

b1

b2

b3

The detached array for this system is

a11 a12 a13 b1

a21 a22 a23 b2

a31 a32 a33 b3

Assuming a11 6= 0 this is equivalent to

a11 a12 a13 b1

0 −a21a11

a12 + a22 −a21a11

a13 + a23 −a21a11

b1 + b2 −a21a11

R1 + R2

0 −a31a11

a12 + a32 −a31a11

a13 + a33 −a31a11

b1 + b3 −a31a11

R1 + R3

In particular, this means[ −a21

a11a12 + a22 −a21

a11a13 + a23

−a31a11

a12 + a32 −a31a11

a13 + a33

] [x2

x3

]=

[ −a21a11

b1 + b2

−a31a11

b1 + b3

]

Assuming that ∣∣∣∣−a21

a11a12 + a22 −a21

a11a13 + a23

−a31a11

a12 + a32 −a31a11

a13 + a33

∣∣∣∣ 6= 0

we can use Cramer’s rule to find

x2 =

∣∣∣∣−a21

a11b1 + b2 −a21

a11a13 + a23

−a31a11

b1 + b3 −a31a11

a13 + a33

∣∣∣∣∣∣∣∣−a21

a11a12 + a22 −a21

a11a13 + a23

−a31a11

a12 + a32 −a31a11

a13 + a33

∣∣∣∣(4.4)

Multiply numerator and denominator of (4.4) by a211and use property 3 of subsection 4.1.4

above:

x2 =

∣∣∣∣a11b2 − a21b1 a23a11 − a21a13

a11b3 − a31b1 a33a11 − a31a13

∣∣∣∣∣∣∣∣

a11a22 − a21a12 a23a11 − a21a13

a11a32 − a31a12 a33a11 − a31a13

∣∣∣∣

Consider the denominator:(a11a22 − a21a12)(a33a11 − a31a13)− (a11a32 − a31a12)(a23a11 − a21a13)

= a11a11a22a33 − a11a13a22a31 − a11a12a21a33 + a12a21a31a13

−a11a11a23a32 + a11a13a21a32 + a11a12a23a31 − a12a13a21a31


= a11(a11a22a33 − a13a22a31 − a12a21a33 − a11a23a32 + a13a21a32 + a12a23a31).

The numerator for x2 is found by substituting b1 for a12, b2 for a22 and b3 for a32:a11(a11b2a33 − a13b2a31 − b1a21a33 − a11a23b3 + a13a21b3 + b1a23a31).

Finally, we find

x2 =a11b2a33 − a13b2a31 − b1a21a33 − a11a23b3 + a13a21b3 + b1a23a31

a11a22a33 − a13a22a31 − a12a21a33 − a11a23a32 + a13a21a32 + a12a23a31(4.5)

Definition:

The denominator of equation (4.5) is defined as the determinant |A| of the 3× 3 matrix A:

|A| = a11a22a33 − a13a22a31 − a12a21a33 − a11a23a32 + a13a21a32 + a12a23a31 (4.6)

|A| is called a 3× 3 determinant and it is sometimes written as det A.

Equation (4.5) can be written

x2 =

∣∣∣∣∣∣

a11 b1 a13

a21 b2 a23

a31 b3 a33

∣∣∣∣∣∣|A|

Can you guess what x1and x3 are, if |A| 6= 0? This is again Cramer’s rule. (See exerciseNo.7 below).

4.1.7 Discussion

Each term of (4.6) is of the form±a1ia2ja3k

The column indices i, j, and k form an arrangement (permutation) ijk of the numbers 1, 2,and 3. In fact all six permutations occur and each has its own sign according to the scheme:

+123 −321 −213 −132 +312 +231

4.1.8 The rule is for finding the sign of ijk

Start with 123 as +. Each time an interchange of two numbers occurs, change the sign as follows:

+123 −→ −132 (interchange 2 and 3)

−132 −→ +231 (interchange 1 and2)


−213 −→ +312 (interchange 2 and 3)



4.1.9 Formal definition of |A|Let σ = (σ1, σ2, σ3) denote a typical permutation of 1, 2, 3 and let sgn(σ) be its sign. Then wecan write

|A| =∑

σ

sgn(σ)a1σ1a2σ2a3σ3 (4.7)

Here the sum extends over all 3! = 6 permutations σ of 1, 2, 3.

4.1.10 The cofactor Ai|j of aij

The cofactor of aij is defined as

Ai|j = (−1)i+j times determinant formed by crossing out row i and column j

Example 4.1.1 A1|1 = (−1)1+1

∣∣∣∣a22 a23

a32 a33

∣∣∣∣ =∣∣∣∣

a22 a23

a32 a33

∣∣∣∣,

A1|2 = (−1)1+2

∣∣∣∣a21 a23

a31 a33

∣∣∣∣ = −∣∣∣∣

a21 a23

a31 a33

∣∣∣∣,

A2|2 = (−1)2+2

∣∣∣∣a11 a13

a31 a33

∣∣∣∣ =∣∣∣∣

a11 a13

a31 a33

∣∣∣∣,

A2|3 = (−1)2+3

∣∣∣∣a11 a12

a31 a32

∣∣∣∣ = −∣∣∣∣

a11 a12

a31 a32

∣∣∣∣,

A3|1 = (−1)3+1

∣∣∣∣a12 a13

a22 a23

∣∣∣∣ =∣∣∣∣

a12 a13

a22 a23

∣∣∣∣

4.1.11 The sign pattern for the cofactors Ai|j

Each time we move by one row or by one column we get a change in sign:

+ − +− + −+ − +

4.1.12 Expansion of |A| by a row

Expansion by row 1:|A| = a11a22a33 − a11a23a32 + a12a23a31 − a12a21a33 + a13a21a32 − a13a22a31

= a11(a22a33 − a23a32)− a12(a21a33 − a23a31) + a13(a21a32 − a22a31)

= a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣ + a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣Hence

|A| = a11A1|1 + a12A1|2 + a13A1|3



−a21(a12a33 − a13a32) + a22(a11a33 − a13a31)− a23(a11a32 − a12a31)

= −a21

∣∣∣∣a12 a13

a32 a33

∣∣∣∣ + a22

∣∣∣∣a11 a13

a31 a33

∣∣∣∣− a23

∣∣∣∣a11 a12

a31 a32

∣∣∣∣

|A| = a21A2|1 + a22A2|2 + a23A2|3


= a31 (a12a23 − a13a22)− a32 (a11a23 − a13a21) + a33 (a11a22 − a12a21)

= a31

∣∣∣∣a12 a13

a22 a23

∣∣∣∣− a32

∣∣∣∣a11 a13

a21 a23

∣∣∣∣ + a33

∣∣∣∣a11 a12

a21 a22

∣∣∣∣

Hence:

|A| = a31A3|1 + a32A3|2 + a33A3|3

Example 4.1.2 Let

A =

1 3 −12 4 0−2 5 −3

Find |A| by expanding by the second row.

Solution:

|A| = a21A2|1 + a22A2|2 + a23A2|3

= (2)(−1)2+1

∣∣∣∣3 −15 −3

∣∣∣∣ + (4)(−1)2+2

∣∣∣∣1 −1−2 −3

∣∣∣∣= (−2)(−4) + (4)(−5) = −12

4.1.13 Expansion by row i

The general formula is|A| = ai1Ai|1 + ai2Ai|2 + ai3Ai|3 (4.8)

4.1.14 Properties of 3× 3 determinants

1. The determinant of A and its transpose are equal:

|A| =∣∣AT

∣∣

Proof :

|A| = a11a22a33 − a13a22a31 − a12a21a33 − a11a23a32 + a13a21a32 + a12a23a31

= a11a22a33 − a31a22a13 − a21a12a33 − a11a32a23 + a21a32a13 + a31a12a23.

If B = AT then bij = aji by definition, so that |A| is equal to

b11b22b33 − b13b22b31 − b12b21b33 − b11b23b32 + b12b23b31 + b13b21b32


= |B|This proves the result.

Note 91 The idea of the proof can be seen as follows:Consider some term of the full expansion of |A|, say −a12a21a33, or +a13a21a32. In thefirst case, −a12a21a33 = −a21a12a33 = −b12b21b33. In the second case, +a13a21a32 =+a21a32a13 = +b12b23b31. In both cases we get a typical term of |B| with its correct sign.

2. Interchanging two rows changes the sign of the determinant.

Proof :

Consider, for example, interchanging rows 2 and 3. Expand |A| by row 1:

|A| = a11A1|1 + a12A1|2 + a13A1|3

= a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣ + a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣

Hence, if we interchange A2• and A3•, the 2 × 2 determinants will change sign and sowill |A| .

3. Multiplying a row of A by a scalar multiplies the determinant by the same scalar:∣∣∣∣∣∣

λA1•A2•A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•λA2•A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•A2•λA3•

∣∣∣∣∣∣= λ |A|

Proof :

Consider for example multiplying row 2 by λ. Expand the result by row 2:∣∣∣∣∣∣

A1•λA2•A3•

∣∣∣∣∣∣= (λa21)A2|1 + (λa22)A2|2 + (λa23)A2|3 = λ

(a21A2|1 + a22A2|2 + a23A2|3

)= λ |A|

Less elegantly, this can also be seen by expanding by row 1:∣∣∣∣∣∣

A1•λA2•A3•

∣∣∣∣∣∣= a11

∣∣∣∣λa22 λa23

a32 a33

∣∣∣∣− a12

∣∣∣∣λa21 λa23

a31 a33

∣∣∣∣ + a13

∣∣∣∣λa21 λa22

a31 a32

∣∣∣∣

= λ |A|

We have used the fact that each cofactor gets multiplied by λ.

4. Let B =[

b1 b2 b3

], then

∣∣∣∣∣∣

A1• + BA2•A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•A2•A3•

∣∣∣∣∣∣+

∣∣∣∣∣∣

BA2•A3•

∣∣∣∣∣∣= |A|+

∣∣∣∣∣∣

BA2•A3•

∣∣∣∣∣∣Proof :

∣∣∣∣∣∣

A1• + BA2•A3•

∣∣∣∣∣∣= (a11 + b1)A1|1 + (a12 + b2)A1|2 + (a13 + b3)A1|3

= a11A1|1 + a12A1|2 + a13A1|3 + b1A1|1 + b2A1|2 + b3A1|3

=

∣∣∣∣∣∣

A1

A2

A3

∣∣∣∣∣∣+

∣∣∣∣∣∣

BA2

A3

∣∣∣∣∣∣


Similar results apply if the row B is added to another row, e.g.∣∣∣∣∣∣

A1•A2• + B

A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•A2•A3•

∣∣∣∣∣∣+

∣∣∣∣∣∣

A2•B

A3•

∣∣∣∣∣∣

Using property (2) this can be proved directly from the first result as follows∣∣∣∣∣∣

A1•A2• + B

A3•

∣∣∣∣∣∣= −

∣∣∣∣∣∣

A2• + BA1•A3•

∣∣∣∣∣∣= −

∣∣∣∣∣∣

A2•A1•A3•

∣∣∣∣∣∣−

∣∣∣∣∣∣

BA1•A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•A2•A3•

∣∣∣∣∣∣+

∣∣∣∣∣∣

A2•B

A3•

∣∣∣∣∣∣

5. If one row of A is a multiple of another then, |A| = 0.Proof

First suppose that two rows are equal, say A1• = A2•. Then interchanging rows 1 and2 of A changes the sign of the determinant by property (2). Hence |A| = − |A| and so|A| = 0. Now let A1• = λA2•. By property (3), |A| is λ times a determinant with equalrows 1 and 2 and so vanishes.

6. Adding a multiple of one row to a different row leaves |A| unchanged, e.g.∣∣∣∣∣∣

A1•λA1• + A2•

A3•

∣∣∣∣∣∣= |A|

Proof :

Use property 5:∣∣∣∣∣∣

A1•λA1• + A2•

A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•A2•A3•

∣∣∣∣∣∣+

∣∣∣∣∣∣

A1•λA1•A3•

∣∣∣∣∣∣=

∣∣∣∣∣∣

A1•A2•A3•

∣∣∣∣∣∣+ 0 = |A|

7. If A and B are 3× 3 matrices,∗ |AB| = |A| |B| (4.9)

* We will not prove this but note that the proof follows that for 2× 2 determinants.

8. Any of the above properties (2) - (6) are true for columns in place of rows.

Proof :

This follows from property (1), namely that |A| =∣∣AT

∣∣.In particular, we have:

(a) The expansion of |A| by column j:

|A| = A1|ja1j + A2|ja2j + a3jA3|ja2j

(b) Interchanging two columns changes the sign of the determinant.

(c) Multiplying a column of A by a scalar multiplies the determinant by the same scalar.

(d) If one column of A is a multiple of another, then |A| = 0.

(e) Adding a multiple of one column to another leaves |A| unchanged.


Example 4.1.3 Let

A =

1 3 −12 4 0−2 5 −3

Find |A| by expanding by the third column.

Solution:

|A| = A1|3a13 + A2|3a23 + A3|3a33

= (−1)1+3

∣∣∣∣2 4−2 5

∣∣∣∣ (−1) + A2|3(0) + (−1)3+3

∣∣∣∣1 32 4

∣∣∣∣ (−3)

= 18(−1) + (−2)(−3) = −12

Simplification:

We may also simplify the evaluation of the determinant by adding a multiple of a row toanother: ∣∣∣∣∣∣

1 3 −12 4 0−5 −4 0

∣∣∣∣∣∣ (−3)R1 + R3

Expanding by the third column gives

|A| = (−1)1+3

∣∣∣∣2 4−5 −4

∣∣∣∣ (−1) = −12

4.1.15 Fundamental properties of cofactors

For rows:

ai1Aj|1 + ai2Aj|2 + ai3Aj|3 = |A| if i = j (4.10)ai1Aj|1 + ai2Aj|2 + ai3Aj|3 = 0 if i 6= j

For columns:

A1|ia1j + A2|ia2j + A3|ia2j = |A| if i = j (4.11)A1|ia1j + A2|ia2j + A3|ia2j = 0 if i 6= j

Proof

We know that if i = j in (4.10) the left-hand side is just the expansion of |A| by row i.Now let i 6= j, say i = 2 and j = 1. Then

a21A1|1 + a22A1|2 + a23A1|3

= a21

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a22

∣∣∣∣a21 a23

a31 a33

∣∣∣∣ + a23

∣∣∣∣a21 a22

a31 a32

∣∣∣∣

This is the expansion of ∣∣∣∣∣∣

A2•A2•A3•

∣∣∣∣∣∣by its first row and so vanishes.The result for columns can be derived from that for rows.


4.1.16 The adjoint adj(A) of A and the inverse A−1

The adjoint of the n× n matrix A is the n× n matrix adj(A), where

[adj (A)]ij = Aj|i

This means that we can first find the matrix B with [B]ij = Ai|j and then adj(A) = BT .

4.1.17 Properties (4.10) and (4.11) as matrix products

Properties (4.10) and (4.11) just say

A adj (A) = |A| I = adj (A) A (4.12)

where I = I3 is the 3× 3 identity matrix.

Remark 92 For a 2× 2 matrix A =[

a11 a12

a21 a22

], the adjoint is adj(A) =

[a22 −a12

−a21 a11

].

This was found in equation (4.2) and the property (4.12) was found in equation (4.3).

4.1.18 Corollary: condition for A to be invertible. Formula for theinverse

The 3× 3 matrix A is invertible if, and only if, |A| 6= 0, in which case

A−1 =1|A|adj (A) (4.13)

(For the definition of ‘invertible’, see Chapter 3, subsection 3.7.2).

Proof

If |A| 6= 0, then equation (4.12) shows that

A

(1|A|adj (A)

)= I =

(1|A|adj (A)

)A

In other words, (4.13) holds.

Conversely, suppose that A−1 exists. Let an elementary row operation be done on A to obtainA′.By properties (2), (3) and (6) we see that |A| 6= 0 if, and only if, |A′| 6= 0. As A−1 exists, itsrow REF is I. Since |I| = 1, it follows that |A| 6= 0.

Example 4.1.4 Let A =

2 4 30 1 −13 5 7

If A−1 exists, find it.

Solution:

|A| = a21A2|1 + a22A2|2 + a23A2|3

= 0 +∣∣∣∣

2337

∣∣∣∣− (−1)∣∣∣∣

2435

∣∣∣∣ = 3

As |A| 6= 0, A−1exists. We find

A1|1 =∣∣∣∣

1 −15 7

∣∣∣∣ = 12, A1|2 = −∣∣∣∣

0 −13 7

∣∣∣∣ = −3, A1|3 =∣∣∣∣

0 13 5

∣∣∣∣ = −3


A2|1 = −∣∣∣∣

4 35 7

∣∣∣∣ = −13, A2|2 =∣∣∣∣

2 33 7

∣∣∣∣ = 5, A2|3 = −∣∣∣∣

2 43 5

∣∣∣∣ = 2

A3|1 =∣∣∣∣

4 31 −1

∣∣∣∣ = −7, A3|2 = −(−2) = 2, A3|3 =∣∣∣∣

2 40 1

∣∣∣∣ = 2

Hence

A−1 =1|A|adjA =

1|A|

A1|1 A1|2 A1|3A2|1 A2|2 A2|3A3|1 A3|2 A3|3

T

=13

12 −3 −3−13 5 2−7 2 2

T

=13

12 −13 −7−3 5 2−3 2 2

4.2 Higher Order Determinants

Let A be an n × n matrix. As with three numbers (see subsection 4.1.7), a permutation(σ1, σ2, ..., σn) of 1, 2, · · · , n is called even if it can be obtained from

1, 2, · · · , n by an even number of interchanges, otherwise odd.sign (σ) = +1 if σ is even, and sgn(σ) = −1 if σ is odd.Following the formal definition for 3× 3 determinants 4.7, for a general n× n matrix A, its

determinant |A| is given by

|A| =∑

σ

sgn(σ)a1σ1a2σ2...anσn

where the sum extends over all n! permutations σ of the numbers 1, 2, · · · , n.Recall that n! = (1) (2) (3) · · · (n− 1) (n) and is called n factorial.As with 2× 2 and 3× 3 determinants (4.1.10 above), if A is an n× n matrix, the cofactor

Ai|j of aij is given by

Ai|j = (−1)i+j times the determinant formed by crossing out row i and column j

For a 4× 4 determinant the sign pattern for the cofactors is

+ − + −− + − ++ − + −− + − +

4.2.1 All the properties of 2× 2 and 3× 3 determinants hold for n× ndeterminants

So properties in 4.1.14, 4.1.13 (equation 4.8) carry over to n × n determinants. In particular,the formula |AB| = |A| |B| and equation (4.12) hold as well as the statement on the inverse inCorollary 4.1.18 and equation (4.13).

4.2. HIGHER ORDER DETERMINANTS 137

Example 4.2.1 Find |A| if

A =

3 −2 −1 21 −2 4 0−1 3 1 −30 1 −1 2

Solution:

|A| =

∣∣∣∣∣∣∣∣

3 −2 −3 61 −2 2 4−1 3 4 −90 1 0 0

∣∣∣∣∣∣∣∣

Here we have added column 2 to column 3 and (−2)(Column 2) to column 4. Expandingby row 4:

|A| = (−1)4+2

∣∣∣∣∣∣

3 −3 61 2 4−1 4 −9

∣∣∣∣∣∣

=

∣∣∣∣∣∣

0 −3 03 2 83 4 −1

∣∣∣∣∣∣= (−1)1+2(−3)

∣∣∣∣3 83 −1

∣∣∣∣ = −81

We have added column 2 to column 1 and 2(column 2) to column 3 then expanded by row 1.

Exercise 93 1. Using property (1) for 2 × 2 determinants, state and prove similar resultsfor columns in (2)-(6).

2. Evaluate the determinants of the matrices in Chapter 3, Exercise 86, No. 1. Find theinverses of the matrices by the adjoint method if they exist.

3. Evaluate the determinants

(a)

∣∣∣∣∣∣∣∣

1 −3 0 −23 −12 −2 −6−2 10 2 5−1 6 1 3

∣∣∣∣∣∣∣∣Solution: The value of the determinant is −1.

(b)

∣∣∣∣∣∣∣∣

−3 7 8 42 1 0 36 −1 −2 −1−3 4 2 11

∣∣∣∣∣∣∣∣

(c)

∣∣∣∣∣∣∣∣∣∣

−3 7 8 4 −72 1 0 3 −16 −1 −2 −1 1−3 4 2 11 −4

−3175 − 252 17 59 12

∣∣∣∣∣∣∣∣∣∣

4. Verify that the adjoint of a 2× 2 matrix is given by equation (4.2).

5. Let A be a 3× 3 matrix and k a scalar. Show that |kA| = k3 |A|. What do you think thegeneral result is?

6. * Suppose the n× n matrix A is not invertible. Show that A(adj A) is the zero matrix.

Solution:Since A is not invertible, |A| = 0 by 4.1.18. The result follows from equation (4.12).


7. State and prove Cramer’s rule for 3 equations in 3 unknowns. Do the same for n equationsin n unknowns using the fact that higher order determinants have the same properties as2 and 3 determinants.

8. (Jacobi) Let the n × n matrix matrix A have a non-zero determinant. Show that aij =|A| [A−1

]j|i

9. (a) Let A = [aij ] be a 3× 3 matrix and β a scalar. Replace the entry a23 with β and letthe resulting matrix be B. Show that |B| = |A|+ (β − a23) A2|3, and find a criterionfor B to be invertible.

(b) Generalize the previous result.

10. Let A be a 3 × 3 matrix. A scalar λ such that A − λI is not invertible is called aneigenvalue of A This means that |A− λI| = 0 and so . Show that λ must satisfy apolynomial equation f(λ) = 0 of degree 3 (called the characteristic polynomial of A).Find this polynomial for the matrix

A =

10 −12 −15 −6 −1−11 12 0

as well as the eigenvalues.Since|A− λI| = 0 it folows that there are non-zero vectors (called eigenvectors) v suchthat (A− λI|v = 0. Find eigenvectors corresponding to the eigenvalues.

11. Let |A| be a 3× 3 determinant. We can permute the rows A1•, A2• and A3• in six ways.For which of these does the determinant keep its sign and for which does it change itssign?State and prove a similar result for the columns A•1, A•2 and A•3 of A.

4.3 The Cross Product a× b

In this section all vectors have three entries and, as in the notation of Chapter 1, will normallybe row vectors. The unit vectors i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) were introduced insubsection 1.1.17.

Let a = (a1, a2, a3) and b = (b1, b2, b3) are two vectors. Their cross product is defined as

a× b =∣∣∣∣

a2 a3

b2 b3

∣∣∣∣ i−∣∣∣∣

a1 a3

b1 b3

∣∣∣∣ j +∣∣∣∣

a1 a2

b1 b2

∣∣∣∣ k (4.14)

=(∣∣∣∣

a2 a3

b2 b3

∣∣∣∣ ,−∣∣∣∣

a1 a3

b1 b3

∣∣∣∣ ,

∣∣∣∣a1 a2

b1 b2

∣∣∣∣)

This is sometimes called the vector product and is used constantly in mechanics.Treating i, j and k as though they are scalars (which they are not), the cross product can beremembered by the formula

a× b =

∣∣∣∣∣∣

i j ka1 a2 a3

b1 b2 b3

∣∣∣∣∣∣Expanding this ‘determinant’ by its first row gives (4.14).If a and b are written as columns the convention is to write

a× b =

∣∣∣∣∣∣

i a1 b1

j a2 b2

k a3 b3

∣∣∣∣∣∣but we get the same result as before.

4.3. THE CROSS PRODUCT A×B 139

Example 4.3.1

(−1, 2, 4)× (2,−3, 1) =

∣∣∣∣∣∣

i j k−1 2 42 −3 1

∣∣∣∣∣∣

=∣∣∣∣

2 4−3 1

∣∣∣∣ i−∣∣∣∣−1 42 1

∣∣∣∣ j +∣∣∣∣−1 22 −3

∣∣∣∣ k

= (14, 9,−1)

Example 4.3.2 A characteristic property of i, j and k

i× j = k j × k = i k × i = j

To see these results, consider

i× j =

∣∣∣∣∣∣

i j k1 0 00 1 0

∣∣∣∣∣∣

=∣∣∣∣

0 01 0

∣∣∣∣ i−∣∣∣∣

1 00 0

∣∣∣∣ j +∣∣∣∣

1 00 1

∣∣∣∣ k = k

j × k =

∣∣∣∣∣∣

i j k0 1 00 0 1

∣∣∣∣∣∣

=∣∣∣∣

1 00 1

∣∣∣∣ i−∣∣∣∣

0 00 1

∣∣∣∣ j +∣∣∣∣

0 10 0

∣∣∣∣ k = i

k × i =

∣∣∣∣∣∣

i j k0 0 11 0 0

∣∣∣∣∣∣

=∣∣∣∣

0 10 0

∣∣∣∣ i−∣∣∣∣

0 11 0

∣∣∣∣ j +∣∣∣∣

0 01 0

∣∣∣∣ k = j

4.3.1 Properties of the cross product

The cross product (4.14) is quite different from the dot product a · b in that it is a vector whilethe dot product is a scalar. Nevertheless, some properties are shared with the dot product.

For all vectors a, b and c and scalars β and γ:

1.b× a = − (a× b)

Proof :

In b × a we interchange the rows in the cofactors defining a × b. Since these all changesign, the result follows.

2. (The triple scalar product)

c · (a× b) = b · (c× a) = a · (b× c) (4.15)

Proof :


First note that

c · (a× b) = c1

∣∣∣∣a2 a3

b2 b3

∣∣∣∣−∣∣∣∣

a1 a3

b1 b3

∣∣∣∣ c2 +∣∣∣∣

a1 a2

b1 b2

∣∣∣∣ c3 (4.16)

=

∣∣∣∣∣∣

c1 c2 c3

a1 a2 a3

b1 b2 b3

∣∣∣∣∣∣

Here we have expanded the determinant by its first row. The other properties follow byrearranging the rows of the above determinant according to the permutations bca and abcof cab. These are all even and so the other determinants are all equal. (See Exercise 93,No.11).

3.a · (a× b) = b · (a× b) = 0

Proof :

The statement a · (a× b) = 0 follows by putting c = a in the determinant of (4.16). Theresulting determinant is zero as two of its rows are the same. The result b · (a× b) = 0 issimilar. Geometrically, this means that if a× b is non-zero it is perpendicular to both aand b.

4.β(a× b) = (βa)× b = a× (βb)

5.(β + γ)(a× b) = β(a× b) + γ(a× b)

6.(αa + βb)× c = (αa× c) + (βb× c)

7. If either a = 0 or b = 0 then a× b = 0. Otherwise, a × b = 0 if, and only if, a and b areparallel. (See Exercise 94 No. 8)

4.3.2 Geometric meaning of the cross product a× b

:i

6a× b = (|a||b| sin θ) c

ab

θ

Figure 4.1

¾

Suppose that 0 < θ < π is the angle between the non-zero and non-parallel vectors a and b.Let c be the unit vector perpendicular to a and b with direction found by using the corkscrewrule: rotate from a to b through θ, then the direction of c goes the way a corkscrew would go.


See Chapter 1, subsection 1.1.1 and Figure 4.1.

Geometric Interpretationa× b = (|a||b| sin θ) c (4.17)

Proof of equation (4.17):

We first show that|a× b|2 + (a · b)2 = |a|2|b|2

We have

|a× b|2 + (a · b)2

=∣∣∣∣

a2 a3

b2 b3

∣∣∣∣2

+∣∣∣∣

a1 a3

b1 b3

∣∣∣∣2

+∣∣∣∣

a1 a2

b1 b2

∣∣∣∣2

+ (a1b1 + a2b2 + a3b3)2

= (a2b3 − a3b2)2 + (a1b3 − a3b1)

2 + (a1b2 − a2b1)2 + (a1b1 + a2b2 + a3b3)

2

= a22b

23 + a2

3b22 + a2

1b23 + a2

3b21 + a2

1b22 + a2

2b21 + a2

1b21 + a2

2b22 + a2

3b23

=(a23 + a2

2 + a21

) (b23 + b2

1 + b22

)= |a|2|b|2

It follows that|a× b|2|a|2|b|2 = 1− (a · b)2

|a|2|b|2 = 1− cos2 θ = sin2 θ

Hence |a× b| = |a||b| sin θ. Since a× b is parallel to c, we have a× b = ± (|a||b| sin θ) c.If a = i and b = j, then from Example 4.3.2 we have from the Corkscrew Rule c = k andi× j = +k. This suggests that the corkscrew rule applied to any cross product a× b gives thesame direction as c, and we will be satisfied with this. Equation (4.17) now follows.

From Example 4.3.2 we see how the geometric interpretation gives, besides i × j = k, alsothe other products j × k = i and k × i = j.Compare this with 1.1.1 on right-handed systems in Chapter 1.

Exercise 94 1. Find (−1, 5, 7)× (2,−3, 4) and (−2, 4,−3)× (6,−1, 2).

2. Find formulae for i× a, j × a and k × a.

3. Consider three points A, B and C in space. If they are not collinear they form a triangle.Show that its area is 1

2 |AB ×AC|.4. Find the area of the triangle with vertices A = (3, 4, 5), B = (4, 4, 6), C = (5, 5, 4).

5. Show that the non-zero vectors a and b are at right angles if, and only if, |a× b| = |a||b|.6. Let l and m be unit vectors and put n = l×m. Using the geometric interpretation of the

cross product show that l, m and n form a right-handed system (see 1.1.1). In fact,

l ×m = n m× n = l n× l = m

Compare this with Example 4.3.2.

7. Let b 6= 0 be a vector in space. Write x → underlinex × b as a linear transformationshowing that the rank of the coefficient matrix is 2. Hence show that the general solution tox×b = 0 is x = tb where t is a parameter and describe the general solution to x×b = a×b.

8. Complete the proofs of the properties 4.3.1 of the cross product.

Hint 95 For property 7, consider the case when a and b are both non-zero. If they areparallel, use properties of 2 × 2 determinants in 4.1.4 to show that the cross product iszero. For the converse use the geometric interpretation (4.17) to show that a × b 6= 0when the vectors are not parallel.


9. Let a, b and c be three vectors in <3. Show that they are linearly independent if, and onlyif, the triple scalar product a · (b× c) is not zero.

10. * Consider the parallelogram ABCD in Figure 4.2. The line EF passing through Q is

Q

A

D C

B

E F

G

H

Figure 4.2

parallel to AB and the line GH passing through Q is parallel to AD. Using vector methodsshow that the parallelograms EQHD and GBFQ have equal areas if, and only if, Q lieson the diagonal AC. Conclude that this is the case if, and only if, the parallelogramsAGQE and QFCH are similar. (A theorem going back to Euclid).

Hint 96 Consider the cross product AQ×QC

11. * A parallelepiped is a ‘squashed box’: it has six faces with opposite faces congruentparallelograms. Let u = AD, v = AB, and w = AE be three of its linearly independentedges. Show that its volume is |(u× v) · w|.

-

3

± ºº-

:: -

-

A

w

D

B

E

u

Figure 4.2 Parallelepiped

Cv

h

h = w·(u×v)|u×v|

Hint 97 Let its base be ABCD. This has area α = |AD × AB|. The height h of theparallelepiped is the absolute value of the component of AE along AD×AB. The volumeis then hα.

12. * (The triple vector product). Let a, b and c be three given vectors. Show that

a× (b× c) = (a · c) b− (a · b) c


The result is not altogether surprising since a× (b× c) is a linear combination of b and c.

Hint 98 Show the result is true if a is i, j or k. Now use the result of No.2.

13. * (Moment or torque of a force about a point Q).Let F be a force vector that is applied to a point P in space. The moment (or torque)of the force about the point Q is defined as

M = QP × F (4.18)

(a) The line of action of the force is the straight line parallel to F that passes throughP . Show that the torques about Q of F applied at any two points on its line of actionare equal.

For this reason, once the line of action is specified, we can simply refer to themoment of F about Q without specifying the particular point at which the force isapplied. However, to get a better picture, one may wish to distinguish such a pointas in the following.

(b) (Connection with the elementary definition of moments learnt in physics).

Let H be the point on the line of action of F that is closest to Q, so that QH (ifnonzero) is perpendicular to F . Show that

|M | =∣∣QH

∣∣ |F |and that M is fully described as the usual clockwise turning effect of the force aboutQ when looking along the axis through Q with direction QP × F . (The right-handrule again!)

14. * (Moment about a line) Consider an axis ξ passing through Q and having the directiongiven by unit vector l. Let D and E be points on the line of action of F and the axis ξrespectively such that the distance |ED| is minimum. (Thus ED, if not zero, is perpen-dicular to both lines.) Let m be the unit vector with direction ED, so that ED = µm,where µ = |ED|. Put n = l×m. In the sense of the right-hand rule, the turning effect ofF about the axis ξ is the component F · n times the distance µ = |ED| (draw a pictureto see this).Show that this component is the triple scalar product M · l =

(QP × F

) · l.Show further that if P ′ and Q′ are any points on the line of action of F and the axis ξrespectively, then (

Q′P ′ × F) · l =

(QP × F

) · lThis shows that the turning effect of a force about an axis is purely a property of the forcewith its line of action and the axis and does not depend on selected points on these lines.

Hint 99 Express QD as a linear combination λl + µm and F as a linear combinationαl + γn. Now multiply out the expression M = (λl + µm)× (αl + γn) and find M · l.

This leads to the following

Definition: Let a force F together with any point P on its line of action be given as wellas any point Q on a given straight line `.If ` has generic equation a + tl, the moment of the force about the line ` is the projectionof the moment M = QP × F on l.This vector does not depend on the representation of ` nor on choices of P and Q.

15. (Couples) Let ` be the line of action of F and `′ the line of action of −F , where ` and`′ are parallel. For any point Q we can calculate the sum of the moments of F and −Fabout Q - known as a couple. Show that this couple is the same for any point Q.


Hint 100 Let P and P ′ be points on ` and `′ respectively. Show that the value of thecouple is PP ′ × (−F ).

Couples play an important part in mechanics.



4.4.1 Determinants

Determinants are inspired by the search for a formula to solve n equations for n unknowns.(See, for example, subsection 4.1.6).


In Exercise 3a of Chapter 2 we introduced the idea of the determinant: |A| = a11a22 − a21a12.In subsection 4.1.4 of 4.1.1 we develop some basic properties of 2× 2 determinants.


For a 3× 3 matrix A, its determinant is given by equation (4.7):

|A| =∑

σ

sgn(σ)a1σ1a2σ2a3σ3

Here the sum extends over all σ = 3! = 6 permutations of 1, 2, 3 and sgn(σ) is the sign ofthe permutation σ = (σ1, σ2, σ3).A similar definition holds for an n× n matrix A.

4.4.4 Cofactors and properties of 3× 3 determinants

(See subsections 4.1.10, 4.1.13, 4.1.14)The cofactor of aij is −1i+j times the determinant formed

by deleting row i and column j.A determinant can be expanded by any row:

|A| = ai1Ai|1 + ai2Ai|2 + ai3Ai|3 (4.19)

(See equation (4.8).Multiplying a row of A by a number k multiplies the determinant by k.Interchanging two rows (columns) changes the sign of the determinant.Adding a row (column) of A to another row (column) does not change the determinant.To each property of rows there is a corresponding property of columns.

4.4.5 The adjoint

(See subsection 4.1.16)The adjoint of A is given by

[adj (A)]ij = Aj|i

This means that we can first find the matrix B with [B]ij = Ai|j and then adj(A) = BT .The fundamental property is

Aadj (A) = |A| I = adj (A) A

Consequently, A has an inverse if, and only if, |A| 6= 0, in which case

A=1 =1|A|adj (A)

4.4.6 General determinants

All properties of 2× 2 and 3× 3 carry over to n× n determinants. (See section 4.2).


4.4.7 The Cross Product

(See section 4.3).

The cross product is defined in equation 4.14

a× b =∣∣∣∣

a2 a3

b2 b3

∣∣∣∣ i−∣∣∣∣

a1 a3

b1 b3

∣∣∣∣ j +∣∣∣∣

a1 a2

b1 b2

∣∣∣∣ k

4.4.8 The Triple Scalar Product of a, b and c

This isa · (b× c)

Basic properties are given in equations (4.15) and (4.16).

4.4.9 Geometric interpretation

a× b = (|a||b| sin θ) c

Here 0 < θ < π is the angle between the non-zero and non-parallel vectors a and b and c is theunit vector perpendicular to a and b with direction found by using the corkscrew rule: rotatefrom a to b through θ.The cross product can also be seen geometrically as an area (exercise 3); the triple scalarproduct as a volume as a volume (exercise 11).

applied mathematics 1a (eng) mathematics …...5. d. lay: linear algebra and its applications (3rd...

Documents