introductory linear algebra - northern arizona...

Introductory Linear Algebra

Lecture Notes

Sudipta Mallik

Updated on March 25, 2020

Contents

1 Introduction 11.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Solving a Linear System 52.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Echelon Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Geometry of Solution Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Fundamental Linear Algebraic Concepts on Rn 143.1 Linear Span and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Basis and Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Inverse and Determinant of a Matrix 314.1 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Invertible Matrix Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Eigenvalues and Eigenvectors 435.1 Basics of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . 435.2 Similar and Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . 475.3 Similarity of Matrix Transformations . . . . . . . . . . . . . . . . . . . . . . 505.4 Application to Differential Equations . . . . . . . . . . . . . . . . . . . . . . 53

6 Inner-product and Orthogonality 566.1 Orthogonal Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.2 Orthogonal Bases and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 586.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.4 Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Vector Spaces and Inner Product Spaces 667.1 Basics of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667.2 Linear Span and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.4 Basis and Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.5 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.6 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Linear Algebra Sudipta Mallik

1 Introduction

1.1 Matrix Operations

Matrix: An m × n matrix A is an m-by-n array of scalars from a field (for example realnumbers) of the form

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

.The order (or size) of A is m × n (read as m by n) if A has m rows and n columns. The(i, j)-entry of A = [ai,j] is ai,j.

For example, A =

[1 2 0−3 0 −1

]is a 2× 3 real matrix. The (2, 3)-entry of A is −1.

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same orderand the entries of A and B are the same.

Useful Matrices:

• A zero matrix, denoted by O or Om,n, is an m× n matrix whose all entries are zero.

• A square matrix is a matrix is a matrix whose number of rows and number of columnsare the same.

• A diagonal matrix is a square n× n matrix whose nondiagonal entries are zero.

• The identity matrix of order n, denoted by In, is the n × n diagonal matrix whose

diagonal entries are 1. For example, I3 =

1 0 00 1 00 0 1

is the 3× 3 identity matrix.

• An n × 1 matrix is called a column matrix or an n-dimensional (column) vector,

denoted by lowercase letters such x, x, or −→x . For example, −→x =

012

is a 3-

dimensional vector which represents the position vector of the point (0, 1, 2) in the3-space R3 (i.e., a directed line segment from the origin (0, 0, 0) to the point (0, 1, 2)).

Matrix Operations:

• Transpose: The transpose of an m×n matrix A, denoted by AT , is an n×m matrixwhose columns are corresponding rows of A, i.e., (AT )ij = Aji.

1


x1

x2

(2, 1)[

21

]

Position vector of a point in the 2-space R2

Example. If A =

[1 2 0−3 0 −1

], then AT =

1 −32 00 −1

.

Properties: Let A and B be two matrices with appropriate orders. Then

1. (AT )T = A

2. (A+B)T = AT +BT

3. (cA)T = cAT for any scalar c

4. (AB)T = BTAT

• Scalar Multiplication: Let A be a matrix and c be a scalar. The scalar multiple,denoted by cA, is the matrix whose entries are c times the corresponding entries of A.

Example. If A =

[1 2 0−3 0 −1

], then −2A =

[−2 −4 0

6 0 2

].

Properties: Let A and B be two matrices of the same order and c and d be scalars.Then

1. c(A+B) = cA+ cB

2. (c+ d)A = cA+ dA

3. c(dA) = (cd)A

• Sum: If A and B are m× n matrices, then the sum A+B is the m× n matrix whoseentries are the sum of the corresponding entries of A and B, i.e., (A+B)ij = Aij +Bij.

Example. If A =

[1 2 0−3 0 −1

]and B =

[0 −2 03 0 2

], then A+B =

[1 0 00 0 1

].

Exercise. Find 2A−B.

Properties: Let A,B, and C be three matrices of the same order. Then

2


1. A+B = B + A (commutative)

2. (A+B) + C = A+ (B + C) (associative)

3. A+O = A (additive identity O)

• Multiplication:Matrix-vector multiplication: If A is an m × n matrix and −→x is an n-dimensionalvector, then their product A−→x is an n-dimensional vector whose (i, 1)-entry is ai1x1 +ai2x2 + · · ·+ aimxn, the dot product of the row i of A and −→x . Note that

A−→x =

a11x1 + a12x2 + · · ·+ a1nxna21x1 + a22x2 + · · ·+ a2nxn

...am1x1 + am2x2 + · · ·+ amnxn

= x1

a11a21...am1

+x2

a12a22...am2

+· · ·+xn

a1na2n...

amn

.

Example. If A =

[1 2 0−3 0 −1

]and −→x =

1−1

0

, then A−→x =

[−1−3

]which is a

linear combination of first and second columns of A with weights 1 and −1 respectively.

Matrix-matrix multiplication: If A is an m× n matrix and B is an n× p matrix, thentheir product AB is an m× p matrix whose (i, j)-entry is the dot product the row i ofA and the column j of B.

(AB)ij = ai1b1j + ai2b2j + · · ·+ aimbmj

Example. For A =

[1 2 20 0 2

]and B =

2 −20 01 1

, we have AB =

[4 02 2

].

Properties: Let A,B, and C be three matrices of appropriate orders. Then

1. A(BC) = (AB)C (associative)

2. A(B + C) = AB + AC (left-distributive)

3. (B + C)A = BA+ CA (right-distributive)

4. k(AB) = (kA)B = A(kB) for any scalar k

5. ImA = A = AIn for any m× n matrix A (multiplicative identity I)

Remark.

(1) The column i of AB is A(column i of B).

3


Example. For A =

[1 2 20 0 2

]and B =

2 −20 01 1

, we have

AB =

[4 02 2

]=

A 2

01

A

−201

.(2) AB 6= BA in general.

Example.

[1 23 4

] [0 10 0

]=

[0 10 3

]6=[

3 40 0

]=

[0 10 0

] [1 23 4

].

(3) AB = AC does not imply B = C in general.

Example.

[−2 1−2 1

] [1 10 0

]=

[−2 −2−2 −2

]=

[−2 1−2 1

] [0 0−2 −2

].

(4) AB = O does not imply A = O or B = O in general.

Example.

[−2 1−2 1

] [1 12 2

]=

[0 00 0

].

• Powers of a matrix: If A is an n × n matrix and k is a positive integer, then k-thpower of A, denoted by Ak, is the product of k copies of A. We use the conventionA0 = In.

Example. A =

[0 10 0

]=⇒ A2 = AA =

[0 00 0

], A100 =

[0 00 0

].

Symmetric and Skew-symmetric Matrices:A square matrix A is symmetric if AT = A and A is skew-symmetric if AT = −A. A squarematrix A can be written uniquely as a sum of a symmetric and a skew-symmetric matrix:

A =1

2

(A+ AT

)+

1

2

(A− AT

)Example.[

1 42 5

]=

1

2

([1 42 5

]+

[1 24 5

])+

1

2

([1 42 5

]−[

1 24 5

])=

[1 33 5

]+

[0 1−1 0

].

4


2 Solving a Linear System

2.1 Systems of Linear Equations

A system of linear equations with n variables x1, . . . , xn and m equations can be written asfollows:

a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2

......

......

am1x1 + am2x2 + · · · + amnxn = bm.

(1)

A solution is an n-tuple (s1, s2, . . . , sn) that satisfies each equation when we substitute x1 =s1, x2 = s2, . . . , xn = sn. The solution set is the set of all solutions.

Example.

x1 + x3 = 3x2 − 2x3 = −1

The solution set (on R) is {(−s+ 3, 2s− 1, s) | s ∈ R}. There are infinitely many solutionsbecause of the free variable x3.

Possibilities of solutions of a linear system:

• System has no solution (Inconsistent)

• System has a solution (Consistent)

(a) Unique solution

(b) Infinitely many solutions

x1

x2

2x1 − x2 = 0

2x1 − x2 = 4

No solution

x1

x2

2x1 − x2 = 0x1 − x2 = −1

Unique solution

x1

x2

2x1 − x2 = 0

4x1 − 2x2 = 0

Infinitely many solutions

Definition. The system (1) is called an underdetermined system if m < n, i.e., fewerequations than variables. The system (1) is called an overdetermined system if m > n,i.e., more equations than variables.

5


The system (1) of linear equations can be written by a matrix equation and a vector equation:

The matrix equation: A−→x =−→b , where

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

, −→x =

x1x2...xn

, and−→b =

b1b2...bm

.A is the coefficient matrix. The augmented matrix is

[A−→b ] =

a11 a12 · · · a1n b1a21 a22 · · · a2n b2...

.... . .

......

am1 am2 · · · amn bm

.The vector equation: x1

−→a1 + x2−→a2 + · · ·+ xn

−→an =−→b , where A = [−→a1 −→a2 · · · −→an].

Example.

2x2 − 8x3 = 8x1 − 2x2 + x3 = 0

−4x1 + 5x2 + 9x3 = −9

The matrix equation is A−→x =−→b where

A =

0 2 −81 −2 1−4 5 9

, −→x =

x1x2x3

, and−→b =

80−9

.The augmented matrix is

[A−→b ] =

0 2 −8 81 −2 1 0−4 5 9 −9

.

The vector equation is x1

01−4

+ x2

2−2

5

+ x3

−819

=

80−9

.You may verify that one solution is (x1, x2, x3) = (29, 16, 3). Is it the only solution?

6


2.2 Row Operations

There are three elementary row operations we perform on a matrix:

1. Interchanging two rows (Ri ↔ Rj)

2. Multiplying a row by a nonzero scalar (cRi, c 6= 0)

3. Adding a scalar multiple of row i to row j (cRi +Rj)

Steps of solving a linear system A−→x =−→b are equivalent to elementary row operations on

the augmented matrix [A−→b ] as illustrated by the following example.

Example.

2x2 − 8x3 = 8 (2.1)x1 − 2x2 + x3 = 0 (2.2)

−4x1 + 5x2 + 9x3 = −9 (2.3)

We do the following steps to solve the above system:

1. Interchange (2.1) and (2.2):

x1 − 2x2 + x3 = 0 (3.1)2x2 − 8x3 = 8 (3.2)

−4x1 + 5x2 + 9x3 = −9 (3.3)

Corresponding row operation is

[A−→b ] =

0 2 −8 81 −2 1 0−4 5 9 −9

R1↔R2−−−−→

1 −2 1 00 2 −8 8−4 5 9 −9

.2. Replace (3.3) by 4(3.1)+(3.3):

x1 − 2x2 + x3 = 0 (4.1)2x2 − 8x3 = 8 (4.2)

− 3x2 + 13x3 = −9 (4.3)

Corresponding row operation is 1 −2 1 00 2 −8 8−4 5 9 −9

4R1+R3−−−−→

1 −2 1 00 2 −8 80 −3 13 −9

.7


3. Scale 12(4.2):

x1 − 2x2 + x3 = 0 (5.1)x2 − 4x3 = 4 (5.2)

− 3x2 + 13x3 = −9 (5.3)

Corresponding row operation is 1 −2 1 00 2 −8 80 −3 13 −9

12R2−−→

1 −2 1 00 1 −4 40 −3 13 −9

.4. Replace (5.3) by 3(5.2)+(5.3):

x1 − 2x2 + x3 = 0 (6.1)x2 − 4x3 = 4 (6.2)

x3 = 3 (6.3)

Corresponding row operation is 1 −2 1 00 1 −4 40 −3 13 −9

3R2+R3−−−−→

1 −2 1 00 1 −4 40 0 1 3

.5. Back substitutions:

(6.3) =⇒ x3 = 3(6.2) =⇒ x2 = 4 + 4x3 = 4 + 4 · 3 = 16(6.1) =⇒ x1 = 0 + 2x2 − x3 = 2 · 16− 3 = 29

So the solution set is {(29, 16, 3)}.

Remark.

1. Two matrices are row equivalent if we can transform one matrix to another by elementaryrow operations. If two linear systems have row equivalent augmented matrices, thenthey have the same solution set.

2. To solve A−→x =−→b , using row operations we transform the augmented matrix [A

−→b ]

into an “upper-triangular” form called echelon form and then use back substitutions.

8


2.3 Echelon Forms

The leading entry of a row of a matrix is the left-most nonzero entry of the row.

Definition. An m× n matrix A is in echelon form (or REF=row echelon form) if

1. all zero rows at the bottom,

2. all entries in a column of a leading entry below the leading entry are zeros, and

3. the leading entry of each row is to the right of all leading entries in the rows above it.

A is in reduced echelon form (or RREF=reduced row echelon form) if it satisfies two additionalconditions:

4. the leading entry of each row is 1 and

5. each leading 1 is the only nonzero entry in its column.

Example.

1. The following matrices are in REF: 1 −2 1 0

0 0 4 30 0 0 0

, 1 −2 1 0

0 0 4 3

0 0 0 5

2. The following matrices are in RREF: 1 −2 0 0

0 0 1 30 0 0 0

, 1 −2 0 0

0 0 1 0

0 0 0 1

Definition. A pivot position in a matrix A is a position of a leading 1 in the RREF of Aand corresponding column is a pivot column. A pivot is a nonzero number in a pivot positionof A that is used to create zeros below it in Gaussian elimination.

Example. Pivot positions of the last matrix are (1, 1), (2, 3), and (3, 4).

The Gaussian elimination or row reduction algorithm to get the REF of a matrix isexplained by the following example:

Example. A =

0 3 −6 5 −53 −7 8 −7 93 −9 12 −9 15

9


1. Start with the left-most nonzero column (first pivot column) and make its top entrynonzero by interchanging rows if needed. This top nonzero entry is the pivot of thepivot column. 0 3 −6 5 −5

3 −7 8 −7 93 −9 12 −9 15

R1↔R3−−−−→

3 −9 12 −9 153 −7 8 −7 90 3 −6 5 −5

2. Create zeros below the pivot by row replacements. 3 −9 12 −9 15

3 −7 8 −7 90 3 −6 5 −5

−R1+R2−−−−−→

3 −9 12 −9 150 2 −4 2 −60 3 −6 5 −5

3. Ignore the column and row of the current pivot and repeat the preceding steps for the

rest submatrix. 3 −9 12 −9 15

0 2 −4 2 −60 3 −6 5 −5

− 32R2+R3−−−−−−→

3 −9 12 −9 15

0 2 −4 2 −6

0 0 0 2 4

(REF )

To get RREF start with the right-most pivot, make it 1 by scaling, and then create zerosabove it by row replacements. Repeat it for the rest of the pivots. 3 −9 12 −9 15

0 2 −4 2 −6

0 0 0 2 4

12R3−−→

3 −9 12 −9 15

0 2 −4 2 −6

0 0 0 1 2

9R3+R1−2R3+R2−−−−−−→

3 −9 12 0 33

0 2 −4 0 −10

0 0 0 1 2

12R2−−→

3 −9 12 0 33

0 1 −2 0 −5

0 0 0 1 2

9R2+R1−−−−→

3 0 −6 0 −12

0 1 −2 0 −5

0 0 0 1 2

13R1−−→

1 0 −2 0 −4

0 1 −2 0 −5

0 0 0 1 2

(RREF)

Remark. The above algorithm to get RREF is called Gauss-Jordan elimination. TheRREF of A is unique as it does not depend on the elementary row operations applied to A.

Steps to solve a linear system A−→x =−→b (Gaussian elimination):

1. Find the RREF of the augmented matrix [A−→b ].

2. Write the system of linear equations corresponding to the RREF.

3. If the new system is inconsistent, there is no solution of the original system. Otherwisewrite the basic variables (variables corresponding to pivot columns) in terms of constantand free variables (non-basic variables which corresponds to non-pivot columns).

10


Example.

x1 − 3x2 + 2x4 = 12x1 − 6x2 + x3 + 10x4 = 0−x1 + 3x2 + x3 + 4x4 = −3

We find the RREF of the augmented matrix: 1 −3 0 2 12 −6 1 10 0−1 3 1 4 −3

−2R1+R2−−−−−→R1+R3

1 −3 0 2 1

0 0 1 6 −20 0 1 6 −2

−R2+R3−−−−−→

1 −3 0 2 1

0 0 1 6 −20 0 0 0 0

(RREF)

Corresponding system is

x1 − 3x2 + 2x4 = 1x3 + 6x4 = −2

0 = 0

where x1 and x3 are basic variables (for pivot columns) and x2 and x4 are free variables (fornon-pivot columns).

x1 = 1 + 3x2 − 2x4x2 = freex3 = −2− 6x4x4 = free

The solution set is {(1 + 3s − 2t, s,−2 − 6t, t) | s, t ∈ R}. If we solve the corresponding

matrix equation A−→x =−→b , the solution set is

1 + 3s− 2t

s−2 − 6t

t

| s, t ∈ R

=

10−2

0

+ s

3100

+ t

−2

0−6

1

| s, t ∈ R

.

Possibilities of solutions of A−→x =−→b from the RREF:

• System has no solution (inconsistent) iff the RREF of [A−→b ] has a row of the form

[0, . . . , 0, c], c 6= 0.

• System has a solution (consistent) iff the RREF of [A−→b ] has no row of the form

[0, . . . , 0, c], c 6= 0.

(a) Infinitely many solution if the RREF of [A−→b ] has a non-pivot column that is not

the last column (there is a free variable).

(b) Unique solution if all but the last column of the RREF of [A−→b ] are pivot columns

(there is no free variable).

11


2.4 Geometry of Solution Sets

Homogeneous linear system: A system of linear equations is homogeneous if its matrixequation is A−→x =

−→0 . Note that

−→0 is always a solution called the trivial solution. Any

nonzero solution is called a nontrivial solution.

Example.

1.

x1 + x2 − x3 = 03x2 − 2x3 = 0

The corresponding matrix equation A−→x =−→0 has the solution sets

123

| s ∈ R

which is also denoted by Span

1

23

. This solution set corresponds to the points

on the line in the 3-space R3 passing through the point (1, 2, 3) and the origin (0, 0, 0).

Recall that the vector

123

is the position vector of the point (1, 2, 3) which is a

directed line segment from the origin (0, 0, 0) to the point (1, 2, 3).

2.x1 − x2 − 2x3 = 0

The corresponding matrix equation A−→x =−→0 has the solution sets

110

+ t

201

| s, t ∈ R

= Span

1

10

, 2

01

.

This solution set corresponds to the points on the plane in the 3-space R3 passingthrough the points (1, 1, 0), (2, 0, 1), and the origin (0, 0, 0).

Remark. If A−→x =−→0 has k free variables, then its solution set is the span of k vectors.

The solution set of A−→x =−→0 is Span{−→v1 , . . . ,−→vk} for some vectors −→v1 , . . . ,−→vk .

The solution set of A−→x =−→b is {−→p +−→v | A−→v =

−→0 } where A−→p =

−→b .

So a nonhomogenous solution is a sum of a particular solution and a homogeneous solution.

To justify it, let −→y be a solution of A−→x =−→b , i.e., A−→y =

−→b . Then

A(−→y −−→p ) =−→b −−→b =

−→0 .

12


−→p

A−→x =−→0

A−→x =−→b

The solution set of A−→x =−→b is a translation of that of A−→x =

−→0

Then (−→y −−→p ) = −→v where A−→v =−→0 . Thus −→y = −→p +−→v .

Geometrically we get the solution set of A−→x =−→b by shifting the solution set of A−→x =

−→0

to the point whose position vector is −→p along the vector −→p .

Example. The nonhomogeneous system x1 − x2 − 2x3 = −2 has a particular solution

−→p =

111

. The corresponding homogeneous system x1− x2− 2x3 = 0 has the solution set

s 1

10

+ t

201

| s, t ∈ R

.

Thus the solution set of the nonhomogeneous system x1 − x2 − 2x3 = −2 is 1

11

+ s

110

+ t

201

| s, t ∈ R

.

13


3 Fundamental Linear Algebraic Concepts on Rn

3.1 Linear Span and Subspaces

Definition. A linear combination of vectors −→v1 ,−→v2 , . . . ,−→vk of Rn is a sum of their scalarmultiples, i.e.,

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk

for some scalars c1, c2, . . . , ck. The set of all linear combinations of a nonempty set S ofvectors of Rn is called the linear span or span of S, denoted by Span(S) or SpanS, i.e.,

Span{−→v1 ,−→v2 , . . . ,−→vk} = {c1−→v1 + c2−→v2 + · · ·+ ck

−→vk | c1, c2, . . . , ck ∈ R}.

We define Span∅ = {−→0 }. When Span{−→v1 , . . . ,−→vk} = Rn, we say {−→v1 , . . . ,−→vk} spans Rn.

Example. For S =

1

10

, 1

20

,

Span(S) =

c1 1

10

+ c2

120

| c1, c2 ∈ R

.

Note that [0, 0, 1]T is not in Span(S) because there are no c1, c2 for which 001

= c1

110

+ c2

120

.Thus S does not span R3. But any vector of the form [a, b, 0]T is in Span(S) because

x1

110

+ x2

120

=

ab0

=⇒ x1 = 2a− b, x2 = −a+ b.

i.e.,

ab0

= (2a− b)

110

+ (−a+ b)

120

∈ Span(S).

Thus S spans the following set

Span(S) =

ab0

| a, b ∈ R

,

which is the xy-plane in R3.

Definition. A subspace of Rn is a nonempty subset S of Rn that satisfies three properties:

14


(a)−→0 is in S.

(b) −→u +−→v is in S for all −→u , −→v in S.

(c) c−→u is in S for all −→u in S and all scalars c.

In short, a subspace of Rn is a nonempty subset S of Rn that is closed under linearcombination of vectors, i.e., c−→u + d−→v is in S for all −→u , −→v in S and all scalars c, d. WhenS is a subspace of Rn, we sometimes denote it by S ≤ Rn.

Example.

1. {−→0 },Rn ≤ Rn, i.e., {−→0 } and Rn are subspaces of Rn.

2. Show that S =

{[xy

]| x, y ∈ R, 2x− y = 0

}is a subspace of R2.

Solution.

(a)

[00

]∈ S because 2 · 0− 0 = 0.

(b) Let −→u ,−→v ∈ S and c ∈ R. Then

−→u =

[x1y1

]and −→v =

[x2y2

],

for some x1, x2, y1, y2 ∈ R such that 2x1 − y1 = 0 and 2x2 − y2 = 0. Then

−→u +−→v =

[x1y1

]+

[x2y2

]=

[x1 + x2y1 + y2

]∈ S

because 2(x1 + x2)− (y1 + y2) = (2x1 − y1) + (2x2 − y2) = 0.

(c)

c−→u = c

[x1y1

]=

[cx1cy1

]∈ S

because 2(cx1)− (cy1) = c(2x1 − y1) = 0.

Thus S (which is the line y = 2x) is a subspace of R2.

3. Let S =

1

10

, 1

20

. Then Span(S) is a subspace of R3.

First note that

000

= 0

110

+ 0

120

∈ Span(S). Thus Span(S) 6= ∅.

15


Let −→u ,−→v ∈ Span(S) and c, d ∈ R. Then

−→u = c1

110

+ c2

120

and −→v = d1

110

+ d2

120

,for some c1, c2, d1, d2 ∈ R. Then

c−→u + d−→v = c

c1 1

10

+ c2

120

+ d

d1 1

10

+ d2

120

= (cc1 + dd1)

110

+ (cc2 + dd2)

120

∈ Span(S).

Thus Span(S) (which is the xy-plane) is a subspace of R3.

Theorem 3.1. Let −→v1 ,−→v2 , . . . ,−→vk ∈ Rn. Then Span{−→v1 ,−→v2 , . . . ,−→vk} is a subspace of Rn.

Proof. Since−→v1 ∈ Span{−→v1 ,−→v2 , . . . ,−→vk}, Span{−→v1 ,−→v2 , . . . ,−→vk} 6= ∅. Let−→u ,−→v ∈ Span{−→v1 ,−→v2 , . . . ,−→vk}and c, d ∈ R. Then −→u = c1

−→v1 + c2−→v2 + · · ·+ ck

−→vk and −→v = d1−→v1 + d2

−→v2 + · · ·+ dk−→vk for some

c1, . . . , ck, d1, . . . , dk ∈ R. Then

c−→u + d−→v = c(c1−→v1 + c2

−→v2 + · · ·+ ck−→vk) + d(d1

−→v1 + d2−→v2 + · · ·+ dk

−→vk)

= (cc1 + dd1)−→v1 + (cc2 + dd2)

−→v2 + · · ·+ (cck + ddk)−→vk ∈ Span{−→v1 ,−→v2 , . . . ,−→vk}.

For a given matrix we have two important subspaces: the column space and the null space.

Definition. The column space of an m× n matrix A = [−→a1 −→a2 · · · −→an], denoted by CS (A) orColA, is the span of its column vectors:

CS (A) = Span{−→a1 ,−→a2 , . . . ,−→an}.

Remark. Since each column is an m dimensional vector, CS (A) is a subspace of Rm.

Example. For A =

[1 2 30 4 5

], CS (A) = Span

{[10

],

[24

],

[35

]}≤ R2.

Example. Let A =

1 −3 −4−4 6 −2−3 7 6

and−→b =

33−4

. Determine if−→b is in CS (A).

Note that−→b ∈ CS (A) if and only if

−→b is a linear combination of columns of A if and only

if A−→x =−→b has a solution. 1 −3 −4 3

−4 6 −2 3−3 7 6 −4

4R1+R2−−−−→3R1+R3

1 −3 −4 30 −6 −18 150 −2 −6 5

− 13R2+R3−−−−−−→

1 −3 −4 3

0 -6 −18 150 0 0 0

(REF )

Since the REF of [A−→b ] has no row of the form [0, 0, 0, c], c 6= 0, A−→x =

−→b is consistent and

consequently−→b is in CS (A).

16


Theorem 3.2. An m×n matrix A has a pivot position in every row if and only if A−→x =−→b

is consistent for any−→b ∈ Rm if and only if CS (A) = Rm.

Example. Since A =

[1 2 3

0 4 5

]has a pivot position in each row, CS (A) = R2.

Definition. The null space of an m×n matrix A, denoted by NS (A) or NulA, is the solution

set of A−→x =−→0 :

NS (A) = {−→x ∈ Rn | A−→x =−→0 }.

Theorem 3.3. Let A be an m× n matrix A. Then NS (A) is a subspace of Rn.

Proof. Since A−→0 =

−→0 ,−→0 ∈ NS (A). Thus NS (A) 6= ∅. Let −→u ,−→v ∈ NS (A) and c, d ∈ R.

Then A−→u =−→0 and A−→v =

−→0 . Then

A(c−→u + d−→v ) = c(A−→u ) + d(A−→v ) = c−→0 + d

−→0 =

−→0 .

Thus c−→u + d−→v ∈ NS (A).

Example. Let A =

[1 1 −10 3 −2

]. Find NS (A).

We find the solution set of A−→x =−→0 .

[A−→0 ] =

[1 1 −1 0

0 3 −2 0

]13R2−−→[

1 1 −1 0

0 1 −2/3 0

]−R2+R1−−−−−→

[1 0 −1/3 0

0 1 −2/3 0

](RREF )

Corresponding system is

x1 − x33

= 0x2 − 2x3

3= 0

where x1 and x2 are basic variables (for pivot columns) and x3 is a free variable (for non-pivotcolumn).

x1 = x33

x2 = −2x33

x3 = free

NS (A) =

x3

32x33

x3

| x3 ∈ R

=

x33 1

23

| x3 ∈ R

= Span

1

23

Remark. If an m×n matrix A has k non-pivot columns (i.e., k free variables for A−→x =

−→0 ),

then NS (A) is a span of k vectors in Rn. For a proof see Theorem 3.9.

17


3.2 Linear Independence

Definition. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of vectors of Rn is linearly independent if the only

linear combination of vectors in S that produces−→0 is a trivial linear combination., i.e.,

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 =⇒ c1 = c2 = · · · = ck = 0.

S = {−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent if S is not linearly independent, i.e., there arescalars c1, c2, . . . , ck, not all zero, such that

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 .

Remark.

1. {−→0 } is linearly dependent as 2−→0 =

−→0 .

2. {−→v } is linearly independent if and only if −→v 6= −→0 .

3. Let S = {−→v1 ,−→v2 , . . . ,−→vk} and A = [−→v1 −→v2 · · · −→vk ]. Then S is linearly independent if and

only if−→0 is the only solution of A−→x =

−→0 if and only if NS (A) = {−→0 }.

Example.

1. Determine if the following vectors are linearly independent.

−→v1 =

[12

], −→v2 =

[23

]

We investigate if c1−→v1 + c2

−→v2 =−→0 =⇒ c1 = c2 = 0.

[A−→0 ] =

[1 2 02 3 0

]−2R1+R2−−−−−→

[1 2 0

0 -1 0

](REF )

Each column of A is a pivot column giving no free variables. So there is a uniquesolution of A−→x =

−→0 which is

−→0 . Thus −→v1 and −→v2 are linearly independent. Note that

each of −→v1 and −→v2 is not a multiple of the other.

2. Determine if the columns of A are linearly independent for A =

1 2 3 41 3 5 81 2 4 7

.

A =

1 2 3 41 3 5 81 2 4 7

−R1+R2−−−−−→−R1+R3

1 2 3 4

0 1 2 4

0 0 1 3

(REF )

A has a non-pivot column giving a free variable. So there are infinitely many solutionsof A−→x =

−→0 . Thus the columns of A are linearly dependent. Verify that one solution

18


is (x1, x2, x3, x4) = (1, 2,−3, 1). So we get the following linear dependence relationamong the columns of A:

1

111

+ 2

232

− 3

354

+ 1

487

=

000

.Remark. The columns of an m× n matrix are linearly dependent when m < n because Awould have a non-pivot column giving a free variable for solutions of the system A−→x =

−→0 .

Theorem 3.4. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of k ≥ 2 vectors in Rn is linearly dependent ifand only if there exists a vector in S that is a linear combination of the other vectors in S.

Proof. Let S = {−→v1 ,−→v2 , . . . ,−→vk} be a set of k ≥ 2 vectors in Rn. First suppose S is linearlydependent. Then there are scalars c1, c2, . . . , ck, not all zero, such that

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 .

Choose i ∈ {1, 2, . . . , k} such that ci 6= 0. Then

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 =⇒ −ci−→vi = c1

−→v1 + · · ·+ ci−1−−→vi−1 + ci+1

−−→vi+1 + · · ·+ ck−→vk

=⇒ −→vi = −c1ci

−→v1 − · · · −ci−1ci

−−→vi−1 −ci+1

ci

−−→vi+1 − · · · −ckci

−→vk .

Conversely suppose there is i ∈ {1, 2, . . . , k} such that

−→vi = d1−→v1 + · · ·+ di−1

−−→vi−1 + di+1−−→vi+1 + · · ·+ dk

−→vk ,

for some scalars d1, . . . , di−1, di+1, . . . , dk. Then we have a nontrivial linear combination

producing−→0 :

d1−→v1 + · · ·+ di−1

−−→vi−1 −−→vi + di+1−−→vi+1 + · · ·+ dk

−→vk =−→0 .

Thus S = {−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent in Rn.

Example. For A = [−→a1 −→a2 −→a3 −→a4 ] =

1 2 3 41 3 5 81 2 4 7

, we have shown that the columns are

linearly dependent and −→a1 + 2−→a2 − 3−→a3 +−→a4 =−→0 . We can write the first column in terms of

the other columns: −→a1 = −2−→a2 + 3−→a3 −−→a4 . In fact we can write any column in terms of theothers (which may not be the case for any given linearly dependent set of vectors).

19


3.3 Basis and Dimensions

Definition. A basis of a nontrivial subspace S of Rn is a subset B of S such that

(a) Span(B) = S and

(b) B is linearly independent set.

We define the basis of the trivial subspace {−→0 } to be B = ∅. The number of vectors in abasis B is the dimension of S denoted by dim (S) or dimS.

Example.

1. For the subspace S =

{[xy

]| x, y ∈ R, 2x− y = 0

}of R2,

S = Span

{[12

]}.

Also

{[12

]}is linearly independent. Thus B =

{[12

]}is a basis of S and dim (S) =

|B| = 1. Note that there infinitely many bases of S.

2. Among infinitely many bases of Rn, B = {−→e1 ,−→e2 , . . . ,−→en} =

100...0

,

010...0

, . . . ,

000...1

is called the standard basis of Rn. For any −→x = [x1, x2, . . . , xn]T ∈ Rn,

−→x = x1−→e1 + x2

−→e2 + · · ·+ xn−→en,

i.e.,

x1x2x3

...xn

= x1

100...0

+ x2

010...0

+ · · ·+ xn

000...1

∈ Span(B).

Thus Span(B) = Rn. To show linear independence, let x1−→e1 + x2

−→e2 + · · ·+ xn−→en =

−→0 ,

i.e., x1

100...0

+x2

010...0

+· · ·+xn

000...1

=

x1x2x3

...xn

=

000...0

=⇒ x1 = x2 = · · · = xn = 0.

So B is linearly independent . Thus B is a basis of Rn and dim (Rn) = |B| = n.

20


Now we present some important theorems regarding bases of a subspace of Rn.

Theorem 3.5 (Unique Representation Theorem). Let S be a subspace of Rn. Then B =

{−→b1 ,−→b2 , . . . ,

−→bk} is a basis of S if and only if each vector −→v of S is a unique linear combination

of−→b1 ,−→b2 , . . . ,

−→bk , i.e., −→v = c1

−→b1 + c2

−→b2 + · · ·+ ck

−→bk for unique scalars c1, c2, . . . , ck.

Proof. LetB = {−→b1 ,−→b2 , . . . ,

−→bk} be a basis of S. Consider a vector−→v of S. Since S = SpanB,

−→v = c1−→b1 +c2

−→b2 + · · ·+cn

−→bk for some scalars c1, c2, . . . , ck. To show these scalars are unique,

let −→v = d1−→b1 + d2

−→b2 + · · ·+ dn

−→bk for some scalars d1, d2, . . . , dk. Then

−→v −−→v = (c1−→b1 + c2

−→b2 + · · ·+ ck

−→bn)− (d1

−→b1 + d2

−→b2 + · · ·+ dn

−→bk )

−→0 = (c1 − d1)

−→b1 + (c2 − d2)

−→b2 + · · ·+ (ck − dk)

−→bn

Since B = {−→b1 ,−→b2 , . . . ,

−→bk} is linearly independent, (c1−d1) = (c2−d2) = · · · = (ck−dk) = 0

which implies d1 = c1, d2 = c2, · · · , dk = ck. The converse follows similarly (exercise).

Theorem 3.6 (Reduction Theorem). Let S be a subspace of Rn. If a set B = {−→b1 ,−→b2 , . . . ,

−→bk}

of vectors of S spans S, then either B is a basis of S or a subset of B is a basis of S.

Proof. Suppose B = {−→b1 ,−→b2 , . . . ,

−→bk} spans S. If B is linearly independent, then B is a basis

of S. Otherwise there is a vector, say−→b1 , which is a linear combination of other vectors in

B. Let B1 = B \ {−→b1} = {

−→b2 , . . . ,

−→bk}. We can verify that SpanB1 = SpanB = S. If B1

is linearly independent, then B1 is a basis of S. Otherwise there is a vector, say−→b2 , which

is a linear combination of other vectors in B1. Let B2 = B1 \ {−→b2} = {

−→b3 , . . . ,

−→bk}. We can

verify that SpanB2 = SpanB1 = S. Proceeding this way we end up with a subset Bm of Bfor some m ≤ k such that Bm is linearly independent and SpanBm = S which means Bm isa basis of S.

Similarly we can prove the following:

Theorem 3.7 (Extension Theorem). Let S be a subspace of Rn. If a set B = {−→b1 ,−→b2 , . . . ,

−→bk}

of vectors of S is linearly independent, then either B is a basis of S or a superset of B is abasis of S.

Example. Use Reduction Theorem to find a basis of CS (A) for A =

1 2 3 41 3 5 81 2 4 7

.

Write A = [−→a1 −→a2 −→a3 −→a4 ] and B = {−→a1 ,−→a2 ,−→a3 ,−→a4}. Then CS (A) = SpanS. Verify that−→a4 = −−→a1 − 2−→a2 + 3−→a3 (exercise). Then B is not linear independent and

CS (A) = SpanB = Span{−→a1 ,−→a2 ,−→a3 ,−→a4} = Span{−→a1 ,−→a2 ,−→a3}.

Verify that {−→a1 ,−→a2 ,−→a3} is linearly independent. Thus {−→a1 ,−→a2 ,−→a3} is a basis of CS (A).

Definition. The rank of a matrix A, denoted by rank (A), is the dimension of its columnspace, i.e., rank (A) = dim (CS (A)).

21


Theorem 3.8. The pivot columns of a matrix A form a basis for CS (A) and rank (A) isthe number of pivot columns of A.

Proof. (Sketch) Suppose R is the RREF of A. Then A−→x =−→0 if and only if R−→x =

−→0 , i.e.,

linear dependence relation among columns of A is the same as that of R. Since the pivotcolumns of R are linearly independent, so are the pivot columns of A. By Reduction Theoremwe can show that the pivot columns of R span CS (R). Then the pivot columns of A spanCS (A). Thus the pivot columns of A form a basis for CS (A) and rank (A) = dim (CS (A))is the number of pivot columns of A.

Remark. If R is the RREF of A, then CS (A) 6= CS (R) in general. Consider A =

[1 21 2

].

Then R = RREF(A) =

[1 20 0

]and CS (A) = Span

{[11

]}6= Span

{[10

]}= CS (R).

Example. Find rank (A) and a basis of CS (A) for A =

1 2 3 41 3 5 81 2 4 7

.

A =

1 2 3 41 3 5 81 2 4 7

−R1+R2−−−−−→−R1+R3

1 2 3 4

0 1 2 4

0 0 1 3

(REF )

Since A has 3 pivot columns −→a1 , −→a2 , and −→a3 , rank (A) = 3 and a basis of CS (A) is {−→a1 ,−→a2 ,−→a3},

i.e.,

1

11

, 2

32

, 3

54

.

Definition. The nullity of a matrix A, denoted by nullity (A), is the dimension of its nullspace, i.e., nullity (A) = dim (NS (A)).

Theorem 3.9. nullity (A) is the number of non-pivot columns of A.

Proof. (Sketch) Suppose B = [−→b1−→b2 · · ·

−→bn ] is the RREF of an m × n matrix A. Then

A−→x =−→0 if and only if B−→x =

−→0 , i.e., NS (A) = NS (B). Suppose

−→b1−→b2 · · ·

−→bk are the pivot

columns of B and the rest non-pivot columns. Then for i = k + 1, . . . , n,

−→bi = ci1

−→b1 + ci2

−→b2 + · · ·+ cik

−→bk =

k∑j=1

cij−→bj for some cij ∈ R.

B−→x =−→0 =⇒ x1

−→b1 + x2

−→b2 + · · ·+ xn

−→bn =

−→0

=⇒ x1−→b1 + x2

−→b2 + · · ·+ xk

−→bk + xk+1

(k∑j=1

ck+1,j

−→bj

)+ · · ·+ xn

(k∑j=1

cn,j−→bj

)=−→0

=⇒

(x1 +

n∑j=k+1

xjcj,1

)−→b1 + · · ·+

(xk +

n∑j=k+1

xjcj,k

)−→bk =

−→0

22


Since {−→b1 ,−→b2 , . . . ,

−→bk} is linearly independent, xi = −

∑nj=k+1 xjcj,i for i = 1, . . . , k. Then we

can write −→x as a linear combination of n− k linearly independent vectors that span NS (B)(exercise). Thus dim (NS (A)) = dim (NS (B)) = n− k.

Remark. The non-pivot columns of A do not form a basis for NS (A).

Example. Find nullity (A) and a basis of NS (A) for A =

1 2 3 41 3 5 81 2 4 7

.

A =

1 2 3 41 3 5 81 2 4 7

−→ 1 0 0 −1

0 1 0 −2

0 0 1 3

(RREF )

Since A has one non-pivot column, nullity (A) = 1. To find a basis of NS (A), we solve

A−→x =−→0 which becomes

x1 − x4 = 0x2 − 2x4 = 0

x3 + 3x4 = 0

where x1, x2 and x3 are basic variables and x4 is a free variable.

x1 = x4x2 = 2x4x3 = −3x4x4 = free

NS (A) =

x42x4−3x4x4

| x4 ∈ R

=

x4

12−3

1

| x4 ∈ R

= Span

12−3

1

.

Thus a basis of NS (A) is

12−3

1

.

Theorem 3.10 (Rank-Nullity Theorem). For an m× n matrix A,

rank (A) + nullity (A) = n.

Proof. rank (A) + nullity (A) = the sum of numbers of pivot and non-pivot columns of Awhich is n.

Example. If A is a 4× 5 matrix with rank 3, then by the Rank-Nullity Theorem

nullity (A) = n− rank (A) = 5− 3 = 2.

23


Now we investigate the relation of rank (A) with the dimension of the row space of A.

Definition. Each row of an m × n matrix A is called a row vector which can be identified

with a (column) vector in Rn. The row space of an m× n matrix A =

−→r1−→r2...−→rm

, denoted by

RS (A) or RowA, is the span of its row vectors:

RS (A) = Span{−→r1 ,−→r2 , . . . ,−→rm}.

Remark.

1. Since each row is an n dimensional vector, RS (A) is a subspace of Rn.

2. The row i of A is the column i of AT . Then RS (A) = CS(AT).

3. Elementary row operations may change the linear dependence relations among rows(unlike columns) but they do not change the row space. For example,

RS (A) = RS (RREF of A) .

Example. Consider A =

2 0 1 00 1 −1 12 1 0 1

. Write A =

−→r1−→r2−→r3, where −→r1 = [2, 0, 1, 0], −→r2 =

[0, 1,−1, 1], −→r3 = [2, 1, 0, 1]. Then RS (A) = CS(AT)

= Span{−→r1 ,−→r2 ,−→r3} is a subspace of R4.

A =

2 0 1 00 1 −1 12 1 0 1

−→ 2 0 1 0

0 1 −1 10 0 0 0

= R (REF)

Note that −→r3 = −→r1 +−→r2 in A, but not in R. Since the row 3 of R is −−→r1 −−→r2 +−→r3 in A, thespan of the rows of R is the same as that of A, i.e., RS (R) = RS (A). Note that the nonzerorows of B are linearly independent and span RS (R) = RS (A), i.e., they form a basis ofRS (R) = RS (A).

Definition. The row rank of a matrix A is the dimension of its row space.

Theorem 3.11. Let A be an m× n matrix with REF R. Then the nonzero rows of R forma basis for RS (R) = RS (A) and the row rank of A = the (column) rank of A = the numberof pivot positions of A.

Proof. Each nonzero row of R is not a linear combination of the other nonzero rows. Thusthe nonzero rows of R are linearly independent and span RS (R) = RS (A), i.e., they form abasis of RS (R) = RS (A). Recall that the rank of A is the number of pivot columns (hencepivot positions) of R. The number of pivot positions of R equals to the number of nonzerorows of R which is the row rank R and consequently the row rank of A.

24


Remark. For an m× n matrix A, 0 ≤ rank (A) ≤ min{m,n}.Example.

1. For the 3 × 4 matrix A in the preceding example, rank (A) ≤ min{3, 4} = 3. Since ithas two nonzero rows in its REF, the row rank of A = rank (A) = 2.

2. What is the smallest and largest possible nullity of a 5× 7 matrix A?First note 0 ≤ rank (A) ≤ min{5, 7} = 5. Now by the Rank-Nullity Theorem,nullity (A) = 7 − rank (A) ≥ 7 − 5 = 2. So the smallest possible nullity of A is 2. Inthat case the row rank of A = rank (A) = 5. Similarly nullity (A) = 7− rank (A) ≤ 7.So the largest possible nullity of A is 7. In that case the row rank of A = rank (A) = 0.

3.4 Linear Transformations

Definition. A function T : V → W from a subspace V of Rn to a subspace W of Rm iscalled a linear transformation if

(a) T (−→u +−→v ) = T (−→u ) + T (−→v ) for all −→u ,−→v ∈ V and

(b) T (c−→v ) = cT (−→v ) for for all −→v ∈ V and all scalars c ∈ R.

In short, a function T : V → W is a linear transformation if it preserves the linearity amongvectors: T (c−→u + d−→v ) = cT (−→u ) + dT (−→v ) for all −→u ,−→v ∈ V and all scalars c, d ∈ R.

Example.

1. The projection T : R3 → R3 of R3 onto the xy-plane in R3 is defined by

T

x1x2x3

=

x1x20

for all −→x =

x1x2x3

∈ R3.

Sometimes it is simply denoted by T (x1, x2, x3) = (x1, x2, 0) in terms of row vectors.To show it is a linear transformation let −→x = (x1, x2, x3) and −→y = (y1, y2, y3) in R3

and c, d ∈ R. Then

T (c−→x + d−→y ) = T (cx1 + dy1, cx2 + dy2, cx3 + dy3)

= (cx1 + dy1, cx2 + dy2, 0)

= (cx1, cx2, 0) + (dy1, dy2, 0)

= cT (−→x ) + dT (−→y ).

2. For the matrix A =

[1 20 1

], define the shear transformation T : R2 → R2 by T (−→x ) =

A−→x . Let −→x ,−→y ∈ R2 and c, d ∈ R. Then

T (c−→x + d−→y ) = A(c−→x + d−→y ) = cA−→x + dA−→y = cT (−→x ) + dT (−→y ).

Thus T is a linear transformation which transforms the square formed by (0, 0),(1, 0),(1, 1),(0, 1)to the parallelogram formed by (0, 0), (1, 0), (3, 1), (2, 1).

25


Definition. A matrix transformation is the linear transformation T : Rn → Rm defined byT (−→x ) = A−→x for some m× n matrix A. It is denoted by −→x 7→ A−→x .

From the definition of a linear transformation we have the following properties.

Proposition. For a linear transformation T : V → W where V ≤ Rn and W ≤ Rm,

(a) T (−→0n) =

−→0m and

(b) for all −→v1 , . . . ,−→vk ∈ V and all c1, . . . , ck ∈ R,

T (c1−→v1 + c2

−→v2 + · · ·+ ck−→vk) = c1T (−→v1) + c2T (−→v2) + · · ·+ ckT (−→vk).

Example. Consider the function T : R3 → R3 defined by T (x1, x2, x3) = (x1, x2, 5). SinceT (0, 0, 0) = (0, 0, 5) 6= (0, 0, 0), T is not a linear transformation.

Theorem 3.12. For a linear transformation T : Rn → Rm, there exists a unique m × nmatrix A, called the standard matrix of T , for which

T (−→x ) = A−→x for all −→x ∈ Rn.

Moreover, A = [T (−→e1 ) T (−→e2 ) · · ·T (−→en)] where −→ei is the ith column of In.

Proof. Let −→x = [x1, x2, . . . , xn]T ∈ Rn. We can write −→x = x1−→e1 + x2

−→e2 + · · ·+ xn−→en. Then

T (−→x ) = T (x1−→e1 + x2

−→e2 + · · ·+ xn−→en) = x1T (−→e1 ) + x2T (−→e2 ) + · · ·+ xnT (−→en)

= [T (−→e1 ) T (−→e2 ) · · ·T (−→en)]

x1x2...xn

= A−→x .

Example.

1. Use the standard matrix to find the rotation transformation T : R2 → R2 that rotateseach point of R2 about the origin through an angle θ counterclockwise.

By trigonometry we have

T (−→e1 ) = T

([10

])=

[cos θsin θ

]and T (−→e2 ) = T

([01

])=

[− sin θ

cos θ

].

Then the standard matrix is A = [T (−→e1 ) T (−→e2 )] =

[cos θ − sin θsin θ cos θ

]. Thus

T (−→x ) = A−→x , i.e., T

([x1x2

])=

[x1 cos θ − x2 sin θx1 sin θ + x2 cos θ

]for all −→x ∈ R2.

26


2. Consider the linear transformation T : R2 → R3 defined by

T (x1, x2) = (x1 − x2, 2x1 + 3x2, 4x2).

Note that T (−→e1 ) = T (1, 0) = (1, 2, 0) and T (−→e2 ) = T (0, 1) = (−1, 3, 4). The standardmatrix of T is

A = [T (−→e1 ) T (−→e2 )] =

1 −12 30 4

.For any given linear transformation T : Rn → Rm, the domain space is Rn and the codomainspace is Rm. We study a subspace of the domain space called Kernel or Null Space and asubspace of the codomain space called Image Space or Range.

Definition. The kernel or null space of a linear transformation T : Rn → Rm, denoted byker(T ) or kerT , is the following subspace of Rn:

kerT = {−→x ∈ Rn | T (−→x ) =−→0m}.

The nullity of T , denoted by nullity (T ), is the dimension of kerT , i.e.,

nullity (T ) = dim (kerT ) .

Remark. If A is the standard matrix of a linear transformation T : Rn → Rm, then kerT =NS (A) and nullity (T ) = nullity (A).

Example. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) has

the standard matrix A = [T (−→e1 ) T (−→e2 ) T (−→e3 )] =

[1 0 00 1 0

]. Note that

kerT = NS (A) = Span

0

01

,

and nullity (T ) = nullity (A) = 1.

Definition. The image space or range of a linear transformation T : Rn → Rm, denoted byim(T ) or imT or T (Rn), is the following subspace of Rm:

imT = {T (−→x ) | −→x ∈ Rn}.

The rank of T , denoted by rank (T ), is the dimension of imT , i.e.,

rank (T ) = dim (imT ) .

Remark. If A is the standard matrix of a linear transformation T : Rn → Rm, then imT =CS (A) and rank (T ) = rank (A).

27


Example. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) has the

standard matrix A = [T (−→e1 ) T (−→e2 )] =

1 00 10 0

. Note that

imT = CS (A) = Span

1

00

, 0

10

,

and rank (T ) = rank (A) = 2.

Theorem 3.13 (Rank-Nullity Theorem). For a linear transformation T : Rn → Rm,

rank (T ) + nullity (T ) = n.

Proof. Let A be the m×n standard matrix of T . Then by the Rank-Nullity Theorem on A,

rank (T ) + nullity (T ) = rank (A) + nullity (A) = n.

Example. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) hasnullity (T ) = 1 (see examples before). Then by the Rank-Nullity Theorem,

rank (T ) = 3− nullity (T ) = 2.

Now we discuss two important types of linear transformation T : Rn → Rm.

Definition. Let T : Rn → Rm be a linear transformation. T is onto if each−→b ∈ Rm has a

pre-image −→x in Rn under T , i.e., T (−→x ) =−→b . T is one-to-one if each

−→b ∈ Rm has at most

one pre-image in Rn under T .

Example.

1. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) is ontobecause each (x1, x2) ∈ R2 has a pre-image (x1, x2, 0) ∈ R3 under T . But T is not one-to-one because T (0, 0, 0) = T (0, 0, 1) = (0, 0), i.e., (0, 0) has two distinct pre-images(0, 0, 0) and (0, 0, 1) under T .

2. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) is one-to-onebecause T (x1, x2) = T (y1, y2) =⇒ (x1, x2, 0) = (x1, x2, 0) =⇒ (x1, x2) = (y1, y2).But T is not onto because (0, 0, 1) ∈ R3 has no pre-image (x1, x2) ∈ R2 under T .

3. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2) isone-to-one and onto (exercise).

Theorem 3.14. Let T : Rn → Rm be a linear transformation with the standard matrix A.Then the following are equivalent.

(a) T (i.e., −→x 7→ A−→x ) is one-to-one.

28


(b) kerT = NS (A) = {−→0n}.

(c) nullity (T ) = nullity (A) = 0.

(d) The columns of A are linearly independent.

Proof. (b), (c), and (d) are equivalent by the definitions.(a) =⇒ (b) Suppose T (i.e., −→x 7→ A−→x ) is one-to-one. Let −→x ∈ kerT = NS (A). Then

A−→x =−→0m. Also

−→0n 7→ A

−→0n =

−→0m. Since −→x 7→ A−→x is one-to-one, −→x =

−→0n. Thus

NS (A) = {−→0n}.

(b) =⇒ (a) Suppose kerT = NS (A) = {−→0n}. Let −→x ,−→y ∈ Rn such that A−→x = A−→y . Then

A(−→x −−→y ) =−→0m. Then −→x −−→y ∈ NS (A) = {−→0n} which implies −→x −−→y =

−→0n, i.e., −→x = −→y .

Thus −→x 7→ A−→x is one-to-one.



1 00 10 0

. Note that the columns of A are linearly

independent , kerT = NS (A) = {−→02}, and nullity (T ) = nullity (A) = 0. Thus T (i.e.,−→x 7→ A−→x ) is one-to-one.

Theorem 3.15. Let T : Rn → Rm be a linear transformation with the standard matrix A.Then the following are equivalent.

(a) T (i.e., −→x 7→ A−→x ) is onto.

(b) imT = CS (A) = Rm.

(c) rank (T ) = rank (A) = m.

(d) Each row of A has a pivot position.

Proof. (b), (c), and (d) are equivalent by the definitions.

(a) =⇒ (b) Suppose T (i.e., −→x 7→ A−→x ) is onto. Let−→b ∈ Rm. Since −→x 7→ A−→x is onto,

−→b = A−→x for some −→x ∈ Rn. Then

−→b = A−→x ∈ CS (A). Thus imT = CS (A) = Rm.

(b) =⇒ (a) Suppose imT = CS (A) = Rm. Let−→b ∈ Rm. Since

−→b ∈ CS (A) = Rm,

−→b = A−→x for some −→x ∈ Rn. Thus −→x 7→ A−→x is onto.



[1 0 00 1 0

]. Note that each row of A has a

pivot position, imT = CS (A) = R2, and rank (T ) = rank (A) = 2. Thus T (i.e., −→x 7→ A−→x )is onto.

Definition. A linear transformation T : Rn → Rn is an isomorphism if it is one-to-one andonto.

29


Example. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2)is one-to-one and onto consequently an isomorphism. Showing T is one-to-one is enough toshow T is an isomorphism by the following theorem.

Theorem 3.16. Let T : Rn → Rn be a linear transformation with the n×n standard matrixA. Then the following are equivalent.

(a) T (i.e., −→x 7→ A−→x ) is an isomorphism.

(b) T (i.e., −→x 7→ A−→x ) is one-to-one.

(c) kerT = NS (A) = {−→0n}.

(d) nullity (T ) = nullity (A) = 0.

(e) The columns of A are linearly independent.

(f) T (i.e., −→x 7→ A−→x ) is onto.

(g) imT = CS (A) = Rn.

(h) rank (T ) = rank (A) = n.

(i) Each row and column of A has a pivot position.

Proof. (b), (c), (d), and (e) are equivalent by Theorem 7.6. (f), (g), (h), and (i) are equivalentby Theorem 7.7. Now for the n × n standard matrix A, rank (A) + nullity (A) = n. Thusnullity (A) = 0 if and only if rank (A) = n, i.e., (d) and (h) are equivalent. Since (b) and (f)are equivalent, they are equivalent to (a).

Example. What can we say about CS (A) ,NS (A) , rank (A) , nullity (A), and pivot positionsof a 3× 3 matrix with three linearly independent columns? What about −→x 7→ A−→x ?By the preceding theorem, CS (A) = R3,NS (A) = {−→03}, rank (A) = 3, nullity (A) = 0, A has3 pivot positions, and −→x 7→ A−→x is a one-to-one linear transformation from R3 onto R3.

30


4 Inverse and Determinant of a Matrix

4.1 Inverse of a Matrix

Definition. An n× n matrix A is invertible if there an n× n matrix B such that

AB = BA = In.

This B is called the inverse of A, denoted by A−1, for which AA−1 = A−1A = In. Aninvertible matrix is also called a nonsingular matrix. A square matrix that is not invertibleis called a singular matrix.

Example. For A =

[1 24 6

]and B =

[−3 1

2 −0.5

], AB =

[1 00 1

]= BA. So B = A−1.

Theorem 4.1. Let A and B be two n× n invertible matrices. Then the following hold.

(a) A−1 is invertible and (A−1)−1 = A.

(b) AT is invertible and (AT )−1 = (A−1)T .

(c) For c 6= 0, cA is invertible and (cA)−1 = 1cA−1.

(d) AB is invertible and (AB)−1 = B−1A−1.

Proof. (a) and (c) are exercises. For (b) note that

AT (A−1)T = (A−1A)T = ITn = In and

(A−1)TAT = (AA−1)T = ITn = In.

For (d) note that(AB)(B−1A−1) = A(BB−1)A−1 = AInA

−1 = AA−1 = In and

(B−1A−1)(AB) = B−1(A−1A)B = B−1InB = B−1B = In.

Example. ForA =

[1 13 4

]andB =

[1 22 5

], A−1 =

[4 −1−3 1

]andB−1 =

[5 −2−2 1

].

Verify (AT )−1 =

[1 31 4

]−1=

[4 −3−1 1

]= (A−1)T , (5A)−1 = 1

5

[4 −1−3 1

]= 1

5A−1, and

(AB)−1 =

[3 7

11 26

]−1=

[26 −7−11 3

]= B−1A−1.

How do we know a given square matrix A is invertible? How do we find A−1?

Theorem 4.2. Let A be an n× n matrix. Then the following are equivalent.

(a) A is invertible.

(b) A−→x =−→b has a unique solution for each

−→b ∈ Rn.

31


(c) The RREF of A is In.

Proof. (b) ⇐⇒ (c) A−→x =−→b has a unique solution for each

−→b ∈ Rn if and only if each

column of the RREF of A has a leading 1 if and only if the RREF of A is In.

(a) =⇒ (b) Suppose A is invertible. Let−→b ∈ Rn. Then A−→x =

−→b =⇒ −→x = A−1

−→b .

(b) =⇒ (a) Suppose A−→x =−→b has a unique solution for each

−→b ∈ Rn. Let A−→vi = −→ei for

i = 1, 2, . . . , n. Then

A[−→v1 −→v2 · · · −→vn] = [A−→v1 A−→v2 · · ·A−→vn] = [−→e1 −→e2 · · · −→en] = In.

To showA−1 = [−→v1−→v2 · · · −→vn], it suffices to show [−→v1−→v2 · · · −→vn]A = In. Since A[−→v1−→v2 · · · −→vn] = In,

A[−→v1 −→v2 · · · −→vn]A = InA = A.

Let−→bi be the ith column of [−→v1−→v2 · · · −→vn]A for i = 1, 2, . . . , n. Then A

−→bi = −→ai . But A−→ei = −→ai .

By the uniqueness of solution of A−→x = −→ai ,−→bi = −→ei for i = 1, 2, . . . , n. Thus

[−→v1 −→v2 · · · −→vn]A = [−→e1 −→e2 · · · −→en] = In.

To find A−1 for an invertible matrix A, we investigate how row operations on A are obtainedfrom premultiplying A by elementary matrices.

Definition. An n×n elementary matrix is obtained by applying an elementary row operationon In.

Example.

1. Eij is obtained by Ri ↔ Rj on In. Note that EijA is obtained by Ri ↔ Rj on A.

A =

0 2 41 −3 0−1 3 1

R1↔R2−−−−→

1 −3 00 2 4−1 3 1

=

0 1 01 0 00 0 1

0 2 41 −3 0−1 3 1

= E12A.

2. For c 6= 0, Ei(c) is obtained by cRi on In. Note that Ei(c)A is obtained by cRi on A.

E12A =

1 −3 00 2 4−1 3 1

12R2−−→

1 −3 00 1 2−1 3 1

=

1 0 00 1

20

0 0 1

1 −3 00 2 4−1 3 1

= E2

(1

2

)E12A.

3. Eij(c) is obtained by cRi +Rj on In. Note that Eij(c)A is obtained by cRi +Rj on A.

E2

(1

2

)E12A =

1 −3 00 1 2−1 3 1

R1+R3−−−−→

1 −3 00 1 20 0 1

=

1 0 00 1 01 0 1

1 −3 00 1 2−1 3 1

= E13(1)E2

(1

2

)E12A.

32


Remark. Elementary matrices are invertible. Moreover, E−1ij = Eij, Ei(c)−1 = Ei

(1c

)for

c 6= 0, and Eij(c)−1 = Eij(−c).

Theorem 4.3. Let A be an n×n invertible matrix. A sequence of elementary row operationsthat reduces A to In also reduces In to A−1.

Proof. Since A is invertible, the RREF of A is In. Suppose In is obtained from A bysuccessively premultiplying by elementary matrices E1, E2, . . . , Ek, i.e.,

EkEk−1 · · ·E1A = In.

Postmultiplying by A−1, we get

EkEk−1 · · ·E1AA−1 = InA

−1 =⇒ EkEk−1 · · ·E1In = A−1.

Gauss-Jordan elimination:Find the RREF of [A | In]. If the the RREF of A is In, then A is invertible and the RREFof [A | In] is [In | A−1]. Otherwise A is not invertible.

Example.

[A | I3] =

0 2 4 1 0 01 −3 0 0 1 0−1 3 1 0 0 1

R1↔R2−−−−→

1 −3 0 0 1 00 2 4 1 0 0−1 3 1 0 0 1

R1+R3−−−−→

1 −3 0 0 1 00 2 4 1 0 00 0 1 0 1 1

−4R3+R2−−−−−→

1 −3 0 0 1 00 2 0 1 −4 −40 0 1 0 1 1

12R2−−→

1 −3 0 0 1 00 1 0 1

2−2 −2

0 0 1 0 1 1

3R2+R1−−−−→

1 0 0 32−5 −6

0 1 0 12−2 −2

0 0 1 0 1 1

= [I3 | A−1]

Thus A−1 =

32−5 −6

12−2 −2

0 1 1

. Notice how elementary matrices E12, E13(1), E32(−4), E2

(12

),

E21(3) are successively applied on A to get I3:

E21(3)E2

(1

2

)E32(−4)E13(1)E12A = I3.

Verify that the product of those elementary matrices is A−1:

A−1 = E21(3)E2

(1

2

)E32(−4)E13(1)E12.

Remark. For an m × n matrix A there is a generalized inverse called the Moore-Penroseinverse, denoted by A+, which can be found using the singular-value decomposition of A.

33


4.2 Invertible Matrix Theorem

Theorem 4.4 (Invertible Matrix Theorem). Let A be an n× n matrix. Then the followingare equivalent.


(b) A−→x =−→b has a unique solution for each

−→b ∈ Rn.

(c) The RREF of A is In.

(d) T (i.e., −→x 7→ A−→x ) is an isomorphism.

(e) T (i.e., −→x 7→ A−→x ) is one-to-one.

(f) kerT = NS (A) = {−→0n}.

(g) nullity (T ) = nullity (A) = 0.

(h) The columns of A are linearly independent.

(i) T (i.e., −→x 7→ A−→x ) is onto.

(j) imT = CS (A) = Rn.

(k) rank (T ) = rank (A) = n.

(l) Each row and column of A has a pivot position.

Proof. (a), (b), and (c) are equivalent by Theorem 4.2. Also (d)-(l) are equivalent byTheorem 7.8. Since A is a square matrix, (c) and (l) are equivalent.

Example. What can we say about CS (A) ,NS (A) , rank (A) , nullity (A), and pivot positionsof a 3× 3 invertible matrix? What about −→x 7→ A−→x ?By the IMT, CS (A) = R3,NS (A) = {−→03}, rank (A) = 3, nullity (A) = 0, A has 3 pivotpositions, and −→x 7→ A−→x is an isomorphism, i.e., a one-to-one linear transformation from R3

onto R3. Also A−→x =−→b has a unique solution A−1

−→b for each

−→b ∈ R3.

Remark. In general the conditions in the IMT are not equivalent for a non-square matrix.

Example.

1. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) has

2 × 3 standard matrix A =

[1 0 00 1 0

]. Note that T is onto but not one-to-one.

Equivalently the columns of A span R2 but they are not linearly independent.

2. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) has 3 × 2

standard matrix A =

1 00 10 0

. Note that T is one-to-one but not onto. Equivalently

the columns of A are linearly independent but they do not span R3.

34


Definition. A linear transformation T : Rn → Rn is invertible if there is another lineartransformation S : Rn → Rn such that

T (S(−→x )) = S(T (−→x )) = −→x for all −→x ∈ Rn.

This S is called the inverse of T , denoted by T−1, for which T ◦ T−1 = T−1 ◦ T = I, theidentity function on Rn.

Remark. It is well-known that a function is invertible if it is one-to-one and onto. So alinear transformation T : Rn → Rn is an isomorphism if and only if it invertible.

Example. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + 2x2, 3x1 +5x2) is one-to-one and onto consequently invertible. How to find T−1 : R2 → R2?

Theorem 4.5. Let T : Rn → Rn be a linear transformation with the standard matrix A.Then T is invertible if and only if A is invertible. Also T−1 : Rn → Rn is given by

T−1(−→x ) = A−1−→x .

Proof. T is invertible (i.e., an isomorphism) if and only if A is invertible by the IMT. LetS : Rn → Rn be a linear transformation defined by S(−→x ) = A−1−→x . Then for all −→x ∈ Rn,

T (S(−→x )) = T (A−1−→x ) = A(A−1−→x ) = In−→x = −→x and

S(T (−→x )) = S(A−→x ) = A−1(A−→x ) = In−→x = −→x .

Thus S = T−1.

Example. The isomorphism T : R2 → R2 defined by T (x1, x2) = (x1+2x2, 3x1+5x2) has the


[1 23 5

]. Since A−1 =

[−5 2

3 −1

], T−1 : R2 → R2

is given by T−1(−→x ) = A−1−→x , i.e., T−1(x1, x2) = (−5x1 + 2x2, 3x1 − x2). Verify that for all[x1, x2]

T ∈ R2,

T (T−1(x1, x2)) = T (−5x1 + 2x2, 3x1 − x2) = (x1, x2) and

T−1(T (x1, x2)) = T−1(x1 + 2x2, 3x1 + 5x2) = (x1, x2).

4.3 Determinant of a Matrix

In this section we study the determinant of an n× n matrix A = [aij], denoted by det(A) ordetA or |A| or ∣∣∣∣∣∣∣∣∣

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

∣∣∣∣∣∣∣∣∣ .To define det(A) recursively, we denote A(i, j) for the the matrix obtained from A by deletingrow i and column j of A.

35


Definition. If A = [a11], then det(A) = a11. If A =

[a11 a12a21 a22

], then det(A) = a11a22 −

a12a21. For an n× n matrix A = [aij] where n ≥ 3,

det(A) =n∑i=1

(−1)1+ia1i detA(1, i) = a11 detA(1, 1)−a12 detA(1, 2)+· · ·+(−1)n+1a1n detA(1, n).

Example. We find det(A) for A =

1 2 31 3 51 4 2

.

det(A) = a11 detA(1, 1)− a12 detA(1, 2) + a13 detA(1, 3)

= 1

∣∣∣∣ 3 54 2

∣∣∣∣− 2

∣∣∣∣ 1 51 2

∣∣∣∣+ 3

∣∣∣∣ 1 31 4

∣∣∣∣= 1(3 · 2− 5 · 4)− 2(1 · 2− 5 · 1) + 3(1 · 4− 3 · 1)

= −5

Definition. For an n× n matrix A = [aij] where n ≥ 2, the (i, j) minor, denoted by mij, ismij = detA(i, j) and the (i, j) cofactor, denoted by cij, is

cij = (−1)i+jmij = (−1)i+j detA(i, j).

Remark. We defined det(A) as the cofactor expansion along the first row of A:

det(A) =n∑i=1

(−1)1+ia1i detA(1, i) =n∑i=1

a1ic1i.

But it can be proved that det(A) is the cofactor expansion along any row or column of A.

Theorem 4.6. Let A be an n× n matrix. Then for each i, j = 1, 2, . . . , n,

det(A) =n∑j=1

aijcij =n∑i=1

aijcij.

The preceding theorem can be proved using the following equivalent definition of determinant:

det(A) =∑σ∈Sn

(sgn(σ)

n∏i=1

aiσ(i)

),

where σ runs over all n! permutations σ of {1, 2, . . . , n}. (This requires study of permutations)

Corollary 4.7. Let A = [aij] be an n× n matrix.

(a) det(AT ) = det(A).

(b) If A is a triangular matrix, then det(A) = a11a22 · · · ann.

36


Proof. (Sketch) (a) Note that the (i, j) cofactor of A is the (j, i) cofactor of AT . The cofactorexpansions along the first rows to get det(A) would be same as cofactor expansions alongthe first columns to get det(AT ).(b) If A is an upper-triangular matrix, then by cofactor expansions along the first rows weget det(A) = a11a22 · · · ann. Similarly if A is a lower-triangular matrix, then by cofactorexpansions along the first columns we get det(A) = a11a22 · · · ann.

Example. A =

1 2 3 4 53 0 1 3 20 0 4 3 00 0 0 2 12 0 0 0 3

.We compute det(A) using rows or columns with maximum

number of zeros at a step. So first we choose column 2 and do cofactor expansion along it:

det(A) = −2

∣∣∣∣∣∣∣∣3 1 3 20 4 3 00 0 2 12 0 0 3

∣∣∣∣∣∣∣∣Now we have 5 choices: row 2,3,4 and column 1,2. We do cofactor expansion along row 4:

det(A) = −2

−2

∣∣∣∣∣∣1 3 24 3 00 2 1

∣∣∣∣∣∣+ 3

∣∣∣∣∣∣3 1 30 4 30 0 2

∣∣∣∣∣∣

Since the second determinant is a determinant of an upper-triangular matrix, its determinantis 3 · 4 · 2 = 24. We do cofactor expansion along column 3 for the first determinant.

det(A) = −2

(−2

(2

∣∣∣∣ 4 30 2

∣∣∣∣+ 1

∣∣∣∣ 1 34 3

∣∣∣∣)+ 3 · 24

)= −2 (−2 (2(4 · 2− 0) + 1(1 · 3− 3 · 4)) + 72)

= −116

Some applications of determinants:

1. Determinant as volume: Suppose a hypersolid S in Rn is given by n concurrent edgesthat are represented by column vectors of an n × n matrix A. Then the volume of S is| det(A)|.

Let −→r1 = [a1, b1, c1]T , −→r2 = [a2, b2, c2]

T , −→r3 = [a3, b3, c3]T .

A = [−→r1 −→r2 −→r3 ] =

a1 a2 a3b1 b2 b3c1 c2 c3

and the volume of the

parallelepiped with concurrent edges given by −→r1 ,−→r2 ,−→r3 is| det(A)| = |a1(b2c3−b3c2)−a2(b1c3−b3c1)+a3(b1c2−b2c1)|.

37


2. Equation of a plane: Consider the plane passing through three distinct points P1(x1, y1, z1),P2(x2, y2, z2) and P3(x3, y3, z3). Let P (x, y, z) be a point on the plane. So the volume of

the parallelepiped with concurrent edges−−→P1P ,

−−→P2P and

−−→P3P is zero.∣∣∣∣∣∣

x− x1 x− x2 x− x3y − y1 y − y2 y − y3z − z1 z − z2 z − z3

∣∣∣∣∣∣ = 0.

3. Volume after transformation: Let T : Rn → Rn be a linear transformation with thestandard matrix A. Let S be a bounded hypersolid in Rn. Then the volume of T (S) is| det(A)| times the volume of S.

x2 + y2 = 1

x2

a2+y2

b2= 1

Example. Let A =

[a 00 b

]and D = {(x, y) | x2 + y2 ≤ 1}. Consider T : R2 → R2

defined by T ([x, y]T ) = A[x, y]T . Note T (D) = {(x, y) | x2a2

+ y2

b2≤ 1}. So the area of

ellipse = the area of T (D) = det(A) · A(D) = ab · π12 = πab.

4. Change of variables: Suppose variables x1, . . . , xn are changed to v1, . . . , vn by n differentiablefunctions f1, . . . , fn so that

v1 = f1(x1, . . . , xn)

v2 = f2(x1, . . . , xn)...

vn = fn(x1, . . . , xn).

So we have a function F : Rn → Rn defined by

F (x1, . . . , xn) = (f1(x1, . . . , xn), . . . , fn(x1, . . . , xn)).

The Jacobian matrix of F : Rn → Rn is the following

∂(f1, . . . , fn)

∂(x1, . . . , xn)=

∂f1∂x1

· · · ∂f1∂xn

.... . .

...∂fn∂x1

· · · ∂fn∂xn

.38


The change of variables formula for integrals is∫F (U)

G(−→v )d−→v =

∫U

G(−→x )

∣∣∣∣ ∂(f1, . . . , fn)

∂(x1, . . . , xn)

∣∣∣∣ d−→x .Example. So (x, y) = F (r, θ) = (ar cos θ, br sin θ) and F ([0, 1] × [0, 2π]) is the region

inscribed by the ellipse x2

a2+ y2

b2= 1. The Jacobian matrix is

∂(x, y)

∂(r, θ)=

[∂x∂r

∂x∂θ

∂y∂r

∂y∂θ

]=

[a cos θ −ar sin θb sin θ br cos θ

]and

∣∣∣∣∂(x, y)

∂(r, θ)

∣∣∣∣ = abr.

By the change of variables formula,∫F ([0,1]×[0,2π])

1 d−→v =

∫ 2π

θ=0

∫ 1

r=0

1

∣∣∣∣∂(x, y)

∂(r, θ)

∣∣∣∣ drdθ = ab · π.

5. Wronskian: The Wroskian of n real-values differentiable functions f1, . . . , fn is

W (f1, . . . , fn)(x) =

∣∣∣∣∣∣∣∣∣f1(x) · · · fn(x)f

′1(x) · · · f

′n(x)

.... . .

...

f(n−1)1 (x) · · · f

(n−1)n (x)

∣∣∣∣∣∣∣∣∣ .

f1, . . . , fn are linearly independent functions iff W (f1, . . . , fn) is not identically zero.

4.4 Properties of Determinants

Theorem 4.8. For an n×n matrix A and n×n elementary matrices Eij, Ei(c), Eij(c), wehave detEij = −1, detEi(c) = c, detEij(c) = 1, and

det(EijA) = − detA = (detEij)(detA),

det(Ei(c)A) = c detA = (detEi(c))(detA),

det(Eij(c)A) = detA = (detEij(c))(detA).

Proof. Use cofactor expansion and induction on n.

Theorem 4.9. Let A be an n× n matrix. Then A is invertible if and only if det(A) 6= 0.

Proof. Suppose A is invertible. Then A−1 is invertible and there are elementary matricesE1, E2, . . . , Ek such that EkEk−1 · · ·E1A

−1 = In. Postmultiplying by A, we get

EkEk−1 · · ·E1 = A =⇒ det(EkEk−1 · · ·E1) = det(A).

39


By successively applying Theorem 4.8, we get

det(A) = det(EkEk−1 · · ·E1) = det(Ek) det(Ek−1) · · · det(E1) 6= 0.

For the converse, suppose that A is not invertible. Then the RREF R of A is not In. So R isan upper-triangular matrix with the last row being a zero row and consequently det(R) = 0.Suppose E ′1, E

′2, . . . , E

′t are elementary matrices for which E ′tE

′t−1 · · ·E ′1A = R. Then

det(E ′tE′t−1 · · ·E ′1A) = det(R) = 0 =⇒ det(E ′t) det(E ′t−1) · · · det(E ′1) det(A) = 0,

by Theorem 4.8. Since det(E ′i) 6= 0 for i = 1, 2, . . . , t, det(A) = 0.

Remark. We extend the IMT by adding one more equivalent condition:


(m) det(A) 6= 0.

Theorem 4.10. Let A and B be two n× n matrices. Then det(AB) = det(A) det(B).

Proof. Case 1. A is not invertible.By the IMT, rank (A) < n. Since CS (AB) ⊆ CS (A), rank (AB) ≤ rank (A) < n andconsequently AB is also not invertible. By the IMT, det(A) = 0 and det(AB) = 0. Thus

det(AB) = 0 = det(A) det(B).

Case 2. A is invertible.There are elementary matrices E1, E2, . . . , Ek such that EkEk−1 · · ·E1 = A. Postmultiplyingby B, we get AB = EkEk−1 · · ·E1B. By successively applying Theorem 4.8, we get

det(AB) = det(EkEk−1 · · ·E1B)

= det(Ek) det(Ek−1) · · · det(E1) det(B)

= det(EkEk−1 · · ·E1) det(B)

= det(A) det(B).

Corollary 4.11. Let A be an n× n matrix.

(a) For all scalars c, det(cA) = det(cInA) = det(cIn) det(A) = cn det(A).

(b) If A is invertible, then det(A) det(A−1) = det(AA−1) = det(In) = 1.

Example. A =

1 2 33 5 10 0 2

. Is A invertible? Compute det(AT ), det(4A5), and det(A−1).

Since det(A) = −2 6= 0, A is invertible and we have det(AT ) = det(A) = −2, det(4A5) =43(detA)5 = −2048, and det(A−1) = (detA)−1 = (−2)−1 = −1/2.

40


Theorem 4.12 (Cramer’s Rule). Let A be an n × n invertible matrix and−→b ∈ Rn. The

unique solution −→x = [x1, x2, . . . , xn]T of A−→x =−→b is given by

xi =det(Ai(

−→b ))

det(A), i = 1, 2, . . . , n,

where Ai(−→b ) is the matrix obtained from A by replacing its ith column by

−→b .

Proof. Let i ∈ {1, 2, . . . , n}. Note that

A[−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en] = [A−→e1 · · ·A−−→ei−1A−→x A−−→ei+1 · · ·A−→en]

= [−→A1 · · ·

−−→Ai−1 A

−→x−−→Ai+1 · · ·

−→An]

= [−→A1 · · ·

−−→Ai−1

−→b−−→Ai+1 · · ·

−→An]

= Ai(−→b ).

Since det([−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en]) = xi,

det(Ai(−→b )) = det(A[−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en])

= det(A) det([−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en])

= det(A)xi.

Thus xi = det(Ai(−→b ))

det(A).

Example. We solve A−→x =−→b by Cramer’s Rule, where

A =

1 0 23 2 51 1 4

,−→x =

x1x2x3

, and−→b =

181

.Since det(A) = 5 6= 0, there is a unique solution [x1, x2, x3]

T and by Cramer’s Rule,

x1 =det(A1(

−→b ))

det(A)=

∣∣∣∣∣∣1 0 28 2 51 1 4

∣∣∣∣∣∣5

=15

5= 3

x2 =det(A2(

−→b ))

det(A)=

∣∣∣∣∣∣1 1 23 8 51 1 4

∣∣∣∣∣∣5

=10

5= 2

x3 =det(A3(

−→b ))

det(A)=

∣∣∣∣∣∣1 0 13 2 81 1 1

∣∣∣∣∣∣5

=−5

5= −1.

Thus the unique solution is [x1, x2, x3]T = [3, 2,−1]T .

41


Definition. Let A be an n × n matrix. The cofactor matrix, denoted by C = [cij], is ann× n matrix where cij is the (i, j) cofactor of A. The adjoint or adjugate of A, denoted byadjA or adj(A), is the transpose of the cofactor matrix of A, i.e., adjA = CT .

Theorem 4.13. Let A be an n× n invertible matrix. Then

A−1 =1

det(A)adjA.

Proof. Since AA−1 = In, A(the column j of A−1) = −→ej . By Cramer’s Rule, the (i, j)-entryof A−1, i.e., the ith entry of the column j of A−1, is

det(Ai(−→ej ))

det(A)=

(−1)i+j det(A(j, i))

det(A)=

cjidet(A)

=(CT )ijdet(A)

=(adjA)ijdet(A)

.

Example.

1. For invertible A =

[a bc d

],

A−1 =1

det(A)adjA =

1

ad− bc

[c11 c12c21 c22

]T=

1

ad− bc

[d −c−b a

]T=

1

ad− bc

[d −b−c a

]

2. For invertible A =

1 0 23 2 51 1 4

,

A−1 =1

det(A)adjA =

1

5

c11 c12 c13c21 c22 c23c31 c32 c33

T =1

5

3 −7 12 2 −1−4 1 2

T =1

5

3 2 −4−7 2 1

1 −1 2

We end by the following useful multilinear property of determinant:

Theorem 4.14. Let A = [−→a1 −→a2 · · · −→an] be an n × n matrix. Then for all −→x ,−→y ∈ Rn andfor all scalars c, d,

det[−→a1 · · · −−→ai−1 (c−→x + d−→y )−−→ai+1 · · · −→an] = c det[−→a1 · · · −−→ai−1 −→x −−→ai+1 · · · −→an]

+ d det[−→a1 · · · −−→ai−1 −→y −−→ai+1 · · · −→an].

Proof. (Sketch) Find determinants by the cofactor expansion along the ith column.

Example.∣∣∣∣ 3a+ 4s 3b+ 4tc d

∣∣∣∣ =

∣∣∣∣ 3a+ 4s c3b+ 4t d

∣∣∣∣ (by transposing)

=

∣∣∣∣ 3a c3b d

∣∣∣∣+

∣∣∣∣ 4s c4t d

∣∣∣∣ (by multilinearity of determinant)

= 3

∣∣∣∣ a cb d

∣∣∣∣+ 4

∣∣∣∣ s ct d

∣∣∣∣ (by multilinearity of determinant)

= 3(ad− cb) + 4(sd− ct).

42


5 Eigenvalues and Eigenvectors

5.1 Basics of Eigenvalues and Eigenvectors

Definition. Let A be an n× n matrix. If A−→x = λ−→x for some nonzero vector −→x and somescalar λ, then λ is an eigenvalue of A and −→x is an eigenvector of A corresponding to λ.


[1 20 3

], λ = 3, −→v =

[11

], −→u =

[−2

1

].

Since A−→v =

[1 20 3

] [11

]=

[33

]= 3

[11

]= λ−→v , 3 is an eigenvalue of A and −→v is an

eigenvector of A corresponding to the eigenvalue 3.

Since A−→u =

[1 20 3

] [−2

1

]=

[03

]6= λ

[−2

1

]= λ−→u for all scalars λ, −→u is not an

eigenvector of A.

Remark. For a real matrix, an eigenvalue can be a complex number and an eigenvector canbe a complex vector.


[0 1−1 0

]. Since

[0 1−1 0

] [1i

]=

[i−1

]= i

[1i

], i is an

eigenvalue of A and

[1i

]is an eigenvector of A corresponding to the eigenvalue i

Remark. An eigenvector must be a nonzero vector by definition. So the following areequivalent:

1. λ is an eigenvalue of A

2. A−→x = λ−→x for some nonzero vector −→x

3. (A− λI)−→x =−→0 for some nonzero vector −→x

4. (A− λI)−→x =−→0 has a nontrivial solution −→x

5. A− λI is not invertible (by IMT)

6. det(A− λI) = 0.

Definition. det(λI − A) is a polynomial of λ and it is the characteristic polynomial of A.det(λI − A) = 0 is the characteristic equation of A.

Remark. Since the roots of the characteristic polynomial are the eigenvalues of the n × nmatrix A, A has n eigenvalues, not necessarily distinct.

Definition. The multiplicity of a root λ in det(λI − A) is the algebraic multiplicity of theeigenvalue λ of A.

43


Remark. If λ is an eigenvalue of A, then NS (A− λI) is the union of {−→0 } and the set ofall eigenvectors of A corresponding to the eigenvalue λ.

Definition. Suppose λ is an eigenvalue of the matrix A. Then

NS (A− λI) = {−→x | (A− λI)−→x =−→0 }

is the eigenspace of A corresponding to the eigenvalue λ and dim (NS (A− λI)) is thegeometric multiplicity of the eigenvalue λ.

Example. Let A =

3 0 00 4 10 −2 1

.

(a) Find the characteristic polynomial of A.

(b) Find the eigenvalues of A with their algebraic multiplicities.

(c) Find the eigenspaces of A and geometric multiplicities of the eigenvalues of A.

Solution. (a) The characteristic polynomial of A is

det(λI − A) =

∣∣∣∣∣∣λ− 3 0 0

0 λ− 4 −10 2 λ− 1

∣∣∣∣∣∣= (λ− 3)

∣∣∣∣ λ− 4 −12 λ− 1

∣∣∣∣− 0 + 0

= (λ− 3)(λ2 − 5λ+ 6)

= (λ− 3)(λ− 3)(λ− 2)

(b) det(λI − A) = (λ− 2)(λ− 3)2 = 0 =⇒ λ = 2, 3, 3.So 2 and 3 are eigenvalue of A with algebraic multiplicities 1 and 2 respectively.

(c) The eigenspace of A corresponding to the eigenvalue 3 is

NS (A− 3I) = {−→x | (A− 3I)−→x =−→0 }.

[A− 3I | −→0 ] =

0 0 0 00 1 1 00 −2 −2 0

2R2+R3−−−−→

0 0 0 00 1 1 00 0 0 0

R1↔R2−−−−→

0 1 1 00 0 0 00 0 0 0

So we get x2 + x3 = 0 where x1 and x3 are free variable. Thus

−→x =

x1x2x3

=

x1−x3x3

= x1

100

+ x3

0−1

1

∈ Span

1

00

, 0−1

1

.

44


Thus the eigenspace of A corresponding to the eigenvalue 3 is

NS (A− 3I) = Span

1

00

, 0−1

1

,

and the geometric multiplicity of the eigenvalue 3 is dim (NS (A− 3I)) = 2.

The eigenspace of A corresponding to the eigenvalue 2 is

NS (A− 2I) = {−→x | (A− 2I)−→x =−→0 }.

[A− 2I | −→0 ] =

1 0 0 00 2 1 00 −2 −1 0

R2+R3−−−−→

1 0 0 00 2 1 00 0 0 0

R22−−→

1 0 0 0

0 1 12

00 0 0 0

So we get x1 = 0, x2 + x3

2= 0 where x3 is a free variable. Thus

−→x =

x1x2x3

=

0−x3

2

x3

=x32

0−1

2

∈ Span

0−1

2

.

Thus the eigenspace of A corresponding to the eigenvalue 2 is

NS (A− 2I) = Span

0−1

2

,

and the geometric multiplicity of the eigenvalue 2 is dim (NS (A− 2I)) = 1.

Remark. Recall that −→x 7→ A−→x is a linear transformation from Rn to Rn. This lineartransformation is invariant on the eigenspaces of A:

If λ is an eigenvalue of A and −→x ∈ NS (A− λI), then A−→x ∈ NS (A− λI).

Example. In the preceding example, −→x =

4−5

5

∈ NS (A− 3I) = Span

1

00

, 0−1

1

,

and alsoA−→x = A

4−5

5

=

12−15

15

= 3

4−5

5

∈ NS (A− 3I) = Span

1

00

, 0−1

1

.

Theorem 5.1 (IMT contd.). Let A be an n× n matrix. Then the following are equivalent:

(a) A is invertible

(o) 0 is not an eigenvalue of A.

Proof. (contrapositive) 0 is an eigenvalue of A iff A−→x =−→0 has a nontrivial solution. By

the IMT, A−→x =−→0 has a nontrivial solution iff A is not invertible.

45


Some useful results:

Theorem 5.2. Let A be an n × n matrix with eigenvalues λ1, λ2, . . . , λn. Then detA =λ1λ2 · · ·λn.

Proof. Note that det(λI − A) = (λ − λ1)(λ − λ2) · · · (λ − λn). Plugging λ = 0, we get(−1)n detA = (−1)nλ1λ2 · · ·λn =⇒ detA = λ1λ2 · · ·λn.

Theorem 5.3. The eigenvalues of a triangular matrix (e.g., diagonal matrix) are the entrieson its main diagonal.

Proof. Consider an upper-triangular matrix A =

d10 d2 ∗0 0 d3...

.... . .

0 0 0 · · · dn

.Its characteristic polynomial is det(λI − A) = (λ− d1)(λ− d2) · · · (λ− dn).So det(λI − A) = 0 =⇒ λ = d1, . . . , dn.

Theorem 5.4. Let A be a square matrix. If λ is an eigenvalue of A, then λk is an eigenvalueof Ak.

Proof. Suppose A−→v = λ−→v , −→v 6= −→0 . Then A(A−→v ) = A(λ−→v ). So

A2−→v = λ(A−→v ) = λ(λ−→v ) = λ2−→v .

Continuing this process, we get Ak−→v = λk−→v .

Theorem 5.5. Let A be an invertible matrix. Then λ is an eigenvalue of A if and only if 1λ

is an eigenvalue of A−1.

Proof. Suppose A−→v = λ−→v , −→v 6= −→0 . Since A is invertible, λ 6= 0.

A−1(A−→v ) = A−1(λ−→v )

I−→v = −→v = λ(A−1−→v )1

λ−→v = A−1−→v

So 1λ

is an eigenvalue of A−1. The converse follows by a similar argument.

Theorem 5.6. Let A be a square matrix. If −→v1 ,−→v2 , . . . ,−→vk are eigenvectors of A correspondingto distinct eigenvalues λ1, λ2, . . . , λk of A respectively, then {−→v1 ,−→v2 , . . . ,−→vk} is linearly independent.

Proof. Let λ1, λ2, . . . , λk be distinct and A−→vi = λi−→vi , −→vi 6=

−→0 for i = 1, . . . , k. Suppose

{−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent. WLOG let {−→v1 ,−→v2 , . . . ,−→vp} be a maximal linearlyindependent subset of {−→v1 ,−→v2 , . . . ,−→vk} for some p < k. Then {−→v1 ,−→v2 , . . . ,−→vp ,−−→vp+1} is linearlydependent and consequently

−−→vp+1 = c1−→v1 + · · ·+ cp

−→vp , (2)

46


for some scalars c1, . . . , cp, not all zero (since −−→vp+1 6=−→0 ).

A−−→vp+1 = A(c1−→v1 + · · ·+ cp

−→vp)

λp+1−−→vp+1 = c1A

−→v1 + · · ·+ cpA−→vp

λp+1−−→vp+1 = c1λ1

−→v1 + · · ·+ cpλp−→vp (3)

λp+1(2)− (3) gives−→0 = c1(λp+1 − λ1)−→v1 + · · ·+ cp(λp+1 − λp)−→vp (4)

Since λp+1−λi 6= 0 for i = 1, . . . , p and c1, . . . , cp are not all zero, c1(λp+1−λ1), . . . , cp(λp+1−λp) are not all zero. So (4) implies {−→v1 ,−→v2 , . . . ,−→vp} is linearly dependent, a contradiction.

Remark. The converse of the preceding theorem is not true. Consider A =

3 0 00 4 10 −2 1

in the last example.

100

and

0−1

1

are linearly independent eigenvectors of A and they

are eigenvectors corresponding to the same eigenvalue 3 of A.

5.2 Similar and Diagonalizable Matrices

Definition. Let A and B be n × n matrices. A is similar to B if A = PBP−1 for someinvertible matrix P .

Remark. If A is similar to B, then B is similar to A because B = P−1A(P−1)−1. So wesimply say A and B are similar.


[6 −12 3

], B =

[5 00 4

], C =

[5 40 0

].

A and B are similar because A = PBP−1 where P =

[1 11 2

]and P−1 =

[2 −1−1 1

]. It

can be verified that there is no invertible matrix P such that A = PCP−1. So A and C arenot similar.

Theorem 5.7. If n×n matrices A and B are similar, then they have the same characteristicpolynomial and consequently the same eigenvalues, counting multiplicities.

Proof. Let A = PBP−1 for some invertible matrix P . Then

det(λI − A) = det(λI − PBP−1)= det(λPP−1 − PBP−1)= det(P (λI −B)P−1)

= detP det(λI −B) det(P−1)

= det(λI −B) detP det(P−1)

= det(λI −B) det(PP−1)

= det(λI −B) · 1.

47


Remark.

1. In the preceding example A and B are similar and then they have the same eigenvalues.Since the eigenvalues of C and B are different, C is not similar to B and hence A.

2. If A and B are similar, they have the same eigenvalues. But the converse is not true.

For example,

[1 00 0

]and

[1 10 0

]are not similar, but have the same eigenvalues.

Theorem 5.8. Let A be an n×n matrix with eigenvalue λ. Then the geometric multiplicityof λ is less than or equal to the algebraic multiplicity of λ.

Proof. Let k be the geometric multiplicity of λ. Suppose−→x1, . . . ,−→xk are k linearly independenteigenvectors of A corresponding to λ. Let P = [−→x1 · · · −→xk ∗ . . . ∗] be an n×n invertible matrix.

Then P−1AP =

[λIk ∗0 ∗

]and λ is an eigenvalue of P−1AP with algebraic multiplicity

at least k. Since A and P−1AP , being similar, have the same eigenvalues, the algebraicmultiplicity of λ is at least k.

Definition. A square matrix A is diagonalizable if A is similar to a diagonal matrix, i.e.,A = PDP−1 for some invertible matrix P and some diagonal matrix D.

Example. In the first example A is diagonalizable as A = PBP−1 where B is a diagonalmatrix.

Theorem 5.9. Let A be an n× n matrix. Then TFAE.

(a) A is diagonalizable,

(b) There are n linearly independent eigenvectors of A,

(c) The sum of the geometric multiplicities of the distinct eigenvalues of A is n, and

(d) Geometric multiplicity and algebraic multiplicity are the same for all eigenvalues of A.

Proof. First note that (b), (c), and (d) are equivalent. So we prove (a)⇐⇒(b).

(a)=⇒(b). There is an invertible matrix P = [−→p1 , . . . ,−→pn] such thatA = Pdiag(λ1, . . . , λn)P−1,i.e., AP = P diag(λ1, . . . , λn). So A−→pi = λi

−→pi for i = 1, . . . , n. So −→pi is an eigenvector ofA corresponding to the eigenvalue λi for i = 1, . . . , n. Since P is invertible, its columns−→p1 , . . . ,−→pn are linearly independent by the IMT.

(b)=⇒(a). Suppose −→x1, . . . ,−→xn are n linearly independent eigenvectors of A correspondingto the eigenvalues λ1, . . . , λn respectively. Then P = [−→x1, . . . ,−→xn] is invertible by the IMT.Since A−→xi = λi

−→xi for i = 1, . . . , n, we get

[A−→x1 . . . , A−→xn] = [λ1−→x1, . . . , λn−→xn]

A[−→x1, . . . ,−→xn] = [−→x1, . . . ,−→xn]diag(λ1, . . . , λn)

AP = PD,

where D = diag(λ1, . . . , λn). Thus A = PDP−1.

48


Corollary 5.10. Let A be an n× n matrix.

1. If A has n distinct eigenvalues, then A is diagonalizable.

2. Suppose that A has k distinct eigenvalues λ1, . . . , λk with eigenbases B1, . . . , Bk respectively.Then A is diagonalizable if and only if B1 ∪ · · · ∪Bk is a basis for Rn.

A formula for Ak: Suppose A is diagonalizable and A = PDP−1 for some diagonal matrixD. Then

Ak = PDkP−1.

It is easy to see that

AA · · ·A = (PDP−1)(PDP−1) · · · (PDP−1) = PDD · · ·DP−1.

Note that Dk is obtained from D by raising the power of each diagonal entry of D to k.

Example. Let A =

2 0 01 2 1−1 0 1

.

(a) Diagonalize A, if possible.

(b) Find Ak, if A is diagonalizable.

Solution. det(λI − A) =

∣∣∣∣∣∣λ− 2 0 0−1 λ− 2 −11 0 λ− 1

∣∣∣∣∣∣ = (λ − 1)(λ − 2)2 = 0 =⇒ λ = 1, 2, 2.

Verify the following:

NS (A− 1I) = Span

0−1

1

NS (A− 2I) = Span

0

10

, −1

01

(a) Since 3 × 3 matrix A has 3 linearly independent eigenvectors, A is diagonalizable and

A = PDP−1 where D =

1 0 00 2 00 0 2

and P =

0 0 −1−1 1 0

1 0 1

.

You may verify this by showing AP = PD =

0 0 −2−1 2 0

1 0 2

.

49


(b) Since A = PDP−1,

Ak = PDkP−1

=

0 0 −1−1 1 0

1 0 1

1 0 00 2 00 0 2

k 1 0 11 1 1−1 0 0

=

0 0 −1−1 1 0

1 0 1

1 0 00 2k 00 0 2k

1 0 11 1 1−1 0 0

=

2k 0 0−1 + 2k 2k −1 + 2k

1− 2k 0 1

.An interesting fact: If A and B are diagonalizable and they have the same eigenvectors,then AB = BA.

5.3 Similarity of Matrix Transformations

Suppose B =(−→b1 , . . . ,

−→bn

)is an ordered basis of Rn. Then any vector −→x ∈ Rn can be written

as −→x = c1−→b1 + c2

−→b2 + · · ·+ cn

−→bn for some unique scalars c1, c2, . . . , cn. The coordinate vector

of −→x relative to B or the B-coordinate of −→x , denoted by [−→x ]B, is [−→x ]B = [c1, c2, . . . , cn]T .

Example.

1. For E2 = (−→e1 ,−→e2 ) and −→x = [3, 2]T = 3−→e1 + 2−→e2 , we have [−→x ]E2 = [3, 2]T .

2. For B = (−→e1 ,−→e1 + 2−→e2 ) and −→x = [3, 2]T = 2−→e1 + 1(−→e1 + 2−→e2 ), we have [−→x ]B = [2, 1]T .

Remark. [ ]B is an isomorphism on Rn.

For two ordered bases B =(−→b1 , . . . ,

−→bn

)and C = (−→c1 , . . . ,−→cn) of Rn, what is the relationship

between [−→x ]B and [−→x ]C? The change of basis matrix from B to C, denoted by MC←B, isthe n×n invertible matrix for which [−→x ]C = MC←B[−→x ]B for all −→x ∈ Rn. How to find MC←B?

Let A be an n × n matrix. Consider the linear transformations T : Rn → Rn defined byT (−→x ) = A−→x . So T is the matrix transformation −→x 7→ A−→x . Consider two ordered bases

B =(−→b1 , . . . ,

−→bn

)and C = (−→c1 , . . . ,−→cn) of Rn. What is the relationship between [−→x ]B and

[T (−→x )]C? The matrix of T from B to C, denoted by [T ]C←B or C [T ]B, is the n×n invertiblematrix for which [T (−→x )]C =C [T ]B[−→x ]B for all −→x ∈ Rn. How to find C [T ]B?

50


−→x T (−→x )

[−→x ]B [T (−→x )]C

T

C [T ]B

[ ]B [ ]C

Rn Rn

For a vector −→x ∈ Rn, suppose [−→x ]B = [r1, r2, . . . , rn]T , i.e., −→x = r1−→b1 + · · ·+ rn

−→bn . Then

T (−→x ) = A−→x = A(r1−→b1 + · · ·+ rn

−→bn) = r1A

−→b1 + · · ·+ rnA

−→bn .

[A−→x ]C = [r1A−→b1 + · · ·+ rnA

−→bn ]C

= r1[A−→b1 ]C + · · ·+ rn[A

−→bn ]C

=[[A−→b1 ]C · · · [A

−→bn ]C

] r1...rn

=[[A−→b1 ]C · · · [A

−→bn ]C

][−→x ]B.

Thus

C [T ]B =[[A−→b1 ]C · · · [A

−→bn ]C

].

Remark.

1. If C = B, then we simply denote C [T ]B by [T ]B, called the B-matrix of T : −→x 7→ A−→x .

2. If B = {−→e1 , . . . ,−→en}, then [T ]B = A, the standard matrix of T : −→x 7→ A−→x .

3. If A = In, then T = I and C [I]B = MC←B, the change of basis matrix from B to C.

Example. Let B =

([10

],

[11

]), C =

([12

],

[31

]), and A =

[−2 7

1 4

].

(a) Find C [T ]B, the matrix of T : −→x 7→ A−→x from B to C and use it to find [T (−→x )]C where[−→x ]B = [13,−1]T .

(b) Find [T ]B, the B-matrix of T : −→x 7→ A−→x .

(c) Find MC←B, the change of basis matrix from B to C and use it to find [−→x ]C where[−→x ]B = [13,−1]T .

51


Solution.(a) A

−→b1 =

[−2

1

]= 1

[12

]− 1

[31

]= 1−→c1 − 1−→c2 =⇒ [A

−→b1 ]C =

[1−1

]A−→b2 =

[55

]= 2

[12

]+ 1

[31

]= 2−→c1 + 1−→c2 =⇒ [A

−→b2 ]C =

[21

]So the matrix of T : −→x 7→ A−→x from B to C is

C [T ]B =[[A−→b1 ]C [A

−→b2 ]C

]=

[1 2−1 1

].

[T (−→x )]C =C [T ]B[−→x ]B =

[1 2−1 1

] [13−1

]=

[11−14

].

(b) A−→b1 =

[−2

1

]= −3

[10

]+ 1

[11

]= −3

−→b1 + 1

−→b2 =⇒ [A

−→b1 ]B =

[−3

1

]A−→b2 =

[55

]= 0

[10

]+ 5

[11

]= 0−→b1 + 5

−→b2 =⇒ [A

−→b2 ]B =

[05

]So the B-matrix of T : −→x 7→ A−→x is

[T ]B =[[A−→b1 ]B [A

−→b2 ]B

]=

[−3 0

1 5

].

(c) −→b1 =

[10

]= −1

5

[12

]+

2

5

[31

]= −1

5−→c1 +

2

5−→c2 =⇒ [

−→b1 ]C =

[−1

525

]−→b2 =

[11

]=

2

5

[12

]+

1

5

[31

]=

2

5−→c1 +

1

5−→c2 =⇒ [

−→b2 ]C =

[2515

]So the change of basis matrix from B to C is

MC←B =C [I]B =[[−→b1 ]C [

−→b2 ]C

]=

[−1

525

25

15

]=

1

5

[−1 2

2 1

].

[−→x ]C = MC←B[−→x ]B =1

5

[−1 2

2 1

] [13−1

]=

[−3

5

].

Theorem 5.11. Let A and D be two n × n matrices such that A = PDP−1. If B is thebasis of Rn formed from the columns of P , then the B-matrix of −→x 7→ A−→x is D = P−1AP .

Proof. Let P =[−→b1 · · ·

−→bn

]and B =

(−→b1 , . . . ,

−→bn

). Since

[−→b1 · · ·

−→bn

][−→x ]B = −→x , we have

P [−→x ]B = −→x . So [−→x ]B = P−1−→x for all −→x ∈ Rn.

[T ]B =[[A−→b1 ]B · · · [A

−→bn ]B

]=[P−1A

−→b1 · · ·P−1A

−→bn

]= P−1A

[−→b1 · · ·

−→bn

]= P−1AP

= D since A = PDP−1.

52


Remark.

1. Suppose A is diagonalizable and A = PDP−1 where D is a diagonal matrix. If B isthe basis of Rn formed from the columns of P (linearly independent eigenvectors of A),then the B-matrix of −→x 7→ A−→x is the diagonal matrix D whose main diagonal entriesare the corresponding eigenvalues of A.

2. The set of all matrix representations (i.e., B-matrices) of −→x 7→ A−→x is the set of allmatrices similar to A.

Example. A =

[−1 3−3 5

]. The eigenvalues of A are λ = 2, 2 with only one linearly

independent eigenvector −→v =

[11

]. Then A is not diagonalizable and no B-matrix of

−→x 7→ A−→x is diagonal. So we find a vector−→w such that (A−λI)2−→w =−→0 and (A−λI)−→w 6= −→0 .

This −→w is called a generalized eigenvector of A corresponding to the eigenvalue λ = 2. One

such −→w is −→w =

[12

]which is a generalized eigenvector of A corresponding to λ = 2.

Now consider a basis B =

{[11

],

[12

]}of R2 consisting of eigenvectors and generalized

eigenvectors of A. Then the B-matrix of −→x 7→ A−→x is an upper-triangular matrix[2 10 2

]= P−1AP where P =

[1 11 2

].

This upper-triangular matrix J = P−1AP =

[2 10 2

]is called the Jordan form of A.

Theorem 5.12. Any n × n matrix is similar to an n × n matrix in Jordan form, i.e.,

A = PJP−1 where J =

J1. . .

Jk

, Ji =

λi 1

λi. . .. . . 1

λi

, and λ1, . . . , λk are

eigenvalues of A (not necessarily distinct).

5.4 Application to Differential Equations

Suppose x1, x2, . . . , xn are n functions of t. Consider the following system of n linear ODEs:

x′1 = a11(t)x1 + a12(t)x2 + · · · + a1n(t)xn + g1(t)x′2 = a21(t)x1 + a22(t)x2 + · · · + a2n(t)xn + g2(t)

......

......

...x′n = am1(t)x1 + am2(t)x2 + · · · + amn(t)xn + gn(t).

53


It can be simply written in the following matrix form:

−→x ′ = A−→x +−→g , (5)

where A =

a11 a12 · · · a1na21 a22 · · · a2n...

......

an1 an2 · · · ann

, −→x =

x1x2...xn

and −→g =

g1g2...gn

. A is called the

coefficient matrix of (5). When −→g =−→0 , (5) is a homogeneous system. Similarly when

−→g 6= −→0 , (5) is a nonhomogeneous system.

Theorem 5.13. If −→v1 , . . . ,−→vn are n linearly independent eigenvectors of A corresponding toeigenvalues λ1, . . . , λn respectively, then eλ1t−→v1 , . . . , eλnt−→vn are n linearly independent solutionsof

−→x ′ = A−→x

and the general solution is

−→x = c1eλ1t−→v1 + · · ·+ cne

λnt−→vn,

for arbitrary scalars c1, . . . , cn.

Verify:

−→x ′ = c1eλ1tλ1

−→v1 + · · ·+ cneλntλn

−→vnA−→x = c1e

λ1tA−→v1 + · · ·+ cneλntA−→vn

= c1eλ1tλ1

−→v1 + · · ·+ cneλntλn

−→vn.

Thus −→x ′ = A−→x .

Example. Suppose a particle is moving in a planar force field and its position vector −→xsatisfies the IVP

−→x ′ = A−→x , −→x (0) = [5, 6]T ,

where A =

[2 04 −3

]. Solve the IVP and sketch the trajectory of the particle on R2.

Solution. The eigenvalues of A are 2 and −3 with corresponding eigenvectors

[54

]and[

01

]respectively (show all the steps). So the general solution is

54


−→x (t) = c1e2t

[54

]+ c2e

−3t[

01

].

−→x (0) =

[56

]=⇒ c1

[54

]+ c2

[01

]=

[56

]=⇒ 5c1 = 5, 4c1 + c2 = 6

=⇒ c1 = 1, c2 = 2.

So the solution is

−→x (t) = e2t[

54

]+ 2e−3t

[01

].

x1

x2

(5, 6)

Geometric view:

−→x (t) = e2t[

54

]+ 2e−3t

[01

]=

[5e2t

4e2t + 2e−3t

]=⇒ x1 = 5e2t, x2 = 4e2t + 2e−3t.

Eliminating t by using et =√x1/5, we get x31(4x1 − 5x2)

2 = 1250, the trajectory of theparticle whose planar motion is described by the given IVP.

55


6 Inner-product and Orthogonality

6.1 Orthogonal Vectors in Rn

Definition. The inner product or the dot product of two vectors −→u and −→v in Rn, denotedby −→u · −→v , is defined by −→u · −→v = −→u T−→v .

Example. For −→u =

1−2

3

and −→v =

21−1

, −→u ·−→v = −→u T−→v = 1 ·2−2 ·1+3 ·(−1) = −3.

Theorem 6.1. The following are true for all −→u , −→v , −→w in Rn and for all scalars c, d in R.

(a) −→u · −→v = −→v · −→u . (symmetry)

(b) (c−→u + d−→v ) · w = c(−→u · −→w ) + d(−→v · −→w ). (linearity)

(c) −→u · −→u ≥ 0 where −→u · −→u = 0 if and only if −→u =−→0 . (nonnegativity)

Definition. The length or norm of −→v = [v1, v2, . . . , vn]T in Rn, denoted by ‖−→v ‖, is definedby ‖−→v ‖ =

√v21 + v22 + · · ·+ v2n. −→v ∈ Rn is a unit vector if ‖−→v ‖ = 1.

Remark. The following are true for all −→v in Rn and for all scalars c in R.

(a) ‖−→v ‖2 = −→v · −→v .

(b) ‖c−→v ‖ = |c| ‖−→v ‖.

(c) The unit vector in the direction of −→v 6= −→0 is 1

‖−→v ‖−→v .

Example. The unit vector in the opposite direction of−→v =

1−2

3

is −1‖−→v ‖−→v = 1√

14

−12−3

.

Definition. The distance between −→u ,−→v in Rn, denoted by d(−→u ,−→v ), is defined by

d(−→u ,−→v ) = ‖−→u −−→v ‖ .

Note that d(−→u ,−→v )2 = ‖−→u −−→v ‖2 = ‖−→u ‖2+‖−→v ‖2+2−→u ·−→v and d(−→u ,−−→v )2 = ‖−→u +−→v ‖2 =

‖−→u ‖2 +‖−→v ‖2−2−→u ·−→v . So −→u and −→v are perpendicular if and only if d(−→u ,−→v ) = d(−→u ,−−→v )if and only if −→u · −→v = 0.

Definition. Two vectors −→u and −→v in Rn are orthogonal if −→u · −→v = 0.

Example. Let −→u = [3, 2,−5, 0]T and −→v = [−4, 1,−2, 1]T .

(a) Determine if −→u and −→v are orthogonal.

(b) Find d(−→u ,−→v ).

56


Solution. (a) Since −→u · −→v = 3 · (−4) + 2 · 1− 5 · (−2) + 0 · 1 = 0, −→u and −→v are orthogonal.(b)

d(−→u ,−→v ) = ‖−→u −−→v ‖ =

√‖−→u ‖2 + ‖−→v ‖2 + 2−→u · −→v

=

√‖−→u ‖2 + ‖−→v ‖2 (since −→u · −→v = 0)

=√

38 + 22

=√

60

Theorem 6.2 (Pythagorean Theorem). Two vectors −→u and −→v in Rn are orthogonal if and

only if ‖−→u +−→v ‖2 = ‖−→u ‖2 + ‖−→v ‖2.

Definition. The angle θ between two vectors −→u and −→v in Rn is the angle in [0, π] satisfying

−→u · −→v = ‖−→u ‖ ‖−→v ‖ cos θ.

Definition. Let W be a subspace of Rn. A vector −→v ∈ Rn is orthogonal to W if −→v · −→w = 0for all −→w ∈ W . The orthogonal complement of W , denoted by W⊥, is the set of all vectorsin Rn that are orthogonal to W , i.e.,

W⊥ = {−→v ∈ Rn | −→v · −→w = 0 for all −→w ∈ W}.

Example.

1. If L is a line in R2 through the origin, then L⊥ is the line through the origin that isperpendicular to L.

2. If L is a line in R3 through the origin, then L⊥ is the plane through the origin that isperpendicular to L. Note that (L⊥)⊥ = L.

Theorem 6.3. Let W be a subspace of Rn and W = Span{−→w1,−→w2, . . . ,

−→wk}. Then

(a) −→v ∈ W⊥ if and only if −→v · −→wi = 0 for i = 1, 2, . . . , k.

(b) W⊥ is a subspace of Rn.

(c) (W⊥)⊥ = W .

(d) W ∩W⊥ = {−→0 }.

Proof.

(a) Let −→v ∈ W⊥. Then −→v ·−→w = 0 for all −→w ∈ W . Since −→wi ∈ W for i = 1, 2, . . . , k, −→v ·−→wi = 0for i = 1, 2, . . . , k.Conversely suppose that −→v · −→wi = 0 for i = 1, 2, . . . , k. Let −→w ∈ W = Span{−→w1,

−→w2, . . . ,−→wk}.

Then −→w = c1−→w1 + c2

−→w2 + · · ·+ ck−→wk for some scalars c1, c2, . . . , ck. Then

−→v · −→w = −→v · (c1−→w1 + c2−→w2 + · · ·+ ck

−→wk) = c1(−→v · −→w1) + c2(

−→v · −→w2) + · · ·+ ck(−→v · −→wk) = 0.

Thus −→v · −→w = 0 for all −→w ∈ W and consequently −→v ∈ W⊥.

57


(b)−→0 · −→w = 0 for all −→w ∈ W ,

−→0 ∈ W⊥ and W⊥ 6= ∅. Let −→u ,−→v ∈ W⊥ and c, d ∈ R. Then

for all −→w ∈ W ,

(c−→u + d−→v ) · −→w = c(−→u · −→w ) + d(−→v · −→w ) = c−→0 + d

−→0 =

−→0 .

Thus c−→u + d−→v ∈ W⊥. Therefore W⊥ is a subspace of Rn.

(c) Exercise.

(d) First note that {−→0 } ⊆ W ∩W⊥. Let −→v ∈ W ∩W⊥. Then −→v ∈ W and −→v ∈ W⊥. Thus

‖−→v ‖2 = −→v · −→v = 0 which implies −→v =−→0 . Therefore W ∩W⊥ = {−→0 }.

Theorem 6.4. Let A be an m × n real matrix. Then RS (A)⊥ = NS (A) and CS (A)⊥ =NS(AT).

Proof. To show NS (A) ⊆ RS (A)⊥, let −→x ∈ NS (A) = {−→x ∈ Rn | A−→x =−→0 }. Then each

row of A is orthogonal to −→x . Since RS (A) is the span of rows of A, −→x is orthogonal to eachvector of RS (A). Then −→x ∈ RS (A)⊥. Thus NS (A) ⊆ RS (A)⊥. To show RS (A)⊥ = NS (A),it suffices to show RS (A)⊥ ⊆ NS (A). Let −→x ∈ RS (A)⊥. Since rows of A are in RS (A),−→x is orthogonal to each row of A. Then A−→x =

−→0 and−→x ∈ NS (A). Thus RS (A)⊥ ⊆ NS (A).

Finally NS(AT)

= RS(AT)⊥

= CS (A)⊥ because RS(AT)

= CS (A).

6.2 Orthogonal Bases and Matrices

Definition. A set {−→v1 ,−→v2 , . . . ,−→vk} of vectors in Rn is called an orthogonal set if −→vi · −→vj = 0for all distinct i, j = 1, 2, . . . , k. Also {−→v1 ,−→v2 , . . . ,−→vk} is called an orthonormal set if it is anorthogonal set of unit vectors.

Example. Let −→v1 =

20−1

, −→v2 =

020

, and −→v3 =

102

. Verify that −→v1 ·−→v2 = 0, −→v1 ·−→v3 =

0, −→v2 · −→v3 = 0. Then {−→v1 ,−→v2 ,−→v3} is an orthogonal set in R3 but not orthonormal. Thefollowing is an orthonormal set:{ −→v1

‖−→v1‖,−→v2‖−→v2‖

,−→v3‖−→v3‖

}=

1√5

20−1

, 1

2

020

, 1√5

102

.

Theorem 6.5. If {−→v1 ,−→v2 , . . . ,−→vk} is an orthogonal set nonzero vectors in Rn, then {−→v1 ,−→v2 , . . . ,−→vk}is linearly independent and consequently it forms a basis of Span{−→v1 ,−→v2 , . . . ,−→vk}.

Proof. Let c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 for some scalars c1, c2, . . . , ck. Then

−→0 · −→v1 = (c1

−→v1 + c2−→v2 + · · ·+ ck

−→vk) · −→v1=⇒ 0 = c1(

−→v1 · −→v1) + c2(−→v2 · −→v1) + · · ·+ ck(

−→vk · −→v1)

=⇒ 0 = c1 ‖−→v1‖2

+ 0 + · · ·+ 0

=⇒ c1 = 0(

since ‖−→v1‖ 6= 0 as −→v1 6=−→0).

58


Similarly we can prove c2 = c3 = · · · = ck = 0. Thus {−→v1 ,−→v2 , . . . ,−→vk} is linearly independentand consequently it forms a basis of Span{−→v1 ,−→v2 , . . . ,−→vk}.Definition. Let W be a subspace of Rn. An orthogonal basis of W is a basis of W that is anorthogonal set. Similarly an orthonormal basis of W is a basis of W that is an orthonormalset.


20−1

, −→v2 =

020

, and −→v3 =

102

. Then {−→v1 ,−→v2 ,−→v3} is an

orthogonal basis of R3.

Theorem 6.6. Let W be a subspace of Rn and {−→w1,−→w2, . . . ,

−→wk} is an orthogonal basis ofW . If −→v ∈ W , then

−→v =−→v · −→w1−→w1 · −→w1

−→w1 +−→v · −→w2−→w2 · −→w2

−→w2 + · · ·+−→v · −→wk−→wk · −→wk

−→wk.

Proof. Let −→v ∈ W = Span{−→w1,−→w2, . . . ,

−→wk}. Then −→v = c1−→w1 + c2

−→w2 + · · · + ck−→wk for some

scalars c1, c2, . . . , ck. Then−→v · −→w1 = (c1

−→w1 + c2−→w2 + · · ·+ ck

−→wk) · −→w1

=⇒ −→v · −→w1 = c1(−→w1 · −→w1) + c2(

−→w2 · −→w1) + · · ·+ ck(−→wk · −→w1)

=⇒ −→v · −→w1 = c1(−→w1 · −→w1) + 0 + · · ·+ 0

=⇒ c1 =−→v · −→w1−→w1 · −→w1

(since −→w1 · −→w1 = ‖−→w1‖

2 6= 0 as −→w1 6=−→0).

Similarly we can prove that ci =−→v · −→wi−→wi · −→wi

for i = 2, 3, . . . , k.


20−1

, −→v2 =

020

, and −→v3 =

102

. Write −→v =

−143

as a

unique linear combination of −→v1 ,−→v2 ,−→v3 which form an orthogonal basis of R3.

−→v =

−143

=−→v · −→v1−→v1 · −→v1

−→v1 +−→v · −→v2−→v2 · −→v2

−→v2 +−→v · −→v3−→v3 · −→v3

−→v3

=−5

5−→v1 +

8

4−→v2 +

5

5−→v3

= −−→v1 + 2−→v2 +−→v3 .Theorem 6.7. An m× n real matrix U has orthonormal columns if and only if UTU = In.

Proof. Let U = [−→u1 −→u2 · · · −→un] be an m× n real matrix. Then

UTU =

−→u1T−→u2T

...−→unT

[−→u1 −→u2 · · · −→un] =

−→u1 · −→u1 −→u1 · −→u2 · · · −→u1 · −→un−→u2 · −→u1 −→u2 · −→u2 · · · −→u2 · −→un

......

. . ....

−→un · −→u1 −→un · −→u2 · · · −→un · −→un

.Thus U has orthonormal columns if and only if UTU = In.

59


Definition. A square real matrix U is called an orthogonal matrix if U has orthonormalcolumns, equivalently if UTU = I.

Theorem 6.8. The following are equivalent for an n× n real matrix U .

(a) U is an orthogonal matrix.

(b) U has orthonormal columns.

(c) UTU = In.

(d) UUT = In.

(e) U has orthonormal rows.

(f) U−1 = UT .

Example. U =

2√5

0 1√5

0 1 0−1√5

0 2√5

is an orthogonal matrix and U−1 = UT =

2√5

0 −1√5

0 1 01√5

0 2√5

.

Theorem 6.9. Let U be an m× n real matrix with orthonormal columns. Then

(a) (U−→x ) · (U−→y ) = −→x · −→y for all −→x ,−→y ∈ Rn.

(b) (U−→x ) · (U−→y ) = 0 if and only if −→x ·−→y = 0 for all −→x ,−→y ∈ Rn (i.e., the map −→x 7→ U−→xpreserves the orthogonality between vectors).

(c) ‖U−→x ‖ = ‖−→x ‖ for all −→x ∈ Rn (i.e., the map −→x 7→ U−→x preserves the length of vectors).

Proof. Since m× n real matrix U has orthonormal columns, UTU = In.

(a) (U−→x ) · (U−→y ) = (U−→x )T (U−→y ) = −→x TUTU−→y = −→x T In−→y = −→x · −→y for all −→x ,−→y ∈ Rn.

(b) Follows from (a).

(c) By (a), ‖U−→x ‖2 = (U−→x ) · (U−→x ) = −→x · −→x = ‖−→x ‖2 =⇒ ‖U−→x ‖ = ‖−→x ‖.

Corollary 6.10. An n× n real matrix U is orthogonal if and only if ‖U−→x ‖ = ‖−→x ‖ for all−→x ∈ Rn.

Proof. Let U be an n× n real matrix.( =⇒ ) It follows from (c) of Theorem 6.9.(⇐=) Suppose ‖U−→x ‖ = ‖−→x ‖ for all −→x ∈ Rn. Let UTU = [aij]. Since UTU is symmetric,

aij = aji. For i = 1, 2, . . . , n, aii = (U−→ei )T (U−→ei ) = ‖U−→ei ‖2

= ‖−→ei ‖2

= 1. For i 6= j,

aii − aji − aij + ajj = (U(−→ei −−→ej ))T (U(−→ei −−→ej ))

=⇒ 2− 2aij = ‖U(−→ei −−→ej )‖2 = ‖−→ei −−→ej ‖2

= 2=⇒ aij = 0.

Thus UTU = In and U is orthogonal.

60


6.3 Orthogonal Projections

Theorem 6.11 (Orthogonal Decomposition Theorem). Let W be a subspace of Rn and−→y ∈ Rn. Then

−→y = −→w +−→zfor unique vectors −→w ∈ W and −→z ∈ W⊥. Moreover, if {−→w1,

−→w2, . . . ,−→wk} is an orthogonal

basis of W , then

−→w =−→y · −→w1−→w1 · −→w1

−→w1 +−→y · −→w2−→w2 · −→w2

−→w2 + · · ·+−→y · −→wk−→wk · −→wk

−→wk and −→z = −→y −−→w .

Proof. Suppose {−→w1,−→w2, . . . ,

−→wk} is an orthogonal basis of W . Then

−→w =−→y · −→w1−→w1 · −→w1

−→w1 +−→y · −→w2−→w2 · −→w2

−→w2 + · · ·+−→y · −→wk−→wk · −→wk

−→wk ∈ Span{−→w1,−→w2, . . . ,

−→wk} = W.

Let −→z = −→y −−→w . We show that −→z = −→y −−→w ∈ W⊥. For i = 1, 2, . . . , k,

−→z · −→wi = (−→y −−→w ) · −→wi= −→y · −→wi −−→w · −→wi

= −→y · −→wi −(−→y · −→w1−→w1 · −→w1

−→w1 +−→y · −→w2−→w2 · −→w2

−→w2 + · · ·+−→y · −→wk−→wk · −→wk

−→wk)· −→wi

= −→y · −→wi −(

0 + · · ·+ 0 +−→y · −→wi−→wi · −→wi

−→wi · −→wi + 0 + · · ·+ 0

)= 0.

Since −→z · −→wi = 0 for i = 1, 2, . . . , k, −→z · −→w = 0 for all −→w ∈ W = Span{−→w1,−→w2, . . . ,

−→wk} andconsequently −→z ∈ W⊥.To show the uniqueness of the decomposition −→y = −→w + −→z , let −→y = −→w ′ + −→z ′ for some−→w ′ ∈ W and −→z ′ ∈ W⊥. Then

−→0 = −→y −−→y = (−→w +−→z )− (−→w ′ +−→z ′)

=⇒ −→w ′ −−→w = −→z −−→z ′ ∈ W ∩W⊥ = {−→0 }=⇒ −→w ′ = −→w , −→z ′ = −→z .

Definition. Let W be a subspace of Rn. Each vector −→y ∈ Rn can be uniquely writtenas −→y = −→w + −→z where −→w ∈ W and −→z ∈ W⊥. The unique vector −→w ∈ W is called theorthogonal projection of −→y onto W and it is denoted by projW

−→y .

W

−→0

−→y−→y −projW

−→y

projW−→y

61


Example.

1. Let −→w = [2, 1]T and W = Span{−→w }. For −→y = [2, 3]T , find projW−→y and the orthogonal

decomposition of −→y with respect to W .

projW−→y =

−→y ·−→w−→w ·−→w−→w = 7

5[2, 1]T ∈ W and −→y − projW

−→y = 15[−4, 8]T ∈ W⊥. The

orthogonal decomposition of −→y with respect to W is

−→y = [2, 3]T =7

5[2, 1]T +

1

5[−4, 8]T .

2. Let −→w1 =

230

, −→w2 =

002

, and W = Span{−→w1,−→w2}. For −→y =

101

, find projW−→y

and the orthogonal decomposition of −→y with respect to W .

projW−→y =

−→y · −→w1−→w1 · −→w1

−→w1 +−→y · −→w2−→w2 · −→w2

−→w2

=2

13

230

+2

4

002

=

1

13

46

13

∈ W,−→y − projW

−→y =1

13

9−6

0

∈ W⊥.

The orthogonal decomposition of −→y with respect to W is

−→y =

101

=1

13

46

13

+1

13

9−6

0

.Corollary 6.12. Let W be a subspace of Rn with an orthonormal basis {−→w1,

−→w2, . . . ,−→wk}.

Let U = [−→w1−→w2 · · · −→wk]. Then for each −→y ∈ Rn,

projW−→y = UUTy = (−→y · −→w1)

−→w1 + (−→y · −→w2)−→w2 + · · ·+ (−→y · −→wk)−→wk.

Proof.

UT−→y =

−→w1

T

−→w2T

...−→wkT

−→y =

−→w1

T−→y−→w2

T−→y...

−→wkT−→y

=

−→w1 · −→y−→w2 · −→y

...−→wk · −→y

.62


UUT−→y = [−→w1−→w2 · · · −→wk]

−→w1 · −→y−→w2 · −→y

...−→wk · −→y

= (−→y ·−→w1)−→w1+(−→y ·−→w2)

−→w2+· · ·+(−→y ·−→wk)−→wk = projW−→y .

Remark. Recall that for an m × n real matrix A, A−→x =−→b has a solution if and only if−→

b ∈ CS (A). So A−→x =−→b has no solution if and only if

−→b /∈ CS (A). We find −→w ∈ CS (A)

that is closest to−→b , i.e., the best approximation to

−→b by a vector −→w ∈ CS (A).

Theorem 6.13 (Best Approximation Theorem). Let W be a subspace of Rn and−→b ∈ Rn.

Thenmin−→w∈W

∥∥∥−→b −−→w∥∥∥ =∥∥∥−→b − projW

−→b∥∥∥ .

Proof. It suffices to show that∥∥∥−→b − projW

−→b∥∥∥ <

∥∥∥−→b −−→w∥∥∥ for all −→w ∈ W when −→w 6=

projW−→b . Let −→w ∈ W and −→w 6= projW

−→b . Then

−→0 6= projW

−→b −−→w ∈ W . Since projW

−→b ∈

W ,−→b − projW

−→b ∈ W⊥ by the orthogonal decomposition. Then

(−→b − projW

−→b ) · (projW

−→b −−→w ) = 0.

By Pythagorean theorem,∥∥∥(−→b − projW

−→b ) + (projW

−→b −−→w )

∥∥∥2 =∥∥∥−→b − projW

−→b∥∥∥2 +

∥∥∥projW−→b −−→w

∥∥∥2=⇒

∥∥∥−→b −−→w∥∥∥2 =∥∥∥−→b − projW

−→b∥∥∥2 +

∥∥∥projW−→b −−→w

∥∥∥2 > ∥∥∥−→b − projW−→b∥∥∥2

because projW−→b −−→w 6= −→0 . Thus

∥∥∥−→b −−→w∥∥∥ > ∥∥∥−→b − projW−→b∥∥∥.

Example. Let −→u = [2, 3, 0]T , −→v = [0, 0, 2]T , and W = Span{−→u ,−→v }. Find the point on Wclosest to −→y (the best approximation to −→y by a vector of W ) and find the distance between−→y and W .The point on W closest to −→y is projW

−→y = 113

[4, 6, 13]T ∈ W (show steps). The distance

between −→y and W is ‖−→y − projW−→y ‖ =

∥∥ 113

[9,−6, 0]T∥∥ =

√11713

.

To find projW−→y in an alternative way, note that

{−→u‖−→u‖

−→v‖−→v ‖

}is an orthonormal basis of

W . Let U =

[−→u‖−→u‖

−→v‖−→v ‖

]=

2√13

03√13

0

0 1

. Then

projW−→y = UUTy =

2√13

03√13

0

0 1

[ 2√13

3√13

0

0 0 1

] 101

=

2√13

03√13

0

0 1

[ 2√13

1

]=

1

13

4613

.

63


6.4 Gram-Schmidt Process

Theorem 6.14 (Gram-Schmidt Process). Let W be a subspace of Rn with a basis {−→w1,−→w2, . . . ,

−→wk}.There is an orthogonal basis {−→v1 ,−→v2 , . . . ,−→vk} of W where

−→v1 = −→w1 and −→vi = −→wi −i−1∑j=1

−→wi · −→vj−→vj · −→vj

−→vj , i = 2, 3, . . . , k.

Moreover, Span{−→v1 ,−→v2 , . . . ,−→vi } = Span{−→w1,−→w2, . . . ,

−→wi} for i = 1, 2, . . . , k.

Proof. Let Wi = Span{−→w1,−→w2, . . . ,

−→wi} for i = 1, 2, . . . , k. By finite induction, we prove thatWi = Span{−→v1 ,−→v2 , . . . ,−→vi } and {−→v1 ,−→v2 , . . . ,−→vi } is an orthogonal set for each i = 1, 2, . . . , k.

Since −→v1 = −→w1, W1 = Span{−→w1} = Span{−→v1} and {−→v1} = {−→w1} is an orthogonal set. Sothe statement is true for i = 1. Suppose the statement is true for some j < k, i.e., Wj =Span{−→v1 ,−→v2 , . . . ,−→vj } and {−→v1 ,−→v2 , . . . ,−→vj } is an orthogonal set. We prove the statement istrue for i = j + 1. Note that

−−→vj+1 = −−→wj+1 −j∑t=1

−−→wj+1 · −→vt−→vt · −→vt

−→vt = −−→wj+1 − projWj

−−→wj+1.

Since −−→wj+1 /∈ Wj = Span{−→v1 ,−→v2 , . . . ,−→vj }, by the orthogonal decomposition we have

−−→vj+1 = −−→wj+1 − projWj

−−→wj+1 ∈ W⊥j .

Then −−→vj+1 is orthogonal to each of −→v1 ,−→v2 , . . . ,−→vj which are in Wj. Thus {−→v1 ,−→v2 , . . . ,−−→vj+1} isan orthogonal set of j + 1 vectors in (j + 1)-dimensional subspace Wj+1 and consequently itspans Wj+1 (in fact it forms an orthogonal basis of Wj+1).

Remark. To find an orthonormal basis, normalize each vector of an orthogonal basis bymaking each vector a unit vector.

Example. Find an orthogonal basis of CS (A) for A =

3 1 04 0 −10 2 0

.

Let −→w1 = [3, 4, 0]T , −→w2 = [1, 0, 2]T , and −→w3 = [0,−1, 0]T . Since the columns −→w1,−→w2,−→w3 of A

are linearly indpendent, they form a basis of CS (A).Let −→v1 = −→w1 and W1 = Span{−→v1}.

Let −→v2 = −→w2 − projW1

−→w2 = −→w2 −−→w2·−→v1−→v1·−→v1−→v1 =

102

− 325

340

= 125

16−12

50

and W2 =

Span{−→v1 ,−→v2}.

64


Let −→v3 = −→w3 − projW2

−→w3 = −→w3 −−→w3 · −→v1−→v1 · −→v1

−→v1 −−→w3 · −→v2−→v2 · −→v2

−→v2

=

0−1

0

− −4

25

340

− 12/25

2900/252

1

25

16−12

50

=

1

29

12−9−6

Thus an orthogonal basis of CS (A) is

{−→v1 ,−→v2 ,−→v3} =

3

40

, 1

25

16−12

50

, 1

29

12−9−6

or simply

3

40

, 16−12

50

, 12−9−6

.

An orthonormal basis of CS (A) is

{ −→v1‖−→v1‖

,−→v2‖−→v2‖

,−→v3‖−→v3‖

}=

1

5

340

, 1

10√

29

16−12

50

, 1

3√

29

12−9−6

.

Theorem 6.15 (QR-factorization). If an m × n real matrix A has linearly independentcolumns, then A can be factored as A = QR where Q is an m×n real matrix whose columnsform an orthonormal basis of CS (A) and R is an n× n upper-triangular real matrix.

Proof. (Sketch) By the Gram-Schmidt process find an orthonormal basis {−→v1 ,−→v2 , . . . ,−→vn} ofCS (A). Let Q = [−→v1 ,−→v2 , · · · ,−→vn]. Since the columns of Q are orthonormal, QTQ = In andconsequently QTA = QTQR = InR = R.

65


7 Vector Spaces and Inner Product Spaces

7.1 Basics of Vector Spaces

Definition. A real vector space is a nonempty set V of objects, called vectors, with twooperations, viz, addition and scalar multiplication, that satisfy the following properties forall vectors −→u ,−→v ,−→w in V and all scalars (real numbers) c and d.

1. −→u +−→v is in V .

2. −→u +−→v = −→v +−→u

3. (−→u +−→v ) +−→w = −→u + (−→v +−→w )

4. There is a zero vector−→0 such that −→u +

−→0 = −→u .

5. There is a vector −−→u such that −→u + (−−→u ) =−→0 .

6. c−→u is in V .

7. c(−→u +−→v ) = c−→u + c−→v

8. (c+ d)−→u = c−→u + d−→u .

9. c(d−→u ) = (cd)−→u .

10. 1−→u = −→u .

Remark.

1. Scalars are elements of a field such as the set of real numbers and the set of complexnumbers. If scalars are complex numbers, then V is called a complex vector space.

2. From the definition we have the following:

(a) 0−→u =−→0

(b) c−→0 =

−→0

(c) −−→u = (−1)−→u

Example. The following are real vector spaces.

1. Vn, the set of all vectors (directed line segments) in Rn.

• Addition: Usual vector addition by the triangle/parallelogram law.

• Scalar multiplication: Usual scalar multiplication of vectors.

2. Rn and Cn.

• Addition: Entrywise addition.

66


• Scalar multiplication: Entrywise scalar multiplication.

3. R∞, the set of all real sequences (an) = (a1, a2, a3, . . .).



4. Pn, the set of all real polynomials of degree at most n.

• Addition: If −→p (t) = a0 + a1t+ · · ·+ antn and −→q (t) = b0 + b1t+ · · ·+ bnt

n, then

(−→p +−→q )(t) = (a0 + b0) + (a1 + b1)t+ · · ·+ (an + bn)tn.

• Scalar multiplication: If −→p (t) = a0 + a1t+ · · ·+ antn and c ∈ R, then

(c−→p )(t) = ca0 + ca1t+ · · ·+ cantn.

5. F , the set of all real-valued functions on a set D.

• Addition: (−→p +−→q )(x) = −→p (x) +−→q (x) for all −→p ,−→q ∈ F .

• Scalar multiplication: (c−→p )(x) = c−→p (x) for all −→p ∈ F and c ∈ R.

6. L(V,W ), the set of all linear transformations T : V → W where V and W are realvector spaces.

• Addition: (−→T +

−→S )(−→v ) =

−→T (−→v ) +

−→S (−→v ) for all −→v ∈ V .

• Scalar multiplication: (c−→T )(−→v ) = c

−→T (−→v ) for all −→v ∈ V and c ∈ R.

7. Mm,n(R), the set of all m× n real matrices



7.2 Linear Span and Subspaces

Definition. A linear combination of vectors −→v1 ,−→v2 , . . . ,−→vk of a vector space V is a sum oftheir scalar multiples, i.e.,

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk

for some scalars c1, c2, . . . , ck. The set of all linear combinations of a nonempty set S ofvectors of V is called the linear span or span of S, denoted by Span(S) or SpanS, i.e.,

Span{−→v1 ,−→v2 , . . . ,−→vk} = {c1−→v1 + c2−→v2 + · · ·+ ck

−→vk | c1, c2, . . . , ck are scalars}.

We define Span∅ = {−→0 }. When Span{−→v1 , . . . ,−→vk} = V , we say {−→v1 , . . . ,−→vk} spans V .

67


Example.

1. Span{−→e1 ,−→e2 , . . . ,−→en} = Rn.

2. Span{−→1 ,−→t ,−→t2 , . . . ,

−→tn} = Pn.

3. Span{−→e1 ,−→e2 , . . . ,−→en, . . .} = R∞ where −→ei is the infinite sequence with 1 in the ith placeand 0 elsewhere.

4. Span(B) = Mm,n(R) for B = {−→Ei,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n} where

−→Ei,j is the m × n

matrix with the (i, j)-entry 1 and 0 elsewhere.

Definition. A subspace of a vector space V is a nonempty subset S of V that satisfies threeproperties:

(a)−→0 is in S.

(b) −→u +−→v is in S for all −→u , −→v in S.

(c) c−→u is in S for all −→u in S and all scalars c.

In short, a subspace of V is a nonempty subset S of V that is closed under linear combinationof vectors, i.e., c−→u +d−→v is in S for all −→u , −→v in S and all scalars c, d. When S is a subspaceof V , we sometimes denote it by S ≤ V .

Example.

1. {−→0V } ≤ V and V ≤ V , i.e., {−→0V } and V are subspaces of the vector space V .

2. If F is the vector space of all real-valued functions, then Pn is a a subspace of thevector space F .

3. Let H be the set of all polynomials −→p in Pn such −→p (0) = 0. Note that

H = {−→p ∈ Pn | −→p (0) = 0}= {a1t+ a2t

2 + · · ·+ antn | a1, . . . , an ∈ R}.

Then H is a subspace of the vector space Pn and consequently a subspace of the vectorspace F .

4. Let H =

x1x20

| x1, x2 ∈ R

. H is not a subspace of the vector space R2 but H

is a subspace of the vector space R3.

5. If v1 . . . , vk are vectors of a real vector space V , then

Span{v1 . . . , vk} = {c1v1 + · · ·+ ckvk | c1, . . . , ck ∈ R}

is a subspace of V .

68


7.3 Linear Independence

Definition. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of vectors of a vector space V is linearly independent

if the only linear combination of vectors in S that produces−→0 is a trivial linear combination.,

i.e.,c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 =⇒ c1 = c2 = · · · = ck = 0.

S = {−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent if S is not linearly independent, i.e., there arescalars c1, c2, . . . , ck, not all zero, such that

c1−→v1 + c2

−→v2 + · · ·+ ck−→vk =

−→0 .

Example.

1. {−→v } is linearly independent in V if and only if −→v 6= −→0V .

2. {−→e1 ,−→e2 , . . . ,−→en} is a linearly independent set of vectors in Rn.

3. {−→1 ,−→t ,−→t2 , . . . ,

−→tn} is a linearly independent set of vectors in Pn.

4. {−→e1 ,−→e2 , . . . ,−→en, . . .} is a linearly independent set of vectors in R∞ where −→ei is theinfinite sequence with 1 in the ith place and 0 elsewhere.

5. B = {−→Ei,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a linearly independent set of vectors in Mm,n(R)

where−→Ei,j is the m× n matrix with (i, j)-entry 1 and 0 elsewhere.

6. Consider the following three polynomials in P2:−→p1(t) = t + 2t2, −→p2(t) = 2 + 2t2 and

−→p3(t) = 1− t− t2. Show that {−→p1 , −→p2 , −→p3} is a linearly dependent set in P2.

Suppose c1−→p1 + c2

−→p2 + c3−→p3 =

−→0 for some scalars c1, c2, c3. Then for all t,

(c1−→p1 + c2

−→p2 + c3−→p3)(t) = 0

c1−→p1(t) + c2

−→p2(t) + c3−→p3(t) = 0

c1(t+ 2t2) + c2(2 + 2t2) + c3(1− t− t2) = 0

(2c2 + c3) + (c1 − c3)t+ (2c1 + 2c2 − c3)t2 = 0.

Thus 2c2+c3 = 0, c1−c3 = 0, 2c1+2c2−c3 = 0. One solution is (c1, c2, c3) = (2,−1, 2).

So 2−→p1 −−→p2 + 2−→p3 =−→0 and {−→p1 , −→p2 , −→p3} is a linearly dependent set in P2.

Theorem 7.1. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of k ≥ 2 vectors in a vector space V is linearlydependent if and only if there exists a vector in S that is a linear combination of the othervectors in S.

69


7.4 Basis and Dimensions

Definition. A basis of a nontrivial subspace S of a vector space V is a subset B of S suchthat

(a) Span(B) = S and

(b) B is linearly independent set.

We define the basis of the trivial subspace {−→0V } to be B = ∅. The number of vectors in abasis B is the dimension of S denoted by dim (S) or dimS.

Remark. If a basis of V consists of n vectors, then each basis of V has exactly n vectorsand dim (V ) = n. If dim (V ) is a positive integer, V is called a finite-dimensional vectorspace. Otherwise V is called an infinite-dimensional vector space. If H is a subspace of afinite-dimensional vector space V , then dim (H) ≤ dim (V ) (See Extension Theorem below).

Example.

1. {−→e1 ,−→e2 , . . . ,−→en} is a basis of Rn. So dim (Rn) = n.

2. {−→1 ,−→t ,−→t2 , . . . ,

−→tn} is a basis of Pn. So dim (Pn) = n+ 1.

3. B = {−→Ei,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis of Mm,n(R) where

−→Ei,j is an m× n matrix

with (i, j)-entry 1 and 0 elsewhere. So dim (Mm,n(R)) = mn.

4. {−→e1 ,−→e2 , . . . ,−→en, . . .} is a basis of R∞ where −→ei is the infinite sequence with 1 in the ithplace and 0 elsewhere. So R∞ is an infinite-dimensional vector space.

Now we present some important theorems regarding bases of a subspace of Rn.

Theorem 7.2 (Unique Representation Theorem). Let S be a subspace of a vector space V .

Then B = {−→b1 ,−→b2 , . . . ,

−→bk} is a basis of S if and only if each vector −→v of S is a unique

linear combination of−→b1 ,−→b2 , . . . ,

−→bk , i.e., −→v = c1

−→b1 + c2

−→b2 + · · · + ck

−→bk for unique scalars

c1, c2, . . . , ck.

Theorem 7.3 (Reduction Theorem). Let S be a subspace of a vector space V . If a set

B = {−→b1 ,−→b2 , . . . ,

−→bk} of vectors of S spans S, then either B is a basis of S or a subset of B

is a basis of S.

Theorem 7.4 (Extension Theorem). Let S be a subspace of a vector space V . If a set

B = {−→b1 ,−→b2 , . . . ,

−→bk} of vectors of S is linearly independent, then either B is a basis of S or

a superset of B is a basis of S.

Example. For −→p1(t) = t + 2t2, −→p2(t) = 2 + 2t2, and −→p3(t) = 1 − t − t2 in P2,−→p2 =

2−→p1 + 2−→p3 . Then Span{−→p1 , −→p2 , −→p3} = Span{−→p1 , −→p3} and {−→p1 , −→p3} is a basis of the subspaceSpan{−→p1 , −→p2 , −→p3} of P2.

70


7.5 Linear Transformations

Definition. A function T : V → W from a vector space V to a vector space W (over thesame field) is called a linear transformation if

(a) T (−→u +−→v ) = T (−→u ) + T (−→v ) for all −→u ,−→v ∈ V and

(b) T (c−→v ) = cT (−→v ) for for all −→v ∈ V and all scalars c.

In short, a function T : V → W is a linear transformation if it preserves the linearity amongvectors: T (c−→u + d−→v ) = cT (−→u ) + dT (−→v ) for all −→u ,−→v ∈ V and all scalars c, d.

Definition. The set of all linear transformations from a vector space V to a vector spaceW (over the same field) is denoted by L(V,W ).

Example.

1. For anm×nmatrix A, T : Rn → Rm defined by T (−→x ) = A−→x is a linear transformation.

2. T : Pn → Pn−1 defined by T (a0 + a1t + a2t2 + · · · + ant

n) = a1 + 2a2t + · · · + nantn−1

is a linear transformation.

3. The trace function T : Mn(R)→ R defined by T (A) = tr (A) is a linear transformation.

4. The right shift operator T : R∞ → R∞ defined by T (a1, a2, a3, . . .) = (0, a1, a2, a3, . . .)is a linear transformation.

From the definition of a linear transformation we have the following properties.

Proposition. For a linear transformation T : V → W ,

(a) T (−→0V ) =

−→0W and

(b) for all −→v1 , . . . ,−→vk ∈ V and all scalars c1, . . . , ck,

T (c1−→v1 + c2

−→v2 + · · ·+ ck−→vk) = c1T (−→v1) + c2T (−→v2) + · · ·+ ckT (−→vk).

Example. Consider the function T : R3 → R3 defined by T (x1, x2, x3) = (x1, x2, 5). SinceT (0, 0, 0) = (0, 0, 5) 6= (0, 0, 0), T is not a linear transformation.

For any given linear transformation T : V → W , the domain space is V and the codomainspace is W . We study a subspace of the domain space called Kernel or Null Space and asubspace of the codomain space called Image Space or Range.

Definition. The kernel or null space of a linear transformation T : V → W , denoted byker(T ) or kerT , is the following subspace of Rn:

kerT = {−→x ∈ V | T (−→x ) =−→0W}.

The nullity of T , denoted by nullity (T ), is the dimension of kerT , i.e.,

nullity (T ) = dim (kerT ) .

71


Example. For the linear transformation T : Mn(R)→Mn(R) defined by T (A) = A− AT ,

kerT = {A ∈Mn(R) | T (A) = A− AT = O} = {A ∈Mn(R) | AT = A},

the set of all n× n real symmetric matrices. Then nullity (T ) = dim (kerT ) = n(n+ 1)/2.

Definition. The image space or range of a linear transformation T : V → W , denoted byim(T ) or imT or T (V ), is the following subspace of W:

imT = {T (−→x ) | −→x ∈ V }.

The rank of T , denoted by rank (T ), is the dimension of imT , i.e.,

rank (T ) = dim (imT ) .


imT = {A− AT | A ∈Mn(R)},

the set of all n× n real skew-symmetric matrices. Then

rank (T ) = dim (imT ) = n(n− 1)/2.

Theorem 7.5. Let T : V → W be a linear transformation. If V has finite dimension, then

rank (T ) + nullity (T ) = dim(V ).

Proof. (Sketch) Let dim (V ) = n. Start with a basis {−→v1 , . . . ,−→vk} of KerT and by theExtension Theorem, extend it to a basis {−→v1 , . . . ,−→vk ,−→u1, . . . ,−−→un−k} of V . Now show that{T (−→u1), . . . , T (−−→un−k)} is a basis of imT .


rank (T ) + nullity (T ) =n(n+ 1)

2+n(n− 1)

2= n2 = dim(Mn(R)).

Remark. When dim (V ) =∞, the Rank-Nullity Theorem still holds assuming the followingalgebra:

n+∞ =∞,∞+ n =∞,∞+∞ =∞.

Now we discuss two important types of linear transformation T : Rn → Rm.

Definition. Let T : V → W be a linear transformation. T is onto if each−→b ∈ W has a

pre-image −→x in V under T , i.e., T (−→x ) =−→b . T is one-to-one if each

−→b ∈ Rm has at most

one pre-image in V under T .

Example.

72


1. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) is ontobecause each (x1, x2) ∈ R2 has a pre-image (x1, x2, 0) ∈ R3 under T . But T is not one-to-one because T (0, 0, 0) = T (0, 0, 1) = (0, 0), i.e., (0, 0) has two distinct pre-images(0, 0, 0) and (0, 0, 1) under T .

2. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) is one-to-onebecause T (x1, x2) = T (y1, y2) =⇒ (x1, x2, 0) = (x1, x2, 0) =⇒ (x1, x2) = (y1, y2).But T is not onto because (0, 0, 1) ∈ R3 has no pre-image (x1, x2) ∈ R2 under T .

3. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2) isone-to-one and onto (exercise).

Theorem 7.6. Let T : V → W be a linear transformation. Then the following areequivalent.

(a) T is one-to-one.

(b) kerT = {−→0V }.

(c) nullity (T ) = 0.



1 00 10 0

. Note that the columns of A are linearly

independent , kerT = NS (A) = {−→02}, and nullity (T ) = nullity (A) = 0. Thus T (i.e.,−→x 7→ A−→x ) is one-to-one.

Theorem 7.7. Let T : V → W be a linear transformation. Then the following areequivalent.

(a) T is onto.

(b) imT = W .

(c) rank (T ) = dim(W ).



[1 0 00 1 0

]. Note that each row of A has a

pivot position, imT = CS (A) = R2, and rank (T ) = rank (A) = 2. Thus T (i.e., −→x 7→ A−→x )is onto.

Definition. A linear transformation T : V → W is an isomorphism if it is one-to-one andonto. When T : V → W is an isomorphism, V and W are called isomorphic denoted byV ∼= W .

Example.

73


1. Define T : Pn → Rn+1 by T (a0 + a1t + · · · + antn) = [a0, a1, . . . , an]T . Verify that

KerT = {−→0 } which implies T : Pn → Rn+1 is one-to-one. Also T : Pn → Rn+1 is ontosince each [a0 a1 · · · an]T ∈ Rn+1 has a pre-image a0 + a1t+ · · ·+ ant

n ∈ Pn under T .Thus T : Pn → Rn+1 is an isomorphism and consequently Pn and Rn+1 are isomorphic.

2. The left shift operator T : R∞ → R∞ defined by T (a1, a2, a3, . . .) = (a2, a3, a4, . . .) is alinear transformation. Since each (a2, a3, a4, . . .) ∈ R∞ has a pre-image (0, a2, a3, . . .) ∈R∞ under T , T is onto equivalently imT = R∞. Verify that

KerT = {(a1, 0, 0, . . .) | a1 ∈ R} = Span{(1, 0, 0, . . .)}.

Thus T is not one-to-one and hence not an isomorphism. Note that ontoness of T doesnot imply that T is one-to-one which may happen when domain and codomain spacesare infinite dimensional.

Theorem 7.8. Let T : V → W be a linear transformation. If V and W have finitedimensions, then the following are equivalent.

(a) T is an isomorphism.

(b) T is one-to-one.

(c) kerT = {−→0V }.

(d) nullity (T ) = 0.

(e) T is onto.

(f) imT = W .

(g) rank (T ) = dim(W ).

Theorem 7.9. If V and W are isomorphic via an isomorphism T : V → W , then V andW have similar linear algebraic properties such as follows.

1. H is a subspace of V if and only if T (H) is a subspace of W .

2. {−→v1 , . . . ,−→vn} is linearly independent in V if and only if {T (−→v1), . . . , T (−→vn)} is linearlyindependent in W .

3. {−→v1 , . . . ,−→vn} spans V if and only if {T (−→v1), . . . , T (−→vn)} spans W .

4. {−→v1 , . . . ,−→vn} is a basis of V if and only if {T (−→v1), . . . , T (−→vn)} is a basis of W .

5. dim(V ) = dim(W ).

Problem. Consider the following three polynomials of P2:

−→p1(t) = 1 + t2,−→p2(t) = −1 + 2t− t2, and −→p3(t) = −1 + 4t.

Show that {−→p1 , −→p2 , −→p3} is a basis of P2.

74


Solution. First recall that T : P2 → R3 defined by T (a0 + a1t + a2t2) = [a0, a1, a2]

T is anisomorphism.

T (−→p1) = T (1 + t2) =

101

T (−→p2) = T (−1 + 2t− t2) =

−12−1

T (−→p3) = T (−1 + 4t) =

−140

Now A = [T (−→p1) T (−→p2) T (−→p3)] =

1 −1 −10 2 41 −1 0

−R1+R3−−−−−→

1 −1 −1

0 2 4

0 0 1

.Since 3 × 3 matrix A has 3 pivot positions, by the IMT, the columns of A are linearlyindependent and span R3. Thus {T (−→p1), T (−→p2), T (−→p3)} is a basis of R3. Since T : P2 → R3

is an isomorphism, {−→p1 , −→p2 , −→p3} is a basis of P2.

Definition. Suppose B =(−→b1 , . . . ,

−→bn

)is an ordered basis of a real vector space V . Then

any vector −→x ∈ V can be written as −→x = c1−→b1 + c2

−→b2 + · · · + cn

−→bn for some unique scalars

c1, c2, . . . , cn. The coordinate vector of −→x relative to B or the B-coordinate of −→x , denotedby [−→x ]B, is [−→x ]B = [c1, c2, . . . , cn]T .

Remark. [ ]B : V → Rn is an isomorphism.

Theorem 7.10. If V is a real vector space of dimension n, then V is isomorphic to Rn.

Proof. Let B be an ordered basis of V . Define the coordinate map T : V → Rn by T (−→x ) =[−→x ]B. It can be verified that T is an isomorphism. Thus V ∼= Rn.

Definition. Let V and W be real vector spaces with ordered bases B =(−→b1 , . . . ,

−→bn

)and

C = (−→c1 , . . . ,−→cm) respectively. Let T : V → W be a linear transformation. The matrix of Tfrom B to C, denoted by [T ]C←B or C [T ]B, is the following m× n matrix:

C [T ]B =[[T (−→b1 )]C · · · [T (

−→bn)]C

].

Note that for all −→x ∈ V ,[T (−→x )]C =C [T ]B[−→x ]B.

Example. Pn and Pn−1 are real vector spaces with ordered bases B = (1, x, . . . , xn) andC = (1, x, . . . , xn−1) respectively. For the linear transformation T : Pn → Pn−1 defined by

75


T (a0 + a1x+ a2x2 + · · ·+ anx

n) = a1 + 2a2x+ · · ·+ nanxn−1, we have

C [T ]B =

0 1 0 · · · 00 0 2 · · · 0...

......

. . ....

0 0 0 · · · n− 1

.

−→x T (−→x )

[−→x ]B [T (−→x )]C

T

C [T ]B

[ ]B [ ]C

V W

Rn Rm

Theorem 7.11. Let V be a real vector space with ordered bases B and B′. Let W be a realvector space with ordered bases C and C ′. For a linear transformation T : V → W ,

C′ [T ]B′ =C′ [I]C C [T ]B B[I]B′ .

7.6 Inner Product Spaces

Definition. Let V be a real vector space. An inner product on V , denoted by 〈·, ·〉, is afunction from V × V to R for which the following hold for all −→u ,−→v ,−→w ∈ V and c, d ∈ R:

(a) 〈−→u ,−→v 〉 = 〈−→v ,−→u 〉. (symmetry)

(b) 〈c−→u + d−→v ,−→w 〉 = c〈−→u ,−→w 〉+ d〈−→v ,−→w 〉. (linearity)

(c) 〈−→u ,−→u 〉 ≥ 0 where 〈−→u ,−→u 〉 = 0 if and only if −→u =−→0 . (nonnegativity)

A real vector space with an inner product defined on it is called a real inner product space.

Example.

1. The real vector space Rn is a real inner product space with the standard inner productor the dot product:

〈−→u ,−→v 〉 = −→u · −→v = −→u T−→v .We call Rn as the n-dimensional Euclidean space.

76


2. Consider the set `2(R) of square-summable real sequences:

`2(R) = {−→a = (a1, a2, a3, . . .) ∈ R∞ |∞∑n=1

a2n <∞}.

`2(R) is a real inner product space with the following inner product:

〈−→a ,−→b 〉 =

∞∑n=1

anbn.

3. The set C[0, 1] of all continuous real-valued functions on [0, 1] is a real inner productspace with the following inner product:

〈f, g〉 =

∫ 1

0

f(x)g(x) dx.

Definition. Let −→u and −→v be in a real inner product space V . The length or norm of −→v ,denoted by ‖−→v ‖, is defined by

‖−→v ‖ =√〈−→v ,−→v 〉.

−→v ∈ V is a unit vector if ‖−→v ‖ = 1. The distance between −→u ,−→v , denoted by d(−→u ,−→v ), isdefined by

d(−→u ,−→v ) = ‖−→u −−→v ‖ .

Theorem 7.12. The following are true for all vectors −→u and −→v of a real inner productspace V and for all scalars c in R.

(a) ‖−→v ‖2 = 〈−→v ,−→v 〉.

(b) ‖c−→v ‖ = |c| ‖−→v ‖.

(c) Triangle inequality: ‖−→u +−→v ‖ ≤ ‖−→u ‖+ ‖−→v ‖.

(d) Parallelogram law: ‖−→u +−→v ‖2 = 2 ‖−→u ‖2 + 2 ‖−→v ‖2.

(e) Cauchy-Schwarz inequality: |〈−→u ,−→v 〉| ≤ ‖−→u ‖ ‖−→v ‖ where the equality holds if and onlyif {−→u ,−→v } is linearly dependent.

Definition. Two vectors −→u and −→v of a real inner product space V are orthogonal if

〈−→u ,−→v 〉 = 0.

Theorem 7.13 (Pythagorean Theorem). Two vectors −→u and −→v of a real inner product

space V are orthogonal if and only if ‖−→v +−→v ‖2 = ‖−→u ‖2 + ‖−→v ‖2.

Definition. The angle θ between two vectors −→u and −→v of a real inner product space V isthe angle in [0, π] satisfying

〈−→u ,−→v 〉 = ‖−→u ‖ ‖−→v ‖ cos θ.

77


Definition. Let W be a subspace of a real inner product space V . A vector −→v ∈ V isorthogonal to W if 〈−→v ,−→w 〉 = 0 for all −→w ∈ W . The orthogonal complement of W , denotedby W⊥, is the set of all vectors in V that are orthogonal to W , i.e.,

W⊥ = {−→v ∈ V | 〈−→v ,−→w 〉 = 0 for all −→w ∈ W}.

Theorem 7.14. Let W be a subspace of a real inner product space V . Then

(a) −→v ∈ W⊥ if and only if −→v is orthogonal to each vector −→w of a basis of W .

(b) W⊥ is a subspace of V .

(c) W ⊆ (W⊥)⊥ where the equality holds for finite dimensional W .

(d) W ∩W⊥ = {−→0V }.

Definition. A subset {−→v1 ,−→v2 , . . . ,−→vk} of a real inner product space V is called an orthogonalset if 〈−→vi ,−→vj 〉 = 0 for all distinct i, j = 1, 2, . . . , k. Also {−→v1 ,−→v2 , . . . ,−→vk} is called anorthonormal set if it is an orthogonal set of unit vectors.

Theorem 7.15. If {−→v1 ,−→v2 , . . . ,−→vk} is an orthogonal set of nonzero vectors in a real innerproduct space V , then {−→v1 ,−→v2 , . . . ,−→vk} is linearly independent and consequently forms a basisof Span{−→v1 ,−→v2 , . . . ,−→vk}.

Definition. Let W be a subspace of a real inner product space V . An orthogonal basis ofW is a basis of W that is an orthogonal set. Similarly an orthonormal basis of W is a basisof W that is an orthonormal set.

Theorem 7.16. Let W be a subspace of a real inner product space V and {−→w1,−→w2, . . . ,

−→wk}is an orthogonal basis of W . If −→v ∈ W , then

−→v =〈−→v ,−→w1〉〈−→w1,−→w1〉−→w1 +

〈−→v ,−→w2〉〈−→w2,−→w2〉−→w2 + · · ·+ 〈

−→v ,−→wk〉〈−→wk,−→wk〉

−→wk.

Theorem 7.17 (Orthogonal Decomposition Theorem). Let W be a subspace of a real innerproduct space V and −→y ∈ V . Then

−→y = −→w +−→z

for unique vectors −→w ∈ W and −→z ∈ W⊥. Moreover, if {−→w1,−→w2, . . . ,

−→wk} is an orthogonalbasis of W , then

−→w =〈−→y ,−→w1〉〈−→w1,−→w1〉−→w1 +

〈−→y ,−→w2〉〈−→w2,−→w2〉−→w2 + · · ·+ 〈

−→y ,−→wk〉〈−→wk,−→wk〉

−→wk and −→z = −→y −−→w .

Definition. Let W be a subspace of a real inner product space V . Each vector −→y ∈ Vcan be uniquely written as −→y = −→w + −→z where −→w ∈ W and −→z ∈ W⊥. The unique vector−→w ∈ W is called the orthogonal projection of −→y onto W and it is denoted by projW

−→y .

78


Corollary 7.18. Let W be a subspace of a real inner product space V with an orthonormalbasis {−→w1,

−→w2, . . . ,−→wk}. Then for each −→y ∈ V ,

projW−→y = 〈−→y ,−→w1〉−→w1 + 〈−→y ,−→w2〉−→w2 + · · ·+ 〈−→y ,−→wk〉−→wk.

Theorem 7.19 (Best Approximation Theorem). Let W be a subspace of a real inner product

space V and−→b ∈ V . Then

min−→w∈W

∥∥∥−→b −−→w∥∥∥ =∥∥∥−→b − projW

−→b∥∥∥ .

Theorem 7.20 (Gram-Schmidt Process). Let W be a subspace of a real inner product spaceV with a basis {−→w1,

−→w2, . . . ,−→wk}. There is an orthogonal basis {−→v1 ,−→v2 , . . . ,−→vk} of W where

−→v1 = −→w1 and −→vi = −→wi −i−1∑j=1

〈−→wi,−→vj 〉〈−→vj ,−→vj 〉

−→vj , i = 2, 3, . . . , k.

Moreover, Span{−→v1 ,−→v2 , . . . ,−→vi } = Span{−→w1,−→w2, . . . ,

−→wi} for i = 1, 2, . . . , k.

79

introductory linear algebra - northern arizona...

Documents