linalg ch7 - kaist 수리과학과mathsci.kaist.ac.kr/~dykwak/courses/mas109/linalg_ch7.pdf ·...

Chapter 7

Dimension and Structure

7.1 Basis and Dimensions

Bases for Subspaces

Definition 7.1.1. A set of vectors in a subspace V of Rn is said to be a basis

for V if it is linearly independent and spans V . The set {e1, e2, · · · , en}is called the standard basis for Rn.

Theorem 7.1.2. If S = {v1, · · · ,vk} is a set of two or more nonzero vectors

in Rn, then S is linearly dependent if and only if some vector in S is a linear

combination of its predecessors.

Example 7.1.3. The vectors are linearly independent since none of the vec-

tors are a linear combination of its predecessors.

v1 = (0, 1, 0), v2 = (1, 1, 0), v3 = (0, 1, 3).

Example 7.1.4. The nonzero row vectors in a row echelon form are linearly

independent.

1 ∗ ∗ ∗ ∗0 1 ∗ ∗ ∗0 0 1 ∗ ∗0 0 0 1 ∗0 0 0 0 1

or

1 ∗ ∗ ∗ ∗ ∗0 0 1 ∗ ∗ ∗0 0 0 1 ∗ ∗0 0 0 0 1 ∗0 0 0 0 0 0

Theorem 7.1.5 (Existence of a Basis). If V is a nonzero subspace of Rn,

then there exists a basis for V that has at most n vectors.

97

98 CHAPTER 7. DIMENSION AND STRUCTURE

Theorem 7.1.6. All bases of a nonzero subspace of Rn has the same number

of vectors.

Proof. Let V be a nonzero subspace of Rn, and suppose B1 = {v1, · · · ,vk}and B2 = {w1, · · · ,wm} are bases for V . We have to show m = k. Suppose

k < m. Since both B1 spans V we can express wi, (i = 1, 2, · · · ,m) in terms

of v1, · · · ,vk.

w1 = a11v1 + a21v2 + · · ·+ ak1vk

w2 = a12v1 + a22v2 + · · ·+ ak2vk

... =...

......

wm = a1mv1 + a2mv2 + · · · + akmvk

(7.1)

Consider the system

a11 a12 · · · a1m

a21 a22 · · · a2m...

... · · · ...

ak1 ak2 · · · akm

c1

c2...

cm

=

0

0...

0

if K eqs in m unknowns. Since k < m we have a nontrivial solution. Thus

there exist numbers c1, c2, · · · , cm not all zeros such that

c1a11+ c2a12+ · · ·+ cma1m = 0

c1a21+ c2a22+ · · ·+ cma2m = 0...

... · · · ...

c1ak1+ c2ak2+ · · ·+ cmakm = 0

(7.2)

Now c1w1 + · · ·+ cmwm equals

c1(a11v1+ a21v2+ · · ·+ ak1vk)

+c2(a12v1+ a22v2+ · · ·+ ak2vk)...

... · · · ...

+cm(a1mv1+ a2mv2+ · · ·+ akmvk)

(7.3)

Arranging this we see the coefficients of v1,v2, · · · are all zero. So we have

c1w1 + · · · + cmwm = 0 which is a contradiction since B2 = {w1, · · · ,wm} is

a basis.

Definition 7.1.7. If V is a nonzero subspace of Rn, then the dimension of

7.2. PROPERTIES OF BASES 99

V , written as dim(V ), is the number of vectors in a basis for V .

Dimension of a Solution Space

The solution of a homog. linear system Ax = 0 is of the form(arising from

Gauss -Jordan elimination)

x = t1v1 + · · ·+ tsvs,

where v1, · · · ,vs are linearly independent(See section 3.5). These vectors are

called canonical solutions and the set of vectors {v1, · · · ,vs} is called a

canonical basis for the solution space.

Example 7.1.8. Find the canonical basis for the solution space of the homog.

linear system

x1 +3x2 −2x3 +2x5 = 0

2x1 +6x2 −5x3 −2x4 +4x5 −3x6 = 0

2x1 +6x2 +8x4 +4x5 +18x6 = 0

Dimension of a Hyperplane

Example 7.1.9. If a = (a1, · · · , an) is a nonzero vector in Rn, then the

Hyperplane a⊥ is defined by the equation

a1x1 + · · ·+ anxn = 0

Theorem 7.1.10. If a is a nonzero vector in Rn, then dim(a⊥) = n− 1.

7.2 Properties of Bases

Properties of Bases

Theorem 7.2.1. If S = {v1, · · · ,vk} is a basis for a subspace V of Rn, then

every vector in V can be expressed in exactly one way as a linear combination

of vector in S.

Theorem 7.2.2. Let S = {v1, · · · ,vk} be a finite set of vectors in a nonzero

subspace V of Rn.


(1) If S spans V but is not a basis for V , then a basis for V can be obtained

by removing some vectors from V .

(2) If S is linearly independent vectors, but is not a basis for V , then a basis

for V can be obtained by adding some vectors to S.

Theorem 7.2.3. If V is a nonzero subspace of Rn, the dim(V ) is the maxi-

mum number of linearly independent vectors in V .

Subspaces of Subspaces

Theorem 7.2.4. If V and W are subspaces of Rn and if V is a subspace W ,

then:

(1) 0 ≤ dim(V ) ≤ dim(W ) ≤ n.

(2) V = W if and only if dim(V ) = dim(W ).

Theorem 7.2.5. Let S = {v1, · · · ,vk} be a nonempty set of vectors in Rn,

let S′ be a set that results by adding additional vectors in Rn to S.

(1) If the additional vectors are in span(S) then span(S′) = span(S).

(2) If span(S′) = span(S), then the additional vectors are in span(S).

(3) If span(S′) and span(S) have the same dimension, then the additional

vectors are in span(S) and span(S′) = span(S).

Spanning and Linear Independence

Theorem 7.2.6. (1) A set of k linearly independent vectors in k-dimensional

subspaces of Rn is a basis for that subspace.

(2) A set of k vectors that span a k-dimensional subspaces of Rn is a basis

for that subspace.

(3) A set of fewer than k vectors in k-dimensional subspaces of Rn cannot

span that subspace.

(4) A set with more than k vectors in k-dimensional subspaces of Rn is

linearly dependent.

7.3. FUNDAMENTAL SPACES OF A MATRIX 101

Unifying Theorem

Theorem 7.2.7. If A is an n × n is matrix, if TA is the linear operator on

Rn with standard matrix A, then the followings statements are equivalent.

(1) The reduced row echelon form is In.

(2) A is expressible as a product of elementary matrices.

(3) A is invertible.

(4) Ax = 0 has only trivial solution.

(5) Ax = b is consistent for any b ∈ Rn.

(6) Ax = b has exactly one solution for any b ∈ Rn.

(7) The column vectors are linearly independent.

(8) The row vectors are linearly independent.

(9) det(A) 6= 0.

(10) λ = 0 is not an eigenvalue of A.

(11) TA is one-to-one.

(12) The column vectors of A are linearly independent.

(13) The row vectors of A are linearly independent.

(14) The column vectors of A span Rn.

(15) The row vectors of A span Rn.

(16) The column vectors of A form a basis for Rn.

(17) The row vectors of A form a basis for Rn.

7.3 Fundamental Spaces of a Matrix

Rank of a Matrix

If A is an m × n is matrix, then there are three important spaces associated

with A.


(1) The row space of A, denoted by row(A) is a subspace of Rm spanned

by the rows of A.

(2) The column space of A, denoted by col(A) is a subspace of Rn spanned

by the columns of A.

(3) The null space of A, denoted by null(A) is a subspace of Rn spanned

by the solutions of Ax = 0.

Considering AT , we have another space null(AT ). These four subspaces are

called fundamental spaces of A.

Definition 7.3.1. The dimension of the row space of a matrix A is called the

rank of A, and the dimension of the null space of A is called the nullity of A

and is denoted by nullity(A).

Orthogonal Complements

Definition 7.3.2. If S is a nonempty set in Rn, then the orthogonal com-

plement of S, denoted by S⊥ is defined as the set of all vectors in Rn that

are orthogonal to every vector in S.

Example 7.3.3. (1) If L is a line through the origin of R3, then L⊥ is the

plane through the origin that is perpendicular to L.

(2) If S is the set of row vectors of an m × n matrix A, then S⊥ is the

solution space of Ax = 0.

Theorem 7.3.4. If S is a nonempty set in Rn, then the S⊥ is a subspace of

Rn.

Example 7.3.5. (1) Find the orthogonal complement of the following vec-

tors in R3.

v1 = (1, 1, 0), v2 = (0, 1, 3).

(2) Find the orthogonal complement of the same vectors in R4.

Properties of Orthogonal Complements

Theorem 7.3.6. (1) If W is subspace of Rn, then W ∩W⊥ = {0}.

(2) If S is a nonempty set in Rn, then S⊥ = span(S)⊥.


(3) If W is subspace of Rn, then (W⊥)⊥ = W .

Theorem 7.3.7. If A is an m×n is matrix, then the row space of A and the

null space of A are orthogonal complements.

Proof. If x is in the null space of A, then

Ax = 0.

In other words, the vector x is orthogonal to row space of A. The converse

also holds.

If we apply this theorem to AT we obtain the following.

Theorem 7.3.8. If A is an m×n is matrix, then the column space of A and

the null space of AT are orthogonal complements.

The results of two theorems can be summarized as follows:

row(A)⊥ = null(A), null(A)⊥ = row(A)

col(A)⊥ = null(AT ), null(AT )⊥ = col(A)(7.4)

Theorem 7.3.9. (1) Elementary row operations do not change the row space

of a matrix.

(2) Elementary row operations do not change the null space of a matrix.

(3) The nonzero row vectors in any row echelon form of a matrix form a

basis for the row space of the matrix.

Theorem 7.3.10. Let A and B are matrices with the same number of columns,

then the followings statements are equivalent.

(1) A and B have the same row space.

(2) A and B have the same null space.

(3) The row vectors of A are linear combinations of the row vectors of B,

and conversely.

Proof. (1) ⇔ (2). The row space and null space of a matrix is orthogonal

complement of each other. Hence if A and B have the same row space, they

must have the same null space, and conversely.


Finding Basis by Row Reduction

Find a basis for a subspace W of Rn that is spanned by the vectors

S = {v1, · · · ,vk}

Example 7.3.11. (1) Find a basis for W spanned by the vectors

(1, 0, 0, 0, 2), (−2, 1,−3,−2,−4), (0, 5,−14,−9, 0), (2, 10,−28,−18, 4)

(2) Find a basis for W⊥.

sol. (1) Let

A =

1 0 0 0 2

−2 1 −3 −2 −4

0 5 −14 −9 0

2 10 −28 −18 4

(7.5)

Reducing to echelon form

U =

1 0 0 0 2

0 1 −3 −2 0

0 0 1 1 0

0 0 0 0 0

(7.6)

Extracting nonzero rows we obtain the following vectors

w1 = (1, 0, 0, 0, 2), w2 = (0, 1,−3,−2, 0), w3 = (0, 0, 1, 1, 0)

Or continuing, we get reduced row echelon form:

R =

1 0 0 0 2

0 1 0 1 0

0 0 1 1 0

0 0 0 0 0

(7.7)

We obtain another basis:

w′1 = (1, 0, 0, 0, 2), w′

2 = (0, 1, 0, 1, 0), w′3 = (0, 0, 1, 1, 0)

(2) Note that row(A) = W . Hence W⊥ = row(A)⊥ = null(A). Thus we need


to compute the null space of A. But then Ax = 0 is equivalent to Rx = 0,

where R is given in (7.7). Thus

x1 + 2x5 = 0, x2 + x4 = 0, x3 + x4 = 0,

from which we set two free variables, s = x5, t = x4. So

x1

x2

x3

x4

x5

=

−2s

−t

−t

t

s

= s

−2

0

0

0

1

+ t

0

−1

−1

1

0

(7.8)

Thus the following vectors form a basis for W⊥.

v1 = (−2, 0, 0, 0, 1),v2 = (0,−1,−1, 1, 0)

Determining Whether a Vector is in a Given Space

We consider the following problems:

(1) Given a set of vectors S = {v1, · · · ,vn} in Rm, find conditions under

which the vector b = (b1, b2, · · · , bm) will lie in the span of S.

(2) Given an m × n matrix A, find conditions under which the vector b =

(b1, b2, · · · , bm) will lie in col(A).

(3) Given a linear transformation T : Rn → Rm, find conditions under which

the vector b = (b1, b2, · · · , bm) will lie in ran(T ).

You can check that these problems are equivalent!

Example 7.3.12. Find conditions under which the vectors b = (b1, b2, · · · , b5)will lie in the span of vectors v1, · · · ,v4 in Example 7.3.11.

sol. A direct way is to see when b can be written as a linear combinations of

v1, · · · ,v4, i.e., when we can find numbers x1, · · · , x4 such that the following

holds.

x1v1 + · · ·+ x4v4 = b. (7.9)


This is a system of the form Cx = b where the successive columns of C are

v1, · · · ,v4. Thus the augmented system is

1 −2 0 2 b1

0 1 5 10 b2

0 −3 −14 −28 b3

0 −2 −9 −18 b4

2 −4 0 4 b5

(7.10)

Elimination gives

1 −2 0 2 b1

0 1 5 10 b2

0 0 1 2 b3 + 3b2

0 0 0 0 b4 − b3 − b2

0 0 0 0 b5 − 2b1

The consistency conditions are

b4 − b3 − b2 = 0 b5 − 2b1 = 0.

Solution 2. (Focusing row rather than columns) Recall Theorem 7.2.5. The

vector b lies in span {v1, · · · ,v4} iff this space has the same dimension as

{v1, · · · ,v4,b}, that is, if and only if the matrix A with row vectors v1, · · · ,v4

have the same rank as the matrix with row vectors v1, · · · ,v4,b. Thus ad-

joining the vector b to A yields

1 0 0 0 2

−2 1 −3 −2 4

0 5 −14 −9 0

2 10 −28 −18 4

b1 b2 b3 b4 b5

(7.11)

Reducing this up to the fourth row,

1 0 0 0 2

0 1 0 1 0

0 0 1 1 0

0 0 0 0 0

b1 b2 b3 b4 b5

→

1 0 0 0 2

0 1 0 1 0

0 0 1 1 0

0 0 0 0 0

0 0 0 b4 − b3 − b2 b5 − 2b2

(7.12)


For this matrix to have rank 3 we must have b4 − b3 − b2 = 0 and b5 − 2b2 = 0

which is the same condition as before.

Solution 3. Note that b lies in the subspace W = span{v1, · · · ,v4} if

and only if b is orthogonal to every vector in W⊥. A basis for W⊥ was

shown(Example before) to be

u1 = (−2, 0, 0, 0, 1), and u2 = (0,−1,−1, 1, 0)

Since b is orthogonal to u1 and u2, we have b · u1 = 0, b · u2 = 0, hence

−2b1 + b5 = 0, and − b2 − b3 + b4 = 0

which is the same condition as before.

Example 7.3.13. Determine which of the vectors b1 = (7,−2, 5, 3, 14), b2 =

(7,−2, 5, 3, 6) and b3 = (0,−1, 3,−2, 0) lie in the subspace of R5 spanned by

the vectors v1, · · · ,v4 in Example 7.3.11.

Method 1. One way is to check the conditions found earlier

−2b1 + b5 = 0, and − b2 − b3 + b4 = 0.

Method 2. We form the system Cx = b1, Cx = b2, Cx = b3 and see if these

system have solutions. Consider the augmented system [C|b1|b2|b3] :

1 −2 0 2 7 7 0

0 1 5 10 −2 −2 −1

0 −3 −14 −28 5 5 3

0 −2 −9 −18 3 3 −2

2 −4 0 4 14 6 0

(7.13)

Elimination(row echelon form) gives

1 −2 0 2 7 7 0

0 1 5 10 −2 −2 −1

0 0 1 2 −1 −1 0

0 0 0 0 0 0 −4

0 0 0 0 0 −8 0


We see that only the vector b1 = (7,−2, 5, 3, 14) lies in the subspace spanned

by the vectors v1, · · · ,v4.

7.4 Dimension Theorem and its Implications

Dimension Theorem for Matrices

Let us recall Theorem 2.2.2. If Ax = 0 is the homogeneous linear system

with n unknowns, and if the reduced row echelon form of the augmented

matrix has r nonzero rows, then the system has n − r free variables. This

is called dimension theorem for homogeneous linear systems. However, for

homogeneous system, the augmented matrix(augmented with zero rhs) and

the coeff. matrix have the same number of nonzero rows in the reduced row

echelon form, we can restate the dimension theorem as

number of free variables = n− rank(A)

or

rank(A) + number of free variables = number of columns (7.14)

But the number of free variables is the same as the nullity of A. Hence we

have

Theorem 7.4.1 (Dimension theorem for Matrices). If A is an m×n is matrix,

then

rank(A) + nullity(A) = n (7.15)

Example 7.4.2.

A =

1 0 0 0 2

−2 1 −3 −2 4

0 5 −14 −9 0

2 10 −28 −18 4

(7.16)

rank(A) + nullity(A) = 3 + 2 = 5.

Extending a Linearly Independent Set to a Basis

Given an independent set of vectors {v1,v2, · · · ,vk}, we would like to ex-

tend it to a basis. One way is to form a matrix A having v1,v2, · · · ,vk

7.4. DIMENSION THEOREM AND ITS IMPLICATIONS 109

as rows and consider the system Ax = 0. Solving this system, we can find

the null space of A(the dimension of null(A) is n − k) whose basis we may

put wk+1,wk+2, · · · ,wn. Each of wi is orthogonal to vj , since null(A) and

row(A) are orthogonal. Hence the set {v1,v2, · · · ,vk,wk+1,wk+2, · · · ,wn} is

a linearly independent set and hence form a basis of Rn.

Example 7.4.3. Given a linearly independent vectors

v1 = (2, 0, 4, 0), and v2 = (1,−2,−1, 0)

extend them to a basis for R4. Form a matrix having these vectors as rows;

A =

[

2 0 4 0

1 −2 −1 0

]

(7.17)

Find the null space of A by solving Ax = 0. Its row echelon form is

R =

[

1 0 2 0

0 −2 −3 0

]

Thus

x1 + 2x3 = 0, −2x2 − 3x3 = 0

from which we get

x = (−2s,−3

2s, s, t) = s(−2,−3

2, 1, 0) + t(0, 0, 0, 1).

Thus the vectors

v1 = (2, 0, 4, 0), v2 = (1,−2,−1, 0), w3 = (−2,−3

2, 1, 0), w4 = (0, 0, 0, 1)

form a basis for R4.

Consequences of Dimension Theorem for Matrices

Theorem 7.4.4 (Dimension theorem for Matrices). If an m×n matrix A has

rank k, then

(1) A has nullity n− k.

(2) Every row echelon form of A has k nonzero rows.


(3) Every row echelon form of A has m− k zero rows.

(4) The homogeneous linear system Ax = 0 has k pivot variables(leading

variables) and n− k free variables.

Theorem 7.4.5 (Dimension theorem for Subspaces). If W is a subspace of

Rn, then

dim(W ) + dim(W⊥) = n (7.18)

Proof. We may assume W 6= {0}. Choose a basis for W and let A be the

matrix having these vectors as rows. Obviously, the matrix A has n columns.

The row space is W and its null space is W⊥, so from dimension theorem

Theorem 7.4.1, we see

dim(W ) + dim(W⊥) = rank(A) + nullity(A) = n

Theorem 7.4.6. If A is an n × n is matrix, if TA is the linear operator on

Rn with standard matrix A, then the followings statements are equivalent.

(1) Reduced echelon form is In.

(2) A is expressible as a product of elementary matrices.

(3) A is invertible.

(4) Ax = 0 has only trivial solution.

(5) Ax = b is consistent for any b ∈ Rn.

(6) Ax = b has exactly one solution for any b ∈ Rn.

(7) det(A) 6= 0.

(8) λ = 0 is not an eigenvalue of A.

(9) TA is one-to-one.

(10) TA is onto.

(11) The column vectors of A are linearly independent.

(12) The row vectors of A are linearly independent.

7.4. DIMENSION THEOREM AND ITS IMPLICATIONS 111

(13) The column vectors of A span Rn.

(14) The row vectors of A span Rn.

(15) The column vectors of A form a basis for Rn.

(16) The row vectors of A form a basis for Rn.

(17) rank(A) = n

(18) nullity(A) = 0

More on Hyperplane

Theorem 7.4.7. If W is a subspace of Rn with dimension n − 1, then there

is a nonzero vector a for which W = a⊥; that is W is a hyperplane through

the origin in Rn.

Proof. From the dimension theorem, it follows that dim(W⊥) = 1; and thus

W⊥ is the span of some nonzero vector, say a such that W⊥ = span{a}. Also,we see

W = (W⊥)⊥ = span{a}⊥ = a⊥.

Theorem 7.4.8. The orthogonal complement of a hyperplane through the ori-

gin in Rn is a line through the origin in R

n, and the orthogonal complement

of a line through the origin in Rn is a hyperplane through the origin in R

n.

Specifically, if a is a nonzero vector in Rn, then the line span{a} and

hyperplane a⊥ are orthogonal complement of one another.

Rank one Matrices

Fact about rank one matrices.

• If rank(A) = 1, then the row space of A is spanned by some nonzero

vector a, all the row vectors are scalar multiples of a and the null space

of A is a⊥.


An example of rank one matrix. The outer product of two vectors u and

v:

uvT =

u1v1 u1v2 u1v3 · · · u1vn

u2v1 u2v2 u2v3 · · · u2vn...

......

umv1 umv2 umv3 · · · umvn

All the rows of a rank one matrix are multiples of a single vector, and all the

columns of a rank one matrix are multiples of a single vector.

[

2 −4 −6 0

−3 6 9 0

]

,

2 −2 1

−3 3 −32

4 −4 2

Theorem 7.4.9. If u is an m × 1 vector and v is an n × 1 vector, then the

outer product

A = uvT

is a rank one matrix. Conversely, if A is a m × n rank one matrix, then A

can be written as an outer product of two vectors.

Proof. (⇐) Let A be an m × n rank one matrix, then all the rows of A are

multiples of a single row, say vT . Then

u1vT

u2vT

...

umvT

=

u1

u2...

um

vT = uvT .

A Symmetric Rank one Matrix

An example of symmetric rank one matrix.

uuT =

u21 u1u2 u1u3 · · · u1un

u2u1 u22 u2u3 · · · u2un...

......

unu1 unu2 unu3 · · · u2n

7.5. THE RANK THEOREM AND ITS IMPLICATIONS 113

Theorem 7.4.10. If u is an n × 1 column vector, then the outer product

uuT is a symmetric rank one matrix. Conversely, if A is an n× n symmetric

matrix of rank one, then A can be written as uuT or −uuT for some column

vector u.

Proof. (⇐) Let A be an n × n symmetric matrix of rank one, then by above

theorem A = uvT for some vectors u and vT . Since A is symmetric, we have

(uvT )T = vuT = uvT .

Hence every row of A is a multiple of the vector uT as well as a multiple of the

vector vT . Thus u = ±k2v for some number k. Thus we see A = ±k2vvT =

±(kv)(kv)T . Hence we see A is of the form uuT or −uuT .

7.5 The Rank Theorem and Its Implications

The Rank Theorem

First we recall the following theorem.

Theorem 7.5.1. The row space and the column space of a matrix has the

same rank.

Example 7.5.2.

A =

1 0 2 0 1

0 1 1 0 0

2 1 5 0 1

Reducing we get

1 0 2 0 1

0 1 1 0 0

0 1 1 0 −1

→

1 0 2 0 1

0 1 1 0 0

0 0 0 0 −1

The row rank is 3. Meanwhile we transpose it and compute the column rank.

1 0 2

0 1 1

2 1 5

0 0 0

1 0 1

→

1 0 2

0 1 1

0 1 1

0 0 0

0 0 −1

→

1 0 2

0 1 1

0 0 0

0 0 0

0 0 −1


Theorem 7.5.3. If A is m× n matrix, then

rank(A) = rank(AT ) (7.19)

Recall Theorem 7.4.1.

Theorem 7.5.4 (Dimension theorem for Matrices). If A is an m×n is matrix,

then

rank(A) + nullity(A) = n.

Applying this Theorem to AT , we obtain

rank(AT ) + nullity(AT ) = m.

Since rank(AT ) = rank(A), we can rewrite it as

rank(A) + nullity(AT ) = m.

If A is m×n matrix of rank k, then the dimensions of four fundamental spaces

satisfy

dim(row(A)) = k, dim(null(A)) = n− k

dim(col(A)) = k, dim(null(AT )) = m− k.(7.20)

Example 7.5.5. Find the dimension of fundamental space of the following

matrix.

A =

1 0 2 0 1

0 1 1 0 0

2 1 5 0 1

dim(row(A)) = dim(col(A)) = 3, dim(null(A)) = 5−3 = 2, dim(null(AT )) =

0.

Consistency and Rank

Theorem 7.5.6 (Consistency Theorem). Let Ax = b be an m× n system of

linear equations. Then the following statements are equivalent.

(1) Ax = b is consistent.

(2) b is in the column space of A.


n − k = null(A)

Rn

Rm

A

k = row(A)

k = col(A)

m− k = null(AT )

Figure 7.1: Rank and nullity of A

(3) The coefficient matrix A and its augmented matrix [A|b] have the same

rank.

Example 7.5.7. Find(if any) the solution of the following equation.

Ax = b, where A =

1 0 2

1 1 1

2 1 3

and b =

1

0

1

Definition 7.5.8. An m × n matrix A is said to have full column rank if

its column vectors are linearly independent, and it is said to have full row

rank if its row vectors are linearly independent.

Theorem 7.5.9. Let A be an m× n matrix.

(1) A has full column rank if and only if the column vectors of A form a

basis for the column space, i.e., rank(A) = n.

(2) A has full row rank if and only if the row vectors of A form a basis for

the row space, i.e., rank(A) = m.

Theorem 7.5.10. Let A be an m× n matrix. Then the following statements

are equivalent.

(1) Ax = 0 has only the trivial solution.

(2) Ax = b has at most one solution for every b ∈ Rm.

(3) A has full column rank.


Proof. Equiv. of (1) and (2) are the contents of Theorem 3.5.3. (1) ⇔ (3).

Let a1, · · · ,an be the column vectors of A, and write Ax = 0 in the form

x1a1 + · · ·+ xnan = 0. (7.21)

Then Ax = 0 has only the trivial solution iff the vectors a1, · · · ,an are linearly

independent.

Over-determined and Under determined Linear Systems


(1) (Over-determined) If m > n, then the system Ax = b is inconsistent for

some vector b ∈ Rn.

(2) (Under-determined) If m < n, then for every vector b ∈ Rn the system

Ax = b is either inconsistent or has infinitely many solutions.

Matrices of the form ATA and AA

T

Let A be a matrix with column vectors a1,a2, · · · ,an. Then

ATA =

a1 · a1 a1 · a2 · · · a1 · ana2 · a1 a2 · a2 · · · a2 · an

......

......

an · a1 an · a2 · · · an · an

(7.22)

On the other hand, if r1, r2, · · · , rm are row vectors of A, then

AAT =

r1 · r1 r1 · r2 · · · r1 · rmr2 · r1 r2 · r2 · · · r2 · rm

......

......

rm · r1 rm · r2 · · · rm · rm

(7.23)


(1) A and ATA have the same null space.

(2) A and ATA have the same row space.

(3) AT and ATA have the same column space.


(4) A and ATA have the same rank.

Proof. Let A be an m× n matrix. .... A and ATA have the same null space.

A and ATA have the same row space.


(1) AT and AAT have the same null space.

(2) AT and AAT have the same row space.

(3) A and AAT have the same column space.

(4) A and AAT have the same rank.

Some Unifying Theorem


are equivalent.

(1) Ax = 0 has only the trivial solution.

(2) Ax = b has at most one solution for every b ∈ Rm.

(3) A has full column rank(n).

(4) ATA(n × n) is invertible.


are equivalent.

(1) ATx = 0 has only the trivial solution.

(2) ATx = b has at most one solution for every b ∈ Rn.

(3) A has full row rank.

(4) AAT is invertible.

Example 7.5.16.

A =

1 0

2 1

−3 1


7.6 Pivot Theorem and Its Implications

Basis Problem Revisited

Consider finding a basis for a subspace W spanned by a set of vectors S =

{v1, · · · ,vs}. Two possibilities:

(1) Find any basis for W

(2) Find a basis for W consisting of vectors in S.

First basis problem may be solved by forming a matrix whose row consisting of

vectors in S. One may reduce it into a row echelon form and extract nonzero

row vectors.

One of the ways to solve the second basis problem is to form a matrix A

whose column consist of vectors from S and find a basis for the column space

of A.

Some remarks: We know that the row operations do not change the row

space. However, the row operations do change the column space.

Example 7.6.1.

A =

1 2 −1

2 0 2

3 2 1

E31(−3)E21(−2)⇒ B =

1 2 −1

0 −4 4

0 −4 4

If we let A = [c1, c2, c3] and B = [c′1, c′2, c

′3] then we see c1 − c2 − c3 = 0

and c′1 − c′2 − c′3 = 0. In general, if B is row equivalent to A(i.e., EA = B

for some product of elementary matrices E), then the solution of Ax = 0 and

Bx = 0 are the same. Hence the following relation holds.

x1c1 + x2c2 + · · · + xncn = 0

if and only if

x1c′1 + x2c

′2 + · · ·+ xnc

′n = 0.

Theorem 7.6.2. Let A and B be row equivalent matrices.

(1) If some subset of column vectors of A is linearly independent, then the

corresponding column vectors of B are linearly independent, and vice

versa.

7.6. PIVOT THEOREM AND ITS IMPLICATIONS 119

(2) If some subset of column vectors of A is linearly dependent, then the cor-

responding column vectors of B are linearly dependent, and vice versa.

Moreover, the column vectors in the two matrices have the same depen-

dency relationships.

Example 7.6.3. Find a basis for the column space consisting of column vec-

tors of

A =

1 −3 4 −2 5 4

2 −6 9 −1 8 2

2 −6 9 −1 9 7

−1 3 −4 2 −5 −4

sol. Row reduced echelon form becomes

U =

1 −3 4 −2 5 4

0 0 1 3 −2 −6

0 0 0 0 1 5

0 0 0 0 0 0

We see rank(A) = 3. Hence it suffices to choose three linearly independent

columns. Since the linearly independency does not change by row operations,

we may choose columns corresponding to leading 1’s in echelon form. Thus

columns 1, 3 and 5 from A suffices.

Definition 7.6.4. The columns chosen above (corresponding to leading 1’ in

row echelon form) are called pivot columns.

Theorem 7.6.5 (Pivot Theorem). The pivot columns of a nonzero matrix A

form a basis for the column space of A.

Algorithm 1 -Finding Basis

IfW is a subspace spanned by the vectors S = {v1, · · · ,vs}, then the following

procedure produces a basis for W from S. Steps 4 and 5 gives a way to express

vectors in S not in the basis as linear combinations of basis vectors.

Step 1. Form the matrix A that has v1, · · · ,vs as successive column vectors.

Step 2. Reduce A to row echelon form U and identify the pivot columns.

Step 3. Extract the pivot columns of A as a basis for W .


Step 4. If it is desired to express the vectors that are not in the basis as a

linear combinations of basis vectors, then continue reducing U to the reduced

row echelon form R.

Step 5. Find the relation between columns of R by inspection and the the same

relation between columns of A hold.

Example 7.6.6. (a) Find a basis for the column space consisting of column

vectors of

A =

1 −3 4 −2 5 4

2 −6 9 −1 8 2

2 −6 9 −1 9 7

−1 3 −4 2 −5 −4

and (b) express those column vectors of that are not in the basis as linear

combinations of basis vectors.

sol. (a) Row reduce gives

U =

1 −3 4 −2 5 4

0 0 1 3 −2 −6

0 0 0 0 1 5

0 0 0 0 0 0

We see columns 1, 3 and 5 from A form a basis.

(b). We need a relationship between columns. The row reduced echelon

form is

R =

1 −3 0 −14 13 28

0 0 1 3 0 4

0 0 0 0 1 5

0 0 0 0 0 0

⇒

1 −3 0 −14 0 −37

0 0 1 3 0 4

0 0 0 0 1 5

0 0 0 0 0 0

If we denote the columns of the row reduced form by c′1, c′2, · · · , c′5, we see by

inspection that

c′2 = −3c′1, c′4 = −14c′1 + 3c′3, c′6 = −37c′1 + 4c′3 + 5c′5.


Since the relation between columns do not change by row operation by Theo-

rem 7.6.2, the same relation must hold for the columns of A, i.e.,

c2 = −3c1, c4 = −14c1 + 3c3, c6 = −37c1 + 4c3 + 5c5.

Bases for Fundamental Spaces

Let U be the row echelon form of a matrix A. We have seen how to find bases

of three fundamental spaces of A by reducing to row echelon form. We have

(1) The nonzero rows U form a basis for row(A).

(2) The columns of U with leading 1’ identify the pivot columns of A and

these form a basis for col(A).

(3) The canonical solution of Ax = 0 form a basis for null(A).

How to find a basis for null(AT ) ? An obvious answer is to use row reduction

to AT and find the solution of ATx = 0. However, it would be desirable to find

the basis applying row reduction to A. But how? First note that the dimension

of null(AT ) is m− k, and that ATx = 0 is the same as xTA = 0.

Algorithm 2

If A is m× n matrix with rank k, and if k < m, then we can find the basis for

null(AT ) by the following procedure.

Step 1. Adjoint m×m identity matrix Im to the rhs of A to form [A|Im]

Step 2. Apply row operations to [A|Im] until we obtain row echelon form,

denote it by [U |E]

Step 3. Repartition [U |E] by separating zero rows. Thus

V | E1

−−− · − − −0 | E2

n m

k

m− k

Step 4. The row vectors of E2 form a basis for null(AT )

Optional Proof. The vectors y ∈ null(AT ) ⊂ Rm if and only if yTA = 0.

Thus applying elementary row operations to [A|Im] we get [EA|E] = [U |E].


Now

[U |E] =

V | E1

−−− · − − −0 | E2

where V is k × n matrix. Hence

V

−0

= U = EA =

E1

−E2

A =

E1A

−−−E2A

From this we see E2A = 0 is (m− k)× n matrix. Thus the row vectors of E2

are orthogonal to col(A) thus belong to null(AT ). It remains to show that the

row vectors of E2 form a basis. Write the relation E2A = 0 in the following

form

y1

y2

...

ym−k

A = 0

The m − k rows of E2 are clearly linearly independent. Since the dimension

of null(AT ) is m − k, we conclude the m − k rows of E2 span the null space

of AT .

Example 7.6.7. Find a basis for null(AT ) using the procedure above.

A =

1 −3 4 −2 5 4

2 −6 9 −1 8 2

2 −6 9 −1 9 7

−1 3 −4 2 −5 −4

Column Row Factorization

Theorem 7.6.8 (Column Row Factorization). If A is m×n matrix with rank

k, then A can be factored as

A = CR, (7.24)

where C is the m × k matrix whose column vectors are the pivot columns of

A and R is the k × n matrix row vectors are the nonzero rows in the reduced

row echelon form of A.


Proof. Applying elementary row operations to [A|Im] we get reduce row ech-

elon form [EA|E] = [R0|E]. Now partition R0 and E−1 as

R0 =

[

R

0

]

and E−1 = [C|D],

where R is the nonzero vectors of R0 and C consists of first k columns of E,

and D consists of m− k last columns of E. Hence

A = E−1R0 = [C|D]

[

R

0

]

= CR+D0 = CR. (7.25)

Here we can see the successive columns of C are the pivot columns of A and

the column vectors in those position of R are the standard basis vectors.

e1, e2, · · · , ek.

Thus j-th pivot of A is Cej which is j-th column of C.

Example 7.6.9. (a) Find a Column Row Factorization for the following ma-

trix using the reduced row echelon form.

A =

1 2 8

−1 −1 −5

2 5 19

→

1 0 2

0 1 3

0 0 0

Hence the first two columns of A are pivot columns of A and the corresponding

rows from the reduced row echelon form are (1, 0, 2) and (0, 1, 3). Hence we

have

A =

1 2

−1 −1

2 5

[

1 0 2

0 1 3

]

= CR.

Column Row Expansion

We have seen in chapter 3 that a matrix can be expressed as a sum of outer

products of columns from the first factor and rows from the second factor.

Thus the previous factorization has the following interpretation.

Theorem 7.6.10 (Column Row Expansion). If A is m× n matrix with rank


k, then A can be factored as

A = c1r1 + c2r2 + · · ·+ ckrk, (7.26)

where c1, c2 · · · , ck is the successive pivot columns of A and r1, r2, · · · , rk are

successive nonzero rows in the reduced row echelon form of A.

Example 7.6.11. k = 2 in the above example. So we have A = c1r1 + c2r2,

i.e.,

1 2 8

−1 −1 −5

2 5 19

=

1

−1

2

[

1 0 2]

+

2

−1

5

[

0 1 3]

=

1 0 2

−1 0 −2

2 0 4

+

0 2 6

0 −1 −3

0 5 15

.

7.7 Projection Theorem and its Implications

Orthogonal Projection onto Lines

Example 7.7.1. Let a be any nonzero vector. Consider the orthogonal pro-

jection of any vector x onto the line W = span{a}. If x1 is the orthogonal

projection of x, then we have

x = x1 + x2, x1 = ka for some scalar k and x2 ⊥ a.

Since

0 = x2 · a = (x− ka) · a,

we see k = x·a‖a‖2 . Thus the orthogonal projection of any vector x onto the line

W = span{a} is given by

Projax =x · a‖a‖2a. (7.27)

Here x− Projax is called the orthogonal complement of x.

Orthogonal projection of x about a line with angle θ is given by the matrix

representation

Pθ =

[

12(1 + cos 2θ) 1

2 sin 2θ12 sin 2θ

12(1− cos 2θ)

]

=

[

cos2 θ sin θ cos θ

sin θ cos θ sin2 θ

]

(7.28)

7.7. PROJECTION THEOREM AND ITS IMPLICATIONS 125

Deriving the matrix representation (7.28) from (7.27)

Let us derive the matrix representation (7.28) again. Let

u =a

‖a‖ := (cos θ, sin θ).

Compute

Projue1 = (e1 · u)u = (cos θ)u, P rojue2 = (e2 · u)u = (sin θ)u.

Thus we obtain

Pθ = [Projue1 Proj

ue2] =

[

cos2 θ sin θ cos θ

sin θ cos θ sin2 θ

]

Projection Operators on Rn

If we use the concept of operator, then the orthogonal projection operator

T : Rn → Rn is defined by

T (x) = Projax =

x · a‖a‖2a. (7.29)

Theorem 7.7.2. The standard matrix for the operator T (x) = Projax is

P =1

aTaaaT . (7.30)

Normalizing a to u = a

‖a‖ we get

P = uuT . (7.31)

Proof. We will be done if we compute the columns of T .

T (ej) =ej · a‖a‖2 a =

aj‖a‖2a.

Here aj is the j-th entry of a. Hence

P =[

a1‖a‖2a,

a2‖a‖2a, · · · ,

an‖a‖2a

]

=1

‖a‖2a[

a1, a2, · · · , an]

=1

aTaaaT . (7.32)


Example 7.7.3. Find the standard matrix P when a = [2,−1, 1].

sol. aTa = 4 + 1 + 1 = 6 and

aaT =

2

−1

1

[2,−1, 1] =

4 −2 2

−2 1 −1

2 −1 1

Hence

P =1

6

4 −2 2

−2 1 −1

2 −1 1

x2

Oθ

x

Wa

x1

Figure 7.2: Projection of x onto W = Span(a)

Orthogonal Projection onto General Subspaces

Theorem 7.7.4 (Projection Theorem for Subspaces). If W is a subspace of

Rn, then every vector x in R

n is expressed in exactly one way as

x = x1 + x2, (7.33)

where x1 is in W and x2 is in W⊥.

Proof. We may assume W 6= {0} and let {w1, · · · ,wk} be a basis of W .(Note

that k ≤ n) Let M be the matrix having wi as columns. Then W is the

column space of M and the orthogonal complement of W is the nullspace of

MT .

If x can be written in the form

x = x1 + x2, (7.34)


where x1 is in W and x2 is in W⊥, then

x1 = Mv for some v ∈ Rn and MTx2 = 0.

But then we have

MT (x− x1) = 0 or MT (x−Mv) = 0.

Now consider

MT (x−Mv) = 0 or MTMv = MTx. (7.35)

The matrix M has full column rank, hence MTM is invertible, so we have

unique solution

v = (MTM)−1MTx. (7.36)

In the special case W is the line through the origin, the vectors x1 and x2

are the the same as previous example. So we have actually shown that the

expression in (7.34) is possible in exactly one way. The vector x1 is called the

orthogonal projection of x onto W (written as ProjWx) and x2 is called

the orthogonal projection of x onto W⊥(written as ProjW⊥x).

x = ProjWx+ProjW⊥x. (7.37)

The relations ProjWx = x1 = Mv and (7.36) are rewritten in the following

theorem.

Theorem 7.7.5. If W is a nonzero subspace of Rn, and if M is any matrix

whose column vectors form a basis for W , then

ProjWx = M(MTM)−1MTx, (7.38)

for x ∈ Rn.

The standard matrix corresponding to the projection is

P = M(MTM)−1MT . (7.39)

The action of MT is to eliminate the component orthogonal to the of col(M),

and the M on the left is to project to col(M). One way to check this formula


is to verify that

Px1 = PMv = M(MTM)−1MTMv = Mv and Px2 = M(MTM)−1MTx2 = 0.

Example 7.7.6. Find the standard matrix P for the orthogonal projection

of R3 onto the plane

(1) x− 3y − 4z = 0.

(2) Use the matrix P to find the orthogonal projection of the vector (1, 2,−1).

First we find a basis for the plane and then form a matrix M to find P . From

the row echelon form we see that the vectors in the plane are given by

[x, y, z] = s[3, 1, 0] + t[4, 0, 1].

Thus the set {(3, 1, 0), (4, 0, 1)} is a basis. Hence

M =

3 4

1 0

0 1

⇒ MTM =

[

3 1 0

4 0 1

]

3 4

1 0

0 1

=

[

10 12

12 17

]

,

and

(MTM)−1 =1

26

[

17 −12

−12 10

]

Therefore the projection matrix P = M(MTM)−1MT is given by

1

26

3 4

1 0

0 1

[

17 −12

−12 10

][

3 1 0

4 0 1

]

=1

26

25 3 4

3 17 −12

4 −12 10

When a Matrix Represent Orthogonal Projection ?

From the previous discussions we know that P = M(MTM)−1MT is the

standard matrix for the projection operator onto the spaceW which is spanned

by the columns of M . We observe:

(1) P T = P .

(2) P 2 = P .


Suppose we have an orthogonal projection P onto a k-dimensional subspace

W of Rn. Then

(1) The columns space of P must be k-dimensional.

(2) P is symmetric.

(3) Moreover P 2 = P.(Idempotent)

These properties exactly characterize an orthogonal projection. In fact, we

have

Theorem 7.7.7 (Projection Matrix). If n×n matrix P is the standard matrix

for an orthogonal projection of Rn onto a k-dimensional subspace of Rn if and

only if P is symmetric and idempotent, having rank k. The subspace W is the

column space of P .

Example 7.7.8. Show that A is the standard matrix for an orthogonal pro-

jection of R3 onto a line through the origin.

A =1

9

1 2 2

2 4 4

2 4 4

We see that A is symmetric, idempotent and has rank 1. Hence it is an

orthogonal projection on the a line. The first column (1, 2, 2) is a basis for the

image space W .

Strang Diagram

Consider the system Ax = b, where A is m×n matrix. Let W = row(A) and

W⊥ = null(A). Recall

x = ProjW +ProjW⊥ . (7.40)

Applying this to W = row(A) and W⊥ = null(A), we get

x = xrow(A) + xnull(A). (7.41)

Similarly, we apply this to W = col(A) and W⊥ = null(AT ). For any vector

b ∈ Rm, we can decompose it as

b = bcol(A) + bnull(AT ). (7.42)


Also note the following relations:

dim(row(A)) + dim(null(A)) = n, (7.43)

dim(col(A)) + dim(null(AT )) = m. (7.44)

The system Ax = b is consistent if and only if b is in the column space of

A, if and only if bnull(AT ) = 0.

Full Column Rank and Consistency of A Linear System

Theorem 7.7.9. Let A be an m× n matrix and b is in the column space of

A.

(1) If A has full column rank, then the system Ax = b has a unique solution,

and that solution is in the row space of A.

(2) If A does not have full column rank, then the system Ax = b has in-

finitely many solutions, but there is a unique solution in the row space

of A. Moreover, among all the solutions the solution in the row space of

A has the smallest norm.

Proof. (1) If A has full column rank, then by Theorem 7.5.10 (7.5.6 of book),

the system Ax = b has either inconsistent or has a unique solution. But since

b is in the column space of A, it must be consistent, and there must exist a

unique solution.

(2) Since A does not have full column rank, the system Ax = 0 has in-

finitely many solutions, and hence Ax = b has infinitely many solutions. We

recall the following.

x = xrow(A) + xnull(A) (7.45)

b = A(xrow(A) + xnull(A)) = Axrow(A). (7.46)

So there is at least one solution in the row space of A. (This also proves the

second part of (1)). To see the uniqueness of solution in the row space for the

case (2), suppose xr and x′r are two solutions. Then

A(xr − x′r) = 0

so that xr−x′r ∈ null(A). However, xr−x′

r is in the row space of A. (italicsized

above.) And since row(A) = null(A)⊥, we must have xr − x′r =∈ null(A)⊥ ∩

7.8. BEST APPROXIMATION AND LEAST SQUARES 131

null(A) = {0}. Finally, any solution satisfies

‖x‖ ≥√

‖xrow(A)‖2 + ‖xnull(A)‖2 ≥ ‖xrow(A)‖.

Theorem 7.7.10. If W is a subspace of Rn, then (W⊥)⊥ = W .

Orthogonal Projection onto W⊥

IfW is a nonzero subspace of Rn, and if M is any matrix whose column vectors

form a basis for W , then

ProjW⊥x = x−M(MTM)−1MTx = (I −M(MTM)−1MT )x, (7.47)

for x ∈ Rn.

The standard matrix corresponding to the orthogonal projection is

I − P = I −M(MTM)−1MT (7.48)

Example 7.7.11. Find the standard matrix corresponding to the orthogonal

projection onto the plane x− 3y − 4z = 0. Since

P = M(MTM)−1MT =1

26

25 3 4

3 17 −12

4 −12 10

, I − P =

1

26

1 −3 −4

−3 9 12

−4 12 16

7.8 Best Approximation and Least Squares

Minimum Distance Problems

Given a subspace W and a vector b ∈ Rn, consider the problem of finding a

vector w ∈ W that is closest to b, i.e., find w ∈ W such that

‖b− w‖ ≤ ‖b−w‖, ∀w ∈ W.

Such a vector, if it exists, is called a best approximation to b from W .

b = ProjWb+ ProjW⊥b.


PW⊥b

b

w

O

W

Figure 7.3: Projection

Theorem 7.8.1 (Best Approximation Theorem). If W is a subspace and b

is a vector in Rn, there is a unique best approximation to b from W , namely

w = ProjWb.

Proof. For every vector w ∈ W we have

b−w = (b− ProjWb) + (ProjWb−w).

Since the two terms are orthogonal(the first vector is in W⊥ and the second

vector is in W ), we have

‖b−w‖2 = ‖b− ProjWb‖2 + ‖ProjWb−w‖2

and hence

‖b− ProjWb‖2 ≤ ‖b−w‖2.

Here we see

d ≡ ‖b− ProjWb‖ = ‖ProjW⊥b‖ (7.49)

is the distance from W .

Example 7.8.2. Find the distance from a point b = (b1, · · · , bn) to the hy-

perplane a1x1 + · · · + anxn = 0.

Denote the hyperplane by W . Then W⊥ = span{a}. We see the distance

to the space W is

‖ProjW⊥b‖ = ‖Projab‖ =

a · b‖a‖ =

a1b1 + · · ·+ anbn√

a21 + · · ·+ a2n


Definition 7.8.3. If A is an m × n is matrix and b is a vector in Rm, then

a vector x in Rn is the best approximation solution or least square

solution of Ax = b if

‖b−Ax‖ ≤ ‖b−Ax‖, (7.50)

for all x in Rn. The quantity ‖b−Ax‖ is called the least square error.

Finding Least Square Solutions

How to find the least square solutions of Ax = b ?

Noting that Ax is in the column space of A, we decompose b as

b = Projcol(A)b+ Projcol(A)⊥b.

Then the following is an orthogonal decomposition:

Ax− b = (Ax− Projcol(A)b)− Projcol(A)⊥b ∈ col(A) + col(A)⊥.

The minimum is attained when we can find an x such that

Ax = Projcol(A)b. (7.51)

and

minx∈Rn

‖b−Ax‖ = ‖Projcol(A)⊥b‖.

In practice, one rarely solves (7.51) to compute the least square solution.

Instead, rewriting (7.51) as

b−Ax = b− Projcol(A)b (7.52)

and multiplying AT , we see (since the space col(A)⊥ is equal to null space of

AT ), that

AT (b−Ax) = AT (b− Projcol(A)b) = 0.

This is equivalent to

ATAx = ATb. (7.53)

This is called a normal equation associated with Ax = b.

Theorem 7.8.4. (1) The least square solutions of Ax = b are the solutions


of the normal equation

ATAx = ATb. (7.54)

(2) If A has full column rank, the normal equation has a unique solution,

namely

x = (ATA)−1ATb. (7.55)

(3) If A does not have full column rank, the normal equation has infinitely

many solutions, but there is a unique solution in the row space of A.

Moreover, among all the solutions of the normal equation, the solution

in the row space of A has the smallest norm.

Example 7.8.5. Find the least square solution of the system

x1 − x2 = 4

3x1 + 2x2 = 1

−2x1 + 4x2 = 3.

sol. Compute

x = (ATA)−1ATb. (7.56)

Orthogonality of Least Square Error

Note the following is an orthogonal decomposition:

Ax− b = (Ax− Projcol(A)b)− Projcol(A)⊥b ∈ col(A) + col(A)⊥.

The least square solution x satisfies

Projcol(A)b−Ax = 0. (7.57)

Hence x is a least square solution if and only if

b−Ax = Projnull(AT )b.

Thus

least square error vector = b−Ax = Projnull(AT )b. (7.58)


Theorem 7.8.6. A vector x is the least square solution of Ax = b if and only

if the error b−Ax is orthogonal to the column space of A.

More Application of Least Square Solution-Curve fitting

Given a set of data

(x1, y1), (x2, y2), · · · , (xn, yn)

in the xy-plane, one would like to find a line(or a curve) that fits these data

best in some sense.

Assume we use the line y = a+ bx. Then we must have

y1 = a+ bx1

y2 = a+ bx2...

yn = a+ bxn.

(7.59)

1 x1

1 x2...

1 xn

[

a

b

]

=

y1

y2...

yn

(7.60)

So

Mv = y. (7.61)

Its least square solution is obtained if we find the solution of

MTMv = MTy. (7.62)

Least Square Solution- Higher Degree Polynomial

Given data

(x1, y1), (x2, y2), · · · , (xn, yn)

one would like to find a curve(or a line) that fits best in some sense.

Assume we try a polynomial y = a0 + a1x + · · · + amxm. Then we must


0

1

2

3

0 1 2 3 4 5

Figure 7.4: Fitting data by a linear function using least square method

havey1 = a0 + a1x1 + · · ·+ amxm1y2 = a0 + a1x2 + · · ·+ amxm2

...

yn = a0 + a1xn + · · ·+ amxmn .

(7.63)

1 x1 x21 · · · xm11 x2 x22 · · · xm2

...... · · · ...

1 xn x2n · · · xmn

a0

a1...

am

=

y1

y2...

yn

→ Mv = y. (7.64)

Its least square solution is obtained by the solution of

v = (MTM)−1MTy. (7.65)

7.9 Orthonomal Bases and Gram-Schmidt

Orthogonal and Orthonormal Bases

Orthogonal Projection Using Orthonormal Bases

Recall the following.

If W is a nonzero subspace of Rn, and if M is any matrix whose column

vectors form a basis for W , then

ProjWx = M(MTM)−1MTx, (7.66)

7.9. ORTHONOMAL BASES AND GRAM-SCHMIDT 137

for x ∈ Rn. If the column vectors of M are orthonormal, we have MTM = I

and we have

ProjWx = MMTx. (7.67)

The standard matrix corresponding to the projection is

P = MMT . (7.68)

Equation (7.67) can be restated in the following form.

Theorem 7.9.1. If {v1, · · · ,vk} is an orthonormal basis for a subspace W

of Rn, then the orthogonal projection of x in Rn onto W is given by

ProjWx = (x · v1)v1 + · · ·+ (x · vk)vk. (7.69)

Example 7.9.2. Find the orthogonal projection of x onto the plane W

spanned by orthonormal vectors v1,v2 .....

ProjWx = .....

2 1 4

2 0 1

4 1 3


of Rn, then orthogonal projection onto W can be expressed as

ProjWx = (x · v1)v1 + · · ·+ (x · vk)vk. (7.70)

Proof. Let

M =[

v1 v2 · · · vk

]

Then

ProjWx = MMTx.

So (7.70) is just a restatement of this.

Two examples.

Trace and Orthogonal Projections

Theorem 7.9.4. If P is the standard matrix for an orthogonal projection of

Rn onto a subspace W , then trace(P ) = rank(P ).


Proof. First note that

P = MMT =[

v1 v2 · · · vk

]

v1

v2

...

vk

= v1vT1 + v2v

T2 + · · ·+ vkv

Tk . (7.71)

Direct computation shows that trace(P ) = 1 + 1 + · · · + 1 = k.

Linear Combinations of Orthonormal Basis Vectors


of Rn, and if w is a vector in W , then

w = (w · v1)v1 + · · ·+ (w · vk)vk. (7.72)

Finding Orthonormal Bases

Theorem 7.9.6. Every nonzero subspace of Rn has an orthonormal basis.

Proof. Let W be a nonzero subspace of Rn, and let {w1, · · · ,wk} be any basis

for W . It suffices to show that we can construct an orthogonal basis. (We

normalize it to get an orthonormal basis.) Let Wi = span{w1, · · · ,wi}, i =1, 2, · · · , k and proceed as follows:

Step 1. Let v1 = w1.

Step 2. We construct a vector orthogonal to v1 by computing an orthogonal

projection of w2 and subtracting it from w2. That is,

v2 = w2 − ProjW1w2 = w2 − w2·v1

‖v1‖2 v1

Step 3. v3 = w3 − ProjW2w3 = w3 − w3·v1

‖v1‖2 v1 − w3·v2

‖v2‖2 v2

Step 4. In general, we have

vj = wj − ProjWj−1wj = wj −

∑j−1i=1

wj ·vi

‖vi‖2wi for j = 1, · · · , k.

This process is called Gram-Schmidt process.

Example 7.9.7. Use the Gram-Schmidt process to construct an orthonormal

basis for the plane x+ y + z = 0 in R3.

7.9. ORTHONOMAL BASES AND GRAM-SCHMIDT 139

sol. We need any two linearly independent vectors from the plane. Writing

the equation of plane in parametric form, we introduce y = t1, z = t2 so that

x = −t1 − t2, y = t1, z = t2.

From this we choose t1 = 1, t2 = 0 and t1 = 0, t2 = 1. So the resulting vectors

are

w1 = (−1, 1, 0) and w2 = (−1, 0, 1).

Now use the Gram-Schmidt process.

v1 = w1 = (−1, 1, 0)

v2 = w2 −w2 · v1

‖v1‖2v1 = (−1, 0, 1) − 1

2(−1, 1, 0) = (−1

2,−1

2, 1).

Normalizing

q1 = (− 1√2,1√2, 0)

q2 = (− 1√6,− 1√

6,2√6).

A Property of the Gram-Schmidt Process

Theorem 7.9.8. If S = {w1, · · · ,wk} is a basis for a nonzero subspace of

Rn, and if S′ = {v1, · · · ,vk} is the corresponding orthogonal basis produced

by Gram-Schmidt process, then

(1) {v1, · · · ,vj} is an orthogonal basis for span{w1, · · · ,wj} at the j-th

step.

(2) vj is orthogonal to span{w1, · · · ,wj−1} at the j-th step(j ≥ 2).

Extending Orthonormal Sets to Orthonormal Bases

Theorem 7.9.9. If W is a nonzero subspace of Rn, then

(1) Every orthogonal set of nonzero vectors in W can be extended to an

orthogonal basis for W .


(2) Every orthonormal set in W can be extended to an orthonormal basis for

W .

7.10 QR-Decomposition; Householder Transforma-

tion

QR-Decomposition

Suppose A is an m × k matrix with full column rank(this requires m ≥ k)

whose successive column vectors are {w1, · · · ,wk}. If Gram-Schmidt process

is applied to these vectors to produce an orthonormal basis {q1, · · · ,qk} for thecolumn space of A, and Q is the matrix whose column vectors are {q1, · · · ,qk}in order, what is the relationship between A and Q ?

Let A and Q be the matrices having wi and qi as columns, i.e.,

A = [w1,w2, · · · ,wk], Q = [q1,q2, · · · ,qk].

We can express the vector wi in terms of orthonormal column vectors of Q as

wi =k

∑

j=1

cijqj.

By orthonormal property of qj’s, we see cij = wi · qj and hence

w1 = (w1 · q1)q1 + (w1 · q2)q2 + · · ·+ (w1 · qk)qk

w2 = (w2 · q1)q1 + (w2 · q2)q2 + · · ·+ (w2 · qk)qk

= · · ·wk = (wk · q1)q1 + (wk · q2)q2 + · · · + (wk · qk)qk.

By Theorem 7.9.8, qj is orthogonal to wi when i < j. Hence we have

w1 = (w1 · q1)q1

w2 = (w2 · q1)q1 + (w2 · q2)q2

= · · ·wk = (wk · q1)q1 + (wk · q2)q2 + · · · + (wk · qk)qk.

7.10. QR-DECOMPOSITION; HOUSEHOLDER TRANSFORMATION141

Let us form the upper triangular matrix

R =

(w1 · q1) (w2 · q1) · · · (wk · q1)

0 (w2 · q2) · · · (wk · q2)...

......

...

0 0 · · · (wk · qk)

(7.73)

Then we can see that AQ = R, i.e.,

[w1,w2, · · · ,wk] = [q1,q2, · · · ,qk]

(w1 · q1) (w2 · q1) · · · (wk · q1)

0 (w2 · q2) · · · (wk · q2)...

......

...

0 0 · · · (wk · qk)

A = Q R. (7.74)

Theorem 7.10.1. If A is an m × k(m ≥ k) matrix with full column rank,

then A can be factored as

A = QR, (7.75)

where Q is m× k matrix whose column vectors form an orthonormal basis for

the column space of A, and R is a k × k invertible upper triangular matrix.

In general a matrix factorization of the form A = QR, where column

vectors of Q are orthonormal and R is invertible, upper triangular is called

a QR-decomposition. The QR-decomposition is not unique!(but unique if

rii > 0)

Note that the order of generating R is columnwise in (7.9.6). If we change

the order row wise, we get Modified Gram-Schimdt process.

Other Ways to Obtain QR-decomposition: (Modified) Gram-

Schmidt, Householder and Rotation

One method of finding QR-decomposition for a matrix with full column rank

is to use Gram-Schmidt process to the column vectors of A, where R is given

by (7.73). Unfortunately, it produce large round off error numerically. Hence

it is not recommended to use in numerical purpose. There are other methods.

One is to rearrange the order of orthogonalization(called Modified Gram-

Schmidt). Another method is to use Householder transformation. Still an-

other method is to use the Givens rotation.


Example 7.10.2. Find a QR-decomposition of the following matrix using the

Gram-Schmidt process.

A =

1 −1 0

0 1 1

1 1 1

sol.

w1 = [1, 0, 1]T , w2 = [−1, 1, 1]T , w3 = [0, 1, 1]T

v1 = w1 = [1, 0, 1]T

v2 = w2 −w2 · v1

‖v1‖2v1 = [−1, 1, 1]T − 0

v3 = w3 −w3 · v1

‖v1‖2v1 −

w3 · v2

‖v2‖2v2 = [0, 1, 1]T − 1

2· [1, 0, 1]T − 2

3[−1, 1, 1]T

= [1

6,1

3,−1

6]T

q1 =1√2[1, 0, 1]T , q2 =

1√3[−1, 1, 1]T , q3 =

1√6[1, 2,−1]T

R =

(w1 · q1) (w2 · q1) (w3 · q1)

0 (w2 · q2) (w3 · q2)

0 0 (w3 · q3)

=

√2 0 1√

2

0√3 2√

3

0 0 1√6

Example 7.10.3. Find a QR-decomposition of the matrix

1 −1 4

1 4 −2

1 4 2

1 −1 0


Solution.

‖w1‖ = 2

q1 =w1

‖w1‖=

1

2(1, 1, 1, 1)

w′2 = w2 − (w2 · q1)q1 = (−1, 4, 4,−1) − 3

1

2(1, 1, 1, 1) =

1

2(−5, 5, 5,−5)

‖w′2‖ = 5

q2 =w′

2

‖w′2‖

=1

2(−1, 1, 1,−1)

w′3 = w3 − (w3 · q1)q1 − (w3 · q2)q2

= (4,−2, 2, 0) − 21

2(1, 1, 1, 1) − ( -2 )

1

2(−1, 1, 1,−1) = (2,−2, 2,−2)

‖w′3‖ = 4

q3 =w′

3

‖w′3‖

=1

2(1,−1, 1,−1)

Therefore

QR =1

2

1 −1 1

1 1 −1

1 1 1

1 −1 −1

2 3 2

0 5 −2

0 0 4

Here the computation of R was columnwise in (7.74). If we compute row-wise,

we get modified Gram Schmidt.

The Role of QR-decomposition in Least Square Problems

Recall the least square solution of Ax = b are the exact solution of the normal

equation ATAx = ATb, and if A has full column rank, then the unique solution

is given by

x = (ATA)−1ATb. (7.76)

But solving this system by conventional method such as LU-decomposition is

not desirable because of instability.

Alternative method is to use the decomposition A = QR. With this we

rewrite the equation (ATA)x = ATb as

(RTQT )QRx = RTQTb. (7.77)


This becomes

RTRx = RTQTb. (7.78)

Since R is invertible upper triangular matrix, we can easily solve it.

Theorem 7.10.4. If A is an m × k matrix with full column rank, and if

A = QR is a QR-decomposition, then the normal equation ATAx = ATb can

be expressed as

Rx = QTb, (7.79)

and the least square solution is given by

x = R−1QTb. (7.80)

Example 7.10.5. Use a QR-decomposition to find the least square solution

ofx1 + 2x2 + 4x3 = −1

x1 + x2 + x3 = 2

x1 + 2x2 − x3 = 1

x1 − 2x2 + x3 = 2.

The matrix

A =

1 2 3

1 1 1

1 1 1

1 0 1

has full column rank 3. Perform QR -decomposition

q1 =1

2(1, 1, 1, 1)

w′2 = w2 − (w2 · q1)q1 = (2, 1, 1, 0) − 2

1

2(1, 1, 1, 1) = (1, 0, 0,−1)

q2 =w′

2

‖w′2‖

=1√2(1, 0, 0,−1)

w′3 = w3 − (w3 · q1)q1 − (w3 · q2)q2

= (3, 1, 1, 1) − 31

2(1, 1, 1, 1) −

√2

1√2(1, 0, 0,−1) =

1

2(1,−1,−1, 1)

q3 =1

2(1,−1,−1, 1)


q

refla⊥x

x

Pax

−2Pax

a⊥

Figure 7.5: Projection Pa and reflection refla⊥ = I − 2proj

a

Thus

1 2 3

1 1 1

1 1 1

1 0 1

=

12

1√2

12

12 0 −1

212 0 −1

212 − 1√

212

2 2 3

0√2

√2

0 0 1

Householder Reflections

Reflection about the hyperplane a⊥(Householder reflection, elementary

reflector) satisfies

x− refla⊥x = 2proj

ax or refl

a⊥x = x− 2proj

ax.

Definition 7.10.6. If a is a nonzero vector in Rn and x is any vector in R

n,

then the reflection of x about the hyperplane a⊥ is defined as

refla⊥x = x− 2proj

ax. (7.81)

The operator T : Rn → Rn defined by T (x) = refl

a⊥x is the reflection of Rn

about the hyperplane a⊥.

The reflection operator about a⊥ can be expressed by the matrix

Ha⊥ = I − 2

aTaaaT .

It is called a Householder Reflections. And if u = a/‖a‖, then

Hu⊥ = I − 2uuT .

This is symmetric and orthogonal.


Theorem 7.10.7. If v and w are two vectors in Rn having the same length,

then the Householder Reflection about the hyperplane (v −w)⊥ maps v into

w.

Example 7.10.8. Let v = (3, 4, 0) and w = (5, 0, 0). Find a Householder

Reflection that maps v into w.

sol. Let a = v − w = (−2, 4, 0) and so ‖a‖ =√20. The Householder

Reflection is

Ha⊥ = I − 2

aTaaaT = I − 2

20

−2

4

0

[

−2 4 0]

=

1 0 0

0 1 0

0 0 1

− 1

10

4 −8 0

−8 16 0

0 0 0

Example 7.10.9. Find a QR-decomposition of the following matrix using

Householder Reflections.

A =

1 −1 0

0 1 1

1 1 1

sol. Recall Ans. by MGS.

A = QR =

1√2

− 1√3

1√6

0 1√3

2√6

1√2

1√3

− 1√6

·

√2 0 1√

2

0√3 2√

3

0 0 1√6

Let us try Householder’s.

a1 = [1, 0, 1]T , a2 = [−1, 1, 1]T , a3 = [0, 1, 1]T

Since ‖a1‖ =√2, α1 =

√2, α1e1−a1

‖α1e1−a1‖ = [√2− 1, 0,−1]T /(

√

4− 2√2)

Q1 = I − 2uuT = I − 2

4− 2√2

3− 2√2 0 1−

√2

0 0 0

1−√2 0 1


Q1A =

1 −1 0

0 1 1

1 1 1

− 1

2−√2

3− 2√2 0 1−

√2

0 0 0

1−√2 0 1

1 −1 0

0 1 1

1 1 1

=

1 −1 0

0 1 1

1 1 1

− 1

2−√2

4− 3√2 −2 +

√2 1−

√2

0 0 0

2−√2

√2 1

=1

2−√2

−2 + 2√2 0 −1 +

√2

0 2−√2 2−

√2

0 2− 2√2 1−

√2

=

√2 0 1√

2

0 1 1

0 −√2 −1

Let a2 = [1,−√2], ‖a2‖ =

√3 = α2, so α2e1 − a2 = (

√3− 1,

√2).

u2 =α2e1 − a2

‖α2e1 − a2‖= [

√3− 1,

√2]T /(

√

6− 2√3)

Q2 = I−2u2uT2 = I− 1

3−√3

[

4− 2√3

√6−

√2√

6−√2 2

]

=1

3−√3

[

−1 +√3 −

√6 +

√2

−√6 +

√2 1−

√3

]

Q1 =1√2

1 0 1

0√2 0

1 0 −1

, Q2 =

1√3

[

1 −√2

−√2 −1

]

=

1 0 0

0 1√3

−√2√3

0 −√2√3

− 1√3

Q1Q2 =1√2

1 0 1

0√2 0

1 0 −1

1 0 0

0 1√3

−√2√3

0 −√2√3

− 1√3

=1√2

1 −√2√3

− 1√3

0√2√3

− 2√3

1√2√3

1√3

Q2A2 =1√2

1

3−√3

[

−3√2 + 3

√6 −2

√2 + 2

√6

0 −√3 + 1

]

=1√6

[

3√2 2

√2

0 −1

]

So

Q2Q1A =

√2 0 1√

2

0√3 2√

3

0 0 − 1√6

= R′

This coincides with the result earlier(by MGS in Leader) except − sign in the

thrid column.


QR-Decomposition using Householder Reflections

Let A be 4× 4 matrix

A =

X X X X

X X X X

X X X X

X X X X

If we can find orthogonal matrices Q1, Q2, Q3 such that

Q3Q2Q1A = R

is upper triangular, then we would have QR-decomposition

A = Q−11 Q−1

2 Q−13 R = QT

1 QT2Q

T3 R = QR. (7.82)

Now the first step of QR is to reduce the first column to a multiple of e1:

Let A = [a1, A2]. Then we can find an orthogonal matrix Q1 = Hu⊥ , which

maps the first column of A onto α1e1. Since the vectors a1 and α1e1 have the

same lengths, we have α1 = ±‖a1‖(To avoid dividing by small number, choose

the sign that makes ‖α1e1 − a1‖ is larger). We see u is given by

u =α1e1 − a1

‖α1e1 − a1‖. (7.83)

and

Q1A =

α1 b2 . . . bn

0... A12

0

.

We repeat the same process. If (n−1)×(n−1) matrix Q′2 is the elementary

reflector used to map the first column of A12 to a vector of the form α2e1 ,

then the matrix

Q2 =

1 0 . . . 0

0... Q′

2

0

7.11. COORDINATES WITH RESPECT TO A BASIS 149

makes

Q2Q1A =

α1 b2 . . . bn

0 α2 ∗ ∗... 0 A′

23

0 0

Repeat the same process to have

Qn−1 · · ·Q2Q1A = R(upper triangular).

7.11 Coordinates with Respect to a Basis

Nonrectangular Coordinates Systems in Rn

Example 7.11.1. If x = (x1, x2) we can write x = x1e1 + x2e2. But given

two linearly independent vectors v1 and v2 we may find numbers

x = c1v1 + c2v2.

The numbers (c1, c2) are coordinates w.r.t the coordinates systems v1

and v2.

If v1 = (1, 0) and v2 = (12 ,√32 ) then the point (4,

√3) = 3(1, 0) + 2(12 ,

√32 )

has coordinate (2, 2).

v1 3v1

v2

2v2

(3, 2)

Ob

b

Figure 7.6: Coordinates of (4,√3) w.r.t v1 and v2

Definition 7.11.2. If B = {v1,v2, · · · ,vk} is an ordered basis for a subspace

W of Rn and if

w = a1v1 + a2v2 + · · · + akvk,


then we call

a1, a2, · · · , ak

the coordinates of w with respect to the coordinates system B. Here

aj the vj-coordinate of w. We denote it by

(w)B = (a1, a2, · · · , ak)

and call it the coordinate vector for w with respect to B. The column

vector

[w]B =

a1

a2...

ak

is called the coordinate matrix for w with respect to B.

Example 7.11.3. Let

v1 = (1,−2, 5), v2 = (0,−1, 3), v3 = (0,−1, 1).

(1) Express the vector b1 = (3,−2, 5) with respect to the basisB = {v1,v2,v3}.will lie in the span of S.

(2) Find the vector w in R3 having (−2, 1, 4) as coordinate vector (w)B .

sol. (1) Find the solution of the relation

b1 = (3,−2, 5) = a1v1 + a2v2 + a3v3.

Thus solving

1 0 0

−2 −1 −1

5 3 1

a1

a2

a3

=

3

−2

5

([

v1, v2, v3

]

)

We see a1 = 3, a2 = −3, a3 = −1. Actually in vector notation, we have

a = B−1b1.

(2)

w = −2v1 + v2 + 4v3 = (−2,−1,−3).


Coordinates w.r.t an Orthonormal Bases

If B = {v1,v2, · · · ,vk} is an orthonormal basis for a subspace W of Rn and

w is any vector in Rn, then

w = (w · v1)v1 + (w · v2)v2 + · · · + (w · vk)vk. (7.84)

Hence the coordinates of w with respect to B is

(w)B = ((w · v1), (w · v2), · · · , (w · vk)). (7.85)

Change of Basis

If w is any vector in Rn and if we change the basis B = {v1,v2, · · · ,vn} to

another basis B′ = {v′1,v

′2, · · · ,v′

n} what happens to the coordinates ?

For simplicity, assume n = 2. Let

B = {v1,v2}, and B′ = {v′1,v

′2}.

If

[v1]B′ =

[

a

b

]

and [v2]B′ =

[

c

d

]

(7.86)

thenv1 = av′

1 + bv′2

v2 = cv′1 + dv′

2.(7.87)

Now if w is any vector in W with

[w]B =

[

k1

k2

]

. (7.88)

Then

w = k1v1 + k2v2.

To express into the new coord. system, we use (7.87) to see

w = k1(av′1 + bv′

2) + k2(cv′1 + dv′

2) = (k1a+ k2c)v′1 + (k1b+ k2d)v

′2.


Thus in the new new coord. system,

[w]B′ =

[

k1a+ k2c

k1b+ k2d

]

This can be written as

[w]B′ =

[

k1a+ k2c

k1b+ k2d

]

=

[

a c

b d

][

k1

k2

]

=

[

a c

b d

]

[w]B . (7.89)

Thus the new coordinate can be obtained by multiplying the old coordinate

by[

a c

b d

]

=[

[v1]B′ [v2]B′

]

. (7.90)

Remark 7.11.4. How to find the transformation matrix? Let a = [a, b] and

b = [c, d]. Recall Example 7.11.3 and let [B] be the matrix whose columns are

v1,v2, similarly for [B′]. Then the columns of transformation matrix [a b] are

given by a = [B′]−1v1 and b = [B′]−1v2. So the transformation matrix is

[B′]−1[B] =[

v′1 v′

2

]−1 [

v1 v2

]

.

In general, this result is summarized as

Theorem 7.11.5 (Change of Basis). If w is a vector in Rn and if B =

{v1,v2, · · · ,vn} and B′ = {v′1,v

′2, · · · ,v′

n} are bases for Rn, then the coordi-

nates of w w.r.t two bases are related by

[w]B′ = PB→B′ [w]B ,

where

PB→B′ =[

[v1]B′ [v2]B′ · · · [vn]B′

]

=[

v′1 v′

2 · · · v′n

]−1 [

v1 v2 · · · vn

]

(7.91)

is the transition matrix or the coordinate change matrix.

Example 7.11.6. The bases are B1 = {e1, e2} and B2 = {v1,v2} where

e1 = (1, 0), e2 = (0, 1), v1 = (2, 1), v2 = (−1, 2).

(1) Find the transition matrix from B1 to B2.


(2) Find [w]B2 when [w]B2 = [2,−5].

PB1→B2 =[

[e1]B2 [e2]B2

]

(7.92)

First express e1 = (1, 0) and e2 in terms of v1 = (2, 1), v2 = (−1, 2), that is

solve[

2 −1

1 2

][

c1 d1

c2 d2

]

=

[

1 0

0 1

]

(7.93)

then we get[

c1

c2

]

=

[

25

−15

]

,

[

d1

d2

]

=

[

1525

]

So

e1 =2

5v1 −

1

5v2, e2 =

1

5v1 +

2

5v2,

from which we see

[e1]B2 =

[

25

−15

]

, [e2]B2 =

[

1525

]

Finally the transition matrix is

PB1→B2 =[

[e1]B2 [e2]B2

]

=

[

25

15

−15

25

]

=[

v1 v2

]−1 [

e1 e2

]

(7.94)

We note that the columns of the transition matrix were obtained by row

operations to the augmented matrix of (7.93). This give a technique to find a

transition matrix, see below.

Theorem 7.11.7 (Inverse of a transition matrix). If B and B′ are bases for

Rn, then the transition matrix PB′→B and PB→B′ are inverse of each other;

that is

(PB′→B)−1 = PB→B′ ,

where

PB→B′ =[

[v1]B′ [v2]B′ · · · [vn]B′

]

(7.95)

is the transition matrix or the coordinate change matrix.


A Technique to Find a Transition Matrix

Let us show how to compute the transition matrix between B and B′:

PB→B′ =[

[v1]B′ [v2]B′ · · · [vn]B′

]

(7.96)

The entries of [vj ]B′ are the coefficients that are required to express vj as a

linear combination of v′1,v

′2, · · · ,v′

n, hence can be obtained by solving

[

v′1 v′

2 · · · v′n

]

x = vj , (7.97)

for j = 1, 2, · · · , n. These correspond to reducing the augmented matrix

[

v′1 v′

2 · · · v′n|v1 v2 · · · vn

]

(7.98)

to[

I|[v1]B′ [v1]B′ · · · [vn]B′

]

= [I|PB→B′ ] = [B′]−1[B].

In other words,

[

new basis | old basis]

row operation=⇒

[

I | transition matrix]

(7.99)

This may be viewed as

[

v′1 v′

2 · · · v′n

]−1 [

v1 v2 · · · vn

]

In summary, a procedure to compute PB→B′ is

Step 1. We start from the augmented matrix [B′|B]

Step 2. Use elementary row operations to reduce it to row echelon form

Step 3. Obtain [I|PB→B′ ]

Step 4. Extract PB→B′

As a particular case, we obtain the following result.

Proposition 7.11.8. If we change the basis from B to standard basis, then

the augmented matrix

[

e1 e2 · · · en|v1 v2 · · · vn

]

(7.100)


itself reveals the transition matrix as

PB→S =[

v1|v2| · · · |vn

]

(7.101)

Its converse is the following.

Theorem 7.11.9 (orthogonal transition matrix). Conversely, if P is any

invertible n×n matrix with column vectors, p1,p2, · · · ,pn then P is the tran-

sition matrix from the basis B = {p1,p2, · · · ,pn} to the standard basis.

New way to think about the matrices

If

A =

[

1 2

5 7

]

is any invertible matrix, we may view it as a transition matrix from the basis

B =

{[

1

5

]

,

[

2

7

]}

to the standard basis. This is exactly what the above theorem says.

Coordinate Maps

If B is a basis for Rn, then the transformation

x → (x)B or in column notation x → [x]B

is called the coordinate map.

Theorem 7.11.10 (orthogonal transition matrix). If B is a basis for Rn,

then the coordinate map x → [x]B is a 1-1 linear operator on Rn. Moreover,

if B is an orthonormal basis for Rn, then it is an orthogonal operator.

Transition between Orthonormal Bases

Theorem 7.11.11. If B and B′ are two orthonormal bases for Rn, then the

transition matrices PB→B′ and PB′→B are orthogonal.


(cos θ, sin θ)

θ

O x

y

Figure 7.7: Rotation of e1 and e2 by θ

Proof. Consider the transition matrix

PB→B′ =[

[v1]B′ [v2]B′ · · · [vn]B′

]

=[

v′1 v′

2 · · · v′n

]−1 [

v1 v2 · · · vn

]

(7.102)

Both [B′] and [B] are orthogonal, since the columns are orthogonal. So are

the inverse, and the product of orthogonal matrices are orthogonal.

Example 7.11.12. Let S be the standard basis for R2 and let B = {v1,v2}be the basis obtained by rotation the standard vectors about the origin by an

angle θ. Then the transition matrix from B to S is

PB→S = [S]−1[B] =

[

cos θ − sin θ

sin θ cos θ

]

The the transition matrix from S to B is

PS→B = P−1B→S =

[

cos θ sin θ

− sin θ cos θ

]

Both of them are orthogonal.

Application to Rotation of Coordinates

If B = {v1, v2} where v1,v2 are obtained by rotating the standard vectors

e1 and e2 by an angle of θ, then the transition matrix is

PB→S =

[

cos θ − sin θ

sin θ cos θ

]


Thus if (x, y) is the coordinates in xy-axes and (x′, y′) is the coordinates in

x′y′-axes(rotated), then

[

x

y

]

=

[

cos θ − sin θ

sin θ cos θ

][

x′

y′

]

(7.103)

Rotation

Rotate xy-coordinate by θ and call new coordinate x′y′- Then P (x, y) is rep-

resented by (x′, y′) in x′y′-coordinate.

α

x′

y′

b

θx

y

O

M ′

M

P

{

(x, y)

(x′, y′)

Figure 7.8: Rotation of axis

From fig 7.8 we see

x = OM = OP cos(α+ θ) = OP cosα cos θ −OP sinα sin θ

y = MP = OP sin(α+ θ) = OP cosα sin θ +OP sinα cos θ.

On the other hand,

OP cosα = OM ′ = x′, OP sinα = M ′P ′ = y′.

Proposition 7.11.13. If (x′, y′) is the new coordinate of the point P = (x, y)

in the standard xy-coordinate, then we have

x = x′ cos θ − y′ sin θ

y = x′ sin θ + y′ cos θ.

linalg ch7 - kaist 수리과학과mathsci.kaist.ac.kr/~dykwak/courses/mas109/linalg_ch7.pdf ·...

Documents