linalg ch7 - kaist 수리과학과mathsci.kaist.ac.kr/~dykwak/courses/mas109/linalg_ch7.pdf ·...
TRANSCRIPT
96
Chapter 7
Dimension and Structure
7.1 Basis and Dimensions
Bases for Subspaces
Definition 7.1.1. A set of vectors in a subspace V of Rn is said to be a basis
for V if it is linearly independent and spans V . The set {e1, e2, · · · , en}is called the standard basis for Rn.
Theorem 7.1.2. If S = {v1, · · · ,vk} is a set of two or more nonzero vectors
in Rn, then S is linearly dependent if and only if some vector in S is a linear
combination of its predecessors.
Example 7.1.3. The vectors are linearly independent since none of the vec-
tors are a linear combination of its predecessors.
v1 = (0, 1, 0), v2 = (1, 1, 0), v3 = (0, 1, 3).
Example 7.1.4. The nonzero row vectors in a row echelon form are linearly
independent.
1 ∗ ∗ ∗ ∗0 1 ∗ ∗ ∗0 0 1 ∗ ∗0 0 0 1 ∗0 0 0 0 1
or
1 ∗ ∗ ∗ ∗ ∗0 0 1 ∗ ∗ ∗0 0 0 1 ∗ ∗0 0 0 0 1 ∗0 0 0 0 0 0
Theorem 7.1.5 (Existence of a Basis). If V is a nonzero subspace of Rn,
then there exists a basis for V that has at most n vectors.
97
98 CHAPTER 7. DIMENSION AND STRUCTURE
Theorem 7.1.6. All bases of a nonzero subspace of Rn has the same number
of vectors.
Proof. Let V be a nonzero subspace of Rn, and suppose B1 = {v1, · · · ,vk}and B2 = {w1, · · · ,wm} are bases for V . We have to show m = k. Suppose
k < m. Since both B1 spans V we can express wi, (i = 1, 2, · · · ,m) in terms
of v1, · · · ,vk.
w1 = a11v1 + a21v2 + · · ·+ ak1vk
w2 = a12v1 + a22v2 + · · ·+ ak2vk
... =...
......
wm = a1mv1 + a2mv2 + · · · + akmvk
(7.1)
Consider the system
a11 a12 · · · a1m
a21 a22 · · · a2m...
... · · · ...
ak1 ak2 · · · akm
c1
c2...
cm
=
0
0...
0
if K eqs in m unknowns. Since k < m we have a nontrivial solution. Thus
there exist numbers c1, c2, · · · , cm not all zeros such that
c1a11+ c2a12+ · · ·+ cma1m = 0
c1a21+ c2a22+ · · ·+ cma2m = 0...
... · · · ...
c1ak1+ c2ak2+ · · ·+ cmakm = 0
(7.2)
Now c1w1 + · · ·+ cmwm equals
c1(a11v1+ a21v2+ · · ·+ ak1vk)
+c2(a12v1+ a22v2+ · · ·+ ak2vk)...
... · · · ...
+cm(a1mv1+ a2mv2+ · · ·+ akmvk)
(7.3)
Arranging this we see the coefficients of v1,v2, · · · are all zero. So we have
c1w1 + · · · + cmwm = 0 which is a contradiction since B2 = {w1, · · · ,wm} is
a basis.
Definition 7.1.7. If V is a nonzero subspace of Rn, then the dimension of
7.2. PROPERTIES OF BASES 99
V , written as dim(V ), is the number of vectors in a basis for V .
Dimension of a Solution Space
The solution of a homog. linear system Ax = 0 is of the form(arising from
Gauss -Jordan elimination)
x = t1v1 + · · ·+ tsvs,
where v1, · · · ,vs are linearly independent(See section 3.5). These vectors are
called canonical solutions and the set of vectors {v1, · · · ,vs} is called a
canonical basis for the solution space.
Example 7.1.8. Find the canonical basis for the solution space of the homog.
linear system
x1 +3x2 −2x3 +2x5 = 0
2x1 +6x2 −5x3 −2x4 +4x5 −3x6 = 0
2x1 +6x2 +8x4 +4x5 +18x6 = 0
Dimension of a Hyperplane
Example 7.1.9. If a = (a1, · · · , an) is a nonzero vector in Rn, then the
Hyperplane a⊥ is defined by the equation
a1x1 + · · ·+ anxn = 0
Theorem 7.1.10. If a is a nonzero vector in Rn, then dim(a⊥) = n− 1.
7.2 Properties of Bases
Properties of Bases
Theorem 7.2.1. If S = {v1, · · · ,vk} is a basis for a subspace V of Rn, then
every vector in V can be expressed in exactly one way as a linear combination
of vector in S.
Theorem 7.2.2. Let S = {v1, · · · ,vk} be a finite set of vectors in a nonzero
subspace V of Rn.
100 CHAPTER 7. DIMENSION AND STRUCTURE
(1) If S spans V but is not a basis for V , then a basis for V can be obtained
by removing some vectors from V .
(2) If S is linearly independent vectors, but is not a basis for V , then a basis
for V can be obtained by adding some vectors to S.
Theorem 7.2.3. If V is a nonzero subspace of Rn, the dim(V ) is the maxi-
mum number of linearly independent vectors in V .
Subspaces of Subspaces
Theorem 7.2.4. If V and W are subspaces of Rn and if V is a subspace W ,
then:
(1) 0 ≤ dim(V ) ≤ dim(W ) ≤ n.
(2) V = W if and only if dim(V ) = dim(W ).
Theorem 7.2.5. Let S = {v1, · · · ,vk} be a nonempty set of vectors in Rn,
let S′ be a set that results by adding additional vectors in Rn to S.
(1) If the additional vectors are in span(S) then span(S′) = span(S).
(2) If span(S′) = span(S), then the additional vectors are in span(S).
(3) If span(S′) and span(S) have the same dimension, then the additional
vectors are in span(S) and span(S′) = span(S).
Spanning and Linear Independence
Theorem 7.2.6. (1) A set of k linearly independent vectors in k-dimensional
subspaces of Rn is a basis for that subspace.
(2) A set of k vectors that span a k-dimensional subspaces of Rn is a basis
for that subspace.
(3) A set of fewer than k vectors in k-dimensional subspaces of Rn cannot
span that subspace.
(4) A set with more than k vectors in k-dimensional subspaces of Rn is
linearly dependent.
7.3. FUNDAMENTAL SPACES OF A MATRIX 101
Unifying Theorem
Theorem 7.2.7. If A is an n × n is matrix, if TA is the linear operator on
Rn with standard matrix A, then the followings statements are equivalent.
(1) The reduced row echelon form is In.
(2) A is expressible as a product of elementary matrices.
(3) A is invertible.
(4) Ax = 0 has only trivial solution.
(5) Ax = b is consistent for any b ∈ Rn.
(6) Ax = b has exactly one solution for any b ∈ Rn.
(7) The column vectors are linearly independent.
(8) The row vectors are linearly independent.
(9) det(A) 6= 0.
(10) λ = 0 is not an eigenvalue of A.
(11) TA is one-to-one.
(12) The column vectors of A are linearly independent.
(13) The row vectors of A are linearly independent.
(14) The column vectors of A span Rn.
(15) The row vectors of A span Rn.
(16) The column vectors of A form a basis for Rn.
(17) The row vectors of A form a basis for Rn.
7.3 Fundamental Spaces of a Matrix
Rank of a Matrix
If A is an m × n is matrix, then there are three important spaces associated
with A.
102 CHAPTER 7. DIMENSION AND STRUCTURE
(1) The row space of A, denoted by row(A) is a subspace of Rm spanned
by the rows of A.
(2) The column space of A, denoted by col(A) is a subspace of Rn spanned
by the columns of A.
(3) The null space of A, denoted by null(A) is a subspace of Rn spanned
by the solutions of Ax = 0.
Considering AT , we have another space null(AT ). These four subspaces are
called fundamental spaces of A.
Definition 7.3.1. The dimension of the row space of a matrix A is called the
rank of A, and the dimension of the null space of A is called the nullity of A
and is denoted by nullity(A).
Orthogonal Complements
Definition 7.3.2. If S is a nonempty set in Rn, then the orthogonal com-
plement of S, denoted by S⊥ is defined as the set of all vectors in Rn that
are orthogonal to every vector in S.
Example 7.3.3. (1) If L is a line through the origin of R3, then L⊥ is the
plane through the origin that is perpendicular to L.
(2) If S is the set of row vectors of an m × n matrix A, then S⊥ is the
solution space of Ax = 0.
Theorem 7.3.4. If S is a nonempty set in Rn, then the S⊥ is a subspace of
Rn.
Example 7.3.5. (1) Find the orthogonal complement of the following vec-
tors in R3.
v1 = (1, 1, 0), v2 = (0, 1, 3).
(2) Find the orthogonal complement of the same vectors in R4.
Properties of Orthogonal Complements
Theorem 7.3.6. (1) If W is subspace of Rn, then W ∩W⊥ = {0}.
(2) If S is a nonempty set in Rn, then S⊥ = span(S)⊥.
7.3. FUNDAMENTAL SPACES OF A MATRIX 103
(3) If W is subspace of Rn, then (W⊥)⊥ = W .
Theorem 7.3.7. If A is an m×n is matrix, then the row space of A and the
null space of A are orthogonal complements.
Proof. If x is in the null space of A, then
Ax = 0.
In other words, the vector x is orthogonal to row space of A. The converse
also holds.
If we apply this theorem to AT we obtain the following.
Theorem 7.3.8. If A is an m×n is matrix, then the column space of A and
the null space of AT are orthogonal complements.
The results of two theorems can be summarized as follows:
row(A)⊥ = null(A), null(A)⊥ = row(A)
col(A)⊥ = null(AT ), null(AT )⊥ = col(A)(7.4)
Theorem 7.3.9. (1) Elementary row operations do not change the row space
of a matrix.
(2) Elementary row operations do not change the null space of a matrix.
(3) The nonzero row vectors in any row echelon form of a matrix form a
basis for the row space of the matrix.
Theorem 7.3.10. Let A and B are matrices with the same number of columns,
then the followings statements are equivalent.
(1) A and B have the same row space.
(2) A and B have the same null space.
(3) The row vectors of A are linear combinations of the row vectors of B,
and conversely.
Proof. (1) ⇔ (2). The row space and null space of a matrix is orthogonal
complement of each other. Hence if A and B have the same row space, they
must have the same null space, and conversely.
104 CHAPTER 7. DIMENSION AND STRUCTURE
Finding Basis by Row Reduction
Find a basis for a subspace W of Rn that is spanned by the vectors
S = {v1, · · · ,vk}
Example 7.3.11. (1) Find a basis for W spanned by the vectors
(1, 0, 0, 0, 2), (−2, 1,−3,−2,−4), (0, 5,−14,−9, 0), (2, 10,−28,−18, 4)
(2) Find a basis for W⊥.
sol. (1) Let
A =
1 0 0 0 2
−2 1 −3 −2 −4
0 5 −14 −9 0
2 10 −28 −18 4
(7.5)
Reducing to echelon form
U =
1 0 0 0 2
0 1 −3 −2 0
0 0 1 1 0
0 0 0 0 0
(7.6)
Extracting nonzero rows we obtain the following vectors
w1 = (1, 0, 0, 0, 2), w2 = (0, 1,−3,−2, 0), w3 = (0, 0, 1, 1, 0)
Or continuing, we get reduced row echelon form:
R =
1 0 0 0 2
0 1 0 1 0
0 0 1 1 0
0 0 0 0 0
(7.7)
We obtain another basis:
w′1 = (1, 0, 0, 0, 2), w′
2 = (0, 1, 0, 1, 0), w′3 = (0, 0, 1, 1, 0)
(2) Note that row(A) = W . Hence W⊥ = row(A)⊥ = null(A). Thus we need
7.3. FUNDAMENTAL SPACES OF A MATRIX 105
to compute the null space of A. But then Ax = 0 is equivalent to Rx = 0,
where R is given in (7.7). Thus
x1 + 2x5 = 0, x2 + x4 = 0, x3 + x4 = 0,
from which we set two free variables, s = x5, t = x4. So
x1
x2
x3
x4
x5
=
−2s
−t
−t
t
s
= s
−2
0
0
0
1
+ t
0
−1
−1
1
0
(7.8)
Thus the following vectors form a basis for W⊥.
v1 = (−2, 0, 0, 0, 1),v2 = (0,−1,−1, 1, 0)
Determining Whether a Vector is in a Given Space
We consider the following problems:
(1) Given a set of vectors S = {v1, · · · ,vn} in Rm, find conditions under
which the vector b = (b1, b2, · · · , bm) will lie in the span of S.
(2) Given an m × n matrix A, find conditions under which the vector b =
(b1, b2, · · · , bm) will lie in col(A).
(3) Given a linear transformation T : Rn → Rm, find conditions under which
the vector b = (b1, b2, · · · , bm) will lie in ran(T ).
You can check that these problems are equivalent!
Example 7.3.12. Find conditions under which the vectors b = (b1, b2, · · · , b5)will lie in the span of vectors v1, · · · ,v4 in Example 7.3.11.
sol. A direct way is to see when b can be written as a linear combinations of
v1, · · · ,v4, i.e., when we can find numbers x1, · · · , x4 such that the following
holds.
x1v1 + · · ·+ x4v4 = b. (7.9)
106 CHAPTER 7. DIMENSION AND STRUCTURE
This is a system of the form Cx = b where the successive columns of C are
v1, · · · ,v4. Thus the augmented system is
1 −2 0 2 b1
0 1 5 10 b2
0 −3 −14 −28 b3
0 −2 −9 −18 b4
2 −4 0 4 b5
(7.10)
Elimination gives
1 −2 0 2 b1
0 1 5 10 b2
0 0 1 2 b3 + 3b2
0 0 0 0 b4 − b3 − b2
0 0 0 0 b5 − 2b1
The consistency conditions are
b4 − b3 − b2 = 0 b5 − 2b1 = 0.
Solution 2. (Focusing row rather than columns) Recall Theorem 7.2.5. The
vector b lies in span {v1, · · · ,v4} iff this space has the same dimension as
{v1, · · · ,v4,b}, that is, if and only if the matrix A with row vectors v1, · · · ,v4
have the same rank as the matrix with row vectors v1, · · · ,v4,b. Thus ad-
joining the vector b to A yields
1 0 0 0 2
−2 1 −3 −2 4
0 5 −14 −9 0
2 10 −28 −18 4
b1 b2 b3 b4 b5
(7.11)
Reducing this up to the fourth row,
1 0 0 0 2
0 1 0 1 0
0 0 1 1 0
0 0 0 0 0
b1 b2 b3 b4 b5
→
1 0 0 0 2
0 1 0 1 0
0 0 1 1 0
0 0 0 0 0
0 0 0 b4 − b3 − b2 b5 − 2b2
(7.12)
7.3. FUNDAMENTAL SPACES OF A MATRIX 107
For this matrix to have rank 3 we must have b4 − b3 − b2 = 0 and b5 − 2b2 = 0
which is the same condition as before.
Solution 3. Note that b lies in the subspace W = span{v1, · · · ,v4} if
and only if b is orthogonal to every vector in W⊥. A basis for W⊥ was
shown(Example before) to be
u1 = (−2, 0, 0, 0, 1), and u2 = (0,−1,−1, 1, 0)
Since b is orthogonal to u1 and u2, we have b · u1 = 0, b · u2 = 0, hence
−2b1 + b5 = 0, and − b2 − b3 + b4 = 0
which is the same condition as before.
Example 7.3.13. Determine which of the vectors b1 = (7,−2, 5, 3, 14), b2 =
(7,−2, 5, 3, 6) and b3 = (0,−1, 3,−2, 0) lie in the subspace of R5 spanned by
the vectors v1, · · · ,v4 in Example 7.3.11.
Method 1. One way is to check the conditions found earlier
−2b1 + b5 = 0, and − b2 − b3 + b4 = 0.
Method 2. We form the system Cx = b1, Cx = b2, Cx = b3 and see if these
system have solutions. Consider the augmented system [C|b1|b2|b3] :
1 −2 0 2 7 7 0
0 1 5 10 −2 −2 −1
0 −3 −14 −28 5 5 3
0 −2 −9 −18 3 3 −2
2 −4 0 4 14 6 0
(7.13)
Elimination(row echelon form) gives
1 −2 0 2 7 7 0
0 1 5 10 −2 −2 −1
0 0 1 2 −1 −1 0
0 0 0 0 0 0 −4
0 0 0 0 0 −8 0
108 CHAPTER 7. DIMENSION AND STRUCTURE
We see that only the vector b1 = (7,−2, 5, 3, 14) lies in the subspace spanned
by the vectors v1, · · · ,v4.
7.4 Dimension Theorem and its Implications
Dimension Theorem for Matrices
Let us recall Theorem 2.2.2. If Ax = 0 is the homogeneous linear system
with n unknowns, and if the reduced row echelon form of the augmented
matrix has r nonzero rows, then the system has n − r free variables. This
is called dimension theorem for homogeneous linear systems. However, for
homogeneous system, the augmented matrix(augmented with zero rhs) and
the coeff. matrix have the same number of nonzero rows in the reduced row
echelon form, we can restate the dimension theorem as
number of free variables = n− rank(A)
or
rank(A) + number of free variables = number of columns (7.14)
But the number of free variables is the same as the nullity of A. Hence we
have
Theorem 7.4.1 (Dimension theorem for Matrices). If A is an m×n is matrix,
then
rank(A) + nullity(A) = n (7.15)
Example 7.4.2.
A =
1 0 0 0 2
−2 1 −3 −2 4
0 5 −14 −9 0
2 10 −28 −18 4
(7.16)
rank(A) + nullity(A) = 3 + 2 = 5.
Extending a Linearly Independent Set to a Basis
Given an independent set of vectors {v1,v2, · · · ,vk}, we would like to ex-
tend it to a basis. One way is to form a matrix A having v1,v2, · · · ,vk
7.4. DIMENSION THEOREM AND ITS IMPLICATIONS 109
as rows and consider the system Ax = 0. Solving this system, we can find
the null space of A(the dimension of null(A) is n − k) whose basis we may
put wk+1,wk+2, · · · ,wn. Each of wi is orthogonal to vj , since null(A) and
row(A) are orthogonal. Hence the set {v1,v2, · · · ,vk,wk+1,wk+2, · · · ,wn} is
a linearly independent set and hence form a basis of Rn.
Example 7.4.3. Given a linearly independent vectors
v1 = (2, 0, 4, 0), and v2 = (1,−2,−1, 0)
extend them to a basis for R4. Form a matrix having these vectors as rows;
A =
[
2 0 4 0
1 −2 −1 0
]
(7.17)
Find the null space of A by solving Ax = 0. Its row echelon form is
R =
[
1 0 2 0
0 −2 −3 0
]
Thus
x1 + 2x3 = 0, −2x2 − 3x3 = 0
from which we get
x = (−2s,−3
2s, s, t) = s(−2,−3
2, 1, 0) + t(0, 0, 0, 1).
Thus the vectors
v1 = (2, 0, 4, 0), v2 = (1,−2,−1, 0), w3 = (−2,−3
2, 1, 0), w4 = (0, 0, 0, 1)
form a basis for R4.
Consequences of Dimension Theorem for Matrices
Theorem 7.4.4 (Dimension theorem for Matrices). If an m×n matrix A has
rank k, then
(1) A has nullity n− k.
(2) Every row echelon form of A has k nonzero rows.
110 CHAPTER 7. DIMENSION AND STRUCTURE
(3) Every row echelon form of A has m− k zero rows.
(4) The homogeneous linear system Ax = 0 has k pivot variables(leading
variables) and n− k free variables.
Theorem 7.4.5 (Dimension theorem for Subspaces). If W is a subspace of
Rn, then
dim(W ) + dim(W⊥) = n (7.18)
Proof. We may assume W 6= {0}. Choose a basis for W and let A be the
matrix having these vectors as rows. Obviously, the matrix A has n columns.
The row space is W and its null space is W⊥, so from dimension theorem
Theorem 7.4.1, we see
dim(W ) + dim(W⊥) = rank(A) + nullity(A) = n
Theorem 7.4.6. If A is an n × n is matrix, if TA is the linear operator on
Rn with standard matrix A, then the followings statements are equivalent.
(1) Reduced echelon form is In.
(2) A is expressible as a product of elementary matrices.
(3) A is invertible.
(4) Ax = 0 has only trivial solution.
(5) Ax = b is consistent for any b ∈ Rn.
(6) Ax = b has exactly one solution for any b ∈ Rn.
(7) det(A) 6= 0.
(8) λ = 0 is not an eigenvalue of A.
(9) TA is one-to-one.
(10) TA is onto.
(11) The column vectors of A are linearly independent.
(12) The row vectors of A are linearly independent.
7.4. DIMENSION THEOREM AND ITS IMPLICATIONS 111
(13) The column vectors of A span Rn.
(14) The row vectors of A span Rn.
(15) The column vectors of A form a basis for Rn.
(16) The row vectors of A form a basis for Rn.
(17) rank(A) = n
(18) nullity(A) = 0
More on Hyperplane
Theorem 7.4.7. If W is a subspace of Rn with dimension n − 1, then there
is a nonzero vector a for which W = a⊥; that is W is a hyperplane through
the origin in Rn.
Proof. From the dimension theorem, it follows that dim(W⊥) = 1; and thus
W⊥ is the span of some nonzero vector, say a such that W⊥ = span{a}. Also,we see
W = (W⊥)⊥ = span{a}⊥ = a⊥.
Theorem 7.4.8. The orthogonal complement of a hyperplane through the ori-
gin in Rn is a line through the origin in R
n, and the orthogonal complement
of a line through the origin in Rn is a hyperplane through the origin in R
n.
Specifically, if a is a nonzero vector in Rn, then the line span{a} and
hyperplane a⊥ are orthogonal complement of one another.
Rank one Matrices
Fact about rank one matrices.
• If rank(A) = 1, then the row space of A is spanned by some nonzero
vector a, all the row vectors are scalar multiples of a and the null space
of A is a⊥.
112 CHAPTER 7. DIMENSION AND STRUCTURE
An example of rank one matrix. The outer product of two vectors u and
v:
uvT =
u1v1 u1v2 u1v3 · · · u1vn
u2v1 u2v2 u2v3 · · · u2vn...
......
umv1 umv2 umv3 · · · umvn
All the rows of a rank one matrix are multiples of a single vector, and all the
columns of a rank one matrix are multiples of a single vector.
[
2 −4 −6 0
−3 6 9 0
]
,
2 −2 1
−3 3 −32
4 −4 2
Theorem 7.4.9. If u is an m × 1 vector and v is an n × 1 vector, then the
outer product
A = uvT
is a rank one matrix. Conversely, if A is a m × n rank one matrix, then A
can be written as an outer product of two vectors.
Proof. (⇐) Let A be an m × n rank one matrix, then all the rows of A are
multiples of a single row, say vT . Then
u1vT
u2vT
...
umvT
=
u1
u2...
um
vT = uvT .
A Symmetric Rank one Matrix
An example of symmetric rank one matrix.
uuT =
u21 u1u2 u1u3 · · · u1un
u2u1 u22 u2u3 · · · u2un...
......
unu1 unu2 unu3 · · · u2n
7.5. THE RANK THEOREM AND ITS IMPLICATIONS 113
Theorem 7.4.10. If u is an n × 1 column vector, then the outer product
uuT is a symmetric rank one matrix. Conversely, if A is an n× n symmetric
matrix of rank one, then A can be written as uuT or −uuT for some column
vector u.
Proof. (⇐) Let A be an n × n symmetric matrix of rank one, then by above
theorem A = uvT for some vectors u and vT . Since A is symmetric, we have
(uvT )T = vuT = uvT .
Hence every row of A is a multiple of the vector uT as well as a multiple of the
vector vT . Thus u = ±k2v for some number k. Thus we see A = ±k2vvT =
±(kv)(kv)T . Hence we see A is of the form uuT or −uuT .
7.5 The Rank Theorem and Its Implications
The Rank Theorem
First we recall the following theorem.
Theorem 7.5.1. The row space and the column space of a matrix has the
same rank.
Example 7.5.2.
A =
1 0 2 0 1
0 1 1 0 0
2 1 5 0 1
Reducing we get
1 0 2 0 1
0 1 1 0 0
0 1 1 0 −1
→
1 0 2 0 1
0 1 1 0 0
0 0 0 0 −1
The row rank is 3. Meanwhile we transpose it and compute the column rank.
1 0 2
0 1 1
2 1 5
0 0 0
1 0 1
→
1 0 2
0 1 1
0 1 1
0 0 0
0 0 −1
→
1 0 2
0 1 1
0 0 0
0 0 0
0 0 −1
114 CHAPTER 7. DIMENSION AND STRUCTURE
Theorem 7.5.3. If A is m× n matrix, then
rank(A) = rank(AT ) (7.19)
Recall Theorem 7.4.1.
Theorem 7.5.4 (Dimension theorem for Matrices). If A is an m×n is matrix,
then
rank(A) + nullity(A) = n.
Applying this Theorem to AT , we obtain
rank(AT ) + nullity(AT ) = m.
Since rank(AT ) = rank(A), we can rewrite it as
rank(A) + nullity(AT ) = m.
If A is m×n matrix of rank k, then the dimensions of four fundamental spaces
satisfy
dim(row(A)) = k, dim(null(A)) = n− k
dim(col(A)) = k, dim(null(AT )) = m− k.(7.20)
Example 7.5.5. Find the dimension of fundamental space of the following
matrix.
A =
1 0 2 0 1
0 1 1 0 0
2 1 5 0 1
dim(row(A)) = dim(col(A)) = 3, dim(null(A)) = 5−3 = 2, dim(null(AT )) =
0.
Consistency and Rank
Theorem 7.5.6 (Consistency Theorem). Let Ax = b be an m× n system of
linear equations. Then the following statements are equivalent.
(1) Ax = b is consistent.
(2) b is in the column space of A.
7.5. THE RANK THEOREM AND ITS IMPLICATIONS 115
n − k = null(A)
Rn
Rm
A
k = row(A)
k = col(A)
m− k = null(AT )
Figure 7.1: Rank and nullity of A
(3) The coefficient matrix A and its augmented matrix [A|b] have the same
rank.
Example 7.5.7. Find(if any) the solution of the following equation.
Ax = b, where A =
1 0 2
1 1 1
2 1 3
and b =
1
0
1
Definition 7.5.8. An m × n matrix A is said to have full column rank if
its column vectors are linearly independent, and it is said to have full row
rank if its row vectors are linearly independent.
Theorem 7.5.9. Let A be an m× n matrix.
(1) A has full column rank if and only if the column vectors of A form a
basis for the column space, i.e., rank(A) = n.
(2) A has full row rank if and only if the row vectors of A form a basis for
the row space, i.e., rank(A) = m.
Theorem 7.5.10. Let A be an m× n matrix. Then the following statements
are equivalent.
(1) Ax = 0 has only the trivial solution.
(2) Ax = b has at most one solution for every b ∈ Rm.
(3) A has full column rank.
116 CHAPTER 7. DIMENSION AND STRUCTURE
Proof. Equiv. of (1) and (2) are the contents of Theorem 3.5.3. (1) ⇔ (3).
Let a1, · · · ,an be the column vectors of A, and write Ax = 0 in the form
x1a1 + · · ·+ xnan = 0. (7.21)
Then Ax = 0 has only the trivial solution iff the vectors a1, · · · ,an are linearly
independent.
Over-determined and Under determined Linear Systems
Theorem 7.5.11. Let A be an m× n matrix.
(1) (Over-determined) If m > n, then the system Ax = b is inconsistent for
some vector b ∈ Rn.
(2) (Under-determined) If m < n, then for every vector b ∈ Rn the system
Ax = b is either inconsistent or has infinitely many solutions.
Matrices of the form ATA and AA
T
Let A be a matrix with column vectors a1,a2, · · · ,an. Then
ATA =
a1 · a1 a1 · a2 · · · a1 · ana2 · a1 a2 · a2 · · · a2 · an
......
......
an · a1 an · a2 · · · an · an
(7.22)
On the other hand, if r1, r2, · · · , rm are row vectors of A, then
AAT =
r1 · r1 r1 · r2 · · · r1 · rmr2 · r1 r2 · r2 · · · r2 · rm
......
......
rm · r1 rm · r2 · · · rm · rm
(7.23)
Theorem 7.5.12. Let A be an m× n matrix.
(1) A and ATA have the same null space.
(2) A and ATA have the same row space.
(3) AT and ATA have the same column space.
7.5. THE RANK THEOREM AND ITS IMPLICATIONS 117
(4) A and ATA have the same rank.
Proof. Let A be an m× n matrix. .... A and ATA have the same null space.
A and ATA have the same row space.
Theorem 7.5.13. Let A be an m× n matrix.
(1) AT and AAT have the same null space.
(2) AT and AAT have the same row space.
(3) A and AAT have the same column space.
(4) A and AAT have the same rank.
Some Unifying Theorem
Theorem 7.5.14. Let A be an m× n matrix. Then the following statements
are equivalent.
(1) Ax = 0 has only the trivial solution.
(2) Ax = b has at most one solution for every b ∈ Rm.
(3) A has full column rank(n).
(4) ATA(n × n) is invertible.
Theorem 7.5.15. Let A be an m× n matrix. Then the following statements
are equivalent.
(1) ATx = 0 has only the trivial solution.
(2) ATx = b has at most one solution for every b ∈ Rn.
(3) A has full row rank.
(4) AAT is invertible.
Example 7.5.16.
A =
1 0
2 1
−3 1
118 CHAPTER 7. DIMENSION AND STRUCTURE
7.6 Pivot Theorem and Its Implications
Basis Problem Revisited
Consider finding a basis for a subspace W spanned by a set of vectors S =
{v1, · · · ,vs}. Two possibilities:
(1) Find any basis for W
(2) Find a basis for W consisting of vectors in S.
First basis problem may be solved by forming a matrix whose row consisting of
vectors in S. One may reduce it into a row echelon form and extract nonzero
row vectors.
One of the ways to solve the second basis problem is to form a matrix A
whose column consist of vectors from S and find a basis for the column space
of A.
Some remarks: We know that the row operations do not change the row
space. However, the row operations do change the column space.
Example 7.6.1.
A =
1 2 −1
2 0 2
3 2 1
E31(−3)E21(−2)⇒ B =
1 2 −1
0 −4 4
0 −4 4
If we let A = [c1, c2, c3] and B = [c′1, c′2, c
′3] then we see c1 − c2 − c3 = 0
and c′1 − c′2 − c′3 = 0. In general, if B is row equivalent to A(i.e., EA = B
for some product of elementary matrices E), then the solution of Ax = 0 and
Bx = 0 are the same. Hence the following relation holds.
x1c1 + x2c2 + · · · + xncn = 0
if and only if
x1c′1 + x2c
′2 + · · ·+ xnc
′n = 0.
Theorem 7.6.2. Let A and B be row equivalent matrices.
(1) If some subset of column vectors of A is linearly independent, then the
corresponding column vectors of B are linearly independent, and vice
versa.
7.6. PIVOT THEOREM AND ITS IMPLICATIONS 119
(2) If some subset of column vectors of A is linearly dependent, then the cor-
responding column vectors of B are linearly dependent, and vice versa.
Moreover, the column vectors in the two matrices have the same depen-
dency relationships.
Example 7.6.3. Find a basis for the column space consisting of column vec-
tors of
A =
1 −3 4 −2 5 4
2 −6 9 −1 8 2
2 −6 9 −1 9 7
−1 3 −4 2 −5 −4
sol. Row reduced echelon form becomes
U =
1 −3 4 −2 5 4
0 0 1 3 −2 −6
0 0 0 0 1 5
0 0 0 0 0 0
We see rank(A) = 3. Hence it suffices to choose three linearly independent
columns. Since the linearly independency does not change by row operations,
we may choose columns corresponding to leading 1’s in echelon form. Thus
columns 1, 3 and 5 from A suffices.
Definition 7.6.4. The columns chosen above (corresponding to leading 1’ in
row echelon form) are called pivot columns.
Theorem 7.6.5 (Pivot Theorem). The pivot columns of a nonzero matrix A
form a basis for the column space of A.
Algorithm 1 -Finding Basis
IfW is a subspace spanned by the vectors S = {v1, · · · ,vs}, then the following
procedure produces a basis for W from S. Steps 4 and 5 gives a way to express
vectors in S not in the basis as linear combinations of basis vectors.
Step 1. Form the matrix A that has v1, · · · ,vs as successive column vectors.
Step 2. Reduce A to row echelon form U and identify the pivot columns.
Step 3. Extract the pivot columns of A as a basis for W .
120 CHAPTER 7. DIMENSION AND STRUCTURE
Step 4. If it is desired to express the vectors that are not in the basis as a
linear combinations of basis vectors, then continue reducing U to the reduced
row echelon form R.
Step 5. Find the relation between columns of R by inspection and the the same
relation between columns of A hold.
Example 7.6.6. (a) Find a basis for the column space consisting of column
vectors of
A =
1 −3 4 −2 5 4
2 −6 9 −1 8 2
2 −6 9 −1 9 7
−1 3 −4 2 −5 −4
and (b) express those column vectors of that are not in the basis as linear
combinations of basis vectors.
sol. (a) Row reduce gives
U =
1 −3 4 −2 5 4
0 0 1 3 −2 −6
0 0 0 0 1 5
0 0 0 0 0 0
We see columns 1, 3 and 5 from A form a basis.
(b). We need a relationship between columns. The row reduced echelon
form is
R =
1 −3 0 −14 13 28
0 0 1 3 0 4
0 0 0 0 1 5
0 0 0 0 0 0
⇒
1 −3 0 −14 0 −37
0 0 1 3 0 4
0 0 0 0 1 5
0 0 0 0 0 0
If we denote the columns of the row reduced form by c′1, c′2, · · · , c′5, we see by
inspection that
c′2 = −3c′1, c′4 = −14c′1 + 3c′3, c′6 = −37c′1 + 4c′3 + 5c′5.
7.6. PIVOT THEOREM AND ITS IMPLICATIONS 121
Since the relation between columns do not change by row operation by Theo-
rem 7.6.2, the same relation must hold for the columns of A, i.e.,
c2 = −3c1, c4 = −14c1 + 3c3, c6 = −37c1 + 4c3 + 5c5.
Bases for Fundamental Spaces
Let U be the row echelon form of a matrix A. We have seen how to find bases
of three fundamental spaces of A by reducing to row echelon form. We have
(1) The nonzero rows U form a basis for row(A).
(2) The columns of U with leading 1’ identify the pivot columns of A and
these form a basis for col(A).
(3) The canonical solution of Ax = 0 form a basis for null(A).
How to find a basis for null(AT ) ? An obvious answer is to use row reduction
to AT and find the solution of ATx = 0. However, it would be desirable to find
the basis applying row reduction to A. But how? First note that the dimension
of null(AT ) is m− k, and that ATx = 0 is the same as xTA = 0.
Algorithm 2
If A is m× n matrix with rank k, and if k < m, then we can find the basis for
null(AT ) by the following procedure.
Step 1. Adjoint m×m identity matrix Im to the rhs of A to form [A|Im]
Step 2. Apply row operations to [A|Im] until we obtain row echelon form,
denote it by [U |E]
Step 3. Repartition [U |E] by separating zero rows. Thus
V | E1
−−− · − − −0 | E2
n m
k
m− k
Step 4. The row vectors of E2 form a basis for null(AT )
Optional Proof. The vectors y ∈ null(AT ) ⊂ Rm if and only if yTA = 0.
Thus applying elementary row operations to [A|Im] we get [EA|E] = [U |E].
122 CHAPTER 7. DIMENSION AND STRUCTURE
Now
[U |E] =
V | E1
−−− · − − −0 | E2
where V is k × n matrix. Hence
V
−0
= U = EA =
E1
−E2
A =
E1A
−−−E2A
From this we see E2A = 0 is (m− k)× n matrix. Thus the row vectors of E2
are orthogonal to col(A) thus belong to null(AT ). It remains to show that the
row vectors of E2 form a basis. Write the relation E2A = 0 in the following
form
y1
y2
...
ym−k
A = 0
The m − k rows of E2 are clearly linearly independent. Since the dimension
of null(AT ) is m − k, we conclude the m − k rows of E2 span the null space
of AT .
Example 7.6.7. Find a basis for null(AT ) using the procedure above.
A =
1 −3 4 −2 5 4
2 −6 9 −1 8 2
2 −6 9 −1 9 7
−1 3 −4 2 −5 −4
Column Row Factorization
Theorem 7.6.8 (Column Row Factorization). If A is m×n matrix with rank
k, then A can be factored as
A = CR, (7.24)
where C is the m × k matrix whose column vectors are the pivot columns of
A and R is the k × n matrix row vectors are the nonzero rows in the reduced
row echelon form of A.
7.6. PIVOT THEOREM AND ITS IMPLICATIONS 123
Proof. Applying elementary row operations to [A|Im] we get reduce row ech-
elon form [EA|E] = [R0|E]. Now partition R0 and E−1 as
R0 =
[
R
0
]
and E−1 = [C|D],
where R is the nonzero vectors of R0 and C consists of first k columns of E,
and D consists of m− k last columns of E. Hence
A = E−1R0 = [C|D]
[
R
0
]
= CR+D0 = CR. (7.25)
Here we can see the successive columns of C are the pivot columns of A and
the column vectors in those position of R are the standard basis vectors.
e1, e2, · · · , ek.
Thus j-th pivot of A is Cej which is j-th column of C.
Example 7.6.9. (a) Find a Column Row Factorization for the following ma-
trix using the reduced row echelon form.
A =
1 2 8
−1 −1 −5
2 5 19
→
1 0 2
0 1 3
0 0 0
Hence the first two columns of A are pivot columns of A and the corresponding
rows from the reduced row echelon form are (1, 0, 2) and (0, 1, 3). Hence we
have
A =
1 2
−1 −1
2 5
[
1 0 2
0 1 3
]
= CR.
Column Row Expansion
We have seen in chapter 3 that a matrix can be expressed as a sum of outer
products of columns from the first factor and rows from the second factor.
Thus the previous factorization has the following interpretation.
Theorem 7.6.10 (Column Row Expansion). If A is m× n matrix with rank
124 CHAPTER 7. DIMENSION AND STRUCTURE
k, then A can be factored as
A = c1r1 + c2r2 + · · ·+ ckrk, (7.26)
where c1, c2 · · · , ck is the successive pivot columns of A and r1, r2, · · · , rk are
successive nonzero rows in the reduced row echelon form of A.
Example 7.6.11. k = 2 in the above example. So we have A = c1r1 + c2r2,
i.e.,
1 2 8
−1 −1 −5
2 5 19
=
1
−1
2
[
1 0 2]
+
2
−1
5
[
0 1 3]
=
1 0 2
−1 0 −2
2 0 4
+
0 2 6
0 −1 −3
0 5 15
.
7.7 Projection Theorem and its Implications
Orthogonal Projection onto Lines
Example 7.7.1. Let a be any nonzero vector. Consider the orthogonal pro-
jection of any vector x onto the line W = span{a}. If x1 is the orthogonal
projection of x, then we have
x = x1 + x2, x1 = ka for some scalar k and x2 ⊥ a.
Since
0 = x2 · a = (x− ka) · a,
we see k = x·a‖a‖2 . Thus the orthogonal projection of any vector x onto the line
W = span{a} is given by
Projax =x · a‖a‖2a. (7.27)
Here x− Projax is called the orthogonal complement of x.
Orthogonal projection of x about a line with angle θ is given by the matrix
representation
Pθ =
[
12(1 + cos 2θ) 1
2 sin 2θ12 sin 2θ
12(1− cos 2θ)
]
=
[
cos2 θ sin θ cos θ
sin θ cos θ sin2 θ
]
(7.28)
7.7. PROJECTION THEOREM AND ITS IMPLICATIONS 125
Deriving the matrix representation (7.28) from (7.27)
Let us derive the matrix representation (7.28) again. Let
u =a
‖a‖ := (cos θ, sin θ).
Compute
Projue1 = (e1 · u)u = (cos θ)u, P rojue2 = (e2 · u)u = (sin θ)u.
Thus we obtain
Pθ = [Projue1 Proj
ue2] =
[
cos2 θ sin θ cos θ
sin θ cos θ sin2 θ
]
Projection Operators on Rn
If we use the concept of operator, then the orthogonal projection operator
T : Rn → Rn is defined by
T (x) = Projax =
x · a‖a‖2a. (7.29)
Theorem 7.7.2. The standard matrix for the operator T (x) = Projax is
P =1
aTaaaT . (7.30)
Normalizing a to u = a
‖a‖ we get
P = uuT . (7.31)
Proof. We will be done if we compute the columns of T .
T (ej) =ej · a‖a‖2 a =
aj‖a‖2a.
Here aj is the j-th entry of a. Hence
P =[
a1‖a‖2a,
a2‖a‖2a, · · · ,
an‖a‖2a
]
=1
‖a‖2a[
a1, a2, · · · , an]
=1
aTaaaT . (7.32)
126 CHAPTER 7. DIMENSION AND STRUCTURE
Example 7.7.3. Find the standard matrix P when a = [2,−1, 1].
sol. aTa = 4 + 1 + 1 = 6 and
aaT =
2
−1
1
[2,−1, 1] =
4 −2 2
−2 1 −1
2 −1 1
Hence
P =1
6
4 −2 2
−2 1 −1
2 −1 1
x2
Oθ
x
Wa
x1
Figure 7.2: Projection of x onto W = Span(a)
Orthogonal Projection onto General Subspaces
Theorem 7.7.4 (Projection Theorem for Subspaces). If W is a subspace of
Rn, then every vector x in R
n is expressed in exactly one way as
x = x1 + x2, (7.33)
where x1 is in W and x2 is in W⊥.
Proof. We may assume W 6= {0} and let {w1, · · · ,wk} be a basis of W .(Note
that k ≤ n) Let M be the matrix having wi as columns. Then W is the
column space of M and the orthogonal complement of W is the nullspace of
MT .
If x can be written in the form
x = x1 + x2, (7.34)
7.7. PROJECTION THEOREM AND ITS IMPLICATIONS 127
where x1 is in W and x2 is in W⊥, then
x1 = Mv for some v ∈ Rn and MTx2 = 0.
But then we have
MT (x− x1) = 0 or MT (x−Mv) = 0.
Now consider
MT (x−Mv) = 0 or MTMv = MTx. (7.35)
The matrix M has full column rank, hence MTM is invertible, so we have
unique solution
v = (MTM)−1MTx. (7.36)
In the special case W is the line through the origin, the vectors x1 and x2
are the the same as previous example. So we have actually shown that the
expression in (7.34) is possible in exactly one way. The vector x1 is called the
orthogonal projection of x onto W (written as ProjWx) and x2 is called
the orthogonal projection of x onto W⊥(written as ProjW⊥x).
x = ProjWx+ProjW⊥x. (7.37)
The relations ProjWx = x1 = Mv and (7.36) are rewritten in the following
theorem.
Theorem 7.7.5. If W is a nonzero subspace of Rn, and if M is any matrix
whose column vectors form a basis for W , then
ProjWx = M(MTM)−1MTx, (7.38)
for x ∈ Rn.
The standard matrix corresponding to the projection is
P = M(MTM)−1MT . (7.39)
The action of MT is to eliminate the component orthogonal to the of col(M),
and the M on the left is to project to col(M). One way to check this formula
128 CHAPTER 7. DIMENSION AND STRUCTURE
is to verify that
Px1 = PMv = M(MTM)−1MTMv = Mv and Px2 = M(MTM)−1MTx2 = 0.
Example 7.7.6. Find the standard matrix P for the orthogonal projection
of R3 onto the plane
(1) x− 3y − 4z = 0.
(2) Use the matrix P to find the orthogonal projection of the vector (1, 2,−1).
First we find a basis for the plane and then form a matrix M to find P . From
the row echelon form we see that the vectors in the plane are given by
[x, y, z] = s[3, 1, 0] + t[4, 0, 1].
Thus the set {(3, 1, 0), (4, 0, 1)} is a basis. Hence
M =
3 4
1 0
0 1
⇒ MTM =
[
3 1 0
4 0 1
]
3 4
1 0
0 1
=
[
10 12
12 17
]
,
and
(MTM)−1 =1
26
[
17 −12
−12 10
]
Therefore the projection matrix P = M(MTM)−1MT is given by
1
26
3 4
1 0
0 1
[
17 −12
−12 10
][
3 1 0
4 0 1
]
=1
26
25 3 4
3 17 −12
4 −12 10
When a Matrix Represent Orthogonal Projection ?
From the previous discussions we know that P = M(MTM)−1MT is the
standard matrix for the projection operator onto the spaceW which is spanned
by the columns of M . We observe:
(1) P T = P .
(2) P 2 = P .
7.7. PROJECTION THEOREM AND ITS IMPLICATIONS 129
Suppose we have an orthogonal projection P onto a k-dimensional subspace
W of Rn. Then
(1) The columns space of P must be k-dimensional.
(2) P is symmetric.
(3) Moreover P 2 = P.(Idempotent)
These properties exactly characterize an orthogonal projection. In fact, we
have
Theorem 7.7.7 (Projection Matrix). If n×n matrix P is the standard matrix
for an orthogonal projection of Rn onto a k-dimensional subspace of Rn if and
only if P is symmetric and idempotent, having rank k. The subspace W is the
column space of P .
Example 7.7.8. Show that A is the standard matrix for an orthogonal pro-
jection of R3 onto a line through the origin.
A =1
9
1 2 2
2 4 4
2 4 4
We see that A is symmetric, idempotent and has rank 1. Hence it is an
orthogonal projection on the a line. The first column (1, 2, 2) is a basis for the
image space W .
Strang Diagram
Consider the system Ax = b, where A is m×n matrix. Let W = row(A) and
W⊥ = null(A). Recall
x = ProjW +ProjW⊥ . (7.40)
Applying this to W = row(A) and W⊥ = null(A), we get
x = xrow(A) + xnull(A). (7.41)
Similarly, we apply this to W = col(A) and W⊥ = null(AT ). For any vector
b ∈ Rm, we can decompose it as
b = bcol(A) + bnull(AT ). (7.42)
130 CHAPTER 7. DIMENSION AND STRUCTURE
Also note the following relations:
dim(row(A)) + dim(null(A)) = n, (7.43)
dim(col(A)) + dim(null(AT )) = m. (7.44)
The system Ax = b is consistent if and only if b is in the column space of
A, if and only if bnull(AT ) = 0.
Full Column Rank and Consistency of A Linear System
Theorem 7.7.9. Let A be an m× n matrix and b is in the column space of
A.
(1) If A has full column rank, then the system Ax = b has a unique solution,
and that solution is in the row space of A.
(2) If A does not have full column rank, then the system Ax = b has in-
finitely many solutions, but there is a unique solution in the row space
of A. Moreover, among all the solutions the solution in the row space of
A has the smallest norm.
Proof. (1) If A has full column rank, then by Theorem 7.5.10 (7.5.6 of book),
the system Ax = b has either inconsistent or has a unique solution. But since
b is in the column space of A, it must be consistent, and there must exist a
unique solution.
(2) Since A does not have full column rank, the system Ax = 0 has in-
finitely many solutions, and hence Ax = b has infinitely many solutions. We
recall the following.
x = xrow(A) + xnull(A) (7.45)
b = A(xrow(A) + xnull(A)) = Axrow(A). (7.46)
So there is at least one solution in the row space of A. (This also proves the
second part of (1)). To see the uniqueness of solution in the row space for the
case (2), suppose xr and x′r are two solutions. Then
A(xr − x′r) = 0
so that xr−x′r ∈ null(A). However, xr−x′
r is in the row space of A. (italicsized
above.) And since row(A) = null(A)⊥, we must have xr − x′r =∈ null(A)⊥ ∩
7.8. BEST APPROXIMATION AND LEAST SQUARES 131
null(A) = {0}. Finally, any solution satisfies
‖x‖ ≥√
‖xrow(A)‖2 + ‖xnull(A)‖2 ≥ ‖xrow(A)‖.
Theorem 7.7.10. If W is a subspace of Rn, then (W⊥)⊥ = W .
Orthogonal Projection onto W⊥
IfW is a nonzero subspace of Rn, and if M is any matrix whose column vectors
form a basis for W , then
ProjW⊥x = x−M(MTM)−1MTx = (I −M(MTM)−1MT )x, (7.47)
for x ∈ Rn.
The standard matrix corresponding to the orthogonal projection is
I − P = I −M(MTM)−1MT (7.48)
Example 7.7.11. Find the standard matrix corresponding to the orthogonal
projection onto the plane x− 3y − 4z = 0. Since
P = M(MTM)−1MT =1
26
25 3 4
3 17 −12
4 −12 10
, I − P =
1
26
1 −3 −4
−3 9 12
−4 12 16
7.8 Best Approximation and Least Squares
Minimum Distance Problems
Given a subspace W and a vector b ∈ Rn, consider the problem of finding a
vector w ∈ W that is closest to b, i.e., find w ∈ W such that
‖b− w‖ ≤ ‖b−w‖, ∀w ∈ W.
Such a vector, if it exists, is called a best approximation to b from W .
b = ProjWb+ ProjW⊥b.
132 CHAPTER 7. DIMENSION AND STRUCTURE
PW⊥b
b
w
O
W
Figure 7.3: Projection
Theorem 7.8.1 (Best Approximation Theorem). If W is a subspace and b
is a vector in Rn, there is a unique best approximation to b from W , namely
w = ProjWb.
Proof. For every vector w ∈ W we have
b−w = (b− ProjWb) + (ProjWb−w).
Since the two terms are orthogonal(the first vector is in W⊥ and the second
vector is in W ), we have
‖b−w‖2 = ‖b− ProjWb‖2 + ‖ProjWb−w‖2
and hence
‖b− ProjWb‖2 ≤ ‖b−w‖2.
Here we see
d ≡ ‖b− ProjWb‖ = ‖ProjW⊥b‖ (7.49)
is the distance from W .
Example 7.8.2. Find the distance from a point b = (b1, · · · , bn) to the hy-
perplane a1x1 + · · · + anxn = 0.
Denote the hyperplane by W . Then W⊥ = span{a}. We see the distance
to the space W is
‖ProjW⊥b‖ = ‖Projab‖ =
a · b‖a‖ =
a1b1 + · · ·+ anbn√
a21 + · · ·+ a2n
7.8. BEST APPROXIMATION AND LEAST SQUARES 133
Definition 7.8.3. If A is an m × n is matrix and b is a vector in Rm, then
a vector x in Rn is the best approximation solution or least square
solution of Ax = b if
‖b−Ax‖ ≤ ‖b−Ax‖, (7.50)
for all x in Rn. The quantity ‖b−Ax‖ is called the least square error.
Finding Least Square Solutions
How to find the least square solutions of Ax = b ?
Noting that Ax is in the column space of A, we decompose b as
b = Projcol(A)b+ Projcol(A)⊥b.
Then the following is an orthogonal decomposition:
Ax− b = (Ax− Projcol(A)b)− Projcol(A)⊥b ∈ col(A) + col(A)⊥.
The minimum is attained when we can find an x such that
Ax = Projcol(A)b. (7.51)
and
minx∈Rn
‖b−Ax‖ = ‖Projcol(A)⊥b‖.
In practice, one rarely solves (7.51) to compute the least square solution.
Instead, rewriting (7.51) as
b−Ax = b− Projcol(A)b (7.52)
and multiplying AT , we see (since the space col(A)⊥ is equal to null space of
AT ), that
AT (b−Ax) = AT (b− Projcol(A)b) = 0.
This is equivalent to
ATAx = ATb. (7.53)
This is called a normal equation associated with Ax = b.
Theorem 7.8.4. (1) The least square solutions of Ax = b are the solutions
134 CHAPTER 7. DIMENSION AND STRUCTURE
of the normal equation
ATAx = ATb. (7.54)
(2) If A has full column rank, the normal equation has a unique solution,
namely
x = (ATA)−1ATb. (7.55)
(3) If A does not have full column rank, the normal equation has infinitely
many solutions, but there is a unique solution in the row space of A.
Moreover, among all the solutions of the normal equation, the solution
in the row space of A has the smallest norm.
Example 7.8.5. Find the least square solution of the system
x1 − x2 = 4
3x1 + 2x2 = 1
−2x1 + 4x2 = 3.
sol. Compute
x = (ATA)−1ATb. (7.56)
Orthogonality of Least Square Error
Note the following is an orthogonal decomposition:
Ax− b = (Ax− Projcol(A)b)− Projcol(A)⊥b ∈ col(A) + col(A)⊥.
The least square solution x satisfies
Projcol(A)b−Ax = 0. (7.57)
Hence x is a least square solution if and only if
b−Ax = Projnull(AT )b.
Thus
least square error vector = b−Ax = Projnull(AT )b. (7.58)
7.8. BEST APPROXIMATION AND LEAST SQUARES 135
Theorem 7.8.6. A vector x is the least square solution of Ax = b if and only
if the error b−Ax is orthogonal to the column space of A.
More Application of Least Square Solution-Curve fitting
Given a set of data
(x1, y1), (x2, y2), · · · , (xn, yn)
in the xy-plane, one would like to find a line(or a curve) that fits these data
best in some sense.
Assume we use the line y = a+ bx. Then we must have
y1 = a+ bx1
y2 = a+ bx2...
yn = a+ bxn.
(7.59)
1 x1
1 x2...
1 xn
[
a
b
]
=
y1
y2...
yn
(7.60)
So
Mv = y. (7.61)
Its least square solution is obtained if we find the solution of
MTMv = MTy. (7.62)
Least Square Solution- Higher Degree Polynomial
Given data
(x1, y1), (x2, y2), · · · , (xn, yn)
one would like to find a curve(or a line) that fits best in some sense.
Assume we try a polynomial y = a0 + a1x + · · · + amxm. Then we must
136 CHAPTER 7. DIMENSION AND STRUCTURE
0
1
2
3
0 1 2 3 4 5
Figure 7.4: Fitting data by a linear function using least square method
havey1 = a0 + a1x1 + · · ·+ amxm1y2 = a0 + a1x2 + · · ·+ amxm2
...
yn = a0 + a1xn + · · ·+ amxmn .
(7.63)
1 x1 x21 · · · xm11 x2 x22 · · · xm2
...... · · · ...
1 xn x2n · · · xmn
a0
a1...
am
=
y1
y2...
yn
→ Mv = y. (7.64)
Its least square solution is obtained by the solution of
v = (MTM)−1MTy. (7.65)
7.9 Orthonomal Bases and Gram-Schmidt
Orthogonal and Orthonormal Bases
Orthogonal Projection Using Orthonormal Bases
Recall the following.
If W is a nonzero subspace of Rn, and if M is any matrix whose column
vectors form a basis for W , then
ProjWx = M(MTM)−1MTx, (7.66)
7.9. ORTHONOMAL BASES AND GRAM-SCHMIDT 137
for x ∈ Rn. If the column vectors of M are orthonormal, we have MTM = I
and we have
ProjWx = MMTx. (7.67)
The standard matrix corresponding to the projection is
P = MMT . (7.68)
Equation (7.67) can be restated in the following form.
Theorem 7.9.1. If {v1, · · · ,vk} is an orthonormal basis for a subspace W
of Rn, then the orthogonal projection of x in Rn onto W is given by
ProjWx = (x · v1)v1 + · · ·+ (x · vk)vk. (7.69)
Example 7.9.2. Find the orthogonal projection of x onto the plane W
spanned by orthonormal vectors v1,v2 .....
ProjWx = .....
2 1 4
2 0 1
4 1 3
Theorem 7.9.3. If {v1, · · · ,vk} is an orthonormal basis for a subspace W
of Rn, then orthogonal projection onto W can be expressed as
ProjWx = (x · v1)v1 + · · ·+ (x · vk)vk. (7.70)
Proof. Let
M =[
v1 v2 · · · vk
]
Then
ProjWx = MMTx.
So (7.70) is just a restatement of this.
Two examples.
Trace and Orthogonal Projections
Theorem 7.9.4. If P is the standard matrix for an orthogonal projection of
Rn onto a subspace W , then trace(P ) = rank(P ).
138 CHAPTER 7. DIMENSION AND STRUCTURE
Proof. First note that
P = MMT =[
v1 v2 · · · vk
]
v1
v2
...
vk
= v1vT1 + v2v
T2 + · · ·+ vkv
Tk . (7.71)
Direct computation shows that trace(P ) = 1 + 1 + · · · + 1 = k.
Linear Combinations of Orthonormal Basis Vectors
Theorem 7.9.5. If {v1, · · · ,vk} is an orthonormal basis for a subspace W
of Rn, and if w is a vector in W , then
w = (w · v1)v1 + · · ·+ (w · vk)vk. (7.72)
Finding Orthonormal Bases
Theorem 7.9.6. Every nonzero subspace of Rn has an orthonormal basis.
Proof. Let W be a nonzero subspace of Rn, and let {w1, · · · ,wk} be any basis
for W . It suffices to show that we can construct an orthogonal basis. (We
normalize it to get an orthonormal basis.) Let Wi = span{w1, · · · ,wi}, i =1, 2, · · · , k and proceed as follows:
Step 1. Let v1 = w1.
Step 2. We construct a vector orthogonal to v1 by computing an orthogonal
projection of w2 and subtracting it from w2. That is,
v2 = w2 − ProjW1w2 = w2 − w2·v1
‖v1‖2 v1
Step 3. v3 = w3 − ProjW2w3 = w3 − w3·v1
‖v1‖2 v1 − w3·v2
‖v2‖2 v2
Step 4. In general, we have
vj = wj − ProjWj−1wj = wj −
∑j−1i=1
wj ·vi
‖vi‖2wi for j = 1, · · · , k.
This process is called Gram-Schmidt process.
Example 7.9.7. Use the Gram-Schmidt process to construct an orthonormal
basis for the plane x+ y + z = 0 in R3.
7.9. ORTHONOMAL BASES AND GRAM-SCHMIDT 139
sol. We need any two linearly independent vectors from the plane. Writing
the equation of plane in parametric form, we introduce y = t1, z = t2 so that
x = −t1 − t2, y = t1, z = t2.
From this we choose t1 = 1, t2 = 0 and t1 = 0, t2 = 1. So the resulting vectors
are
w1 = (−1, 1, 0) and w2 = (−1, 0, 1).
Now use the Gram-Schmidt process.
v1 = w1 = (−1, 1, 0)
v2 = w2 −w2 · v1
‖v1‖2v1 = (−1, 0, 1) − 1
2(−1, 1, 0) = (−1
2,−1
2, 1).
Normalizing
q1 = (− 1√2,1√2, 0)
q2 = (− 1√6,− 1√
6,2√6).
A Property of the Gram-Schmidt Process
Theorem 7.9.8. If S = {w1, · · · ,wk} is a basis for a nonzero subspace of
Rn, and if S′ = {v1, · · · ,vk} is the corresponding orthogonal basis produced
by Gram-Schmidt process, then
(1) {v1, · · · ,vj} is an orthogonal basis for span{w1, · · · ,wj} at the j-th
step.
(2) vj is orthogonal to span{w1, · · · ,wj−1} at the j-th step(j ≥ 2).
Extending Orthonormal Sets to Orthonormal Bases
Theorem 7.9.9. If W is a nonzero subspace of Rn, then
(1) Every orthogonal set of nonzero vectors in W can be extended to an
orthogonal basis for W .
140 CHAPTER 7. DIMENSION AND STRUCTURE
(2) Every orthonormal set in W can be extended to an orthonormal basis for
W .
7.10 QR-Decomposition; Householder Transforma-
tion
QR-Decomposition
Suppose A is an m × k matrix with full column rank(this requires m ≥ k)
whose successive column vectors are {w1, · · · ,wk}. If Gram-Schmidt process
is applied to these vectors to produce an orthonormal basis {q1, · · · ,qk} for thecolumn space of A, and Q is the matrix whose column vectors are {q1, · · · ,qk}in order, what is the relationship between A and Q ?
Let A and Q be the matrices having wi and qi as columns, i.e.,
A = [w1,w2, · · · ,wk], Q = [q1,q2, · · · ,qk].
We can express the vector wi in terms of orthonormal column vectors of Q as
wi =k
∑
j=1
cijqj.
By orthonormal property of qj’s, we see cij = wi · qj and hence
w1 = (w1 · q1)q1 + (w1 · q2)q2 + · · ·+ (w1 · qk)qk
w2 = (w2 · q1)q1 + (w2 · q2)q2 + · · ·+ (w2 · qk)qk
= · · ·wk = (wk · q1)q1 + (wk · q2)q2 + · · · + (wk · qk)qk.
By Theorem 7.9.8, qj is orthogonal to wi when i < j. Hence we have
w1 = (w1 · q1)q1
w2 = (w2 · q1)q1 + (w2 · q2)q2
= · · ·wk = (wk · q1)q1 + (wk · q2)q2 + · · · + (wk · qk)qk.
7.10. QR-DECOMPOSITION; HOUSEHOLDER TRANSFORMATION141
Let us form the upper triangular matrix
R =
(w1 · q1) (w2 · q1) · · · (wk · q1)
0 (w2 · q2) · · · (wk · q2)...
......
...
0 0 · · · (wk · qk)
(7.73)
Then we can see that AQ = R, i.e.,
[w1,w2, · · · ,wk] = [q1,q2, · · · ,qk]
(w1 · q1) (w2 · q1) · · · (wk · q1)
0 (w2 · q2) · · · (wk · q2)...
......
...
0 0 · · · (wk · qk)
A = Q R. (7.74)
Theorem 7.10.1. If A is an m × k(m ≥ k) matrix with full column rank,
then A can be factored as
A = QR, (7.75)
where Q is m× k matrix whose column vectors form an orthonormal basis for
the column space of A, and R is a k × k invertible upper triangular matrix.
In general a matrix factorization of the form A = QR, where column
vectors of Q are orthonormal and R is invertible, upper triangular is called
a QR-decomposition. The QR-decomposition is not unique!(but unique if
rii > 0)
Note that the order of generating R is columnwise in (7.9.6). If we change
the order row wise, we get Modified Gram-Schimdt process.
Other Ways to Obtain QR-decomposition: (Modified) Gram-
Schmidt, Householder and Rotation
One method of finding QR-decomposition for a matrix with full column rank
is to use Gram-Schmidt process to the column vectors of A, where R is given
by (7.73). Unfortunately, it produce large round off error numerically. Hence
it is not recommended to use in numerical purpose. There are other methods.
One is to rearrange the order of orthogonalization(called Modified Gram-
Schmidt). Another method is to use Householder transformation. Still an-
other method is to use the Givens rotation.
142 CHAPTER 7. DIMENSION AND STRUCTURE
Example 7.10.2. Find a QR-decomposition of the following matrix using the
Gram-Schmidt process.
A =
1 −1 0
0 1 1
1 1 1
sol.
w1 = [1, 0, 1]T , w2 = [−1, 1, 1]T , w3 = [0, 1, 1]T
v1 = w1 = [1, 0, 1]T
v2 = w2 −w2 · v1
‖v1‖2v1 = [−1, 1, 1]T − 0
v3 = w3 −w3 · v1
‖v1‖2v1 −
w3 · v2
‖v2‖2v2 = [0, 1, 1]T − 1
2· [1, 0, 1]T − 2
3[−1, 1, 1]T
= [1
6,1
3,−1
6]T
q1 =1√2[1, 0, 1]T , q2 =
1√3[−1, 1, 1]T , q3 =
1√6[1, 2,−1]T
R =
(w1 · q1) (w2 · q1) (w3 · q1)
0 (w2 · q2) (w3 · q2)
0 0 (w3 · q3)
=
√2 0 1√
2
0√3 2√
3
0 0 1√6
Example 7.10.3. Find a QR-decomposition of the matrix
1 −1 4
1 4 −2
1 4 2
1 −1 0
7.10. QR-DECOMPOSITION; HOUSEHOLDER TRANSFORMATION143
Solution.
‖w1‖ = 2
q1 =w1
‖w1‖=
1
2(1, 1, 1, 1)
w′2 = w2 − (w2 · q1)q1 = (−1, 4, 4,−1) − 3
1
2(1, 1, 1, 1) =
1
2(−5, 5, 5,−5)
‖w′2‖ = 5
q2 =w′
2
‖w′2‖
=1
2(−1, 1, 1,−1)
w′3 = w3 − (w3 · q1)q1 − (w3 · q2)q2
= (4,−2, 2, 0) − 21
2(1, 1, 1, 1) − ( -2 )
1
2(−1, 1, 1,−1) = (2,−2, 2,−2)
‖w′3‖ = 4
q3 =w′
3
‖w′3‖
=1
2(1,−1, 1,−1)
Therefore
QR =1
2
1 −1 1
1 1 −1
1 1 1
1 −1 −1
2 3 2
0 5 −2
0 0 4
Here the computation of R was columnwise in (7.74). If we compute row-wise,
we get modified Gram Schmidt.
The Role of QR-decomposition in Least Square Problems
Recall the least square solution of Ax = b are the exact solution of the normal
equation ATAx = ATb, and if A has full column rank, then the unique solution
is given by
x = (ATA)−1ATb. (7.76)
But solving this system by conventional method such as LU-decomposition is
not desirable because of instability.
Alternative method is to use the decomposition A = QR. With this we
rewrite the equation (ATA)x = ATb as
(RTQT )QRx = RTQTb. (7.77)
144 CHAPTER 7. DIMENSION AND STRUCTURE
This becomes
RTRx = RTQTb. (7.78)
Since R is invertible upper triangular matrix, we can easily solve it.
Theorem 7.10.4. If A is an m × k matrix with full column rank, and if
A = QR is a QR-decomposition, then the normal equation ATAx = ATb can
be expressed as
Rx = QTb, (7.79)
and the least square solution is given by
x = R−1QTb. (7.80)
Example 7.10.5. Use a QR-decomposition to find the least square solution
ofx1 + 2x2 + 4x3 = −1
x1 + x2 + x3 = 2
x1 + 2x2 − x3 = 1
x1 − 2x2 + x3 = 2.
The matrix
A =
1 2 3
1 1 1
1 1 1
1 0 1
has full column rank 3. Perform QR -decomposition
q1 =1
2(1, 1, 1, 1)
w′2 = w2 − (w2 · q1)q1 = (2, 1, 1, 0) − 2
1
2(1, 1, 1, 1) = (1, 0, 0,−1)
q2 =w′
2
‖w′2‖
=1√2(1, 0, 0,−1)
w′3 = w3 − (w3 · q1)q1 − (w3 · q2)q2
= (3, 1, 1, 1) − 31
2(1, 1, 1, 1) −
√2
1√2(1, 0, 0,−1) =
1
2(1,−1,−1, 1)
q3 =1
2(1,−1,−1, 1)
7.10. QR-DECOMPOSITION; HOUSEHOLDER TRANSFORMATION145
q
refla⊥x
x
Pax
−2Pax
a⊥
Figure 7.5: Projection Pa and reflection refla⊥ = I − 2proj
a
Thus
1 2 3
1 1 1
1 1 1
1 0 1
=
12
1√2
12
12 0 −1
212 0 −1
212 − 1√
212
2 2 3
0√2
√2
0 0 1
Householder Reflections
Reflection about the hyperplane a⊥(Householder reflection, elementary
reflector) satisfies
x− refla⊥x = 2proj
ax or refl
a⊥x = x− 2proj
ax.
Definition 7.10.6. If a is a nonzero vector in Rn and x is any vector in R
n,
then the reflection of x about the hyperplane a⊥ is defined as
refla⊥x = x− 2proj
ax. (7.81)
The operator T : Rn → Rn defined by T (x) = refl
a⊥x is the reflection of Rn
about the hyperplane a⊥.
The reflection operator about a⊥ can be expressed by the matrix
Ha⊥ = I − 2
aTaaaT .
It is called a Householder Reflections. And if u = a/‖a‖, then
Hu⊥ = I − 2uuT .
This is symmetric and orthogonal.
146 CHAPTER 7. DIMENSION AND STRUCTURE
Theorem 7.10.7. If v and w are two vectors in Rn having the same length,
then the Householder Reflection about the hyperplane (v −w)⊥ maps v into
w.
Example 7.10.8. Let v = (3, 4, 0) and w = (5, 0, 0). Find a Householder
Reflection that maps v into w.
sol. Let a = v − w = (−2, 4, 0) and so ‖a‖ =√20. The Householder
Reflection is
Ha⊥ = I − 2
aTaaaT = I − 2
20
−2
4
0
[
−2 4 0]
=
1 0 0
0 1 0
0 0 1
− 1
10
4 −8 0
−8 16 0
0 0 0
Example 7.10.9. Find a QR-decomposition of the following matrix using
Householder Reflections.
A =
1 −1 0
0 1 1
1 1 1
sol. Recall Ans. by MGS.
A = QR =
1√2
− 1√3
1√6
0 1√3
2√6
1√2
1√3
− 1√6
·
√2 0 1√
2
0√3 2√
3
0 0 1√6
Let us try Householder’s.
a1 = [1, 0, 1]T , a2 = [−1, 1, 1]T , a3 = [0, 1, 1]T
Since ‖a1‖ =√2, α1 =
√2, α1e1−a1
‖α1e1−a1‖ = [√2− 1, 0,−1]T /(
√
4− 2√2)
Q1 = I − 2uuT = I − 2
4− 2√2
3− 2√2 0 1−
√2
0 0 0
1−√2 0 1
7.10. QR-DECOMPOSITION; HOUSEHOLDER TRANSFORMATION147
Q1A =
1 −1 0
0 1 1
1 1 1
− 1
2−√2
3− 2√2 0 1−
√2
0 0 0
1−√2 0 1
1 −1 0
0 1 1
1 1 1
=
1 −1 0
0 1 1
1 1 1
− 1
2−√2
4− 3√2 −2 +
√2 1−
√2
0 0 0
2−√2
√2 1
=1
2−√2
−2 + 2√2 0 −1 +
√2
0 2−√2 2−
√2
0 2− 2√2 1−
√2
=
√2 0 1√
2
0 1 1
0 −√2 −1
Let a2 = [1,−√2], ‖a2‖ =
√3 = α2, so α2e1 − a2 = (
√3− 1,
√2).
u2 =α2e1 − a2
‖α2e1 − a2‖= [
√3− 1,
√2]T /(
√
6− 2√3)
Q2 = I−2u2uT2 = I− 1
3−√3
[
4− 2√3
√6−
√2√
6−√2 2
]
=1
3−√3
[
−1 +√3 −
√6 +
√2
−√6 +
√2 1−
√3
]
Q1 =1√2
1 0 1
0√2 0
1 0 −1
, Q2 =
1√3
[
1 −√2
−√2 −1
]
=
1 0 0
0 1√3
−√2√3
0 −√2√3
− 1√3
Q1Q2 =1√2
1 0 1
0√2 0
1 0 −1
1 0 0
0 1√3
−√2√3
0 −√2√3
− 1√3
=1√2
1 −√2√3
− 1√3
0√2√3
− 2√3
1√2√3
1√3
Q2A2 =1√2
1
3−√3
[
−3√2 + 3
√6 −2
√2 + 2
√6
0 −√3 + 1
]
=1√6
[
3√2 2
√2
0 −1
]
So
Q2Q1A =
√2 0 1√
2
0√3 2√
3
0 0 − 1√6
= R′
This coincides with the result earlier(by MGS in Leader) except − sign in the
thrid column.
148 CHAPTER 7. DIMENSION AND STRUCTURE
QR-Decomposition using Householder Reflections
Let A be 4× 4 matrix
A =
X X X X
X X X X
X X X X
X X X X
If we can find orthogonal matrices Q1, Q2, Q3 such that
Q3Q2Q1A = R
is upper triangular, then we would have QR-decomposition
A = Q−11 Q−1
2 Q−13 R = QT
1 QT2Q
T3 R = QR. (7.82)
Now the first step of QR is to reduce the first column to a multiple of e1:
Let A = [a1, A2]. Then we can find an orthogonal matrix Q1 = Hu⊥ , which
maps the first column of A onto α1e1. Since the vectors a1 and α1e1 have the
same lengths, we have α1 = ±‖a1‖(To avoid dividing by small number, choose
the sign that makes ‖α1e1 − a1‖ is larger). We see u is given by
u =α1e1 − a1
‖α1e1 − a1‖. (7.83)
and
Q1A =
α1 b2 . . . bn
0... A12
0
.
We repeat the same process. If (n−1)×(n−1) matrix Q′2 is the elementary
reflector used to map the first column of A12 to a vector of the form α2e1 ,
then the matrix
Q2 =
1 0 . . . 0
0... Q′
2
0
7.11. COORDINATES WITH RESPECT TO A BASIS 149
makes
Q2Q1A =
α1 b2 . . . bn
0 α2 ∗ ∗... 0 A′
23
0 0
Repeat the same process to have
Qn−1 · · ·Q2Q1A = R(upper triangular).
7.11 Coordinates with Respect to a Basis
Nonrectangular Coordinates Systems in Rn
Example 7.11.1. If x = (x1, x2) we can write x = x1e1 + x2e2. But given
two linearly independent vectors v1 and v2 we may find numbers
x = c1v1 + c2v2.
The numbers (c1, c2) are coordinates w.r.t the coordinates systems v1
and v2.
If v1 = (1, 0) and v2 = (12 ,√32 ) then the point (4,
√3) = 3(1, 0) + 2(12 ,
√32 )
has coordinate (2, 2).
v1 3v1
v2
2v2
(3, 2)
Ob
b
Figure 7.6: Coordinates of (4,√3) w.r.t v1 and v2
Definition 7.11.2. If B = {v1,v2, · · · ,vk} is an ordered basis for a subspace
W of Rn and if
w = a1v1 + a2v2 + · · · + akvk,
150 CHAPTER 7. DIMENSION AND STRUCTURE
then we call
a1, a2, · · · , ak
the coordinates of w with respect to the coordinates system B. Here
aj the vj-coordinate of w. We denote it by
(w)B = (a1, a2, · · · , ak)
and call it the coordinate vector for w with respect to B. The column
vector
[w]B =
a1
a2...
ak
is called the coordinate matrix for w with respect to B.
Example 7.11.3. Let
v1 = (1,−2, 5), v2 = (0,−1, 3), v3 = (0,−1, 1).
(1) Express the vector b1 = (3,−2, 5) with respect to the basisB = {v1,v2,v3}.will lie in the span of S.
(2) Find the vector w in R3 having (−2, 1, 4) as coordinate vector (w)B .
sol. (1) Find the solution of the relation
b1 = (3,−2, 5) = a1v1 + a2v2 + a3v3.
Thus solving
1 0 0
−2 −1 −1
5 3 1
a1
a2
a3
=
3
−2
5
([
v1, v2, v3
]
)
We see a1 = 3, a2 = −3, a3 = −1. Actually in vector notation, we have
a = B−1b1.
(2)
w = −2v1 + v2 + 4v3 = (−2,−1,−3).
7.11. COORDINATES WITH RESPECT TO A BASIS 151
Coordinates w.r.t an Orthonormal Bases
If B = {v1,v2, · · · ,vk} is an orthonormal basis for a subspace W of Rn and
w is any vector in Rn, then
w = (w · v1)v1 + (w · v2)v2 + · · · + (w · vk)vk. (7.84)
Hence the coordinates of w with respect to B is
(w)B = ((w · v1), (w · v2), · · · , (w · vk)). (7.85)
Change of Basis
If w is any vector in Rn and if we change the basis B = {v1,v2, · · · ,vn} to
another basis B′ = {v′1,v
′2, · · · ,v′
n} what happens to the coordinates ?
For simplicity, assume n = 2. Let
B = {v1,v2}, and B′ = {v′1,v
′2}.
If
[v1]B′ =
[
a
b
]
and [v2]B′ =
[
c
d
]
(7.86)
thenv1 = av′
1 + bv′2
v2 = cv′1 + dv′
2.(7.87)
Now if w is any vector in W with
[w]B =
[
k1
k2
]
. (7.88)
Then
w = k1v1 + k2v2.
To express into the new coord. system, we use (7.87) to see
w = k1(av′1 + bv′
2) + k2(cv′1 + dv′
2) = (k1a+ k2c)v′1 + (k1b+ k2d)v
′2.
152 CHAPTER 7. DIMENSION AND STRUCTURE
Thus in the new new coord. system,
[w]B′ =
[
k1a+ k2c
k1b+ k2d
]
This can be written as
[w]B′ =
[
k1a+ k2c
k1b+ k2d
]
=
[
a c
b d
][
k1
k2
]
=
[
a c
b d
]
[w]B . (7.89)
Thus the new coordinate can be obtained by multiplying the old coordinate
by[
a c
b d
]
=[
[v1]B′ [v2]B′
]
. (7.90)
Remark 7.11.4. How to find the transformation matrix? Let a = [a, b] and
b = [c, d]. Recall Example 7.11.3 and let [B] be the matrix whose columns are
v1,v2, similarly for [B′]. Then the columns of transformation matrix [a b] are
given by a = [B′]−1v1 and b = [B′]−1v2. So the transformation matrix is
[B′]−1[B] =[
v′1 v′
2
]−1 [
v1 v2
]
.
In general, this result is summarized as
Theorem 7.11.5 (Change of Basis). If w is a vector in Rn and if B =
{v1,v2, · · · ,vn} and B′ = {v′1,v
′2, · · · ,v′
n} are bases for Rn, then the coordi-
nates of w w.r.t two bases are related by
[w]B′ = PB→B′ [w]B ,
where
PB→B′ =[
[v1]B′ [v2]B′ · · · [vn]B′
]
=[
v′1 v′
2 · · · v′n
]−1 [
v1 v2 · · · vn
]
(7.91)
is the transition matrix or the coordinate change matrix.
Example 7.11.6. The bases are B1 = {e1, e2} and B2 = {v1,v2} where
e1 = (1, 0), e2 = (0, 1), v1 = (2, 1), v2 = (−1, 2).
(1) Find the transition matrix from B1 to B2.
7.11. COORDINATES WITH RESPECT TO A BASIS 153
(2) Find [w]B2 when [w]B2 = [2,−5].
PB1→B2 =[
[e1]B2 [e2]B2
]
(7.92)
First express e1 = (1, 0) and e2 in terms of v1 = (2, 1), v2 = (−1, 2), that is
solve[
2 −1
1 2
][
c1 d1
c2 d2
]
=
[
1 0
0 1
]
(7.93)
then we get[
c1
c2
]
=
[
25
−15
]
,
[
d1
d2
]
=
[
1525
]
So
e1 =2
5v1 −
1
5v2, e2 =
1
5v1 +
2
5v2,
from which we see
[e1]B2 =
[
25
−15
]
, [e2]B2 =
[
1525
]
Finally the transition matrix is
PB1→B2 =[
[e1]B2 [e2]B2
]
=
[
25
15
−15
25
]
=[
v1 v2
]−1 [
e1 e2
]
(7.94)
We note that the columns of the transition matrix were obtained by row
operations to the augmented matrix of (7.93). This give a technique to find a
transition matrix, see below.
Theorem 7.11.7 (Inverse of a transition matrix). If B and B′ are bases for
Rn, then the transition matrix PB′→B and PB→B′ are inverse of each other;
that is
(PB′→B)−1 = PB→B′ ,
where
PB→B′ =[
[v1]B′ [v2]B′ · · · [vn]B′
]
(7.95)
is the transition matrix or the coordinate change matrix.
154 CHAPTER 7. DIMENSION AND STRUCTURE
A Technique to Find a Transition Matrix
Let us show how to compute the transition matrix between B and B′:
PB→B′ =[
[v1]B′ [v2]B′ · · · [vn]B′
]
(7.96)
The entries of [vj ]B′ are the coefficients that are required to express vj as a
linear combination of v′1,v
′2, · · · ,v′
n, hence can be obtained by solving
[
v′1 v′
2 · · · v′n
]
x = vj , (7.97)
for j = 1, 2, · · · , n. These correspond to reducing the augmented matrix
[
v′1 v′
2 · · · v′n|v1 v2 · · · vn
]
(7.98)
to[
I|[v1]B′ [v1]B′ · · · [vn]B′
]
= [I|PB→B′ ] = [B′]−1[B].
In other words,
[
new basis | old basis]
row operation=⇒
[
I | transition matrix]
(7.99)
This may be viewed as
[
v′1 v′
2 · · · v′n
]−1 [
v1 v2 · · · vn
]
In summary, a procedure to compute PB→B′ is
Step 1. We start from the augmented matrix [B′|B]
Step 2. Use elementary row operations to reduce it to row echelon form
Step 3. Obtain [I|PB→B′ ]
Step 4. Extract PB→B′
As a particular case, we obtain the following result.
Proposition 7.11.8. If we change the basis from B to standard basis, then
the augmented matrix
[
e1 e2 · · · en|v1 v2 · · · vn
]
(7.100)
7.11. COORDINATES WITH RESPECT TO A BASIS 155
itself reveals the transition matrix as
PB→S =[
v1|v2| · · · |vn
]
(7.101)
Its converse is the following.
Theorem 7.11.9 (orthogonal transition matrix). Conversely, if P is any
invertible n×n matrix with column vectors, p1,p2, · · · ,pn then P is the tran-
sition matrix from the basis B = {p1,p2, · · · ,pn} to the standard basis.
New way to think about the matrices
If
A =
[
1 2
5 7
]
is any invertible matrix, we may view it as a transition matrix from the basis
B =
{[
1
5
]
,
[
2
7
]}
to the standard basis. This is exactly what the above theorem says.
Coordinate Maps
If B is a basis for Rn, then the transformation
x → (x)B or in column notation x → [x]B
is called the coordinate map.
Theorem 7.11.10 (orthogonal transition matrix). If B is a basis for Rn,
then the coordinate map x → [x]B is a 1-1 linear operator on Rn. Moreover,
if B is an orthonormal basis for Rn, then it is an orthogonal operator.
Transition between Orthonormal Bases
Theorem 7.11.11. If B and B′ are two orthonormal bases for Rn, then the
transition matrices PB→B′ and PB′→B are orthogonal.
156 CHAPTER 7. DIMENSION AND STRUCTURE
(cos θ, sin θ)
θ
O x
y
Figure 7.7: Rotation of e1 and e2 by θ
Proof. Consider the transition matrix
PB→B′ =[
[v1]B′ [v2]B′ · · · [vn]B′
]
=[
v′1 v′
2 · · · v′n
]−1 [
v1 v2 · · · vn
]
(7.102)
Both [B′] and [B] are orthogonal, since the columns are orthogonal. So are
the inverse, and the product of orthogonal matrices are orthogonal.
Example 7.11.12. Let S be the standard basis for R2 and let B = {v1,v2}be the basis obtained by rotation the standard vectors about the origin by an
angle θ. Then the transition matrix from B to S is
PB→S = [S]−1[B] =
[
cos θ − sin θ
sin θ cos θ
]
The the transition matrix from S to B is
PS→B = P−1B→S =
[
cos θ sin θ
− sin θ cos θ
]
Both of them are orthogonal.
Application to Rotation of Coordinates
If B = {v1, v2} where v1,v2 are obtained by rotating the standard vectors
e1 and e2 by an angle of θ, then the transition matrix is
PB→S =
[
cos θ − sin θ
sin θ cos θ
]
7.11. COORDINATES WITH RESPECT TO A BASIS 157
Thus if (x, y) is the coordinates in xy-axes and (x′, y′) is the coordinates in
x′y′-axes(rotated), then
[
x
y
]
=
[
cos θ − sin θ
sin θ cos θ
][
x′
y′
]
(7.103)
Rotation
Rotate xy-coordinate by θ and call new coordinate x′y′- Then P (x, y) is rep-
resented by (x′, y′) in x′y′-coordinate.
α
x′
y′
b
θx
y
O
M ′
M
P
{
(x, y)
(x′, y′)
Figure 7.8: Rotation of axis
From fig 7.8 we see
x = OM = OP cos(α+ θ) = OP cosα cos θ −OP sinα sin θ
y = MP = OP sin(α+ θ) = OP cosα sin θ +OP sinα cos θ.
On the other hand,
OP cosα = OM ′ = x′, OP sinα = M ′P ′ = y′.
Proposition 7.11.13. If (x′, y′) is the new coordinate of the point P = (x, y)
in the standard xy-coordinate, then we have
x = x′ cos θ − y′ sin θ
y = x′ sin θ + y′ cos θ.