5.1 the scalar product in rn - 5. orthogonality 5.1. the scalar product in euclidean space angles...
TRANSCRIPT
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
5.1 The Scalar Product in Rn
Thusfar we have restricted ourselves to vector spaces and theoperations of addition and scalar multiplication: how elsemight we combine vectors?
You should know about the scalar (dot) and cross products ofvectors in R3
The scalar product extends nicely to other vector spaces, whilethe cross product is another story1
The basic purpose of scalar products is to define and analyzethe lengths of and angles between vectors2
1But not for this class! You may see ‘wedge’ products in later classes. . .2It is important to note that everything in this chapter, until mentioned,
only applies to real vector spaces (where F = R)
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Euclidean Space
Definition 5.1.1Suppose x, y ∈ Rn are written with respect to the standardbasis {e1, . . . , en}
1 The scalar product of x, y is the real numbera
(x, y) := xTy = x1y1 + x2y2 + · · ·+ xnyn2 x, y are orthogonal or perpendicular if (x, y) = 03 n-dimensional Euclidean Space Rn is the vector space of
column vectors Rn×1 together with the scalar product
aOther notations include x · y and 〈x, y〉
Euclidean Space is more than just a collection of co-ordinatesvectors: it implicitly comes with notions of angle and length3
Important Fact: (y, x) = yTx = (xTy)T = xTy = (x, y) so thescalar product is symmetric
3To be seen in R2 and R3 shortly
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Angles and Lengths
Definition 5.1.2
The length of a vector x ∈ Rn is its norm ||x|| =√(x, x)
The distance between two vectors x, y is given by ||y− x||
Theorem 5.1.3The angle θ ∈ [0, π] between two vectorsx, y in R2 or R3 satisfies the equation
(x, y) = ||x|| ||y|| cos θ
y
xx
yy − x
(x1, x2)
(y1, y2)
θ
Definition 5.1.4We define the angle θ between x, y ∈ Rn to be the number
θ = cos−1(x, y)||x|| ||y||
θ is the smaller of the two possible angles, since cos−1 hasrange [0, π]
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Proof of Theorem.
If x, y are parallel then θ = 0 or πand the Theorem is trivial
Otherwise, in R2 (or in the planeSpan(x, y) ≤ R3), the cosine ruleholds:
y
xx
yy − x
(x1, x2)
(y1, y2)
θ
||x− y||2 = ||x||2 + ||y||2 − 2 ||x|| ||y|| cos θApplying the definition of norm and scalar product, we obtain
2 ||x|| ||y|| cos θ = ||x||2 + ||y||2 − ||x− y||2
= xTx + yTy− (x− y)T(x− y)= xTx + yTy− (xTx + yTy− xTy− yTx)= xTy + yTx = 2(x, y)
as required
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Basic results & inequalities
Several results that you will have used without thinking inelementary geometry follow directly from the definitions
Theorem 5.1.5 (Cauchy–Schwarz inequality)
If x, y are vectors in Rn then
|(x, y)| ≤ ||x|| ||y||
with equality iff x, y are parallel
Proof.
|(x, y)| =∣∣ ||x|| ||y|| cos θ∣∣ = ||x|| ||y|| |cos θ| ≤ ||x|| ||y||
Equality is satisfied precisely when cos θ = ±1: that is whenθ = 0, π, and so x, y are parallel
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Theorem 5.1.6 (Triangle inequality)
If x, y ∈ Rn then ||x + y|| ≤ ||x||+ ||y||
I.e. Any side of a triangle is shorter than the sum of the others
Proof.
||x + y||2 = (x + y, x + y) = (x + y)T(x + y)= ||x||2 + 2(x, y) + ||y||2
≤ ||x||2 + 2 |(x, y)|+ ||y||2
≤ ||x||2 + 2 ||x|| ||y||+ ||y||2
= (||x||+ ||y||)2
y
xx
y
x + y
If (x, y) = 0, the second line in the proof of the triangleinequality immediately yields
Theorem 5.1.7 (Pythagoras’)
If x, y ∈ Rn are orthogonal then ||x + y||2 = ||x||2 + ||y||2
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Example
Let x =( 1
2−1
)and y =
( −313
), then
||x|| =√(x, x) =
√1 + 4 + 1 =
√6
||y|| =√
9 + 1 + 9 =√
19
(x, y) = −3 + 2− 3 = −4
θ = cos−1(x, y)||x|| ||y|| = cos
−1 −4√6√
19≈ 1.955 rad ≈ 112◦
||y− x|| =∣∣∣∣∣∣( −4−1
4
)∣∣∣∣∣∣ = √33≤√
6 +√
19
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Projections
Scalar products are useful for calculating how much of onevector points in the direction of another
Definition 5.1.8
The unit vector in the direction of v ∈ Rn is the vector 1||v||vThe scalar projection of x onto v 6= 0 in Rn isthe scalar product
αv(x) =(
1||v||v, x
)=
(v, x)||v||
The orthogonal (or vector) projection of x ontov 6= 0 in Rn is
πv(x) = αv(x)1||v||v =
(v, x)
||v||2v
v
x
πv(x)
Note: αv(x) 6= ||πv(x)||: if αv(x) < 0 then the projection of xonto v points in the opposite direction to v
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Orthogonal projection means several things:
1 πv ∈ L(Rn) (πv is a linear map)2 πv(Rn) = Span(v) (Projection onto Span(v))3 πv(v) = v (Identity on Span(v))4 ker πv = v⊥ = {y ∈ Rn : (y, v) = 0} (Orthogonality)
1 , 2 , 3 say that πv is a projection
4 makes the projection orthogonal: any-thing orthogonal to v is mapped to zero
Similarly αv ∈ L(Rn, R)v
x
πv(x)
ker πv
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
The matrix of a projection
Since πv is a linear map, it has a standard matrix representationIndeed
πv(x) =(v, x)
||v||2v = v
(v, x)
||v||2= v
vTx
||v||2=
vvT
||v||2x
whence the matrix of πv is the n× n matrix vvT
||v||2
Example
In R2, orthogonal projection onto v = ( xy ) has matrix
Av =1
x2 + y2
(xy
)(x y) =
1x2 + y2
(x2 xyxy y2
)Projection onto the x-axis is therefore Ai =
(1 00 0
), while
projection onto the line y = x is Ai+j = 12(
1 11 1
)
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
Planes in R3
Projections are useful for describing planes4 in R3
Let P be the plane normal to n =( a
bc
)∈ R3 and which passes
through the point with position vector x0 =( x0
y0z0
)The distance d of the planefrom the origin is the scalarprojection of any vector inthe plane onto n: thus
d = αn(x0) =(x0, n)||n||
=ax0 + by0 + cz0√
a2 + b2 + c2
4And more general affine spaces in arbitrary dimension
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
The plane P is the set of points whose scalar projection onto n isd: otherwise said
P = {x : αn(x) = d = αn(x0)}However
αn(x) = αn(x0) ⇐⇒ αn(x− x0) = 0 ⇐⇒ (x− x0, n) = 0⇐⇒ a(x− x0) + b(y− y0) + c(z− z0) = 0
which is an alternative description of the plane
Example
If x0 =(
793
), and n =
( 031
), then P has equation
3(y− 9) + (z− 3) = 0 or 3y + z = 30
The distance to P is
d = αn(x0) =(x0, n)||n|| =
30√10
-
5. Orthogonality 5.1. The Scalar Product in Euclidean Space
The distance of a vector y from Pis the scalar projection of y ontothe normal n with d subtracted:
dist(y, P) = αn(y)− d
=(n, y)||n|| − αn(x0)
=(n, y− x0)||n||
Example
If y =(
321
), x0 =
(793
), and n =
( 031
), then
dist(y, P) =−23√
10The negative sign means that y is ‘below’ P (on the oppositeside of P to the direction of n)
-
5. Orthogonality 5.2. Orthogonal subspaces
5.2 Orthogonal subspaces
Recall that x, y ∈ Rn are orthogonal if (x, y) = xTy = 0Definition 5.2.1Two subspaces U, V ≤ Rn are orthogonal, written U ⊥ V, iff
(u, v) = 0 for all u ∈ U, v ∈ VThe orthogonal complement to U in Rn is the subspace
U⊥ := {x ∈ Rn : (x, u) = 0, ∀u ∈ U}
E.g. A plane and its normal line intersectingat the origin are orthogonal complements
-
5. Orthogonality 5.2. Orthogonal subspaces
The previous example suggests the following
Lemma 5.2.21 U⊥ as defined really is a subspace of Rn
2 U ∩U⊥ = {0}
Proof.1 (0, u) = 0 for all u ∈ U, hence 0 ∈ U⊥ is non-empty
Now let u ∈ U, x, y ∈ U⊥, and α, β ∈ R, then
(αx, u) = α(x, u) = 0 =⇒ αx ∈ U⊥
(x + y, u) = (x, u) + (y, u) = 0 =⇒ x + y ∈ U⊥
hence U⊥ is closed under scalar multiplication andaddition and is therefore a subspace
2 Let x ∈ U ∩U⊥, then (x, x) = 0, whence x = 0
-
5. Orthogonality 5.2. Orthogonal subspaces
Examples
1 Suppose U = Span(u1, u2) = Span(( 1
3−2
),( −1
01
))≤ R3
U⊥ is spanned by all vectors orthogonal to u1 & u2Multiples of the cross-product u1 × u2 are the only suchvectors, whence
U⊥ = Span(( 1
3−2
)×( −1
01
))= Span
( 313
)In general the orthogonal complement U⊥ to a planeU ≤ R3 is spanned by the cross-product of any twospanning vectors in U: hence U⊥ is always a line
-
5. Orthogonality 5.2. Orthogonal subspaces
Examples
2 Suppose U = Span(u) = Span( −2
25
)Then (x, u) = 0 ⇐⇒ uTx = 0 ⇐⇒ (−2 2 5)x = 0,whence we find the nullspace:
U⊥ = N(−2 2 5) = Span((
110
),(
502
))In general, the orthogonal complement to a line U ≤ R3 isthe nullspace of a rank 1 matrix uT ∈ R1×3uT has nullity 3− 1 = 2 and so dim U = 2: hence U isalways a plane
We will see shortly that orthogonal complements are naturallythought of as nullspaces of particular matrices
-
5. Orthogonality 5.2. Orthogonal subspaces
Non-degeneracy
The scalar product is said to be non-degenerate in the sense that
(x, y) = 0, ∀y ∈ Rn =⇒ x = 0
Alternatively said, the only vector which is orthogonal toeverything is the zero-vector 0:
(Rn)⊥ = {0}
We can check this: if x is orthogonal to all y ∈ Rn, then
(x, ei) = 0 for every standard basis vectore1, . . . , en
But (x, ei) = xi =⇒ xi = 0 for all i and so x = 0Similarly {0}⊥ = Rn
-
5. Orthogonality 5.2. Orthogonal subspaces
Orthogonality and matrices
For a general matrix A, we consider how the N(A) and C(A)are related to orthogonalityFirst we need to see how matrix multiplication interacts withthe scalar product
Lemma 5.2.3
If x ∈ Rn, y ∈ Rm, and A ∈ Rm×n, then
(Ax, y) = (x, ATy)
Proof.
(Ax, y) = (Ax)Ty = xTATy = (x, ATy)
Note that the scalar product on the left is of vectors in Rm,while the product on the right is of vectors in Rn
-
5. Orthogonality 5.2. Orthogonal subspaces
Theorem 5.2.4 (Fundamental subspaces)
If A ∈ Rm×n thena
N(A) = C(AT)⊥ and N(AT) = C(A)⊥
aWarning: the book uses the strange notation R(A) = Range(A) for thecolumn space of A here, rather than our C(A)
Proof.Using the definition we see that
C(AT)⊥ = {x ∈ Rn : (x, z) = 0, ∀z ∈ C(AT)}= {x ∈ Rn : (x, ATy) = 0, ∀y ∈ Rm}= {x ∈ Rn : (Ax, y) = 0, ∀y ∈ Rm} (Lemma 5.2.3)= {x ∈ Rn :Ax = 0} (Non-degeneracy)= N(A)
The second formula comes from replacing A↔ AT
-
5. Orthogonality 5.2. Orthogonal subspaces
The Theorem tells us how to find the orthogonal complementto a general subspace U ≤ Rn:
1 Take a basis {u1, . . . , ur} of U2 Build the rank r matrix A ∈ Rn×r with columns u1, . . . , ur3 U = C(A) =⇒ U⊥ = N(AT)
Example
If U = Span(( 1
0−10
),( 5−201
))≤ R4, then
A =( 1 5
0 −2−1 00 1
)=⇒ AT =
( 1 0 −1 05 −2 0 1
)from which we find U⊥ as the nullspace
U⊥ = N(AT) = Span(( 0
102
),( 1
01−5
))
-
5. Orthogonality 5.2. Orthogonal subspaces
Theorem 5.2.51 If S ≤ Rn then dim S + dim S⊥ = n2 If B = {s1, . . . , sr} is a basis of S, then we may form a basis
C = {sr+1, . . . , sn} of S⊥ such that B∪ C is a basis of Rn
The Theorem clears up what we’ve already seen: e.g. theorthogonal complement to a line in R3 is always a plane, etc.
Proof.
Suppose S 6= {0}, otherwise S⊥ = Rn and the Theorem is trivial
Otherwise let A =( | |
s1 ··· sr| |
)∈ Rn×r
Since B is a basis we have S = C(A), whence Theorem 5.2.4yields
S⊥ = C(A)⊥ = N(AT)
The Rank–Nullity Theorem gives us 1 :
dim S⊥ = null AT = n− rank AT = n− r = n− dim S
-
5. Orthogonality 5.2. Orthogonal subspaces
Proof (cont).
Now choose a basis C = {sr+1, . . . , sn} of S⊥ and suppose thatα1s1 + · · ·+ αrsr︸ ︷︷ ︸
s∈S
+ αr+1sr+1 + · · ·+ αnsn︸ ︷︷ ︸s⊥∈S⊥
= 0
Lemma 5.2.2 =⇒ s = −s⊥ ∈ S∩ S⊥ = {0} =⇒ s = s⊥ = 0Since B, C are bases it follows that all αi = 0, whence s1, . . . , snare linearly independent
Since dim Rn = n we necessarily have a basis of Rn
-
5. Orthogonality 5.2. Orthogonal subspaces
Direct Sums of Subspaces
Definition 5.2.6Suppose that U, V are subspaces of WMoreover suppose that each w ∈ W can be written uniquely as asum
w = u + v
for some u ∈ U, v ∈ VThen W is the direct sum of U and V and we write W = U⊕VW = U⊕V is equivalent to both of the following holdingsimultaneously:
1 W = U + V; everything in W can be written as acombination u + v
2 U ∩V = {0}; the linear combination is unique
-
5. Orthogonality 5.2. Orthogonal subspaces
Orthogonal complements are always direct sums
Theorem 5.2.7
If S is a subspace of Rn then S⊕ S⊥ = Rn
Proof.
We must prove S∩ S⊥ = {0} and Rn = S + S⊥The first is Lemma 5.2.2, part 2
For the second we use Theorem 5.2.5 and the homework:
dim(S + S⊥) = dim S + dim S⊥ − dim(S∩ S⊥) = n
from which S + S⊥ = Rn
Thus S⊕ S⊥ = Rn
-
5. Orthogonality 5.2. Orthogonal subspaces
Theorem 5.2.8
If S is a subspace of Rn then (S⊥)⊥ = S
Proof.If s ∈ S then
(s, y) = 0 for all y ∈ S⊥
Thus s ∈ (S⊥)⊥ hence S ≤ (S⊥)⊥
Conversely, let z ∈ (S⊥)⊥Since Rn = S⊕ S⊥ there exist unique s ∈ S, s⊥ ∈ S⊥ such that
z = s + s⊥
Now take scalar products with s⊥:
0 = (z, s⊥) = (s, s⊥) + (s⊥, s⊥) =∣∣∣∣∣∣s⊥∣∣∣∣∣∣2 =⇒ s⊥ = 0
Hence z = s ∈ S and we have (S⊥)⊥ ≤ S
Putting both halves together gives the Theorem
-
5. Orthogonality 5.2. Orthogonal subspaces
The Fundamental Subspaces Theorem has a bearing onwhether linear systems have solutions
Corollary 5.2.9
Let A ∈ Rm×n and b ∈ Rm. Then exactly one of the following holds:1 There is a vector x ∈ Rn such that Ax = b, or2 There exists some y ∈ N(AT) ≤ Rm such that (y, b) 6= 0
The corollary is illustrated for m = n = 3, and rank A = 2: asuitable, but unnecessary, choice satisfying 2 is y = πN(AT)(b)
Proof.
N(AT) = C(A)⊥ =⇒ Rn = C(A)⊕N(AT)Write b = p + y according to the directsum, then (b, y) = |y|2This is zero iff b ∈ C(A) iff Ax = b has asolution
-
5. Orthogonality 5.3. Least squares problems
5.3 Least squares problems
In applications, one often has more equations than unknownsand cannot find a solution to all of them simultaneously: whatdo we do?
Idea: find a combination of variables that comes as close aspossible to solving all the equations
Many methods exist: depend on type of problem, definition of‘close as possible’, etc.5
We consider a method for approaching overdetermined linearsystems, first championed by Gauss
5Take a Numerical Analysis class for more!
-
5. Orthogonality 5.3. Least squares problems
Suppose Ax = b is an overdetermined system: i.e.A ∈ Rm×n with m > n (more rows than columns)b ∈ Rm is given
x =( x1...
xn
)∈ Rn is the column vector of variables
The picture from Corollary 5.2.9 gives us an approach:
In general b 6∈ C(A) and there is no solution
The closest we can get to a solution xwould be to choose x̂ so that Ax̂ is as closeas possible to b
Since Rn = C(A)⊕N(AT), we decomposeb = p + y and instead solve Ax̂ = p
-
5. Orthogonality 5.3. Least squares problems
Least Squares?
Suppose Ax = b is our m× n overdetermined systemAny vector x ∈ Rn creates a residual r(x) = Ax− b ∈ Rm: either
1 We can solve Ax = b and thus make r(x) = 0, or2 We want to minimize the residual; equivalent to
minimizing the length ||r(x)||
Definition 5.3.1If x̂ ∈ Rn is such that ||Ax̂− b|| ≤ ||Ax− b|| for all x ∈ Rn thenwe say that x̂ is a least squares solution to the system Ax = b
Minimizing ||r(x)|| is equivalent to minimizing ||r(x)||2, asum of squares: no square-roots!In general there will be many least squares solutions to agiven system: if x̂ is such, then x̂ + n is another for anyn ∈ N(A)
-
5. Orthogonality 5.3. Least squares problems
Theorem 5.3.2Let S ≤ Rm and b ∈ Rm, then:
1 There exists a unique p ∈ S which isclosest to b
2 p ∈ S is closest to b iff p− b ∈ S⊥
Proof.
Since Rm = S⊕ S⊥ we may write b = p + s⊥ for some p ∈ Sand s⊥ ∈ S⊥. Let s ∈ S, then
||b− s||2 = ||b− p + p− s||2
= ||b− p||2 + ||p− s||2 (Pythagoras’)≥ ||b− p||2
with equality iff p = s
The closest point in S to b is therefore the orthogonal projection ofb onto S
-
5. Orthogonality 5.3. Least squares problems
By Theorem 5.3.2, it follows that x̂ is a least squares solution toAx = b iff Ax̂ = p = πC(A)b
We don’t yet have a formula for calculating the orthogonalprojection πS for a general subspace S, but we can calculatewhen S is 1-dimensional
Example
Find the vector p ∈ S = Span(
132
)which is closest to b =
( −101
)We want the projection onto S = Span(s):
p = πS(b) =(s, b)
||s||2s =
114
(132
)
-
5. Orthogonality 5.3. Least squares problems
Unique Least Squares Solutions
We address the simplest situation of least squares solutions x̂ toAx = b: when the solution x̂ is unique
Theorem 5.3.3
If A ∈ Rm×n has rank A = n, then the equationsATAx = ATb
have a unique solution
x̂ = (ATA)−1ATb
which is the unique least squares solution to the system Ax = b
Proof.We must prove three things:
1 ATA is invertible2 x̂ = (ATA)−1ATb is a least squares solution to Ax = b3 x̂ is the only least squares solution
-
5. Orthogonality 5.3. Least squares problems
Proof (cont).1 Suppose that z ∈ Rn solves ATAz = 0
Then
Az ∈ N(AT) = C(A)⊥ (Fundamental Subspaces)But Az ∈ C(A), whenceAz ∈ C(A) ∩ C(A)⊥ = {0} =⇒ Az = 0To finish,
null A = n− rank A = 0from which Az = 0 has only the solution z = 0Hence ATAz = 0 =⇒ z = 0, whence ATA is invertible
2 x̂ = (ATA)−1ATb certainly solves ATAx = ATbHowever, for any y ∈ Rn,
(Ax̂− b, Ay) = (ATA(ATA)−1ATb−ATb, y) = 0hence Ax̂− b ∈ C(A)⊥x̂ is therefore a least squares solution to Ax = b
-
5. Orthogonality 5.3. Least squares problems
Proof (cont).3 Now suppose that ŷ is another least squares solution
ThenA(ŷ− x̂)︸ ︷︷ ︸∈C(A)
= Ax̂− b− (Aŷ− b)︸ ︷︷ ︸∈C(A)⊥
Since C(A) ∩ C(A)⊥ = {0} we have A(ŷ− x̂) = 0Since rank A = n we necessarily have ŷ− x̂ = 0 and so theleast squares solution is unique
Note how often the fact that rank A = n is required: theTheorem is false without it! Example to come. . .
-
5. Orthogonality 5.3. Least squares problems
General Orthogonal Projections (non-examinable)
Corollary 5.3.4
Suppose S ≤ Rm is a subspace with dim S = nLet A ∈ Rm×n be any matrixa with C(A) = SThen
πS = A(ATA)−1AT
is the orthogonal projection onto S
aNecessarily the columns of A form a basis of S
It is easy to see that if A = v is a column vector, then we recoverthe original definition of orthogonal projection onto a vector
πv = v(vTv)−1vT =1
||v||2vvT
-
5. Orthogonality 5.3. Least squares problems
Example
Find the unique least-squares solution to the system ofequations
x1 + 2x2 = 03x1 + 3x2 = 1
x2 = 4
We have Ax = b where A =(
1 23 30 1
)and b =
( 014
)Since rank A = 2, the Theorem says that the unique solution is
x̂ = (ATA)−1ATb =((
1 3 02 3 1
) ( 1 23 30 1
))−1 (1 3 02 3 1
) ( 014
)=(
10 1111 14
)−1 ( 37
)= 119
( 14 −11−11 10
) (37
)= 119
( −3537
)x̂ ∈ R2 is closest to a solution to Ax = b in the sense thatAx̂ ∈ R3 is as close as possible to b: we are minimizingdistance in R3, not in R2
Should check using multivariable calc thatf (x1, x2) = (x1 + 2x2)2 + (3x1 + 3x2 − 1)2 + (x2 − 4)2 has anabsolute minimum at (x1, x2) = (−3519 ,
3719 )
-
5. Orthogonality 5.3. Least squares problems
Example
Find all the least-squares solutions x̂ when A =(
3 −61 −2−1 2
)and
b =( −2
14
)rank A = 1 < 2 and so ATA =
( 11 −22−22 44
)is non-invertible and
we are obliged to solve ATAx̂ = ATb directlyThis reads (
11 −22−22 44
)x̂ =
(−918
)whence x̂ =
( −9/110
)+ λ
(21
), where λ is any scalar
There is a one-parameter set of least-squares solutions
-
5. Orthogonality 5.3. Least squares problems
Best-fitting curves in Statistics
Least-squares solutions are often used in statistics when onewants to find a best fitting polynomial to a set of data pointsExample
Find the equation of the line y = α0 + α1twhich minimizes the sum of the squaresof the vertical distances to the data points(1, 3), (2, 6), and (3, 7)
Observe how the different choices of lineaffect the sum of the distances d21 + d
22 + d
23
-
5. Orthogonality 5.3. Least squares problems
Example (cont)
The sum of the squared errors, as a function of α0, α1, is
(y(1)− 3)2 + (y(2)− 6)2 + (y(3)− 7)2 =∣∣∣∣∣∣∣∣( α0+α1α0+2α1α0+3α1
)−(
367
)∣∣∣∣∣∣∣∣2=∣∣∣∣∣∣( 1 11 2
1 3
)( α0α1 )−
(367
)∣∣∣∣∣∣2 = ||Aααα− b||2Therefore ( α0α1 ) is the least-squares solution
( α0α1 ) = (ATA)−1ATb
=(
3 66 14
)−1 ( 1 1 11 2 3
) ( 367
)= 16
( 14 −6−6 3
) (1636
)=(
4/32
)We therefore get the line y = 43 + 2t
This is the “best-fitting least-squares” line tothe data
-
5. Orthogonality 5.3. Least squares problems
Best-fitting least-squares polynomials
Suppose {(ti, bi) : i = 0, . . . , n} is a set of data points wheret1, . . . , tn are distinct6
Question: If t is given, what do we expect b to be?
We look for a polynomial p(t) of degree k < n which minimizesthe squares of the errors in the dependent variable bp(t) is then a prediction of the value b if t is given
Example
Try plugging in the data “1 1; 2 2; 3 1; 4 3; 5 7; 6 2; 7 3;” to theapplet for degrees 1–5
6The ti are often time-values and the bi the values of some output at time ti
http://www.shodor.org/chemviz/tools/regressionjava/index.html
-
5. Orthogonality 5.3. Least squares problems
Let p(t) = α0 + α1t + · · ·+ αktk be a polynomial of degree k < n
The predictive error7 at t = ti is the distance |p(ti)− bi|Choose coefficients α0, . . . , αk to minimize the sum of thesquared errors
n
∑i=1
(p(ti)− bi)2
Sum squares of errors for three reasons:
1 Positive and negative errors are treated the same (bothpositive)
2 Large errors are penalized much more than small ones3 The calculations are much easier than other methods!
7If k = n then there is a unique polynomial through the n + 1 data points,so we have a formula b = f (t) and thus no predictive error for any ti
-
5. Orthogonality 5.3. Least squares problems
Since we have the coefficients for p(t), we can write( p(t1)...
p(tn)
)=
1 t1 t21 ··· tk11 t2 t22 ··· tk2...
...1 tn t2n ··· tkn
α0α1...
αk
=: Pa,defining the matrix P ∈ Rn×(k+1). Setting b =
( b1...bn
), we are
trying to minimizen
∑i=1
(p(ti)− bi)2 = ||Pa− b||2
This is a least squares problemMoreover, rank P = k + 1 is maximal iff the ti are distinct. Theunique least squares solution is therefore
â = (PTP)PTb,
which returns us the coefficients α0, . . . , αk of the best-fittingleast-squares polynomial of degree ≤ k
-
5. Orthogonality 5.3. Least squares problems
Example
Find the best-fitting line and quadratic to the datati 1 2 3 4bi 1 2 1 3
For the straight line we have P =( 1 1
1 21 31 4
)and b =
( 1213
), thus
â = (PTP)−1PTb =(
4 1010 30
)−1 ( 720
)= 12
(11
)hence p(t) = 12 (1 + t) is the best-fitting straight line
0
1
2
3y
0 1 2 3 4 5t
d1
d2d3
d4
y = 0.5t + 0.5∑i d2i = 1.5
-
5. Orthogonality 5.3. Least squares problems
Example (cont)
For the datati 1 2 3 4bi 1 2 1 3
the best-fitting quadratic requires
P =( 1 1 1
1 2 41 3 91 4 16
), thus
â = (PTP)−1PTb =( 4 10 30
10 30 10030 100 354
)−1 ( 72066
)= 14
( 7−31
)The best-fitting quadratic polynomial is thereforep(t) = 14 (7− 3t + t2)
Note that p fits the data bet-ter than the straight line, other-wise the best-fitting quadraticwould be a straight line!
0
1
2
3y
0 1 2 3 4 5t
d1
d2d3
d4
y = 0.25t2 − 0.75t + 1.75∑i d2i = 1.25
-
5. Orthogonality 5.4. Inner product spaces
5.4 Inner Product Spaces
Inner products generalize the scalar product on Rn
Definition 5.4.1An inner product ( , ) on a real vector space V is a function( , ) : V×V → R which satisfies the following axioms:
I (x, x) ≥ 0, ∀x ∈ V with equality iff x = 0II (x, y) = (y, x), ∀x, y ∈ V
III (αx + βy, z) = α(x, z) + β(y, z), ∀x, y, z ∈ V, ∀α, β ∈ R(V, ( , )) is an inner product space
( , ) is also called a positive definite (I), symmetric (II),(bi)linear8 (III) formIII says that each map Lz : V → R defined by Lz(x) = (x, z) islinear9
8Linear in both arguments9When dim V < ∞ it is a fact (beyond this course) that all linear maps
V → R are of the form Lz for some z ∈ V
-
5. Orthogonality 5.4. Inner product spaces
Inner Products on Rn
If w1, . . . , wn > 0, then
(x, y) :=n
∑i=1
wixiyi
is an inner product:10 the wi are called weightsIndeed if A ∈ Rn×n is any symmetric (AT = A), positive-definite(xTAx > 0, ∀x 6= 0) matrix, then
(x, y) := xTAy
is an inner product11 on Rn
Two examples on R3 are
(x, y) = xT
1 0 00 3 00 0 4
y (x, y) = xT3 0 00 1 1
0 1 2
y10If w1 = w + 2 = · · · = wn = 1 we get the standard scalar product11Check each of I, II, III
-
5. Orthogonality 5.4. Inner product spaces
Inner Products on PnThe standard basis {1, x, x2, . . . , xn−1} identifies Pn with Rn andwe can use any of the inner products on the previous slide, e.g.
(a1 + b1x + c1x2, a2 + b2x + c2x2) = a1a2 + b1b2 + c1c2 in P3Alternatively, let x1, . . . , xn be distinct real numbers and define
(p, q) :=n
∑i=1
p(xi)q(xi)
Conditions II and III clearly hold, but I needs a little work:
(p, p) =n
∑i=1
p(xi)2 = 0 ⇐⇒ p(xi) = 0, ∀i = 1, . . . , n
This says that p(x) has at least n distinct rootsHowever a polynomial of degree ≤ n− 1 has at most n− 1roots, unless it is identically zero: hence I holds
Can also have weights: if w(x) is a positive function
(p, q) :=n
∑i=1
w(xi)p(xi)q(xi) is an inner product
-
5. Orthogonality 5.4. Inner product spaces
Inner Products on C[a, b]
Undoubtedly the most important example for future courses isthe L2 inner product on C[a, b]
(f , g) :=∫ b
af (x)g(x)dx
We checkI (f , f ) =
∫ ba f (x)
2 dx ≥ 0 with equality iff f (x) ≡ 0(since f is continuous)
II (f , g) =∫ b
a f (x)g(x)dx =∫ b
a g(x)f (x)dx = (g, f )III (αf + βg, h)=
∫ ba (αf (x) + βg(x))h(x)dx
= α∫ b
a f (x)h(x)dx + β∫ b
a g(x)h(x)dx= α(f , h) + β(g, h)
Can similarly define a weighted inner product
(f , g) =∫ b
aw(x)f (x)g(x)dx
where w(x) is any positive function
-
5. Orthogonality 5.4. Inner product spaces
Basic Properties
Definition 5.4.2If (V, ( , )) is an inner product space then the norm or length of avector v ∈ V is
||v|| :=√(v, v)
v, w ∈ V are orthogonal iff (v, w) = 0
Observe: ||v|| = 0 ⇐⇒ (v, v) = 0 ⇐⇒ v = 0, by property ITheorem 5.4.3 (Pythagoras’)
If v, w are orthogonal then ||v + w||2 = ||v||2 + ||w||2
The proof is identical to that given in Rn
-
5. Orthogonality 5.4. Inner product spaces
Example
Find the norms and inner products of the three vectors{1, sin x, cos x} with respect to the L2 inner product on C[0, 2π]
||1|| =√∫ 2π
0 12 dx =
√2π
||sin x|| =√∫ 2π
0 sin2x dx =
√∫ 2π0
12 (1− cos 2x)dx =
√π
||cos x|| = √π(1, sin x) =
∫ 2π0 1 · sin x dx = 0
(1, cos x) = 0
(sin x, cos x) =∫ 2π
0 sin x cos x dx = 0
1, sin x, cos x are therefore orthogonal vectors in C[0, 2π]Dividing by the norms we see that
1√2π
, 1√π
sin x, 1√π
cos x
are orthonormal vectorsa in C[0, 2π]aImportant for Fourier Series
-
5. Orthogonality 5.4. Inner product spaces
Orthogonal Projections
Can define orthogonal projections exactly as in Rn
Definition 5.4.4If v 6= 0 in an inner product space (V, ( , )) then the orthogonalprojection of x ∈ V onto v is
πv(x) =(v, x)
||v||2v
In particular
(πv(x), x− πv(x)) =((v, x)
||v||2v, x− (v, x)
||v||2v
)
=(v, x)
||v||2
((v, x)− (v, x)
||v||2||v||2
)= 0
whence πv(x) and x− πv(x) are orthogonalv
x
πv(x)
x − πv(x)
-
5. Orthogonality 5.4. Inner product spaces
Example
Calculate the orthogonal projection of sin x ontoSpan{x} ≤ C[0, 2π] with the L2-inner product
πx(sin x) =(x, sin x)
||x||2x =
∫ 2π0 x sin x dx∫ 2π
0 x2 dx
x
=−x cos x
∣∣2π0 +
∫ 2π0 cos x dx
13 x
3∣∣2π0
x
=−2π83 π
3x =
−34π2
x
−1
0
1y
xπ 2ππ2
3π2
-
5. Orthogonality 5.4. Inner product spaces
Theorem 5.4.5 (Cauchy–Schwarz inequality)
|(v, w)| ≤ ||v|| ||w||, with equality iff v, w are parallel
Can’t rely on cosine rule like in Rn as currently have no notionof angle
Proof.Suppose v 6= 0, otherwise the Theorem is trivialπv(w) and w− πv(w) are orthogonal, so by Pythagoras’
||w||2 = ||πv(w)||2 + ||w− πv(w)||2 ≥ ||πv(w)||2 =(v, w)2
||v||2
Rearranging gives the Theorem: equality is clearly iffw = πv(w) and so iff v, w are parallel
-
5. Orthogonality 5.4. Inner product spaces
Angles in Inner Product Spaces
Cauchy–Schwarz allows us to define the notion of angle
Definition 5.4.6The angle θ between two non-zero vectors v, w in an innerproduct space is given by
cos θ =(v, w)||v|| ||w||
Can now check that the Cosine rule holds:
||v + w||2 = ||v||2 + ||w||2 − 2 ||v|| ||w|| cos θ
and, more painfully, that the Sine rule holds also!
-
5. Orthogonality 5.4. Inner product spaces
Norms
Definition 5.4.7A norm on a real vector space V is a function || || : V → Rwhich satisfies the following axioms:
I ||v|| ≥ 0, ∀v ∈ V, with equality iff v = 0II ||αv|| = |α| ||v|| , ∀α ∈ R, ∀v ∈ V
III ||v + w|| ≤ ||v||+ ||w|| , ∀v, w ∈ VWe call (V, || ||) a normed linear space
Condition III is the triangle inequality: the lengthof one side of a triangle is at most the sum of thelengths of the other two sides
v
wv + w
-
5. Orthogonality 5.4. Inner product spaces
Theorem 5.4.8
If (V, ( , )) is an inner product space, then ||v|| =√(v, v) is a norm
Proof.I is the identical condition for an inner productFor II, ||αv|| =
√(αv, αv) =
√α2(v, v) = |α| ||v||
For III we need the Cauchy–Schwarz inequality:
||v + w||2 = ||v||2 + 2(v, w) + ||w||2
≤ ||v||2 + 2 ||v|| ||w||+ ||w||2
= (||v||+ ||w||)2
-
5. Orthogonality 5.4. Inner product spaces
The p-norms
These generalize the standard norm on Rn
Definition 5.4.9Given p ≥ 1, the p-norm on Rn is the norm
||x||p :=(
n
∑i=1|xi|p
)1/pThe uniform or ∞-norm on Rn is the norm
||x||∞ := maxi=1,...,n |xi|
The 2-norm is the usual notion of length in Rn
Only the 2-norm comes from an inner product on Rn: anormed linear space in general has no idea of what theangle between vectors means, only their lengths
-
5. Orthogonality 5.4. Inner product spaces
The three most common norms are the 1-, 2-, and ∞-norms
Example
If x =( 1
3−1
)then
||x||1 = |1|+ |3|+ |−1| = 5
||x||2 =√
12 + 32 + (−1)2 =√
11
||x||∞ = max{|1| , |3| , |−1|} = 3
Note that ||x||1 ≥ ||x||2 ≥ ||x||∞: this is true in generalaaSee the homework. . .
-
5. Orthogonality 5.4. Inner product spaces
Lp norms on C[a, b] (non-examinable)
There are also analogues of the p-norms on function spaces
Definition 5.4.10On C[a, b], the Lp-norm (p ≥ 1) is given by
||f ||p :=(∫ b
a|f (x)|p dx
)1/pThe uniform or ∞-norm is defined by
||f ||∞ := maxx∈[a,b] |f (x)|
Again only the L2 norm comes from an inner product — in thiscase the L2 inner product defined earlier
-
5. Orthogonality 5.5. Orthonormal sets
5.5 Orthonormal sets
Definition 5.5.1v1, . . . , vn in an inner product space V are orthogonal iff(vi, vj) = 0, ∀i 6= jv1, . . . , vn are orthonormal iff
(vi, vj) = δij =
{1 if i = j0 if i 6= j
Can turn an orthogonal set into an orthonormal set by dividingby the norms:
{v1, . . . , vn} 7→{
v1||v1||
, . . . ,vn||vn||
}
Example
Recall 1√2π
, 1√π
sin x, 1√π
cos x are orthonormal in (C[0, 2π], L2)
-
5. Orthogonality 5.5. Orthonormal sets
Theorem 5.5.2An non-zero orthogonal set {v1, . . . , vn} is linearly independent
Proof.Suppose that α1v1 + · · ·+ αnvn = 0Then, for each i,
0 = (vi, 0) = (vi, α1v1 + · · ·+ αnvn)= α1(vi, v1) + · · ·+ αi(vi, vi) + · · ·+ αn(vi, vn)= αi ||vi||2
Since all αi = 0 we have linear independence
-
5. Orthogonality 5.5. Orthonormal sets
Calculating in orthonormal bases
Theorem 5.5.3Let U = {u1, . . . , un} be an orthonormal basis of an inner productspace (V, ( , )). Then:
1 v =n
∑i=1
(v, ui)ui, ∀v ∈ V: i.e. [v]U =(
(v,u1)...(v,un)
)
2
(n
∑i=1
aiui,n
∑i=1
biui
)=
n
∑i=1
aibi
3
∣∣∣∣∣∣∣∣∣∣ n∑i=1 ciui
∣∣∣∣∣∣∣∣∣∣2
=n
∑i=1
c2i (Parseval’s formula)
Everything12 works as if you are in Rn with the basis e1, . . . , en!
12Essentially. . .
-
5. Orthogonality 5.5. Orthonormal sets
Proof.Since {u1, . . . , un} is a basis there exist unique αi ∈ R such that
v = α1u1 + · · ·+ αnun∴ (v, ui) = αi
which proves 12 and 3 are straightforward by linearity from 1
With careful caveats, the above formulæ are valid whendim V = ∞: which leads to the example. . .
-
5. Orthogonality 5.5. Orthonormal sets
Theorem 5.5.4
In C[−π, π] with the scaled L2 inner product(f , g) = 1π
∫ π−π f (x)g(x)dx, the following infinite set is orthonormal:{
1√2
, sin x, cos x, sin 2x, cos 2x, . . .}
Proof.Just compute integrals: use identities such as2 sin nx sin mx = cos(n−m)x− cos(n + m)x, i.e.
(sin(nx), sin(mx)) =1π
∫ π−π
sin(nx) sin(mx)dx
=1
2π
∫ π−π
cos(n−m)x− cos(n + m)x dx
=1
2π
∫ π−π
cos(n−m)x dx = δmn
-
5. Orthogonality 5.5. Orthonormal sets
Parseval’s formula makes some calculations extremely easy
Example1√2, cos 2x are orthonormal with respect to previous inner
product and
cos2 x =12(1 + cos 2x) =
1√2· 1√
2+
12
cos 2x
Hence
1π
∫ π−π
cos4 x dx =∣∣∣∣cos2 x∣∣∣∣2 = ( 1√
2
)2+
(12
)2=
34
∴∫ π−π
cos4 x dx =3π4
-
5. Orthogonality 5.5. Orthonormal sets
Least squares approximations
Orthogonal projections Least squares approximations
Theorem 5.5.5Let u1, . . . , un be orthonormal in V and let S = Span(u1, . . . , un)Then the orthogonal projection πS : V → S onto S is
πS(v) =n
∑i=1
(v, ui)ui, ∀v ∈ V
Proof.πS is certainly linear (property III of the inner product)Moreover, for each i,
(v− πS(v), ui) = (v, ui)− (v, ui) = 0=⇒ v− πS(v) ∈ S⊥
∴ πS(v) + (v− πS(v)) is the unique decomposition of v intoS, S⊥ parts
-
5. Orthogonality 5.5. Orthonormal sets
Corollary 5.5.6
πS(v) is the closest element of S to v
The proof is exactly the same as Theorem 5.3.2
Definition 5.5.7Given v ∈ V we call πS(v) the least-squares approximation of vby S
Least squares approximations often used to findapproximations to complicated functions by simpler ones. . .
-
5. Orthogonality 5.5. Orthonormal sets
Example
1√2,√
32 x are orthonormal in (C[−1, 1], L2)
The least-squares approximation to f (x) = ex by a linearpolynomial on the interval [−1, 1] is therefore
ex ≈(
ex,1√2
)· 1√
2+
(ex,√
3√2
x
) √3√2
x
=12
∫ 1−1
ex dx +32
∫ 1−1
xex dx · x
=12(e− e−1) + 3e−1x
1
2
3y
−1 0 1x
Different interval different linear approximation. . .
-
5. Orthogonality 5.5. Orthonormal sets
Fourier Series
Least-squares mostly works for projection onto infinite sets
Recall: U ={
1√2, sin x, cos x, sin 2x, cos 2x, . . .
}is orthonormal
with respect to (f , g) = 1π∫ π−π f (x)g(x)dx
Definition 5.5.8Suppose f has period 2π. The Fourier Series of f is its orthogonalprojection onto SpanU
F (f )(x) =(
1√2
, f (x))
1√2+
∞
∑n=1
(sin nx, f (x)) sin nx
+∞
∑n=1
(cos nx, f (x)) cos nx
if the infinite sum convergesa
aBeyond this course
-
5. Orthogonality 5.5. Orthonormal sets
Example
Let f (x) = x on [−π, π] extended periodicallyThen
(1√2, x)= 0 = (cos nx, x) for all n, since x is odd
Moreover
(sin nx, x) =1π
∫ π−π
x sin nx dx = − 1nπ
[x cos nx]π−π =2n(−1)n+1
Thus
F (f )(x) =∞
∑n=1
2n(−1)n+1 sin nx = 2 sin x− sin 2x+ 2
3sin 3x− · · ·
-
5. Orthogonality 5.5. Orthonormal sets
Example
Similarly the Fourier series of f (x) = x2 on [−π, π] is
F (f )(x) = π2
3+ 4
∞
∑n=1
(−1)nn2
cos nx
=π2
3− 4 cos x + cos 2x− 4
9cos 3x + · · ·
-
5. Orthogonality 5.6. Gram–Schmidt Orthogonalization
5.6 Gram–Schmidt Orthogonalisation
Orthogonal13 bases are useful: how to find them?Answer: Use projections
Example
Let {x1, x2} be a basis of R2
Linear independence =⇒ x2 6= πx1(x2)Moreover
x2 − πx1(x2) ⊥ x1∴ {x1, x2−πx1(x2)} is an orthogonal basis of R2
We have orthogonalized the basis {x1, x2}x1
x2
πx1(x2)
x2 − πx1(x2)
The Gram–Schmidt algorithm does this in general, in any innerproduct space
13And orthonormal
-
5. Orthogonality 5.6. Gram–Schmidt Orthogonalization
Theorem 5.6.1 (Gram–Schmidt)
Let {x1, . . . , xn} be a basis of an inner product space (V, ( , ))Define vectors vi recursively by v1 = x1 and
vk+1 = xk+1 −k
∑i=1
πvi(xk+1)
Then {v1, . . . , vn} is an orthogonal basis of V
If desired, can easily form an orthonormal basis
{u1, . . . , un} ={
v1||v1||
, . . . ,vn||vn||
}
-
5. Orthogonality 5.6. Gram–Schmidt Orthogonalization
Proof.Fix k < n and suppose that {v1, . . . , vk} is an orthogonal basisof Span(x1, . . . , xk)
Observe:vk+1 = xk+1 −∑ki=1 πvi(xk+1) ∈ Span(x1, . . . , xk+1)If i ≤ k, then (vk+1, vi) = 0
Hence {v1, . . . , vk+1} is an orthogonal (hence linearlyindependent) spanning set of Span(x1, . . . , xk+1)
I.e. {v1, . . . , vk+1} is an orthogonal basis of Span(x1, . . . , xk+1)
The result follows by induction
-
5. Orthogonality 5.6. Gram–Schmidt Orthogonalization
Example
Orthonormalize the basis{(
101
),( 3
01
),(
021
)}of R3
Label the vectors in order x1, x2, x3 and apply Gram–Schmidt:
v1 = x1 =(
101
)v2 = x2 − πv1(x2) =
( 301
)−
((101
),( 3
01
))∣∣∣∣∣∣( 10
1
)∣∣∣∣∣∣2(
101
)=( 1
0−1
)v3 = x3 − πv1(x3)− πv2(x3) =
(021
)− 1
2
(101
)− −1
2
( 10−1
)=(
020
){v1, v2, v3} is an orthogonal basis, whence{
1√2
(101
),
1√2
( 10−1
),( 0
10
)}is an orthonormal basis
-
5. Orthogonality 5.6. Gram–Schmidt Orthogonalization
Gram–Schmidt is an algorithm which depends on the order ofthe inputs x1, . . . , xn
Example
The Gram–Schmidt orthonormalization of{(
021
),(
101
),( 3
01
)}is {
1√5
(021
),
13√
5
( 5−24
),
13
( 21−2
)}completely different from the previous example
-
5. Orthogonality 5.6. Gram–Schmidt Orthogonalization
Example
Find an orthonormal basis of Span(1, x, x2) in (C[−1, 1], L2)
v1 = 1 v2 = x−(1, x)
||1||2· 1 = x−
∫ 1−1 x dx∫ 1−1 1
2 dx= x
v3 = x2 −(1, x2)
||1||2· 1− (x, x
2)
||x||2· x
= x2 −∫ 1−1 x
2 dx∫ 1−1 1
2 dx−∫ 1−1 x
3 dx∫ 1−1 x
2 dxx = x2 − 1
3
To normalize, divide through by norms:{1√2
,
√32
x,
√458
(x2 − 1
3
)}
OrthogonalityThe Scalar Product in Euclidean SpaceOrthogonal subspacesLeast squares problemsInner product spacesOrthonormal setsGram–Schmidt Orthogonalization
0.0: 0.1: 0.2: 0.3: 0.4: 0.5: 0.6: 0.7: 0.8: 0.9: 0.10: 0.11: 0.12: 0.13: 0.14: 0.15: anm0: 1.0: anm1: 2.0: 2.1: 2.2: 2.3: 2.4: 2.5: 2.6: 2.7: 2.8: 2.9: anm2: 3.0: 3.1: 3.2: 3.3: 3.4: 3.5: 3.6: 3.7: 3.8: 3.9: anm3: