5.1 the scalar product in rn - 5. orthogonality 5.1. the scalar product in euclidean space angles...

5. Orthogonality 5.1. The Scalar Product in Euclidean Space

5.1 The Scalar Product in Rn

Thusfar we have restricted ourselves to vector spaces and theoperations of addition and scalar multiplication: how elsemight we combine vectors?

You should know about the scalar (dot) and cross products ofvectors in R3

The scalar product extends nicely to other vector spaces, whilethe cross product is another story1

The basic purpose of scalar products is to define and analyzethe lengths of and angles between vectors2

1But not for this class! You may see ‘wedge’ products in later classes. . .2It is important to note that everything in this chapter, until mentioned,

only applies to real vector spaces (where F = R)


Euclidean Space

Definition 5.1.1Suppose x, y ∈ Rn are written with respect to the standardbasis {e1, . . . , en}

1 The scalar product of x, y is the real numbera

(x, y) := xTy = x1y1 + x2y2 + · · ·+ xnyn2 x, y are orthogonal or perpendicular if (x, y) = 03 n-dimensional Euclidean Space Rn is the vector space of

column vectors Rn×1 together with the scalar product

aOther notations include x · y and 〈x, y〉

Euclidean Space is more than just a collection of co-ordinatesvectors: it implicitly comes with notions of angle and length3

Important Fact: (y, x) = yTx = (xTy)T = xTy = (x, y) so thescalar product is symmetric

3To be seen in R2 and R3 shortly


Angles and Lengths

Definition 5.1.2

The length of a vector x ∈ Rn is its norm ||x|| =√(x, x)

The distance between two vectors x, y is given by ||y− x||

Theorem 5.1.3The angle θ ∈ [0, π] between two vectorsx, y in R2 or R3 satisfies the equation

(x, y) = ||x|| ||y|| cos θ

y

xx

yy − x

(x1, x2)

(y1, y2)

θ

Definition 5.1.4We define the angle θ between x, y ∈ Rn to be the number

θ = cos−1(x, y)||x|| ||y||

θ is the smaller of the two possible angles, since cos−1 hasrange [0, π]


Proof of Theorem.

If x, y are parallel then θ = 0 or πand the Theorem is trivial

Otherwise, in R2 (or in the planeSpan(x, y) ≤ R3), the cosine ruleholds:

y

xx

yy − x

(x1, x2)

(y1, y2)

θ

||x− y||2 = ||x||2 + ||y||2 − 2 ||x|| ||y|| cos θApplying the definition of norm and scalar product, we obtain

2 ||x|| ||y|| cos θ = ||x||2 + ||y||2 − ||x− y||2

= xTx + yTy− (x− y)T(x− y)= xTx + yTy− (xTx + yTy− xTy− yTx)= xTy + yTx = 2(x, y)

as required


Basic results & inequalities

Several results that you will have used without thinking inelementary geometry follow directly from the definitions

Theorem 5.1.5 (Cauchy–Schwarz inequality)

If x, y are vectors in Rn then

|(x, y)| ≤ ||x|| ||y||

with equality iff x, y are parallel

Proof.

|(x, y)| =∣∣ ||x|| ||y|| cos θ∣∣ = ||x|| ||y|| |cos θ| ≤ ||x|| ||y||

Equality is satisfied precisely when cos θ = ±1: that is whenθ = 0, π, and so x, y are parallel


Theorem 5.1.6 (Triangle inequality)

If x, y ∈ Rn then ||x + y|| ≤ ||x||+ ||y||

I.e. Any side of a triangle is shorter than the sum of the others

Proof.

||x + y||2 = (x + y, x + y) = (x + y)T(x + y)= ||x||2 + 2(x, y) + ||y||2

≤ ||x||2 + 2 |(x, y)|+ ||y||2

≤ ||x||2 + 2 ||x|| ||y||+ ||y||2

= (||x||+ ||y||)2

y

xx

y

x + y

If (x, y) = 0, the second line in the proof of the triangleinequality immediately yields

Theorem 5.1.7 (Pythagoras’)

If x, y ∈ Rn are orthogonal then ||x + y||2 = ||x||2 + ||y||2


Example

Let x =( 1

2−1

)and y =

( −313

), then

||x|| =√(x, x) =

√1 + 4 + 1 =

√6

||y|| =√

9 + 1 + 9 =√

19

(x, y) = −3 + 2− 3 = −4

θ = cos−1(x, y)||x|| ||y|| = cos

−1 −4√6√

19≈ 1.955 rad ≈ 112◦

||y− x|| =∣∣∣∣∣∣( −4−1

4

)∣∣∣∣∣∣ = √33≤√

6 +√

19


Projections

Scalar products are useful for calculating how much of onevector points in the direction of another

Definition 5.1.8

The unit vector in the direction of v ∈ Rn is the vector 1||v||vThe scalar projection of x onto v 6= 0 in Rn isthe scalar product

αv(x) =(

1||v||v, x

)=

(v, x)||v||

The orthogonal (or vector) projection of x ontov 6= 0 in Rn is

πv(x) = αv(x)1||v||v =

(v, x)

||v||2v

v

x

πv(x)

Note: αv(x) 6= ||πv(x)||: if αv(x) < 0 then the projection of xonto v points in the opposite direction to v


Orthogonal projection means several things:

1 πv ∈ L(Rn) (πv is a linear map)2 πv(Rn) = Span(v) (Projection onto Span(v))3 πv(v) = v (Identity on Span(v))4 ker πv = v⊥ = {y ∈ Rn : (y, v) = 0} (Orthogonality)

1 , 2 , 3 say that πv is a projection

4 makes the projection orthogonal: any-thing orthogonal to v is mapped to zero

Similarly αv ∈ L(Rn, R)v

x

πv(x)

ker πv


The matrix of a projection

Since πv is a linear map, it has a standard matrix representationIndeed

πv(x) =(v, x)

||v||2v = v

(v, x)

||v||2= v

vTx

||v||2=

vvT

||v||2x

whence the matrix of πv is the n× n matrix vvT

||v||2

Example

In R2, orthogonal projection onto v = ( xy ) has matrix

Av =1

x2 + y2

(xy

)(x y) =

1x2 + y2

(x2 xyxy y2

)Projection onto the x-axis is therefore Ai =

(1 00 0

), while

projection onto the line y = x is Ai+j = 12(

1 11 1

)


Planes in R3

Projections are useful for describing planes4 in R3

Let P be the plane normal to n =( a

bc

)∈ R3 and which passes

through the point with position vector x0 =( x0

y0z0

)The distance d of the planefrom the origin is the scalarprojection of any vector inthe plane onto n: thus

d = αn(x0) =(x0, n)||n||

=ax0 + by0 + cz0√

a2 + b2 + c2

4And more general affine spaces in arbitrary dimension


The plane P is the set of points whose scalar projection onto n isd: otherwise said

P = {x : αn(x) = d = αn(x0)}However

αn(x) = αn(x0) ⇐⇒ αn(x− x0) = 0 ⇐⇒ (x− x0, n) = 0⇐⇒ a(x− x0) + b(y− y0) + c(z− z0) = 0

which is an alternative description of the plane

Example

If x0 =(

793

), and n =

( 031

), then P has equation

3(y− 9) + (z− 3) = 0 or 3y + z = 30

The distance to P is

d = αn(x0) =(x0, n)||n|| =

30√10


The distance of a vector y from Pis the scalar projection of y ontothe normal n with d subtracted:

dist(y, P) = αn(y)− d

=(n, y)||n|| − αn(x0)

=(n, y− x0)||n||

Example

If y =(

321

), x0 =

(793

), and n =

( 031

), then

dist(y, P) =−23√

10The negative sign means that y is ‘below’ P (on the oppositeside of P to the direction of n)

5. Orthogonality 5.2. Orthogonal subspaces

5.2 Orthogonal subspaces

Recall that x, y ∈ Rn are orthogonal if (x, y) = xTy = 0Definition 5.2.1Two subspaces U, V ≤ Rn are orthogonal, written U ⊥ V, iff

(u, v) = 0 for all u ∈ U, v ∈ VThe orthogonal complement to U in Rn is the subspace

U⊥ := {x ∈ Rn : (x, u) = 0, ∀u ∈ U}

E.g. A plane and its normal line intersectingat the origin are orthogonal complements


The previous example suggests the following

Lemma 5.2.21 U⊥ as defined really is a subspace of Rn

2 U ∩U⊥ = {0}

Proof.1 (0, u) = 0 for all u ∈ U, hence 0 ∈ U⊥ is non-empty

Now let u ∈ U, x, y ∈ U⊥, and α, β ∈ R, then

(αx, u) = α(x, u) = 0 =⇒ αx ∈ U⊥

(x + y, u) = (x, u) + (y, u) = 0 =⇒ x + y ∈ U⊥

hence U⊥ is closed under scalar multiplication andaddition and is therefore a subspace

2 Let x ∈ U ∩U⊥, then (x, x) = 0, whence x = 0


Examples

1 Suppose U = Span(u1, u2) = Span(( 1

3−2

),( −1

01

))≤ R3

U⊥ is spanned by all vectors orthogonal to u1 & u2Multiples of the cross-product u1 × u2 are the only suchvectors, whence

U⊥ = Span(( 1

3−2

)×( −1

01

))= Span

( 313

)In general the orthogonal complement U⊥ to a planeU ≤ R3 is spanned by the cross-product of any twospanning vectors in U: hence U⊥ is always a line


Examples

2 Suppose U = Span(u) = Span( −2

25

)Then (x, u) = 0 ⇐⇒ uTx = 0 ⇐⇒ (−2 2 5)x = 0,whence we find the nullspace:

U⊥ = N(−2 2 5) = Span((

110

),(

502

))In general, the orthogonal complement to a line U ≤ R3 isthe nullspace of a rank 1 matrix uT ∈ R1×3uT has nullity 3− 1 = 2 and so dim U = 2: hence U isalways a plane

We will see shortly that orthogonal complements are naturallythought of as nullspaces of particular matrices


Non-degeneracy

The scalar product is said to be non-degenerate in the sense that

(x, y) = 0, ∀y ∈ Rn =⇒ x = 0

Alternatively said, the only vector which is orthogonal toeverything is the zero-vector 0:

(Rn)⊥ = {0}

We can check this: if x is orthogonal to all y ∈ Rn, then

(x, ei) = 0 for every standard basis vectore1, . . . , en

But (x, ei) = xi =⇒ xi = 0 for all i and so x = 0Similarly {0}⊥ = Rn


Orthogonality and matrices

For a general matrix A, we consider how the N(A) and C(A)are related to orthogonalityFirst we need to see how matrix multiplication interacts withthe scalar product

Lemma 5.2.3

If x ∈ Rn, y ∈ Rm, and A ∈ Rm×n, then

(Ax, y) = (x, ATy)

Proof.

(Ax, y) = (Ax)Ty = xTATy = (x, ATy)

Note that the scalar product on the left is of vectors in Rm,while the product on the right is of vectors in Rn


Theorem 5.2.4 (Fundamental subspaces)

If A ∈ Rm×n thena

N(A) = C(AT)⊥ and N(AT) = C(A)⊥

aWarning: the book uses the strange notation R(A) = Range(A) for thecolumn space of A here, rather than our C(A)

Proof.Using the definition we see that

C(AT)⊥ = {x ∈ Rn : (x, z) = 0, ∀z ∈ C(AT)}= {x ∈ Rn : (x, ATy) = 0, ∀y ∈ Rm}= {x ∈ Rn : (Ax, y) = 0, ∀y ∈ Rm} (Lemma 5.2.3)= {x ∈ Rn :Ax = 0} (Non-degeneracy)= N(A)

The second formula comes from replacing A↔ AT


The Theorem tells us how to find the orthogonal complementto a general subspace U ≤ Rn:

1 Take a basis {u1, . . . , ur} of U2 Build the rank r matrix A ∈ Rn×r with columns u1, . . . , ur3 U = C(A) =⇒ U⊥ = N(AT)

Example

If U = Span(( 1

0−10

),( 5−201

))≤ R4, then

A =( 1 5

0 −2−1 00 1

)=⇒ AT =

( 1 0 −1 05 −2 0 1

)from which we find U⊥ as the nullspace

U⊥ = N(AT) = Span(( 0

102

),( 1

01−5

))


Theorem 5.2.51 If S ≤ Rn then dim S + dim S⊥ = n2 If B = {s1, . . . , sr} is a basis of S, then we may form a basis

C = {sr+1, . . . , sn} of S⊥ such that B∪ C is a basis of Rn

The Theorem clears up what we’ve already seen: e.g. theorthogonal complement to a line in R3 is always a plane, etc.

Proof.

Suppose S 6= {0}, otherwise S⊥ = Rn and the Theorem is trivial

Otherwise let A =( | |

s1 ··· sr| |

)∈ Rn×r

Since B is a basis we have S = C(A), whence Theorem 5.2.4yields

S⊥ = C(A)⊥ = N(AT)

The Rank–Nullity Theorem gives us 1 :

dim S⊥ = null AT = n− rank AT = n− r = n− dim S


Proof (cont).

Now choose a basis C = {sr+1, . . . , sn} of S⊥ and suppose thatα1s1 + · · ·+ αrsr︸︷︷︸

s∈S

+ αr+1sr+1 + · · ·+ αnsn︸︷︷︸s⊥∈S⊥

= 0

Lemma 5.2.2 =⇒ s = −s⊥ ∈ S∩ S⊥ = {0} =⇒ s = s⊥ = 0Since B, C are bases it follows that all αi = 0, whence s1, . . . , snare linearly independent

Since dim Rn = n we necessarily have a basis of Rn


Direct Sums of Subspaces

Definition 5.2.6Suppose that U, V are subspaces of WMoreover suppose that each w ∈ W can be written uniquely as asum

w = u + v

for some u ∈ U, v ∈ VThen W is the direct sum of U and V and we write W = U⊕VW = U⊕V is equivalent to both of the following holdingsimultaneously:

1 W = U + V; everything in W can be written as acombination u + v

2 U ∩V = {0}; the linear combination is unique


Orthogonal complements are always direct sums

Theorem 5.2.7

If S is a subspace of Rn then S⊕ S⊥ = Rn

Proof.

We must prove S∩ S⊥ = {0} and Rn = S + S⊥The first is Lemma 5.2.2, part 2

For the second we use Theorem 5.2.5 and the homework:

dim(S + S⊥) = dim S + dim S⊥ − dim(S∩ S⊥) = n

from which S + S⊥ = Rn

Thus S⊕ S⊥ = Rn


Theorem 5.2.8

If S is a subspace of Rn then (S⊥)⊥ = S

Proof.If s ∈ S then

(s, y) = 0 for all y ∈ S⊥

Thus s ∈ (S⊥)⊥ hence S ≤ (S⊥)⊥

Conversely, let z ∈ (S⊥)⊥Since Rn = S⊕ S⊥ there exist unique s ∈ S, s⊥ ∈ S⊥ such that

z = s + s⊥

Now take scalar products with s⊥:

0 = (z, s⊥) = (s, s⊥) + (s⊥, s⊥) =∣∣∣∣∣∣s⊥∣∣∣∣∣∣2 =⇒ s⊥ = 0

Hence z = s ∈ S and we have (S⊥)⊥ ≤ S

Putting both halves together gives the Theorem


The Fundamental Subspaces Theorem has a bearing onwhether linear systems have solutions

Corollary 5.2.9

Let A ∈ Rm×n and b ∈ Rm. Then exactly one of the following holds:1 There is a vector x ∈ Rn such that Ax = b, or2 There exists some y ∈ N(AT) ≤ Rm such that (y, b) 6= 0

The corollary is illustrated for m = n = 3, and rank A = 2: asuitable, but unnecessary, choice satisfying 2 is y = πN(AT)(b)

Proof.

N(AT) = C(A)⊥ =⇒ Rn = C(A)⊕N(AT)Write b = p + y according to the directsum, then (b, y) = |y|2This is zero iff b ∈ C(A) iff Ax = b has asolution

5. Orthogonality 5.3. Least squares problems

5.3 Least squares problems

In applications, one often has more equations than unknownsand cannot find a solution to all of them simultaneously: whatdo we do?

Idea: find a combination of variables that comes as close aspossible to solving all the equations

Many methods exist: depend on type of problem, definition of‘close as possible’, etc.5

We consider a method for approaching overdetermined linearsystems, first championed by Gauss

5Take a Numerical Analysis class for more!


Suppose Ax = b is an overdetermined system: i.e.A ∈ Rm×n with m > n (more rows than columns)b ∈ Rm is given

x =( x1...

xn

)∈ Rn is the column vector of variables

The picture from Corollary 5.2.9 gives us an approach:

In general b 6∈ C(A) and there is no solution

The closest we can get to a solution xwould be to choose x̂ so that Ax̂ is as closeas possible to b

Since Rn = C(A)⊕N(AT), we decomposeb = p + y and instead solve Ax̂ = p


Least Squares?

Suppose Ax = b is our m× n overdetermined systemAny vector x ∈ Rn creates a residual r(x) = Ax− b ∈ Rm: either

1 We can solve Ax = b and thus make r(x) = 0, or2 We want to minimize the residual; equivalent to

minimizing the length ||r(x)||

Definition 5.3.1If x̂ ∈ Rn is such that ||Ax̂− b|| ≤ ||Ax− b|| for all x ∈ Rn thenwe say that x̂ is a least squares solution to the system Ax = b

Minimizing ||r(x)|| is equivalent to minimizing ||r(x)||2, asum of squares: no square-roots!In general there will be many least squares solutions to agiven system: if x̂ is such, then x̂ + n is another for anyn ∈ N(A)


Theorem 5.3.2Let S ≤ Rm and b ∈ Rm, then:

1 There exists a unique p ∈ S which isclosest to b

2 p ∈ S is closest to b iff p− b ∈ S⊥

Proof.

Since Rm = S⊕ S⊥ we may write b = p + s⊥ for some p ∈ Sand s⊥ ∈ S⊥. Let s ∈ S, then

||b− s||2 = ||b− p + p− s||2

= ||b− p||2 + ||p− s||2 (Pythagoras’)≥ ||b− p||2

with equality iff p = s

The closest point in S to b is therefore the orthogonal projection ofb onto S


By Theorem 5.3.2, it follows that x̂ is a least squares solution toAx = b iff Ax̂ = p = πC(A)b

We don’t yet have a formula for calculating the orthogonalprojection πS for a general subspace S, but we can calculatewhen S is 1-dimensional

Example

Find the vector p ∈ S = Span(

132

)which is closest to b =

( −101

)We want the projection onto S = Span(s):

p = πS(b) =(s, b)

||s||2s =

114

(132

)


Unique Least Squares Solutions

We address the simplest situation of least squares solutions x̂ toAx = b: when the solution x̂ is unique

Theorem 5.3.3

If A ∈ Rm×n has rank A = n, then the equationsATAx = ATb

have a unique solution

x̂ = (ATA)−1ATb

which is the unique least squares solution to the system Ax = b

Proof.We must prove three things:

1 ATA is invertible2 x̂ = (ATA)−1ATb is a least squares solution to Ax = b3 x̂ is the only least squares solution


Proof (cont).1 Suppose that z ∈ Rn solves ATAz = 0

Then

Az ∈ N(AT) = C(A)⊥ (Fundamental Subspaces)But Az ∈ C(A), whenceAz ∈ C(A) ∩ C(A)⊥ = {0} =⇒ Az = 0To finish,

null A = n− rank A = 0from which Az = 0 has only the solution z = 0Hence ATAz = 0 =⇒ z = 0, whence ATA is invertible

2 x̂ = (ATA)−1ATb certainly solves ATAx = ATbHowever, for any y ∈ Rn,

(Ax̂− b, Ay) = (ATA(ATA)−1ATb−ATb, y) = 0hence Ax̂− b ∈ C(A)⊥x̂ is therefore a least squares solution to Ax = b


Proof (cont).3 Now suppose that ŷ is another least squares solution

ThenA(ŷ− x̂)︸︷︷︸∈C(A)

= Ax̂− b− (Aŷ− b)︸︷︷︸∈C(A)⊥

Since C(A) ∩ C(A)⊥ = {0} we have A(ŷ− x̂) = 0Since rank A = n we necessarily have ŷ− x̂ = 0 and so theleast squares solution is unique

Note how often the fact that rank A = n is required: theTheorem is false without it! Example to come. . .


General Orthogonal Projections (non-examinable)

Corollary 5.3.4

Suppose S ≤ Rm is a subspace with dim S = nLet A ∈ Rm×n be any matrixa with C(A) = SThen

πS = A(ATA)−1AT

is the orthogonal projection onto S

aNecessarily the columns of A form a basis of S

It is easy to see that if A = v is a column vector, then we recoverthe original definition of orthogonal projection onto a vector

πv = v(vTv)−1vT =1

||v||2vvT


Example

Find the unique least-squares solution to the system ofequations

x1 + 2x2 = 03x1 + 3x2 = 1

x2 = 4

We have Ax = b where A =(

1 23 30 1

)and b =

( 014

)Since rank A = 2, the Theorem says that the unique solution is

x̂ = (ATA)−1ATb =((

1 3 02 3 1

) ( 1 23 30 1

))−1 (1 3 02 3 1

) ( 014

)=(

10 1111 14

)−1 ( 37

)= 119

( 14 −11−11 10

) (37

)= 119

( −3537

)x̂ ∈ R2 is closest to a solution to Ax = b in the sense thatAx̂ ∈ R3 is as close as possible to b: we are minimizingdistance in R3, not in R2

Should check using multivariable calc thatf (x1, x2) = (x1 + 2x2)2 + (3x1 + 3x2 − 1)2 + (x2 − 4)2 has anabsolute minimum at (x1, x2) = (−3519 ,

3719 )


Example

Find all the least-squares solutions x̂ when A =(

3 −61 −2−1 2

)and

b =( −2

14

)rank A = 1 < 2 and so ATA =

( 11 −22−22 44

)is non-invertible and

we are obliged to solve ATAx̂ = ATb directlyThis reads (

11 −22−22 44

)x̂ =

(−918

)whence x̂ =

( −9/110

)+ λ

(21

), where λ is any scalar

There is a one-parameter set of least-squares solutions


Best-fitting curves in Statistics

Least-squares solutions are often used in statistics when onewants to find a best fitting polynomial to a set of data pointsExample

Find the equation of the line y = α0 + α1twhich minimizes the sum of the squaresof the vertical distances to the data points(1, 3), (2, 6), and (3, 7)

Observe how the different choices of lineaffect the sum of the distances d21 + d

22 + d

23


Example (cont)

The sum of the squared errors, as a function of α0, α1, is

(y(1)− 3)2 + (y(2)− 6)2 + (y(3)− 7)2 =∣∣∣∣∣∣∣∣( α0+α1α0+2α1α0+3α1

)−(

367

)∣∣∣∣∣∣∣∣2=∣∣∣∣∣∣( 1 11 2

1 3

)( α0α1 )−

(367

)∣∣∣∣∣∣2 = ||Aααα− b||2Therefore ( α0α1 ) is the least-squares solution

( α0α1 ) = (ATA)−1ATb

=(

3 66 14

)−1 ( 1 1 11 2 3

) ( 367

)= 16

( 14 −6−6 3

) (1636

)=(

4/32

)We therefore get the line y = 43 + 2t

This is the “best-fitting least-squares” line tothe data


Best-fitting least-squares polynomials

Suppose {(ti, bi) : i = 0, . . . , n} is a set of data points wheret1, . . . , tn are distinct6

Question: If t is given, what do we expect b to be?

We look for a polynomial p(t) of degree k < n which minimizesthe squares of the errors in the dependent variable bp(t) is then a prediction of the value b if t is given

Example

Try plugging in the data “1 1; 2 2; 3 1; 4 3; 5 7; 6 2; 7 3;” to theapplet for degrees 1–5

6The ti are often time-values and the bi the values of some output at time ti

http://www.shodor.org/chemviz/tools/regressionjava/index.html


Let p(t) = α0 + α1t + · · ·+ αktk be a polynomial of degree k < n

The predictive error7 at t = ti is the distance |p(ti)− bi|Choose coefficients α0, . . . , αk to minimize the sum of thesquared errors

n

∑i=1

(p(ti)− bi)2

Sum squares of errors for three reasons:

1 Positive and negative errors are treated the same (bothpositive)

2 Large errors are penalized much more than small ones3 The calculations are much easier than other methods!

7If k = n then there is a unique polynomial through the n + 1 data points,so we have a formula b = f (t) and thus no predictive error for any ti


Since we have the coefficients for p(t), we can write( p(t1)...

p(tn)

)=

1 t1 t21 ··· tk11 t2 t22 ··· tk2...

...1 tn t2n ··· tkn

α0α1...

αk

=: Pa,defining the matrix P ∈ Rn×(k+1). Setting b =

( b1...bn

), we are

trying to minimizen

∑i=1

(p(ti)− bi)2 = ||Pa− b||2

This is a least squares problemMoreover, rank P = k + 1 is maximal iff the ti are distinct. Theunique least squares solution is therefore

â = (PTP)PTb,

which returns us the coefficients α0, . . . , αk of the best-fittingleast-squares polynomial of degree ≤ k


Example

Find the best-fitting line and quadratic to the datati 1 2 3 4bi 1 2 1 3

For the straight line we have P =( 1 1

1 21 31 4

)and b =

( 1213

), thus

â = (PTP)−1PTb =(

4 1010 30

)−1 ( 720

)= 12

(11

)hence p(t) = 12 (1 + t) is the best-fitting straight line

0

1

2

3y

0 1 2 3 4 5t

d1

d2d3

d4

y = 0.5t + 0.5∑i d2i = 1.5


Example (cont)

For the datati 1 2 3 4bi 1 2 1 3

the best-fitting quadratic requires

P =( 1 1 1

1 2 41 3 91 4 16

), thus

â = (PTP)−1PTb =( 4 10 30

10 30 10030 100 354

)−1 ( 72066

)= 14

( 7−31

)The best-fitting quadratic polynomial is thereforep(t) = 14 (7− 3t + t2)

Note that p fits the data bet-ter than the straight line, other-wise the best-fitting quadraticwould be a straight line!

0

1

2

3y

0 1 2 3 4 5t

d1

d2d3

d4

y = 0.25t2 − 0.75t + 1.75∑i d2i = 1.25

5. Orthogonality 5.4. Inner product spaces

5.4 Inner Product Spaces

Inner products generalize the scalar product on Rn

Definition 5.4.1An inner product ( , ) on a real vector space V is a function( , ) : V×V → R which satisfies the following axioms:

I (x, x) ≥ 0, ∀x ∈ V with equality iff x = 0II (x, y) = (y, x), ∀x, y ∈ V

III (αx + βy, z) = α(x, z) + β(y, z), ∀x, y, z ∈ V, ∀α, β ∈ R(V, ( , )) is an inner product space

( , ) is also called a positive definite (I), symmetric (II),(bi)linear8 (III) formIII says that each map Lz : V → R defined by Lz(x) = (x, z) islinear9

8Linear in both arguments9When dim V < ∞ it is a fact (beyond this course) that all linear maps

V → R are of the form Lz for some z ∈ V


Inner Products on Rn

If w1, . . . , wn > 0, then

(x, y) :=n

∑i=1

wixiyi

is an inner product:10 the wi are called weightsIndeed if A ∈ Rn×n is any symmetric (AT = A), positive-definite(xTAx > 0, ∀x 6= 0) matrix, then

(x, y) := xTAy

is an inner product11 on Rn

Two examples on R3 are

(x, y) = xT

1 0 00 3 00 0 4

y (x, y) = xT3 0 00 1 1

0 1 2

y10If w1 = w + 2 = · · · = wn = 1 we get the standard scalar product11Check each of I, II, III


Inner Products on PnThe standard basis {1, x, x2, . . . , xn−1} identifies Pn with Rn andwe can use any of the inner products on the previous slide, e.g.

(a1 + b1x + c1x2, a2 + b2x + c2x2) = a1a2 + b1b2 + c1c2 in P3Alternatively, let x1, . . . , xn be distinct real numbers and define

(p, q) :=n

∑i=1

p(xi)q(xi)

Conditions II and III clearly hold, but I needs a little work:

(p, p) =n

∑i=1

p(xi)2 = 0 ⇐⇒ p(xi) = 0, ∀i = 1, . . . , n

This says that p(x) has at least n distinct rootsHowever a polynomial of degree ≤ n− 1 has at most n− 1roots, unless it is identically zero: hence I holds

Can also have weights: if w(x) is a positive function

(p, q) :=n

∑i=1

w(xi)p(xi)q(xi) is an inner product


Inner Products on C[a, b]

Undoubtedly the most important example for future courses isthe L2 inner product on C[a, b]

(f , g) :=∫ b

af (x)g(x)dx

We checkI (f , f ) =

∫ ba f (x)

2 dx ≥ 0 with equality iff f (x) ≡ 0(since f is continuous)

II (f , g) =∫ b

a f (x)g(x)dx =∫ b

a g(x)f (x)dx = (g, f )III (αf + βg, h)=

∫ ba (αf (x) + βg(x))h(x)dx

= α∫ b

a f (x)h(x)dx + β∫ b

a g(x)h(x)dx= α(f , h) + β(g, h)

Can similarly define a weighted inner product

(f , g) =∫ b

aw(x)f (x)g(x)dx

where w(x) is any positive function


Basic Properties

Definition 5.4.2If (V, ( , )) is an inner product space then the norm or length of avector v ∈ V is

||v|| :=√(v, v)

v, w ∈ V are orthogonal iff (v, w) = 0

Observe: ||v|| = 0 ⇐⇒ (v, v) = 0 ⇐⇒ v = 0, by property ITheorem 5.4.3 (Pythagoras’)

If v, w are orthogonal then ||v + w||2 = ||v||2 + ||w||2

The proof is identical to that given in Rn


Example

Find the norms and inner products of the three vectors{1, sin x, cos x} with respect to the L2 inner product on C[0, 2π]

||1|| =√∫ 2π

0 12 dx =

√2π

||sin x|| =√∫ 2π

0 sin2x dx =

√∫ 2π0

12 (1− cos 2x)dx =

√π

||cos x|| = √π(1, sin x) =

∫ 2π0 1 · sin x dx = 0

(1, cos x) = 0

(sin x, cos x) =∫ 2π

0 sin x cos x dx = 0

1, sin x, cos x are therefore orthogonal vectors in C[0, 2π]Dividing by the norms we see that

1√2π

, 1√π

sin x, 1√π

cos x

are orthonormal vectorsa in C[0, 2π]aImportant for Fourier Series


Orthogonal Projections

Can define orthogonal projections exactly as in Rn

Definition 5.4.4If v 6= 0 in an inner product space (V, ( , )) then the orthogonalprojection of x ∈ V onto v is

πv(x) =(v, x)

||v||2v

In particular

(πv(x), x− πv(x)) =((v, x)

||v||2v, x− (v, x)

||v||2v

)

=(v, x)

||v||2

((v, x)− (v, x)

||v||2||v||2

)= 0

whence πv(x) and x− πv(x) are orthogonalv

x

πv(x)

x − πv(x)


Example

Calculate the orthogonal projection of sin x ontoSpan{x} ≤ C[0, 2π] with the L2-inner product

πx(sin x) =(x, sin x)

||x||2x =

∫ 2π0 x sin x dx∫ 2π

0 x2 dx

x

=−x cos x

∣∣2π0 +

∫ 2π0 cos x dx

13 x

3∣∣2π0

x

=−2π83 π

3x =

−34π2

x

−1

0

1y

xπ 2ππ2

3π2


Theorem 5.4.5 (Cauchy–Schwarz inequality)

|(v, w)| ≤ ||v|| ||w||, with equality iff v, w are parallel

Can’t rely on cosine rule like in Rn as currently have no notionof angle

Proof.Suppose v 6= 0, otherwise the Theorem is trivialπv(w) and w− πv(w) are orthogonal, so by Pythagoras’

||w||2 = ||πv(w)||2 + ||w− πv(w)||2 ≥ ||πv(w)||2 =(v, w)2

||v||2

Rearranging gives the Theorem: equality is clearly iffw = πv(w) and so iff v, w are parallel


Angles in Inner Product Spaces

Cauchy–Schwarz allows us to define the notion of angle

Definition 5.4.6The angle θ between two non-zero vectors v, w in an innerproduct space is given by

cos θ =(v, w)||v|| ||w||

Can now check that the Cosine rule holds:

||v + w||2 = ||v||2 + ||w||2 − 2 ||v|| ||w|| cos θ

and, more painfully, that the Sine rule holds also!


Norms

Definition 5.4.7A norm on a real vector space V is a function || || : V → Rwhich satisfies the following axioms:

I ||v|| ≥ 0, ∀v ∈ V, with equality iff v = 0II ||αv|| = |α| ||v|| , ∀α ∈ R, ∀v ∈ V

III ||v + w|| ≤ ||v||+ ||w|| , ∀v, w ∈ VWe call (V, || ||) a normed linear space

Condition III is the triangle inequality: the lengthof one side of a triangle is at most the sum of thelengths of the other two sides

v

wv + w


Theorem 5.4.8

If (V, ( , )) is an inner product space, then ||v|| =√(v, v) is a norm

Proof.I is the identical condition for an inner productFor II, ||αv|| =

√(αv, αv) =

√α2(v, v) = |α| ||v||

For III we need the Cauchy–Schwarz inequality:

||v + w||2 = ||v||2 + 2(v, w) + ||w||2

≤ ||v||2 + 2 ||v|| ||w||+ ||w||2

= (||v||+ ||w||)2


The p-norms

These generalize the standard norm on Rn

Definition 5.4.9Given p ≥ 1, the p-norm on Rn is the norm

||x||p :=(

n

∑i=1|xi|p

)1/pThe uniform or ∞-norm on Rn is the norm

||x||∞ := maxi=1,...,n |xi|

The 2-norm is the usual notion of length in Rn

Only the 2-norm comes from an inner product on Rn: anormed linear space in general has no idea of what theangle between vectors means, only their lengths


The three most common norms are the 1-, 2-, and ∞-norms

Example

If x =( 1

3−1

)then

||x||1 = |1|+ |3|+ |−1| = 5

||x||2 =√

12 + 32 + (−1)2 =√

11

||x||∞ = max{|1| , |3| , |−1|} = 3

Note that ||x||1 ≥ ||x||2 ≥ ||x||∞: this is true in generalaaSee the homework. . .


Lp norms on C[a, b] (non-examinable)

There are also analogues of the p-norms on function spaces

Definition 5.4.10On C[a, b], the Lp-norm (p ≥ 1) is given by

||f ||p :=(∫ b

a|f (x)|p dx

)1/pThe uniform or ∞-norm is defined by

||f ||∞ := maxx∈[a,b] |f (x)|

Again only the L2 norm comes from an inner product — in thiscase the L2 inner product defined earlier

5. Orthogonality 5.5. Orthonormal sets

5.5 Orthonormal sets

Definition 5.5.1v1, . . . , vn in an inner product space V are orthogonal iff(vi, vj) = 0, ∀i 6= jv1, . . . , vn are orthonormal iff

(vi, vj) = δij =

{1 if i = j0 if i 6= j

Can turn an orthogonal set into an orthonormal set by dividingby the norms:

{v1, . . . , vn} 7→{

v1||v1||

, . . . ,vn||vn||

}

Example

Recall 1√2π

, 1√π

sin x, 1√π

cos x are orthonormal in (C[0, 2π], L2)


Theorem 5.5.2An non-zero orthogonal set {v1, . . . , vn} is linearly independent

Proof.Suppose that α1v1 + · · ·+ αnvn = 0Then, for each i,

0 = (vi, 0) = (vi, α1v1 + · · ·+ αnvn)= α1(vi, v1) + · · ·+ αi(vi, vi) + · · ·+ αn(vi, vn)= αi ||vi||2

Since all αi = 0 we have linear independence


Calculating in orthonormal bases

Theorem 5.5.3Let U = {u1, . . . , un} be an orthonormal basis of an inner productspace (V, ( , )). Then:

1 v =n

∑i=1

(v, ui)ui, ∀v ∈ V: i.e. [v]U =(

(v,u1)...(v,un)

)

2

(n

∑i=1

aiui,n

∑i=1

biui

)=

n

∑i=1

aibi

3

∣∣∣∣∣∣∣∣∣∣ n∑i=1 ciui

∣∣∣∣∣∣∣∣∣∣2

=n

∑i=1

c2i (Parseval’s formula)

Everything12 works as if you are in Rn with the basis e1, . . . , en!

12Essentially. . .


Proof.Since {u1, . . . , un} is a basis there exist unique αi ∈ R such that

v = α1u1 + · · ·+ αnun∴ (v, ui) = αi

which proves 12 and 3 are straightforward by linearity from 1

With careful caveats, the above formulæ are valid whendim V = ∞: which leads to the example. . .


Theorem 5.5.4

In C[−π, π] with the scaled L2 inner product(f , g) = 1π

∫ π−π f (x)g(x)dx, the following infinite set is orthonormal:{

1√2

, sin x, cos x, sin 2x, cos 2x, . . .}

Proof.Just compute integrals: use identities such as2 sin nx sin mx = cos(n−m)x− cos(n + m)x, i.e.

(sin(nx), sin(mx)) =1π

∫ π−π

sin(nx) sin(mx)dx

=1

2π

∫ π−π

cos(n−m)x− cos(n + m)x dx

=1

2π

∫ π−π

cos(n−m)x dx = δmn


Parseval’s formula makes some calculations extremely easy

Example1√2, cos 2x are orthonormal with respect to previous inner

product and

cos2 x =12(1 + cos 2x) =

1√2· 1√

2+

12

cos 2x

Hence

1π

∫ π−π

cos4 x dx =∣∣∣∣cos2 x∣∣∣∣2 = ( 1√

2

)2+

(12

)2=

34

∴∫ π−π

cos4 x dx =3π4


Least squares approximations

Orthogonal projections Least squares approximations

Theorem 5.5.5Let u1, . . . , un be orthonormal in V and let S = Span(u1, . . . , un)Then the orthogonal projection πS : V → S onto S is

πS(v) =n

∑i=1

(v, ui)ui, ∀v ∈ V

Proof.πS is certainly linear (property III of the inner product)Moreover, for each i,

(v− πS(v), ui) = (v, ui)− (v, ui) = 0=⇒ v− πS(v) ∈ S⊥

∴ πS(v) + (v− πS(v)) is the unique decomposition of v intoS, S⊥ parts


Corollary 5.5.6

πS(v) is the closest element of S to v

The proof is exactly the same as Theorem 5.3.2

Definition 5.5.7Given v ∈ V we call πS(v) the least-squares approximation of vby S

Least squares approximations often used to findapproximations to complicated functions by simpler ones. . .


Example

1√2,√

32 x are orthonormal in (C[−1, 1], L2)

The least-squares approximation to f (x) = ex by a linearpolynomial on the interval [−1, 1] is therefore

ex ≈(

ex,1√2

)· 1√

2+

(ex,√

3√2

x

) √3√2

x

=12

∫ 1−1

ex dx +32

∫ 1−1

xex dx · x

=12(e− e−1) + 3e−1x

1

2

3y

−1 0 1x

Different interval different linear approximation. . .


Fourier Series

Least-squares mostly works for projection onto infinite sets

Recall: U ={

1√2, sin x, cos x, sin 2x, cos 2x, . . .

}is orthonormal

with respect to (f , g) = 1π∫ π−π f (x)g(x)dx

Definition 5.5.8Suppose f has period 2π. The Fourier Series of f is its orthogonalprojection onto SpanU

F (f )(x) =(

1√2

, f (x))

1√2+

∞

∑n=1

(sin nx, f (x)) sin nx

+∞

∑n=1

(cos nx, f (x)) cos nx

if the infinite sum convergesa

aBeyond this course


Example

Let f (x) = x on [−π, π] extended periodicallyThen

(1√2, x)= 0 = (cos nx, x) for all n, since x is odd

Moreover

(sin nx, x) =1π

∫ π−π

x sin nx dx = − 1nπ

[x cos nx]π−π =2n(−1)n+1

Thus

F (f )(x) =∞

∑n=1

2n(−1)n+1 sin nx = 2 sin x− sin 2x+ 2

3sin 3x− · · ·


Example

Similarly the Fourier series of f (x) = x2 on [−π, π] is

F (f )(x) = π2

3+ 4

∞

∑n=1

(−1)nn2

cos nx

=π2

3− 4 cos x + cos 2x− 4

9cos 3x + · · ·

5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

5.6 Gram–Schmidt Orthogonalisation

Orthogonal13 bases are useful: how to find them?Answer: Use projections

Example

Let {x1, x2} be a basis of R2

Linear independence =⇒ x2 6= πx1(x2)Moreover

x2 − πx1(x2) ⊥ x1∴ {x1, x2−πx1(x2)} is an orthogonal basis of R2

We have orthogonalized the basis {x1, x2}x1

x2

πx1(x2)

x2 − πx1(x2)

The Gram–Schmidt algorithm does this in general, in any innerproduct space

13And orthonormal


Theorem 5.6.1 (Gram–Schmidt)

Let {x1, . . . , xn} be a basis of an inner product space (V, ( , ))Define vectors vi recursively by v1 = x1 and

vk+1 = xk+1 −k

∑i=1

πvi(xk+1)

Then {v1, . . . , vn} is an orthogonal basis of V

If desired, can easily form an orthonormal basis

{u1, . . . , un} ={

v1||v1||

, . . . ,vn||vn||

}


Proof.Fix k < n and suppose that {v1, . . . , vk} is an orthogonal basisof Span(x1, . . . , xk)

Observe:vk+1 = xk+1 −∑ki=1 πvi(xk+1) ∈ Span(x1, . . . , xk+1)If i ≤ k, then (vk+1, vi) = 0

Hence {v1, . . . , vk+1} is an orthogonal (hence linearlyindependent) spanning set of Span(x1, . . . , xk+1)

I.e. {v1, . . . , vk+1} is an orthogonal basis of Span(x1, . . . , xk+1)

The result follows by induction


Example

Orthonormalize the basis{(

101

),( 3

01

),(

021

)}of R3

Label the vectors in order x1, x2, x3 and apply Gram–Schmidt:

v1 = x1 =(

101

)v2 = x2 − πv1(x2) =

( 301

)−

((101

),( 3

01

))∣∣∣∣∣∣( 10

1

)∣∣∣∣∣∣2(

101

)=( 1

0−1

)v3 = x3 − πv1(x3)− πv2(x3) =

(021

)− 1

2

(101

)− −1

2

( 10−1

)=(

020

){v1, v2, v3} is an orthogonal basis, whence{

1√2

(101

),

1√2

( 10−1

),( 0

10

)}is an orthonormal basis


Gram–Schmidt is an algorithm which depends on the order ofthe inputs x1, . . . , xn

Example

The Gram–Schmidt orthonormalization of{(

021

),(

101

),( 3

01

)}is {

1√5

(021

),

13√

5

( 5−24

),

13

( 21−2

)}completely different from the previous example


Example

Find an orthonormal basis of Span(1, x, x2) in (C[−1, 1], L2)

v1 = 1 v2 = x−(1, x)

||1||2· 1 = x−

∫ 1−1 x dx∫ 1−1 1

2 dx= x

v3 = x2 −(1, x2)

||1||2· 1− (x, x

2)

||x||2· x

= x2 −∫ 1−1 x

2 dx∫ 1−1 1

2 dx−∫ 1−1 x

3 dx∫ 1−1 x

2 dxx = x2 − 1

3

To normalize, divide through by norms:{1√2

,

√32

x,

√458

(x2 − 1

3

)}

OrthogonalityThe Scalar Product in Euclidean SpaceOrthogonal subspacesLeast squares problemsInner product spacesOrthonormal setsGram–Schmidt Orthogonalization

0.0: 0.1: 0.2: 0.3: 0.4: 0.5: 0.6: 0.7: 0.8: 0.9: 0.10: 0.11: 0.12: 0.13: 0.14: 0.15: anm0: 1.0: anm1: 2.0: 2.1: 2.2: 2.3: 2.4: 2.5: 2.6: 2.7: 2.8: 2.9: anm2: 3.0: 3.1: 3.2: 3.3: 3.4: 3.5: 3.6: 3.7: 3.8: 3.9: anm3:

5.1 the scalar product in rn - 5. orthogonality 5.1. the scalar product in euclidean space angles...

Documents