5.1 the scalar product in rn - 5. orthogonality 5.1. the scalar product in euclidean space angles...

78
5. Orthogonality 5.1. The Scalar Product in Euclidean Space 5.1 The Scalar Product in R n Thusfar we have restricted ourselves to vector spaces and the operations of addition and scalar multiplication: how else might we combine vectors? You should know about the scalar (dot) and cross products of vectors in R 3 The scalar product extends nicely to other vector spaces, while the cross product is another story 1 The basic purpose of scalar products is to define and analyze the lengths of and angles between vectors 2 1 But not for this class! You may see ‘wedge’ products in later classes. . . 2 It is important to note that everything in this chapter, until mentioned, only applies to real vector spaces (where F = R)

Upload: others

Post on 11-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    5.1 The Scalar Product in Rn

    Thusfar we have restricted ourselves to vector spaces and theoperations of addition and scalar multiplication: how elsemight we combine vectors?

    You should know about the scalar (dot) and cross products ofvectors in R3

    The scalar product extends nicely to other vector spaces, whilethe cross product is another story1

    The basic purpose of scalar products is to define and analyzethe lengths of and angles between vectors2

    1But not for this class! You may see ‘wedge’ products in later classes. . .2It is important to note that everything in this chapter, until mentioned,

    only applies to real vector spaces (where F = R)

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Euclidean Space

    Definition 5.1.1Suppose x, y ∈ Rn are written with respect to the standardbasis {e1, . . . , en}

    1 The scalar product of x, y is the real numbera

    (x, y) := xTy = x1y1 + x2y2 + · · ·+ xnyn2 x, y are orthogonal or perpendicular if (x, y) = 03 n-dimensional Euclidean Space Rn is the vector space of

    column vectors Rn×1 together with the scalar product

    aOther notations include x · y and 〈x, y〉

    Euclidean Space is more than just a collection of co-ordinatesvectors: it implicitly comes with notions of angle and length3

    Important Fact: (y, x) = yTx = (xTy)T = xTy = (x, y) so thescalar product is symmetric

    3To be seen in R2 and R3 shortly

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Angles and Lengths

    Definition 5.1.2

    The length of a vector x ∈ Rn is its norm ||x|| =√(x, x)

    The distance between two vectors x, y is given by ||y− x||

    Theorem 5.1.3The angle θ ∈ [0, π] between two vectorsx, y in R2 or R3 satisfies the equation

    (x, y) = ||x|| ||y|| cos θ

    y

    xx

    yy − x

    (x1, x2)

    (y1, y2)

    θ

    Definition 5.1.4We define the angle θ between x, y ∈ Rn to be the number

    θ = cos−1(x, y)||x|| ||y||

    θ is the smaller of the two possible angles, since cos−1 hasrange [0, π]

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Proof of Theorem.

    If x, y are parallel then θ = 0 or πand the Theorem is trivial

    Otherwise, in R2 (or in the planeSpan(x, y) ≤ R3), the cosine ruleholds:

    y

    xx

    yy − x

    (x1, x2)

    (y1, y2)

    θ

    ||x− y||2 = ||x||2 + ||y||2 − 2 ||x|| ||y|| cos θApplying the definition of norm and scalar product, we obtain

    2 ||x|| ||y|| cos θ = ||x||2 + ||y||2 − ||x− y||2

    = xTx + yTy− (x− y)T(x− y)= xTx + yTy− (xTx + yTy− xTy− yTx)= xTy + yTx = 2(x, y)

    as required

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Basic results & inequalities

    Several results that you will have used without thinking inelementary geometry follow directly from the definitions

    Theorem 5.1.5 (Cauchy–Schwarz inequality)

    If x, y are vectors in Rn then

    |(x, y)| ≤ ||x|| ||y||

    with equality iff x, y are parallel

    Proof.

    |(x, y)| =∣∣ ||x|| ||y|| cos θ∣∣ = ||x|| ||y|| |cos θ| ≤ ||x|| ||y||

    Equality is satisfied precisely when cos θ = ±1: that is whenθ = 0, π, and so x, y are parallel

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Theorem 5.1.6 (Triangle inequality)

    If x, y ∈ Rn then ||x + y|| ≤ ||x||+ ||y||

    I.e. Any side of a triangle is shorter than the sum of the others

    Proof.

    ||x + y||2 = (x + y, x + y) = (x + y)T(x + y)= ||x||2 + 2(x, y) + ||y||2

    ≤ ||x||2 + 2 |(x, y)|+ ||y||2

    ≤ ||x||2 + 2 ||x|| ||y||+ ||y||2

    = (||x||+ ||y||)2

    y

    xx

    y

    x + y

    If (x, y) = 0, the second line in the proof of the triangleinequality immediately yields

    Theorem 5.1.7 (Pythagoras’)

    If x, y ∈ Rn are orthogonal then ||x + y||2 = ||x||2 + ||y||2

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Example

    Let x =( 1

    2−1

    )and y =

    ( −313

    ), then

    ||x|| =√(x, x) =

    √1 + 4 + 1 =

    √6

    ||y|| =√

    9 + 1 + 9 =√

    19

    (x, y) = −3 + 2− 3 = −4

    θ = cos−1(x, y)||x|| ||y|| = cos

    −1 −4√6√

    19≈ 1.955 rad ≈ 112◦

    ||y− x|| =∣∣∣∣∣∣( −4−1

    4

    )∣∣∣∣∣∣ = √33≤√

    6 +√

    19

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Projections

    Scalar products are useful for calculating how much of onevector points in the direction of another

    Definition 5.1.8

    The unit vector in the direction of v ∈ Rn is the vector 1||v||vThe scalar projection of x onto v 6= 0 in Rn isthe scalar product

    αv(x) =(

    1||v||v, x

    )=

    (v, x)||v||

    The orthogonal (or vector) projection of x ontov 6= 0 in Rn is

    πv(x) = αv(x)1||v||v =

    (v, x)

    ||v||2v

    v

    x

    πv(x)

    Note: αv(x) 6= ||πv(x)||: if αv(x) < 0 then the projection of xonto v points in the opposite direction to v

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Orthogonal projection means several things:

    1 πv ∈ L(Rn) (πv is a linear map)2 πv(Rn) = Span(v) (Projection onto Span(v))3 πv(v) = v (Identity on Span(v))4 ker πv = v⊥ = {y ∈ Rn : (y, v) = 0} (Orthogonality)

    1 , 2 , 3 say that πv is a projection

    4 makes the projection orthogonal: any-thing orthogonal to v is mapped to zero

    Similarly αv ∈ L(Rn, R)v

    x

    πv(x)

    ker πv

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    The matrix of a projection

    Since πv is a linear map, it has a standard matrix representationIndeed

    πv(x) =(v, x)

    ||v||2v = v

    (v, x)

    ||v||2= v

    vTx

    ||v||2=

    vvT

    ||v||2x

    whence the matrix of πv is the n× n matrix vvT

    ||v||2

    Example

    In R2, orthogonal projection onto v = ( xy ) has matrix

    Av =1

    x2 + y2

    (xy

    )(x y) =

    1x2 + y2

    (x2 xyxy y2

    )Projection onto the x-axis is therefore Ai =

    (1 00 0

    ), while

    projection onto the line y = x is Ai+j = 12(

    1 11 1

    )

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    Planes in R3

    Projections are useful for describing planes4 in R3

    Let P be the plane normal to n =( a

    bc

    )∈ R3 and which passes

    through the point with position vector x0 =( x0

    y0z0

    )The distance d of the planefrom the origin is the scalarprojection of any vector inthe plane onto n: thus

    d = αn(x0) =(x0, n)||n||

    =ax0 + by0 + cz0√

    a2 + b2 + c2

    4And more general affine spaces in arbitrary dimension

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    The plane P is the set of points whose scalar projection onto n isd: otherwise said

    P = {x : αn(x) = d = αn(x0)}However

    αn(x) = αn(x0) ⇐⇒ αn(x− x0) = 0 ⇐⇒ (x− x0, n) = 0⇐⇒ a(x− x0) + b(y− y0) + c(z− z0) = 0

    which is an alternative description of the plane

    Example

    If x0 =(

    793

    ), and n =

    ( 031

    ), then P has equation

    3(y− 9) + (z− 3) = 0 or 3y + z = 30

    The distance to P is

    d = αn(x0) =(x0, n)||n|| =

    30√10

  • 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

    The distance of a vector y from Pis the scalar projection of y ontothe normal n with d subtracted:

    dist(y, P) = αn(y)− d

    =(n, y)||n|| − αn(x0)

    =(n, y− x0)||n||

    Example

    If y =(

    321

    ), x0 =

    (793

    ), and n =

    ( 031

    ), then

    dist(y, P) =−23√

    10The negative sign means that y is ‘below’ P (on the oppositeside of P to the direction of n)

  • 5. Orthogonality 5.2. Orthogonal subspaces

    5.2 Orthogonal subspaces

    Recall that x, y ∈ Rn are orthogonal if (x, y) = xTy = 0Definition 5.2.1Two subspaces U, V ≤ Rn are orthogonal, written U ⊥ V, iff

    (u, v) = 0 for all u ∈ U, v ∈ VThe orthogonal complement to U in Rn is the subspace

    U⊥ := {x ∈ Rn : (x, u) = 0, ∀u ∈ U}

    E.g. A plane and its normal line intersectingat the origin are orthogonal complements

  • 5. Orthogonality 5.2. Orthogonal subspaces

    The previous example suggests the following

    Lemma 5.2.21 U⊥ as defined really is a subspace of Rn

    2 U ∩U⊥ = {0}

    Proof.1 (0, u) = 0 for all u ∈ U, hence 0 ∈ U⊥ is non-empty

    Now let u ∈ U, x, y ∈ U⊥, and α, β ∈ R, then

    (αx, u) = α(x, u) = 0 =⇒ αx ∈ U⊥

    (x + y, u) = (x, u) + (y, u) = 0 =⇒ x + y ∈ U⊥

    hence U⊥ is closed under scalar multiplication andaddition and is therefore a subspace

    2 Let x ∈ U ∩U⊥, then (x, x) = 0, whence x = 0

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Examples

    1 Suppose U = Span(u1, u2) = Span(( 1

    3−2

    ),( −1

    01

    ))≤ R3

    U⊥ is spanned by all vectors orthogonal to u1 & u2Multiples of the cross-product u1 × u2 are the only suchvectors, whence

    U⊥ = Span(( 1

    3−2

    )×( −1

    01

    ))= Span

    ( 313

    )In general the orthogonal complement U⊥ to a planeU ≤ R3 is spanned by the cross-product of any twospanning vectors in U: hence U⊥ is always a line

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Examples

    2 Suppose U = Span(u) = Span( −2

    25

    )Then (x, u) = 0 ⇐⇒ uTx = 0 ⇐⇒ (−2 2 5)x = 0,whence we find the nullspace:

    U⊥ = N(−2 2 5) = Span((

    110

    ),(

    502

    ))In general, the orthogonal complement to a line U ≤ R3 isthe nullspace of a rank 1 matrix uT ∈ R1×3uT has nullity 3− 1 = 2 and so dim U = 2: hence U isalways a plane

    We will see shortly that orthogonal complements are naturallythought of as nullspaces of particular matrices

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Non-degeneracy

    The scalar product is said to be non-degenerate in the sense that

    (x, y) = 0, ∀y ∈ Rn =⇒ x = 0

    Alternatively said, the only vector which is orthogonal toeverything is the zero-vector 0:

    (Rn)⊥ = {0}

    We can check this: if x is orthogonal to all y ∈ Rn, then

    (x, ei) = 0 for every standard basis vectore1, . . . , en

    But (x, ei) = xi =⇒ xi = 0 for all i and so x = 0Similarly {0}⊥ = Rn

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Orthogonality and matrices

    For a general matrix A, we consider how the N(A) and C(A)are related to orthogonalityFirst we need to see how matrix multiplication interacts withthe scalar product

    Lemma 5.2.3

    If x ∈ Rn, y ∈ Rm, and A ∈ Rm×n, then

    (Ax, y) = (x, ATy)

    Proof.

    (Ax, y) = (Ax)Ty = xTATy = (x, ATy)

    Note that the scalar product on the left is of vectors in Rm,while the product on the right is of vectors in Rn

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Theorem 5.2.4 (Fundamental subspaces)

    If A ∈ Rm×n thena

    N(A) = C(AT)⊥ and N(AT) = C(A)⊥

    aWarning: the book uses the strange notation R(A) = Range(A) for thecolumn space of A here, rather than our C(A)

    Proof.Using the definition we see that

    C(AT)⊥ = {x ∈ Rn : (x, z) = 0, ∀z ∈ C(AT)}= {x ∈ Rn : (x, ATy) = 0, ∀y ∈ Rm}= {x ∈ Rn : (Ax, y) = 0, ∀y ∈ Rm} (Lemma 5.2.3)= {x ∈ Rn :Ax = 0} (Non-degeneracy)= N(A)

    The second formula comes from replacing A↔ AT

  • 5. Orthogonality 5.2. Orthogonal subspaces

    The Theorem tells us how to find the orthogonal complementto a general subspace U ≤ Rn:

    1 Take a basis {u1, . . . , ur} of U2 Build the rank r matrix A ∈ Rn×r with columns u1, . . . , ur3 U = C(A) =⇒ U⊥ = N(AT)

    Example

    If U = Span(( 1

    0−10

    ),( 5−201

    ))≤ R4, then

    A =( 1 5

    0 −2−1 00 1

    )=⇒ AT =

    ( 1 0 −1 05 −2 0 1

    )from which we find U⊥ as the nullspace

    U⊥ = N(AT) = Span(( 0

    102

    ),( 1

    01−5

    ))

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Theorem 5.2.51 If S ≤ Rn then dim S + dim S⊥ = n2 If B = {s1, . . . , sr} is a basis of S, then we may form a basis

    C = {sr+1, . . . , sn} of S⊥ such that B∪ C is a basis of Rn

    The Theorem clears up what we’ve already seen: e.g. theorthogonal complement to a line in R3 is always a plane, etc.

    Proof.

    Suppose S 6= {0}, otherwise S⊥ = Rn and the Theorem is trivial

    Otherwise let A =( | |

    s1 ··· sr| |

    )∈ Rn×r

    Since B is a basis we have S = C(A), whence Theorem 5.2.4yields

    S⊥ = C(A)⊥ = N(AT)

    The Rank–Nullity Theorem gives us 1 :

    dim S⊥ = null AT = n− rank AT = n− r = n− dim S

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Proof (cont).

    Now choose a basis C = {sr+1, . . . , sn} of S⊥ and suppose thatα1s1 + · · ·+ αrsr︸ ︷︷ ︸

    s∈S

    + αr+1sr+1 + · · ·+ αnsn︸ ︷︷ ︸s⊥∈S⊥

    = 0

    Lemma 5.2.2 =⇒ s = −s⊥ ∈ S∩ S⊥ = {0} =⇒ s = s⊥ = 0Since B, C are bases it follows that all αi = 0, whence s1, . . . , snare linearly independent

    Since dim Rn = n we necessarily have a basis of Rn

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Direct Sums of Subspaces

    Definition 5.2.6Suppose that U, V are subspaces of WMoreover suppose that each w ∈ W can be written uniquely as asum

    w = u + v

    for some u ∈ U, v ∈ VThen W is the direct sum of U and V and we write W = U⊕VW = U⊕V is equivalent to both of the following holdingsimultaneously:

    1 W = U + V; everything in W can be written as acombination u + v

    2 U ∩V = {0}; the linear combination is unique

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Orthogonal complements are always direct sums

    Theorem 5.2.7

    If S is a subspace of Rn then S⊕ S⊥ = Rn

    Proof.

    We must prove S∩ S⊥ = {0} and Rn = S + S⊥The first is Lemma 5.2.2, part 2

    For the second we use Theorem 5.2.5 and the homework:

    dim(S + S⊥) = dim S + dim S⊥ − dim(S∩ S⊥) = n

    from which S + S⊥ = Rn

    Thus S⊕ S⊥ = Rn

  • 5. Orthogonality 5.2. Orthogonal subspaces

    Theorem 5.2.8

    If S is a subspace of Rn then (S⊥)⊥ = S

    Proof.If s ∈ S then

    (s, y) = 0 for all y ∈ S⊥

    Thus s ∈ (S⊥)⊥ hence S ≤ (S⊥)⊥

    Conversely, let z ∈ (S⊥)⊥Since Rn = S⊕ S⊥ there exist unique s ∈ S, s⊥ ∈ S⊥ such that

    z = s + s⊥

    Now take scalar products with s⊥:

    0 = (z, s⊥) = (s, s⊥) + (s⊥, s⊥) =∣∣∣∣∣∣s⊥∣∣∣∣∣∣2 =⇒ s⊥ = 0

    Hence z = s ∈ S and we have (S⊥)⊥ ≤ S

    Putting both halves together gives the Theorem

  • 5. Orthogonality 5.2. Orthogonal subspaces

    The Fundamental Subspaces Theorem has a bearing onwhether linear systems have solutions

    Corollary 5.2.9

    Let A ∈ Rm×n and b ∈ Rm. Then exactly one of the following holds:1 There is a vector x ∈ Rn such that Ax = b, or2 There exists some y ∈ N(AT) ≤ Rm such that (y, b) 6= 0

    The corollary is illustrated for m = n = 3, and rank A = 2: asuitable, but unnecessary, choice satisfying 2 is y = πN(AT)(b)

    Proof.

    N(AT) = C(A)⊥ =⇒ Rn = C(A)⊕N(AT)Write b = p + y according to the directsum, then (b, y) = |y|2This is zero iff b ∈ C(A) iff Ax = b has asolution

  • 5. Orthogonality 5.3. Least squares problems

    5.3 Least squares problems

    In applications, one often has more equations than unknownsand cannot find a solution to all of them simultaneously: whatdo we do?

    Idea: find a combination of variables that comes as close aspossible to solving all the equations

    Many methods exist: depend on type of problem, definition of‘close as possible’, etc.5

    We consider a method for approaching overdetermined linearsystems, first championed by Gauss

    5Take a Numerical Analysis class for more!

  • 5. Orthogonality 5.3. Least squares problems

    Suppose Ax = b is an overdetermined system: i.e.A ∈ Rm×n with m > n (more rows than columns)b ∈ Rm is given

    x =( x1...

    xn

    )∈ Rn is the column vector of variables

    The picture from Corollary 5.2.9 gives us an approach:

    In general b 6∈ C(A) and there is no solution

    The closest we can get to a solution xwould be to choose x̂ so that Ax̂ is as closeas possible to b

    Since Rn = C(A)⊕N(AT), we decomposeb = p + y and instead solve Ax̂ = p

  • 5. Orthogonality 5.3. Least squares problems

    Least Squares?

    Suppose Ax = b is our m× n overdetermined systemAny vector x ∈ Rn creates a residual r(x) = Ax− b ∈ Rm: either

    1 We can solve Ax = b and thus make r(x) = 0, or2 We want to minimize the residual; equivalent to

    minimizing the length ||r(x)||

    Definition 5.3.1If x̂ ∈ Rn is such that ||Ax̂− b|| ≤ ||Ax− b|| for all x ∈ Rn thenwe say that x̂ is a least squares solution to the system Ax = b

    Minimizing ||r(x)|| is equivalent to minimizing ||r(x)||2, asum of squares: no square-roots!In general there will be many least squares solutions to agiven system: if x̂ is such, then x̂ + n is another for anyn ∈ N(A)

  • 5. Orthogonality 5.3. Least squares problems

    Theorem 5.3.2Let S ≤ Rm and b ∈ Rm, then:

    1 There exists a unique p ∈ S which isclosest to b

    2 p ∈ S is closest to b iff p− b ∈ S⊥

    Proof.

    Since Rm = S⊕ S⊥ we may write b = p + s⊥ for some p ∈ Sand s⊥ ∈ S⊥. Let s ∈ S, then

    ||b− s||2 = ||b− p + p− s||2

    = ||b− p||2 + ||p− s||2 (Pythagoras’)≥ ||b− p||2

    with equality iff p = s

    The closest point in S to b is therefore the orthogonal projection ofb onto S

  • 5. Orthogonality 5.3. Least squares problems

    By Theorem 5.3.2, it follows that x̂ is a least squares solution toAx = b iff Ax̂ = p = πC(A)b

    We don’t yet have a formula for calculating the orthogonalprojection πS for a general subspace S, but we can calculatewhen S is 1-dimensional

    Example

    Find the vector p ∈ S = Span(

    132

    )which is closest to b =

    ( −101

    )We want the projection onto S = Span(s):

    p = πS(b) =(s, b)

    ||s||2s =

    114

    (132

    )

  • 5. Orthogonality 5.3. Least squares problems

    Unique Least Squares Solutions

    We address the simplest situation of least squares solutions x̂ toAx = b: when the solution x̂ is unique

    Theorem 5.3.3

    If A ∈ Rm×n has rank A = n, then the equationsATAx = ATb

    have a unique solution

    x̂ = (ATA)−1ATb

    which is the unique least squares solution to the system Ax = b

    Proof.We must prove three things:

    1 ATA is invertible2 x̂ = (ATA)−1ATb is a least squares solution to Ax = b3 x̂ is the only least squares solution

  • 5. Orthogonality 5.3. Least squares problems

    Proof (cont).1 Suppose that z ∈ Rn solves ATAz = 0

    Then

    Az ∈ N(AT) = C(A)⊥ (Fundamental Subspaces)But Az ∈ C(A), whenceAz ∈ C(A) ∩ C(A)⊥ = {0} =⇒ Az = 0To finish,

    null A = n− rank A = 0from which Az = 0 has only the solution z = 0Hence ATAz = 0 =⇒ z = 0, whence ATA is invertible

    2 x̂ = (ATA)−1ATb certainly solves ATAx = ATbHowever, for any y ∈ Rn,

    (Ax̂− b, Ay) = (ATA(ATA)−1ATb−ATb, y) = 0hence Ax̂− b ∈ C(A)⊥x̂ is therefore a least squares solution to Ax = b

  • 5. Orthogonality 5.3. Least squares problems

    Proof (cont).3 Now suppose that ŷ is another least squares solution

    ThenA(ŷ− x̂)︸ ︷︷ ︸∈C(A)

    = Ax̂− b− (Aŷ− b)︸ ︷︷ ︸∈C(A)⊥

    Since C(A) ∩ C(A)⊥ = {0} we have A(ŷ− x̂) = 0Since rank A = n we necessarily have ŷ− x̂ = 0 and so theleast squares solution is unique

    Note how often the fact that rank A = n is required: theTheorem is false without it! Example to come. . .

  • 5. Orthogonality 5.3. Least squares problems

    General Orthogonal Projections (non-examinable)

    Corollary 5.3.4

    Suppose S ≤ Rm is a subspace with dim S = nLet A ∈ Rm×n be any matrixa with C(A) = SThen

    πS = A(ATA)−1AT

    is the orthogonal projection onto S

    aNecessarily the columns of A form a basis of S

    It is easy to see that if A = v is a column vector, then we recoverthe original definition of orthogonal projection onto a vector

    πv = v(vTv)−1vT =1

    ||v||2vvT

  • 5. Orthogonality 5.3. Least squares problems

    Example

    Find the unique least-squares solution to the system ofequations

    x1 + 2x2 = 03x1 + 3x2 = 1

    x2 = 4

    We have Ax = b where A =(

    1 23 30 1

    )and b =

    ( 014

    )Since rank A = 2, the Theorem says that the unique solution is

    x̂ = (ATA)−1ATb =((

    1 3 02 3 1

    ) ( 1 23 30 1

    ))−1 (1 3 02 3 1

    ) ( 014

    )=(

    10 1111 14

    )−1 ( 37

    )= 119

    ( 14 −11−11 10

    ) (37

    )= 119

    ( −3537

    )x̂ ∈ R2 is closest to a solution to Ax = b in the sense thatAx̂ ∈ R3 is as close as possible to b: we are minimizingdistance in R3, not in R2

    Should check using multivariable calc thatf (x1, x2) = (x1 + 2x2)2 + (3x1 + 3x2 − 1)2 + (x2 − 4)2 has anabsolute minimum at (x1, x2) = (−3519 ,

    3719 )

  • 5. Orthogonality 5.3. Least squares problems

    Example

    Find all the least-squares solutions x̂ when A =(

    3 −61 −2−1 2

    )and

    b =( −2

    14

    )rank A = 1 < 2 and so ATA =

    ( 11 −22−22 44

    )is non-invertible and

    we are obliged to solve ATAx̂ = ATb directlyThis reads (

    11 −22−22 44

    )x̂ =

    (−918

    )whence x̂ =

    ( −9/110

    )+ λ

    (21

    ), where λ is any scalar

    There is a one-parameter set of least-squares solutions

  • 5. Orthogonality 5.3. Least squares problems

    Best-fitting curves in Statistics

    Least-squares solutions are often used in statistics when onewants to find a best fitting polynomial to a set of data pointsExample

    Find the equation of the line y = α0 + α1twhich minimizes the sum of the squaresof the vertical distances to the data points(1, 3), (2, 6), and (3, 7)

    Observe how the different choices of lineaffect the sum of the distances d21 + d

    22 + d

    23

  • 5. Orthogonality 5.3. Least squares problems

    Example (cont)

    The sum of the squared errors, as a function of α0, α1, is

    (y(1)− 3)2 + (y(2)− 6)2 + (y(3)− 7)2 =∣∣∣∣∣∣∣∣( α0+α1α0+2α1α0+3α1

    )−(

    367

    )∣∣∣∣∣∣∣∣2=∣∣∣∣∣∣( 1 11 2

    1 3

    )( α0α1 )−

    (367

    )∣∣∣∣∣∣2 = ||Aααα− b||2Therefore ( α0α1 ) is the least-squares solution

    ( α0α1 ) = (ATA)−1ATb

    =(

    3 66 14

    )−1 ( 1 1 11 2 3

    ) ( 367

    )= 16

    ( 14 −6−6 3

    ) (1636

    )=(

    4/32

    )We therefore get the line y = 43 + 2t

    This is the “best-fitting least-squares” line tothe data

  • 5. Orthogonality 5.3. Least squares problems

    Best-fitting least-squares polynomials

    Suppose {(ti, bi) : i = 0, . . . , n} is a set of data points wheret1, . . . , tn are distinct6

    Question: If t is given, what do we expect b to be?

    We look for a polynomial p(t) of degree k < n which minimizesthe squares of the errors in the dependent variable bp(t) is then a prediction of the value b if t is given

    Example

    Try plugging in the data “1 1; 2 2; 3 1; 4 3; 5 7; 6 2; 7 3;” to theapplet for degrees 1–5

    6The ti are often time-values and the bi the values of some output at time ti

    http://www.shodor.org/chemviz/tools/regressionjava/index.html

  • 5. Orthogonality 5.3. Least squares problems

    Let p(t) = α0 + α1t + · · ·+ αktk be a polynomial of degree k < n

    The predictive error7 at t = ti is the distance |p(ti)− bi|Choose coefficients α0, . . . , αk to minimize the sum of thesquared errors

    n

    ∑i=1

    (p(ti)− bi)2

    Sum squares of errors for three reasons:

    1 Positive and negative errors are treated the same (bothpositive)

    2 Large errors are penalized much more than small ones3 The calculations are much easier than other methods!

    7If k = n then there is a unique polynomial through the n + 1 data points,so we have a formula b = f (t) and thus no predictive error for any ti

  • 5. Orthogonality 5.3. Least squares problems

    Since we have the coefficients for p(t), we can write( p(t1)...

    p(tn)

    )=

    1 t1 t21 ··· tk11 t2 t22 ··· tk2...

    ...1 tn t2n ··· tkn

    α0α1...

    αk

    =: Pa,defining the matrix P ∈ Rn×(k+1). Setting b =

    ( b1...bn

    ), we are

    trying to minimizen

    ∑i=1

    (p(ti)− bi)2 = ||Pa− b||2

    This is a least squares problemMoreover, rank P = k + 1 is maximal iff the ti are distinct. Theunique least squares solution is therefore

    â = (PTP)PTb,

    which returns us the coefficients α0, . . . , αk of the best-fittingleast-squares polynomial of degree ≤ k

  • 5. Orthogonality 5.3. Least squares problems

    Example

    Find the best-fitting line and quadratic to the datati 1 2 3 4bi 1 2 1 3

    For the straight line we have P =( 1 1

    1 21 31 4

    )and b =

    ( 1213

    ), thus

    â = (PTP)−1PTb =(

    4 1010 30

    )−1 ( 720

    )= 12

    (11

    )hence p(t) = 12 (1 + t) is the best-fitting straight line

    0

    1

    2

    3y

    0 1 2 3 4 5t

    d1

    d2d3

    d4

    y = 0.5t + 0.5∑i d2i = 1.5

  • 5. Orthogonality 5.3. Least squares problems

    Example (cont)

    For the datati 1 2 3 4bi 1 2 1 3

    the best-fitting quadratic requires

    P =( 1 1 1

    1 2 41 3 91 4 16

    ), thus

    â = (PTP)−1PTb =( 4 10 30

    10 30 10030 100 354

    )−1 ( 72066

    )= 14

    ( 7−31

    )The best-fitting quadratic polynomial is thereforep(t) = 14 (7− 3t + t2)

    Note that p fits the data bet-ter than the straight line, other-wise the best-fitting quadraticwould be a straight line!

    0

    1

    2

    3y

    0 1 2 3 4 5t

    d1

    d2d3

    d4

    y = 0.25t2 − 0.75t + 1.75∑i d2i = 1.25

  • 5. Orthogonality 5.4. Inner product spaces

    5.4 Inner Product Spaces

    Inner products generalize the scalar product on Rn

    Definition 5.4.1An inner product ( , ) on a real vector space V is a function( , ) : V×V → R which satisfies the following axioms:

    I (x, x) ≥ 0, ∀x ∈ V with equality iff x = 0II (x, y) = (y, x), ∀x, y ∈ V

    III (αx + βy, z) = α(x, z) + β(y, z), ∀x, y, z ∈ V, ∀α, β ∈ R(V, ( , )) is an inner product space

    ( , ) is also called a positive definite (I), symmetric (II),(bi)linear8 (III) formIII says that each map Lz : V → R defined by Lz(x) = (x, z) islinear9

    8Linear in both arguments9When dim V < ∞ it is a fact (beyond this course) that all linear maps

    V → R are of the form Lz for some z ∈ V

  • 5. Orthogonality 5.4. Inner product spaces

    Inner Products on Rn

    If w1, . . . , wn > 0, then

    (x, y) :=n

    ∑i=1

    wixiyi

    is an inner product:10 the wi are called weightsIndeed if A ∈ Rn×n is any symmetric (AT = A), positive-definite(xTAx > 0, ∀x 6= 0) matrix, then

    (x, y) := xTAy

    is an inner product11 on Rn

    Two examples on R3 are

    (x, y) = xT

    1 0 00 3 00 0 4

    y (x, y) = xT3 0 00 1 1

    0 1 2

    y10If w1 = w + 2 = · · · = wn = 1 we get the standard scalar product11Check each of I, II, III

  • 5. Orthogonality 5.4. Inner product spaces

    Inner Products on PnThe standard basis {1, x, x2, . . . , xn−1} identifies Pn with Rn andwe can use any of the inner products on the previous slide, e.g.

    (a1 + b1x + c1x2, a2 + b2x + c2x2) = a1a2 + b1b2 + c1c2 in P3Alternatively, let x1, . . . , xn be distinct real numbers and define

    (p, q) :=n

    ∑i=1

    p(xi)q(xi)

    Conditions II and III clearly hold, but I needs a little work:

    (p, p) =n

    ∑i=1

    p(xi)2 = 0 ⇐⇒ p(xi) = 0, ∀i = 1, . . . , n

    This says that p(x) has at least n distinct rootsHowever a polynomial of degree ≤ n− 1 has at most n− 1roots, unless it is identically zero: hence I holds

    Can also have weights: if w(x) is a positive function

    (p, q) :=n

    ∑i=1

    w(xi)p(xi)q(xi) is an inner product

  • 5. Orthogonality 5.4. Inner product spaces

    Inner Products on C[a, b]

    Undoubtedly the most important example for future courses isthe L2 inner product on C[a, b]

    (f , g) :=∫ b

    af (x)g(x)dx

    We checkI (f , f ) =

    ∫ ba f (x)

    2 dx ≥ 0 with equality iff f (x) ≡ 0(since f is continuous)

    II (f , g) =∫ b

    a f (x)g(x)dx =∫ b

    a g(x)f (x)dx = (g, f )III (αf + βg, h)=

    ∫ ba (αf (x) + βg(x))h(x)dx

    = α∫ b

    a f (x)h(x)dx + β∫ b

    a g(x)h(x)dx= α(f , h) + β(g, h)

    Can similarly define a weighted inner product

    (f , g) =∫ b

    aw(x)f (x)g(x)dx

    where w(x) is any positive function

  • 5. Orthogonality 5.4. Inner product spaces

    Basic Properties

    Definition 5.4.2If (V, ( , )) is an inner product space then the norm or length of avector v ∈ V is

    ||v|| :=√(v, v)

    v, w ∈ V are orthogonal iff (v, w) = 0

    Observe: ||v|| = 0 ⇐⇒ (v, v) = 0 ⇐⇒ v = 0, by property ITheorem 5.4.3 (Pythagoras’)

    If v, w are orthogonal then ||v + w||2 = ||v||2 + ||w||2

    The proof is identical to that given in Rn

  • 5. Orthogonality 5.4. Inner product spaces

    Example

    Find the norms and inner products of the three vectors{1, sin x, cos x} with respect to the L2 inner product on C[0, 2π]

    ||1|| =√∫ 2π

    0 12 dx =

    √2π

    ||sin x|| =√∫ 2π

    0 sin2x dx =

    √∫ 2π0

    12 (1− cos 2x)dx =

    √π

    ||cos x|| = √π(1, sin x) =

    ∫ 2π0 1 · sin x dx = 0

    (1, cos x) = 0

    (sin x, cos x) =∫ 2π

    0 sin x cos x dx = 0

    1, sin x, cos x are therefore orthogonal vectors in C[0, 2π]Dividing by the norms we see that

    1√2π

    , 1√π

    sin x, 1√π

    cos x

    are orthonormal vectorsa in C[0, 2π]aImportant for Fourier Series

  • 5. Orthogonality 5.4. Inner product spaces

    Orthogonal Projections

    Can define orthogonal projections exactly as in Rn

    Definition 5.4.4If v 6= 0 in an inner product space (V, ( , )) then the orthogonalprojection of x ∈ V onto v is

    πv(x) =(v, x)

    ||v||2v

    In particular

    (πv(x), x− πv(x)) =((v, x)

    ||v||2v, x− (v, x)

    ||v||2v

    )

    =(v, x)

    ||v||2

    ((v, x)− (v, x)

    ||v||2||v||2

    )= 0

    whence πv(x) and x− πv(x) are orthogonalv

    x

    πv(x)

    x − πv(x)

  • 5. Orthogonality 5.4. Inner product spaces

    Example

    Calculate the orthogonal projection of sin x ontoSpan{x} ≤ C[0, 2π] with the L2-inner product

    πx(sin x) =(x, sin x)

    ||x||2x =

    ∫ 2π0 x sin x dx∫ 2π

    0 x2 dx

    x

    =−x cos x

    ∣∣2π0 +

    ∫ 2π0 cos x dx

    13 x

    3∣∣2π0

    x

    =−2π83 π

    3x =

    −34π2

    x

    −1

    0

    1y

    xπ 2ππ2

    3π2

  • 5. Orthogonality 5.4. Inner product spaces

    Theorem 5.4.5 (Cauchy–Schwarz inequality)

    |(v, w)| ≤ ||v|| ||w||, with equality iff v, w are parallel

    Can’t rely on cosine rule like in Rn as currently have no notionof angle

    Proof.Suppose v 6= 0, otherwise the Theorem is trivialπv(w) and w− πv(w) are orthogonal, so by Pythagoras’

    ||w||2 = ||πv(w)||2 + ||w− πv(w)||2 ≥ ||πv(w)||2 =(v, w)2

    ||v||2

    Rearranging gives the Theorem: equality is clearly iffw = πv(w) and so iff v, w are parallel

  • 5. Orthogonality 5.4. Inner product spaces

    Angles in Inner Product Spaces

    Cauchy–Schwarz allows us to define the notion of angle

    Definition 5.4.6The angle θ between two non-zero vectors v, w in an innerproduct space is given by

    cos θ =(v, w)||v|| ||w||

    Can now check that the Cosine rule holds:

    ||v + w||2 = ||v||2 + ||w||2 − 2 ||v|| ||w|| cos θ

    and, more painfully, that the Sine rule holds also!

  • 5. Orthogonality 5.4. Inner product spaces

    Norms

    Definition 5.4.7A norm on a real vector space V is a function || || : V → Rwhich satisfies the following axioms:

    I ||v|| ≥ 0, ∀v ∈ V, with equality iff v = 0II ||αv|| = |α| ||v|| , ∀α ∈ R, ∀v ∈ V

    III ||v + w|| ≤ ||v||+ ||w|| , ∀v, w ∈ VWe call (V, || ||) a normed linear space

    Condition III is the triangle inequality: the lengthof one side of a triangle is at most the sum of thelengths of the other two sides

    v

    wv + w

  • 5. Orthogonality 5.4. Inner product spaces

    Theorem 5.4.8

    If (V, ( , )) is an inner product space, then ||v|| =√(v, v) is a norm

    Proof.I is the identical condition for an inner productFor II, ||αv|| =

    √(αv, αv) =

    √α2(v, v) = |α| ||v||

    For III we need the Cauchy–Schwarz inequality:

    ||v + w||2 = ||v||2 + 2(v, w) + ||w||2

    ≤ ||v||2 + 2 ||v|| ||w||+ ||w||2

    = (||v||+ ||w||)2

  • 5. Orthogonality 5.4. Inner product spaces

    The p-norms

    These generalize the standard norm on Rn

    Definition 5.4.9Given p ≥ 1, the p-norm on Rn is the norm

    ||x||p :=(

    n

    ∑i=1|xi|p

    )1/pThe uniform or ∞-norm on Rn is the norm

    ||x||∞ := maxi=1,...,n |xi|

    The 2-norm is the usual notion of length in Rn

    Only the 2-norm comes from an inner product on Rn: anormed linear space in general has no idea of what theangle between vectors means, only their lengths

  • 5. Orthogonality 5.4. Inner product spaces

    The three most common norms are the 1-, 2-, and ∞-norms

    Example

    If x =( 1

    3−1

    )then

    ||x||1 = |1|+ |3|+ |−1| = 5

    ||x||2 =√

    12 + 32 + (−1)2 =√

    11

    ||x||∞ = max{|1| , |3| , |−1|} = 3

    Note that ||x||1 ≥ ||x||2 ≥ ||x||∞: this is true in generalaaSee the homework. . .

  • 5. Orthogonality 5.4. Inner product spaces

    Lp norms on C[a, b] (non-examinable)

    There are also analogues of the p-norms on function spaces

    Definition 5.4.10On C[a, b], the Lp-norm (p ≥ 1) is given by

    ||f ||p :=(∫ b

    a|f (x)|p dx

    )1/pThe uniform or ∞-norm is defined by

    ||f ||∞ := maxx∈[a,b] |f (x)|

    Again only the L2 norm comes from an inner product — in thiscase the L2 inner product defined earlier

  • 5. Orthogonality 5.5. Orthonormal sets

    5.5 Orthonormal sets

    Definition 5.5.1v1, . . . , vn in an inner product space V are orthogonal iff(vi, vj) = 0, ∀i 6= jv1, . . . , vn are orthonormal iff

    (vi, vj) = δij =

    {1 if i = j0 if i 6= j

    Can turn an orthogonal set into an orthonormal set by dividingby the norms:

    {v1, . . . , vn} 7→{

    v1||v1||

    , . . . ,vn||vn||

    }

    Example

    Recall 1√2π

    , 1√π

    sin x, 1√π

    cos x are orthonormal in (C[0, 2π], L2)

  • 5. Orthogonality 5.5. Orthonormal sets

    Theorem 5.5.2An non-zero orthogonal set {v1, . . . , vn} is linearly independent

    Proof.Suppose that α1v1 + · · ·+ αnvn = 0Then, for each i,

    0 = (vi, 0) = (vi, α1v1 + · · ·+ αnvn)= α1(vi, v1) + · · ·+ αi(vi, vi) + · · ·+ αn(vi, vn)= αi ||vi||2

    Since all αi = 0 we have linear independence

  • 5. Orthogonality 5.5. Orthonormal sets

    Calculating in orthonormal bases

    Theorem 5.5.3Let U = {u1, . . . , un} be an orthonormal basis of an inner productspace (V, ( , )). Then:

    1 v =n

    ∑i=1

    (v, ui)ui, ∀v ∈ V: i.e. [v]U =(

    (v,u1)...(v,un)

    )

    2

    (n

    ∑i=1

    aiui,n

    ∑i=1

    biui

    )=

    n

    ∑i=1

    aibi

    3

    ∣∣∣∣∣∣∣∣∣∣ n∑i=1 ciui

    ∣∣∣∣∣∣∣∣∣∣2

    =n

    ∑i=1

    c2i (Parseval’s formula)

    Everything12 works as if you are in Rn with the basis e1, . . . , en!

    12Essentially. . .

  • 5. Orthogonality 5.5. Orthonormal sets

    Proof.Since {u1, . . . , un} is a basis there exist unique αi ∈ R such that

    v = α1u1 + · · ·+ αnun∴ (v, ui) = αi

    which proves 12 and 3 are straightforward by linearity from 1

    With careful caveats, the above formulæ are valid whendim V = ∞: which leads to the example. . .

  • 5. Orthogonality 5.5. Orthonormal sets

    Theorem 5.5.4

    In C[−π, π] with the scaled L2 inner product(f , g) = 1π

    ∫ π−π f (x)g(x)dx, the following infinite set is orthonormal:{

    1√2

    , sin x, cos x, sin 2x, cos 2x, . . .}

    Proof.Just compute integrals: use identities such as2 sin nx sin mx = cos(n−m)x− cos(n + m)x, i.e.

    (sin(nx), sin(mx)) =1π

    ∫ π−π

    sin(nx) sin(mx)dx

    =1

    ∫ π−π

    cos(n−m)x− cos(n + m)x dx

    =1

    ∫ π−π

    cos(n−m)x dx = δmn

  • 5. Orthogonality 5.5. Orthonormal sets

    Parseval’s formula makes some calculations extremely easy

    Example1√2, cos 2x are orthonormal with respect to previous inner

    product and

    cos2 x =12(1 + cos 2x) =

    1√2· 1√

    2+

    12

    cos 2x

    Hence

    ∫ π−π

    cos4 x dx =∣∣∣∣cos2 x∣∣∣∣2 = ( 1√

    2

    )2+

    (12

    )2=

    34

    ∴∫ π−π

    cos4 x dx =3π4

  • 5. Orthogonality 5.5. Orthonormal sets

    Least squares approximations

    Orthogonal projections Least squares approximations

    Theorem 5.5.5Let u1, . . . , un be orthonormal in V and let S = Span(u1, . . . , un)Then the orthogonal projection πS : V → S onto S is

    πS(v) =n

    ∑i=1

    (v, ui)ui, ∀v ∈ V

    Proof.πS is certainly linear (property III of the inner product)Moreover, for each i,

    (v− πS(v), ui) = (v, ui)− (v, ui) = 0=⇒ v− πS(v) ∈ S⊥

    ∴ πS(v) + (v− πS(v)) is the unique decomposition of v intoS, S⊥ parts

  • 5. Orthogonality 5.5. Orthonormal sets

    Corollary 5.5.6

    πS(v) is the closest element of S to v

    The proof is exactly the same as Theorem 5.3.2

    Definition 5.5.7Given v ∈ V we call πS(v) the least-squares approximation of vby S

    Least squares approximations often used to findapproximations to complicated functions by simpler ones. . .

  • 5. Orthogonality 5.5. Orthonormal sets

    Example

    1√2,√

    32 x are orthonormal in (C[−1, 1], L2)

    The least-squares approximation to f (x) = ex by a linearpolynomial on the interval [−1, 1] is therefore

    ex ≈(

    ex,1√2

    )· 1√

    2+

    (ex,√

    3√2

    x

    ) √3√2

    x

    =12

    ∫ 1−1

    ex dx +32

    ∫ 1−1

    xex dx · x

    =12(e− e−1) + 3e−1x

    1

    2

    3y

    −1 0 1x

    Different interval different linear approximation. . .

  • 5. Orthogonality 5.5. Orthonormal sets

    Fourier Series

    Least-squares mostly works for projection onto infinite sets

    Recall: U ={

    1√2, sin x, cos x, sin 2x, cos 2x, . . .

    }is orthonormal

    with respect to (f , g) = 1π∫ π−π f (x)g(x)dx

    Definition 5.5.8Suppose f has period 2π. The Fourier Series of f is its orthogonalprojection onto SpanU

    F (f )(x) =(

    1√2

    , f (x))

    1√2+

    ∑n=1

    (sin nx, f (x)) sin nx

    +∞

    ∑n=1

    (cos nx, f (x)) cos nx

    if the infinite sum convergesa

    aBeyond this course

  • 5. Orthogonality 5.5. Orthonormal sets

    Example

    Let f (x) = x on [−π, π] extended periodicallyThen

    (1√2, x)= 0 = (cos nx, x) for all n, since x is odd

    Moreover

    (sin nx, x) =1π

    ∫ π−π

    x sin nx dx = − 1nπ

    [x cos nx]π−π =2n(−1)n+1

    Thus

    F (f )(x) =∞

    ∑n=1

    2n(−1)n+1 sin nx = 2 sin x− sin 2x+ 2

    3sin 3x− · · ·

  • 5. Orthogonality 5.5. Orthonormal sets

    Example

    Similarly the Fourier series of f (x) = x2 on [−π, π] is

    F (f )(x) = π2

    3+ 4

    ∑n=1

    (−1)nn2

    cos nx

    =π2

    3− 4 cos x + cos 2x− 4

    9cos 3x + · · ·

  • 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

    5.6 Gram–Schmidt Orthogonalisation

    Orthogonal13 bases are useful: how to find them?Answer: Use projections

    Example

    Let {x1, x2} be a basis of R2

    Linear independence =⇒ x2 6= πx1(x2)Moreover

    x2 − πx1(x2) ⊥ x1∴ {x1, x2−πx1(x2)} is an orthogonal basis of R2

    We have orthogonalized the basis {x1, x2}x1

    x2

    πx1(x2)

    x2 − πx1(x2)

    The Gram–Schmidt algorithm does this in general, in any innerproduct space

    13And orthonormal

  • 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

    Theorem 5.6.1 (Gram–Schmidt)

    Let {x1, . . . , xn} be a basis of an inner product space (V, ( , ))Define vectors vi recursively by v1 = x1 and

    vk+1 = xk+1 −k

    ∑i=1

    πvi(xk+1)

    Then {v1, . . . , vn} is an orthogonal basis of V

    If desired, can easily form an orthonormal basis

    {u1, . . . , un} ={

    v1||v1||

    , . . . ,vn||vn||

    }

  • 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

    Proof.Fix k < n and suppose that {v1, . . . , vk} is an orthogonal basisof Span(x1, . . . , xk)

    Observe:vk+1 = xk+1 −∑ki=1 πvi(xk+1) ∈ Span(x1, . . . , xk+1)If i ≤ k, then (vk+1, vi) = 0

    Hence {v1, . . . , vk+1} is an orthogonal (hence linearlyindependent) spanning set of Span(x1, . . . , xk+1)

    I.e. {v1, . . . , vk+1} is an orthogonal basis of Span(x1, . . . , xk+1)

    The result follows by induction

  • 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

    Example

    Orthonormalize the basis{(

    101

    ),( 3

    01

    ),(

    021

    )}of R3

    Label the vectors in order x1, x2, x3 and apply Gram–Schmidt:

    v1 = x1 =(

    101

    )v2 = x2 − πv1(x2) =

    ( 301

    )−

    ((101

    ),( 3

    01

    ))∣∣∣∣∣∣( 10

    1

    )∣∣∣∣∣∣2(

    101

    )=( 1

    0−1

    )v3 = x3 − πv1(x3)− πv2(x3) =

    (021

    )− 1

    2

    (101

    )− −1

    2

    ( 10−1

    )=(

    020

    ){v1, v2, v3} is an orthogonal basis, whence{

    1√2

    (101

    ),

    1√2

    ( 10−1

    ),( 0

    10

    )}is an orthonormal basis

  • 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

    Gram–Schmidt is an algorithm which depends on the order ofthe inputs x1, . . . , xn

    Example

    The Gram–Schmidt orthonormalization of{(

    021

    ),(

    101

    ),( 3

    01

    )}is {

    1√5

    (021

    ),

    13√

    5

    ( 5−24

    ),

    13

    ( 21−2

    )}completely different from the previous example

  • 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

    Example

    Find an orthonormal basis of Span(1, x, x2) in (C[−1, 1], L2)

    v1 = 1 v2 = x−(1, x)

    ||1||2· 1 = x−

    ∫ 1−1 x dx∫ 1−1 1

    2 dx= x

    v3 = x2 −(1, x2)

    ||1||2· 1− (x, x

    2)

    ||x||2· x

    = x2 −∫ 1−1 x

    2 dx∫ 1−1 1

    2 dx−∫ 1−1 x

    3 dx∫ 1−1 x

    2 dxx = x2 − 1

    3

    To normalize, divide through by norms:{1√2

    ,

    √32

    x,

    √458

    (x2 − 1

    3

    )}

    OrthogonalityThe Scalar Product in Euclidean SpaceOrthogonal subspacesLeast squares problemsInner product spacesOrthonormal setsGram–Schmidt Orthogonalization

    0.0: 0.1: 0.2: 0.3: 0.4: 0.5: 0.6: 0.7: 0.8: 0.9: 0.10: 0.11: 0.12: 0.13: 0.14: 0.15: anm0: 1.0: anm1: 2.0: 2.1: 2.2: 2.3: 2.4: 2.5: 2.6: 2.7: 2.8: 2.9: anm2: 3.0: 3.1: 3.2: 3.3: 3.4: 3.5: 3.6: 3.7: 3.8: 3.9: anm3: