6 orthogonality and least squares

64
6 Orthogonality and Least Squares INTRODUCTORY EXAMPLE The North American Datum and GPS Navigation Imagine starting a massive project that you estimate will take ten years and require the efforts of scores of people to construct and solve a 1,800,000 by 900,000 system of linear equations. That is exactly what the National Geodetic Survey did in 1974, when it set out to update the North American Datum (NAD)—a network of 268,000 precisely located reference points that span the entire North American continent, together with Greenland, Hawaii, the Virgin Islands, Puerto Rico, and other Caribbean islands. The recorded latitudes and longitudes in the NAD must be determined to within a few centimeters because they form the basis for all surveys, maps, legal property boundaries, and layouts of civil engineering projects such as highways and public utility lines. However, more than 200,000 new points had been added to the datum since the last adjustment in 1927, and errors had gradually accumulated over the years, due to imprecise measurements and shifts in the earth’s crust. Data gathering for the NAD readjustment was completed in 1983. The system of equations for the NAD had no solution in the ordinary sense, but rather had a least-squares solution, which assigned latitudes and longitudes to the reference points in a way that corresponded best to the 1.8 million observations. The least-squares solution was found in 1986 by solving a related system of so-called normal equations, which involved 928,735 equations in 928,735 variables. 1 More recently, knowledge of reference points on the ground has become crucial for accurately determining the locations of satellites in the satellite-based Global Positioning System (GPS). A GPS satellite calculates its position relative to the earth by measuring the time it takes for signals to arrive from three ground transmitters. To do this, the satellites use precise atomic clocks that have been synchronized with ground stations (whose locations are known accurately because of the NAD). The Global Positioning System is used both for determining the locations of new reference points on the ground and for finding a user’s position on the ground relative to established maps. When a car driver (or a mountain climber) turns on a GPS receiver, the receiver measures the relative arrival times of signals from at least three satellites. This information, together with the transmitted data about the satellites’ locations and message times, is used to adjust the GPS receiver’s time and to determine its approximate location on the earth. Given information from a fourth satellite, the GPS receiver can even establish its approximate altitude. 1 A mathematical discussion of the solution strategy (along with details of the entire NAD project) appears in North American Datum of 1983, Charles R. Schwarz (ed.), National Geodetic Survey, National Oceanic and Atmospheric Administration (NOAA) Professional Paper NOS 2, 1989. 329

Upload: others

Post on 26-Feb-2022

11 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 6 Orthogonality and Least Squares

6 Orthogonality andLeast Squares

INTRODUCTORY EXAMPLE

The North American Datumand GPS NavigationImaginestartingamassiveprojectthatyouestimatewilltaketenyearsandrequiretheeffortsofscoresofpeopletoconstructandsolvea1,800,000by900,000systemoflinearequations. ThatisexactlywhattheNationalGeodeticSurveydidin1974, whenitsetouttoupdatetheNorthAmericanDatum(NAD)—anetworkof268,000preciselylocatedreferencepointsthatspantheentireNorthAmericancontinent, togetherwithGreenland, Hawaii, theVirginIslands, PuertoRico, andotherCaribbeanislands.

TherecordedlatitudesandlongitudesintheNADmustbedeterminedtowithinafewcentimetersbecausetheyformthebasisforallsurveys, maps, legalpropertyboundaries, andlayoutsofcivilengineeringprojectssuchashighwaysandpublicutilitylines. However,morethan200,000newpointshadbeenaddedtothedatumsincethelastadjustmentin1927, anderrorshadgraduallyaccumulatedovertheyears, duetoimprecisemeasurements andshifts in the earth’s crust. DatagatheringfortheNAD readjustmentwascompletedin1983.

ThesystemofequationsfortheNAD hadnosolutionintheordinarysense, butratherhada least-squaressolution, whichassignedlatitudesandlongitudestothereferencepointsinawaythatcorrespondedbesttothe1.8millionobservations. Theleast-squaressolutionwasfoundin1986bysolvingarelatedsystemofso-called

normalequations, whichinvolved928,735equationsin928,735variables.1

Morerecently, knowledgeofreferencepointsonthegroundhasbecomecrucialforaccuratelydeterminingthelocationsofsatellitesinthesatellite-based GlobalPositioningSystem(GPS).A GPS satellitecalculatesitspositionrelativetotheearthbymeasuringthetimeittakesforsignalstoarrivefromthreegroundtransmitters. Todothis, thesatellitesusepreciseatomicclocksthathavebeensynchronizedwithgroundstations(whoselocationsareknownaccuratelybecauseoftheNAD).

The GlobalPositioningSystem is usedbothfordeterminingthelocationsofnewreferencepointsonthegroundandforfindingauser’spositiononthegroundrelativetoestablishedmaps. Whenacardriver(oramountainclimber)turnsonaGPS receiver, thereceivermeasurestherelativearrivaltimesofsignalsfromatleastthreesatellites. Thisinformation, togetherwiththetransmitteddataaboutthesatellites’locationsandmessagetimes, isusedtoadjusttheGPS receiver’stimeandtodetermineitsapproximatelocationontheearth. Giveninformationfromafourthsatellite, theGPS receivercanevenestablishitsapproximatealtitude.

1A mathematicaldiscussionofthesolutionstrategy(alongwithdetailsoftheentireNAD project)appearsin NorthAmericanDatumof1983,CharlesR.Schwarz(ed.), NationalGeodeticSurvey, NationalOceanicandAtmosphericAdministration(NOAA) ProfessionalPaperNOS 2,1989.

329

Page 2: 6 Orthogonality and Least Squares

330 CHAPTER 6 Orthogonality and Least Squares

BoththeNAD andGPS problemsaresolvedbyfindingavectorthat“approximatelysatisfies”aninconsistent

systemofequations. A carefulexplanationofthisapparentcontradictionwillrequireideasdevelopedinthefirstfivesectionsofthischapter.

WEB

Inordertofindanapproximatesolutiontoaninconsistentsystemofequationsthathasnoactualsolution, awell-definednotionofnearnessisneeded. Section 6.1introducestheconceptsofdistanceandorthogonalityinavectorspace. Sections 6.2and6.3showhoworthogonalitycanbeusedtoidentifythepointwithinasubspace W thatisnearesttoapoint y lyingoutsideof W . Bytaking W tobethecolumnspaceofamatrix,Section 6.5developsamethodforproducingapproximate(“least-squares”)solutionsforinconsistentlinearsystems, suchasthesystemsolvedfortheNAD report.

Section 6.4providesanotheropportunitytoseeorthogonalprojectionsatwork,creatingamatrixfactorizationwidelyusedinnumericallinearalgebra. Theremainingsectionsexaminesomeofthemanyleast-squaresproblemsthatariseinapplications,includingthoseinvectorspacesmoregeneralthan R

n.

6.1 INNER PRODUCT, LENGTH, AND ORTHOGONALITY

Geometricconceptsoflength, distance, andperpendicularity, whicharewellknownforR

2 and R3, aredefinedherefor R

n. Theseconceptsprovidepowerfulgeometrictoolsforsolvingmanyappliedproblems, includingtheleast-squaresproblemsmentionedabove. Allthreenotionsaredefinedintermsoftheinnerproductoftwovectors.

The Inner ProductIf u and v arevectorsin R

n, thenweregard u and v as n � 1 matrices. ThetransposeuT isa 1 � n matrix, andthematrixproduct uT v isa 1 � 1 matrix, whichwewriteasasinglerealnumber(ascalar)withoutbrackets. Thenumber uT v iscalledthe innerproduct of u and v, andoftenitiswrittenas u�v. Thisinnerproduct, mentionedintheexercisesforSection 2.1, isalsoreferredtoasa dotproduct. If

u D

2

6664

u1

u2

:::

un

3

7775

and v D

2

6664

v1

v2

:::

vn

3

7775

thentheinnerproductof u and v is

Œ u1 u2 � � � un �

2

6664

v1

v2

:::

vn

3

7775

D u1v1 C u2v2 C � � � C unvn

Page 3: 6 Orthogonality and Least Squares

6.1 Inner Product, Length, and Orthogonality 331

EXAMPLE 1 Compute u�v and v�u for u D

2

4

2

�5

�1

3

5 and v D

2

4

3

2

�3

3

5.

SOLUTION

u�v D uT v D Œ 2 �5 �1 �

2

4

3

2

�3

3

5 D .2/.3/ C .�5/.2/ C .�1/.�3/ D �1

v�u D vT u D Œ 3 2 �3 �

2

4

2

�5

�1

3

5 D .3/.2/ C .2/.�5/ C .�3/.�1/ D �1

ItisclearfromthecalculationsinExample 1why u�v D v�u. Thiscommutativityoftheinnerproductholdsingeneral. ThefollowingpropertiesoftheinnerproductareeasilydeducedfrompropertiesofthetransposeoperationinSection 2.1. (SeeExercises 21and22attheendofthissection.)

THEOREM 1 Let u, v, and w bevectorsin Rn, andlet c beascalar. Then

a. u�v D v�ub. .u C v/�w D u�w C v�wc. .cu/�v D c.u�v/ D u�.cv/

d. u�u � 0, and u�u D 0 ifandonlyif u D 0

Properties(b)and(c)canbecombinedseveraltimestoproducethefollowingusefulrule:

.c1u1 C � � � C cpup/�w D c1.u1 �w/ C � � � C cp.up �w/

The Length of a VectorIf v isin R

n, withentries v1; : : : ; vn, thenthesquarerootof v�v isdefinedbecause v�visnonnegative.

DEF IN I T I ON The length (or norm)of v isthenonnegativescalar kvk definedby

kvk Dpv�v D

q

v21 C v2

2 C � � � C v2n; and kvk2 D v�v

Suppose v isin R2, say, v D

a

b

. Ifweidentify v withageometricpointinthe

|a|

|b|

x1

x2

(a, b)

a2 + b2√

0

FIGURE 1

Interpretationof kvk aslength.

plane, asusual, then kvk coincideswiththestandardnotionofthelengthofthelinesegmentfromtheoriginto v. ThisfollowsfromthePythagoreanTheoremappliedtoatrianglesuchastheoneinFig. 1.

A similarcalculationwiththediagonalofarectangularboxshowsthatthedefinitionoflengthofavector v in R

3 coincideswiththeusualnotionoflength.Foranyscalar c, thelengthof cv is jcj timesthelengthof v. Thatis,

kcvk D jcjkvk

(Toseethis, compute kcvk2 D .cv/� .cv/ D c2v�v D c2kvk2 andtakesquareroots.)

Page 4: 6 Orthogonality and Least Squares

332 CHAPTER 6 Orthogonality and Least Squares

A vectorwhoselengthis1iscalleda unitvector. Ifwe divide anonzerovector vbyitslength—thatis, multiplyby 1=kvk—weobtainaunitvector u becausethelengthof u is .1=kvk/kvk. Theprocessofcreating u from v issometimescalled normalizingv, andwesaythat u is inthesamedirection as v.

Severalexamplesthatfollowusethespace-savingnotationfor(column)vectors.

EXAMPLE 2 Let v D .1; �2; 2; 0/. Findaunitvector u inthesamedirectionas v.

SOLUTION First, computethelengthof v:

kvk2 D v�v D .1/2 C .�2/2 C .2/2 C .0/2 D 9

kvk Dp

9 D 3

Then, multiply v by 1=kvk toobtain

u D 1

kvkv D 1

3v D 1

3

2

664

1

�2

2

0

3

775

D

2

664

1=3

�2=3

2=3

0

3

775

Tocheckthat kuk D 1, itsufficestoshowthat kuk2 D 1.

kuk2 D u�u D�

13

�2 C�

� 23

�2 C�

23

�2 C .0/2

D 19

C 49

C 49

C 0 D 1

EXAMPLE 3 Let W bethesubspaceof R2 spannedby x D . 2

3; 1/. Findaunit

vector z thatisabasisfor W .

SOLUTION W consistsofallmultiplesof x, asinFig. 2(a). Anynonzerovectorin W

isabasisfor W . Tosimplifythecalculation, “scale” x toeliminatefractions. Thatis,multiply x by3toget

y D�

2

3

Nowcompute kyk2 D 22 C 32 D 13, kyk Dp

13, andnormalize y toget

z D 1p13

2

3

D�

2=p

13

3=p

13

SeeFig. 2(b). Anotherunitvectoris .�2=p

13; �3=p

13/.

(a)

x1

x2

x

W

1

1

(b)

x1

x2

y

z

1

1

FIGURE 2

Normalizingavectortoproduceaunitvector.

Distance in Rn

Wearereadynowtodescribehowcloseonevectoristoanother. Recallthatif a and b

arerealnumbers, thedistanceonthenumberlinebetween a and b isthenumber ja � bj.TwoexamplesareshowninFig. 3. ThisdefinitionofdistanceinR hasadirectanaloguein R

n.

|2 – 8| = |–6| = 6 or |8 – 2| = |6| = 6 |(–3) – 4| = |–7| = 7 or |4 – (–3)| = |7| = 7

6 units apart

a b a b

7 units apart

1 32 4 5 6 7 8 9 1 30 2–1–3 –2 4 5

FIGURE 3 Distancesin R.

Page 5: 6 Orthogonality and Least Squares

6.1 Inner Product, Length, and Orthogonality 333

DEF IN I T I ON For u and v in Rn, the distancebetweenuandv, writtenas dist.u; v/, isthe

lengthofthevector u � v. Thatis,

dist.u; v/ D ku � vk

In R2 and R

3, thisdefinitionofdistancecoincideswiththeusualformulasfortheEuclideandistancebetweentwopoints, asthenexttwoexamplesshow.

EXAMPLE 4 Computethedistancebetweenthevectors u D .7; 1/ and v D .3; 2/.

SOLUTION Calculate

u � v D�

7

1

��

3

2

D�

4

�1

ku � vk Dp

42 C .�1/2 Dp

17

Thevectors u, v, and u � v areshowninFig. 4. Whenthevector u � v isaddedto v, theresultis u. NoticethattheparallelograminFig. 4showsthatthedistancefromu to v isthesameasthedistancefrom u � v to 0.

||u – v||

x1

x2

v

u

u – v

–v

1

1

FIGURE 4 Thedistancebetween u and v isthelengthof u � v.

EXAMPLE 5 If u D .u1; u2; u3/ and v D .v1; v2; v3/, then

dist.u; v/ D ku � vk Dp

.u � v/�.u � v/

Dp

.u1 � v1/2 C .u2 � v2/2 C .u3 � v3/2

Orthogonal VectorsTherestofthischapterdependsonthefactthattheconceptofperpendicularlinesin

||u –(– v)||

||u – v||

v

0

u

–v

FIGURE 5

ordinaryEuclideangeometryhasananaloguein Rn.

Consider R2 or R

3 andtwolinesthroughtheorigindeterminedbyvectors u and v.ThetwolinesshowninFig. 5aregeometricallyperpendicularifandonlyifthedistancefrom u to v isthesameasthedistancefrom u to �v. Thisisthesameasrequiringthesquaresofthedistancestobethesame. Now

Œ dist.u; �v/ �2 D ku � .�v/k2 D ku C vk2

D .u C v/� .u C v/

D u�.u C v/ C v� .u C v/ Theorem 1(b)D u�u C u�v C v�u C v�v Theorem 1(a), (b)

D kuk2 C kvk2 C 2u�v Theorem 1(a) (1)

Page 6: 6 Orthogonality and Least Squares

334 CHAPTER 6 Orthogonality and Least Squares

Thesamecalculationswith v and �v interchangedshowthatŒdist.u; v/�2 D kuk2 C k � vk2 C 2u� .�v/

D kuk2 C kvk2 � 2u�vThetwosquareddistancesareequalifandonlyif 2u�v D �2u�v, whichhappensifandonlyif u�v D 0.

Thiscalculationshowsthatwhenvectors u and v areidentifiedwithgeometricpoints, thecorrespondinglinesthroughthepointsandtheoriginareperpendicularifandonlyif u�v D 0. Thefollowingdefinitiongeneralizesto R

n thisnotionofperpen-dicularity(or orthogonality, asitiscommonlycalledinlinearalgebra).

DEF IN I T I ON Twovectors u and v in Rn are orthogonal (toeachother)if u�v D 0.

Observethatthezerovectorisorthogonaltoeveryvectorin Rn because 0T v D 0

forall v.Thenexttheoremprovidesausefulfactaboutorthogonalvectors. Theprooffol-

lowsimmediatelyfromthecalculationin(1)aboveandthedefinitionoforthogonality.TherighttriangleshowninFig. 6providesavisualizationofthelengthsthatappearinthetheorem.

THEOREM 2 The Pythagorean Theorem

Twovectors u and v areorthogonalifandonlyif ku C vk2 D kuk2 C kvk2.

Orthogonal ComplementsToprovidepracticeusinginnerproducts, weintroduceaconceptherethatwillbeofuseinSection 6.3andelsewhereinthechapter. Ifavector z isorthogonaltoeveryvectorinasubspace W of R

n, then z issaidtobe orthogonalto W . Thesetofallvectors zthatareorthogonalto W iscalledthe orthogonalcomplement of W andisdenotedbyW ? (andreadas“W perpendicular”orsimply“W perp”).

v

u + v

||u + v|| u

||v||

||u||

0

FIGURE 6

EXAMPLE 6 Let W beaplanethroughtheoriginin R3, andlet L betheline

throughtheoriginandperpendicularto W . If z and w arenonzero, z ison L, and

w

zL

W

0

FIGURE 7

A planeandlinethrough 0 asorthogonalcomplements.

w isin W , thenthelinesegmentfrom 0 to z isperpendiculartothelinesegmentfrom 0to w; thatis, z�w D 0. SeeFig. 7. Soeachvectoron L isorthogonaltoevery w in W .Infact, L consistsof all vectorsthatareorthogonaltothe w’sin W , and W consistsofallvectorsorthogonaltothe z’sin L. Thatis,

L D W ? and W D L?

Thefollowingtwofactsabout W ?, with W asubspaceof Rn, areneededlater

inthechapter. ProofsaresuggestedinExercises 29and30. Exercises 27–31provideexcellentpracticeusingpropertiesoftheinnerproduct.

1. A vector x isin W ? ifandonlyif x isorthogonaltoeveryvectorinasetthatspans W .

2. W ? isasubspaceof Rn.

Page 7: 6 Orthogonality and Least Squares

6.1 Inner Product, Length, and Orthogonality 335

ThenexttheoremandExercise 31verifytheclaimsmadeinSection 4.6concerningthesubspacesshowninFig. 8. (AlsoseeExercise 28inSection 4.6.)

A

0 0

Row A

Nul A

Col A

Nul AT

FIGURE 8 Thefundamentalsubspacesdeterminedbyan m � n matrix A.

THEOREM 3 Let A bean m � n matrix. Theorthogonalcomplementoftherowspaceof A isthenullspaceof A, andtheorthogonalcomplementofthecolumnspaceof A isthenullspaceof AT :

.RowA/? D NulA and .ColA/? D NulAT

PROOF Therow–columnruleforcomputing Ax showsthatif x isin NulA, then x isorthogonaltoeachrowof A (withtherowstreatedasvectorsin R

n/. Sincetherowsof A spantherowspace, x isorthogonalto RowA. Conversely, if x isorthogonaltoRowA, then x iscertainlyorthogonaltoeachrowof A, andhence Ax D 0. Thisprovesthefirststatementofthetheorem. Sincethisstatementistrueforanymatrix, itistruefor AT . Thatis, theorthogonalcomplementoftherowspaceof AT isthenullspaceofAT . Thisprovesthesecondstatement, because RowAT D ColA.

Angles in R2 and R

3 (Optional)If u and v arenonzerovectorsineitherR

2 orR3, thenthereisaniceconnectionbetween

theirinnerproductandtheangle # betweenthetwolinesegmentsfromtheorigintothepointsidentifiedwith u and v. Theformulais

u�v D kuk kvk cos# (2)

ToverifythisformulaforvectorsinR2, considerthetriangleshowninFig. 9, withsides

oflengths kuk, kvk, and ku � vk. Bythelawofcosines,ku � vk2 D kuk2 C kvk2 � 2kuk kvk cos#

(u1, u

2)

(v1, v

2)

||u – v||

||v||

||u|| �

FIGURE 9 Theanglebetweentwovectors.

Page 8: 6 Orthogonality and Least Squares

336 CHAPTER 6 Orthogonality and Least Squares

whichcanberearrangedtoproduce

kuk kvk cos# D 1

2

kuk2 C kvk2 � ku � vk2�

D 1

2

u21 C u2

2 C v21 C v2

2 � .u1 � v1/2 � .u2 � v2/2�

D u1v1 C u2v2

D u�v

Theverificationfor R3 issimilar. When n > 3, formula (2)maybeusedto define the

anglebetweentwovectorsin Rn. Instatistics, forinstance, thevalueof cos# defined

by(2)forsuitablevectors u and v iswhatstatisticianscalla correlationcoefficient.

PRACTICE PROBLEMS

Let a D�

�2

1

, b D�

�3

1

, c D

2

4

4=3

�1

2=3

3

5, and d D

2

4

5

6

�1

3

5.

1. Computea�ba�a

and�a�ba�a

a.

2. Findaunitvector u inthedirectionof c.3. Showthat d isorthogonalto c.4. UsetheresultsofPracticeProblems 2and3toexplainwhy dmustbeorthogonalto

theunitvector u.

6.1 EXERCISESComputethequantitiesinExercises1–8usingthevectors

u D�

�1

2

, v D�

4

6

, w D

2

4

3

�1

�5

3

5, x D

2

4

6

�2

3

3

5

1. u �u, v �u, and v �uu �u

2. w �w, x �w, and x �ww �w

3. 1

w �ww 4. 1

u �uu

5.�u �vv �v

v 6.�x �wx � x

x

7. kwk 8. kxk

InExercises9–12, findaunitvectorinthedirectionofthegivenvector.

9.�

�30

40

10.

2

4

�6

4

�3

3

5

11.

2

4

7=4

1=2

1

3

5 12.�

8=3

2

13. Findthedistancebetween x D�

10

�3

and y D�

�1

�5

.

14. Findthedistancebetween u D

2

4

0

�5

2

3

5 and z D

2

4

�4

�1

8

3

5.

DeterminewhichpairsofvectorsinExercises15–18areorthog-onal.

15. a D�

8

�5

, b D�

�2

�3

16. u D

2

4

12

3

�5

3

5, v D

2

4

2

�3

3

3

5

17. u D

2

664

3

2

�5

0

3

775, v D

2

664

�4

1

�2

6

3

775

18. y D

2

664

�3

7

4

0

3

775, z D

2

664

1

�8

15

�7

3

775

InExercises19and20, allvectorsareinRn. Markeachstatement

TrueorFalse. Justifyeachanswer.

19. a. v �v D kvk2.b. Foranyscalar c, u � .cv/ D c.u �v/.c. Ifthedistancefrom u to v equalsthedistancefrom u to

�v, then u and v areorthogonal.d. Forasquarematrix A, vectorsin ColA areorthogonalto

vectorsin NulA.e. If vectors v1; : : : ; vp span a subspace W and if x is

orthogonaltoeach vj for j D 1; : : : ; p, then x isin W ?.

Page 9: 6 Orthogonality and Least Squares

6.1 Inner Product, Length, and Orthogonality 337

20. a. u �v � v �u D 0.b. Foranyscalar c, kcvk D ckvk.c. If x isorthogonaltoeveryvectorinasubspace W , then x

isin W ?.d. If kuk2 C kvk2 D ku C vk2, then u and v areorthogonal.e. Foran m � n matrix A, vectorsinthenullspaceof A are

orthogonaltovectorsintherowspaceof A.21. Usethetransposedefinitionoftheinnerproducttoverify

parts(b)and(c)ofTheorem 1. MentiontheappropriatefactsfromChapter 2.

22. Let u D .u1; u2; u3/. Explain why u �u � 0. When isu �u D 0?

23. Let u D

2

4

2

�5

�1

3

5 and v D

2

4

�7

�4

6

3

5. Computeandcompare

u �v, kuk2, kvk2, and ku C vk2. Donotuse thePythagoreanTheorem.

24. Verifythe parallelogramlaw forvectors u and v in Rn:

ku C vk2 C ku � vk2 D 2kuk2 C 2kvk2

25. Let v D�

a

b

. Describetheset H ofvectors�

x

y

thatare

orthogonalto v. [Hint: Consider v D 0 and v ¤ 0.]

26. Let u D

2

4

5

�6

7

3

5, andlet W bethesetofall x in R3 suchthat

u �x D 0. Whattheorem inChapter 4canbeusedtoshowthatW isasubspaceof R

3? Describe W ingeometriclanguage.27. Supposeavector y isorthogonaltovectors u and v. Show

that y isorthogonaltothevector u C v.28. Suppose y is orthogonal to u and v. Showthat y is or-

thogonaltoevery w in Span fu; vg. [Hint: Anarbitrary win Span fu; vg hastheform w D c1u C c2v. Showthat y isorthogonaltosuchavector w.]

Span{u, v}

u

w

vy

0

29. Let W D Span fv1; : : : ; vpg. Showthatif x isorthogonaltoeach vj , for 1 � j � p, then x isorthogonaltoeveryvectorin W .

30. Let W beasubspaceof Rn, andlet W ? bethesetofall

vectorsorthogonalto W . Showthat W ? isasubspaceof Rn

usingthefollowingsteps.a. Take z in W ?, andlet u representanyelementof W .

Then z �u D 0. Takeanyscalar c andshowthat cz isorthogonalto u. (Since u wasanarbitraryelementof W ,thiswillshowthat cz isin W ?.)

b. Take z1 and z2 in W ?, andlet u beanyelementof W .Showthat z1 C z2 is orthogonal to u. Whatcanyouconcludeabout z1 C z2? Why?

c. Finishtheproofthat W ? isasubspaceof Rn.

31. Showthatif x isinboth W and W ?, then x D 0.

32. [M] Constructapair u, v ofrandomvectorsin R4, andlet

A D

2

664

:5 :5 :5 :5

:5 :5 �:5 �:5

:5 �:5 :5 �:5

:5 �:5 �:5 :5

3

775

a. Denote the columns of A by a1; : : : ; a4. Com-pute the length of each column, and compute a1 �a2,a1 �a3; a1 �a4; a2 �a3; a2 �a4, and a3 �a4.

b. Computeandcomparethelengthsof u, Au, v, and Av.c. Useequation(2)inthissectiontocomputethecosineof

theanglebetween u and v. Comparethiswiththecosineoftheanglebetween Au and Av.

d. Repeatparts(b)and(c)fortwootherpairsofrandomvectors. Whatdoyouconjectureabouttheeffectof A onvectors?

33. [M] Generaterandomvectors x, y, and v in R4 withinteger

entries(and v ¤ 0), andcomputethequantities�x �vv �v

v;�y �vv �v

v;.x C y/�v

v �vv;

.10x/�vv �v

v

Repeatthecomputationswithnewrandomvectors x andy. Whatdoyouconjectureaboutthemapping x 7! T .x/ D�x �vv �v

v (for v ¤ 0)? Verifyyourconjecturealgebraically.

34. [M] Let A D

2

66664

�6 3 �27 �33 �13

6 �5 25 28 14

8 �6 34 38 18

12 �10 50 41 23

14 �21 49 29 33

3

77775

. Construct

a matrix N whosecolumnsforma basis for NulA, andconstructamatrixR whose rows formabasisfor RowA (seeSection4.6fordetails). PerformamatrixcomputationwithN and R thatillustratesafactfromTheorem3.

Page 10: 6 Orthogonality and Least Squares

338 CHAPTER 6 Orthogonality and Least Squares

SOLUTIONS TO PRACTICE PROBLEMS

1. a�b D 7, a�a D 5. Hencea�ba�a

D 7

5, and

�a�ba�a

a D 7

5a D

�14=5

7=5

.

2. Scale c, multiplyingby3toget y D

2

4

4

�3

2

3

5. Compute kyk2 D 29 and kyk Dp

29.

Theunitvectorinthedirectionofboth c and y is u D 1

kyky D

2

4

4=p

29

�3=p

29

2=p

29

3

5.

3. d isorthogonalto c, because

d�c D

2

4

5

6

�1

3

5�

2

4

4=3

�1

2=3

3

5 D 20

3� 6 � 2

3D 0

4. d isorthogonalto u because u hastheform kc forsome k, andd�u D d� .kc/ D k.d�c/ D k.0/ D 0

6.2 ORTHOGONAL SETS

A setofvectors fu1; : : : ;upg inRn issaidtobean orthogonalset ifeachpairofdistinct

vectorsfromthesetisorthogonal, thatis, if ui �uj D 0 whenever i ¤ j .

EXAMPLE 1 Showthat fu1; u2; u3g isanorthogonalset, where

u1 D

2

4

3

1

1

3

5; u2 D

2

4

�1

2

1

3

5; u3 D

2

4

�1=2

�2

7=2

3

5

SOLUTION Considerthethreepossiblepairsofdistinctvectors, namely, fu1; u2g,fu1; u3g, and fu2;u3g.

u1 �u2 D 3.�1/ C 1.2/ C 1.1/ D 0

u1 �u3 D 3�

� 12

C 1.�2/ C 1�

72

D 0

u2 �u3 D �1�

� 12

C 2.�2/ C 1�

72

D 0

Eachpairofdistinctvectorsisorthogonal, andso fu1; u2; u3g isanorthogonalset. SeeFig. 1; thethreelinesegmentstherearemutuallyperpendicular.

x1

x2

x3

u1

u2

u3

FIGURE 1

THEOREM 4 If S D fu1; : : : ; upg isanorthogonalsetofnonzerovectorsin Rn, then S is

linearlyindependentandhenceisabasisforthesubspacespannedby S .

PROOF If 0 D c1u1 C � � � C cpup forsomescalars c1; : : : ; cp , then0 D 0�u1 D .c1u1 C c2u2 C � � � C cpup/�u1

D .c1u1/�u1 C .c2u2/�u1 C � � � C .cpup/�u1

D c1.u1 � u1/ C c2.u2 � u1/ C � � � C cp.up � u1/

D c1.u1 � u1/

because u1 isorthogonalto u2; : : : ; up . Since u1 isnonzero, u1 �u1 isnotzeroandsoc1 D 0. Similarly, c2; : : : ; cp mustbezero. Thus S islinearlyindependent.

Page 11: 6 Orthogonality and Least Squares

6.2 Orthogonal Sets 339

DEF IN I T I ON An orthogonalbasis forasubspace W of Rn isabasisfor W thatisalsoan

orthogonalset.

Thenexttheoremsuggestswhyanorthogonalbasisismuchnicerthanotherbases.Theweightsinalinearcombinationcanbecomputedeasily.

THEOREM 5 Let fu1; : : : ; upg beanorthogonalbasisforasubspace W of Rn. Foreach y in

W , theweightsinthelinearcombination

y D c1u1 C � � � C cpup

aregivenbycj D y�uj

uj � uj

.j D 1; : : : ; p/

PROOF Asintheprecedingproof, theorthogonalityof fu1; : : : ; upg showsthat

y�u1 D .c1u1 C c2u2 C � � � C cpup/�u1 D c1.u1 � u1/

Since u1 �u1 is notzero, theequationabovecanbesolvedfor c1. Tofind cj forj D 2; : : : ; p, compute y�uj andsolvefor cj .

EXAMPLE 2 Theset S D fu1; u2;u3g inExample 1isanorthogonalbasisfor R3.

Expressthevector y D

2

4

6

1

�8

3

5 asalinearcombinationofthevectorsin S .

SOLUTION Compute

y�u1 D 11; y�u2 D �12; y�u3 D �33

u1 � u1 D 11; u2 � u2 D 6; u3 � u3 D 33=2

ByTheorem 5,

y D y�u1

u1 � u1

u1 C y�u2

u2 � u2

u2 C y�u3

u3 � u3

u3

D 11

11u1 C �12

6u2 C �33

33=2u3

D u1 � 2u2 � 2u3

Noticehoweasyitistocomputetheweightsneededtobuild y fromanorthogonalbasis. Ifthebasiswerenotorthogonal, itwouldbenecessarytosolveasystemoflinearequationsinordertofindtheweights, asinChapter 1.

Weturnnexttoaconstructionthatwillbecomeakeystepinmanycalculationsinvolvingorthogonality, anditwillleadtoageometricinterpretationofTheorem 5.

An Orthogonal ProjectionGivenanonzerovector u in R

n, considertheproblemofdecomposingavector y in Rn

intothesumoftwovectors, oneamultipleof u andtheotherorthogonalto u. Wewishtowrite

y D Oy C z (1)

Page 12: 6 Orthogonality and Least Squares

340 CHAPTER 6 Orthogonality and Least Squares

where Oy D ˛u forsomescalar ˛ and z issomevectororthogonalto u. SeeFig. 2. Given

u

y

0 ˆ

z = y – y

y = �u

FIGURE 2

Finding ˛ tomake y � Oyorthogonalto u.

anyscalar ˛, let z D y � ˛u, sothat(1)issatisfied. Then y � Oy isorthogonalto u ifandonlyif

0 D .y � ˛u/�u D y�u � .˛u/�u D y�u � ˛.u�u/

Thatis, (1)issatisfiedwith z orthogonalto u ifandonlyif ˛ D y�uu�u

and Oy D y�uu�u

u.Thevector Oy iscalledthe orthogonalprojectionofyontou, andthevector z iscalledthe componentofyorthogonaltou.

If c isanynonzeroscalarandif u isreplacedby cu inthedefinitionof Oy, thentheorthogonalprojectionof y onto cu isexactlythesameastheorthogonalprojectionof yonto u (Exercise 31). Hencethisprojectionisdeterminedbythe subspace L spannedby u (thelinethrough u and 0). Sometimes Oy isdenotedby projL y andiscalledtheorthogonalprojectionofyonto L. Thatis,

Oy D projL y D y�uu�u

u (2)

EXAMPLE 3 Let y D�

7

6

and u D�

4

2

. Findtheorthogonalprojectionof y

onto u. Thenwrite y asthesumoftwoorthogonalvectors, onein Span fug andoneorthogonalto u.

SOLUTION Compute

y�u D�

7

6

4

2

D 40

u�u D�

4

2

4

2

D 20

Theorthogonalprojectionof y onto u is

Oy D y�uu�u

u D 40

20u D 2

4

2

D�

8

4

andthecomponentof y orthogonalto u is

y � Oy D�

7

6

��

8

4

D�

�1

2

Thesumofthesetwovectorsis y. Thatis,�

7

6

"y

D�

8

4

"Oy

C�

�1

2

".y � Oy/

Thisdecompositionof y isillustratedinFig. 3. Note: Ifthecalculationsabovearecorrect, then fOy; y � Oyg willbeanorthogonalset. Asacheck, compute

Oy�.y � Oy/ D�

8

4

�1

2

D �8 C 8 D 0

SincethelinesegmentinFig. 3between y and Oy isperpendiculartoL, byconstruc-tionof Oy, thepointidentifiedwith Oy istheclosestpointof L to y. (Thiscanbeprovedfromgeometry. Wewillassumethisfor R

2 nowandproveitfor Rn inSection 6.3.)

Page 13: 6 Orthogonality and Least Squares

6.2 Orthogonal Sets 341

x1

x2

y

u

y

L = Span{u}

3

1 8

6

yy – ˆ

FIGURE 3 Theorthogonalprojectionof y ontoaline L throughtheorigin.

EXAMPLE 4 FindthedistanceinFig. 3from y to L.

SOLUTION Thedistancefrom y to L isthelengthoftheperpendicularlinesegmentfrom y totheorthogonalprojection Oy. Thislengthequalsthelengthof y � Oy. Thusthedistanceis

ky � Oyk Dp

.�1/2 C 22 Dp

5

A Geometric Interpretation of Theorem 5Theformulafortheorthogonalprojection Oy in(2)hasthesameappearanceaseachofthetermsinTheorem 5. ThusTheorem 5decomposesavector y intoasumoforthogonalprojectionsontoone-dimensionalsubspaces.

Itiseasytovisualizethecaseinwhich W D R2 D Span fu1;u2g, with u1 and u2

orthogonal. Any y in R2 canbewrittenintheform

y D y�u1

u1 � u1

u1 C y�u2

u2 � u2

u2 (3)

Thefirsttermin(3)istheprojectionof y ontothesubspacespannedby u1 (thelinethrough u1 andtheorigin), andthesecondtermistheprojectionof y ontothesubspacespannedby u2. Thus(3)expresses y asthesumofitsprojectionsontothe(orthogonal)axesdeterminedby u1 and u2. SeeFig. 4.

0

y

u1

u2

y2 = projection onto u

2

y1 = projection onto u

1

FIGURE 4 A vectordecomposedintothesumoftwoprojections.

Theorem 5decomposeseach y in Span fu1; : : : ; upg intothesumof p projectionsontoone-dimensionalsubspacesthataremutuallyorthogonal.

Page 14: 6 Orthogonality and Least Squares

342 CHAPTER 6 Orthogonality and Least Squares

Decomposing a Force into Component ForcesThedecompositioninFig. 4canoccurinphysicswhensomesortofforceisappliedtoanobject. Choosinganappropriatecoordinatesystemallowstheforcetoberepresentedbyavector y in R

2 or R3. Oftentheprobleminvolvessomeparticulardirectionof

interest, whichisrepresentedbyanothervector u. Forinstance, iftheobjectismovinginastraightlinewhentheforceisapplied, thevector u mightpointinthedirectionofmovement, asinFig. 5. A keystepintheproblemistodecomposetheforceintoacomponentinthedirectionof u andacomponentorthogonalto u. ThecalculationswouldbeanalogoustothosemadeinExample 3above.

y

u

FIGURE 5

Orthonormal SetsA set fu1; : : : ; upg isan orthonormalset ifitisanorthogonalsetofunitvectors. If W

isthesubspacespannedbysuchaset, then fu1; : : : ;upg isan orthonormalbasis forW , sincethesetisautomaticallylinearlyindependent, byTheorem 4.

Thesimplestexampleofanorthonormalsetisthestandardbasis fe1; : : : ; eng forR

n. Anynonemptysubsetof fe1; : : : ; eng isorthonormal, too. Hereisamorecompli-catedexample.

EXAMPLE 5 Showthat fv1; v2; v3g isanorthonormalbasisof R3, where

v1 D

2

64

3=p

11

1=p

11

1=p

11

3

75; v2 D

2

64

�1=p

6

2=p

6

1=p

6

3

75; v3 D

2

64

�1=p

66

�4=p

66

7=p

66

3

75

SOLUTION Compute

v1 �v2 D �3=p

66 C 2=p

66 C 1=p

66 D 0

v1 �v3 D �3=p

726 � 4=p

726 C 7=p

726 D 0

v2 �v3 D 1=p

396 � 8=p

396 C 7=p

396 D 0

Thus fv1; v2; v3g isanorthogonalset. Also,v1 �v1 D 9=11 C 1=11 C 1=11 D 1

v2 �v2 D 1=6 C 4=6 C 1=6 D 1

v3 �v3 D 1=66 C 16=66 C 49=66 D 1

whichshowsthat v1, v2, and v3 areunitvectors. Thus fv1; v2; v3g isanorthonormalset.Sincethesetislinearlyindependent, itsthreevectorsformabasisforR

3. SeeFig. 6.

x1

x2

x3

v1

v2

v3

FIGURE 6

Page 15: 6 Orthogonality and Least Squares

6.2 Orthogonal Sets 343

Whenthevectorsinanorthogonalsetofnonzerovectorsare normalized tohaveunitlength, thenewvectorswillstillbeorthogonal, andhencethenewsetwillbeanor-thonormalset. SeeExercise 32. ItiseasytocheckthatthevectorsinFig. 6(Example 5)aresimplytheunitvectorsinthedirectionsofthevectorsinFig. 1(Example 1).

Matriceswhosecolumnsformanorthonormalsetareimportantinapplicationsandincomputeralgorithmsformatrixcomputations. TheirmainpropertiesaregiveninTheorems 6and7.

THEOREM 6 An m � n matrix U hasorthonormalcolumnsifandonlyif U TU D I .

PROOF Tosimplifynotation, wesupposethat U hasonlythreecolumns, eachavectorin R

m. Theproofofthegeneralcaseisessentiallythesame. Let U D Œ u1 u2 u3 �

andcompute

U TU D

2

64

uT1

uT2

uT3

3

75

u1 u2 u3

D

2

64

uT1 u1 uT

1 u2 uT1 u3

uT2 u1 uT

2 u2 uT2 u3

uT3 u1 uT

3 u2 uT3 u3

3

75 (4)

Theentriesinthematrixattherightareinnerproducts, usingtransposenotation. Thecolumnsof U areorthogonalifandonlyif

uT1 u2 D uT

2 u1 D 0; uT1 u3 D uT

3 u1 D 0; uT2 u3 D uT

3 u2 D 0 (5)

Thecolumnsof U allhaveunitlengthifandonlyif

uT1 u1 D 1; uT

2 u2 D 1; uT3 u3 D 1 (6)

Thetheoremfollowsimmediatelyfrom(4)–(6).

THEOREM 7 Let U bean m � n matrixwithorthonormalcolumns, andlet x and y bein Rn.

Then

a. kU xk D kxkb. .U x/�.U y/ D x�yc. .U x/� .U y/ D 0 ifandonlyif x�y D 0

Properties(a)and(c)saythatthelinearmapping x 7! U x preserveslengthsandorthogonality. Thesepropertiesarecrucialformanycomputeralgorithms. SeeExer-cise 25fortheproofofTheorem7.

EXAMPLE 6 Let U D

2

64

1=p

2 2=3

1=p

2 �2=3

0 1=3

3

75 and x D

�p2

3

. Noticethat U hasor-

thonormalcolumnsand

U TU D�

1=p

2 1=p

2 0

2=3 �2=3 1=3

�2

4

1=p

2 2=3

1=p

2 �2=3

0 1=3

3

5 D�

1 0

0 1

Verifythat kU xk D kxk.

Page 16: 6 Orthogonality and Least Squares

344 CHAPTER 6 Orthogonality and Least Squares

SOLUTION

U x D

2

4

1=p

2 2=3

1=p

2 �2=3

0 1=3

3

5

�p2

3

D

2

4

3

�1

1

3

5

kU xk Dp

9 C 1 C 1 Dp

11

kxk Dp

2 C 9 Dp

11

Theorems6and7areparticularlyusefulwhenappliedto square matrices. Anorthogonalmatrix isasquareinvertiblematrixU suchthatU�1 D U T . ByTheorem 6,suchamatrixhasorthonormalcolumns.¹ Itiseasytoseethatany square matrixwithorthonormalcolumnsisanorthogonalmatrix. Surprisingly, suchamatrixmusthaveorthonormal rows, too. SeeExercises 27and28. OrthogonalmatriceswillappearfrequentlyinChapter 7.

EXAMPLE 7 Thematrix

U D

2

64

3=p

11 �1=p

6 �1=p

66

1=p

11 2=p

6 �4=p

66

1=p

11 1=p

6 7=p

66

3

75

isanorthogonalmatrixbecauseitissquareandbecauseitscolumnsareorthonormal,byExample 5. Verifythattherowsareorthonormal, too!

PRACTICE PROBLEMS

1. Let u1 D�

�1=p

5

2=p

5

and u2 D�

2=p

5

1=p

5

. Showthat fu1; u2g isanorthonormal

basisfor R2.

2. Let y and L beasinExample 3andFig. 3. Computetheorthogonalprojection Oy ofy onto L using u D

2

1

insteadofthe u inExample 3.

3. Let U and x beasinExample 6, andlet y D�

�3p

2

6

. Verifythat U x�U y D x�y.

6.2 EXERCISESInExercises1–6, determinewhichsetsofvectorsareorthogonal.

1.

2

4

�1

4

�3

3

5,

2

4

5

2

1

3

5,

2

4

3

�4

�7

3

5 2.

2

4

1

�2

1

3

5,

2

4

0

1

2

3

5,

2

4

�5

�2

1

3

5

3.

2

4

2

�7

�1

3

5,

2

4

�6

�3

9

3

5,

2

4

3

1

�1

3

5 4.

2

4

2

�5

�3

3

5,

2

4

0

0

0

3

5,

2

4

4

�2

6

3

5

5.

2

664

3

�2

1

3

3

775,

2

664

�1

3

�3

4

3

775,

2

664

3

8

7

0

3

775

6.

2

664

5

�4

0

3

3

775,

2

664

�4

1

�3

8

3

775,

2

664

3

3

5

�1

3

775

InExercises7–10, showthat fu1;u2g or fu1;u2; u3g isanorthog-onalbasisfor R

2 or R3, respectively. Thenexpress x asalinear

combinationofthe u’s.

7. u1 D�

2

�3

, u2 D�

6

4

, and x D�

9

�7

¹A betternamemightbe orthonormalmatrix, andthistermisfoundinsomestatisticstexts. However,orthogonalmatrix isthestandardterminlinearalgebra.

Page 17: 6 Orthogonality and Least Squares

6.2 Orthogonal Sets 345

8. u1 D�

3

1

, u2 D�

�2

6

, and x D�

�6

3

9. u1 D

2

4

1

0

1

3

5, u2 D

2

4

�1

4

1

3

5, u3 D

2

4

2

1

�2

3

5, and x D

2

4

8

�4

�3

3

5

10. u1 D

2

4

3

�3

0

3

5, u2 D

2

4

2

2

�1

3

5, u3 D

2

4

1

1

4

3

5, and x D

2

4

5

�3

1

3

5

11. Compute theorthogonal projectionof�

1

7

onto the line

through�

�4

2

andtheorigin.

12. Computetheorthogonalprojectionof�

1

�1

ontotheline

through�

�1

3

andtheorigin.

13. Let y D�

2

3

and u D�

4

�7

. Write y asthesumoftwo

orthogonalvectors, onein Span fug andoneorthogonalto u.

14. Let y D�

2

6

and u D�

7

1

. Write y asthesumofavector

in Span fug andavectororthogonalto u.

15. Let y D�

3

1

and u D�

8

6

. Computethedistancefrom ytothelinethrough u andtheorigin.

16. Let y D�

�3

9

and u D�

1

2

. Computethedistancefrom ytothelinethrough u andtheorigin.

InExercises17–22, determinewhichsetsofvectorsareorthonor-mal. Ifasetisonlyorthogonal, normalizethevectorstoproduceanorthonormalset.

17.

2

4

1=3

1=3

1=3

3

5,

2

4

�1=2

0

1=2

3

5 18.

2

4

0

1

0

3

5,

2

4

0

�1

0

3

5

19.�

�:6

:8

,�

:8

:6

20.

2

4

�2=3

1=3

2=3

3

5,

2

4

1=3

2=3

0

3

5

21.

2

4

1=p

10

3=p

20

3=p

20

3

5,

2

4

3=p

10

�1=p

20

�1=p

20

3

5,

2

4

0

�1=p

2

1=p

2

3

5

22.

2

4

1=p

18

4=p

18

1=p

18

3

5,

2

4

1=p

2

0

�1=p

2

3

5,

2

4

�2=3

1=3

�2=3

3

5

InExercises23and24, allvectorsareinRn. Markeachstatement

TrueorFalse. Justifyeachanswer.

23. a. Noteverylinearlyindependentsetin Rn isanorthogonal

set.

b. If y isalinearcombinationofnonzerovectorsfromanorthogonalset, thentheweightsinthelinearcombinationcanbecomputedwithoutrowoperationsonamatrix.

c. Ifthevectorsinanorthogonalsetofnonzerovectorsarenormalized, thensomeofthenewvectorsmaynotbeorthogonal.

d. A matrix with orthonormal columns is an orthogonalmatrix.

e. IfL isalinethrough 0 andif Oy istheorthogonalprojectionof y onto L, then kOyk givesthedistancefrom y to L.

24. a. Noteveryorthogonalsetin Rn islinearlyindependent.

b. Ifaset S D fu1; : : : ; upg hasthepropertythat ui � uj D 0

whenever i ¤ j , then S isanorthonormalset.c. Ifthecolumnsofanm � nmatrixA areorthonormal, then

thelinearmapping x 7! Ax preserveslengths.d. Theorthogonalprojectionof y onto v isthesameasthe

orthogonalprojectionof y onto cv whenever c ¤ 0.e. Anorthogonalmatrixisinvertible.

25. ProveTheorem 7. [Hint: For(a), compute kU xk2, orprove(b)first.]

26. Suppose W is a subspace of Rn spanned by n nonzero

orthogonalvectors. Explainwhy W D Rn.

27. LetU beasquarematrixwithorthonormalcolumns. Explainwhy U isinvertible. (Mentionthetheoremsyouuse.)

28. Let U bean n � n orthogonalmatrix. ShowthattherowsofU formanorthonormalbasisof R

n.

29. Let U and V be n � n orthogonalmatrices. ExplainwhyUV isanorthogonalmatrix. [Thatis, explainwhy UV isinvertibleanditsinverseis .UV /T .]

30. Let U beanorthogonalmatrix, andconstruct V byinter-changingsomeofthecolumnsof U . Explainwhy V isanorthogonalmatrix.

31. Showthattheorthogonalprojectionofavector y ontoalineL throughtheoriginin R

2 doesnotdependonthechoiceofthenonzero u in L usedintheformulafor Oy. Todothis, suppose y and u aregivenand Oy hasbeencomputedbyformula(2)inthissection. Replace u inthatformulaby cu,where c isanunspecifiednonzeroscalar. Showthatthenewformulagivesthesame Oy.

32. Let fv1; v2g beanorthogonalsetofnonzerovectors, andletc1, c2 beanynonzeroscalars. Showthat fc1v1; c2v2g isalsoanorthogonalset. Sinceorthogonalityofasetisdefinedintermsofpairsofvectors, thisshowsthatifthevectorsinanorthogonalsetarenormalized, thenewsetwillstillbeorthogonal.

33. Given u ¤ 0 in Rn, let L D Span fug. Showthatthemap-

ping x 7! projL x isalineartransformation.

34. Given u ¤ 0 in Rn, let L D Span fug. For y in R

n, thereflectionofyin L isthepoint reflL y definedby

Page 18: 6 Orthogonality and Least Squares

346 CHAPTER 6 Orthogonality and Least Squares

reflL y D 2� projL y � y

See the figure, which shows that reflL y is the sum ofOy D projL y and Oy � y. Showthatthemapping y 7! reflL yisalineartransformation.

x1

x2 y

u

y

L = Span{u}

yy – ˆ ref lL y

yy –ˆ

Thereflectionof y inalinethroughtheorigin.

35. [M] Showthatthecolumnsofthematrix A areorthogonalbymakinganappropriatematrixcalculation. Statethecal-culationyouuse.

A D

2

66666666664

�6 �3 6 1

�1 2 1 �6

3 6 3 �2

6 �3 6 �1

2 �1 2 3

�3 6 3 2

�2 �1 2 �3

1 2 1 6

3

77777777775

36. [M] Inparts(a)–(d), let U bethematrixformedbynormal-izingeachcolumnofthematrix A inExercise 35.a. Compute U TU and U U T . Howdotheydiffer?b. Generate a random vector y in R

8, and computep D U U Ty and z D y � p. Explainwhy p isin ColA.Verifythat z isorthogonalto p.

c. Verifythat z isorthogonaltoeachcolumnof U .d. Noticethat y D p C z, with p in ColA. Explainwhy z is

in .ColA/?. (Thesignificanceofthisdecompositionofy willbeexplainedinthenextsection.)

SOLUTIONS TO PRACTICE PROBLEMS

1. Thevectorsareorthogonalbecause

u1 � u2 D �2=5 C 2=5 D 0

Theyareunitvectorsbecause

ku1k2 D .�1=p

5/2 C .2=p

5/2 D 1=5 C 4=5 D 1

ku2k2 D .2=p

5/2 C .1=p

5/2 D 4=5 C 1=5 D 1

Inparticular, theset fu1; u2g islinearlyindependent, andhenceisabasisforR2 since

therearetwovectorsintheset.

2. When y D�

7

6

and u D�

2

1

,

Oy D y�uu�u

u D 20

5

2

1

D 4

2

1

D�

8

4

Thisisthesame Oy foundinExample 3. Theorthogonalprojectiondoesnotseemtodependonthe u chosenontheline. SeeExercise 31.

3. U y D

2

4

1=p

2 2=3

1=p

2 �2=3

0 1=3

3

5

�3p

2

6

D

2

4

1

�7

2

3

5

Also, fromExample 6, x D�p

2

3

and U x D

2

4

3

�1

1

3

5. Hence

U x�U y D 3 C 7 C 2 D 12; and x�y D �6 C 18 D 12SGMastering: OrthogonalBasis 6–4

Page 19: 6 Orthogonality and Least Squares

6.3 Orthogonal Projections 347

6.3 ORTHOGONAL PROJECTIONS

TheorthogonalprojectionofapointinR2 ontoalinethroughtheoriginhasanimportant

analoguein Rn. Givenavector y andasubspace W in R

n, thereisavector Oy in W suchthat(1) Oy istheuniquevectorin W forwhich y � Oy isorthogonalto W , and(2) Oy istheuniquevectorin W closestto y. SeeFig. 1. Thesetwopropertiesof Oy providethekeytofindingleast-squaressolutionsoflinearsystems, mentionedintheintroductoryexampleforthischapter. ThefullstorywillbetoldinSection6.5.

Toprepareforthefirsttheorem, observethatwheneveravector y iswrittenasalinearcombinationofvectors u1; : : : ;un inR

n, thetermsinthesumfor y canbegroupedy

y0W

FIGURE 1

intotwopartssothat y canbewrittenas

y D z1 C z2

where z1 isalinearcombinationofsomeofthe ui and z2 isalinearcombinationoftherestofthe ui . Thisideaisparticularlyusefulwhen fu1; : : : ;ung isanorthogonalbasis. RecallfromSection 6.1that W ? denotesthesetofallvectorsorthogonaltoasubspace W .

EXAMPLE 1 Let fu1; : : : ; u5g beanorthogonalbasisfor R5 andlet

y D c1u1 C � � � C c5u5

Considerthesubspace W D Span fu1;u2g, andwrite y asthesumofavector z1 in W

andavector z2 in W ?.

SOLUTION Write

y D c1u1 C c2u2„ ƒ‚ …

z1

C c3u3 C c4u4 C c5u5„ ƒ‚ …

z2

z1 D c1u1 C c2u2 isin Span fu1; u2gwhere

z2 D c3u3 C c4u4 C c5u5 isin Span fu3;u4;u5g:and

Toshowthat z2 isin W ?, itsufficestoshowthat z2 isorthogonaltothevectorsinthebasis fu1; u2g for W . (SeeSection 6.1.) Usingpropertiesoftheinnerproduct, compute

z2 �u1 D .c3u3 C c4u4 C c5u5/�u1

D c3u3 � u1 C c4u4 � u1 C c5u5 � u1

D 0

because u1 isorthogonalto u3, u4, and u5. A similarcalculationshowsthat z2 �u2 D 0.Thus z2 isin W ?.

Thenexttheoremshowsthatthedecomposition y D z1 C z2 inExample 1canbecomputedwithouthavinganorthogonalbasisforR

n. Itisenoughtohaveanorthogonalbasisonlyfor W .

Page 20: 6 Orthogonality and Least Squares

348 CHAPTER 6 Orthogonality and Least Squares

THEOREM 8 The Orthogonal Decomposition Theorem

Let W beasubspaceof Rn. Theneach y in R

n canbewrittenuniquelyintheform

y D Oy C z (1)

where Oy isin W and z isin W ?. Infact, if fu1; : : : ; upg isanyorthogonalbasisof W , then

Oy D y�u1

u1 � u1

u1 C � � � C y�up

up � up

up (2)

and z D y � Oy.

Thevector Oy in(1)iscalledthe orthogonalprojectionof y onto W andofteniswrittenas projW y. SeeFig. 2. When W isaone-dimensionalsubspace, theformulaforOy matchestheformulagiveninSection 6.2.

0

W

y

y = projW

z = y – y

FIGURE 2 Theorthogonalprojectionof yonto W .

PROOF Let fu1; : : : ; upg beanyorthogonalbasisfor W , anddefine Oy by(2).¹ Then Oyisin W because Oy isalinearcombinationofthebasis u1; : : : ; up . Let z D y � Oy. Sinceu1 isorthogonalto u2; : : : ; up , itfollowsfrom(2)that

z�u1 D .y � Oy/�u1 D y�u1 �� y�u1

u1 � u1

u1 � u1 � 0 � � � � � 0

D y�u1 � y�u1 D 0

Thus z isorthogonalto u1. Similarly, z isorthogonaltoeach uj inthebasisfor W .Hence z isorthogonaltoeveryvectorin W . Thatis, z isin W ?.

Toshowthatthedecompositionin(1)isunique, suppose y canalsobewrittenasy D Oy1 C z1, with Oy1 inW and z1 inW ?. Then Oy C z D Oy1 C z1 (sincebothsidesequaly/, andso

Oy � Oy1 D z1 � z

Thisequalityshowsthatthevector v D Oy � Oy1 isin W andin W ? (because z1 and zarebothin W ?, and W ? isasubspace). Hence v�v D 0, whichshowsthat v D 0. Thisprovesthat Oy D Oy1 andalso z1 D z.

Theuniquenessofthedecomposition(1)showsthattheorthogonalprojection Oydependsonlyon W andnotontheparticularbasisusedin(2).

¹Wemayassumethat W isnotthezerosubspace, forotherwise W? D Rn and(1)issimply y D 0C y.Thenextsectionwillshowthatanynonzerosubspaceof Rn hasanorthogonalbasis.

Page 21: 6 Orthogonality and Least Squares

6.3 Orthogonal Projections 349

EXAMPLE 2 Let u1 D

2

4

2

5

�1

3

5, u2 D

2

4

�2

1

1

3

5, and y D

2

4

1

2

3

3

5. Observethat fu1; u2g

isanorthogonalbasisfor W D Span fu1; u2g. Write y asthesumofavectorin W andavectororthogonalto W .

SOLUTION Theorthogonalprojectionof y onto W is

Oy D y�u1

u1 � u1

u1 C y�u2

u2 � u2

u2

D 9

30

2

4

2

5

�1

3

5C 3

6

2

4

�2

1

1

3

5 D 9

30

2

4

2

5

�1

3

5C 15

30

2

4

�2

1

1

3

5 D

2

4

�2=5

2

1=5

3

5

Also

y � Oy D

2

4

1

2

3

3

5 �

2

4

�2=5

2

1=5

3

5 D

2

4

7=5

0

14=5

3

5

Theorem 8ensuresthat y � Oy isin W ?. Tocheckthecalculations, however, itisagoodideatoverifythat y � Oy isorthogonaltoboth u1 and u2 andhencetoallof W . Thedesireddecompositionof y is

y D

2

4

1

2

3

3

5 D

2

4

�2=5

2

1=5

3

5C

2

4

7=5

0

14=5

3

5

A Geometric Interpretation of the Orthogonal ProjectionWhen W isaone-dimensionalsubspace, theformula(2)for projW y containsjustoneterm. Thus, when dimW > 1, eachtermin(2)isitselfanorthogonalprojectionof yontoaone-dimensionalsubspacespannedbyoneofthe u’sinthebasisfor W . Figure 3illustratesthiswhenW isasubspaceofR

3 spannedby u1 and u2. Here Oy1 and Oy2 denotetheprojectionsof y ontothelinesspannedby u1 and u2, respectively. Theorthogonalprojection Oy of y onto W isthesumoftheprojectionsof y ontoone-dimensionalsub-spacesthatareorthogonaltoeachother. Thevector Oy inFig. 3correspondstothevectory inFig. 4ofSection 6.2, becausenowitis Oy thatisin W .

u1

u2

0

y

y2

ˆ

y1

ˆ

y = u2 = y

1 + y

2ˆˆ–––––u

2 . u

2

y . u2–––––u

1 . u

1

y . u1 u

1+

FIGURE 3 Theorthogonalprojectionof y isthesumofitsprojectionsontoone-dimensionalsubspacesthataremutuallyorthogonal.

Page 22: 6 Orthogonality and Least Squares

350 CHAPTER 6 Orthogonality and Least Squares

Properties of Orthogonal ProjectionsIf fu1; : : : ; upg isanorthogonalbasisfor W andif y happenstobein W , thentheformulafor projW y isexactlythesameastherepresentationof y giveninTheorem 5inSection 6.2. Inthiscase, projW y D y.

If y isin W D Span fu1; : : : ; upg, then projW y D y.

Thisfactalsofollowsfromthenexttheorem.

THEOREM 9 The Best Approximation Theorem

Let W beasubspaceof Rn, let y beanyvectorin R

n, andlet Oy betheorthogonalprojectionof y onto W . Then Oy istheclosestpointin W to y, inthesensethat

ky � Oyk < ky � vk (3)

forall v in W distinctfrom Oy.

Thevector Oy inTheorem 9iscalled thebestapproximationto y byelementsofW .Latersectionsinthetextwillexamineproblemswhereagiven y mustbereplaced, orapproximated, byavector v insomefixedsubspace W . Thedistancefrom y to v, givenby ky � vk, canberegardedasthe“error”ofusing v inplaceof y. Theorem 9saysthatthiserrorisminimizedwhen v D Oy.

Inequality(3)leadstoanewproofthat Oy doesnotdependontheparticularorthogo-nalbasisusedtocomputeit. IfadifferentorthogonalbasisforW wereusedtoconstructanorthogonalprojectionof y, thenthisprojectionwouldalsobetheclosestpointin W

to y, namely, Oy.

PROOF Take v inW distinctfrom Oy. SeeFig. 4. Then Oy � v isinW . BytheOrthogonalDecompositionTheorem, y � Oy isorthogonalto W . Inparticular, y � Oy isorthogonalto Oy � v (whichisin W ). Since

y � v D .y � Oy/ C .Oy � v/

thePythagoreanTheoremgives

ky � vk2 D ky � Oyk2 C kOy � vk2

(SeethecoloredrighttriangleinFig. 4. Thelengthofeachsideislabeled.) NowkOy � vk2 > 0 because Oy � v ¤ 0, andsoinequality(3)followsimmediately.

y

v

0

W

||y – v||y

||y – v||ˆ

||y – y||ˆ

FIGURE 4 Theorthogonalprojectionof yonto W istheclosestpointin W to y.

Page 23: 6 Orthogonality and Least Squares

6.3 Orthogonal Projections 351

EXAMPLE 3 If u1 D

2

4

2

5

�1

3

5, u2 D

2

4

�2

1

1

3

5, y D

2

4

1

2

3

3

5, and W D Span fu1; u2g,

asinExample 2, then theclosestpointin W to y is

Oy D y�u1

u1 � u1

u1 C y�u2

u2 � u2

u2 D

2

4

�2=5

2

1=5

3

5

EXAMPLE 4 Thedistancefromapoint y in Rn toasubspace W isdefinedasthe

distancefrom y tothenearestpointinW . Findthedistancefrom y toW D Span fu1; u2g,where

y D

2

4

�1

�5

10

3

5; u1 D

2

4

5

�2

1

3

5; u2 D

2

4

1

2

�1

3

5

SOLUTION BytheBestApproximationTheorem, thedistancefrom y toW is ky � Oyk,where Oy D projW y. Since fu1; u2g isanorthogonalbasisfor W ,

Oy D 15

30u1 C �21

6u2 D 1

2

2

4

5

�2

1

3

5 � 7

2

2

4

1

2

�1

3

5 D

2

4

�1

�8

4

3

5

y � Oy D

2

4

�1

�5

10

3

5 �

2

4

�1

�8

4

3

5 D

2

4

0

3

6

3

5

ky � Oyk2 D 32 C 62 D 45

Thedistancefrom y to W isp

45 D 3p

5.

Thefinaltheoreminthissectionshowshowformula(2)for projW y issimplifiedwhenthebasisfor W isanorthonormalset.

THEOREM 10 If fu1; : : : ; upg isanorthonormalbasisforasubspace W of Rn, then

projW y D .y�u1/u1 C .y�u2/u2 C � � � C .y�up/up (4)

If U D Œ u1 u2 � � � up �, then

projW y D U U Ty forall y in Rn (5)

PROOF Formula(4)followsimmediatelyfrom(2)inTheorem 8. Also, (4)showsthat projW y is a linearcombinationof thecolumnsof U usingtheweights y�u1,y�u2; : : : ; y�up . Theweightscanbewrittenas uT

1 y; uT2 y; : : : ; uT

py, showingthattheyaretheentriesin U Ty andjustifying(5).

Suppose U isan n � p matrixwithorthonormalcolumns, andlet W bethecolumnWEB

spaceof U . ThenU TU x D Ipx D x forall x in R

p Theorem 6

U U Ty D projW y forall y in Rn Theorem 10

If U isan n � n (square)matrixwithorthonormalcolumns, then U isan orthogonalmatrix, thecolumnspace W isallof R

n, and U U Ty D Iy D y forall y in Rn.

Althoughformula(4)isimportantfortheoreticalpurposes, inpracticeitusuallyinvolvescalculationswithsquarerootsofnumbers(intheentriesofthe ui /. Formula(2)isrecommendedforhandcalculations.

Page 24: 6 Orthogonality and Least Squares

352 CHAPTER 6 Orthogonality and Least Squares

PRACTICE PROBLEM

Let u1 D

2

4

�7

1

4

3

5, u2 D

2

4

�1

1

�2

3

5, y D

2

4

�9

1

6

3

5, and W D Span fu1; u2g. Usethefact

that u1 and u2 areorthogonaltocompute projW y.

6.3 EXERCISESInExercises1and2, youmayassumethat fu1; : : : ; u4g isanorthogonalbasisfor R

4.

1. u1 D

2

664

0

1

�4

�1

3

775, u2 D

2

664

3

5

1

1

3

775, u3 D

2

664

1

0

1

�4

3

775, u4 D

2

664

5

�3

�1

1

3

775,

x D

2

664

10

�8

2

0

3

775. Write x asthesumoftwovectors, onein

Span fu1;u2;u3g andtheother in Span fu4g.

2. u1 D

2

664

1

2

1

1

3

775, u2 D

2

664

�2

1

�1

1

3

775, u3 D

2

664

1

1

�2

�1

3

775, u4 D

2

664

�1

1

1

�2

3

775,

v D

2

664

4

5

�3

3

3

775. Write v asthesumof twovectors, onein

Span fu1g andtheother in Span fu2;u3;u4g.

InExercises3–6, verifythat fu1;u2g isanorthogonalset, andthenfindtheorthogonalprojectionof y onto Span fu1;u2g.

3. y D

2

4

�1

4

3

3

5, u1 D

2

4

1

1

0

3

5, u2 D

2

4

�1

1

0

3

5

4. y D

2

4

6

3

�2

3

5, u1 D

2

4

3

4

0

3

5, u2 D

2

4

�4

3

0

3

5

5. y D

2

4

�1

2

6

3

5, u1 D

2

4

3

�1

2

3

5, u2 D

2

4

1

�1

�2

3

5

6. y D

2

4

6

4

1

3

5, u1 D

2

4

�4

�1

1

3

5, u2 D

2

4

0

1

1

3

5

InExercises7–10, let W bethesubspacespannedbythe u’s, andwrite y asthesumofavectorin W andavectororthogonalto W .

7. y D

2

4

1

3

5

3

5, u1 D

2

4

1

3

�2

3

5, u2 D

2

4

5

1

4

3

5

8. y D

2

4

�1

4

3

3

5, u1 D

2

4

1

1

1

3

5, u2 D

2

4

�1

3

�2

3

5

9. y D

2

664

4

3

3

�1

3

775, u1 D

2

664

1

1

0

1

3

775, u2 D

2

664

�1

3

1

�2

3

775, u3 D

2

664

�1

0

1

1

3

775

10. y D

2

664

3

4

5

6

3

775, u1 D

2

664

1

1

0

�1

3

775, u2 D

2

664

1

0

1

1

3

775, u3 D

2

664

0

�1

1

�1

3

775

InExercises11and12, findtheclosestpointto y inthesubspaceW spannedby v1 and v2.

11. y D

2

664

3

1

5

1

3

775, v1 D

2

664

3

1

�1

1

3

775, v2 D

2

664

1

�1

1

�1

3

775

12. y D

2

664

3

�1

1

13

3

775, v1 D

2

664

1

�2

�1

2

3

775, v2 D

2

664

�4

1

0

3

3

775

InExercises13and14, findthebestapproximationto z byvectorsoftheform c1v1 C c2v2.

13. z D

2

664

3

�7

2

3

3

775, v1 D

2

664

2

�1

�3

1

3

775, v2 D

2

664

1

1

0

�1

3

775

14. z D

2

664

2

4

0

�1

3

775, v1 D

2

664

2

0

�1

�3

3

775, v2 D

2

664

5

�2

4

2

3

775

15. Let y D

2

4

5

�9

5

3

5, u1 D

2

4

�3

�5

1

3

5, u2 D

2

4

�3

2

1

3

5. Find the

distancefrom y tothe planein R3 spannedby u1 and u2.

16. Let y, v1, and v2 beasinExercise 12. Findthedistancefromy tothesubspaceof R

4 spannedby v1 and v2.

17. Let y D

2

4

4

8

1

3

5, u1 D

2

4

2=3

1=3

2=3

3

5, u2 D

2

4

�2=3

2=3

1=3

3

5, and

W D Span fu1;u2g.

Page 25: 6 Orthogonality and Least Squares

6.3 Orthogonal Projections 353

a. Let U D Œ u1 u2 �. Compute U TU and U U T .b. Compute projW y and .U U T /y.

18. Let y D�

7

9

, u1 D�

1=p

10

�3=p

10

, and W D Span fu1g.

a. Let U be the 2 � 1 matrixwhoseonlycolumnis u1.Compute U TU and U U T .

b. Compute projW y and .U U T /y.

19. Let u1 D

2

4

1

1

�2

3

5, u2 D

2

4

5

�1

2

3

5, and u3 D

2

4

0

0

1

3

5. Notethat

u1 and u2 areorthogonalbutthat u3 isnotorthogonalto u1 oru2. Itcanbeshownthat u3 isnotinthesubspace W spannedby u1 and u2. Usethisfacttoconstructanonzerovector v inR

3 thatisorthogonalto u1 and u2.

20. Let u1 and u2 beasinExercise 19, andlet u4 D

2

4

0

1

0

3

5. Itcan

beshownthat u4 isnotinthesubspace W spannedby u1 andu2. Usethisfacttoconstructanonzerovector v in R

3 thatisorthogonalto u1 and u2.

InExercises21and22, allvectorsandsubspacesarein Rn. Mark

eachstatementTrueorFalse. Justifyeachanswer.

21. a. If z is orthogonal to u1 and to u2 and if W DSpan fu1;u2g, then z mustbein W ?.

b. Foreach y andeachsubspace W , thevector y � projW yisorthogonalto W .

c. Theorthogonalprojection Oy of y ontoasubspace W cansometimesdependontheorthogonalbasisfor W usedtocompute Oy.

d. If y isinasubspace W , thentheorthogonalprojectionofy onto W is y itself.

e. Ifthecolumnsofann � pmatrixU areorthonormal, thenU U Ty istheorthogonalprojectionof y ontothecolumnspaceof U .

22. a. If W isasubspaceof Rn andif v isinboth W and W ?,

then v mustbethezerovector.b. IntheOrthogonalDecompositionTheorem, eachtermin

formula(2)for Oy isitselfanorthogonalprojectionof yontoasubspaceof W .

c. If y D z1 C z2, where z1 isinasubspace W and z2 isinW ?, then z1 mustbetheorthogonalprojectionof y ontoW .

d. Thebestapproximationto y byelementsofasubspaceW isgivenbythevector y � projW y.

e. If an n � p matrix U hasorthonormal columns, thenU U Tx D x forall x in R

n.23. Let A bean m � n matrix. Provethateveryvector x in R

n

canbewrittenintheform x D p C u, where p isin RowA

and u isin NulA. Also, showthatiftheequation Ax D bisconsistent, thenthereisaunique p in RowA suchthatAp D b.

24. Let W be a subspace of Rn with an orthogonal basis

fw1; : : : ;wpg, andlet fv1; : : : ; vqg beanorthogonalbasisforW ?.a. Explainwhy fw1; : : : ;wp; v1; : : : ; vqg is anorthogonal

set.b. Explainwhythesetinpart(a)spans R

n.c. Showthat dimW C dimW ? D n.

25. [M] Let U bethe 8 � 4 matrixinExercise 36inSection 6.2.Findtheclosestpointto y D .1; 1; 1; 1; 1; 1; 1; 1/ in ColU .Write the keystrokes or commandsyouuse to solve thisproblem.

26. [M] Let U bethematrixinExercise 25. Findthedistancefrom b D .1; 1; 1; 1; �1; �1; �1; �1/ to ColU .

SOLUTION TO PRACTICE PROBLEM

Compute

projW y D y�u1

u1 � u1

u1 C y�u2

u2 � u2

u2 D 88

66u1 C �2

6u2

D 4

3

2

4

�7

1

4

3

5 � 1

3

2

4

�1

1

�2

3

5 D

2

4

�9

1

6

3

5 D y

Inthiscase, y happenstobealinearcombinationof u1 and u2, so y isin W . Theclosestpointin W to y is y itself.

Page 26: 6 Orthogonality and Least Squares

354 CHAPTER 6 Orthogonality and Least Squares

6.4 THE GRAM SCHMIDT PROCESS

TheGram–Schmidt process is a simple algorithmfor producinganorthogonal ororthonormalbasisforanynonzerosubspaceofR

n. Thefirsttwoexamplesoftheprocessareaimedathandcalculation.

EXAMPLE 1 Let W D Span fx1; x2g, where x1 D

2

4

3

6

0

3

5 and x2 D

2

4

1

2

2

3

5. Con-

x1

x2

x2

v2

x3

W

v1 =

x

1

p

0

FIGURE 1

Constructionofanorthogonalbasis fv1; v2g.

structanorthogonalbasis fv1; v2g for W .

SOLUTION Thesubspace W isshowninFig. 1, alongwith x1, x2, andtheprojectionp of x2 onto x1. Thecomponentof x2 orthogonalto x1 is x2 � p, whichisin W becauseitisformedfrom x2 andamultipleof x1. Let v1 D x1 and

v2 D x2 � p D x2 � x2 � x1

x1 � x1

x1 D

2

4

1

2

2

3

5 � 15

45

2

4

3

6

0

3

5 D

2

4

0

0

2

3

5

Then fv1; v2g isanorthogonalsetofnonzerovectorsin W . Since dimW D 2, thesetfv1; v2g isabasisfor W .

ThenextexamplefullyillustratestheGram–Schmidtprocess. Studyitcarefully.

EXAMPLE 2 Let x1 D

2

664

1

1

1

1

3

775, x2 D

2

664

0

1

1

1

3

775, and x3 D

2

664

0

0

1

1

3

775. Then fx1; x2; x3g is

clearlylinearlyindependentandthusisabasisforasubspace W of R4. Constructan

orthogonalbasisfor W .

SOLUTION

Step1. Let v1 D x1 and W1 D Span fx1g D Span fv1g.Step2. Let v2 bethevectorproducedbysubtractingfrom x2 itsprojectionontothesubspace W1. Thatis, let

v2 D x2 � projW1x2

D x2 � x2 � v1

v1 � v1

v1 Since v1 D x1

D

2

664

0

1

1

1

3

775

� 3

4

2

664

1

1

1

1

3

775

D

2

664

�3=4

1=4

1=4

1=4

3

775

AsinExample 1, v2 is thecomponent of x2 orthogonal to x1, and fv1; v2g is anorthogonalbasisforthesubspace W2 spannedby x1 and x2.Step20 (optional). Ifappropriate, scale v2 tosimplifylatercomputations. Since v2 hasfractionalentries, itisconvenienttoscaleitbyafactorof4andreplace fv1; v2g bytheorthogonalbasis

v1 D

2

664

1

1

1

1

3

775

; v02 D

2

664

�3

1

1

1

3

775

Page 27: 6 Orthogonality and Least Squares

6.4 The Gram–Schmidt Process 355

Step3. Let v3 bethevectorproducedbysubtractingfrom x3 itsprojectionontothesubspace W2. Usetheorthogonalbasis fv1; v02g tocomputethisprojectiononto W2:

projW2x3 D

Projectionofx3 onto v1

x3 � v1

v1 � v1

v1 C

Projectionofx3 onto v02

x3 � v02v02 � v02

v02 D 2

4

2

664

1

1

1

1

3

775

C 2

12

2

664

�3

1

1

1

3

775

D

2

664

0

2=3

2=3

2=3

3

775

Then v3 isthecomponentof x3 orthogonalto W2, namely,

v3 D x3 � projW2x3 D

2

664

0

0

1

1

3

775

2

664

0

2=3

2=3

2=3

3

775

D

2

664

0

�2=3

1=3

1=3

3

775

SeeFig. 2foradiagramofthisconstruction. Observethat v3 isin W , because x3

and projW2x3 arebothin W . Thus fv1; v02; v3g isanorthogonalsetofnonzerovectorsandhencealinearlyindependentsetin W . Notethat W isthree-dimensionalsinceitwasdefinedbyabasisofthreevectors. Hence, bytheBasisTheoreminSection 4.5,fv1; v02; v3g isanorthogonalbasisfor W .

projW

2

x3

x3

v3

v1

0

v'2

W2 = Span{v

1, v'

2}

FIGURE 2 Theconstructionof v3 from x3

and W2.

Theproofofthenexttheoremshowsthatthisstrategyreallyworks. Scalingofvectorsisnotmentionedbecausethatisusedonlytosimplifyhandcalculations.

THEOREM 11 The Gram--Schmidt Process

Givenabasis fx1; : : : ; xpg foranonzerosubspace W of Rn, define

v1 D x1

v2 D x2 � x2 � v1

v1 � v1

v1

v3 D x3 � x3 � v1

v1 � v1

v1 � x3 � v2

v2 � v2

v2

:::

vp D xp � xp � v1

v1 � v1

v1 � xp � v2

v2 � v2

v2 � � � � � xp � vp�1

vp�1 � vp�1

vp�1

Then fv1; : : : ; vpg isanorthogonalbasisfor W . Inaddition

Span fv1; : : : ; vkg D Span fx1; : : : ; xkg for 1 � k � p (1)

Page 28: 6 Orthogonality and Least Squares

356 CHAPTER 6 Orthogonality and Least Squares

PROOF For 1 � k � p, letWk D Span fx1; : : : ; xkg. Set v1 D x1, sothat Span fv1g DSpan fx1g. Suppose, for some k < p, we have constructed v1; : : : ; vk so thatfv1; : : : ; vkg isanorthogonalbasisfor Wk . Define

vkC1 D xkC1 � projWkxkC1 (2)

BytheOrthogonalDecompositionTheorem, vkC1 isorthogonalto Wk . NotethatprojWk

xkC1 isinWk andhencealsoinWkC1. Since xkC1 isinWkC1, sois vkC1 (becauseWkC1 isasubspaceandisclosedundersubtraction). Furthermore, vkC1 ¤ 0 becausexkC1 isnotin Wk D Span fx1; : : : ; xkg. Hence fv1; : : : ; vkC1g isanorthogonalsetofnonzerovectorsinthe .k C 1/-dimensionalspace WkC1. BytheBasisTheoreminSec-tion 4.5, thissetisanorthogonalbasisfor WkC1. Hence WkC1 D Span fv1; : : : ; vkC1g.When k C 1 D p, theprocessstops.

Theorem 11showsthatanynonzerosubspaceW ofRn hasanorthogonalbasis, be-

causeanordinarybasis fx1; : : : ; xpg isalwaysavailable(byTheorem 11inSection 4.5),andtheGram–Schmidtprocessdependsonlyontheexistenceoforthogonalprojectionsontosubspacesof W thatalreadyhaveorthogonalbases.

Orthonormal BasesAnorthonormal basis is constructedeasily fromanorthogonal basis fv1; : : : ; vpg:simplynormalize(i.e., “scale”)allthe vk . Whenworkingproblemsbyhand, thisiseasierthannormalizingeach vk assoonasitisfound(becauseitavoidsunnecessarywritingofsquareroots).

EXAMPLE 3 Example 1constructedtheorthogonalbasis

v1 D

2

4

3

6

0

3

5; v2 D

2

4

0

0

2

3

5

Anorthonormalbasisis

u1 D 1

kv1kv1 D 1p

45

2

4

3

6

0

3

5 D

2

4

1=p

5

2=p

5

0

3

5

u2 D 1

kv2kv2 D

2

4

0

0

1

3

5

QR Factorization of MatricesIfan m � n matrix A haslinearlyindependentcolumns x1; : : : ; xn, thenapplyingtheWEB

Gram–Schmidtprocess(withnormalizations)to x1; : : : ; xn amountsto factoring A, asdescribedinthenexttheorem. Thisfactorizationiswidelyusedincomputeralgorithmsforvariouscomputations, suchassolvingequations(discussedinSection 6.5)andfindingeigenvalues(mentionedintheexercisesforSection 5.2).

Page 29: 6 Orthogonality and Least Squares

6.4 The Gram–Schmidt Process 357

THEOREM 12 The QR Factorization

IfA isanm � nmatrixwithlinearlyindependentcolumns, thenA canbefactoredas A D QR, where Q isan m � n matrixwhosecolumnsformanorthonormalbasisfor ColA and R isan n � n uppertriangularinvertiblematrixwithpositiveentriesonitsdiagonal.

PROOF Thecolumnsof A formabasis fx1; : : : ; xng for ColA. Constructanorthonor-malbasis fu1; : : : ; ung for W D ColA withproperty(1)inTheorem 11. ThisbasismaybeconstructedbytheGram–Schmidtprocessorsomeothermeans. Let

Q D Œ u1 u2 � � � un �

For k D 1; : : : ; n; xk isin Span fx1; : : : ; xkg D Span fu1; : : : ; ukg. Sotherearecon-stants, r1k ; : : : ; rkk , suchthat

xk D r1ku1 C � � � C rkkuk C 0�ukC1 C � � � C 0�un

Wemayassumethat rkk � 0. (If rkk < 0, multiplyboth rkk and uk by �1.) Thisshowsthat xk isalinearcombinationofthecolumnsof Q usingasweightstheentriesinthevector

rk D

2

66666664

r1k

:::

rkk

0:::

0

3

77777775

Thatis, xk D Qrk for k D 1; : : : ; n. Let R D Œ r1 � � � rn �. Then

A D Œ x1 � � � xn � D Œ Qr1 � � � Qrn � D QR

ThefactthatR isinvertiblefollowseasilyfromthefactthatthecolumnsofA arelinearlyindependent(Exercise 19). SinceR isclearlyuppertriangular, itsnonnegativediagonalentriesmustbepositive.

EXAMPLE 4 FindaQR factorizationof A D

2

664

1 0 0

1 1 0

1 1 1

1 1 1

3

775.

SOLUTION Thecolumnsof A are thevectors x1, x2, and x3 inExample 2. Anorthogonalbasisfor ColA D Span fx1; x2; x3g wasfoundinthatexample:

v1 D

2

664

1

1

1

1

3

775

; v02 D

2

664

�3

1

1

1

3

775

; v3 D

2

664

0

�2=3

1=3

1=3

3

775

Tosimplifythearithmeticthatfollows, scale v3 byletting v03 D 3v3. Thennormalizethethreevectorstoobtain u1, u2, and u3, andusethesevectorsasthecolumnsof Q:

Q D

2

6664

1=2 �3=p

12 0

1=2 1=p

12 �2=p

6

1=2 1=p

12 1=p

6

1=2 1=p

12 1=p

6

3

7775

Page 30: 6 Orthogonality and Least Squares

358 CHAPTER 6 Orthogonality and Least Squares

Byconstruction, thefirst k columnsofQ areanorthonormalbasisof Span fx1; : : : ; xkg.FromtheproofofTheorem 12,A D QR forsomeR. TofindR, observethatQTQ D I ,becausethecolumnsof Q areorthonormal. Hence

QTA D QT .QR/ D IR D R

and

R D

2

4

1=2 1=2 1=2 1=2

�3=p

12 1=p

12 1=p

12 1=p

12

0 �2=p

6 1=p

6 1=p

6

3

5

2

664

1 0 0

1 1 0

1 1 1

1 1 1

3

775

D

2

4

2 3=2 1

0 3=p

12 2=p

12

0 0 2=p

6

3

5

NUMER ICAL NOTES

1. WhentheGram–Schmidtprocessisrunonacomputer, roundofferrorcanbuildupasthevectors uk arecalculated, onebyone. For j and k largebutunequal, theinnerproducts uT

j uk maynotbesufficientlyclosetozero. Thislossoforthogonalitycanbereducedsubstantiallybyrearrangingtheorderofthecalculations.1 However, adifferentcomputer-basedQR factorizationisusuallypreferredtothismodifiedGram–Schmidtmethodbecauseityieldsamoreaccurateorthonormalbasis, eventhoughthefactorizationrequiresabouttwiceasmucharithmetic.

2. ToproduceaQR factorizationofamatrix A, acomputerprogramusuallyleft-multiplies A byasequenceoforthogonalmatricesuntil A istransformedintoanuppertriangularmatrix. Thisconstructionisanalogoustotheleft-multiplicationbyelementarymatricesthatproducesanLU factorizationofA.

PRACTICE PROBLEM

LetW D Span fx1; x2g, where x1 D

2

4

1

1

1

3

5 and x2 D

2

4

1=3

1=3

�2=3

3

5. Constructanorthonor-

malbasisfor W .

6.4 EXERCISESInExercises1–6, thegivensetisabasisforasubspace W . UsetheGram–Schmidtprocesstoproduceanorthogonalbasisfor W .

1.

2

4

3

0

�1

3

5,

2

4

8

5

�6

3

5 2.

2

4

0

4

2

3

5,

2

4

5

6

�7

3

5

3.

2

4

2

�5

1

3

5,

2

4

4

�1

2

3

5 4.

2

4

3

�4

5

3

5,

2

4

�3

14

�7

3

5

5.

2

664

1

�4

0

1

3

775,

2

664

7

�7

�4

1

3

775

6.

2

664

3

�1

2

�1

3

775,

2

664

�5

9

�9

3

3

775

¹See FundamentalsofMatrixComputations, byDavidS. Watkins(NewYork: JohnWiley&Sons, 1991),pp. 167–180.

Page 31: 6 Orthogonality and Least Squares

6.4 The Gram–Schmidt Process 359

7. FindanorthonormalbasisofthesubspacespannedbythevectorsinExercise 3.

8. FindanorthonormalbasisofthesubspacespannedbythevectorsinExercise 4.

FindanorthogonalbasisforthecolumnspaceofeachmatrixinExercises9–12.

9.

2

664

3 �5 1

1 1 1

�1 5 �2

3 �7 8

3

775

10.

2

664

�1 6 6

3 �8 3

1 �2 6

1 �4 �3

3

775

11.

2

66664

1 2 5

�1 1 �4

�1 4 �3

1 �4 7

1 2 1

3

77775

12.

2

66664

1 3 5

�1 �3 1

0 2 3

1 5 2

1 5 8

3

77775

InExercises 13and14, thecolumnsof Q wereobtainedbyapplyingtheGram–SchmidtprocesstothecolumnsofA. Findanuppertriangularmatrix R suchthat A D QR. Checkyourwork.

13. A D

2

664

5 9

1 7

�3 �5

1 5

3

775, Q D

2

664

5=6 �1=6

1=6 5=6

�3=6 1=6

1=6 3=6

3

775

14. A D

2

664

�2 3

5 7

2 �2

4 6

3

775, Q D

2

664

�2=7 5=7

5=7 2=7

2=7 �4=7

4=7 2=7

3

775

15. FindaQR factorizationofthematrixinExercise 11.16. FindaQR factorizationofthematrixinExercise 12.

InExercises17and18, allvectorsandsubspacesarein Rn. Mark

eachstatementTrueorFalse. Justifyeachanswer.

17. a. If fv1; v2; v3g isanorthogonalbasisfor W , thenmul-tiplying v3 byascalar c givesaneworthogonalbasisfv1; v2; cv3g.

b. TheGram–Schmidtprocessproducesfromalinearlyin-dependentset fx1; : : : ; xpg anorthogonalset fv1; : : : ; vpgwiththepropertythatforeach k, thevectors v1; : : : ; vk

spanthesamesubspaceasthatspannedby x1; : : : ; xk .c. If A D QR, where Q hasorthonormalcolumns, then

R D QTA.18. a. If W D Span fx1; x2; x3g with fx1; x2; x3g linearlyinde-

pendent, andif fv1; v2; v3g isanorthogonalsetinW , thenfv1; v2; v3g isabasisfor W .

b. If x isnotinasubspace W , then x � projW x isnotzero.c. InaQR factorization, say A D QR (when A haslin-

earlyindependentcolumns), thecolumnsof Q formanorthonormalbasisforthecolumnspaceof A.

19. Suppose A D QR, where Q is m � n and R is n � n. ShowthatifthecolumnsofA arelinearlyindependent, thenRmustbeinvertible. [Hint: Studytheequation Rx D 0 andusethefactthat A D QR.]

20. Suppose A D QR, where R isaninvertiblematrix. Showthat A and Q havethesamecolumnspace. [Hint: Given y inColA, showthat y D Qx forsome x. Also, given y in ColQ,showthat y D Ax forsome x.]

21. Given A D QR asinTheorem 12, describehowtofindanorthogonalm � m (square)matrixQ1 andaninvertible n � n

uppertriangularmatrix R suchthat

A D Q1

R

0

TheMATLAB qr commandsuppliesthis“full”QR factor-izationwhen rankA D n.

22. Let u1; : : : ; up beanorthogonalbasisforasubspace W ofR

n, and let T W Rn ! R

n bedefinedby T .x/ D projW x.Showthat T isalineartransformation.

23. Suppose A D QR isaQR factorizationofan m � n ma-trix A (withlinearlyindependentcolumns). Partition A asŒA1 A2�, where A1 has p columns. ShowhowtoobtainaQR factorizationof A1, andexplainwhyyourfactorizationhastheappropriateproperties.

24. [M] Use the Gram–Schmidt process as in Example 2 toproduceanorthogonalbasisforthecolumnspaceof

A D

2

66664

�10 13 7 �11

2 1 �5 3

�6 3 13 �3

16 �16 �2 5

2 1 �5 �7

3

77775

25. [M] UsethemethodinthissectiontoproduceaQR factor-izationofthematrixinExercise 24.

26. [M] Foramatrixprogram, theGram–Schmidtprocessworksbetterwithorthonormalvectors. Startingwith x1; : : : ; xp asinTheorem 11, let A D Œ x1 � � � xp �. Suppose Q isann � k matrixwhosecolumnsformanorthonormalbasisforthesubspace Wk spannedbythefirst k columnsof A. Thenfor x in R

n, QQT x istheorthogonalprojectionof x onto Wk

(Theorem 10inSection 6.3). If xkC1 isthenextcolumnofA, thenequation(2)intheproofofTheorem 11becomes

vkC1 D xkC1 � Q.QT xkC1/

(The parentheses above reduce the number of arithmeticoperations.) Let ukC1 D vkC1=kvkC1k. Thenew Q forthenextstepis Œ Q ukC1 �. UsethisproceduretocomputetheQR factorizationofthematrixinExercise 24. Writethekeystrokesorcommandsyouuse.

WEB

Page 32: 6 Orthogonality and Least Squares

360 CHAPTER 6 Orthogonality and Least Squares

SOLUTION TO PRACTICE PROBLEM

Let v1 D x1 D

2

4

1

1

1

3

5 and v2 D x2 � x2 � v1

v1 � v1

v1 D x2 � 0v1 D x2. So fx1; x2g isalready

orthogonal. Allthatisneededistonormalizethevectors. Let

u1 D 1

kv1kv1 D 1p

3

2

4

1

1

1

3

5 D

2

4

1=p

3

1=p

3

1=p

3

3

5

Insteadofnormalizing v2 directly, normalize v02 D 3v2 instead:

u2 D 1

kv02kv02 D 1

p

12 C 12 C .�2/2

2

4

1

1

�2

3

5 D

2

4

1=p

6

1=p

6

�2=p

6

3

5

Then fu1;u2g isanorthonormalbasisfor W .

6.5 LEAST-SQUARES PROBLEMS

Thechapter’sintroductoryexampledescribedamassiveproblem Ax D b thathadnosolution. Inconsistentsystemsariseofteninapplications, thoughusuallynotwithsuchanenormouscoefficientmatrix. Whenasolutionisdemandedandnoneexists, thebestonecandoistofindan x thatmakes Ax ascloseaspossibleto b.

Thinkof Ax asan approximation to b. Thesmallerthedistancebetween b and Ax,givenby kb � Axk, thebettertheapproximation. The generalleast-squaresproblemistofindan x thatmakes kb � Axk assmallaspossible. Theadjective“least-squares”arisesfromthefactthat kb � Axk isthesquarerootofasumofsquares.

DEF IN I T I ON If A is m � n and b isin Rm, a least-squaressolution of Ax D b isan Ox in R

n

suchthatkb � AOxk � kb � Axk

forall x in Rn.

Themostimportantaspectoftheleast-squaresproblemisthatnomatterwhat x weselect, thevector Ax willnecessarilybeinthecolumnspace, ColA. Soweseekan xthatmakes Ax theclosestpointin ColA to b. SeeFig.1. (Ofcourse, if b happenstobein ColA, then b is Ax forsome x, andsuchan x isa“least-squaressolution.”)

Ax

Ax

Ax

Col A

b

0

FIGURE 1 Thevector b iscloserto AOx thanto Ax forother x.

Page 33: 6 Orthogonality and Least Squares

6.5 Least-Squares Problems 361

Solution of the General Least-Squares ProblemGiven A and b asabove, applytheBestApproximationTheoreminSection 6.3tothesubspace ColA. Let

Ob D projColA b

Because Ob isinthecolumnspaceof A, theequation Ax D Ob is consistent, andthereisan Ox in R

n suchthatAOx D Ob (1)

Since Ob istheclosestpointin ColA to b, avector Ox isaleast-squaressolutionofAx D bifandonlyif Ox satisfies(1). Suchan Ox in R

n isalistofweightsthatwillbuild Ob outofthecolumnsof A. SeeFig. 2. [Therearemanysolutionsof(1)iftheequationhasfreevariables.]

x

�n

0

A

subspace of �m

bb – Ax

b = Axˆ

Col A

FIGURE 2 Theleast-squaressolution Ox isin Rn.

Suppose Ox satisfies AOx D Ob. BytheOrthogonalDecompositionTheoreminSec-tion 6.3, theprojection Ob hasthepropertythat b � Ob isorthogonalto ColA, so b � AOxisorthogonaltoeachcolumnof A. If aj isanycolumnof A, then aj �.b � AOx/ D 0,and aT

j .b � AOx/ D 0. Sinceeach aTj isarowof AT ,

AT .b � AOx/ D 0 (2)

(ThisequationalsofollowsfromTheorem3inSection6.1.) Thus

AT b � ATAOx D 0ATAOx D AT b

Thesecalculationsshowthateachleast-squaressolutionofAx D b satisfiestheequation

ATAx D AT b (3)

Thematrixequation(3)representsasystemofequationscalledthe normalequationsfor Ax D b. A solutionof(3)isoftendenotedby Ox.

THEOREM 13 Thesetofleast-squaressolutionsof Ax D b coincideswiththenonemptysetofsolutionsofthenormalequations ATAx D AT b.

PROOF Asshownabove, thesetofleast-squaressolutionsisnonemptyandeachleast-squaressolution Ox satisfiesthenormalequations. Conversely, suppose Ox satisfiesATAOx D AT b. Then Ox satisfies(2)above, whichshowsthat b � AOx isorthogonaltothe

Page 34: 6 Orthogonality and Least Squares

362 CHAPTER 6 Orthogonality and Least Squares

rowsof AT andhenceisorthogonaltothecolumnsof A. Sincethecolumnsof A spanColA, thevector b � AOx isorthogonaltoallof ColA. Hencetheequation

b D AOx C .b � AOx/

isadecompositionof b intothesumofavectorin ColA andavectororthogonaltoColA. Bytheuniquenessoftheorthogonaldecomposition, AOx mustbetheorthogonalprojectionof b onto ColA. Thatis, AOx D Ob, and Ox isaleast-squaressolution.

EXAMPLE 1 Findaleast-squaressolutionoftheinconsistentsystem Ax D b for

A D

2

4

4 0

0 2

1 1

3

5; b D

2

4

2

0

11

3

5

SOLUTION Tousenormalequations(3), compute:

ATA D�

4 0 1

0 2 1

�2

4

4 0

0 2

1 1

3

5 D�

17 1

1 5

AT b D�

4 0 1

0 2 1

�2

4

2

0

11

3

5 D�

19

11

Thentheequation ATAx D AT b becomes�

17 1

1 5

��

x1

x2

D�

19

11

Rowoperationscanbeusedtosolvethissystem, butsince ATA isinvertibleand 2 � 2,itisprobablyfastertocompute

.ATA/�1 D 1

84

5 �1

�1 17

andthentosolve ATAx D AT b as

Ox D .ATA/�1AT b

D 1

84

5 �1

�1 17

��

19

11

D 1

84

84

168

D�

1

2

Inmanycalculations, ATA isinvertible, butthisisnotalwaysthecase. Thenextexampleinvolvesamatrixofthesortthatappearsinwhatarecalled analysisofvarianceproblemsinstatistics.

EXAMPLE 2 Findaleast-squaressolutionof Ax D b for

A D

2

6666664

1 1 0 0

1 1 0 0

1 0 1 0

1 0 1 0

1 0 0 1

1 0 0 1

3

7777775

; b D

2

6666664

�3

�1

0

2

5

1

3

7777775

Page 35: 6 Orthogonality and Least Squares

6.5 Least-Squares Problems 363

SOLUTION Compute

ATA D

2

664

1 1 1 1 1 1

1 1 0 0 0 0

0 0 1 1 0 0

0 0 0 0 1 1

3

775

2

6666664

1 1 0 0

1 1 0 0

1 0 1 0

1 0 1 0

1 0 0 1

1 0 0 1

3

7777775

D

2

664

6 2 2 2

2 2 0 0

2 0 2 0

2 0 0 2

3

775

AT b D

2

664

1 1 1 1 1 1

1 1 0 0 0 0

0 0 1 1 0 0

0 0 0 0 1 1

3

775

2

6666664

�3

�1

0

2

5

1

3

7777775

D

2

664

4

�4

2

6

3

775

Theaugmentedmatrixfor ATAx D AT b is2

664

6 2 2 2 4

2 2 0 0 �4

2 0 2 0 2

2 0 0 2 6

3

775

2

664

1 0 0 1 3

0 1 0 �1 �5

0 0 1 �1 �2

0 0 0 0 0

3

775

Thegeneralsolutionis x1 D 3 � x4, x2 D �5 C x4, x3 D �2 C x4, and x4 isfree. Sothegeneralleast-squaressolutionof Ax D b hastheform

Ox D

2

664

3

�5

�2

0

3

775

C x4

2

664

�1

1

1

1

3

775

Thenexttheoremgivesusefulcriteriafordeterminingwhenthereisonlyoneleast-squaressolutionof Ax D b. (Ofcourse, theorthogonalprojection Ob isalwaysunique.)

THEOREM 14 Let A bean m � n matrix. Thefollowingstatementsarelogicallyequivalent:

a. Theequation Ax D b hasauniqueleast-squaressolutionforeach b in Rm.

b. Thecolumnsof A arelinearlyindpendent.c. Thematrix ATA isinvertible.

Whenthesestatementsaretrue, theleast-squaressolution Ox isgivenbyOx D .ATA/�1AT b (4)

ThemainelementsofaproofofTheorem 14areoutlinedinExercises19–21, whichalsoreviewconceptsfromChapter 4. Formula(4)for Ox isusefulmainlyfortheoreticalpurposesandforhandcalculationswhen ATA isa 2 � 2 invertiblematrix.

Whenaleast-squaressolution Ox isusedtoproduce AOx asanapproximationto b,thedistancefrom b to AOx iscalledthe least-squareserror ofthisapproximation.

EXAMPLE 3 Given A and b asinExample 1, determinetheleast-squareserrorintheleast-squaressolutionof Ax D b.

Page 36: 6 Orthogonality and Least Squares

364 CHAPTER 6 Orthogonality and Least Squares

SOLUTION FromExample 1,

(2, 0, 11)

(0, 2, 1)

0

b

Ax

x3

x1

(4, 0, 1)

�84

Col A

FIGURE 3

b D

2

4

2

0

11

3

5 and AOx D

2

4

4 0

0 2

1 1

3

5

1

2

D

2

4

4

4

3

3

5

Hence

b � AOx D

2

4

2

0

11

3

5 �

2

4

4

4

3

3

5 D

2

4

�2

�4

8

3

5

andkb � AOxk D

p

.�2/2 C .�4/2 C 82 Dp

84

Theleast-squareserrorisp

84. Forany x in R2, thedistancebetween b andthevector

Ax isatleastp

84. SeeFig. 3. Notethattheleast-squaressolution Ox itselfdoesnotappearinthefigure.

Alternative Calculations of Least-Squares SolutionsThenextexampleshowshowtofindaleast-squaressolutionof Ax D b whenthecolumnsof A areorthogonal. Suchmatricesoftenappearinlinearregressionproblems,discussedinthenextsection.

EXAMPLE 4 Findaleast-squaressolutionof Ax D b for

A D

2

664

1 �6

1 �2

1 1

1 7

3

775

; b D

2

664

�1

2

1

6

3

775

SOLUTION Because the columns a1 and a2 of A are orthogonal, the orthogonalprojectionof b onto ColA isgivenby

Ob D b�a1

a1 � a1

a1 C b�a2

a2 � a2

a2 D 8

4a1 C 45

90a2 (5)

D

2

664

2

2

2

2

3

775

C

2

664

�3

�1

1=2

7=2

3

775

D

2

664

�1

1

5=2

11=2

3

775

Nowthat Ob isknown, wecansolve AOx D Ob. Butthisistrivial, sincewealreadyknowwhatweightstoplaceonthecolumnsof A toproduce Ob. Itisclearfrom(5)that

Ox D�

8=4

45=90

D�

2

1=2

In some cases, the normal equations for a least-squares problem can be ill-conditioned; thatis, smallerrorsinthecalculationsoftheentriesof ATA cansometimescause relatively large errors in the solution Ox. If the columns of A are linearlyindependent, theleast-squaressolutioncanoftenbecomputedmorereliablythroughaQR factorizationof A (describedinSection 6.4).¹

¹TheQR methodiscomparedwiththestandardnormalequationmethodinG.GolubandC.VanLoan,MatrixComputations, 3rded. (Baltimore: JohnsHopkinsPress, 1996), pp.230–231.

Page 37: 6 Orthogonality and Least Squares

6.5 Least-Squares Problems 365

THEOREM 15 Givenan m � n matrix A withlinearlyindependentcolumns, let A D QR beaQR factorizationof A asinTheorem 12. Then, foreach b in R

m, theequationAx D b hasauniqueleast-squaressolution, givenby

Ox D R�1QT b (6)

PROOF Let Ox D R�1QT b. Then

AOx D QROx D QRR�1QT b D QQT b

ByTheorem 12, thecolumnsof Q formanorthonormalbasisfor ColA. Hence, byTheorem 10, QQTb istheorthogonalprojection Ob of b onto ColA. Then AOx D Ob,whichshowsthat Ox isaleast-squaressolutionof Ax D b. Theuniquenessof Ox followsfromTheorem 14.

NUMER ICAL NOTE

Since R inTheorem 15isuppertriangular, Ox shouldbecalculatedastheexactsolutionoftheequation

Rx D QT b (7)

Itismuchfastertosolve(7)byback-substitutionorrowoperationsthantocompute R�1 anduse(6).

EXAMPLE 5 Findtheleast-squaressolutionof Ax D b for

A D

2

664

1 3 5

1 1 0

1 1 2

1 3 3

3

775

; b D

2

664

3

5

7

�3

3

775

SOLUTION TheQR factorizationof A canbeobtainedasinSection 6.4.

A D QR D

2

664

1=2 1=2 1=2

1=2 �1=2 �1=2

1=2 �1=2 1=2

1=2 1=2 �1=2

3

775

2

4

2 4 5

0 2 3

0 0 2

3

5

Then

QT b D

2

4

1=2 1=2 1=2 1=2

1=2 �1=2 �1=2 1=2

1=2 �1=2 1=2 �1=2

3

5

2

664

3

5

7

�3

3

775

D

2

4

6

�6

4

3

5

Theleast-squaressolution Ox satisfies Rx D QT b; thatis,2

4

2 4 5

0 2 3

0 0 2

3

5

2

4

x1

x2

x3

3

5 D

2

4

6

�6

4

3

5

Thisequationissolvedeasilyandyields Ox D

2

4

10

�6

2

3

5.

Page 38: 6 Orthogonality and Least Squares

366 CHAPTER 6 Orthogonality and Least Squares

PRACTICE PROBLEMS

1. LetA D

2

4

1 �3 �3

1 5 1

1 7 2

3

5 and b D

2

4

5

�3

�5

3

5. Findaleast-squaressolutionofAx D b,

andcomputetheassociatedleast-squareserror.2. Whatcanyousayabouttheleast-squaressolutionof Ax D b when b isorthogonal

tothecolumnsof A?

6.5 EXERCISESIn Exercises 1–4, finda least-squares solution of Ax D b by(a)constructingthenormalequationsfor Ox and(b)solvingforOx.

1. A D

2

4

�1 2

2 �3

�1 3

3

5, b D

2

4

4

1

2

3

5

2. A D

2

4

2 1

�2 0

2 3

3

5, b D

2

4

�5

8

1

3

5

3. A D

2

664

1 �2

�1 2

0 3

2 5

3

775, b D

2

664

3

1

�4

2

3

775

4. A D

2

4

1 3

1 �1

1 1

3

5, b D

2

4

5

1

0

3

5

InExercises5and6, describeallleast-squaressolutionsoftheequation Ax D b.

5. A D

2

664

1 1 0

1 1 0

1 0 1

1 0 1

3

775, b D

2

664

1

3

8

2

3

775

6. A D

2

6666664

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

3

7777775

, b D

2

6666664

7

2

3

6

5

4

3

7777775

7. Computetheleast-squareserrorassociatedwiththeleast-squaressolutionfoundinExercise 3.

8. Computetheleast-squareserrorassociatedwiththeleast-squaressolutionfoundinExercise 4.

InExercises9–12, find(a)theorthogonalprojectionof b ontoColA and(b)aleast-squaressolutionof Ax D b.

9. A D

2

4

1 5

3 1

�2 4

3

5, b D

2

4

4

�2

�3

3

5

10. A D

2

4

1 2

�1 4

1 2

3

5, b D

2

4

3

�1

5

3

5

11. A D

2

664

4 0 1

1 �5 1

6 1 0

1 �1 �5

3

775, b D

2

664

9

0

0

0

3

775

12. A D

2

664

1 1 0

1 0 �1

0 1 1

�1 1 �1

3

775, b D

2

664

2

5

6

6

3

775

13. Let A D

2

4

3 4

�2 1

3 4

3

5, b D

2

4

11

�9

5

3

5, u D�

5

�1

, and v D�

5

�2

. Compute Au and Av, andcomparethemwith b.

Could u possiblybea least-squares solutionof Ax D b?(Answerthiswithoutcomputingaleast-squaressolution.)

14. Let A D

2

4

2 1

�3 �4

3 2

3

5, b D

2

4

5

4

4

3

5, u D�

4

�5

, and v D�

6

�5

. Compute Au and Av, andcomparethemwith b. Is

itpossiblethatatleastoneof u or v couldbealeast-squaressolutionofAx D b? (Answerthiswithoutcomputingaleast-squaressolution.)

InExercises15and16, usethefactorization A D QR tofindtheleast-squaressolutionof Ax D b.

15. A D

2

4

2 3

2 4

1 1

3

5 D

2

4

2=3 �1=3

2=3 2=3

1=3 �2=3

3

5

3 5

0 1

, b D

2

4

7

3

1

3

5

16. A D

2

664

1 �1

1 4

1 �1

1 4

3

775

D

2

664

1=2 �1=2

1=2 1=2

1=2 �1=2

1=2 1=2

3

775

2 3

0 5

;b D

2

664

�1

6

5

7

3

775

InExercises17and18,A isanm � nmatrixand b isinRm. Mark

eachstatementTrueorFalse. Justifyeachanswer.

17. a. Thegeneralleast-squaresproblemistofindan x thatmakes Ax ascloseaspossibleto b.

Page 39: 6 Orthogonality and Least Squares

6.5 Least-Squares Problems 367

b. A least-squaressolutionof Ax D b is a vector Ox thatsatisfies AOx D Ob, where Ob istheorthogonalprojectionofb onto ColA.

c. A least-squaressolutionof Ax D b isavector Ox suchthatkb � Axk � kb � AOxk forall x in R

n.d. Anysolutionof ATAx D AT b isaleast-squaressolution

of Ax D b.e. Ifthecolumnsof A arelinearlyindependent, thenthe

equation Ax D b hasexactlyoneleast-squaressolution.18. a. If b isinthecolumnspaceof A, theneverysolutionof

Ax D b isaleast-squaressolution.b. Theleast-squaressolutionof Ax D b isthepointinthe

columnspaceof A closestto b.c. A least-squaressolutionof Ax D b isalistofweights

that, whenappliedtothecolumnsof A, producestheorthogonalprojectionof b onto ColA.

d. If Ox is a least-squares solution of Ax D b, thenOx D .ATA/�1AT b.

e. Thenormalequationsalwaysprovideareliablemethodforcomputingleast-squaressolutions.

f. If A hasaQR factorization, say A D QR, thenthebestwaytofindtheleast-squaressolutionof Ax D b istocompute Ox D R�1QT b.

19. LetA beanm � nmatrix. Usethestepsbelowtoshowthatavector x inR

n satisfiesAx D 0 ifandonlyifATAx D 0. Thiswillshowthat NulA D NulATA.a. Showthatif Ax D 0, then ATAx D 0.b. Suppose ATAx D 0. Explainwhy xTATAx D 0, anduse

thistoshowthat Ax D 0.20. Let A bean m � n matrixsuchthat ATA isinvertible. Show

thatthecolumnsof A arelinearlyindependent. [Careful:Youmaynotassumethat A isinvertible; itmaynotevenbesquare.]

21. Let A bean m � n matrixwhosecolumnsarelinearlyinde-pendent. [Careful: A neednotbesquare.]a. UseExercise 19toshowthat ATA isaninvertiblematrix.b. Explain why A must have at least as many rows as

columns.c. Determinetherankof A.

22. UseExercise 19toshowthat rankATA D rankA. [Hint:Howmanycolumnsdoes ATA have? Howisthisconnectedwiththerankof ATA?]

23. Suppose A is m � n withlinearlyindependentcolumnsandb isin R

m. Usethenormalequationstoproduceaformulafor Ob, theprojectionof b onto ColA. [Hint: Find Ox first. Theformuladoesnotrequireanorthogonalbasisfor ColA.]

24. Findaformulafortheleast-squaressolutionofAx D bwhenthecolumnsof A areorthonormal.

25. Describeallleast-squaressolutionsofthesystem

x C y D 2

x C y D 4

26. [M] Example 3inSection 4.8displayedalow-passlinearfilterthatchangedasignal fykg into fykC1g andchangedahigher-frequencysignal fwkg into the zero signal, whereyk D cos.�k=4/ and wk D cos.3�k=4/. Thefollowingcal-culationswilldesignafilterwithapproximatelythoseprop-erties. Thefilterequationis

a0ykC2 C a1ykC1 C a2yk D ´k forall k .8/

Becausethesignalsareperiodic, withperiod 8, itsufficestostudyequation(8)for k D 0; : : : ; 7. Theactiononthetwosignalsdescribedabovetranslatesintotwosetsofeightequations, shownbelow:

k D 0

k D 1:::

k D 7

2

66666666664

ykC2

0ykC1

.7yk

1�:7 0 :7

�1 �:7 0

�:7 �1 �:7

0 �:7 �1

:7 0 �:7

1 :7 0

:7 1 :7

3

77777777775

2

4

a0

a1

a2

3

5 D

2

66666666664

ykC1

.70

�:7

�1

�:7

0

:7

1

3

77777777775

k D 0

k D 1:::

k D 7

2

66666666664

wkC2

0 �wkC1

.7wk

1:7 0 �:7

�1 :7 0

:7 �1 :7

0 :7 �1

�:7 0 :7

1 �:7 0

�:7 1 �:7

3

77777777775

2

4

a0

a1

a2

3

5 D

2

66666666664

0

0

0

0

0

0

0

0

3

77777777775

Write an equation Ax D b, where A is a 16 � 3 matrixformedfromthetwocoefficientmatricesaboveandwhere bin R

16 isformedfromthetworightsidesoftheequations.Find a0, a1, and a2 givenbytheleast-squaressolutionofAx D b. (The.7inthedataabovewasusedasanapprox-imationfor

p2=2, toillustratehowatypicalcomputation

inanappliedproblemmightproceed. If .707wereusedinstead, theresultingfiltercoefficientswouldagreetoatleastsevendecimalplaceswith

p2=4; 1=2, and

p2=4, thevalues

producedbyexactarithmeticcalculations.)

WEB

Page 40: 6 Orthogonality and Least Squares

368 CHAPTER 6 Orthogonality and Least Squares

SOLUTIONS TO PRACTICE PROBLEMS

1. First, compute

ATA D

2

4

1 1 1

�3 5 7

�3 1 2

3

5

2

4

1 �3 �3

1 5 1

1 7 2

3

5 D

2

4

3 9 0

9 83 28

0 28 14

3

5

AT b D

2

4

1 1 1

�3 5 7

�3 1 2

3

5

2

4

5

�3

�5

3

5 D

2

4

�3

�65

�28

3

5

Next, rowreducetheaugmentedmatrixforthenormalequations, ATAx D AT b:2

4

3 9 0 �3

9 83 28 �65

0 28 14 �28

3

5 �

2

4

1 3 0 �1

0 56 28 �56

0 28 14 �28

3

5 � � � � �

2

4

1 0 �3=2 2

0 1 1=2 �1

0 0 0 0

3

5

Thegeneralleast-squaressolutionis x1 D 2 C 32x3, x2 D �1 � 1

2x3, with x3 free.

Foronespecificsolution, take x3 D 0 (forexample), andget

Ox D

2

4

2

�1

0

3

5

Tofindtheleast-squareserror, compute

Ob D AOx D

2

4

1 �3 �3

1 5 1

1 7 2

3

5

2

4

2

�1

0

3

5 D

2

4

5

�3

�5

3

5

Itturnsoutthat Ob D b, so kb � Obk D 0. Theleast-squareserroriszerobecause bhappenstobein ColA.

2. If b isorthogonaltothecolumnsofA, thentheprojectionof b ontothecolumnspaceof A is 0. Inthiscase, aleast-squaressolution Ox of Ax D b satisfies AOx D 0.

6.6 APPLICATIONS TO LINEAR MODELS

A commontaskinscienceandengineeringistoanalyzeandunderstandrelationshipsamongseveralquantitiesthatvary. Thissectiondescribesavarietyofsituationsinwhichdataareusedtobuildorverifyaformulathatpredictsthevalueofonevariableasafunctionofothervariables. Ineachcase, theproblemwillamounttosolvingaleast-squaresproblem.

Foreasyapplicationofthediscussiontorealproblemsthatyoumayencounterlaterinyourcareer, wechoosenotationthatiscommonlyusedinthestatisticalanalysisofscientificandengineeringdata. Insteadof Ax D b, wewrite Xˇ D y andreferto X asthe designmatrix, ˇ asthe parametervector, and y asthe observationvector.

Least-Squares LinesThe simplest relation between two variables x and y is the linear equationy D ˇ0 C ˇ1x.¹ Experimentaldataoftenproducepoints .x1; y1/; : : : ; .xn; yn/ that,

¹Thisnotationiscommonlyusedforleast-squareslinesinsteadof y D mx C b.

Page 41: 6 Orthogonality and Least Squares

6.6 Applications to Linear Models 369

whengraphed, seemtolieclosetoaline. Wewanttodeterminetheparameters ˇ0

and ˇ1 thatmakethelineas“close”tothepointsaspossible.Suppose ˇ0 and ˇ1 are fixed, andconsider the line y D ˇ0 C ˇ1x in Fig. 1.

Correspondingtoeachdatapoint .xj ; yj / thereisapoint .xj ; ˇ0 C ˇ1xj / onthelinewiththesame x-coordinate. Wecall yj the observed valueof y and ˇ0 C ˇ1xj thepredicted y-value(determinedbytheline). Thedifferencebetweenanobserved y-valueandapredicted y-valueiscalleda residual.

ResidualResidual

Point on line

Data pointy

xj

x1

xn

x

y = �0 + �

1x

(xj, �

0 + �

1x

j)

(xj, y

j)

FIGURE 1 Fittingalinetoexperimentaldata.

Thereareseveralwaystomeasurehow“close”thelineistothedata. Theusualchoice(primarilybecausethemathematicalcalculationsaresimple)istoaddthesquaresoftheresiduals. The least-squaresline istheline y D ˇ0 C ˇ1x thatminimizesthesumofthesquaresoftheresiduals. Thislineisalsocalleda lineofregressionof yon x, becauseanyerrorsinthedataareassumedtobeonlyinthe y-coordinates. Thecoefficients ˇ0, ˇ1 ofthelinearecalled(linear) regressioncoefficients.²

Ifthedatapointswereontheline, theparameters ˇ0 and ˇ1 wouldsatisfytheequations

Predicted Observedy-value y-value

ˇ0 C ˇ1x1 = y1

ˇ0 C ˇ1x2 = y2

::::::

ˇ0 C ˇ1xn = yn

Wecanwritethissystemas

Xˇ D y; where X D

2

6664

1 x1

1 x2:::

:::

1 xn

3

7775

; ˇ D�

ˇ0

ˇ1

; y D

2

6664

y1

y2:::

yn

3

7775

(1)

Ofcourse, ifthedatapointsdon’tlieonaline, thentherearenoparameters ˇ0, ˇ1 forwhichthepredicted y-valuesin Xˇ equaltheobserved y-valuesin y, and Xˇ D y hasnosolution. Thisisaleast-squaresproblem, Ax D b, withdifferentnotation!

Thesquareofthedistancebetweenthevectors Xˇ and y ispreciselythesumofthesquaresoftheresiduals. The ˇ thatminimizesthissumalsominimizesthedistancebetween Xˇ and y. Computingtheleast-squaressolutionof Xˇ D y isequivalenttofindingthe ˇ thatdeterminestheleast-squareslineinFig. 1.

² Ifthemeasurementerrorsarein x insteadof y, simplyinterchangethecoordinatesofthedata .xj ; yj /

beforeplottingthepointsandcomputingtheregressionline. Ifbothcoordinatesaresubjecttopossibleerror,thenyoumightchoosethelinethatminimizesthesumofthesquaresofthe orthogonal (perpendicular)distancesfromthepointstotheline. SeethePracticeProblemsforSection 7.5.

Page 42: 6 Orthogonality and Least Squares

370 CHAPTER 6 Orthogonality and Least Squares

EXAMPLE 1 Findtheequation y D ˇ0 C ˇ1x oftheleast-squareslinethatbestfitsthedatapoints .2; 1/, .5; 2/, .7; 3/, and .8; 3/.

SOLUTION Usethe x-coordinatesofthedatatobuildthedesignmatrix X in(1)andthe y-coordinatestobuildtheobservationvector y:

X D

2

664

1 2

1 5

1 7

1 8

3

775

; y D

2

664

1

2

3

3

3

775

Fortheleast-squaressolutionof Xˇ D y, obtainthenormalequations(withthenewnotation):

XTXˇ D XTy

Thatis, compute

XTX D�

1 1 1 1

2 5 7 8

2

664

1 2

1 5

1 7

1 8

3

775

D�

4 22

22 142

XTy D�

1 1 1 1

2 5 7 8

2

664

1

2

3

3

3

775

D�

9

57

Thenormalequationsare�

4 22

22 142

��

ˇ0

ˇ1

D�

9

57

Hence�

ˇ0

ˇ1

D�

4 22

22 142

��1�9

57

D 1

84

142 �22

�22 4

��

9

57

D 1

84

24

30

D�

2=7

5=14

Thustheleast-squareslinehastheequation

y D 2

7C 5

14x

SeeFig. 2.

1 2 3

1

2

3

4 5 7 8 96

y

x

FIGURE 2 Theleast-squaresliney D 2

7C 5

14x.

A commonpracticebeforecomputingaleast-squareslineistocomputetheaveragex oftheoriginal x-valuesandformanewvariable x� D x � x. Thenew x-dataaresaidtobein mean-deviationform. Inthiscase, thetwocolumnsofthedesignmatrixwillbeorthogonal. Solutionofthenormalequationsissimplified, justasinExample 4inSection 6.5. SeeExercises 17and18.

Page 43: 6 Orthogonality and Least Squares

6.6 Applications to Linear Models 371

The General Linear ModelInsomeapplications, itisnecessarytofitdatapointswithsomethingotherthanastraightline. Intheexamplesthatfollow, thematrixequationisstill Xˇ D y, butthespecificformof X changesfromoneproblemtothenext. Statisticiansusuallyintroducearesidualvector �, definedby � D y � Xˇ, andwrite

y D Xˇ C �

Anyequationofthisformisreferredtoasa linearmodel. OnceX and y aredetermined,thegoalistominimizethelengthof �, whichamountstofindingaleast-squaressolutionof Xˇ D y. Ineachcase, theleast-squaressolution O isasolutionofthenormalequations

XTXˇ D XTy

Least-Squares Fitting of Other CurvesWhendatapoints .x1; y1/; : : : ; .xn; yn/ onascatterplotdonotlieclosetoanyline, itmaybeappropriatetopostulatesomeotherfunctionalrelationshipbetween x and y.

Thenexttwoexamplesshowhowtofitdatabycurvesthathavethegeneralformy D ˇ0f0.x/ C ˇ1f1.x/ C � � � C ˇkfk.x/ (2)

where f0; : : : ; fk areknownfunctionsand ˇ0; : : : ; ˇk areparameters that mustbedetermined. Aswewillsee, equation(2)describesalinearmodelbecauseitislinearintheunknownparameters.

Foraparticularvalueof x, (2)givesapredicted, or“fitted,” valueof y. Thedifferencebetweentheobservedvalueandthepredictedvalueistheresidual. Theparameters ˇ0; : : : ; ˇk mustbedeterminedsoastominimizethesumofthesquaresoftheresiduals.

EXAMPLE 2 Supposedatapoints .x1; y1/; : : : ; .xn; yn/ appeartoliealongsomesortofparabolainsteadofastraightline. Forinstance, ifthe x-coordinatedenotestheproductionlevelforacompany, and y denotestheaveragecostperunitofoperatingatalevelof x unitsperday, thenatypicalaveragecostcurvelookslikeaparabolathatopensupward(Fig. 3). Inecology, aparaboliccurvethatopensdownwardisusedto

Units produced

Aver

age

cost

per

unit

x

y

FIGURE 3

Averagecostcurve.modelthenetprimaryproductionofnutrientsinaplant, asafunctionofthesurfaceareaofthefoliage(Fig. 4). Supposewewishtoapproximatethedatabyanequationoftheform

y D ˇ0 C ˇ1x C ˇ2x2 (3)

Describethelinearmodelthatproducesa“least-squaresfit”ofthedatabyequation (3).SOLUTION Equation(3)describestheidealrelationship. Supposetheactualvaluesof

Surface areaof foliage

x

y

Net

pri

mar

ypro

duct

ion

FIGURE 4

Productionofnutrients.

theparametersare ˇ0, ˇ1, ˇ2. Thenthecoordinatesofthefirstdatapoint .x1; y1/ satisfyanequationoftheform

y1 D ˇ0 C ˇ1x1 C ˇ2x21 C �1

where �1 istheresidualerrorbetweentheobservedvalue y1 andthepredicted y-valueˇ0 C ˇ1x1 C ˇ2x2

1 . Eachdatapointdeterminesasimilarequation:

y1 D ˇ0 C ˇ1x1 C ˇ2x21 C �1

y2 D ˇ0 C ˇ1x2 C ˇ2x22 C �2

::::::

yn D ˇ0 C ˇ1xn C ˇ2x2n C �n

Page 44: 6 Orthogonality and Least Squares

372 CHAPTER 6 Orthogonality and Least Squares

Itisasimplemattertowritethissystemofequationsintheform y D Xˇ C �. TofindX , inspectthefirstfewrowsofthesystemandlookforthepattern.

2

6664

y1

y2

:::

yn

3

7775

D

2

66664

1 x1 x21

1 x2 x22

::::::

:::

1 xn x2n

3

77775

2

64

ˇ0

ˇ1

ˇ2

3

75 C

2

66664

�1

�2

:::

�n

3

77775

y D X ˇ C �

EXAMPLE 3 IfdatapointstendtofollowapatternsuchasinFig. 5, thenan

x

y

FIGURE 5

Datapointsalongacubiccurve.

appropriatemodelmightbeanequationoftheform

y D ˇ0 C ˇ1x C ˇ2x2 C ˇ3x3

Suchdata, forinstance, couldcomefromacompany’stotalcosts, asafunctionofthelevelofproduction. Describethelinearmodelthatgivesaleast-squaresfitofthistypetodata .x1; y1/; : : : ; .xn; yn/.

SOLUTION ByananalysissimilartothatinExample 2, weobtain

Observation Design Parameter Residualvector matrix vector vector

y D

2

66664

y1

y2

:::

yn

3

77775

; X D

2

66664

1 x1 x21 x3

1

1 x2 x22 x3

2

::::::

::::::

1 xn x2n x3

n

3

77775

; ˇ D

2

66664

ˇ0

ˇ1

ˇ2

ˇ3

3

77775

; � D

2

66664

�1

�2

:::

�n

3

77775

Multiple RegressionSupposeanexperimentinvolvestwoindependentvariables—say, u and v—andonedependentvariable, y. A simpleequationforpredicting y from u and v hastheform

y D ˇ0 C ˇ1u C ˇ2v (4)

A moregeneralpredictionequationmighthavetheform

y D ˇ0 C ˇ1u C ˇ2v C ˇ3u2 C ˇ4uv C ˇ5v2 (5)

Thisequationisusedingeology, forinstance, tomodelerosionsurfaces, glacialcirques,soilpH,andotherquantities. Insuchcases, theleast-squaresfitiscalleda trendsurface.

Equations(4)and(5)bothleadtoalinearmodelbecausetheyarelinearintheunknownparameters(eventhough u and v aremultiplied). Ingeneral, alinearmodelwillarisewhenever y istobepredictedbyanequationoftheform

y D ˇ0f0.u; v/ C ˇ1f1.u; v/ C � � � C ˇkfk.u; v/

with f0; : : : ; fk anysortofknownfunctionsand ˇ0; : : : ; ˇk unknownweights.

EXAMPLE 4 In geography, local models of terrain are constructed fromdata.u1; v1; y1/; : : : ; .un; vn; yn/, where uj , vj , and yj arelatitude, longitude, andaltitude,respectively. Describethelinearmodelbasedon(4)thatgivesaleast-squaresfittosuchdata. Thesolutioniscalledthe least-squaresplane. SeeFig. 6.

Page 45: 6 Orthogonality and Least Squares

6.6 Applications to Linear Models 373

FIGURE 6 A least-squaresplane.

SOLUTION Weexpectthedatatosatisfythefollowingequations:

y1 D ˇ0 C ˇ1u1 C ˇ2v1 C �1

y2 D ˇ0 C ˇ1u2 C ˇ2v2 C �2

::::::

yn D ˇ0 C ˇ1un C ˇ2vn C �n

Thissystemhasthematrixform y D Xˇ C �, whereObservation Design Parameter Residual

vector matrix vector vector

y D

2

6664

y1

y2

:::

yn

3

7775

; X D

2

6664

1 u1 v1

1 u2 v2

::::::

:::

1 un vn

3

7775

; ˇ D

2

4

ˇ0

ˇ1

ˇ2

3

5; � D

2

6664

�1

�2

:::

�n

3

7775

Example 4showsthatthelinearmodelformultipleregressionhasthesameabstractformasthemodelforthesimpleregressionintheearlierexamples. Linearalgebragivesusthepowertounderstandthegeneralprinciplebehindallthelinearmodels. Once X

isdefinedproperly, thenormalequationsfor ˇ havethesamematrixform, nomatterhowmanyvariablesareinvolved. Thus, foranylinearmodelwhere XTX isinvertible,theleast-squares O isgivenby .XTX/�1XTy.SG

The Geometry of aLinear Model 6–19

Further ReadingFerguson, J., IntroductiontoLinearAlgebrainGeology (NewYork: Chapman&Hall,1994).Krumbein, W.C., andF.A.Graybill, AnIntroductiontoStatisticalModelsinGeology(NewYork: McGraw-Hill, 1965).Legendre, P., andL.Legendre, NumericalEcology (Amsterdam: Elsevier, 1998).Unwin, DavidJ., AnIntroductiontoTrendSurfaceAnalysis, ConceptsandTechniquesinModernGeography, No. 5(Norwich, England: GeoBooks, 1975).

PRACTICE PROBLEM

Whenthemonthlysalesofaproductaresubjecttoseasonalfluctuations, acurvethatapproximatesthesalesdatamighthavetheform

y D ˇ0 C ˇ1x C ˇ2 sin .2�x=12/

where x isthetimeinmonths. Theterm ˇ0 C ˇ1x givesthebasicsalestrend, andthesinetermreflectstheseasonalchangesinsales. Givethedesignmatrixandtheparametervectorforthelinearmodelthatleadstoaleast-squaresfitoftheequationabove. Assumethedataare .x1; y1/; : : : ; .xn; yn/.

Page 46: 6 Orthogonality and Least Squares

374 CHAPTER 6 Orthogonality and Least Squares

6.6 EXERCISESInExercises1–4, findtheequation y D ˇ0 C ˇ1x oftheleast-squareslinethatbestfitsthegivendatapoints.

1. .0; 1/, .1; 1/, .2; 2/, .3; 2/

2. .1; 0/, .2; 1/, .4; 2/, .5; 3/

3. .�1; 0/, .0; 1/, .1; 2/, .2; 4/

4. .2; 3/, .3; 2/, .5; 1/, .6; 0/

5. Let X bethedesignmatrixusedtofindtheleast-squareslinetofitdata .x1; y1/; : : : ; .xn; yn/. UseatheoreminSection 6.5toshowthatthenormalequationshaveauniquesolutionifandonlyifthedataincludeatleasttwodatapointswithdifferent x-coordinates.

6. Let X bethedesignmatrixinExample 2correspondingtoaleast-squaresfitofaparabolatodata .x1; y1/; : : : ; .xn; yn/.Suppose x1, x2, and x3 aredistinct. Explainwhythereisonlyoneparabolathatfitsthedatabest, inaleast-squaressense.(SeeExercise 5.)

7. A certain experiment produces the data .1; 1:8/, .2; 2:7/,.3; 3:4/, .4; 3:8/, .5; 3:9/. Describethemodelthatproducesaleast-squaresfitofthesepointsbyafunctionoftheform

y D ˇ1x C ˇ2x2

Suchafunctionmightarise, forexample, astherevenuefromthesaleof x unitsofaproduct, whentheamountofferedforsaleaffectsthepricetobesetfortheproduct.a. Givethedesignmatrix, theobservationvector, andthe

unknownparametervector.b. [M] Findtheassociatedleast-squarescurveforthedata.

8. A simplecurvethatoftenmakesagoodmodelforthevari-ablecostsofacompany, asafunctionofthesaleslevel x,hastheform y D ˇ1x C ˇ2x

2 C ˇ3x3. Thereisnoconstant

termbecausefixedcostsarenotincluded.a. Givethedesignmatrixandtheparametervectorforthe

linearmodelthatleadstoaleast-squaresfitoftheequa-tionabove, withdata .x1; y1/; : : : ; .xn; yn/.

b. [M] Findtheleast-squarescurveoftheformabovetofitthedata .4; 1:58/, .6; 2:08/, .8; 2:5/, .10; 2:8/, .12; 3:1/,.14; 3:4/, .16; 3:8/, and .18; 4:32/, withvaluesinthou-sands. Ifpossible, produceagraphthatshowsthedatapointsandthegraphofthecubicapproximation.

9. A certainexperimentproducesthedata .1; 7:9/, .2; 5:4/, and.3; �:9/. Describethemodelthatproducesaleast-squaresfitofthesepointsbyafunctionoftheform

y D A cos x C B sin x

10. SupposeradioactivesubstancesA andB havedecaycon-stantsof.02and.07, respectively. Ifamixtureofthesetwosubstancesattime t D 0 contains MA gramsofA and MBgramsofB,thenamodelforthetotalamount y ofthemixturepresentattime t is

y D MAe�:02t C MBe�:07t .6/

Suppose the initial amounts MA and MB are unknown,but a scientist is able to measure the total amountspresent at several timesandrecords thefollowingpoints.ti ; yi /: .10; 21:34/, .11; 20:68/, .12; 20:05/, .14; 18:87/,and .15; 18:30/.a. Describealinearmodelthatcanbeusedtoestimate MA

and MB.b. [M] Findtheleast-squarescurvebasedon(6).

Halley’sCometlastappearedin1986andwillreappearin2061.

11. [M] AccordingtoKepler’sfirstlaw, acometshouldhaveanelliptic, parabolic, orhyperbolicorbit(withgravitationalattractionsfromtheplanetsignored). Insuitablepolarcoor-dinates, theposition .r; #/ ofacometsatisfiesanequationoftheform

r D ˇ C e.r � cos#/

where ˇ isaconstantand e isthe eccentricity oftheorbit,with 0 � e < 1 foranellipse, e D 1 foraparabola, and e > 1

forahyperbola. Supposeobservationsofanewlydiscoveredcometprovidethedatabelow. Determinethetypeoforbit,andpredictwherethecometwillbewhen# D 4:6 (radians).3

# .88 1.10 1.42 1.77 2.14r 3.00 2.30 1.65 1.25 1.01

12. [M] A healthychild’ssystolicbloodpressure p (inmillime-tersofmercury)andweightw (inpounds)areapproximatelyrelatedbytheequation

ˇ0 C ˇ1 lnw D p

Usethefollowingexperimentaldatatoestimatethesystolicbloodpressureofahealthychildweighing100pounds.

3 Thebasicideaofleast-squaresfittingofdataisduetoK. F. Gauss(and, independently, toA.Legendre), whoseinitialrisetofameoccurredin1801whenheusedthemethodtodeterminethepathoftheasteroidCeres. Fortydaysaftertheasteroidwasdiscovered, itdisappearedbehindthesun. Gausspredicteditwouldappeartenmonthslaterandgaveitslocation. TheaccuracyofthepredictionastonishedtheEuropeanscientificcommunity.

Page 47: 6 Orthogonality and Least Squares

6.6 Applications to Linear Models 375

w 44 61 81 113 131lnw 3.78 4.11 4.39 4.73 4.88p 91 98 103 110 112

13. [M] Tomeasurethetakeoffperformanceofanairplane, thehorizontalpositionoftheplanewasmeasuredeverysecond,from t D 0 to t D 12. Thepositions(infeet)were: 0, 8.8,29.9, 62.0, 104.7, 159.1, 222.0, 294.5, 380.4, 471.1, 571.7,686.8, and809.2.a. Find the least-squares cubic curve y D ˇ0 C ˇ1t C

ˇ2t2 C ˇ3t

3 forthesedata.b. Usetheresultofpart(a)toestimatethevelocityofthe

planewhen t D 4:5 seconds.

14. Let x D 1

n.x1 C � � � C xn/ and y D 1

n.y1 C � � � C yn/.

Show that the least-squares line for the data.x1; y1/; : : : ; .xn; yn/mustpassthrough .x; y/. Thatis, showthat x and y satisfythelinearequation y D O

0 C O1x. [Hint:

Derivethisequationfromthevectorequation y D X O C �.Denotethefirstcolumnof X by 1. Usethefactthattheresidualvector � isorthogonaltothecolumnspaceof X andhenceisorthogonalto 1.]

Givendataforaleast-squaresproblem, .x1; y1/; : : : ; .xn; yn/, thefollowingabbreviationsarehelpful:P

x DPn

iD1 xi ;P

x2 DPn

iD1 x2i ;

Py D

PniD1 yi ;

Pxy D

PniD1 xi yi

Thenormalequationsforaleast-squaresline y D O0 C O

1x maybewrittenintheform

n O0 C O

1

Px D

Py

O0

Px C O

1

Px2 D

Pxy

.7/

15. Derivethenormalequations(7)fromthematrixformgiveninthissection.

16. Useamatrixinversetosolvethesystemofequationsin(7)andtherebyobtainformulasfor O

0 and O1 thatappearinmany

statisticstexts.

17. a. RewritethedatainExample 1withnew x-coordinatesinmeandeviationform. Let X betheassociateddesignmatrix. Whyarethecolumnsof X orthogonal?

b. Writethenormalequationsforthedatainpart(a), andsolvethemtofindtheleast-squaresline, y D ˇ0 C ˇ1x

�,where x� D x � 5:5.

18. Supposethe x-coordinatesofthedata .x1; y1/; : : : ; .xn; yn/

areinmeandeviationform, sothatP

xi D 0. ShowthatifX isthedesignmatrixfortheleast-squareslineinthiscase,then XTX isadiagonalmatrix.

Exercises19and20involveadesignmatrix X withtwoormorecolumnsandaleast-squaressolution O of y D Xˇ. Considerthefollowingnumbers.

(i) kX O k2—the sumofthe squaresofthe“regressionterm.”Denotethisnumberby SS(R).

(ii) ky � X O k2—the sumofthe squaresfor errorterm. DenotethisnumberbySS(E).

(iii) kyk2—the“total” sumofthe squaresofthe y-values. DenotethisnumberbySS(T).

Everystatisticstextthatdiscussesregressionandthelinearmodely D Xˇ C � introducesthesenumbers, thoughterminologyandnotationvarysomewhat. Tosimplifymatters, assumethatthemeanofthe y-valuesiszero. Inthiscase, SS(T) isproportionaltowhatiscalledthe variance ofthesetof y-values.

19. Justifytheequation SS(T) D SS(R) C SS(E). [Hint: Useatheorem, andexplainwhythehypothesesofthetheoremaresatisfied.] Thisequationisextremelyimportantinstatistics,bothinregressiontheoryandintheanalysisofvariance.

20. Showthat kX O k2 = O T XTy. [Hint: Rewritetheleftsideandusethefactthat O satisfiesthenormalequations.] ThisformulaforSS(R) isusedinstatistics. FromthisandfromExercise 19, obtainthestandardformulaforSS(E):

SS(E) D yT y � O TXT y

SOLUTION TO PRACTICE PROBLEM

Construct X and ˇ sothatthe kthrowof Xˇ isthepredicted y-valuethatcorresponds

x

y

Salestrendwithseasonalfluctuations.

tothedatapoint .xk ; yk/, namely,

ˇ0 C ˇ1xk C ˇ2 sin.2�xk=12/

Itshouldbeclearthat

X D

2

64

1 x1 sin.2�x1=12/:::

::::::

1 xn sin.2�xn=12/

3

75 ; ˇ D

2

4

ˇ0

ˇ1

ˇ2

3

5

Page 48: 6 Orthogonality and Least Squares

376 CHAPTER 6 Orthogonality and Least Squares

6.7 INNER PRODUCT SPACES

Notions of length, distance, and orthogonality are often important in applicationsinvolvingavectorspace. For R

n, theseconceptswerebasedonthepropertiesoftheinnerproductlistedinTheorem 1ofSection 6.1. Forotherspaces, weneedanaloguesoftheinnerproductwiththesameproperties. TheconclusionsofTheorem 1nowbecomeaxioms inthefollowingdefinition.

DEF IN I T I ON An innerproduct onavectorspace V isafunctionthat, toeachpairofvectorsu and v in V , associatesarealnumber hu; vi andsatisfiesthefollowingaxioms,forall u, v, w in V andallscalars c:

1. hu; vi D hv; ui2. hu C v;wi D hu;wi C hv;wi3. hcu; vi D chu; vi4. hu;ui � 0 and hu; ui D 0 ifandonlyif u D 0

A vectorspacewithaninnerproductiscalledan innerproductspace.

Thevectorspace Rn withthestandardinnerproductisaninnerproductspace, and

nearlyeverythingdiscussedinthischapterfor Rn carriesovertoinnerproductspaces.

Theexamplesinthissectionandthenextlaythefoundationforavarietyofapplicationstreatedincoursesinengineering, physics, mathematics, andstatistics.

EXAMPLE 1 Fix any two positive numbers—say, 4 and 5—and for vectorsu D .u1; u2/ and v D .v1; v2/ in R

2, set

hu; vi D 4u1v1 C 5u2v2 (1)

Showthatequation(1)definesaninnerproduct.

SOLUTION Certainly Axiom 1 is satisfied, because hu; vi D 4u1v1 C 5u2v2 D4v1u1 C 5v2u2 D hv; ui. If w D .w1; w2/, then

hu C v;wi D 4.u1 C v1/w1 C 5.u2 C v2/w2

D 4u1w1 C 5u2w2 C 4v1w1 C 5v2w2

D hu;wi C hv;wiThisverifiesAxiom 2. ForAxiom 3, compute

hcu; vi D 4.cu1/v1 C 5.cu2/v2 D c.4u1v1 C 5u2v2/ D chu; viForAxiom 4, notethat hu; ui D 4u2

1 C 5u22 � 0, and 4u2

1 C 5u22 D 0 onlyif u1 D u2 D

0, thatis, if u D 0. Also, h0; 0i D 0. So(1)definesaninnerproducton R2.

Inner products similar to (1) can be defined on Rn. They arise naturally in

connectionwith“weightedleast-squares”problems, inwhichweightsareassignedtothevariousentriesinthesumfortheinnerproductinsuchawaythatmoreimportanceisgiventothemorereliablemeasurements.

Fromnowon, whenaninnerproductspaceinvolvespolynomialsorotherfunctions,wewillwritethefunctionsinthefamiliarway, ratherthanusetheboldfacetypeforvectors. Nevertheless, itisimportanttorememberthateachfunction is avectorwhenitistreatedasanelementofavectorspace.

Page 49: 6 Orthogonality and Least Squares

6.7 Inner Product Spaces 377

EXAMPLE 2 Let t0; : : : ; tn bedistinctrealnumbers. For p and q in Pn, definehp; qi D p.t0/q.t0/ C p.t1/q.t1/ C � � � C p.tn/q.tn/ (2)

InnerproductAxioms 1–3arereadilychecked. ForAxiom 4, notethathp; pi D Œp.t0/�2 C Œp.t1/�2 C � � � C Œp.tn/�2 � 0

Also, h0; 0i D 0. (Theboldfacezeroheredenotesthezeropolynomial, thezerovectorin Pn.) If hp; pi D 0, then p mustvanishat n C 1 points: t0; : : : ; tn. Thisispossibleonlyif p isthezeropolynomial, becausethedegreeof p islessthan n C 1. Thus(2)definesaninnerproducton Pn.

EXAMPLE 3 Let V be P2, withtheinnerproductfromExample 2, where t0 D 0,t1 D 1

2, and t2 D 1. Let p.t/ D 12t2 and q.t/ D 2t � 1. Compute hp; qi and hq; qi.

SOLUTION

hp; qi D p.0/q.0/ C p�

12

q�

12

C p.1/q.1/

D .0/.�1/ C .3/.0/ C .12/.1/ D 12

hq; qi D Œq.0/�2 C Œq�

12

�2 C Œq.1/�2

D .�1/2 C .0/2 C .1/2 D 2

Lengths, Distances, and OrthogonalityLet V beaninnerproductspace, withtheinnerproductdenotedby hu; vi. JustasinR

n, wedefinethe length, or norm, ofavector v tobethescalar

kvk Dp

hv; viEquivalently, kvk2 D hv; vi. (Thisdefinitionmakessensebecause hv; vi � 0, butthedefinition doesnot saythat hv; vi isa“sumofsquares,” because v neednotbeanelementof R

n.)A unitvector isonewhoselengthis1. The distancebetween u and v is ku � vk.

Vectors u and v are orthogonal if hu; vi D 0.

EXAMPLE 4 Let P2 havetheinnerproduct(2)ofExample 3. Computethelengthsofthevectors p.t/ D 12t2 and q.t/ D 2t � 1.SOLUTION

kpk2 D hp; pi D Œp.0/�2 C�

p�

12

��2 C Œp.1/�2

D 0 C Œ3�2 C Œ12�2 D 153

kpk Dp

153

FromExample 3, hq; qi D 2. Hence kqk Dp

2.

The Gram–Schmidt ProcessTheexistenceoforthogonalbasesforfinite-dimensionalsubspacesofaninnerproductspacecanbeestablishedbytheGram–Schmidtprocess, justasin R

n. Certainorthogo-nalbasesthatarisefrequentlyinapplicationscanbeconstructedbythisprocess.

Theorthogonalprojectionofavectorontoasubspace W withanorthogonalbasiscanbeconstructedasusual. Theprojectiondoesnotdependonthechoiceoforthogonalbasis, andithasthepropertiesdescribedintheOrthogonalDecompositionTheoremandtheBestApproximationTheorem.

Page 50: 6 Orthogonality and Least Squares

378 CHAPTER 6 Orthogonality and Least Squares

EXAMPLE 5 LetV beP4 withtheinnerproductinExample 2, involvingevaluationofpolynomialsat �2, �1, 0, 1, and2, andview P2 asasubspaceof V . Produceanorthogonalbasisfor P2 byapplyingtheGram–Schmidtprocesstothepolynomials1, t ,and t2.SOLUTION Theinnerproductdependsonlyonthevaluesofapolynomialat�2; : : : ; 2,sowelistthevaluesofeachpolynomialasavectorin R

5, underneaththenameofthepolynomial:¹

Polynomial: 1 t t 2

Vectorofvalues:

2

66664

1

1

1

1

1

3

77775

;

2

66664

�2

�1

0

1

2

3

77775

;

2

66664

4

1

0

1

4

3

77775

Theinnerproductoftwopolynomialsin V equalsthe(standard)innerproductoftheircorrespondingvectorsin R

5. Observethat t isorthogonaltotheconstantfunction1. Sotake p0.t/ D 1 and p1.t/ D t . For p2, usethevectorsin R

5 tocomputetheprojectionof t2 onto Span fp0; p1g:

ht2; p0i D ht2; 1i D 4 C 1 C 0 C 1 C 4 D 10

hp0; p0i D 5

ht2; p1i D ht2; ti D �8 C .�1/ C 0 C 1 C 8 D 0

Theorthogonalprojectionof t2 onto Span f1; tg is 105

p0 C 0p1. Thus

p2.t/ D t2 � 2p0.t/ D t2 � 2

Anorthogonalbasisforthesubspace P2 of V is:Polynomial: p0 p1 p2

Vectorofvalues:

2

66664

1

1

1

1

1

3

77775

;

2

66664

�2

�1

0

1

2

3

77775

;

2

66664

2

�1

�2

�1

2

3

77775

(3)

Best Approximation in Inner Product SpacesA commonprobleminappliedmathematicsinvolvesavectorspace V whoseelementsarefunctions. Theproblemistoapproximateafunction f in V byafunction g fromaspecifiedsubspace W of V . The“closeness”oftheapproximationof f dependsontheway kf � gk isdefined. Wewillconsideronlythecaseinwhichthedistancebetweenf and g isdeterminedbyaninnerproduct. Inthiscase, the bestapproximationto f byfunctionsin W istheorthogonalprojectionof f ontothesubspace W .

EXAMPLE 6 Let V be P4 withtheinnerproductinExample 5, andlet p0, p1,and p2 betheorthogonalbasisfoundinExample 5forthesubspace P2. Findthebestapproximationto p.t/ D 5 � 1

2t4 bypolynomialsin P2.

¹Eachpolynomialin P4 isuniquelydeterminedbyitsvalueatthefivenumbers �2; : : : ; 2. Infact, thecorrespondencebetween p anditsvectorofvaluesisanisomorphism, thatis, aone-to-onemappingontoR5 thatpreserveslinearcombinations.

Page 51: 6 Orthogonality and Least Squares

6.7 Inner Product Spaces 379

SOLUTION Thevaluesof p0; p1, and p2 atthenumbers �2, �1, 0, 1, and2arelistedin R

5 vectorsin(3)above. Thecorrespondingvaluesfor p are �3, 9/2, 5, 9/2, and �3.Compute

hp; p0i D 8; hp; p1i D 0; hp; p2i D �31

hp0; p0i D 5; hp2; p2i D 14

Thenthebestapproximationin V to p bypolynomialsin P2 is

Op D projP2p D hp; p0i

hp0; p0ip0 C hp; p1ihp1; p1ip1 C hp; p2i

hp2; p2ip2

D 85p0 C �31

14p2 D 8

5� 31

14.t2 � 2/:

Thispolynomialistheclosestto p ofallpolynomialsin P2, whenthedistancebetweenpolynomialsismeasuredonlyat �2, �1, 0, 1, and2. SeeFig. 1.

t

y

2

2

p(t)

p(t)ˆ

FIGURE 1

Thepolynomials p0, p1, and p2 inExamples 5and6belongtoaclassofpolynomi-alsthatarereferredtoinstatisticsas orthogonalpolynomials.² TheorthogonalityreferstothetypeofinnerproductdescribedinExample 2.

Two InequalitiesGivenavector v inaninnerproductspace V andgivenafinite-dimensionalsubspaceW , wemayapplythePythagoreanTheoremtotheorthogonaldecompositionof v withrespectto W andobtain

kvk2 D k projW vk2 C kv � projW vk2

SeeFig.2. Inparticular, thisshowsthatthenormoftheprojectionof v ontoW doesnotexceedthenormof v itself. Thissimpleobservationleadstothefollowingimportant

v

0W

||v – projW

v||

||projW

v||

||v||

projW

v

FIGURE 2

Thehypotenuseisthelongestside.inequality.

THEOREM 16 The Cauchy--Schwarz Inequality

Forall u, v in V ,jhu; vij � kuk kvk (4)

²See StatisticsandExperimentalDesigninEngineeringandthePhysicalSciences, 2nded., byNormanL. JohnsonandFredC. Leone(NewYork: JohnWiley&Sons, 1977). Tablestherelist“OrthogonalPolynomials,” whicharesimplythevaluesofthepolynomialatnumberssuchas �2, �1, 0, 1, and2.

Page 52: 6 Orthogonality and Least Squares

380 CHAPTER 6 Orthogonality and Least Squares

PROOF If u D 0, thenbothsidesof(4)arezero, andhencetheinequalityistrueinthiscase. (SeePracticeProblem 1.) If u ¤ 0, let W bethesubspacespannedby u. Recallthat kcuk D jcj kuk foranyscalar c. Thus

k projW vk D

hv;uihu; uiu

D jhv; uijjhu;uijkuk D jhv; uij

kuk2kuk D jhu; vij

kuk

Since k projW vk � kvk, wehave jhu; vijkuk � kvk, whichgives(4).

TheCauchy–Schwarzinequalityisusefulinmanybranchesofmathematics. A fewsimpleapplicationsarepresentedintheexercises. Ourmainneedforthisinequalityhereistoproveanotherfundamentalinequalityinvolvingnormsofvectors. SeeFig. 3.

THEOREM 17 The Triangle Inequality

Forall u; v in V ,ku C vk � kuk C kvk

PROOF ku C vk2 D hu C v;u C vi D hu;ui C 2hu; vi C hv; vi� kuk2 C 2jhu; vij C kvk2

� kuk2 C 2kuk kvk C kvk2 Cauchy–SchwarzD .kuk C kvk/2

Thetriangleinequalityfollowsimmediatelybytakingsquarerootsofbothsides.

0 u

v

||u + v||

u + v

||v||

||u||

FIGURE 3

Thelengthsofthesidesofatriangle.

An Inner Product for C Œa; b� (Calculus required)ProbablythemostwidelyusedinnerproductspaceforapplicationsisthevectorspaceC Œa; b� ofallcontinuousfunctionsonaninterval a � t � b, withaninnerproductthatwewilldescribe.

Webeginbyconsideringapolynomial p andanyinteger n largerthanorequaltothedegreeof p. Then p isin Pn, andwemaycomputea“length”for p usingtheinnerproductofExample 2involvingevaluationat n C 1 pointsin Œa; b�. However,thislengthof p capturesthebehavioratonlythose n C 1 points. Since p isin Pn foralllarge n, wecoulduseamuchlarger n, withmanymorepointsforthe“evaluation”innerproduct. SeeFig. 4.

tba

tba

p(t) p(t)

FIGURE 4 Usingdifferentnumbersofevaluationpointsin Œa; b� tocomputekpk2.

Page 53: 6 Orthogonality and Least Squares

6.7 Inner Product Spaces 381

Letuspartition Œa; b� into n C 1 subintervalsoflength �t D .b � a/=.n C 1/, andlet t0; : : : ; tn bearbitrarypointsinthesesubintervals.

a t0

∆t

tj btn

If n islarge, theinnerproducton Pn determinedby t0; : : : ; tn willtendtogivealargevalueto hp; pi, sowescaleitdownanddivideby n C 1. Observethat 1=.n C 1/ D�t=.b � a/, anddefine

hp; qi D 1

n C 1

nX

jD0

p.tj /q.tj / D 1

b � a

2

4

nX

jD0

p.tj /q.tj /�t

3

5

Now, let n increasewithoutbound. Sincepolynomialsp and q arecontinuousfunctions,theexpressioninbracketsisaRiemannsumthatapproachesadefiniteintegral, andweareledtoconsiderthe averagevalueof p.t/q.t/ ontheinterval Œa; b�:

1

b � a

Z b

a

p.t/q.t/ dt

Thisquantityisdefinedforpolynomialsofanydegree(infact, forall continuousfunctions), andithasallthepropertiesofaninnerproduct, asthenextexampleshows.Thescalefactor 1=.b � a/ isinessentialandisoftenomittedforsimplicity.

EXAMPLE 7 For f , g in C Œa; b�, set

hf; gi DZ b

a

f .t/g.t/ dt (5)

Showthat(5)definesaninnerproducton C Œa; b�.

SOLUTION InnerproductAxioms 1–3followfromelementarypropertiesofdefiniteintegrals. ForAxiom 4, observethat

hf; f i DZ b

a

Œf .t/�2 dt � 0

Thefunction Œf .t/�2 iscontinuousandnonnegativeon Œa; b�. IfthedefiniteintegralofŒf .t/�2 iszero, then Œf .t/�2 mustbeidenticallyzeroon Œa; b�, byatheoreminadvancedcalculus, inwhichcase f isthezerofunction. Thus hf; f i D 0 impliesthat f isthezerofunctionon Œa; b�. So(5)definesaninnerproducton C Œa; b�.

EXAMPLE 8 Let V bethespace C Œ0; 1� withtheinnerproductofExample 7, andlet W bethesubspacespannedbythepolynomials p1.t/ D 1, p2.t/ D 2t � 1, andp3.t/ D 12t2. UsetheGram–Schmidtprocesstofindanorthogonalbasisfor W .

SOLUTION Let q1 D p1, andcompute

hp2; q1i DZ 1

0

.2t � 1/.1/ dt D .t2 � t /

ˇˇˇˇ

1

0

D 0

Page 54: 6 Orthogonality and Least Squares

382 CHAPTER 6 Orthogonality and Least Squares

So p2 isalreadyorthogonalto q1, andwecantake q2 D p2. Fortheprojectionof p3

onto W2 D Span fq1; q2g, compute

hp3; q1i DZ 1

0

12t2 � 1 dt D 4t3

ˇˇˇˇ

1

0

D 4

hq1; q1i DZ 1

0

1 � 1 dt D t

ˇˇˇˇ

1

0

D 1

hp3; q2i DZ 1

0

12t2.2t � 1/ dt DZ 1

0

.24t3 � 12t2/ dt D 2

hq2; q2i DZ 1

0

.2t � 1/2 dt D 1

6.2t � 1/3

ˇˇˇˇ

1

0

D 1

3

Then

projW2p3 D hp3; q1i

hq1; q1i q1 C hp3; q2ihq2; q2i

q2 D 4

1q1 C 2

1=3q2 D 4q1 C 6q2

andq3 D p3 � projW2

p3 D p3 � 4q1 � 6q2

Asafunction, q3.t/ D 12t2 � 4 � 6.2t � 1/ D 12t2 � 12t C 2. Theorthogonalbasisforthesubspace W is fq1; q2; q3g.

PRACTICE PROBLEMS

Usetheinnerproductaxiomstoverifythefollowingstatements.

1. hv; 0i D h0; vi D 0.2. hu; v C wi D hu; vi C hu;wi.

6.7 EXERCISES1. Let R

2 have the inner product of Example 1, and letx D .1; 1/ and y D .5; �1/.a. Find kxk, kyk, and jhx; yij2.b. Describeallvectors .´1; ´2/ thatareorthogonalto y.

2. Let R2 havetheinnerproductofExample 1. Showthat

theCauchy–Schwarzinequalityholdsfor x D .3; �2/ andy D .�2; 1/. [Suggestion: Study jhx; yij2.]

Exercises3–8referto P2 withtheinnerproductgivenbyevalua-tionat �1, 0, and1. (SeeExample 2.)

3. Compute hp; qi, where p.t/ D 4 C t , q.t/ D 5 � 4t 2.

4. Compute hp; qi, where p.t/ D 3t � t 2, q.t/ D 3 C 2t 2.

5. Compute kpk and kqk, for p and q inExercise 3.

6. Compute kpk and kqk, for p and q inExercise 4.

7. Computetheorthogonalprojectionof q ontothesubspacespannedby p, for p and q inExercise 3.

8. Computetheorthogonalprojectionof q ontothesubspacespannedby p, for p and q inExercise 4.

9. Let P3 havetheinnerproductgivenbyevaluationat �3, �1,1, and3. Let p0.t/ D 1, p1.t/ D t , and p2.t/ D t 2.a. Computetheorthogonalprojectionof p2 ontothesub-

spacespannedby p0 and p1.b. Find a polynomial q that is orthogonal to p0 and

p1, such that fp0; p1; qg is an orthogonal basis forSpan fp0; p1; p2g. Scale the polynomial q so that itsvectorofvaluesat .�3; �1; 1; 3/ is .1; �1; �1; 1/.

10. Let P3 havetheinnerproductasinExercise 9, with p0; p1,and q thepolynomialsdescribedthere. Findthebestapprox-imationto p.t/ D t 3 bypolynomialsin Span fp0; p1; qg.

11. Let p0, p1, and p2 betheorthogonalpolynomialsdescribedinExample5, wheretheinnerproducton P4 isgivenbyevaluationat �2, �1, 0, 1, and2. Find the orthogonalprojectionof t 3 onto Span fp0; p1; p2g.

12. Finda polynomial p3 such that fp0; p1; p2; p3g (see Ex-ercise 11) is an orthogonal basis for the subspace P3 ofP4. Scalethepolynomial p3 sothatitsvectorofvaluesis.�1; 2; 0; �2; 1/.

Page 55: 6 Orthogonality and Least Squares

6.8 Applications of Inner Product Spaces 383

13. Let A beanyinvertible n � n matrix. Showthatfor u, v inR

n, theformula hu; vi D .Au/� .Av/ D .Au/T .Av/ definesaninnerproducton R

n.

14. Let T beaone-to-onelineartransformationfromavectorspace V into R

n. Showthatfor u, v in V , theformulahu; vi D T .u/�T .v/ definesaninnerproducton V .

UsetheinnerproductaxiomsandotherresultsofthissectiontoverifythestatementsinExercises15–18.

15. hu; cvi D chu; vi forallscalars c.

16. If fu; vg isanorthonormalsetin V , then ku � vk Dp

2.

17. hu; vi D 14ku C vk2 � 1

4ku � vk2.

18. ku C vk2 C ku � vk2 D 2kuk2 C 2kvk2.

19. Given a � 0 and b � 0, let u D�p

apb

and v D�p

bpa

.

UsetheCauchy–Schwarzinequalitytocomparethegeomet-ricmean

pab withthearithmeticmean .a C b/=2.

20. Let u D�

a

b

and v D�

1

1

. Use the Cauchy–Schwarz

inequalitytoshowthat�

a C b

2

�2

� a2 C b2

2

Exercises 21–24refer to V D C Œ0; 1�, with the inner productgivenbyanintegral, asinExample 7.

21. Compute hf; gi, where f .t/ D 1 � 3t 2 and g.t/ D t � t 3.22. Compute hf; gi, where f .t/ D 5t � 3 and g.t/ D t 3 � t 2.23. Compute kf k for f inExercise 21.24. Compute kgk for g inExercise 22.25. Let V bethespace C Œ�1; 1� withtheinnerproductofExam-

ple 7. Findanorthogonalbasisforthesubspacespannedbythepolynomials 1, t , and t 2. Thepolynomialsinthisbasisarecalled Legendrepolynomials.

26. Let V bethespace C Œ�2; 2� withtheinnerproductofExam-ple 7. Findanorthogonalbasisforthesubspacespannedbythepolynomials 1, t , and t 2.

27. [M] Let P4 havetheinnerproductasinExample 5, andletp0, p1, p2 betheorthogonalpolynomialsfromthatexam-ple. Usingyourmatrixprogram, applytheGram–Schmidtprocesstotheset fp0; p1; p2; t 3; t 4g tocreateanorthogonalbasisfor P4.

28. [M] Let V be the space C Œ0; 2�� with the inner prod-uct of Example 7. Use the Gram–Schmidt process tocreate an orthogonal basis for the subspace spanned byf1; cos t; cos2 t; cos3 tg. Useamatrixprogramorcomputa-tionalprogramtocomputetheappropriatedefiniteintegrals.

SOLUTIONS TO PRACTICE PROBLEMS

1. ByAxiom 1, hv; 0i D h0; vi. Then h0; vi D h0v; vi D 0hv; vi, by Axiom 3, soh0; vi D 0.

2. ByAxioms 1, 2, andthen1again, hu; v C wi D hv C w; ui D hv; ui C hw; ui Dhu; vi C hu;wi.

6.8 APPLICATIONS OF INNER PRODUCT SPACES

TheexamplesinthissectionsuggesthowtheinnerproductspacesdefinedinSection 6.7ariseinpracticalproblems. Thefirstexampleisconnectedwiththemassiveleast-squaresproblemofupdatingtheNorthAmericanDatum, describedinthechapter’sintroductoryexample.

Weighted Least-SquaresLet y beavectorof n observations, y1; : : : ; yn, andsupposewewishtoapproximate y byavector Oy thatbelongstosomespecifiedsubspaceof R

n. (InSection 6.5, Oy waswrittenas Ax sothat Oy wasinthecolumnspaceof A.) Denotetheentriesin Oy by Oy1; : : : ; Oyn.Thenthe sumofthesquaresforerror, or SS(E), inapproximating y by Oy is

SS(E) D .y1 � Oy1/2 C � � � C .yn � Oyn/2 (1)

Thisissimply ky � Oyk2, usingthestandardlengthin Rn.

Page 56: 6 Orthogonality and Least Squares

384 CHAPTER 6 Orthogonality and Least Squares

Nowsupposethemeasurementsthatproducedtheentriesin y arenotequallyreliable. (ThiswasthecasefortheNorthAmericanDatum, sincemeasurementsweremadeoveraperiodof140years.) Asanotherexample, theentriesin y mightbecomputedfromvarioussamplesofmeasurements, withunequalsamplesizes.) Thenitbecomesappropriatetoweightthesquarederrorsin(1)insuchawaythatmoreimportanceisassignedtothemorereliablemeasurements.¹ Iftheweightsaredenotedby w2

1 ; : : : ; w2n, thentheweightedsumofthesquaresforerroris

WeightedSS(E) D w21.y1 � Oy1/2 C � � � C w2

n.yn � Oyn/2 (2)

Thisisthesquareofthelengthof y � Oy, wherethelengthisderivedfromaninnerproductanalogoustothatinExample 1inSection 6.7, namely,

hx; yi D w21x1y1 C � � � C w2

nxnyn

Itissometimesconvenienttotransformaweightedleast-squaresproblemintoanequivalentordinaryleast-squaresproblem. LetW bethediagonalmatrixwith(positive)w1; : : : ; wn onitsdiagonal, sothat

W y D

2

6664

w1 0 � � � 0

0 w2

:::: : :

:::

0 � � � wn

3

7775

2

6664

y1

y2

:::

yn

3

7775

D

2

6664

w1y1

w2y2

:::

wnyn

3

7775

withasimilarexpressionfor W Oy. Observethatthe j thtermin(2)canbewrittenas

w2j .yj � Oyj /2 D .wj yj � wj Oyj /2

Itfollowsthattheweighted SS(E) in(2)isthesquareoftheordinarylengthin Rn of

W y � W Oy, whichwewriteas kW y � W Oyk2.Nowsupposetheapproximatingvector Oy istobeconstructedfromthecolumnsof

amatrix A. Thenweseekan Ox thatmakes AOx D Oy ascloseto y aspossible. However,themeasureofclosenessistheweightederror,

kW y � W Oyk2 D kW y � WAOxk2

Thus Ox isthe(ordinary)least-squaressolutionoftheequation

WAx D W y

Thenormalequationfortheleast-squaressolutionis

.WA/T WAx D .WA/T W y

EXAMPLE 1 Find the least-squares line y D ˇ0 C ˇ1x that best fits the data.�2; 3/, .�1; 5/, .0; 5/, .1; 4/, and .2; 3/. Supposetheerrorsinmeasuringthe y-valuesofthelasttwodatapointsaregreaterthanfortheotherpoints. Weightthesedatahalfasmuchastherestofthedata.

¹Noteforreaderswithabackgroundinstatistics: Supposetheerrorsinmeasuringthe yi areindependentrandomvariableswithmeansequaltozeroandvariancesof �2

1 ; : : : ; �2n . Thentheappropriateweightsin(2)

are w2i D 1=�2

i . Thelargerthevarianceoftheerror, thesmallertheweight.

Page 57: 6 Orthogonality and Least Squares

6.8 Applications of Inner Product Spaces 385

SOLUTION AsinSection 6.6, write X forthematrix A and ˇ forthevector x, andobtain

X D

2

66664

1 �2

1 �1

1 0

1 1

1 2

3

77775

; ˇ D�

ˇ0

ˇ1

; y D

2

66664

3

5

5

4

3

3

77775

Foraweightingmatrix, choose W withdiagonalentries2, 2, 2, 1, and1. Left-multiplicationby W scalestherowsof X and y:

WX D

2

66664

2 �4

2 �2

2 0

1 1

1 2

3

77775

; W y D

2

66664

6

10

10

4

3

3

77775

Forthenormalequation, compute

.WX/T WX D�

14 �9

�9 25

and .WX/T W y D�

59

�34

andsolve �

14 �9

�9 25

��

ˇ0

ˇ1

D�

59

�34

Thesolutionofthenormalequationis(totwosignificantdigits) ˇ0 D 4:3 and ˇ1 D :20.Thedesiredlineis

y D 4:3 C :20x

Incontrast, theordinaryleast-squareslineforthesedatais

y D 4:0 � :10x

BothlinesaredisplayedinFig. 1.

2

2

–2

y = 4 – .1x

y = 4.3 + .2x

y

x

FIGURE 1

Weightedandordinaryleast-squareslines.

Trend Analysis of DataLet f representanunknownfunctionwhosevaluesareknown(perhapsonlyapprox-imately)at t0; : : : ; tn. Ifthereisa“lineartrend”inthedata f .t0/; : : : ; f .tn/, thenwemightexpecttoapproximatethevaluesof f byafunctionoftheform ˇ0 C ˇ1t .Ifthereisa“quadratictrend”tothedata, thenwewouldtryafunctionoftheformˇ0 C ˇ1t C ˇ2t2. ThiswasdiscussedinSection 6.6, fromadifferentpointofview.

Insomestatisticalproblems, itisimportanttobeabletoseparatethelineartrendfromthequadratictrend(andpossiblycubicorhigher-ordertrends). Forinstance,supposeengineersareanalyzingtheperformanceofanewcar, and f .t/ representsthedistancebetweenthecarattime t andsomereferencepoint. Ifthecaristravelingatconstantvelocity, thenthegraphof f .t/ shouldbeastraightlinewhoseslopeisthecar’svelocity. Ifthegaspedalissuddenlypressedtothefloor, thegraphof f .t/ willchangetoincludeaquadratictermandpossiblyacubicterm(duetotheacceleration).Toanalyzetheabilityofthecartopassanothercar, forexample, engineersmaywanttoseparatethequadraticandcubiccomponentsfromthelinearterm.

Ifthefunctionisapproximatedbyacurveoftheform y D ˇ0 C ˇ1t C ˇ2t2, thecoefficient ˇ2 maynotgivethedesiredinformationaboutthequadratictrendinthedata,becauseitmaynotbe“independent”inastatisticalsensefromtheother ˇi . Tomake

Page 58: 6 Orthogonality and Least Squares

386 CHAPTER 6 Orthogonality and Least Squares

whatisknownasa trendanalysis ofthedata, weintroduceaninnerproductonthespace Pn analogoustothatgiveninExample 2inSection 6.7. For p, q in Pn, define

hp; qi D p.t0/q.t0/ C � � � C p.tn/q.tn/

Inpractice, statisticiansseldomneedtoconsidertrendsindataofdegreehigherthancubicorquartic. Solet p0, p1, p2, p3 denoteanorthogonalbasisofthesubspace P3 ofPn, obtainedbyapplyingtheGram–Schmidtprocesstothepolynomials1, t , t2, and t3.BySupplementaryExercise 11inChapter 2, thereisapolynomial g in Pn whosevaluesat t0; : : : ; tn coincidewiththoseoftheunknownfunction f . Let Og betheorthogonalprojection(withrespecttothegiveninnerproduct)of g onto P3, say,

Og D c0p0 C c1p1 C c2p2 C c3p3

Then Og iscalledacubic trendfunction, and c0; : : : ; c3 arethe trendcoefficients ofthedata. Thecoefficient c1 measuresthelineartrend, c2 thequadratictrend, and c3 thecubictrend. Itturnsoutthatifthedatahavecertainproperties, thesecoefficientsarestatisticallyindependent.

Since p0; : : : ; p3 areorthogonal, thetrendcoefficientsmaybecomputedoneatatime, independentlyofoneanother. (Recallthat ci D hg; pi i=hpi ; pi i.) Wecanignore p3 and c3 ifwewantonlythequadratictrend. Andif, forexample, weneededtodeterminethequartictrend, wewouldhavetofind(viaGram–Schmidt)onlyapolynomial p4 in P4 thatisorthogonalto P3 andcompute hg; p4i=hp4; p4i.

EXAMPLE 2 Thesimplestandmostcommonuseoftrendanalysisoccurswhenthepoints t0; : : : ; tn canbeadjustedsothattheyareevenlyspacedandsumtozero. Fitaquadratictrendfunctiontothedata .�2; 3/, .�1; 5/, .0; 5/, .1; 4/, and .2; 3/.

SOLUTION The t -coordinatesaresuitablyscaledtousetheorthogonalpolynomialsfoundinExample 5ofSection 6.7:

Polynomial: p0 p1 p2 Data: g

Vectorofvalues:

2

66664

1

1

1

1

1

3

77775

;

2

66664

�2

�1

0

1

2

3

77775

;

2

66664

2

�1

�2

�1

2

3

77775

;

2

66664

3

5

5

4

3

3

77775

Thecalculationsinvolveonlythesevectors, notthespecificformulasfortheorthogonalpolynomials. Thebestapproximationtothedatabypolynomialsin P2 istheorthogonalprojectiongivenby

Op D hg; p0ihp0; p0i

p0 C hg; p1ihp1; p1i

p1 C hg; p2ihp2; p2i

p2

D 205

p0 � 110

p1 � 714

p2

andOp.t/ D 4 � :1t � :5.t2 � 2/ (3)

Sincethecoefficientof p2 isnotextremelysmall, itwouldbereasonabletoconcludethatthetrendisatleastquadratic. ThisisconfirmedbythegraphinFig. 2.

2

2

–2

y

y = p(t)

x

FIGURE 2

Approximationbyaquadratictrendfunction.

Page 59: 6 Orthogonality and Least Squares

6.8 Applications of Inner Product Spaces 387

Fourier Series (Calculus required)Continuousfunctionsareoftenapproximatedbylinearcombinationsofsineandcosinefunctions. Forinstance, acontinuousfunctionmightrepresentasoundwave, anelectricsignalofsometype, orthemovementofavibratingmechanicalsystem.

Forsimplicity, weconsiderfunctionson 0 � t � 2� . Itturnsoutthatanyfunctionin C Œ0; 2�� canbeapproximatedascloselyasdesiredbyafunctionoftheform

a0

2C a1 cos t C � � � C an cos nt C b1 sin t C � � � C bn sin nt (4)

forasufficientlylargevalueof n. Thefunction(4)iscalleda trigonometricpoly-nomial. If an and bn arenotbothzero, thepolynomialissaidtobeof order n. TheconnectionbetweentrigonometricpolynomialsandotherfunctionsinC Œ0; 2�� dependsonthefactthatforany n � 1, theset

f1; cos t; cos 2t; : : : ; cos nt; sin t; sin 2t; : : : ; sin ntg (5)

isorthogonalwithrespecttotheinnerproduct

hf; gi DZ 2�

0

f .t/g.t/ dt (6)

ThisorthogonalityisverifiedasinthefollowingexampleandinExercises 5and6.

EXAMPLE 3 Let C Œ0; 2�� havetheinnerproduct(6), andlet m and n beunequalpositiveintegers. Showthat cosmt and cos nt areorthogonal.

SOLUTION Useatrigonometricidentity. When m ¤ n,

hcosmt; cos nti DZ 2�

0

cosmt cosnt dt

D 1

2

Z 2�

0

Œcos.mt C nt/ C cos.mt � nt/� dt

D 1

2

�sin.mt C nt/

m C nC sin.mt � nt/

m � n

�ˇˇˇˇ

2�

0

D 0

Let W bethesubspaceof C Œ0; 2�� spannedbythefunctionsin(5). Given f

in C Œ0; 2��, thebestapproximationto f byfunctionsin W iscalledthe nth-orderFourierapproximation to f on Œ0; 2��. Sincethefunctionsin(5)areorthogonal,thebestapproximationisgivenbytheorthogonalprojectiononto W . Inthiscase, thecoefficients ak and bk in(4)arecalledthe Fouriercoefficients of f . Thestandardformulaforanorthogonalprojectionshowsthat

ak D hf; cos ktihcos kt; cos kti ; bk D hf; sin kti

hsin kt; sin kti ; k � 1

Exercise 7asksyoutoshowthat hcos kt; cos kti D � and hsin kt; sin kti D � . Thus

ak D 1

Z 2�

0

f .t/ cos kt dt; bk D 1

Z 2�

0

f .t/ sin kt dt (7)

Thecoefficientofthe(constant)function 1 intheorthogonalprojectionis

hf; 1ih1; 1i D 1

2�

Z 2�

0

f .t/�1 dt D 1

2

�1

Z 2�

0

f .t/ cos.0�t / dt

D a0

2

where a0 isdefinedby(7)for k D 0. Thisexplainswhytheconstanttermin(4)iswrittenas a0=2.

Page 60: 6 Orthogonality and Least Squares

388 CHAPTER 6 Orthogonality and Least Squares

EXAMPLE 4 Findthe nth-orderFourierapproximationtothefunction f .t/ D t ontheinterval Œ0; 2��.SOLUTION Compute

a0

2D 1

2�

1

Z 2�

0

t dt D 1

2�

"

1

2t2

ˇˇˇˇ

2�

0

#

D �

andfor k > 0, usingintegrationbyparts,

ak D 1

Z 2�

0

t cos kt dt D 1

�1

k2cos kt C t

ksin kt

�2�

0

D 0

bk D 1

Z 2�

0

t sin kt dt D 1

�1

k2sin kt � t

kcos kt

�2�

0

D � 2

k

Thusthe nth-orderFourierapproximationof f .t/ D t is

� � 2 sin t � sin 2t � 2

3sin 3t � � � � � 2

nsin nt

Figure 3showsthethird-andfourth-orderFourierapproximationsof f .

(a) Third order

y

2�

2�

y = t

t

y

2�

2�

y = t

t

(b) Fourth order

FIGURE 3 Fourierapproximationsofthefunction f .t/ D t .

Thenormofthedifferencebetween f andaFourierapproximationiscalledthemeansquareerror intheapproximation. (Theterm mean refers tothefact thatthenormisdeterminedbyanintegral.) ItcanbeshownthatthemeansquareerrorapproacheszeroastheorderoftheFourierapproximationincreases. Forthisreason, itiscommontowrite

f .t/ D a0

2C1X

mD1

.am cosmt C bm sinmt/

Thisexpressionfor f .t/ is calledthe Fourierseries for f on Œ0; 2��. Thetermam cosmt , forexample, is theprojectionof f ontotheone-dimensionalsubspacespannedby cosmt .

PRACTICE PROBLEMS

1. Let q1.t/ D 1, q2.t/ D t , and q3.t/ D 3t2 � 4. Verifythat fq1; q2; q3g isanorthog-onalsetin C Œ�2; 2� withtheinnerproductofExample 7inSection 6.7(integrationfrom �2 to2).

2. Findthefirst-orderandthird-orderFourierapproximationstof .t/ D 3 � 2 sin t C 5 sin 2t � 6 cos 2t

Page 61: 6 Orthogonality and Least Squares

6.8 Applications of Inner Product Spaces 389

6.8 EXERCISES1. Findtheleast-squaresline y D ˇ0 C ˇ1x thatbestfitsthe

data .�2; 0/, .�1; 0/, .0; 2/, .1; 4/, and .2; 4/, assumingthatthefirstandlastdatapointsarelessreliable. Weightthemhalfasmuchasthethreeinteriorpoints.

2. Suppose5outof25datapointsinaweightedleast-squaresproblemhavea y-measurementthatislessreliablethantheothers, andtheyaretobeweightedhalfasmuchastheother20points. Onemethodistoweightthe20pointsbyafactorof1andtheother5byafactorof 1

2. A secondmethodis

toweightthe20pointsbyafactorof2andtheother5byafactorof1. Dothetwomethodsproducedifferentresults?Explain.

3. FitacubictrendfunctiontothedatainExample 2. Theorthogonalcubicpolynomialis p3.t/ D 5

6t 3 � 17

6t .

4. Tomakeatrendanalysisofsixevenlyspaceddatapoints, onecanuseorthogonalpolynomialswithrespecttoevaluationatthepoints t D �5; �3; �1; 1; 3, and5.a. Showthatthefirstthreeorthogonalpolynomialsare

p0.t/ D 1; p1.t/ D t; and p2.t/ D 38t 2 � 35

8

(Thepolynomial p2 hasbeenscaledsothatitsvaluesattheevaluationpointsaresmallintegers.)

b. Fitaquadratictrendfunctiontothedata

.�5; 1/; .�3; 1/; .�1; 4/; .1; 4/; .3; 6/; .5; 8/

InExercises5–14, thespaceis C Œ0; 2�� withtheinnerproduct(6).

5. Showthat sinmt and sin nt areorthogonalwhen m ¤ n.6. Showthat sinmt and cosnt areorthogonalforallpositive

integers m and n.7. Showthat k cos ktk2 D � and k sin ktk2 D � for k > 0.8. Findthethird-orderFourierapproximationto f .t/ D t � 1.

9. Find the third-order Fourier approximation to f .t/ D2� � t .

10. Find the third-order Fourier approximation to the squarewavefunction, f .t/ D 1 for 0 � t < � and f .t/ D �1 for� � t < 2� .

11. Findthethird-orderFourierapproximationto sin2 t , withoutperforminganyintegrationcalculations.

12. Findthethird-orderFourierapproximationto cos3 t , withoutperforminganyintegrationcalculations.

13. ExplainwhyaFouriercoefficientofthesumoftwofunctionsisthesumofthecorrespondingFouriercoefficientsofthetwofunctions.

14. SupposethefirstfewFouriercoefficientsofsomefunctionf in C Œ0; 2�� are a0, a1, a2, and b1, b2, b3. Whichofthefollowingtrigonometricpolynomialsiscloserto f ? Defendyouranswer.

g.t/ D a0

2C a1 cos t C a2 cos 2t C b1 sin t

h.t/ D a0

2C a1 cos t C a2 cos 2t C b1 sin t C b2 sin 2t

15. [M] RefertothedatainExercise 13inSection 6.6, con-cerningthetakeoffperformanceofanairplane. Supposethepossiblemeasurementerrorsbecomegreaterasthespeedoftheairplaneincreases, andlet W bethediagonalweightingmatrixwhosediagonalentriesare1, 1, 1, .9, .9, .8, .7, .6, .5,.4, .3, .2, and.1. Findthecubiccurvethatfitsthedatawithminimumweightedleast-squareserror, anduseittoestimatethevelocityoftheplanewhen t D 4:5 seconds.

16. [M] Let f4 and f5 bethefourth-orderandfifth-orderFourierapproximationsin C Œ0; 2�� tothesquarewavefunctioninExercise 10. Produceseparategraphsof f4 and f5 ontheinterval Œ0; 2��, andproduceagraphof f5 on Œ�2�; 2��.

SG The Linearity of an Orthogonal Projection 6–25

SOLUTIONS TO PRACTICE PROBLEMS

1. Compute

hq1; q2i DZ 2

�2

1�t dt D 1

2t2

ˇˇˇˇ

2

�2

D 0

hq1; q3i DZ 2

�2

1� .3t2 � 4/ dt D .t3 � 4t/

ˇˇˇˇ

2

�2

D 0

hq2; q3i DZ 2

�2

t � .3t2 � 4/ dt D�

3

4t4 � 2t2

�ˇˇˇˇ

2

�2

D 0

Page 62: 6 Orthogonality and Least Squares

390 CHAPTER 6 Orthogonality and Least Squares

2. Thethird-orderFourierapproximationto f isthebestapproximationin C Œ0; 2��

to f byfunctions(vectors)inthesubspacespannedby1, cos t , cos 2t , cos 3t ,sin t , sin 2t , and sin 3t . But f isobviously in thissubspace, so f isitsownbestapproximation:

f .t/ D 3 � 2 sin t C 5 sin 2t � 6 cos 2t

Forthefirst-orderapproximation, theclosestfunctionto f inthesubspace W DSpanf1; cos t; sin tg is 3 � 2 sin t . Theothertwotermsintheformulafor f .t/ areorthogonaltothefunctionsin W , sotheycontributenothingtotheintegralsthatgivetheFouriercoefficientsforafirst-orderapproximation.

y = 3 – 2 sin ty = f (t)

t

3

9

–32π

y

π

First-andthird-orderapproximationsto f .t/.

CHAPTER 6 SUPPLEMENTARY EXERCISES1. Thefollowingstatementsrefertovectorsin R

n (or Rm/ with

thestandardinnerproduct. MarkeachstatementTrueorFalse. Justifyeachanswer.a. Thelengthofeveryvectorisapositivenumber.b. A vector v anditsnegative �v haveequallengths.c. Thedistancebetween u and v is ku � vk.d. If r isanyscalar, then krvk D rkvk.e. Iftwovectorsareorthogonal, theyarelinearlyindepen-

dent.f. If x is orthogonal to both u and v, then x must be

orthogonalto u � v.g. If ku C vk2 D kuk2 C kvk2, then u and v areorthogonal.h. If ku � vk2 D kuk2 C kvk2, then u and v areorthogonal.i. Theorthogonalprojectionof y onto u isascalarmultiple

of y.j. Ifavector y coincideswithitsorthogonalprojectiononto

asubspace W , then y isin W .k. ThesetofallvectorsinR

n orthogonaltoonefixedvectorisasubspaceof R

n.l. If W isasubspaceof R

n, then W and W ? havenovectorsincommon.

m. If fv1; v2; v3g isanorthogonalsetandif c1, c2, and c3 arescalars, then fc1v1; c2v2; c3v3g isanorthogonalset.

n. Ifamatrix U hasorthonormalcolumns, then U U T D I .o. A squarematrixwithorthogonalcolumnsisanorthogo-

nalmatrix.p. Ifasquarematrixhasorthonormalcolumns, thenitalso

hasorthonormalrows.q. IfW isasubspace, then k projW vk2 C kv � projW vk2 D

kvk2.

r. A least-squaressolutionof Ax D b isthevector AOx inColA closestto b, sothat kb � AOx k � kb � Axk forall x.

s. The normal equations for a least-squares solution ofAx D b aregivenby Ox D .ATA/�1AT b.

2. Let fv1; : : : ; vpg beanorthonormalset. Verifythefollowingequalitybyinduction, beginningwith p D 2. If x D c1v1C� � � C cpvp , then

kxk2 D jc1j2 C � � � C jcpj2

3. Let fv1; : : : ; vpg beanorthonormalsetin Rn. Verifythe

followinginequality, called Bessel’sinequality, whichistrueforeach x in R

n:

kxk2 � jx �v1j2 C jx �v2j2 C � � � C jx �vpj2

4. Let U be an n � n orthogonal matrix. Show that iffv1; : : : ; vng is an orthonormal basis for R

n, then so isfU v1; : : : ; U vng.

5. Showthatifan n � n matrix U satisfies .U x/� .U y/ D x �yforall x and y in R

n, then U isanorthogonalmatrix.

6. Showthatif U isanorthogonalmatrix, thenanyrealeigen-valueof U mustbe ˙1.

7. A Householdermatrix, oran elementaryreflector, hastheform Q D I � 2uuT where u isaunitvector. (SeeExer-cise 13intheSupplementaryExercisesforChapter 2.) ShowthatQ isanorthogonalmatrix. (Elementaryreflectorsareof-tenusedincomputerprogramstoproduceaQR factorizationofamatrix A. If A haslinearlyindependentcolumns, thenleft-multiplicationbyasequenceofelementaryreflectorscanproduceanuppertriangularmatrix.)

Page 63: 6 Orthogonality and Least Squares

Chapter 6 Supplementary Exercises 391

8. Let T W Rn ! R

n bealineartransformationthatpreserveslengths; thatis, kT .x/k D kxk forall x in R

n.a. Show that T also preserves orthogonality; that is,

T .x/�T .y/ D 0 whenever x �y D 0.b. Show that the standard matrix of T is an orthogonal

matrix.

9. Let u and v belinearlyindependentvectorsin Rn thatare

not orthogonal. Describehowtofindthebestapproximationto z in R

n byvectorsoftheform x1u C x2v withoutfirstconstructinganorthogonalbasisfor Span fu; vg.

10. Supposethecolumnsof A arelinearlyindependent. Deter-minewhathappenstotheleast-squaressolution Ox of Ax D bwhen b isreplacedby cb forsomenonzeroscalar c.

11. If a, b, and c are distinct numbers, then the followingsystemisinconsistentbecausethegraphsoftheequationsareparallelplanes. Showthatthesetofallleast-squaressolutionsofthesystemispreciselytheplanewhoseequationis x � 2y C 5´ D .a C b C c/=3.

x � 2y C 5´ D a

x � 2y C 5´ D b

x � 2y C 5´ D c

12. Considertheproblemoffindinganeigenvalueofan n � n

matrix A when an approximate eigenvector v is known.Since v isnotexactlycorrect, theequation

Av D �v .1/

will probably not have a solution. However, � can beestimatedbya least-squares solutionwhen(1) is viewedproperly. Thinkof v asan n � 1 matrix V , thinkof � asavectorin R

1, anddenotethevector Av bythesymbol b.Then(1)becomes b D �V , whichmayalsobewrittenasV � D b. Findtheleast-squaressolutionofthissystemof n

equationsintheoneunknown �, andwritethissolutionusingtheoriginalsymbols. Theresultingestimatefor � iscalledaRayleighquotient. SeeExercises11and12inSection5.8.

13. Usethestepsbelowtoprovethefollowingrelationsamongthe four fundamental subspaces determined by an m � n

matrix A.

RowA D .NulA/?; ColA D .NulAT /?

a. Showthat RowA iscontainedin .NulA/?. (Showthatifx isin RowA, then x isorthogonaltoevery u in NulA.)

b. Suppose rankA D r . Find dimNulA and dim .NulA/?,andthendeducefrompart(a)that RowA D .NulA/?.[Hint: StudytheexercisesforSection 6.3.]

c. Explainwhy ColA D .NulAT /?.

14. Explainwhyanequation Ax D b hasasolutionifandonlyif b isorthogonaltoallsolutionsoftheequation ATx D 0.

Exercises15and16concernthe(real) Schurfactorization ofann � nmatrixA intheformA D URU T , whereU isanorthogonalmatrixand R isan n � n uppertriangularmatrix.1

15. Show that if A admits a (real) Schur factorization, A DURU T , then A has n realeigenvalues, countingmultiplic-ities.

16. Let A bean n � n matrixwith n realeigenvalues, countingmultiplicities, denotedby �1; : : : ; �n. ItcanbeshownthatA admitsa(real)Schurfactorization. Parts(a)and(b)showthekeyideasintheproof. Therestoftheproofamountstorepeating(a)and(b)forsuccessivelysmallermatrices, andthenpiecingtogethertheresults.a. Let u1 bea unit eigenvector corresponding to �1, let

u2; : : : ; un beanyothervectorssuchthat fu1; : : : ; ungis an orthonormal basis for R

n, and then let U DŒ u1 u2 � � � un �. Show that the first column ofU T AU is �1e1, where e1 isthefirstcolumnofthe n � n

identitymatrix.b. Part(a)impliesthat U TAU hastheformshownbelow.

Explainwhytheeigenvaluesof A1 are �2; : : : ; �n. [Hint:SeetheSupplementaryExercisesforChapter 5.]

U TAU D

2

6664

�1 � � � �0::: A1

0

3

7775

[M] When the right side of an equation Ax D b is changedslightly—say, toAx D b C �b forsomevector�b—thesolutionchangesfrom x to x C �x, where �x satisfies A.�x/ D �b.Thequotient k�bk=kbk iscalledthe relativechange in b (orthe relativeerror in b when �b representspossibleerrorintheentriesof b/. Therelativechangeinthesolutionis k�xk=kxk.When A isinvertible, the conditionnumber of A, writtenascond.A/, producesaboundonhowlargetherelativechangeinx canbe:

k�xkkxk � cond.A/�

k�bkkbk .2/

InExercises17–20, solve Ax D b and A.�x/ D �b, andshowthattheinequality(2)holdsineachcase. (Seethediscussionofill-conditioned matricesinExercises41–43inSection 2.3.)

17. A D�

4:5 3:1

1:6 1:1

, b D�

19:249

6:843

, �b D�

:001

�:003

18. A D�

4:5 3:1

1:6 1:1

, b D�

:500

�1:407

, �b D�

:001

�:003

1 Ifcomplexnumbersareallowed, every n� n matrix A admitsa(complex)Schurfactorization,A D URU�1, whereR isuppertriangularand U�1 isthe conjugate transposeof U . Thisveryusefulfactisdiscussedin MatrixAnalysis, byRogerA. HornandCharlesR. Johnson(Cambridge: CambridgeUniversityPress, 1985), pp. 79–100.

Page 64: 6 Orthogonality and Least Squares

392 CHAPTER 6 Orthogonality and Least Squares

19. A D

2

664

7 �6 �4 1

�5 1 0 �2

10 11 7 �3

19 9 7 1

3

775, b D

2

664

:100

2:888

�1:404

1:462

3

775,

�b D 10�4

2

664

:49

�1:28

5:78

8:04

3

775

20. A D

2

664

7 �6 �4 1

�5 1 0 �2

10 11 7 �3

19 9 7 1

3

775, b D

2

664

4:230

�11:043

49:991

69:536

3

775,

�b D 10�4

2

664

:27

7:76

�3:77

3:93

3

775