6 orthogonality and least squares
TRANSCRIPT
6 Orthogonality andLeast Squares
INTRODUCTORY EXAMPLE
The North American Datumand GPS NavigationImaginestartingamassiveprojectthatyouestimatewilltaketenyearsandrequiretheeffortsofscoresofpeopletoconstructandsolvea1,800,000by900,000systemoflinearequations. ThatisexactlywhattheNationalGeodeticSurveydidin1974, whenitsetouttoupdatetheNorthAmericanDatum(NAD)—anetworkof268,000preciselylocatedreferencepointsthatspantheentireNorthAmericancontinent, togetherwithGreenland, Hawaii, theVirginIslands, PuertoRico, andotherCaribbeanislands.
TherecordedlatitudesandlongitudesintheNADmustbedeterminedtowithinafewcentimetersbecausetheyformthebasisforallsurveys, maps, legalpropertyboundaries, andlayoutsofcivilengineeringprojectssuchashighwaysandpublicutilitylines. However,morethan200,000newpointshadbeenaddedtothedatumsincethelastadjustmentin1927, anderrorshadgraduallyaccumulatedovertheyears, duetoimprecisemeasurements andshifts in the earth’s crust. DatagatheringfortheNAD readjustmentwascompletedin1983.
ThesystemofequationsfortheNAD hadnosolutionintheordinarysense, butratherhada least-squaressolution, whichassignedlatitudesandlongitudestothereferencepointsinawaythatcorrespondedbesttothe1.8millionobservations. Theleast-squaressolutionwasfoundin1986bysolvingarelatedsystemofso-called
normalequations, whichinvolved928,735equationsin928,735variables.1
Morerecently, knowledgeofreferencepointsonthegroundhasbecomecrucialforaccuratelydeterminingthelocationsofsatellitesinthesatellite-based GlobalPositioningSystem(GPS).A GPS satellitecalculatesitspositionrelativetotheearthbymeasuringthetimeittakesforsignalstoarrivefromthreegroundtransmitters. Todothis, thesatellitesusepreciseatomicclocksthathavebeensynchronizedwithgroundstations(whoselocationsareknownaccuratelybecauseoftheNAD).
The GlobalPositioningSystem is usedbothfordeterminingthelocationsofnewreferencepointsonthegroundandforfindingauser’spositiononthegroundrelativetoestablishedmaps. Whenacardriver(oramountainclimber)turnsonaGPS receiver, thereceivermeasurestherelativearrivaltimesofsignalsfromatleastthreesatellites. Thisinformation, togetherwiththetransmitteddataaboutthesatellites’locationsandmessagetimes, isusedtoadjusttheGPS receiver’stimeandtodetermineitsapproximatelocationontheearth. Giveninformationfromafourthsatellite, theGPS receivercanevenestablishitsapproximatealtitude.
1A mathematicaldiscussionofthesolutionstrategy(alongwithdetailsoftheentireNAD project)appearsin NorthAmericanDatumof1983,CharlesR.Schwarz(ed.), NationalGeodeticSurvey, NationalOceanicandAtmosphericAdministration(NOAA) ProfessionalPaperNOS 2,1989.
329
330 CHAPTER 6 Orthogonality and Least Squares
BoththeNAD andGPS problemsaresolvedbyfindingavectorthat“approximatelysatisfies”aninconsistent
systemofequations. A carefulexplanationofthisapparentcontradictionwillrequireideasdevelopedinthefirstfivesectionsofthischapter.
WEB
Inordertofindanapproximatesolutiontoaninconsistentsystemofequationsthathasnoactualsolution, awell-definednotionofnearnessisneeded. Section 6.1introducestheconceptsofdistanceandorthogonalityinavectorspace. Sections 6.2and6.3showhoworthogonalitycanbeusedtoidentifythepointwithinasubspace W thatisnearesttoapoint y lyingoutsideof W . Bytaking W tobethecolumnspaceofamatrix,Section 6.5developsamethodforproducingapproximate(“least-squares”)solutionsforinconsistentlinearsystems, suchasthesystemsolvedfortheNAD report.
Section 6.4providesanotheropportunitytoseeorthogonalprojectionsatwork,creatingamatrixfactorizationwidelyusedinnumericallinearalgebra. Theremainingsectionsexaminesomeofthemanyleast-squaresproblemsthatariseinapplications,includingthoseinvectorspacesmoregeneralthan R
n.
6.1 INNER PRODUCT, LENGTH, AND ORTHOGONALITY
Geometricconceptsoflength, distance, andperpendicularity, whicharewellknownforR
2 and R3, aredefinedherefor R
n. Theseconceptsprovidepowerfulgeometrictoolsforsolvingmanyappliedproblems, includingtheleast-squaresproblemsmentionedabove. Allthreenotionsaredefinedintermsoftheinnerproductoftwovectors.
The Inner ProductIf u and v arevectorsin R
n, thenweregard u and v as n � 1 matrices. ThetransposeuT isa 1 � n matrix, andthematrixproduct uT v isa 1 � 1 matrix, whichwewriteasasinglerealnumber(ascalar)withoutbrackets. Thenumber uT v iscalledthe innerproduct of u and v, andoftenitiswrittenas u�v. Thisinnerproduct, mentionedintheexercisesforSection 2.1, isalsoreferredtoasa dotproduct. If
u D
2
6664
u1
u2
:::
un
3
7775
and v D
2
6664
v1
v2
:::
vn
3
7775
thentheinnerproductof u and v is
Œ u1 u2 � � � un �
2
6664
v1
v2
:::
vn
3
7775
D u1v1 C u2v2 C � � � C unvn
6.1 Inner Product, Length, and Orthogonality 331
EXAMPLE 1 Compute u�v and v�u for u D
2
4
2
�5
�1
3
5 and v D
2
4
3
2
�3
3
5.
SOLUTION
u�v D uT v D Œ 2 �5 �1 �
2
4
3
2
�3
3
5 D .2/.3/ C .�5/.2/ C .�1/.�3/ D �1
v�u D vT u D Œ 3 2 �3 �
2
4
2
�5
�1
3
5 D .3/.2/ C .2/.�5/ C .�3/.�1/ D �1
ItisclearfromthecalculationsinExample 1why u�v D v�u. Thiscommutativityoftheinnerproductholdsingeneral. ThefollowingpropertiesoftheinnerproductareeasilydeducedfrompropertiesofthetransposeoperationinSection 2.1. (SeeExercises 21and22attheendofthissection.)
THEOREM 1 Let u, v, and w bevectorsin Rn, andlet c beascalar. Then
a. u�v D v�ub. .u C v/�w D u�w C v�wc. .cu/�v D c.u�v/ D u�.cv/
d. u�u � 0, and u�u D 0 ifandonlyif u D 0
Properties(b)and(c)canbecombinedseveraltimestoproducethefollowingusefulrule:
.c1u1 C � � � C cpup/�w D c1.u1 �w/ C � � � C cp.up �w/
The Length of a VectorIf v isin R
n, withentries v1; : : : ; vn, thenthesquarerootof v�v isdefinedbecause v�visnonnegative.
DEF IN I T I ON The length (or norm)of v isthenonnegativescalar kvk definedby
kvk Dpv�v D
q
v21 C v2
2 C � � � C v2n; and kvk2 D v�v
Suppose v isin R2, say, v D
�
a
b
�
. Ifweidentify v withageometricpointinthe
|a|
|b|
x1
x2
(a, b)
a2 + b2√
0
FIGURE 1
Interpretationof kvk aslength.
plane, asusual, then kvk coincideswiththestandardnotionofthelengthofthelinesegmentfromtheoriginto v. ThisfollowsfromthePythagoreanTheoremappliedtoatrianglesuchastheoneinFig. 1.
A similarcalculationwiththediagonalofarectangularboxshowsthatthedefinitionoflengthofavector v in R
3 coincideswiththeusualnotionoflength.Foranyscalar c, thelengthof cv is jcj timesthelengthof v. Thatis,
kcvk D jcjkvk
(Toseethis, compute kcvk2 D .cv/� .cv/ D c2v�v D c2kvk2 andtakesquareroots.)
332 CHAPTER 6 Orthogonality and Least Squares
A vectorwhoselengthis1iscalleda unitvector. Ifwe divide anonzerovector vbyitslength—thatis, multiplyby 1=kvk—weobtainaunitvector u becausethelengthof u is .1=kvk/kvk. Theprocessofcreating u from v issometimescalled normalizingv, andwesaythat u is inthesamedirection as v.
Severalexamplesthatfollowusethespace-savingnotationfor(column)vectors.
EXAMPLE 2 Let v D .1; �2; 2; 0/. Findaunitvector u inthesamedirectionas v.
SOLUTION First, computethelengthof v:
kvk2 D v�v D .1/2 C .�2/2 C .2/2 C .0/2 D 9
kvk Dp
9 D 3
Then, multiply v by 1=kvk toobtain
u D 1
kvkv D 1
3v D 1
3
2
664
1
�2
2
0
3
775
D
2
664
1=3
�2=3
2=3
0
3
775
Tocheckthat kuk D 1, itsufficestoshowthat kuk2 D 1.
kuk2 D u�u D�
13
�2 C�
� 23
�2 C�
23
�2 C .0/2
D 19
C 49
C 49
C 0 D 1
EXAMPLE 3 Let W bethesubspaceof R2 spannedby x D . 2
3; 1/. Findaunit
vector z thatisabasisfor W .
SOLUTION W consistsofallmultiplesof x, asinFig. 2(a). Anynonzerovectorin W
isabasisfor W . Tosimplifythecalculation, “scale” x toeliminatefractions. Thatis,multiply x by3toget
y D�
2
3
�
Nowcompute kyk2 D 22 C 32 D 13, kyk Dp
13, andnormalize y toget
z D 1p13
�
2
3
�
D�
2=p
13
3=p
13
�
SeeFig. 2(b). Anotherunitvectoris .�2=p
13; �3=p
13/.
(a)
x1
x2
x
W
1
1
(b)
x1
x2
y
z
1
1
FIGURE 2
Normalizingavectortoproduceaunitvector.
Distance in Rn
Wearereadynowtodescribehowcloseonevectoristoanother. Recallthatif a and b
arerealnumbers, thedistanceonthenumberlinebetween a and b isthenumber ja � bj.TwoexamplesareshowninFig. 3. ThisdefinitionofdistanceinR hasadirectanaloguein R
n.
|2 – 8| = |–6| = 6 or |8 – 2| = |6| = 6 |(–3) – 4| = |–7| = 7 or |4 – (–3)| = |7| = 7
6 units apart
a b a b
7 units apart
1 32 4 5 6 7 8 9 1 30 2–1–3 –2 4 5
FIGURE 3 Distancesin R.
6.1 Inner Product, Length, and Orthogonality 333
DEF IN I T I ON For u and v in Rn, the distancebetweenuandv, writtenas dist.u; v/, isthe
lengthofthevector u � v. Thatis,
dist.u; v/ D ku � vk
In R2 and R
3, thisdefinitionofdistancecoincideswiththeusualformulasfortheEuclideandistancebetweentwopoints, asthenexttwoexamplesshow.
EXAMPLE 4 Computethedistancebetweenthevectors u D .7; 1/ and v D .3; 2/.
SOLUTION Calculate
u � v D�
7
1
�
��
3
2
�
D�
4
�1
�
ku � vk Dp
42 C .�1/2 Dp
17
Thevectors u, v, and u � v areshowninFig. 4. Whenthevector u � v isaddedto v, theresultis u. NoticethattheparallelograminFig. 4showsthatthedistancefromu to v isthesameasthedistancefrom u � v to 0.
||u – v||
x1
x2
v
u
u – v
–v
1
1
FIGURE 4 Thedistancebetween u and v isthelengthof u � v.
EXAMPLE 5 If u D .u1; u2; u3/ and v D .v1; v2; v3/, then
dist.u; v/ D ku � vk Dp
.u � v/�.u � v/
Dp
.u1 � v1/2 C .u2 � v2/2 C .u3 � v3/2
Orthogonal VectorsTherestofthischapterdependsonthefactthattheconceptofperpendicularlinesin
||u –(– v)||
||u – v||
v
0
u
–v
FIGURE 5
ordinaryEuclideangeometryhasananaloguein Rn.
Consider R2 or R
3 andtwolinesthroughtheorigindeterminedbyvectors u and v.ThetwolinesshowninFig. 5aregeometricallyperpendicularifandonlyifthedistancefrom u to v isthesameasthedistancefrom u to �v. Thisisthesameasrequiringthesquaresofthedistancestobethesame. Now
Œ dist.u; �v/ �2 D ku � .�v/k2 D ku C vk2
D .u C v/� .u C v/
D u�.u C v/ C v� .u C v/ Theorem 1(b)D u�u C u�v C v�u C v�v Theorem 1(a), (b)
D kuk2 C kvk2 C 2u�v Theorem 1(a) (1)
334 CHAPTER 6 Orthogonality and Least Squares
Thesamecalculationswith v and �v interchangedshowthatŒdist.u; v/�2 D kuk2 C k � vk2 C 2u� .�v/
D kuk2 C kvk2 � 2u�vThetwosquareddistancesareequalifandonlyif 2u�v D �2u�v, whichhappensifandonlyif u�v D 0.
Thiscalculationshowsthatwhenvectors u and v areidentifiedwithgeometricpoints, thecorrespondinglinesthroughthepointsandtheoriginareperpendicularifandonlyif u�v D 0. Thefollowingdefinitiongeneralizesto R
n thisnotionofperpen-dicularity(or orthogonality, asitiscommonlycalledinlinearalgebra).
DEF IN I T I ON Twovectors u and v in Rn are orthogonal (toeachother)if u�v D 0.
Observethatthezerovectorisorthogonaltoeveryvectorin Rn because 0T v D 0
forall v.Thenexttheoremprovidesausefulfactaboutorthogonalvectors. Theprooffol-
lowsimmediatelyfromthecalculationin(1)aboveandthedefinitionoforthogonality.TherighttriangleshowninFig. 6providesavisualizationofthelengthsthatappearinthetheorem.
THEOREM 2 The Pythagorean Theorem
Twovectors u and v areorthogonalifandonlyif ku C vk2 D kuk2 C kvk2.
Orthogonal ComplementsToprovidepracticeusinginnerproducts, weintroduceaconceptherethatwillbeofuseinSection 6.3andelsewhereinthechapter. Ifavector z isorthogonaltoeveryvectorinasubspace W of R
n, then z issaidtobe orthogonalto W . Thesetofallvectors zthatareorthogonalto W iscalledthe orthogonalcomplement of W andisdenotedbyW ? (andreadas“W perpendicular”orsimply“W perp”).
v
u + v
||u + v|| u
||v||
||u||
0
FIGURE 6
EXAMPLE 6 Let W beaplanethroughtheoriginin R3, andlet L betheline
throughtheoriginandperpendicularto W . If z and w arenonzero, z ison L, and
w
zL
W
0
FIGURE 7
A planeandlinethrough 0 asorthogonalcomplements.
w isin W , thenthelinesegmentfrom 0 to z isperpendiculartothelinesegmentfrom 0to w; thatis, z�w D 0. SeeFig. 7. Soeachvectoron L isorthogonaltoevery w in W .Infact, L consistsof all vectorsthatareorthogonaltothe w’sin W , and W consistsofallvectorsorthogonaltothe z’sin L. Thatis,
L D W ? and W D L?
Thefollowingtwofactsabout W ?, with W asubspaceof Rn, areneededlater
inthechapter. ProofsaresuggestedinExercises 29and30. Exercises 27–31provideexcellentpracticeusingpropertiesoftheinnerproduct.
1. A vector x isin W ? ifandonlyif x isorthogonaltoeveryvectorinasetthatspans W .
2. W ? isasubspaceof Rn.
6.1 Inner Product, Length, and Orthogonality 335
ThenexttheoremandExercise 31verifytheclaimsmadeinSection 4.6concerningthesubspacesshowninFig. 8. (AlsoseeExercise 28inSection 4.6.)
A
0 0
Row A
Nul A
Col A
Nul AT
FIGURE 8 Thefundamentalsubspacesdeterminedbyan m � n matrix A.
THEOREM 3 Let A bean m � n matrix. Theorthogonalcomplementoftherowspaceof A isthenullspaceof A, andtheorthogonalcomplementofthecolumnspaceof A isthenullspaceof AT :
.RowA/? D NulA and .ColA/? D NulAT
PROOF Therow–columnruleforcomputing Ax showsthatif x isin NulA, then x isorthogonaltoeachrowof A (withtherowstreatedasvectorsin R
n/. Sincetherowsof A spantherowspace, x isorthogonalto RowA. Conversely, if x isorthogonaltoRowA, then x iscertainlyorthogonaltoeachrowof A, andhence Ax D 0. Thisprovesthefirststatementofthetheorem. Sincethisstatementistrueforanymatrix, itistruefor AT . Thatis, theorthogonalcomplementoftherowspaceof AT isthenullspaceofAT . Thisprovesthesecondstatement, because RowAT D ColA.
Angles in R2 and R
3 (Optional)If u and v arenonzerovectorsineitherR
2 orR3, thenthereisaniceconnectionbetween
theirinnerproductandtheangle # betweenthetwolinesegmentsfromtheorigintothepointsidentifiedwith u and v. Theformulais
u�v D kuk kvk cos# (2)
ToverifythisformulaforvectorsinR2, considerthetriangleshowninFig. 9, withsides
oflengths kuk, kvk, and ku � vk. Bythelawofcosines,ku � vk2 D kuk2 C kvk2 � 2kuk kvk cos#
(u1, u
2)
(v1, v
2)
||u – v||
||v||
||u|| �
FIGURE 9 Theanglebetweentwovectors.
336 CHAPTER 6 Orthogonality and Least Squares
whichcanberearrangedtoproduce
kuk kvk cos# D 1
2
�
kuk2 C kvk2 � ku � vk2�
D 1
2
�
u21 C u2
2 C v21 C v2
2 � .u1 � v1/2 � .u2 � v2/2�
D u1v1 C u2v2
D u�v
Theverificationfor R3 issimilar. When n > 3, formula (2)maybeusedto define the
anglebetweentwovectorsin Rn. Instatistics, forinstance, thevalueof cos# defined
by(2)forsuitablevectors u and v iswhatstatisticianscalla correlationcoefficient.
PRACTICE PROBLEMS
Let a D�
�2
1
�
, b D�
�3
1
�
, c D
2
4
4=3
�1
2=3
3
5, and d D
2
4
5
6
�1
3
5.
1. Computea�ba�a
and�a�ba�a
�
a.
2. Findaunitvector u inthedirectionof c.3. Showthat d isorthogonalto c.4. UsetheresultsofPracticeProblems 2and3toexplainwhy dmustbeorthogonalto
theunitvector u.
6.1 EXERCISESComputethequantitiesinExercises1–8usingthevectors
u D�
�1
2
�
, v D�
4
6
�
, w D
2
4
3
�1
�5
3
5, x D
2
4
6
�2
3
3
5
1. u �u, v �u, and v �uu �u
2. w �w, x �w, and x �ww �w
3. 1
w �ww 4. 1
u �uu
5.�u �vv �v
�
v 6.�x �wx � x
�
x
7. kwk 8. kxk
InExercises9–12, findaunitvectorinthedirectionofthegivenvector.
9.�
�30
40
�
10.
2
4
�6
4
�3
3
5
11.
2
4
7=4
1=2
1
3
5 12.�
8=3
2
�
13. Findthedistancebetween x D�
10
�3
�
and y D�
�1
�5
�
.
14. Findthedistancebetween u D
2
4
0
�5
2
3
5 and z D
2
4
�4
�1
8
3
5.
DeterminewhichpairsofvectorsinExercises15–18areorthog-onal.
15. a D�
8
�5
�
, b D�
�2
�3
�
16. u D
2
4
12
3
�5
3
5, v D
2
4
2
�3
3
3
5
17. u D
2
664
3
2
�5
0
3
775, v D
2
664
�4
1
�2
6
3
775
18. y D
2
664
�3
7
4
0
3
775, z D
2
664
1
�8
15
�7
3
775
InExercises19and20, allvectorsareinRn. Markeachstatement
TrueorFalse. Justifyeachanswer.
19. a. v �v D kvk2.b. Foranyscalar c, u � .cv/ D c.u �v/.c. Ifthedistancefrom u to v equalsthedistancefrom u to
�v, then u and v areorthogonal.d. Forasquarematrix A, vectorsin ColA areorthogonalto
vectorsin NulA.e. If vectors v1; : : : ; vp span a subspace W and if x is
orthogonaltoeach vj for j D 1; : : : ; p, then x isin W ?.
6.1 Inner Product, Length, and Orthogonality 337
20. a. u �v � v �u D 0.b. Foranyscalar c, kcvk D ckvk.c. If x isorthogonaltoeveryvectorinasubspace W , then x
isin W ?.d. If kuk2 C kvk2 D ku C vk2, then u and v areorthogonal.e. Foran m � n matrix A, vectorsinthenullspaceof A are
orthogonaltovectorsintherowspaceof A.21. Usethetransposedefinitionoftheinnerproducttoverify
parts(b)and(c)ofTheorem 1. MentiontheappropriatefactsfromChapter 2.
22. Let u D .u1; u2; u3/. Explain why u �u � 0. When isu �u D 0?
23. Let u D
2
4
2
�5
�1
3
5 and v D
2
4
�7
�4
6
3
5. Computeandcompare
u �v, kuk2, kvk2, and ku C vk2. Donotuse thePythagoreanTheorem.
24. Verifythe parallelogramlaw forvectors u and v in Rn:
ku C vk2 C ku � vk2 D 2kuk2 C 2kvk2
25. Let v D�
a
b
�
. Describetheset H ofvectors�
x
y
�
thatare
orthogonalto v. [Hint: Consider v D 0 and v ¤ 0.]
26. Let u D
2
4
5
�6
7
3
5, andlet W bethesetofall x in R3 suchthat
u �x D 0. Whattheorem inChapter 4canbeusedtoshowthatW isasubspaceof R
3? Describe W ingeometriclanguage.27. Supposeavector y isorthogonaltovectors u and v. Show
that y isorthogonaltothevector u C v.28. Suppose y is orthogonal to u and v. Showthat y is or-
thogonaltoevery w in Span fu; vg. [Hint: Anarbitrary win Span fu; vg hastheform w D c1u C c2v. Showthat y isorthogonaltosuchavector w.]
Span{u, v}
u
w
vy
0
29. Let W D Span fv1; : : : ; vpg. Showthatif x isorthogonaltoeach vj , for 1 � j � p, then x isorthogonaltoeveryvectorin W .
30. Let W beasubspaceof Rn, andlet W ? bethesetofall
vectorsorthogonalto W . Showthat W ? isasubspaceof Rn
usingthefollowingsteps.a. Take z in W ?, andlet u representanyelementof W .
Then z �u D 0. Takeanyscalar c andshowthat cz isorthogonalto u. (Since u wasanarbitraryelementof W ,thiswillshowthat cz isin W ?.)
b. Take z1 and z2 in W ?, andlet u beanyelementof W .Showthat z1 C z2 is orthogonal to u. Whatcanyouconcludeabout z1 C z2? Why?
c. Finishtheproofthat W ? isasubspaceof Rn.
31. Showthatif x isinboth W and W ?, then x D 0.
32. [M] Constructapair u, v ofrandomvectorsin R4, andlet
A D
2
664
:5 :5 :5 :5
:5 :5 �:5 �:5
:5 �:5 :5 �:5
:5 �:5 �:5 :5
3
775
a. Denote the columns of A by a1; : : : ; a4. Com-pute the length of each column, and compute a1 �a2,a1 �a3; a1 �a4; a2 �a3; a2 �a4, and a3 �a4.
b. Computeandcomparethelengthsof u, Au, v, and Av.c. Useequation(2)inthissectiontocomputethecosineof
theanglebetween u and v. Comparethiswiththecosineoftheanglebetween Au and Av.
d. Repeatparts(b)and(c)fortwootherpairsofrandomvectors. Whatdoyouconjectureabouttheeffectof A onvectors?
33. [M] Generaterandomvectors x, y, and v in R4 withinteger
entries(and v ¤ 0), andcomputethequantities�x �vv �v
�
v;�y �vv �v
�
v;.x C y/�v
v �vv;
.10x/�vv �v
v
Repeatthecomputationswithnewrandomvectors x andy. Whatdoyouconjectureaboutthemapping x 7! T .x/ D�x �vv �v
�
v (for v ¤ 0)? Verifyyourconjecturealgebraically.
34. [M] Let A D
2
66664
�6 3 �27 �33 �13
6 �5 25 28 14
8 �6 34 38 18
12 �10 50 41 23
14 �21 49 29 33
3
77775
. Construct
a matrix N whosecolumnsforma basis for NulA, andconstructamatrixR whose rows formabasisfor RowA (seeSection4.6fordetails). PerformamatrixcomputationwithN and R thatillustratesafactfromTheorem3.
338 CHAPTER 6 Orthogonality and Least Squares
SOLUTIONS TO PRACTICE PROBLEMS
1. a�b D 7, a�a D 5. Hencea�ba�a
D 7
5, and
�a�ba�a
�
a D 7
5a D
�
�14=5
7=5
�
.
2. Scale c, multiplyingby3toget y D
2
4
4
�3
2
3
5. Compute kyk2 D 29 and kyk Dp
29.
Theunitvectorinthedirectionofboth c and y is u D 1
kyky D
2
4
4=p
29
�3=p
29
2=p
29
3
5.
3. d isorthogonalto c, because
d�c D
2
4
5
6
�1
3
5�
2
4
4=3
�1
2=3
3
5 D 20
3� 6 � 2
3D 0
4. d isorthogonalto u because u hastheform kc forsome k, andd�u D d� .kc/ D k.d�c/ D k.0/ D 0
6.2 ORTHOGONAL SETS
A setofvectors fu1; : : : ;upg inRn issaidtobean orthogonalset ifeachpairofdistinct
vectorsfromthesetisorthogonal, thatis, if ui �uj D 0 whenever i ¤ j .
EXAMPLE 1 Showthat fu1; u2; u3g isanorthogonalset, where
u1 D
2
4
3
1
1
3
5; u2 D
2
4
�1
2
1
3
5; u3 D
2
4
�1=2
�2
7=2
3
5
SOLUTION Considerthethreepossiblepairsofdistinctvectors, namely, fu1; u2g,fu1; u3g, and fu2;u3g.
u1 �u2 D 3.�1/ C 1.2/ C 1.1/ D 0
u1 �u3 D 3�
� 12
�
C 1.�2/ C 1�
72
�
D 0
u2 �u3 D �1�
� 12
�
C 2.�2/ C 1�
72
�
D 0
Eachpairofdistinctvectorsisorthogonal, andso fu1; u2; u3g isanorthogonalset. SeeFig. 1; thethreelinesegmentstherearemutuallyperpendicular.
x1
x2
x3
u1
u2
u3
FIGURE 1
THEOREM 4 If S D fu1; : : : ; upg isanorthogonalsetofnonzerovectorsin Rn, then S is
linearlyindependentandhenceisabasisforthesubspacespannedby S .
PROOF If 0 D c1u1 C � � � C cpup forsomescalars c1; : : : ; cp , then0 D 0�u1 D .c1u1 C c2u2 C � � � C cpup/�u1
D .c1u1/�u1 C .c2u2/�u1 C � � � C .cpup/�u1
D c1.u1 � u1/ C c2.u2 � u1/ C � � � C cp.up � u1/
D c1.u1 � u1/
because u1 isorthogonalto u2; : : : ; up . Since u1 isnonzero, u1 �u1 isnotzeroandsoc1 D 0. Similarly, c2; : : : ; cp mustbezero. Thus S islinearlyindependent.
6.2 Orthogonal Sets 339
DEF IN I T I ON An orthogonalbasis forasubspace W of Rn isabasisfor W thatisalsoan
orthogonalset.
Thenexttheoremsuggestswhyanorthogonalbasisismuchnicerthanotherbases.Theweightsinalinearcombinationcanbecomputedeasily.
THEOREM 5 Let fu1; : : : ; upg beanorthogonalbasisforasubspace W of Rn. Foreach y in
W , theweightsinthelinearcombination
y D c1u1 C � � � C cpup
aregivenbycj D y�uj
uj � uj
.j D 1; : : : ; p/
PROOF Asintheprecedingproof, theorthogonalityof fu1; : : : ; upg showsthat
y�u1 D .c1u1 C c2u2 C � � � C cpup/�u1 D c1.u1 � u1/
Since u1 �u1 is notzero, theequationabovecanbesolvedfor c1. Tofind cj forj D 2; : : : ; p, compute y�uj andsolvefor cj .
EXAMPLE 2 Theset S D fu1; u2;u3g inExample 1isanorthogonalbasisfor R3.
Expressthevector y D
2
4
6
1
�8
3
5 asalinearcombinationofthevectorsin S .
SOLUTION Compute
y�u1 D 11; y�u2 D �12; y�u3 D �33
u1 � u1 D 11; u2 � u2 D 6; u3 � u3 D 33=2
ByTheorem 5,
y D y�u1
u1 � u1
u1 C y�u2
u2 � u2
u2 C y�u3
u3 � u3
u3
D 11
11u1 C �12
6u2 C �33
33=2u3
D u1 � 2u2 � 2u3
Noticehoweasyitistocomputetheweightsneededtobuild y fromanorthogonalbasis. Ifthebasiswerenotorthogonal, itwouldbenecessarytosolveasystemoflinearequationsinordertofindtheweights, asinChapter 1.
Weturnnexttoaconstructionthatwillbecomeakeystepinmanycalculationsinvolvingorthogonality, anditwillleadtoageometricinterpretationofTheorem 5.
An Orthogonal ProjectionGivenanonzerovector u in R
n, considertheproblemofdecomposingavector y in Rn
intothesumoftwovectors, oneamultipleof u andtheotherorthogonalto u. Wewishtowrite
y D Oy C z (1)
340 CHAPTER 6 Orthogonality and Least Squares
where Oy D ˛u forsomescalar ˛ and z issomevectororthogonalto u. SeeFig. 2. Given
u
y
0 ˆ
z = y – y
y = �u
FIGURE 2
Finding ˛ tomake y � Oyorthogonalto u.
anyscalar ˛, let z D y � ˛u, sothat(1)issatisfied. Then y � Oy isorthogonalto u ifandonlyif
0 D .y � ˛u/�u D y�u � .˛u/�u D y�u � ˛.u�u/
Thatis, (1)issatisfiedwith z orthogonalto u ifandonlyif ˛ D y�uu�u
and Oy D y�uu�u
u.Thevector Oy iscalledthe orthogonalprojectionofyontou, andthevector z iscalledthe componentofyorthogonaltou.
If c isanynonzeroscalarandif u isreplacedby cu inthedefinitionof Oy, thentheorthogonalprojectionof y onto cu isexactlythesameastheorthogonalprojectionof yonto u (Exercise 31). Hencethisprojectionisdeterminedbythe subspace L spannedby u (thelinethrough u and 0). Sometimes Oy isdenotedby projL y andiscalledtheorthogonalprojectionofyonto L. Thatis,
Oy D projL y D y�uu�u
u (2)
EXAMPLE 3 Let y D�
7
6
�
and u D�
4
2
�
. Findtheorthogonalprojectionof y
onto u. Thenwrite y asthesumoftwoorthogonalvectors, onein Span fug andoneorthogonalto u.
SOLUTION Compute
y�u D�
7
6
�
�
�
4
2
�
D 40
u�u D�
4
2
�
�
�
4
2
�
D 20
Theorthogonalprojectionof y onto u is
Oy D y�uu�u
u D 40
20u D 2
�
4
2
�
D�
8
4
�
andthecomponentof y orthogonalto u is
y � Oy D�
7
6
�
��
8
4
�
D�
�1
2
�
Thesumofthesetwovectorsis y. Thatis,�
7
6
�
"y
D�
8
4
�
"Oy
C�
�1
2
�
".y � Oy/
Thisdecompositionof y isillustratedinFig. 3. Note: Ifthecalculationsabovearecorrect, then fOy; y � Oyg willbeanorthogonalset. Asacheck, compute
Oy�.y � Oy/ D�
8
4
�
�
�
�1
2
�
D �8 C 8 D 0
SincethelinesegmentinFig. 3between y and Oy isperpendiculartoL, byconstruc-tionof Oy, thepointidentifiedwith Oy istheclosestpointof L to y. (Thiscanbeprovedfromgeometry. Wewillassumethisfor R
2 nowandproveitfor Rn inSection 6.3.)
6.2 Orthogonal Sets 341
x1
x2
y
u
y
L = Span{u}
3
1 8
6
yy – ˆ
FIGURE 3 Theorthogonalprojectionof y ontoaline L throughtheorigin.
EXAMPLE 4 FindthedistanceinFig. 3from y to L.
SOLUTION Thedistancefrom y to L isthelengthoftheperpendicularlinesegmentfrom y totheorthogonalprojection Oy. Thislengthequalsthelengthof y � Oy. Thusthedistanceis
ky � Oyk Dp
.�1/2 C 22 Dp
5
A Geometric Interpretation of Theorem 5Theformulafortheorthogonalprojection Oy in(2)hasthesameappearanceaseachofthetermsinTheorem 5. ThusTheorem 5decomposesavector y intoasumoforthogonalprojectionsontoone-dimensionalsubspaces.
Itiseasytovisualizethecaseinwhich W D R2 D Span fu1;u2g, with u1 and u2
orthogonal. Any y in R2 canbewrittenintheform
y D y�u1
u1 � u1
u1 C y�u2
u2 � u2
u2 (3)
Thefirsttermin(3)istheprojectionof y ontothesubspacespannedby u1 (thelinethrough u1 andtheorigin), andthesecondtermistheprojectionof y ontothesubspacespannedby u2. Thus(3)expresses y asthesumofitsprojectionsontothe(orthogonal)axesdeterminedby u1 and u2. SeeFig. 4.
0
y
u1
u2
y2 = projection onto u
2
y1 = projection onto u
1
FIGURE 4 A vectordecomposedintothesumoftwoprojections.
Theorem 5decomposeseach y in Span fu1; : : : ; upg intothesumof p projectionsontoone-dimensionalsubspacesthataremutuallyorthogonal.
342 CHAPTER 6 Orthogonality and Least Squares
Decomposing a Force into Component ForcesThedecompositioninFig. 4canoccurinphysicswhensomesortofforceisappliedtoanobject. Choosinganappropriatecoordinatesystemallowstheforcetoberepresentedbyavector y in R
2 or R3. Oftentheprobleminvolvessomeparticulardirectionof
interest, whichisrepresentedbyanothervector u. Forinstance, iftheobjectismovinginastraightlinewhentheforceisapplied, thevector u mightpointinthedirectionofmovement, asinFig. 5. A keystepintheproblemistodecomposetheforceintoacomponentinthedirectionof u andacomponentorthogonalto u. ThecalculationswouldbeanalogoustothosemadeinExample 3above.
y
u
FIGURE 5
Orthonormal SetsA set fu1; : : : ; upg isan orthonormalset ifitisanorthogonalsetofunitvectors. If W
isthesubspacespannedbysuchaset, then fu1; : : : ;upg isan orthonormalbasis forW , sincethesetisautomaticallylinearlyindependent, byTheorem 4.
Thesimplestexampleofanorthonormalsetisthestandardbasis fe1; : : : ; eng forR
n. Anynonemptysubsetof fe1; : : : ; eng isorthonormal, too. Hereisamorecompli-catedexample.
EXAMPLE 5 Showthat fv1; v2; v3g isanorthonormalbasisof R3, where
v1 D
2
64
3=p
11
1=p
11
1=p
11
3
75; v2 D
2
64
�1=p
6
2=p
6
1=p
6
3
75; v3 D
2
64
�1=p
66
�4=p
66
7=p
66
3
75
SOLUTION Compute
v1 �v2 D �3=p
66 C 2=p
66 C 1=p
66 D 0
v1 �v3 D �3=p
726 � 4=p
726 C 7=p
726 D 0
v2 �v3 D 1=p
396 � 8=p
396 C 7=p
396 D 0
Thus fv1; v2; v3g isanorthogonalset. Also,v1 �v1 D 9=11 C 1=11 C 1=11 D 1
v2 �v2 D 1=6 C 4=6 C 1=6 D 1
v3 �v3 D 1=66 C 16=66 C 49=66 D 1
whichshowsthat v1, v2, and v3 areunitvectors. Thus fv1; v2; v3g isanorthonormalset.Sincethesetislinearlyindependent, itsthreevectorsformabasisforR
3. SeeFig. 6.
x1
x2
x3
v1
v2
v3
FIGURE 6
6.2 Orthogonal Sets 343
Whenthevectorsinanorthogonalsetofnonzerovectorsare normalized tohaveunitlength, thenewvectorswillstillbeorthogonal, andhencethenewsetwillbeanor-thonormalset. SeeExercise 32. ItiseasytocheckthatthevectorsinFig. 6(Example 5)aresimplytheunitvectorsinthedirectionsofthevectorsinFig. 1(Example 1).
Matriceswhosecolumnsformanorthonormalsetareimportantinapplicationsandincomputeralgorithmsformatrixcomputations. TheirmainpropertiesaregiveninTheorems 6and7.
THEOREM 6 An m � n matrix U hasorthonormalcolumnsifandonlyif U TU D I .
PROOF Tosimplifynotation, wesupposethat U hasonlythreecolumns, eachavectorin R
m. Theproofofthegeneralcaseisessentiallythesame. Let U D Œ u1 u2 u3 �
andcompute
U TU D
2
64
uT1
uT2
uT3
3
75
�
u1 u2 u3
�
D
2
64
uT1 u1 uT
1 u2 uT1 u3
uT2 u1 uT
2 u2 uT2 u3
uT3 u1 uT
3 u2 uT3 u3
3
75 (4)
Theentriesinthematrixattherightareinnerproducts, usingtransposenotation. Thecolumnsof U areorthogonalifandonlyif
uT1 u2 D uT
2 u1 D 0; uT1 u3 D uT
3 u1 D 0; uT2 u3 D uT
3 u2 D 0 (5)
Thecolumnsof U allhaveunitlengthifandonlyif
uT1 u1 D 1; uT
2 u2 D 1; uT3 u3 D 1 (6)
Thetheoremfollowsimmediatelyfrom(4)–(6).
THEOREM 7 Let U bean m � n matrixwithorthonormalcolumns, andlet x and y bein Rn.
Then
a. kU xk D kxkb. .U x/�.U y/ D x�yc. .U x/� .U y/ D 0 ifandonlyif x�y D 0
Properties(a)and(c)saythatthelinearmapping x 7! U x preserveslengthsandorthogonality. Thesepropertiesarecrucialformanycomputeralgorithms. SeeExer-cise 25fortheproofofTheorem7.
EXAMPLE 6 Let U D
2
64
1=p
2 2=3
1=p
2 �2=3
0 1=3
3
75 and x D
�p2
3
�
. Noticethat U hasor-
thonormalcolumnsand
U TU D�
1=p
2 1=p
2 0
2=3 �2=3 1=3
�2
4
1=p
2 2=3
1=p
2 �2=3
0 1=3
3
5 D�
1 0
0 1
�
Verifythat kU xk D kxk.
344 CHAPTER 6 Orthogonality and Least Squares
SOLUTION
U x D
2
4
1=p
2 2=3
1=p
2 �2=3
0 1=3
3
5
�p2
3
�
D
2
4
3
�1
1
3
5
kU xk Dp
9 C 1 C 1 Dp
11
kxk Dp
2 C 9 Dp
11
Theorems6and7areparticularlyusefulwhenappliedto square matrices. Anorthogonalmatrix isasquareinvertiblematrixU suchthatU�1 D U T . ByTheorem 6,suchamatrixhasorthonormalcolumns.¹ Itiseasytoseethatany square matrixwithorthonormalcolumnsisanorthogonalmatrix. Surprisingly, suchamatrixmusthaveorthonormal rows, too. SeeExercises 27and28. OrthogonalmatriceswillappearfrequentlyinChapter 7.
EXAMPLE 7 Thematrix
U D
2
64
3=p
11 �1=p
6 �1=p
66
1=p
11 2=p
6 �4=p
66
1=p
11 1=p
6 7=p
66
3
75
isanorthogonalmatrixbecauseitissquareandbecauseitscolumnsareorthonormal,byExample 5. Verifythattherowsareorthonormal, too!
PRACTICE PROBLEMS
1. Let u1 D�
�1=p
5
2=p
5
�
and u2 D�
2=p
5
1=p
5
�
. Showthat fu1; u2g isanorthonormal
basisfor R2.
2. Let y and L beasinExample 3andFig. 3. Computetheorthogonalprojection Oy ofy onto L using u D
�
2
1
�
insteadofthe u inExample 3.
3. Let U and x beasinExample 6, andlet y D�
�3p
2
6
�
. Verifythat U x�U y D x�y.
6.2 EXERCISESInExercises1–6, determinewhichsetsofvectorsareorthogonal.
1.
2
4
�1
4
�3
3
5,
2
4
5
2
1
3
5,
2
4
3
�4
�7
3
5 2.
2
4
1
�2
1
3
5,
2
4
0
1
2
3
5,
2
4
�5
�2
1
3
5
3.
2
4
2
�7
�1
3
5,
2
4
�6
�3
9
3
5,
2
4
3
1
�1
3
5 4.
2
4
2
�5
�3
3
5,
2
4
0
0
0
3
5,
2
4
4
�2
6
3
5
5.
2
664
3
�2
1
3
3
775,
2
664
�1
3
�3
4
3
775,
2
664
3
8
7
0
3
775
6.
2
664
5
�4
0
3
3
775,
2
664
�4
1
�3
8
3
775,
2
664
3
3
5
�1
3
775
InExercises7–10, showthat fu1;u2g or fu1;u2; u3g isanorthog-onalbasisfor R
2 or R3, respectively. Thenexpress x asalinear
combinationofthe u’s.
7. u1 D�
2
�3
�
, u2 D�
6
4
�
, and x D�
9
�7
�
¹A betternamemightbe orthonormalmatrix, andthistermisfoundinsomestatisticstexts. However,orthogonalmatrix isthestandardterminlinearalgebra.
6.2 Orthogonal Sets 345
8. u1 D�
3
1
�
, u2 D�
�2
6
�
, and x D�
�6
3
�
9. u1 D
2
4
1
0
1
3
5, u2 D
2
4
�1
4
1
3
5, u3 D
2
4
2
1
�2
3
5, and x D
2
4
8
�4
�3
3
5
10. u1 D
2
4
3
�3
0
3
5, u2 D
2
4
2
2
�1
3
5, u3 D
2
4
1
1
4
3
5, and x D
2
4
5
�3
1
3
5
11. Compute theorthogonal projectionof�
1
7
�
onto the line
through�
�4
2
�
andtheorigin.
12. Computetheorthogonalprojectionof�
1
�1
�
ontotheline
through�
�1
3
�
andtheorigin.
13. Let y D�
2
3
�
and u D�
4
�7
�
. Write y asthesumoftwo
orthogonalvectors, onein Span fug andoneorthogonalto u.
14. Let y D�
2
6
�
and u D�
7
1
�
. Write y asthesumofavector
in Span fug andavectororthogonalto u.
15. Let y D�
3
1
�
and u D�
8
6
�
. Computethedistancefrom ytothelinethrough u andtheorigin.
16. Let y D�
�3
9
�
and u D�
1
2
�
. Computethedistancefrom ytothelinethrough u andtheorigin.
InExercises17–22, determinewhichsetsofvectorsareorthonor-mal. Ifasetisonlyorthogonal, normalizethevectorstoproduceanorthonormalset.
17.
2
4
1=3
1=3
1=3
3
5,
2
4
�1=2
0
1=2
3
5 18.
2
4
0
1
0
3
5,
2
4
0
�1
0
3
5
19.�
�:6
:8
�
,�
:8
:6
�
20.
2
4
�2=3
1=3
2=3
3
5,
2
4
1=3
2=3
0
3
5
21.
2
4
1=p
10
3=p
20
3=p
20
3
5,
2
4
3=p
10
�1=p
20
�1=p
20
3
5,
2
4
0
�1=p
2
1=p
2
3
5
22.
2
4
1=p
18
4=p
18
1=p
18
3
5,
2
4
1=p
2
0
�1=p
2
3
5,
2
4
�2=3
1=3
�2=3
3
5
InExercises23and24, allvectorsareinRn. Markeachstatement
TrueorFalse. Justifyeachanswer.
23. a. Noteverylinearlyindependentsetin Rn isanorthogonal
set.
b. If y isalinearcombinationofnonzerovectorsfromanorthogonalset, thentheweightsinthelinearcombinationcanbecomputedwithoutrowoperationsonamatrix.
c. Ifthevectorsinanorthogonalsetofnonzerovectorsarenormalized, thensomeofthenewvectorsmaynotbeorthogonal.
d. A matrix with orthonormal columns is an orthogonalmatrix.
e. IfL isalinethrough 0 andif Oy istheorthogonalprojectionof y onto L, then kOyk givesthedistancefrom y to L.
24. a. Noteveryorthogonalsetin Rn islinearlyindependent.
b. Ifaset S D fu1; : : : ; upg hasthepropertythat ui � uj D 0
whenever i ¤ j , then S isanorthonormalset.c. Ifthecolumnsofanm � nmatrixA areorthonormal, then
thelinearmapping x 7! Ax preserveslengths.d. Theorthogonalprojectionof y onto v isthesameasthe
orthogonalprojectionof y onto cv whenever c ¤ 0.e. Anorthogonalmatrixisinvertible.
25. ProveTheorem 7. [Hint: For(a), compute kU xk2, orprove(b)first.]
26. Suppose W is a subspace of Rn spanned by n nonzero
orthogonalvectors. Explainwhy W D Rn.
27. LetU beasquarematrixwithorthonormalcolumns. Explainwhy U isinvertible. (Mentionthetheoremsyouuse.)
28. Let U bean n � n orthogonalmatrix. ShowthattherowsofU formanorthonormalbasisof R
n.
29. Let U and V be n � n orthogonalmatrices. ExplainwhyUV isanorthogonalmatrix. [Thatis, explainwhy UV isinvertibleanditsinverseis .UV /T .]
30. Let U beanorthogonalmatrix, andconstruct V byinter-changingsomeofthecolumnsof U . Explainwhy V isanorthogonalmatrix.
31. Showthattheorthogonalprojectionofavector y ontoalineL throughtheoriginin R
2 doesnotdependonthechoiceofthenonzero u in L usedintheformulafor Oy. Todothis, suppose y and u aregivenand Oy hasbeencomputedbyformula(2)inthissection. Replace u inthatformulaby cu,where c isanunspecifiednonzeroscalar. Showthatthenewformulagivesthesame Oy.
32. Let fv1; v2g beanorthogonalsetofnonzerovectors, andletc1, c2 beanynonzeroscalars. Showthat fc1v1; c2v2g isalsoanorthogonalset. Sinceorthogonalityofasetisdefinedintermsofpairsofvectors, thisshowsthatifthevectorsinanorthogonalsetarenormalized, thenewsetwillstillbeorthogonal.
33. Given u ¤ 0 in Rn, let L D Span fug. Showthatthemap-
ping x 7! projL x isalineartransformation.
34. Given u ¤ 0 in Rn, let L D Span fug. For y in R
n, thereflectionofyin L isthepoint reflL y definedby
346 CHAPTER 6 Orthogonality and Least Squares
reflL y D 2� projL y � y
See the figure, which shows that reflL y is the sum ofOy D projL y and Oy � y. Showthatthemapping y 7! reflL yisalineartransformation.
x1
x2 y
u
y
L = Span{u}
yy – ˆ ref lL y
yy –ˆ
Thereflectionof y inalinethroughtheorigin.
35. [M] Showthatthecolumnsofthematrix A areorthogonalbymakinganappropriatematrixcalculation. Statethecal-culationyouuse.
A D
2
66666666664
�6 �3 6 1
�1 2 1 �6
3 6 3 �2
6 �3 6 �1
2 �1 2 3
�3 6 3 2
�2 �1 2 �3
1 2 1 6
3
77777777775
36. [M] Inparts(a)–(d), let U bethematrixformedbynormal-izingeachcolumnofthematrix A inExercise 35.a. Compute U TU and U U T . Howdotheydiffer?b. Generate a random vector y in R
8, and computep D U U Ty and z D y � p. Explainwhy p isin ColA.Verifythat z isorthogonalto p.
c. Verifythat z isorthogonaltoeachcolumnof U .d. Noticethat y D p C z, with p in ColA. Explainwhy z is
in .ColA/?. (Thesignificanceofthisdecompositionofy willbeexplainedinthenextsection.)
SOLUTIONS TO PRACTICE PROBLEMS
1. Thevectorsareorthogonalbecause
u1 � u2 D �2=5 C 2=5 D 0
Theyareunitvectorsbecause
ku1k2 D .�1=p
5/2 C .2=p
5/2 D 1=5 C 4=5 D 1
ku2k2 D .2=p
5/2 C .1=p
5/2 D 4=5 C 1=5 D 1
Inparticular, theset fu1; u2g islinearlyindependent, andhenceisabasisforR2 since
therearetwovectorsintheset.
2. When y D�
7
6
�
and u D�
2
1
�
,
Oy D y�uu�u
u D 20
5
�
2
1
�
D 4
�
2
1
�
D�
8
4
�
Thisisthesame Oy foundinExample 3. Theorthogonalprojectiondoesnotseemtodependonthe u chosenontheline. SeeExercise 31.
3. U y D
2
4
1=p
2 2=3
1=p
2 �2=3
0 1=3
3
5
�
�3p
2
6
�
D
2
4
1
�7
2
3
5
Also, fromExample 6, x D�p
2
3
�
and U x D
2
4
3
�1
1
3
5. Hence
U x�U y D 3 C 7 C 2 D 12; and x�y D �6 C 18 D 12SGMastering: OrthogonalBasis 6–4
6.3 Orthogonal Projections 347
6.3 ORTHOGONAL PROJECTIONS
TheorthogonalprojectionofapointinR2 ontoalinethroughtheoriginhasanimportant
analoguein Rn. Givenavector y andasubspace W in R
n, thereisavector Oy in W suchthat(1) Oy istheuniquevectorin W forwhich y � Oy isorthogonalto W , and(2) Oy istheuniquevectorin W closestto y. SeeFig. 1. Thesetwopropertiesof Oy providethekeytofindingleast-squaressolutionsoflinearsystems, mentionedintheintroductoryexampleforthischapter. ThefullstorywillbetoldinSection6.5.
Toprepareforthefirsttheorem, observethatwheneveravector y iswrittenasalinearcombinationofvectors u1; : : : ;un inR
n, thetermsinthesumfor y canbegroupedy
y0W
FIGURE 1
intotwopartssothat y canbewrittenas
y D z1 C z2
where z1 isalinearcombinationofsomeofthe ui and z2 isalinearcombinationoftherestofthe ui . Thisideaisparticularlyusefulwhen fu1; : : : ;ung isanorthogonalbasis. RecallfromSection 6.1that W ? denotesthesetofallvectorsorthogonaltoasubspace W .
EXAMPLE 1 Let fu1; : : : ; u5g beanorthogonalbasisfor R5 andlet
y D c1u1 C � � � C c5u5
Considerthesubspace W D Span fu1;u2g, andwrite y asthesumofavector z1 in W
andavector z2 in W ?.
SOLUTION Write
y D c1u1 C c2u2„ ƒ‚ …
z1
C c3u3 C c4u4 C c5u5„ ƒ‚ …
z2
z1 D c1u1 C c2u2 isin Span fu1; u2gwhere
z2 D c3u3 C c4u4 C c5u5 isin Span fu3;u4;u5g:and
Toshowthat z2 isin W ?, itsufficestoshowthat z2 isorthogonaltothevectorsinthebasis fu1; u2g for W . (SeeSection 6.1.) Usingpropertiesoftheinnerproduct, compute
z2 �u1 D .c3u3 C c4u4 C c5u5/�u1
D c3u3 � u1 C c4u4 � u1 C c5u5 � u1
D 0
because u1 isorthogonalto u3, u4, and u5. A similarcalculationshowsthat z2 �u2 D 0.Thus z2 isin W ?.
Thenexttheoremshowsthatthedecomposition y D z1 C z2 inExample 1canbecomputedwithouthavinganorthogonalbasisforR
n. Itisenoughtohaveanorthogonalbasisonlyfor W .
348 CHAPTER 6 Orthogonality and Least Squares
THEOREM 8 The Orthogonal Decomposition Theorem
Let W beasubspaceof Rn. Theneach y in R
n canbewrittenuniquelyintheform
y D Oy C z (1)
where Oy isin W and z isin W ?. Infact, if fu1; : : : ; upg isanyorthogonalbasisof W , then
Oy D y�u1
u1 � u1
u1 C � � � C y�up
up � up
up (2)
and z D y � Oy.
Thevector Oy in(1)iscalledthe orthogonalprojectionof y onto W andofteniswrittenas projW y. SeeFig. 2. When W isaone-dimensionalsubspace, theformulaforOy matchestheformulagiveninSection 6.2.
0
W
y
y = projW
yˆ
z = y – y
FIGURE 2 Theorthogonalprojectionof yonto W .
PROOF Let fu1; : : : ; upg beanyorthogonalbasisfor W , anddefine Oy by(2).¹ Then Oyisin W because Oy isalinearcombinationofthebasis u1; : : : ; up . Let z D y � Oy. Sinceu1 isorthogonalto u2; : : : ; up , itfollowsfrom(2)that
z�u1 D .y � Oy/�u1 D y�u1 �� y�u1
u1 � u1
�
u1 � u1 � 0 � � � � � 0
D y�u1 � y�u1 D 0
Thus z isorthogonalto u1. Similarly, z isorthogonaltoeach uj inthebasisfor W .Hence z isorthogonaltoeveryvectorin W . Thatis, z isin W ?.
Toshowthatthedecompositionin(1)isunique, suppose y canalsobewrittenasy D Oy1 C z1, with Oy1 inW and z1 inW ?. Then Oy C z D Oy1 C z1 (sincebothsidesequaly/, andso
Oy � Oy1 D z1 � z
Thisequalityshowsthatthevector v D Oy � Oy1 isin W andin W ? (because z1 and zarebothin W ?, and W ? isasubspace). Hence v�v D 0, whichshowsthat v D 0. Thisprovesthat Oy D Oy1 andalso z1 D z.
Theuniquenessofthedecomposition(1)showsthattheorthogonalprojection Oydependsonlyon W andnotontheparticularbasisusedin(2).
¹Wemayassumethat W isnotthezerosubspace, forotherwise W? D Rn and(1)issimply y D 0C y.Thenextsectionwillshowthatanynonzerosubspaceof Rn hasanorthogonalbasis.
6.3 Orthogonal Projections 349
EXAMPLE 2 Let u1 D
2
4
2
5
�1
3
5, u2 D
2
4
�2
1
1
3
5, and y D
2
4
1
2
3
3
5. Observethat fu1; u2g
isanorthogonalbasisfor W D Span fu1; u2g. Write y asthesumofavectorin W andavectororthogonalto W .
SOLUTION Theorthogonalprojectionof y onto W is
Oy D y�u1
u1 � u1
u1 C y�u2
u2 � u2
u2
D 9
30
2
4
2
5
�1
3
5C 3
6
2
4
�2
1
1
3
5 D 9
30
2
4
2
5
�1
3
5C 15
30
2
4
�2
1
1
3
5 D
2
4
�2=5
2
1=5
3
5
Also
y � Oy D
2
4
1
2
3
3
5 �
2
4
�2=5
2
1=5
3
5 D
2
4
7=5
0
14=5
3
5
Theorem 8ensuresthat y � Oy isin W ?. Tocheckthecalculations, however, itisagoodideatoverifythat y � Oy isorthogonaltoboth u1 and u2 andhencetoallof W . Thedesireddecompositionof y is
y D
2
4
1
2
3
3
5 D
2
4
�2=5
2
1=5
3
5C
2
4
7=5
0
14=5
3
5
A Geometric Interpretation of the Orthogonal ProjectionWhen W isaone-dimensionalsubspace, theformula(2)for projW y containsjustoneterm. Thus, when dimW > 1, eachtermin(2)isitselfanorthogonalprojectionof yontoaone-dimensionalsubspacespannedbyoneofthe u’sinthebasisfor W . Figure 3illustratesthiswhenW isasubspaceofR
3 spannedby u1 and u2. Here Oy1 and Oy2 denotetheprojectionsof y ontothelinesspannedby u1 and u2, respectively. Theorthogonalprojection Oy of y onto W isthesumoftheprojectionsof y ontoone-dimensionalsub-spacesthatareorthogonaltoeachother. Thevector Oy inFig. 3correspondstothevectory inFig. 4ofSection 6.2, becausenowitis Oy thatisin W .
u1
u2
0
y
y2
ˆ
y1
ˆ
y = u2 = y
1 + y
2ˆˆ–––––u
2 . u
2
y . u2–––––u
1 . u
1
y . u1 u
1+
FIGURE 3 Theorthogonalprojectionof y isthesumofitsprojectionsontoone-dimensionalsubspacesthataremutuallyorthogonal.
350 CHAPTER 6 Orthogonality and Least Squares
Properties of Orthogonal ProjectionsIf fu1; : : : ; upg isanorthogonalbasisfor W andif y happenstobein W , thentheformulafor projW y isexactlythesameastherepresentationof y giveninTheorem 5inSection 6.2. Inthiscase, projW y D y.
If y isin W D Span fu1; : : : ; upg, then projW y D y.
Thisfactalsofollowsfromthenexttheorem.
THEOREM 9 The Best Approximation Theorem
Let W beasubspaceof Rn, let y beanyvectorin R
n, andlet Oy betheorthogonalprojectionof y onto W . Then Oy istheclosestpointin W to y, inthesensethat
ky � Oyk < ky � vk (3)
forall v in W distinctfrom Oy.
Thevector Oy inTheorem 9iscalled thebestapproximationto y byelementsofW .Latersectionsinthetextwillexamineproblemswhereagiven y mustbereplaced, orapproximated, byavector v insomefixedsubspace W . Thedistancefrom y to v, givenby ky � vk, canberegardedasthe“error”ofusing v inplaceof y. Theorem 9saysthatthiserrorisminimizedwhen v D Oy.
Inequality(3)leadstoanewproofthat Oy doesnotdependontheparticularorthogo-nalbasisusedtocomputeit. IfadifferentorthogonalbasisforW wereusedtoconstructanorthogonalprojectionof y, thenthisprojectionwouldalsobetheclosestpointin W
to y, namely, Oy.
PROOF Take v inW distinctfrom Oy. SeeFig. 4. Then Oy � v isinW . BytheOrthogonalDecompositionTheorem, y � Oy isorthogonalto W . Inparticular, y � Oy isorthogonalto Oy � v (whichisin W ). Since
y � v D .y � Oy/ C .Oy � v/
thePythagoreanTheoremgives
ky � vk2 D ky � Oyk2 C kOy � vk2
(SeethecoloredrighttriangleinFig. 4. Thelengthofeachsideislabeled.) NowkOy � vk2 > 0 because Oy � v ¤ 0, andsoinequality(3)followsimmediately.
y
v
0
W
||y – v||y
||y – v||ˆ
||y – y||ˆ
FIGURE 4 Theorthogonalprojectionof yonto W istheclosestpointin W to y.
6.3 Orthogonal Projections 351
EXAMPLE 3 If u1 D
2
4
2
5
�1
3
5, u2 D
2
4
�2
1
1
3
5, y D
2
4
1
2
3
3
5, and W D Span fu1; u2g,
asinExample 2, then theclosestpointin W to y is
Oy D y�u1
u1 � u1
u1 C y�u2
u2 � u2
u2 D
2
4
�2=5
2
1=5
3
5
EXAMPLE 4 Thedistancefromapoint y in Rn toasubspace W isdefinedasthe
distancefrom y tothenearestpointinW . Findthedistancefrom y toW D Span fu1; u2g,where
y D
2
4
�1
�5
10
3
5; u1 D
2
4
5
�2
1
3
5; u2 D
2
4
1
2
�1
3
5
SOLUTION BytheBestApproximationTheorem, thedistancefrom y toW is ky � Oyk,where Oy D projW y. Since fu1; u2g isanorthogonalbasisfor W ,
Oy D 15
30u1 C �21
6u2 D 1
2
2
4
5
�2
1
3
5 � 7
2
2
4
1
2
�1
3
5 D
2
4
�1
�8
4
3
5
y � Oy D
2
4
�1
�5
10
3
5 �
2
4
�1
�8
4
3
5 D
2
4
0
3
6
3
5
ky � Oyk2 D 32 C 62 D 45
Thedistancefrom y to W isp
45 D 3p
5.
Thefinaltheoreminthissectionshowshowformula(2)for projW y issimplifiedwhenthebasisfor W isanorthonormalset.
THEOREM 10 If fu1; : : : ; upg isanorthonormalbasisforasubspace W of Rn, then
projW y D .y�u1/u1 C .y�u2/u2 C � � � C .y�up/up (4)
If U D Œ u1 u2 � � � up �, then
projW y D U U Ty forall y in Rn (5)
PROOF Formula(4)followsimmediatelyfrom(2)inTheorem 8. Also, (4)showsthat projW y is a linearcombinationof thecolumnsof U usingtheweights y�u1,y�u2; : : : ; y�up . Theweightscanbewrittenas uT
1 y; uT2 y; : : : ; uT
py, showingthattheyaretheentriesin U Ty andjustifying(5).
Suppose U isan n � p matrixwithorthonormalcolumns, andlet W bethecolumnWEB
spaceof U . ThenU TU x D Ipx D x forall x in R
p Theorem 6
U U Ty D projW y forall y in Rn Theorem 10
If U isan n � n (square)matrixwithorthonormalcolumns, then U isan orthogonalmatrix, thecolumnspace W isallof R
n, and U U Ty D Iy D y forall y in Rn.
Althoughformula(4)isimportantfortheoreticalpurposes, inpracticeitusuallyinvolvescalculationswithsquarerootsofnumbers(intheentriesofthe ui /. Formula(2)isrecommendedforhandcalculations.
352 CHAPTER 6 Orthogonality and Least Squares
PRACTICE PROBLEM
Let u1 D
2
4
�7
1
4
3
5, u2 D
2
4
�1
1
�2
3
5, y D
2
4
�9
1
6
3
5, and W D Span fu1; u2g. Usethefact
that u1 and u2 areorthogonaltocompute projW y.
6.3 EXERCISESInExercises1and2, youmayassumethat fu1; : : : ; u4g isanorthogonalbasisfor R
4.
1. u1 D
2
664
0
1
�4
�1
3
775, u2 D
2
664
3
5
1
1
3
775, u3 D
2
664
1
0
1
�4
3
775, u4 D
2
664
5
�3
�1
1
3
775,
x D
2
664
10
�8
2
0
3
775. Write x asthesumoftwovectors, onein
Span fu1;u2;u3g andtheother in Span fu4g.
2. u1 D
2
664
1
2
1
1
3
775, u2 D
2
664
�2
1
�1
1
3
775, u3 D
2
664
1
1
�2
�1
3
775, u4 D
2
664
�1
1
1
�2
3
775,
v D
2
664
4
5
�3
3
3
775. Write v asthesumof twovectors, onein
Span fu1g andtheother in Span fu2;u3;u4g.
InExercises3–6, verifythat fu1;u2g isanorthogonalset, andthenfindtheorthogonalprojectionof y onto Span fu1;u2g.
3. y D
2
4
�1
4
3
3
5, u1 D
2
4
1
1
0
3
5, u2 D
2
4
�1
1
0
3
5
4. y D
2
4
6
3
�2
3
5, u1 D
2
4
3
4
0
3
5, u2 D
2
4
�4
3
0
3
5
5. y D
2
4
�1
2
6
3
5, u1 D
2
4
3
�1
2
3
5, u2 D
2
4
1
�1
�2
3
5
6. y D
2
4
6
4
1
3
5, u1 D
2
4
�4
�1
1
3
5, u2 D
2
4
0
1
1
3
5
InExercises7–10, let W bethesubspacespannedbythe u’s, andwrite y asthesumofavectorin W andavectororthogonalto W .
7. y D
2
4
1
3
5
3
5, u1 D
2
4
1
3
�2
3
5, u2 D
2
4
5
1
4
3
5
8. y D
2
4
�1
4
3
3
5, u1 D
2
4
1
1
1
3
5, u2 D
2
4
�1
3
�2
3
5
9. y D
2
664
4
3
3
�1
3
775, u1 D
2
664
1
1
0
1
3
775, u2 D
2
664
�1
3
1
�2
3
775, u3 D
2
664
�1
0
1
1
3
775
10. y D
2
664
3
4
5
6
3
775, u1 D
2
664
1
1
0
�1
3
775, u2 D
2
664
1
0
1
1
3
775, u3 D
2
664
0
�1
1
�1
3
775
InExercises11and12, findtheclosestpointto y inthesubspaceW spannedby v1 and v2.
11. y D
2
664
3
1
5
1
3
775, v1 D
2
664
3
1
�1
1
3
775, v2 D
2
664
1
�1
1
�1
3
775
12. y D
2
664
3
�1
1
13
3
775, v1 D
2
664
1
�2
�1
2
3
775, v2 D
2
664
�4
1
0
3
3
775
InExercises13and14, findthebestapproximationto z byvectorsoftheform c1v1 C c2v2.
13. z D
2
664
3
�7
2
3
3
775, v1 D
2
664
2
�1
�3
1
3
775, v2 D
2
664
1
1
0
�1
3
775
14. z D
2
664
2
4
0
�1
3
775, v1 D
2
664
2
0
�1
�3
3
775, v2 D
2
664
5
�2
4
2
3
775
15. Let y D
2
4
5
�9
5
3
5, u1 D
2
4
�3
�5
1
3
5, u2 D
2
4
�3
2
1
3
5. Find the
distancefrom y tothe planein R3 spannedby u1 and u2.
16. Let y, v1, and v2 beasinExercise 12. Findthedistancefromy tothesubspaceof R
4 spannedby v1 and v2.
17. Let y D
2
4
4
8
1
3
5, u1 D
2
4
2=3
1=3
2=3
3
5, u2 D
2
4
�2=3
2=3
1=3
3
5, and
W D Span fu1;u2g.
6.3 Orthogonal Projections 353
a. Let U D Œ u1 u2 �. Compute U TU and U U T .b. Compute projW y and .U U T /y.
18. Let y D�
7
9
�
, u1 D�
1=p
10
�3=p
10
�
, and W D Span fu1g.
a. Let U be the 2 � 1 matrixwhoseonlycolumnis u1.Compute U TU and U U T .
b. Compute projW y and .U U T /y.
19. Let u1 D
2
4
1
1
�2
3
5, u2 D
2
4
5
�1
2
3
5, and u3 D
2
4
0
0
1
3
5. Notethat
u1 and u2 areorthogonalbutthat u3 isnotorthogonalto u1 oru2. Itcanbeshownthat u3 isnotinthesubspace W spannedby u1 and u2. Usethisfacttoconstructanonzerovector v inR
3 thatisorthogonalto u1 and u2.
20. Let u1 and u2 beasinExercise 19, andlet u4 D
2
4
0
1
0
3
5. Itcan
beshownthat u4 isnotinthesubspace W spannedby u1 andu2. Usethisfacttoconstructanonzerovector v in R
3 thatisorthogonalto u1 and u2.
InExercises21and22, allvectorsandsubspacesarein Rn. Mark
eachstatementTrueorFalse. Justifyeachanswer.
21. a. If z is orthogonal to u1 and to u2 and if W DSpan fu1;u2g, then z mustbein W ?.
b. Foreach y andeachsubspace W , thevector y � projW yisorthogonalto W .
c. Theorthogonalprojection Oy of y ontoasubspace W cansometimesdependontheorthogonalbasisfor W usedtocompute Oy.
d. If y isinasubspace W , thentheorthogonalprojectionofy onto W is y itself.
e. Ifthecolumnsofann � pmatrixU areorthonormal, thenU U Ty istheorthogonalprojectionof y ontothecolumnspaceof U .
22. a. If W isasubspaceof Rn andif v isinboth W and W ?,
then v mustbethezerovector.b. IntheOrthogonalDecompositionTheorem, eachtermin
formula(2)for Oy isitselfanorthogonalprojectionof yontoasubspaceof W .
c. If y D z1 C z2, where z1 isinasubspace W and z2 isinW ?, then z1 mustbetheorthogonalprojectionof y ontoW .
d. Thebestapproximationto y byelementsofasubspaceW isgivenbythevector y � projW y.
e. If an n � p matrix U hasorthonormal columns, thenU U Tx D x forall x in R
n.23. Let A bean m � n matrix. Provethateveryvector x in R
n
canbewrittenintheform x D p C u, where p isin RowA
and u isin NulA. Also, showthatiftheequation Ax D bisconsistent, thenthereisaunique p in RowA suchthatAp D b.
24. Let W be a subspace of Rn with an orthogonal basis
fw1; : : : ;wpg, andlet fv1; : : : ; vqg beanorthogonalbasisforW ?.a. Explainwhy fw1; : : : ;wp; v1; : : : ; vqg is anorthogonal
set.b. Explainwhythesetinpart(a)spans R
n.c. Showthat dimW C dimW ? D n.
25. [M] Let U bethe 8 � 4 matrixinExercise 36inSection 6.2.Findtheclosestpointto y D .1; 1; 1; 1; 1; 1; 1; 1/ in ColU .Write the keystrokes or commandsyouuse to solve thisproblem.
26. [M] Let U bethematrixinExercise 25. Findthedistancefrom b D .1; 1; 1; 1; �1; �1; �1; �1/ to ColU .
SOLUTION TO PRACTICE PROBLEM
Compute
projW y D y�u1
u1 � u1
u1 C y�u2
u2 � u2
u2 D 88
66u1 C �2
6u2
D 4
3
2
4
�7
1
4
3
5 � 1
3
2
4
�1
1
�2
3
5 D
2
4
�9
1
6
3
5 D y
Inthiscase, y happenstobealinearcombinationof u1 and u2, so y isin W . Theclosestpointin W to y is y itself.
354 CHAPTER 6 Orthogonality and Least Squares
6.4 THE GRAM SCHMIDT PROCESS
TheGram–Schmidt process is a simple algorithmfor producinganorthogonal ororthonormalbasisforanynonzerosubspaceofR
n. Thefirsttwoexamplesoftheprocessareaimedathandcalculation.
EXAMPLE 1 Let W D Span fx1; x2g, where x1 D
2
4
3
6
0
3
5 and x2 D
2
4
1
2
2
3
5. Con-
x1
x2
x2
v2
x3
W
v1 =
x
1
p
0
FIGURE 1
Constructionofanorthogonalbasis fv1; v2g.
structanorthogonalbasis fv1; v2g for W .
SOLUTION Thesubspace W isshowninFig. 1, alongwith x1, x2, andtheprojectionp of x2 onto x1. Thecomponentof x2 orthogonalto x1 is x2 � p, whichisin W becauseitisformedfrom x2 andamultipleof x1. Let v1 D x1 and
v2 D x2 � p D x2 � x2 � x1
x1 � x1
x1 D
2
4
1
2
2
3
5 � 15
45
2
4
3
6
0
3
5 D
2
4
0
0
2
3
5
Then fv1; v2g isanorthogonalsetofnonzerovectorsin W . Since dimW D 2, thesetfv1; v2g isabasisfor W .
ThenextexamplefullyillustratestheGram–Schmidtprocess. Studyitcarefully.
EXAMPLE 2 Let x1 D
2
664
1
1
1
1
3
775, x2 D
2
664
0
1
1
1
3
775, and x3 D
2
664
0
0
1
1
3
775. Then fx1; x2; x3g is
clearlylinearlyindependentandthusisabasisforasubspace W of R4. Constructan
orthogonalbasisfor W .
SOLUTION
Step1. Let v1 D x1 and W1 D Span fx1g D Span fv1g.Step2. Let v2 bethevectorproducedbysubtractingfrom x2 itsprojectionontothesubspace W1. Thatis, let
v2 D x2 � projW1x2
D x2 � x2 � v1
v1 � v1
v1 Since v1 D x1
D
2
664
0
1
1
1
3
775
� 3
4
2
664
1
1
1
1
3
775
D
2
664
�3=4
1=4
1=4
1=4
3
775
AsinExample 1, v2 is thecomponent of x2 orthogonal to x1, and fv1; v2g is anorthogonalbasisforthesubspace W2 spannedby x1 and x2.Step20 (optional). Ifappropriate, scale v2 tosimplifylatercomputations. Since v2 hasfractionalentries, itisconvenienttoscaleitbyafactorof4andreplace fv1; v2g bytheorthogonalbasis
v1 D
2
664
1
1
1
1
3
775
; v02 D
2
664
�3
1
1
1
3
775
6.4 The Gram–Schmidt Process 355
Step3. Let v3 bethevectorproducedbysubtractingfrom x3 itsprojectionontothesubspace W2. Usetheorthogonalbasis fv1; v02g tocomputethisprojectiononto W2:
projW2x3 D
Projectionofx3 onto v1
❄
x3 � v1
v1 � v1
v1 C
Projectionofx3 onto v02
❄
x3 � v02v02 � v02
v02 D 2
4
2
664
1
1
1
1
3
775
C 2
12
2
664
�3
1
1
1
3
775
D
2
664
0
2=3
2=3
2=3
3
775
Then v3 isthecomponentof x3 orthogonalto W2, namely,
v3 D x3 � projW2x3 D
2
664
0
0
1
1
3
775
�
2
664
0
2=3
2=3
2=3
3
775
D
2
664
0
�2=3
1=3
1=3
3
775
SeeFig. 2foradiagramofthisconstruction. Observethat v3 isin W , because x3
and projW2x3 arebothin W . Thus fv1; v02; v3g isanorthogonalsetofnonzerovectorsandhencealinearlyindependentsetin W . Notethat W isthree-dimensionalsinceitwasdefinedbyabasisofthreevectors. Hence, bytheBasisTheoreminSection 4.5,fv1; v02; v3g isanorthogonalbasisfor W .
projW
2
x3
x3
v3
v1
0
v'2
W2 = Span{v
1, v'
2}
FIGURE 2 Theconstructionof v3 from x3
and W2.
Theproofofthenexttheoremshowsthatthisstrategyreallyworks. Scalingofvectorsisnotmentionedbecausethatisusedonlytosimplifyhandcalculations.
THEOREM 11 The Gram--Schmidt Process
Givenabasis fx1; : : : ; xpg foranonzerosubspace W of Rn, define
v1 D x1
v2 D x2 � x2 � v1
v1 � v1
v1
v3 D x3 � x3 � v1
v1 � v1
v1 � x3 � v2
v2 � v2
v2
:::
vp D xp � xp � v1
v1 � v1
v1 � xp � v2
v2 � v2
v2 � � � � � xp � vp�1
vp�1 � vp�1
vp�1
Then fv1; : : : ; vpg isanorthogonalbasisfor W . Inaddition
Span fv1; : : : ; vkg D Span fx1; : : : ; xkg for 1 � k � p (1)
356 CHAPTER 6 Orthogonality and Least Squares
PROOF For 1 � k � p, letWk D Span fx1; : : : ; xkg. Set v1 D x1, sothat Span fv1g DSpan fx1g. Suppose, for some k < p, we have constructed v1; : : : ; vk so thatfv1; : : : ; vkg isanorthogonalbasisfor Wk . Define
vkC1 D xkC1 � projWkxkC1 (2)
BytheOrthogonalDecompositionTheorem, vkC1 isorthogonalto Wk . NotethatprojWk
xkC1 isinWk andhencealsoinWkC1. Since xkC1 isinWkC1, sois vkC1 (becauseWkC1 isasubspaceandisclosedundersubtraction). Furthermore, vkC1 ¤ 0 becausexkC1 isnotin Wk D Span fx1; : : : ; xkg. Hence fv1; : : : ; vkC1g isanorthogonalsetofnonzerovectorsinthe .k C 1/-dimensionalspace WkC1. BytheBasisTheoreminSec-tion 4.5, thissetisanorthogonalbasisfor WkC1. Hence WkC1 D Span fv1; : : : ; vkC1g.When k C 1 D p, theprocessstops.
Theorem 11showsthatanynonzerosubspaceW ofRn hasanorthogonalbasis, be-
causeanordinarybasis fx1; : : : ; xpg isalwaysavailable(byTheorem 11inSection 4.5),andtheGram–Schmidtprocessdependsonlyontheexistenceoforthogonalprojectionsontosubspacesof W thatalreadyhaveorthogonalbases.
Orthonormal BasesAnorthonormal basis is constructedeasily fromanorthogonal basis fv1; : : : ; vpg:simplynormalize(i.e., “scale”)allthe vk . Whenworkingproblemsbyhand, thisiseasierthannormalizingeach vk assoonasitisfound(becauseitavoidsunnecessarywritingofsquareroots).
EXAMPLE 3 Example 1constructedtheorthogonalbasis
v1 D
2
4
3
6
0
3
5; v2 D
2
4
0
0
2
3
5
Anorthonormalbasisis
u1 D 1
kv1kv1 D 1p
45
2
4
3
6
0
3
5 D
2
4
1=p
5
2=p
5
0
3
5
u2 D 1
kv2kv2 D
2
4
0
0
1
3
5
QR Factorization of MatricesIfan m � n matrix A haslinearlyindependentcolumns x1; : : : ; xn, thenapplyingtheWEB
Gram–Schmidtprocess(withnormalizations)to x1; : : : ; xn amountsto factoring A, asdescribedinthenexttheorem. Thisfactorizationiswidelyusedincomputeralgorithmsforvariouscomputations, suchassolvingequations(discussedinSection 6.5)andfindingeigenvalues(mentionedintheexercisesforSection 5.2).
6.4 The Gram–Schmidt Process 357
THEOREM 12 The QR Factorization
IfA isanm � nmatrixwithlinearlyindependentcolumns, thenA canbefactoredas A D QR, where Q isan m � n matrixwhosecolumnsformanorthonormalbasisfor ColA and R isan n � n uppertriangularinvertiblematrixwithpositiveentriesonitsdiagonal.
PROOF Thecolumnsof A formabasis fx1; : : : ; xng for ColA. Constructanorthonor-malbasis fu1; : : : ; ung for W D ColA withproperty(1)inTheorem 11. ThisbasismaybeconstructedbytheGram–Schmidtprocessorsomeothermeans. Let
Q D Œ u1 u2 � � � un �
For k D 1; : : : ; n; xk isin Span fx1; : : : ; xkg D Span fu1; : : : ; ukg. Sotherearecon-stants, r1k ; : : : ; rkk , suchthat
xk D r1ku1 C � � � C rkkuk C 0�ukC1 C � � � C 0�un
Wemayassumethat rkk � 0. (If rkk < 0, multiplyboth rkk and uk by �1.) Thisshowsthat xk isalinearcombinationofthecolumnsof Q usingasweightstheentriesinthevector
rk D
2
66666664
r1k
:::
rkk
0:::
0
3
77777775
Thatis, xk D Qrk for k D 1; : : : ; n. Let R D Œ r1 � � � rn �. Then
A D Œ x1 � � � xn � D Œ Qr1 � � � Qrn � D QR
ThefactthatR isinvertiblefollowseasilyfromthefactthatthecolumnsofA arelinearlyindependent(Exercise 19). SinceR isclearlyuppertriangular, itsnonnegativediagonalentriesmustbepositive.
EXAMPLE 4 FindaQR factorizationof A D
2
664
1 0 0
1 1 0
1 1 1
1 1 1
3
775.
SOLUTION Thecolumnsof A are thevectors x1, x2, and x3 inExample 2. Anorthogonalbasisfor ColA D Span fx1; x2; x3g wasfoundinthatexample:
v1 D
2
664
1
1
1
1
3
775
; v02 D
2
664
�3
1
1
1
3
775
; v3 D
2
664
0
�2=3
1=3
1=3
3
775
Tosimplifythearithmeticthatfollows, scale v3 byletting v03 D 3v3. Thennormalizethethreevectorstoobtain u1, u2, and u3, andusethesevectorsasthecolumnsof Q:
Q D
2
6664
1=2 �3=p
12 0
1=2 1=p
12 �2=p
6
1=2 1=p
12 1=p
6
1=2 1=p
12 1=p
6
3
7775
358 CHAPTER 6 Orthogonality and Least Squares
Byconstruction, thefirst k columnsofQ areanorthonormalbasisof Span fx1; : : : ; xkg.FromtheproofofTheorem 12,A D QR forsomeR. TofindR, observethatQTQ D I ,becausethecolumnsof Q areorthonormal. Hence
QTA D QT .QR/ D IR D R
and
R D
2
4
1=2 1=2 1=2 1=2
�3=p
12 1=p
12 1=p
12 1=p
12
0 �2=p
6 1=p
6 1=p
6
3
5
2
664
1 0 0
1 1 0
1 1 1
1 1 1
3
775
D
2
4
2 3=2 1
0 3=p
12 2=p
12
0 0 2=p
6
3
5
NUMER ICAL NOTES
1. WhentheGram–Schmidtprocessisrunonacomputer, roundofferrorcanbuildupasthevectors uk arecalculated, onebyone. For j and k largebutunequal, theinnerproducts uT
j uk maynotbesufficientlyclosetozero. Thislossoforthogonalitycanbereducedsubstantiallybyrearrangingtheorderofthecalculations.1 However, adifferentcomputer-basedQR factorizationisusuallypreferredtothismodifiedGram–Schmidtmethodbecauseityieldsamoreaccurateorthonormalbasis, eventhoughthefactorizationrequiresabouttwiceasmucharithmetic.
2. ToproduceaQR factorizationofamatrix A, acomputerprogramusuallyleft-multiplies A byasequenceoforthogonalmatricesuntil A istransformedintoanuppertriangularmatrix. Thisconstructionisanalogoustotheleft-multiplicationbyelementarymatricesthatproducesanLU factorizationofA.
PRACTICE PROBLEM
LetW D Span fx1; x2g, where x1 D
2
4
1
1
1
3
5 and x2 D
2
4
1=3
1=3
�2=3
3
5. Constructanorthonor-
malbasisfor W .
6.4 EXERCISESInExercises1–6, thegivensetisabasisforasubspace W . UsetheGram–Schmidtprocesstoproduceanorthogonalbasisfor W .
1.
2
4
3
0
�1
3
5,
2
4
8
5
�6
3
5 2.
2
4
0
4
2
3
5,
2
4
5
6
�7
3
5
3.
2
4
2
�5
1
3
5,
2
4
4
�1
2
3
5 4.
2
4
3
�4
5
3
5,
2
4
�3
14
�7
3
5
5.
2
664
1
�4
0
1
3
775,
2
664
7
�7
�4
1
3
775
6.
2
664
3
�1
2
�1
3
775,
2
664
�5
9
�9
3
3
775
¹See FundamentalsofMatrixComputations, byDavidS. Watkins(NewYork: JohnWiley&Sons, 1991),pp. 167–180.
6.4 The Gram–Schmidt Process 359
7. FindanorthonormalbasisofthesubspacespannedbythevectorsinExercise 3.
8. FindanorthonormalbasisofthesubspacespannedbythevectorsinExercise 4.
FindanorthogonalbasisforthecolumnspaceofeachmatrixinExercises9–12.
9.
2
664
3 �5 1
1 1 1
�1 5 �2
3 �7 8
3
775
10.
2
664
�1 6 6
3 �8 3
1 �2 6
1 �4 �3
3
775
11.
2
66664
1 2 5
�1 1 �4
�1 4 �3
1 �4 7
1 2 1
3
77775
12.
2
66664
1 3 5
�1 �3 1
0 2 3
1 5 2
1 5 8
3
77775
InExercises 13and14, thecolumnsof Q wereobtainedbyapplyingtheGram–SchmidtprocesstothecolumnsofA. Findanuppertriangularmatrix R suchthat A D QR. Checkyourwork.
13. A D
2
664
5 9
1 7
�3 �5
1 5
3
775, Q D
2
664
5=6 �1=6
1=6 5=6
�3=6 1=6
1=6 3=6
3
775
14. A D
2
664
�2 3
5 7
2 �2
4 6
3
775, Q D
2
664
�2=7 5=7
5=7 2=7
2=7 �4=7
4=7 2=7
3
775
15. FindaQR factorizationofthematrixinExercise 11.16. FindaQR factorizationofthematrixinExercise 12.
InExercises17and18, allvectorsandsubspacesarein Rn. Mark
eachstatementTrueorFalse. Justifyeachanswer.
17. a. If fv1; v2; v3g isanorthogonalbasisfor W , thenmul-tiplying v3 byascalar c givesaneworthogonalbasisfv1; v2; cv3g.
b. TheGram–Schmidtprocessproducesfromalinearlyin-dependentset fx1; : : : ; xpg anorthogonalset fv1; : : : ; vpgwiththepropertythatforeach k, thevectors v1; : : : ; vk
spanthesamesubspaceasthatspannedby x1; : : : ; xk .c. If A D QR, where Q hasorthonormalcolumns, then
R D QTA.18. a. If W D Span fx1; x2; x3g with fx1; x2; x3g linearlyinde-
pendent, andif fv1; v2; v3g isanorthogonalsetinW , thenfv1; v2; v3g isabasisfor W .
b. If x isnotinasubspace W , then x � projW x isnotzero.c. InaQR factorization, say A D QR (when A haslin-
earlyindependentcolumns), thecolumnsof Q formanorthonormalbasisforthecolumnspaceof A.
19. Suppose A D QR, where Q is m � n and R is n � n. ShowthatifthecolumnsofA arelinearlyindependent, thenRmustbeinvertible. [Hint: Studytheequation Rx D 0 andusethefactthat A D QR.]
20. Suppose A D QR, where R isaninvertiblematrix. Showthat A and Q havethesamecolumnspace. [Hint: Given y inColA, showthat y D Qx forsome x. Also, given y in ColQ,showthat y D Ax forsome x.]
21. Given A D QR asinTheorem 12, describehowtofindanorthogonalm � m (square)matrixQ1 andaninvertible n � n
uppertriangularmatrix R suchthat
A D Q1
�
R
0
�
TheMATLAB qr commandsuppliesthis“full”QR factor-izationwhen rankA D n.
22. Let u1; : : : ; up beanorthogonalbasisforasubspace W ofR
n, and let T W Rn ! R
n bedefinedby T .x/ D projW x.Showthat T isalineartransformation.
23. Suppose A D QR isaQR factorizationofan m � n ma-trix A (withlinearlyindependentcolumns). Partition A asŒA1 A2�, where A1 has p columns. ShowhowtoobtainaQR factorizationof A1, andexplainwhyyourfactorizationhastheappropriateproperties.
24. [M] Use the Gram–Schmidt process as in Example 2 toproduceanorthogonalbasisforthecolumnspaceof
A D
2
66664
�10 13 7 �11
2 1 �5 3
�6 3 13 �3
16 �16 �2 5
2 1 �5 �7
3
77775
25. [M] UsethemethodinthissectiontoproduceaQR factor-izationofthematrixinExercise 24.
26. [M] Foramatrixprogram, theGram–Schmidtprocessworksbetterwithorthonormalvectors. Startingwith x1; : : : ; xp asinTheorem 11, let A D Œ x1 � � � xp �. Suppose Q isann � k matrixwhosecolumnsformanorthonormalbasisforthesubspace Wk spannedbythefirst k columnsof A. Thenfor x in R
n, QQT x istheorthogonalprojectionof x onto Wk
(Theorem 10inSection 6.3). If xkC1 isthenextcolumnofA, thenequation(2)intheproofofTheorem 11becomes
vkC1 D xkC1 � Q.QT xkC1/
(The parentheses above reduce the number of arithmeticoperations.) Let ukC1 D vkC1=kvkC1k. Thenew Q forthenextstepis Œ Q ukC1 �. UsethisproceduretocomputetheQR factorizationofthematrixinExercise 24. Writethekeystrokesorcommandsyouuse.
WEB
360 CHAPTER 6 Orthogonality and Least Squares
SOLUTION TO PRACTICE PROBLEM
Let v1 D x1 D
2
4
1
1
1
3
5 and v2 D x2 � x2 � v1
v1 � v1
v1 D x2 � 0v1 D x2. So fx1; x2g isalready
orthogonal. Allthatisneededistonormalizethevectors. Let
u1 D 1
kv1kv1 D 1p
3
2
4
1
1
1
3
5 D
2
4
1=p
3
1=p
3
1=p
3
3
5
Insteadofnormalizing v2 directly, normalize v02 D 3v2 instead:
u2 D 1
kv02kv02 D 1
p
12 C 12 C .�2/2
2
4
1
1
�2
3
5 D
2
4
1=p
6
1=p
6
�2=p
6
3
5
Then fu1;u2g isanorthonormalbasisfor W .
6.5 LEAST-SQUARES PROBLEMS
Thechapter’sintroductoryexampledescribedamassiveproblem Ax D b thathadnosolution. Inconsistentsystemsariseofteninapplications, thoughusuallynotwithsuchanenormouscoefficientmatrix. Whenasolutionisdemandedandnoneexists, thebestonecandoistofindan x thatmakes Ax ascloseaspossibleto b.
Thinkof Ax asan approximation to b. Thesmallerthedistancebetween b and Ax,givenby kb � Axk, thebettertheapproximation. The generalleast-squaresproblemistofindan x thatmakes kb � Axk assmallaspossible. Theadjective“least-squares”arisesfromthefactthat kb � Axk isthesquarerootofasumofsquares.
DEF IN I T I ON If A is m � n and b isin Rm, a least-squaressolution of Ax D b isan Ox in R
n
suchthatkb � AOxk � kb � Axk
forall x in Rn.
Themostimportantaspectoftheleast-squaresproblemisthatnomatterwhat x weselect, thevector Ax willnecessarilybeinthecolumnspace, ColA. Soweseekan xthatmakes Ax theclosestpointin ColA to b. SeeFig.1. (Ofcourse, if b happenstobein ColA, then b is Ax forsome x, andsuchan x isa“least-squaressolution.”)
Ax
Ax
Ax
Col A
b
0
FIGURE 1 Thevector b iscloserto AOx thanto Ax forother x.
6.5 Least-Squares Problems 361
Solution of the General Least-Squares ProblemGiven A and b asabove, applytheBestApproximationTheoreminSection 6.3tothesubspace ColA. Let
Ob D projColA b
Because Ob isinthecolumnspaceof A, theequation Ax D Ob is consistent, andthereisan Ox in R
n suchthatAOx D Ob (1)
Since Ob istheclosestpointin ColA to b, avector Ox isaleast-squaressolutionofAx D bifandonlyif Ox satisfies(1). Suchan Ox in R
n isalistofweightsthatwillbuild Ob outofthecolumnsof A. SeeFig. 2. [Therearemanysolutionsof(1)iftheequationhasfreevariables.]
x
�n
0
A
subspace of �m
bb – Ax
b = Axˆ
Col A
FIGURE 2 Theleast-squaressolution Ox isin Rn.
Suppose Ox satisfies AOx D Ob. BytheOrthogonalDecompositionTheoreminSec-tion 6.3, theprojection Ob hasthepropertythat b � Ob isorthogonalto ColA, so b � AOxisorthogonaltoeachcolumnof A. If aj isanycolumnof A, then aj �.b � AOx/ D 0,and aT
j .b � AOx/ D 0. Sinceeach aTj isarowof AT ,
AT .b � AOx/ D 0 (2)
(ThisequationalsofollowsfromTheorem3inSection6.1.) Thus
AT b � ATAOx D 0ATAOx D AT b
Thesecalculationsshowthateachleast-squaressolutionofAx D b satisfiestheequation
ATAx D AT b (3)
Thematrixequation(3)representsasystemofequationscalledthe normalequationsfor Ax D b. A solutionof(3)isoftendenotedby Ox.
THEOREM 13 Thesetofleast-squaressolutionsof Ax D b coincideswiththenonemptysetofsolutionsofthenormalequations ATAx D AT b.
PROOF Asshownabove, thesetofleast-squaressolutionsisnonemptyandeachleast-squaressolution Ox satisfiesthenormalequations. Conversely, suppose Ox satisfiesATAOx D AT b. Then Ox satisfies(2)above, whichshowsthat b � AOx isorthogonaltothe
362 CHAPTER 6 Orthogonality and Least Squares
rowsof AT andhenceisorthogonaltothecolumnsof A. Sincethecolumnsof A spanColA, thevector b � AOx isorthogonaltoallof ColA. Hencetheequation
b D AOx C .b � AOx/
isadecompositionof b intothesumofavectorin ColA andavectororthogonaltoColA. Bytheuniquenessoftheorthogonaldecomposition, AOx mustbetheorthogonalprojectionof b onto ColA. Thatis, AOx D Ob, and Ox isaleast-squaressolution.
EXAMPLE 1 Findaleast-squaressolutionoftheinconsistentsystem Ax D b for
A D
2
4
4 0
0 2
1 1
3
5; b D
2
4
2
0
11
3
5
SOLUTION Tousenormalequations(3), compute:
ATA D�
4 0 1
0 2 1
�2
4
4 0
0 2
1 1
3
5 D�
17 1
1 5
�
AT b D�
4 0 1
0 2 1
�2
4
2
0
11
3
5 D�
19
11
�
Thentheequation ATAx D AT b becomes�
17 1
1 5
��
x1
x2
�
D�
19
11
�
Rowoperationscanbeusedtosolvethissystem, butsince ATA isinvertibleand 2 � 2,itisprobablyfastertocompute
.ATA/�1 D 1
84
�
5 �1
�1 17
�
andthentosolve ATAx D AT b as
Ox D .ATA/�1AT b
D 1
84
�
5 �1
�1 17
��
19
11
�
D 1
84
�
84
168
�
D�
1
2
�
Inmanycalculations, ATA isinvertible, butthisisnotalwaysthecase. Thenextexampleinvolvesamatrixofthesortthatappearsinwhatarecalled analysisofvarianceproblemsinstatistics.
EXAMPLE 2 Findaleast-squaressolutionof Ax D b for
A D
2
6666664
1 1 0 0
1 1 0 0
1 0 1 0
1 0 1 0
1 0 0 1
1 0 0 1
3
7777775
; b D
2
6666664
�3
�1
0
2
5
1
3
7777775
6.5 Least-Squares Problems 363
SOLUTION Compute
ATA D
2
664
1 1 1 1 1 1
1 1 0 0 0 0
0 0 1 1 0 0
0 0 0 0 1 1
3
775
2
6666664
1 1 0 0
1 1 0 0
1 0 1 0
1 0 1 0
1 0 0 1
1 0 0 1
3
7777775
D
2
664
6 2 2 2
2 2 0 0
2 0 2 0
2 0 0 2
3
775
AT b D
2
664
1 1 1 1 1 1
1 1 0 0 0 0
0 0 1 1 0 0
0 0 0 0 1 1
3
775
2
6666664
�3
�1
0
2
5
1
3
7777775
D
2
664
4
�4
2
6
3
775
Theaugmentedmatrixfor ATAx D AT b is2
664
6 2 2 2 4
2 2 0 0 �4
2 0 2 0 2
2 0 0 2 6
3
775
�
2
664
1 0 0 1 3
0 1 0 �1 �5
0 0 1 �1 �2
0 0 0 0 0
3
775
Thegeneralsolutionis x1 D 3 � x4, x2 D �5 C x4, x3 D �2 C x4, and x4 isfree. Sothegeneralleast-squaressolutionof Ax D b hastheform
Ox D
2
664
3
�5
�2
0
3
775
C x4
2
664
�1
1
1
1
3
775
Thenexttheoremgivesusefulcriteriafordeterminingwhenthereisonlyoneleast-squaressolutionof Ax D b. (Ofcourse, theorthogonalprojection Ob isalwaysunique.)
THEOREM 14 Let A bean m � n matrix. Thefollowingstatementsarelogicallyequivalent:
a. Theequation Ax D b hasauniqueleast-squaressolutionforeach b in Rm.
b. Thecolumnsof A arelinearlyindpendent.c. Thematrix ATA isinvertible.
Whenthesestatementsaretrue, theleast-squaressolution Ox isgivenbyOx D .ATA/�1AT b (4)
ThemainelementsofaproofofTheorem 14areoutlinedinExercises19–21, whichalsoreviewconceptsfromChapter 4. Formula(4)for Ox isusefulmainlyfortheoreticalpurposesandforhandcalculationswhen ATA isa 2 � 2 invertiblematrix.
Whenaleast-squaressolution Ox isusedtoproduce AOx asanapproximationto b,thedistancefrom b to AOx iscalledthe least-squareserror ofthisapproximation.
EXAMPLE 3 Given A and b asinExample 1, determinetheleast-squareserrorintheleast-squaressolutionof Ax D b.
364 CHAPTER 6 Orthogonality and Least Squares
SOLUTION FromExample 1,
(2, 0, 11)
(0, 2, 1)
0
b
Ax
x3
x1
(4, 0, 1)
�84
Col A
FIGURE 3
b D
2
4
2
0
11
3
5 and AOx D
2
4
4 0
0 2
1 1
3
5
�
1
2
�
D
2
4
4
4
3
3
5
Hence
b � AOx D
2
4
2
0
11
3
5 �
2
4
4
4
3
3
5 D
2
4
�2
�4
8
3
5
andkb � AOxk D
p
.�2/2 C .�4/2 C 82 Dp
84
Theleast-squareserrorisp
84. Forany x in R2, thedistancebetween b andthevector
Ax isatleastp
84. SeeFig. 3. Notethattheleast-squaressolution Ox itselfdoesnotappearinthefigure.
Alternative Calculations of Least-Squares SolutionsThenextexampleshowshowtofindaleast-squaressolutionof Ax D b whenthecolumnsof A areorthogonal. Suchmatricesoftenappearinlinearregressionproblems,discussedinthenextsection.
EXAMPLE 4 Findaleast-squaressolutionof Ax D b for
A D
2
664
1 �6
1 �2
1 1
1 7
3
775
; b D
2
664
�1
2
1
6
3
775
SOLUTION Because the columns a1 and a2 of A are orthogonal, the orthogonalprojectionof b onto ColA isgivenby
Ob D b�a1
a1 � a1
a1 C b�a2
a2 � a2
a2 D 8
4a1 C 45
90a2 (5)
D
2
664
2
2
2
2
3
775
C
2
664
�3
�1
1=2
7=2
3
775
D
2
664
�1
1
5=2
11=2
3
775
Nowthat Ob isknown, wecansolve AOx D Ob. Butthisistrivial, sincewealreadyknowwhatweightstoplaceonthecolumnsof A toproduce Ob. Itisclearfrom(5)that
Ox D�
8=4
45=90
�
D�
2
1=2
�
In some cases, the normal equations for a least-squares problem can be ill-conditioned; thatis, smallerrorsinthecalculationsoftheentriesof ATA cansometimescause relatively large errors in the solution Ox. If the columns of A are linearlyindependent, theleast-squaressolutioncanoftenbecomputedmorereliablythroughaQR factorizationof A (describedinSection 6.4).¹
¹TheQR methodiscomparedwiththestandardnormalequationmethodinG.GolubandC.VanLoan,MatrixComputations, 3rded. (Baltimore: JohnsHopkinsPress, 1996), pp.230–231.
6.5 Least-Squares Problems 365
THEOREM 15 Givenan m � n matrix A withlinearlyindependentcolumns, let A D QR beaQR factorizationof A asinTheorem 12. Then, foreach b in R
m, theequationAx D b hasauniqueleast-squaressolution, givenby
Ox D R�1QT b (6)
PROOF Let Ox D R�1QT b. Then
AOx D QROx D QRR�1QT b D QQT b
ByTheorem 12, thecolumnsof Q formanorthonormalbasisfor ColA. Hence, byTheorem 10, QQTb istheorthogonalprojection Ob of b onto ColA. Then AOx D Ob,whichshowsthat Ox isaleast-squaressolutionof Ax D b. Theuniquenessof Ox followsfromTheorem 14.
NUMER ICAL NOTE
Since R inTheorem 15isuppertriangular, Ox shouldbecalculatedastheexactsolutionoftheequation
Rx D QT b (7)
Itismuchfastertosolve(7)byback-substitutionorrowoperationsthantocompute R�1 anduse(6).
EXAMPLE 5 Findtheleast-squaressolutionof Ax D b for
A D
2
664
1 3 5
1 1 0
1 1 2
1 3 3
3
775
; b D
2
664
3
5
7
�3
3
775
SOLUTION TheQR factorizationof A canbeobtainedasinSection 6.4.
A D QR D
2
664
1=2 1=2 1=2
1=2 �1=2 �1=2
1=2 �1=2 1=2
1=2 1=2 �1=2
3
775
2
4
2 4 5
0 2 3
0 0 2
3
5
Then
QT b D
2
4
1=2 1=2 1=2 1=2
1=2 �1=2 �1=2 1=2
1=2 �1=2 1=2 �1=2
3
5
2
664
3
5
7
�3
3
775
D
2
4
6
�6
4
3
5
Theleast-squaressolution Ox satisfies Rx D QT b; thatis,2
4
2 4 5
0 2 3
0 0 2
3
5
2
4
x1
x2
x3
3
5 D
2
4
6
�6
4
3
5
Thisequationissolvedeasilyandyields Ox D
2
4
10
�6
2
3
5.
366 CHAPTER 6 Orthogonality and Least Squares
PRACTICE PROBLEMS
1. LetA D
2
4
1 �3 �3
1 5 1
1 7 2
3
5 and b D
2
4
5
�3
�5
3
5. Findaleast-squaressolutionofAx D b,
andcomputetheassociatedleast-squareserror.2. Whatcanyousayabouttheleast-squaressolutionof Ax D b when b isorthogonal
tothecolumnsof A?
6.5 EXERCISESIn Exercises 1–4, finda least-squares solution of Ax D b by(a)constructingthenormalequationsfor Ox and(b)solvingforOx.
1. A D
2
4
�1 2
2 �3
�1 3
3
5, b D
2
4
4
1
2
3
5
2. A D
2
4
2 1
�2 0
2 3
3
5, b D
2
4
�5
8
1
3
5
3. A D
2
664
1 �2
�1 2
0 3
2 5
3
775, b D
2
664
3
1
�4
2
3
775
4. A D
2
4
1 3
1 �1
1 1
3
5, b D
2
4
5
1
0
3
5
InExercises5and6, describeallleast-squaressolutionsoftheequation Ax D b.
5. A D
2
664
1 1 0
1 1 0
1 0 1
1 0 1
3
775, b D
2
664
1
3
8
2
3
775
6. A D
2
6666664
1 1 0
1 1 0
1 1 0
1 0 1
1 0 1
1 0 1
3
7777775
, b D
2
6666664
7
2
3
6
5
4
3
7777775
7. Computetheleast-squareserrorassociatedwiththeleast-squaressolutionfoundinExercise 3.
8. Computetheleast-squareserrorassociatedwiththeleast-squaressolutionfoundinExercise 4.
InExercises9–12, find(a)theorthogonalprojectionof b ontoColA and(b)aleast-squaressolutionof Ax D b.
9. A D
2
4
1 5
3 1
�2 4
3
5, b D
2
4
4
�2
�3
3
5
10. A D
2
4
1 2
�1 4
1 2
3
5, b D
2
4
3
�1
5
3
5
11. A D
2
664
4 0 1
1 �5 1
6 1 0
1 �1 �5
3
775, b D
2
664
9
0
0
0
3
775
12. A D
2
664
1 1 0
1 0 �1
0 1 1
�1 1 �1
3
775, b D
2
664
2
5
6
6
3
775
13. Let A D
2
4
3 4
�2 1
3 4
3
5, b D
2
4
11
�9
5
3
5, u D�
5
�1
�
, and v D�
5
�2
�
. Compute Au and Av, andcomparethemwith b.
Could u possiblybea least-squares solutionof Ax D b?(Answerthiswithoutcomputingaleast-squaressolution.)
14. Let A D
2
4
2 1
�3 �4
3 2
3
5, b D
2
4
5
4
4
3
5, u D�
4
�5
�
, and v D�
6
�5
�
. Compute Au and Av, andcomparethemwith b. Is
itpossiblethatatleastoneof u or v couldbealeast-squaressolutionofAx D b? (Answerthiswithoutcomputingaleast-squaressolution.)
InExercises15and16, usethefactorization A D QR tofindtheleast-squaressolutionof Ax D b.
15. A D
2
4
2 3
2 4
1 1
3
5 D
2
4
2=3 �1=3
2=3 2=3
1=3 �2=3
3
5
�
3 5
0 1
�
, b D
2
4
7
3
1
3
5
16. A D
2
664
1 �1
1 4
1 �1
1 4
3
775
D
2
664
1=2 �1=2
1=2 1=2
1=2 �1=2
1=2 1=2
3
775
�
2 3
0 5
�
;b D
2
664
�1
6
5
7
3
775
InExercises17and18,A isanm � nmatrixand b isinRm. Mark
eachstatementTrueorFalse. Justifyeachanswer.
17. a. Thegeneralleast-squaresproblemistofindan x thatmakes Ax ascloseaspossibleto b.
6.5 Least-Squares Problems 367
b. A least-squaressolutionof Ax D b is a vector Ox thatsatisfies AOx D Ob, where Ob istheorthogonalprojectionofb onto ColA.
c. A least-squaressolutionof Ax D b isavector Ox suchthatkb � Axk � kb � AOxk forall x in R
n.d. Anysolutionof ATAx D AT b isaleast-squaressolution
of Ax D b.e. Ifthecolumnsof A arelinearlyindependent, thenthe
equation Ax D b hasexactlyoneleast-squaressolution.18. a. If b isinthecolumnspaceof A, theneverysolutionof
Ax D b isaleast-squaressolution.b. Theleast-squaressolutionof Ax D b isthepointinthe
columnspaceof A closestto b.c. A least-squaressolutionof Ax D b isalistofweights
that, whenappliedtothecolumnsof A, producestheorthogonalprojectionof b onto ColA.
d. If Ox is a least-squares solution of Ax D b, thenOx D .ATA/�1AT b.
e. Thenormalequationsalwaysprovideareliablemethodforcomputingleast-squaressolutions.
f. If A hasaQR factorization, say A D QR, thenthebestwaytofindtheleast-squaressolutionof Ax D b istocompute Ox D R�1QT b.
19. LetA beanm � nmatrix. Usethestepsbelowtoshowthatavector x inR
n satisfiesAx D 0 ifandonlyifATAx D 0. Thiswillshowthat NulA D NulATA.a. Showthatif Ax D 0, then ATAx D 0.b. Suppose ATAx D 0. Explainwhy xTATAx D 0, anduse
thistoshowthat Ax D 0.20. Let A bean m � n matrixsuchthat ATA isinvertible. Show
thatthecolumnsof A arelinearlyindependent. [Careful:Youmaynotassumethat A isinvertible; itmaynotevenbesquare.]
21. Let A bean m � n matrixwhosecolumnsarelinearlyinde-pendent. [Careful: A neednotbesquare.]a. UseExercise 19toshowthat ATA isaninvertiblematrix.b. Explain why A must have at least as many rows as
columns.c. Determinetherankof A.
22. UseExercise 19toshowthat rankATA D rankA. [Hint:Howmanycolumnsdoes ATA have? Howisthisconnectedwiththerankof ATA?]
23. Suppose A is m � n withlinearlyindependentcolumnsandb isin R
m. Usethenormalequationstoproduceaformulafor Ob, theprojectionof b onto ColA. [Hint: Find Ox first. Theformuladoesnotrequireanorthogonalbasisfor ColA.]
24. Findaformulafortheleast-squaressolutionofAx D bwhenthecolumnsof A areorthonormal.
25. Describeallleast-squaressolutionsofthesystem
x C y D 2
x C y D 4
26. [M] Example 3inSection 4.8displayedalow-passlinearfilterthatchangedasignal fykg into fykC1g andchangedahigher-frequencysignal fwkg into the zero signal, whereyk D cos.�k=4/ and wk D cos.3�k=4/. Thefollowingcal-culationswilldesignafilterwithapproximatelythoseprop-erties. Thefilterequationis
a0ykC2 C a1ykC1 C a2yk D ´k forall k .8/
Becausethesignalsareperiodic, withperiod 8, itsufficestostudyequation(8)for k D 0; : : : ; 7. Theactiononthetwosignalsdescribedabovetranslatesintotwosetsofeightequations, shownbelow:
k D 0
k D 1:::
k D 7
2
66666666664
ykC2
0ykC1
.7yk
1�:7 0 :7
�1 �:7 0
�:7 �1 �:7
0 �:7 �1
:7 0 �:7
1 :7 0
:7 1 :7
3
77777777775
2
4
a0
a1
a2
3
5 D
2
66666666664
ykC1
.70
�:7
�1
�:7
0
:7
1
3
77777777775
k D 0
k D 1:::
k D 7
2
66666666664
wkC2
0 �wkC1
.7wk
1:7 0 �:7
�1 :7 0
:7 �1 :7
0 :7 �1
�:7 0 :7
1 �:7 0
�:7 1 �:7
3
77777777775
2
4
a0
a1
a2
3
5 D
2
66666666664
0
0
0
0
0
0
0
0
3
77777777775
Write an equation Ax D b, where A is a 16 � 3 matrixformedfromthetwocoefficientmatricesaboveandwhere bin R
16 isformedfromthetworightsidesoftheequations.Find a0, a1, and a2 givenbytheleast-squaressolutionofAx D b. (The.7inthedataabovewasusedasanapprox-imationfor
p2=2, toillustratehowatypicalcomputation
inanappliedproblemmightproceed. If .707wereusedinstead, theresultingfiltercoefficientswouldagreetoatleastsevendecimalplaceswith
p2=4; 1=2, and
p2=4, thevalues
producedbyexactarithmeticcalculations.)
WEB
368 CHAPTER 6 Orthogonality and Least Squares
SOLUTIONS TO PRACTICE PROBLEMS
1. First, compute
ATA D
2
4
1 1 1
�3 5 7
�3 1 2
3
5
2
4
1 �3 �3
1 5 1
1 7 2
3
5 D
2
4
3 9 0
9 83 28
0 28 14
3
5
AT b D
2
4
1 1 1
�3 5 7
�3 1 2
3
5
2
4
5
�3
�5
3
5 D
2
4
�3
�65
�28
3
5
Next, rowreducetheaugmentedmatrixforthenormalequations, ATAx D AT b:2
4
3 9 0 �3
9 83 28 �65
0 28 14 �28
3
5 �
2
4
1 3 0 �1
0 56 28 �56
0 28 14 �28
3
5 � � � � �
2
4
1 0 �3=2 2
0 1 1=2 �1
0 0 0 0
3
5
Thegeneralleast-squaressolutionis x1 D 2 C 32x3, x2 D �1 � 1
2x3, with x3 free.
Foronespecificsolution, take x3 D 0 (forexample), andget
Ox D
2
4
2
�1
0
3
5
Tofindtheleast-squareserror, compute
Ob D AOx D
2
4
1 �3 �3
1 5 1
1 7 2
3
5
2
4
2
�1
0
3
5 D
2
4
5
�3
�5
3
5
Itturnsoutthat Ob D b, so kb � Obk D 0. Theleast-squareserroriszerobecause bhappenstobein ColA.
2. If b isorthogonaltothecolumnsofA, thentheprojectionof b ontothecolumnspaceof A is 0. Inthiscase, aleast-squaressolution Ox of Ax D b satisfies AOx D 0.
6.6 APPLICATIONS TO LINEAR MODELS
A commontaskinscienceandengineeringistoanalyzeandunderstandrelationshipsamongseveralquantitiesthatvary. Thissectiondescribesavarietyofsituationsinwhichdataareusedtobuildorverifyaformulathatpredictsthevalueofonevariableasafunctionofothervariables. Ineachcase, theproblemwillamounttosolvingaleast-squaresproblem.
Foreasyapplicationofthediscussiontorealproblemsthatyoumayencounterlaterinyourcareer, wechoosenotationthatiscommonlyusedinthestatisticalanalysisofscientificandengineeringdata. Insteadof Ax D b, wewrite Xˇ D y andreferto X asthe designmatrix, ˇ asthe parametervector, and y asthe observationvector.
Least-Squares LinesThe simplest relation between two variables x and y is the linear equationy D ˇ0 C ˇ1x.¹ Experimentaldataoftenproducepoints .x1; y1/; : : : ; .xn; yn/ that,
¹Thisnotationiscommonlyusedforleast-squareslinesinsteadof y D mx C b.
6.6 Applications to Linear Models 369
whengraphed, seemtolieclosetoaline. Wewanttodeterminetheparameters ˇ0
and ˇ1 thatmakethelineas“close”tothepointsaspossible.Suppose ˇ0 and ˇ1 are fixed, andconsider the line y D ˇ0 C ˇ1x in Fig. 1.
Correspondingtoeachdatapoint .xj ; yj / thereisapoint .xj ; ˇ0 C ˇ1xj / onthelinewiththesame x-coordinate. Wecall yj the observed valueof y and ˇ0 C ˇ1xj thepredicted y-value(determinedbytheline). Thedifferencebetweenanobserved y-valueandapredicted y-valueiscalleda residual.
ResidualResidual
Point on line
Data pointy
xj
x1
xn
x
y = �0 + �
1x
(xj, �
0 + �
1x
j)
(xj, y
j)
FIGURE 1 Fittingalinetoexperimentaldata.
Thereareseveralwaystomeasurehow“close”thelineistothedata. Theusualchoice(primarilybecausethemathematicalcalculationsaresimple)istoaddthesquaresoftheresiduals. The least-squaresline istheline y D ˇ0 C ˇ1x thatminimizesthesumofthesquaresoftheresiduals. Thislineisalsocalleda lineofregressionof yon x, becauseanyerrorsinthedataareassumedtobeonlyinthe y-coordinates. Thecoefficients ˇ0, ˇ1 ofthelinearecalled(linear) regressioncoefficients.²
Ifthedatapointswereontheline, theparameters ˇ0 and ˇ1 wouldsatisfytheequations
Predicted Observedy-value y-value
ˇ0 C ˇ1x1 = y1
ˇ0 C ˇ1x2 = y2
::::::
ˇ0 C ˇ1xn = yn
Wecanwritethissystemas
Xˇ D y; where X D
2
6664
1 x1
1 x2:::
:::
1 xn
3
7775
; ˇ D�
ˇ0
ˇ1
�
; y D
2
6664
y1
y2:::
yn
3
7775
(1)
Ofcourse, ifthedatapointsdon’tlieonaline, thentherearenoparameters ˇ0, ˇ1 forwhichthepredicted y-valuesin Xˇ equaltheobserved y-valuesin y, and Xˇ D y hasnosolution. Thisisaleast-squaresproblem, Ax D b, withdifferentnotation!
Thesquareofthedistancebetweenthevectors Xˇ and y ispreciselythesumofthesquaresoftheresiduals. The ˇ thatminimizesthissumalsominimizesthedistancebetween Xˇ and y. Computingtheleast-squaressolutionof Xˇ D y isequivalenttofindingthe ˇ thatdeterminestheleast-squareslineinFig. 1.
² Ifthemeasurementerrorsarein x insteadof y, simplyinterchangethecoordinatesofthedata .xj ; yj /
beforeplottingthepointsandcomputingtheregressionline. Ifbothcoordinatesaresubjecttopossibleerror,thenyoumightchoosethelinethatminimizesthesumofthesquaresofthe orthogonal (perpendicular)distancesfromthepointstotheline. SeethePracticeProblemsforSection 7.5.
370 CHAPTER 6 Orthogonality and Least Squares
EXAMPLE 1 Findtheequation y D ˇ0 C ˇ1x oftheleast-squareslinethatbestfitsthedatapoints .2; 1/, .5; 2/, .7; 3/, and .8; 3/.
SOLUTION Usethe x-coordinatesofthedatatobuildthedesignmatrix X in(1)andthe y-coordinatestobuildtheobservationvector y:
X D
2
664
1 2
1 5
1 7
1 8
3
775
; y D
2
664
1
2
3
3
3
775
Fortheleast-squaressolutionof Xˇ D y, obtainthenormalequations(withthenewnotation):
XTXˇ D XTy
Thatis, compute
XTX D�
1 1 1 1
2 5 7 8
�
2
664
1 2
1 5
1 7
1 8
3
775
D�
4 22
22 142
�
XTy D�
1 1 1 1
2 5 7 8
�
2
664
1
2
3
3
3
775
D�
9
57
�
Thenormalequationsare�
4 22
22 142
��
ˇ0
ˇ1
�
D�
9
57
�
Hence�
ˇ0
ˇ1
�
D�
4 22
22 142
��1�9
57
�
D 1
84
�
142 �22
�22 4
��
9
57
�
D 1
84
�
24
30
�
D�
2=7
5=14
�
Thustheleast-squareslinehastheequation
y D 2
7C 5
14x
SeeFig. 2.
1 2 3
1
2
3
4 5 7 8 96
y
x
FIGURE 2 Theleast-squaresliney D 2
7C 5
14x.
A commonpracticebeforecomputingaleast-squareslineistocomputetheaveragex oftheoriginal x-valuesandformanewvariable x� D x � x. Thenew x-dataaresaidtobein mean-deviationform. Inthiscase, thetwocolumnsofthedesignmatrixwillbeorthogonal. Solutionofthenormalequationsissimplified, justasinExample 4inSection 6.5. SeeExercises 17and18.
6.6 Applications to Linear Models 371
The General Linear ModelInsomeapplications, itisnecessarytofitdatapointswithsomethingotherthanastraightline. Intheexamplesthatfollow, thematrixequationisstill Xˇ D y, butthespecificformof X changesfromoneproblemtothenext. Statisticiansusuallyintroducearesidualvector �, definedby � D y � Xˇ, andwrite
y D Xˇ C �
Anyequationofthisformisreferredtoasa linearmodel. OnceX and y aredetermined,thegoalistominimizethelengthof �, whichamountstofindingaleast-squaressolutionof Xˇ D y. Ineachcase, theleast-squaressolution O isasolutionofthenormalequations
XTXˇ D XTy
Least-Squares Fitting of Other CurvesWhendatapoints .x1; y1/; : : : ; .xn; yn/ onascatterplotdonotlieclosetoanyline, itmaybeappropriatetopostulatesomeotherfunctionalrelationshipbetween x and y.
Thenexttwoexamplesshowhowtofitdatabycurvesthathavethegeneralformy D ˇ0f0.x/ C ˇ1f1.x/ C � � � C ˇkfk.x/ (2)
where f0; : : : ; fk areknownfunctionsand ˇ0; : : : ; ˇk areparameters that mustbedetermined. Aswewillsee, equation(2)describesalinearmodelbecauseitislinearintheunknownparameters.
Foraparticularvalueof x, (2)givesapredicted, or“fitted,” valueof y. Thedifferencebetweentheobservedvalueandthepredictedvalueistheresidual. Theparameters ˇ0; : : : ; ˇk mustbedeterminedsoastominimizethesumofthesquaresoftheresiduals.
EXAMPLE 2 Supposedatapoints .x1; y1/; : : : ; .xn; yn/ appeartoliealongsomesortofparabolainsteadofastraightline. Forinstance, ifthe x-coordinatedenotestheproductionlevelforacompany, and y denotestheaveragecostperunitofoperatingatalevelof x unitsperday, thenatypicalaveragecostcurvelookslikeaparabolathatopensupward(Fig. 3). Inecology, aparaboliccurvethatopensdownwardisusedto
Units produced
Aver
age
cost
per
unit
x
y
FIGURE 3
Averagecostcurve.modelthenetprimaryproductionofnutrientsinaplant, asafunctionofthesurfaceareaofthefoliage(Fig. 4). Supposewewishtoapproximatethedatabyanequationoftheform
y D ˇ0 C ˇ1x C ˇ2x2 (3)
Describethelinearmodelthatproducesa“least-squaresfit”ofthedatabyequation (3).SOLUTION Equation(3)describestheidealrelationship. Supposetheactualvaluesof
Surface areaof foliage
x
y
Net
pri
mar
ypro
duct
ion
FIGURE 4
Productionofnutrients.
theparametersare ˇ0, ˇ1, ˇ2. Thenthecoordinatesofthefirstdatapoint .x1; y1/ satisfyanequationoftheform
y1 D ˇ0 C ˇ1x1 C ˇ2x21 C �1
where �1 istheresidualerrorbetweentheobservedvalue y1 andthepredicted y-valueˇ0 C ˇ1x1 C ˇ2x2
1 . Eachdatapointdeterminesasimilarequation:
y1 D ˇ0 C ˇ1x1 C ˇ2x21 C �1
y2 D ˇ0 C ˇ1x2 C ˇ2x22 C �2
::::::
yn D ˇ0 C ˇ1xn C ˇ2x2n C �n
372 CHAPTER 6 Orthogonality and Least Squares
Itisasimplemattertowritethissystemofequationsintheform y D Xˇ C �. TofindX , inspectthefirstfewrowsofthesystemandlookforthepattern.
2
6664
y1
y2
:::
yn
3
7775
D
2
66664
1 x1 x21
1 x2 x22
::::::
:::
1 xn x2n
3
77775
2
64
ˇ0
ˇ1
ˇ2
3
75 C
2
66664
�1
�2
:::
�n
3
77775
y D X ˇ C �
EXAMPLE 3 IfdatapointstendtofollowapatternsuchasinFig. 5, thenan
x
y
FIGURE 5
Datapointsalongacubiccurve.
appropriatemodelmightbeanequationoftheform
y D ˇ0 C ˇ1x C ˇ2x2 C ˇ3x3
Suchdata, forinstance, couldcomefromacompany’stotalcosts, asafunctionofthelevelofproduction. Describethelinearmodelthatgivesaleast-squaresfitofthistypetodata .x1; y1/; : : : ; .xn; yn/.
SOLUTION ByananalysissimilartothatinExample 2, weobtain
Observation Design Parameter Residualvector matrix vector vector
y D
2
66664
y1
y2
:::
yn
3
77775
; X D
2
66664
1 x1 x21 x3
1
1 x2 x22 x3
2
::::::
::::::
1 xn x2n x3
n
3
77775
; ˇ D
2
66664
ˇ0
ˇ1
ˇ2
ˇ3
3
77775
; � D
2
66664
�1
�2
:::
�n
3
77775
Multiple RegressionSupposeanexperimentinvolvestwoindependentvariables—say, u and v—andonedependentvariable, y. A simpleequationforpredicting y from u and v hastheform
y D ˇ0 C ˇ1u C ˇ2v (4)
A moregeneralpredictionequationmighthavetheform
y D ˇ0 C ˇ1u C ˇ2v C ˇ3u2 C ˇ4uv C ˇ5v2 (5)
Thisequationisusedingeology, forinstance, tomodelerosionsurfaces, glacialcirques,soilpH,andotherquantities. Insuchcases, theleast-squaresfitiscalleda trendsurface.
Equations(4)and(5)bothleadtoalinearmodelbecausetheyarelinearintheunknownparameters(eventhough u and v aremultiplied). Ingeneral, alinearmodelwillarisewhenever y istobepredictedbyanequationoftheform
y D ˇ0f0.u; v/ C ˇ1f1.u; v/ C � � � C ˇkfk.u; v/
with f0; : : : ; fk anysortofknownfunctionsand ˇ0; : : : ; ˇk unknownweights.
EXAMPLE 4 In geography, local models of terrain are constructed fromdata.u1; v1; y1/; : : : ; .un; vn; yn/, where uj , vj , and yj arelatitude, longitude, andaltitude,respectively. Describethelinearmodelbasedon(4)thatgivesaleast-squaresfittosuchdata. Thesolutioniscalledthe least-squaresplane. SeeFig. 6.
6.6 Applications to Linear Models 373
FIGURE 6 A least-squaresplane.
SOLUTION Weexpectthedatatosatisfythefollowingequations:
y1 D ˇ0 C ˇ1u1 C ˇ2v1 C �1
y2 D ˇ0 C ˇ1u2 C ˇ2v2 C �2
::::::
yn D ˇ0 C ˇ1un C ˇ2vn C �n
Thissystemhasthematrixform y D Xˇ C �, whereObservation Design Parameter Residual
vector matrix vector vector
y D
2
6664
y1
y2
:::
yn
3
7775
; X D
2
6664
1 u1 v1
1 u2 v2
::::::
:::
1 un vn
3
7775
; ˇ D
2
4
ˇ0
ˇ1
ˇ2
3
5; � D
2
6664
�1
�2
:::
�n
3
7775
Example 4showsthatthelinearmodelformultipleregressionhasthesameabstractformasthemodelforthesimpleregressionintheearlierexamples. Linearalgebragivesusthepowertounderstandthegeneralprinciplebehindallthelinearmodels. Once X
isdefinedproperly, thenormalequationsfor ˇ havethesamematrixform, nomatterhowmanyvariablesareinvolved. Thus, foranylinearmodelwhere XTX isinvertible,theleast-squares O isgivenby .XTX/�1XTy.SG
The Geometry of aLinear Model 6–19
Further ReadingFerguson, J., IntroductiontoLinearAlgebrainGeology (NewYork: Chapman&Hall,1994).Krumbein, W.C., andF.A.Graybill, AnIntroductiontoStatisticalModelsinGeology(NewYork: McGraw-Hill, 1965).Legendre, P., andL.Legendre, NumericalEcology (Amsterdam: Elsevier, 1998).Unwin, DavidJ., AnIntroductiontoTrendSurfaceAnalysis, ConceptsandTechniquesinModernGeography, No. 5(Norwich, England: GeoBooks, 1975).
PRACTICE PROBLEM
Whenthemonthlysalesofaproductaresubjecttoseasonalfluctuations, acurvethatapproximatesthesalesdatamighthavetheform
y D ˇ0 C ˇ1x C ˇ2 sin .2�x=12/
where x isthetimeinmonths. Theterm ˇ0 C ˇ1x givesthebasicsalestrend, andthesinetermreflectstheseasonalchangesinsales. Givethedesignmatrixandtheparametervectorforthelinearmodelthatleadstoaleast-squaresfitoftheequationabove. Assumethedataare .x1; y1/; : : : ; .xn; yn/.
374 CHAPTER 6 Orthogonality and Least Squares
6.6 EXERCISESInExercises1–4, findtheequation y D ˇ0 C ˇ1x oftheleast-squareslinethatbestfitsthegivendatapoints.
1. .0; 1/, .1; 1/, .2; 2/, .3; 2/
2. .1; 0/, .2; 1/, .4; 2/, .5; 3/
3. .�1; 0/, .0; 1/, .1; 2/, .2; 4/
4. .2; 3/, .3; 2/, .5; 1/, .6; 0/
5. Let X bethedesignmatrixusedtofindtheleast-squareslinetofitdata .x1; y1/; : : : ; .xn; yn/. UseatheoreminSection 6.5toshowthatthenormalequationshaveauniquesolutionifandonlyifthedataincludeatleasttwodatapointswithdifferent x-coordinates.
6. Let X bethedesignmatrixinExample 2correspondingtoaleast-squaresfitofaparabolatodata .x1; y1/; : : : ; .xn; yn/.Suppose x1, x2, and x3 aredistinct. Explainwhythereisonlyoneparabolathatfitsthedatabest, inaleast-squaressense.(SeeExercise 5.)
7. A certain experiment produces the data .1; 1:8/, .2; 2:7/,.3; 3:4/, .4; 3:8/, .5; 3:9/. Describethemodelthatproducesaleast-squaresfitofthesepointsbyafunctionoftheform
y D ˇ1x C ˇ2x2
Suchafunctionmightarise, forexample, astherevenuefromthesaleof x unitsofaproduct, whentheamountofferedforsaleaffectsthepricetobesetfortheproduct.a. Givethedesignmatrix, theobservationvector, andthe
unknownparametervector.b. [M] Findtheassociatedleast-squarescurveforthedata.
8. A simplecurvethatoftenmakesagoodmodelforthevari-ablecostsofacompany, asafunctionofthesaleslevel x,hastheform y D ˇ1x C ˇ2x
2 C ˇ3x3. Thereisnoconstant
termbecausefixedcostsarenotincluded.a. Givethedesignmatrixandtheparametervectorforthe
linearmodelthatleadstoaleast-squaresfitoftheequa-tionabove, withdata .x1; y1/; : : : ; .xn; yn/.
b. [M] Findtheleast-squarescurveoftheformabovetofitthedata .4; 1:58/, .6; 2:08/, .8; 2:5/, .10; 2:8/, .12; 3:1/,.14; 3:4/, .16; 3:8/, and .18; 4:32/, withvaluesinthou-sands. Ifpossible, produceagraphthatshowsthedatapointsandthegraphofthecubicapproximation.
9. A certainexperimentproducesthedata .1; 7:9/, .2; 5:4/, and.3; �:9/. Describethemodelthatproducesaleast-squaresfitofthesepointsbyafunctionoftheform
y D A cos x C B sin x
10. SupposeradioactivesubstancesA andB havedecaycon-stantsof.02and.07, respectively. Ifamixtureofthesetwosubstancesattime t D 0 contains MA gramsofA and MBgramsofB,thenamodelforthetotalamount y ofthemixturepresentattime t is
y D MAe�:02t C MBe�:07t .6/
Suppose the initial amounts MA and MB are unknown,but a scientist is able to measure the total amountspresent at several timesandrecords thefollowingpoints.ti ; yi /: .10; 21:34/, .11; 20:68/, .12; 20:05/, .14; 18:87/,and .15; 18:30/.a. Describealinearmodelthatcanbeusedtoestimate MA
and MB.b. [M] Findtheleast-squarescurvebasedon(6).
Halley’sCometlastappearedin1986andwillreappearin2061.
11. [M] AccordingtoKepler’sfirstlaw, acometshouldhaveanelliptic, parabolic, orhyperbolicorbit(withgravitationalattractionsfromtheplanetsignored). Insuitablepolarcoor-dinates, theposition .r; #/ ofacometsatisfiesanequationoftheform
r D ˇ C e.r � cos#/
where ˇ isaconstantand e isthe eccentricity oftheorbit,with 0 � e < 1 foranellipse, e D 1 foraparabola, and e > 1
forahyperbola. Supposeobservationsofanewlydiscoveredcometprovidethedatabelow. Determinethetypeoforbit,andpredictwherethecometwillbewhen# D 4:6 (radians).3
# .88 1.10 1.42 1.77 2.14r 3.00 2.30 1.65 1.25 1.01
12. [M] A healthychild’ssystolicbloodpressure p (inmillime-tersofmercury)andweightw (inpounds)areapproximatelyrelatedbytheequation
ˇ0 C ˇ1 lnw D p
Usethefollowingexperimentaldatatoestimatethesystolicbloodpressureofahealthychildweighing100pounds.
3 Thebasicideaofleast-squaresfittingofdataisduetoK. F. Gauss(and, independently, toA.Legendre), whoseinitialrisetofameoccurredin1801whenheusedthemethodtodeterminethepathoftheasteroidCeres. Fortydaysaftertheasteroidwasdiscovered, itdisappearedbehindthesun. Gausspredicteditwouldappeartenmonthslaterandgaveitslocation. TheaccuracyofthepredictionastonishedtheEuropeanscientificcommunity.
6.6 Applications to Linear Models 375
w 44 61 81 113 131lnw 3.78 4.11 4.39 4.73 4.88p 91 98 103 110 112
13. [M] Tomeasurethetakeoffperformanceofanairplane, thehorizontalpositionoftheplanewasmeasuredeverysecond,from t D 0 to t D 12. Thepositions(infeet)were: 0, 8.8,29.9, 62.0, 104.7, 159.1, 222.0, 294.5, 380.4, 471.1, 571.7,686.8, and809.2.a. Find the least-squares cubic curve y D ˇ0 C ˇ1t C
ˇ2t2 C ˇ3t
3 forthesedata.b. Usetheresultofpart(a)toestimatethevelocityofthe
planewhen t D 4:5 seconds.
14. Let x D 1
n.x1 C � � � C xn/ and y D 1
n.y1 C � � � C yn/.
Show that the least-squares line for the data.x1; y1/; : : : ; .xn; yn/mustpassthrough .x; y/. Thatis, showthat x and y satisfythelinearequation y D O
0 C O1x. [Hint:
Derivethisequationfromthevectorequation y D X O C �.Denotethefirstcolumnof X by 1. Usethefactthattheresidualvector � isorthogonaltothecolumnspaceof X andhenceisorthogonalto 1.]
Givendataforaleast-squaresproblem, .x1; y1/; : : : ; .xn; yn/, thefollowingabbreviationsarehelpful:P
x DPn
iD1 xi ;P
x2 DPn
iD1 x2i ;
Py D
PniD1 yi ;
Pxy D
PniD1 xi yi
Thenormalequationsforaleast-squaresline y D O0 C O
1x maybewrittenintheform
n O0 C O
1
Px D
Py
O0
Px C O
1
Px2 D
Pxy
.7/
15. Derivethenormalequations(7)fromthematrixformgiveninthissection.
16. Useamatrixinversetosolvethesystemofequationsin(7)andtherebyobtainformulasfor O
0 and O1 thatappearinmany
statisticstexts.
17. a. RewritethedatainExample 1withnew x-coordinatesinmeandeviationform. Let X betheassociateddesignmatrix. Whyarethecolumnsof X orthogonal?
b. Writethenormalequationsforthedatainpart(a), andsolvethemtofindtheleast-squaresline, y D ˇ0 C ˇ1x
�,where x� D x � 5:5.
18. Supposethe x-coordinatesofthedata .x1; y1/; : : : ; .xn; yn/
areinmeandeviationform, sothatP
xi D 0. ShowthatifX isthedesignmatrixfortheleast-squareslineinthiscase,then XTX isadiagonalmatrix.
Exercises19and20involveadesignmatrix X withtwoormorecolumnsandaleast-squaressolution O of y D Xˇ. Considerthefollowingnumbers.
(i) kX O k2—the sumofthe squaresofthe“regressionterm.”Denotethisnumberby SS(R).
(ii) ky � X O k2—the sumofthe squaresfor errorterm. DenotethisnumberbySS(E).
(iii) kyk2—the“total” sumofthe squaresofthe y-values. DenotethisnumberbySS(T).
Everystatisticstextthatdiscussesregressionandthelinearmodely D Xˇ C � introducesthesenumbers, thoughterminologyandnotationvarysomewhat. Tosimplifymatters, assumethatthemeanofthe y-valuesiszero. Inthiscase, SS(T) isproportionaltowhatiscalledthe variance ofthesetof y-values.
19. Justifytheequation SS(T) D SS(R) C SS(E). [Hint: Useatheorem, andexplainwhythehypothesesofthetheoremaresatisfied.] Thisequationisextremelyimportantinstatistics,bothinregressiontheoryandintheanalysisofvariance.
20. Showthat kX O k2 = O T XTy. [Hint: Rewritetheleftsideandusethefactthat O satisfiesthenormalequations.] ThisformulaforSS(R) isusedinstatistics. FromthisandfromExercise 19, obtainthestandardformulaforSS(E):
SS(E) D yT y � O TXT y
SOLUTION TO PRACTICE PROBLEM
Construct X and ˇ sothatthe kthrowof Xˇ isthepredicted y-valuethatcorresponds
x
y
Salestrendwithseasonalfluctuations.
tothedatapoint .xk ; yk/, namely,
ˇ0 C ˇ1xk C ˇ2 sin.2�xk=12/
Itshouldbeclearthat
X D
2
64
1 x1 sin.2�x1=12/:::
::::::
1 xn sin.2�xn=12/
3
75 ; ˇ D
2
4
ˇ0
ˇ1
ˇ2
3
5
376 CHAPTER 6 Orthogonality and Least Squares
6.7 INNER PRODUCT SPACES
Notions of length, distance, and orthogonality are often important in applicationsinvolvingavectorspace. For R
n, theseconceptswerebasedonthepropertiesoftheinnerproductlistedinTheorem 1ofSection 6.1. Forotherspaces, weneedanaloguesoftheinnerproductwiththesameproperties. TheconclusionsofTheorem 1nowbecomeaxioms inthefollowingdefinition.
DEF IN I T I ON An innerproduct onavectorspace V isafunctionthat, toeachpairofvectorsu and v in V , associatesarealnumber hu; vi andsatisfiesthefollowingaxioms,forall u, v, w in V andallscalars c:
1. hu; vi D hv; ui2. hu C v;wi D hu;wi C hv;wi3. hcu; vi D chu; vi4. hu;ui � 0 and hu; ui D 0 ifandonlyif u D 0
A vectorspacewithaninnerproductiscalledan innerproductspace.
Thevectorspace Rn withthestandardinnerproductisaninnerproductspace, and
nearlyeverythingdiscussedinthischapterfor Rn carriesovertoinnerproductspaces.
Theexamplesinthissectionandthenextlaythefoundationforavarietyofapplicationstreatedincoursesinengineering, physics, mathematics, andstatistics.
EXAMPLE 1 Fix any two positive numbers—say, 4 and 5—and for vectorsu D .u1; u2/ and v D .v1; v2/ in R
2, set
hu; vi D 4u1v1 C 5u2v2 (1)
Showthatequation(1)definesaninnerproduct.
SOLUTION Certainly Axiom 1 is satisfied, because hu; vi D 4u1v1 C 5u2v2 D4v1u1 C 5v2u2 D hv; ui. If w D .w1; w2/, then
hu C v;wi D 4.u1 C v1/w1 C 5.u2 C v2/w2
D 4u1w1 C 5u2w2 C 4v1w1 C 5v2w2
D hu;wi C hv;wiThisverifiesAxiom 2. ForAxiom 3, compute
hcu; vi D 4.cu1/v1 C 5.cu2/v2 D c.4u1v1 C 5u2v2/ D chu; viForAxiom 4, notethat hu; ui D 4u2
1 C 5u22 � 0, and 4u2
1 C 5u22 D 0 onlyif u1 D u2 D
0, thatis, if u D 0. Also, h0; 0i D 0. So(1)definesaninnerproducton R2.
Inner products similar to (1) can be defined on Rn. They arise naturally in
connectionwith“weightedleast-squares”problems, inwhichweightsareassignedtothevariousentriesinthesumfortheinnerproductinsuchawaythatmoreimportanceisgiventothemorereliablemeasurements.
Fromnowon, whenaninnerproductspaceinvolvespolynomialsorotherfunctions,wewillwritethefunctionsinthefamiliarway, ratherthanusetheboldfacetypeforvectors. Nevertheless, itisimportanttorememberthateachfunction is avectorwhenitistreatedasanelementofavectorspace.
6.7 Inner Product Spaces 377
EXAMPLE 2 Let t0; : : : ; tn bedistinctrealnumbers. For p and q in Pn, definehp; qi D p.t0/q.t0/ C p.t1/q.t1/ C � � � C p.tn/q.tn/ (2)
InnerproductAxioms 1–3arereadilychecked. ForAxiom 4, notethathp; pi D Œp.t0/�2 C Œp.t1/�2 C � � � C Œp.tn/�2 � 0
Also, h0; 0i D 0. (Theboldfacezeroheredenotesthezeropolynomial, thezerovectorin Pn.) If hp; pi D 0, then p mustvanishat n C 1 points: t0; : : : ; tn. Thisispossibleonlyif p isthezeropolynomial, becausethedegreeof p islessthan n C 1. Thus(2)definesaninnerproducton Pn.
EXAMPLE 3 Let V be P2, withtheinnerproductfromExample 2, where t0 D 0,t1 D 1
2, and t2 D 1. Let p.t/ D 12t2 and q.t/ D 2t � 1. Compute hp; qi and hq; qi.
SOLUTION
hp; qi D p.0/q.0/ C p�
12
�
q�
12
�
C p.1/q.1/
D .0/.�1/ C .3/.0/ C .12/.1/ D 12
hq; qi D Œq.0/�2 C Œq�
12
�
�2 C Œq.1/�2
D .�1/2 C .0/2 C .1/2 D 2
Lengths, Distances, and OrthogonalityLet V beaninnerproductspace, withtheinnerproductdenotedby hu; vi. JustasinR
n, wedefinethe length, or norm, ofavector v tobethescalar
kvk Dp
hv; viEquivalently, kvk2 D hv; vi. (Thisdefinitionmakessensebecause hv; vi � 0, butthedefinition doesnot saythat hv; vi isa“sumofsquares,” because v neednotbeanelementof R
n.)A unitvector isonewhoselengthis1. The distancebetween u and v is ku � vk.
Vectors u and v are orthogonal if hu; vi D 0.
EXAMPLE 4 Let P2 havetheinnerproduct(2)ofExample 3. Computethelengthsofthevectors p.t/ D 12t2 and q.t/ D 2t � 1.SOLUTION
kpk2 D hp; pi D Œp.0/�2 C�
p�
12
��2 C Œp.1/�2
D 0 C Œ3�2 C Œ12�2 D 153
kpk Dp
153
FromExample 3, hq; qi D 2. Hence kqk Dp
2.
The Gram–Schmidt ProcessTheexistenceoforthogonalbasesforfinite-dimensionalsubspacesofaninnerproductspacecanbeestablishedbytheGram–Schmidtprocess, justasin R
n. Certainorthogo-nalbasesthatarisefrequentlyinapplicationscanbeconstructedbythisprocess.
Theorthogonalprojectionofavectorontoasubspace W withanorthogonalbasiscanbeconstructedasusual. Theprojectiondoesnotdependonthechoiceoforthogonalbasis, andithasthepropertiesdescribedintheOrthogonalDecompositionTheoremandtheBestApproximationTheorem.
378 CHAPTER 6 Orthogonality and Least Squares
EXAMPLE 5 LetV beP4 withtheinnerproductinExample 2, involvingevaluationofpolynomialsat �2, �1, 0, 1, and2, andview P2 asasubspaceof V . Produceanorthogonalbasisfor P2 byapplyingtheGram–Schmidtprocesstothepolynomials1, t ,and t2.SOLUTION Theinnerproductdependsonlyonthevaluesofapolynomialat�2; : : : ; 2,sowelistthevaluesofeachpolynomialasavectorin R
5, underneaththenameofthepolynomial:¹
Polynomial: 1 t t 2
Vectorofvalues:
2
66664
1
1
1
1
1
3
77775
;
2
66664
�2
�1
0
1
2
3
77775
;
2
66664
4
1
0
1
4
3
77775
Theinnerproductoftwopolynomialsin V equalsthe(standard)innerproductoftheircorrespondingvectorsin R
5. Observethat t isorthogonaltotheconstantfunction1. Sotake p0.t/ D 1 and p1.t/ D t . For p2, usethevectorsin R
5 tocomputetheprojectionof t2 onto Span fp0; p1g:
ht2; p0i D ht2; 1i D 4 C 1 C 0 C 1 C 4 D 10
hp0; p0i D 5
ht2; p1i D ht2; ti D �8 C .�1/ C 0 C 1 C 8 D 0
Theorthogonalprojectionof t2 onto Span f1; tg is 105
p0 C 0p1. Thus
p2.t/ D t2 � 2p0.t/ D t2 � 2
Anorthogonalbasisforthesubspace P2 of V is:Polynomial: p0 p1 p2
Vectorofvalues:
2
66664
1
1
1
1
1
3
77775
;
2
66664
�2
�1
0
1
2
3
77775
;
2
66664
2
�1
�2
�1
2
3
77775
(3)
Best Approximation in Inner Product SpacesA commonprobleminappliedmathematicsinvolvesavectorspace V whoseelementsarefunctions. Theproblemistoapproximateafunction f in V byafunction g fromaspecifiedsubspace W of V . The“closeness”oftheapproximationof f dependsontheway kf � gk isdefined. Wewillconsideronlythecaseinwhichthedistancebetweenf and g isdeterminedbyaninnerproduct. Inthiscase, the bestapproximationto f byfunctionsin W istheorthogonalprojectionof f ontothesubspace W .
EXAMPLE 6 Let V be P4 withtheinnerproductinExample 5, andlet p0, p1,and p2 betheorthogonalbasisfoundinExample 5forthesubspace P2. Findthebestapproximationto p.t/ D 5 � 1
2t4 bypolynomialsin P2.
¹Eachpolynomialin P4 isuniquelydeterminedbyitsvalueatthefivenumbers �2; : : : ; 2. Infact, thecorrespondencebetween p anditsvectorofvaluesisanisomorphism, thatis, aone-to-onemappingontoR5 thatpreserveslinearcombinations.
6.7 Inner Product Spaces 379
SOLUTION Thevaluesof p0; p1, and p2 atthenumbers �2, �1, 0, 1, and2arelistedin R
5 vectorsin(3)above. Thecorrespondingvaluesfor p are �3, 9/2, 5, 9/2, and �3.Compute
hp; p0i D 8; hp; p1i D 0; hp; p2i D �31
hp0; p0i D 5; hp2; p2i D 14
Thenthebestapproximationin V to p bypolynomialsin P2 is
Op D projP2p D hp; p0i
hp0; p0ip0 C hp; p1ihp1; p1ip1 C hp; p2i
hp2; p2ip2
D 85p0 C �31
14p2 D 8
5� 31
14.t2 � 2/:
Thispolynomialistheclosestto p ofallpolynomialsin P2, whenthedistancebetweenpolynomialsismeasuredonlyat �2, �1, 0, 1, and2. SeeFig. 1.
t
y
2
2
p(t)
p(t)ˆ
FIGURE 1
Thepolynomials p0, p1, and p2 inExamples 5and6belongtoaclassofpolynomi-alsthatarereferredtoinstatisticsas orthogonalpolynomials.² TheorthogonalityreferstothetypeofinnerproductdescribedinExample 2.
Two InequalitiesGivenavector v inaninnerproductspace V andgivenafinite-dimensionalsubspaceW , wemayapplythePythagoreanTheoremtotheorthogonaldecompositionof v withrespectto W andobtain
kvk2 D k projW vk2 C kv � projW vk2
SeeFig.2. Inparticular, thisshowsthatthenormoftheprojectionof v ontoW doesnotexceedthenormof v itself. Thissimpleobservationleadstothefollowingimportant
v
0W
||v – projW
v||
||projW
v||
||v||
projW
v
FIGURE 2
Thehypotenuseisthelongestside.inequality.
THEOREM 16 The Cauchy--Schwarz Inequality
Forall u, v in V ,jhu; vij � kuk kvk (4)
²See StatisticsandExperimentalDesigninEngineeringandthePhysicalSciences, 2nded., byNormanL. JohnsonandFredC. Leone(NewYork: JohnWiley&Sons, 1977). Tablestherelist“OrthogonalPolynomials,” whicharesimplythevaluesofthepolynomialatnumberssuchas �2, �1, 0, 1, and2.
380 CHAPTER 6 Orthogonality and Least Squares
PROOF If u D 0, thenbothsidesof(4)arezero, andhencetheinequalityistrueinthiscase. (SeePracticeProblem 1.) If u ¤ 0, let W bethesubspacespannedby u. Recallthat kcuk D jcj kuk foranyscalar c. Thus
k projW vk D
hv;uihu; uiu
D jhv; uijjhu;uijkuk D jhv; uij
kuk2kuk D jhu; vij
kuk
Since k projW vk � kvk, wehave jhu; vijkuk � kvk, whichgives(4).
TheCauchy–Schwarzinequalityisusefulinmanybranchesofmathematics. A fewsimpleapplicationsarepresentedintheexercises. Ourmainneedforthisinequalityhereistoproveanotherfundamentalinequalityinvolvingnormsofvectors. SeeFig. 3.
THEOREM 17 The Triangle Inequality
Forall u; v in V ,ku C vk � kuk C kvk
PROOF ku C vk2 D hu C v;u C vi D hu;ui C 2hu; vi C hv; vi� kuk2 C 2jhu; vij C kvk2
� kuk2 C 2kuk kvk C kvk2 Cauchy–SchwarzD .kuk C kvk/2
Thetriangleinequalityfollowsimmediatelybytakingsquarerootsofbothsides.
0 u
v
||u + v||
u + v
||v||
||u||
FIGURE 3
Thelengthsofthesidesofatriangle.
An Inner Product for C Œa; b� (Calculus required)ProbablythemostwidelyusedinnerproductspaceforapplicationsisthevectorspaceC Œa; b� ofallcontinuousfunctionsonaninterval a � t � b, withaninnerproductthatwewilldescribe.
Webeginbyconsideringapolynomial p andanyinteger n largerthanorequaltothedegreeof p. Then p isin Pn, andwemaycomputea“length”for p usingtheinnerproductofExample 2involvingevaluationat n C 1 pointsin Œa; b�. However,thislengthof p capturesthebehavioratonlythose n C 1 points. Since p isin Pn foralllarge n, wecoulduseamuchlarger n, withmanymorepointsforthe“evaluation”innerproduct. SeeFig. 4.
tba
tba
p(t) p(t)
FIGURE 4 Usingdifferentnumbersofevaluationpointsin Œa; b� tocomputekpk2.
6.7 Inner Product Spaces 381
Letuspartition Œa; b� into n C 1 subintervalsoflength �t D .b � a/=.n C 1/, andlet t0; : : : ; tn bearbitrarypointsinthesesubintervals.
a t0
∆t
tj btn
If n islarge, theinnerproducton Pn determinedby t0; : : : ; tn willtendtogivealargevalueto hp; pi, sowescaleitdownanddivideby n C 1. Observethat 1=.n C 1/ D�t=.b � a/, anddefine
hp; qi D 1
n C 1
nX
jD0
p.tj /q.tj / D 1
b � a
2
4
nX
jD0
p.tj /q.tj /�t
3
5
Now, let n increasewithoutbound. Sincepolynomialsp and q arecontinuousfunctions,theexpressioninbracketsisaRiemannsumthatapproachesadefiniteintegral, andweareledtoconsiderthe averagevalueof p.t/q.t/ ontheinterval Œa; b�:
1
b � a
Z b
a
p.t/q.t/ dt
Thisquantityisdefinedforpolynomialsofanydegree(infact, forall continuousfunctions), andithasallthepropertiesofaninnerproduct, asthenextexampleshows.Thescalefactor 1=.b � a/ isinessentialandisoftenomittedforsimplicity.
EXAMPLE 7 For f , g in C Œa; b�, set
hf; gi DZ b
a
f .t/g.t/ dt (5)
Showthat(5)definesaninnerproducton C Œa; b�.
SOLUTION InnerproductAxioms 1–3followfromelementarypropertiesofdefiniteintegrals. ForAxiom 4, observethat
hf; f i DZ b
a
Œf .t/�2 dt � 0
Thefunction Œf .t/�2 iscontinuousandnonnegativeon Œa; b�. IfthedefiniteintegralofŒf .t/�2 iszero, then Œf .t/�2 mustbeidenticallyzeroon Œa; b�, byatheoreminadvancedcalculus, inwhichcase f isthezerofunction. Thus hf; f i D 0 impliesthat f isthezerofunctionon Œa; b�. So(5)definesaninnerproducton C Œa; b�.
EXAMPLE 8 Let V bethespace C Œ0; 1� withtheinnerproductofExample 7, andlet W bethesubspacespannedbythepolynomials p1.t/ D 1, p2.t/ D 2t � 1, andp3.t/ D 12t2. UsetheGram–Schmidtprocesstofindanorthogonalbasisfor W .
SOLUTION Let q1 D p1, andcompute
hp2; q1i DZ 1
0
.2t � 1/.1/ dt D .t2 � t /
ˇˇˇˇ
1
0
D 0
382 CHAPTER 6 Orthogonality and Least Squares
So p2 isalreadyorthogonalto q1, andwecantake q2 D p2. Fortheprojectionof p3
onto W2 D Span fq1; q2g, compute
hp3; q1i DZ 1
0
12t2 � 1 dt D 4t3
ˇˇˇˇ
1
0
D 4
hq1; q1i DZ 1
0
1 � 1 dt D t
ˇˇˇˇ
1
0
D 1
hp3; q2i DZ 1
0
12t2.2t � 1/ dt DZ 1
0
.24t3 � 12t2/ dt D 2
hq2; q2i DZ 1
0
.2t � 1/2 dt D 1
6.2t � 1/3
ˇˇˇˇ
1
0
D 1
3
Then
projW2p3 D hp3; q1i
hq1; q1i q1 C hp3; q2ihq2; q2i
q2 D 4
1q1 C 2
1=3q2 D 4q1 C 6q2
andq3 D p3 � projW2
p3 D p3 � 4q1 � 6q2
Asafunction, q3.t/ D 12t2 � 4 � 6.2t � 1/ D 12t2 � 12t C 2. Theorthogonalbasisforthesubspace W is fq1; q2; q3g.
PRACTICE PROBLEMS
Usetheinnerproductaxiomstoverifythefollowingstatements.
1. hv; 0i D h0; vi D 0.2. hu; v C wi D hu; vi C hu;wi.
6.7 EXERCISES1. Let R
2 have the inner product of Example 1, and letx D .1; 1/ and y D .5; �1/.a. Find kxk, kyk, and jhx; yij2.b. Describeallvectors .´1; ´2/ thatareorthogonalto y.
2. Let R2 havetheinnerproductofExample 1. Showthat
theCauchy–Schwarzinequalityholdsfor x D .3; �2/ andy D .�2; 1/. [Suggestion: Study jhx; yij2.]
Exercises3–8referto P2 withtheinnerproductgivenbyevalua-tionat �1, 0, and1. (SeeExample 2.)
3. Compute hp; qi, where p.t/ D 4 C t , q.t/ D 5 � 4t 2.
4. Compute hp; qi, where p.t/ D 3t � t 2, q.t/ D 3 C 2t 2.
5. Compute kpk and kqk, for p and q inExercise 3.
6. Compute kpk and kqk, for p and q inExercise 4.
7. Computetheorthogonalprojectionof q ontothesubspacespannedby p, for p and q inExercise 3.
8. Computetheorthogonalprojectionof q ontothesubspacespannedby p, for p and q inExercise 4.
9. Let P3 havetheinnerproductgivenbyevaluationat �3, �1,1, and3. Let p0.t/ D 1, p1.t/ D t , and p2.t/ D t 2.a. Computetheorthogonalprojectionof p2 ontothesub-
spacespannedby p0 and p1.b. Find a polynomial q that is orthogonal to p0 and
p1, such that fp0; p1; qg is an orthogonal basis forSpan fp0; p1; p2g. Scale the polynomial q so that itsvectorofvaluesat .�3; �1; 1; 3/ is .1; �1; �1; 1/.
10. Let P3 havetheinnerproductasinExercise 9, with p0; p1,and q thepolynomialsdescribedthere. Findthebestapprox-imationto p.t/ D t 3 bypolynomialsin Span fp0; p1; qg.
11. Let p0, p1, and p2 betheorthogonalpolynomialsdescribedinExample5, wheretheinnerproducton P4 isgivenbyevaluationat �2, �1, 0, 1, and2. Find the orthogonalprojectionof t 3 onto Span fp0; p1; p2g.
12. Finda polynomial p3 such that fp0; p1; p2; p3g (see Ex-ercise 11) is an orthogonal basis for the subspace P3 ofP4. Scalethepolynomial p3 sothatitsvectorofvaluesis.�1; 2; 0; �2; 1/.
6.8 Applications of Inner Product Spaces 383
13. Let A beanyinvertible n � n matrix. Showthatfor u, v inR
n, theformula hu; vi D .Au/� .Av/ D .Au/T .Av/ definesaninnerproducton R
n.
14. Let T beaone-to-onelineartransformationfromavectorspace V into R
n. Showthatfor u, v in V , theformulahu; vi D T .u/�T .v/ definesaninnerproducton V .
UsetheinnerproductaxiomsandotherresultsofthissectiontoverifythestatementsinExercises15–18.
15. hu; cvi D chu; vi forallscalars c.
16. If fu; vg isanorthonormalsetin V , then ku � vk Dp
2.
17. hu; vi D 14ku C vk2 � 1
4ku � vk2.
18. ku C vk2 C ku � vk2 D 2kuk2 C 2kvk2.
19. Given a � 0 and b � 0, let u D�p
apb
�
and v D�p
bpa
�
.
UsetheCauchy–Schwarzinequalitytocomparethegeomet-ricmean
pab withthearithmeticmean .a C b/=2.
20. Let u D�
a
b
�
and v D�
1
1
�
. Use the Cauchy–Schwarz
inequalitytoshowthat�
a C b
2
�2
� a2 C b2
2
Exercises 21–24refer to V D C Œ0; 1�, with the inner productgivenbyanintegral, asinExample 7.
21. Compute hf; gi, where f .t/ D 1 � 3t 2 and g.t/ D t � t 3.22. Compute hf; gi, where f .t/ D 5t � 3 and g.t/ D t 3 � t 2.23. Compute kf k for f inExercise 21.24. Compute kgk for g inExercise 22.25. Let V bethespace C Œ�1; 1� withtheinnerproductofExam-
ple 7. Findanorthogonalbasisforthesubspacespannedbythepolynomials 1, t , and t 2. Thepolynomialsinthisbasisarecalled Legendrepolynomials.
26. Let V bethespace C Œ�2; 2� withtheinnerproductofExam-ple 7. Findanorthogonalbasisforthesubspacespannedbythepolynomials 1, t , and t 2.
27. [M] Let P4 havetheinnerproductasinExample 5, andletp0, p1, p2 betheorthogonalpolynomialsfromthatexam-ple. Usingyourmatrixprogram, applytheGram–Schmidtprocesstotheset fp0; p1; p2; t 3; t 4g tocreateanorthogonalbasisfor P4.
28. [M] Let V be the space C Œ0; 2�� with the inner prod-uct of Example 7. Use the Gram–Schmidt process tocreate an orthogonal basis for the subspace spanned byf1; cos t; cos2 t; cos3 tg. Useamatrixprogramorcomputa-tionalprogramtocomputetheappropriatedefiniteintegrals.
SOLUTIONS TO PRACTICE PROBLEMS
1. ByAxiom 1, hv; 0i D h0; vi. Then h0; vi D h0v; vi D 0hv; vi, by Axiom 3, soh0; vi D 0.
2. ByAxioms 1, 2, andthen1again, hu; v C wi D hv C w; ui D hv; ui C hw; ui Dhu; vi C hu;wi.
6.8 APPLICATIONS OF INNER PRODUCT SPACES
TheexamplesinthissectionsuggesthowtheinnerproductspacesdefinedinSection 6.7ariseinpracticalproblems. Thefirstexampleisconnectedwiththemassiveleast-squaresproblemofupdatingtheNorthAmericanDatum, describedinthechapter’sintroductoryexample.
Weighted Least-SquaresLet y beavectorof n observations, y1; : : : ; yn, andsupposewewishtoapproximate y byavector Oy thatbelongstosomespecifiedsubspaceof R
n. (InSection 6.5, Oy waswrittenas Ax sothat Oy wasinthecolumnspaceof A.) Denotetheentriesin Oy by Oy1; : : : ; Oyn.Thenthe sumofthesquaresforerror, or SS(E), inapproximating y by Oy is
SS(E) D .y1 � Oy1/2 C � � � C .yn � Oyn/2 (1)
Thisissimply ky � Oyk2, usingthestandardlengthin Rn.
384 CHAPTER 6 Orthogonality and Least Squares
Nowsupposethemeasurementsthatproducedtheentriesin y arenotequallyreliable. (ThiswasthecasefortheNorthAmericanDatum, sincemeasurementsweremadeoveraperiodof140years.) Asanotherexample, theentriesin y mightbecomputedfromvarioussamplesofmeasurements, withunequalsamplesizes.) Thenitbecomesappropriatetoweightthesquarederrorsin(1)insuchawaythatmoreimportanceisassignedtothemorereliablemeasurements.¹ Iftheweightsaredenotedby w2
1 ; : : : ; w2n, thentheweightedsumofthesquaresforerroris
WeightedSS(E) D w21.y1 � Oy1/2 C � � � C w2
n.yn � Oyn/2 (2)
Thisisthesquareofthelengthof y � Oy, wherethelengthisderivedfromaninnerproductanalogoustothatinExample 1inSection 6.7, namely,
hx; yi D w21x1y1 C � � � C w2
nxnyn
Itissometimesconvenienttotransformaweightedleast-squaresproblemintoanequivalentordinaryleast-squaresproblem. LetW bethediagonalmatrixwith(positive)w1; : : : ; wn onitsdiagonal, sothat
W y D
2
6664
w1 0 � � � 0
0 w2
:::: : :
:::
0 � � � wn
3
7775
2
6664
y1
y2
:::
yn
3
7775
D
2
6664
w1y1
w2y2
:::
wnyn
3
7775
withasimilarexpressionfor W Oy. Observethatthe j thtermin(2)canbewrittenas
w2j .yj � Oyj /2 D .wj yj � wj Oyj /2
Itfollowsthattheweighted SS(E) in(2)isthesquareoftheordinarylengthin Rn of
W y � W Oy, whichwewriteas kW y � W Oyk2.Nowsupposetheapproximatingvector Oy istobeconstructedfromthecolumnsof
amatrix A. Thenweseekan Ox thatmakes AOx D Oy ascloseto y aspossible. However,themeasureofclosenessistheweightederror,
kW y � W Oyk2 D kW y � WAOxk2
Thus Ox isthe(ordinary)least-squaressolutionoftheequation
WAx D W y
Thenormalequationfortheleast-squaressolutionis
.WA/T WAx D .WA/T W y
EXAMPLE 1 Find the least-squares line y D ˇ0 C ˇ1x that best fits the data.�2; 3/, .�1; 5/, .0; 5/, .1; 4/, and .2; 3/. Supposetheerrorsinmeasuringthe y-valuesofthelasttwodatapointsaregreaterthanfortheotherpoints. Weightthesedatahalfasmuchastherestofthedata.
¹Noteforreaderswithabackgroundinstatistics: Supposetheerrorsinmeasuringthe yi areindependentrandomvariableswithmeansequaltozeroandvariancesof �2
1 ; : : : ; �2n . Thentheappropriateweightsin(2)
are w2i D 1=�2
i . Thelargerthevarianceoftheerror, thesmallertheweight.
6.8 Applications of Inner Product Spaces 385
SOLUTION AsinSection 6.6, write X forthematrix A and ˇ forthevector x, andobtain
X D
2
66664
1 �2
1 �1
1 0
1 1
1 2
3
77775
; ˇ D�
ˇ0
ˇ1
�
; y D
2
66664
3
5
5
4
3
3
77775
Foraweightingmatrix, choose W withdiagonalentries2, 2, 2, 1, and1. Left-multiplicationby W scalestherowsof X and y:
WX D
2
66664
2 �4
2 �2
2 0
1 1
1 2
3
77775
; W y D
2
66664
6
10
10
4
3
3
77775
Forthenormalequation, compute
.WX/T WX D�
14 �9
�9 25
�
and .WX/T W y D�
59
�34
�
andsolve �
14 �9
�9 25
��
ˇ0
ˇ1
�
D�
59
�34
�
Thesolutionofthenormalequationis(totwosignificantdigits) ˇ0 D 4:3 and ˇ1 D :20.Thedesiredlineis
y D 4:3 C :20x
Incontrast, theordinaryleast-squareslineforthesedatais
y D 4:0 � :10x
BothlinesaredisplayedinFig. 1.
2
2
–2
y = 4 – .1x
y = 4.3 + .2x
y
x
FIGURE 1
Weightedandordinaryleast-squareslines.
Trend Analysis of DataLet f representanunknownfunctionwhosevaluesareknown(perhapsonlyapprox-imately)at t0; : : : ; tn. Ifthereisa“lineartrend”inthedata f .t0/; : : : ; f .tn/, thenwemightexpecttoapproximatethevaluesof f byafunctionoftheform ˇ0 C ˇ1t .Ifthereisa“quadratictrend”tothedata, thenwewouldtryafunctionoftheformˇ0 C ˇ1t C ˇ2t2. ThiswasdiscussedinSection 6.6, fromadifferentpointofview.
Insomestatisticalproblems, itisimportanttobeabletoseparatethelineartrendfromthequadratictrend(andpossiblycubicorhigher-ordertrends). Forinstance,supposeengineersareanalyzingtheperformanceofanewcar, and f .t/ representsthedistancebetweenthecarattime t andsomereferencepoint. Ifthecaristravelingatconstantvelocity, thenthegraphof f .t/ shouldbeastraightlinewhoseslopeisthecar’svelocity. Ifthegaspedalissuddenlypressedtothefloor, thegraphof f .t/ willchangetoincludeaquadratictermandpossiblyacubicterm(duetotheacceleration).Toanalyzetheabilityofthecartopassanothercar, forexample, engineersmaywanttoseparatethequadraticandcubiccomponentsfromthelinearterm.
Ifthefunctionisapproximatedbyacurveoftheform y D ˇ0 C ˇ1t C ˇ2t2, thecoefficient ˇ2 maynotgivethedesiredinformationaboutthequadratictrendinthedata,becauseitmaynotbe“independent”inastatisticalsensefromtheother ˇi . Tomake
386 CHAPTER 6 Orthogonality and Least Squares
whatisknownasa trendanalysis ofthedata, weintroduceaninnerproductonthespace Pn analogoustothatgiveninExample 2inSection 6.7. For p, q in Pn, define
hp; qi D p.t0/q.t0/ C � � � C p.tn/q.tn/
Inpractice, statisticiansseldomneedtoconsidertrendsindataofdegreehigherthancubicorquartic. Solet p0, p1, p2, p3 denoteanorthogonalbasisofthesubspace P3 ofPn, obtainedbyapplyingtheGram–Schmidtprocesstothepolynomials1, t , t2, and t3.BySupplementaryExercise 11inChapter 2, thereisapolynomial g in Pn whosevaluesat t0; : : : ; tn coincidewiththoseoftheunknownfunction f . Let Og betheorthogonalprojection(withrespecttothegiveninnerproduct)of g onto P3, say,
Og D c0p0 C c1p1 C c2p2 C c3p3
Then Og iscalledacubic trendfunction, and c0; : : : ; c3 arethe trendcoefficients ofthedata. Thecoefficient c1 measuresthelineartrend, c2 thequadratictrend, and c3 thecubictrend. Itturnsoutthatifthedatahavecertainproperties, thesecoefficientsarestatisticallyindependent.
Since p0; : : : ; p3 areorthogonal, thetrendcoefficientsmaybecomputedoneatatime, independentlyofoneanother. (Recallthat ci D hg; pi i=hpi ; pi i.) Wecanignore p3 and c3 ifwewantonlythequadratictrend. Andif, forexample, weneededtodeterminethequartictrend, wewouldhavetofind(viaGram–Schmidt)onlyapolynomial p4 in P4 thatisorthogonalto P3 andcompute hg; p4i=hp4; p4i.
EXAMPLE 2 Thesimplestandmostcommonuseoftrendanalysisoccurswhenthepoints t0; : : : ; tn canbeadjustedsothattheyareevenlyspacedandsumtozero. Fitaquadratictrendfunctiontothedata .�2; 3/, .�1; 5/, .0; 5/, .1; 4/, and .2; 3/.
SOLUTION The t -coordinatesaresuitablyscaledtousetheorthogonalpolynomialsfoundinExample 5ofSection 6.7:
Polynomial: p0 p1 p2 Data: g
Vectorofvalues:
2
66664
1
1
1
1
1
3
77775
;
2
66664
�2
�1
0
1
2
3
77775
;
2
66664
2
�1
�2
�1
2
3
77775
;
2
66664
3
5
5
4
3
3
77775
Thecalculationsinvolveonlythesevectors, notthespecificformulasfortheorthogonalpolynomials. Thebestapproximationtothedatabypolynomialsin P2 istheorthogonalprojectiongivenby
Op D hg; p0ihp0; p0i
p0 C hg; p1ihp1; p1i
p1 C hg; p2ihp2; p2i
p2
D 205
p0 � 110
p1 � 714
p2
andOp.t/ D 4 � :1t � :5.t2 � 2/ (3)
Sincethecoefficientof p2 isnotextremelysmall, itwouldbereasonabletoconcludethatthetrendisatleastquadratic. ThisisconfirmedbythegraphinFig. 2.
2
2
–2
y
y = p(t)
x
FIGURE 2
Approximationbyaquadratictrendfunction.
6.8 Applications of Inner Product Spaces 387
Fourier Series (Calculus required)Continuousfunctionsareoftenapproximatedbylinearcombinationsofsineandcosinefunctions. Forinstance, acontinuousfunctionmightrepresentasoundwave, anelectricsignalofsometype, orthemovementofavibratingmechanicalsystem.
Forsimplicity, weconsiderfunctionson 0 � t � 2� . Itturnsoutthatanyfunctionin C Œ0; 2�� canbeapproximatedascloselyasdesiredbyafunctionoftheform
a0
2C a1 cos t C � � � C an cos nt C b1 sin t C � � � C bn sin nt (4)
forasufficientlylargevalueof n. Thefunction(4)iscalleda trigonometricpoly-nomial. If an and bn arenotbothzero, thepolynomialissaidtobeof order n. TheconnectionbetweentrigonometricpolynomialsandotherfunctionsinC Œ0; 2�� dependsonthefactthatforany n � 1, theset
f1; cos t; cos 2t; : : : ; cos nt; sin t; sin 2t; : : : ; sin ntg (5)
isorthogonalwithrespecttotheinnerproduct
hf; gi DZ 2�
0
f .t/g.t/ dt (6)
ThisorthogonalityisverifiedasinthefollowingexampleandinExercises 5and6.
EXAMPLE 3 Let C Œ0; 2�� havetheinnerproduct(6), andlet m and n beunequalpositiveintegers. Showthat cosmt and cos nt areorthogonal.
SOLUTION Useatrigonometricidentity. When m ¤ n,
hcosmt; cos nti DZ 2�
0
cosmt cosnt dt
D 1
2
Z 2�
0
Œcos.mt C nt/ C cos.mt � nt/� dt
D 1
2
�sin.mt C nt/
m C nC sin.mt � nt/
m � n
�ˇˇˇˇ
2�
0
D 0
Let W bethesubspaceof C Œ0; 2�� spannedbythefunctionsin(5). Given f
in C Œ0; 2��, thebestapproximationto f byfunctionsin W iscalledthe nth-orderFourierapproximation to f on Œ0; 2��. Sincethefunctionsin(5)areorthogonal,thebestapproximationisgivenbytheorthogonalprojectiononto W . Inthiscase, thecoefficients ak and bk in(4)arecalledthe Fouriercoefficients of f . Thestandardformulaforanorthogonalprojectionshowsthat
ak D hf; cos ktihcos kt; cos kti ; bk D hf; sin kti
hsin kt; sin kti ; k � 1
Exercise 7asksyoutoshowthat hcos kt; cos kti D � and hsin kt; sin kti D � . Thus
ak D 1
�
Z 2�
0
f .t/ cos kt dt; bk D 1
�
Z 2�
0
f .t/ sin kt dt (7)
Thecoefficientofthe(constant)function 1 intheorthogonalprojectionis
hf; 1ih1; 1i D 1
2�
Z 2�
0
f .t/�1 dt D 1
2
�1
�
Z 2�
0
f .t/ cos.0�t / dt
�
D a0
2
where a0 isdefinedby(7)for k D 0. Thisexplainswhytheconstanttermin(4)iswrittenas a0=2.
388 CHAPTER 6 Orthogonality and Least Squares
EXAMPLE 4 Findthe nth-orderFourierapproximationtothefunction f .t/ D t ontheinterval Œ0; 2��.SOLUTION Compute
a0
2D 1
2�
1
�
Z 2�
0
t dt D 1
2�
"
1
2t2
ˇˇˇˇ
2�
0
#
D �
andfor k > 0, usingintegrationbyparts,
ak D 1
�
Z 2�
0
t cos kt dt D 1
�
�1
k2cos kt C t
ksin kt
�2�
0
D 0
bk D 1
�
Z 2�
0
t sin kt dt D 1
�
�1
k2sin kt � t
kcos kt
�2�
0
D � 2
k
Thusthe nth-orderFourierapproximationof f .t/ D t is
� � 2 sin t � sin 2t � 2
3sin 3t � � � � � 2
nsin nt
Figure 3showsthethird-andfourth-orderFourierapproximationsof f .
(a) Third order
y
2�
2�
y = t
�
�
t
y
2�
2�
y = t
�
�
t
(b) Fourth order
FIGURE 3 Fourierapproximationsofthefunction f .t/ D t .
Thenormofthedifferencebetween f andaFourierapproximationiscalledthemeansquareerror intheapproximation. (Theterm mean refers tothefact thatthenormisdeterminedbyanintegral.) ItcanbeshownthatthemeansquareerrorapproacheszeroastheorderoftheFourierapproximationincreases. Forthisreason, itiscommontowrite
f .t/ D a0
2C1X
mD1
.am cosmt C bm sinmt/
Thisexpressionfor f .t/ is calledthe Fourierseries for f on Œ0; 2��. Thetermam cosmt , forexample, is theprojectionof f ontotheone-dimensionalsubspacespannedby cosmt .
PRACTICE PROBLEMS
1. Let q1.t/ D 1, q2.t/ D t , and q3.t/ D 3t2 � 4. Verifythat fq1; q2; q3g isanorthog-onalsetin C Œ�2; 2� withtheinnerproductofExample 7inSection 6.7(integrationfrom �2 to2).
2. Findthefirst-orderandthird-orderFourierapproximationstof .t/ D 3 � 2 sin t C 5 sin 2t � 6 cos 2t
6.8 Applications of Inner Product Spaces 389
6.8 EXERCISES1. Findtheleast-squaresline y D ˇ0 C ˇ1x thatbestfitsthe
data .�2; 0/, .�1; 0/, .0; 2/, .1; 4/, and .2; 4/, assumingthatthefirstandlastdatapointsarelessreliable. Weightthemhalfasmuchasthethreeinteriorpoints.
2. Suppose5outof25datapointsinaweightedleast-squaresproblemhavea y-measurementthatislessreliablethantheothers, andtheyaretobeweightedhalfasmuchastheother20points. Onemethodistoweightthe20pointsbyafactorof1andtheother5byafactorof 1
2. A secondmethodis
toweightthe20pointsbyafactorof2andtheother5byafactorof1. Dothetwomethodsproducedifferentresults?Explain.
3. FitacubictrendfunctiontothedatainExample 2. Theorthogonalcubicpolynomialis p3.t/ D 5
6t 3 � 17
6t .
4. Tomakeatrendanalysisofsixevenlyspaceddatapoints, onecanuseorthogonalpolynomialswithrespecttoevaluationatthepoints t D �5; �3; �1; 1; 3, and5.a. Showthatthefirstthreeorthogonalpolynomialsare
p0.t/ D 1; p1.t/ D t; and p2.t/ D 38t 2 � 35
8
(Thepolynomial p2 hasbeenscaledsothatitsvaluesattheevaluationpointsaresmallintegers.)
b. Fitaquadratictrendfunctiontothedata
.�5; 1/; .�3; 1/; .�1; 4/; .1; 4/; .3; 6/; .5; 8/
InExercises5–14, thespaceis C Œ0; 2�� withtheinnerproduct(6).
5. Showthat sinmt and sin nt areorthogonalwhen m ¤ n.6. Showthat sinmt and cosnt areorthogonalforallpositive
integers m and n.7. Showthat k cos ktk2 D � and k sin ktk2 D � for k > 0.8. Findthethird-orderFourierapproximationto f .t/ D t � 1.
9. Find the third-order Fourier approximation to f .t/ D2� � t .
10. Find the third-order Fourier approximation to the squarewavefunction, f .t/ D 1 for 0 � t < � and f .t/ D �1 for� � t < 2� .
11. Findthethird-orderFourierapproximationto sin2 t , withoutperforminganyintegrationcalculations.
12. Findthethird-orderFourierapproximationto cos3 t , withoutperforminganyintegrationcalculations.
13. ExplainwhyaFouriercoefficientofthesumoftwofunctionsisthesumofthecorrespondingFouriercoefficientsofthetwofunctions.
14. SupposethefirstfewFouriercoefficientsofsomefunctionf in C Œ0; 2�� are a0, a1, a2, and b1, b2, b3. Whichofthefollowingtrigonometricpolynomialsiscloserto f ? Defendyouranswer.
g.t/ D a0
2C a1 cos t C a2 cos 2t C b1 sin t
h.t/ D a0
2C a1 cos t C a2 cos 2t C b1 sin t C b2 sin 2t
15. [M] RefertothedatainExercise 13inSection 6.6, con-cerningthetakeoffperformanceofanairplane. Supposethepossiblemeasurementerrorsbecomegreaterasthespeedoftheairplaneincreases, andlet W bethediagonalweightingmatrixwhosediagonalentriesare1, 1, 1, .9, .9, .8, .7, .6, .5,.4, .3, .2, and.1. Findthecubiccurvethatfitsthedatawithminimumweightedleast-squareserror, anduseittoestimatethevelocityoftheplanewhen t D 4:5 seconds.
16. [M] Let f4 and f5 bethefourth-orderandfifth-orderFourierapproximationsin C Œ0; 2�� tothesquarewavefunctioninExercise 10. Produceseparategraphsof f4 and f5 ontheinterval Œ0; 2��, andproduceagraphof f5 on Œ�2�; 2��.
SG The Linearity of an Orthogonal Projection 6–25
SOLUTIONS TO PRACTICE PROBLEMS
1. Compute
hq1; q2i DZ 2
�2
1�t dt D 1
2t2
ˇˇˇˇ
2
�2
D 0
hq1; q3i DZ 2
�2
1� .3t2 � 4/ dt D .t3 � 4t/
ˇˇˇˇ
2
�2
D 0
hq2; q3i DZ 2
�2
t � .3t2 � 4/ dt D�
3
4t4 � 2t2
�ˇˇˇˇ
2
�2
D 0
390 CHAPTER 6 Orthogonality and Least Squares
2. Thethird-orderFourierapproximationto f isthebestapproximationin C Œ0; 2��
to f byfunctions(vectors)inthesubspacespannedby1, cos t , cos 2t , cos 3t ,sin t , sin 2t , and sin 3t . But f isobviously in thissubspace, so f isitsownbestapproximation:
f .t/ D 3 � 2 sin t C 5 sin 2t � 6 cos 2t
Forthefirst-orderapproximation, theclosestfunctionto f inthesubspace W DSpanf1; cos t; sin tg is 3 � 2 sin t . Theothertwotermsintheformulafor f .t/ areorthogonaltothefunctionsin W , sotheycontributenothingtotheintegralsthatgivetheFouriercoefficientsforafirst-orderapproximation.
y = 3 – 2 sin ty = f (t)
t
3
9
–32π
y
π
First-andthird-orderapproximationsto f .t/.
CHAPTER 6 SUPPLEMENTARY EXERCISES1. Thefollowingstatementsrefertovectorsin R
n (or Rm/ with
thestandardinnerproduct. MarkeachstatementTrueorFalse. Justifyeachanswer.a. Thelengthofeveryvectorisapositivenumber.b. A vector v anditsnegative �v haveequallengths.c. Thedistancebetween u and v is ku � vk.d. If r isanyscalar, then krvk D rkvk.e. Iftwovectorsareorthogonal, theyarelinearlyindepen-
dent.f. If x is orthogonal to both u and v, then x must be
orthogonalto u � v.g. If ku C vk2 D kuk2 C kvk2, then u and v areorthogonal.h. If ku � vk2 D kuk2 C kvk2, then u and v areorthogonal.i. Theorthogonalprojectionof y onto u isascalarmultiple
of y.j. Ifavector y coincideswithitsorthogonalprojectiononto
asubspace W , then y isin W .k. ThesetofallvectorsinR
n orthogonaltoonefixedvectorisasubspaceof R
n.l. If W isasubspaceof R
n, then W and W ? havenovectorsincommon.
m. If fv1; v2; v3g isanorthogonalsetandif c1, c2, and c3 arescalars, then fc1v1; c2v2; c3v3g isanorthogonalset.
n. Ifamatrix U hasorthonormalcolumns, then U U T D I .o. A squarematrixwithorthogonalcolumnsisanorthogo-
nalmatrix.p. Ifasquarematrixhasorthonormalcolumns, thenitalso
hasorthonormalrows.q. IfW isasubspace, then k projW vk2 C kv � projW vk2 D
kvk2.
r. A least-squaressolutionof Ax D b isthevector AOx inColA closestto b, sothat kb � AOx k � kb � Axk forall x.
s. The normal equations for a least-squares solution ofAx D b aregivenby Ox D .ATA/�1AT b.
2. Let fv1; : : : ; vpg beanorthonormalset. Verifythefollowingequalitybyinduction, beginningwith p D 2. If x D c1v1C� � � C cpvp , then
kxk2 D jc1j2 C � � � C jcpj2
3. Let fv1; : : : ; vpg beanorthonormalsetin Rn. Verifythe
followinginequality, called Bessel’sinequality, whichistrueforeach x in R
n:
kxk2 � jx �v1j2 C jx �v2j2 C � � � C jx �vpj2
4. Let U be an n � n orthogonal matrix. Show that iffv1; : : : ; vng is an orthonormal basis for R
n, then so isfU v1; : : : ; U vng.
5. Showthatifan n � n matrix U satisfies .U x/� .U y/ D x �yforall x and y in R
n, then U isanorthogonalmatrix.
6. Showthatif U isanorthogonalmatrix, thenanyrealeigen-valueof U mustbe ˙1.
7. A Householdermatrix, oran elementaryreflector, hastheform Q D I � 2uuT where u isaunitvector. (SeeExer-cise 13intheSupplementaryExercisesforChapter 2.) ShowthatQ isanorthogonalmatrix. (Elementaryreflectorsareof-tenusedincomputerprogramstoproduceaQR factorizationofamatrix A. If A haslinearlyindependentcolumns, thenleft-multiplicationbyasequenceofelementaryreflectorscanproduceanuppertriangularmatrix.)
Chapter 6 Supplementary Exercises 391
8. Let T W Rn ! R
n bealineartransformationthatpreserveslengths; thatis, kT .x/k D kxk forall x in R
n.a. Show that T also preserves orthogonality; that is,
T .x/�T .y/ D 0 whenever x �y D 0.b. Show that the standard matrix of T is an orthogonal
matrix.
9. Let u and v belinearlyindependentvectorsin Rn thatare
not orthogonal. Describehowtofindthebestapproximationto z in R
n byvectorsoftheform x1u C x2v withoutfirstconstructinganorthogonalbasisfor Span fu; vg.
10. Supposethecolumnsof A arelinearlyindependent. Deter-minewhathappenstotheleast-squaressolution Ox of Ax D bwhen b isreplacedby cb forsomenonzeroscalar c.
11. If a, b, and c are distinct numbers, then the followingsystemisinconsistentbecausethegraphsoftheequationsareparallelplanes. Showthatthesetofallleast-squaressolutionsofthesystemispreciselytheplanewhoseequationis x � 2y C 5´ D .a C b C c/=3.
x � 2y C 5´ D a
x � 2y C 5´ D b
x � 2y C 5´ D c
12. Considertheproblemoffindinganeigenvalueofan n � n
matrix A when an approximate eigenvector v is known.Since v isnotexactlycorrect, theequation
Av D �v .1/
will probably not have a solution. However, � can beestimatedbya least-squares solutionwhen(1) is viewedproperly. Thinkof v asan n � 1 matrix V , thinkof � asavectorin R
1, anddenotethevector Av bythesymbol b.Then(1)becomes b D �V , whichmayalsobewrittenasV � D b. Findtheleast-squaressolutionofthissystemof n
equationsintheoneunknown �, andwritethissolutionusingtheoriginalsymbols. Theresultingestimatefor � iscalledaRayleighquotient. SeeExercises11and12inSection5.8.
13. Usethestepsbelowtoprovethefollowingrelationsamongthe four fundamental subspaces determined by an m � n
matrix A.
RowA D .NulA/?; ColA D .NulAT /?
a. Showthat RowA iscontainedin .NulA/?. (Showthatifx isin RowA, then x isorthogonaltoevery u in NulA.)
b. Suppose rankA D r . Find dimNulA and dim .NulA/?,andthendeducefrompart(a)that RowA D .NulA/?.[Hint: StudytheexercisesforSection 6.3.]
c. Explainwhy ColA D .NulAT /?.
14. Explainwhyanequation Ax D b hasasolutionifandonlyif b isorthogonaltoallsolutionsoftheequation ATx D 0.
Exercises15and16concernthe(real) Schurfactorization ofann � nmatrixA intheformA D URU T , whereU isanorthogonalmatrixand R isan n � n uppertriangularmatrix.1
15. Show that if A admits a (real) Schur factorization, A DURU T , then A has n realeigenvalues, countingmultiplic-ities.
16. Let A bean n � n matrixwith n realeigenvalues, countingmultiplicities, denotedby �1; : : : ; �n. ItcanbeshownthatA admitsa(real)Schurfactorization. Parts(a)and(b)showthekeyideasintheproof. Therestoftheproofamountstorepeating(a)and(b)forsuccessivelysmallermatrices, andthenpiecingtogethertheresults.a. Let u1 bea unit eigenvector corresponding to �1, let
u2; : : : ; un beanyothervectorssuchthat fu1; : : : ; ungis an orthonormal basis for R
n, and then let U DŒ u1 u2 � � � un �. Show that the first column ofU T AU is �1e1, where e1 isthefirstcolumnofthe n � n
identitymatrix.b. Part(a)impliesthat U TAU hastheformshownbelow.
Explainwhytheeigenvaluesof A1 are �2; : : : ; �n. [Hint:SeetheSupplementaryExercisesforChapter 5.]
U TAU D
2
6664
�1 � � � �0::: A1
0
3
7775
[M] When the right side of an equation Ax D b is changedslightly—say, toAx D b C �b forsomevector�b—thesolutionchangesfrom x to x C �x, where �x satisfies A.�x/ D �b.Thequotient k�bk=kbk iscalledthe relativechange in b (orthe relativeerror in b when �b representspossibleerrorintheentriesof b/. Therelativechangeinthesolutionis k�xk=kxk.When A isinvertible, the conditionnumber of A, writtenascond.A/, producesaboundonhowlargetherelativechangeinx canbe:
k�xkkxk � cond.A/�
k�bkkbk .2/
InExercises17–20, solve Ax D b and A.�x/ D �b, andshowthattheinequality(2)holdsineachcase. (Seethediscussionofill-conditioned matricesinExercises41–43inSection 2.3.)
17. A D�
4:5 3:1
1:6 1:1
�
, b D�
19:249
6:843
�
, �b D�
:001
�:003
�
18. A D�
4:5 3:1
1:6 1:1
�
, b D�
:500
�1:407
�
, �b D�
:001
�:003
�
1 Ifcomplexnumbersareallowed, every n� n matrix A admitsa(complex)Schurfactorization,A D URU�1, whereR isuppertriangularand U�1 isthe conjugate transposeof U . Thisveryusefulfactisdiscussedin MatrixAnalysis, byRogerA. HornandCharlesR. Johnson(Cambridge: CambridgeUniversityPress, 1985), pp. 79–100.
392 CHAPTER 6 Orthogonality and Least Squares
19. A D
2
664
7 �6 �4 1
�5 1 0 �2
10 11 7 �3
19 9 7 1
3
775, b D
2
664
:100
2:888
�1:404
1:462
3
775,
�b D 10�4
2
664
:49
�1:28
5:78
8:04
3
775
20. A D
2
664
7 �6 �4 1
�5 1 0 �2
10 11 7 �3
19 9 7 1
3
775, b D
2
664
4:230
�11:043
49:991
69:536
3
775,
�b D 10�4
2
664
:27
7:76
�3:77
3:93
3
775