mit2 086f12 notes unit3
TRANSCRIPT
-
7/27/2019 MIT2 086F12 Notes Unit3
1/107
DRAFTV1.2
From
Math,Numerics,&Programming(forMechanicalEngineers)
MasayukiYanoJamesDouglassPennGeorgeKonidarisAnthonyTPatera
September2012
The Authors.License: Creative Commons Attribution-Noncommercial-Share Alike 3.0
(CC BY-NC-SA 3.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original authors and MIT OpenCourseWare source
are credited; the use is non-commercial; and the CC BY-NC-SA license is
retained. See also http://ocw.mit.edu/terms/ .
http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://ocw.mit.edu/terms/http://ocw.mit.edu/terms/http://creativecommons.org/licenses/by-nc-sa/3.0/us/ -
7/27/2019 MIT2 086F12 Notes Unit3
2/107
-
7/27/2019 MIT2 086F12 Notes Unit3
3/107
ContentsIII LinearAlgebra1:MatricesandLeastSquares.Regression. 19915Motivation 20116MatricesandVectors:DefinitionsandOperations 203
16.1 Basic Vector and Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 20316.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Transpose Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20416.1.2 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Inner Product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Norm (2-Norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
16.1.3 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213Vector Spaces and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
16.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21716.2.1 Interpretation of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21716.2.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Matrix-Matrix Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21916.2.3 InterpretationsoftheMatrix-VectorProduct . . . . . . . . . . . . . . . . . . 222
Row Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222Column Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223Left Vector-Matrix Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
16.2.4 InterpretationsoftheMatrix-MatrixProduct . . . . . . . . . . . . . . . . . . 224Matrix-MatrixProductasaSeriesofMatrix-VectorProducts . . . . . . . . . 224Matrix-MatrixProductasaSeriesofLeftVector-MatrixProducts . . . . . . 225
16.2.5 Operation Count of Matrix-Matrix Product . . . . . . . . . . . . . . . . . . . 22516.2.6 The Inverse of a Matrix (Briefly) . . . . . . . . . . . . . . . . . . . . . . . . . 226
16.3 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22716.3.1 Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22816.3.2 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22816.3.3 Symmetric Positive Definite Matrices. . . . . . . . . . . . . . . . . . . . . . . 22816.3.4 Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23016.3.5 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23016.3.6 Orthonormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
16.4 Further Concepts in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 23316.4.1 Column Space and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . 23316.4.2 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
3
-
7/27/2019 MIT2 086F12 Notes Unit3
4/107
17LeastSquares 23717.1 Data Fitting in Absence of Noise and Bias . . . . . . . . . . . . . . . . . . . . . . . 23717.2 Overdetermined Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Row Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243Column Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
17.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24517.3.1 Measures of Closeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24517.3.2 Least-SquaresFormulation(2 minimization) . . . . . . . . . . . . . . . . . . 24617.3.3 Computational Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 250
QRFactorizationandtheGram-SchmidtProcedure . . . . . . . . . . . . . . 25117.3.4 InterpretationofLeastSquares: Projection . . . . . . . . . . . . . . . . . . . 25317.3.5 Error Bounds for Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . 255
ErrorBoundswithRespecttoPerturbationinData,g(constantmodel) . . . 255ErrorBoundswithRespecttoPerturbationinData,g(general) . . . . . . . 256ErrorBoundswithRespecttoReductioninSpace,B . . . . . . . . . . . . . 262
18MatlabLinearAlgebra(Briefly) 26718.1 Matrix Multiplication (and Addition) . . . . . . . . . . . . . . . . . . . . . . . . . . 26718.2 TheMatlab InverseFunction: inv . . . . . . . . . . . . . . . . . . . . . . . . . . . 26818.3 SolutionofLinearSystems:Matlab Backslash . . . . . . . . . . . . . . . . . . . . . 26918.4 Solution of (Linear) Least-Squares Problems. . . . . . . . . . . . . . . . . . . . . . . 269
19Regression: Statistical Inference 27119.1 Simplest Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
19.1.1 FrictionCoefficientDeterminationProblemRevisited . . . . . . . . . . . . . 27119.1.2 Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27219.1.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27519.1.4 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Individual Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 278Joint Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
19.1.5 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28419.1.6 Inspection of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
CheckingforPlausibilityoftheNoiseAssumptions . . . . . . . . . . . . . . . 285Checking for Presence of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
19.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2.1 Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28819.2.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28919.2.4 Overfitting (and Underfitting) . . . . . . . . . . . . . . . . . . . . . . . . . . 290
4
-
7/27/2019 MIT2 086F12 Notes Unit3
5/107
UnitIIILinearAlgebra1:MatricesandLeast
Squares.Regression.
199
-
7/27/2019 MIT2 086F12 Notes Unit3
6/107
-
7/27/2019 MIT2 086F12 Notes Unit3
7/107
Chapter15MotivationDRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.
In odometry-based mobile robot navigation, the accuracy of the robots dead reckoning posetracking depends on minimizing slippage between the robots wheels and the ground. Even amomentary slip can lead to an error in heading that will cause the error in the robots locationestimatetogrowlinearlyoveritsjourney. Itisthusimportanttodeterminethefrictioncoefficientbetweentherobotswheelsandtheground,whichdirectlyaffectstherobotsresistancetoslippage.Just as importantly, this friction coefficient will significantly affect the performance of the robot:theabilitytopushloads.
WhenthemobilerobotofFigure15.1iscommandedtomoveforward,anumberofforcescomeinto play. Internally, the drive motors exert a torque (not shown in the figure) on the wheels,whichisresistedbythefrictionforceFf betweenthewheelsandtheground. IfthemagnitudeofFfdictatedbythesumofthedragforceFdrag (acombinationofallforcesresistingtherobotsmotion)andtheproductoftherobotsmassandaccelerationislessthanthemaximumstaticfrictionforceFmax between the wheels and the ground, the wheels will roll without slipping and the robotf,staticwillmoveforwardwithvelocityv=rwheel. If,however,themagnitudeofFf reachesFmax ,thef,staticwheelswillbegintoslipandFf willdroptoa lower levelFf,kinetic,thekinetic friction force. Thewheels will continue to slip (v
-
7/27/2019 MIT2 086F12 Notes Unit3
8/107
0 1 2 3 4 5 6 7 8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
time (seconds)ime seconds)
Ffrictionriction vs.s Timime foror 50000 Grramm Loadoad
Ff, kiineeticc = k Fnorormaal + expxperirimenntalal errorrror
Ffriction
o(Newtons)
Newto
Ff, stattaticc = Ftaangengentiaal, applpplieded s Fnorormaal
measas
= s Fnorormaal + expxperiimenntalal errorrrorFf, stattaticcmax,x measeas
Figure15.2: Experimentalsetup for frictionmeasurement: Force transducer (A) is con- Figure 15.3: Sample data for one frictionnected to contact area (B) by a thin wire. measurement, yielding one data point for
Fmax,measNormal force is exerted on the contact area . DatacourtesyofJamesPenn.f,
static
by load stack (C). Tangential force is appliedusingturntable(D)viathefrictionbetweentheturntablesurfaceandthecontactarea. ApparatusandphotographcourtesyofJamesPenn.staticfrictionforce. Weexpectthat
Fmaxf,static =sFnormal,rear , (15.1)wheres isthestaticcoefficientoffrictionandFnormal,rear isthenormalforcefromthegroundonthe rear, driving, wheels. In order to minimize the risk of slippage (and to be able to push largeloads),robotwheelsshouldbedesignedforahighvalueofs betweenthewheelsandtheground.Thisvalue,althoughdifficulttopredictaccuratelybymodeling,canbedeterminedbyexperiment.
WefirstconductexperimentsforthefrictionforceFmax (inNewtons)asafunctionofnormalf,staticload Fnormal,applied (in Newtons) and (nominal) surface area of contact Asurface (in cm2) with thefrictionturntableapparatusdepictedinFigure15.2. Weightspermitustovarythenormalloadandwasherinsertspermitustovarythenominalsurfaceareaofcontact. Atypicalexperiment(ataparticularprescribedvalueofFnormal,applied andAsurface)yieldsthetimetraceofFigure15.3fromwhichtheFmax,means (ourmeasurement ofFmax ) isdeducedasthemaximumoftheresponse.f,static f,static
Wenextpostulateadependence(ormodel)Fmaxf,static(Fnormal,applied, Asurface) =0+1Fnormal,applied+2Asurface , (15.2)
where we expect but do notapriori assume from Messieurs Amontons and Coulomb that0 =0and2 =0(andofcourse1 s). Inordertoconfirmthat 0 =0and 2 =0 oratleastconfirmthat0 =0and2 =0 isnotuntrueandtofindagoodestimatefor1 s,wemustappealtoourmeasurements.
The mathematical techniques by which to determine s (and 0, 2) with some confidencefromnoisyexperimentaldataisknownasregression,whichisthesubjectofChapter19. Regression,inturn,isbestdescribedinthelanguageoflinearalgebra(Chapter16),andisbuiltuponthelinearalgebraconceptof leastsquares(Chapter17).
202
-
7/27/2019 MIT2 086F12 Notes Unit3
9/107
Chapter16MatricesandVectors:DefinitionsandOperations
16.1 BasicVectorandMatrixOperations16.1.1 DefinitionsLet us first introduce the primitive objects in linear algebra: vectors and matrices. A m-vectorvRm1 consistsofmrealnumbers1
v=
v1v2...
vm
.
Itisalsocalledacolumnvector,whichisthedefaultvectorinlinearalgebra. Thus,byconvention,vRm impliesthatv isacolumnvector inRm1. Notethatweusesubscript()i toaddressthe
R1ni-th component of a vector. The other kind of vector is a row vector v consisting of nentries
v= v1 v2 vn .Letusconsiderafewexamplesofcolumnandrowvectors.Example16.1.1vectorsExamplesof(column)vectors inR3 are
1 3 9.1v= 3, u= 7, and w= 7/3 .
6 1Theconceptofvectorsreadilyextendstocomplexnumbers,butweonlyconsiderrealvectorsinourpresentation
ofthischapter.
203
DRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.
http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/ -
7/27/2019 MIT2 086F12 Notes Unit3
10/107
To address a specific component of the vectors, we write, for example, v1 = 1, u1 = 3, andw3 =. Examplesofrowvectors inR14 are v= 2 5 2 e and u= 1 1 0 .
Someofthecomponentsoftheserowvectorsarev2 =5andu4 =0.
AmatrixARmn consistsofmrowsandncolumnsforthetotalofmnentries,A=
A11 A12 A1nA21 A22 A2n
... ... ... ...Am1 Am2 Amn
.Extendingtheconventionforaddressinganentryofavector,weusesubscript()ij toaddresstheentry on the i-th row andj-th column. Note that the order in which the row and column arereferredfollowsthatfordescribingthesizeofthematrix. Thus,ARmn consistsofentries
Aij, i= 1, . . . , m , j= 1, . . . , n .Sometimesitisconvenienttothinkofa(column)vectorasaspecialcaseofamatrixwithonlyonecolumn, i.e.,n=1. Similarly,a(row)vectorcanbethoughtofasaspecialcaseofamatrixwithm=1. Conversely,anmnmatrixcanbeviewedasmrown-vectorsorncolumnm-vectors,aswediscussfurtherbelow.Example16.1.2matricesExamplesofmatricesare
1 3 0 0 1 4 9 and B= 2 8 1A= . 3 0 3 0
ThematrixA isa32matrix(AR32)andmatrixB isa33matrix(BR33). Wecanalsoaddressspecificentriesas,forexample,A12 = 3,A31 =4,andB32 =3.
While vectors and matrices may appear like arrays of numbers, linear algebra defines specialsetofrulestomanipulatetheseobjects. Onesuchoperationisthetransposeoperationconsiderednext.TransposeOperationThefirstlinearalgebraoperatorweconsideristhetransposeoperator,denotedbysuperscript()T.The transpose operator swaps the rows and columns of the matrix. That is, if B = AT withARmn,then
Bij =Aji , 1in, 1jm .Because the rows and columns of the matrix are swapped, the dimensions of the matrix are alsoswapped, i.e., ifARmn thenBRnm.
If we swap the rows and columns twice, then we return to the original matrix. Thus, thetransposeofatransposedmatrix istheoriginalmatrix, i.e.
(AT)T =A .204
-
7/27/2019 MIT2 086F12 Notes Unit3
11/107
Example16.1.3transposeLetusconsiderafewexamplesoftransposeoperation. AmatrixAand itstransposeB=AT arerelatedby
1 3A=
4 9
and B= 1 4 .
3 9 3
3
Therowsandcolumnsareswapped inthesensethatA31 =B13 = andA12 =B21 = 3. Also,becauseAR32,BR23. Interpretingavectorasaspecialcaseofamatrixwithonecolumn,wecanalsoapplythetransposeoperatortoacolumnvectortocreatearowvector, i.e.,given
3 7v= ,
thetransposeoperationyields Tu=v = 3 7 .
Notethatthetransposeofacolumnvector isarowvector,andthetransposeofarowvector isacolumnvector.
16.1.2 VectorOperationsThefirstvectoroperationweconsiderismultiplicationofavectorvRm byascalarR. Theoperationyields
u=v ,where
each
entry
of
u
Rm isgivenbyui =vi, i= 1, . . . , m .
Inotherwords,multiplicationofavectorbyascalarresultsineachcomponentofthevectorbeingscaledbythescalar.
ThesecondoperationweconsiderisadditionoftwovectorsvRm andwRm. Theadditionyields
u=v+w ,whereeachentryofuRm isgivenby
ui =vi+wi, i= 1, . . . , m.In order for addition of two vectors to make sense, the vectors must have the same number ofcomponents. Each entry of theresultingvector is simply thesum ofthe corresponding entriesofthetwovectors.
Wecansummarizetheactionofthescalingandadditionoperations inasingleoperation. LetvRm,wRm andR. Then,theoperation
u=v+w205
-
7/27/2019 MIT2 086F12 Notes Unit3
12/107
0 0.5 1 1.50
0.5
1
1.5
v
u=v
0 0.5 1 1.50
0.5
1
1.5
v
wu=v+w
(a) scalarscaling (b) vectoradditionFigure16.1: Illustrationofvectorscalingandvectoraddition.
yieldsavectoruRm whoseentriesaregivenbyui =vi+wi, i= 1, . . . , m .
The result is nothing more than a combination of the scalar multiplication and vector additionrules.Example16.1.4vectorscalingandaddition inR2Letus illustratescalingofavectorbyascalarandadditionoftwovectors inR2 using
1 1/2 3v= , w= , and = .
1/3 1 2First,letusconsiderscalingofthevectorv bythescalar. Theoperationyields
3 1 3/2u=v= = .
2 1/3 1/2This operation is illustrated in Figure 16.1(a). The vector v is simply stretched by the factor of3/2whilepreservingthedirection.
Now, letusconsideradditionofthevectorsv andw. Thevectoradditionyields1 1/2 3/2
u=v+w= + = .1/3 1 4/3
Figure 16.1(b) illustrates the vector addition process. We translate w so that it starts from thetip of v to form a parallelogram. The resultant vector is precisely the sum of the two vectors.NotethatthegeometricintuitionforscalingandadditionprovidedforR2 readilyextendstohigherdimensions.
206
-
7/27/2019 MIT2 086F12 Notes Unit3
13/107
Example16.1.5vectorscalingandaddition inR3T T
Letv= 1 3 6 ,w= 2 1 0 , and=3. Then, 1 2 1 6 7
u=v+w=
3
+ 3
1
=
3
+
3
=
0
.6 0 6 0 6
InnerProductAnother important operation is the inner product. This operation takes two vectors of the samedimension,vRm andwRm,andyieldsascalarR:
mm=vTw where = viwi .
i=1Theappearanceofthetransposeoperatorwillbecomeobviousonceweintroducethematrix-matrixmultiplicationrule. TheinnerproductinaEuclideanvectorspaceisalsocommonlycalledthedotproductandisdenotedby=vw. Moregenerally,theinnerproductoftwoelementsofavectorspace isdenotedby(,), i.e.,= (v,w).Example16.1.6 innerproduct
T TLetusconsidertwovectorsinR3,v= 1 3 6 andw= 2 1 0 . Theinnerproductofthesetwovectors is
m3
T=v w= viwi = 12 + 3(1)+60 =1 .i=1
Norm(2-Norm)Usingtheinnerproduct,wecannaturallydefinethe2-normofavector. GivenvRm,the2-normofv,denotedbyIvI2, isdefinedby
mmi=1
Note
that
the
norm
of
any
vector
is
non-negative,
because
it
is
a
sum
m
non-negative
numbers
IvI2 = vTv= 2vi .
g(squaredvalues). The2 normistheusualEuclideanlength;inparticular,form=2,theexpression2 2simplifiestothefamiliarPythagoreantheorem,IvI2 = v1 +v2. Whilethereareothernorms,we
almostexclusivelyusethe2-norm inthisunit. Thus, fornotationalconvenience,wewilldropthesubscript2andwritethe2-normofv asIvIwiththe implicitunderstandingI I I I2.
Bydefinition,anynormmustsatisfythetriangle inequality,Iv+wI IvI+IwI ,
207
-
7/27/2019 MIT2 086F12 Notes Unit3
14/107
for any v,w Rm. The theorem states that the sum of the lengths of two adjoining segmentsis longer than the distance between their non-joined end points, as is intuitively clear from Figure16.1(b). Fornormsdefinedby innerproducts,asour2-normabove,thetriangle inequality isautomaticallysatisfied.
Proof. Fornormsinducedbyaninnerproduct,theproofofthetriangleinequalityfollowsdirectlyfromthedefinitionofthenormandtheCauchy-Schwarzinequality. First,weexpandtheexpressionas
T T TIv+wI2 = (v+w)T(v+w) =v v+ 2v w+w w .ThemiddletermscanbeboundedbytheCauchy-Schwarz inequality,whichstatesthat
T Tv w |v w| IvIIwI .Thus,wecanboundthenormas
Iv+w
I2
Iv
I2+ 2
Iv
IIw
I+
Iw
I2 = (
Iv
I+
Iw
I)2 ,
andtakingthesquarerootofbothsidesyieldsthedesiredresult.
Example16.1.7normofavectorT T
Letv= 1 3 6 andw= 2 1 0 . The2 normsofthesevectorsare3m g
2IvI= vi = 12+ 32+ 62 = 46i=1
3m g 2and IwI= w = 22+ (1)2+ 02 = 5 .ii=1
Example16.1.8triangle inequalityLetus illustratethetriangle inequalityusingtwovectors
1 1/2v= and w= .
1/3 1Thelength(orthenorm)ofthevectorsare
10 5IvI= 1.054 and IwI= 1.118 .9 4
Ontheotherhand,thesumofthetwovectors is1 1/2 3/2
v+w= + = ,1/3 1 4/3
208
-
7/27/2019 MIT2 086F12 Notes Unit3
15/107
0 0.5 1 1.50
0.5
1
1.5
v
w
v+w
||v||
||w||
Figure16.2: Illustrationofthetriangle inequality.andits length is
145Iv+wI= 2.007 .6
Thenormofthesum isshorterthanthesumofthenorms,which isIvI+IwI 2.172 .
ThisinequalityisillustratedinFigure16.2. Clearly,thelengthofv+ wisstrictlylessthanthesumofthelengthsofvandw(unlessvandwalignwitheachother,inwhichcaseweobtainequality).
In
two
dimensions,
the
inner
product
can
be
interpreted
as
Tv w=IvIIwIcos() , (16.1)
where istheanglebetweenv andw. Inotherwords,the innerproduct isameasureofhowwellv and w align with each other. Note that we can show the Cauchy-Schwarz inequality from theaboveequality. Namely, |cos()| 1 impliesthat
T|v w|=IvIIwI|cos()| IvIIwI .In particular, we see that the inequality holds with equality if and only if = 0 or , whichcorrespondstothecaseswherevandwalign. ItiseasytodemonstrateEq.(16.1)intwodimensions.
Proof. Notingv,wR2,weexpresstheminpolarcoordinatescos(v) cos(w)v=IvI and w=IwI .sin(v) sin(w)
209
-
7/27/2019 MIT2 086F12 Notes Unit3
16/107
Theinnerproductofthetwovectorsyield2m
T=v w= viwi =IvIcos(v)IwIcos(w) +IvIsin(v)IwIsin(w)i=1
=IvIIwI cos(v)cos(w)+sin(v)sin(w)
1 iv iv) 1 iw 1 iv iv)1 iw iw)=IvIIwI (e +e (e +eiw) + (e e (e e2 2 2i 2i1 i(v+w) i(v+w) i(vw) i(vw)=IvIIwI e +e +e +e4
1 i(v+w) i(v+w)ei(vw)ei(vw) e +e4
1 i(vw) i(vw)=IvIIwI e +e2
=IvIIwIcos(vw) =IvIIwIcos() ,wherethe lastequalityfollowsfromthedefinition
v
w.
BeginAdvancedMaterialForcompleteness, letus introduceamoregeneralclassofnorms.
Example16.1.9p-normsThe2-norm,whichwewillalmostexclusivelyuse,belongtoamoregeneralclassofnorms,calledthep-norms. Thep-normofavectorvRm is
1/p
m
m IvIp = |vi|p ,i=1
wherep1. Anyp-normsatisfiesthepositivityrequirement,thescalarscalingrequirement,andthetriangleinequality. Weseethat2-norm isacaseofp-normwithp=2.
Anothercaseofp-normthatwefrequentlyencounteristhe1-norm,whichissimplythesumoftheabsolutevalueoftheentries, i.e.
mmIvI1 = |vi| .
i=1Theotherone isthe infinitynormgivenby
IvI = lim IvIp = max |vi| .p i=1,...,m
Inotherwords,the infinitynormofavectoris its largestentry inabsolutevalue.
EndAdvancedMaterial
210
-
7/27/2019 MIT2 086F12 Notes Unit3
17/107
5 4 3 2 1 0 1 2 3 40
1
2
3
4
5
6
7
u
v
w
Figure16.3: Setofvectorsconsideredto illustrateorthogonality.OrthogonalityTwovectorsvRm andwRm aresaidtobeorthogonaltoeachother if
Tv w= 0 .Intwodimensions, it iseasytoseethat
Tv w=IvIIwIcos() = 0 cos() = 0 =/2 .That is, the angle between v and w is /2, which is the definition of orthogonality in the usualgeometricsense.Example16.1.10orthogonalityLetusconsiderthreevectorsinR2,
4 3 0u= , v= , and w= ,
2 6 5andcomputethree innerproductsformedbythesevectors:
Tu v=43 + 26 = 0Tu w=40 + 25=10Tv w= 30 + 65=30 .
T T
Becauseu v=0,thevectorsuandv areorthogonaltoeachother. Ontheotherhand,u w = 0andthevectorsuandwarenotorthogonaltoeachother. Similarly,vandwarenotorthogonaltoeachother. ThesevectorsareplottedinFigure16.3;thefigureconfirmsthatuandvareorthogonalintheusualgeometricsense.
211
-
7/27/2019 MIT2 086F12 Notes Unit3
18/107
1 0.5 0 0.5 10
0.5
1
u
v
Figure16.4: Anorthonormalsetofvectors.OrthonormalityTwovectorsvRm andwRm aresaidtobeorthonormaltoeachother iftheyareorthogonaltoeachotherandeachhasunit length, i.e.
Tv w=0 and IvI=IwI= 1.Example
16.1.11
orthonormality
Twovectors
1 2 1 1u= and v=
1 2areorthonormaltoeachother. Itisstraightforwardtoverifythattheyareorthogonaltoeachother
5 5
T T1 2 1 1 1 2 1Tu v= = = 0
1 2 5 1 2andthateachofthemhaveunit length
1
5 5
IuI= ((2)2+ 12) = 15
1IvI= ((1)2+ 22) = 1.5
Figure 16.4 shows that the vectors are orthogonal and have unit length in the usual geometricsense.
16.1.3 LinearCombinationsLet
us
consider
aset
of
n m-vectors
v1 Rm, v2 Rm, . . . , vn Rm .
A linearcombinationofthevectors isgivenbynm
w= jvj ,j=1
where1, 2, . . . , n isasetofrealnumbers,andeachvj isanm-vector.212
-
7/27/2019 MIT2 086F12 Notes Unit3
19/107
Example16.1.12 linearcombinationofvectorsT T T
1 2 3Let us consider three vectors inR2, v = 4 2 , v = 3 6 , and v = 0 5 . Alinearcombinationofthevectors,with1 =1,2 =2,and3 =3, is
3
m 4 3 0
jvj = 1 + (2) + 3w=2 6 5
j=14 6 0 10
= + + = .2 12 15 5
Anotherexampleof linearcombination,with1 =1,2 =0,and3 =0, is3
2 6 5 2j=1
Notethatalinearcombinationofasetofvectors issimplyaweightedsumofthevectors.
Linear IndependenceAsetofn m-vectorsare linearlyindependent if
m 4 3 0 4jvj = 1 + 0 + 0 =w= .
n
jvj =0 only if 1 =2 = =n = 0 .j=1
Otherwise,thevectorsare linearlydependent.
m
Example16.1.13 linear independenceLetusconsiderfourvectors, 2 0 0 2 0, 2w = 0, 3w = 1, and 4w = 01w = .
0 3 0 5
1 2 4} is linearlydependentbecauseThesetofvectors{w , w , w 2 0 2 0
5 5 0+ 01 0= 01 21w4 = 11w + ;w3 3
0 3 5 0thelinearcombinationwiththeweights
{1,5/3,
1
}producesthezerovector. Notethatthechoice
oftheweightsthatachievesthisisnotunique;wejustneedtofindonesetofweightstoshowthatthevectorsarenot linearly independent(i.e.,are linearlydependent).
1 2Ontheotherhand,thesetofvectors{w , w , w3}islinearlyindependent. Consideringalinearcombination,
2 0 0 01 2 31 +2 +3 =1w w w 0+2 0+3 1= 0,
0 3 0 0213
-
7/27/2019 MIT2 086F12 Notes Unit3
20/107
we see that we must choose 1 = 0 to set the first component to 0, 2 = 0 to set the thirdcomponent to 0, and 3 = 0 to set the second component to 0. Thus, only way to satisfy the
1 2equation is to choose the trivial set of weights, {0,0,0}. Thus, the set of vectors{w , w , w3} islinearlyindependent.
BeginAdvancedMaterial
VectorSpacesandBasesGivenasetofn m-vectors,wecanconstructavectorspace,V,givenby
1 2V =span({v , v , . . . , vn}),where n
1 2, v , . . . , v vRm :v= km vk, k Rnn}) =span({v
k=11 2 n=spaceofvectorswhicharelinearcombinationsofv , v , . . . , v .
1In general we do not require the vectors{v , . . . , vn} to be linearly independent. When they arelinearlyindependent,theyaresaidtobeabasisofthespace. Inotherwords,abasisofthevectorspaceV isasetoflinearlyindependentvectorsthatspansthespace. Aswewillseeshortlyinourexample, there are many bases for any space. However, the number of vectors in any bases for agiven space isunique, and that number is called the dimension ofthe space. Letusdemonstratethe ideainasimpleexample.Example16.1.14Bases foravectorspace inR3LetusconsideravectorspaceV spannedbyvectors
1 2 01
v = 2 2v = 1 and 3v = 1.
0 0 0Bydefinition,anyvectorxV isoftheform
1 2 0 1+ 221 2 3x=1v +2v +3v =1 2+2 1+3 1= 21+2+3.
0 0 0 0Clearly, we can express any vector of the form x = (x1, x2,0)T by choosing the coefficients 1,2,and3judiciously. Thus,ourvectorspaceconsistsofvectorsofthe form(x1, x2,0)T, i.e.,allvectors inR3 withzero inthethirdentry.
We also note that the selection of coefficients that achieves (x1, x2,0)T is not unique, as itrequiressolutiontoasystemoftwolinearequationswiththreeunknowns. Thenon-uniquenessof
1 2thecoefficientsisadirectconsequenceof{v , v , v3}notbeinglinearlyindependent. Wecaneasilyverifythe lineardependencebyconsideringanon-trivial linearcombinationsuchas
1 2 0 02v1v23v3 = 2 21 13 1= 0.
0 0 0 0214
-
7/27/2019 MIT2 086F12 Notes Unit3
21/107
Becausethevectorsarenotlinearly independent,theydonotformabasisofthespace.To choose a basis for the space, we first note that vectors in the space V are of the form
(x1, x2,0)T. Weobservethat,forexample, 1 0
1w =
0
and 2w =
1
0 0
wouldspanthespacebecauseanyvectorinV avectoroftheform(x1, x2,0)Tcanbeexpressedasa linearcombination,
1 0 1x1 1 2=1w +2w =1 0+2 0= 2,x20 0 1 0
2 1bychoosing1 =x1 and2 =x . Moreover,w1 andw2 arelinearlyindependent. Thus,{w , w2}is1 2 1abasisforthespaceV. Unliketheset{v , v , v3}whichisnotabasis,thecoefficientsfor{w , w2}
thatyieldsxV isunique. Becausethebasisconsistsoftwovectors,thedimensionofV istwo.This issuccinctlywrittenas
dim(V) = 2 .Because a basis for a given space is not unique, we can pick a different set of vectors. For
example, 1 2
1z = 2 and 2z = 1,0 0
1is also a basis for V. Since z is not a constant multiple of z2, it is clear that they are linearlyindependent. We need to verify that they span the space V. We can verify this by a directargument,
1 2 1+ 22x1 1 2=1z +2z =1 2+2 1= 21+2.x20 0 0 0
Weseethat, forany(x1, x2,0)T,wecanfindthe linearcombinationofz1 andz2 bychoosingthecoefficients 1 = (x1 + 2x2)/3 and 2 =(2x1x2)/3. Again, the coefficients that represents x
1using{z , z2}areunique.ForthespaceV,andforanygivenbasis,wecanfindauniquesetoftwocoefficientstorepresent
anyvector
x
V
.
Inother
words,
any
vector
in
V
is
uniquely
described
by
two
coefficients,
or
parameters. Thus,abasisprovidesaparameterizationofthevectorspaceV. Thedimensionofthespaceistwo,becausethebasishastwovectors,i.e.,thevectorsinthespaceareuniquelydescribedbytwoparameters.
Whiletherearemanybasesforanyspace,therearecertainbasesthataremoreconvenientto
work with than others. Orthonormal bases bases consisting of orthonormal sets of vectors are such a class of bases. We recall that two set of vectors are orthogonal to each other if their
215
-
7/27/2019 MIT2 086F12 Notes Unit3
22/107
1innerproductvanishes. Inorderforasetofvectors{v , . . . , vn}tobeorthogonal,thevectorsmustsatisfy
Inotherwords,thevectorsaremutuallyorthogonal. Anorthonormalsetofvectorsisanorthogonal1setofvectorswitheachvectorhavingnormunity. Thatis,thesetofvectors{v , . . . , vn}ismutually
orthonormal if
We note that an orthonormal set of vectors is linearly independent by construction, as we nowprove.
1Proof. Let{v , . . . , vn}beanorthogonalsetof(non-zero)vectors. Bydefinition,thesetofvectorsislinearlyindependentiftheonlylinearcombinationthatyieldsthezerovectorcorrespondstoallcoefficientsequaltozero, i.e.
1 1+n 1v +
vn = 0
=
=n = 0 .
Toverifythisindeedisthecaseforanyorthogonalsetofvectors,weperformtheinnerproductofthe linearcombinationwithv1, . . . , vn toobtain
i)T(1 1 +n i)T 1 i)T i i)T n(v v + vn) =1(v v + +i(v v + +n(v v=iIviI2, i= 1, . . . , n .
Notethat(vi)Tvj =0, i=j,duetoorthogonality. Thus,settingthe linearcombinationequaltozerorequires
iIviI2 = 0, i= 1, . . . , n .Inotherwords,i = 0or
Ivi
I2 =0foreachi. Ifwerestrictourselvestoasetofnon-zerovectors,
thenwemusthavei =0. Thus,avanishinglinearcombinationrequires1 = =n =0,whichisthedefinitionof linear independence.
Because an orthogonal set of vectors is linearly independent by construction, an orthonormalbasis for a space V is an orthonormal set of vectors that spans V. One advantage of using anorthonormalbasis isthatfindingthecoefficients foranyvector inV isstraightforward. Suppose,
1wehaveabasis{v , . . . , vn}andwishtofindthecoefficients1, . . . , n thatresultsinxV. Thatis,weare lookingforthecoefficientssuchthat
1 i nx=1v + +iv + +nv .Tofindthei-thcoefficienti,wesimplyconsiderthe innerproductwithvi, i.e.
1 ii)T(v x= (vi)T(1v + +iv + +nvn)i)T i)T i)T1 i n=1(v v + +i(v v + +n(v v
i=i(vi)Tv =iIviI2 =i, i= 1, . . . , n ,wherethelastequalityfollowsfromIviI2 =1. Thatis,i = (vi)Tx,i= 1, . . . , n. Inparticular,foranorthonormalbasis,wesimplyneedtoperformn innerproductstofindthencoefficients. Thisis in contrast to anarbitrarybasis, whichrequiresa solution to annn linearsystem (which issignificantlymorecostly,aswewillsee later).
216
(vi)Tvj = 0, i=j .
(vi)Tvj = 0, i=j
vi= (vi)Tvi = 1, i= 1, . . . , n .
-
7/27/2019 MIT2 086F12 Notes Unit3
23/107
Example16.1.15OrthonormalBasisLetusconsiderthespacevectorspaceV spannedby
1 2 01v = 2 2v = 1 and 3v = 1.
0 0 0RecallingeveryvectorinV isoftheform(x1, x2,0)
T
,asetofvectors1 0
1w = 0 and 2w = 10 0
forms an orthonormal basis of the space. It is trivial to verify they are orthonormal, as they areorthogonal,i.e.,(w1)Tw2 =0,andeachvectorisofunitlengthIw1I=Iw2I=1. Wealsoseethatwecanexpressanyvectoroftheform(x1, x2,0)T bychoosingthecoefficients1 =x1 and2 =x2.
1Thus, {w , w2} spans the space. Because the set of vectors spans the space and is orthonormal(andhence linearly independent),it isanorthonormalbasisofthespaceV.
Anotherorthonormalsetofbasis isformedby 1 21w = 1 12 and 2w = 1.
5 50 0Wecaneasily verify that they are orthogonalandeachhasa unit length. The coefficients for an
1arbitraryvectorx= (x1, x2,0)T V represented inthebasis{w 2}are, wx11 1
1 2 0 =1)T1 = (w (x1+ 2x2)x= x25 50
x11 12 1 0 =2)T2 = (w (2x1x2).x= x2
5 50
EndAdvancedMaterial16.2 MatrixOperations16.2.1 InterpretationofMatricesRecallthatamatrixARmn consistsofmrowsandncolumnsforthetotalofmnentries,
A=
A11 A12 A1nA21 A22 A2n
... ... ... ...Am1 Am2 Amn
.Thismatrixcanbe interpretedinacolumn-centricmannerasasetofncolumnm-vectors. Alternatively,thematrixcanbe interpreted inarow-centricmannerasasetofmrown-vectors. Eachofthese interpretations isusefulforunderstandingmatrixoperations,whichiscoverednext.
217
-
7/27/2019 MIT2 086F12 Notes Unit3
24/107
16.2.2 MatrixOperationsThefirstmatrixoperationweconsiderismultiplicationofamatrixARm1n1 byascalarR.Theoperationyields
B=A ,whereeachentryofB
Rm1n1 isgivenby
Bij =Aij, i= 1, . . . , m1, j= 1, . . . , n1 .Similartothemultiplicationofavectorbyascalar,themultiplicationofamatrixbyascalarscaleseachentryofthematrix.
The second operation we consider is addition of two matrices ARm1n1 and B Rm2n2.Theadditionyields
C=A+B ,whereeachentryofCRm1n1 isgivenby
Cij =Aij +Bij, i= 1, . . . , m1, j= 1, . . . , n1 .Inorderforadditionoftwomatricestomakesense,thematricesmusthavethesamedimensions,m1 andn1.
Wecancombinethescalarscalingandadditionoperation. LetARm1n1,BRm1n1,andR. Then,theoperation
C=A+ByieldsamatrixCRm1n1 whoseentriesaregivenby
Cij =Aij +Bij, i= 1, . . . , m1, j= 1, . . . , n1 .Notethatthescalar-matrixmultiplicationandmatrix-matrixadditionoperationstreatthematricesasarraysofnumbers,operatingentrybyentry. Thisisunlikethematrix-matrixprodcut,whichisintroducednextafteranexampleofmatrixscalingandaddition.Example16.2.1matrixscalingandadditionConsiderthefollowingmatricesandscalar,
1 3 0 2 4 9 , B= 2 3A= , and = 2 . 3 4
Then, 1 3 0 2 1 3 + 4 4 9 + 2 2 3= 0 3 C=A+B= . 3 4 3 11
218
-
7/27/2019 MIT2 086F12 Notes Unit3
25/107
Matrix-MatrixProductLet us consider two matrices A Rm1n1 and B Rm2n2 with n1 = m2. The matrix-matrixproductofthematricesresults in
C=ABwith
n1
Cij = AikBkj, i= 1, . . . , m1, j= 1, . . . , n2 .k=1
Becausethesummation applies to the second indexofAandthefirst indexofB, the numberofcolumnsofAmustmatchthenumberofrowsofB: n1 =m2 must betrue. Letusconsidera fewexamples.Example16.2.2matrix-matrixproductLetusconsidermatricesAR32 andBR23 with
m
1 3
2 3 5
and B= .1 0
1A= 4 90 3
Thematrix-matrixproductyields 1 3 5 3 8
2 3 5 4 9 = 1 12 11 C=AB= ,1 0 1
0 3 3 0 3whereeachentryiscalculatedas
2
m
C11 = A1kBk1 =A11B11+A12B21 = 12 + 31 = 5k=1m2
C12 = A1kBk2 =A11B12+A12B22 = 13 + 30 = 3k=1m2
C13 = A1kBk3 =A11B13+A12B23 = 1 5 + 3(1)=8k=1m2
C21 = A2kBk1 =A21B11+A22B21 =42 + 91 = 1k=1
...m2C33 = A3kBk3 =A31B13+A32B23 = 0 5 + (3)(1)=3 .k=1
NotethatbecauseAR32 andBR23,CR33.Thisisverydifferentfrom
1 32 3 5 10 48= ,
1 6D=BA= 4 91 0 10 3
219
-
7/27/2019 MIT2 086F12 Notes Unit3
26/107
whereeachentry iscalculatedas3
D11 = A1kBk1 =B11A11+B12A21+B13A31 = 21 + 3(4)+(5)0 =10k=1
...
m
3
D22 = A2kBk2 =B21A12+B22A22+B23A32 = 13 + 09 + (1)(3)=6.k=1
NotethatbecauseBR23 andAR32,DR22. Clearly,C =AB=BA=D;C andD infacthavedifferent dimensions. Thus, matrix-matrixproduct isnot commutative in general, evenifbothAB andBAmakesense.
Example16.2.3 innerproductasmatrix-matrixproductTheinnerproductoftwovectorscanbeconsideredasaspecialcaseofmatrix-matrixproduct. Let
m
1
2v= 3 and w= 0 .
6 4T R13We have v,wR3(=R31). Taking the transpose, we have v . Noting that the second
dimensionofvT andthefirstdimensionofwmatch,wecanperformmatrix-matrixproduct,2
T=v w= 1 3 6 0 = 1(2)+30 + 64=22.4
Example16.2.4outerproductTheouterproductoftwovectors isyetanotherspecialcaseofmatrix-matrixproduct. TheouterproductB oftwovectorsvRm andwRm isdefinedas
TB=vw .TBecausevRm1 andwT R1m, thematrix-matrix product vw iswell-defined andyields as
mmmatrix.As inthepreviousexample, let
1 2v= 3 and w= 0 .6 4
Theouterproductoftwovectors isgivenby 2 2 6 12
Twv = 0 1 3 6 = 0 0 0 .4 4 12 24
Clearly,=vTw=wvT =B,astheyevenhavedifferentdimensions.220
-
7/27/2019 MIT2 086F12 Notes Unit3
27/107
Intheaboveexample,wesawthatAB=BAingeneral. Infact,ABmightnotevenbeallowedeven if BA is allowed (consider AR21 and BR32). However, although the matrix-matrixproduct isnotcommutative ingeneral,thematrix-matrixproductis associative, i.e.
ABC=A(BC) = (AB)C .Moreover,thematrix-matrixproduct isalsodistributive, i.e.
(A+B)C=AC+BC .
Proof. Theassociativeanddistributivepropertiesofmatrix-matrixproductisreadilyprovenfromitsdefinition. Forassociativity,weconsiderij-entryofthem1n3 matrixA(BC), i.e.
n1 n1 n2 n1 n2 n2 n1m m m mm mm =(A(BC))ij = Aik(BC)kj = Aik BklClj AikBklClj = AikBklCljk=1 k=1 l=1 k=1 l=1 l=1 k=1
n2 n1 n2m m m= AikBklClj = (AB)ilClj =((AB)C)ij, i, j .l=1 k=1 l=1
Sincetheequality(A(BC))ij =((AB)C)ij holdsforallentries,wehaveA(BC) = (AB)C.Thedistributivepropertycanalsobeprovendirectly. Theij-entryof(A+B)Ccanbeexpressed
asn1 n1 n1m m m
((A+B)C)ij = (A+B)ikCkj = (Aik +Bik)Ckj = (AikCkj +BikCkj)k=1 k=1 k=1n1 n1
m m
= AikCkj + BikCkj = (AC)ij + (BC)ij, i, j .k=1 k=1
Again,sincetheequalityholdsforallentries,wehave(A+B)C=AC+BC.
Anotherusefulruleconcerningmatrix-matrixproductandtransposeoperation is(AB)T =BTAT .
Thisrule isusedveryoften.
Proof. Theprooffollowsbycheckingthecomponentsofeachside. The left-handsideyieldsn1m((AB)T)ij = (AB)ji = Ajk Bki .
k=1Theright-handsideyields
n1 n1 n1m m m(BTAT)ij = (BT)ik(AT)kj = BkiAjk = Ajk Bki .
k=1 k=1 k=1
221
-
7/27/2019 MIT2 086F12 Notes Unit3
28/107
Thus,wehave((AB)T)ij = (BTAT)ij, i= 1, . . . , n2, j= 1, . . . , m1 .
16.2.3 InterpretationsoftheMatrix-VectorProductLet us consider a special case of the matrix-matrix product: the matrix-vector product. Thespecial case arises when the second matrix has only one column. Then, with A Rmn andw=BRn1 =Rn,wehave
C=AB ,where
n n
Cij = AikBkj = Aikwk, i= 1, . . . , m1, j= 1.k=1 k=1
SinceCRm1 =Rm,wecanintroducevRm andconciselywritethematrix-vectorproductasm
v=Aw ,where
m
mnvi = Aikwk, i= 1, . . . , m .
k=1Expandingthesummation,wecanthinkofthematrix-vectorproductas
v1 =A11w1+A12w2+ +A1nwnv2 =A21w1+A22w2+ +A2nwn
...vm =Am1w1+Am2w2+ +Amnwn .
Now,weconsidertwodifferent interpretationsofthematrix-vectorproduct.Row InterpretationThefirst interpretation istherow interpretation,whereweconsiderthematrix-vectormultiplicationasaseriesof innerproducts. Inparticular,weconsidervi asthe innerproductofi-throwofAandw. Inotherwords,thevectorv iscomputedentrybyentryinthesensethat
Ai1 Ai2 Ain
w1w2 i= 1, . . . , m .vi = ...wn
,
222
-
7/27/2019 MIT2 086F12 Notes Unit3
29/107
Example16.2.5row interpretationofmatrix-vectorproductAnexampleoftherow interpretationofmatrix-vectorproduct is
T0 1 0 3 2 1 0 1 0 2
=
1 0 0 3 2 1 TT
0 0 0
3 2 1
T0 0 1 3 2 1
=
32v=
.1 0 0
0 0 0 301
0 0 1 1
Column InterpretationThe second interpretation is the column interpretation, where we consider the matrix-vectormultiplicationasasumofnvectorscorrespondingtothencolumnsofthematrix,i.e.
A11 A12 A1n
v= w1+ w2+ + wn .A21
...A22
...A2n
...Am1 Am2 Amn
In this case, we consider v as a linear combination of columns of A with coefficients w. Hencev=Aw issimplyanotherwaytowritea linearcombinationofvectors: thecolumnsofAarethevectors,andwcontainsthecoefficients.Example16.2.6column interpretationofmatrix-vectorproductAnexampleofthecolumn interpretationofmatrix-vectorproduct is
0 1 0 0 1 0 2
3v= = 3 10+ 2 00+ 1 00= 30.1 0 00 0 0 21
0 0 1 0 0 1 1Clearly, the outcome of the matrix-vector product is identical to that computed using the rowinterpretation.
LeftVector-MatrixProductWenowconsideranotherspecialcaseofthematrix-matrixproduct: theleftvector-matrixproduct.This special case arises when the first matrix only has one row. Then, we have A
R1m and
TBRmn. LetusdenotethematrixA,which isarowvector,byw . Clearly,wRm,becauseT R1mw . The leftvector-matrixproductyields
v=wTB ,where mm
vj = wkBkj, j= 1, . . . , n .k=1
223
-
7/27/2019 MIT2 086F12 Notes Unit3
30/107
Theresultantvectorv isarowvector inR1n. The leftvector-matrixproductcanalsobe interpreted in two different manners. The first interpretation considers the product as a series of dotproducts,whereeachentryvj iscomputedasadotproductofwwiththej-thcolumnofB, i.e.
vj = w1 w2 wm B1jB2j
..
.
Bmj , j= 1, . . . , n .
Thesecondinterpretationconsiderstheleftvector-matrixproductasalinearcombinationofrowsofB, i.e.
v=w1 B11 B12 B1n +w2 B21 B22 B2n+ +wm Bm1 Bm2 Bmn .
16.2.4 InterpretationsoftheMatrix-MatrixProductSimilartothematrix-vectorproduct,thematrix-matrixproductcanbeinterpretedinafewdifferentways. Throughoutthediscussion,weassumeARm1n1 andBRn1n2 andhenceC =ABR
m1n2.Matrix-MatrixProductasaSeriesofMatrix-VectorProductsOneinterpretationofthematrix-matrixproductistoconsideritascomputingC onecolumnatatime,wherethej-thcolumnofC resultsfromthematrix-vectorproductofthematrixAwiththej-thcolumnofB,i.e.
Cj =ABj, j= 1, . . . , n2 ,whereCj referstothej-thcolumnofC. Inotherwords,
C1jC2j
...=
A11 A12 A1n1A21 A22 A2n1
. . ... . . .. . . .
B1jB2j
..., j= 1, . . . , n2 .
Cm1j Am11 Am12 Am1n1 Bn1jExample16.2.7matrix-matrixproductasaseriesofmatrix-vectorproductsLetusconsidermatricesAR32 andBR23 with
1 3A= 4 9 2 3 5and B= .1 0 1
0 3ThefirstcolumnofC=ABR33 isgivenby
1 3 5C1 =AB1 = 4 9 2 = 1 .1
0 3 3224
-
7/27/2019 MIT2 086F12 Notes Unit3
31/107
Similarly,thesecondandthirdcolumnsaregivenby 1 3 3 4 9 3 =
0 12C2 =AB2 =0 3 0
and 1 3 8C3 =AB3 = 4 9 5 = 11 .1
0 3 3PuttingthecolumnsofC together
5 3 8C= C1 C2 C3 = 1 12 11 .
3 0 3
Matrix-MatrixProductasaSeriesofLeftVector-MatrixProductsIn the previous interpretation, we performed the matrix-matrix product by constructing the resultant matrix one column at a time. We can also use a series of left vector-matrix products toconstruct the resultant matrix one row at a time. Namely, in C =AB, the i-th row ofC resultsfromthe leftvector-matrixproductofi-throwofAwiththematrixB, i.e.
Ci =AiB, i= 1, . . . , m1 ,whereCi referstothei-throwofC. Inotherwords,
Ci1 Cin1 = Ai1 Ain1
B11 B1n2. ... . ... .
Bm21 Bm2n2, i= 1, . . . , m1 .
16.2.5 OperationCountofMatrix-MatrixProductMatrix-matrix product is ubiquitous in scientific computing, and significant effort has been putinto efficient performance of the operation on modern computers. Let us now count the numberof additions and multiplications required to compute this product. Consider multiplication ofARm1n1 andBRn1n2. TocomputeC=AB,weperform
mn1Cij = AikBkj, i= 1, . . . , m1, j= 1, . . . , n2 .
k=1ComputingeachCij requiresn1 multiplicationsandn1 additions,yieldingthetotalof2n1 operations. Wemustperformthisform1n2 entriesinC. Thus,thetotaloperationcountforcomputingC is 2m1n1n2. Considering the matrix-vector product and the inner product as special cases ofmatrix-matrixproduct,wecansummarizehowtheoperationcountscales.
225
-
7/27/2019 MIT2 086F12 Notes Unit3
32/107
Operation Sizes OperationcountMatrix-matrix m1 =n1 =m2 =n2 =n 2n3Matrix-vector m1 =n1 =m2 =n,n2 = 1 2n2Innerproduct n1 =m1 =n,m1 =n2 = 1 2n
The operation count is measured in FLoating Point Operations, or FLOPs. (Note FLOPS isdifferentfromFLOPs: FLOPSreferstoFLoatingPointOperationsperSecond,whichisaspeedassociated
with
a
particular
computer/hardware
and
aparticular
implementation
of
an
algorithm.)
16.2.6 TheInverseofaMatrix(Briefly)Wehave nowstudied thematrix vectorproduct, in which, given a vector xRn, we calculate anew vector b = Ax, where A Rnn and hence b Rn. We may think of this as a forwardproblem, in which given x we calculate b = Ax. We can now also ask about the correspondinginverseproblem: givenb,canwefindxsuchthatAx=b? Note inthissection,andforreasonswhich shall become clear shortly, we shall exclusively consider square matrices, and hence we setm=n.
To begin, let us revert to the scalar case. If b is a scalar and a is a non-zero scalar, we knowthatthe (very simple linear) equation ax=bhas thesolution x=b/a. We may write thismore
1suggestively as x = a1b since of course a = 1/a. It is important to note that the equationax=bhasasolutiononlyifaisnon-zero;ifaiszero,thenofcoursethereisnoxsuchthatax=b.(This isnotquitetrue: in fact, ifb=0anda=0thenax=bhasan infinityofsolutionsanyvalueofx. Wediscussthissingularbutsolvablecase inmoredetailinUnitV.)
Wecannowproceedtothematrixcasebyanalogy.ThematrixequationAx=bcanofcoursebeviewedasasystemof linearequations innunknowns. Thefirstequationstatesthatthe innerproductofthefirstrowofAwithxmustequalb1;ingeneral,theith equationstatesthattheinnerproductoftheith rowofAwithxmustequalbi. Then ifA isnon-zerowecouldplausiblyexpectthatx=A1b. Thisstatementisclearlydeficientintworelatedways: whatwedomeanwhenwesayamatrixisnon-zero? andwhatdowe infactmeanbyA1.
Asregardsthefirstquestion,Ax=bwillhaveasolutionwhenAisnon-singular: non-singularisthe
proper
extension
of
the
scalar
concept
of
non-zero
in
this
linear
systems
context.
Conversely,
if A is singular then (except for special b) Ax = b will have no solution: singular is the properextensionofthescalarconceptofzero inthis linearsystemscontext. Howcanwedetermine ifamatrixA issingular? Unfortunately, it isnotnearlyassimpleasverifying,say,thatthematrixconsistsofat leastonenon-zeroentry,orcontainsallnon-zeroentries.
Therearevariety ofways to determinewhether amatrix isnon-singular, manyof which mayonlymake goodsense in laterchapters(in particular, inUnit V): a non-singularnnmatrix Ahas n independent columns (or, equivalently, n independent rows); a non-singular nn matrixAhasallnon-zeroeigenvalues;anon-singularmatrixAhasanon-zerodeterminant(perhapsthisconditionisclosesttothescalarcase,butitisalsoperhapstheleastuseful);anon-singularmatrixA has allnon-zero pivots in a (partially pivoted) LU decomposition process (described in UnitV). For now, we shall simply assume that A is non-singular. (We should also emphasize that inthenumericalcontextwemustbeconcernednot onlywithmatriceswhich mightbesingularbutalsowithmatriceswhicharealmostsingular insomeappropriatesense.) Asregardsthesecondquestion,wemustfirst introducetheidentity matrix,I.
Letusnowdefineanidentitymatrix. Theidentitymatrixisammsquarematrixwithonesonthediagonalandzeroselsewhere, i.e.
1, i=jIij = .
0, i j226
=
-
7/27/2019 MIT2 086F12 Notes Unit3
33/107
Identitymatrices inR1,R2,andR3 are 1 0 0
1 0 0 1 0I= 1 , I= , and I= .0 1
0 0 1The identitymatrix isconventionallydenotedbyI. IfvRm,thei-thentryofIv is
m
(Iv)i = Iikvkk=1
0 0 0 0=(( +Ii1v1+ + Ii,i+1vi+1+Ii,i1vi1+Iiivi+ Iimvm=vi, i= 1, . . . , m .
TI TSo, we have Iv = v. Following the same argument, we also have v = v . In essence, I is them-dimensionalversionofone.
WemaythendefineA1 asthat(unique)matrixsuchthatA1A=I. (Ofcourseinthescalar1 1 1case, this defines a as the unique scalar such that a a = 1 and hence a = 1/a.) In fact,
A1A=I andalsoAA1 =I andthus this isa case inwhichmatrixmultiplicationdoes indeedcommute. Wecannowderivetheresultx=A1b: webeginwithAx=bandmultiplybothsidesbyA1 toobtainA1Ax=A1bor,sincethematrixproduct isassociative,x=A1b. OfcoursethisdefinitionofA1 doesnotyettellushowtofindA1: weshallshortlyconsiderthisquestionfrom a pragmaticMatlab perspective and then in Unit V from a more fundamental numericallinear algebra perspective. We should note here, however, that the matrix inverse is very rarelycomputedorused inpractice, forreasonswewillunderstand inUnitV.Nevertheless,the inversecan be quite useful for very small systems (n small) and of course more generally as an central
m
concept in the consideration of linear systems of equations. Example16.2.8The inverseofa 22matrixWeconsiderherethecaseofa2
2matrixAwhichwewriteas
A= a b .c d
Ifthecolumnsaretobeindependentwemusthavea/b=c/dor(ad)/(bc)=1oradbc=0whichinfactistheconditionthatthedeterminantofAisnonzero. The inverseofA isthengivenby
1 d bA1 = .
adbc c aNotethatthis inverse isonlydefined ifadbc=0,andwethusseethenecessitythatA isnon-singular. It isasimplemattertoshowbyexplicitmatrixmultiplicationthatA1A=AA1 =I,asdesired.
16.3 SpecialMatricesLetusnowintroduceafewspecialmatricesthatweshallencounterfrequentlyinnumericalmethods.
227
-
7/27/2019 MIT2 086F12 Notes Unit3
34/107
16.3.1 DiagonalMatricesAsquarematrixA issaidtobediagonal iftheoff-diagonalentriesarezero, i.e.
Aij = 0, i=j .Example16.3.1diagonalmatricesExamplesofdiagonalmatrixare
2 0 01 0
= 0 1 0, and C= 4 .A= , B0 3
0 0 7Theidentitymatrixisaspecialcaseofadiagonalmatrixwithalltheentriesinthediagonalequalto 1. Any 11matrix istriviallydiagonalas itdoesnothaveanyoff-diagonalentries.
16.3.2 SymmetricMatricesA square matrix A is said to be symmetric if the off-diagonal entries are symmetric about thediagonal, i.e.
Aij =Aji , i= 1, . . . , m , j= 1, . . . , m .Theequivalentstatement isthatA isnotchangedbythetransposeoperation, i.e.
AT =A .Wenotethatthe identitymatrix isaspecialcaseofsymmetricmatrix. Letus lookatafewmoreexamples.Example16.3.2SymmetricmatricesExamplesofsymmetricmatricesare
2 31 2
= 1 1, and C= 4 .A= , B2 33 1 7
Notethatanyscalar,ora11matrix, istriviallysymmetricandunchangedundertranspose.
16.3.3 SymmetricPositiveDefiniteMatricesA mm square matrix A is said to be symmetric positive definite (SPD) if it is symmetric andfurthermoresatisfies
vTAv>0, vRm (v=0) .Beforewediscuss itsproperties, letusgiveanexampleofaSPDmatrix.
228
-
7/27/2019 MIT2 086F12 Notes Unit3
35/107
Example16.3.3SymmetricpositivedefinitematricesAnexampleofasymmetricpositivedefinitematrix is
2 1A= .1 2
WecanconfirmthatAissymmetricbyinspection. TocheckifAispositivedefinite,letusconsiderthequadraticform
2 2 2 2m m m mq(v)vTAv= vi Aijvj= Aijvivj
i=1 j=1 i=1 j=1=A11v12+A12v1v2+A21v2v1+A22v22=A11v12+ 2A12v1v2+A22v2 ,2
wherethe last equality follows from thesymmetry condition A12 =A21. Substituting theentriesofA,
2 21 1 1 32 2 2 2 2q(v) =vTAv= 2v12v1v2+ 2v2 = 2 v1 v2 v2 +v = 2 v1 v2 + v .2 22 4 2 4Becauseq(v) isasumoftwopositiveterms(eachsquared), it isnon-negative. It isequaltozeroonly if
1 3 2v1 v2 =0 and v2 = 0.2 4Thesecond conditionrequires v2 =0, andthe first conditionwithv2 =0 requires v1 =0. Thus,wehave
q(v) =vTAv>0, vR2 ,andvTAv= 0ifv=0. ThusA issymmetricpositivedefinite.
Symmetricpositivedefinitematricesareencounteredinmanyareasofengineeringandscience.
Theyarisenaturallyinthenumericalsolutionof,forexample,theheatequation,thewaveequation,andthelinearelasticityequations. Oneimportantpropertyofsymmetricpositivedefinitematricesisthattheyarealways invertible: A1 alwaysexists. Thus, ifA isanSPD matrix,then, foranyb,thereisalwaysauniquexsuchthat
Ax=b .Inalaterunit,wewilldiscusstechniquesforsolutionoflinearsystems,suchastheoneabove. Fornow, wejustnote that thereare particularly efficient techniques for solving thesystem whenthematrix issymmetricpositivedefinite.
229
-
7/27/2019 MIT2 086F12 Notes Unit3
36/107
16.3.4 TriangularMatricesTriangular matrices are square matrices whose entries are all zeros either below or above thediagonal. A mm square matrix is said to be upper triangular if all entries below the diagonalarezero, i.e.
Aij = 0, i > j .Asquarematrix issaidtobe lowertriangular ifallentriesabovethediagonalarezero, i.e.
Aij = 0, j > i .We will see later that a linear system, Ax = b, in which A is a triangular matrix is particularlyeasytosolve. Furthermore,thelinearsystemisguaranteedtohaveauniquesolutionaslongasalldiagonalentriesarenonzero.Example16.3.4triangularmatricesExamplesofuppertriangularmatricesare
1 0 21 2
A= and B= 0 4 1 .0 3 0 0 3Examplesof lowertriangularmatricesare
2 0 01 0
C= and D= 7 5 0.7 63 1 4
BeginAdvancedMaterial
16.3.5Orthogonal
Matrices
AmmsquarematrixQ issaidtobeorthogonal if itscolumns formanorthonormalset. Thatis, ifwedenotethej-thcolumnofQbyqj,wehave
Q= q1 q2 qm ,where
T 1, i=jqi qj = .0, i=j
OrthogonalmatriceshaveaspecialpropertyQTQ=I .
ThisrelationshipfollowsdirectlyfromthefactthatcolumnsofQformanorthonormalset. Recallthat the ij entry ofQTQ isthe inner productofthe i-throwof QT (which is the i-thcolumnofQ)andthej-thcolumnofQ. Thus,
T 1, i=j(QTQ)ij =qi qj = ,0, i=j
230
-
7/27/2019 MIT2 086F12 Notes Unit3
37/107
which isthedefinitionofthe identitymatrix. OrthogonalmatricesalsosatisfyQQT =I ,
which infact isaminormiracle.Example16.3.5OrthogonalmatricesExamplesoforthogonalmatricesare
2/ 5 1/ 5 1 0Q= and I= .
1/ 5 2/ 5 0 1Wecaneasilyverifythatthecolumnsthematrix Qareorthogonalto eachotherandeachareofunitlength. Thus,Qisanorthogonalmatrix. WecanalsodirectlyconfirmthatQTQ=QQT =I.Similarly,the identitymatrix istriviallyorthogonal.
Let us discuss a few important properties of orthogonal matrices. First, the action by an
orthogonalmatrixpreservesthe2-normofavector, i.e.IQxI2 =IxI2, xRm .
Thisfollowsdirectlyfromthedefinitionof2-normandthefactthatQTQ=I, i.e.TIQxI22 = (Qx)T(Qx) =xTQTQx=xTIx=x x=IxI22 .
Second, orthogonal matrices are always invertible. In fact, solving a linear system defined by anorthogonalmatrix istrivialbecause
Qx=b QTQx=QTb x=QTb .In considering linear spaces, we observed that a basis provides a unique description of vectors inV intermsofthecoefficients. AscolumnsofQformanorthonormalsetofm m-vectors,itcanbethoughtofasanbasisofRm. InsolvingQx=b,wearefindingtherepresentationofbincoefficientsof{q1, . . . , q m}. Thus, the operation byQT (or Q) represent asimple coordinatetransformation.LetussolidifythisideabyshowingthatarotationmatrixinR2 isanorthogonalmatrix.Example16.3.6RotationmatrixRotation of a vector is equivalent to representing the vector in a rotated coordinate system. Arotationmatrixthatrotatesavector inR2 byangle is
cos() sin()R() = .
sin() cos()Let us verify that the rotation matrix is orthogonal for any . The two columns are orthogonalbecause
sin()Tr1r2 = cos() sin() =cos()sin()+sin()cos() = 0, .cos()Eachcolumn isofunitlengthbecause
Ir1I22 =(cos())2+(sin())2 = 1Ir2I22 = (sin())2+(cos())2 = 1, .
231
-
7/27/2019 MIT2 086F12 Notes Unit3
38/107
Thus, the columns of the rotation matrix is orthonormal, and the matrix is orthogonal. Thisresult verifies that the action of the orthogonal matrix represents a coordinate transformation inR
2. The interpretationofanorthogonalmatrixasacoordinatetransformationreadilyextendstohigher-dimensionalspaces.
16.3.6 OrthonormalMatricesLetusdefineorthonormalmatricestobemnmatriceswhosecolumnsformanorthonormalset,i.e.
Q= q1 q2 qn ,with
T 1, i=jqi qj =0, i=j .
Note that, unlike an orthogonal matrix, we do not require the matrix to be square. Just likeorthogonalmatrices,wehave
QTQ=I ,where I is an nn matrix. The proof is identical to that for the orthogonal matrix. However,QQT doesnotyieldan identitymatrix,
QQT =I ,unlessofcoursem=n.Example16.3.7orthonormalmatricesAn
example
of
an
orthonormal
matrix
is
1/ 6 2/ 5 Q= 2/ 6 1/ 5 .1/ 6 0
WecanverifythatQTQ=I because 1/ 6 2/ 5 1/ 6 2/ 6 1/ 6 1 0= . QTQ= 2/ 6 1/ 5 0 12/ 5 1/ 5 01/ 6 0
However,QQT =I because 1/ 6 2/ 5 29/30 1/15 1/61/15 13/15 1/3 1/ 6 2/ 6 1/ 6 = QQT = 2/ 6 1/ 5 . 2/ 5 1/ 5 01/6 1/3 1/61/ 6 0
EndAdvancedMaterial
232
=
-
7/27/2019 MIT2 086F12 Notes Unit3
39/107
BeginAdvancedMaterial16.4 FurtherConcepts inLinearAlgebra16.4.1 ColumnSpaceandNullSpaceLet us introduce more concepts in linear algebra. First is the column space. The column spaceof matrix A is a space of vectors that can be expressed as Ax. From the column interpretationofmatrix-vector product, we recallthat Ax is a linearcombination ofthecolumns ofAwiththeweightsprovidedbyx. WewilldenotethecolumnspaceofARmn bycol(A),andthespace isdefinedas
col(A) ={vRm :v=AxforsomexRn} .ThecolumnspaceofA isalsocalledtheimageofA, img(A),ortherangeofA,range(A).
Thesecondconceptisthenullspace. ThenullspaceofARmn isdenotedbynull(A)andisdefinedas
null(A) ={xRn :Ax= 0} ,i.e., the null space of A is a space of vectors that results in Ax = 0. Recalling the columninterpretationofmatrix-vectorproductandthedefinitionoflinearindependence,wenotethatthecolumnsofAmustbe linearlydependent inorder forAtohaveanon-trivialnullspace. ThenullspacedefinedaboveismoreformallyknownastherightnullspaceandalsocalledthekernelofA,ker(A).Example16.4.1columnspaceandnullspaceLetusconsidera32matrix
0 2A=
1 0
.
0 0ThecolumnspaceofA isthesetofvectorsrepresentableasAx,whichare
0 2 0 2 2x2Ax= 1 0 x1 = 1x1+ 0x2 = .x1x20 0 0 0 0
So,thecolumnspaceofAisasetofvectorswitharbitraryvalues inthefirsttwoentriesandzerointhethirdentry. That is,col(A) isthe1-2plane inR3.
BecausethecolumnsofAarelinearlyindependent,theonlywaytorealizeAx=0isifxisthezerovector. Thus,thenullspaceofAconsistsofthezerovectoronly.
Letusnowconsidera23matrixB= 1
2 21 03 .ThecolumnspaceofB consistsofvectorsoftheform
x11 2 0 x1+ 2x2= .Bx= x22 1 3 2x1x2+ 3x3x3233
-
7/27/2019 MIT2 086F12 Notes Unit3
40/107
Byjudiciouslychoosingx1,x2,andx3,wecanexpressanyvectors inR2. Thus,thecolumnspaceofB isentireR2, i.e.,col(B) =R2.
BecausethecolumnsofB arenot linearly independent,weexpectB tohaveanontrivialnullspace. Invokingtherowinterpretationofmatrix-vectorproduct,avectorxinthenullspacemustsatisfy
x1+ 2x2 = 0 and 2x1
x2+ 3x3 = 0 .The first equation requires x1 =2x2. The combination of the first requirement and the second
5equationyieldsx3 = 3x2. Thus,thenullspaceofB is 2 :R .null(B) = 1 5/3Thus,thenullspace isaone-dimensionalspace(i.e.,a line) inR3.
16.4.2 ProjectorsAnotherimportantconceptinparticularforleastsquarescoveredinChapter17istheconceptofprojector. Aprojector isasquarematrixP that is idempotent, i.e.
P2 =P P =P .Letv beanarbitraryvector inRm. TheprojectorP projectsv,which isnotnecessary incol(P),ontocol(P), i.e.
w=Pvcol(P), vRm .In addition, the projector P does not modify a vector that is already in col(P). This is easilyverifiedbecause
Pw=PPv=Pv=w, wcol(P) .Intuitively,aprojectorprojectsavectorvRm ontoasmallerspacecol(P). Ifthevectorisalreadyincol(P),then itwouldbe leftunchanged.
The complementary projector of P is a projector IP. It is easy to verify that IP is aprojector itselfbecause
(IP)2 = (IP)(IP) =I2P+P P =IP .ItcanbeshownthatthecomplementaryprojectorI
P projectsontothenullspaceofP,null(P).
Whenthe spacealongwhichthe projectorprojects isorthogonalto the space ontowhichtheprojector projects, the projector is said to be an orthogonal projector. Algebraically, orthogonalprojectorsaresymmetric.
When an orthonormal basis for a space is available, it is particularly simple to construct anorthogonalprojectorontothespace. Say{q1, . . . , q n} isanorthonormalbasis foran-dimensionalsubspaceofRm,n < m. GivenanyvRm,werecallthat
Tui =qi v234
-
7/27/2019 MIT2 086F12 Notes Unit3
41/107
is the component of v in the direction of qi represented in the basis{qi}. We then introduce thevector
wi =qi(qiTv) ;the sum of such vectors would produce the projection of vRm onto V spanned by{qi}. Morecompactly, ifweformanmnmatrix
Q= q1 qn ,thentheprojectionofv ontothecolumnspaceofQis
w=Q(QTv) = (QQT)v .Werecognizethattheorthogonalprojectorontothespanof{qi}orcol(Q) is
P =QQT .Of course P is symmetric, (QQT)T = (QT)TQT = QQT, and idempotent, (QQT)(QQT) =Q(Q
T
Q)QT
=QQT
. EndAdvancedMaterial
235
-
7/27/2019 MIT2 086F12 Notes Unit3
42/107
236
-
7/27/2019 MIT2 086F12 Notes Unit3
43/107
Chapter17LeastSquares
17.1Data
Fitting
in
Absence
of
Noise
and
Bias
We motivate our discussion by reconsidering the friction coefficient example of Chapter 15. Werecall that, according to Amontons, the static friction, Ff,static, and the applied normal force,Fnormal,applied,arerelatedby
Ff,static sFnormal,applied ;here s is the coefficient of friction, which is only dependent on the two materials in contact. Inparticular,themaximum staticfriction isa linearfunctionoftheappliednormalforce, i.e.
Fmaxf,static =sFnormal,applied .We wish to deduce s by measuring the maximum static friction attainable for several differentvaluesoftheappliednormalforce.
Ourapproachtothisproblemistofirstchoosetheformofamodelbasedonphysicalprinciplesandthen deducetheparametersbased ona set of measurements. In particular, letus consider asimpleaffinemodel
y=Ymodel(x;) =0+1x .The variable y is the predicted quantity, or the output, which is the maximum static frictionFmax The variable x is the independent variable, or the input, which is the maximum normalf,static.force Fnormal,applied. The function Ymodel is our predictive model which is parameterized by aparameter = (0, 1). NotethatAmontons law isaparticularcaseofourgeneralaffinemodelwith0 =0and1 =s. Ifwetakemnoise-free measurementsandAmontonslawisexact,thenweexpect
Fmaxf,static i =sFnormal,applied i, i= 1, . . . , m .Theequationshouldbesatisfiedexactly foreachoneofthemmeasurements. Accordingly,thereisalsoauniquesolutiontoourmodel-parameteridentificationproblem
yi =0+1xi, i= 1, . . . , m ,237
DRAFT V1.2 The Authors. License: Creative Commons BY-NC-SA 3.0.
http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/ -
7/27/2019 MIT2 086F12 Notes Unit3
44/107
withthesolutiongivenbytrue =0andtrue =s.0 1
Because the dependency of the output y on the model parameters {0, 1} is linear, we canwritethesystemofequationsasam2matrixequation
1 x1
0 =1
y1y2...
1 x2. .. .. .
,1 xm ym' v '" v '" v"
X Yor,morecompactly,
X=Y .Using therow interpretationof matrix-vectormultiplication, we immediately recover theoriginalsetofequations,
0+1x1 =y1y2...=Y .X= 0+1x2...
0+1xm ymOr, using the column interpretation, we see that our parameter fitting problem corresponds tochoosingthetwoweightsforthetwom-vectorstomatchtheright-handside,
1 x1
=
y1y2...
=Y .X=0
+1
1...
x2...
1
xm
ym
We emphasize that the linear system X = Y is overdetermined, i.e., more equations thanunknowns(m > n). (Westudythis inmoredetail inthenextsection.) However,wecanstillfindasolutiontothesystembecausethefollowingtwoconditionsaresatisfied:
Unbiased: Ourmodelincludesthetruefunctionaldependencey=sx,andthusthemodelis capable of representing this true underlying functional dependence. This would not bethe case if, for example, we consider a constant model y(x) =0 because our model wouldbe incapableofrepresentingthe lineardependenceofthe friction forceonthenormal force.Clearly the assumption of no bias is a very strong assumption given the complexity of thephysicalworld.Noise free: We have perfect measurements: each measurement yi corresponding to theindependent variable xi provides the exact value of the friction force. Obviously this isagainrathernaveandwillneedtoberelaxed.
Undertheseassumptions,thereexistsaparametertruethatcompletelydescribethemeasurements,i.e.
yi =Ymodel(x;true), i= 1, . . . , m .238
-
7/27/2019 MIT2 086F12 Notes Unit3
45/107
(Thetrue willbeuniqueifthecolumnsofXareindependent.) Consequently,ourpredictivemodelisperfect,andwecanexactlypredicttheexperimentaloutputforanychoiceofx, i.e.
Y(x) =Ymodel(x;true), x ,whereY(x)istheexperimentalmeasurementcorrespondingtotheconditiondescribedbyx. However, in practice, the bias-free and noise-free assumptions are rarely satisfied, and our model isneveraperfectpredictorofthereality.InChapter19,wewilldevelopaprobabilistictoolforquantifyingtheeffectofnoiseandbias;thecurrent chapter focuses on developing a least-squares technique for solving overdetermined linearsystem(in the deterministiccontext) which isessentialto solvingthesedata fittingproblems. Inparticularwewillconsiderastrategyforsolvingoverdetermined linearsystemsoftheform
Bz=g ,whereBRmn,zRn,andgRm withm > n.
Beforewediscusstheleast-squaresstrategy,letusconsideranotherexampleofoverdeterminedsystems in the context of polynomial fitting. Let us consider a particle experiencing constantacceleration,e.g. duetogravity. Weknowthatthepositionyoftheparticleattimetisdescribedbyaquadraticfunction
y(t) = 1at2+v0t+y0 ,2
where a is the acceleration, v0 is the initial velocity, and y0 is the initial position. Suppose thatwedonotknowthethreeparametersa,v0,andy0 thatgovernthemotionoftheparticleandweare interested indeterminingtheparameters. Wecoulddothisbyfirstmeasuringthepositionofthe particle at several different times and recording the pairs {ti, y(ti)}. Then, we could fit ourmeasurementstothequadraticmodeltodeducetheparameters.
Theproblemoffindingtheparametersthatgovernthemotionoftheparticle isaspecialcaseofamoregeneralproblem: polynomialfitting. Letusconsideraquadraticpolynomial, i.e.
y(x) =true+truex+true 2x ,0 1 2where true = {true, true, true} is the set of true parameters characterizing the modeled phe0 1 2nomenon. Suppose that we do not know true but we do know that our output depends on theinputx inaquadraticmanner. Thus,weconsideramodeloftheform
Ymodel(x;) =0+1x+2x2 ,andwedeterminethecoefficientsbymeasuringtheoutputy forseveraldifferentvaluesofx. Weare free to choose the number of measurements m and the measurement points xi, i= 1, . . . , m.Inparticular,uponchoosingthemeasurementpointsandtakingameasurementateachpoint,weobtainasystemoflinearequations,
yi =Ymodel(xi;) =0+1xi+2xi2, i= 1, . . . , m ,whereyi isthemeasurementcorrespondingtothe inputxi.
2Notethattheequationislinearinourunknowns{0, 1, 2}(theappearanceofx onlyaffectsithe manner in which data enters the equation). Because the dependency on the parameters is
239
-
7/27/2019 MIT2 086F12 Notes Unit3
46/107
linear,wecanwritethesystemasmatrixequation, 1 x1 x2 y1 1
01 x2 x2y2 2 1= ,. . . .. . . .
.
. . .
2
ym 1 xm x2m"'"'"'vY vX v
or,morecompactly,Y =X .
NotethatthisparticularmatrixX hasaratherspecialstructureeachrowformsageometricsej1riesandtheij-thentryisgivenbyBij =x . MatriceswiththisstructurearecalledVandermondei
matrices.Asinthefrictioncoefficientexampleconsideredearlier,therowinterpretationofmatrix-vector
productrecoverstheoriginalsetofequation
0+1x1+2x2y1y2...
12
Y = = =X .0+1x2+2x2...0+1xm+2x2ym m
With the column interpretation, we immediately recognize that this is a problem of finding thethreecoefficients,orparameters,ofthelinearcombinationthatyieldsthedesiredm-vectorY,i.e.
1 x1 x2y1y2...
=0
12
Y =
=X .1...
x2...
x2+1 +2 ...
2ym 1 xm xmWeknowthat ifhavethreeormorenon-degeneratemeasurements(i.e.,m3),thenwecanfindtheuniquesolutiontothelinearsystem. Moreover,thesolutionisthecoefficientsoftheunderlying
, true, truepolynomial,(true ).0 1 2Example17.1.1AquadraticpolynomialLetusconsideramorespecificcase,wheretheunderlyingpolynomial isoftheform
1 2 1 2y(x) = + x cx .2 3 8
Werecognizethaty(x) =Ymodel(x;true)forYmodel(x;) =0+1x+2x2 andthetrueparameters1 2 1
true true true=
, = , and =
c .0 1 22 3 8
The parameter c controls the degree of quadratic dependency; in particular, c = 0 results in anaffinefunction.
First, we consider the case with c = 1, which results in a strong quadratic dependency, i.e.,true = 1/8. The result of measuring y at three non-degenerate points (m = 3) is shown inFigure17.1(a). Solvingthe33 linearsystemwiththecoefficientsastheunknown,weobtain
1 2 10 =
2, 1 =
3, and 2 =
8 .240
2
-
7/27/2019 MIT2 086F12 Notes Unit3
47/107
0 0.5 1 1.5 2 2.5 30.8
0.6
0.4
0.2
0
0.2
0.4
0.6
x
y
ymeas
y
0 0.5 1 1.5 2 2.5 30.8
0.6
0.4
0.2
0
0.2
0.4
0.6
x
y
ymeas
y
(a)m=3 (b)m= 7Figure17.1: Deducingthecoefficientsofapolynomialwithastrongquadraticdependence.
Notsurprisingly,
we
can
find
the
true
coefficients
of
the
quadratic
equation
using
three
data
points.
Suppose we take more measurements. An example of taking seven measurements (m = 7) is
shown in Figure 17.1(b). We now have seven data points and three unknowns, so we must solvethe 73 linear system, i.e., find the set ={0, 1, 2} that satisfies all seven equations. Thesolutiontothe linearsystem,ofcourse, isgivenby
1 2 10 = , 1 = , and 2 = .
2 3 8Theresultiscorrect(=true)and,inparticular,nodifferentfromtheresultforthem=3case.
Wecanmodifytheunderlyingpolynomialslightlyandrepeatthesameexercise. Forexample,let
us
consider
the
case
with
c= 1/10,
which
results
in
amuch
weaker
quadratic
dependency
of
y on x, i.e., true = 1/80. As shown in Figure 17.1.1, we can take either m = 3 or m = 72measurements. Similartothec=1case,weidentifythetruecoefficients,
1 2 10 = , 1 = , and 2 = ,
2 3 80usingtheeitherm= 3orm=7(infactusinganythreeormorenon-degeneratemeasurements).
In the friction coefficient determination and the (particle motion) polynomial identification
problems,wehaveseenthatwecanfindasolutiontothemnoverdeterminedsystem(m > n)if(a) ourmodel includestheunderlying input-outputfunctionaldependencenobias;(b) andthemeasurementsareperfectnonoise.
As already stated, in practice, these two assumptions are rarely satisfied; i.e., models are often(infact,always)incompleteandmeasurementsareoften inaccurate. (Forexample,inourparticlemotion model, we have neglected friction.) We can still construct a mn linearsystem Bz =gusing our model and measurements, but the solution to the system in general does not exist.Knowingthatwecannotfindthesolutiontotheoverdetermined linearsystem,ourobjective is
241
-
7/27/2019 MIT2 086F12 Notes Unit3
48/107
0 0.5 1 1.5 2 2.5 31
0.5
0
0.5
1
1.5
2
x
y
ymeas
y
0 0.5 1 1.5 2 2.5 31
0.5
0
0.5
1
1.5
2
x
y
ymeas
y
(a)m=3 (b)m= 7Figure17.2: Deducingthecoefficientsofapolynomialwithaweakquadraticdependence.
tofind
asolution
that
is
close
to
satisfying
the
solution.
In
the
following
section,
we
will
define
thenotionofclosenesssuitableforouranalysisandintroduceageneralprocedureforfindingtheclosestsolutiontoageneraloverdeterminedsystemoftheform
Bz=g ,where B Rmn with m > n. We will subsequently address the meaning and interpretation ofthis(non-)solution.17.2 OverdeterminedSystemsLet us consider an overdetermined linear system such as the one arising from the regressionexampleearlieroftheform
Bz=g ,or,moreexplicitly,
B11 B12 g1g2 z1 = .B21 B22 z2B31 B32 g3
Our objective is to find z that makes the three-component vector equation true, i.e., find thesolution
to
the
linear
system.
In
Chapter
16,
we
considered
the
forward
problem
of
matrix-vectormultiplicationinwhich,givenz,wecalculateg=Bz . Wealsobrieflydiscussedtheinverse
probleminwhichgivengwewouldliketofindz. Butform=n,B1 doesnotexist;asdiscussedintheprevioussection,theremaybenoz thatsatisfiesBz=g. Thus,wewillneedto look forazthatsatisfiestheequationcloselyinthesensewemustspecifyandinterpret. Thisisthefocusofthissection.1
1Note later (inUnitV) weshall look attheostensibly simpler case inwhichB issquare anda solutionz existsandisevenunique. But,formanyreasons,overdeterminedsystemsareanicerplacetostart.
242
-
7/27/2019 MIT2 086F12 Notes Unit3
49/107
Row InterpretationLet us consider a row interpretation of the overdetermined system. Satisfying the linear systemrequires
Bi1z1+Bi2z2 =gi, i= 1,2,3 .Note that each of these equations define a line inR2. Thus, satisfying the three equations isequivalent tofindinga point that is shared by all three lines, which in general is not possible, aswewilldemonstrate inanexample.Example17.2.1row interpretationofoverdeterminedsystemLetusconsideranoverdeterminedsystem
1 2 z1 = 5/22 .2 1z22 3 2
Using the row interpretation of the linear system, we see there are three linear equations to besatisfied. Thesetofpointsx= (x1, x2)thatsatisfiesthefirstequation,
51x1+ 2x2 = ,2
forma lineL1 ={(x1, x2): 1x2+ 2x2 = 5/2}
inthetwodimensionalspace. Similarly,thesetsofpointsthatsatisfythesecondandthirdequationsform linesdescribedby
L2 ={(x1, x2):2x1+ 1x2 = 2}L3 ={(x1, x2):2x13x2 =2} .
ThesesetofpointsinL1,L2,andL3,orthe lines,areshown inFigure17.3(a).The solution to the linear system must satisfy each of the three equations, i.e., belong to all
three lines. This means that there must be an intersection of all three lines and, if it exists, thesolution isthe intersection. This linearsystemhasthesolution
1/2z= .
1However,threelinesintersectinginR2 isarareoccurrence;infacttheright-handsideofthesystemwas chosen carefully so that the system has a solution in this example. If we perturb either thematrixortheright-handsideofthesystem,itislikelythatthethreelineswillnolongerintersect.
Amoretypicaloverdeterminedsystem isthefollowingsystem,
1 2 0 2 1 z1 = 2 .z22 3 4Again,interpretingthematrixequationasasystemofthreelinearequations,wecanillustratetheset of points that satisfy each equation as a line inR2 as shown in Figure 17.3(b). There is nosolutiontothisoverdeterminedsystem,becausethere isnopoint(z1, z2)thatbelongstoallthreelines, i.e.,thethree linesdonot intersectatapoint.
243
-
7/27/2019 MIT2 086F12 Notes Unit3
50/107
2 1 0 1 22
1
0
1
2
L1
L2
L3
2 1 0 1 22
1
0
1
2
L1
L2
L3
(a) systemwithasolution (b) systemwithoutasolutionFigure17.3: Illustrationoftherowinterpretationoftheoverdeterminedsystems. EachlineisasetofpointsthatsatisfiesBix=gi,i= 1,2,3.
ColumnInterpretationLet us now consider a column interpretation of the overdetermined system. Satisfying the linearsystemrequires
B11 B12 = g1g2 .z1 +z2 B21 B22B31 B32 g3
Inotherwords,weconsidera linearcombinationoftwovectors inR3 andtrytomatchtheright-handsidegR3. ThevectorsspanatmostaplaneinR3,sothereisnoweight(z1, z2)thatmakestheequationholdunlessthevectorghappenstolieintheplane. Toclarifytheidea,letusconsideraspecificexample.Example17.2.2column interpretationofoverdeterminedsystemForsimplicity, letusconsiderthefollowingspecialcase:
1 0 1 0 1 z1 = .3/2z20 0 2
Thecolumn interpretationresults in
1 0 1 0z1+ 1z2 = 3/2.0 0 2
Bychangingz1 andz2,wecanmove intheplane 1 0 z1 0z1+ 1z2 = .z20 0 0
244
-
7/27/2019 MIT2 086F12 Notes Unit3
51/107
2
1
0
1
2
21
01
2
1
0
1
2
col(B)
span(col(B))
g
Bz*
Figure17.4: Illustrationofthecolumn interpretationoftheoverdeterminedsystem.Clearly, ifg3 =0, it isnot possibletofindz1 andz2 thatsatisfythe linearequation,Bz =g. Inotherwords,gmust lie intheplanespannedbythecolumnsofB,whichisthe12plane inthiscase.Figure 17.4 illustrates the column interpretation of the overdetermined system. The vectorg R3 does not lie in the space spanned by the columns of B, thus there is no solution to the
system. However, ifg3 issmall,thenwecanfindaz suchthatBz isclosetog, i.e.,agoodapproximationtog. Suchanapproximation isshown inthefigure,andthenextsectiondiscusseshowtofindsuchanapproximation.
17.3 LeastSquares17.3.1 MeasuresofClosenessIn the previous section, we observed that it is in general not possible to find a solution to anoverdeterminedsystem. Ouraim isthustofindz suchthatBz isclosetog, i.e.,z suchthat
Bzg ,forBRmn,m > n. Forconvenience, letus introducetheresidual function,which isdefinedas
r(z)gBz .Notethat
nmri =gi(Bz)i =gi Bijzj, i= 1, . . . , m .j=1
Thus,ri istheextenttowhichi-thequation(Bz)i =gi isnotsatisfied. Inparticular,ifri(z)=0,i= 1, . . . , m,thenBz=g andz isthesolutiontothe linearsystem. Wenotethattheresidual isa measure of closeness described by mvalues. It ismoreconvenient to havea single scalar valueforassessingtheextenttowhichtheequation issatisfied. Asimplewaytoachievethis istotakea norm of the residual vector. Different norms result in different measures of closeness, which inturnproducedifferentbest-fitsolutions.
245
-
7/27/2019 MIT2 086F12 Notes Unit3
52/107
BeginAdvancedMaterialLetusconsiderfirsttwoexamples,neitherofwhichwewillpursue inthischapter.
Example17.3.1 1minimizationThe first method is based on measuring the residual in the 1-norm. The scalar representing theextentofmismatch is
m mm mJ1(z) Ir(z)I1 = |ri(z)|= |(gBz)i| .
i=1 i=1Thebestz,denotedbyz, isthez thatminimizestheextentofmismatchmeasured inJ1(z), i.e.
z =arg min J1(z) .zRm
TheargminzRn J1(z)returnstheargumentz thatminimizesthe functionJ1(z). Inotherwords,z satisfies
J1(z)J1(z), zRm .This minimization problem can be formulated as a linear programming problem. The minimizeris not necessarily unique and the solution procedure is not as simple as that resulting from the2-norm. Thus,wewillnotpursuethisoptionhere.
Example17.3.2 minimizationThesecondmethodisbasedonmeasuringtheresidualinthe-norm. Thescalarrepresentingtheextentofmismatch is
J(z) Ir(z)I = max |ri(z)|= max |(gBz)i|.i=1,...,m i=1,...,mThebestz thatminimizesJ(z) is
z =argminJ(z) .zRn
This so-called min-max problem can also be cast as a linear programming problem. Again, thisprocedureisrathercomplicated,andthesolution isnotnecessarilyunique.
EndAdvancedMaterial
17.3.2 Least-SquaresFormulation(.2minimization)Minimizingtheresidualmeasured in(say)the1-normor-normresultsinalinearprogrammingproblemthat isnotsoeasytosolve. Herewewillshowthatmeasuringtheresidual inthe2-normresults inaparticularlysimpleminimizationproblem. Moreover,thesolutiontotheminimizationproblemisuniqueassumingthatthematrixB isfullrankhasnindependentcolumns. WeshallassumethatB does indeedhave independentcolumns.
246
-
7/27/2019 MIT2 086F12 Notes Unit3
53/107
Thescalarfunctionrepresentingtheextentofmismatchfor2 minimization isJ2(z) Ir(z)I2 =rT(z)r(z) = (gBz)T(gBz) .2
Notethatweconsiderthesquareofthe2-normforconvenience,ratherthanthe2-normitself. Ourobjective istofindz suchthat
z
=arg
min
zRnJ2(z)
,
which isequivalenttofindz withIgBzI2 =J2(z)< J2(z) =IgBzI22, z=z .2
(Noteargminreferstotheargumentwhichminimizes: soministheminimum andargministheminimizer.) NotethatwecanwriteourobjectivefunctionJ2(z)as
mmJ2(z) =Ir(z)I2 =rT(z)r(z) = (ri(z))2 .2
i=1
Inotherwords,ourobjectiveistominimizethesumofthesquareoftheresiduals,i.e.,leastsquares. Thus,wesaythatz istheleast-squaressolutiontotheoverdeterminedsystemBz=g: z isthatz whichmakesJ2(z)thesumofthesquaresoftheresidualsassmallaspossible.
Note that if Bz = g does have a solution, the least-squares solution is the solution to theoverdeterminedsystem. Ifz isthesolution,thenr=Bzg=0andinparticularJ2(z)=0,which
istheminimumvaluethatJ2 cantake. Thus,thesolutionz istheminimizerofJ2: z=z . Letusnowderiveaprocedureforsolvingtheleast-squaresproblemforamoregeneralcasewhereBz=gdoesnothaveasolution.
Forconvenience,wedropthesubscript2oftheobjective functionJ2,andsimplydenote itbyJ. Again,ourobjective istofindz suchthat
J(z)< J(z),
z=z .ExpandingouttheexpressionforJ(z),wehave
J(z) = (gBz)T(gBz) = (gT(Bz)T)(gBz)=gT(gBz)(Bz)T(gBz)
T=g ggTBz(Bz)Tg+ (Bz)T(Bz)T TBT=g ggTBzz g+zTBTBz ,
wherewehaveusedthetransposerulewhichtellsusthat(Bz)T =zTBT. WenotethatgTBz isascalar,so itdoesnotchangeunderthetransposeoperation. Thus,gTBz canbeexpressedas
TBTgTBz= (gTBz)T =z g ,againbythetransposerule. ThefunctionJ thussimplifiesto
T TBTJ(z) =g g2z g+zTBTBz .Forconvenience, letusdefineNBTBRnn,sothat
T TBTJ(z) =g g2z g+zTN z .247
-
7/27/2019 MIT2 086F12 Notes Unit3
54/107
It issimpletoconfirmthateachterm intheaboveexpression is indeedascalar.Thesolutiontotheminimizationproblem isgivenby
Nz =d ,whered=BTg. Theequation iscalledthenormalequation,whichcanbewrittenoutas
N11 N12 N1n d1z1
z2... = .N21 N22 N2n. . ... . . ... . . d2...
"
Nn1 Nn2 Nnn zn dn
'
Theexistenceanduniquenessofz isguaranteedassumingthatthecolumnsofBareindependent.Weprovidebelowtheproofthatz istheuniqueminimizerofJ(z)inthecaseinwhichBhas
independentcolumns.
Proof. WefirstshowthatthenormalmatrixN issymmetricpositivedefinite,i.e.xTN x>0, xRn (x=0) ,
assumingthecolumnsofB are linearly independent. ThenormalmatrixN =BTB issymmetricbecause
"
NT = (BTB)T =BT(BT)T =BTB=N .ToshowN ispositivedefinite,wefirstobservethat
xTN x=xTBTBx= (Bx)T(Bx) =IBxI2 .That is, xTN x is the 2-norm of Bx. Recall that the norm of a vector is zero if and only if thevectoristhezerovector. Inourcase,
xTNx=0 ifandonly if Bx= 0 .
'
BecausethecolumnsofB are linearly independent,Bx=0 ifandonlyifx=0. Thus,wehavexTNx=IBxI2 >0, x= 0 .
Thus,N issymmetricpositivedefinite.Nowrecallthatthefunctiontobeminimizedis
T TBTJ(z) =g g2z g+zTN z .Ifz minimizesthefunction,thenforanyz=0,wemusthave
J(z)< J(z +z) ;
Letus
expand
J(z
+
z):
T J(z +z) =g g2(z +z)TBTg+ (z +z)TN(z +z),
T BT=g g2z g+ (z)TNz2zTBTg+zTN z+ (z)TN z +zTN z ,v vJ(z) zTNTz=zTN z
=J(z) + 2zT(NzBTg) +zTN z .
248
=
=
=
-
7/27/2019 MIT2 086F12 Notes Unit3
55/107
Note that NT = N because NT = (BTB)T = BTB = N. If z satisfies the normal equation,Nz =BTg,then
NzBTg= 0 ,andthus
J(z +z) =J(z) +zTNz .Thesecondterm isalwayspositivebecauseN ispositivedefinite. Thus,wehave
J(z +z)> J(z), z= 0 ,or,settingz=zz ,
J(z)< J(z), z=z .Thus, z satisfying the normal equation N z =BTg is the minimizer of J, i.e., the least-squares
solutiontotheoverdeterminedsystemBz=g.
Example17.3.3 21 least-squaresand itsgeometric interpretationConsiderasimplecaseofaoverdeterminedsystem,
B= 2 z = 1 .1 2
Because the system is 21, there is a single scalar parameter, z, to be chosen. To obtain thenormalequation,wefirstconstructthematrixN andthevectord(bothofwhicharesimplyscalarforthisproblem):
2N =BTB= 2 1 = 51
d=BTg= 2 1 1 = 4 .2
Solvingthenormalequation,weobtaintheleast-squaressolution Nz =d 5z = 4 z = 4/5 .
Thischoiceofz yields2 4 8/5
Bz = = ,1 5 4/5whichofcourse isdifferentfromg.
TheprocessisillustratedinFigure17.5. ThespanofthecolumnofB isthelineparameterizedT
by 2 1 z,zR. RecallthatthesolutionBz isthepointonthe linethat isclosesttog inthe least-squaressense, i.e.
IBzgI2
-
7/27/2019 MIT2 086F12 Notes Unit3
56/107
0.5 0 0.5 1 1.5 2 2.5
0.5
0
0.5
1
1.5
2
2.5
col(B)
span(col(B))
g
Bz*
Figure17.5: Illustrationof least-squares inR2.Recallingthatthe2 distanceistheusualEuclideandistance,weexpecttheclosestpointtobetheorthogonalprojectionofg ontothe line span(col(B)). Thefigureconfirmsthatthis indeed isthecase. Wecanverifythisalgebraically,
2 4 1 3/5BT(Bzg) = 2 1 = 2 1 = 0 .
1 5 2 6/5Thus,theresidualvectorBz g andthecolumnspaceofB areorthogonaltoeachother. Whilethegeometric illustrationoforthogonalitymaybedifficult forahigher-dimensional leastsquares,theorthogonalityconditioncanbecheckedsystematicallyusingthealgebraicmethod.
17.3.3 ComputationalConsiderationsLet us analyze the computational cost of solving the least-squares system. The first step is theformulationofthenormalmatrix,
N =BTB ,whichrequiresamatrix-matrixmultiplicationofBT Rnm andBRmn. BecauseN issymmetric,weonlyneedtocomputetheuppertriangularpartofN,whichcorrespondstoperformingn(n+1)/2 m-vector inner products. Thus, the computational cost is mn(n+1). Forming theright-handside,