data modeling and least squares fitting 2

19
Data Modeling and Data Modeling and Least Squares Fitting 2 Least Squares Fitting 2 COS 323 COS 323

Upload: zephr-fry

Post on 31-Dec-2015

54 views

Category:

Documents


3 download

DESCRIPTION

Data Modeling and Least Squares Fitting 2. COS 323. Nonlinear Least Squares. Some problems can be rewritten to linear Fit data points (x i , log y i ) to a * +bx, a = e a* Big problem: this no longer minimizes squared error!. Nonlinear Least Squares. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Modeling and Least Squares Fitting 2

Data Modeling andData Modeling andLeast Squares Fitting 2Least Squares Fitting 2

COS 323COS 323

Page 2: Data Modeling and Least Squares Fitting 2

Nonlinear Least SquaresNonlinear Least Squares

• Some problems can be rewritten to linearSome problems can be rewritten to linear

• Fit data points (xFit data points (xii, log y, log yii) to a) to a**+bx, a = e+bx, a = ea*a*

• Big problem: this no longer minimizesBig problem: this no longer minimizessquared error!squared error!

bxay

aey bx

)(log)(log bxay

aey bx

)(log)(log

Page 3: Data Modeling and Least Squares Fitting 2

Nonlinear Least SquaresNonlinear Least Squares

• Can write error function, minimize Can write error function, minimize directlydirectly

• For the exponential, no analytic solution For the exponential, no analytic solution for a, b:for a, b:

etc.,0,0Set

),,,(22

ba

baxfyi

ii

etc.,0,0Set

),,,(22

ba

baxfyi

ii

02

02

22

i

bxi

bxi

i

bxi

bx

i

bxi

ii

ii

i

aeyeaxb

aeyea

aey

02

02

22

i

bxi

bxi

i

bxi

bx

i

bxi

ii

ii

i

aeyeaxb

aeyea

aey

Page 4: Data Modeling and Least Squares Fitting 2

Newton’s MethodNewton’s Method

• Apply Newton’s method for Apply Newton’s method for minimization:minimization:

where H is Hessian (matrix of all 2where H is Hessian (matrix of all 2ndnd derivatives) and G is gradient (vector of derivatives) and G is gradient (vector of all 1all 1stst derivatives) derivatives)

GHb

a

b

a

ii

1

1

GHb

a

b

a

ii

1

1

Page 5: Data Modeling and Least Squares Fitting 2

Newton’s Method for Least Newton’s Method for Least SquaresSquares

• Gradient has 1Gradient has 1stst derivatives of f, Hessian derivatives of f, Hessian 22ndnd

2

2222

22

2

22

2

2

)()(

)()(

)(

)(

22

),,,(2

),,,(2

),,,(

bba

baa

iiib

fi

iiaf

b

a

iii

H

baxfy

baxfy

G

baxfy

2

2222

22

2

22

2

2

)()(

)()(

)(

)(

22

),,,(2

),,,(2

),,,(

bba

baa

iiib

fi

iiaf

b

a

iii

H

baxfy

baxfy

G

baxfy

Page 6: Data Modeling and Least Squares Fitting 2

Gauss-Newton IterationGauss-Newton Iteration

• Consider 1 term of Hessian:Consider 1 term of Hessian:

• If close to answer, first term close to 0If close to answer, first term close to 0

• Gauss-Newtion method: ignore first term!Gauss-Newtion method: ignore first term!– Eliminates requirement to calculate 2Eliminates requirement to calculate 2ndnd

derivatives of fderivatives of f

– Surprising fact: still superlinear convergence ifSurprising fact: still superlinear convergence if“close enough” to answer“close enough” to answer

iaf

af

iii

a

f

iiia

f

baxfy

baxfyaa

2),,,(2

),,,(2)(

2

2

2

22

iaf

af

iii

a

f

iiia

f

baxfy

baxfyaa

2),,,(2

),,,(2)(

2

2

2

22

Page 7: Data Modeling and Least Squares Fitting 2

Levenberg-MarquardtLevenberg-Marquardt

• Newton (and Gauss-Newton) work well Newton (and Gauss-Newton) work well when close to answer, terribly when far when close to answer, terribly when far awayaway

• Steepest descent safe when far awaySteepest descent safe when far away

• Levenberg-Marquardt idea: let’s do bothLevenberg-Marquardt idea: let’s do both

GGb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

GGb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

SteepestSteepestdescentdescent

Gauss-Gauss-NewtonNewton

Page 8: Data Modeling and Least Squares Fitting 2

Levenberg-MarquardtLevenberg-Marquardt

• Trade off between constants depending on Trade off between constants depending on how far away you are…how far away you are…

• Clever way of doing this:Clever way of doing this:

• If If is small, mostly like Gauss-Newton is small, mostly like Gauss-Newton

• If If is big, matrix becomes mostly diagonal, is big, matrix becomes mostly diagonal,behaves like steepest descentbehaves like steepest descent

Gb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

)1(

)1(

Gb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

)1(

)1(

Page 9: Data Modeling and Least Squares Fitting 2

Levenberg-MarquardtLevenberg-Marquardt

• Final bit of cleverness: adjust Final bit of cleverness: adjust depending on how well we’re doingdepending on how well we’re doing– Start with some Start with some , e.g. 0.001, e.g. 0.001

– If last iteration If last iteration decreaseddecreased error, error, acceptaccept the the step and step and decreasedecrease to to /10/10

– If last iteration If last iteration increasedincreased error, error, rejectreject the the step and step and increaseincrease to 10to 10

• Result: fairly stable algorithm, not too Result: fairly stable algorithm, not too painful (no 2painful (no 2ndnd derivatives), used a lot derivatives), used a lot

Page 10: Data Modeling and Least Squares Fitting 2

OutliersOutliers

• A lot of derivations assume Gaussian A lot of derivations assume Gaussian distribution for errorsdistribution for errors

• Unfortunately, nature (and experimenters)Unfortunately, nature (and experimenters)sometimes don’t cooperatesometimes don’t cooperate

• Outliers: points with extremely low Outliers: points with extremely low probability of occurrence (according to probability of occurrence (according to Gaussian statistics)Gaussian statistics)

• Can have strong influence on least squaresCan have strong influence on least squares

probabilityprobability

GaussianGaussian

Non-GaussianNon-Gaussian

Page 11: Data Modeling and Least Squares Fitting 2

Robust EstimationRobust Estimation

• Goal: develop parameter estimation Goal: develop parameter estimation methods insensitive to methods insensitive to smallsmall numbers of numbers of largelarge errors errors

• General approach: try to give large General approach: try to give large deviations less weightdeviations less weight

• M-estimators: minimize some function M-estimators: minimize some function other than square of y – f(x,a,b,…)other than square of y – f(x,a,b,…)

Page 12: Data Modeling and Least Squares Fitting 2

Least Absolute Value FittingLeast Absolute Value Fitting

• MinimizeMinimizeinstead ofinstead of

• Points far away from trend get Points far away from trend get comparativelycomparativelyless influenceless influence

i

ii baxfy ),,,( i

ii baxfy ),,,(

2),,,(

iii baxfy 2

),,,( i

ii baxfy

Page 13: Data Modeling and Least Squares Fitting 2

Example: ConstantExample: Constant

• For constant function y = a,For constant function y = a,minimizing minimizing (y–a)(y–a)22 gave a = mean gave a = mean

• Minimizing Minimizing |y–a| gives a = median|y–a| gives a = median

Page 14: Data Modeling and Least Squares Fitting 2

Doing Robust FittingDoing Robust Fitting

• In general case, nasty function:In general case, nasty function:discontinuous derivativediscontinuous derivative

• Simplex method often a good choiceSimplex method often a good choice

Page 15: Data Modeling and Least Squares Fitting 2

Iteratively Reweighted Least Iteratively Reweighted Least SquaresSquares

• Sometimes-used approximation:Sometimes-used approximation:convert to iterated weighted least convert to iterated weighted least squaressquares

with wwith wii based on previous iteration based on previous iteration

2

2

),,,(

),,,(),,,(

1

),,,(

baxfyw

baxfybaxfy

baxfy

iii

i

iii ii

iii

2

2

),,,(

),,,(),,,(

1

),,,(

baxfyw

baxfybaxfy

baxfy

iii

i

iii ii

iii

Page 16: Data Modeling and Least Squares Fitting 2

Iteratively Reweighted Least Iteratively Reweighted Least SquaresSquares

• Different options for weightsDifferent options for weights– Avoid problems with infinitiesAvoid problems with infinities

– Give even less weight to outliersGive even less weight to outliers

2),,,(

2),,,(

1

),,,(

1

),,,(

1

baxfyki

ii

i

iii

iii

iiew

baxfykw

baxfykw

baxfyw

2),,,(

2),,,(

1

),,,(

1

),,,(

1

baxfyki

ii

i

iii

iii

iiew

baxfykw

baxfykw

baxfyw

Page 17: Data Modeling and Least Squares Fitting 2

Iteratively Reweighted Least Iteratively Reweighted Least SquaresSquares

• Danger! This is not guaranteed to Danger! This is not guaranteed to convergeconvergeto the right answer!to the right answer!– Needs good starting point, which is Needs good starting point, which is

available ifavailable ifinitial least squares estimator is reasonableinitial least squares estimator is reasonable

– In general, works OK if few outliers, not too In general, works OK if few outliers, not too far offfar off

Page 18: Data Modeling and Least Squares Fitting 2

Outlier Detection and RejectionOutlier Detection and Rejection

• Special case of IRWLS: set weight = 0 if Special case of IRWLS: set weight = 0 if outlier, 1 otherwiseoutlier, 1 otherwise

• Detecting outliers: Detecting outliers: ((yyii–f(x–f(xii))))22 > threshold > threshold

– One choice: multiple of mean squared differenceOne choice: multiple of mean squared difference

– Better choice: multiple of Better choice: multiple of medianmedian squared squared differencedifference

– Can iterate…Can iterate…

– As before, not guaranteed to do anything As before, not guaranteed to do anything reasonable, tends to work OK if only a few reasonable, tends to work OK if only a few outliersoutliers

Page 19: Data Modeling and Least Squares Fitting 2

RANSACRANSAC

• RANRANdom dom SASAmple mple CConsensus: desgined foronsensus: desgined forbad data (in best case, up to 50% outliers)bad data (in best case, up to 50% outliers)

• Take many random subsets of dataTake many random subsets of data– Compute least squares fit for each sampleCompute least squares fit for each sample

– See how many points agree: See how many points agree: ((yyii–f(x–f(xii))))22 < threshold < threshold

– Threshold user-specified or estimated from more Threshold user-specified or estimated from more trialstrials

• At end, use fit that agreed with most pointsAt end, use fit that agreed with most points– Can do one final least squares with all inliersCan do one final least squares with all inliers