data modeling and least squares fitting 2

Data Modeling andData Modeling andLeast Squares Fitting 2Least Squares Fitting 2

COS 323COS 323

Nonlinear Least SquaresNonlinear Least Squares

• Some problems can be rewritten to linearSome problems can be rewritten to linear

• Fit data points (xFit data points (xii, log y, log yii) to a) to a**+bx, a = e+bx, a = ea*a*

• Big problem: this no longer minimizesBig problem: this no longer minimizessquared error!squared error!

bxay

aey bx

)(log)(log bxay

aey bx

)(log)(log

Nonlinear Least SquaresNonlinear Least Squares

• Can write error function, minimize Can write error function, minimize directlydirectly

• For the exponential, no analytic solution For the exponential, no analytic solution for a, b:for a, b:

etc.,0,0Set

),,,(22

ba

baxfyi

ii

etc.,0,0Set

),,,(22

ba

baxfyi

ii

02

02

22

i

bxi

bxi

i

bxi

bx

i

bxi

ii

ii

i

aeyeaxb

aeyea

aey

02

02

22

i

bxi

bxi

i

bxi

bx

i

bxi

ii

ii

i

aeyeaxb

aeyea

aey

Newton’s MethodNewton’s Method

• Apply Newton’s method for Apply Newton’s method for minimization:minimization:

where H is Hessian (matrix of all 2where H is Hessian (matrix of all 2ndnd derivatives) and G is gradient (vector of derivatives) and G is gradient (vector of all 1all 1stst derivatives) derivatives)

GHb

a

b

a

ii

1

1

GHb

a

b

a

ii

1

1

Newton’s Method for Least Newton’s Method for Least SquaresSquares

• Gradient has 1Gradient has 1stst derivatives of f, Hessian derivatives of f, Hessian 22ndnd

2

2222

22

2

22

2

2

)()(

)()(

)(

)(

22

),,,(2

),,,(2

),,,(

bba

baa

iiib

fi

iiaf

b

a

iii

H

baxfy

baxfy

G

baxfy

2

2222

22

2

22

2

2

)()(

)()(

)(

)(

22

),,,(2

),,,(2

),,,(

bba

baa

iiib

fi

iiaf

b

a

iii

H

baxfy

baxfy

G

baxfy

Gauss-Newton IterationGauss-Newton Iteration

• Consider 1 term of Hessian:Consider 1 term of Hessian:

• If close to answer, first term close to 0If close to answer, first term close to 0

• Gauss-Newtion method: ignore first term!Gauss-Newtion method: ignore first term!– Eliminates requirement to calculate 2Eliminates requirement to calculate 2ndnd

derivatives of fderivatives of f

– Surprising fact: still superlinear convergence ifSurprising fact: still superlinear convergence if“close enough” to answer“close enough” to answer

iaf

af

iii

a

f

iiia

f

baxfy

baxfyaa

2),,,(2

),,,(2)(

2

2

2

22

iaf

af

iii

a

f

iiia

f

baxfy

baxfyaa

2),,,(2

),,,(2)(

2

2

2

22

Levenberg-MarquardtLevenberg-Marquardt

• Newton (and Gauss-Newton) work well Newton (and Gauss-Newton) work well when close to answer, terribly when far when close to answer, terribly when far awayaway

• Steepest descent safe when far awaySteepest descent safe when far away

• Levenberg-Marquardt idea: let’s do bothLevenberg-Marquardt idea: let’s do both

GGb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

GGb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

SteepestSteepestdescentdescent

Gauss-Gauss-NewtonNewton


• Trade off between constants depending on Trade off between constants depending on how far away you are…how far away you are…

• Clever way of doing this:Clever way of doing this:

• If If is small, mostly like Gauss-Newton is small, mostly like Gauss-Newton

• If If is big, matrix becomes mostly diagonal, is big, matrix becomes mostly diagonal,behaves like steepest descentbehaves like steepest descent

Gb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

)1(

)1(

Gb

a

b

a

bf

bf

bf

af

bf

af

af

af

ii

1

1

)1(

)1(


• Final bit of cleverness: adjust Final bit of cleverness: adjust depending on how well we’re doingdepending on how well we’re doing– Start with some Start with some , e.g. 0.001, e.g. 0.001

– If last iteration If last iteration decreaseddecreased error, error, acceptaccept the the step and step and decreasedecrease to to /10/10

– If last iteration If last iteration increasedincreased error, error, rejectreject the the step and step and increaseincrease to 10to 10

• Result: fairly stable algorithm, not too Result: fairly stable algorithm, not too painful (no 2painful (no 2ndnd derivatives), used a lot derivatives), used a lot

OutliersOutliers

• A lot of derivations assume Gaussian A lot of derivations assume Gaussian distribution for errorsdistribution for errors

• Unfortunately, nature (and experimenters)Unfortunately, nature (and experimenters)sometimes don’t cooperatesometimes don’t cooperate

• Outliers: points with extremely low Outliers: points with extremely low probability of occurrence (according to probability of occurrence (according to Gaussian statistics)Gaussian statistics)

• Can have strong influence on least squaresCan have strong influence on least squares

probabilityprobability

GaussianGaussian

Non-GaussianNon-Gaussian

Robust EstimationRobust Estimation

• Goal: develop parameter estimation Goal: develop parameter estimation methods insensitive to methods insensitive to smallsmall numbers of numbers of largelarge errors errors

• General approach: try to give large General approach: try to give large deviations less weightdeviations less weight

• M-estimators: minimize some function M-estimators: minimize some function other than square of y – f(x,a,b,…)other than square of y – f(x,a,b,…)

Least Absolute Value FittingLeast Absolute Value Fitting

• MinimizeMinimizeinstead ofinstead of

• Points far away from trend get Points far away from trend get comparativelycomparativelyless influenceless influence

i

ii baxfy ),,,( i

ii baxfy ),,,(

2),,,(

iii baxfy 2

),,,( i

ii baxfy

Example: ConstantExample: Constant

• For constant function y = a,For constant function y = a,minimizing minimizing (y–a)(y–a)22 gave a = mean gave a = mean

• Minimizing Minimizing |y–a| gives a = median|y–a| gives a = median

Doing Robust FittingDoing Robust Fitting

• In general case, nasty function:In general case, nasty function:discontinuous derivativediscontinuous derivative

• Simplex method often a good choiceSimplex method often a good choice

Iteratively Reweighted Least Iteratively Reweighted Least SquaresSquares

• Sometimes-used approximation:Sometimes-used approximation:convert to iterated weighted least convert to iterated weighted least squaressquares

with wwith wii based on previous iteration based on previous iteration

2

2

),,,(

),,,(),,,(

1

),,,(

baxfyw

baxfybaxfy

baxfy

iii

i

iii ii

iii

2

2

),,,(

),,,(),,,(

1

),,,(

baxfyw

baxfybaxfy

baxfy

iii

i

iii ii

iii


• Different options for weightsDifferent options for weights– Avoid problems with infinitiesAvoid problems with infinities

– Give even less weight to outliersGive even less weight to outliers

2),,,(

2),,,(

1

),,,(

1

),,,(

1

baxfyki

ii

i

iii

iii

iiew

baxfykw

baxfykw

baxfyw

2),,,(

2),,,(

1

),,,(

1

),,,(

1

baxfyki

ii

i

iii

iii

iiew

baxfykw

baxfykw

baxfyw


• Danger! This is not guaranteed to Danger! This is not guaranteed to convergeconvergeto the right answer!to the right answer!– Needs good starting point, which is Needs good starting point, which is

available ifavailable ifinitial least squares estimator is reasonableinitial least squares estimator is reasonable

– In general, works OK if few outliers, not too In general, works OK if few outliers, not too far offfar off

Outlier Detection and RejectionOutlier Detection and Rejection

• Special case of IRWLS: set weight = 0 if Special case of IRWLS: set weight = 0 if outlier, 1 otherwiseoutlier, 1 otherwise

• Detecting outliers: Detecting outliers: ((yyii–f(x–f(xii))))22 > threshold > threshold

– One choice: multiple of mean squared differenceOne choice: multiple of mean squared difference

– Better choice: multiple of Better choice: multiple of medianmedian squared squared differencedifference

– Can iterate…Can iterate…

– As before, not guaranteed to do anything As before, not guaranteed to do anything reasonable, tends to work OK if only a few reasonable, tends to work OK if only a few outliersoutliers

RANSACRANSAC

• RANRANdom dom SASAmple mple CConsensus: desgined foronsensus: desgined forbad data (in best case, up to 50% outliers)bad data (in best case, up to 50% outliers)

• Take many random subsets of dataTake many random subsets of data– Compute least squares fit for each sampleCompute least squares fit for each sample

– See how many points agree: See how many points agree: ((yyii–f(x–f(xii))))22 < threshold < threshold

– Threshold user-specified or estimated from more Threshold user-specified or estimated from more trialstrials

• At end, use fit that agreed with most pointsAt end, use fit that agreed with most points– Can do one final least squares with all inliersCan do one final least squares with all inliers

data modeling and least squares fitting 2

Documents

error function

squared error

term close

term of hessian

gaussnewtion method

nasty function

derivativesnewtons method

hessian matrix