uva cs 4501: machine learning lecture 5: non …...rbf= radial-basis function:a function which...

82
UVA CS 4501: Machine Learning Lecture 5: Non-Linear Regression Models Dr. Yanjun Qi University of Virginia Department of Computer Science 10/6/18 Dr. Yanjun Qi / UVA CS 1

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

UVACS4501:MachineLearning

Lecture5:Non-LinearRegressionModels

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

10/6/18

Dr.YanjunQi/UVACS

1

Page 2: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Wherearewe?èFivemajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

10/6/18 2

Dr.YanjunQi/UVACS

Page 3: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Regression(supervised)q Fourwaystotrain/performoptimizationforlinearregressionmodelsq NormalEquationq GradientDescent(GD)q StochasticGDq Newton’smethod

qSupervisedregressionmodelsqLinearregression(LR)qLRwithnon-linearbasisfunctionsqLocallyweightedLRqLRwithRegularizations

10/6/18 3

Dr.YanjunQi/UVACS

Page 4: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

TodayèRegression(supervised)

q Fourwaystotrain/performoptimizationforlinearregressionmodelsq NormalEquationq GradientDescent(GD)q StochasticGDq Newton’smethod

qSupervisedregressionmodelsqLinearregression(LR)qLRwithnon-linearbasisfunctionsqLocallyweightedLRqLRwithRegularizations

10/6/18 4

Dr.YanjunQi/UVACS

Page 5: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Today

q RegressionModelsBeyondLinear– LRwithnon-linearbasisfunctions– Instance-basedRegression:K-NearestNeighbors(later)

– Locallyweightedlinearregression(extra)–RegressiontreesandMultilinear Interpolation(later)

10/6/18 5

Dr.YanjunQi/UVACS

Page 6: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

6

LRwithnon-linearbasisfunctions

• LRdoesnotmeanwecanonlydealwithlinearrelationships

y =θ0 + θ jϕ j(x)j=1m∑ =θTϕ(x)

10/6/18

y =θ Tx

Page 7: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

7

LRwithnon-linearbasisfunctions

• Wearefreetodesignbasisfunctions(e.g.,non-linearfeatures:

Herearefixedbasisfunctions(alsodefine)

• E.g.:polynomialregression:

ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T

10/6/18

!!ϕ j(x) !!ϕ0(x)=1

Page 8: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

e.g.(1)polynomialregression

10/6/18

Dr.YanjunQi/UVACS

8

θ * = ϕTϕ( )−1ϕT !y( ) yXXX TT !1−=*θ

y =θTϕ(x)y =θ Tx

ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T

Page 9: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

e.g.(1)polynomialregression

10/6/18

Dr.YanjunQi/UVACS

9

KEY:ifthebasesaregiven,theproblemof

learningtheparameters isstilllinear.

y =θTϕ(x)y =θ Tx

Page 10: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

10

ManyPossibleBasisfunctions• Therearemanybasisfunctions,e.g.:

– Polynomial

– Radialbasisfunctions

– Sigmoidal

– Splines,– Fourier,– Wavelets,etc

ϕ j (x) = xj−1

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2sx

x jj

µφ exp)(

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

sx

x jj

µσφ )(

10/6/18

Page 11: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

ManyPossibleBasisfunctions

10/6/18

Dr.YanjunQi/UVACS

11

Page 12: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

12

e.g.(2)LRwithradial-basisfunctions

• E.g.:LRwithRBFregression:

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

10/6/18

!!ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3

(x ,r3),Kλ4(x ,r4 )⎡⎣

⎤⎦T

θ * = ϕTϕ( )−1ϕT !y

Page 13: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Kλ(x ,r)= exp − (x − r)

2

2λ2⎛

⎝⎜⎞

⎠⎟

RBF = radial-basisfunction: afunctionwhichdependsonlyontheradial distancefromacentre point

GaussianRBFè

asdistance fromthecenterr increases, theoutputoftheRBFdecreases

1Dcase 2Dcase

10/6/18

Dr.YanjunQi/UVACS

13

Page 14: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18

Dr.YanjunQi/UVACS

14

Kλ(x ,r)= exp − (x − r)

2

2λ2⎛

⎝⎜⎞

⎠⎟

X=

10.6065307

0.1353353

0.0001234098

Kλ(x ,r)=r

r +λ

r +2λr +3λ

Page 15: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

e.g.anotherLinearregressionwith1DRBFbasisfunctions

(assuming3predefinedcentres andwidth)

10/6/18

Dr.YanjunQi/UVACS

15

ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3

(x ,r3)⎡⎣

⎤⎦T

θ * = ϕTϕ( )−1ϕT !y

Page 16: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

16

e.g.aLRwith1DRBFs(3predefinedcentres andwidth)

• 1DRBF

• Afterfit:

10/6/18

Page 17: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

e.g.EvenmorepossibleBasisFunc?

10/6/18

Dr.YanjunQi/UVACS

17

Page 18: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

(2) Multivariate Linear Regression with basis Expansion

Regression

Y = Weighted linear sum of (X basis expansion)

SSE

Linear algebra

Regression coefficients

Task

Representation

Score Function

Search/Optimization

Models, Parameters

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

10/6/18 18

Dr.YanjunQi/UVACS

Page 19: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Twomainissues:

• ToLearntheparameter– AlmostthesameasLR,justè Xto– Linearcombinationofbasisfunctions(thatcanbenon-linear)

• Howtochoosethemodelorder,– E.g.whatpolynomialdegreeforpolynomialregression

– E.g.,wheretoputthecentersfortheRBFkernels?Howwide?

10/6/18

Dr.YanjunQi/UVACS

19

ϕ(x)θ *

Page 20: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

20

e.g.2DGoodandBadRBFBasis

• Agood2DRBF

• Twobad2DRBFs

10/6/18

Page 21: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Issue:OverfittingandUnderfitting

x

y

y=f(x)+noiseCanwelearnaregression ffromthedata?

Let’sconsiderthreemethods…

Page 22: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LinearRegression

x

y

Page 23: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

QuadraticRegression

x

y

Page 24: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Join-the-dots

x

y

Alsoknownaspiecewiselinearnonparametric regression ifthatmakesyoufeelbetter

Page 25: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Whichisbest?

x

y

x

y

Whynotchoosethemethodwiththebestfittothedata?

Page 26: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Whatdowereallywant?

x

y

x

y

Whynotchoosethemethodwiththebestfittothedata?

“Howwellareyougoing topredictfuturedatadrawnfromthesamedistribution?”

Page 27: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Whatdowereallywant?

x

y

x

y

Whynotchoosethemethodwiththebestfittothedata?

“Howwellareyougoing topredictfuturedatadrawnfromthesamedistribution?”

Underfit Good? Overfit

Page 28: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

28

Issue:Overfitting andunderfitting

xy 10 θθ += 2210 xxy θθθ ++= ∑ =

= 5

0jj

j xy θ

10/6/18

K-foldCrossValidation/Train-Test/

Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”new dataexamples

Underfit Looksgood Overfit

Page 29: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

x

y

1.Randomlychoosesomepercentagelike30% ofthelabeleddatatobeinatestset2.Theremainder isatrainingset

Credit:Prof.AndrewMoore

Page 30: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

x

y

1.Randomlychoosesomepercentagelike30% ofthelabeleddatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset

(Linearregressionexample)

Credit:Prof.AndrewMoore

Page 31: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

x

y

1.Randomlychoose30%ofthedatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset4.Estimateyour futureperformancewiththetestset

(Linearregressionexample)MeanSquaredError=2.4

Credit:Prof.AndrewMoore

Page 32: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Evaluation: e.g. train / test split as follows

10/6/18 32

Xtrain =

−− x1T −−

−− x2T −−

! ! !−− xn

T −−

"

#

$$$$$

%

&

'''''

!ytrain =

y1y2"yn

!

"

#####

$

%

&&&&&

Xtest =

−− xn+1T −−

−− xn+2T −−

! ! !−− xn+m

T −−

"

#

$$$$$

%

&

'''''

!ytest =

yn+1yn+2"yn+m

!

"

#####

$

%

&&&&&

Dr.YanjunQi/UVACS

Page 33: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18 33

TestingMSEErrortoreport:

Jtest =1m

(x iTθ * − yi )2i=n+1

n+m

∑ = 1m

ε i2

i=n+1

n+m

Dr.YanjunQi/UVACS

InHomework,whenweaskforplotsof trainingerror,weaskfortheMSEper-sampletrainerrors;BecauseitiscomparabletotestMSEerror.

Page 34: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18

Dr.YanjunQi/

34

Jtrain−MSE =

1n

(x iTθ * − yi )2i=1

n

• TrainMSEErrortoobserve:

InHomework,whenweaskforplotsof trainingerror,weaskfortheMSEper-sampletrainerrors;BecauseitiscomparabletotestMSEerror.

Inmanysituations,visualizingTrain-MSEcanbehelpful tounderstand thebehaviorofyourmethod, e.g.,theinfluenceofthehyperparameteryouchose……

Page 35: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

x

y

1.Randomlychoose30%ofthedatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset4.Estimateyour futureperformancewiththetestset

(Quadraticregressionexample)MeanSquaredError=0.9

Credit:Prof.AndrewMoore

Page 36: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

x

y

1.Randomlychoose30%ofthedatatobeinatestset2.Theremainder isatrainingset3.Performyour regressiononthetrainingset4.Estimateyour futureperformancewiththetestset

(Join thedotsexample)MeanSquaredError=2.2

Credit:Prof.AndrewMoore

Page 37: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

Goodnews:•Veryverysimple•Canthensimplychoosethemethodwiththebesttest-setscoreBadnews:•What’sthedownside?

Credit:Prof.AndrewMoore

Page 38: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Thetestsetmethod

Goodnews:•Veryverysimple•Canthensimplychoosethemethodwiththebesttest-setscoreBadnews:•Wastesdata:wegetanestimateofthebestmethodtoapplyto30%lessdata•Ifwedon’thavemuchdata,ourtest-setmightjustbeluckyorunlucky

We say the “test-set estimator of performance has high variance”

Credit:Prof.AndrewMoore

Page 39: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Regression: Complexity versus Goodness of Fit

x

y

x

y

x

y

x

y

Too simple?

Too complex ? About right ?

Training data

What ultimately matters: GENERALIZATION

LowVariance/HighBias

LowBias/HighVariance

10/6/18 39

Dr.YanjunQi/UVACS

Page 40: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCV(Leave-one-outCrossValidation)

x

y

For k=1 to n

1. Let (xk,yk) be the kth record

Credit:Prof.AndrewMoore

Page 41: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCV(Leave-one-outCrossValidation)

x

y

For k=1 to n

1. Let (xk,yk) be the kth record

2. Temporarily remove (xk,yk)from the dataset

Page 42: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCV(Leave-one-outCrossValidation)

x

y

For k=1 to n

1. Let (xk,yk) be the kth record

2. Temporarily remove (xk,yk)from the dataset

3. Train on the remaining R-1 datapoints

Credit:Prof.AndrewMoore

Page 43: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCV(Leave-one-outCrossValidation)For k=1 to n

1. Let (xk,yk) be the kth record

2. Temporarily remove (xk,yk)from the dataset

3. Train on the remaining R-1 datapoints

4. Note your error (xk,yk)

x

y

Page 44: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCV(Leave-one-outCrossValidation)For k=1 to R

1. Let (xk,yk) be the kth record

2. Temporarily remove (xk,yk)from the dataset

3. Train on the remaining R-1 datapoints

4. Note your error (xk,yk)

When you’ve done all points, report the mean error.

x

y

Page 45: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCV(Leave-one-outCrossValidation)For k=1 to n

1. Let (xk,yk) be the kth

record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining R-1 datapoints

4. Note your error (xk,yk)

When you’ve done all points, report the mean error.

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

MSELOOCV=2.12

Credit:Prof.AndrewMoore

Page 46: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCVforQuadraticRegressionFor k=1 to n

1. Let (xk,yk) be the kth record

2. Temporarily remove (xk,yk)from the dataset

3. Train on the remaining R-1 datapoints

4. Note your error (xk,yk)

When you’ve done all points, report the mean error.

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

MSELOOCV=0.962

Credit:Prof.AndrewMoore

Page 47: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LOOCVforJoinTheDotsFor k=1 to n

1. Let (xk,yk) be the kth

record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining R-1 datapoints

4. Note your error (xk,yk)

When you’ve done all points, report the mean error.

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

MSELOOCV=3.33

Credit:Prof.AndrewMoore

Page 48: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

WhichkindofCrossValidation?Downside Upside

Test-set Variance: unreliable estimate of future performance

Cheap

Leave-one-out

Expensive. Has some weird behavior

Doesn’t waste data

..canwegetthebestofbothworlds?

Credit:Prof.AndrewMoore

Page 49: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

e.g.Byk=10foldCrossValidationmodel P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

1 train train train train train train train train train test

2 train train train train train train train train test train

3 train train train train train train train test train train

4 train train train train train train test train train train

5 train train train train train test train train train train

6 train train train train test train train train train train

7 train train train test train train train train train train

8 train train test train train train train train train train

9 train test train train train train train train train train

10 test train train train train train train train train train

• Dividedatainto10equalpieces

• 9piecesastrainingset,therest1astestset

• Collectthescoresfromthediagonal

• Wenormallyusethemeanofthescores 4910/6/18

Dr.YanjunQi/UVACSMakesurethatthetrain/test/validationfoldsareindeed independent samples.

Page 50: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

Credit:Prof.AndrewMoore

Page 51: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Credit:Prof.AndrewMoore

Page 52: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

Credit:Prof.AndrewMoore

Page 53: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.

Credit:Prof.AndrewMoore

Page 54: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.

Then report the mean error

LinearRegressionMSE3FOLD=2.05

Credit:Prof.AndrewMoore

Page 55: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.

Then report the mean error

QuadraticRegressionMSE3FOLD=1.11

Credit:Prof.AndrewMoore

Page 56: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

k-foldCrossValidation

x

y

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored Purple Green and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error

Joint-the-dotsMSE3FOLD=2.93

Credit:Prof.AndrewMoore

Page 57: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

WhichkindofCrossValidation?Downside Upside

Test-set Variance: unreliable estimate of future performance

Cheap

Leave-one-out

Expensive. Has some weird behavior

Doesn’t waste data

10-fold Wastes 10% of the data. 10 times more expensive than test set

Only wastes 10%. Only 10 times more expensive instead of R times.

3-fold Wastier than 10-fold. Expensivier than test set

better than test-set

n-fold Identical to Leave-one-out

Page 58: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

CV-basedModelSelection

• We’retryingtodecidewhichalgorithmtouse.• Wetraineachmachineandmakeatable…

i fi TRAINERR k-FOLD-CV-ERR Choice1 f12 f23 f3 Ö

4 f45 f56 f6

Credit:Prof.AndrewMoore

Page 59: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18

Dr.YanjunQi/UVACS

59Credit:StanfordMachineLearningcourse

Page 60: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

References

• BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

q Prof.Nando deFreitas’stutorialslideq Prof.AndrewMoore’sslides@CMU

10/6/18 60

Dr.YanjunQi/UVACS

Page 61: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

61

Extra: Performancevs.TrainingSizeforanexample

l TheresultsfromBatchGDandOnline(SGD)updatearealmostidentical.Sotheplotscoincide.

l ThetestMSEfromthenormalequationismorethanthatofBGDandO(SGD)during smalltraining.Thisisprobablyduetooverfitting.

l InBandO,sinceonly2000(forexample)iterationsareallowedatmost.Thisroughlyactsasamechanismthatavoidsoverfitting.

10/6/18

Dr.YanjunQi/UVACS

Dr.EricXing’s tutorialslide

Page 62: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

EXTRA:MOREREGRESSIONMODELS

10/6/18

Dr.YanjunQi/UVACS

62

Page 63: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

ExtraToday

q RegressionModelsBeyondLinear– LRwithnon-linearbasisfunctions– Instance-basedRegression:K-NearestNeighbors(later)

– Locallyweightedlinearregression(extra)–RegressiontreesandMultilinear Interpolation(later)

10/6/18 63

Dr.YanjunQi/UVACS

Page 64: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

K-NearestNeighbor

• Features– Allinstancescorrespondtopointsinanp-dimensionalEuclideanspace

– Regressionisdelayedtillanewinstancearrives– Regressionisdonebycomparingfeaturevectorsofthedifferentpoints

– Targetfunctionmaybediscreteorreal-valued• Whentargetiscontinuous,thepredictionisthemeanvalueoftheknearesttrainingexamples

Page 65: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

K=5-NearestNeighbor(1Dinput)

10/6/18

Dr.YanjunQi/UVACS

65

Page 66: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

K=1-NearestNeighbor(1Dinput)

10/6/18

Dr.YanjunQi/UVACS

66

Page 67: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

K-Nearest Neighbor

Regression/ classification

Local Smoothness

NA

NA

Training Samples

Task

Representation

Score Function

Search/Optimization

Models, Parameters

10/6/18 67

Dr.YanjunQi/UVACS6316/f16

Page 68: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Variants:Distance-Weightedk-NearestNeighborAlgorithm

• Assignweightstotheneighborsbasedontheir“distance” fromthequerypoint– Weight“may” beinversesquareofthedistances

• Alltrainingpointsmayinfluenceaparticularinstance– E.g.,Shepard’smethod/ModifiedShepard,… byGeospatialAnalysis

Page 69: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Instance-basedRegressionvs.LinearRegression

• LinearRegressionLearning– Explicitdescriptionoftargetfunctiononthewholetrainingset

• Instance-basedLearning– Learning=storingalltraininginstances– Referredtoas“Lazy” learning

Page 70: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

ExtraToday

q RegressionModelsBeyondLinear– LRwithnon-linearbasisfunctions– Instance-basedRegression:K-NearestNeighbors(later)

– Locallyweightedlinearregression(extra)–RegressiontreesandMultilinear Interpolation(later)

10/6/18 70

Dr.YanjunQi/UVACS

Page 71: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

71

Locally weighted regression• aka locally weighted regression, local

linear regression, LOESS, …– A combination of kNN and Linear regression

10/6/18

Dr.YanjunQi/UVACS

Kλ (xi, x0 )

Page 72: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

72

Locally weighted regression

10/6/18

Dr.YanjunQi/UVACS

UseRBFfunctiontopickout/emphasizetheneighborregionofx_0è

Kλ (xi, x0 )

Kλ (xi, x0 )

Page 73: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

73

Locally weighted regression

10/6/18

Dr.YanjunQi/UVACS

Alinear_func(x)->yèOnlytorepresent

theneighborregionofx_0

Kλ (xi, x0 )

f (x0)=θ0!(x0)+θ1

!(x0)x0

Page 74: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

74

Locallyweightedlinearregression

Insteadofminimizing

nowwefittominimize

∑=

−=n

ii

Ti yJ

1

2

21 )()( θθ x

J(θ ) = 12

wi (xiTθ − yi )

2

i=1

n

wi = Kλ(x i ,x0)= exp −

(x i − x0)22λ2

⎝⎜

⎠⎟

10/6/18

wherex_0 isthequerypointforwhichwe'dliketoknowitscorrespondingy

Page 75: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

75

Locallyweightedlinearregression

Wefit \thetatominimize

wi comesfrom:

• x_0 isthequerypointforwhichwe'dliketoknowitscorrespondingy

J(θ ) = 12

wi (xiTθ − yi )

2

i=1

n

wi = Kλ(x i ,x0)= exp −

(x i − x0)22λ2

⎝⎜

⎠⎟

10/6/18

Essentiallyweputhigherweightsontrainingexamplesthatareclosetothequerypointx_0(thanthosethatarefurtherawayfromthequerypointx_0)

Page 76: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

76

Locally weighted linear regression

• ThewidthofRBFmatters!

10/6/18

Dr.YanjunQi/UVACS

Page 77: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

LEARNING of Locally weighted linear regression

10/6/18

Dr.YanjunQi/UVACS

x0

77

• Separate weighted least squares training and inference at each target point x0

f (x0)=θ0!(x0)+θ1

!(x0)x0

Page 78: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18

Dr. Yanjun Qi / UVA CS

78

Locally weighted linear regression

• èSeparate weighted least square error minimization at each target point x0:

f (x0)= x0Tθ *(x0)

θ *(x0)= argmin12 wi(x iTθ(x0)− yi )2

i=1

n

= argmin12 Kλ(xi ,x0)(x iTθ(x0)− yi )2i=1

n

0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=

Page 79: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18

Dr. Yanjun Qi / UVA CS

79

Extra: Solution of Locally weighted linear/NonLinearBasis regression

( ) NixxKdiagxW iNN ,,1,),()( 00 !==× λ

θ*(x0)= (BTW(x0)B)−1BTW(x0)y

versus LR θ * = XTX( )−1 XT !y

LWR

Page 80: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

10/6/18 80

More è Local Weighted Polynomial Regression

• Local polynomial fits of any degree d

∑∑ ∑

=

= ==

+=

⎥⎦

⎤⎢⎣

⎡−−

d

jj

j

N

i

d

j

jijiidjxx

xxxxf

xxxyxxKj

1 0000

1

2

1000,,1),(),(

)(ˆ)(ˆ)(ˆ

)()(),(min00

βα

βαλβα !

Dr.YanjunQi/UVACS

Blue:trueGreen:estimated

Page 81: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

Dr.YanjunQi/UVACS

81

Extra:Parametricvs.non-parametric

• Locallyweightedlinearregressionisanon-parametricalgorithm.

• The(unweighted)linearregressionalgorithmthatwesawearlierisknownasaparametric learningalgorithm– becauseithasafixed,finitenumberofparameters(the),which

arefittothedata;– Oncewe'vefitthe\theta andstoredthemaway,wenolongerneed

tokeepthetrainingdataaroundtomakefuturepredictions.– Incontrast,tomakepredictionsusinglocallyweightedlinear

regression,weneedtokeeptheentiretrainingsetaround.

• Theterm"non-parametric"(roughly)referstothefactthattheamountofknowledgeweneedtokeep,inordertorepresentthehypothesisgrowswithlinearlythesizeofthetrainingset.

10/6/18

θ

Page 82: UVA CS 4501: Machine Learning Lecture 5: Non …...RBF= radial-basis function:a function which depends only on the radial distance from a centrepoint Gaussian RBFè as distancefrom

(3) Locally Weighted / Kernel Linear Regression

Regression

Y = Weighted linear sum of X’s

Weighted SSE

Linear algebra

Local Regression coefficients

(conditioned on each test point)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2

i=1

N

∑10/6/18 82

Dr.YanjunQi/UVACS

θ*(x0)= (BTW(x0)B)−1BTW(x0)y