lecture 16: high-dimensional regression, non-linear regression · lecture 16: high-dimensional...

45
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis Jonathan Taylor Nov 2, 2018 Slide credits: Sergio Bacallado 1 / 18

Upload: others

Post on 15-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Lecture 16: High-dimensional regression,non-linear regression

Reading: Sections 6.4, 7.1

STATS 202: Data mining and analysis

Jonathan TaylorNov 2, 2018

Slide credits: Sergio Bacallado

1 / 18

Page 2: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

High-dimensional regression

I Most of the methods we’ve discussed work best when n ismuch larger than p.

I However, the case p n is now common, due to experimentaladvances and cheaper computers:

1. Medicine: Instead of regressing heart disease onto just a fewclinical observations (blood pressure, salt consumption, age),we use in addition 500,000 single nucleotide polymorphisms.

2. Marketing: Using search terms to understand online shoppingpatterns. A bag of words model defines one feature for everypossible search term, which counts the number of times theterm appears in a person’s search. There can be as manyfeatures as words in the dictionary.

2 / 18

Page 3: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

High-dimensional regression

I Most of the methods we’ve discussed work best when n ismuch larger than p.

I However, the case p n is now common, due to experimentaladvances and cheaper computers:

1. Medicine: Instead of regressing heart disease onto just a fewclinical observations (blood pressure, salt consumption, age),we use in addition 500,000 single nucleotide polymorphisms.

2. Marketing: Using search terms to understand online shoppingpatterns. A bag of words model defines one feature for everypossible search term, which counts the number of times theterm appears in a person’s search. There can be as manyfeatures as words in the dictionary.

2 / 18

Page 4: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

High-dimensional regression

I Most of the methods we’ve discussed work best when n ismuch larger than p.

I However, the case p n is now common, due to experimentaladvances and cheaper computers:

1. Medicine: Instead of regressing heart disease onto just a fewclinical observations (blood pressure, salt consumption, age),we use in addition 500,000 single nucleotide polymorphisms.

2. Marketing: Using search terms to understand online shoppingpatterns. A bag of words model defines one feature for everypossible search term, which counts the number of times theterm appears in a person’s search. There can be as manyfeatures as words in the dictionary.

2 / 18

Page 5: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

High-dimensional regression

I Most of the methods we’ve discussed work best when n ismuch larger than p.

I However, the case p n is now common, due to experimentaladvances and cheaper computers:

1. Medicine: Instead of regressing heart disease onto just a fewclinical observations (blood pressure, salt consumption, age),we use in addition 500,000 single nucleotide polymorphisms.

2. Marketing: Using search terms to understand online shoppingpatterns. A bag of words model defines one feature for everypossible search term, which counts the number of times theterm appears in a person’s search. There can be as manyfeatures as words in the dictionary.

2 / 18

Page 6: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some problems we have talked about

I When n = p, we can find a fit that goes through every point.

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

XX

YY

I Least-squares regression doesn’t have a unique solution whenp > n.

I We can use regularization methods, such as variable selection,ridge regression and the lasso.

3 / 18

Page 7: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some problems we have talked about

I When n = p, we can find a fit that goes through every point.

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

XX

YY

I Least-squares regression doesn’t have a unique solution whenp > n.

I We can use regularization methods, such as variable selection,ridge regression and the lasso.

3 / 18

Page 8: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some problems we have talked about

I When n = p, we can find a fit that goes through every point.

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

XX

YY

I Least-squares regression doesn’t have a unique solution whenp > n.

I We can use regularization methods, such as variable selection,ridge regression and the lasso.

3 / 18

Page 9: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some problems we have talked about

I We know that least-squares regression doesn’t work whenp > n.

I We can use regularization methods, such as variable selection,ridge regression and the lasso.

I When n = p, we can find a fit that goes through every point.

I Measures of training error are really bad.

5 10 15

0.2

0.4

0.6

0.8

1.0

Number of Variables

R2

5 10 15

0.0

0.2

0.4

0.6

0.8

Number of Variables

Tra

inin

g M

SE

5 10 15

15

50

500

Number of VariablesTest M

SE

4 / 18

Page 10: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some new problems

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

−1.5 −1.0 −0.5 0.0 0.5 1.0

−10

−5

05

10

XXYY

I Furthermore, it becomes hard to estimate the noise σ2.

I Measures of model fit Cp, AIC, and BIC fail.

5 / 18

Page 11: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some new problems

1 16 21

01

23

45

1 28 51

01

23

45

1 70 111

01

23

45

p = 20 p = 50 p = 2000

Degrees of FreedomDegrees of FreedomDegrees of Freedom

I In each case, only 20 predictors are associated to the response.

I Plots show the test error of the Lasso.

I Message: Adding predictors that are uncorrelated with theresponse hurts the performance of the regression!

6 / 18

Page 12: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Some new problems

1 16 21

01

23

45

1 28 51

01

23

45

1 70 111

01

23

45

p = 20 p = 50 p = 2000

Degrees of FreedomDegrees of FreedomDegrees of Freedom

I In each case, only 20 predictors are associated to the response.

I Plots show the test error of the Lasso.

I Message: Adding predictors that are uncorrelated with theresponse hurts the performance of the regression!

6 / 18

Page 13: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting coefficients when p > n

I When p > n, every predictor is a linear combination of otherpredictors, i.e. there is an extreme level of multicollinearity.

I The Lasso and Ridge regression will choose one set ofcoefficients.

I The coefficients selected i ; |βi| > δ are not guaranteed tobe identical to i ; |βi| > δ. There can be many sets ofpredictors (possibly non-overlapping) which yield good models.

I Message: Don’t overstate the importance of the predictorsselected.

7 / 18

Page 14: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting coefficients when p > n

I When p > n, every predictor is a linear combination of otherpredictors, i.e. there is an extreme level of multicollinearity.

I The Lasso and Ridge regression will choose one set ofcoefficients.

I The coefficients selected i ; |βi| > δ are not guaranteed tobe identical to i ; |βi| > δ. There can be many sets ofpredictors (possibly non-overlapping) which yield good models.

I Message: Don’t overstate the importance of the predictorsselected.

7 / 18

Page 15: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting coefficients when p > n

I When p > n, every predictor is a linear combination of otherpredictors, i.e. there is an extreme level of multicollinearity.

I The Lasso and Ridge regression will choose one set ofcoefficients.

I The coefficients selected i ; |βi| > δ are not guaranteed tobe identical to i ; |βi| > δ. There can be many sets ofpredictors (possibly non-overlapping) which yield good models.

I Message: Don’t overstate the importance of the predictorsselected.

7 / 18

Page 16: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting coefficients when p > n

I When p > n, every predictor is a linear combination of otherpredictors, i.e. there is an extreme level of multicollinearity.

I The Lasso and Ridge regression will choose one set ofcoefficients.

I The coefficients selected i ; |βi| > δ are not guaranteed tobe identical to i ; |βi| > δ. There can be many sets ofpredictors (possibly non-overlapping) which yield good models.

I Message: Don’t overstate the importance of the predictorsselected.

7 / 18

Page 17: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting inference for p > n

I When p > n, LASSO might select a sparse model.

I Running lm on selected variables on training data is bad.

I Running lm on selected variables on independent validationdata is OK.

I Message: Don’t use inferential methods developed for leastsquares regression for things like LASSO, forward stepwise, etc.

I Can we do better? Yes, but it’s complicated.

8 / 18

Page 18: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting inference for p > n

I When p > n, LASSO might select a sparse model.

I Running lm on selected variables on training data is bad.

I Running lm on selected variables on independent validationdata is OK.

I Message: Don’t use inferential methods developed for leastsquares regression for things like LASSO, forward stepwise, etc.

I Can we do better? Yes, but it’s complicated.

8 / 18

Page 19: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting inference for p > n

I When p > n, LASSO might select a sparse model.

I Running lm on selected variables on training data is bad.

I Running lm on selected variables on independent validationdata is OK.

I Message: Don’t use inferential methods developed for leastsquares regression for things like LASSO, forward stepwise, etc.

I Can we do better? Yes, but it’s complicated.

8 / 18

Page 20: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting inference for p > n

I When p > n, LASSO might select a sparse model.

I Running lm on selected variables on training data is bad.

I Running lm on selected variables on independent validationdata is OK.

I Message: Don’t use inferential methods developed for leastsquares regression for things like LASSO, forward stepwise, etc.

I Can we do better? Yes, but it’s complicated.

8 / 18

Page 21: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Interpreting inference for p > n

I When p > n, LASSO might select a sparse model.

I Running lm on selected variables on training data is bad.

I Running lm on selected variables on independent validationdata is OK.

I Message: Don’t use inferential methods developed for leastsquares regression for things like LASSO, forward stepwise, etc.

I Can we do better? Yes, but it’s complicated.

8 / 18

Page 22: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Non-linear regression

Problem: How do we model a non-linear relationship?

20 30 40 50 60 70 80

50

10

01

50

20

02

50

30

0

Age

Wa

ge

Degree−4 Polynomial

20 30 40 50 60 70 800

.00

0.0

50

.10

0.1

50

.20

Age

| | || | ||| | ||| | | ||| | || | | |||

|

|| || ||| | | | || || || | || |

|

|| | | |

|

| || || || | | || | ||| || ||| | | |

|

| || | ||| || | || |||| || || ||| || || ||| |||| || | | | ||

|

|| || ||| ||| || || ||| || ||| | || ||| |||| ||| || | | | ||| || |||| |||| || || | ||||| | || || || | ||| | || ||| || | || ||

|

||| | || | || || | || | ||| || || | || ||| |

|

| | |||| ||| | || | |||| ||| || || ||| | | || || |||||| || | || || | || || | | || || || | || ||| || | || || ||||| ||||| || || || || | |||| || || ||| | || || || |

|

| |||| ||| || || || ||| | | ||

|

|| |

|

| || || || ||| || || | || || || | || ||| || | ||| || | || || || | |||| || | |||| | |||| || | | | ||||

|

| || || || || || |

|

| |||| || || |||| | || || || ||| | || |||| || |

|

| | |||| || || || |

|

|| |||| ||| ||

|

||| || |||| | || || | | |||

|

||| || | || | || | || || ||||| | | ||| |

|

| | || || ||| ||| | || |

|

|| | || || | |||| || | || || | ||| || || || || |||| || | ||||| | | |||| | || ||| || ||| |

|

| ||| | || || | | || |

|

| | | ||| |||| || || | | || || | |||| | | | ||| | || | |||| ||| | |

|

|| ||||| ||| | | || || || || || || || |

|

| || || || | ||| || || | || || |||| |||| |

|

| || || ||| || | | ||| || || | ||| ||| || || |

|

||| || || || || || | | ||| | || ||| || || | |||| || | |

|

|| || ||||| || | || || ||| | ||| | || ||| ||||| || ||||| ||| | ||| ||| | || || || ||| || || | | || |

|

| || |||| ||| | |||

|

| | | | || | ||| | | || | |||| || ||| || | ||| || | ||| ||

|

|| || |||| | ||| | || | | ||| |||| || ||| || || || | | || | || | || || || || | | || || | |

|

|| ||| ||||| ||| ||| || ||||| || || | ||| || | | || | ||| | | ||| || || || || | ||| ||| || || |||

|

| || || ||| | | ||| | |||| | || || ||||

|

| | || | || | || | |||

|

| || || ||| | | ||| ||| | || ||| || || ||| | |||| | ||| | ||| | || | || | || | | || || || || || |||| || | | || | | | |||| || | ||| | || ||| || || ||| ||

|

||| ||| | || || || | | || | || || || || || || | || || | || || |

|

| || ||| || |

| |

| ||| | || || |

|

| |||| ||| | |||| ||

|

| ||| ||| ||| |||| |

|

| || || || || ||| | | | || || | ||| || | || | || | |||| | ||| ||| ||

|

| | ||||| ||| | | || || | | |||| | |||| ||| ||| | || | || || || | || | || || ||| | || ||| | || || ||| | | | |||| | || | | ||| ||| |||| | | ||| | |||| | || | || || | ||

|

| || ||||| || ||| ||| || | | ||||| || |||| || | | ||| | || || || ||| |||| |||| | | || || || | ||| | || || || | | || || || |||| || ||| || ||| || |

|

| || || |||| || | ||| | ||| || | || |||| |||| ||| | | | || ||| | || | | ||

|

|| |||| ||| ||| || | | |||| ||| |||| || |||| || || ||| |||| | ||| | |

|

|| | || || || | ||| | || ||| || ||| | || || ||| | || || || | || ||| | || || |||| || || | || ||| ||

|

|| || | || || || | || | ||| | ||| || | || || ||| || ||| ||| || | || || | | || || || ||| || || || | ||| || | |||

|

|| | |

|

||| | | | || ||| || | ||||| | | || || || | | || || || | | || ||| | |||| |

|

||||| | | | || || | | ||| || | | || | || | ||| || |||| | ||| | || || ||||| | || ||| ||| | || || || || || ||| | ||||| || || ||| ||| || | | || || || ||

|

| || | | || | || || | || || || | |||| | | | ||| | | ||

|

| | || ||

|

|| | | ||| || ||| || || | || || || || | | || ||| || ||| || || || ||| | ||| || ||| || ||| | ||| | | | || || | ||| ||| || | ||

|

|||| |

|

|| | |||| ||| | || || ||| || ||| | |||| || |

|

|| ||| ||| | ||| | || | | | ||| || | || || ||| | | | ||| || || ||| || | ||| | || |||| | |||| | ||| || || || || || | ||| || || | | ||| || || |||| ||| || | || ||| || | ||| |

|

| || | |||

|

| | || || | ||| || |

|

| | ||| || || || | | || | ||| | | ||| || | | || | | || ||||| || || |||| | ||| | | || || | | || | | |

|

|| || |||| | || |||| |

||

| | | ||||| |||

|

|| |||| | |||| || |

|

| | || ||||| ||||| | || || || | || ||| ||| | || ||| || ||| || | || || ||| || | | | || || ||| | || || | || || |

|

| || ||

|

|| || ||| || | | | || |||| || |||| ||| || |||| || || | ||| | |||

|

|| ||| | |

| |

|| || | ||| || ||| | | |||| | ||| | |||| || ||| || || | ||| | ||| | |||| || | || |||| | ||||| ||| | | ||| | ||| || ||| || | ||| || ||| | ||| || | ||| | | || || || || | ||| || || || |||| ||| | ||| || || |||| || |||

|

| |||

|

| ||

|

| |

|

|

|

| | | || || |||

|

|||| ||

|

|| || || || || || | | ||||| | ||| || | ||| ||| || ||| || | | || || | || | || ||| |||| || || ||| |||| ||| ||| ||| | | || |

|

| ||| || || || ||| ||| | ||| | || || ||| || || ||| ||

|

| ||| | || | || || |||| || ||| || | | ||| || | || ||| || || | || ||

|

| | ||| || | | | ||

|

| | || | | ||| | || | || | ||| || || ||| | | || |

|

|| ||| || || | || || |||| || || || | || || | || ||| | || ||| | || ||| || || | | || || ||| || || || ||| |||| |Pr(Wage>

250|A

ge)

Left: Regression of wage onto age.Right: Logistic regression for classes wage> 250 and wage≤ 250

9 / 18

Page 23: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Basis functionsStrategy:

I Define a model:

Y = β0 + β1f1(X) + β2f2(X) + · · ·+ βdfd(X) + ε.

I Fit this model through least-squares regression: fj ’s arenonlinear, model is linear!

I Options for f1, . . . , fd:

1. Polynomials, fi(x) = xi.

2. Indicator functions, fi(x) = 1(ci ≤ x < ci+1).

20 30 40 50 60 70 80

50

10

01

50

20

02

50

30

0

Age

Wag

e

Piecewise Constant

20 30 40 50 60 70 80

0.0

00

.05

0.1

00

.15

0.2

0

Age

| | || | ||| | ||| | | ||| | || | | |||

|

|| || ||| | | | || || || | || |

|

|| | | |

|

| || || || | | || | ||| || ||| | | |

|

| || | ||| || | || |||| || || ||| || || ||| |||| || | | | ||

|

|| || ||| ||| || || ||| || ||| | || ||| |||| ||| || | | | ||| || |||| |||| || || | ||||| | || || || | ||| | || ||| || | || ||

|

||| | || | || || | || | ||| || || | || ||| |

|

| | |||| ||| | || | |||| ||| || || ||| | | || || |||||| || | || || | || || | | || || || | || ||| || | || || ||||| ||||| || || || || | |||| || || ||| | || || || |

|

| |||| ||| || || || ||| | | ||

|

|| |

|

| || || || ||| || || | || || || | || ||| || | ||| || | || || || | |||| || | |||| | |||| || | | | ||||

|

| || || || || || |

|

| |||| || || |||| | || || || ||| | || |||| || |

|

| | |||| || || || |

|

|| |||| ||| ||

|

||| || |||| | || || | | |||

|

||| || | || | || | || || ||||| | | ||| |

|

| | || || ||| ||| | || |

|

|| | || || | |||| || | || || | ||| || || || || |||| || | ||||| | | |||| | || ||| || ||| |

|

| ||| | || || | | || |

|

| | | ||| |||| || || | | || || | |||| | | | ||| | || | |||| ||| | |

|

|| ||||| ||| | | || || || || || || || |

|

| || || || | ||| || || | || || |||| |||| |

|

| || || ||| || | | ||| || || | ||| ||| || || |

|

||| || || || || || | | ||| | || ||| || || | |||| || | |

|

|| || ||||| || | || || ||| | ||| | || ||| ||||| || ||||| ||| | ||| ||| | || || || ||| || || | | || |

|

| || |||| ||| | |||

|

| | | | || | ||| | | || | |||| || ||| || | ||| || | ||| ||

|

|| || |||| | ||| | || | | ||| |||| || ||| || || || | | || | || | || || || || | | || || | |

|

|| ||| ||||| ||| ||| || ||||| || || | ||| || | | || | ||| | | ||| || || || || | ||| ||| || || |||

|

| || || ||| | | ||| | |||| | || || ||||

|

| | || | || | || | |||

|

| || || ||| | | ||| ||| | || ||| || || ||| | |||| | ||| | ||| | || | || | || | | || || || || || |||| || | | || | | | |||| || | ||| | || ||| || || ||| ||

|

||| ||| | || || || | | || | || || || || || || | || || | || || |

|

| || ||| || |

| |

| ||| | || || |

|

| |||| ||| | |||| ||

|

| ||| ||| ||| |||| |

|

| || || || || ||| | | | || || | ||| || | || | || | |||| | ||| ||| ||

|

| | ||||| ||| | | || || | | |||| | |||| ||| ||| | || | || || || | || | || || ||| | || ||| | || || ||| | | | |||| | || | | ||| ||| |||| | | ||| | |||| | || | || || | ||

|

| || ||||| || ||| ||| || | | ||||| || |||| || | | ||| | || || || ||| |||| |||| | | || || || | ||| | || || || | | || || || |||| || ||| || ||| || |

|

| || || |||| || | ||| | ||| || | || |||| |||| ||| | | | || ||| | || | | ||

|

|| |||| ||| ||| || | | |||| ||| |||| || |||| || || ||| |||| | ||| | |

|

|| | || || || | ||| | || ||| || ||| | || || ||| | || || || | || ||| | || || |||| || || | || ||| ||

|

|| || | || || || | || | ||| | ||| || | || || ||| || ||| ||| || | || || | | || || || ||| || || || | ||| || | |||

|

|| | |

|

||| | | | || ||| || | ||||| | | || || || | | || || || | | || ||| | |||| |

|

||||| | | | || || | | ||| || | | || | || | ||| || |||| | ||| | || || ||||| | || ||| ||| | || || || || || ||| | ||||| || || ||| ||| || | | || || || ||

|

| || | | || | || || | || || || | |||| | | | ||| | | ||

|

| | || ||

|

|| | | ||| || ||| || || | || || || || | | || ||| || ||| || || || ||| | ||| || ||| || ||| | ||| | | | || || | ||| ||| || | ||

|

|||| |

|

|| | |||| ||| | || || ||| || ||| | |||| || |

|

|| ||| ||| | ||| | || | | | ||| || | || || ||| | | | ||| || || ||| || | ||| | || |||| | |||| | ||| || || || || || | ||| || || | | ||| || || |||| ||| || | || ||| || | ||| |

|

| || | |||

|

| | || || | ||| || |

|

| | ||| || || || | | || | ||| | | ||| || | | || | | || ||||| || || |||| | ||| | | || || | | || | | |

|

|| || |||| | || |||| |

||

| | | ||||| |||

|

|| |||| | |||| || |

|

| | || ||||| ||||| | || || || | || ||| ||| | || ||| || ||| || | || || ||| || | | | || || ||| | || || | || || |

|

| || ||

|

|| || ||| || | | | || |||| || |||| ||| || |||| || || | ||| | |||

|

|| ||| | |

| |

|| || | ||| || ||| | | |||| | ||| | |||| || ||| || || | ||| | ||| | |||| || | || |||| | ||||| ||| | | ||| | ||| || ||| || | ||| || ||| | ||| || | ||| | | || || || || | ||| || || || |||| ||| | ||| || || |||| || |||

|

| |||

|

| ||

|

| |

|

|

|

| | | || || |||

|

|||| ||

|

|| || || || || || | | ||||| | ||| || | ||| ||| || ||| || | | || || | || | || ||| |||| || || ||| |||| ||| ||| ||| | | || |

|

| ||| || || || ||| ||| | ||| | || || ||| || || ||| ||

|

| ||| | || | || || |||| || ||| || | | ||| || | || ||| || || | || ||

|

| | ||| || | | | ||

|

| | || | | ||| | || | || | ||| || || ||| | | || |

|

|| ||| || || | || || |||| || || || | || || | || ||| | || ||| | || ||| || || | | || || ||| || || || ||| |||| |

Pr(Wage>

250|A

ge)

10 / 18

Page 24: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Basis functionsStrategy:

I Define a model:

Y = β0 + β1f1(X) + β2f2(X) + · · ·+ βdfd(X) + ε.

I Fit this model through least-squares regression: fj ’s arenonlinear, model is linear!

I Options for f1, . . . , fd:

1. Polynomials, fi(x) = xi.

2. Indicator functions, fi(x) = 1(ci ≤ x < ci+1).

20 30 40 50 60 70 80

50

10

01

50

20

02

50

30

0

Age

Wag

e

Piecewise Constant

20 30 40 50 60 70 80

0.0

00

.05

0.1

00

.15

0.2

0

Age

| | || | ||| | ||| | | ||| | || | | |||

|

|| || ||| | | | || || || | || |

|

|| | | |

|

| || || || | | || | ||| || ||| | | |

|

| || | ||| || | || |||| || || ||| || || ||| |||| || | | | ||

|

|| || ||| ||| || || ||| || ||| | || ||| |||| ||| || | | | ||| || |||| |||| || || | ||||| | || || || | ||| | || ||| || | || ||

|

||| | || | || || | || | ||| || || | || ||| |

|

| | |||| ||| | || | |||| ||| || || ||| | | || || |||||| || | || || | || || | | || || || | || ||| || | || || ||||| ||||| || || || || | |||| || || ||| | || || || |

|

| |||| ||| || || || ||| | | ||

|

|| |

|

| || || || ||| || || | || || || | || ||| || | ||| || | || || || | |||| || | |||| | |||| || | | | ||||

|

| || || || || || |

|

| |||| || || |||| | || || || ||| | || |||| || |

|

| | |||| || || || |

|

|| |||| ||| ||

|

||| || |||| | || || | | |||

|

||| || | || | || | || || ||||| | | ||| |

|

| | || || ||| ||| | || |

|

|| | || || | |||| || | || || | ||| || || || || |||| || | ||||| | | |||| | || ||| || ||| |

|

| ||| | || || | | || |

|

| | | ||| |||| || || | | || || | |||| | | | ||| | || | |||| ||| | |

|

|| ||||| ||| | | || || || || || || || |

|

| || || || | ||| || || | || || |||| |||| |

|

| || || ||| || | | ||| || || | ||| ||| || || |

|

||| || || || || || | | ||| | || ||| || || | |||| || | |

|

|| || ||||| || | || || ||| | ||| | || ||| ||||| || ||||| ||| | ||| ||| | || || || ||| || || | | || |

|

| || |||| ||| | |||

|

| | | | || | ||| | | || | |||| || ||| || | ||| || | ||| ||

|

|| || |||| | ||| | || | | ||| |||| || ||| || || || | | || | || | || || || || | | || || | |

|

|| ||| ||||| ||| ||| || ||||| || || | ||| || | | || | ||| | | ||| || || || || | ||| ||| || || |||

|

| || || ||| | | ||| | |||| | || || ||||

|

| | || | || | || | |||

|

| || || ||| | | ||| ||| | || ||| || || ||| | |||| | ||| | ||| | || | || | || | | || || || || || |||| || | | || | | | |||| || | ||| | || ||| || || ||| ||

|

||| ||| | || || || | | || | || || || || || || | || || | || || |

|

| || ||| || |

| |

| ||| | || || |

|

| |||| ||| | |||| ||

|

| ||| ||| ||| |||| |

|

| || || || || ||| | | | || || | ||| || | || | || | |||| | ||| ||| ||

|

| | ||||| ||| | | || || | | |||| | |||| ||| ||| | || | || || || | || | || || ||| | || ||| | || || ||| | | | |||| | || | | ||| ||| |||| | | ||| | |||| | || | || || | ||

|

| || ||||| || ||| ||| || | | ||||| || |||| || | | ||| | || || || ||| |||| |||| | | || || || | ||| | || || || | | || || || |||| || ||| || ||| || |

|

| || || |||| || | ||| | ||| || | || |||| |||| ||| | | | || ||| | || | | ||

|

|| |||| ||| ||| || | | |||| ||| |||| || |||| || || ||| |||| | ||| | |

|

|| | || || || | ||| | || ||| || ||| | || || ||| | || || || | || ||| | || || |||| || || | || ||| ||

|

|| || | || || || | || | ||| | ||| || | || || ||| || ||| ||| || | || || | | || || || ||| || || || | ||| || | |||

|

|| | |

|

||| | | | || ||| || | ||||| | | || || || | | || || || | | || ||| | |||| |

|

||||| | | | || || | | ||| || | | || | || | ||| || |||| | ||| | || || ||||| | || ||| ||| | || || || || || ||| | ||||| || || ||| ||| || | | || || || ||

|

| || | | || | || || | || || || | |||| | | | ||| | | ||

|

| | || ||

|

|| | | ||| || ||| || || | || || || || | | || ||| || ||| || || || ||| | ||| || ||| || ||| | ||| | | | || || | ||| ||| || | ||

|

|||| |

|

|| | |||| ||| | || || ||| || ||| | |||| || |

|

|| ||| ||| | ||| | || | | | ||| || | || || ||| | | | ||| || || ||| || | ||| | || |||| | |||| | ||| || || || || || | ||| || || | | ||| || || |||| ||| || | || ||| || | ||| |

|

| || | |||

|

| | || || | ||| || |

|

| | ||| || || || | | || | ||| | | ||| || | | || | | || ||||| || || |||| | ||| | | || || | | || | | |

|

|| || |||| | || |||| |

||

| | | ||||| |||

|

|| |||| | |||| || |

|

| | || ||||| ||||| | || || || | || ||| ||| | || ||| || ||| || | || || ||| || | | | || || ||| | || || | || || |

|

| || ||

|

|| || ||| || | | | || |||| || |||| ||| || |||| || || | ||| | |||

|

|| ||| | |

| |

|| || | ||| || ||| | | |||| | ||| | |||| || ||| || || | ||| | ||| | |||| || | || |||| | ||||| ||| | | ||| | ||| || ||| || | ||| || ||| | ||| || | ||| | | || || || || | ||| || || || |||| ||| | ||| || || |||| || |||

|

| |||

|

| ||

|

| |

|

|

|

| | | || || |||

|

|||| ||

|

|| || || || || || | | ||||| | ||| || | ||| ||| || ||| || | | || || | || | || ||| |||| || || ||| |||| ||| ||| ||| | | || |

|

| ||| || || || ||| ||| | ||| | || || ||| || || ||| ||

|

| ||| | || | || || |||| || ||| || | | ||| || | || ||| || || | || ||

|

| | ||| || | | | ||

|

| | || | | ||| | || | || | ||| || || ||| | | || |

|

|| ||| || || | || || |||| || || || | || || | || ||| | || ||| | || ||| || || | | || || ||| || || || ||| |||| |

Pr(Wage>

250|A

ge)

10 / 18

Page 25: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Basis functionsStrategy:

I Define a model:

Y = β0 + β1f1(X) + β2f2(X) + · · ·+ βdfd(X) + ε.

I Fit this model through least-squares regression: fj ’s arenonlinear, model is linear!

I Options for f1, . . . , fd:

1. Polynomials, fi(x) = xi.

2. Indicator functions, fi(x) = 1(ci ≤ x < ci+1).

20 30 40 50 60 70 80

50

10

01

50

200

25

03

00

Age

Wa

ge

Piecewise Constant

20 30 40 50 60 70 80

0.0

00

.05

0.1

00

.15

0.2

0

Age

| | || | ||| | ||| | | ||| | || | | |||

|

|| || ||| | | | || || || | || |

|

|| | | |

|

| || || || | | || | ||| || ||| | | |

|

| || | ||| || | || |||| || || ||| || || ||| |||| || | | | ||

|

|| || ||| ||| || || ||| || ||| | || ||| |||| ||| || | | | ||| || |||| |||| || || | ||||| | || || || | ||| | || ||| || | || ||

|

||| | || | || || | || | ||| || || | || ||| |

|

| | |||| ||| | || | |||| ||| || || ||| | | || || |||||| || | || || | || || | | || || || | || ||| || | || || ||||| ||||| || || || || | |||| || || ||| | || || || |

|

| |||| ||| || || || ||| | | ||

|

|| |

|

| || || || ||| || || | || || || | || ||| || | ||| || | || || || | |||| || | |||| | |||| || | | | ||||

|

| || || || || || |

|

| |||| || || |||| | || || || ||| | || |||| || |

|

| | |||| || || || |

|

|| |||| ||| ||

|

||| || |||| | || || | | |||

|

||| || | || | || | || || ||||| | | ||| |

|

| | || || ||| ||| | || |

|

|| | || || | |||| || | || || | ||| || || || || |||| || | ||||| | | |||| | || ||| || ||| |

|

| ||| | || || | | || |

|

| | | ||| |||| || || | | || || | |||| | | | ||| | || | |||| ||| | |

|

|| ||||| ||| | | || || || || || || || |

|

| || || || | ||| || || | || || |||| |||| |

|

| || || ||| || | | ||| || || | ||| ||| || || |

|

||| || || || || || | | ||| | || ||| || || | |||| || | |

|

|| || ||||| || | || || ||| | ||| | || ||| ||||| || ||||| ||| | ||| ||| | || || || ||| || || | | || |

|

| || |||| ||| | |||

|

| | | | || | ||| | | || | |||| || ||| || | ||| || | ||| ||

|

|| || |||| | ||| | || | | ||| |||| || ||| || || || | | || | || | || || || || | | || || | |

|

|| ||| ||||| ||| ||| || ||||| || || | ||| || | | || | ||| | | ||| || || || || | ||| ||| || || |||

|

| || || ||| | | ||| | |||| | || || ||||

|

| | || | || | || | |||

|

| || || ||| | | ||| ||| | || ||| || || ||| | |||| | ||| | ||| | || | || | || | | || || || || || |||| || | | || | | | |||| || | ||| | || ||| || || ||| ||

|

||| ||| | || || || | | || | || || || || || || | || || | || || |

|

| || ||| || |

| |

| ||| | || || |

|

| |||| ||| | |||| ||

|

| ||| ||| ||| |||| |

|

| || || || || ||| | | | || || | ||| || | || | || | |||| | ||| ||| ||

|

| | ||||| ||| | | || || | | |||| | |||| ||| ||| | || | || || || | || | || || ||| | || ||| | || || ||| | | | |||| | || | | ||| ||| |||| | | ||| | |||| | || | || || | ||

|

| || ||||| || ||| ||| || | | ||||| || |||| || | | ||| | || || || ||| |||| |||| | | || || || | ||| | || || || | | || || || |||| || ||| || ||| || |

|

| || || |||| || | ||| | ||| || | || |||| |||| ||| | | | || ||| | || | | ||

|

|| |||| ||| ||| || | | |||| ||| |||| || |||| || || ||| |||| | ||| | |

|

|| | || || || | ||| | || ||| || ||| | || || ||| | || || || | || ||| | || || |||| || || | || ||| ||

|

|| || | || || || | || | ||| | ||| || | || || ||| || ||| ||| || | || || | | || || || ||| || || || | ||| || | |||

|

|| | |

|

||| | | | || ||| || | ||||| | | || || || | | || || || | | || ||| | |||| |

|

||||| | | | || || | | ||| || | | || | || | ||| || |||| | ||| | || || ||||| | || ||| ||| | || || || || || ||| | ||||| || || ||| ||| || | | || || || ||

|

| || | | || | || || | || || || | |||| | | | ||| | | ||

|

| | || ||

|

|| | | ||| || ||| || || | || || || || | | || ||| || ||| || || || ||| | ||| || ||| || ||| | ||| | | | || || | ||| ||| || | ||

|

|||| |

|

|| | |||| ||| | || || ||| || ||| | |||| || |

|

|| ||| ||| | ||| | || | | | ||| || | || || ||| | | | ||| || || ||| || | ||| | || |||| | |||| | ||| || || || || || | ||| || || | | ||| || || |||| ||| || | || ||| || | ||| |

|

| || | |||

|

| | || || | ||| || |

|

| | ||| || || || | | || | ||| | | ||| || | | || | | || ||||| || || |||| | ||| | | || || | | || | | |

|

|| || |||| | || |||| |

||

| | | ||||| |||

|

|| |||| | |||| || |

|

| | || ||||| ||||| | || || || | || ||| ||| | || ||| || ||| || | || || ||| || | | | || || ||| | || || | || || |

|

| || ||

|

|| || ||| || | | | || |||| || |||| ||| || |||| || || | ||| | |||

|

|| ||| | |

| |

|| || | ||| || ||| | | |||| | ||| | |||| || ||| || || | ||| | ||| | |||| || | || |||| | ||||| ||| | | ||| | ||| || ||| || | ||| || ||| | ||| || | ||| | | || || || || | ||| || || || |||| ||| | ||| || || |||| || |||

|

| |||

|

| ||

|

| |

|

|

|

| | | || || |||

|

|||| ||

|

|| || || || || || | | ||||| | ||| || | ||| ||| || ||| || | | || || | || | || ||| |||| || || ||| |||| ||| ||| ||| | | || |

|

| ||| || || || ||| ||| | ||| | || || ||| || || ||| ||

|

| ||| | || | || || |||| || ||| || | | ||| || | || ||| || || | || ||

|

| | ||| || | | | ||

|

| | || | | ||| | || | || | ||| || || ||| | | || |

|

|| ||| || || | || || |||| || || || | || || | || ||| | || ||| | || ||| || || | | || || ||| || || || ||| |||| |

Pr(Wage>

250|A

ge)

10 / 18

Page 26: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Basis functionsStrategy:

I Define a model:

Y = β0 + β1f1(X) + β2f2(X) + · · ·+ βdfd(X) + ε.

I Fit this model through least-squares regression: fj ’s arenonlinear, model is linear!

I Options for f1, . . . , fd:

1. Polynomials, fi(x) = xi.

2. Indicator functions, fi(x) = 1(ci ≤ x < ci+1).

20 30 40 50 60 70 80

50

10

01

50

200

25

03

00

Age

Wa

ge

Piecewise Constant

20 30 40 50 60 70 80

0.0

00

.05

0.1

00

.15

0.2

0

Age

| | || | ||| | ||| | | ||| | || | | |||

|

|| || ||| | | | || || || | || |

|

|| | | |

|

| || || || | | || | ||| || ||| | | |

|

| || | ||| || | || |||| || || ||| || || ||| |||| || | | | ||

|

|| || ||| ||| || || ||| || ||| | || ||| |||| ||| || | | | ||| || |||| |||| || || | ||||| | || || || | ||| | || ||| || | || ||

|

||| | || | || || | || | ||| || || | || ||| |

|

| | |||| ||| | || | |||| ||| || || ||| | | || || |||||| || | || || | || || | | || || || | || ||| || | || || ||||| ||||| || || || || | |||| || || ||| | || || || |

|

| |||| ||| || || || ||| | | ||

|

|| |

|

| || || || ||| || || | || || || | || ||| || | ||| || | || || || | |||| || | |||| | |||| || | | | ||||

|

| || || || || || |

|

| |||| || || |||| | || || || ||| | || |||| || |

|

| | |||| || || || |

|

|| |||| ||| ||

|

||| || |||| | || || | | |||

|

||| || | || | || | || || ||||| | | ||| |

|

| | || || ||| ||| | || |

|

|| | || || | |||| || | || || | ||| || || || || |||| || | ||||| | | |||| | || ||| || ||| |

|

| ||| | || || | | || |

|

| | | ||| |||| || || | | || || | |||| | | | ||| | || | |||| ||| | |

|

|| ||||| ||| | | || || || || || || || |

|

| || || || | ||| || || | || || |||| |||| |

|

| || || ||| || | | ||| || || | ||| ||| || || |

|

||| || || || || || | | ||| | || ||| || || | |||| || | |

|

|| || ||||| || | || || ||| | ||| | || ||| ||||| || ||||| ||| | ||| ||| | || || || ||| || || | | || |

|

| || |||| ||| | |||

|

| | | | || | ||| | | || | |||| || ||| || | ||| || | ||| ||

|

|| || |||| | ||| | || | | ||| |||| || ||| || || || | | || | || | || || || || | | || || | |

|

|| ||| ||||| ||| ||| || ||||| || || | ||| || | | || | ||| | | ||| || || || || | ||| ||| || || |||

|

| || || ||| | | ||| | |||| | || || ||||

|

| | || | || | || | |||

|

| || || ||| | | ||| ||| | || ||| || || ||| | |||| | ||| | ||| | || | || | || | | || || || || || |||| || | | || | | | |||| || | ||| | || ||| || || ||| ||

|

||| ||| | || || || | | || | || || || || || || | || || | || || |

|

| || ||| || |

| |

| ||| | || || |

|

| |||| ||| | |||| ||

|

| ||| ||| ||| |||| |

|

| || || || || ||| | | | || || | ||| || | || | || | |||| | ||| ||| ||

|

| | ||||| ||| | | || || | | |||| | |||| ||| ||| | || | || || || | || | || || ||| | || ||| | || || ||| | | | |||| | || | | ||| ||| |||| | | ||| | |||| | || | || || | ||

|

| || ||||| || ||| ||| || | | ||||| || |||| || | | ||| | || || || ||| |||| |||| | | || || || | ||| | || || || | | || || || |||| || ||| || ||| || |

|

| || || |||| || | ||| | ||| || | || |||| |||| ||| | | | || ||| | || | | ||

|

|| |||| ||| ||| || | | |||| ||| |||| || |||| || || ||| |||| | ||| | |

|

|| | || || || | ||| | || ||| || ||| | || || ||| | || || || | || ||| | || || |||| || || | || ||| ||

|

|| || | || || || | || | ||| | ||| || | || || ||| || ||| ||| || | || || | | || || || ||| || || || | ||| || | |||

|

|| | |

|

||| | | | || ||| || | ||||| | | || || || | | || || || | | || ||| | |||| |

|

||||| | | | || || | | ||| || | | || | || | ||| || |||| | ||| | || || ||||| | || ||| ||| | || || || || || ||| | ||||| || || ||| ||| || | | || || || ||

|

| || | | || | || || | || || || | |||| | | | ||| | | ||

|

| | || ||

|

|| | | ||| || ||| || || | || || || || | | || ||| || ||| || || || ||| | ||| || ||| || ||| | ||| | | | || || | ||| ||| || | ||

|

|||| |

|

|| | |||| ||| | || || ||| || ||| | |||| || |

|

|| ||| ||| | ||| | || | | | ||| || | || || ||| | | | ||| || || ||| || | ||| | || |||| | |||| | ||| || || || || || | ||| || || | | ||| || || |||| ||| || | || ||| || | ||| |

|

| || | |||

|

| | || || | ||| || |

|

| | ||| || || || | | || | ||| | | ||| || | | || | | || ||||| || || |||| | ||| | | || || | | || | | |

|

|| || |||| | || |||| |

||

| | | ||||| |||

|

|| |||| | |||| || |

|

| | || ||||| ||||| | || || || | || ||| ||| | || ||| || ||| || | || || ||| || | | | || || ||| | || || | || || |

|

| || ||

|

|| || ||| || | | | || |||| || |||| ||| || |||| || || | ||| | |||

|

|| ||| | |

| |

|| || | ||| || ||| | | |||| | ||| | |||| || ||| || || | ||| | ||| | |||| || | || |||| | ||||| ||| | | ||| | ||| || ||| || | ||| || ||| | ||| || | ||| | | || || || || | ||| || || || |||| ||| | ||| || || |||| || |||

|

| |||

|

| ||

|

| |

|

|

|

| | | || || |||

|

|||| ||

|

|| || || || || || | | ||||| | ||| || | ||| ||| || ||| || | | || || | || | || ||| |||| || || ||| |||| ||| ||| ||| | | || |

|

| ||| || || || ||| ||| | ||| | || || ||| || || ||| ||

|

| ||| | || | || || |||| || ||| || | | ||| || | || ||| || || | || ||

|

| | ||| || | | | ||

|

| | || | | ||| | || | || | ||| || || ||| | | || |

|

|| ||| || || | || || |||| || || || | || || | || ||| | || ||| | || ||| || || | | || || ||| || || || ||| |||| |

Pr(Wage>

250|A

ge)

10 / 18

Page 27: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Basis functionsStrategy:

I Define a model:

Y = β0 + β1f1(X) + β2f2(X) + · · ·+ βdfd(X) + ε.

I Fit this model through least-squares regression: fj ’s arenonlinear, model is linear!

I Options for f1, . . . , fd:

1. Polynomials, fi(x) = xi.

2. Indicator functions, fi(x) = 1(ci ≤ x < ci+1).

20 30 40 50 60 70 80

50

10

01

50

20

02

50

30

0

Age

Wa

ge

Piecewise Constant

20 30 40 50 60 70 80

0.0

00

.05

0.1

00

.15

0.2

0

Age

| | || | ||| | ||| | | ||| | || | | |||

|

|| || ||| | | | || || || | || |

|

|| | | |

|

| || || || | | || | ||| || ||| | | |

|

| || | ||| || | || |||| || || ||| || || ||| |||| || | | | ||

|

|| || ||| ||| || || ||| || ||| | || ||| |||| ||| || | | | ||| || |||| |||| || || | ||||| | || || || | ||| | || ||| || | || ||

|

||| | || | || || | || | ||| || || | || ||| |

|

| | |||| ||| | || | |||| ||| || || ||| | | || || |||||| || | || || | || || | | || || || | || ||| || | || || ||||| ||||| || || || || | |||| || || ||| | || || || |

|

| |||| ||| || || || ||| | | ||

|

|| |

|

| || || || ||| || || | || || || | || ||| || | ||| || | || || || | |||| || | |||| | |||| || | | | ||||

|

| || || || || || |

|

| |||| || || |||| | || || || ||| | || |||| || |

|

| | |||| || || || |

|

|| |||| ||| ||

|

||| || |||| | || || | | |||

|

||| || | || | || | || || ||||| | | ||| |

|

| | || || ||| ||| | || |

|

|| | || || | |||| || | || || | ||| || || || || |||| || | ||||| | | |||| | || ||| || ||| |

|

| ||| | || || | | || |

|

| | | ||| |||| || || | | || || | |||| | | | ||| | || | |||| ||| | |

|

|| ||||| ||| | | || || || || || || || |

|

| || || || | ||| || || | || || |||| |||| |

|

| || || ||| || | | ||| || || | ||| ||| || || |

|

||| || || || || || | | ||| | || ||| || || | |||| || | |

|

|| || ||||| || | || || ||| | ||| | || ||| ||||| || ||||| ||| | ||| ||| | || || || ||| || || | | || |

|

| || |||| ||| | |||

|

| | | | || | ||| | | || | |||| || ||| || | ||| || | ||| ||

|

|| || |||| | ||| | || | | ||| |||| || ||| || || || | | || | || | || || || || | | || || | |

|

|| ||| ||||| ||| ||| || ||||| || || | ||| || | | || | ||| | | ||| || || || || | ||| ||| || || |||

|

| || || ||| | | ||| | |||| | || || ||||

|

| | || | || | || | |||

|

| || || ||| | | ||| ||| | || ||| || || ||| | |||| | ||| | ||| | || | || | || | | || || || || || |||| || | | || | | | |||| || | ||| | || ||| || || ||| ||

|

||| ||| | || || || | | || | || || || || || || | || || | || || |

|

| || ||| || |

| |

| ||| | || || |

|

| |||| ||| | |||| ||

|

| ||| ||| ||| |||| |

|

| || || || || ||| | | | || || | ||| || | || | || | |||| | ||| ||| ||

|

| | ||||| ||| | | || || | | |||| | |||| ||| ||| | || | || || || | || | || || ||| | || ||| | || || ||| | | | |||| | || | | ||| ||| |||| | | ||| | |||| | || | || || | ||

|

| || ||||| || ||| ||| || | | ||||| || |||| || | | ||| | || || || ||| |||| |||| | | || || || | ||| | || || || | | || || || |||| || ||| || ||| || |

|

| || || |||| || | ||| | ||| || | || |||| |||| ||| | | | || ||| | || | | ||

|

|| |||| ||| ||| || | | |||| ||| |||| || |||| || || ||| |||| | ||| | |

|

|| | || || || | ||| | || ||| || ||| | || || ||| | || || || | || ||| | || || |||| || || | || ||| ||

|

|| || | || || || | || | ||| | ||| || | || || ||| || ||| ||| || | || || | | || || || ||| || || || | ||| || | |||

|

|| | |

|

||| | | | || ||| || | ||||| | | || || || | | || || || | | || ||| | |||| |

|

||||| | | | || || | | ||| || | | || | || | ||| || |||| | ||| | || || ||||| | || ||| ||| | || || || || || ||| | ||||| || || ||| ||| || | | || || || ||

|

| || | | || | || || | || || || | |||| | | | ||| | | ||

|

| | || ||

|

|| | | ||| || ||| || || | || || || || | | || ||| || ||| || || || ||| | ||| || ||| || ||| | ||| | | | || || | ||| ||| || | ||

|

|||| |

|

|| | |||| ||| | || || ||| || ||| | |||| || |

|

|| ||| ||| | ||| | || | | | ||| || | || || ||| | | | ||| || || ||| || | ||| | || |||| | |||| | ||| || || || || || | ||| || || | | ||| || || |||| ||| || | || ||| || | ||| |

|

| || | |||

|

| | || || | ||| || |

|

| | ||| || || || | | || | ||| | | ||| || | | || | | || ||||| || || |||| | ||| | | || || | | || | | |

|

|| || |||| | || |||| |

||

| | | ||||| |||

|

|| |||| | |||| || |

|

| | || ||||| ||||| | || || || | || ||| ||| | || ||| || ||| || | || || ||| || | | | || || ||| | || || | || || |

|

| || ||

|

|| || ||| || | | | || |||| || |||| ||| || |||| || || | ||| | |||

|

|| ||| | |

| |

|| || | ||| || ||| | | |||| | ||| | |||| || ||| || || | ||| | ||| | |||| || | || |||| | ||||| ||| | | ||| | ||| || ||| || | ||| || ||| | ||| || | ||| | | || || || || | ||| || || || |||| ||| | ||| || || |||| || |||

|

| |||

|

| ||

|

| |

|

|

|

| | | || || |||

|

|||| ||

|

|| || || || || || | | ||||| | ||| || | ||| ||| || ||| || | | || || | || | || ||| |||| || || ||| |||| ||| ||| ||| | | || |

|

| ||| || || || ||| ||| | ||| | || || ||| || || ||| ||

|

| ||| | || | || || |||| || ||| || | | ||| || | || ||| || || | || ||

|

| | ||| || | | | ||

|

| | || | | ||| | || | || | ||| || || ||| | | || |

|

|| ||| || || | || || |||| || || || | || || | || ||| | || ||| | || ||| || || | | || || ||| || || || ||| |||| |

Pr(Wage>

250|A

ge)

10 / 18

Page 28: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Basis functionsI Options for f1, . . . , fd:

3. Piecewise polynomials:

20 30 40 50 60 70

50

10

01

50

20

02

50

Age

Wa

ge

Piecewise Cubic

20 30 40 50 60 70

50

10

01

50

20

02

50

Age

Wa

ge

Continuous Piecewise Cubic

20 30 40 50 60 70

50

10

01

50

20

02

50

Age

Wa

ge

Cubic Spline

20 30 40 50 60 70

50

10

01

50

20

02

50

Age

Wa

ge

Linear Spline

11 / 18

Page 29: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Cubic splines

I Define a set of knots ξ1 < ξ2 < · · · < ξK .

I We want the function f in Y = f(X) + ε to:

1. Be a cubic polynomial between every pair of knots ξi, ξi+1.

2. Be continuous at each knot.

3. Have continuous first and second derivatives at each knot.

I It turns out, we can write f in terms of K + 3 basis functions:

f(X) = β0+β1X+β2X2+β3X

3+β4h(X, ξ1)+· · ·+βK+3h(X, ξK)

where,

h(x, ξ) =

(x− ξ)3 if x > ξ

0 otherwise

12 / 18

Page 30: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Cubic splines

I Define a set of knots ξ1 < ξ2 < · · · < ξK .

I We want the function f in Y = f(X) + ε to:

1. Be a cubic polynomial between every pair of knots ξi, ξi+1.

2. Be continuous at each knot.

3. Have continuous first and second derivatives at each knot.

I It turns out, we can write f in terms of K + 3 basis functions:

f(X) = β0+β1X+β2X2+β3X

3+β4h(X, ξ1)+· · ·+βK+3h(X, ξK)

where,

h(x, ξ) =

(x− ξ)3 if x > ξ

0 otherwise

12 / 18

Page 31: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Cubic splines

I Define a set of knots ξ1 < ξ2 < · · · < ξK .

I We want the function f in Y = f(X) + ε to:

1. Be a cubic polynomial between every pair of knots ξi, ξi+1.

2. Be continuous at each knot.

3. Have continuous first and second derivatives at each knot.

I It turns out, we can write f in terms of K + 3 basis functions:

f(X) = β0+β1X+β2X2+β3X

3+β4h(X, ξ1)+· · ·+βK+3h(X, ξK)

where,

h(x, ξ) =

(x− ξ)3 if x > ξ

0 otherwise

12 / 18

Page 32: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Cubic splines

I Define a set of knots ξ1 < ξ2 < · · · < ξK .

I We want the function f in Y = f(X) + ε to:

1. Be a cubic polynomial between every pair of knots ξi, ξi+1.

2. Be continuous at each knot.

3. Have continuous first and second derivatives at each knot.

I It turns out, we can write f in terms of K + 3 basis functions:

f(X) = β0+β1X+β2X2+β3X

3+β4h(X, ξ1)+· · ·+βK+3h(X, ξK)

where,

h(x, ξ) =

(x− ξ)3 if x > ξ

0 otherwise

12 / 18

Page 33: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Cubic splines

I Define a set of knots ξ1 < ξ2 < · · · < ξK .

I We want the function f in Y = f(X) + ε to:

1. Be a cubic polynomial between every pair of knots ξi, ξi+1.

2. Be continuous at each knot.

3. Have continuous first and second derivatives at each knot.

I It turns out, we can write f in terms of K + 3 basis functions:

f(X) = β0+β1X+β2X2+β3X

3+β4h(X, ξ1)+· · ·+βK+3h(X, ξK)

where,

h(x, ξ) =

(x− ξ)3 if x > ξ

0 otherwise

12 / 18

Page 34: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Cubic splines

I Define a set of knots ξ1 < ξ2 < · · · < ξK .

I We want the function f in Y = f(X) + ε to:

1. Be a cubic polynomial between every pair of knots ξi, ξi+1.

2. Be continuous at each knot.

3. Have continuous first and second derivatives at each knot.

I It turns out, we can write f in terms of K + 3 basis functions:

f(X) = β0+β1X+β2X2+β3X

3+β4h(X, ξ1)+· · ·+βK+3h(X, ξK)

where,

h(x, ξ) =

(x− ξ)3 if x > ξ

0 otherwise

12 / 18

Page 35: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Natural cubic splines

Spline which is linear instead of cubic for X < ξ1, X > ξK .

20 30 40 50 60 70

50

10

01

50

20

02

50

Age

Wa

ge

Natural Cubic Spline

Cubic Spline

The predictions are more stable for extreme values of X.

13 / 18

Page 36: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Choosing the number and locations of knots

The locations of the knots are typically quantiles of X.

The number of knots, K, is chosen by cross validation:

2 4 6 8 10

16

00

162

016

40

16

60

168

0

Degrees of Freedom of Natural Spline

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

00

162

016

40

16

60

168

0

Degrees of Freedom of Cubic Spline

Me

an

Sq

ua

red

Err

or

14 / 18

Page 37: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Choosing the number and locations of knots

The locations of the knots are typically quantiles of X.

The number of knots, K, is chosen by cross validation:

2 4 6 8 10

16

00

16

20

16

40

16

60

16

80

Degrees of Freedom of Natural Spline

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

00

16

20

16

40

16

60

16

80

Degrees of Freedom of Cubic Spline

Me

an

Sq

ua

red

Err

or

14 / 18

Page 38: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Natural cubic splines vs. polynomial regression

I Splines can fit complex functions with few parameters.

I Polynomials require high degree terms to be flexible.

I High-degree polynomials can be unstable at the edges.

20 30 40 50 60 70 80

50

10

01

50

20

02

50

30

0

Age

Wa

ge

Natural Cubic Spline

Polynomial

15 / 18

Page 39: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Smoothing splines

Find the function f which minimizes

n∑i=1

(yi − f(xi))2 + λ

∫f ′′(x)2dx

I The RSS of the model.

I A penalty for the roughness of the function.

Facts:I The minimizer f is a natural cubic spline, with knots at each

sample point x1, . . . , xn.

I Obtaining f is similar to a Ridge regression.

16 / 18

Page 40: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Smoothing splines

Find the function f which minimizes

n∑i=1

(yi − f(xi))2 + λ

∫f ′′(x)2dx

I The RSS of the model.

I A penalty for the roughness of the function.

Facts:I The minimizer f is a natural cubic spline, with knots at each

sample point x1, . . . , xn.

I Obtaining f is similar to a Ridge regression.

16 / 18

Page 41: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Deriving a smoothing spline

1. Show that if you fix the values f(x1), . . . , f(x2), the roughness∫f ′′(x)2dx

is minimized by a natural cubic spline.

2. Deduce that the solution to the smoothing spline problem is anatural cubic spline, which can be written in terms of its basisfunctions.

f(x) = β0 + β1f1(x) + . . . βn+3fn+3(x)

3. Letting N be a matrix with N(i, j) = fj(xi), we can write theobjective function:

(y −Nβ)T (y −Nβ) + λβTΩNβ,

where ΩN(i, j) =∫N ′′i (t)N ′′j (t)dt.

17 / 18

Page 42: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Deriving a smoothing spline

1. Show that if you fix the values f(x1), . . . , f(x2), the roughness∫f ′′(x)2dx

is minimized by a natural cubic spline.

2. Deduce that the solution to the smoothing spline problem is anatural cubic spline, which can be written in terms of its basisfunctions.

f(x) = β0 + β1f1(x) + . . . βn+3fn+3(x)

3. Letting N be a matrix with N(i, j) = fj(xi), we can write theobjective function:

(y −Nβ)T (y −Nβ) + λβTΩNβ,

where ΩN(i, j) =∫N ′′i (t)N ′′j (t)dt.

17 / 18

Page 43: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Deriving a smoothing spline

1. Show that if you fix the values f(x1), . . . , f(x2), the roughness∫f ′′(x)2dx

is minimized by a natural cubic spline.

2. Deduce that the solution to the smoothing spline problem is anatural cubic spline, which can be written in terms of its basisfunctions.

f(x) = β0 + β1f1(x) + . . . βn+3fn+3(x)

3. Letting N be a matrix with N(i, j) = fj(xi), we can write theobjective function:

(y −Nβ)T (y −Nβ) + λβTΩNβ,

where ΩN(i, j) =∫N ′′i (t)N ′′j (t)dt.

17 / 18

Page 44: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Deriving a smoothing spline

4. By simple calculus, the coefficients β which minimize

(y −Nβ)T (y −Nβ) + λβTΩNβ,

are β = (NTN + λΩN)−1NT y.

5. Note that the predicted values are a linear function of theobserved values:

y = N(NTN + λΩN)−1NT︸ ︷︷ ︸Sλ

y

18 / 18

Page 45: Lecture 16: High-dimensional regression, non-linear regression · Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining

Deriving a smoothing spline

4. By simple calculus, the coefficients β which minimize

(y −Nβ)T (y −Nβ) + λβTΩNβ,

are β = (NTN + λΩN)−1NT y.

5. Note that the predicted values are a linear function of theobserved values:

y = N(NTN + λΩN)−1NT︸ ︷︷ ︸Sλ

y

18 / 18