![Page 1: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/1.jpg)
1
GeneralizedLinear Models
Lecture 10: Nonparametric regression
![Page 2: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/2.jpg)
Nonparametric regression
yi = f(xi) + εi
How to estimate f?Could assume f(x) = β0 + β1x + β2x2 + β3x3
Or f(x) = β0 + β1xβ2
OK if you know the right form.But often better to assume only that f is continuous and smooth.
2
![Page 3: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/3.jpg)
Examples
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
Example A
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0
4
8
12
0.00 0.25 0.50 0.75 1.00
x
y
Example B
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50
60
70
80
90
2 3 4 5
eruptions
wai
ting
Old Faithful
3
![Page 4: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/4.jpg)
Outline
1 Kernel estimators
2 Local polynomials
3 Splines
4 Multivariate predictors
4
![Page 5: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/5.jpg)
Kernel estimators
In briefFor any point x0, the value of the function at that point f(x0) is somecombination of the (nearby) observations.
Kernel functionThe contribution of each observation xj , f(xj) to f(x0) is calculated using aweighting function or Kernel Kh(x0, xj). h is the width of theneighborhood.
5
![Page 6: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/6.jpg)
K-Nearest-Neighbor (KNN) Average
A simple estimate of f(x0) at any point x0 is the mean of the k pointsclosest to x0.
f̂(x0) = Ave(yi|xi ∈ Nk(x0)).
6
![Page 7: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/7.jpg)
KNN example
7
![Page 8: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/8.jpg)
Problem with KNN
Problem
Regression function f̂(x) is discontinuous.
SolutionWeight all points such that their contribution drop off smoothly withdistance.
8
![Page 9: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/9.jpg)
Kernel estimators
Nadaraya–Watson estimator
f̂h(x) =
n∑j=1
K(x− xj
h
)yj
n∑j=1
K(x− xj
h
)
K is a kernel function where∫K = 1, K(a) = K(−a), and K(0) ≥ K(a),
for all a.f̂ is a weighted moving averageNeed to choose K and h.
9
![Page 10: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/10.jpg)
Common kernels
Uniform
K(x) =
12 −1 < x < 10 otherwise.
Epanechnikov
K(x) =
34(1− x2) −1 < x < 10 otherwise.
Tri-cube
K(x) =c(1− |x|
3)3 −1 < x < 10 otherwise.
GaussianK(x) =
1√2π
e−12x
2
10
![Page 11: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/11.jpg)
Common kernels
0.00
0.25
0.50
0.75
−2 0 2
x
K(x
)
KernelEpanechnikov
Gaussian
Tri−cube
Uniform
11
![Page 12: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/12.jpg)
KNN vs Smooth Kernel Comparison
12
![Page 13: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/13.jpg)
Old Faithful
13
![Page 14: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/14.jpg)
Kernel smoothing
A smooth kernel is better, but otherwise the choice of kernel makeslittle difference.Optimal kernel (minimizing MSE) is Epanechnikov. It is also fast.The choice of h is crucial.
14
![Page 15: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/15.jpg)
Example A
15
![Page 16: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/16.jpg)
Example B
16
![Page 17: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/17.jpg)
Cross-validation
CV(h) =1n
n∑j=1
(yj − f̂(−j)h (xj))2
(−j) indicates jth point omitted from estimatePick h that minimizes CV.Works ok provided there are no duplicate (x, y) pairs. Butoccasionally odd results.
17
![Page 18: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/18.jpg)
Kernel smoothing in R
Many packages available. One of the better ones is KernSmooth:
fit <- locpoly(faithful$eruptions, faithful$waiting,degree=0, bandwidth=0.3) %>% as.tibble
ggplot(faithful) +geom_point(aes(x=eruptions,y=waiting)) +geom_line(data=fit, aes(x=x,y=y), col='blue')
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
50
60
70
80
90
2 3 4 5
eruptions
wai
ting
18
![Page 19: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/19.jpg)
Problems with the Smooth Weighted Average
19
![Page 20: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/20.jpg)
Outline
1 Kernel estimators
2 Local polynomials
3 Splines
4 Multivariate predictors
20
![Page 21: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/21.jpg)
Local linear regression
Kernel smoothingEquivalent to local constant regression at each point.
minn∑j=1
wj(x)(yj − a0)2
Local Linear RegressionFit a line at each point instead.
minn∑j=1
wj(x)(yj − a0 − a1xj)2
21
![Page 22: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/22.jpg)
Local polynomials
22
Step 1
World pulp price
Pul
p sh
ipm
ents
1015
2025
3035
500 600 700 800
Step 2
World pulp price
Wei
ght f
unct
ion
500 600 700 800
Step 3
World pulp price
Pul
p sh
ipm
ents
1015
2025
3035
•
500 600 700 800
Result
World pulp price
Pul
p sh
ipm
ents
1015
2025
3035
•
500 600 700 800
![Page 23: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/23.jpg)
Local polynomials
minn∑j=1
wj(x)(yj −p∑k=0
akxpj )2
Local linear and local quadratic are commonly used.Robust regression can be used insteadLess biased at boundaries than kernel smoothingLocal quadratic less biased at peaks and troughs than local linear orkernel
23
![Page 24: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/24.jpg)
Local polynomials in R
One useful implementation isKernSmooth::locpoly(x, y, degree, bandwidth)dpill can be used to choose the bandwidth h if degree=1.Otherwise, h could be selected by cross-validation.But most people seem to use trial and error — finding the largest hthat captures what they think they see by eye.
24
![Page 25: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/25.jpg)
Loess
Best known implementation is loess (locally quadratic)
fit <- loess(y ~ x, span=0.75, degree=2,family="gaussian", data)
Uses tri-cube kernel and variable bandwidth.span controls bandwidth. Specified in terms of percentage of datacovered.degree is order of polynomialUse family="symmetric" for a robust fit
25
![Page 26: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/26.jpg)
Loess
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50
60
70
80
90
2 3 4 5
eruptions
wai
ting
Old Faithful (Loess, span=0.75)
26
![Page 27: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/27.jpg)
Loess in R
smr <- loess(waiting ~ eruptions, data=faithful)ggplot(faithful) +
geom_point(aes(x=eruptions,y=waiting)) +ggtitle("Old Faithful (Loess, span=0.75)") +geom_line(aes(x=eruptions, y=fitted(smr)),
col='blue')
27
![Page 28: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/28.jpg)
Loess
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
Example A (Loess, span=0.75)
28
![Page 29: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/29.jpg)
Loess
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
Example A (Loess, span=0.22)
29
![Page 30: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/30.jpg)
Loess
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0
4
8
12
0.00 0.25 0.50 0.75 1.00
x
y
Example B (Robust Loess, span=0.75)
30
![Page 31: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/31.jpg)
Loess and geom_smooth()
ggplot(exa) +geom_point(aes(x=x,y=y)) +geom_smooth(aes(x=x,y=y), method='loess',
span=0.22)
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
31
![Page 32: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/32.jpg)
Loess and geom_smooth()
Because local polynomials use local linear models, we can easily findstandard errors for the fitted values.
Connected together, these form a pointwise confidence band.
Automatically produced using geom_smooth
32
![Page 33: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/33.jpg)
Outline
1 Kernel estimators
2 Local polynomials
3 Splines
4 Multivariate predictors
33
![Page 34: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/34.jpg)
Splines
34
![Page 35: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/35.jpg)
Splines
A spline is a continuous function f(x) interpolating all points (κj, yj) forj = 1, . . . , K and consisting of polynomials between each consecutive pairof ‘knots’ κj and κj+1.
Parameters constrained so that f(x) is continuous.Further constraints imposed to give continuous derivatives.Cubic splines most common, with f′, f′′ continuous.
35
![Page 36: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/36.jpg)
Splines
A spline is a continuous function f(x) interpolating all points (κj, yj) forj = 1, . . . , K and consisting of polynomials between each consecutive pairof ‘knots’ κj and κj+1.
Parameters constrained so that f(x) is continuous.Further constraints imposed to give continuous derivatives.Cubic splines most common, with f′, f′′ continuous.
35
![Page 37: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/37.jpg)
Smoothing splines
Let y = f(x) + ε where ε ∼ IID(0, σ2). ThenChoose f̂ to minimize
1n∑i(yi − f(xi))2 + λ
∫[f′′(x)]2dx
λ is smoothing parameter to be chosen∫[f′′(x)]2dx is a measure of roughness.
Solution: f̂ is a cubic spline with knots κi = xi, i = 1, . . . , n (ignoringduplicates).Other penalties lead to higher order splinesCross-validation can be used to select λ.
36
![Page 38: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/38.jpg)
Smoothing splines
37
![Page 39: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/39.jpg)
Smoothing splines
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50
60
70
80
90
2 3 4 5
eruptions
wai
ting
Old Faithful (Smoothing spline, lambda chosen by CV)
38
![Page 40: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/40.jpg)
Smoothing splines
smr <- smooth.spline(faithful$eruptions, faithful$waiting,
cv=TRUE)
smr <- data.frame(x=smr$x,y=smr$y)
ggplot(faithful) +
geom_point(aes(x=eruptions,y=waiting)) +
ggtitle("Old Faithful (Smoothing spline, lambda chosen by CV)") +
geom_line(data=smr, aes(x=x, y=y), col='blue')
39
![Page 41: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/41.jpg)
Smoothing splines
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
Example A (Smoothing spline, lambda chosen by CV)
40
![Page 42: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/42.jpg)
Smoothing splines
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0
4
8
12
0.00 0.25 0.50 0.75 1.00
x
y
Example B (Smoothing spline, lambda chosen by CV)
41
![Page 43: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/43.jpg)
Regression splines
Fewer knots than smoothing splines.Need to choose the knots rather than a smoothing parameter.Can be estimated as a linear model once knots are selected.
42
![Page 44: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/44.jpg)
General cubic regression splines
Let κ1 < κ2 < · · · < κK be “knots” in interval (a, b).
Let x1 = x, x2 = x2, x3 = x3, xj = (x− κj−3)3+ for j = 4, . . . , K + 3.
Then the regression of y on x1, . . . , xK+3 is piecewise cubic, butsmooth at the knots.
Choice of knots can be difficult and arbitrary.
Automatic knot selection algorithms very slow.
Often use equally spaced knots. Then only need to choose K.
43
![Page 45: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/45.jpg)
Natural splines in R
fit <- lm(waiting ~ ns(eruptions, df=6), faithful)ggplot(faithful) +
geom_point(aes(x=eruptions,y=waiting)) +ggtitle("Old Faithful (Natural splines, 6 df)") +geom_line(aes(x=eruptions, y=fitted(fit)), col='blue')
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
50
60
70
80
90
2 3 4 5
eruptions
wai
ting
Old Faithful (Natural splines, 6 df)
44
![Page 46: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/46.jpg)
Natural splines in R
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
Example A (Natural splines, 12 df)
45
![Page 47: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/47.jpg)
Natural splines in R
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0
4
8
12
0.00 0.25 0.50 0.75 1.00
x
y
Example B (Natural splines, 3 df)
46
![Page 48: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/48.jpg)
Natural splines in R
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0
4
8
12
0.00 0.25 0.50 0.75 1.00
x
y
Example B (Natural splines, 10 df)
47
![Page 49: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/49.jpg)
Splines and geom_smooth()
ggplot(exa) +geom_point(aes(x=x,y=y)) +geom_smooth(aes(x=x,y=y), method='gam',
formula = y ~ s(x,k=12))
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
−1
0
1
0.00 0.25 0.50 0.75 1.00
x
y
48
![Page 50: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/50.jpg)
Splines and geom_smooth()
Because regression splines use local linear models, we can easily findstandard errors for the fitted values.
Connected together, these form a pointwise confidence band.
Automatically produced using geom_smooth
49
![Page 51: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/51.jpg)
Outline
1 Kernel estimators
2 Local polynomials
3 Splines
4 Multivariate predictors
50
![Page 52: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/52.jpg)
Multivariate predictors
yi = f(xi) + εi, x ∈ Rd
Most methods extend naturally to higher dimensions.
Multivariate kernel methodsMultivariate local quadratic surfacesThin-plate splines (2-d version of smoothing splines)
ProblemThe curse of dimensionality!
51
![Page 53: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/53.jpg)
Multivariate predictors
yi = f(xi) + εi, x ∈ Rd
Most methods extend naturally to higher dimensions.
Multivariate kernel methodsMultivariate local quadratic surfacesThin-plate splines (2-d version of smoothing splines)
ProblemThe curse of dimensionality!
51
![Page 54: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/54.jpg)
Curse of dimensionality
Most data lie near the boundary
x <- matrix(runif(1e6,-1,1), ncol=100)boundary <- function(z) { any(abs(z) > 0.95) }
mean(apply(x[,1,drop=FALSE], 1, boundary))
## [1] 0.0494
mean(apply(x[,1:2], 1, boundary))
## [1] 0.0971
mean(apply(x[,1:5], 1, boundary))
## [1] 0.2276
52
![Page 55: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/55.jpg)
Curse of dimensionality
Most data lie near the boundary
x <- matrix(runif(1e6,-1,1), ncol=100)boundary <- function(z) { any(abs(z) > 0.95) }
mean(apply(x[,1:10], 1, boundary))
## [1] 0.4052
mean(apply(x[,1:50], 1, boundary))
## [1] 0.9205
mean(apply(x[,1:100], 1, boundary))
## [1] 0.9938
53
![Page 56: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/56.jpg)
Curse of dimensionality
Data are sparse
x <- matrix(runif(1e6,-1,1), ncol=100)nearby <- function(z) { all(abs(z) < 0.5) }mean(apply(x[,1,drop=FALSE], 1, nearby))
## [1] 0.4953
mean(apply(x[,1:2], 1, nearby))
## [1] 0.2512
mean(apply(x[,1:5], 1, nearby))
## [1] 0.0291
54
![Page 57: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/57.jpg)
Curse of dimensionality
Data are sparse
x <- matrix(runif(1e6,-1,1), ncol=100)nearby <- function(z) { all(abs(z) < 0.5) }mean(apply(x[,1:10], 1, nearby))
## [1] 5e-04
mean(apply(x[,1:50], 1, nearby))
## [1] 0
mean(apply(x[,1:100], 1, nearby))
## [1] 0
55
![Page 58: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/58.jpg)
Bivariate smoothing
lomod <- loess(sr ~ pop15 + ddpi, data=savings)
pop15
25
3035
4045ddpi 5
10
15
savings rate
10
20
30
40
56
![Page 59: Generalized Linear Models - Yanfei Kang, Ph.D. · 2018. 6. 7. · Generalized Linear Models Lecture 10: Nonparametric regression. Nonparametric regression y ... Local linear and local](https://reader036.vdocuments.net/reader036/viewer/2022062610/6116a8d54019e05851216453/html5/thumbnails/59.jpg)
Bivariate smoothing
library(mgcv)smod <- gam(sr ~ s(pop15, ddpi), data=savings)
pop15
25
3035
4045
ddpi
5
10
15
linear predictor 10
15
57