gilberto a. paula - ime-uspgiapaula/slides_exemplos_semip.pdf · semiparametric models with...
TRANSCRIPT
Semiparametric models with applications using R
Gilberto A. Paula
Instituto de Matemática e EstatísticaUniversidade de São Paulo, Brasil
2o Semestre 2016
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 1 / 61
Examples
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 2 / 61
Examples
Voltage drop data
Description
As a 1st example we will consider the voltage drop data (Montgomeryand Peck, 2001) in which a battery voltage drop in a guided missilemotor is observed over the time of missile flight. It was intended avoltage drop model for using a digital-analog simulation model of themissile. Altogether there are 41 observations.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 3 / 61
Examples
Scatter plot of voltage drop data
0 5 10 15 20
810
1214
Time
Volta
ge
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 4 / 61
Examples
Scatter plot of voltage drop data
0 5 10 15 20
810
1214
Time
Volta
ge
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 5 / 61
Examples
Possible model
Description
The data suggest a nonparametric model such as:
Voltagei = α+ f (Timei) + ǫi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 6 / 61
Examples
Possible model
Description
The data suggest a nonparametric model such as:
Voltagei = α+ f (Timei) + ǫi ,
where ǫi∼ N(0, σ2) for i = 1, . . . , 41, with f (·) being a continuous,smooth and nonparametric function.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 6 / 61
Examples
Boston housing data
Description
As a 2nd example we will consider the Boston housing data that havebeen analyzed by various authors (see, for instance, Belsley et al.1980). The aim of the study is to assess the association of houseprices with the air quality of the neighborhood by using regressionmodels. The outcome variable
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 7 / 61
Examples
Boston housing data
Description
As a 2nd example we will consider the Boston housing data that havebeen analyzed by various authors (see, for instance, Belsley et al.1980). The aim of the study is to assess the association of houseprices with the air quality of the neighborhood by using regressionmodels. The outcome variable
LMEDV (logarithm of the median house price in USD 1000)
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 7 / 61
Examples
Boston housing data
Description
As a 2nd example we will consider the Boston housing data that havebeen analyzed by various authors (see, for instance, Belsley et al.1980). The aim of the study is to assess the association of houseprices with the air quality of the neighborhood by using regressionmodels. The outcome variable
LMEDV (logarithm of the median house price in USD 1000)
is related with 13 explanatory variables. Altogether there are 506observations.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 7 / 61
Examples
Boston housing data
Illustration
We will work, for the purpose of motivating the semi-parametricmodels, with three explanatory variables:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 8 / 61
Examples
Boston housing data
Illustration
We will work, for the purpose of motivating the semi-parametricmodels, with three explanatory variables:
NOX (annual average nitric oxide concentration, p.p. 10 million);
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 8 / 61
Examples
Boston housing data
Illustration
We will work, for the purpose of motivating the semi-parametricmodels, with three explanatory variables:
NOX (annual average nitric oxide concentration, p.p. 10 million);
LSTAT (% lower status of the population);
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 8 / 61
Examples
Boston housing data
Illustration
We will work, for the purpose of motivating the semi-parametricmodels, with three explanatory variables:
NOX (annual average nitric oxide concentration, p.p. 10 million);
LSTAT (% lower status of the population);
DIS (weighted distances to five Boston employment centers).
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 8 / 61
Examples
Plot of LMEDV versus NOX
0.4 0.5 0.6 0.7 0.8
2.0
2.5
3.0
3.5
4.0
NOX
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 9 / 61
Examples
Plot of LMEDV versus NOX
0.4 0.5 0.6 0.7 0.8
2.0
2.5
3.0
3.5
4.0
NOX
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 10 / 61
Examples
Plot of LMEDV versus LSTAT
10 20 30
2.0
2.5
3.0
3.5
4.0
LSTAT
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 11 / 61
Examples
Plot of LMEDV versus LSTAT
10 20 30
2.0
2.5
3.0
3.5
4.0
LSTAT
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 12 / 61
Examples
Plot of LMEDV versus DIS
2 4 6 8 10 12
2.0
2.5
3.0
3.5
4.0
DIS
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 13 / 61
Examples
Plot of LMEDV versus DIS
2 4 6 8 10 12
2.0
2.5
3.0
3.5
4.0
DIS
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 14 / 61
Examples
Possible model
Description
We may try to fit initially the following semi-parametric model:
LMEDVi = α+ βNOXi + f1(LSTATi) + f2(DISi) + ǫi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 15 / 61
Examples
Possible model
Description
We may try to fit initially the following semi-parametric model:
LMEDVi = α+ βNOXi + f1(LSTATi) + f2(DISi) + ǫi ,
where ǫiiid∼ N(0, σ2) for i = 1, . . . , 506, with f1(·) and f2(·) being
continuous, smooth and nonparametric functions.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 15 / 61
Examples
Comparison of snacks
Description
As a 3rd example, we will consider a data set from an experimentdeveloped in School of Public Health - Universidade de São Paulo, inwhich 4 different forms of light snacks (B, C, D and E) were comparedacross 20 weeks with a traditional snack (A).
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 16 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
A: 22% hvf, 0% canola oil
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
A: 22% hvf, 0% canola oil
B: 0% hvf, 22% canola oil
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
A: 22% hvf, 0% canola oil
B: 0% hvf, 22% canola oil
C: 17% hvf, 5% canola oil
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
A: 22% hvf, 0% canola oil
B: 0% hvf, 22% canola oil
C: 17% hvf, 5% canola oil
D: 11% hvf, 11% canola oil
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
A: 22% hvf, 0% canola oil
B: 0% hvf, 22% canola oil
C: 17% hvf, 5% canola oil
D: 11% hvf, 11% canola oil
E: 5% hvf, 17% canola oil.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Comparison of snacks
Experiment description
The hydrogenated vegetable fat (hvt) was replaced by canola oil underdifferent proportions:
A: 22% hvf, 0% canola oil
B: 0% hvf, 22% canola oil
C: 17% hvf, 5% canola oil
D: 11% hvf, 11% canola oil
E: 5% hvf, 17% canola oil.
In this analysis we will only consider the variable TEXTURE that will becompared across time among the 5 snack types.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 17 / 61
Examples
Mean profiles
5 10 15 20
4050
6070
80
Weeks
Text
ure
ABCDE
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 18 / 61
Examples
Variation coefficient profiles
5 10 15 20
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Weeks
VC
of T
extu
reABCDE
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 19 / 61
Examples
Double gamma model
Description
Similarly to Paula (2013) we may consider a semi-parametric doublegamma model:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 20 / 61
Examples
Double gamma model
Description
Similarly to Paula (2013) we may consider a semi-parametric doublegamma model:
yijkind∼ G(µij , φij);
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 20 / 61
Examples
Double gamma model
Description
Similarly to Paula (2013) we may consider a semi-parametric doublegamma model:
yijkind∼ G(µij , φij);
log(µij) = β0 + βi + f (Weeksj);
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 20 / 61
Examples
Double gamma model
Description
Similarly to Paula (2013) we may consider a semi-parametric doublegamma model:
yijkind∼ G(µij , φij);
log(µij) = β0 + βi + f (Weeksj);
log(φ−1ij ) = γ0 + γi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 20 / 61
Examples
Double gamma model
Description
Similarly to Paula (2013) we may consider a semi-parametric doublegamma model:
yijkind∼ G(µij , φij);
log(µij) = β0 + βi + f (Weeksj);
log(φ−1ij ) = γ0 + γi ,
for i = 1(A), 2(B), 3(C), 4(D), 5(E), j = 2, 4, . . . , 20 and k = 1, . . . , 15,where φ−1
ij is the dispersion parameter, β0 + βi and γ0 + γi denote thesnack effects whereas f (·) is continuous, smooth and nonparametricfunction.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 20 / 61
Defining f (x)
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 21 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splines
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splines
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splines
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splinesThin-plate splines
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splinesThin-plate splines· · ·
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splinesThin-plate splines· · ·
Kernel
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splinesThin-plate splines· · ·
Kernel
Loess
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splinesThin-plate splines· · ·
Kernel
Loess
Wavelets
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Defining f (x)
How to define f (x)?
Piecewise-cubic splinesB-splines
Natural cubic splinesP-splinesThin-plate splines· · ·
Kernel
Loess
Wavelets
· · ·
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 22 / 61
Defining f (x)
Piecewise-cubic splines
Definition
Suppose the explanatory variable values are in the interval [a, b], fori = 1, . . . , n, with m internal knots, namely a < t1 < · · · < tm < b,where m ≤ n − 2.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 23 / 61
Defining f (x)
Piecewise-cubic splines
Definition
Suppose the explanatory variable values are in the interval [a, b], fori = 1, . . . , n, with m internal knots, namely a < t1 < · · · < tm < b,where m ≤ n − 2.
A simple choice for the nonparametric function f (x) could be thepiecewise-cubic spline, described as
f (x) = β0 + β1x + β2x2 +
m∑
j=1
γj(x − tj)3+,
where
(x − tj)+ =
{
0 se x ≤ tj(x − tj) se x > tj ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 23 / 61
Defining f (x)
Voltage drop data
Suppose m = 2 internal knots at t1 = 6.5 and t2 = 13.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 24 / 61
Defining f (x)
Voltage drop data
Suppose m = 2 internal knots at t1 = 6.5 and t2 = 13.
0 5 10 15 20
810
1214
Time
Volta
ge
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 24 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
yi = β0 + β1xi + β2x2i + β3x3
i + ǫi .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
yi = β0 + β1xi + β2x2i + β3x3
i + ǫi .
Fitting on the interval (6.5;13]
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
yi = β0 + β1xi + β2x2i + β3x3
i + ǫi .
Fitting on the interval (6.5;13]
yi = β0 + β1xi + β2x2i + β3x3
i + γ1(xi − 6.5)3 + ǫi .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
yi = β0 + β1xi + β2x2i + β3x3
i + ǫi .
Fitting on the interval (6.5;13]
yi = β0 + β1xi + β2x2i + β3x3
i + γ1(xi − 6.5)3 + ǫi .
Fitting on the interval (13;20]
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
yi = β0 + β1xi + β2x2i + β3x3
i + ǫi .
Fitting on the interval (6.5;13]
yi = β0 + β1xi + β2x2i + β3x3
i + γ1(xi − 6.5)3 + ǫi .
Fitting on the interval (13;20]
yi = β0 + β1xi + β2x2i + β3x3
i + γ1(xi − 6.5)3 + γ2(xi − 13)3 + ǫi .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
Voltage drop data
Fitting on the interval [0;6.5]
yi = β0 + β1xi + β2x2i + β3x3
i + ǫi .
Fitting on the interval (6.5;13]
yi = β0 + β1xi + β2x2i + β3x3
i + γ1(xi − 6.5)3 + ǫi .
Fitting on the interval (13;20]
yi = β0 + β1xi + β2x2i + β3x3
i + γ1(xi − 6.5)3 + γ2(xi − 13)3 + ǫi .
The parameter vector β = (β0, β1, β2, β3, γ1, γ2)⊤ may be estimated by
least-squares.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 25 / 61
Defining f (x)
B-splines
Definition
A more flexible class that contains candidates for f (x) is the B-splinesclass, defined as
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 26 / 61
Defining f (x)
B-splines
Definition
A more flexible class that contains candidates for f (x) is the B-splinesclass, defined as
f (x) =q
∑
j=1
Nj(x)τj , x ∈ [a, b],
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 26 / 61
Defining f (x)
B-splines
Definition
A more flexible class that contains candidates for f (x) is the B-splinesclass, defined as
f (x) =q
∑
j=1
Nj(x)τj , x ∈ [a, b],
where Nj(x) are the B-spline basis functions and τj are coefficients.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 26 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
the explanatory variable values have distinct values, namelya ≤ t1 < · · · < tq ≤ b,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
the explanatory variable values have distinct values, namelya ≤ t1 < · · · < tq ≤ b,
f (x) is a cubic spline in the intervals [t1, t2], . . . , [tq−1, tq],
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
the explanatory variable values have distinct values, namelya ≤ t1 < · · · < tq ≤ b,
f (x) is a cubic spline in the intervals [t1, t2], . . . , [tq−1, tq],
f (x) is linear in the intervals [a, t1] and [tq, b],
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
the explanatory variable values have distinct values, namelya ≤ t1 < · · · < tq ≤ b,
f (x) is a cubic spline in the intervals [t1, t2], . . . , [tq−1, tq],
f (x) is linear in the intervals [a, t1] and [tq, b],
f (x), f ′(x) and f ′′(x) are continuous.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
the explanatory variable values have distinct values, namelya ≤ t1 < · · · < tq ≤ b,
f (x) is a cubic spline in the intervals [t1, t2], . . . , [tq−1, tq],
f (x) is linear in the intervals [a, t1] and [tq, b],
f (x), f ′(x) and f ′′(x) are continuous.
Therefore, for NCS one has m = q − 2 internal knots.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
Natural cubic splines
Definition
NCS (see, for instance, Green and Silverman, 1994) may beexpressed as B-splines and have the following properties:
the explanatory variable values have distinct values, namelya ≤ t1 < · · · < tq ≤ b,
f (x) is a cubic spline in the intervals [t1, t2], . . . , [tq−1, tq],
f (x) is linear in the intervals [a, t1] and [tq, b],
f (x), f ′(x) and f ′′(x) are continuous.
Therefore, for NCS one has m = q − 2 internal knots.
NCS may also be defined for arbitrary m internal knots.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 27 / 61
Defining f (x)
P-splines
Definition
P-splines (Eilers and Marx, 1996) form a flexible class of B-splinesdefined as
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 28 / 61
Defining f (x)
P-splines
Definition
P-splines (Eilers and Marx, 1996) form a flexible class of B-splinesdefined as
f (x) =q
∑
j=1
Nj,k (x)τj , x ∈ [a, b],
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 28 / 61
Defining f (x)
P-splines
Definition
P-splines (Eilers and Marx, 1996) form a flexible class of B-splinesdefined as
f (x) =q
∑
j=1
Nj,k (x)τj , x ∈ [a, b],
where Nj,k (x) are the B-spline basis functions of degree k (de Boor,1978), for k = 0, 1, 2, . . ., τj are coefficients, m is the number of internalknots, namely a < t1 < · · · < tm < b, and m = q + k + 1.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 28 / 61
Defining f (x)
P-splines
Basis function
De Boor’s B-splines basis functions are expressed as
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 29 / 61
Defining f (x)
P-splines
Basis function
De Boor’s B-splines basis functions are expressed as
Nj,0(x) ={
1 tj ≤ x ≤ tj+1
0 otherwise
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 29 / 61
Defining f (x)
P-splines
Basis function
De Boor’s B-splines basis functions are expressed as
Nj,0(x) ={
1 tj ≤ x ≤ tj+1
0 otherwise
and
Nj,k (x) =(x − tj)(tj+k − tj)
Nj,k−1(x) +(tj+k+1 − x)(tj+k+1 − tj+1)
Nj+1,k−1(x),
for j = 1, . . . , q and k = 1, 2, 3, . . . .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 29 / 61
Defining f (x)
Penalization
Why to penalize?
The aim of penalization is to reduce the parametric space solution inorder to avoid overfitting.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 30 / 61
Additive normal model
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 31 / 61
Additive normal model
Additive normal model
Description
First, we will assume the following nonparametric model:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 32 / 61
Additive normal model
Additive normal model
Description
First, we will assume the following nonparametric model:
yi = f (ti) + ǫi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 32 / 61
Additive normal model
Additive normal model
Description
First, we will assume the following nonparametric model:
yi = f (ti) + ǫi ,
where f (t) is a continuous, smooth and nonparametric function and
ǫiiid∼ N(0, σ2), for i = 1, . . . , n.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 32 / 61
Additive normal model
Additive normal model
Penalization
A suggestion is to use the second derivative penalization.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 33 / 61
Additive normal model
Additive normal model
Penalization
A suggestion is to use the second derivative penalization. So, theobjective function to be minimized is given by
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 33 / 61
Additive normal model
Additive normal model
Penalization
A suggestion is to use the second derivative penalization. So, theobjective function to be minimized is given by
SP(f, λ) =n
∑
i=1
{yi − f (ti)}2 + λ
∫ b
a[f ′′(x)]2dx ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 33 / 61
Additive normal model
Additive normal model
Penalization
A suggestion is to use the second derivative penalization. So, theobjective function to be minimized is given by
SP(f, λ) =n
∑
i=1
{yi − f (ti)}2 + λ
∫ b
a[f ′′(x)]2dx ,
where f = (f (t1), . . . , f (tq))⊤, [a, b] denotes the data interval and λ > 0is the smoothing parameter.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 33 / 61
Additive normal model
Additive normal model
Penalization
A suggestion is to use the second derivative penalization. So, theobjective function to be minimized is given by
SP(f, λ) =n
∑
i=1
{yi − f (ti)}2 + λ
∫ b
a[f ′′(x)]2dx ,
where f = (f (t1), . . . , f (tq))⊤, [a, b] denotes the data interval and λ > 0is the smoothing parameter.
The solution is a natural cubic spline with knots at the distinct valuesa ≤ t1 < · · · < tq ≤ b.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 33 / 61
Additive normal model
Additive normal model
Smoothing parameter
One has the following λ interpretation:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 34 / 61
Additive normal model
Additive normal model
Smoothing parameter
One has the following λ interpretation:
when λ → 0 minimizing SP(f, λ) leads to a data interpolation;
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 34 / 61
Additive normal model
Additive normal model
Smoothing parameter
One has the following λ interpretation:
when λ → 0 minimizing SP(f, λ) leads to a data interpolation;
when λ → ∞ one has to impose f ′′(x) = 0 so the solution leads toa linear function for f (x);
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 34 / 61
Additive normal model
Additive normal model
Smoothing parameter
One has the following λ interpretation:
when λ → 0 minimizing SP(f, λ) leads to a data interpolation;
when λ → ∞ one has to impose f ′′(x) = 0 so the solution leads toa linear function for f (x);
then 0 < λ < ∞.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 34 / 61
Additive normal model
Semiparametric normal model
Penalization
One has for B-splines the following solution (see, for instance, Wood,2006):
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 35 / 61
Additive normal model
Semiparametric normal model
Penalization
One has for B-splines the following solution (see, for instance, Wood,2006):
∫ b
a[f ′′(x)]2dx = τ⊤Kτ ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 35 / 61
Additive normal model
Semiparametric normal model
Penalization
One has for B-splines the following solution (see, for instance, Wood,2006):
∫ b
a[f ′′(x)]2dx = τ⊤Kτ ,
where K is a (q × q) non-negative definite smoothing matrix that doesnot depend on τ .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 35 / 61
Additive normal model
Semiparametric normal model
Penalization
One has for B-splines the following solution (see, for instance, Wood,2006):
∫ b
a[f ′′(x)]2dx = τ⊤Kτ ,
where K is a (q × q) non-negative definite smoothing matrix that doesnot depend on τ .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 35 / 61
Semiparametric normal model
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 36 / 61
Semiparametric normal model
Semiparametric normal model
Description
We will assume now the following partially linear model:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 37 / 61
Semiparametric normal model
Semiparametric normal model
Description
We will assume now the following partially linear model:
yi = x⊤
i β + f (ti) + ǫi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 37 / 61
Semiparametric normal model
Semiparametric normal model
Description
We will assume now the following partially linear model:
yi = x⊤
i β + f (ti) + ǫi ,
where x i = (xi1, . . . , xip)⊤ contains values of explanatory variables,
β = (β1, . . . , βp)⊤, f (ti) = N⊤
i τ is a B-spline and ǫiiid∼ N(0, σ2), for
i = 1, . . . , n.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 37 / 61
Semiparametric normal model
Semiparametric normal model
Description
We will assume now the following partially linear model:
yi = x⊤
i β + f (ti) + ǫi ,
where x i = (xi1, . . . , xip)⊤ contains values of explanatory variables,
β = (β1, . . . , βp)⊤, f (ti) = N⊤
i τ is a B-spline and ǫiiid∼ N(0, σ2), for
i = 1, . . . , n.
Objective function
The penalized least-squares function becomes
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 37 / 61
Semiparametric normal model
Semiparametric normal model
Description
We will assume now the following partially linear model:
yi = x⊤
i β + f (ti) + ǫi ,
where x i = (xi1, . . . , xip)⊤ contains values of explanatory variables,
β = (β1, . . . , βp)⊤, f (ti) = N⊤
i τ is a B-spline and ǫiiid∼ N(0, σ2), for
i = 1, . . . , n.
Objective function
The penalized least-squares function becomes
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λτ⊤Kτ ,
where θ = (β⊤, τ⊤)⊤.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 37 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
starting with β(0) as the parametric least-squares solution;
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
starting with β(0) as the parametric least-squares solution;
τ (0) = (N⊤N + λK)−1N⊤(y − Xβ(0));
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
starting with β(0) as the parametric least-squares solution;
τ (0) = (N⊤N + λK)−1N⊤(y − Xβ(0));back-fitting (Gauss-Seidel) algorithm:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
starting with β(0) as the parametric least-squares solution;
τ (0) = (N⊤N + λK)−1N⊤(y − Xβ(0));back-fitting (Gauss-Seidel) algorithm:
β(m+1) = (X⊤X)−1X⊤{y − Nτ (m)}
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
starting with β(0) as the parametric least-squares solution;
τ (0) = (N⊤N + λK)−1N⊤(y − Xβ(0));back-fitting (Gauss-Seidel) algorithm:
β(m+1) = (X⊤X)−1X⊤{y − Nτ (m)}
τ (m+1) = (N⊤N + λK)−1N⊤{y − Xβ(m+1)},
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Iterative process
One has the following iterative process:
starting with β(0) as the parametric least-squares solution;
τ (0) = (N⊤N + λK)−1N⊤(y − Xβ(0));back-fitting (Gauss-Seidel) algorithm:
β(m+1) = (X⊤X)−1X⊤{y − Nτ (m)}
τ (m+1) = (N⊤N + λK)−1N⊤{y − Xβ(m+1)},
for m = 0, 1, 2, . . . and λ fixed.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 38 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
= N(N⊤N + λK)−1N⊤{y − Xβ}
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
= N(N⊤N + λK)−1N⊤{y − Xβ}
= H(λ){y − Xβ}.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
= N(N⊤N + λK)−1N⊤{y − Xβ}
= H(λ){y − Xβ}.
So, as suggested by Hastie and Tibshirani (1990) one may take
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
= N(N⊤N + λK)−1N⊤{y − Xβ}
= H(λ){y − Xβ}.
So, as suggested by Hastie and Tibshirani (1990) one may take
df(λ) = tr{H(λ)}
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
= N(N⊤N + λK)−1N⊤{y − Xβ}
= H(λ){y − Xβ}.
So, as suggested by Hastie and Tibshirani (1990) one may take
df(λ) = tr{H(λ)}
= tr{N(N⊤N + λK)−1N⊤}
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Effective degrees of freedom
From the iterative process at the convergence one has that
f = Nτ
= N(N⊤N + λK)−1N⊤{y − Xβ}
= H(λ){y − Xβ}.
So, as suggested by Hastie and Tibshirani (1990) one may take
df(λ) = tr{H(λ)}
= tr{N(N⊤N + λK)−1N⊤}
= tr{N⊤N(N⊤N + λK)−1}.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 39 / 61
Semiparametric normal model
Semiparametric normal model
Model selection
The Akaike Information Criterion (AIC) and the Bayesian InformationCriterion (BIC) are, respectively, defined as
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 40 / 61
Semiparametric normal model
Semiparametric normal model
Model selection
The Akaike Information Criterion (AIC) and the Bayesian InformationCriterion (BIC) are, respectively, defined as
AIC(λ) = −2L(θ, σ2) + 2{p + df(λ) + 1};
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 40 / 61
Semiparametric normal model
Semiparametric normal model
Model selection
The Akaike Information Criterion (AIC) and the Bayesian InformationCriterion (BIC) are, respectively, defined as
AIC(λ) = −2L(θ, σ2) + 2{p + df(λ) + 1};
BIC(λ) = −2L(θ, σ2) + log(n){p + df(λ) + 1},
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 40 / 61
Semiparametric normal model
Semiparametric normal model
Model selection
The Akaike Information Criterion (AIC) and the Bayesian InformationCriterion (BIC) are, respectively, defined as
AIC(λ) = −2L(θ, σ2) + 2{p + df(λ) + 1};
BIC(λ) = −2L(θ, σ2) + log(n){p + df(λ) + 1},
for given λ.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 40 / 61
Semiparametric normal model
Semiparametric normal model
Estimator of the variance
For σ2 one has (given λ) the following estimator:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 41 / 61
Semiparametric normal model
Semiparametric normal model
Estimator of the variance
For σ2 one has (given λ) the following estimator:
σ2 =
∑ni=1(yi − yi)
2
{n − p − df(λ)}.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 41 / 61
Semiparametric normal model
Semiparametric normal model
Estimator of the variance
For σ2 one has (given λ) the following estimator:
σ2 =
∑ni=1(yi − yi)
2
{n − p − df(λ)}.
Choosing the smoothing parameter
Minimizing the generalized cross-validation score
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 41 / 61
Semiparametric normal model
Semiparametric normal model
Estimator of the variance
For σ2 one has (given λ) the following estimator:
σ2 =
∑ni=1(yi − yi)
2
{n − p − df(λ)}.
Choosing the smoothing parameter
Minimizing the generalized cross-validation score
GCV(λ) =n∑n
i=1(yi − yi)2
{n − df(λ)}2 ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 41 / 61
Semiparametric normal model
Semiparametric normal model
Estimator of the variance
For σ2 one has (given λ) the following estimator:
σ2 =
∑ni=1(yi − yi)
2
{n − p − df(λ)}.
Choosing the smoothing parameter
Minimizing the generalized cross-validation score
GCV(λ) =n∑n
i=1(yi − yi)2
{n − df(λ)}2 ,
or minimizing (jointly) AIC(λ) and df(λ) for a grid of λ values.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 41 / 61
Semiparametric normal model
Alternative penalization
P-splines
Eilers and Marx (1996) proposes the alternative penalization
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 42 / 61
Semiparametric normal model
Alternative penalization
P-splines
Eilers and Marx (1996) proposes the alternative penalization
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λ
q∑
j=d+1
[∆dτj ]2,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 42 / 61
Semiparametric normal model
Alternative penalization
P-splines
Eilers and Marx (1996) proposes the alternative penalization
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λ
q∑
j=d+1
[∆dτj ]2,
where N is the de Boor’s basis and ∆dτj is the penalty difference termof order d .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 42 / 61
Semiparametric normal model
Alternative penalization
P-splines
Eilers and Marx (1996) proposes the alternative penalization
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λ
q∑
j=d+1
[∆dτj ]2,
where N is the de Boor’s basis and ∆dτj is the penalty difference termof order d .
In matrix notation
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λτ⊤D⊤
d Ddτ ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 42 / 61
Semiparametric normal model
Alternative penalization
P-splines
Eilers and Marx (1996) proposes the alternative penalization
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λ
q∑
j=d+1
[∆dτj ]2,
where N is the de Boor’s basis and ∆dτj is the penalty difference termof order d .
In matrix notation
SP(θ, λ) = (y − Xβ − Nτ )⊤(y − Xβ − Nτ ) + λτ⊤D⊤
d Ddτ ,
where Dd is the penalty difference matrix of order d .
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 42 / 61
Semiparametric normal model
P-splines
Penalization examples
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 43 / 61
Semiparametric normal model
P-splines
Penalization examples
∆τj = τj − τj−1
D1 =
−1 1 0 00 −1 1 00 0 −1 1
.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 43 / 61
Semiparametric normal model
P-splines
Penalization examples
∆τj = τj − τj−1
D1 =
−1 1 0 00 −1 1 00 0 −1 1
.
∆2τj = τj − 2τj−1 + τj−2
D2 =
1 −2 1 0 00 1 −2 1 00 0 1 −2 1
.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 43 / 61
Semiparametric normal model
P-splines
Penalization examples
∆τj = τj − τj−1
D1 =
−1 1 0 00 −1 1 00 0 −1 1
.
∆2τj = τj − 2τj−1 + τj−2
D2 =
1 −2 1 0 00 1 −2 1 00 0 1 −2 1
.
∆3τj = τj − 3τj−1 + 3τj−2 − τj−3
D3 =
−1 3 −3 1 0 00 −1 3 −3 1 00 0 −1 3 −3 1
.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 43 / 61
Packages in R
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 44 / 61
Packages in R
Packages in R
Packages in R
Some packages for fitting semi-parametric regression models availablefrom CRAN at http://CRAN.R-project.org:
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 45 / 61
Packages in R
Packages in R
Packages in R
Some packages for fitting semi-parametric regression models availablefrom CRAN at http://CRAN.R-project.org:
gamlss (Righy and Stasinopoulos, 2005)
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 45 / 61
Packages in R
Packages in R
Packages in R
Some packages for fitting semi-parametric regression models availablefrom CRAN at http://CRAN.R-project.org:
gamlss (Righy and Stasinopoulos, 2005)
mgcv (Wood, 2015)
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 45 / 61
Packages in R
Packages in R
Packages in R
Some packages for fitting semi-parametric regression models availablefrom CRAN at http://CRAN.R-project.org:
gamlss (Righy and Stasinopoulos, 2005)
mgcv (Wood, 2015)
ssym (Vanegas and Paula, 2015)
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 45 / 61
Voltage drop data
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 46 / 61
Voltage drop data
Scatter plot of voltage drop data
0 5 10 15 20
810
1214
Time
Volta
ge
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 47 / 61
Voltage drop data
Fitted model
Description
We will fit by the package ssym the following model:
Voltagei = α+ f (Timei) + ǫi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 48 / 61
Voltage drop data
Fitted model
Description
We will fit by the package ssym the following model:
Voltagei = α+ f (Timei) + ǫi ,
where α is an intercept, f (·) is a continuous, smooth and
nonparametric function and ǫiiid∼ N(0, σ2) for i = 1, . . . , 41.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 48 / 61
Voltage drop data
Fitted model
Description
We will fit by the package ssym the following model:
Voltagei = α+ f (Timei) + ǫi ,
where α is an intercept, f (·) is a continuous, smooth and
nonparametric function and ǫiiid∼ N(0, σ2) for i = 1, . . . , 41.
Suggestion: (n13 + 3) knots.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 48 / 61
Voltage drop data
> require(ssym)> fit1.battery = ssym.l(voltage ~ ncs(time), data=battery,family="Normal")> summary(fit1.battery)
Family: NormalSample size: 41Quantile of the Weights0% 25% 50% 75% 100%1 1 1 1 1
************************** Median/Location submodel ********************************** Parametric component
Estimate Std.Err z-value Pr(>|z|)(Intercept) 10.904 0.0542 201.3309 < 2.2e-16 *********** Nonparametric component
Smooth.param Basis.dimen d.f. Statistic p-valuencs(time) 4.243 5.000 4.931 2709 <2e-16 ***
**** Deviance: 41
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 48 / 61
Voltage drop data
************************* Skewness/Dispersion submodel ******************************* Parametric component
Estimate Std.Err z-value Pr(>|z|)(Intercept) -2.3484 0.2209 -10.6329 < 2.2e-16 ***
**** Deviance: 42.2
*******************************************************************Overall goodness-of-fit statistic: 0.152165
-2*log-likelihood: 20.068AIC: 33.931BIC: 45.808
> np.graph(fit1.battery,which=1,xlab="Time", ylab="Voltage")> np.graph(fit1.battery,which=1,xlab="Time", ylab="Voltage",obs=TRUE)
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 49 / 61
Voltage drop data
Voltage 95% confidence band
0 5 10 15 20
−4−2
02
4
Voltage
Non
para
met
ric e
stim
ate
0 5 10 15 20
−4−2
02
4
Voltage
Non
para
met
ric e
stim
ate
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 49 / 61
Voltage drop data
Voltage 95% confidence band
0 5 10 15 20
−4−2
02
4
Voltage
Non
para
met
ric e
stim
ate
0 5 10 15 20
−4−2
02
4
0 5 10 15 20
−4−2
02
4
Voltage
Non
para
met
ric e
stim
ate
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 50 / 61
Boston housing data
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 51 / 61
Boston housing data
Plot of LMEDV versus NOX
0.4 0.5 0.6 0.7 0.8
2.0
2.5
3.0
3.5
4.0
NOX
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 52 / 61
Boston housing data
Plot of LMEDV versus NOX
0.4 0.5 0.6 0.7 0.8
2.0
2.5
3.0
3.5
4.0
NOX
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 53 / 61
Boston housing data
Plot of LMEDV versus LSTAT
10 20 30
2.0
2.5
3.0
3.5
4.0
LSTAT
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 54 / 61
Boston housing data
Plot of LMEDV versus LSTAT
10 20 30
2.0
2.5
3.0
3.5
4.0
LSTAT
LME
DV
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 55 / 61
Boston housing data
Possible model
Description
We may try to fit initially the following semi-parametric model:
LMEDVi = α+ βNOXi + f (LSTATi) + ǫi ,
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 56 / 61
Boston housing data
Possible model
Description
We may try to fit initially the following semi-parametric model:
LMEDVi = α+ βNOXi + f (LSTATi) + ǫi ,
where ǫiiid∼ N(0, σ2) for i = 1, . . . , 506, with f (·) being a continuous,
smooth and nonparametric function.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 56 / 61
Boston housing data
> require(ssym)> require(MASS)> fit1.boston= ssym.l(log(medv) ~ nox + psp(lstat), data=Boston,family="Normal")
> summary(fit1.boston)
Family: NormalSample size: 506Quantile of the Weights0% 25% 50% 75% 100%1 1 1 1 1
************************** Median/Location submodel ********************************** Parametric component
Estimate Std.Err z-value Pr(>|z|)(Intercept) 3.1251 0.0650 48.0810 <2e-16 ***nox -0.1543 0.1106 -1.3954 0.1629
******** Nonparametric component
Smooth.param Basis.dimen d.f. Statistic p-valuepsp(lstat) 17.1 11.000 7.282 731.9 <2e-16 ***
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 56 / 61
Boston housing data
**** Deviance: 506
************************* Skewness/Dispersion submodel ******************************* Parametric component
Estimate Std.Err z-value Pr(>|z|)(Intercept) -2.9854 0.0629 -47.4859 < 2.2e-16 ***
**** Deviance: 762.68
*******************************************************************Overall goodness-of-fit statistic: 0.110987
-2*log-likelihood: -74.654AIC: -54.09BIC: -10.632
> np.graph(fit1.boston, which=1, xlab="Lstat",ylab="Estimate of f(Lstat)")> envelope(fit1.boston)
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 57 / 61
Boston housing data
f(Lstat) 95% confidence band
10 20 30
−1.0
−0.5
0.0
0.5
1.0
Lstat
Non
para
met
ric e
stim
ate
10 20 30
−1.0
−0.5
0.0
0.5
1.0
Lstat
Non
para
met
ric e
stim
ate
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 57 / 61
Boston housing data
Normal probability plot
−3 −2 −1 0 1 2 3
−4−2
02
Quantile N(0,1)
Mea
n de
vian
ce r
esid
ual
−3 −2 −1 0 1 2 3
−4−2
02
4Quantile N(0,1)
Dis
pers
ion
devi
ance
res
idua
l
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 58 / 61
Bibliography
Outline
1 Examples
2 Defining f (x)
3 Additive normal model
4 Semiparametric normal model
5 Packages in R
6 Voltage drop data
7 Boston housing data
8 Bibliography
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 59 / 61
Bibliography
References
References
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 60 / 61
Bibliography
References
References
Belsley DA, Kuh E and Welsch RE (1980). RegressionDiagnostics. Identifying Influential Data and Sources ofCollinearity. Wiley, New York.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 60 / 61
Bibliography
References
References
Belsley DA, Kuh E and Welsch RE (1980). RegressionDiagnostics. Identifying Influential Data and Sources ofCollinearity. Wiley, New York.
De Boor C (1978). A Practical Guide to Splines. AppliedMathematical Sciences. Springer-Verlag, New York.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 60 / 61
Bibliography
References
References
Belsley DA, Kuh E and Welsch RE (1980). RegressionDiagnostics. Identifying Influential Data and Sources ofCollinearity. Wiley, New York.
De Boor C (1978). A Practical Guide to Splines. AppliedMathematical Sciences. Springer-Verlag, New York.
Eilers PHC and Marx BD (1996). Flexible smoothing withB-splines and penalties. Statistical Science, 11, 89-121.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 60 / 61
Bibliography
References
References
Belsley DA, Kuh E and Welsch RE (1980). RegressionDiagnostics. Identifying Influential Data and Sources ofCollinearity. Wiley, New York.
De Boor C (1978). A Practical Guide to Splines. AppliedMathematical Sciences. Springer-Verlag, New York.
Eilers PHC and Marx BD (1996). Flexible smoothing withB-splines and penalties. Statistical Science, 11, 89-121.
Green PJ and Silverman BW (1994). Nonparametric Regressionand Generalized Linear Models. Chapman and Hall, London.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 60 / 61
Bibliography
References
References
Belsley DA, Kuh E and Welsch RE (1980). RegressionDiagnostics. Identifying Influential Data and Sources ofCollinearity. Wiley, New York.
De Boor C (1978). A Practical Guide to Splines. AppliedMathematical Sciences. Springer-Verlag, New York.
Eilers PHC and Marx BD (1996). Flexible smoothing withB-splines and penalties. Statistical Science, 11, 89-121.
Green PJ and Silverman BW (1994). Nonparametric Regressionand Generalized Linear Models. Chapman and Hall, London.
Hastie TJ and Tibshirani RJ (1990). Generalized Additive Models.Chapman and Hall, London.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 60 / 61
Bibliography
References
References
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 61 / 61
Bibliography
References
References
Montgomery DC, Peck EA and Vining GG (2001). Introduction toLinear Regression Analysis, 3rd Edition. Wiley, New York.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 61 / 61
Bibliography
References
References
Montgomery DC, Peck EA and Vining GG (2001). Introduction toLinear Regression Analysis, 3rd Edition. Wiley, New York.
Paula GA (2013). On diagnostics in double generalized linearmodels. Computational Statistics & Data Analysis, 68, 44-51.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 61 / 61
Bibliography
References
References
Montgomery DC, Peck EA and Vining GG (2001). Introduction toLinear Regression Analysis, 3rd Edition. Wiley, New York.
Paula GA (2013). On diagnostics in double generalized linearmodels. Computational Statistics & Data Analysis, 68, 44-51.
Righy, R. A. e Stasinopoulos, D. M. (2005). Generalized additivemodels for location, scale and shape (with discussion). AppliedStatistics 54, 507-554.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 61 / 61
Bibliography
References
References
Montgomery DC, Peck EA and Vining GG (2001). Introduction toLinear Regression Analysis, 3rd Edition. Wiley, New York.
Paula GA (2013). On diagnostics in double generalized linearmodels. Computational Statistics & Data Analysis, 68, 44-51.
Righy, R. A. e Stasinopoulos, D. M. (2005). Generalized additivemodels for location, scale and shape (with discussion). AppliedStatistics 54, 507-554.
Vanegas LH and Paula GA (2015). ssym: Fitting Semi-parametricLog-symmetric Regression Models. R package version 1.5.3.http://CRAN.R-project.org/package=ssym.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 61 / 61
Bibliography
References
References
Montgomery DC, Peck EA and Vining GG (2001). Introduction toLinear Regression Analysis, 3rd Edition. Wiley, New York.
Paula GA (2013). On diagnostics in double generalized linearmodels. Computational Statistics & Data Analysis, 68, 44-51.
Righy, R. A. e Stasinopoulos, D. M. (2005). Generalized additivemodels for location, scale and shape (with discussion). AppliedStatistics 54, 507-554.
Vanegas LH and Paula GA (2015). ssym: Fitting Semi-parametricLog-symmetric Regression Models. R package version 1.5.3.http://CRAN.R-project.org/package=ssym.
Wood SN (2015). mgcv: Mixed GAM Computation Vehicle withGCV/AIC;REML. Smoothness Estimation R package version1.8-7. http://CRAN.R-project.org/package=mgcv.
G. A. Paula (IME-USP) Semiparametric models in R 2o Semestre 2016 61 / 61