spline and penalized regression

Upload: luh-putu-safitri-pratiwi

Post on 02-Jun-2018

250 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Spline and Penalized Regression

    1/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    Splines and penalized regression

    Patrick Breheny

    November 23

    Patrick Breheny STA 621: Nonparametric Statistics

  • 8/10/2019 Spline and Penalized Regression

    2/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    Introduction

    We are discussing ways to estimate the regression function f,where

    E(y|x) =f(x)

    One approach is of course to assume that fhas a certainshape, such as linear or quadratic, that can be estimatedparametrically

    We have also discussed locally weighted linear/polynomialmodels as a way of allowing fto be more flexible

    An alternative, more direct approach is penalization

    Patrick Breheny STA 621: Nonparametric Statistics

  • 8/10/2019 Spline and Penalized Regression

    3/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    Controlling smoothness with penalization

    Here, we directly solve for the function f that minimizes thefollowing objective function, a penalized version of the leastsquares objective:

    ni=1{y

    i f(xi)}

    2

    + {f

    (u)}

    2

    du

    The first term captures the fit to the data, while the secondpenalizes curvature note that for a line, f(u) = 0 for all u

    Here, is the smoothing parameter, and it controls thetradeoff between the two terms:

    = 0 imposes no restrictions and fwill therefore interpolatethe data= renders curvature impossible, thereby returning us toordinary linear regression

    Patrick Breheny STA 621: Nonparametric Statistics

  • 8/10/2019 Spline and Penalized Regression

    4/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    Splines

    It may sound impossible to solve for such an fover all possiblefunctions, but the solution turns out to be surprisingly simple

    This solutions, it turns out, depends on a class of functionscalled splines

    We will begin by introducing splines themselves, then move onto discuss how they represent a solution to our penalized

    regression problem

    Patrick Breheny STA 621: Nonparametric Statistics

    I d i

  • 8/10/2019 Spline and Penalized Regression

    5/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    Basis functions

    One approach for extending the linear model is to represent xusing a collection of basis functions:

    f(x) =

    Mm=1

    mhm(x)

    Because the basis functions {hm} are prespecified and themodel is linear in these new variables, ordinary least squares

    approaches for model fitting and inference can be employedThis idea is probably not new to you, as transformations andexpansions using polynomial bases are common

    Patrick Breheny STA 621: Nonparametric Statistics

    I t d ti

  • 8/10/2019 Spline and Penalized Regression

    6/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    Global versus local bases

    However, polynomial bases with global representations haveundesirable side effects: each observation affects the entirecurve, even for x values far from the observation

    In previous lectures, we got around this problem with localweighting

    In this lecture, we will explore instead an approach based onpiecewise basis functions

    As we will see, splines are piecewise polynomials joinedtogether to make a singe smooth curve

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 8/10/2019 Spline and Penalized Regression

    7/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    The piecewise constant model

    To understand splines, we will gradually build up a piecewisemodel, starting at the simplest one: the piecewise constant

    modelFirst, we partition the range ofx intoK+ 1 intervals bychoosingKpoints {k}

    Kk=1 called knots

    For our example involving bone mineral density, we will

    choose the tertiles of the observed ages

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 8/10/2019 Spline and Penalized Regression

    8/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    The piecewise constant model (contd)

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 8/10/2019 Spline and Penalized Regression

    9/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    The piecewise linear model

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 8/10/2019 Spline and Penalized Regression

    10/45

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

    The continuous piecewise linear model

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 8/10/2019 Spline and Penalized Regression

    11/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Basis functions for piecewise continuous models

    These constraints can be incorporated directly into the basisfunctions:

    h1

    (x) = 1, h2

    (x) =x, h3

    (x) = (x 1

    )+

    , h4

    (x) = (x 2

    )+

    ,

    where ()+ denotes the positive portion of its argument:

    r+= r ifr 0

    0 ifr

  • 8/10/2019 Spline and Penalized Regression

    12/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Basis functions for piecewise continuous models

    It can be easily checked that these basis functions lead to acomposite function f(x) that:

    Is everywhere continuousIs linear everywhere except the knotsHas a different slope for each region

    Also, note that the degrees of freedom add up: 3 regions 2degrees of freedom in each region - 2 constraints = 4 basis

    functions

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 8/10/2019 Spline and Penalized Regression

    13/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Splines

    The preceding is an example of a spline: a piecewise m 1degree polynomial that is continuous up to its first m 2derivatives

    By requiring continuous derivatives, we ensure that the

    resulting function is as smooth as possible

    We can obtain more flexible curves by increasing the degree ofthe spline and/or by adding knots

    However, there is a tradeoff:

    Few knots/low degree: Resulting class of functions may be toorestrictive (bias)Many knots/high degree: We run the risk of overfitting(variance)

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction( )

  • 8/10/2019 Spline and Penalized Regression

    14/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    The truncated power basis

    The set of basis functions introduced earlier is an example ofwhat is called the truncated power basis

    Its logic is easily extended to splines of order m:

    hj(x) =xj1 j = 1, . . . , m

    hm+k(x) = (x k)m1+ l= 1, . . . , K

    Note that a spline has m + Kdegrees of freedom

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionR i li ( i )

  • 8/10/2019 Spline and Penalized Regression

    15/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Quadratic splines

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionR ssi s li s ( t i )

  • 8/10/2019 Spline and Penalized Regression

    16/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Cubic splines

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

  • 8/10/2019 Spline and Penalized Regression

    17/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Additional notes

    These types of fixed-knot models are referred to as regressionsplines

    Recall that cubic splines contain 4 + Kdegrees of freedom:

    K+ 1 regions 4 parameters per region - K knots 3constraints per knot

    It is claimed that cubic splines are the lowest order spline forwhich the discontinuity at the knots cannot be noticed by thehuman eye

    There is rarely any need to go beyond cubic splines, which areby far the most common type of splines in practice

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

  • 8/10/2019 Spline and Penalized Regression

    18/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Implementing regression splines

    The truncated power basis has two principal virtues:

    Conceptual simplicityThe linear model is nested inside it, leading to simple tests of

    the null hypothesis of linearityUnfortunately, it has a number of computational/numericalflaws its inefficient and can lead to overflow and nearlysingular matrix problems

    The more complicated but numerically much more stable andefficient B-splinebasis is often employed instead

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

  • 8/10/2019 Spline and Penalized Regression

    19/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    B-splines in R

    Fortunately, one can use B-splines without knowing the detailsbehind their complicated construction

    In the splines package (which by default is installed but notloaded), the bs() function will implement a B-spline basis foryou

    X

  • 8/10/2019 Spline and Penalized Regression

    20/45

    Regression splines (parametric)Smoothing splines (nonparametric)

    Natural cubic splines

    Polynomial fits tend to be erratic at the boundaries of the data

    This is even worse for cubic splinesNatural cubic splinesameliorate this problem by adding theadditional (4) constraints that the function is linear beyondthe boundaries of the data

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

  • 8/10/2019 Spline and Penalized Regression

    21/45

    g p (p )Smoothing splines (nonparametric)

    Natural cubic splines (contd)

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

  • 8/10/2019 Spline and Penalized Regression

    22/45

    ( )Smoothing splines (nonparametric)

    Natural cubic splines, 6 df

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    )

  • 8/10/2019 Spline and Penalized Regression

    23/45

    Smoothing splines (nonparametric)

    Natural cubic splines, 6 df (contd)

    10 15 20 250

    .05

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    0.2

    0

    Age

    RelativechangeinspinalBMD

    femalemale

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    S hi li ( i )

  • 8/10/2019 Spline and Penalized Regression

    24/45

    Smoothing splines (nonparametric)

    Splines vs. Loess (6 df each)

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    spline loess

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    S thi li ( t i )

  • 8/10/2019 Spline and Penalized Regression

    25/45

    Smoothing splines (nonparametric)

    Natural splines in R

    R also provides a function to compute a basis for the naturalcubic splines, ns, which works almost exactly like bs, exceptthat there is no option to change the degree

    Note that a natural spline has m + K 4 degrees of freedom;thus, a natural cubic spline with Kknots has Kdegrees offreedom

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    26/45

    Smoothing splines (nonparametric)

    Mean and variance estimation

    Because the basis functions are fixed, all standard approachesto inference for regression are valid

    In particular, letting L=X(XX)1X denote the projectionmatrix,

    E(f) =Lf(x)

    V(f) =2L(LL)1L

    CV = 1

    niyi yi

    1 lii2

    Furthermore, extensions to logistic regression, Coxproportional hazards regression, etc., are straightforward

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    27/45

    Smoothing splines (nonparametric)

    Mean and variance estimation (contd)

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    28/45

    Smoothing splines (nonparametric)

    Mean and variance estimation (contd)

    Coronary heart disease study, K= 4

    100 120 140 160 180 200 220

    2

    1

    0

    1

    2

    3

    Systolic blood pressure

    f^(x)

    100 120 140 160 180 200 220

    0.0

    0.

    2

    0.

    4

    0.6

    0.8

    1.0

    Systolic blood pressure

    ^(x)

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    29/45

    Smoothing splines (nonparametric)

    Problems with knots

    Fixed-df splines are useful tools, but are not trulynonparametric

    Choices regarding the number of knots and where they arelocated are fundamentally parametric choices and have a largeeffect on the fit

    Furthermore, assuming that you place knots at quantiles,models will not be nested inside each other, which

    complicates hypothesis testing

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    30/45

    g p ( p )

    Controlling smoothness with penalization

    We can avoid the knot selection problem altogether via thenonparametric formulation introduced at the beginning oflecture: choose the function f that minimizes

    ni=1

    {yi f(xi)}2 +

    {f(u)}2du

    We will now see that the solution to this problem lies in thefamily of natural cubic splines

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    31/45

    g p ( p )

    Terminology

    First, some terminology:

    The parametric splines with fixed degrees of freedom that wehave talked about so far are called regression splines

    A spline that passes through the points {xi, yi} is called aninterpolating spline, and is said to interpolate the points{xi, yi}

    A spline that describes and smooths noisy data by passing

    close to {xi, yi} without the requirement of passing throughthem is called a smoothing spline

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)

    Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    32/45

    Natural cubic splines are the smoothest interpolators

    Theorem: Out of all twice-differentiable functions passing throughthe points {xi, yi}, the one that minimizes

    {f(u)}2du

    is a natural cubic spline with knots at every unique value of{xi}

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    33/45

    Natural cubic splines solve the nonparametric formulation

    Theorem: Out of all twice-differentiable functions, the one thatminimizes

    ni=1

    {yi f(xi)}2 +

    {f(u)}2du

    is a natural cubic spline with knots at every unique value of{xi}

    Patrick Breheny STA 621: Nonparametric Statistics

  • 8/10/2019 Spline and Penalized Regression

    34/45

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    35/45

    Solution

    The penalized objective function is therefore

    (yN)(y N) + ,

    where jk =

    Nj(t)N

    k (t)dt

    The solution is therefore

    = (NN + )1Ny

    Patrick Breheny STA 621: Nonparametric Statistics

  • 8/10/2019 Spline and Penalized Regression

    36/45

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    37/45

    Smoothing splines are linear smoothers (contd)

    In particular:

    CV = 1

    n

    i

    yi yi1 lii

    2

    Ef(x0) = i

    li(x0)f(x0)

    Vf(x0) =2i

    li(x0)2

    2 = i(yi yi)2n 2+

    = tr(L)

    = tr(LL)

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    38/45

    CV, GCV for BMD example

    log()

    0.002

    0.003

    0.004

    0.005

    0.006

    0.5 1.0 1.5

    male

    0.5 1.0 1.5

    female

    CV GCV

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    39/45

    Undersmoothing and oversmoothing of BMD data

    10 15 20 25

    0.

    05

    0.

    05

    0.

    15

    10 15 20 25

    0.

    05

    0.

    05

    0.

    15

    Males

    10 15 20 25

    0.

    05

    0.

    05

    0.

    15

    10 15 20 25

    0.

    05

    0.0

    5

    0.

    15

    10 15 20 25

    0.

    05

    0.0

    5

    0.

    15

    Females

    10 15 20 25

    0.

    05

    0.0

    5

    0.

    15

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    40/45

    Pointwise confidence bands

    age

    spnbmd

    0.05

    0.00

    0.05

    0.10

    0.15

    0.20

    10 15 20 25

    female

    10 15 20 25

    male

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

  • 8/10/2019 Spline and Penalized Regression

    41/45

    R implementation

    Recall that local regression had a simple, standard function forbasic one-dimensional smoothing (loess) and an extensivepackage for more comprehensive analyses (locfit)

    Spline-based smoothing is similar

    (smooth.spline) does not require any packages andimplements simple one-dimensional smoothing:

    fit

  • 8/10/2019 Spline and Penalized Regression

    42/45

    The mgcv package

    If you have a binary outcome variable or multiple covariates orwant confidence intervals, however, smooth.spline is lacking

    A very extensive package called mgcv provides those features,as well as much more

    The basic function is called gam, which stands for generalizedadditive model(well discuss GAMs more in a later lecture)

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

    Th k ( d)

  • 8/10/2019 Spline and Penalized Regression

    43/45

    The mgcv package (contd)

    The syntax of gamis very similar to glmand locfit, with afunction s()placed around any terms that you want a smoothfunction of:

    fit

  • 8/10/2019 Spline and Penalized Regression

    44/45

    IntroductionRegression splines (parametric)Smoothing splines (nonparametric)

    H th i t ti ( td)

  • 8/10/2019 Spline and Penalized Regression

    45/45

    Hypothesis testing (contd)

    These results do not hold exactly for the case of penalizedleast squares, but still provide a way to compute usefulapproximate p-values, using = tr(L) as the degrees offreedom in a model

    One can do this manually, or via

    anova(fit0,fit,test="F")

    anova(fit0,fit,test="Chisq")

    It should be noted, however, that such tests treat as fixed,

    even though in reality it is almost always estimated from thedata using CV or GCV

    Patrick Breheny STA 621: Nonparametric Statistics