self-modeling regression for longitudinal data with time ...€¦ · self-modeling regression for...

Self-Modeling Regression for

Longitudinal Data with Time-

Invariant Covariates

Naomi S. Altman

Penn State University

naomi_altman@stat.psu.edu

Julio Villarreal

EdVision

Experiments in Which the

Response is a Curve

• growth curves

• longitudinal

• blood

concentration

curves

Features

• multiple curves with similar shape

• covariates or treatments (“time

invariant”)

Objectives

• flexible nonparametric model for shape

• interpretable parameters for treatment

effects

• test statistics for treatment effects

• test statistics for effect of treatment on

Shape-Invariant Regression

yi (t) = ααααi0 + Ai1 µµµµ0(bi0 +Bi1t) + εεεεit

(Lawton et al 1972)

• Common shape: µµµµ0 (t)

• Parameters: θθθθi’ = (ααααi0, ααααi1, ββββi0, ββββi1)

• Ai1= exp(ααααi1)

• Bi1= exp(ββββi1)

Why Use Shape-Invariant

Regression?

• If shape is not of interest:

– treatment effects can be summarized by a model

for the parameters

– common summaries (e.g. height of maximum,

time to maximum) depend on shape only through

functionals which are constant over “i”

• Shape can be estimated nonparametrically

– this allows a test of treatment effect on shape

– a functional form for shape may be suggested

Outline

• Nestling Growth Experiments

• Fitting a SIM model (algorithm)

• Testing effects (simulation)

• Results for Tree Swallow Growth

• Does the model fit?

Nestling Growth Experiments

• Several data sets and experimental conditions and covariates. Many response variables.

• Questions of interest:

a) Are there treatment effects?

b) Does the shape of the growth curve vary

with response variable?

c) Do treatment effects for the response variables differ? - e.g. If a treatment delays growth of tarsus length, does it delay growth of head circumference.

Experiment Design

For Tree Swallow Study

• 2 to 8 times per curve (mean 6)

• 297 nestlings

• A split plot design with whole plot

(nest) factors:

– covariate HatchDate

– dietary supplement/none

(courtesy of Matt Wasson, Cornell)

Growth Curves

Tarsus Length

0 2 4 6 8 10 12 14

Wing Length

0 2 4 6 8 10 12 14

Head Diameter

0 2 4 6 8 10 12 14

tarsus

transformed time

Growth Curve

0 2 4 6

transformed time

Growth Curve

0.0 0.05 0.10 0.15 0.20 0.25 0.30

-5*10^13

05*10^13

transformed time

Growth Curve

0 5*10 -̂6 10 -̂5 1.5*10 -̂5

transformed time

Growth Curve

2 4 6 8 10

tarsus lengthwing lengthtarsus length wing length

mass masshead head

age transformed time

Raw Data Fitted Growth Curve

Back to

Shape-Invariant Regression

yi (t) = ααααi0 + Ai1 µµµµ0(ββββιιιι0 +Bi1t) + εεεεit

(Lawton et al 1972)

• Common shape: µµµµ0 (t)

• Parameters: θθθθi’ = (ααααi0, ααααi1, ββββi0, ββββi1)

• Ai1= exp(ααααi1)

• Bi1= exp(ββββi1)

Fitting the SIM Model

Starting with θθθθij = (0,0,0,0)

1. Let Y*ij(t) = (Yij(t) – aaaaij0)/exp(aij1) “aligned

response”

t* = bij0 + exp(bij1 )t “aligned time”

2. Use a nonparametric smoother to regress Y* on t*.

Call this m(t*).

3. Use nonlinear mixed models to fit the model

yij (t) = ααααij0 + Aij1 m(ββββijijijij0 +Bij1t*) + εεεεijt

4. Check for convergence. If not converged, go to 1.

• In fitting a complicated SIM model such as the Bird Growth Data, it is convenient to fit first without the linear model for the parameters. In the last step, the linear model for the parameters can be fitted.

• The convergence criterion used was the change in predicted values.

• It is convenient to smooth using a penalized cubic spline. This can be done using a linear mixed models routine.

Penalized Cubic Spline

• Pick a dense set of equally spaced time points – in a typical study with 4-12 time points per curve, 20 points will do.

• Fit a linear mixed model:

cubic polynomial in time is the fixed effect

are the random effects

• The result is similar to a smoothing spline, but computationally simpler.

(Carroll and Ruppert, 1997; Eilers and Marx, 1997)

i i ttt ητγδµ +−+= +== ∑∑ 3

10 )()(

3)( +− it τ

subject to γ’γ ≤ C

It turns out that this is readily fitted by considering the

δδδδ’s to be fixed effects and the γγγγ’s to be random effects with common variance .

Why? Computationally very simple and fast compared

to other smoothing techniques.

This has two nice consequences:

The shape is a polynomial if =0.

The treatment effects on the curves can be readily

modeled by using the same linear model that we used

for the parameters.

Fitting the Penalized Spline

Mixed Models for the Parameters

• Suppose that we now add another level

to the model:

• yi (t) = ααααi0 + Ai1 µµµµ0(ββββi0 +Bi1t) + εεεεit

θθθθi’ = (ααααi0, ααααi1, ββββi0, ββββi1)

• Ai1= exp(ααααi1) Bi1= exp(ββββi1)

θθθθij = Xjdj + Zj Dj• where Xj and Zj are observed

predictors; dj and Dj are fixed and

random effects

Estimation of the Mixed Model

Parameters

is readily incorporated into the algorithm by either:

• adding the mixed model to the NLME step during iteration

(unconditional method)

• iterating the basic SIM model until convergence and then fitting the mixed model in the final NLME step

(conditional method)

Testing the Mixed Model

Parameters

• Conditional on the fitted shape, we

have a NLME. So, the LRT from the

conditional method should be

asymptotically chi-squared.

• How asymptotic is this?

• What about the unconditional method?

• Why not fit the entire model as one

huge NLME?

Distribution of the Conditional

0 1 2 3 4 5

Curve 1

Curve 2

Curve 3

Simulation Study

0.100.05σσσσεεεε

0.100.05σσσσb1

0.300.10σσσσb0

0.100.05σσσσa1

0.300.10σσσσa0

How Good is the Fit?

The ASE is an order of magnitude smaller than the fit

obtained by fitting each curve individually.

20 30 50

Number of Curves

20 30 50

Number of CurvesA

20 30 50

Number of Curves

20 points/curve

ASE*10000

5 10 15 20 25 30

ASE*10000

2 3 4 5 6 7

ASE*10000

2 3 4 5 6 7

ASE*10000

5 10 15 20 25 30

ASE*10000

5 10 15 20 25 30

20 30 50

Number of CurvesASE*10000

2 3 4 5 6 7

20 30 50

Number of Curves

20 30 50

Number of Curves

30 points/curve

20 30 20 30 20 30

How Good are the Parameter

Estimates?

20 30 50

Number of Curves

Curve 1- Small Variance A0

20 30 50

Number of Curves

20 30 50

Number of Curves

20 30 50

Number of Curves

1.0correlation

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

Number of Curves

Curve 1- Large Variance A0

Number of Curves

a1 b1 a1 b1

Small Variance Large Variance

a0 b0 a0 b0

Correlation Among Estimates:

r(A0,A1) r(A0,B0) r(A0,B1) r(A1,B0) r(A1,B1) r(B0,B1)

Curve 1 - Small Variance

r(A0,A1) r(A0,B0) r(A0,B1) r(A1,B0) r(A1,B1) r(B0,B1)

Curve 1 - Large Variance

correlation

-0.5 0.0 0.5

correlation

-0.5 0.0 0.5

r(a0,a1) r(a0,b0) r(a0,b1) r(a1, b0) r(a1, b1) r(b0,b1)

Distribution of the LRT

m=n=20 m=30 n=50

The top curves are the observed

percentiles of the CLRT versus

chi-square

The lower curves are the

observed percentiles of the

CLRT versus the LRT with the

correct parametric form.

chi-square percentile

CLRT percentile

parametric LRT percentile

Power of the CLRT

(level=.05)

89%87%1.0 σ

36%33%0.5 σ

unconditionalconditionalshift

99%91%1.0 σ

70%67%0.5 σ

unconditionalconditionalshift

20 curves, 20 time points

50 curves, 30 time points

Why not fit the whole thing as

one big mixed model?

SIM Model for the Nestling

Growth Study

yijk (t) = ααααij0 +exp( ααααij1)µµµµ0 (exp( ββββij1 )t) + εεεεijt

θθθθij =γγγγ0 + φφφφi(HatchDateij) + Treatmentij

HatchDate and Treatment are time invariant

and are applied to every bird in the nest

Note: ααααijk ↑ larger birds

ββββij1 ↑ faster growth

Tarsus Length versus Age

Tarsus

0 2 4 6 8 10 12 14

Tarsus

transformed time

Transform

ed Tarsus

0 1 2 3 4

transformed time

Growth Curve

0 2 4 6

2 4 6 8

26 knots

Random effects only

Some individual

growth curves

Aggregate Aligned

Parameters versus Hatch Datea0

Hatch Date

30 40 50 60

Hatch Date

30 40 50 60

-10^-12

-4*10^-13

04*10^-13

Hatch Date

30 40 50 60

Hatch Date

Parameters versus Treatment

Control Calcium

-3*10^-16

02*10^-16

Control Calcium

cor(a0,a1)=.88

cor(a0, b1)=.60

control calcium control calcium

control calcium

-1.5 -0.5 0.5 1.0

a1*1016

-3 -1 1

-0.2 -0.1 0.0 0.1

SIM Model for the Nestling

Growth Study

yijk (t) = ααααij0 +exp( ααααij1)µµµµ0 (exp( ββββij1 )t) + εεεεijt

θθθθij =γγγγ0 + ρρρρ1 HatchDateij + ρρρρ2222 HatchDateij2

+ Treatmentij + ζζζζij

Conditional Likelihoods

• Conclusion: Both hatch date (quadratic) and

treatment have an effect on nestling growth.

• Similarly, despite the small variance

component for αααα1, the fit is significantly worse without it.

p-value

19-1832full

13-1839treatment

16-1836hatch date

10-1843null

d.f.likelihoodmodel

Does the Model Fit?

• Does the Treatment Affect Shape?

• A simple idea: Fit a linear mixed model

to the LME for shape.

Does the Model Fit?

Crainiceanu and Ruppert (2003) show that the LRT cannot be

used to test for polynomial versus p-spline, unless the design

matrices are orthogonalized.

Xu (2003) found that for test equality of curves, P-spline fit of

full model under the null hypothesis is WORSE than the fit of

the null model (although the models are nested) unless the

design matrices are orthogonalized.

Does the Model Fit?

Crianiceanu and Ruppert (2003)

Xu (2003)

Good News: There is never a shortage of research problems.

Does the Model Fit?

∆LRT=56

P<0.05 (for d.f. < 40)

estimated d.f. @ 5

Why consider the SIM Model?

• can be used in a variety of problems:– growth

– sera concentration (hormones, drugs)

– bio-equivalence

– materials deformation

• more flexible than polynomial or other parametric fits

• just as easy to use and interpret as parametric nonlinear mixed model

• can be used to test goodness-of-fit of parametric models (particularly easy for polynomials)

• can be used to suggest parametric shapes

• can be used to compare across curves with different shapes but similar treatment effects

Main References

• Crainiceanu et al (2005) Biometrika

• Ke andWang, (2001) JASA

• Kneip and Gasser (1998)

• Lawton, Sylvestre, and Maggio (1972)

Technometrics

• Lindstrom (1995) Statistics in Medicine

• Murphy and van der Vaart (2000) JASA

And many thanks to ...

Chuck McCulloch (penalized splines)

Matt Wasson (data)

Doug Bates and Jose Pinheiro (lme)

JCSS editors, associate editors and

reviewers

The awards committee

self-modeling regression for longitudinal data with time ...€¦ · self-modeling regression for...

Documents

functional linear regression analysis for longitudinal...

quantifying social fields by turning regression modeling...

copy of logistic regression modeling

longitudinal modeling

instructors’ manual for regression modeling with...

multilevel and longitudinal modeling using stata ·...

joint modeling of longitudinal and survival data · joint...

longitudinal modeling of ultrasensitive and traditional

longitudinal space charge (lsc) modeling

regression analysis for longitudinal data

logistic regression for distribution modeling - clas...

bayesian latent factor regression for functional and...

handbook of regression and modeling -...

modeling in regression

predictive modeling using regression

marginal regression analysis of longitudinal data with time

regression for continuous and binary longitudinal ·...

functional modeling of longitudinal...

regression modeling strategies for microarchitectural...

functional modeling of longitudinal...