Model Comparison for Model Comparison for Tree Resin Dose Effect Tree Resin Dose Effect
On TermitesOn Termites
Lianfen Qian
Florida Atlantic University
Co-author: Soyoung Ryu, University of Washington
OutlineOutline
Introduction
Longitudinal Data: Termites Data Set
Model Comparison
– Partially Linear Model
– Piecewise Linear Models
– Nonparametric Smoothing Methods
Conclusions
IntroductionIntroduction
Termite destruction in Florida is a serious problem.
Each year wood termites bore into thousands of homes and businesses causing millions of dollars of damage.
Current chemical pesticides that are used in the control of termites and protection from their damage are potentially harmful to Florida’s delicate environment.
Goal of studyGoal of study To determine the effectiveness of a natural tropical tree resin in controlling termites thus providing protection from their destruction.
Longitudinal DataLongitudinal Data
Definition: Longitudinal data is characterized by repeated measures over time on the same set of units.
Incomplete data: one or more of the sequences of measurements from units are incomplete.
Unbalanced data if the measurement was NEVER INTENDED to be taken
Missing data if the measurement was INTENDED to be taken
Longitudinal Data, Cont.Longitudinal Data, Cont.
Benefits Distinguish changes over time within units from the
differences among units Use units efficiently once they are enrolled in a
study
Issue: Repeated observations on the same subject tend to be correlated Need to find appropriate statistical analysis
considering this correlation.
Termites Data SetTermites Data Set
The resin was derived from the bark of tropical trees and was dissolved in a solvent and is placed on filter paper in two different levels of concentration, either 5mg or 10mg dosage.
There are eight dishes for each dose. Twenty five alive termites are placed in each dish.
Each dish was observed on 13 specific days. No observation was made on day 3 and day 9.
O O O O OO O O O OO O O O OO O O O OO O O O O
5mg or 10mg25
Termites Data, Cont.Termites Data, Cont.
5mg
10mg
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
Scatter PlotScatter Plot
Longitudinal PlotsLongitudinal Plots
(a) 5mg dose (b) 10mg dose
Partially Linear Model
Data Set
EDA
Strange Behavior of dishes 1 & 2 for 10mg dose is found
Mistake?
Remove dishes 1 & 2 for 10mg
Add additional unknown level of dosage. *mg, 5mg, 10mg
Add random effect of dish
NO YES
Common time effect for the different dose
Are error terms correlated?
Add correlation additional term to
catch the correlation
End
No
YES
Different time effect for the different dose
Partially Linear ModelPartially Linear Model
Benefits:
It is more efficient than the standard linear regression model, when the response variable depends on some variables in linear relationship, but is nonlinearly related to other covariates.
It can provide a parsimonious description of relationship between the response variable and explanatory variables.
It has the flexibility of the nonparametric model.
Partially Linear ModelPartially Linear Model
Yij = xijTβ + g(tij) + εij , i = 1,…,m, j = 1,…, ni
m is the number of units ni is the number of observations for each unit (xij, tij) is either independent and identically
distributed random design points or fixed design points
g is an unknown non-parametric function εij are a set of N random variables, each with zero
mean and finite variance.N = n1+…+nm
Back-fitting AlgorithmBack-fitting Algorithm
1. Given the current estimate , calculate residuals
rij=Yij-xTij and use these in place of Yij to
calculate a cubic spline estimate, g(t). 2. Given g, calculate residuals, rij=Yij-g(tij), and
update the estimate using generalized least squares,
ß = (XTV-1 X)-1 XTV-1 r, where X is the matrix with rows xT
ij, V is the assumed block diagonal covariance matrix of the data and r is the vector of residuals.
3. Repeat steps 1 and 2 for convergence.
^
^
^
^ ^
^
^
Spline Estimator of gSpline Estimator of g Among all functions g(x) with two continuous
derivatives, find the one that minimizes the penalized residual sum of squares:
∑{rij – g(xij)}2 + λ ∫ {g″(t)}2 dt
λ controls the smoothness of the fitted curve:
Larger λ => Smaller variance => Smoother curve => Larger bias
Trade-off between bias and variance.
The Generalized Cross-Validation function (Rice & Silverman, 1991) is used to choose λ: Minimize
2
( ) ( )
1 1
ˆ ˆ( ) ( ) , where is the cubic
spline estimator without jth observation of ith unit.
inmij ij
ij iji j
S r g t g
Original Data Set Original Data Set with Common Time Effectwith Common Time Effect
Removing Outliers (dishes 1 &2)Removing Outliers (dishes 1 &2)
Add Additional DoseAdd Additional Dose
Different Time Effect for DoseDifferent Time Effect for Dose
Piecewise Linear Regression ModelPiecewise Linear Regression Model
For 5mg, the data does not show change point. For 10mg, the data shows a change point. Use the following piecewise linear model:
E(y|x)= 0 + 1 x, if x< 0 + 1 x, otherwise.
• Change point estimated using M-estimation (Koul & Qian & Surgailis, 2003)
Two-Phase Linear Regression
Piecewise Linear RegressionPiecewise Linear Regression
Cubic Splines SmoothingCubic Splines Smoothing
Cubic Splines
E(y|x)=+1x+ 2x2+ 3x3+ 4 (x-7)3+
Cubic Spline MethodCubic Spline Method
(a) 5 mg dose (b) 10 mg dose (c) Unknown dose
No significant different between cubic smoothing and piecewise models
Model ComparisonsModel Comparisons Partially Linear Model gives
significant dose effects and non-linear time trend.
The dose effect under 10 mg is about 1.5 times faster than under 5 mg dosage in killing termites.
Time trend levels off by the end of the experiments. It is possible that there are not many termites in the dishes or the termites build up resistance to the tree resin.
Piecewise Linear ModelsPiecewise Linear Models It shows that there is a dramatic effect in
the first seven days under 10 mg dosage.
There is linear trend and dose effect under 5 mg dosage.
For the two strange dishes under 10 mg dosage, the first seven day effect is not significantly from 5 mg dose, while after seven days, it shows worse effect than 5 mg dose. This indicates that there are recording or operating mistakes for those two dishes’ records.
Cubic SplineCubic Spline
It shows the similar results as the piecewise linear models. There is one knot identified at the seventh day for 10 mg dosage, but there is none for 5mg dosage.
ConclusionsConclusions
Overall, 10 mg dose is significantly more effective than 5 mg dose.
For 10 mg dose, both piecewise linear model and cubic spline smoothing show that termites are killed in about 7 days.
Conclusion, Cont.Conclusion, Cont.
For 5 mg dose, all methods (linear, partial linear, piecewise linear and cubic spline smoothing) show that the effect is linear. It takes more than double time to kill termites comparing 10 mg dose.
Two dishes recorded for 10 mg dose behaviors insignificant from 5 mg dose for the first 7 days. After seven days, it shows significantly none effectiveness on killing termites.
Conclusions, cont.Conclusions, cont.
The estimated treatment effect is time varying with a change point at day 7.
The final piecewise model fits the data with adjusted R2=93.7%.
On average, 10mg is 68.9% more efficient than 5mg in killing termites during the first week.
Thank you !Thank you !
Florida Atlantic Universityhttp://www.math.fau.edu/qian
Please contact at
E-MAIL: [email protected]
PHONE: 561-297-2486
Department of Mathematical Sciences
Florida Atlantic University
Boca Raton, FL 33431
. . . . . . . . . . . . . . . . . . .