statistical analysis overview i session 2 peg burchinal frank porter graham child development...
TRANSCRIPT
Statistical Analysis Overview ISession 2
Peg Burchinal
Frank Porter Graham
Child Development Institute,
University of North Carolina-Chapel Hill
Overview: Statistical analysis overview I-b
• Nesting and intraclass correlation
• Hierarchical Linear Models
– 2 level models
– 3 level models
Nesting
• Nesting implies violation of the linear model assumptions of independence of observations
• Ignoring this dependency in the data results in inflated test statistics when observations are positively correlated– CAN DRAW INCORRECT CONCLUSIONS
Nesting and Design• Educational data often collected in schools,
classrooms, or special treatment groups– Lack of independence among individuals -> reduction in
variability• Pre-existing similarities (i.e., students within the cluster are more
similar than a students who would be randomly selected)• Shared instructional environment (i.e., variability in instruction
greater across classroom than within classroom)
• Educational treatments often assigned to schools or classrooms – Advantage: To avoid contamination, make study more
acceptable (often simple random assignment not possible)– Disadvantage: Analysis must take dependencies or
relatedness of responses within clusters into account
Intraclass Correlation (ICC)
• For models with clustering of individuals – “cluster effect”: proportion of variance in the
outcomes that is between clusters (compares within-cluster variance to between-cluster variance)
– Example – clustering of children in classroom. ICC describes proportion of variance associated with differences between classrooms
Intraclass Correlation
• Intraclass correlation (ICC) – measure of relatedness or dependence of clustered data– Proportion of variance that is between clusters
– ICC or = b / (b + w)
– ICC = 0 } no correlation among individuals within a cluster
= 1 } all responses within the clusters are identical
Nesting, Design, and ICC
• Taking ICC into account results in less power for given sample size – less independent information
• Design effect = mk / (1 + (m-1))– m= number of individuals per cluster– K=number of clusters– =ICC
• Effective sample size is number of clusters (k) when ICC=1 and is number of individuals (mk) when ICC=0
ICC and Hierchical Linear Models
• Hierarchical linear models (HLM) implicitly take nesting into account– Clustering of data is explicitly specified by
model– ICC is considered when estimating standard
errors, test statistics, and p-values
2 level HLM
• One level of nesting– Longitudinal: Repeated measures of individual
over time• Typically - Random intercepts and slopes to
describe individual patterns of change over time
– Clusters: Nesting of individuals within classes, families, therapy groups, etc.
• Typically - Random intercept to describe cluster effect
2 level HLM Random-intercepts models
• Corresponds to One-way ANOVA with random effects (mixed model ANOVA)
• Example: Classrooms randomly assigned to treatment or control conditions– All study children within classroom in same condition
– Post treatment outcome per child (can use pre-treatment as covariate to increase power)
– Level 1 = children in classroom
Level 2 = classroom
ICC reflects extent the degree of similarity among students within the classroom.
2 Level HLMRandom Intercept Model
• Level 1 – individual students within the classroom– Unconditional Model: Yij = B0j + rij
– Conditional Model: Yij = B0j + B1 Xij + rij
• Yij= outcome for ith student in jth class
• B0j= intercept (e.g., mean) for jth class
• B1= coefficient for individual-level covariate, Xij
• rij= random error term for ith student in jth class,
E ( rij) = 0, var (rij) =
2 Level HLMRandom Intercept Model
• Level 2 – Classrooms – Unconditional model: B0j
= 00 + u 0j
– Conditional model: B0j = 00 + 01 Wj1 + 02 Wj2 + u 0j• B0j j= intercept (e.g., mean) for jth class• 00 = grand mean in population• 01 = treatment effect for Wj, dummy variable indicating
treatment status-.5 if control; .5 if treatment
• 02 coefficient for Wj2, class level covariate• u 0j = random effect associated with j-th classroom
E (uij) = 0, var (uij) =
2 Level HLMRandom Intercept Model
• Combined (unconditional)– Yij = 00 + u 0j + rij
• Yij = B0j + rij
• B0j = 00 + u 0j
• Combined (conditional)– Yij = 00 + 01 Wj + 02 Wj2 + B1 Xij + u 0j + rij
• Yij = B0j + B1 Xij + rij
• B0j = 00 + 01 Wj + 02 Wj2 + u 0j
• Var (Yij ) = Var ( u 0j + rij ) = (
• ICC = = (
Example2 level HLM Random Intercepts
• Purdue Curriculum Study (Powell & Diamond)– Onsite or Remote coaching– 27 Head Start classes randomly assigned to onsite
coaching and 25 to remote coaching– Post-test scores on writing– Onsite: n=196, M=6.70, SD=1.54
Remote: n=171, M=7.05, SD=1.64
Example2 level HLM Random Intercepts
• Level 1: Writingij = B0j + B1 Writing-preij + rij
B1 =.56, se=.05, p<.001
E ( rij) = 0, var (rij) = 1.67
• Level 2: B0j = 00 + 01 Onsitej + u 0j
00 (intercept- remote group adjusted mean) = 3.74, se =.31
01(Onsite-Remote difference) = -.37, se=.17, p=.03
E (uij) = 0, var (uij) =
• ICC = (
2 Level HLM - Longitudinal (random-slopes and –intercepts models)
• Corresponds NOT to One-way ANOVA with random effects
• Example: Longitudinal assessment of children’s literacy skills during Pre-K years– Level 1 = individual growth curve
Level 2 = group growth curve
Level 1- Longitudinal HLM
• Level 1 – individual growth curve – Unconditional Model: Yij = B0j + B1j Ageij + rij
– Conditional Model: Yij = B0j + B1j Ageij + B2 Xij + rij• Yij= outcome for ith student on the jth occasion• Ageij = age at assessment for ith student on the jth occasion
• B0j= intercept for ith student• B1j= slope for Age for ith student• B2= coefficient for tiem-varying covariate, Xij\
• rij= random error term for ith student on the jth occasion E ( rij) = 0, var (rij) =
Level 2 – Longitudinal HLM• Level 2 – predicting individual trajectories
– Unconditional model: B0j = 00 + u 0j
B1j = 10 + u 1j
– Conditional model: B0j = 00 + 01 Wj1 + 02 Wj2 + u 0j
B1j = 10 + 11 Wj1 + 12 Wj2 + u 1j
• B0j= intercept for ith student B1j= slope for Age for ith student
• 00 = intercept in population10 = slope in population
• 01 = treatment effect on intercept for Wj, student -level covariate
11 = treatment effect on slope for Wj, student -level covariate
Level 2 – Longitudinal HLM• Level 2 – predicting individual trajectories
– Unconditional model: B0j = 00 + u 0j
B1j = 10 + u 1j
– Conditional model: B0j = 00 + 01 Wj1 + u 0j
B1j = 10 + 11 Wj1 + u 1j
• u 0j = random effect for individual intercept u 0j = random effect for individual slope• E (u0j) = 0, var (u0j) =
E (u1j) = 0, var (u1j) = cov u 0j, u 1j) =
var u 0j, u 1j)=
• level 1 and 2 error terms independent cov (rij, T) = 0
Example – Longitudinal HLM• Purdue Curriculum Study (Powell &
Diamond)Level 1 – estimating individual growth curves for
children in one treatment condition (Remote)– Level 2 – estimating population growth curves
for Remote condition
Blending Pre Post Follow-up
N
M (sd)
187
9.48 (5.34)
171
13.75 (4.57)
63
15.14 (4.60)
Example
• Level 1: blendingij = B0j + B1j Ageij + rij
estimated• Level 2: B0j = 00 + 01 Wj1 + u 0j
B1j = 10 + u 1j
Estimated results
Intercept 00 = 11.86 (se=.48), 00 = 10.03**
season 01 = 2.43* (se=.70)
Slope 10 = 1.51* (se=.60), 11 = 4.24** 10 = -1.45**
3 level HLM • 2 levels of nesting• Examples
– Longitudinal assessments of children in randomly assigned classrooms
• Level 1 – child level data• Level 2 – child’s growth curve• Level 3 – classroom level data
– Two levels of nesting such as children nested in classrooms that are nested in schools
• Level 1 – child level data• Level 2 – classroom level data• Level 3 – school level data
3 level Model-Random Intercepts• Children nested in classrooms, classrooms nested
in schools– Level 1 child-level model Yijk = ojk + eijk
• Yijk is achievement of child I in class J in school K
• ojk is mean score of class j in school k
• eojk is random “child effect”
– Classroom level model ojk = 00k + r0jk
• 00k is mean score for school k
• r0jk is random “class effect”
– School level model 00k = 000 + u00k
• 000 is grand mean score
• u00k is random “school effect”
3 level Model-Random Intercepts• Children nested in classrooms, classrooms nested
in schools– Level 1 child-level model Yijk = ojk + eijk
• eojk is random “child effect”,
E (eijk) = 0 , var(eijk) =
– Within classroom level model ojk = 00k + r0jk
• r0jk is random “class effect”,
E (r0jk ) = 0 , var(r0jk ) =
Assume variance among classes within school is the same
– Between classroom (school) 00k = 000 + 01 trt + u00k
E (u00k ) = 0 , var(u00k ) =
Partitioning variance
• Proportion of variance within classroom
• Proportion of variance among classrooms within schools
• Proportion of variance among schools
3 Level HLM – level 2 longitudinal and level 3 random intercepts
• Typically – treatment randomly assigned at classroom level, children followed longitudinally (e.g., Purdue Curriculum Study)– (within child) Level 1: Yijk = 0j k + 1j k Ageijk + rijk
E (eijk) = 0 , var(eijk) =
– (between child ) Level 2: 0jk
= 00k + r 0jk; 1j k = 10k + r 1jk
E (r0jk ) = 0 , var(r0jk ) = E (r1jk ) = 0 , var(r1jk ) =
– (between classes) Level 3: 00k = 00 + u00k; 10k = 10 + u10k
E (u00k ) = 0 , var(u00k ) = E (u10k ) = 0 , var(u10k ) =
Example Purdue Curriculum Study
• Level 1 – individual growth curve• Level 2 – classroom growth curve• Level 3 – treatment differences in classroom growth
curves
Writing Pre Post Follow-up
Onsite
M (se)
N=199
5.98 (1.49)
N=196
6.70 (1.54)
N=79
6.92 (1.74)
Remote
M (se)
N=187
6.01 (1.55)
N=171
7.04 (1.64)
N=63
7.48 (1.62)
Purdue Curriculum Study
Threats
• Homogeneity of variance – at each level– Nonnormal data with heavy tails– Bad data– Differences in variability among groups
• Normality assumption– Examine residuals– Robust standard error (large n)
• Inferences with small samples
3 Level HLMLongitudinal assessments of
individual in clustered settings