statistical analysis overview i session 2 peg burchinal frank porter graham child development...

Statistical Analysis Overview ISession 2

Peg Burchinal

Frank Porter Graham

Child Development Institute,

University of North Carolina-Chapel Hill

Overview: Statistical analysis overview I-b

• Nesting and intraclass correlation

• Hierarchical Linear Models

– 2 level models

– 3 level models

Nesting

• Nesting implies violation of the linear model assumptions of independence of observations

• Ignoring this dependency in the data results in inflated test statistics when observations are positively correlated– CAN DRAW INCORRECT CONCLUSIONS

Nesting and Design• Educational data often collected in schools,

classrooms, or special treatment groups– Lack of independence among individuals -> reduction in

variability• Pre-existing similarities (i.e., students within the cluster are more

similar than a students who would be randomly selected)• Shared instructional environment (i.e., variability in instruction

greater across classroom than within classroom)

• Educational treatments often assigned to schools or classrooms – Advantage: To avoid contamination, make study more

acceptable (often simple random assignment not possible)– Disadvantage: Analysis must take dependencies or

relatedness of responses within clusters into account

Intraclass Correlation (ICC)

• For models with clustering of individuals – “cluster effect”: proportion of variance in the

outcomes that is between clusters (compares within-cluster variance to between-cluster variance)

– Example – clustering of children in classroom. ICC describes proportion of variance associated with differences between classrooms

Intraclass Correlation

• Intraclass correlation (ICC) – measure of relatedness or dependence of clustered data– Proportion of variance that is between clusters

– ICC or = b / (b + w)

– ICC = 0 } no correlation among individuals within a cluster

= 1 } all responses within the clusters are identical

Nesting, Design, and ICC

• Taking ICC into account results in less power for given sample size – less independent information

• Design effect = mk / (1 + (m-1))– m= number of individuals per cluster– K=number of clusters– =ICC

• Effective sample size is number of clusters (k) when ICC=1 and is number of individuals (mk) when ICC=0

ICC and Hierchical Linear Models

• Hierarchical linear models (HLM) implicitly take nesting into account– Clustering of data is explicitly specified by

model– ICC is considered when estimating standard

errors, test statistics, and p-values

2 level HLM

• One level of nesting– Longitudinal: Repeated measures of individual

over time• Typically - Random intercepts and slopes to

describe individual patterns of change over time

– Clusters: Nesting of individuals within classes, families, therapy groups, etc.

• Typically - Random intercept to describe cluster effect

2 level HLM Random-intercepts models

• Corresponds to One-way ANOVA with random effects (mixed model ANOVA)

• Example: Classrooms randomly assigned to treatment or control conditions– All study children within classroom in same condition

– Post treatment outcome per child (can use pre-treatment as covariate to increase power)

– Level 1 = children in classroom

Level 2 = classroom

ICC reflects extent the degree of similarity among students within the classroom.

2 Level HLMRandom Intercept Model

• Level 1 – individual students within the classroom– Unconditional Model: Yij = B0j + rij

– Conditional Model: Yij = B0j + B1 Xij + rij

• Yij= outcome for ith student in jth class

• B0j= intercept (e.g., mean) for jth class

• B1= coefficient for individual-level covariate, Xij

• rij= random error term for ith student in jth class,

E ( rij) = 0, var (rij) =


• Level 2 – Classrooms – Unconditional model: B0j

= 00 + u 0j

– Conditional model: B0j = 00 + 01 Wj1 + 02 Wj2 + u 0j• B0j j= intercept (e.g., mean) for jth class• 00 = grand mean in population• 01 = treatment effect for Wj, dummy variable indicating

treatment status-.5 if control; .5 if treatment

• 02 coefficient for Wj2, class level covariate• u 0j = random effect associated with j-th classroom

E (uij) = 0, var (uij) =


• Combined (unconditional)– Yij = 00 + u 0j + rij

• Yij = B0j + rij

• B0j = 00 + u 0j

• Combined (conditional)– Yij = 00 + 01 Wj + 02 Wj2 + B1 Xij + u 0j + rij

• Yij = B0j + B1 Xij + rij

• B0j = 00 + 01 Wj + 02 Wj2 + u 0j

• Var (Yij ) = Var ( u 0j + rij ) = (

• ICC = = (

Example2 level HLM Random Intercepts

• Purdue Curriculum Study (Powell & Diamond)– Onsite or Remote coaching– 27 Head Start classes randomly assigned to onsite

coaching and 25 to remote coaching– Post-test scores on writing– Onsite: n=196, M=6.70, SD=1.54

Remote: n=171, M=7.05, SD=1.64

Example2 level HLM Random Intercepts

• Level 1: Writingij = B0j + B1 Writing-preij + rij

B1 =.56, se=.05, p<.001

E ( rij) = 0, var (rij) = 1.67

• Level 2: B0j = 00 + 01 Onsitej + u 0j

00 (intercept- remote group adjusted mean) = 3.74, se =.31

01(Onsite-Remote difference) = -.37, se=.17, p=.03

E (uij) = 0, var (uij) =

• ICC = (

2 Level HLM - Longitudinal (random-slopes and –intercepts models)

• Corresponds NOT to One-way ANOVA with random effects

• Example: Longitudinal assessment of children’s literacy skills during Pre-K years– Level 1 = individual growth curve

Level 2 = group growth curve

Level 1- Longitudinal HLM

• Level 1 – individual growth curve – Unconditional Model: Yij = B0j + B1j Ageij + rij

– Conditional Model: Yij = B0j + B1j Ageij + B2 Xij + rij• Yij= outcome for ith student on the jth occasion• Ageij = age at assessment for ith student on the jth occasion

• B0j= intercept for ith student• B1j= slope for Age for ith student• B2= coefficient for tiem-varying covariate, Xij\

• rij= random error term for ith student on the jth occasion E ( rij) = 0, var (rij) =

Level 2 – Longitudinal HLM• Level 2 – predicting individual trajectories

– Unconditional model: B0j = 00 + u 0j

B1j = 10 + u 1j

– Conditional model: B0j = 00 + 01 Wj1 + 02 Wj2 + u 0j

B1j = 10 + 11 Wj1 + 12 Wj2 + u 1j

• B0j= intercept for ith student B1j= slope for Age for ith student

• 00 = intercept in population10 = slope in population

• 01 = treatment effect on intercept for Wj, student -level covariate

11 = treatment effect on slope for Wj, student -level covariate

Level 2 – Longitudinal HLM• Level 2 – predicting individual trajectories

– Unconditional model: B0j = 00 + u 0j

B1j = 10 + u 1j

– Conditional model: B0j = 00 + 01 Wj1 + u 0j

B1j = 10 + 11 Wj1 + u 1j

• u 0j = random effect for individual intercept u 0j = random effect for individual slope• E (u0j) = 0, var (u0j) =

E (u1j) = 0, var (u1j) = cov u 0j, u 1j) =

var u 0j, u 1j)=

• level 1 and 2 error terms independent cov (rij, T) = 0

Example – Longitudinal HLM• Purdue Curriculum Study (Powell &

Diamond)Level 1 – estimating individual growth curves for

children in one treatment condition (Remote)– Level 2 – estimating population growth curves

for Remote condition

Blending Pre Post Follow-up

N

M (sd)

187

9.48 (5.34)

171

13.75 (4.57)

63

15.14 (4.60)

Example

• Level 1: blendingij = B0j + B1j Ageij + rij

estimated• Level 2: B0j = 00 + 01 Wj1 + u 0j

B1j = 10 + u 1j

Estimated results

Intercept 00 = 11.86 (se=.48), 00 = 10.03**

season 01 = 2.43* (se=.70)

Slope 10 = 1.51* (se=.60), 11 = 4.24** 10 = -1.45**

3 level HLM • 2 levels of nesting• Examples

– Longitudinal assessments of children in randomly assigned classrooms

• Level 1 – child level data• Level 2 – child’s growth curve• Level 3 – classroom level data

– Two levels of nesting such as children nested in classrooms that are nested in schools

• Level 1 – child level data• Level 2 – classroom level data• Level 3 – school level data

3 level Model-Random Intercepts• Children nested in classrooms, classrooms nested

in schools– Level 1 child-level model Yijk = ojk + eijk

• Yijk is achievement of child I in class J in school K

• ojk is mean score of class j in school k

• eojk is random “child effect”

– Classroom level model ojk = 00k + r0jk

• 00k is mean score for school k

• r0jk is random “class effect”

– School level model 00k = 000 + u00k

• 000 is grand mean score

• u00k is random “school effect”

3 level Model-Random Intercepts• Children nested in classrooms, classrooms nested

in schools– Level 1 child-level model Yijk = ojk + eijk

• eojk is random “child effect”,

E (eijk) = 0 , var(eijk) =

– Within classroom level model ojk = 00k + r0jk

• r0jk is random “class effect”,

E (r0jk ) = 0 , var(r0jk ) =

Assume variance among classes within school is the same

– Between classroom (school) 00k = 000 + 01 trt + u00k

E (u00k ) = 0 , var(u00k ) =

Partitioning variance

• Proportion of variance within classroom

• Proportion of variance among classrooms within schools

• Proportion of variance among schools

3 Level HLM – level 2 longitudinal and level 3 random intercepts

• Typically – treatment randomly assigned at classroom level, children followed longitudinally (e.g., Purdue Curriculum Study)– (within child) Level 1: Yijk = 0j k + 1j k Ageijk + rijk

E (eijk) = 0 , var(eijk) =

– (between child ) Level 2: 0jk

= 00k + r 0jk; 1j k = 10k + r 1jk

E (r0jk ) = 0 , var(r0jk ) = E (r1jk ) = 0 , var(r1jk ) =

– (between classes) Level 3: 00k = 00 + u00k; 10k = 10 + u10k

E (u00k ) = 0 , var(u00k ) = E (u10k ) = 0 , var(u10k ) =

Example Purdue Curriculum Study

• Level 1 – individual growth curve• Level 2 – classroom growth curve• Level 3 – treatment differences in classroom growth

curves

Writing Pre Post Follow-up

Onsite

M (se)

N=199

5.98 (1.49)

N=196

6.70 (1.54)

N=79

6.92 (1.74)

Remote

M (se)

N=187

6.01 (1.55)

N=171

7.04 (1.64)

N=63

7.48 (1.62)

Purdue Curriculum Study

Threats

• Homogeneity of variance – at each level– Nonnormal data with heavy tails– Bad data– Differences in variability among groups

• Normality assumption– Examine residuals– Robust standard error (large n)

• Inferences with small samples

3 Level HLMLongitudinal assessments of

individual in clustered settings

statistical analysis overview i session 2 peg burchinal frank porter graham child development...

Documents

classroom level

classroom icc

number of clusters r

nesting of individuals

icctaking icc

number of individuals

individual students

individuals reduction