sample design for group- randomized trials howard s. bloom chief social scientist mdrc prepared for...

61
Sample Design for Group- Randomized Trials Howard S. Bloom Howard S. Bloom Chief Social Scientist Chief Social Scientist MDRC MDRC Prepared for the IES/NCER Summer Research Training Prepared for the IES/NCER Summer Research Training Institute held at Northwestern University on July Institute held at Northwestern University on July 27, 2010. 27, 2010.

Upload: ernest-lucas

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Sample Design for Group-Randomized Trials

Howard S. BloomHoward S. Bloom

Chief Social ScientistChief Social Scientist

MDRCMDRC

Prepared for the IES/NCER Summer Research Training Institute held at Prepared for the IES/NCER Summer Research Training Institute held at Northwestern University on July 27, 2010.Northwestern University on July 27, 2010.

Today we will examine

Sample size determinantsSample size determinants Precision requirementsPrecision requirements Sample allocation Sample allocation Covariate adjustmentsCovariate adjustments Matching and blockingMatching and blocking Subgroup analysesSubgroup analyses Generalizing findings for sites and blocksGeneralizing findings for sites and blocks Using two-level data for three-level Using two-level data for three-level

situationssituations

Part I:

The BasicsThe Basics

Statistical properties of group-randomized impact estimators

Unbiased estimatesUnbiased estimates

YYijij = = +B+B00TTjj+e+ejj++ijij

E(bE(b00) = B) = B00

Less precise estimatesLess precise estimates

VAR(VAR(ijij) = ) = 22

VAR(eVAR(ejj) = ) = 22

= = 22/(/(22++22))

)(/)()1(1 00 bSEbSEnGEM IC

Design Effect(for a given total number of individuals)

____________________________________________________________________________

IntraclassIntraclass Individuals per Group (n)Individuals per Group (n)

Correlation Correlation ((

__________________________________________________________________________

Sample design parameters

Number of randomized groups Number of randomized groups (J)(J)

Number of individuals per randomized Number of individuals per randomized group group (n)(n)

Proportion of groups randomized to Proportion of groups randomized to program status program status (P)(P)

Reporting precision

A minimum detectable effect (MDE) is the A minimum detectable effect (MDE) is the smallest true effect that has a “good chance” of smallest true effect that has a “good chance” of being found to be statistically significant. being found to be statistically significant.

We typically define an MDE as the smallest true We typically define an MDE as the smallest true effect that has 80 percent power for a two-tailed effect that has 80 percent power for a two-tailed test of statistical significance at the 0.05 level.test of statistical significance at the 0.05 level.

An MDE is reported in natural units whereas a An MDE is reported in natural units whereas a minimum detectable effect size (MDES) is minimum detectable effect size (MDES) is reported in units of standard deviations reported in units of standard deviations

Minimum Detectable Effect SizesFor a Group-Randomized Design with = 0.05 and no Covariates

______________________________________________________________________Randomized Randomized Individuals per Group (n)Individuals per Group (n)Groups (J) 10 50 500Groups (J) 10 50 500 10 0.77 0.53 0.46 10 0.77 0.53 0.46 20 0.50 0.35 0.3020 0.50 0.35 0.30 40 0.35 0.24 0.2140 0.35 0.24 0.21 120 0.20 0.14 0.12120 0.20 0.14 0.12___________________________________ ___________________________________

Implications for sample design

It is extremely important to randomize It is extremely important to randomize an adequate number of groups.an adequate number of groups.

It is often far less important how many It is often far less important how many individuals per group you have. individuals per group you have.

Part II

Determining required precisionDetermining required precision

When assessing how much precision is needed:

Always ask “relative to what?”Always ask “relative to what?” Program benefitsProgram benefits Program costsProgram costs Existing outcome differencesExisting outcome differences Past program performancePast program performance

Effect Size Gospel According to Cohen and Lipsey

Cohen LipseyCohen Lipsey (speculative) (empirical) (speculative) (empirical)

______________________________________________________________________________________________

Small = 0.2Small = 0.2 Small = 0.15 Small = 0.15Medium = 0.5Medium = 0.5Medium = Medium =

0.450.45 Large = 0.8Large = 0.8 Large = 0.90 Large = 0.90

Five-year impacts of the Tennessee class-size experiment

Treatment:Treatment: 13-17 versus 22-26 students per class13-17 versus 22-26 students per class

Effect sizes:Effect sizes: 0.110.11 to 0.22 to 0.22 for reading and math for reading and math

Findings are summarized from Nye, Barbara, Larry V. Hedges and Spyros Findings are summarized from Nye, Barbara, Larry V. Hedges and Spyros Konstantopoulos (1999) “The Long-Term Effects of Small Classes: A Five-Konstantopoulos (1999) “The Long-Term Effects of Small Classes: A Five-Year Follow-up of the Tennessee Class Size Experiment,” Year Follow-up of the Tennessee Class Size Experiment,” Educational Educational Evaluation and Policy AnalysisEvaluation and Policy Analysis, Vol. 21, No. 2: 127-142., Vol. 21, No. 2: 127-142.

Annual reading and math growthReadingReading MathMath

Grade Grade Growth Growth GrowthGrowthTransition Transition Effect Size Effect Size Effect Size Effect Size -------------------------------------------------------------------------------------------------------------------------------- K - 1 K - 1 1.52 1.52 1.141.14 1 - 2 0.97 1.03 1 - 2 0.97 1.03 2 - 3 0.60 2 - 3 0.60 0.890.89 3 - 4 0.36 3 - 4 0.36 0.520.52 4 - 5 0.40 4 - 5 0.40 0.560.56 5 - 6 0.32 5 - 6 0.32 0.410.41 6 - 7 0.23 6 - 7 0.23 0.300.30 7 - 8 0.26 7 - 8 0.26 0.320.32 8 - 9 0.24 8 - 9 0.24 0.220.22 9 - 10 0.19 9 - 10 0.19 0.250.25 10 - 11 0.19 10 - 11 0.19 0.140.14 11 - 12 0.06 11 - 12 0.06 0.010.01--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Based on work in progress using documentation on the national norming samples for the CAT5, Based on work in progress using documentation on the national norming samples for the CAT5, SAT9, Terra Nova CTBS, Gates MacGinitie (for reading only), MAT8, Terra Nova CAT, and SAT9, Terra Nova CTBS, Gates MacGinitie (for reading only), MAT8, Terra Nova CAT, and

SAT10. 95% confidence intervals range in reading from +/- .03 to .15 and in math from +/- .03 to .22SAT10. 95% confidence intervals range in reading from +/- .03 to .15 and in math from +/- .03 to .22

Performance gap between “average” (50th percentile) and “weak” (10th percentile) schools

Subject and grade District I District II District III District IV

Reading

Grade 3 0.31 0.18 0.16 0.43

Grade 5 0.41 0.18 0.35 0.31

Grade 7 .025 0.11 0.30 NA

Grade 10 0.07 0.11 NA NA

Math

Grade 3 0.29 0.25 0.19 0.41

Grade 5 0.27 0.23 0.36 0.26

Grade 7 0.20 0.15 0.23 NA

Grade 10 0.14 0.17 NA NA

Source: District I outcomes are based on ITBS scaled scores, District II on SAT 9 scaled scores, District Source: District I outcomes are based on ITBS scaled scores, District II on SAT 9 scaled scores, District III on MAT NCE scores, and District IV on SAT 8 NCE scores. III on MAT NCE scores, and District IV on SAT 8 NCE scores.

Demographic performance gap in reading and math: Main NAEP scores

Subject and grade

Black-White

Hispanic-White

Male-Female

Eligible-Ineligible for free/reduced

price lunch

Reading

Grade 4 -0.83 -0.77 -0.18 -0.74

Grade 8 -0.80 -0.76 -0.28 -0.66

Grade 12 -0.67 -0.53 -0.44 -0.45

Math

Grade 4 -0.99 -0.85 0.08 -0.85

Grade 8 -1.04 -0.82 0.04 -0.80

Grade 12 -0.94 -0.68 0.09 -0.72

Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2002 Reading Assessment and 2000 Mathematics Assessment.National Assessment of Educational Progress (NAEP), 2002 Reading Assessment and 2000 Mathematics Assessment.

ES Results from Randomized Studies

Achievement Measure n Mean

Elementary School 389 0.33

Standardized test (Broad) 21 0.07

Standardized test (Narrow) 181 0.23

Specialized Topic/Test 180 0.44

Middle Schools 36 0.510.51

High Schools 43 0.270.27

Part III

The ABCs of Sample AllocationThe ABCs of Sample Allocation

Sample allocation alternatives

Balanced allocationBalanced allocation maximizes precision for a given sample size; maximizes precision for a given sample size; maximizes robustness to distributional maximizes robustness to distributional

assumptions.assumptions.

Unbalanced allocationUnbalanced allocation precision erodes precision erodes slowlyslowly with imbalance for a with imbalance for a

given sample sizegiven sample size imbalance can facilitate a larger sampleimbalance can facilitate a larger sample Imbalance can facilitate randomization Imbalance can facilitate randomization

Variance relationships for the program and control groups

Equal variances: Equal variances: when the program when the program does does notnot affect the outcome variance. affect the outcome variance.

Unequal variances: Unequal variances: when the program when the program doesdoes

affect the outcome variance.affect the outcome variance.

MDES for equal variances without covariates

)1(112

PPnJM

MDESJ

How allocation affects MDES

00.2)5.1(5.

1

)1(

1

PP

04.2)6.1(6.

1

18.2)7.1(7.

1

50.2)8.1(8.

1

33.3)9.1(9.

1

Minimum Detectable Effect Size For Sample

Allocations Given Equal Variances

AllocationAllocation ExampleExample** Ratio to Ratio to Balanced Balanced

AllocationAllocation 0.5/0.5 0.540.5/0.5 0.54 1.00 1.00 0.6/0.4 0.550.6/0.4 0.55 1.02 1.02 0.7/0.30.7/0.3 0.59 0.591.091.09 0.8/0.20.8/0.2 0.68 0.68 1.25 1.25 0.9/0.10.9/0.1 0.91 0.91 1.67 1.67________________________________________________________________________________** Example is for n = 20, J = 10, Example is for n = 20, J = 10, = 0.05, a one-tail hypothesis test and no = 0.05, a one-tail hypothesis test and no

covariatescovariates..

Implications of unbalanced allocations with unequal variances

JJbSE

C

C

P

PU

22

0)(

JJbseE

P

C

C

PE

22

0))((

Implications Continued

The estimated standard error is unbiasedThe estimated standard error is unbiased When the allocation is balancedWhen the allocation is balanced When the variances are equalWhen the variances are equal

The estimated standard error is biased upwardThe estimated standard error is biased upward When the larger sample has the larger varianceWhen the larger sample has the larger variance

The estimated standard error is biased downwardThe estimated standard error is biased downward When the larger sample has the smaller varianceWhen the larger sample has the smaller variance

Interim Conclusions

Don’t use the equal variance assumptionDon’t use the equal variance assumption for an unbalanced allocation with many for an unbalanced allocation with many degrees of freedomdegrees of freedom..

Use a balanced allocationUse a balanced allocation when there are when there are few degrees of freedomfew degrees of freedom..

References

Gail, Mitchell H., Steven D. Mark, Raymond J. Carroll, Gail, Mitchell H., Steven D. Mark, Raymond J. Carroll, Sylvan B. Green and David Pee (1996) “On Design Sylvan B. Green and David Pee (1996) “On Design Considerations and Randomization-Based Inferences Considerations and Randomization-Based Inferences for Community Intervention Trials,” for Community Intervention Trials,” Statistics in Statistics in MedicineMedicine 15: 1069 – 1092. 15: 1069 – 1092.

Bryk, Anthony S. and Stephen W. Raudenbush (1988) Bryk, Anthony S. and Stephen W. Raudenbush (1988) “Heterogeneity of Variance in Experimental Studies: “Heterogeneity of Variance in Experimental Studies: A Challenge to Conventional Interpretations,” A Challenge to Conventional Interpretations,” Psychological Bulletin,Psychological Bulletin, 104(3): 396 – 404. 104(3): 396 – 404.

Part IV

Using Covariates to Reduce Using Covariates to Reduce

Sample SizeSample Size

Basic ideas

Goal:Goal: Reduce the number of clusters randomized Reduce the number of clusters randomized

Approach:Approach: Reduce the standard error of the Reduce the standard error of the impact estimator by controlling for baseline impact estimator by controlling for baseline covariatescovariates

Alternative CovariatesAlternative Covariates Individual-levelIndividual-level Cluster-levelCluster-level PretestsPretests Other characteristicsOther characteristics

Impact Estimation with a Covariate

yyijij = the outcome for student i from school j = the outcome for student i from school j

TTjj = 1 for treatment schools and 0 for control schools = 1 for treatment schools and 0 for control schools

XXjj = a covariate for school j = a covariate for school j

xxijij = a covariate for student i from school j = a covariate for student i from school j

eejj = a random error term for school j = a random error term for school j

ijij = a random error term for student i from school j = a random error term for student i from school j

ijjijjij exTy 10

ijjjjij exTy 10

Minimum Detectable Effect Size with a Covariate

MDES = minimum detectable effect sizeMDES = minimum detectable effect sizeMMJ-KJ-K = a degrees-of-freedom multiplier = a degrees-of-freedom multiplier11

J = the total number of schools randomizedJ = the total number of schools randomizedn = the number of students in a grade per schooln = the number of students in a grade per schoolP = the proportion of schools randomized to treatmentP = the proportion of schools randomized to treatment = the unconditional intraclass correlation (without a covariate)= the unconditional intraclass correlation (without a covariate)RR11

22 = the proportion of variance across individuals within schools (at = the proportion of variance across individuals within schools (at level 1) predicted by the covariatelevel 1) predicted by the covariate

RR2222 = the proportion of variance across schools (at level 2) predicted = the proportion of variance across schools (at level 2) predicted by the covariate by the covariate

11 For 20 or more degrees of freedom M For 20 or more degrees of freedom MJ-KJ-K equals 2.8 for a two-tail test and 2.5 for a equals 2.8 for a two-tail test and 2.5 for a one-tail test with statistical power of 0.80 and statistical significance of 0.05one-tail test with statistical power of 0.80 and statistical significance of 0.05

nJPPJPPMMDES RR

KJ)1(

)1)(1(

)1(

)1( 12

22

Questions Addressed Empirically about the Predictive Power of Covariates

School-level vs. student-level pretestsSchool-level vs. student-level pretests Earlier vs. later follow-up yearsEarlier vs. later follow-up years Reading vs. mathReading vs. math Elementary vs. middle vs. high schoolElementary vs. middle vs. high school All schools vs. low-income schools vs. low-performing All schools vs. low-income schools vs. low-performing

schoolsschools

Empirical Analysis

Estimate Estimate , R, R2222 and R and R11

22 from data on thousands of students from data on thousands of students

from hundreds of schools, during multiple years at five from hundreds of schools, during multiple years at five urban school districtsurban school districts

Summarize these estimates for reading and math in grades Summarize these estimates for reading and math in grades 3, 5, 8 and 103, 5, 8 and 10

Compute implications for minimum detectable effect sizesCompute implications for minimum detectable effect sizes

Estimated Parameters for Reading with a School-level Pretest Lagged One Year

___________________________________________________________________ ___________________________________________________________________

School DistrictSchool District ______________________________________________________________________________________________________________________ A B C D EA B C D E___________________________________________________________________ ___________________________________________________________________

Grade 3 Grade 3 0.20 0.15 0.19 0.22 0.20 0.15 0.19 0.22

0.160.16 RR22

2 2 0.31 0.77 0.74 0.51 0.750.31 0.77 0.74 0.51 0.75Grade 5Grade 5 0.25 0.15 0.20 NA 0.120.25 0.15 0.20 NA 0.12 RR22

22 0.33 0.50 0.81 NA 0.70 0.33 0.50 0.81 NA 0.70Grade 8Grade 8 0.18 NA 0.23 NA NA0.18 NA 0.23 NA NA RR22

22 0.77 NA 0.91 NA NA 0.77 NA 0.91 NA NAGrade 10Grade 10 0.15 NA 0.29 NA NA0.15 NA 0.29 NA NA RR22

22 0.93 NA 0.95 NA NA 0.93 NA 0.95 NA NA________________________________________________________________________________________________________________________________________

Minimum Detectable Effect Sizes for Reading with a School-Level

Pretest (Y-1) or a Student-Level Pretest (y-1) Lagged One Year ________________________________________________________ ________________________________________________________

Grade 3 Grade 5 Grade 8 Grade 10Grade 3 Grade 5 Grade 8 Grade 10

________________________________________________________________________________________________________________

20 schools randomized20 schools randomized

No covariate 0.57 0.56 0.61 0.62No covariate 0.57 0.56 0.61 0.62

YY-1-1 0.37 0.38 0.24 0.16 0.37 0.38 0.24 0.16

yy-1-1 0.38 0.40 0.28 0.15 0.38 0.40 0.28 0.15

40 schools randomized40 schools randomized

No covariate 0.39 0.38 0.42 0.42No covariate 0.39 0.38 0.42 0.42

YY-1-1 0.26 0.26 0.17 0.11 0.26 0.26 0.17 0.11

yy-1-1 0.26 0.27 0.19 0.10 0.26 0.27 0.19 0.10

60 schools randomized60 schools randomized

No covariate 0.32 0.31 0.34 0.34No covariate 0.32 0.31 0.34 0.34

YY-1-1 0.21 0.21 0.13 0.09 0.21 0.21 0.13 0.09

yy-1-1 0.21 0.22 0.15 0.08 0.21 0.22 0.15 0.08

________________________________________________________________________________________________________________

Key Findings

Using a pretest improves precision dramatically.Using a pretest improves precision dramatically. This improvement increases appreciably from This improvement increases appreciably from

elementary school to middle school to high school elementary school to middle school to high school because Rbecause R22

22 increases. increases.

School-level pretests produce as much precision as do School-level pretests produce as much precision as do student-level pretests.student-level pretests.

The effect of a pretest declines somewhat as the time The effect of a pretest declines somewhat as the time between it and the post-test increases.between it and the post-test increases.

Adding a second pretest increases precision slightly.Adding a second pretest increases precision slightly. Using a pretest for a different subject increases precision Using a pretest for a different subject increases precision

substantially.substantially. Narrowing the sample to schools that are similar to each Narrowing the sample to schools that are similar to each

other does not improve precision beyond that achieved other does not improve precision beyond that achieved by a pretest.by a pretest.

Source

Bloom, Howard S., Lashawn Richburg-Hayes and Alison Bloom, Howard S., Lashawn Richburg-Hayes and Alison Rebeck Black (2007) “Rebeck Black (2007) “Using Covariates to Improve Using Covariates to Improve Precision for Studies that Randomize Schools to Precision for Studies that Randomize Schools to Evaluate Educational Interventions”Evaluate Educational Interventions” Educational Educational Evaluation and Policy AnalysisEvaluation and Policy Analysis, 29(1): 30 – 59., 29(1): 30 – 59.

Part VThe Putative Power of Pairing

A Tail of Two TradeoffsA Tail of Two Tradeoffs(“It was the best of techniques. It was the worst of techniques.” (“It was the best of techniques. It was the worst of techniques.”

Who the dickens said that?)Who the dickens said that?)

Pairing

Why match pairs?Why match pairs? for face validityfor face validity for precisionfor precision

How to match pairs?How to match pairs? rank order clusters by covariate rank order clusters by covariate pair clusters in rank-ordered list pair clusters in rank-ordered list randomize clusters in each pairrandomize clusters in each pair

When to pair?

When the gain in predictive power When the gain in predictive power outweighs the loss of degrees of outweighs the loss of degrees of freedomfreedom

Degrees of freedomDegrees of freedom J - 2 without pairingJ - 2 without pairing J/2 - 1 with pairingJ/2 - 1 with pairing

Deriving the Minimum Required Predictive Power of Pairing

Without pairingWithout pairing

With pairingWith pairing

Breakeven RBreakeven R22

GRJGR bb SEMMDE )()( 020

GRJGR bb SERMMDE )(2)( 012/0 1

MMR

J

J

2

12/

2

22

min 1

The Minimum Required

Predictive Power of Pairing

Randomized Required PredictiveRandomized Required Predictive

Clusters (J) Power (RClusters (J) Power (R min min22))**

66 0.520.52 88 0.350.35

1010 0.260.26 2020 0.110.11 3030 0.070.07

**For a two-tail test.For a two-tail test.

A few key points about blocking

Blocking for face validity vs. blocking for Blocking for face validity vs. blocking for precisionprecision

Treating blocks as fixed effects vs.random Treating blocks as fixed effects vs.random effectseffects

Defining blocks using baseline informationDefining blocks using baseline information

Part VI

Subgroup Analyses #1:Subgroup Analyses #1:When to Emphasize ThemWhen to Emphasize Them

Confirmatory vs. Exploratory Findings

Confirmatory: Draw conclusions about the Confirmatory: Draw conclusions about the program’s effectiveness if results areprogram’s effectiveness if results are

Consistent with theory and contextual factorsConsistent with theory and contextual factorsStatistically significant and largeStatistically significant and largeAnd subgroup was pre-specifiedAnd subgroup was pre-specified

Exploratory: Develop hypotheses for further Exploratory: Develop hypotheses for further studystudy

45

Pre-specification

Before the analysis, state that conclusions Before the analysis, state that conclusions about the program will be based in part on about the program will be based in part on findings for this set of subgroupsfindings for this set of subgroups

Pre-specification can be based on Pre-specification can be based on TheoryTheory Prior evidencePrior evidence Policy relevancePolicy relevance

46

Statistical significance

When should we discuss subgroup When should we discuss subgroup findings?findings?

Depends on Depends on Whether significant differences in Whether significant differences in

impacts across subgroupsimpacts across subgroups Might depend on whether impacts for the Might depend on whether impacts for the

full sample are statistically significantfull sample are statistically significant

47

Part VII

Subgroup Analyses #2:Subgroup Analyses #2:Creating SubgroupsCreating Subgroups

Defining Features

Creating subgroups in terms of: Creating subgroups in terms of: Program characteristicsProgram characteristics Randomized group characteristicsRandomized group characteristics Individual characteristicsIndividual characteristics

Defining Subgroups by Program Characteristics

Based only on program features that were Based only on program features that were randomizedrandomized

Thus one cannot use implementation qualityThus one cannot use implementation quality

Defining Subgroups by Characteristics

Of Randomized Groups

Types of impactsTypes of impacts Net impactsNet impacts Differential impactsDifferential impacts

Internal validityInternal validity only use pre-existing characteristicsonly use pre-existing characteristics

PrecisionPrecision Net impact estimates are limited by reduced Net impact estimates are limited by reduced

number of randomized groupsnumber of randomized groups Differential impact estimates are triply Differential impact estimates are triply

limited (and often need four times as many limited (and often need four times as many randomized groups)randomized groups)

Defining Subgroups by Characteristics of Individuals

Types of impactsTypes of impacts Net impactsNet impacts Differential impactsDifferential impacts

Internal validityInternal validity Only use pre-existing characteristicsOnly use pre-existing characteristics Only use subgroups with sample members from all Only use subgroups with sample members from all

randomized groupsrandomized groups

PrecisionPrecision For net impactsFor net impacts:: can be almost as good as for full can be almost as good as for full

sample sample For differential impactsFor differential impacts: can be even better than : can be even better than

for full samplefor full sample

Differential Impactsby Gender

YCB - YCG YPB - YPG

Boys Girls

Boys Girls

ProgramGroup

Boys Girls

Boys Girls

Boys Girls

Boys Girls

ControlGroup

Boys Girls

Boys Girls

Part VIII

Generalizing Results from Generalizing Results from

Multiple Sites and BlocksMultiple Sites and Blocks

Fixed vs. Random Effects Inference:A Vexing Issue

Known vs. unknown populationsKnown vs. unknown populations Broader vs. narrower inferencesBroader vs. narrower inferences Weaker vs. stronger precision Weaker vs. stronger precision Few vs. many sites or blocks Few vs. many sites or blocks

Weighting Sites and Blocks

Implicitly through a pooled regressionImplicitly through a pooled regression Explicitly based on Explicitly based on

Number of schoolsNumber of schools Number of studentsNumber of students

Explicitly based on precision Explicitly based on precision Fixed effectsFixed effects Random effects Random effects

Bottom line: Bottom line: the question addressed is what the question addressed is what countscounts

Part IX

Using Two-Level Data for Three-Using Two-Level Data for Three-Level SituationsLevel Situations

The Issue

General Question:General Question: What happens when you What happens when you design a study with randomized groups that design a study with randomized groups that comprise three levels based on data which do not comprise three levels based on data which do not account explicitly for the middle level? account explicitly for the middle level?

Specific Example:Specific Example: What happens when you What happens when you design a study that randomizes schools (with design a study that randomizes schools (with students clustered in classrooms in schools) based students clustered in classrooms in schools) based on data for students clustered in schools? on data for students clustered in schools?

3-level vs. 2-level Variance Components

Outcomes School Class Student Total School Student Total

Expressive vocab-spring 19.84 32.45 306.18 358.48 38.15 321.11 359.26

Stanford 9 Total Math Scaled Score 115.14 36.40 1273.15 1424.69 131.39 1293.24 1424.63

Stanford 9 Total Reading Scaled Score 108.75 158.95 1581.86 1849.56 181.77 1666.48 1848.25

Sources: The Chicago Literacy Initiative: Making Better Early Readers study (CLIMBERs) database and the School Breakfast Pilot Project (SBPP) database.

Variance Components

3-Level Model 2-Level Model

3-level vs. 2-level MDES

for Original Sample

Outcomes Unconditional Conditional Unconditional Conditional

Expressive vocab-spring 0.482 0.386 0.495 0.311

Stanford 9 Total Math Scaled Score 0.259 0.184 0.259 0.184

Stanford 9 Total Reading Scaled Score 0.261 0.148 0.264 0.150

Sources: The Chicago Literacy Initiative: Making Better Early Readers study (CLIMBERs) database and the School Breakfast Pilot Project (SBPP) database.

MDES3-Level Model 2-Level Model

Further References

Bloom, Howard S. (2005) “Randomizing Groups to Evaluate Place-Based Bloom, Howard S. (2005) “Randomizing Groups to Evaluate Place-Based Programs,” in Howard S. Bloom, editor, Programs,” in Howard S. Bloom, editor, Learning More From Social Learning More From Social Experiments: Evolving Analytic ApproachesExperiments: Evolving Analytic Approaches (New York: Russell Sage (New York: Russell Sage Foundation).Foundation).

Bloom, Howard S., Lashawn Richburg-Hayes and Alison Rebeck Black (2005) Bloom, Howard S., Lashawn Richburg-Hayes and Alison Rebeck Black (2005) “Using Covariates to Improve Precision: Empirical Guidance for Studies that “Using Covariates to Improve Precision: Empirical Guidance for Studies that Randomize Schools to Measure the Impacts of Educational Interventions” (New Randomize Schools to Measure the Impacts of Educational Interventions” (New York: MDRC).York: MDRC).

Donner, Allan and Neil Klar (2000) Donner, Allan and Neil Klar (2000) Cluster Randomization Trials in Health Cluster Randomization Trials in Health ResearchResearch (London: Arnold). (London: Arnold).

Hedges, Larry V. and Eric C. Hedberg (2006) “Intraclass Correlation Values for Hedges, Larry V. and Eric C. Hedberg (2006) “Intraclass Correlation Values for Planning Group Randomized Trials in Education” (Chicago: Northwestern Planning Group Randomized Trials in Education” (Chicago: Northwestern University).University).

Murray, David M. (1998) Murray, David M. (1998) Design and Analysis of Group-Randomized TrialsDesign and Analysis of Group-Randomized Trials (New (New York: Oxford University Press).York: Oxford University Press).

Raudenbush, Stephen W., Andres Martinez and Jessaca Spybrook (2005) “Strategies Raudenbush, Stephen W., Andres Martinez and Jessaca Spybrook (2005) “Strategies for Improving Precision in Group-Randomized Experiments” (University of for Improving Precision in Group-Randomized Experiments” (University of Chicago). Chicago).

Raudenbush, Stephen W. (1997) “Statistical Analysis and Optimal Design for Raudenbush, Stephen W. (1997) “Statistical Analysis and Optimal Design for Cluster Randomized Trials” Cluster Randomized Trials” Psychological MethodsPsychological Methods, 2(2): 173 – 185., 2(2): 173 – 185.

Schochet, Peter Z. (2005) “Statistical Power for Random Assignment Evaluations of Schochet, Peter Z. (2005) “Statistical Power for Random Assignment Evaluations of Education Programs,” (Princeton, NJ: Mathematica Policy Research).Education Programs,” (Princeton, NJ: Mathematica Policy Research).