sample design for group- randomized trials howard s. bloom chief social scientist mdrc prepared for...
TRANSCRIPT
Sample Design for Group-Randomized Trials
Howard S. BloomHoward S. Bloom
Chief Social ScientistChief Social Scientist
MDRCMDRC
Prepared for the IES/NCER Summer Research Training Institute held at Prepared for the IES/NCER Summer Research Training Institute held at Northwestern University on July 27, 2010.Northwestern University on July 27, 2010.
Today we will examine
Sample size determinantsSample size determinants Precision requirementsPrecision requirements Sample allocation Sample allocation Covariate adjustmentsCovariate adjustments Matching and blockingMatching and blocking Subgroup analysesSubgroup analyses Generalizing findings for sites and blocksGeneralizing findings for sites and blocks Using two-level data for three-level Using two-level data for three-level
situationssituations
Statistical properties of group-randomized impact estimators
Unbiased estimatesUnbiased estimates
YYijij = = +B+B00TTjj+e+ejj++ijij
E(bE(b00) = B) = B00
Less precise estimatesLess precise estimates
VAR(VAR(ijij) = ) = 22
VAR(eVAR(ejj) = ) = 22
= = 22/(/(22++22))
)(/)()1(1 00 bSEbSEnGEM IC
Design Effect(for a given total number of individuals)
____________________________________________________________________________
IntraclassIntraclass Individuals per Group (n)Individuals per Group (n)
Correlation Correlation ((
__________________________________________________________________________
Sample design parameters
Number of randomized groups Number of randomized groups (J)(J)
Number of individuals per randomized Number of individuals per randomized group group (n)(n)
Proportion of groups randomized to Proportion of groups randomized to program status program status (P)(P)
Reporting precision
A minimum detectable effect (MDE) is the A minimum detectable effect (MDE) is the smallest true effect that has a “good chance” of smallest true effect that has a “good chance” of being found to be statistically significant. being found to be statistically significant.
We typically define an MDE as the smallest true We typically define an MDE as the smallest true effect that has 80 percent power for a two-tailed effect that has 80 percent power for a two-tailed test of statistical significance at the 0.05 level.test of statistical significance at the 0.05 level.
An MDE is reported in natural units whereas a An MDE is reported in natural units whereas a minimum detectable effect size (MDES) is minimum detectable effect size (MDES) is reported in units of standard deviations reported in units of standard deviations
Minimum Detectable Effect SizesFor a Group-Randomized Design with = 0.05 and no Covariates
______________________________________________________________________Randomized Randomized Individuals per Group (n)Individuals per Group (n)Groups (J) 10 50 500Groups (J) 10 50 500 10 0.77 0.53 0.46 10 0.77 0.53 0.46 20 0.50 0.35 0.3020 0.50 0.35 0.30 40 0.35 0.24 0.2140 0.35 0.24 0.21 120 0.20 0.14 0.12120 0.20 0.14 0.12___________________________________ ___________________________________
Implications for sample design
It is extremely important to randomize It is extremely important to randomize an adequate number of groups.an adequate number of groups.
It is often far less important how many It is often far less important how many individuals per group you have. individuals per group you have.
When assessing how much precision is needed:
Always ask “relative to what?”Always ask “relative to what?” Program benefitsProgram benefits Program costsProgram costs Existing outcome differencesExisting outcome differences Past program performancePast program performance
Effect Size Gospel According to Cohen and Lipsey
Cohen LipseyCohen Lipsey (speculative) (empirical) (speculative) (empirical)
______________________________________________________________________________________________
Small = 0.2Small = 0.2 Small = 0.15 Small = 0.15Medium = 0.5Medium = 0.5Medium = Medium =
0.450.45 Large = 0.8Large = 0.8 Large = 0.90 Large = 0.90
Five-year impacts of the Tennessee class-size experiment
Treatment:Treatment: 13-17 versus 22-26 students per class13-17 versus 22-26 students per class
Effect sizes:Effect sizes: 0.110.11 to 0.22 to 0.22 for reading and math for reading and math
Findings are summarized from Nye, Barbara, Larry V. Hedges and Spyros Findings are summarized from Nye, Barbara, Larry V. Hedges and Spyros Konstantopoulos (1999) “The Long-Term Effects of Small Classes: A Five-Konstantopoulos (1999) “The Long-Term Effects of Small Classes: A Five-Year Follow-up of the Tennessee Class Size Experiment,” Year Follow-up of the Tennessee Class Size Experiment,” Educational Educational Evaluation and Policy AnalysisEvaluation and Policy Analysis, Vol. 21, No. 2: 127-142., Vol. 21, No. 2: 127-142.
Annual reading and math growthReadingReading MathMath
Grade Grade Growth Growth GrowthGrowthTransition Transition Effect Size Effect Size Effect Size Effect Size -------------------------------------------------------------------------------------------------------------------------------- K - 1 K - 1 1.52 1.52 1.141.14 1 - 2 0.97 1.03 1 - 2 0.97 1.03 2 - 3 0.60 2 - 3 0.60 0.890.89 3 - 4 0.36 3 - 4 0.36 0.520.52 4 - 5 0.40 4 - 5 0.40 0.560.56 5 - 6 0.32 5 - 6 0.32 0.410.41 6 - 7 0.23 6 - 7 0.23 0.300.30 7 - 8 0.26 7 - 8 0.26 0.320.32 8 - 9 0.24 8 - 9 0.24 0.220.22 9 - 10 0.19 9 - 10 0.19 0.250.25 10 - 11 0.19 10 - 11 0.19 0.140.14 11 - 12 0.06 11 - 12 0.06 0.010.01--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Based on work in progress using documentation on the national norming samples for the CAT5, Based on work in progress using documentation on the national norming samples for the CAT5, SAT9, Terra Nova CTBS, Gates MacGinitie (for reading only), MAT8, Terra Nova CAT, and SAT9, Terra Nova CTBS, Gates MacGinitie (for reading only), MAT8, Terra Nova CAT, and
SAT10. 95% confidence intervals range in reading from +/- .03 to .15 and in math from +/- .03 to .22SAT10. 95% confidence intervals range in reading from +/- .03 to .15 and in math from +/- .03 to .22
Performance gap between “average” (50th percentile) and “weak” (10th percentile) schools
Subject and grade District I District II District III District IV
Reading
Grade 3 0.31 0.18 0.16 0.43
Grade 5 0.41 0.18 0.35 0.31
Grade 7 .025 0.11 0.30 NA
Grade 10 0.07 0.11 NA NA
Math
Grade 3 0.29 0.25 0.19 0.41
Grade 5 0.27 0.23 0.36 0.26
Grade 7 0.20 0.15 0.23 NA
Grade 10 0.14 0.17 NA NA
Source: District I outcomes are based on ITBS scaled scores, District II on SAT 9 scaled scores, District Source: District I outcomes are based on ITBS scaled scores, District II on SAT 9 scaled scores, District III on MAT NCE scores, and District IV on SAT 8 NCE scores. III on MAT NCE scores, and District IV on SAT 8 NCE scores.
Demographic performance gap in reading and math: Main NAEP scores
Subject and grade
Black-White
Hispanic-White
Male-Female
Eligible-Ineligible for free/reduced
price lunch
Reading
Grade 4 -0.83 -0.77 -0.18 -0.74
Grade 8 -0.80 -0.76 -0.28 -0.66
Grade 12 -0.67 -0.53 -0.44 -0.45
Math
Grade 4 -0.99 -0.85 0.08 -0.85
Grade 8 -1.04 -0.82 0.04 -0.80
Grade 12 -0.94 -0.68 0.09 -0.72
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2002 Reading Assessment and 2000 Mathematics Assessment.National Assessment of Educational Progress (NAEP), 2002 Reading Assessment and 2000 Mathematics Assessment.
ES Results from Randomized Studies
Achievement Measure n Mean
Elementary School 389 0.33
Standardized test (Broad) 21 0.07
Standardized test (Narrow) 181 0.23
Specialized Topic/Test 180 0.44
Middle Schools 36 0.510.51
High Schools 43 0.270.27
Sample allocation alternatives
Balanced allocationBalanced allocation maximizes precision for a given sample size; maximizes precision for a given sample size; maximizes robustness to distributional maximizes robustness to distributional
assumptions.assumptions.
Unbalanced allocationUnbalanced allocation precision erodes precision erodes slowlyslowly with imbalance for a with imbalance for a
given sample sizegiven sample size imbalance can facilitate a larger sampleimbalance can facilitate a larger sample Imbalance can facilitate randomization Imbalance can facilitate randomization
Variance relationships for the program and control groups
Equal variances: Equal variances: when the program when the program does does notnot affect the outcome variance. affect the outcome variance.
Unequal variances: Unequal variances: when the program when the program doesdoes
affect the outcome variance.affect the outcome variance.
How allocation affects MDES
00.2)5.1(5.
1
)1(
1
PP
04.2)6.1(6.
1
18.2)7.1(7.
1
50.2)8.1(8.
1
33.3)9.1(9.
1
Minimum Detectable Effect Size For Sample
Allocations Given Equal Variances
AllocationAllocation ExampleExample** Ratio to Ratio to Balanced Balanced
AllocationAllocation 0.5/0.5 0.540.5/0.5 0.54 1.00 1.00 0.6/0.4 0.550.6/0.4 0.55 1.02 1.02 0.7/0.30.7/0.3 0.59 0.591.091.09 0.8/0.20.8/0.2 0.68 0.68 1.25 1.25 0.9/0.10.9/0.1 0.91 0.91 1.67 1.67________________________________________________________________________________** Example is for n = 20, J = 10, Example is for n = 20, J = 10, = 0.05, a one-tail hypothesis test and no = 0.05, a one-tail hypothesis test and no
covariatescovariates..
Implications of unbalanced allocations with unequal variances
JJbSE
C
C
P
PU
22
0)(
JJbseE
P
C
C
PE
22
0))((
Implications Continued
The estimated standard error is unbiasedThe estimated standard error is unbiased When the allocation is balancedWhen the allocation is balanced When the variances are equalWhen the variances are equal
The estimated standard error is biased upwardThe estimated standard error is biased upward When the larger sample has the larger varianceWhen the larger sample has the larger variance
The estimated standard error is biased downwardThe estimated standard error is biased downward When the larger sample has the smaller varianceWhen the larger sample has the smaller variance
Interim Conclusions
Don’t use the equal variance assumptionDon’t use the equal variance assumption for an unbalanced allocation with many for an unbalanced allocation with many degrees of freedomdegrees of freedom..
Use a balanced allocationUse a balanced allocation when there are when there are few degrees of freedomfew degrees of freedom..
References
Gail, Mitchell H., Steven D. Mark, Raymond J. Carroll, Gail, Mitchell H., Steven D. Mark, Raymond J. Carroll, Sylvan B. Green and David Pee (1996) “On Design Sylvan B. Green and David Pee (1996) “On Design Considerations and Randomization-Based Inferences Considerations and Randomization-Based Inferences for Community Intervention Trials,” for Community Intervention Trials,” Statistics in Statistics in MedicineMedicine 15: 1069 – 1092. 15: 1069 – 1092.
Bryk, Anthony S. and Stephen W. Raudenbush (1988) Bryk, Anthony S. and Stephen W. Raudenbush (1988) “Heterogeneity of Variance in Experimental Studies: “Heterogeneity of Variance in Experimental Studies: A Challenge to Conventional Interpretations,” A Challenge to Conventional Interpretations,” Psychological Bulletin,Psychological Bulletin, 104(3): 396 – 404. 104(3): 396 – 404.
Basic ideas
Goal:Goal: Reduce the number of clusters randomized Reduce the number of clusters randomized
Approach:Approach: Reduce the standard error of the Reduce the standard error of the impact estimator by controlling for baseline impact estimator by controlling for baseline covariatescovariates
Alternative CovariatesAlternative Covariates Individual-levelIndividual-level Cluster-levelCluster-level PretestsPretests Other characteristicsOther characteristics
Impact Estimation with a Covariate
yyijij = the outcome for student i from school j = the outcome for student i from school j
TTjj = 1 for treatment schools and 0 for control schools = 1 for treatment schools and 0 for control schools
XXjj = a covariate for school j = a covariate for school j
xxijij = a covariate for student i from school j = a covariate for student i from school j
eejj = a random error term for school j = a random error term for school j
ijij = a random error term for student i from school j = a random error term for student i from school j
ijjijjij exTy 10
ijjjjij exTy 10
Minimum Detectable Effect Size with a Covariate
MDES = minimum detectable effect sizeMDES = minimum detectable effect sizeMMJ-KJ-K = a degrees-of-freedom multiplier = a degrees-of-freedom multiplier11
J = the total number of schools randomizedJ = the total number of schools randomizedn = the number of students in a grade per schooln = the number of students in a grade per schoolP = the proportion of schools randomized to treatmentP = the proportion of schools randomized to treatment = the unconditional intraclass correlation (without a covariate)= the unconditional intraclass correlation (without a covariate)RR11
22 = the proportion of variance across individuals within schools (at = the proportion of variance across individuals within schools (at level 1) predicted by the covariatelevel 1) predicted by the covariate
RR2222 = the proportion of variance across schools (at level 2) predicted = the proportion of variance across schools (at level 2) predicted by the covariate by the covariate
11 For 20 or more degrees of freedom M For 20 or more degrees of freedom MJ-KJ-K equals 2.8 for a two-tail test and 2.5 for a equals 2.8 for a two-tail test and 2.5 for a one-tail test with statistical power of 0.80 and statistical significance of 0.05one-tail test with statistical power of 0.80 and statistical significance of 0.05
nJPPJPPMMDES RR
KJ)1(
)1)(1(
)1(
)1( 12
22
Questions Addressed Empirically about the Predictive Power of Covariates
School-level vs. student-level pretestsSchool-level vs. student-level pretests Earlier vs. later follow-up yearsEarlier vs. later follow-up years Reading vs. mathReading vs. math Elementary vs. middle vs. high schoolElementary vs. middle vs. high school All schools vs. low-income schools vs. low-performing All schools vs. low-income schools vs. low-performing
schoolsschools
Empirical Analysis
Estimate Estimate , R, R2222 and R and R11
22 from data on thousands of students from data on thousands of students
from hundreds of schools, during multiple years at five from hundreds of schools, during multiple years at five urban school districtsurban school districts
Summarize these estimates for reading and math in grades Summarize these estimates for reading and math in grades 3, 5, 8 and 103, 5, 8 and 10
Compute implications for minimum detectable effect sizesCompute implications for minimum detectable effect sizes
Estimated Parameters for Reading with a School-level Pretest Lagged One Year
___________________________________________________________________ ___________________________________________________________________
School DistrictSchool District ______________________________________________________________________________________________________________________ A B C D EA B C D E___________________________________________________________________ ___________________________________________________________________
Grade 3 Grade 3 0.20 0.15 0.19 0.22 0.20 0.15 0.19 0.22
0.160.16 RR22
2 2 0.31 0.77 0.74 0.51 0.750.31 0.77 0.74 0.51 0.75Grade 5Grade 5 0.25 0.15 0.20 NA 0.120.25 0.15 0.20 NA 0.12 RR22
22 0.33 0.50 0.81 NA 0.70 0.33 0.50 0.81 NA 0.70Grade 8Grade 8 0.18 NA 0.23 NA NA0.18 NA 0.23 NA NA RR22
22 0.77 NA 0.91 NA NA 0.77 NA 0.91 NA NAGrade 10Grade 10 0.15 NA 0.29 NA NA0.15 NA 0.29 NA NA RR22
22 0.93 NA 0.95 NA NA 0.93 NA 0.95 NA NA________________________________________________________________________________________________________________________________________
Minimum Detectable Effect Sizes for Reading with a School-Level
Pretest (Y-1) or a Student-Level Pretest (y-1) Lagged One Year ________________________________________________________ ________________________________________________________
Grade 3 Grade 5 Grade 8 Grade 10Grade 3 Grade 5 Grade 8 Grade 10
________________________________________________________________________________________________________________
20 schools randomized20 schools randomized
No covariate 0.57 0.56 0.61 0.62No covariate 0.57 0.56 0.61 0.62
YY-1-1 0.37 0.38 0.24 0.16 0.37 0.38 0.24 0.16
yy-1-1 0.38 0.40 0.28 0.15 0.38 0.40 0.28 0.15
40 schools randomized40 schools randomized
No covariate 0.39 0.38 0.42 0.42No covariate 0.39 0.38 0.42 0.42
YY-1-1 0.26 0.26 0.17 0.11 0.26 0.26 0.17 0.11
yy-1-1 0.26 0.27 0.19 0.10 0.26 0.27 0.19 0.10
60 schools randomized60 schools randomized
No covariate 0.32 0.31 0.34 0.34No covariate 0.32 0.31 0.34 0.34
YY-1-1 0.21 0.21 0.13 0.09 0.21 0.21 0.13 0.09
yy-1-1 0.21 0.22 0.15 0.08 0.21 0.22 0.15 0.08
________________________________________________________________________________________________________________
Key Findings
Using a pretest improves precision dramatically.Using a pretest improves precision dramatically. This improvement increases appreciably from This improvement increases appreciably from
elementary school to middle school to high school elementary school to middle school to high school because Rbecause R22
22 increases. increases.
School-level pretests produce as much precision as do School-level pretests produce as much precision as do student-level pretests.student-level pretests.
The effect of a pretest declines somewhat as the time The effect of a pretest declines somewhat as the time between it and the post-test increases.between it and the post-test increases.
Adding a second pretest increases precision slightly.Adding a second pretest increases precision slightly. Using a pretest for a different subject increases precision Using a pretest for a different subject increases precision
substantially.substantially. Narrowing the sample to schools that are similar to each Narrowing the sample to schools that are similar to each
other does not improve precision beyond that achieved other does not improve precision beyond that achieved by a pretest.by a pretest.
Source
Bloom, Howard S., Lashawn Richburg-Hayes and Alison Bloom, Howard S., Lashawn Richburg-Hayes and Alison Rebeck Black (2007) “Rebeck Black (2007) “Using Covariates to Improve Using Covariates to Improve Precision for Studies that Randomize Schools to Precision for Studies that Randomize Schools to Evaluate Educational Interventions”Evaluate Educational Interventions” Educational Educational Evaluation and Policy AnalysisEvaluation and Policy Analysis, 29(1): 30 – 59., 29(1): 30 – 59.
Part VThe Putative Power of Pairing
A Tail of Two TradeoffsA Tail of Two Tradeoffs(“It was the best of techniques. It was the worst of techniques.” (“It was the best of techniques. It was the worst of techniques.”
Who the dickens said that?)Who the dickens said that?)
Pairing
Why match pairs?Why match pairs? for face validityfor face validity for precisionfor precision
How to match pairs?How to match pairs? rank order clusters by covariate rank order clusters by covariate pair clusters in rank-ordered list pair clusters in rank-ordered list randomize clusters in each pairrandomize clusters in each pair
When to pair?
When the gain in predictive power When the gain in predictive power outweighs the loss of degrees of outweighs the loss of degrees of freedomfreedom
Degrees of freedomDegrees of freedom J - 2 without pairingJ - 2 without pairing J/2 - 1 with pairingJ/2 - 1 with pairing
Deriving the Minimum Required Predictive Power of Pairing
Without pairingWithout pairing
With pairingWith pairing
Breakeven RBreakeven R22
GRJGR bb SEMMDE )()( 020
GRJGR bb SERMMDE )(2)( 012/0 1
MMR
J
J
2
12/
2
22
min 1
The Minimum Required
Predictive Power of Pairing
Randomized Required PredictiveRandomized Required Predictive
Clusters (J) Power (RClusters (J) Power (R min min22))**
66 0.520.52 88 0.350.35
1010 0.260.26 2020 0.110.11 3030 0.070.07
**For a two-tail test.For a two-tail test.
A few key points about blocking
Blocking for face validity vs. blocking for Blocking for face validity vs. blocking for precisionprecision
Treating blocks as fixed effects vs.random Treating blocks as fixed effects vs.random effectseffects
Defining blocks using baseline informationDefining blocks using baseline information
Confirmatory vs. Exploratory Findings
Confirmatory: Draw conclusions about the Confirmatory: Draw conclusions about the program’s effectiveness if results areprogram’s effectiveness if results are
Consistent with theory and contextual factorsConsistent with theory and contextual factorsStatistically significant and largeStatistically significant and largeAnd subgroup was pre-specifiedAnd subgroup was pre-specified
Exploratory: Develop hypotheses for further Exploratory: Develop hypotheses for further studystudy
45
Pre-specification
Before the analysis, state that conclusions Before the analysis, state that conclusions about the program will be based in part on about the program will be based in part on findings for this set of subgroupsfindings for this set of subgroups
Pre-specification can be based on Pre-specification can be based on TheoryTheory Prior evidencePrior evidence Policy relevancePolicy relevance
46
Statistical significance
When should we discuss subgroup When should we discuss subgroup findings?findings?
Depends on Depends on Whether significant differences in Whether significant differences in
impacts across subgroupsimpacts across subgroups Might depend on whether impacts for the Might depend on whether impacts for the
full sample are statistically significantfull sample are statistically significant
47
Defining Features
Creating subgroups in terms of: Creating subgroups in terms of: Program characteristicsProgram characteristics Randomized group characteristicsRandomized group characteristics Individual characteristicsIndividual characteristics
Defining Subgroups by Program Characteristics
Based only on program features that were Based only on program features that were randomizedrandomized
Thus one cannot use implementation qualityThus one cannot use implementation quality
Defining Subgroups by Characteristics
Of Randomized Groups
Types of impactsTypes of impacts Net impactsNet impacts Differential impactsDifferential impacts
Internal validityInternal validity only use pre-existing characteristicsonly use pre-existing characteristics
PrecisionPrecision Net impact estimates are limited by reduced Net impact estimates are limited by reduced
number of randomized groupsnumber of randomized groups Differential impact estimates are triply Differential impact estimates are triply
limited (and often need four times as many limited (and often need four times as many randomized groups)randomized groups)
Defining Subgroups by Characteristics of Individuals
Types of impactsTypes of impacts Net impactsNet impacts Differential impactsDifferential impacts
Internal validityInternal validity Only use pre-existing characteristicsOnly use pre-existing characteristics Only use subgroups with sample members from all Only use subgroups with sample members from all
randomized groupsrandomized groups
PrecisionPrecision For net impactsFor net impacts:: can be almost as good as for full can be almost as good as for full
sample sample For differential impactsFor differential impacts: can be even better than : can be even better than
for full samplefor full sample
Differential Impactsby Gender
YCB - YCG YPB - YPG
Boys Girls
Boys Girls
ProgramGroup
Boys Girls
Boys Girls
Boys Girls
Boys Girls
ControlGroup
Boys Girls
Boys Girls
Part VIII
Generalizing Results from Generalizing Results from
Multiple Sites and BlocksMultiple Sites and Blocks
Fixed vs. Random Effects Inference:A Vexing Issue
Known vs. unknown populationsKnown vs. unknown populations Broader vs. narrower inferencesBroader vs. narrower inferences Weaker vs. stronger precision Weaker vs. stronger precision Few vs. many sites or blocks Few vs. many sites or blocks
Weighting Sites and Blocks
Implicitly through a pooled regressionImplicitly through a pooled regression Explicitly based on Explicitly based on
Number of schoolsNumber of schools Number of studentsNumber of students
Explicitly based on precision Explicitly based on precision Fixed effectsFixed effects Random effects Random effects
Bottom line: Bottom line: the question addressed is what the question addressed is what countscounts
Part IX
Using Two-Level Data for Three-Using Two-Level Data for Three-Level SituationsLevel Situations
The Issue
General Question:General Question: What happens when you What happens when you design a study with randomized groups that design a study with randomized groups that comprise three levels based on data which do not comprise three levels based on data which do not account explicitly for the middle level? account explicitly for the middle level?
Specific Example:Specific Example: What happens when you What happens when you design a study that randomizes schools (with design a study that randomizes schools (with students clustered in classrooms in schools) based students clustered in classrooms in schools) based on data for students clustered in schools? on data for students clustered in schools?
3-level vs. 2-level Variance Components
Outcomes School Class Student Total School Student Total
Expressive vocab-spring 19.84 32.45 306.18 358.48 38.15 321.11 359.26
Stanford 9 Total Math Scaled Score 115.14 36.40 1273.15 1424.69 131.39 1293.24 1424.63
Stanford 9 Total Reading Scaled Score 108.75 158.95 1581.86 1849.56 181.77 1666.48 1848.25
Sources: The Chicago Literacy Initiative: Making Better Early Readers study (CLIMBERs) database and the School Breakfast Pilot Project (SBPP) database.
Variance Components
3-Level Model 2-Level Model
3-level vs. 2-level MDES
for Original Sample
Outcomes Unconditional Conditional Unconditional Conditional
Expressive vocab-spring 0.482 0.386 0.495 0.311
Stanford 9 Total Math Scaled Score 0.259 0.184 0.259 0.184
Stanford 9 Total Reading Scaled Score 0.261 0.148 0.264 0.150
Sources: The Chicago Literacy Initiative: Making Better Early Readers study (CLIMBERs) database and the School Breakfast Pilot Project (SBPP) database.
MDES3-Level Model 2-Level Model
Further References
Bloom, Howard S. (2005) “Randomizing Groups to Evaluate Place-Based Bloom, Howard S. (2005) “Randomizing Groups to Evaluate Place-Based Programs,” in Howard S. Bloom, editor, Programs,” in Howard S. Bloom, editor, Learning More From Social Learning More From Social Experiments: Evolving Analytic ApproachesExperiments: Evolving Analytic Approaches (New York: Russell Sage (New York: Russell Sage Foundation).Foundation).
Bloom, Howard S., Lashawn Richburg-Hayes and Alison Rebeck Black (2005) Bloom, Howard S., Lashawn Richburg-Hayes and Alison Rebeck Black (2005) “Using Covariates to Improve Precision: Empirical Guidance for Studies that “Using Covariates to Improve Precision: Empirical Guidance for Studies that Randomize Schools to Measure the Impacts of Educational Interventions” (New Randomize Schools to Measure the Impacts of Educational Interventions” (New York: MDRC).York: MDRC).
Donner, Allan and Neil Klar (2000) Donner, Allan and Neil Klar (2000) Cluster Randomization Trials in Health Cluster Randomization Trials in Health ResearchResearch (London: Arnold). (London: Arnold).
Hedges, Larry V. and Eric C. Hedberg (2006) “Intraclass Correlation Values for Hedges, Larry V. and Eric C. Hedberg (2006) “Intraclass Correlation Values for Planning Group Randomized Trials in Education” (Chicago: Northwestern Planning Group Randomized Trials in Education” (Chicago: Northwestern University).University).
Murray, David M. (1998) Murray, David M. (1998) Design and Analysis of Group-Randomized TrialsDesign and Analysis of Group-Randomized Trials (New (New York: Oxford University Press).York: Oxford University Press).
Raudenbush, Stephen W., Andres Martinez and Jessaca Spybrook (2005) “Strategies Raudenbush, Stephen W., Andres Martinez and Jessaca Spybrook (2005) “Strategies for Improving Precision in Group-Randomized Experiments” (University of for Improving Precision in Group-Randomized Experiments” (University of Chicago). Chicago).
Raudenbush, Stephen W. (1997) “Statistical Analysis and Optimal Design for Raudenbush, Stephen W. (1997) “Statistical Analysis and Optimal Design for Cluster Randomized Trials” Cluster Randomized Trials” Psychological MethodsPsychological Methods, 2(2): 173 – 185., 2(2): 173 – 185.
Schochet, Peter Z. (2005) “Statistical Power for Random Assignment Evaluations of Schochet, Peter Z. (2005) “Statistical Power for Random Assignment Evaluations of Education Programs,” (Princeton, NJ: Mathematica Policy Research).Education Programs,” (Princeton, NJ: Mathematica Policy Research).