defined population
DESCRIPTION
S290: Quantitative Methods for Improving Causal Inference in Educational Research Class #01: The Formal Design of Experiments. Sample. Draw a random sample. In an ideal world unicorns exist, and this is what we would need to do to prove that a “ treatment ” caused an “ effect ” …. - PowerPoint PPT PresentationTRANSCRIPT
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 1
S290: Quantitative Methods for Improving Causal Inference in Educational Research Class #01: The Formal Design of Experiments
Roadmap of Today’s ClassTopic Slides
I. What Is An Experiment? Making Causal Statements – An Ideal Experiment. Making Causal Statements in Practice -- The Randomized Experiment. The Critical Condition – Exogenous and Randomized Assignment.
234
II. Analyzing the Data Collected in a Randomized Experiment It’s So Simple, You Can Use a t-Test (or ANOVA). But, You’re Always Better Off Using OLS Regression Analysis.
56
III. Properties and Problems of OLS Estimation Critical Assumption of OLS Regression Analysis -- Bias (and Precision). Then, You Can Add Covariates (Control Predictors) … But, Why? An All-Too-Common Story – Omitted-Variable Bias, When Experiments Fail!
78
9-10
IV. You Can Make Unbiased Causal Inferences in Other Ways Too Roadmap of the S290 Seminar. 11
AppendicesI. Random Numbers, Random Sampling & Randomization.II. Randomized Experiments Are Not Limited To Simple Two-Group Comparisons.III. Selected Comments On Next Week’s Reading.
12-1314-15
17
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 2
S290: Quantitative Methods for Improving Causal Inference in Educational Research Making Causal Statements – An “Ideal” Experiment
In an ideal world unicorns exist, and this is what we would need to do to prove that a “treatment” caused an “effect” …
Defined Population
Sample
Draw a random sample ...
Sample
While we can’t do this in practice, there is nothing to stop us imagining… hypothetically, at least, that each participant in the population has a value of the outcome that would “potentially” be revealed under the corresponding experimental condition:
Yi(1) = Potential value of the outcome for ith person, when treated (Ti = 1).Yi(0) = Potential value of the outcome for ith person, when not treated (Ti = 0) … then,
Hypothetically, we could then implement the treatment for each participant …
and also concurrently not implement it … a perfect counterfactual!!!(We would need to turn back the clock, mysteriously erasing all impact/memory of
the treatment and returning participants to their original pristine condition).
And if the ATE differed from zero, we could claim that the treatment caused the effect, in the population:Why? Because there would be no other explanation for the differences detected between the “treated” and “control” conditions other than participants’ experience of the treatment!
The Individual Treatment Effect (ITE) is then:
Difference in potential outcome values between treated & control conditions, for each person.
)0()1( iii YYITE
And Average Treatment Effect (ATE) in the population is:
Average the individual treatment effects across all participants, in the population.
ITEEATEi
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 3
S290: Quantitative Methods for Improving Causal Inference in Educational Research Making Causal Statements In Practice – The Randomized Experiment
An “ideal” experiment is not possible in practice because the same group of people cannot both experience and not experience treatment … you have a missing-data problem!
DefinedPopulation
Sample
Randomly select …
And, if we can reject the null hypothesis that the ATE differs from zero, in the population, we can claim the treatment caused the effect:Why? Because random assignment to experimental conditions has ensured that the treatment and control groups were identical on average in the population (“Equal in Expectation”) before treatment onset.
Then, randomly
assign each participant …
… or to the Control Group, where they are not treated, and their value of
the outcome, Yi(0), is measured.
… either to the Treatment Groupwhere they are treated, and their value
of the outcome, Yi(1), is measured.
So, while you may not be able to estimate individual treatment effects in practice, you may still be able to estimate the average treatment effect ... if your conduct a randomized experiment!
Providing your random assignment is credible, you can estimate the population Average Treatment Effect as a difference of treatment and control group sample means:
0
1
1
1
01
)0()1(ˆ
n
Y
n
YETA
n
i
n
i
i
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 4
S290: Quantitative Methods for Improving Causal Inference in Educational Research A Critical Requirement -- Exogenous & Random Treatment Assignment!
Big idea in a randomized experiment is that treatment variation is exogenously & randomly assigned …
It is the presence of exogenous and random treatment variation that validates the causal attributions – that is, it ensures the internal
validity of an experiment.
An external (“exogenous”) agent (the investigator) determines who is treated (Ti = 1) and who is not
(Ti = 0) … by randomly assigning each participant to an experimental condition.
In particular, participants themselves have no say!
This ensures that: Values of all observed and unobserved
characteristics of the participants are randomized across treatment and control groups.
Members of the treatment and control groups are then equal, on average, in the population (“Equal in Expectation”) before the experiment begins, on every possible dimension.
As an added benefit, this means that the values of treatment variable, T, will also be completely uncorrelated with all characteristics of participants, observed and unobserved, in the population.
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 5
S290: Quantitative Methods for Improving Causal Inference in Educational Research It’s Easy to Analyze Data From Randomized Experiments Using a t-Test, or ANOVA
The great thing about doing an experiment is … the cleaner the design, the simpler the data-analysis …
0critobs
nndfcrit
obs
Htt
tt
nnsnsns
ns
ns
YYt
reject then if
)2()1()1( where,
0 :H:test- tsample- twoaconduct effect, treatmentafor test To
)05.()2(
01
200
2112
0
2
1
201
010
01
No need for a pre-test,No need for controls,
No need for complex analyses,…
Just estimate and examine the difference in average outcome between the Treatment
and Control Groups
Sample
01
01
differencemean sample by the Estimated
effect treatmentaverage Population
YY
Control Group (Ti=0)n0 cases
Mean of outcome, St. dev. of outcome, s0
0Y
Treatment Group (Ti=1)n1 cases
Mean of outcome, St. dev. of outcome, s1
1Y
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 6
S290: Quantitative Methods for Improving Causal Inference in Educational Research But, You’re Better Off Using OLS Regression Analysis
You don’t actually need to conduct a t-test, you can use OLS regression analysis to get an identical answer …
iii
ii
TYTY
10
: predictor, treatmentsdichotomouon , outcome, Regress
Sample
Control Group (Ti=0)n0 cases
Mean of outcome, St. dev. of outcome, s0
0Y
Treatment Group (Ti=1)n1 cases
Mean of outcome, St. dev. of outcome, s1
1Y
This model specifies that, in the population:
101
0100
10101
)(so, and
)0(0
)1(1
TYE
TYE
And hence the OLS regression slope associated with the treatment dummy, , is the estimated average treatment effect.
The t-test associated with parameter 1 in an OLS regression analysis of the same data is identical to the t-test offered on the previous slide …
Still no need for a pre-test,Still no need for controls,
Still no need for complex analyses,…
The estimated regression parameter is the difference in average outcome!!!
1̂
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 7
If residuals and predictor(s), are
uncorrelated, then OLS estimates of regression
parameters are unbiased. What is Bias? ... in repeated sampling from a population …
If residuals and predictor(s), are correlated,
then OLS estimates of regression parameters are
biased.
S290: Quantitative Methods for Improving Causal Inference in Educational ResearchCritical Assumption of OLS Regression Analysis (Bias and Precision)!
Critically, in an experiment, the most important assumption
of OLS regression analysis is automatically satisfied …
iii TY 10
Biased estimate
Unbiased estimate
… because, if when the values of the treatment predictor
have been assigned at random, then the “treatment” predictor will be uncorrelated automatically with everything,
including the residuals(in the population).
In OLS regression analysis, we usually say that the “residuals must be distributed randomly” … • What we actually mean by this is that we are assuming
that the residuals are uncorrelated with everything else, including any predictor(s) in the model.
• If they are not, then OLS provides biased estimates of the underlying population parameters!
Variation in YPredicted
by T
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 8
TotalVariation
in Y
Before covariate added …
S290: Quantitative Methods for Improving Causal Inference in Educational Research Then You Can Add Covariates (Control Predictors) … But, Why?
If analyses are so simple with data from a randomized experiment, why do folk complicate their work by adding control predictors (“covariates”) to OLS regression analyses of experimental data?
After covariate added …iiii ZTY 210
iii TY 10
If you add covariate Z to the regression model, the part of Y that
is now predicted by Z must have been part of the earlier residual (and
not part of the variation in Y that has been predicted by T ) because Z
and T are uncorrelated, by randomization).
Reduced residual variance means better precision (smaller standard error) for the estimated treatment effect:
Smaller standard error means larger t-statistic:
Larger t-statistic means a smaller p-value, which means you have more power to reject H0.
Variance Residualˆ.. 1 es
11ˆ..ˆ est
Variation in YPredicted
by T
ResidualVariation in Y
Before including Z
TotalVariation
in T
Variation in YNow Predicted
by Z
ResidualVariation Is
SmallerAfter Z is
included
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 9
S290: Quantitative Methods for Improving Causal Inference in Educational Research A Common Story When Experiments Fail – “Omitted Variable” Bias!
iii TY 10
There is a story you will hear repeatedly … An Experiment Will Fail – And Its Findings Will Be Biased – If Participants Can Choose Their Own Experimental Conditions … If Treatment Assignment Is Endogenous …
If students can self-select into classes, perhaps their selection will
be driven by an unobserved characteristic, like Motivation.Maybe more motivated students, seek admission to small classes?So, after their choices, T will be
correlated with Student Motivation.
StudentAchievement?
Small vs. LargeClasses
But, what if Student Motivation also impacts the Achievement
outcome?
But, we have just admitted that the offending (unobserved)
characteristic (Motivation) impacts the outcome.
So, if it has been omitted as a predictor, its effect must naturally
reside surreptitiously in the residual (where all omitted effects
on the outcome go to die).
We have not included the offending unobserved participant characteristic
(Motivation) in the model as an explicit predictor …
Now, because the residuals contain the effect of the omitted predictor, Motivation, they must be correlated with T.
Why? Because the values of T are now correlated with student motivation, by the selection that occurred … BIAS, BIAS, BIAS!
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 10
S290: Quantitative Methods for Improving Causal Inference in Educational Research A Common Story When Experiments Fail – “Omitted Variable” Bias!
iiii ZTY 210
You might react by saying, “Well, then, let’s include motivation as a predictor!” … and you’d be right …
Pull the offending omitted variable (motivation?) out of the residual.Include it as a predictor in the regression model …
This leaves the treatment predictor and the residual again uncorrelated
… now, our OLS estimate of the treatment effect will again be unbiased.
But, what about competitiveness, family wealth, … If treatment assignment is not random, there may be some other unobserved
characteristic related to treatment assignment that has also been omitted? How would you ever know that you had included all the needed controls! So, an unknown random bias is always possible regardless of how many
covariates you include!!!!! Be careful when experiments fail!
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 11
S290: Quantitative Methods for Improving Causal Inference in Educational Research Other Than Experiments … What Then Can You Do?
In addition to true experiments in which an “exogenous” investigator takes charge of the randomization to experimental conditions, there are other strategies and situations that can support unbiased causal attribution …
Natural Experiments: You can sometimes locate exogenous variation in
assignment to treatments that has been enforced “by nature,” rather than the investigator.
Perhaps due to unanticipated policy shifts and natural disasters, random events, and the like.
Regression-Discontinuity Designs: You can capitalize on some ubiquitous initial ranking or
arraying of participants, and the exogenous imposition of a cut-score or boundary to create “haves” and “have-nots.”
Natural experiments often fall into this category.
Instrumental Variables Estimation: You can sometimes tease out analytically the exogenous
component of otherwise endogenous treatment variation. Often, this permits making causal inferences in
observational data, or when an experiment has failed.
Model, and Account for, Endogenous Selection: You can resolve treatment assignment bias (and sample
selection bias) by modeling the selection process explicitly and using analytic bias-correction methods, like propensity score estimation.
And You Can Dream on …
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 12
S290: Quantitative Methods for Improving Causal Inference in Educational Research Appendix I: Random Numbers, Random Sampling & Randomization
It’s hard to ensure true randomness
How do you know there are no trends or biases hidden deep in the list of random numbers (or
even secret messages hidden by the Architect)?
• Random numbers were first created using dice in ancient Sumeria, an effective but inefficient and potentially flawed process (see graphic on right).
• Up until the 20th century, random numbers were generated by tossing coins, rolling dice, dealing cards, picking ping-pong balls out of urns, etc.
• In 1927, statistician L.H.C. Tippett published a list of 41,600 “random” numbers by picking out the middle digits of measurements of the areas of English churches (!!!).
• In 1955, Rand Corporation published A Million Random Numbers With 100,000 Normal Deviates (but, was it peer-reviewed?)
• Today, there are random number generators on the WWW.
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 13
S290: Quantitative Methods for Improving Causal Inference in Educational Research Appendix I: Random Numbers, Random Sampling & Randomization
How do you use a random number table to select people?
Obtain an initial random start by some reasonable mechanical means, and note the random number in that cell.
Count down from that cell, a number of cells equal to the initial random number.
The new cell now contains your first true random number.
Starting with the number in this cell, assign random ID numbers to each person… For random selection, assign a random ID
number to each person in the population. For random assignment, assign a random ID
number to each person in the sample. In either case, adopt a sensible rule that uses
the random ID numbers to accept or reject each person: For random selection, order the individuals in
the population by their random ID and sample the number of participants that you need from the top of the list.
For random assignment, order the individuals in the sample by their random ID and alternately assign them to treatment and control groups.
Etc.
In a table of random numbers, every sequence of numbers listed in any direction is random – any column or row of single digits, columns and rows of double digits, columns and rows of triple digits, etc. You choose the number of digits you use in order to match the size of the problem you’re addressing.
8
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 14
S290: Quantitative Methods for Improving Causal Inference in Educational Research Appendix II: Randomized Experiments Need Not Simply Be Two-Group Comparisons …
You can include more than two groups in the research design
Tennessee Student/Teacher Achievement Ratio (STAR) Experiment (Krueger, 1999)• Research Question: Is academic achievement
enhanced when students are taught in smaller classes?
• Outcome: Student achievement.• Experimental Conditions:
• Treatment: “Small” classes (13-17 kids).• Control #1: “Regular” classes (22-25)
without a teacher’s aide.• Control #2: “Regular” classes (22-25) with
a teacher’s aide.
Situated Science Learning in Multi-User Virtual Environments (Dede, 2004)• Interactive 3-D virtual environment called
“River City,” learners collaborate as scientists to resolve environmental and health problems.
• Research Question: Are student science achievement and attitudes towards science enhanced by apprenticeship learning?
• Outcome: Science achievement and attitudes.• Experimental Conditions provide educational
supports according to four theories of learning:• Treatment #1: Expert Coaching.• Treatment #2: Peripheral Participation.• Treatment #3: Pure Constructivist.• Control: Constructivist without Technology.
Each supports interesting post-hoc empirical comparisons, using GLH Testing
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 15
S290: Quantitative Methods for Improving Causal Inference in Educational Research Appendix II: Randomized Experiments Need Not Simply Be Two-Group Comparisons …
You can make the treatment levels ordinal or continuous
Modeling Across the Curriculum (MAC) Project(Concord Consortium, 2004)
• Computer-aided learning -- learners explore science concepts in structured simulations in high-school biology and chemistry.
• Research Question: Does increased on-line guidance cause students to achieve at higher levels?
• Outcome: Science achievement, measured on-line.• Experimental Conditions:
• On-line guidance provided, with the level of guidance being manipulated continuously.
• Assigned randomly to student, by ID.• Treatment level entered as a continuous predictor into the data-
analyses.
More statistical power than “categorical” treatment assignment.
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 16
S290: Quantitative Methods for Improving Causal Inference in Educational ResearchAppendix II: Randomized Experiments Need Not Simply Be Two-Group Comparisons …
You can make the treatment level a time-varying predictor
Possible Extension of the Tennessee Student/Teacher Achievement Ratio (STAR)
Experiment (Krueger, 1999)• Research Question: Is academic
achievement enhanced when students are taught in smaller classes?
• Outcome: Student Achievement.• Experimental Conditions:
• Could have reassigned students exogenously to different sized classes in different years.
• Solves an ethical issue?• Analysis: Modeled the relationship between
individual outcome and the longitudinal class size profiles over time.
More statistical power, richer research questions
Modeling Across the Curriculum (MAC) Project (Concord Consortium, 2004)
• Computer-aided learning -- learners explore science concepts in guided simulations in high-school biology and chemistry.
• Research Question: Does increased on-line guidance lead students to achieve at higher levels?
• Outcome: Science achievement, measured at each log-on, via the interactive technology.
• Experimental Conditions:• On-line guidance provided, with level of
guidance being manipulated continuously.• Student randomly assigned a different level
of individual guidance at each log-on.• Treatment level entered as a continuous time-
varying predictor into the data-analyses.
© Murnane & Willett, Harvard University Graduate School of Education, 04/22/2023 S290/Class #01 – Slide 17
S290: Quantitative Methods for Improving Causal Inference in Educational Research Appendix III: Selected Comments On Next Week’s Reading
Howell, W.G, Wolf, P. J., Campbell, D. E., & Peterson, P. E. (2002). School Vouchers And Academic Performance: Results From Three Randomized Field Trials. Journal of Policy Analysis and Management, 21(2), 191-217. The authors report on a classical two-group randomized
experimental design. Their regression analyses of the experimental data include
covariates, but their comments about why the covariates were included are not entirely correct.
The paper distinguishes the effect of Intent to Treat from the effect of Treatment, a distinction that is worth paying attention to, and that will feature repeatedly in the course. In fact, in follow-up analyses, the authors use the voucher
offer as an instrumental variable to estimate the unbiased causal effect of private school. We will talk about this later.
There is attrition from the sample between kindergarten and fourth grade that the authors attempt to adjust for, using empirically estimated post-hoc “probability” weights. Did they get the weights right? Do they adequately reflect the attrition?