![Page 1: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/1.jpg)
Efficient Estimation for Staggered Rollout Designs
Jonathan Roth
Microsoft
Pedro H. C. Sant’Anna
Microsoft and Vanderbilt University
July 2021
![Page 2: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/2.jpg)
Introduction
• In many settings, treatments are rolled out to different units at different points in time.
• Social policies may be introduced in different locations at different times.
• Companies may introduce new feature or marketing campaign to customers at different times.
• Researcher can often ensure that treatment timing is random by design.
• Alternatively, researcher might argue that treatment timing is quasi-random or
as-good-as-random (which is somehow common in Economics).
• This paper studies efficient estimation of causal effects in such (quasi-)randomized
staggered rollout designs
1
![Page 3: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/3.jpg)
What is the status-quo method in these cases?
2
![Page 4: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/4.jpg)
Current Practice
• Two-way fixed effects (TWFE) methods are commonly use, but we now know they may not
recover interpretable treatment effect parameters (Borusyak and Jaravel, 2017; Athey and Imbens,
2021; Goodman-Bacon, 2018; de Chaisemartin and D’Haultfœuille, 2020; Sun and Abraham, 2020)
• Alternative DiD estimators that give more sensible estimands under heterogeneity have been
proposed (Callaway and Sant’Anna, 2020; Sun and Abraham, 2020; de Chaisemartin and D’Haultfœuille,
2020)
• All of these procedures exploit different parallel trends assumptions, not random timing.
• Although technically weaker, parallel trends assumptions are routinely motivated by arguing that
treatment timing is “quasi-random”.
• In the absence of random treatment timing, DiD estimators may be sensitive to functional form
restrictions. (Roth and Sant’Anna, 2021).
3
![Page 5: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/5.jpg)
What we do in this paper
• We introduce a design-based framework formalizing the notion of random treatment
timing (Imbens and Rubin, 2015; Athey and Imbens, 2021; Abadie, Athey, Imbens and Wooldridge, 2020;
Bojinov, Rambachan and Shephard, 2020; Rambachan and Roth, 2020; Xu, 2021)
• Consider estimation of a large class of causal parameters that aggregate average effects
across periods and cohorts
• Solve for the efficient estimator in a class of estimators that nests existing approaches• Sample analog plus linear adjustment for pre-treatment differences in outcome
• In our setting, pre-treatment outcomes play a similar role to fixed covariates in a
randomized experiment (Freedman, 2008b,a; Lin, 2013; Li and Ding, 2017; Imbens and Rubin, 2015)
• We provide both t-based and permutation-test based methods for randomization inference.
4
![Page 6: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/6.jpg)
Does this matter in practice?
5
![Page 7: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/7.jpg)
Application - Background
• Reducing police misconduct and use of force is an important policy objective.
• Wood, Tyler and Papachristos (2020a, PNAS) studied a randomized roll out of aprocedural justice training program for police officers• Emphasized respect, neutrality, and transparency in the exercise of authority
• Original study found large & significant reductions in complaints/use of force
• But we discovered a statistical error in the original analysis!• They compared cohorts without normalizing by cohort size
• In Wood, Tyler, Papachristos, Roth and Sant’Anna (2020b), we re-analyzed data usingthe method of Callaway and Sant’Anna (2020)• No significant impacts on complaints; borderline significant effects on force; but CIs for all
outcomes were wide
6
![Page 8: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/8.jpg)
7
![Page 9: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/9.jpg)
Table 1: Estimates and 95% CIs as a Percentage of Pre-treatment Means
Note: This table shows the pre-treatment means for the three outcomes. It also displays the
estimates and 95% CIs in Figure ?? as percentages of these means, as well as the p-value from
a Fisher Randomization Test (FRT). The final columns shows the ratio of the CI length using
the CS estimator relative to the plug-in efficient estimator.
8
![Page 10: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/10.jpg)
Framework
9
![Page 11: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/11.jpg)
Framework
• Finite population of units: i = 1, ...,N
• T periods: t = 1, ...,T
• Unit i is first treated at period Gi ∈ G ⊂ {1, ...,T} ∪ {∞}• Gi =∞ denotes never treated.
• Treatment is an “absorbing state.”
• Potential outcomes: Yi ,t(g) = i ’s outcome in t if first treated at g
• We observe Yi ,t =∑
g 1[Gi = g ]Yi ,t(g)
• Adopt a design-based framework: Yi ,t(·) and Ng =∑
i 1[Gi = g ] treated as fixed, G is
stochastic.
10
![Page 12: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/12.jpg)
Two Key Assumptions
Assumption 1 (Random treatment timing)
Let G = (G1, ...,GN). Then P (G = g) = (∏
g∈G Ng !)/N! if∑
i 1[gi = g ] = Ng for all g , and
zero otherwise.
• Any permutation of treatment timing that preserves group size is equally likely
Assumption 2 (No anticipation)
For all i ,t and g , g ′ > t, Yi ,t(g) = Yi ,t(g′).
• No Anticipation may fail if treatment timing announced in advance (Malani and Reif, 2015)
• Note that No Anticipation allows for arbitrary treatment effect dynamics once treatment
has occurred.
11
![Page 13: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/13.jpg)
Special Case: 2-Period Model
• Suppose T = 2 and G = {2,∞}, so some units are treated in period 2 and some are never
treated.
• Under Randomization and No Anticipation, this is analogous to a cross-sectionalrandom experiment with Yi ,t=2 the outcome and Yi ,t=1 playing the role of a fixedcovariate.
• Yi = Yi,t=2;
• Xi = Yi,t=1 ≡ Yi,t=1(∞)
• Di = 1[Gi = 2]
12
![Page 14: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/14.jpg)
Causal Parameters of Interest
13
![Page 15: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/15.jpg)
Estimands
• With staggered treatment timing, there are many possible causal estimands.
Consider a flexible class of possible aggregations.
• Building block: Following Athey and Imbens (2021), let τt,gg ′ be average effect on
outcome in period t of switching treatment from g ′ to g
τt,gg ′ =1
N
∑i
Yi ,t(g)− Yi ,t(g′).
• Consider a (scalar) estimand that aggregates these building blocks:
θ =∑t,gg ′
at,gg ′τt,gg ′
14
![Page 16: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/16.jpg)
Estimands in the Staggered Case
• In the staggered case, there are many possible ways of aggregating effects across cohorts
and time periods.
• One useful parameter is ATE (t, g), the average effect at time t of being treated at g
relative to being never treated:
ATE (t, g) =1
N
∑i
Yi ,t(g)− Yi ,t(∞).
• Following Callaway and Sant’Anna (2020), one might also be interested in summary
parameters that are weighted averages of ATE (t, g) along different dimensions.
• All these aggregations fit into our setup
15
![Page 17: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/17.jpg)
Class of estimators we consider
16
![Page 18: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/18.jpg)
Estimators
• Define θ0 to be the sample plug-in estimator for θ:
θ0 =∑t,gg ′
at,gg ′ τt,gg ′ ,
where τt,gg ′ = Ytg − Ytg ′ and Ytg is the sample mean of Yi ,t for cohort g .
• We will consider the class of estimators of the form
θβ = θ0 − X ′β,
where X is a vector guaranteed to be mean-zero by No Anticipation.
• Formally, each element of X aggregates differences btwn groups before either is treated
Xj =∑
(t,g ,g ′):g ,g ′>t
bjt,gg ′ τt,gg ′ .
17
![Page 19: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/19.jpg)
Example: 2 period Example
• Set-up: Two periods (T = 2). Units treated in period 2 or never (G = {2,∞})
• Target parameter: Average treatment effect (ATE) in period 2:
θ = τ2,2∞ =1
N
∑i
Yi ,t=2(2)− Yi ,t=2(∞)
• Class of estimators:
θ0 is the simple difference in means at t = 2:
Yt=2,g=2 − Yt=2,g=∞
X is the pre-treatment difference in means: Yt=1,g=2 − Yt=1,g=∞
18
![Page 20: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/20.jpg)
Example: 2 period Example
• Our proposed estimator is of the form
θβ = θ0 − X ′β,
• The difference-in-differences estimator is
θ1 = θ0 − X = (Yt=2,g=2 − Yt=2,g=∞)− (Yt=1,g=2 − Yt=1,g=∞),
corresponding with θβ with β = 1.
• In this simple 2x2 case, θβ = θ0 − X ′β is isomorphic to class of regression adjustedestimators in Freedman (2008b,a); Lin (2013)
• They consider τ(β1, β2) = θ0 − β′1X1 + β′0X0 = θβ , for β = N1
N β0 + N0
N β1.
19
![Page 21: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/21.jpg)
Estimators in this class
• Several previously proposed estimators correspond with θ1 = θ0 − X for an appropriately
specificied θ0 and X .
• Callaway and Sant’Anna (2020) consider estimators that aggregate 2x2 diff-in-diff
estimators:
τCSw =∑t,g
wt,g
(Yt,g − Yt,∞)︸ ︷︷ ︸Diff in period t
− (Yg−1,g − Yg−1,∞)︸ ︷︷ ︸Diff in period g−1
.• This can be viewed as an estimator of the form θ0 − X , where
θ0 =∑t,g
wt,g τt,g∞ and X =∑t,g
wt,g τg−1,g∞
20
![Page 22: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/22.jpg)
Related Staggered Estimators
• Several variants to the Callaway and Sant’Anna (2020) estimator have been proposed that
can likewise be cast into this class
• Callaway and Sant’Anna (2020) propose an alternative estimator using not-yet-treated
instead of never-treated as the comparison
• Sun and Abraham (2020) propose a similar estimator using last-to-be-treated as the
comparison
• de Chaisemartin and D’Haultfœuille (2020)’s estimator equivalent to Callaway and
Sant’Anna (2020) estimator for particular choice of weights, corresponding with
event-study at lag 0
21
![Page 23: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/23.jpg)
The Efficient Estimator
22
![Page 24: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/24.jpg)
Unbiasedness
Proposition 1 (Unbiasedness)The estimator θβ = θ0 − X ′β is unbiased over the randomization distribution for any β,
E[θβ
]= θ for all β.
23
![Page 25: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/25.jpg)
Efficient Estimator
Proposition 2The variance of θβ = θ0 − X ′β is uniquely minimized at
β∗ = (Var[X]
︸ ︷︷ ︸=VX
)−1 Cov[X , θ0
]︸ ︷︷ ︸
=VX ,θ0
if VX is positive definite.
24
![Page 26: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/26.jpg)
Solving for the Variance
Recall that θ0 and X are both linear functions of cohort sample means Yg .
Can write them as:
θ0 =∑g
Aθ,g Yg and X =∑g
A0,g Yg .
Can apply Li and Ding (2017)’s results for experiments with multiple outcomes/treatments:
Proposition 3
Var
[(θ0
X
)]=
( ∑g Ng
−1 Aθ,g Sg A′θ,g−N−1Sθ,
∑g Ng
−1 Aθ,g Sg A′0,g∑
g Ng−1 A0,g Sg A
′θ,g ,
∑g Ng
−1 A0,g Sg A′0,g
),
where Sg = Varf [Yi (g)], Sθ = Varf
[∑g Aθ,gYi (g)
].
• Depends on estimable variances of potential outcomes (Sg ), and
non-estimable variances of treatment effects Sθ.
• But β∗ depends only on estimable quantities not on heterogeneous treatment effects.25
![Page 27: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/27.jpg)
Properties of the Plug-In Efficient Estimator
26
![Page 28: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/28.jpg)
The Plug-In Estimator
• So far we have solved for the efficient β∗, but it depends on the variances of potential
outcomes Sg , which are typically not known ex ante.
• Consider the feasible plug-in efficient estimator based on β∗, which replaces Sg with asample analog Sg in the expression for β∗.
• Sg = 1/(Ng − 1)∑
i 1[Gi = g ](Yi − Yg )(Yi − Yg )′.
• Will show that in large populations the plug-in estimator θβ∗ has similar properties to the
“oracle” estimator θβ∗ .
27
![Page 29: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/29.jpg)
Large population asymptotics
• Consider a sequence of populations in which Ng grows large for all g , satisfying certain
regularity conditions
Assumption 3
(i) Cohort shares converge to a constant:
• For all g ∈ G, Ng/N → pg ∈ (0, 1).
(ii) Variances of potential outcomes converge to a constant:
• For all g , g ′, Sg and Sgg′ have limiting values denoted S∗g and S∗gg′ , respectively, with S∗g positive
definite.
(iii) No individual dominates the variance of potential outcomes (Lindeberg-type condition):
• maxi,g ||Yi (g)− Y (g)||2/N → 0.
28
![Page 30: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/30.jpg)
Asymptotic Properties of the Plug-In Estimator
• Under the given asymptotic conditions, the plug-in efficient estimator is asymptotically
normally distributed with the same variance as the “oracle” efficient estimator.
Proposition 4
Under the given asymptotic conditions,
√N(θβ∗ − θ)→d N
(0, σ2
∗),
where
σ2∗ = lim
N→∞NVar
[θβ∗].
29
![Page 31: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/31.jpg)
Variance Estimation
30
![Page 32: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/32.jpg)
Covariance Estimation
• As is common in finite-population settings, the variance of θβ∗ can only be estimated
conservatively.
• The issue is that the variance of θβ∗ contains the term −Sθ = −Varf
[∑g Aθ,gYi (g)
].
This is not consistently estimable since it depends on covariances of potential outcomes
that are never observed together.
• A natural conservative approach is the Neyman-style variance estimate, which ignores
Sθ and replaces Sg with Sg in the variance formula.
• In paper, we show that a less conservative variance estimator can be obtained byestimating the part of Sθ explained by X .
• Mirrors the use of pre-treatment covariates in Lin (2013); Abadie et al. (2020)
31
![Page 33: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/33.jpg)
What about Fisher Randomization Tests
32
![Page 34: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/34.jpg)
Fisher Randomization Tests
• An alternative approach to inference uses Fisher Randomization Tests (FRTs)
• We show that an FRT using a studentized version of the efficient estimator has thedual advantages :
1. has exact size under the sharp null of no treatment effects for all units;
2. is asymptotically valid for the weak null that θ = 0.
• Studentization is key!
• In general, (unstudentized) FRTs may not have correct size for such weak null hypotheses
even asymptotically (Wu and Ding, 2020).
• We build on Wu and Ding (2020) and Zhao and Ding (2020) to show that studentization
bypass this problem: FRT is asy. equiv. to testing that 0 falls within the t-based confidence
interval CI∗∗
33
![Page 35: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/35.jpg)
Fisher Randomization Tests
The following regularity condition imposes that the means of the potential outcomes have
limits, and that their fourth moment is bounded.
Assumption 4Suppose that for all g , limN→∞ Ef [Yi (g)] = µg <∞, and there exists L <∞ such that
N−1∑
i ||Yi (g)− Ef [Yi (g)] ||4 < L for all N.
34
![Page 36: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/36.jpg)
Fisher Randomization Tests
With this assumption in hand, we can make precise the sense in which the FRT is
asymptotically valid under the weak null.
Proposition 5
Suppose Assumptions 1-4 hold. Let tπ = (θ∗/se)π be the studentized statistic under
permutation π. Then tπ →d N (0, 1), PG -almost surely. Hence, if pFRT is the p-value from
the FRT associated with |tπ|, then under H0 : θ = 0,
limN→∞
P(pFRT ≤ α) ≤ α,
PG -almost surely, with equality if and only if S∗θ = 0.
35
![Page 37: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/37.jpg)
Conclusion
36
![Page 38: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/38.jpg)
Conclusion
• We study staggered rollout designs, in which units randomly receive treatment at different
times.
• Estimation in these settings is often done using generalized difference-in-differences
procedures.
• We solve for the efficient estimator in a class that nests common procedures.
Our proofs draw on parallels to the literature on covariate adjustment in experiments.
• We provide both t-based and permutation-test based methods for randomization inference.
• The plug-in efficient estimator offers substantial precision gains relative to existing
approaches.
37
![Page 39: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/39.jpg)
38
![Page 40: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/40.jpg)
Abadie, Alberto, Susan Athey, Guido W. Imbens, and Jeffrey M. Wooldridge,
“Sampling-Based versus Design-Based Uncertainty in Regression Analysis,” Econometrica,
2020, 88 (1), 265–296.
Athey, Susan and Guido Imbens, “Design-Based Analysis in Difference-In-Differences
Settings with Staggered Adoption,” Journal of Econometrics, 2021, Forthcoming.
Bojinov, Iavor, Ashesh Rambachan, and Neil Shephard, “Panel Experiments and Dynamic
Causal Effects: A Finite Population Perspective,” arXiv:2003.09915 [stat], 2020. arXiv:
2003.09915.
Borusyak, Kirill and Xavier Jaravel, “Revisiting Event Study Designs,” SSRN Scholarly
Paper ID 2826228, Social Science Research Network, Rochester, NY 2017.
Callaway, Brantly and Pedro H. C. Sant’Anna, “Difference-in-Differences with multiple
time periods,” Journal of Econometrics, December 2020.
38
![Page 41: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/41.jpg)
de Chaisemartin, Clement and Xavier D’Haultfœuille, “Two-Way Fixed Effects Estimators
with Heterogeneous Treatment Effects,” American Economic Review, September 2020, 110
(9), 2964–2996.
Freedman, David A., “On Regression Adjustments in Experiments with Several Treatments,”
The Annals of Applied Statistics, 2008, 2 (1), 176–196.
, “On regression adjustments to experimental data,” Advances in Applied Mathematics,
2008, 40 (2), 180–193.
Goodman-Bacon, Andrew, “Public Insurance and Mortality: Evidence from Medicaid
Implementation,” Journal of Public Economics, 2018, 126 (1), 216–262.
Imbens, Guido W. and Donald B. Rubin, Causal Inference for Statistics, Social, and
Biomedical Sciences: An Introduction, 1 edition ed., New York: Cambridge University Press,
April 2015.
38
![Page 42: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/42.jpg)
Li, Xinran and Peng Ding, “General Forms of Finite Population Central Limit Theorems with
Applications to Causal Inference,” Journal of the American Statistical Association, October
2017, 112 (520), 1759–1769.
Lin, Winston, “Agnostic notes on regression adjustments to experimental data: Reexamining
Freedman’s critique,” Annals of Applied Statistics, March 2013, 7 (1), 295–318.
Malani, Anup and Julian Reif, “Interpreting pre-trends as anticipation: Impact on estimated
treatment effects from tort reform,” Journal of Public Economics, April 2015, 124, 1–17.
Rambachan, Ashesh and Jonathan Roth, “Design-Based Uncertainty for
Quasi-Experiments,” arXiv:2008.00602 [econ.EM], 2020. arXiv: 2003.09915.
Roth, Jonathan and Pedro H. C. Sant’Anna, “When Is Parallel Trends Sensitive to
Functional Form?,” arXiv:2010.04814 [econ, stat], January 2021. arXiv: 2010.04814.
Sun, Liyang and Sarah Abraham, “Estimating dynamic treatment effects in event studies
with heterogeneous treatment effects,” Journal of Econometrics, December 2020.
38
![Page 43: Efficient Estimation for Staggered Rollout Designs...E cient Estimation for Staggered Rollout Designs Jonathan Roth Microsoft Pedro H. C. Sant’Anna Microsoft and Vanderbilt University](https://reader036.vdocuments.net/reader036/viewer/2022070223/61405fa21664f1518558b6b3/html5/thumbnails/43.jpg)
Wood, George, Tom R. Tyler, and Andrew V. Papachristos, “Procedural justice training
reduces police use of force and complaints against officers,” Proceedings of the National
Academy of Sciences, May 2020, 117 (18), 9815–9821.
, , , Jonathan Roth, and Pedro H.C. Sant’Anna, “Revised Findings for “Procedural
justice training reduces police use of force and complaints against officers”,” Working Paper,
2020.
Wu, Jason and Peng Ding, “Randomization Tests for Weak Null Hypotheses in Randomized
Experiments,” Journal of the American Statistical Association, May 2020, pp. 1–16. arXiv:
1809.07419.
Xu, Ruonan, “Potential outcomes and finite-population inference for M-estimators,”
Econometrics Journal, 2021, (Forthcoming).
Zhao, Anqi and Peng Ding, “Covariate-adjusted Fisher randomization tests for the average
treatment effect,” arXiv:2010.14555 [math, stat], November 2020. arXiv: 2010.14555.
37