rerandomization in randomized experiments

Rerandomization in Randomized Experiments

Kari Lock and Don RubinHarvard University

JSM 2010

The “Gold Standard”

Why are randomized experiments so good?

• They yield unbiased estimates of the treatment effect

• They eliminate (?) confounding factors…… ON AVERAGE. For any particular experiment, covariate imbalance is possible (and likely)

Rerandomization

• Suppose you are doing a randomized experiment and have covariate information available before conducting the experiment

• You randomize to treatment and control, but get a “bad” randomization

• Can you rerandomize?• Yes, but you first need to specify a concrete definition

of “bad”

Randomize subjects to treated and control

Collect covariate dataSpecify a criteria determining when a randomization is unacceptable;

based on covariate balance

(Re)randomize subjects to treated and control

Check covariate balance

1)

2)

Conduct experiment

unacceptable acceptable

Analyze results with a Fisher randomization test

3)

4)

UnbiasedTo maintain an unbiased estimate of the treatment effect, the decision to rerandomize or not must be automatic and specified in advance blind to which group is treated

Theorem: If the treated and control groups are the same size, and if for every unacceptable randomization the exact opposite randomization is also unacceptable, then rerandomization yields an unbiased estimate of the treatment effect.

Mahalanobis Distance

Define overall covariate distance byM = D’r-1D

2Under adequate sample sizes and pure randomization: ~ kM

Dj : Standardized difference between treated and control covariate means for covariate jk = number of covariatesD = (D1, …, Dk)r = covariate correlation matrix = cov(D)

Choose a and rerandomize when M > a

Rerandomization Based on M

• Since M follows a known distribution, easy to specify the proportion of rejected randomizations

• M is affinely invariant

• Correlations between covariates are maintained

• The variance reduction on each covariate is the same (and known)

•The variance reduction for any linear combination of the covariates is known

RerandomizationTheorem: If nT = nC and rerandomization occurs when M > a, then

| ,cov co| v

T C

T C T Ca

E M aM a v

X X 0X X X X

and

1,2 2 2 , is the incomplete gamma function

,2 2

a

k a

vk ak

2va

| 0,| 1 (1 ) var .r

T C

T C T Ca

E M aMY Y

Y Ya vY R Y

Difference in Covariate Means

Difference in Outcome Means

Pure RandomizationRe-Randomization

Standardized Differences in Covariate Means

-4 -2 0 2 4

male

age

collgpaa

actcomp

preflit

likelit

likemath

numbmath 0.14

0.15

0.17

0.14

0.16

0.16

0.16

0.15

, ,

, ,

var( |var(

))

j T j C

j T j C

X X aTX X

(theoretical va = .16)

Pure RandomizationRe-Randomization

var( | .57var(

))

T C

T C

Y Y aY Y

T

(theory = .58)

Math

Difference in Means

-1.0 -0.5 0.0 0.5 1.0

Verbal

Difference in Means

-1.0 -0.5 0.0 0.5 1.0

Equivalent to increasing the

sample size by a factor of 1.7

Difference in Outcome Means Under Null

Conclusion• Rerandomization improves covariate balance between the treated and control means, and increases precision in estimating the treatment effect if the covariates are correlated with the response

• Rerandomization gives the researcher more power to detect a significant result, and more faith that an observed effect is really due to the treatment

[email protected]

rerandomization in randomized experiments

Documents