propensity score matching a primer in r 1 david zepeda assistant professor supply chain &...

Propensity Score MatchingA Primer in R

1

David ZepedaAssistant Professor

Supply Chain & Information [email protected]

Center for Health Policy and Healthcare ResearchBrown Bag Series

April 1, 2015

http://cba.neu.edu/

Outline

1. Problem description2. Theory3. Two-Step Approach4. Implementation in R5. Example 1 – Hospitals6. Example 2 – Primary Care Clinics7. Example 3 – Farm Land 8. References

2

Problem

3

Consider a study with n observational units. There are two treatment states, 0 and 1.

The response of the ith unit is indicated by if the ith unit received treatment = 1 and if the ith unit received treatment = 0.

Mean causal effect =

Problem

4

The goal is to estimate the average effect over observations.

Let be a vector of the characteristics of the unit (i.e., covariates) that are not consequences of the treatment.

The mean causal effect averaged over all units is the average treatment effect (ATE).

ATE =

Problem

5

An observational unit is generally assigned only one of the two treatments.

The treatment is not randomly assigned.

Results in a number of potential problems regarding bias and model dependence.

Problem

6

Source: Ho, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.

Let T be an indicator of treatment

Now compare a randomly selected treated unit to a randomly selected control unit.

Because conditioning on T, the difference between two randomly selected observational units is not in general the mean causal effect!

Theory

7

But, if treatment assignment is , then we have

Theory

8

Theory

9

A tool used to estimate treatment effects is a “balancing score”. Suppose is a function of covariates such that

such that

Sampling a treated and control unit with the same value of we have

Theory

10

A propensity score is p.

The true propensity score is a “balancing score” such that if treatment and control groups have identical propensity score distributions then all covariates are balanced between the two groups.

Therefore, we can use logistic regression or another approach (i.e., probit) to estimate p so we can get matched samples!

Can now use same parametric analysis on the matched dataset as would have been used to analyze the original raw dataset prior to propensity score matching.

1. Preprocessing via matchingDistributions of be matched as closely as possible (i.e., balance)Relationship between T and is eliminatedDo not introduce biasDo no increase inefficiency too muchCreate a dataset closer to an experiment

2. Parametric analysis after matching Holding covariates constantIncrease robustnessLess sensitive to modeling assumptionsA variety of treatment effects can be analyzed (i.e., ATE)

Two-Step Approach

11

What is R?A language and environment for statistical computing and

graphics Provides a wide variety of statistical and graphical techniquesIs highly extensible Provides an Open Source route to participation Great care has been taken over the defaults for the minor design

choices in graphicsUser retains full control Available as Free Software!Allows users to add additional functionalityCan be extended (easily) via packages.

The R Project for Statistical Computinghttp://www.r-project.org/

Implementation in R

12

http://www.r-project.org/

MATCHIT PackageDichotomous treatment variableExperimental and observational dataImproving parametric statistical modelsReduces model dependenceSemi-parametric and non-parametric preprocessingAssess covariate distributions in the two groups (i.e., balance)

Large range of matching methodsExactSubclassificationNearest neighborOptimalGenetic

Implementation in R

13

Exact matchingSimplest version of matchingMatch each treated unit to all possible control unitsExactly the same values on all the covariatesSufficient matches often cannot be found

SubclassificationForms subclasses with “close” distributions of covariatesVarious subclassification schemesCan be used in conjunction with other matching methods

Nearest neighbor matchingSelects “best” control matches for each treated unitChooses the control unit not yet matched closest to treated unit

Implementation in R

14

Optimal matchingFinds matched samples with smallest average absolute distanceHelpful when there are not many appropriate control matches

Genetic matchingUses a genetic search algorithmOptimal balance achieved after matchingPerforms statistical tests for determining balance

Variety of options for matching methodsNumber of matched control unitsMatching with or without replacementKernel matchingDiscard treated units, control units, or bothNumber of subclassesDistance measurement (i.e., logit)

Implementation in R

15

Association between hospital system affiliation and hospital inventory in California hospitals (Zepeda, Nyaga, & Young, WP 2015)California hospital data from 2007 – 2009878 observations (126 affiliated with smaller hospital systems)Preprocessing of data on affiliation with smaller hospital systemsGenetic matching method2 control observations with replacement for every treated

observation126 observations in treatment group156 observations in control groupPropensity score balancing improved by 95%

Example 1

16

Example 1

17

Association between IT-leveraging capability and high quality diabetes care in Minnesota primary care clinics (Zepeda & Sinha, WP 2015)Minnesota primary care clinics in 2010450 observations (135 with high IT-leveraging capability)Preprocessing of data on high IT-leveraging capabilityOptimal matching method1 control observations without replacement for every treated

observation135 observations in treatment group135 observations in control groupPropensity score balancing improved by 76%

Example 2

18

Example 2

19

Effect of easements on the selling price of farms in Minnesota (Taff & Weisberg, 2007)Federal Conservation Reserve Program (CRP)Temporary conservation easement by USDA (10-15 years)Annual payment by USDA for enrolled landLand valuation theory predicts that temporary easements should

have no effect on value of properties

DataOct 1, 2002 – Sep 30, 2004Farm properties with short-term conservation easementsFarm properties with no conservation easementsCovariates2,937 property sales (271 were restricted by CRP contracts)

Example 3

20

The primary objectiveCompare 271 sales with CRP restrictions to sales without

Standard observational study approachUse all sales with no CRP as a comparison group

Potential problemProperties sold without a random assignmentDifferences between observable sample and target population

may be a cause for bias

Using propensity score matchingMimic a randomized experimentSample of non-CRP and CRP salesClosely agree on salient property characteristics (i.e., balance)

Example 3

21

Example 3

22

MediansUpper 75%Lower 25%Dotted lines = 95%

Six models developed and testedModels 1 – 3: use all data, CRP and portion of land RESTRICTEDModel 4: restricts data to sales with PRODUCTIVITY measureModel 5: matched sample on CRP restrictionModel 6: matched sample with PRODUCTIVITY measure

Consistency in resultsCRP contracts negatively associated with sales pricesMost of CRP effect is captured by RESTRICTED amountCounter to land valuation theory

Example 3

23

Example 3

24

The R Project for Statistical Computinghttp://www.r-project.org/

MATCHIT R Packagehttp://gking.harvard.edu/matchitHo, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as

nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.

ExamplesZepeda, D., Nyaga, G., & Young, G. 2015. Supply Chain Risk

Management and Hospital Inventory: Effects of System Affiliation. Working Paper.

Zepeda, D. & Sinha, K. IT-Leveraging Capability for Reducing Health Care Disparities: An Empirical Analysis of Primary Care Operations. Working Paper.

Taff, S.J. & Weisberg, S. 2007. Compensated short-term conservation restrictions may reduce sales prices. The Appraisal Journal, Winter.

References

25

Thank You!

26

David ZepedaAssistant Professor

Supply Chain & Information [email protected]

http://cba.neu.edu/

propensity score matching a primer in r 1 david zepeda assistant professor supply chain &...

Documents

optimal matching

genetic matching

neighbor matching

unit implementation

r project

treated unit

possible control units

control available