propensity score matching a primer in r 1 david zepeda assistant professor supply chain &...
TRANSCRIPT
Propensity Score MatchingA Primer in R
1
David ZepedaAssistant Professor
Supply Chain & Information [email protected]
Center for Health Policy and Healthcare ResearchBrown Bag Series
April 1, 2015
Outline
1. Problem description2. Theory3. Two-Step Approach4. Implementation in R5. Example 1 – Hospitals6. Example 2 – Primary Care Clinics7. Example 3 – Farm Land 8. References
2
Problem
3
Consider a study with n observational units. There are two treatment states, 0 and 1.
The response of the ith unit is indicated by if the ith unit received treatment = 1 and if the ith unit received treatment = 0.
Mean causal effect =
Problem
4
The goal is to estimate the average effect over observations.
Let be a vector of the characteristics of the unit (i.e., covariates) that are not consequences of the treatment.
The mean causal effect averaged over all units is the average treatment effect (ATE).
ATE =
Problem
5
An observational unit is generally assigned only one of the two treatments.
The treatment is not randomly assigned.
Results in a number of potential problems regarding bias and model dependence.
Problem
6
Source: Ho, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.
Let T be an indicator of treatment
Now compare a randomly selected treated unit to a randomly selected control unit.
Because conditioning on T, the difference between two randomly selected observational units is not in general the mean causal effect!
Theory
7
But, if treatment assignment is , then we have
Theory
8
Theory
9
A tool used to estimate treatment effects is a “balancing score”. Suppose is a function of covariates such that
such that
Sampling a treated and control unit with the same value of we have
Theory
10
A propensity score is p.
The true propensity score is a “balancing score” such that if treatment and control groups have identical propensity score distributions then all covariates are balanced between the two groups.
Therefore, we can use logistic regression or another approach (i.e., probit) to estimate p so we can get matched samples!
Can now use same parametric analysis on the matched dataset as would have been used to analyze the original raw dataset prior to propensity score matching.
1. Preprocessing via matchingDistributions of be matched as closely as possible (i.e., balance)Relationship between T and is eliminatedDo not introduce biasDo no increase inefficiency too muchCreate a dataset closer to an experiment
2. Parametric analysis after matching Holding covariates constantIncrease robustnessLess sensitive to modeling assumptionsA variety of treatment effects can be analyzed (i.e., ATE)
Two-Step Approach
11
What is R?A language and environment for statistical computing and
graphics Provides a wide variety of statistical and graphical techniquesIs highly extensible Provides an Open Source route to participation Great care has been taken over the defaults for the minor design
choices in graphicsUser retains full control Available as Free Software!Allows users to add additional functionalityCan be extended (easily) via packages.
The R Project for Statistical Computinghttp://www.r-project.org/
Implementation in R
12
MATCHIT PackageDichotomous treatment variableExperimental and observational dataImproving parametric statistical modelsReduces model dependenceSemi-parametric and non-parametric preprocessingAssess covariate distributions in the two groups (i.e., balance)
Large range of matching methodsExactSubclassificationNearest neighborOptimalGenetic
Implementation in R
13
Exact matchingSimplest version of matchingMatch each treated unit to all possible control unitsExactly the same values on all the covariatesSufficient matches often cannot be found
SubclassificationForms subclasses with “close” distributions of covariatesVarious subclassification schemesCan be used in conjunction with other matching methods
Nearest neighbor matchingSelects “best” control matches for each treated unitChooses the control unit not yet matched closest to treated unit
Implementation in R
14
Optimal matchingFinds matched samples with smallest average absolute distanceHelpful when there are not many appropriate control matches
Genetic matchingUses a genetic search algorithmOptimal balance achieved after matchingPerforms statistical tests for determining balance
Variety of options for matching methodsNumber of matched control unitsMatching with or without replacementKernel matchingDiscard treated units, control units, or bothNumber of subclassesDistance measurement (i.e., logit)
Implementation in R
15
Association between hospital system affiliation and hospital inventory in California hospitals (Zepeda, Nyaga, & Young, WP 2015)California hospital data from 2007 – 2009878 observations (126 affiliated with smaller hospital systems)Preprocessing of data on affiliation with smaller hospital systemsGenetic matching method2 control observations with replacement for every treated
observation126 observations in treatment group156 observations in control groupPropensity score balancing improved by 95%
Example 1
16
Example 1
17
Association between IT-leveraging capability and high quality diabetes care in Minnesota primary care clinics (Zepeda & Sinha, WP 2015)Minnesota primary care clinics in 2010450 observations (135 with high IT-leveraging capability)Preprocessing of data on high IT-leveraging capabilityOptimal matching method1 control observations without replacement for every treated
observation135 observations in treatment group135 observations in control groupPropensity score balancing improved by 76%
Example 2
18
Example 2
19
Effect of easements on the selling price of farms in Minnesota (Taff & Weisberg, 2007)Federal Conservation Reserve Program (CRP)Temporary conservation easement by USDA (10-15 years)Annual payment by USDA for enrolled landLand valuation theory predicts that temporary easements should
have no effect on value of properties
DataOct 1, 2002 – Sep 30, 2004Farm properties with short-term conservation easementsFarm properties with no conservation easementsCovariates2,937 property sales (271 were restricted by CRP contracts)
Example 3
20
The primary objectiveCompare 271 sales with CRP restrictions to sales without
Standard observational study approachUse all sales with no CRP as a comparison group
Potential problemProperties sold without a random assignmentDifferences between observable sample and target population
may be a cause for bias
Using propensity score matchingMimic a randomized experimentSample of non-CRP and CRP salesClosely agree on salient property characteristics (i.e., balance)
Example 3
21
Example 3
22
MediansUpper 75%Lower 25%Dotted lines = 95%
Six models developed and testedModels 1 – 3: use all data, CRP and portion of land RESTRICTEDModel 4: restricts data to sales with PRODUCTIVITY measureModel 5: matched sample on CRP restrictionModel 6: matched sample with PRODUCTIVITY measure
Consistency in resultsCRP contracts negatively associated with sales pricesMost of CRP effect is captured by RESTRICTED amountCounter to land valuation theory
Example 3
23
Example 3
24
The R Project for Statistical Computinghttp://www.r-project.org/
MATCHIT R Packagehttp://gking.harvard.edu/matchitHo, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as
nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.
ExamplesZepeda, D., Nyaga, G., & Young, G. 2015. Supply Chain Risk
Management and Hospital Inventory: Effects of System Affiliation. Working Paper.
Zepeda, D. & Sinha, K. IT-Leveraging Capability for Reducing Health Care Disparities: An Empirical Analysis of Primary Care Operations. Working Paper.
Taff, S.J. & Weisberg, S. 2007. Compensated short-term conservation restrictions may reduce sales prices. The Appraisal Journal, Winter.
References
25