kernel matching with automatic bandwidth selection - stata · c cccccccc ccc cc ccc cc c cc cccc...

61
Kernel matching with automatic bandwidth selection Ben Jann University of Bern, [email protected] 2017 London Stata Users Group meeting London, September 7–8, 2017 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 1

Upload: voquynh

Post on 26-May-2018

439 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Kernel matching with automatic bandwidth selection

Ben Jann

University of Bern benjannsozunibech

2017 London Stata Users Group meetingLondon September 7ndash8 2017

Ben Jann (University of Bern) Kernel matching London 07092017 1

Contents

1 BackgroundWhat is MatchingMultivariate Distance Matching (MDM)Propensity Score Matching (PSM)Matching AlgorithmsldquoWhy PSM Should Not Be Used for Matchingrdquo

2 The kmatch commandFeaturesExamplesSome Simulation Results

3 Conclusions

Ben Jann (University of Bern) Kernel matching London 07092017 2

What is Matching

Matching is an approach to ldquocondition on X rdquo between a treatmentgroup and a control group

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values2 The Y values of these matching observations are then used to

compute the counterfactual outcome without treatment for theobservation at hand

3 An estimate for the average treatment effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Kernel matching London 07092017 3

What is Matching

Formally

ATT =1

NT=1

sumi |T=1

[Yi minus Y 0

i

]with Y 0

i =sum

j |T=0

wijYj

ATC =1

NT=0

sumi |T=0

[Y 1

i minus Yi

]with Y 1

i =sum

j |T=1

wijYj

ATE =NT=1

Nmiddot ATT +

NT=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

ATE average treatment effect ATT ate on the treated ATC ate on the untreatedT treatment indicator (01)Y observed outcome Y 1 potential outcome with treatment Y 0 po without treatment

Ben Jann (University of Bern) Kernel matching London 07092017 4

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Kernel matching London 07092017 5

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Kernel matching London 07092017 6

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 2: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Contents

1 BackgroundWhat is MatchingMultivariate Distance Matching (MDM)Propensity Score Matching (PSM)Matching AlgorithmsldquoWhy PSM Should Not Be Used for Matchingrdquo

2 The kmatch commandFeaturesExamplesSome Simulation Results

3 Conclusions

Ben Jann (University of Bern) Kernel matching London 07092017 2

What is Matching

Matching is an approach to ldquocondition on X rdquo between a treatmentgroup and a control group

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values2 The Y values of these matching observations are then used to

compute the counterfactual outcome without treatment for theobservation at hand

3 An estimate for the average treatment effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Kernel matching London 07092017 3

What is Matching

Formally

ATT =1

NT=1

sumi |T=1

[Yi minus Y 0

i

]with Y 0

i =sum

j |T=0

wijYj

ATC =1

NT=0

sumi |T=0

[Y 1

i minus Yi

]with Y 1

i =sum

j |T=1

wijYj

ATE =NT=1

Nmiddot ATT +

NT=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

ATE average treatment effect ATT ate on the treated ATC ate on the untreatedT treatment indicator (01)Y observed outcome Y 1 potential outcome with treatment Y 0 po without treatment

Ben Jann (University of Bern) Kernel matching London 07092017 4

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Kernel matching London 07092017 5

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Kernel matching London 07092017 6

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 3: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

What is Matching

Matching is an approach to ldquocondition on X rdquo between a treatmentgroup and a control group

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values2 The Y values of these matching observations are then used to

compute the counterfactual outcome without treatment for theobservation at hand

3 An estimate for the average treatment effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Kernel matching London 07092017 3

What is Matching

Formally

ATT =1

NT=1

sumi |T=1

[Yi minus Y 0

i

]with Y 0

i =sum

j |T=0

wijYj

ATC =1

NT=0

sumi |T=0

[Y 1

i minus Yi

]with Y 1

i =sum

j |T=1

wijYj

ATE =NT=1

Nmiddot ATT +

NT=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

ATE average treatment effect ATT ate on the treated ATC ate on the untreatedT treatment indicator (01)Y observed outcome Y 1 potential outcome with treatment Y 0 po without treatment

Ben Jann (University of Bern) Kernel matching London 07092017 4

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Kernel matching London 07092017 5

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Kernel matching London 07092017 6

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 4: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

What is Matching

Formally

ATT =1

NT=1

sumi |T=1

[Yi minus Y 0

i

]with Y 0

i =sum

j |T=0

wijYj

ATC =1

NT=0

sumi |T=0

[Y 1

i minus Yi

]with Y 1

i =sum

j |T=1

wijYj

ATE =NT=1

Nmiddot ATT +

NT=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

ATE average treatment effect ATT ate on the treated ATC ate on the untreatedT treatment indicator (01)Y observed outcome Y 1 potential outcome with treatment Y 0 po without treatment

Ben Jann (University of Bern) Kernel matching London 07092017 4

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Kernel matching London 07092017 5

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Kernel matching London 07092017 6

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 5: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Kernel matching London 07092017 5

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Kernel matching London 07092017 6

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 6: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Kernel matching London 07092017 6

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 7: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Propensity Score Matching (PSM)

(Y 0Y 1) perpperp T |X implies (Y 0Y 1) perpperp T |π(X ) where π(X ) is thetreatment probability conditional on X (the ldquopropensity scorerdquo)(Rosenbaum and Rubin 1983)

This simplifies the matching task as we can match onone-dimensional π(X ) instead of multi-dimensional X

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is very popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Kernel matching London 07092017 7

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 8: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Matching Algorithms

Various matching algorithms can be used to find potential matchesbased on MD or π(X ) and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation in the treatment group find the closestobservation in the control group Each control is only used once

Nearest-neighbor matching (with replacement)I For each observation in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times In case of ties use all ties as matches k is set by theresearcher

Caliper matchingI Like nearest-neighbor matching but only use controls with a distancesmaller than some threshold c

Ben Jann (University of Bern) Kernel matching London 07092017 8

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 9: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Matching Algorithms

Radius matchingI Use all controls with a distance smaller than some threshold c

Kernel matchingI Like radius matching but give larger weight to controls with smallerdistances (using some kernel function such as eg the Epanechnikovkernel)

Optional remove remaining imbalance after matching usingregression adjustment (aka ldquobias correctionrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Kernel matching London 07092017 9

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 10: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

ldquoWhy PSM Should Not Be Used for Matchingrdquo

The message of a recent paper by Gary King and Richard Nielsen isDo not use PSM it is really really badI The paper httpjmp1sexgVwI Slides httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

I Watch it httpswwwyoutubecomwatchv=rBv39pK1iEs

Their argument goes about as followsI In experimental language PSM approximates complete randomizationI Other methods such as MDM approximate fully blocked

randomizationI A fully blocked design is more efficient It leads to less data imbalanceand less ldquomodel dependencerdquo (dependence of results on modelingdecisions by the researcher)

I Hence procedures such as MDM dominate PSMI King and Nielsen provide evidence suggesting that PSM performsshockingly bad

Ben Jann (University of Bern) Kernel matching London 07092017 10

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 11: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Matching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 11

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 12: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 13: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 14: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 15: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 16: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slides

byKingandNielsen)

Ben Jann (University of Bern) Kernel matching London 07092017 12

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 17: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

ldquoWhy PSM Should Not Be Used for Matchingrdquo

Are King and Nielsen rightI For a given sample size (as in an experiment with fixed budget) fullyblocked randomization is more efficient than complete randomizationThings are less clear if blocking reduces the sample size as inmatching

I The complete randomization analogy only works for observations withthe same propensity score If X has a strong effect on T there is alot of blocking also in PSM

I King and Nielsonrsquos examples illustrating the bad performance of PSMseem to be based on pair matching without replacement Pairmatching throws away a lot of data For PSM pair matching isparticularly bad because a lot of good data (ie observations with thesame PS) is thrown away (ldquorandom pruningrdquo)

I The performance of PSM should be alright for matching algorithmsthat do not engage in random pruning such as radius or kernelmatching

Ben Jann (University of Bern) Kernel matching London 07092017 13

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 18: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

The kmatch command

New matching software for Stata

Partly written in response to the paper by King and Nielsen

Available from SSC (ssc install kmatch)

Ben Jann (University of Bern) Kernel matching London 07092017 14

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 19: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Key Features

Type of matchingI Multivariate Distance Matching (MDM)I Propensity Score Matching (PSM)I MDM combined with PSMI MDM and PSM combined with exact matching

Matching algorithmsI Kernel matching including ridge and local-linear matchingI Nearest-neighbor matching optionally with caliperI Optional regression adjustment

Several automatic bandwidth selectors for kernel matching

Joint analysis of multiple subgroups and multiple outcome variables

Various post-estimation commands for balancing andcommon-support diagnostics

Computationally efficient

Ben Jann (University of Bern) Kernel matching London 07092017 15

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 20: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Mahalanobis-Distance Kernel MatchingEstimation of the ldquoeffectrdquo of union membership on wages using theNLSW 1988 data sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 16

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 21: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Balancing Statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Kernel matching London 07092017 17

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 22: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Make a Graph of the Balancing Statistics mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Kernel matching London 07092017 18

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 23: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Propensity-Score Kernel Matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Kernel matching London 07092017 19

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 24: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Density Balancing Plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 20

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 25: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Cumulative Distribution Balancing Plot

kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 21

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 26: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Balancing Box Plot

kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Kernel matching London 07092017 22

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 27: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Standard Errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Kernel matching London 07092017 23

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 28: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Postestimation Tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Kernel matching London 07092017 24

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 29: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Nearest-Neighbor Matching (1 Neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Kernel matching London 07092017 25

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 30: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Nearest-Neighbor Matching (5 Neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Kernel matching London 07092017 26

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 31: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Regression Adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Kernel matching London 07092017 27

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 32: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples MDM and PSM combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Kernel matching London 07092017 28

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 33: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples MDM with Exact Matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Kernel matching London 07092017 29

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 34: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

Default 15 times the 90 quantile of the (non-zero) distances inpair matching with replacement (Huber et al 2013 2015)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(pm)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Kernel matching London 07092017 30

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 35: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

Cross validation with respect to the means of X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

Ben Jann (University of Bern) Kernel matching London 07092017 31

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 36: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

302

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 32

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 37: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

Cross validation with respect to Y (Froumllich 2004 2005)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

Ben Jann (University of Bern) Kernel matching London 07092017 33

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 38: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 34

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 39: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

Weighted cross validation with respect to Y (Galdo et al 2008Section 42)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

Ben Jann (University of Bern) Kernel matching London 07092017 35

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 40: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Bandwidth Selection

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

411

1213

14W

eigh

ted

MIS

E

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Kernel matching London 07092017 36

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 41: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Common Support Diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Kernel matching London 07092017 37

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 42: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Make a Graph of Common Support Statistics mat M = r(M)

coefplot matrix(M[4]) noci nolabels xline(0) gt title(Std difference between matched and original)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference between matched and original

Ben Jann (University of Bern) Kernel matching London 07092017 38

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 43: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Multiple Outcome Variables kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Kernel matching London 07092017 39

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 44: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Varying Regression-Adjustment Equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Kernel matching London 07092017 40

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 45: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Examples Treatment Effects by Subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2808206 163 0102 -0917652 1009032

1ATT 9518705 334356 285 0004 2965449 1607196

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 136Prob gt chi2 = 02433

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4227171 117 0243 -335273 1321748

Ben Jann (University of Bern) Kernel matching London 07092017 41

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 46: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 or 5000) from population andcompute various matching estimators

Ben Jann (University of Bern) Kernel matching London 07092017 42

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 47: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Kernel matching London 07092017 43

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 48: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Kernel matching London 07092017 44

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 49: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Kernel matching London 07092017 45

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 50: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 51: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Kernel matching London 07092017 46

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 52: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 53: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Kernel matching London 07092017 47

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 54: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Kernel matching London 07092017 48

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 55: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Here we can observe the well-known result that bootstrap standarderrors are biased (too large) for nearest-neighbor matching

In small samples also the teffects standard errors seem to be slightlyoff (too low) for PSM and for MDM with bias-correction

For kernel matching bootstrap standard standard errors are oftensomewhat too large especially in the small sample The bias is mostpronounced for the estimates using the pair-matching bandwidthselector Results are better if the bandwidth is selected bycross-validation

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 56: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Kernel matching London 07092017 49

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 57: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

2017

-09-

12Kernel matching

The kmatch commandSome Simulation Results

Coverage of teffects CIs is a bit too low for PSM (and for MDM withbias-correction in the small sample)

Bootstrap CIs are too conservative for nearest-neighbor matching

For kernel matching coverage is mostly okay being a bit tooconservative in case of the pair-matching bandwidth selector andconsiderably off (anti-conservative) for the PSM estimates withoutbias-correction (due to the pronounced bias in these estimates)

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 58: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Conclusions

Overall I agree with King and Nielsen that MDM has someadvantages over PSM but it also has some disadvantages Inapplied research the choice may not be that clear- MDM leaves less scope for bias due to post-matching modeling

decisions- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Kernel matching London 07092017 50

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 59: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

Conclusions

Some additional conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some more simulationsI Variance estimation based on influence functionsI Better (and faster) bandwidth selection algorithmsI Explore potential of adaptive bandwidths

Ben Jann (University of Bern) Kernel matching London 07092017 51

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 60: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2005 Matching estimators and optimal bandwidth choiceStatistics and Computing 15197-215

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Galdo JC J Smith D Black 2008 Bandwidth selection and theestimation of treatment effects with unbalanced data Annales drsquoEacuteconomieet de Statistique 919289-216

Ben Jann (University of Bern) Kernel matching London 07092017 52

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions
Page 61: Kernel matching with automatic bandwidth selection - Stata · c cccccccc ccc cc ccc cc c cc cccc cccc ccccccccc cc ccc cc cccc cccccc cc c c cc ccc c ccccc cccccccc cc ccc ccc c cc

References II

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

Huber M M Lechner A Steinmayr 2015 Radius matching on thepropensity score with bias adjustment tuning parameters and finite samplebehaviour Empirical Economics 491-31

Huber M M Lechner C Wunsch 2013 The performance of estimatorsbased on the propensity score Journal of Econometrics 1751-21

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Ben Jann (University of Bern) Kernel matching London 07092017 53

  • Background
    • What is Matching
    • Multivariate Distance Matching (MDM)
    • Propensity Score Matching (PSM)
    • Matching Algorithms
    • ldquoWhy PSM Should Not Be Used for Matchingrdquo
      • The kmatch command
        • Features
        • Examples
        • Some Simulation Results
          • Conclusions