advanced quantitative research methodology, lecture notes · current practice: \matching as...

333
Advanced Quantitative Research Methodology, Lecture Notes: Matching Methods for Causal Inference 1 Gary King 2 Institute for Quantitative Social Science Harvard University 1 c Copyright 2016 Gary King, All Rights Reserved. 2 GaryKing.org 1 / 48

Upload: others

Post on 22-Jul-2020

15 views

Category:

Documents


3 download

TRANSCRIPT

  • Advanced Quantitative Research Methodology,Lecture Notes:

    Matching Methods for Causal Inference1

    Gary King2

    Institute for Quantitative Social ScienceHarvard University

    1 c©Copyright 2016 Gary King, All Rights Reserved.2GaryKing.org

    1 / 48

  • Matching Overview

    • Current practice:

    “Matching As Nonparametric Preprocessing For Re-ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory.

    So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice:

    “Matching As Nonparametric Preprocessing For Re-ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory.

    So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory.

    So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory.

    So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory. So let’s changethe theory:

    “A Theory of Statistical Inference for Matching Meth-ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory. So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory. So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical:

    “Why Propensity Scores Should Not Be Used forMatching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory. So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical: “Why Propensity Scores Should Not Be Used for

    Matching” (Gary King, Richard Nielsen)

    • Matching methods optimize either imbalance (≈ bias) or # unitspruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory. So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical: “Why Propensity Scores Should Not Be Used for

    Matching” (Gary King, Richard Nielsen)• Matching methods optimize either imbalance (≈ bias) or # units

    pruned (≈ variance); users need both simultaneously’:

    “The Balance-Sample Size Frontier in MatchingMethods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Matching Overview• Current practice: “Matching As Nonparametric Preprocessing For Re-

    ducing Model Dependence In Parametric Causal Infer-ence” (Daniel Ho, Kosuke Imai, Gary King, ElizabethStuart)

    • Current practice violates current statistical theory. So let’s changethe theory: “A Theory of Statistical Inference for Matching Meth-

    ods in Applied Causal Research”(Stefano Iacus, Gary King, Giuseppe Porro)

    • The most popular method (propensity score matching, used in53,200 articles!) sounds magical: “Why Propensity Scores Should Not Be Used for

    Matching” (Gary King, Richard Nielsen)• Matching methods optimize either imbalance (≈ bias) or # units

    pruned (≈ variance); users need both simultaneously’: “The Balance-Sample Size Frontier in Matching

    Methods for Causal Inference” (Gary King, Christo-pher Lucas and Richard Nielsen)

    2 / 48

  • Overview of Matching for Causal Inference

    • Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence

    • A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach

    • Makes parametric models work better rather than substitutefor them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data

    • Violates the “more data is better” principle, but that onlyapplies when you know the DGP

    • Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP

    • Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Overview of Matching for Causal Inference• Goal: reduce model dependence• A nonparametric, non-model-based approach• Makes parametric models work better rather than substitute

    for them (i.e,. matching is not an estimator; its apreprocessing method)

    • Should have been called pruning (no bias is introduced ifpruning is a function of T and X , but not Y )

    • Apply model to preprocessed (pruned) rather than raw data• Violates the “more data is better” principle, but that only

    applies when you know the DGP• Overall idea:

    • If each treated unit exactly matches a control unit w.r.t. X ,then: (1) treated and control groups are identical, (2) X is nolonger a confounder, (3) no need to worry about the functionalform (ȲT − ȲC is good enough).

    • If treated and control groups are better balanced than whenyou started, due to pruning, model dependence is reduced

    3 / 48

  • Model Dependence: A Simpler Example

    What to do?

    • Preprocess I: Eliminate extrapolation region• Preprocess II: Match (prune) within interpolation region• Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

    What to do?

    • Preprocess I: Eliminate extrapolation region• Preprocess II: Match (prune) within interpolation region• Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

    What to do?

    • Preprocess I: Eliminate extrapolation region• Preprocess II: Match (prune) within interpolation region• Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

    What to do?

    • Preprocess I: Eliminate extrapolation region• Preprocess II: Match (prune) within interpolation region• Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

    What to do?

    • Preprocess I: Eliminate extrapolation region

    • Preprocess II: Match (prune) within interpolation region• Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

    What to do?

    • Preprocess I: Eliminate extrapolation region• Preprocess II: Match (prune) within interpolation region

    • Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

    What to do?

    • Preprocess I: Eliminate extrapolation region• Preprocess II: Match (prune) within interpolation region• Model remaining imbalance (as you would w/o matching)

    4 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.

    • Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points

    2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.

    4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Remove Extrapolation Region, then Match

    • Must remove data (selecting on X ) to avoid extrapolation.• Options to find “common support” of p(X |T = 1) andP(X |T = 0)

    1. Exact match, so support is defined only at data points2. Less but still conservative: convex hull approach

    • let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗,X ∗}falls within the convex hull of {T ,X}

    • use X ∗ as estimate of common support (deleting remainingobservations)

    3. Other approaches, based on distance metrics, pscores, etc.4. Easiest: Coarsened Exact Matching, no separate step needed

    5 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    T

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    T

    CC

    C

    CC

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CC C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CCC

    CC

    CC

    C

    C

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    T

    CC

    C

    CC

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CC C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CCC

    CC

    CC

    C

    C

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    T

    CC

    C

    CC

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CC C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CCC

    CC

    CC

    C

    C

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    T

    CC

    C

    CC

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CC C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C

    CCC

    CC

    CC

    C

    C

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    TC

    C

    C

    C

    C

    CC

    C

    C

    CC

    C CC

    C

    C

    CCCC

    C

    CC

    C

    CC

    CC

    CC

    C

    C

    C

    C

    CC

    CCCC

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Education (years)

    Out

    com

    e

    12 14 16 18 20 22 24 26 28

    0

    2

    4

    6

    8

    10

    12

    T

    T

    T

    T T

    T

    T

    TTT

    TT

    T TT T

    T

    T

    T

    TC

    C

    C

    C

    C

    CC

    C

    C

    CC

    C CC

    C

    C

    CCCC

    C

    CC

    C

    CC

    CC

    CC

    C

    C

    C

    C

    CC

    CCCC

    6 / 48

  • Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

    Matching reduces model dependence, bias, and variance

    6 / 48

  • Empirical Illustration: Carpenter, AJPS, 2002

    • Hypothesis: Democratic senate majorities slow FDA drugapproval time

    • n = 408 new drugs (262 approved, 146 pending).• lognormal survival model.• seven oversight variables (median adjusted ADA scores for

    House and Senate Committees as well as for House andSenate floors, Democratic Majority in House and Senate, andDemocratic Presidency).

    • 18 control variables (clinical factors, firm characteristics,media variables, etc.)

    7 / 48

  • Empirical Illustration: Carpenter, AJPS, 2002

    • Hypothesis: Democratic senate majorities slow FDA drugapproval time

    • n = 408 new drugs (262 approved, 146 pending).• lognormal survival model.• seven oversight variables (median adjusted ADA scores for

    House and Senate Committees as well as for House andSenate floors, Democratic Majority in House and Senate, andDemocratic Presidency).

    • 18 control variables (clinical factors, firm characteristics,media variables, etc.)

    7 / 48

  • Empirical Illustration: Carpenter, AJPS, 2002

    • Hypothesis: Democratic senate majorities slow FDA drugapproval time

    • n = 408 new drugs (262 approved, 146 pending).

    • lognormal survival model.• seven oversight variables (median adjusted ADA scores for

    House and Senate Committees as well as for House andSenate floors, Democratic Majority in House and Senate, andDemocratic Presidency).

    • 18 control variables (clinical factors, firm characteristics,media variables, etc.)

    7 / 48

  • Empirical Illustration: Carpenter, AJPS, 2002

    • Hypothesis: Democratic senate majorities slow FDA drugapproval time

    • n = 408 new drugs (262 approved, 146 pending).• lognormal survival model.

    • seven oversight variables (median adjusted ADA scores forHouse and Senate Committees as well as for House andSenate floors, Democratic Majority in House and Senate, andDemocratic Presidency).

    • 18 control variables (clinical factors, firm characteristics,media variables, etc.)

    7 / 48

  • Empirical Illustration: Carpenter, AJPS, 2002

    • Hypothesis: Democratic senate majorities slow FDA drugapproval time

    • n = 408 new drugs (262 approved, 146 pending).• lognormal survival model.• seven oversight variables (median adjusted ADA scores for

    House and Senate Committees as well as for House andSenate floors, Democratic Majority in House and Senate, andDemocratic Presidency).

    • 18 control variables (clinical factors, firm characteristics,media variables, etc.)

    7 / 48

  • Empirical Illustration: Carpenter, AJPS, 2002

    • Hypothesis: Democratic senate majorities slow FDA drugapproval time

    • n = 408 new drugs (262 approved, 146 pending).• lognormal survival model.• seven oversight variables (median adjusted ADA scores for

    House and Senate Committees as well as for House andSenate floors, Democratic Majority in House and Senate, andDemocratic Presidency).

    • 18 control variables (clinical factors, firm characteristics,media variables, etc.)

    7 / 48

  • Evaluating Reduction in Model Dependence

    • Focus on the causal effect of a Democratic majority in theSenate (identified by Carpenter as not robust).

    • Match: prune 49 units (2 treated, 17 control units).• run 262,143 possible specifications and calculates ATE for

    each.

    • Look at variability in ATE estimate across specifications.• (Normal applications would only use one or a few

    specifications.)

    8 / 48

  • Evaluating Reduction in Model Dependence

    • Focus on the causal effect of a Democratic majority in theSenate (identified by Carpenter as not robust).

    • Match: prune 49 units (2 treated, 17 control units).• run 262,143 possible specifications and calculates ATE for

    each.

    • Look at variability in ATE estimate across specifications.• (Normal applications would only use one or a few

    specifications.)

    8 / 48

  • Evaluating Reduction in Model Dependence

    • Focus on the causal effect of a Democratic majority in theSenate (identified by Carpenter as not robust).

    • Match: prune 49 units (2 treated, 17 control units).

    • run 262,143 possible specifications and calculates ATE foreach.

    • Look at variability in ATE estimate across specifications.• (Normal applications would only use one or a few

    specifications.)

    8 / 48

  • Evaluating Reduction in Model Dependence

    • Focus on the causal effect of a Democratic majority in theSenate (identified by Carpenter as not robust).

    • Match: prune 49 units (2 treated, 17 control units).• run 262,143 possible specifications and calculates ATE for

    each.

    • Look at variability in ATE estimate across specifications.• (Normal applications would only use one or a few

    specifications.)

    8 / 48

  • Evaluating Reduction in Model Dependence

    • Focus on the causal effect of a Democratic majority in theSenate (identified by Carpenter as not robust).

    • Match: prune 49 units (2 treated, 17 control units).• run 262,143 possible specifications and calculates ATE for

    each.

    • Look at variability in ATE estimate across specifications.

    • (Normal applications would only use one or a fewspecifications.)

    8 / 48

  • Evaluating Reduction in Model Dependence

    • Focus on the causal effect of a Democratic majority in theSenate (identified by Carpenter as not robust).

    • Match: prune 49 units (2 treated, 17 control units).• run 262,143 possible specifications and calculates ATE for

    each.

    • Look at variability in ATE estimate across specifications.• (Normal applications would only use one or a few

    specifications.)

    8 / 48

  • Reducing Model Dependence

    −80 −70 −60 −50 −40 −30

    0.00

    0.05

    0.10

    0.15

    0.20

    Estimated in−sample average treatment effect for the treated

    Den

    sity

    Raw data Matcheddata

    Point estimate of Carpenter's specification

    using raw data

    Figure: SATT Histogram: Effect of Democratic Senate majority on FDAdrug approval time, across 262, 143 specifications.

    9 / 48

  • Another Example: Jeffrey Koch, AJPS, 2002

    −0.05 0.00 0.05 0.10

    010

    2030

    4050

    60

    Estimated average treatment effect

    Den

    sity

    Raw data

    Matcheddata

    Point estimate of raw data

    Figure: SATT Histogram: Effect of being a highly visible femaleRepublican candidate across 63 possible specifications with the Kochdata.

    10 / 48

  • The Problems Matching Solves

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator• e.g., Choosing from results of 50 randomized experiments

    • Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator• e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator• e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]

    • People do not have easy access to their own mental processesor feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator• e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator• e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    Without Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator• e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    With��HHout Matching:

    Imbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    With��HHout Matching:

    ��ZZImbalance Model Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    With��HHout Matching:

    ��ZZImbalance ((((((((

    (hhhhhhhhhModel Dependence Researcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    With��HHout Matching:

    ��ZZImbalance ((((((((

    (hhhhhhhhhModel Dependence ((((((((

    ((hhhhhhhhhhResearcher discretion Bias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    With��HHout Matching:

    ��ZZImbalance ((((((((

    (hhhhhhhhhModel Dependence ((((((((

    ((hhhhhhhhhhResearcher discretion ���XXXBias

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • The Problems Matching Solves

    With��HHout Matching:

    ��ZZImbalance ((((((((

    (hhhhhhhhhModel Dependence ((((((((

    ((hhhhhhhhhhResearcher discretion ���XXXBias

    A central project of statistics: Automating away human discretion

    • Qualitative choice from unbiased estimates = biased estimator

    • e.g., Choosing from results of 50 randomized experiments• Choosing based on “plausibility” is probably worse[eff]

    • conscientious effort doesn’t avoid biases (Banaji 2013)[acc]• People do not have easy access to their own mental processes

    or feedback to avoid the problem (Wilson and Brekke1994)[exprt]

    • Experts overestimate their ability to control personal biasesmore than nonexperts, and more prominent experts are themost overconfident (Tetlock 2005)[tch]

    • “Teaching psychology is mostly a waste of time” (Kahneman2011)

    11 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders

    • Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi (1)− Yi (0)

    = observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi (1)− Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control

    • Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • What’s Matching?

    • Yi dep var, Ti (1=treated, 0=control), Xi confounders• Treatment Effect for treated observation i :

    TEi = Yi − Yi (0)= observed− unobserved

    • Estimate Yi (0) with Yj with a matched (Xi ≈ Xj) control• Quantities of Interest:

    1. SATT: Sample Average Treatment effect on the Treated:

    SATT = Meani∈{Ti=1}

    (TEi )

    2. FSATT: Feasible SATT (prune badly matched treateds too)

    • Big convenience: Follow preprocessing with whateverstatistical method you’d have used without matching

    • Pruning nonmatches makes control vars matter less: reducesimbalance, model dependence, researcher discretion, & bias

    12 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed

    On average Exact

    Unobserved

    On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average

    Exact

    Unobserved

    On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average

    Exact

    Unobserved On average

    On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average

    On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization

    for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:

    imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance,

    model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence,

    power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power,

    efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency,

    bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias,

    researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts,

    robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness.

    E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization

    • Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked

    • Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM

    (wait, it gets worse)

    13 / 48

  • Matching: Finding Hidden Randomized Experiments

    Types of Experiments

    BalanceCovariates:

    CompleteRandomization

    FullyBlocked

    Observed On average ExactUnobserved On average On average

    Fully blocked dominates complete randomization for:imbalance, model dependence, power, efficiency, bias, researchcosts, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller!

    Goal of Each Matching Method (in Observational Data)

    • PSM: complete randomization• Other methods: fully blocked• Other matching methods dominate PSM (wait, it gets worse)

    13 / 48

  • Method 1: Mahalanobis Distance Matching

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)

    • (Mahalanobis is for methodologists; in applications, useEuclidean!)

    • Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)

    • Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit

    • Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused

    • Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper

    • (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Method 1: Mahalanobis Distance Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Distance(Xc ,Xt) =√

    (Xc − Xt)′S−1(Xc − Xt)• (Mahalanobis is for methodologists; in applications, use

    Euclidean!)• Match each treated unit to the nearest control unit• Control units: not reused; pruned if unused• Prune matches if Distance>caliper• (Many adjustments available to this basic method)

    2. Estimation Difference in means or a model

    14 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    15 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    TTTT

    T

    T

    T

    T

    T

    T

    T

    T

    T

    TT

    T

    T

    T

    T

    T

    15 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    C

    C

    CC

    C

    C

    C

    C

    C

    CC

    C

    CCC

    CC

    C

    C

    C

    CC CC

    C

    C

    CC

    C

    CC

    CC

    C

    C C

    CC

    C

    C

    TTTT

    T

    T

    T

    T

    T

    T

    T

    T

    T

    TT

    T

    T

    T

    T

    T

    15 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    C

    C

    CC

    C

    C

    C

    C

    C

    CC

    C

    CCC

    CC

    C

    C

    C

    CC CC

    C

    C

    CC

    C

    CC

    CC

    C

    C C

    CC

    C

    C

    TTTT

    T

    T

    T

    T

    T

    T

    T

    T

    T

    TT

    T

    T

    T

    T

    T

    15 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    T TT T

    TTTT

    T TTTT

    T TT

    TTTT

    CCC C

    CC

    C

    C

    C CC

    C

    CC

    CCC CC

    C

    C

    CCCCC

    CCC CCCCC

    C CCCC

    C

    15 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    T TT T

    TTTT

    T TTTT

    T TT

    TTTT

    CCC C

    CC

    C

    C

    C CC

    C

    CC

    CCC CC

    C

    15 / 48

  • Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    T TT T

    TTTT

    T TTTT

    T TT

    TTTT

    CCC C

    CC

    C

    C

    C CC

    C

    CC

    CCC CC

    C

    15 / 48

  • Best Case: Mahalanobis Distance Matching

    16 / 48

  • Best Case: Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    T TTTT T TTTT TT T

    T TTTTTTT T

    TTTTT T T

    T TTTT T

    TT TTTTT

    TTTT

    TTTT

    C CCCC C CCCC CC C

    C CCCCCCC C

    CCCCC C C

    C CCCC C

    CC CCCCC

    CCCC

    CCCC

    CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCC

    CC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC

    CC CCCC C CCC CCC CC C C CCCCCC CCCC C CCCC C CC CCC CC CCC C

    C CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC C

    C CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC

    CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C C

    C CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C C

    C CCCC C CC C CC CC C CC C CCC C C CCC CC C CCCCCC CC CCCC CC CC C CC C CC CC C CC CC

    C CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC

    CCC CC CCC CC CCCC C CCC CC CCC

    16 / 48

  • Best Case: Mahalanobis Distance Matching

    Education (years)

    Age

    12 14 16 18 20 22 24 26 28

    20

    30

    40

    50

    60

    70

    80

    T TTTT T TTTT TT T

    T TTTTTTT T

    TTTTT T T

    T TTTT T

    TT TTTTT

    TTTT

    TTTT

    C CCCC C CCCC CC C

    C CCCCCCC C

    CCCCC C C

    C CCCC C

    CC CCCCC

    CCCC

    CCCC

    16 / 48

  • Method 2: Coarsened Exact Matching

    1. Preprocess (Matching)

    • Temporarily coarsen X as much as you’re willing

    • e.g., Education (grade school, high school, college, graduate)

    • Apply exact matching to the coarsened X , C (X )

    • Sort observations into strata, each with unique values of C(X )• Prune any stratum with 0 treated or 0 control units

    • Pass on original (uncoarsened) units except those pruned

    2. Estimation Difference in means or a model

    • Weight controls in each stratum to equal treateds

    17 / 48

  • Method 2: Coarsened Exact Matching(Approximates Fully Blocked Experiment)

    1. Preprocess (Matching)

    • Temporarily coarsen X as much as you’