the logic of c ounterfactual i mpact e valuation 1

73
The logic of Counterfactual Impact Evaluation 1

Upload: sarah-jacobs

Post on 27-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The logic of C ounterfactual I mpact E valuation 1

The logic of

Counterfactual

Impact

Evaluation

1

Page 2: The logic of C ounterfactual I mpact E valuation 1

To understand counterfactuals

It is necessary to understand

impacts

Page 3: The logic of C ounterfactual I mpact E valuation 1

Impacts differ in one fundamental way

from outputs and results

Outputs and results are observable quantities

Page 4: The logic of C ounterfactual I mpact E valuation 1

Can we observe an impact?

No, we can’t

Page 5: The logic of C ounterfactual I mpact E valuation 1

As output indicators measure outputs, result indicators

measure results, so impact indicators measure impacts

Sorry, they don’t

Page 6: The logic of C ounterfactual I mpact E valuation 1

Almost everything about programmes can be observed (at least in principle):

outputs (beneficiaries served, activities done, training courses offered,

KM of roads built, sewages cleaned)

outcomes/results (income levels, inequality, well-being of the population,

pollution, congestion, inflation, unemployment, birth rate)

Page 7: The logic of C ounterfactual I mpact E valuation 1

What is needed for M&E of outputs and results are

BITs(baselines, indicators,

and targets)

Page 8: The logic of C ounterfactual I mpact E valuation 1

Unlike outputs and results, to define, detect, understand,

and measure impacts

one needs to deal with

causality

Page 9: The logic of C ounterfactual I mpact E valuation 1

“Causality is in the mind”

J.J. Heckman

Page 10: The logic of C ounterfactual I mpact E valuation 1

Why this focus on causality?Because, unless we can attribute changes

(or differences) to policies, we do not know whether the intervention “works”,

“for whom” it works, and even less “why” it works

(or does not)

Causal questions represents a bigger challenge than non causal questions (descriptive, normative,

exploratory)

10

Page 11: The logic of C ounterfactual I mpact E valuation 1

The social science scientific community defines

impact/effect as

“the difference between a situation observed after a stimulus has been

applied and the situation that would have occurred

without such stimulus” 11

Page 12: The logic of C ounterfactual I mpact E valuation 1

A very intuitive example of the role of causality in producing credible evidence for

policy decisions

Page 13: The logic of C ounterfactual I mpact E valuation 1

Does playing chess

have an impact on math learning?

Page 14: The logic of C ounterfactual I mpact E valuation 1

Policy-relevant question:

Should we make chess part of the regular curriculum in elementary schools, to improve

mathematics achievement?

Which kind of evidence do we need to make this decision in an informed way?

We can think of three types of evidence, from the most naive to the most credible

14

Page 15: The logic of C ounterfactual I mpact E valuation 1

1. The naive evidence:pre-post difference

• Take a sample of pupils in fourth grade• Measure their achievement in math at

the beginning of the year• Teach them to play chess during the

year• Test them again at the end of the year

15

Page 16: The logic of C ounterfactual I mpact E valuation 1

Results for the pre-post difference

Pupils at the beginning of the year

Average score = 40 points

Difference = 12 points = + 30% Question: what are the implications for

making chess compulsory in schools?Have we proven anything?

The same pupils at the end of the year

Average score = 52 points

16

Page 17: The logic of C ounterfactual I mpact E valuation 1

Can we attribute the increase in test score to playing chess?

OBVIOUSLY NOTThe data tell us that the effect is

between zero and 12 points

•There is not doubt that many more factors are at play

•So we must dismiss the increase in 10 points as unable to tell us anything about impact.

17

Page 18: The logic of C ounterfactual I mpact E valuation 1

The pre-post great temptation

• The pre-post comparisons have a great advantage: they seem kind of obvious (the “pop” definition of impact coincides with the pre-post difference)

• Particularly when the intervention is big, and the theory suggests that the outcomes should be affected

• This is not the case here, but we should be careful in general to make causal inference based on pre-post comparisons

18

Page 19: The logic of C ounterfactual I mpact E valuation 1

The risky alternative:with-without difference

Impact = difference between treated and not treated?

19

Compare math test scores for kids who have learned chess by themselves

and kids who have not

Page 20: The logic of C ounterfactual I mpact E valuation 1

Not reallyAverage score of pupils who already play chess on their

own (25% of the total)

= 66 points

Difference = 21 points = + 47%This difference is OBJECTIVE,

but what does it mean, really? Does it have any implication for policy?

Average score of pupils who DO NOT play chess on their own

(75% of the total) = 45 points

20

Page 21: The logic of C ounterfactual I mpact E valuation 1

This evidence tells us almost nothing about making chess

compulsory for all students

The data tell us that the effect of playing chess is between zero and 21 points.

Why?

The observed difference could entirely be due to differences in mathematical ability that exist before the courses, between the two groups

21

Page 22: The logic of C ounterfactual I mpact E valuation 1

Play chessPlay

chess

Math innate ability

Math innate ability

Math test

scores

Math test

scoresCS

SELECTION PROCESS

DIRDIRE

DIRECT INFLUENCE

Ignoring math ability could severly bias the results, if we intend to interpret them as causal effect

Does it have an impact on?

66 – 45: real effect or the fruit of sorting?

22

Page 23: The logic of C ounterfactual I mpact E valuation 1

Counterfeit Counterfactual

Both the raw difference between self-selected participants and non-participants, and the raw

change between pre and post are a caricature of the counterfactual logic

In the case of raw differences, the problem is selection bias (predetermined differences)In the case of raw changes, the problem ismaturation bias (a.k.a. natural dynamics)

23

Page 24: The logic of C ounterfactual I mpact E valuation 1

The modern way to understandcausality is to think in terms of

POTENTIAL OUTCOMES

Let us imagine we know the score that kids would get if they played

and they would get if they did not

24

Page 25: The logic of C ounterfactual I mpact E valuation 1

Let’s say there are three levels of ability

Kids in the top quartile (top 25%) learn to play chess on their own

Kids in the two middle quartiles learn if they are taught in school

Kids in the bottom quartile (last 25%) never learn to play chess 25

Page 26: The logic of C ounterfactual I mpact E valuation 1

Mid math ability50%

Mid math ability50%

High math ability25%

High math ability25%

Low math ability25%

Low math ability25%

Play chess by themselvesPlay chess by themselves

Do not play chessDo not play chess

Unless taught in schoolUnless taught in school

Never learn to play Never learn to play

26

Page 27: The logic of C ounterfactual I mpact E valuation 1

Mid math ability

Mid math ability

High math ability

High math ability

Low math ability

Low math ability

If they do play

chess

If they do play

chess

If they do NOT play

chess

If they do NOT play

chess

Impact = gain from playing

chess

Impact = gain from playing

chess

66 66 56 56 10 10

5454 48 48 6 6

40 40 40 40 0 0

Potential outcomes

27

Page 28: The logic of C ounterfactual I mpact E valuation 1

Mid math ability

Mid math ability

High math ability

High math ability

Low math ability

Low math ability

For those who play

chess

For those who play

chess

For those who do not play chess

For those who do not play chess

66 66

48 48

40 40

Observed outcomes

45 45

the difference of 21 points is NOT an

IMPACT, it is just an OBSERVED difference

the difference of 21 points is NOT an

IMPACT, it is just an OBSERVED difference

Mid/Low math ability combined

Mid/Low math ability combined

28

Page 29: The logic of C ounterfactual I mpact E valuation 1

The problem: we do not observe the counterfactual(s)

• For the treated, the counterfactual is 56, but we do not see it

• The true impact is 10, but we do not see it• Still we cannot use 45, that is the untreated

observed outcome

We can think of decomposing the 68-45 difference as the sum of the true impact on the treated and the effect of sorting

29

Page 30: The logic of C ounterfactual I mpact E valuation 1

Low/mid math ability

Low/mid math ability

High math ability

High math ability

If play chessIf play chess

If do not play chessIf do not

play chessDecomposing the observed

difference

Decomposing the observed

difference

66 66 56 56 = 10Impact

for players

= 10Impact

for players

45 45 =21Observed difference

=21Observed difference= 11

preexisting differences= 11

preexisting differences

21 = 10 + 11 21 = 10 + 11 30

Page 31: The logic of C ounterfactual I mpact E valuation 1

21 = 10 + 11

Observed differences =

Impact +

Preexisting differences(selection bias)

The heart of impact evaluation is getting rid of selection bias, by using

experiments or by using some non-experimental methods

21 = 10 + 11

Observed differences =

Impact +

Preexisting differences(selection bias)

The heart of impact evaluation is getting rid of selection bias, by using

experiments or by using some non-experimental methods

31

Page 32: The logic of C ounterfactual I mpact E valuation 1

Experimental evidence to the rescue

Schools get a free instructor to teach chess to one class, if they agree to select

the class at random among the fourth grade classes

Now we have the following situation

32

Page 33: The logic of C ounterfactual I mpact E valuation 1

Results of the randomized experiment

Pupils in the selected classes

Average score of randomized chess players = 60 points

Pupils in the excluded classes

Average score of NON chess players = 52 points

Difference = 8 points

Question: what does this difference tell us?33

Page 34: The logic of C ounterfactual I mpact E valuation 1

Thus we are able to isolate the effect of chess from other factors

(but some problems remain)

The results tell us that teaching chess truly improves math performance

(by 8 points, about 15%)

34

Page 35: The logic of C ounterfactual I mpact E valuation 1

Mid abilityMid ability

High abilityHigh ability

Low abilityLow ability

If they do play

chess

If they do play

chess

If they do NOT play

chess

If they do NOT play

chessComposition of populationComposition of population

66 66 56 56 25% 25%

54 54 48 48 50%50%

40 40 40 40 25% 25%

AveragesAverages 54 54 48 48 100% 100%

ImpactImpact Impact = 54 – 48 = 6Impact = 54 – 48 = 6

Average Treatment Effect

ATE

35

Page 36: The logic of C ounterfactual I mpact E valuation 1

Play chessPlay

chess

Math abilityMath ability

Math test scores

Math test scores

DIRDIRE

DIRDIRE

Note that the experiment does solve all the cognitive problems related to policy design: for example, it does identify impact heterogeneity (“for whom it works”)

Note that the experiment does solve all the cognitive problems related to policy design: for example, it does identify impact heterogeneity (“for whom it works”) 36

Page 37: The logic of C ounterfactual I mpact E valuation 1

The ATE is the average effect if every member of the

population is treated

Generally there is more policy interest in Average Treatment Effect on the Treated

ATT = 10 the chess example, while ATE = 6

(we ran an experiment and got an impact of 8. Can you think why this happens?)

37

Page 38: The logic of C ounterfactual I mpact E valuation 1

Mid abilityMid ability

High abilityHigh ability

Low abilityLow ability

Schools that

vounteered

Schools that

vounteered

Schools that DID NOT vounteer

Schools that DID NOT vounteer

50% 50% 1010

50% 50%

50% 50%

66

EXPERIMENTALmean of 66 and 54

= 60

EXPERIMENTALmean of 66 and 54

= 60

True impactTrue

impact

Impact = 60 – 52 = 8Impact = 60 – 52 = 838

50% 50%

00

CONTROL mean of 56 and 48

= 52

CONTROL mean of 56 and 48

= 52

Internal validityInternal validity

Little external validity

Little external validity

Page 39: The logic of C ounterfactual I mpact E valuation 1

Lessons learned

Impacts are differences, but not all differences are impacts

Differences (and changes) have many causes, but we do not need to undersand all the causes

We are especially interested in one cause, the policy, and we would like to eliminate all the counfounding

causes of the difference (or change)

Internal vs. External validity

39

Page 40: The logic of C ounterfactual I mpact E valuation 1

An example of a real ERDF policy

Grants to small enterprises to invest in R&D

40

Page 41: The logic of C ounterfactual I mpact E valuation 1

To design an impact evaluation, one needs to answer three important questions

1. Impact of what?

2. Impact for whom?

3. Impact on what?

Page 42: The logic of C ounterfactual I mpact E valuation 1

AVERAGE NPRE 65.000 2400

POST 75.000 2400

OBSERVED CHANGE 10.000

R&D EXPENDITURES AMONG THE FIRMS RECEIVING GRANTS

Is 10.000 the true average impact of the grant?

42

Page 43: The logic of C ounterfactual I mpact E valuation 1

43

Page 44: The logic of C ounterfactual I mpact E valuation 1

44

Page 45: The logic of C ounterfactual I mpact E valuation 1

The fundamental challenge to this assumption is the well known fact

that things change over time by “natural dynamics”

How do we disentangle the change due to the policy from the myriad

changes that would have occurred anyway?

45

Page 46: The logic of C ounterfactual I mpact E valuation 1

AVERAGE N

T=0 60.000 2600

T=1 75.000 2400

DIFFERENCETREATED - NON TREATED +15.000

IS 15.000 THE TRUE IMPACT OF THE POLICY?

46

Page 47: The logic of C ounterfactual I mpact E valuation 1

WITH-WITHOUT (I.A.: NO PRE-INTERVENTION DIFFERENCES)

47

Page 48: The logic of C ounterfactual I mpact E valuation 1

DECOMPOSITION OF WITH-WITHOUT DIFFERENCES

48

Page 49: The logic of C ounterfactual I mpact E valuation 1

DECOMPOSITION OF WITH-WITHOUT DIFFERENCES

49

Page 50: The logic of C ounterfactual I mpact E valuation 1

We cannot use experiments with firms, for obvious (?) political reasons

The good news is that there are lots of non-experimental counterfactual

methods

50

Page 51: The logic of C ounterfactual I mpact E valuation 1

The difference-in-differences (DID) is a combination of the first two

strategies

And it is a good way to understand the logic of (non-experimental)

counterfactual evaluation

51

Page 52: The logic of C ounterfactual I mpact E valuation 1

52

Page 53: The logic of C ounterfactual I mpact E valuation 1

53

Page 54: The logic of C ounterfactual I mpact E valuation 1

54

Page 55: The logic of C ounterfactual I mpact E valuation 1

55

Page 56: The logic of C ounterfactual I mpact E valuation 1

56

Page 57: The logic of C ounterfactual I mpact E valuation 1

57

Page 58: The logic of C ounterfactual I mpact E valuation 1

POST DIFFE-RENCE

PREDIFFE-RENCE

58

Page 59: The logic of C ounterfactual I mpact E valuation 1

POST DIFFE-RENCE

PRE DIFFE-RENCE

59

Page 60: The logic of C ounterfactual I mpact E valuation 1

POST DIFFERENCE=15.000

-PRE DIFFERENCE

=10.000

=

Impact = 5000

60

Page 61: The logic of C ounterfactual I mpact E valuation 1

61

Page 62: The logic of C ounterfactual I mpact E valuation 1

CAN WE TEST THE PARALLELISM ASSUMPTION?

With four observed means, we cannot

The parallelism becomes testable if we have two additional data points

pre-intervention

PRE-PRE

62

Page 63: The logic of C ounterfactual I mpact E valuation 1

63

Page 64: The logic of C ounterfactual I mpact E valuation 1

64

Page 65: The logic of C ounterfactual I mpact E valuation 1

65

Page 66: The logic of C ounterfactual I mpact E valuation 1

66

Page 67: The logic of C ounterfactual I mpact E valuation 1

67

Page 68: The logic of C ounterfactual I mpact E valuation 1

68

Page 69: The logic of C ounterfactual I mpact E valuation 1

69

WHEN TO USE DIFF-IN-DIFF?

When we have longitudinal data and have reasons to believe that most of what drives selection is

individual unobserved characteristics

Page 70: The logic of C ounterfactual I mpact E valuation 1

70

Second, the path taken by the controls must be a plausible approximation of what would

happen to the treated

The following is an example in which it would be better

NOT to use DID

Page 71: The logic of C ounterfactual I mpact E valuation 1

71

Page 72: The logic of C ounterfactual I mpact E valuation 1

72

58.000 65.000 7.00057.000 55.000 -2.000

9.000

65.000 75.000 10.00055.000 67.000 12.000

-2.000

Diff-in-diff-in-diff -11.000

Page 73: The logic of C ounterfactual I mpact E valuation 1

73

58.000 65.000 7.000 65.000 72.000 75.000

Linearly projected impact 3.000