impact evaluations and

Impact Evaluations and

Randomization

Riza Halili

Policy Associate

IPA Philippines

Impact Evaluation of Social

Development Programs

June 20, 2018

1. Impact Evaluations and Randomization

2. How to Randomize

3. RCTs Start to Finish

Overview

June 20, 2018

Presentation Overview

• Introducing Innovations for Poverty Action

• Theory of change, indicators (monitoring), and

impact evaluation

• Causality and impact

• Impact evaluation methods: Non-experimental

and Experimental

• Conclusions

impact evaluation

and Experimental

• Conclusions

OUR VISIONMore evidence, less poverty

IPA’s Approach

We generate insights on what works and what does not through

randomized evaluations, and ensure that those findings will be useful

to, and used by practitioners and policy makers.

ABOUT IPA

• 425 completed projects

• 431 ongoing projects

• 50+ countries

• 400+ leading academics

• 400+ partner organizations (public, private,

and non-profit sectors)

• 20 country offices

IPA Stats

ABOUT IPA

WE WORK ACROSS SECTORS

Financial services Health Agriculture

Education Governance & DemocracySmall & Medium

Enterprises

Focusing on

the Local

• 20 countries

with a

long-term

presence

• Widely

recognized

as the

experts in

field-based

randomized

evaluations

ABOUT IPA-PH • Agrarian Reform (CARP)

• Agricultural Micro-insurance

• Kalahi-CIDSS

• KASAMA (child labor)

• Labeled Remittances

• Special Program for the Employment of Students (SPES)

• SME Credit Scoring

• Court descongestion

• Micro-savings

• Micro-credit

• Combating Vote-Selling

• Values education

• Graduation of the ultra-poor

• Rules of Thumb Financial Literacy

PROJECTS

ABOUT IPA-PH• Bank of the Philippine Islands

• Department of Agrarian Reform

• International Care Ministries

• Philippine Crop Insurance Corp.

• Department of Social Welfare and Development

• Development Bank of the Philippines

• Department of Labor and Employment

• Department of Education

• Supreme Court

• Office of the Vice President

• Negros Women for Tomorrow Foundation

• First Macro Bank

• Asian Development Bank

• National Economic Development Authority

PARTNERS

In the News

• IPA “has succeeded in bringing complex issues in aid and development to the forefront of global development media coverage.”

• -The Guardian

• Theory of change, indicators (monitoring),

and impact evaluation

and Experimental

• Conclusions

Components of Program Evaluation

• Needs Assessment

• Program Theory Assessment

• Process Evaluation

• Impact Evaluation

• Cost Effectiveness

• What is the problem?

• How, in theory, does the program fix the problem?

• Does the program work as planned?

• Were its goals achieved?The magnitude?

• Given magnitude and cost, how does it compare to alternatives?

Levels of Program Evaluation

Needs Assessment

Program Theory Assessment

Process evaluation

Impact evaluation

Cost-benefit / Cost-effectiveness analysis

Components of Program Evaluation

• Needs Assessment

• Program Theory

Assessment

• Process Evaluation

• Impact Evaluation

• Cost Effectiveness

• What is the problem?

• How, in theory, does the

programme fix the problem?

• Does the programme work as

planned?

• Were its goals achieved?

The magnitude?

• Given magnitude and cost, how

does it compare to alternatives?

Build a Theory of Change

What is a Theory of Change

(ToC)?

• An explanation of how something is made different

• Definition

–A theory of change is a structured approach used in the

design and evaluation of social programs to explore change

and how it happens. It maps the logical chain of how

program inputs achieve changes in outcomes.

–An on-going process of reflection to explore change and how

it happens – and what that means in a particular context,

sector, and/or group of people.

–Guide for what we need to monitor and evaluate

Theory of change is a PROCESS and a PRODUCT.

Causal Hypothesis

Do changes in one variable cause changes in another variable?

Q: How do I expect results to be achieved?

A: If [inputs] produce [outputs] this should lead to [outcomes]

which will ultimately contribute to [goal].

Inputs/

Program Activities

OutputsIntermediate

outcomesImpact

What we do as a

part of the

Program - deliver,

teach, offer loans,

Tangible

products or

services

produced as a

result of the

activities -

usually can be

counted.

Short-term

behavioral

changes that

result from the

outputs -

preventive

health habits,

usage of tablets.

Long-term

changes that

result from

outcomes –

the result of

the Program.

Theory of Change Components

6 Steps to Building a ToC

1. Situation analysis – Specifying the context

2. Clarify the Program goal

3. Design the Program/product

4. Map the causal pathway

5. Explicate assumptions

6. Design SMART indicators

Building a Theory of Change

Situation/Context Analysis: High health worker absenteeism, low value of immunization,

limited income and time

Increased

Immuni-

zation

Incentives for

Immunization

Camps are

reliably Open

Parents bring

children to

the camps

Parents bring

children to

the camps

repeatedly

GOALOUTPUT OUTCOME

Immunization

Incentives are

delivered

ToC: Explicate Assumptions

• Definition

– Hypotheses about factors or risks which could affect

the progress or success of an intervention

• Intervention results depend on whether or not

the assumptions made, prove to be correct

• Assumptions are the key to unlocking theory of

change thinking

• Source: http://www.unaids.org/sites/default/files/sub_landing/files/11_ME_Glossary_FinalWorkingDraft.pdf

Theory of Change: Assumptions

Incentives for

Immunization Parents bring

children to the

Immunization

Camps Camp provides

immunizations

Parents value

incentives

Parents trust

Incentives paid

regularly

• Indicators for each component

–goal, outcome, output, input

• Risk indicators

– Measure whether assumptions and risks have

been met and are facilitating

ToC: Design indicators

Indicators: What are they?

Indicators are:

–A measurement of achievement or change.

How do we know that we are achieving the results we set

out to achieve?

Indicators vs. targets

• Indicators

• Progress markers

• What do we measure? Track?

• Targets

• What are our goals? What change do we expect to see? In what time frame?

• Non-directional vs. directional– Indicator: measures change

– Target: tells us whether change should go up or down

• Time-bound– Indicator: may or may not specify time period

– Target: tells us by when we should expect to see change

Components of an indicator:

Component Example

What is to be measured Farmers adopting sustainable farming practices

Unit of measurement % of farmers

Quality or standard of the change to be achieved

# of practices adopted by farmers

Target population Farmers benefited by the program

% of farmers employing at least 3 sustainable farming practices.

Additional components of a target:

Component Example

Baseline status From 20%

Size, magnitude or dimension of change Increase to 50%

Time-frame 1 year after the training

Increase the % of farmers who employ at least 3 sustainable farming techniques from 20% to 50% one year after the training.

Objective vs. subjective indicators

• Objective

– Observed

– Information collected will be the same if collected by different people

– Can be quantitative

• Subjective

– Reported by beneficiary/respondent

– Measured using judgment or perception

Example:

Kg of rice harvested vs. kg of rice harvested as reported by farmer

Quantitative vs. Qualitative

indicators

• Quantitative:

–Statistical measures

–#, %, rate, ratio

–Easily aggregated and compared

Examples:

• # of trainings held

• % of farmers adopting sustainable farming practices

• # of trees planted per household

Quantitative vs. Qualitative

indicators

• Qualitative:– Capture judgments and perceptions

• “quality of”

• “extent of”

• “compliance with”

• “satisfaction with”

– Can be quantified and compared• E.g. scale to measure teaching quality

• E.g. scale to measure youth perceptions of self-efficacy

• Hint: Use pre-existing scales and indicators!

Examples:• % of participants who rated the training as high quality

• Extent to which government supports integrate with local programming

OUTPUT OUTCOME

ToC: Design Indicators

Increased

Immunization

Camps +

Incentives

Camps are

open and

incentives are

delivered

Parents bring

children to the

Parents bring

children to

the camps

repeatedly

After 6 months, camps were established and equipped to run in 90% of Programme villages. . All health

workers were trained to offer parents the

appropriate incentives at their visit.

After 9 months, camps were running on a monthly basis at 90% of the planned

villages.

Incentives were delivered to these

70-75% of Parents brought children to be

immunized in the camps that were open and reported receiving

incentives.

90 to 95% of parents who immunized the children during the

first round of immunization, brought them to be immunized for the second round

At the end of the Program,

immunization rate was 39% in the

intervention villages as compared to 6%

in comparison villages

# of villages camps established

# of trained health workers

# camps open

# camps incentives delivered to

# of beneficiaries attending camps

# of beneficiaries receiving incentives

repeatedly

# of children immunized

ToC: Design Indicators

Incentives for

Immunization Parents bring

children to the

Immunization

Camps Camp provides

immunizations

Parents value

incentives

Parents trust

Incentives paid

regularly

GOAL# of beneficiaries attending camps

repeatedly

# of incentives bought as reflected in receipts

repeatedly

# of immunizations administered at the camp site by hired

nurses

ExampleWhat is it? Components Assumptions Conclusion

Why is Theory of Change Important?

For evaluators, reminds us to consider process

For implementers, it helps us be results oriented

Increased Immuni-

zation

Incentives for Immunization

Camps are reliably Open

Parents bring

children to the camps

Parents bring

children to the camps repeatedly

GOALOUTPUT OUTCOME

ImmunizationCamps

Incentives are delivered

Inputs Activities Outputs Outcomes Goal

Theory Failure vs. Implementation Failure

Successful intervention

Implementation failure

Theory failure

Inputs Activities Outputs Outcomes Goal

impact evaluation

and Experimental

• Conclusions

What is Impact Evaluation?

• Two key concepts…

–Causality

–The counterfactual…what is that?

Impact Evaluation tells you…

The causal effect of a program or activity on

an outcome of interest by comparing the

outcomes of interest (short, medium, or long

term) with what would have happened without

the program—a counterfactual.

What is causality…

and what do we mean by impact?

Which of the following indicates

a causal relationship?

A. A positive correlation between computer aided learning

and test scores

B. A positive correlation between income and health

outcomes

C. A positive correlation between years of schooling and

income

D. None of the above

E. Don’t know

Causality

Cause and effect language is used everyday in a lot of contexts, but it means something very specific in impact evaluation.

• We can think of causality as:

• Isolating the singular effect of a program, independent of any other intervening factors, on an outcome of interest and estimating the size of this effect accurately and with confidence

• We use impact evaluation to rule out the possibility that any other factors, other than the program of interest, are the reason for these changes

Measuring Impact for Pratham’s

Balsakhi Program

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

What was the Problem?

▪ Many children in 3rd and 4th standard were not even at

the 1st standard level of competency

▪ Class sizes were large

▪ Social distance between teacher and many of the

students was large

Proposed Solution

▪ Work with Pratham in 124 Municipal Schools in

Vadodara (Western India)

▪ Hire local women (Balsakhis) from the

community

▪ Train them to teach remedial competencies

• Basic literacy, numeracy

▪ Identify lowest performing 3rd and 4th standard

students

• Take these students out of class (2 hours/day)

• Balsakhi teaches them basic competencies

Setting up the impact evaluation

• Implemented over 2 years

• Outcome of interest: test scores

UNDERSTANDING

IMPACTP

Program starts

A. POSITIVE

B. NEGATIVE

C. ZERO

D. DON’T

What is the impact of the Balsakhi program?

UNDERSTANDING

IMPACTP

Program starts

What is the impact of the Balsakhi program?

Impact

▪ Impact is defined as a comparison between:

• The outcome some time after the program has been introduced

• The outcome at that same point in time had the program not been introduced

This is know as the “Counterfactual”

How to Measure Impact?

UNDERSTANDING

IMPACT

In other words, impact evaluation measures…

How lives have changed(with the program)

compared to how they would have changed

(without the program)

IMPACT of the program

UNDERSTANDING

IMPACT

Counterfactual: represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program)

Impact: What is it?

ImpactIntervention

Impact: What is it?

Impact

Intervention

UNDERSTANDING

IMPACT

Counterfactual: represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program)

Problem: Counterfactual cannot be observed

Solution: We need to “mimic” or construct the counterfactual

impact evaluation

• Impact evaluation methods: Non-

experimental and Experimental

• Conclusions

MEASURING

IMPACT

How can we “mimic” or construct the counterfactual?

Select a comparison group not affected by the program.

RandomizedUse random assignment of the program to create a control group which mimics the counterfactual.

Non-randomizedArgue that a certain excluded group mimics the counterfactual.

MEASURING

IMPACT

Methods:• Pre-post• Simple difference• Difference-in-differences• Multivariate regression• Statistical Matching• Interrupted Time Series• Instrumental Variables• Regression Discontinuity• Randomized controlled trials

Non-randomized

Randomized

MEASURING

IMPACT

Methods:• Pre-post• Simple difference• Difference-in-differences• Multivariate regression• Statistical Matching• Interrupted Time Series• Instrumental Variables• Regression Discontinuity• Randomized controlled trials

Non-randomized

Randomized

MEASURING

IMPACT

An impact evaluation is only as good as the comparison group it uses to mimic the counterfactual.

For each evaluation method, ask yourself:

1. What is the comparison group?

2. What assumptions must be valid in order for the comparison group to accurately represent the counterfactual?

INTERNAL VALIDITY

UNDERSTANDING

IMPACT

2. What assumptions must be valid in order for the comparison group to accurately represent the counterfactual?

INTERNAL VALIDITY

Objectives Achievement: What intended outputs and outcomes/impact

were found and to what extent can they be attributed to

project/program activities?

Annex A: Evaluation Criteria, NEDA-DBM Joint Memorandum Circular No. 2015-01

…ensure that evaluations are conducted with the highest possible degree

of impartiality in order to maximize objectivity and minimize the potential

for bias.

Annex E: Impartiality, NEDA-DBM Joint Memorandum Circular No. 2015-01

MEASURING

IMPACT

Case Study: Pratham’s Balsakhi Program

MEASURING

IMPACT: RANDOMIZED EXPERIMENTSEvaluation method Comparison group Assumptions Balsakhi Impact

Estimate

Pre-post

Simple difference

Difference-in-differences

Multivariate regression

Randomized ControlledTrial

MEASURING

IMPACT: PRE-POST

Before After

One of the most common methods for determining impact.

Compare data for program participants BEFORE and AFTERthe intervention.

MEASURING

IMPACT: PRE-POST

Balsakhi Program: Outcomes

• You are tasked to conduct a pre-post evaluation of the balsakhi program on at-risk children, evaluating the impact of the program on test scores.

1. What is the comparison group in this evaluation?

2. What are the potential problems with this evaluation? I.e, what assumptions must be true in order for the comparison group to be valid?

MEASURING

Estimate

Pre-post Program participants before participating in the program

Simple difference

MEASURING

IMPACT: PRE-POST

Start of program End of program

Average test scores of Balsakhi students

Average post-test score for children with a Balsakhi 51.22

Average pretest score for children with a Balsakhi 24.80

Difference 26.42

Balsakhi program Test scores

MEASURING

IMPACT: PRE-POST

school feeding program

MEASURING

IMPACT: PRE-POST

free uniforms

MEASURING

IMPACT: PRE-POST

free uniforms

potable water system installed

MEASURING

IMPACT: PRE-POST

free uniforms

conditional cash transfers

MEASURING

IMPACT: PRE-POST

free uniforms

conditional cash transfers

new textbooks

Natural maturity / increased cognitive skills over time

good harvest

Teacher trainings

Improved roads

MEASURING

IMPACT: PRE-POST

MEASURING

Estimate

The program was the only factor influencing outcomes over time

26.42*

Simple difference

MEASURING

IMPACT: SIMPLE DIFFERENCE

Non-participants

Measures the difference between program participants and non-participants after the program is completed.

Participants

Program starts

MEASURING

• You are tasked to conduct a simple difference evaluation of the balsakhi program, evaluating the impact of the program on test scores.

2. What are the potential problems with this evaluation? I.e, what assumptions must be true in order for the comparison group to be valid?

MEASURING

Estimate

26.42*

Simple difference Non-participants from whom we have outcome data

MEASURING

56.2751.22

Not enrolled in program Enrolled in program

Average test scores end of program

Average score for children with a balsakhi 51.22

Average score for children without a balsakhi 56.27

Difference -5.05

MEASURING

Selection effect

• Self-selection: those who voluntarily join program likely to be different than those who don’t (e.g. more motivation, access, etc.)

• Administrative selection:administrators select participants based on certain criteria

• Treatment and comparison groups are not comparable

MEASURING

Estimate

26.42*

Simple difference Nonparticipants from whom we have outcome data

Participants and nonparticipants are identical except for program participation (i.e. no selection effect)

MEASURING

Estimate

26.42*

-5.05*

MEASURING

IMPACT: DIFFERENCE-IN-DIFFERENCES

Non-participants

Combines simple difference and pre-post approaches.

Participants

Program starts

Participants

Non-participants

Measures changes in outcomes over time of program participants relative to the changes in outcomes of nonparticipants.

MEASURING

IMPACT: DIFFERENCE-IN-DIFFERENCESP

Program starts

MEASURING

Program starts

51.256.27

MEASURING

Program starts

IMPACT =-5.05 – (-11.87) =

6.82T-C = -11.87

T-C = -5.05

MEASURING

Pretest Post-test Difference

Average score for children with a balsakhi(treatment)

24.80 51.22 26.42

Average score for children without a balsakhi(comparison)

Difference

36.67 56.27 19.60

6.82-11.87 -5.05

MEASURING

IMPACT: DIFFERENCE-IN-DIFFERENCEST

Program starts

Parallel Trends Assumption

Differences between treatment and comparison groups do not have more or less of an effect on outcomes over time.

Differences have constant effect on outcomes.

MEASURING

IMPACT: DIFFERENCE-IN-DIFFERENCEST

Program starts

Parallel Trends Assumption

Differences between treatment and comparison groups have constant effect on outcomes.

MEASURING

• You are tasked to conduct a double difference evaluation of the balsakhi program on at-risk children, evaluating the impact of the program on test scores.

2. What are the potential problems with this evaluation? Be specific.

MEASURING

Estimate

26.42*

-5.05*

Difference-in-differences Nonparticipants from whom we have outcome data before and after the program

If program wasn’t implemented, two groups would have identical trajectories

MEASURING

IMPACT: MULTIVARIATE REGRESSION

Non-participants Participants

Program starts Like simple difference, but…

“control” for factors that might explain differences in outcomes other than the program

MEASURING

IMPACT: MULTIVARIATE REGRESSION

Non-participants Participants

Program starts Like simple difference, but…

“control” for factors that might explain differences in outcomes other than the program

explanatory variables (age, income, education, etc.)

MEASURING

IMPACT: MULTIVARIATE REGRESSIONEvaluation method Comparison group Assumptions Balsakhi mpact

estimate

26.42*

Simple difference Nonparticipants Participants and nonparticipants are identical except for program participation (i.e. no selection effect)

-5.05*

Nonparticipants before and after the program

Multivariate regression Nonparticipants All observable differences controlled for, no unobservable differences that affect outcome

MEASURING

IMPACT: OTHER METHODS

There are more non-experimental methods to estimate program impacts:• Statistical matching• Regression discontinuity design (RDD)• Instrumental variables• Interrupted time series

Common thread: all try to mimic the counterfactual to estimate impact.

Problem: assumptions are not testable

MEASURING

IMPACT: RANDOMIZED EXPERIMENTS

Also known as:

• Randomized Controlled Trials (RCTs)• Randomized Assignment Studies• Randomized Field Trials• Social experiments• Randomized Controlled Experiments

MEASURING

Program

candidates

Outcomes of

interestRandomly split

into 2 groups

INTERVENTION

NO INTERVENTION

TREATMENT GROUP

CONTROL GROUP

MEASURING

KEY ADVANTAGE

Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,

any difference that subsequently arises between them can be attributed to the program rather than to other factors.

MEASURING

What’s the difference between random selection and random assignment?

Target

Population

Not in

evaluation

Evaluation

Sample

Population

Random

Assignment

Treatment

Control

MEASURING

EXTERNAL VALIDITY INTERNAL VALIDITY

Randomly

sample

from both

program and

control

Randomly

assign

to program

and control

Randomly

sample

from area of

interest

Randomly

sample

from both

program and

control

Randomly

assign

to program

and control

Randomly

sample

from area of

interest

MEASURING

• Assignment purely by chance; allocation unrelated to characteristics that affect outcomes

• With large enough number of units, the two groups are statistically identical, on average

• Key advantage: balanced on unobservable characteristics as well as observables

• Any differences that subsequently arise can be attributed to the program rather than other factors

MEASURING

Standard deviation in parentheses. Statistics displayed for Regions I, II, III, IV-A, and V.

*/*/***: Statistically significant at the 10% / 5% / 1% level

Source: Edmonds, et al. (2016). Impact Evaluation of KASAMA Program: Baseline Report.

Evaluation of DOLE’s KASAMA Program: Balance Table

MEASURING

Estimate

26.42*

-5.05*

Difference-in-differences Nonparticipants from who we have outcome data before and after the program

Multivariate regression Nonparticipants All observable differences controlled for, no unobservable differences that affect outcome

Units randomly assigned to the control group

Assumptions are limited provided the design is strictly followed

impact evaluation

and Experimental

• Conclusions

Conclusions

▪ Theory of change as the basis for what to monitor and evaluate

▪ Monitoring is equally important as impact evaluations

▪ There are many ways to estimate a program’s impact

▪ Different methods can generate different estimates

▪ Each evaluation method has specific assumptions and

limitations

▪ If applicable, randomized experiments, when properly designed

and conducted, provide the most credible method to estimate

the impact of a program

More Evidence,

Less Poverty

Innovations for Poverty Action

www.poverty-action.org

rhalili@poverty-action.org

Thank you!

impact evaluations and

Documents

lathe check formation and their impact on evaluations of

social safety nets and gender- learning from impact...

causality, rcts and impact evaluations · et cetera. peter...

evidence and lessons learned from impact evaluations on...

strategic impact evaluation fund - world...

impact evaluations and development

impact evaluations and development draft nonie guidance on...

survey questions for impact evaluations which rely …...1...

how do impact evaluations improve operations and policy? 9...

principles impact evaluations - reducing poverty through

wp4 - designing impact evaluations different perspectives

impact of ingratiation on judgments and evaluations: a...

impact evaluations in education - world bank

measuring women s and in impact evaluations · measure...

designing rigorous impact evaluations of agricultural...

designing quality impact evaluations under budget, time...

evaluations backgrounder: a summary of formal evaluations...

impact evaluations and social innovation in europe

impact evaluations and communication lessons learned from...

neil buddy shah evaluations with impact paul wang decision...