program evaluation psyco 325 sept 27, 2007 rebecca watchorn

42
Program Evaluation PSYCO 325 Sept 27, 2007 Rebecca Watchorn

Upload: bonnie-french

Post on 31-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Program Evaluation

PSYCO 325

Sept 27, 2007

Rebecca Watchorn

History

• Effect of– Installing electric street lighting– Purification of water– Prohibiting child labour– Unemployment benefits

Modern example

• Documentary about juvenile delinquents sent to visit a prison, meet actual inmates who show what life is like in prison to deter the young people from this life

• Developed into actual programs across US

Petrosino, A., Turpin-Petrosino, C., & Buehler, J. (2005). Scared Straight and Other Juvenile Awareness Programs for Preventing Juvenile Delinquency. The Scientific Review of Mental Health Practice, 4(1), 48-54.

• Let's Talk Science strives to improve Science literacy through leadership, innovative educational programs, research and advocacy. We motivate and empower youth to use science, technology and engineering to develop critical skills, knowledge and attitudes needed to thrive in our world.

• Wings of Discovery is Canada's first comprehensive year-long hands-on program for children to explore, learn and apply basic mathematics, sciences and technology concepts in daily programming within a structured early years learning program (ages infant – 5 years) and an after-school program (ages 6-12).

• The Butterfly Learning Centre is a preschool / after-school care facility that strives to improve the learning and working environment of an early childhood education institution with the goal to enrich the learning experience of all children, families and staff. Built in 2001, facilities include networked computers in all rooms, specialized ‘science room’, innovative design of classrooms. BLC fosters strategic relationships with other technology companies to enhance learning for all children.

Becky’s example

What is Evaluation?

Coming up…

• Definition

• Outcome vs. Process evaluation

• Who wants evaluation?

• Functions of evaluation

Definition of Evaluation

The systematic assessment of the operation and or outcomes of a program or policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy. (Weiss, 1998)

Outcome vs. Process Evaluation

• Summative vs. Formative• Outcomes/results/effects

• Must be careful about definitions• What about things you don’t want to happen

(e.g. self-labels)?

• Process• Integrity of the protocol• Can help with understanding the outcome

(what are the outcomes actually OF?)

Process evaluation

• Mobiles in nursery

• Found no effect. Why?

• Importance of program fidelity

Who wants evaluation?

• Philanthropic donors

• Local/provincial/federal governments

• Program directors

• Program managers

• Mandated grant requirement

– Each has their own questions and concerns (organizational learning, decision making)

Who wants evaluation?

• Community / corporate donors

• BLC teachers

• BLC directors / board of governors

• Wings of Discovery program creators

• Let’s Talk Science administrators

• Queen’s University Professors

• Wilfred Laurier University graduate students

• Parents

$30,000 - Premiere Room Sponsors

ASSESSMENT AND EVALUATION GROUP

Functions of Evaluation

Evaluation Function• Review• Needs assessment• Formative evaluation• Effectiveness evaluation• Impact evaluation• Maintenance evaluation

Development Function Project conceptualization Design Development Implementation Institutionalization Project

reconceptualization

Informs

Formative evaluation

• Purpose: to provide information to guide decisions about fixing problems or enhancing a program at various stages of development

• Can enhance the effectiveness and efficiency of the program

• Resistance: – Time, money, human nature (reluctant to

subject selves to potential criticism), measurement difficulties

• Key to detecting and reducing flaws to eventually attain high quality program

Formative Evaluation

E.g.

• How is the Wings of Discovery program being implemented by actual teachers?

• How are the kids responding to the lessons? Do they seem to like them? Are they giving the types of responses program creators anticipated?

Effectiveness evaluation• Purpose: to determine whether a program accomplishes its

objectives within the immediate or short-term context of its implementation

Decision Ex. Questions

How should the program be marketed?

Do participants achieve the objectives of the program?

Should this program be adopted?

What are the implementation requirements? What are the outcomes of the different implementations of this program?

How should instructors be trained to implement this program?

To what degree do instructors implement the program as designed?

What creative adaptations of this program have instructors made?

What price should be charged for this program?

How do the results achieved with this program compare to alternatives?

Evaluation measures

• How is information collected?– E.g., implementation logs, questionnaires,

interviews, observations, tests, expert review

Effectiveness Evaluation

E.g.• Are the children learning what they are supposed to? Are they

using the science terms they are introduced to?• Should the Wings of Discovery program be adopted?• How should this program be marketed?

Impact evaluation

• Purpose: to determine whether the knowledge, skills, and attitudes learned via the program transfer to the intended context of use

• Typically in industry impact questions are related to the bottom line. In education impact questions are related to the long-term effects of a program

• Challenge in evaluating impact is establishing causal relationships (best strategy: triangulation)

Impact Evaluation

E.g.

• Are the children transferring what they learn from this program to other contexts (at home, outside of school, or later grades)?

• How do we know if these behaviours are a result of being in the program?

Maintenance evaluation

• Purpose: to monitor the progress and regular use of the program so that decisions about support, modification or reconceptualization can be influenced.

• Over time, every system changes. People leave, change roles, equipment becomes obsolete, ‘best practices’ are updated etc.

Decision Ex. Questions

Should the program be reconceptualized?

Is the system still being used? Are its objectives still relevant?

Should the implementation plans be modified?

How have teachers integrated the program into courses? What aspects of the program have been dropped?

Should aspects of the program be updated?

Do the participants perceive the program as current?

Maintenance Evaluation

E.g. Future evaluation:

• Is the program still up to date? (advances in technology, scientific understanding)

• Have any aspects of the program been dropped through the years?

• Do parents still think this program is something worthwhile?

Planning the evaluation

Coming up…

• How to decide which questions to pursue

• Types of evaluation questions

• Quantitative or qualitative?

• Ethical issues

How to decide which questions to pursue

– Possible criteria:• Decisional timetable (can evaluation information

contribute to making a more informed decision?)• Relative clout of interested parties• Preferences of stakeholders• Uncertainties in the knowledge base• Practicalities• Assumptions of program theory• Potential for use of the findings• Evaluator’s professional judgment

Goals of evaluation questions• Program process: what is going on in the program?

– Fidelity of the program to designers’ intentions– E.g. is the program attracting homeless clients? For how many of them is it

providing hot meals? Or, more openly: Who is coming to the program? What help is staff giving them?

– Emphasis on processes of recruitment, service, client response, and program operation

• Program outcomes: consequences of the intervention for its clients– Focus on change in clients’ situations (knowledge, behaviours, earnings, health

status, drug use, etc.)• Attributing outcomes to the program: determining whether any changes are

due to the program– Economy might have improved and better jobs became available, trainees at low

point before? Now just older & more saavy?• Links between process and outcomes

– Are particular features of the program related to better or poorer outcomes?– E.g. did group discussions lead to better outcomes than the same information given

one on one?• Explanations: not only what happened, but how and why?

– If you want to improve the likelihood of success, it helps to know the reasons for achievements and shortfalls

Quantitative or qualitative?

• Quantitative: – data that can be transformed into

numerical form, analyses usually statistical, reports based largely on size of effects and significance of stats.

• Qualitative:– Interviewing and observation

techniques, analyses and reporting often in narrative form.

Quantitative or qualitative?

• Methods should match central focus of the inquiry• Program process:

– New programs: often qualitative. Whole program may be too volatile to tie to a few arbitrary measures. Can remain open to new info and ideas about what the program is and does.

– Established programs: often quantitative. Clearly defined program, well-specified activities, can use quantitative methods to characterize program process.

• Program outcomes:– Are there precise questions? – quantitative may be preferable

Quantitative or qualitative?

• E.g. job training program:– Quantitative: accurate data on the proportion

of trainees who find a job after training, wages they earn

– Qualitative: how trainees feel about the job hunting process, the kinds of jobs they look for, why they quit jobs after a brief time, etc.

Note: neither approach is as limited as this implies, these are just tendencies

Ethical issues• Real people in real programs, often in serious need of help• Simply put:

Do not harm the people studied; Do not distort the data

– intruding into the work domain of staff, interrupting their routines, possibly observing them in action, asking them questions about what they do, know, and think.

– Will have access to client information, might show the client in a poor light, might even subject them to sanctions if others found out (e.g. delinquents reveal other law violations, veterans receiving assistance may give facts that reveal their ineligibility etc.)

– Informed consent– Confidentiality and anonymity– High competence of evaluator– Honesty & integrity– Reciprocity: many feel respondents should have access to results– Protection of staff and client interests

Developing Measures

Coming up…

• Choices among measures

• Measurement of variables

• Developing new measures

• Desirable characteristics of measures in evaluation

Choice among measures

• Want to measure: inputs, processes, interim markers of progress, longer term outcomes, unintended consequences

• Same types of decisions as choosing questions (practicalities, uncertainties in the knowledge base, preferences of stakeholders)

• Want to be sure the measures are tapping the outcome you want to assess

Measurement of variables

• A demanding/time-consuming phase

• Might be able to use existing measures from earlier studies (trial-and-error work done, established reliability, comparison groups)

Developing new measures

• If can’t find existing measures, may need to develop your own.

• Much more difficult than it looks!– Balancing, interpretation of questions, etc.

Desirable characteristics of measures in evaluation

• Validity (extent to which measure captures concept of interest)

– Criterion validity (how well does your measure correlate with another established measure of the same concept?)

– Construct validity (do differences on your measure actually reflect differences in the theoretical construct)

– Content validity (does your measure cover the full spectrum of the concept?)

• Reliability (do repeated efforts to measure the same phenomenon come up

with the same answer?)

• Direction (for evaluation, outcome measures usually have a “good end” and

a “bad end”. E.g., which direction do you hope to see unemployment rates, birth weights, history test scores, etc. go?)

VS.

Collecting data

Coming up…

• Sources of data

• Sampling

• Interviewing

• Coding responses

Sources of data – more ideas• Informal interviews• Observations• Formal interviews• Written questionnaires• Program records• Data from other institutions (e.g., school, jail,

etc.)• Many others (e.g., tests of knowledge,

simulation games, psychometric tests of attitude/value/personality/beliefs, diaries, focus groups, physical tests, etc.)

Sampling• Purposive Sampling – picking particular

people for particular reasons– Often used if interested in extremes of the

population– May need to address specific policy

questions– Not generally as justifiable as random

sampling (strong statistical advantages on favor of random sampling)

• Random sampling – assuming representativeness through laws of chance– Every unit has to have a known chance of

being selected– E.g. draw every 3rd name from the list– Large samples are better

Interviewing

• Survey Interviewing– Interviewers are trained so that each one is

following the same procedures.

• Open-ended (unstructured) interviewing– Interviewer starts with a list of topics to be

covered but questions are crafted to suit the particular respondent and the flow of conversation.

Coding responses

• Narrative responses have to be slotted into a set of categories that capture the essence of their meaning– Develop a set of descriptive categories that

capture the main themes in the material– Assign numerical values to each category– Tabulate the number of responses that fall

into each category of the code, analyze these responses against other data

• Quantitative data may also need to be coded (may group participants based on responses)

Analyzing data• Convert a mass of raw data into a

coherent account– Whether quantitative or qualitative: sort,

arrange, and process to make sense of their configuration

– Many statistical approaches available– Some basic analytic strategies:

• Describing, counting, factoring, clustering, comparing, modeling, telling the story

Interpretation of results

• Actual graph:

– Mentoring effective for boys and not for girls

– Funder asks: Scrap program for girls?

• Averages• What are we

measuring?• How could you improve

the program for girls (or more girls)?

BASC TRS AB - Final Scores Spring 2005Interaction of Gender and Months Mentored

0

5

10

15

20

25

30

35

40

45

50

female male

Gender

BA

SC

TR

S A

B -

25th Percentile Months Mentored

75th Percentile Months Mentored

Post-evaluation

• Replication

• Meta-analysis

• Cost-benefit analysis (Is the program worth the cost? Do the benefits outweigh the costs that the program incurs?)

Evaluation vs. other research

• After all of this, what do you think?– Purpose

• intended generality• Utility• Program derived questions• Judgmental quality• Action setting• Role conflicts• Publication• Allegiance

References• Weiss, C. H. (1998) Evaluation. Upper Saddle River, NJ: Prentice Hall.

• Reeves, T. C. & Hedberg, J. G. (2003). Interactive learning systems evaluation. Englewood Cliffs, NJ : Educational Technology Publications.