multi-site evaluation of science and mathematics teacher professional development programs: the...

24
www.elsevier.com/stueduc Studies in Educational Evaluation 33 (2007) 135–158 MULTI-SITE EVALUATION OF SCIENCE AND MATHEMATICS TEACHER PROFESSIONAL DEVELOPMENT PROGRAMS: THE PROJECT PROFILE APPROACH 1 Sandra K. Abell*, John K. Lannin*, Rose M. Marra*, Mark W. Ehlert*, James S. Cole**, Michele H. Lee*, Meredith A. Park Rogers**, and Chia-Yu Wang* *University of Missouri-Columbia, USA **Indiana University, USA Abstract The purpose of this article is to describe a new approach for conducting multi-site evaluations of professional development (PD). The approach extends a popular model of PD evaluation through the creation of individual project profiles that take into account the unique contextual variables of individual projects while providing an evaluation picture of the overall program. We provide illustrative evaluation results highlighting the importance of understanding project contexts. Through two case comparisons, we demonstrate how case-based context informed our interpretation of outcomes data. Finally, we discuss how our approach maps onto and extends a popular model for professional development evaluation. Accountability is a pervasive theme in the current US educational policy arena. Accountability is aimed at improving student learning and achievement (Linn, 2003; Reeves, 2002). Yet, to attain such ends, accountability measures can and often do address a wide range of variables. The focus on producing high-quality teachers for our schools is one such example. Although debates rage over how best to prepare such teachers (Darling- Hammond & Youngs, 2002; US Department of Education, 2002), a shared concern exists about the need. Too many classrooms are taught by individuals who are not certified in the subject matter they are teaching. Nationally, this problem is acute in the areas of science and mathematics, where figures for those who lack state certification in their field range 0191-491X/04/$ – see front matter # 2006 Published by Elsevier Ltd. doi:10.1016/j.stueduc.2007.04.003

Upload: sandra-k-abell

Post on 28-Oct-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

www.elsevier.com/stueduc

Studies in Educational Evaluation 33 (2007) 135–158

MULTI-SITE EVALUATION OF SCIENCE AND MATHEMATICSTEACHER PROFESSIONAL DEVELOPMENT PROGRAMS:

THE PROJECT PROFILE APPROACH1

Sandra K. Abell*, John K. Lannin*, Rose M. Marra*, Mark W. Ehlert*,James S. Cole**, Michele H. Lee*, Meredith A. Park Rogers**, and Chia-Yu Wang*

*University of Missouri-Columbia, USA**Indiana University, USA

Abstract

The purpose of this article is to describe a new approach for conducting multi-siteevaluations of professional development (PD). The approach extends a popular modelof PD evaluation through the creation of individual project profiles that take intoaccount the unique contextual variables of individual projects while providing anevaluation picture of the overall program. We provide illustrative evaluation resultshighlighting the importance of understanding project contexts. Through two casecomparisons, we demonstrate how case-based context informed our interpretation ofoutcomes data. Finally, we discuss how our approach maps onto and extends apopular model for professional development evaluation.

Accountability is a pervasive theme in the current US educational policy arena.Accountability is aimed at improving student learning and achievement (Linn, 2003;Reeves, 2002). Yet, to attain such ends, accountability measures can and often do addressa wide range of variables. The focus on producing high-quality teachers for our schools isone such example. Although debates rage over how best to prepare such teachers (Darling-Hammond & Youngs, 2002; US Department of Education, 2002), a shared concern existsabout the need. Too many classrooms are taught by individuals who are not certified in thesubject matter they are teaching. Nationally, this problem is acute in the areas of scienceand mathematics, where figures for those who lack state certification in their field range

0191-491X/04/$ – see front matter # 2006 Published by Elsevier Ltd.

doi:10.1016/j.stueduc.2007.04.003

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158136

from 28-33% for mathematics teachers and 18-20% for science teachers (Ingersoll, 1999;Olson, 2000).

Designing and implementing effective professional development for mathematicsand science teachers is one solution to the teacher quality problem. Professionaldevelopment (PD) is a term used to describe experiences aimed at improving teacherknowledge and practice. Accountability is an increasingly relevant issue for PD projects asfunding agencies try to ensure that their investments result in measurable gains in teachercontent knowledge, positive changes in teacher practice, and improved student learning.Although much is known about evaluation methods in general, effective evaluation of PDprojects such as the US federally-funded Improving Teacher Quality State GrantsProgram2 pose the special challenge of conducting a single evaluation of a programinstantiated by multiple projects, each of which addresses the program goals in differingways.

The purpose of this article is twofold. First, we describe our approach for conductingmulti-site evaluations of PD projects. The approach extends a popular model of PDevaluation (Guskey, 2000). We create individual project profiles that take into account theunique contextual variables of each project while still providing an evaluation picture of theoverall program by reporting outcomes data within and across projects. We then explicitlymap our approach onto Guskey's (2000) model for evaluating professional development.This mapping allowed us to analyze our approach closely, which, as we describe later inthe article, resulted in an adaptation of Guskey's model.

As external evaluators for one state's Improving Teacher Quality Grants Program,we applied our approach to nine PD projects for science and mathematics teachers. After ashort review of the literature on large evaluation projects, we describe the approach that theexternal evaluation team took in evaluating the nine PD projects. We then provideillustrative evaluation results that highlight the importance of understanding projectcontexts. Through two case comparisons, we demonstrate how the outcomes data dependedon the case-based context for interpretation. Finally, we discuss how our approach mapsonto and extends the Guskey (2000) model for professional development evaluation.

Review of the Literature

According to Guskey (2000), "True professional development is a deliberateprocess, guided by a clear vision of purposes and planned goals" (p. 17). Some degree ofagreement has emerged about what characterizes effective professional development(Loucks-Horsley, Love, Stiles, Mundry, & Hewson, 2003), including dimensions related tocontent and implementation. Researchers (Jauhiainen, Lavonen, Kopenen, & Suonio,2002; Loucks-Horsley et al., 2003; Shepardson, Harbor, Cooper, & McDonald, 2002), aswell as US standards documents in mathematics and science education (National Councilof Teachers of Mathematics [NCTM], 1991; National Research Council [NRC], 1996),agree that PD for mathematics and science teachers should include a focus on subjectmatter knowledge, pedagogical knowledge, and pedagogical content knowledge. Researchevidence demonstrates that effective PD engages teachers as learners (Loucks-Horsley etal., 2003), provides sufficient time, structure, and support for teachers (Garet, Porter,Desimone, Birman, & Yoon, 2001; Sparks, 2002; Thompson & Zeuli, 1999), encourages

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 137

change in deeply held beliefs and practices through cognitive dissonance (Thompson &Zeuli), and engages teachers in an intensive and sustainable process of continuousimprovement (Garet et al., 2001; Loucks-Horsley et al., 2003).

Even though some degree of agreement exists regarding the characteristics ofeffective PD, there is still a gap between this knowledge and common practice (Loucks-Horsley et al., 2003). PD continues to be described as (a) lacking in number and variety ofopportunities for educators to participate; (b) misaligned to the needs or learning goalsemphasized in education reform; (c) insufficient in providing sustained support toeducators; (d) focused on changing the individual educator rather than theorganization/school; and (e) introducing pockets of innovation with minimal means forimpact at the classroom and system levels (Loucks-Horsley et al., 2003). These gapsbetween research and performance underscore the need to evaluate PD effectiveness.

Many researchers have evaluated the effectiveness of mathematics and scienceteacher PD programs by examining individual cases of PD (e.g., Jauhiainen, Lavonen,Koponen, & Suonio., 2002; Shepardson Harbor, Cooper, & McDonald, 2002). Suchsingle-case PD evaluations have also taken place in other curricular areas. For example, ina recent evaluation of a single case of PD for reading instruction, Shaha, Lewis, O'Donnell,and Brown (2004) proposed an approach focusing on three types of impact: learning,attitudinal, and resource. In the reading PD, Shaha et al. measured these at the student andteacher levels using a design that included testing, control groups, and demographicanalysis. They reported teacher and student gains in learning; student attitudes weresignificantly higher for the experimental group than the control group. They reported thatthis PD program was more cost-effective than others, and the school leadership decided tocease the "costlier" program in favor of this one.

This study, like many in mathematics and science education, focused on theevaluation of a single project at a single location, where the capacity to carry outexperimental designs is enhanced. The simultaneous evaluation of multiple projects posessome unique challenges. One challenge is how to report data that are comparable across allprojects, while at the same time reporting idiosyncratic characteristics of each project thatinfluence project quality.

Supovitz and Turner (2000) used a multiple site approach to PD evaluation inscience education. One of their primary research questions was whether or not theeffectiveness of PD programs in 24 communities across the US was related to inquiryteaching. Their outcome variables included teaching practice and classroom culture.Predictor variables included teacher attitudes toward reform, content preparedness,principal support, teacher resource availability, and school resource availability. Theycollected data consistently and uniformly across projects. Because the data wereconsidered to be nested, the data analysis approach included hierarchical linear modeling.Level 1 variables included teacher demographics, teacher attitudes, resource availability,and principal support. Level 2 variables included information specific to each project(community), such as suburban or rural, percent of free or reduced lunch, and school size.They found that school factors were important predictors of teaching practice andclassroom culture. However, they did not report data specific to each project. Though theyhighlighted many important characteristics of effective PD, they did not report the uniquecharacteristics that may have been important influences in each individual project.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158138

In another multi-site study, Ingvarson, Meiers, and Beavis (2005) investigated theimpact of PD on four outcome variables: teacher knowledge, teaching practice, studentlearning, and teacher efficacy. They predicted that background variables (e.g., school size),structural features (e.g., contact hours), opportunity to learn (e.g., feedback on practice),and professional community (e.g., teacher collaboration) would influence the four outcomevariables. Data were collected from 3250 Australian elementary and secondary teacherparticipants who attended over 80 different PD programs in a variety of subject areas,including mathematics and science. The researchers used regression analysis to determinehow much variance in each of four outcome variables was explained by the backgroundcharacteristics, structural features, opportunity to learn, and professional community. Bydoing so, the relative impact of the PD components could be evaluated by the outcomes foreach project. This technique allowed for a better understanding of the unique contributionof each PD component for the different projects.

Other multi-site PD evaluation has taken place under the auspices of PD programsfunded by the US National Science Foundation (NSF) and the Eisenhower program.Contracted by the NSF, Horizon Research Inc. (HRI) conducted an evaluation of multi-sitePD projects in mathematics and science, analyzing the comprehensive evaluation reportsfrom 61 NSF Local Systemic Change (LSC) projects (Boyd, Banilower, Pasley, & Weiss,2003). The evaluation reports blended data from core evaluation observations, interviews,and questionnaires, with data from cross-site analyses of teacher and principalquestionnaires, teacher interviews, classroom observations, and PD observations, as well asin-depth exit interviews with PIs of LSCs ending in 2000, 2001, and 2002. HRI used factoranalysis to identify survey questions that could be combined into "composites" (Flora &Panter, 1999). Each composite represented an important construct related to one of the keyevaluation questions. Evaluators reported the impact of the LSC initiative on teachers andtheir teaching by the extent of teacher involvement in LSC PD activities. The resultsreported by HRI included a list of overarching themes related to the impact of the PDprovided in the LSCs. Their reports did not discuss how these themes related to specific PDprojects.

Several evaluations of the Eisenhower PD program have been conducted bydifferent entities. The Eisenhower National Clearinghouse and Regional Consortiaconducted annual evaluations of activities and associated outcomes and issued their lastreport in 2004 for activities conducted between October 1, 2002 and September 30, 2003(Eisenhower Network Evaluation Committee, 2004). The report was organized aroundperformance standards associated with two primary objectives, providing technicalassistance and dissemination. Using administrative data and surveys from samples ofteachers who participated in at least 12 hours of Eisenhower-sponsored activities, the reportdocumented service levels and participants' ratings of the quality and usefulness of thoseservices. The impact of PD experiences on student learning was estimated by summarizingchanges in school level performance on state assessments. While this evaluation yieldedpositive outcomes from participating in PD activities, the report did not attempt todocument and/or control for differences in school contexts or the characteristics of specificprojects.

In another evaluation of specific components of the overall Eisenhower program,Blackwell (2004) analyzed artifacts and survey responses from nine Initial Teacher

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 139

Preparation projects to investigate whether state level policies on teacher preparation andPD had been affected by these unique partnership grants. The author described resultsacross eight dimensions of teacher preparation programs from a standards survey andrecognized the diversity of approaches to instruction and evaluation evident in the variousprojects. Many of the "summaries" of results included project-specific exemplars ofpractices or descriptions of how projects diverged from overall findings. The EisenhowerNational Clearinghouse also published a report describing lessons learned aboutcollaboration from the work of the Regional Consortia (National Network of EisenhowerRegional Consortia and Clearinghouse, 2004).

The most comprehensive evaluation of the Eisenhower Professional DevelopmentProgram was conducted between 1997 and 2000 by the American Institutes for Research.Different research designs and methods were used to investigate different evaluationquestions. The first report from this program evaluation was based on case studies in sixschool districts and focused on describing PD programs and identifying issues that mightaffect implementation efforts at specific sites (Birman, Reeve, & Sattler, 1998). Thesecond report was based on survey and interview data from a national probability sample ofparticipating teachers and PD providers and in-depth case studies at 10 participating schooldistricts. The research team used regression analyses to document relationships amongstructural characteristics, core features, and participating teachers' changes in knowledgeand practice (Garet, Birman, Porter, Desimone, & Herman, 1999). The third reportexamined changes in teacher practices as a function of PD and was based on longitudinalsurvey data from mathematics and science teachers in 30 schools. The researchersexamined differences in characteristics of Eisenhower-funded PD and other district PD andthe extent to which teachers attributed changes in knowledge and practice to their PDexperiences (Porter, Garet, Desimone, Yoon, & Birman, 2000). The results of the thirdstudy indicated that specific characteristics of PD were related to changes in teacherpractice. The authors used hierarchical linear modeling techniques to identify sources ofvariation in measures from different districts, schools, and subject areas, but did not attemptto describe key features of those differences that may have influenced the quality of PD orits effectiveness in changing teaching practice. In summary, while national studies of theEisenhower program documented that differences in projects and activities occurred andhad an impact on the quality and effectiveness of PD efforts, the overall focus was on theaggregate effectiveness of all projects, i.e., the program.

In this article, we describe an evaluation model that extends the literature on multi-site PD evaluation by demonstrating the need to address the contextual variables of singleprojects along with documenting program outcomes as a whole. We begin by describingthe context of our evaluation and the approach we took. Next we illustrate our results usingtwo individual cases. Finally, we discuss the theory reflected in our multi-site PDevaluation model.

Context

In June, 2002, the Missouri Department of Higher Education (MDHE) announced aRequest for Proposals (RFP) for Cycle 1 of the Improving Teacher Quality State GrantsProgram. The RFP solicited projects that would increase student achievement in

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158140

mathematics and science, K-12, through improving teacher and principal quality therebyincreasing the number of highly qualified mathematics and science teachers in the state.The RFP recommended that each project address the following core features in its design:

(1) Ongoing collaboration of teachers (measured in years) for purposes ofplanning; (2) the explicit goal of improving students' achievement of clearlearning goals; (3) …anchored by attention to students' thinking, thecurriculum, and pedagogy; and (4) access to alternative ideas and methods,with opportunities to observe these in action and reflect on the reasons for theireffectiveness. (MDHE, 2002, p. 4).

More specifically, proposal writers were required to plan and provide PD designedto meet school needs over a 17-month period, including a 10-16 day summer institute andintensive, school-based, job-embedded follow-up of 4-10 days. The PD projects couldinvolve paraprofessionals and principals as participants, in addition to teachers, to supportthe implementation of standards-based instructional practices. Funds were also available toeligible partners to develop appropriate curriculum materials for participating teachersand/or their students. Eligible partnerships were required to include: a higher education unitthat prepares teachers; a higher education arts and sciences unit; and high-needschools/school districts. High-need schools/school districts were defined by (1) 20%poverty; (2) teacher quality—high percent of teachers teaching out of field or withoutcertification; and/or (3) low student achievement in mathematics or science.

Each project was required to include an internal evaluation component to assessproject outcomes. In addition, the MDHE (2003) produced a separate RFP that solicitedproposals for external evaluation of the funded projects. In February, 2003, we werecontracted by MDHE to serve as the Cycle 1 external evaluation team. Subsequently, wehave been awarded contracts for evaluation of Cycles 2-8. This article focuses on ourCycle 1 evaluation.

The Cycle 1 evaluation RFP required that the external evaluation team provideformative and summative evaluation of the funded PD projects, both individually andcollectively. We were required to conduct two site visits to each project as well as collectand analyze data on teacher quality and student achievement. We began the externalevaluation design process with these requirements in mind.

Our Approach to PD Evaluation

Design Process

Our evaluation design process began as we examined the MDHE RFPs, in terms ofrequirements for both the PD projects and external evaluation. Using the RFPs as a startingpoint, the first design stage involved determining how we would evaluate the degree towhich projects attained the objectives specified in the RFP along with project-specificgoals. We thought that project goals and activities would be related to the theoreticalassumptions held by the project team (e.g., that inquiry-based science teaching was

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 141

essential to student performance, or that teachers need to be taught in ways that model howtheir students will learn).

Our evaluation design was also influenced by a model of PD program evaluationused by Medina, Pollard, Schneider, and Leonhardt (2000) to evaluate history teacher PDin California. The model links teacher PD, classroom practice, and student learning,implying that the teacher learning that occurs during PD influences instructional planningand delivery, which in turn influences what and how students learn. Student performancebecomes an input into the cycle for planning new PD experiences. This view of theinfluence of PD on teacher practice and student learning was consistent with the goals ofthe Cycle 1 RFP.

We further believed that the extent of teacher learning, and the effect of that learningon teacher work and on student learning, would be related to the quality of implementationof the PD itself. The research and standards on effective PD, as well as the characteristicslisted in the Cycle 1 RFP, influenced our thinking about what would constitute PD quality.Therefore, an important component of the evaluation design was to plan data collectionaround key dimensions of PD planning and implementation suggested by research andstandards (Jauhiainen et al., 2002; Loucks-Horsley et al., 2003; NCTM, 1991; NRC, 1996)for PD, including:

• organization and coherence;• degree of emphasis on improving content knowledge versus improving teachingpractice;

• reform-based instructional techniques;• availability of resources and materials to facilitate classroom implementation,• level of effort and support during the subsequent school year.

Thus, our evaluation design sought to gather data about program goals, assumptions,activities, and participants as well as the quality of implementation (see Figure 1). Webelieved that outcomes for teachers and students would result from the interrelationshipsamong these factors.

Figure 1: Cycle 1 Initial Evaluation Design Model.

Implem

entation

Program Goals andTheories & Assumptions

Program Activities& Methods

Participants andtheir Schools

Outcomes

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158142

Research Questions

We designed our evaluation to determine: (1) how well projects attained theirobjectives; (2) the quality of the PD that was delivered, and (3) what outcomes wereachieved for teachers and students. In order to address these evaluation goals, wedeveloped the following set of research questions that guided data collection and analysis:

1. What are the characteristics of the funded PD projects? How much variation andsimilarity exist in philosophy, assumptions, and methods? To what extent doprojects achieve their stated goals and the goals of the Improving Teacher QualityProgram?

2. What are the characteristics of the teachers who participate in the PD projects?3. What learning outcomes (knowledge and practice) for teachers are demonstrated in

the PD projects? How are they related to project goals?4. Are there relationships among project characteristics, teacher characteristics, and

teacher learning outcomes?5. What learning outcomes are demonstrated by students of participating teachers?6. Do aggregate measures of student achievement based on the Missouri Assessment

Program (MAP) and other standardized assessments show increases in schools withteachers who participated in PD activities?

Procedures

Members of the external evaluation team conducted two site visits per PD project,the first near the end of the project's summer institute (June or July, 2003), and the secondat a project follow-up activity held during the subsequent academic year. During the sitevisits, we made observations and took field notes, interviewed the project director and othercontributing staff members, and interviewed a random sample of teachers (N = 72). Wealso collected data on participants and their schools using a Demographic Questionnaire.

After the summer site visit, we used inductive analysis to examine the interview andobservational data, and compared those data with goals as explicated in the observedproject's original proposal. We then wrote brief formative reports to provide feedback toeach project director. Each report characterized the assumptions, goals, and activities of theproject and listed strengths, areas needing improvement, and specific recommendations forproject improvement.

The summative evaluation focused on the quality of the implementation of PDprojects and resulting outcomes. Site visits, including interviews and observations, were animportant source of information about PD implementation. In addition, participants wereasked to complete a Satisfaction Survey at the end of the summer institute and again at theend of the following academic year. These surveys provided data on teacher perceptions ofgoal attainment, effectiveness of instruction, impacts of participation on teaching practiceand student learning, overall satisfaction with project goals and activities, and suggestionsfor improving PD projects.

Availability of data about important project outcomes depended to a large part onthe cooperation of each project director and internal evaluator with the external evaluation

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 143

team. Most projects collected data on teacher content knowledge, teacher pedagogicalpractice, and/or teacher attitudes. Some projects also collected student achievement data,although this was less common. Project directors summarized data from internal evaluationactivities in the final reports submitted to the MDHE.

Student learning outcomes were measured by some internal evaluators via teacher-made assessments and standardized tests. The external evaluators received teacher self-reported data on student achievement via the Satisfaction Survey. In addition, we examinedchanges in school-level performance on relevant Missouri Assessment Program (MAP)examination results. We calculated average MAP performance measures on mathematicsand science exams for all schools in the state for all years prior to Cycle 1 implementation.We calculated changes in performance measures by subtracting the average performancefrom the 2004 performance in each school. We tested whether the MAP performance ofschools with teachers who participated in the PD projects improved more than otherschools. We compared average change scores for each relevant test using a one-wayanalysis of variance. In addition, school-level change scores were regressed on the percentof teachers who participated in the PD project while controlling for the percent of studentseligible for free/reduced lunch, school size measured as the total enrollment, and thepercent minority student enrollment.

Reporting

In order to capture the data on context, quality of PD delivery and outcomes, weconstructed individual profiles for each of the nine PD projects. Each profile consisted ofsix main sections that were linked to our research questions and reflected our evaluationmodel (Figure 1). In the first, Project Background, we described the project team and theirexperience with PD, the goals for the project, and the theoretical assumptions guidingproject design. The second section, Project Design, described the activities of the PD,including the summer institute and follow up activities. In the third section, Participantsand their Schools, we presented data on the characteristics of the PD participants (gender,educational background, teaching experience, and certification) and of their schools(percent of free and reduced lunch and performance on the state science or mathexamination). The fourth section was Quality of Implementation. Evaluators analyzed andsynthesized data from observations, interviews, and the Satisfaction Survey to characterizeimplementation of the summer institute and follow-up activities, and the cohesiveness ofthe instructional team. In the fifth section of the profile, Outcomes, we presented data onteacher content knowledge gains, change in practice, and student achievement. In the finalprofile section, Recommendations, we listed suggestions for improving the PD based on theevaluation. Our final report (Abell et al., 2004) also included project participantdemographics for the total set of participants and results across all nine projects related toproject implementation, participant satisfaction, and outcomes for teachers and students.

Examining Contextual Variables: Sample Profile Comparisons

As the literature on multi-site project evaluation shows, one of the challenges of PDevaluation is to capture simultaneously the individual characteristics of each project whilealso making claims across all projects. We intended that the individual project profiles

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158144

would provide windows into each project, whereas the cross-project evaluation wouldallow summative claims about the PD program as a whole. During the reporting process,we recognized that the profiles helped us understand the similarities and differences amongprojects, which in turn helped us to interpret the outcomes in light of project contextvariables and quality of implementation. We present cross-case analyses of two PD projectpairs (one science, one mathematics), selected purposefully to illustrate our evaluationmodel. For this analysis, we drew key data from selected profile sections including ProjectDesign, Participants and their Schools, Quality of Implementation, and Outcomes. In thefirst comparison in each pair, we present information associated with the first three boxesin Figure 1: Program Goals, Activities, and Participants (factors related to planning andprogram design). In the second comparison in each pair, we present information associatedwith the last two boxes in Figure 1: Implementation and Outcomes.

Cross-case Analysis I: Science Projects

Of the six science PD projects in Cycle 1, we focus our attention here on two cases,the "DNA" project and the "Improving Achievement" project. Although these projectsaddressed the same RFP requirements, they were significantly different in focus,participants, and activities. Table 1 summarizes these differences.

Table 1: Comparison of Two Science Projects by Characteristics

Characteristics DNA Improving AchievementProject history Previously delivered seven

timesNew project

Science content Focused: DNA analysistechniques and science/societyimplications

Distributed among 3 courses:physics; ecology, andscience/technology

Participants • High school biologyteachers

• From urban and ruraldistricts across state

• Most with 6+ yearsscience teachingexperience

• Included master teacherswho had attended before

• K-12 teachers;• All from same urban

district• Most first year teachers in

an alternative certificationprogram

• All first time attendees

Summer institute activities • Practice DNA labtechniques

• Share best practices• Guest speakers• Socratic seminars

• Read and discuss sciencearticles

• Map science content tostate standards and districtcurriculum

• Develop lesson plans

Follow-up activities • DNA equipment rotatesamong schools

• "DNA Day" for teachersand students

• Complete and implementlesson plans

• Classroom coaching

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 145

"DNA" was a PD project aimed at teaching high school biology teachers DNAanalysis techniques and exploring societal issues related to DNA (e.g., DNA fingerprinting,evolution). Teachers who attended were from all over the state (rural, suburban, and urbandistricts) and most had more than six years of teaching experience. Project staff included ascientist and a science educator, as well as master teachers who had attended the institutenumerous times in the past. "DNA" included a summer institute where teachers engaged inthe laboratory activities that they would later implement with their students, participated indiscussions and data analysis, and listened to other teachers present examples of "bestpractices." After the summer institute, teachers signed up to receive the DNA labequipment and reagents for a 2-week period during the school year. They were also invitedto a "DNA Day" for high school students and teachers held in a science museum in one ofthe urban areas of the state.

"Improving Achievement" offered PD to K-12 teachers from one large urban districtin the state. Most of the participants had finished their first year of teaching as part of analternative certification program. Participants signed up for one of three courses: physics,ecology, or science/technology. Two of the courses were taught by scientists from localuniversities, and the third by the district's science coordinator. In each course, teachersread and discussed science articles to learn more science content. In the course led by thedistrict science coordinator, there was also an emphasis on mapping the content learned tostate standards and the district curriculum. Teachers were expected to develop lesson plansfor their classrooms. During the follow up, the staff planned to carry out classroomcoaching to assist teachers in implementing these lessons.

Thus, these two projects were quite different in scope, participants, and activities. Afurther difference between the two projects was that "DNA" had been delivered numeroustimes in the past, with Eisenhower and other funds. Over time, components such asteachers sharing best practices had been added, protocols had been revised, and bugsworked out. The project functioned smoothly and the staff was able to carry out plans withlittle difficulty. The project was subsequently awarded another grant in MDHE's Cycle 2."Improving Achievement," on the other hand, was a new project. The project ran intoseveral logistical challenges along the way and was not able to accomplish its plans for job-embedded PD through instructional coaching during Cycle 1 funding. The PI did not applyfor Cycle 2 funding, but we found out that some coaching did take place in the subsequentyear.

If we examine outcomes data for both projects (see Table 2), we find that teachershad different levels of satisfaction in each project, yet demonstrated high levels of sciencecontent learning in both. The resulting impact on instructional practice was high for"DNA," though this project focused on only a small portion of the biology curriculum. Theimpact evidence for "Improving Achievement" was incomplete. Because this project didnot complete planned follow-up activities during Cycle 1, we were unable to capture theimpact on teaching for most participants. In terms of student achievement, the "DNA"project provided data on a common test administered by a subset of project teachers. Theresults were consistently positive, but some classes showed much greater gains than others.Because the "Improving Achievement" project internal evaluators did not report studentlearning data, student achievement for the project was unknown at the end of Cycle 1.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158146

These two cases illustrate the importance of PD evaluation that includes thickdescription of individual project characteristics as well as reports of project outcomes.Teachers from both projects were successful in attaining gains in content knowledge, buthad differing levels of satisfaction. These outcomes could be due to important differencesbetween the projects. For example, teachers in the "DNA" project benefited from thehistory of the project in which the PI had worked with and listened to teacher participantsand involved them in future iterations. The experience of participants in "ImprovingAchievement" varied by the course in which they participated. Those who took the coursefrom the district's science coordinator learned about direct connections between the PD andthe district's science curriculum, and received follow-up support during the school year.Those in the other two courses had a very different experience, which led to differentoutcomes. The projects also varied by the background and teaching context of theparticipants, factors that could also affect the outcomes. Thus the individual projectprofiles capture these rich contextual differences in ways not possible by examiningoutcomes data alone.

Table 2: Comparison of Two Science Projects by Implementation and Outcomes

Implementation Quality/Outcome DNA Improving AchievementTeacher Satisfaction High Moderate

Teacher Content Knowledge High High (for 1 course; others unknown)

Teacher Instructional Practice High Data not available

Student Achievement Improved Unknown

Cross-case Analysis II: Mathematics Projects

As another demonstration of the importance of understanding the context of the PD,we provide a comparison of two of the three Cycle 1 mathematics PD projects. As seenin Table 3, the "Rural School" project and the "General Workshop" project shared somecharacteristics, yet differed in many ways. Both projects focused on a particularmathematical strand across grades K-12 and involved relatively inexperienced teachers.However, the "Rural School" project drew participants primarily from two rural schoolswith high poverty rates and low student achievement, whereas the "General Workshop"project involved teachers from nine school districts. Of these nine districts, one-third of theparticipants were from a school district with relatively high performing students locatednear the PI's university. The summer institute for the "Rural School" project was led by twomathematics educators who designed the follow-up activities and visited the rural schoolsites regularly throughout the school year. The "General Workshop" project brought inmaster teachers at the elementary, middle, and high school levels to lead the summerinstitute. An education faculty member visited classrooms once during the school year, andco-led the follow-up sessions along with the project director, a faculty member in statistics.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 147

Table 3: Comparison of Two Mathematics Projects by Characteristics

Characteristics Rural School General Workshop

Project history New project Delivered multiple times underEisenhower

Mathematics content Focused: geometry and spatialsense

Focused: statistics andprobability

Participants • K-12 teachers of alldisciplines, and teachersaides

• Mainly from 2 rural schools• Most with 1-5 years of

teaching experience• Districts of high poverty and

low student state assessmentscores

• K-12 teachers ofmathematics

• Teachers from a variety ofrural districts

• Most teachers with 1-5 yearsteaching experience

• Some districts from highpoverty and moderate stateassessment scores

Summer institute activities • Working sample stateassessment items andexamining district studentperformance

• Solving problems involvinggeometry and spatial sense

• Developing a curricular unitto be used the following year

• Provided teachers with avariety of activities thatcould be used in classrooms

• Activity sessions led byexperienced elementary,middle, and secondaryteachers

Follow-up activities • Biweekly visits to schoolsby project staff

• Full day follow-up seminarsat rural school sites

• Online discussion board

• Full day follow-up seminars• One follow-up visit to

observe and providefeedback

The “Rural School” project was a new project; the PIs worked hard to buildrelationships with the participating schools before the PD began. They ran into some snagsin program delivery: local school politics that were unrelated to the PD led to disruptionswithin the schools and distracted teachers from focusing on PD. The “General Workshop,”on the other hand, had previously been funded under the Eisenhower program and theproject logistics seemed to be carried out smoothly.

Table 4 compares the outcomes for both of these projects. Both projects receivedstrong teacher satisfaction scores for the summer institutes. However, the "Rural School"project received high satisfaction scores for the follow-up sessions and moderatesatisfaction for the school-based visits. The "General Workshop" project received moderatesatisfaction for both the follow-up sessions and school-based visits. Based on a contentknowledge test designed by the local project, the teachers in the "Rural School" projectdemonstrated moderate content knowledge gains. Data on teacher knowledge were notcollected by internal evaluators of the "General Workshop" project. Teachers from bothprojects reported some changes to instructional practice. Based on a locally designed exam,students in the classes of teachers from the "General Workshop" project demonstrated an

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158148

increase in achievement based on pre- and post-test scores. Data were not reported in the"Rural Schools" project related to student achievement.

Without understanding the context of these PD projects, it would be difficult toassess the effectiveness of each project. Understanding the context allowed us to examinethe impact of the design and the teacher characteristics from the "Rural School" and"General Workshop" projects. Teachers from both projects were satisfied with the summerinstitute. However, differences existed in the value the teachers placed on follow-upactivities. In the "Rural School" project, the regularity of involvement of project staffwithin two participating schools could have led to stronger teacher satisfaction with thefollow-up sessions. The teachers in the "General Workshop" expressed satisfaction with themaster teachers in the summer, but were less satisfied with the design of the follow-upactivities. Such dissatisfaction could have occurred due to the lack of connection with theproject staff in the summer institute or to the difficulty in meeting the needs of teachersfrom a variety of school districts. Considering the context of the PD allowed us to considerfactors that otherwise might have been ignored.

Table 4: Comparison of Two Mathematics Projects by Implementation and Outcomes

Implementation Quality/ Outcome Rural School General WorkshopTeacher Satisfaction

Summer Institutes High Very HighFollow-up Seminars High MediumSchool-based Activities Medium Medium

Teacher Content Knowledge Medium Data not available

Teacher Instructional Practice Moderate Moderate

Student Achievement Unknown Improved

Discussion

As part of work as external evaluators of the Improving Teacher Quality grants, weattempted to examine characteristics of effective PD across projects, yet considered thecontextual variables that could influence the results of the projects. In the followingsections, we further discuss the use of project profiles and relate our evaluation design toGuskey's (2000) model for PD evaluation.

Project Profiles: A Useful Tool for PD Evaluation

Multi-site PD evaluation attempts to document common outcomes across projects inorder to make claims about the value of a PD program as a whole. This is a reasonablegoal for funding agents who need to demonstrate the usefulness of their programs inadvancing teacher quality and student achievement. Yet, is it reasonable to expect thatprojects such as the ones described above, which deliver a variety of science ormathematics content to very different teachers (grade level, teaching experience) fromdiverse schools settings (urban, suburban, rural), would have similar outcomes? As

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 149

demonstrated in our case analyses, the context of professional development matters.Outcomes do not stand alone nor do they tell the entire story. Furthermore, program-leveloutcomes devoid of project-specific details are inadequate.

We believe that our approach to PD evaluation, via both individual project profilesand common cross-project measures, helps to resolve this dilemma. We attempted todetermine if PD was effective in terms of the satisfaction of the participants, the views ofthe staff, the quality of the implementation, and the outcomes for teachers and students.We created rich descriptions of individual projects as well as summative claims for theImproving Teacher Quality Grants program. In the next section, we contrast our approachwith an accepted model of PD evaluation.

Comparing Our Approach to the Literature

Professional development evaluation design rests upon theoretical models of therelationships among PD characteristics and outcomes (e.g., Medina et al., 2000). Guskeyand Sparks (1996) proposed a model based on the premise that the quality of PD isinfluenced by content, context, and process factors. Content factors include the knowledgeand skills to be developed as well as the degree of change required to enact the newknowledge and skills. Context factors include the "who, when, where, and why ofprofessional development" (Guskey, 2000, p. 74). Process refers to PD delivery format andinstructional strategies. Guskey and Sparks claimed that these factors affect the quality ofPD, which in turn influences outcomes (knowledge and practices) for the teachers,administrators, and others involved. These outcomes can have an impact on studentlearning. "Although the relationship between professional development and improvementin student learning is complex, it is not random or chaotic" (Guskey, pp. 76-77).Furthermore, according to both Guskey and Medina et al. (2000), information about studentlearning provides a feedback loop to professional developers as they design and deliver PD.

Theoretical models of the relationship of PD to teacher and student learningoutcomes have implications for PD evaluation. Guskey (2000, 2002) proposed a model forevaluating PD that is hierarchical in nature. According to this model, PD evaluation shouldmove from the simple (reactions of participants), to the more complex (student learningoutcomes), with data from each level building on the previous. His model consists ofevaluation at five levels: (1) participant reactions; (2) participant learning; (3) organizationsupport and change (4) participant use of new knowledge and skills, and (5) studentlearning outcomes. Each level addresses a certain set of questions and informs the types ofevaluation data to be collected. At Level 1, evaluators ask if the participants liked the PDthrough questionnaires and interviews. For Level 2, evaluators ask if participants learnedthe intended knowledge and skills. They gather evidence through tests, presentations, andartifacts. At Level 3, evaluation is concerned with the impact on the organization (i.e., theschool, school district, institution of higher education). Questionnaires, interviews, andartifacts can provide such information. Level 4 focuses on how participants apply theirnew knowledge and skills in their classrooms and schools. Evaluators use questionnaires,interviews, written reflections, and observations as evidence. Level 5 is concerned with theimpact of PD on student achievement, performance, attitudes, and self-efficacy. Evidenceof PD impact on students comes from school records, questionnaires, interviews, and

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158150

artifacts such as tests. Guskey claimed that good PD evaluation addresses all five levels,and by doing so, meets the criteria established by the Joint Committee on Standards forEducational Evaluation (1994), which include standards of utility, feasibility, propriety, andaccuracy.

Table 5: External Evaluation Data by Guskey's PD Characteristics

PD Characteristics (Guskey, 2000) MDHE External Evaluation Team DataContent (what—knowledge, skills, andunderstandings the PD aims to develop)

• Artifacts: RFP, proposals, agendas,handouts

• Interviews with professionaldevelopers and teachers

Context (who, when, where, and why) • Demographic Questionnaire• Interviews• Observations

Process (how—types and forms of PD,including follow up)

• Artifacts: proposals, agendas• Observations• Satisfaction Survey• Interviews

As we reflected on our evaluation processes and products in light of Guskey's model(2000), we realized that our approach extended the model. First, we characterized thecontent, context, and process variables associated with each project. Table 5 displays howdata that we collected map onto each of Guskey's categories of PD characteristics.Second, we analyzed how our data collection tools measured outcomes as specified byGuskey's (2000) PD evaluation hierarchy. Table 6 displays the levels of the Guskeyevaluation model along with the data collection strategies employed by Cycle 1 internaland external evaluators. (Internal evaluation activities were unique to each project; theitems included in this column were those that individual projects designed andadministered.)

Table 6: Evaluation Data by Level of Guskey (2000) Hierarchy

Guskey Levels Project Internal Evaluator MDHE External Evaluator

1. Participants' reactions • Project data collection—attitudes, dailyevaluation, etc.

• Satisfaction Survey• Observations and

interviews

2. Participants' learning • Content knowledge tests • Satisfaction Survey• Interviews

3. Organization support andchange

• Interviews

4. Participants' use of newknowledge and skills

• Classroom observations • Interviews• Satisfaction Survey

5. Student learning outcomes • Teacher-made or project-made tests

• Standardized tests

• State standardized tests

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 151

We generated individual profiles to synthesize the data collected from each of thesesources into a coherent picture of each project. When we mapped the sections of theproject profiles onto Guskey's hierarchical evaluation model (see Table 7), we foundinadequacies both in Guskey's model and in our evaluation design. First, we realized thatGuskey's levels of evaluation applied only to two sections of our profiles, Quality ofImplementation, which Guskey called "participants' reactions," and Outcomes, whichencompass Levels 2, 4, and 5 of his model. Four sections of our profiles could not bemapped directly to Guskey's evaluation hierarchy: Program Background, Program Design,and Participants and their Schools and Recommendations. We found, instead, that Guskeypresented these components in his theoretical model of PD - in the content, context, andprocess variables that influence the quality of the PD, and in the feedback loop that informsnew PD design - but did not insert them into his model of PD evaluation. This mappingtask also demonstrated that our PD evaluation neglected one of Guskey's levels, Level 3,"organization support and change."

Our PD evaluation approach has led us to modify Guskey's PD evaluation model.We attempted to determine PD outcomes and characterize the complex setting in which PDoccurred. The individual profiles helped us capture not only the levels of Guskey'shierarchy (Guskey, 1997, 2000; Guskey & Sparks, 1991 1996), but also his ContentCharacteristics, Context Characteristics, and Process Characteristics (Guskey, 2000).

Table 7: Project Profile and Guskey Models

Project Profile Section Guskey Theoretical and Evaluation Model ComponentsI. Program Background Theoretical Model: Content, Context

II. Program Design Theoretical Model: Content, Process

III. Participants and their Schools Theoretical Model: Context

IV. Quality of Implementation Evaluation Model Level 1. Participants' reactions

V. Outcomes Evaluation Model Level 2. Participants' learningEvaluation Model Level 4. Participants' use of new knowledge andskills

Evaluation Model Level 5. Student learning

VI. Recommendations Theoretical Model: Feedback Loop

We offer a modification of Guskey's model of PD evaluation that takes both projectcharacteristics and outcomes into account (see Figure 2). The project characteristics forman outer circle around outcomes, indicating the importance of context in understanding PDoutcomes. The outcomes are contained within a permeable membrane, indicating thefeedback that takes place back and forth between project characteristics and outcomes. Wehave also modified Guskey's hierarchical arrangement of outcomes to demonstrate aslightly different set of connections among the levels. Our model highlights the importanceof understanding content, context, and process in order to interpret outcomes more fully.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158152

Figure 2: An Adaptation of Guskey's (2000) Model for EvaluatingProfessional Development Programs.

Conclusion

Evaluation Research and Accountability

Outcomes typically take the spotlight in today's era of accountability. Our PDevaluation experience provides evidence that educators, evaluation teams, andpolicymakers must also consider the setting in which outcomes arise. In retrospect, ourapproach to evaluation is akin to the notion of "holistic accountability" (Reeves, 2002).According to Reeves, evaluative information must be meaningful, constructive, andcomprehensive and consider both effects and causes. In the context of PD evaluation,holistic accountability involves the interpretation of student achievement data in associationwith information about the implementation of the PD project and teacher learningoutcomes. Instead of limiting the evaluation to context-free collections of numbers, weused both quantitative and qualitative data sets to construct individual project profiles andconduct cross-project analyses. Our "holistic profiles" of individual PD projects resulted incontextualized insights and analyses. The results provided program information thatsatisfied the MDHE's interests in accountability and was useful to PD project directors.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 153

Our emphasis on individual project profiles for organizing and subsequentlyinterpreting evaluation data reflects an effort on the team's part to balance accountabilityand understanding in the execution of the program evaluation. Evaluation standards (andgeneral logic) suggest that understanding is as important to improving professional practiceas accountability. Trying to understand the where and why of observed results is madepossible through explicit measurement and analysis of contextual variables. Our work andfindings are consistent with and support Chatterji's (2004) proposed extended-term mixed-method evaluation designs for education. He advocated viewing programs through a life-cycle lens and argued that appropriate evaluation methodology varies with the length oftime that an intervention has been in place. Chatterji's basic premise is that field-basedexperiments focused on comparing outcomes from educational interventions areappropriate for evaluation studies only after important process and context factors havebeen identified and studied. His argument is that early evaluation research documentingcontext, processes, and inputs and their potential roles in mediating outcomes are importantprecursors to designing and implementing sound field-based experiments. Our continuinginvolvement as the external evaluators for Improving Teacher Quality Grants and theincreasing emphasis by the funding agent on partner relationships between PD providersand the schools and districts that receive PD services will provide us with the opportunityto apply this perspective to our evaluation design.

Next Steps

Our team continues to serve as external evaluators for MDHE's Improving TeacherQuality Grants program. Our analysis in Cycle 1 relative to PD evaluation models such asGuskey's (2000) has influenced both our conceptions of evaluation as well as our methods.Changes in the MDHE RFP for projects in subsequent cycles also provided a need toextend our evaluation practices and our project profile reports. We have added new datasources to broaden our understanding of PD outcomes. For example, we included newinstruments to measure outcomes at Level 2 of the Guskey model, participants' learning.These instruments aim to capture changes in teacher pedagogical knowledge and beliefsabout inquiry-based instruction. In subsequent cycles, we addressed Guskey's Level 3,organization change and support, a level of Guskey's model that we did not adequatelyaddress in Cycle 1. We targeted K-12 organizational change through additional interviewquestions and higher education organizational change through a Higher Education ImpactSurvey that asked about the impact of the PD project on science and education curricula atthe university level. We also extended data collection on Guskey's Level 4, participants' useof new knowledge and skills. We believe that these new data provide us with a clearerpicture of changes across Guskey's levels.

As we progress with our PD evaluation efforts, we are guided by our modification ofthe Guskey evaluation model. No doubt our approach to PD evaluation will evolve andimprove as we continue to learn about the relation of the model's outer circle of PDcharacteristics to its inner circle of PD outcomes (see Figure 2) in understanding theeffectiveness of PD.

We have come to see evaluating PD much like scoring ballroom dance events.Professional developers try to fit their dance movements to different partners throughout

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158154

the new steps and changing tempos of a PD project. Evaluators observe their progress andrate their performance. However, success on the dance floor goes beyond the properexecution of the steps; dance scorers try to capture the entire milieu: the costumes andeffect of the dancers, and the connection they make to their surroundings and to therhythms and melodies of the band. Similarly, PD evaluation must take into account bothcontext and outcomes in order to understand the successes and failures and provide usefulfeedback to improve subsequent dances. We believe that project profiles as a dataorganization and reporting tool better capture some of this complexity.

Acknowledgment

We would like to acknowledge the contributions of other members of our evaluation team whoassisted with data collection and analysis: David Barker, Shannon Dingman, Dave Mathys, KusalinMusikul, Dorothy Belsky, and Nichole Radman.

Notes

1. A previous version of this article was presented at the annual meeting of the American EducationalResearch Association, April, 2005, Montreal, Quebec, Canada.

2. In 2002, with the reauthorization of the US Elementary and Secondary Education Act as the NoChild Left Behind Act (http://www.ed.gov/nclb/landing.jhtml), funds for PD became available to thestates through Title II, Part A, Improving Teacher Quality State Grants Program.

References

Abell, S., Cole, J., Ehlert, M., Lannin, J., Marra, R., et al. (2004). Missouri Department of HigherEducation Improving Teacher Quality Grants Cycle 1 external evaluation report. Columbia, MO:Southwestern Bell Science Education Center. Accessed on 5/11/05 fromhttp://www.pdeval.missouri.edu/results.html

Birman, B.G., Reeve, A.L., & Sattler, C.L. (1998). The Eisenhower Professional DevelopmentProgram: Emerging themes from six districts. Washington, DC: U.S. Department of Education.

Blackwell, P.J. (2004). Putting the system together: Lessons learned by the Eisenhower InitialTeacher Professional Development Programs. Action in Teacher Education, 25 (4), 38-47.

Boyd, S.E., Banilower, E.R., Pasley, J.D., & Weiss, I.R., (2003). Progress and pitfalls: A cross sitelook at local systemic change through teacher enhancement. Chapel Hill, NC: Horizon Research Inc.Accessed 3/28/05 from http://www.horizon-research.com/reports/

Chatterji, M. (2004). Evidence on "What Works": An argument for extended-term mixed-method(ETMM) evaluation design. Educational Researcher, 33 (9), 3-13.

Darling-Hammond, L., & Youngs, P. (2002). Defining "highly qualified teachers": What does"scientifically-based research" actually tell us? Educational Researcher, 31 (9), 13-25.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 155

Eisenhower Network Evaluation Committee (2004). Evaluation report of the National Network ofEisenhower Regional Consortia and Clearinghouse: FY 2003 (October1, 2002 – September 30, 2003).Washington, DC: National Network of Eisenhower Regional Consortia and Clearinghouse.

Flora, D.B., & Panter, L.L. (1999). Technical report: Analysis of the psychometric structure of theLocal Systemic Change surveys. University of North Carolina at Chapel Hill: Thurstone PsychometricLaboratory.

Garet, M., Birman, B., Porter, A., Desimone, L., & Herman, R., with Yoon, K.S. (1999).Designing effective professional development: Lessons from the Eisenhower Program. Washington, DC:U.S. Department of Education.

Garet, M.S., Porter, A.C., Desimone, L., Birman, B.F., & Yoon, K.S. (2001). What makesprofessional development effective? Results from a national sample of teachers. American EducationalResearch Journal, 38, 915-945.

Guskey, T.R. (1997). Research needs to link professional development and student learning.Journal of Staff Development, 18 (2), 36-40.

Guskey, T.R. (2000). Evaluating professional development. Thousand Oaks, CA: Corwin.

Guskey, T.R. (2002). Does it make a difference? Evaluating professional development.Educational Leadership, 59 (6), 45-51.

Guskey, T.R., & Sparks, D. (1991). What to consider when evaluating staff development.Educational Leadership, 49 (3), 73-76.

Guskey, T.R., & Sparks, D. (1996). Exploring the relationship between staff development andimprovements in student learning. Journal of Staff Development, 17 (4), 34-38.

Ingersoll, R.M. (1999). The problem of underqualified teachers in American secondary schools.Educational Researcher, 28 (2), 26-37.

Ingvarson, L., Meiers, M., & Beavis, A. (2005). Factors affecting the impact of professionaldevelopment programs on teachers' knowledge, practice, student outcomes and efficacy. Education PolicyAnalysis Archives, 10 (13), 1-26.

Jauhiainen, J., Lavonen, J., Koponen, I., & Suonio, K.K. (2002). Experiences from long-term in-service training for physics teachers in Finland. Physics Education 37, 128-134.

Joint Committee on Standards for Educational Evaluation (1994). The program evaluationstandards (2nd ed.). Newbury Park, CA: Sage.

Linn, R.L. (2003). Accountability: Responsibility and reasonable expectations. EducationalResearcher, 32 (7), 3-13.

Loucks-Horsley, S., Love, N., Stiles, K.E., Mundry, S., & Hewson, P.W. (2003). Designingprofessional development for teachers of science and mathematics (2nd ed.). Thousand Oaks, CA: Corwin.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158156

Medina, K., Pollard, J., Schneider, D., & Leonhardt, C. (2000). How do students understand thediscipline of history as an outcome of teachers' professional development? (Report of the "Every Teacher aHistorian" project). Davis, CA: Regents of the University of California.

Missouri Department of Higher Education (2002). Improving teacher quality state grants. Requestfor proposals. Jefferson City, MO: Author.

Missouri Department of Higher Education (2003). Evaluation of professional development projects.Request for proposals. Jefferson City, MO: Author.

National Council of Teachers of Mathematics (1991). Professional standards for teachingmathematics. Reston, VA: Author.

National Network of Eisenhower Regional Consortia and Clearinghouse (2004). What experiencehas taught us about collaboration. Washington, DC: Author. Accessed on 3/28/05 fromhttp://www.mathsciencenetwork.org/collaboration.pdf

National Research Council (1996). National science education standards. Washington, DC:National Academy Press.

Olson, L. (2000, January 13). Finding and keeping competent teachers. Quality Counts 2000.Education Week on the Web. [On-line]. Accessed 3/28/05 from www.edweek.org

Porter, A., Garet, M., Desimone, L., Yoon, K S., & Birman, B. (2000). Does professionaldevelopment change teaching practice? Results from a three-year study. Washington, DC: U.S. Departmentof Education.

Reeves, D. (2002). Holistic accountability: Serving students, schools, and community. ThousandOaks, CA: Corwin.

Shaha, S.H., Lewis, V.K., O'Donnell, T.J., & Brown, D.H. (2004). Evaluating professionaldevelopment: An approach to verifying program impact on teachers and students. Journal of Research inProfessional Learning, 1, 1-18.

Shepardson, D.P., Harbor, J., Cooper, B., & McDonald, J. (2002). The impact of a professionaldevelopment program on teachers' understanding about watersheds, water quality, and stream monitoring.The Journal of Environmental Education, 33 (3), 34-40.

Sparks, D. (2002). Designing powerful professional development for teachers and principals.Oxford, OH: National Staff Development Council.

Supovitz, J.A., & Turner, H.M. (2000). The effects of professional development on scienceteaching practices and classroom culture. Journal of Research in Science Teaching, 37, 963-980.

Thompson, C.L., & Zeuli, J.S. (1999). The frame and the tapestry: Standards-based reform andprofessional development. In L. Darling-Hammond & G. Sykes (Eds.), Teaching as the learning profession:Handbook of policy and practice (pp. 341-375). San Francisco: Jossey Bass.

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158 157

US Department of Education (2002). Meeting the highly qualified teachers challenge: TheSecretary's annual report on teacher quality. Washington, DC: US Department of Education, Office ofPostsecondary Education, Office of Policy, Planning, and Innovation.

The Authors

SANDRA K. ABELL is professor of science education and director of the ScienceEducation Center at the University of Missouri-Columbia. Her research focuses on teacherlearning of both science and pedagogy throughout the professional continuum. She alsohas published extensively on inquiry-based science instruction.

JOHN K. LANNIN is an assistant professor of mathematics education at the University ofMissouri-Columbia. His research focuses the development of algebraic reasoning with K-12 students. He also has published on teacher learning of both mathematics and pedagogy.

ROSE M. MARRA is an associate professor in the School of Information Science andLearning Technologies at the University of Missouri. Her research interests includeevaluation research, engineering education, gender equity issues, the epistemologicaldevelopment of college students, and promoting meaningful learning in web-basedenvironments.

MARK W. EHLERT is a research analyst in the department of economics at the Universityof Missouri-Columbia. He has spent more than 20 years working with schools andcompiling and analyzing education data. Other research interests include teacher supplyand demand, financial aid in higher education, correlates of student achievement, and valueadded assessment.

JAMES S. COLE is a research associate at Indiana University's Center for PostsecondaryResearch. His research focus is on college student engagement, achievement motivation,test-taking motivation, and assessment.

MICHELE H. LEE is a student in the science education doctoral program at the Universityof Missouri-Columbia. Formerly, she worked as an elementary and high school scienceteacher, professional developer for Project 2061/American Association for theAdvancement of Science, and instructor at The Johns Hopkins University. Her presentresearch focuses on science teacher education and policies.

MEREDITH A. PARK ROGERS is an assistant professor of science education at IndianaUniversity – Bloomington. Her research focuses on elementary teacher learning of inquiry-based science pedagogy and the role of science in designing and implementinginterdisciplinary curriculum. She has also been actively involved in professionaldevelopment and program evaluation.

CHIA-YU WANG Chia-Yu Wang is a doctoral candidate in science education at theUniversity of Missouri-Columbia. Her research interests include science teacher learning;

S.K. Abell et al. / Studies in Educational Evaluation 33 (2007) 135–158158

technology in science teacher education; and student understanding of chemicalrepresentations.

Correspondence: <[email protected]>