wyoming accountability advisory committee scott marion & chris domaleski center for assessment...

Designing a Statewide System for Measuring Teacher and

Leader Effectiveness

Wyoming Accountability Advisory Committee

Scott Marion & Chris DomaleskiCenter for Assessment

June 14, 2012

Center for Assessment. WY Accountability Advisory Committee (6/14/12)

2

Some background Outline key decisions for creating educator

evaluation systems Our purpose today is to highlight some of the

key decisions we will need to make through the interim

We’ll be asking a lot more questions than providing answers, but we will need to answer these questions in order to move forward…

A process note: Given the number of people on the WEBEX/call, I will pause at specific places in the presentation to respond to questions.

Overview of presentation…


3

Wyoming, like an increasing number of states, intends to revise its teacher and leader evaluation practices

Educator effectiveness will be determined “in part by student achievement”

This enterprise holds great promise, but also presents real challenges

We are fortunate to be able to build off of the work in many other states. We are closely involved in:◦ CO, RI, NH, GA, PA, UT, NYC, HI, LA

Introduction


4

Why the interest in new forms of teacher evaluation?

Nobody doubts the critical influence of teacher quality on student achievement

Current (traditional) evaluation systems rarely identify either highly effective or ineffective teachers

Rationale


5

From Aspen Report and our experience:◦ Vision and Goals◦ State-Local Roles and Responsibilities◦ Theory of Action◦ General Evaluation Model

Coherence◦ Specific Measurement Model(s)

Attribution rules Combining multiple measures

◦ Information Requirements◦ Capacity Requirements◦ Reporting & Communication◦ Consequences & Support◦ Monitoring and Evaluation

Key Decisions


6

What is the vision and what are the guiding principles of the system we will design?

For example, will the system be designed to identify and “council out” low quality educators or is it designed primarily to improve the performance of the majority of educators?

Goals and key principles


7

The primary purpose of the system is to maximize student learning The system is designed to maximize educator development by

providing specific information, including appropriate formative information that can be used to improve teaching quality.

Local instantiations of the State Model system must be designed collaboratively among teachers, leaders, and other key stakeholders such as parent and students as appropriate. Individual educators will have input into the specific nature of their evaluation and considerable involvement into the establishment of their specific goals.

The effectiveness rating of each educator shall be based on multiple measures of teaching practice and student outcomes including using multiple years of data when available, especially for measures of student learning.

The Model system is designed to ensure that the framework, methods, and tools lead to a coherent system that is also coherent with the developing NH Leader Evaluation System.

The Model system shall be applied by well trained leaders and evaluation teams using the multiple sources of evidence along with professional judgment to arrive at an overall evaluation for each educator.

Excerpt rom NH’s draft system


8

What will be the “reach” of the state in defining local systems?

What factors must be considered in this decision?◦ Comparability/portability vs. flexibility◦ Support and capacity building◦ Oversight and monitoring◦ Required Framework, “State Model” or State-required

system

We are proceeding here with the assumption that there will at least be a state required framework?

Major policy decisions


9

Grounds our design Clarifies the assumptions, purposes, and goals

of the system Specifies the various indicators and

mechanisms by which the system will fulfill its purposes (and minimize unintended negative consequences)

Serves as a framework for evaluation

The ToA on the following slide is oversimplified and somewhat naïve, but it is what is driving much of the policy. We’ll be working with more complex and honest ToAs as we do our work.

A Theory of Action…

Center for Assessment. WY Accountability Advisory Committee (6/14/12) 10

A Simplified Theory of Action for Reformed Educator Evaluation Systems

Measures of Educator

Effectiveness and

Evaluation Processes

Hiring

Placement

Career Ladder

Compensation

Dismissal

Professional Development Student

Outcomes Improve


Basic Structure of a Theory of Action

Assumptions or

Antecedents

Activities and

Mechanisms

Proximal Indicators

Intermediate Indicators

Activities and

Mechanisms

Distal Indicators (Intended Outcomes)

Consequences


12

Let’s look at a more reasonable approximation for an improvement-based educator evaluation system

Theory of Action

Simple ToA for an “improvement” system

13

Student Learning Improves

Focuses educators’

attention on productive practices

Educator evaluation

system


Results are used to improve

instruction

Student performance

is well measured

Evaluation results

improve

Thinking Through a Theory of ActionPolicy makers should have to very explicitly say

why and how implementing test-based approaches to support educator effectiveness for these grades and subjects will lead to improved educational opportunities for students For example, one might postulate that holding

teachers accountable for increases in student test scores on classroom-based assessments will lead to the development of both better assessments and improvements in student learning.

What are the specific mechanism(s) by which the intended outcomes will occur? E.g., targeted instruction, better PD, and/or more

appropriate curricular materials?

14Center for Assessment. WY Accountability Advisory Committee (6/14/12)


15

What will be the major components of our system?◦ Measures of teacher practice◦ Measures of student performance

◦ Student voice?◦ Peer input?◦ Other?

How will these be combined and weighted? How will these classes of indicators be

integrated to form a coherent picture?

The General Evaluation Model


16

Involves ensuring that the school accountability and educator accountability systems are sending similar messages to schools and stakeholders

It would make sense to use data from the school accountability system to augment information from the educator system

Further, it would also make sense to integrate the various components of the educator evaluation system to avoid a silo effect

Coherence


17

The following slides present some of the key decisions related to measurement model that will need to be made as we proceed?

As you know, the “devil is in the details” and there are many details with which to contend.

This is even more complicated when trying to reconcile and be clear about the state role

Specific Measurement Model


18

What are the indicators that operationalize the knowledge & skills that define educator practice? For example, domains from Danielson’s Framework for Teaching include:

Planning and Preparation The Classroom Environment Instruction Professional Responsibilities

◦Should these be the default “standards of professional practice” or should WY adopt more general standards (e.g., ISLIC, NC,CO) or leave it up to districts?

Measures of Educator Practice


19

Whatever standards are selected/developed, how shall they be measured?◦ Classroom observations?◦ Document (artifact) analysis?◦ Structured interviews?◦ Professional portfolios?

What about required data collection strategies and protocols (e.g., 4 observations/year)?

What are the expected levels of performance on the various indicators?

What about observer training and certification?

Measures of Educator Practice


20

Student Performance Measures and Analytics

What indicators of student growth should be used for PAWS grades and content areas?

What performance (growth) indicators should be used for non-PAWS grades and content areas?◦This is a huge issue!

Should state-level measures of student growth be combined with local measures of student performance for each educator determination? If so, how?


21

What analytic approach (model) will be used for analyzing State test data? ◦ What are the technical and policy issues that

need to be considered in choosing a model? ◦ What are the advantages/disadvantages of using

SGPs for educator evaluation? What is the standard for ‘good enough’

growth? Should growth expectations be

“conditioned” on factors other than prior performance such as poverty, etc.?

What information should be reported to whom and at what level?

Student Performance: Analyzing Growth

Mapping educators to standards, assessments & growth (Lee, 2010, based on preliminary data from MA DOE)

No curriculum framework

(25%)

HS Electives

Pre-K – 2

Special Education

ELA and Math 4-8 Self Contained Classes and

Middle School Subject teachers

Growth Direct(16%)

Growth Indirect(17%)

teachers

Curriculum Framework but no Assessment

(32%)

Assessment, but no growth

(10%)

Music

Drama

Visual Arts

3rd Grade Teachers

MS & HS School STE Teachers

K-12 ELL Teachers (MEPA)

Voc Ed

7th Grade History

Teachers*

* HSS Tests have been suspended

Gr 10 & 11 US History*

8th & 12th Grade History & Social

Science

Specialists K-2, 11&12

Reading Specialists (4-8)

Gr. 11 & 12 STE, ELA

& Math

K-4 Reading using DIBELS &

Grade **

9 & 10 ELA and

Math

Foreign Language

Phys Ed

Health

Special

Education 4-10

MS & HS Computers

Business& Mkting

Spring 2010 Robert Lee, Massachusetts

ESE

**These teachers have

not been linked yet

AP and IB Teachers**

AdminStaff


The Non-Tested Challenge

Center for Assessment. WY Accountability Advisory Committee (6/14/12)23

Lack of high quality measures of student performance, particularly for the purposes for which they are being used

Limitations of analytical options for calculating educator contributions to student performance

Comparability concerns Lack of technical capacity at the local and even state levels Lack of predictable course sequences Not enough time Not enough money Too much policy pressure (e.g., 50%) Huge risk of corruption Challenging issues of attribution

Many of these are challenges for tested as well as non-tested, but may be exacerbated for non-tested subjects and grades

All Educators in NTSG are Not the Same


Instead of dealing with each individual case, it makes sense to create an approach for addressing categories of educators

The general categorization can occur at the state level and should be fine-tuned at the district or even school level

One classification approach is based on the data available for the various groups of educators

The following excerpt of a chart, created for Colorado, provides examples of the nominal types of educators that would fall into the different data categories

25

Personnel defined by end of year state summative assessments available

Personnel Type (Examples)

Personnel teaching a core subject area where end of year state assessments measuring content taught in their subject area are available in two adjacent grades

Grades 4 -10 core subject teachers for literacy and math

Interventionists/specialists with shared responsibility with core subject teachers for improving literacy/numeracy skills of students in grades 4-10 (e.g., RTI specialists, ELA, special education teachers)

Personnel teaching in a core subject area where an end of year state summative assessment is available to measure content taught in their classrooms.

Science teachers (currently, grades 5,8 and 10) and grade 3 teachers with end of year summative state assessments available for their respective grade

Personnel teaching in a core subject area where no end of year state summative assessments are currently available to measure content taught in their classrooms.

Core subject teachers in the sciences (with the exception of grades 5, 8 and some personnel for10) and social studies. All ECE, grades K-2 and grades 11-12 teachers.

Resource teachers/specialists with instructional responsibility not directly linked to literacy/numeracy skills of students (e.g., music, arts, and P.E. teachers)

Personnel with no direct instructional responsibilities Resource teachers/specialists with indirect (non-instructional) responsibility for improving literacy/numeracy skills of students (e.g., social workers, psychologists, and school nurses).

Comparability What do we mean by comparability in this

context?◦ Educators within the units of analysis are held to

similar levels of expectations, at least in some relative sense

◦ For example, it would be a threat to the system if the teachers in grades 4-8 reading and math received noticeably lower ratings than the rest of the teachers (NTSG) in the school

At what levels is comparability important?◦ Within schools? Clearly yes.◦ Within districts? Probably yes.◦ Within states? It would be nice, but it might be too

high of a bar right now.


What Measurement Approaches Are Being Proposed?

1. Norm-referenced tests (NRTs)2. Commercial interim assessments3. State or district created end-of-course exams

(both externally and locally developed)a. Includes new assessment development in places like

DE, CO, Hillsborough, FL

4. School or teacher-developed measures of student performance

a. Often includes Student Learning Objectives

*Note: 1 & 2 rarely cover courses beyond the core content areas and even then, not well in HS.


Analytic Approaches


If you thought the measurement/assessment issue was daunting….

It pales in comparison to the analytic challenges (i.e., how growth is calculated at local levels)

Remember, using the most sophisticated VAM models with high quality state test data has been rightfully questioned based on challenges with causal inferences, unreliability (year-to-year), and other technical issues (e.g., EPI report, Braun, et al., 2010, Rothstein, 2009 & 2010)

What Approaches Are Being Proposed for NTSG?


1. Growth models using pre and post test from the same subject

2. Value-added modelsa.Pre and post test score in the same subject b.Conditioned on data other than pretest from same

content area as posttest

3. Student Growth Percentiles4. Shared attribution of aggregate

growth/VAM results5. Student learning objectives (SLO)

Definitions Growth refers to measures of performance for the same

students at two or more points in time and requires a common, often vertical, scale to evaluate the magnitude of change. Only true growth model here.

VAM: Generally describes multivariate models that include certain variables to produce to an expectation against which actual performance is evaluated.

Student Growth Percentiles (SGP) is a regression based measure of growth that works by evaluating current achievement based on prior achievement and describing performance (using percentiles) relative to other students with the “same” prior achievement histories.

Student Learning Objectives (SLO) is a general approach (often called Student Growth Objectives) whereby educators establish goals for individual or groups of students (often in conjunction with administrators) and then evaluating the extent to which the goals have been achieved.



31

Attribution: linking educator behavior to student outcomes◦Assigning accountability

Multiple educators contribute to instruction “Contact time” requirements—how long does

the student need to be in the teacher’s classroom to count

◦Opportunity to employ shared attribution strategies Must be tied to local theories of action or

theories of improvement

Attribution


32

Combining Multiple Measures How should we arrive at an overall

judgment of educator effectiveness?◦ Weighting of student performance and knowledge

& skills What are the different types of information

that should be employed when evaluating principals compared with teachers?◦ We know the specific indicators and even

standards will differ Who should be responsible for making these

overall judgments?

33

Data system requirements to link students with teachers at the state level

Data system requirements to manage the data at the local level

Dealing with student mobility Dealing with missing data, especially

non-random missing data “Full academic year” rules


Information Requirements

34

How will this be managed at the state level?◦ Data, information, and analytics◦ Reporting and communication◦ Support and capacity building◦ Training and monitoring

How will this be managed at the local level?◦ Capacity for implementation

Conducting observations, document analysis, etc Induction, mentoring, and support Training Record keeping Reporting and feedback Decision making and appeals


Capacity Requirements

35

How will results be communicated to educators to improve practice?

How will information about the system be communicated to the public and policy makers while protecting educators?


Reporting & Communication


36

What sanctions, rewards, and/or consequences are appropriate to advance prioritized outcomes?

What strategies will be employed to use information to support schools/ teachers/ students?

Is there capacity in the state (in the districts) to improve educator quality in WY?

What resources will be required for this improvement to occur?◦ Where will they come from?

Consequences & Support


37

Negative Consequences As we consider the design and implementation of

WY’s new educator evaluation system, we must be mindful that the likelihood of getting this wrong (i.e., leading to unintended negative consequences) are at least as high as the chances of getting it right (i.e., improving teacher quality and student learning)

Unintended consequences could include:◦ Narrowing curriculum◦ Competition vs. Cooperation◦ Assignment of students or teachers to selected classes

for reasons unrelated to educational benefit◦ Educator transition◦ Educator attrition

Campbell’s Law

38

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (emphases added)

http://en.wikipedia.org/wiki/Campbell%27s_Law

Educator accountability systems will invite significantly more implicit and explicit corruption than has been seen with school accountability


http://en.wikipedia.org/wiki/Campbell's_Law


39

What types of formative evaluation approaches need to be put in place to monitor implementation and consequences?

Evaluate claims in theory of action Evaluate impact

◦ Establish criteria to determine if results are reasonable

Develop methods and standards to assess the precision and stability of results

Does the system meet important utility criteria?

Monitoring and Evaluation


40

How should we plan our work going forward?

Who’s going to do what? How will we work?

Goals for next meeting…

Next steps…

wyoming accountability advisory committee scott marion & chris domaleski center for assessment...

Documents

evaluation center

state model system

educator evaluation

following slide

coherent system

evaluation processes

evaluation teams

overall evaluation