draft: october 17, 2012 introduction€¦ · draft: october 17, 2012 introduction ... decisions...

DRAFT WY Educator Evaluation Framework. Advisory Committee October 17-18, 2012 1

THE WYOMING FRAMEWORK FOR EDUCATOR EVALUATION

DRAFT: October 17, 2012

Introduction

The Advisory Committee to the Wyoming Select Committee on Educational Accountability was

charged with carrying out the recommendations put forth in the Wyoming Accountability in

Education Act of 2012 (WEA 65). The specific charge for the Advisory Committee was to

design a framework for educator evaluation in Wyoming. The Select Committee was quite clear

that they wanted a balance between state and local control and, in keeping with Wyoming’s

educational philosophy, the Select Committee placed considerable authority for making specific

design and implementation decisions with local educational leaders and teachers. However, in

order to best support the work of districts, the Advisory Committee produced this document: A

Framework of Educator Evaluation in Wyoming. This framework outlines methods and design

decisions necessary for implementing an educator evaluation system and indicates where the

Advisory Committee recommends where the requirements should be “tight” or more

standardized across districts and where flexibility is expected and even encouraged. The

Advisory Committee intends for the Framework described below to be able to be used by

districts as the basis for their local systems if they choose. The Framework will not be “plug and

play” in that local districts will still have many decisions to make to operationalize their local

system, but the Framework is designed to make districts’ jobs considerably easier.

A critical aspect of the framework, as reflected below in the key principles, is the intention to

build both an internally coherent system and an educator evaluation system that is coherent with

other educational accountability systems in Wyoming. A coherent system would use

information from the school accountability system (and perhaps the district accreditation system)

to supplement the information generated from the educator accountability systems. For example,

if a school has demonstrated high achievement and the students are growing at admirable rates,

there is good evidence of high quality education in the school, which therefore suggests we can

trust that the educators in the building are performing well. Relying on the larger sample sizes

associated with the school than any individual teacher means that the determinations are that

much more reliable. This intent to build off of the information from the school accountability

does not relieve school districts from implementing educator evaluation systems, but it could

mean that the state would have to provide far less oversight of educator evaluation systems in

high performing schools.


Key Principles

The following principles guided the development of the Wyoming Framework Educator

Evaluation system. The Advisory Committee kept these principles at the center of its

deliberations in the development of the various components of the system and are at the heart of

the recommendations discussed throughout this document. As noted below, the primary purpose

of the system is to maximize student learning and improvements in student learning. The system

must maintain the focus on student learning and all of the following principles support this

primary purpose.

1. The primary purpose of Wyoming’s educator evaluation and the reason for engaging

in this work is to support and promote increases in student learning in Wyoming

schools.

2. The system must be designed coherently to support a system of continuous school

improvement. A coherent system will work seamlessly with the school and leader

accountability systems and foster collaboration among educators, administrators, and

other stakeholders.

3. The Framework and locally-aligned versions of the system shall be designed to

promote opportunities for meaningful professional growth of educators. As such, the

system must be designed to provide specific and timely feedback on multiple aspects

of professional practice and student learning. A feedback-oriented system must be a

continuous improvement process and not a one-time event.

4. The system must be designed and implemented with integrity. Doing so will offer a

positive and far-reaching vision for education as a profession, one built on respect,

caring, and fairness. A system designed with integrity will be transparent such that

all relevant participants clearly understand the expectations.

5. The Framework must allow for flexibility to best fit local contexts and needs. The

local evaluation systems should be design collaboratively by administrators and

educators, with input gathered from parents and community members.

6. The system will provide credible information to support hiring, placement, and career

ladder decisions in a technically and morally defensible manner.

Domains of a the Wyoming Educator Evaluation Framework

A key aspect of the Framework is that it will contain five major components, four domains of

professional practice and one domain of student performance data. The four domains of

professional practice noted below represent the overarching categories of the Interstate Teacher


Assessment and Support Consortium Model Core Teaching Standards (InTASC Standards)1.

Districts will a variety of tools to measure professional practice (e.g., Danielson’s Framework for

Effective Teaching; Marzano’s Art and Science of Teaching). The Advisory Committee does not

want to limit the options to specific tools, but recommends that all local systems measure the

four domains of effective teaching described in the InTASC Standards.

Learner and Learning

Content Knowledge

Instructional Practice

Professional Responsibility

The Advisory Committee intends for each domain, including student performance results, to be

equally valued in the overall evaluation. Further, the Framework is designed to promote

coherence and integration among the five domains. Therefore, the Advisory Committee

recommends weighting each component, especially student learning, as equally as possible in the

overall evaluation of each educator. Further, there is an important difference between nominal

(intended) and effective (actual) weights and the Advisory Committee recommends that as each

district pilots its system, it analyzes the data to determine the actual weight of the various

domains. This actual weighting will depend on the variability in the responses to the specific

instruments used in each district. In the following sections, the major components of the

Framework are discussed in more detail.

Standards of Professional Practice

The Framework uses InTASC Standards as the measurement framework for evaluating teachers

relative to the four domains of effective teaching. This recommendation is based on the research

base supporting this framework and the extensive materials available to support its use and

professional development. Local districts may adopt tools or approaches to add more specificity

to the InTASC Standards, but the Advisory Committee recommends requiring that any

framework used must document the research supporting its use and provides the specifications

necessary to support reliable and valid measurement of teacher practices. The specific InTASC

Standards, grouped by domain are presented below. For a more complete explanation of the

standards, please refer to the InTASC document reference in the footnote.

Learner and Learning

Standard #1: Learner Development. The teacher understands how learners grow and develop,

recognizing that patterns of learning and development vary individually within

1 Council of Chief State School Officers. (2011, April). Interstate Teacher Assessment and Support Consortium

(InTASC) Model Core Teaching Standards: A Resource for State Dialogue. Washington, DC: Author.

http://www.ccsso.org/Resources/Programs/Interstate_Teacher_Assessment_Consortium_(InTASC).html

http://www.ccsso.org/Resources/Programs/Interstate_Teacher_Assessment_Consortium_(InTASC).html


and across the cognitive, linguistic, social, emotional, and physical areas, and

designs and implements developmentally appropriate and challenging learning

experiences.

Standard #2: Learning Differences. The teacher uses understanding of individual differences

and diverse cultures and communities to ensure inclusive learning environments

that enable each learner to meet high standards.

Standard #3: Learning Environments. The teacher works with others to create environments

that support individual and collaborative learning, and that encourage positive

social interaction, active engagement in learning, and self motivation.

Content Knowledge

Standard #4: Content Knowledge. The teacher understands the central concepts, tools of

inquiry, and structures of the discipline(s) he or she teaches and creates learning

experiences that make the discipline accessible and meaningful for learners to

assure mastery of the content.

Standard #5: Application of Content. The teacher understands how to connect concepts and

use differing perspectives to engage learners in critical thinking, creativity, and

collaborative problem solving related to authentic local and global issues.

Instructional Practice

Standard #6: Assessment. The teacher understands and uses multiple methods of assessment to

engage learners in their own growth, to monitor learner progress, and to guide the

teacher’s and learner’s decision making.

Standard #7: Planning for Instruction. The teacher plans instruction that supports every

student in meeting rigorous learning goals by drawing upon knowledge of content

areas, curriculum, cross-disciplinary skills, and pedagogy, as well as knowledge

of learners and the community context.

Standard #8: Instructional Strategies. The teacher understands and uses a variety of

instructional strategies to encourage learners to develop deep understanding of

content areas and their connections, and to build skills to apply knowledge in

meaningful ways.

Professional Responsibility

Standard #9: Professional Learning and Ethical Practice. The teacher engages in ongoing

professional learning and uses evidence to continually evaluate his/her practice,

particularly the effects of his/her choices and actions on others (learners, families,

other professionals, and the community), and adapts practice to meet the needs of

each learner.


Standard #10: Leadership and Collaboration. The teacher seeks appropriate leadership roles

and opportunities to take responsibility for student learning, to collaborate with

learners, families, colleagues, other school professionals, and community

members to ensure learner growth, and to advance the profession.

Performance Standards

All Wyoming schools, as determined by their districts, will classify all licensed personnel, as

illustrated by the Framework, as highly effective, effective, needs improvement effectiveness,

and ineffective based on data from measures of the standards for professional practice and

measures of student performance. The evaluation system will produce an overall rating for each

teacher. To arrive at an overall rating, a description of performance that characterizes the types

of knowledge, skills, dispositions, and behaviors of an “effective” teacher (as well as other

levels) must be described. Further, if there is any hope in comparable ratings across the state,

common performance level descriptors must be used. Performance standards describe “how

good is good enough” and the “performance level descriptor” (PLD) is the narrative component

of the performance standard that describes the key qualities that differentiate educators at each of

the various levels.

The InTASC Standards provide performance descriptors for each of the ten standards, but they

do not provide an overall description for various levels of teacher effectiveness. One might ask,

why not require educators to meet the requirements on each of the ten standards in order to be

classified as effective? Such a conjunctive system where candidates must meet every threshold

in order to be classified as “effective” is both unrealistic and unreliable. No Child Left Behind’s

(NCLB) Adequate Yearly Progress (AYP) system is the most recent, well known example of a

conjunctive system that leads to many unreliable in invalid decisions. Therefore, a more

compensatory approach where stronger performance in one area may offset weaker performance

in other areas is more reliable and often much more realistic. Further, hybrid systems can clearly

value important aspects of the domain while allowing some compensatory decisions elsewhere in

the system. Therefore, an educator evaluation system that results in an overall classification for

each educator must also include an omnibus description of educator effectiveness. This

definition is also critical to help guide the data collection and validity evaluation of the system.

The Framework provides PLDs for each of the four overall levels of the system. These

descriptors connect the standards for professional practice with the various data produced by the

measurement instruments used in the system. This overall description is necessary, because an

effective teacher is not necessarily a simple sum of the scores on the various

components/indicators in the system. Further, defining an effective teacher as one who is

effective on each component will establish a “conjunctive” system (e.g., NCLB-AYP) with the


potential negative consequence of having very few teachers classified as effective or highly

effective. A DRAFT PLD for an effective teacher in WY is as follows:

Effective teachers in Wyoming have the knowledge, skills, and commitments that

ensure meaningful learning opportunities for all students and high rates of

growth for most students. Effective teachers facilitate mastery of content and skill

development, and identify and employ appropriate strategies for students who are

not achieving success. Effective teachers communicate high expectations to

students and their families and find ways to engage them in mutually-supportive

teaching and learning environments. They also develop in students the skills,

interests and abilities necessary to be lifelong learners. Because effective

teachers understand that the work of ensuring meaningful learning opportunities

for all students cannot happen in isolation, they engage in collaboration,

continuous reflection, on-going learning and leadership within the profession.

The Advisory Committee (or subcommittee) should craft PLDs for each of the remaining

performance levels in the WY Framework. The Advisory Committee strongly endorses

employing a set of common performance descriptors for WY in order to promote comparable

expectations for educators across districts.

General Evaluation Framework

The general measurement framework describes the overall approach for how local districts

following the Framework would approach the data collection involved in evaluating educators.

The measurement framework follows from the key principles outlined at the beginning of this

document. There are four domains of educator practice along with evaluations based on student

achievement. The general measurement framework is tied to this overall depiction, but provides

more structure for the Framework and perhaps local instantiations of the Framework. All

evaluations, conducted using the Framework, shall include:

Professional practice measures

Multiple approaches and measures will be used to collect data on educator practices to best tailor

the data collection approaches to complex nature teaching practice. Each educator shall conduct

a self-assessment each year that will be used as the foundation of a goal setting meeting with the

principal and/or peer coach (mentor). The self assessment and collaboratively established goals

will be used to focus the professional practice data collection for the year in which the educator

is being formally evaluated. For the years in which the educator is not undergoing a formal

evaluation, the self assessment and goals shall be used to guide professional development and

formative evaluation. Data related to professional practices shall be collected using:


A focused professional portfolio used to document specific goals and artifacts related to

these goals, and

Observations of practice by educational leaders and potentially peers.

Measures of student performance

Student Learning Objectives

Student Growth Percentiles (if applicable)

The SLO and/or SGP results may be “shared” among multiple educators depending upon

local theories of action around school improvement.

As part of the general measurement approach, the Framework includes the use of multiple

measures of each domain when possible and when the use of the multiple measures improves the

validity of the evaluation decision. In addition to multiple measures, the Advisory Committee

recognizes the challenge of having enough expertise and time in any single individual to conduct

all required evaluations. Therefore, the Framework calls for the use of peer teams, in addition to

building-level administrators, to participate and advise in the evaluation process.

The Advisory Committee further recommends that at least part of the SLO and/or SGP results be

shared among multiple educators depending upon local theories of action around school

improvement.

The Professional Portfolio

The professional portfolio is a critical component of WY’s Framework and contributes data to

multiple domains of teacher practice. All educators are required to establish yearly professional

goals in consultation with their supervisor or designee and document the process and products

associated with these goals through a professional portfolio that is reviewed each year. The

Wyoming Department of Education (WDE) or other designees will produce guidance outlining

the requirements of a professional portfolio to be used as a starting point for local requirements.

The Advisory Committee recommends that each educator maintain a professional portfolio that

includes the following components:

Documentation of self assessment

Documentation of collaboratively established specific goals

A plan, including identified professional development, for achieving the goals

Includes among other things analyses of key artifacts such as student work from

specific assignments, planning documents, and assessments related to the

established goals

Self reflection at the end of the year to self evaluate the extent to which the

specific goals have been achieved


Implementation and Differentiation [Note we need to address and avoid any potential contract

issues in this section]

The Advisory Committee has been sensitive to balancing the needs of creating a valid system

with an understanding that the system or one like it must be implemented by all school districts

without creating an unmanageable burden. While many states have required the full evaluation

of every teacher every year, the Advisory Committee recognized that this would place an

impossible and inefficient burden on WY schools. Therefore, the Advisory Committee

recommends that evaluations should be differentiated according to the experience and status of

the schools’ educators. Ultimately, each district shall enact a policy and set of procedures to

differentiate evaluation systems for its different classes of educators (e.g., novice, veteran, and/or

high performing, low performing) and to the specific evaluation questions to be investigated.

Within the first three years of implementation, each educator shall undergo a full evaluation. To

the extent possible, yearly evaluations shall include multiple years of student performance

results.

Novice educators, defined as those within the first two years of the teaching profession, must be

evaluated every year until they are rated “effective” for three consecutive years. In order to be

granted professional (continuing contract) status, educators must be rated effective for three

consecutive years. These two events can happen concurrently. Districts may decide to focus

specific aspects of the evaluation for novice educators by reducing the demands of the

professional portfolio, for example.

Teachers with professional status (continuing contract) shall be evaluated every year until they

receive “effective” ratings or better for two consecutive ratings. Once these teachers receive two

consecutive effective ratings, they shall receive summative evaluations every three years. A

yearly evaluation schedule shall not be required as long as the educator continues receiving

effective or better ratings.

Specific Measurement Framework

The specific measurement framework adds the details to the general measurement framework to

guide the data collection methods in order to successfully conduct educator evaluations. Such a

detailed measurement framework would describe the type and frequency of data collection

approaches for each of the major domains. The following paragraphs briefly highlight aspects of

the specific measurement framework, organized by major domain. Subsequent work will be

required to fully describe the specific measurement procedures and policies to be enacted for the

various educators in the system.


Domain 1: Planning and Preparation

A professional portfolio shall be required as evidence of educator performance related to

Domain-1 for each educator. Given the scope of Domain 1, each educator, along with her/his

evaluator (principal) shall identify the sub-components of the Domain that will be the focus of

the evaluation for that particular year. The focal sub-domains for the given year will determine

the specific data to be included in the portfolio. For example, if one of the foci was on planning

instruction, the teacher and evaluator might agree that a series of lesson and unit plans with

structured reflections would serve as useful entries in the professional portfolio. District

evaluation teams and building administrators will need to track the focus of each educator’s

portfolio each year to ensure that the planning and preparation domain is fully represented for

each educator over time.

Domains 2 (Classroom Environment) and Domain 3 (Instruction)

These domains generally require direct observation to collect evidence of the educator’s

successful mastery of these domains. The Advisory Committee recognizes that any manageable

schedule of observations when the system becomes operational will be necessarily “thin.”

Therefore, districts must think carefully about the nature and frequency of the observations. For

example, the Advisory Committee recommends that Novice and Ineffective teachers be formally

observed at least three times each year, while Effective educators may be observed at least three

times only in the year of their evaluation and less frequently during their “off years.”

In the years that the teacher is evaluated, teachers shall be observed formally on at least three

different occasions. The general time frame/unit of instruction for the observations shall occur in

consultation with the educator, but the specific lessons observed may be unannounced. At least

one of the observations, but preferably most of them, should be tied to aspects of the curriculum

that are the focus of the SLOs. Further, the observations shall include an analysis and discussion

of relevant documents associated with the unit of study being observed. These documents may

include lesson plans, assessments, assignments, student work, and other relevant documents

associated with the teaching, learning, and assessment of the unit. To improve coherence, at

least some of these artifacts or other documents should be included in the professional portfolio.

Domain 4: Professional Responsibility

Similar to Domain 1, professional responsibility cannot be evaluated with direct observation.

The Advisory Committee separated Domain 4 from Domain 1 in this discussion because the

Framework recognizes that the nature of professional responsibility will be quite different for

novice compared with experienced teachers. The professional responsibility for a novice


educator would tend to focus more on Standard 9 (Professional Learning and Ethical Practice),

while experienced educators should be expected to provide evidence for both Standards 9 and 10

(Leadership and Collaboration). For experienced educators, defining the specific aspects of

their professional responsibilities to be evaluated is a critical aspect of their goal setting. The

specific focus of the professional responsibility will guide the required data collection and

reflection.

Domain 5: Student Performance

As stated in the first guiding principle of this Framework, the primary purpose of Wyoming’s

educator evaluation and the reason for engaging in this work is to support and promote increases

in student learning in Wyoming schools. Therefore, it is critical that the results of student

achievement be incorporated in the evaluations of all educators. While this sounds so intuitively

simple, it is one of the most complex aspects of new forms of educator evaluation. The

Wyoming Framework uses a three part approach for incorporating student achievement and

growth into evaluations in order to attempt to maximize the benefits of doing so, while striving

to minimize potential unintended negative consequences.

Student Learning Objectives (SLO) form the foundation of Wyoming’s approach for

documenting changes in student performance associated with a teacher or group of educators

and, as such, all educators will have the results of SLOs incorporated into their evaluations. For

educators in “tested” subjects and grades, those grades and subjects for which there is a state,

standardized test as well as a state test in the same subject in the previous year, student

performance will be evaluated using Student Growth Percentiles (SGP), and the results of SGP

analyses, along with SLO results, will be used in the evaluations of educators in tested subjects

and grades. Both SGP and SLO approaches are described in more detail below.

Both SGP and SLO approaches can be used to attribute the academic achievement and growth of

students to individual educators or to appropriate aggregations of educators such as grade or

content-level teams or even the whole school. Distributing student performance results to

multiple educators is referred to as “shared attribution.” The tradeoffs associated with shared

attribution are also discussed below.

Student Learning Objectives (SLO)

All teachers, whether in “tested grades and subjects” or not shall be required to document student

academic performance each year using SLOs in accordance with Wyoming’s SLO guidance

Both SGP and SLO analyses shall produce results in three classifications of performance, to the

extent possible, such as: high, typical/average, and low. The results of the SLO determinations

shall be incorporated into the evaluation of all educators according to the rules described below


in the section on combining multiple measures. [Note: A draft of the SLO Guidance is found in

Appendix A].

Calculating Student Performance Results in “Tested” Subjects and Grades

The growing interest in reforming long-standing approaches for evaluating and compensating

teachers has been characterized by among other things incorporating student performance results

in teacher evaluations. Advances in growth and value-added models in education have

contributed to the interest in using changes in student test scores over time as part of educator

accountability systems. Many districts, states, and non-governmental organizations have

embraced these test-based accountability initiatives, but the initial focus has been on the content

areas and grade levels for which there are state standardized tests, generally administered at the

end of each school year, or “tested” grades/subjects. Student performance, for the purposes of

educator evaluation, is generally evaluated using complex statistical models such as value-added

or student growth percentile models.

There are several possible approaches that Wyoming could use for evaluating student

performance in tested grades, but in order to adhere to the coherence principle, the Advisory

Committee recommends using the same Student Growth Percentile model currently being used

for the school accountability system to be used for educator evaluation. However, this is not

necessarily as simple as it sounds to move from school to teacher accountability. Appendix B

outlines multiple considerations for using SGPs in educator evaluation.

WDE shall produce Student Growth Percentiles (SGP) results documenting the individual

student and aggregate growth for students. These results will be aggregated according to

“teacher of record” rules as well as for the whole school. Further, results will be disaggregated

according to identifiable student groups in the school. All educators in “tested” grades and

subjects shall receive a report each year from WDE. These results, based on PAWS and

eventually Smarter Balanced Assessment Consortium (SBAC) test scores or another assessment,

using the SGP model, shall be incorporated into teachers’ evaluations either using a shared or

individual attribution framework.

Share Attribution

The Advisory Committee recognizes the challenges of properly attributing the results of student

performance to individual teachers. It is easy to think of many examples where it does not make

much sense to attribute the performance of students to any individual teachers, such as the case

when grade-level teams of teachers place students into differentiated instructional groups and

providing instruction to students by educators other than the child’s regular teachers. Therefore,

the Wyoming Framework relies on a mix of shared attribution and individual attribution of

student performance results. The SGP results, based on state tests in grades 3-8 should,


depending on the specific theory of improvement for the particular school, be shared among

educators at the same grade and/or teaching the same subject areas. SLO results, assuming

groups of educators are working on the same SLO, may also be shared among educators at the

same grade and/or content area. However, SLOs allow for more control than state test results

and the Framework requires that at least some portion of the SLOs used to document student

performance by attributed to the individual educator of record. Like anything else in

accountability system design, there are both advantages and disadvantages to using shared

attribution.

One of the major concerns with attributing the results of student performance to individual

teachers is that many fear that this could erode collaborative cultures at many schools, especially

if the results are used in some sort of “zero sum game” accountability design. Shared attribution

approaches, if implemented sensibly, can help promote both collaboration and internal (to the

group of teachers) accountability orientations. Both of which are associated with high

performing schools and organizations. Another concern for policy makers and accountability

system designers are potential unintended negative consequences of having the mathematics and

reading teachers in grades 4-8 evaluating in potentially very different ways than the other 70-

75% of educators in the district. This could lead to higher rates of attrition from these subjects

and grades or perhaps a feeling of professional isolation. The requirement for all educators to

participate in the SLO process is one hedge against this potential problem. However, sharing the

results of all of the student performance indicators among multiple educators, as appropriate is

one way to recognize the contributions of other educators to student performance, especially in

reading and math. Finally, one of the major concerns with tying student performance results to

individual teachers involves the reliability concerns when dealing with such small groups of

students. Aggregating the student performance results for multiple educators is one way to

ameliorate, but far from eliminate, the reliability challenges.

This discussion could lead one to believe that shared attribution has so many advantages, why

would a system include any other approach. Of course there are potential disadvantages to

shared attribution too. One important disadvantage—that could be reduced with careful

design—is the educators maybe held accountable for results for which they may have little to no

control. This was a considerable criticism of Tennessee’s approach for including student

performance results in the evaluations of teachers from non-tested subjects and grades. This

threat is likely greatest when student performance on the state math and/or reading tests is

attributed to all educators in the school as opposed to a finer-grained aggregation. Another

potential disadvantage to shared attribution is that it may mask true variability in educator

quality. If we believe that educator quality is truly variable along a continuum of being able to

influence student performance, then pooling results among multiple educators could mask such

differences. Of course, being able to separate the “signal” (true variability) from the “noise”

(unreliability in the system) is not easy.


Therefore, the Advisory Committee recommends that sharing student performance results among

multiple educators should be based on more than just reliability concerns, but that such decisions

must be tied to local theories of improvement. For example, if the focus of improvement

activities is the grade level team attribution should be shared among educators at that grade and

not at the whole school level. Therefore, the first step in implementing any sort of shared

attribution approach involves a careful articulation of the school’s locus of improvement actions.

This theory of improvement (action) should also make clear which subjects are shared and with

whom. For example, does the 5th

grade team share both math and ELA results or just one

subject? Finally, while the Advisory Committee favors shared attribution approaches in many

cases and for at least some of the weight in the accountability determinations, it also

recommends that at least some of the changes in student performance be attributed to individual

teachers. This might best be accomplished with SLOs rather than SGPs because of the closer

ties to the specific course, but the Advisory Committee suggests leaving this specific decision to

local school districts.

Combining Multiple Measures

There are many approaches for combining multiple indicators to yield a single outcome:

compensatory, conjunctive, disjunctive, and profile methods. Compensatory means that higher

performance in one measure may offset or compensate for lower performance on another

measure. Conjunctive means that acceptable performance must be achieved for every measure

(e.g., AYP). Disjunctive means that performance must be acceptable on at least one measure. A

profile refers to a defined pattern of performance that is judged against specific performance

level descriptions. A profile approach is often operationalized using a matrix to combine

indicators for making judgments. Given the challenges involved in characterizing the

complexities of teaching, the Framework must employ a thoughtful approach for combining the

multiple sources of data in order to arrive at the most valid inferences about overall teacher

quality possible.

A compensatory approach recognizes that some degree of variability in performance across

indicators may be expected. Such an approach has a higher degree of reliability because the

overall decision is based on multiple indicators evaluated more holistically. Conjunctive

decisions are less reliable because errors accumulate across multiple judgments meaning a

teacher might fail to be classified as effective due to poor performance on the least reliable

measure. A conjunctive approach does not appear to make much sense for an educator

evaluation system. A disjunctive method is used when any one component is viewed as adequate

assurance the teacher met expectations. Again, this does not appear to make much sense in a

teacher evaluation system. Finally, profiles are useful especially when there are certain patterns


that can be described that reflect valued performance that are not easily captured, usually

because the combinations of criteria are judged to be not equivalent.

These approaches should not be regarded as mutually exclusive. It is possible, for example, to

combine aspects of compensatory and profile ‘rules’ to arrive at a final result. For example, a

compensatory approach may be used to aggregate the data from the multiple measures within

any single domain, while a profile approach could be used to combine information across

domains. A major advantage of a profile or decision matrix approach is that once established,

the teacher can never receive an unexpected overall rating, whereas simple averages

characteristic of compensatory approach can produce some surprising outcomes.

The Advisory Committee recommends using, as part of the Framework, an approach for

combining the various sources of information that avoids mechanistic approaches such as simple

averaging, but that takes into account the nature of the different sources of information. A

“panel” or “decision matrix” approach” for combining the multiple measures allows the goals of

the system to be reflected explicitly and not buried in some numerical composite. An example of

such a panel approach is found below.

EXAMPLE Panel Approach for Combining Multiple Measures (based on an approximate

25/75 weighting between student performance and teacher practices)

“P

rofe

ssio

nal

Pra

ctic

e” R

ati

ng

4 Automatic

Review

Highly Effective Highly Effective

3 Needs

Improvement

Effective Effective

2 Needs

Improvement

Needs

Improvement

Needs

Improvement

1 Ineffective Ineffective Automatic

Review

1 2 3

“Student Performance” Rating

Again, this is just an example. If we want to include such detail in the Framework, we will need

to provide the details for combining across the various domains.

Supports and Consequences

Assumptions

As stated in the guiding principles, Wyoming’s Framework is being designed such that it can

support improvements in teaching and learning. As part of this design, the Advisory Committee


emphasizes the importance of reporting detailed and actionable information so that educators and

their leaders have the information they need to guide efforts to improve their practice. This

means that educators need to receive information on each of the indicators in the system, while

recognizing that the information at the indicator level is considerably less reliable than the total

evaluation. This will require that each local system is well documented, in terms of the

components and indicators outlined in this document, so that each local educator understands the

nature of the information on which they will be evaluated.

The WY Framework and all local systems must produce an overall effectiveness rating that

guides support, career development, and employment decisions. The overall rating can only be

an overall flag to guide support since the detailed information is really required to allow for

focused support and development.

Supports

A critical support requires having each educator understand the rules by which they will be

evaluated. Therefore, each district shall develop and implement a process for training all

licensed personnel on the educator evaluation system including the consequences associated with

the ratings. Further, the district shall require all personnel conducting classroom observations to

undergo a defined training and qualification process.

In order to fulfill one of the major guiding principles that the system is being designed to

improve educators’ performance, the Framework requires that each Wyoming school district

must include well-specified and formalized process of mentoring and support designed to

improve the performance of all educators in the district. The support and mentoring systems

should be designed in collaboration with teachers, administrators, and other key stakeholders

(e.g., parents, Board members) and based on research and documented best practices.

Additionally, all evaluators (administrators) must receive research-based training on how best to

share results of the evaluation system with those evaluated in order to support understanding of

the information and to improve practice.

Educators rated ineffective or needs improvement in one year must be placed on directed

professional growth (improvement) plan that includes receiving targeted mentoring and support.

These support systems must be research-based to the maximum extent possible. Further, the

evaluations of the educators involved in a directed professional growth plan shall include

additional data sources such as video records of classroom teaching experiences. The video

recording of classroom teaching is designed to serve two purposes. It can be a very effective

feedback tool for all educators, but particularly for struggling educators if viewed with an expert

mentor. Second, the video evidence will allow for review by an appeals panel within the school

district to ensure the accuracy of the principal ratings for classroom performance.


Consequences

Ultimately, the system will lead to certain consequences for educators falling well below or well

above expectations. While the system is designed for improvement and a significant support

system is required to help struggling educators, there will likely come a point where educators

may need to be counseled out of the profession. The Framework includes the following

expectations for such eventualities:

1. An experienced, educator with two consecutive years of ineffective ratings shall lose

her/his current (continuing contract) status and may be dismissed without additional

cause.

2. An educator with two consecutive years of needs improvement ratings shall be moved to

ineffective status.

3. An educator rated highly effective for two consecutive ratings should receive recognition

or reward, as determined by the local district, and may assume a “teacher leader role” as

part of the mentoring and support system.

4. Only educators with consistent ratings of highly effective may participate in the

evaluations of other educators in their district or building.

Implementation Recommendations

The Advisory Committee, as can be seen from the preceding discussion, has been very

thoughtful about designing a framework for educator evaluation in Wyoming. We have

attempted to outline a clear approach to addressing the complexities for designing and

implementing educator evaluation systems in Wyoming. However, the Advisory Committee

wants to stress that there are enormous challenges to implementing such systems in any locale.

One positive aspect of having Wyoming follow other states and districts in this work is that we

have the opportunity to learn from the experiences of others. One of the most striking things

being learned is that significant time and thoughtfulness are needed to implement these systems

well. Further, the odds of getting things wrong is much greater than the odds of getting things

right when these systems are rushed into operational practice too soon.

This would be true under conditions where the state standards and assessment systems were

stable. As we know, both the standards and assessments have or are in the process of being

revised. Further, the Common Core State Standards call for deeper levels of understanding on

the part of students than ever before. Shifting instructional practices and curriculum will require

considerable effort on the part of local school districts. Adding requirements for a new school

accountability system will further stress systems. Therefore, the Advisory Committee strongly

recommends the proposed educator evaluation system be implemented thoughtfully with an


extended pilot period to both gradually implement the system and to allow for formative

feedback to make adjustment to the system before it is implemented for high stakes.


APPENDIX A:

Student Learning Objectives: Guidance

Introduction

The Wyoming Accountability Advisory Committee recommends the use of Student Learning

Objectives (SLOs) to document educators’ contributions to student performance in both “tested”

and “non-tested” subjects and grades. SLOs are content- and grade/course-specific measurable

learning objectives that can be used to document student learning over a defined period of time.

In essence, educators establish learning goals for individual or groups of students, monitor

students’ progress toward these goals, and then evaluate the degree to which educators help

students achieve these goals. This is a key advantage of the SLO approach. It is designed to

reflect and incentivize good teaching practices such as setting clear learning targets,

differentiating instruction for students, monitoring students’ progress toward these targets, and

evaluating the extent to which students have met the targets.

There are several important considerations for employing SLOs in educator evaluations. First,

the quality of the objectives and the validity of the inferences that can be made from the SLO

process must be assured. Second, the process by which the objectives are established must be

considered if the objectives are seen as fair for all educators. Third, the measurement approaches

and tools must enable educators and their evaluators to judge the extent to which educators have

met their objectives. Finally, the oversight and support, especially the professional development

necessary to help educators and administrators learn how to set and evaluate meaningful

objectives, and the cross school/district monitoring will be critical to assure fairness and rigor

within and across schools and districts.

While many have an interest in developing “growth-based” SLOs (i.e., measuring the change in

student achievement over two or more points in time), most will be “status-based,” usually

roughly conditioned on estimated initial understanding, then evaluating the degree to which

students reach specific targets on the measurement at the end of the instructional period. This

distinction between growth and status SLOs is discussed in more detail in Marion, et al., (20122).

This section of the report will help guide educators and administrators in designing and

implementing a local SLO process. It is divided into the four sections: 1) The Objectives; 2)

The Objective Setting Process; 3) Assessment/Measures; and, 4) Oversight and Support. Each

section provides both recommendations and a rationale for the recommendations. To the extent

2Marion, S., DePascale, C., Domaleski, C., Gong, B., and Diaz-Bilello, E. (2012, May). Considerations for analyzing educators’

contributions to student learning in non-tested subjects and grades with a focus on Student Learning Objectives

http://www.nciea.org/publication_PDFs/Measurement%20Considerations%20for%20NTSG_052212.pdf

http://www.nciea.org/publication_PDFs/Measurement%20Considerations%20for%20NTSG_052212.pdf


applicable, reference is made regarding the distinction between the early implementation years

and a more complete operational system.

The Objectives

The number and specificity of the objectives are important considerations in terms of

maximizing the validity of the evidence regarding the claims one is trying to make as a result of

the SLO process. At a minimum, evaluators are at least implicitly claiming that the results of the

SLO determinations for a given time period are a fair and valid depiction of the learning results

of an individual or group of students associated with a particular educator or educators. The

intention is to clearly use the results of the SLO process as evidence of the quality of a particular

educational experience in a particular setting.

SLOs will work best if they are situated within the theory of action or theory of improvement for

the particular school. In order to help ensure the validity of the claims about educators from the

SLO process, it is important to use a sufficient number and representativeness of objectives to

ensure that the domain of the course is appropriately sampled, but not so many objectives that

certain objectives become trivialized. As such, educational leaders should consider requiring

that at least a portion of the SLOs in the building will be shared among a group of educators

(e.g., grade level team). Further, while most SLOs will be tailored to the specific learning targets

in the particular class or course, district and school leaders should work to have SLOs related to

overall school improvement goals to the extent practical. The following recommendations are

designed to maximize the validity of the inferences from the SLOs related to educator quality

while trying to manage the implementation challenges of a new SLO process.

1. All non-administrator educator evaluations shall include a minimum of two, individually-

based SLOs for each individual educator in a building during the first pilot year. By the

first operational year up to four SLOs per teacher should be the requirement to ensure

that the subjects and grades are more appropriately represented in the complete set of

SLOs. Reliability concerns can be mitigated by:

a. Using multiple measures for each SLO, and,

b. Increasing the number of SLOs, each with its own measure.

2. The objectives shall be established as “close to the individual students as possible.” This

may involve establishing subgroup, overall class or school goals, and then allowing

variation from these goals based on the current achievement levels of individual or

groups of students.

3. Objectives for each educator should be as representative of the set of courses/subjects

they teach as possible. For example, a middle or high school teacher should have

objectives from multiple sections or courses. This does not mean that every

course/section is represented, but there should be an effort to ensure such representation


over time. Similarly, objectives for elementary school teachers should be as

representative as possible for the subjects that these teachers teach.

4. The objectives shall be linked to the appropriate specific content and skills from the

Wyoming Content Standards and/or course standards. The SLOs should be targeted to

“enduring understandings” or high priority standards. In other words, given the limited

number of student learning objectives for each teacher, they should be tied to the most

critical learning outcomes. It will be important for educators to focus on the most

important outcomes and be cautious not to narrow the curriculum.

5. Each educator shall participate in at least one shared or aggregate objective. This may be

in alignment with a school wide goal or could be a grade level or content area goal

(typically for middle or high school). This should be based on a theory of

action/improvement that leaves the school and district able to decide on the appropriate

aggregation (e.g., grade level teams) based on school/ district philosophy. For example,

most schools have “literacy across the curriculum” initiatives in place and it will make

sense to maintain focus on such initiatives through the SLO process.

6. Objectives for each individual educator, and especially the shared/aggregate objectives,

should reflect consideration of the overall school improvement plan.

7. Growth-based objectives should be encouraged and employed only where possible to do

so in technically defensible ways (Marion, et al., 2012).

8. The objectives should be ambitious, but realistic. Further, the objectives should be rich

enough such that educators are not simply classified as having met or not met the specific

objectives. The student learning objectives should be tied to a rubric of performance that

includes at least three or four levels. The objectives should be able to produce nuanced

results such as “clearly not met,” “partially met,” “met objective,” and “exceeded

objective,” as categories of performance. Such an approach will encourage objectives

rich enough to support such a scoring scheme and will hopefully maximize the chances of

capturing the true variance in educators.

The Objective Setting Process

The process of setting the student learning objectives is critical to the fairness, educator buy-in,

and manageability of the SLOs. A process should be established so that educators are held to

similar levels of rigor at least within a school building. The focus should be on trying to

implement as comparable a process within each school as possible. Hopefully in the long run,

this comparability will expand across the district. If SLOs are to lead to the improvements in


student learning that many hope to see, educators should fully participate in the process and not

“have SLOs done to them.” The following recommendations are designed to address these

concerns.

1. Each district shall establish a framework for ensuring that objectives across the district

are comparable as possible. Participating on statewide peer teams to set objectives for

content area may be an option for districts to consider. Further, the principal or her/his

designees shall consider comparability when approving all objectives in the building.

2. Generally, the school principal is legally responsible for the evaluation of all personnel in

the building and therefore should approve all objectives. However, the principal,

especially at the secondary level, should consider employing a team approach to take

advantage of distributed leadership and expertise. Having a single point person (or team)

can help ensure the comparability of SLOs across the school building.

3. In addition to school administrators, teams of educators shall be involved in establishing

both shared and individual teacher objectives. Teams members may include: members

of the same academic department, grade level colleagues, district content area experts,

and other qualified individuals. This recommendation is designed to address three major

concerns: content knowledge, comparability, and buy-in.

4. Each educator shall have considerable say in establishing her/his objectives. Shared

district objectives can influence educator SLOs, but with administrator approval,

significant input is appropriate to better fit the needs of the educators’ particular classes.

5. Relevant performance data on students for whom objectives will be set as well as data

from the same course in prior years shall be used to assist in establishing meaningful

objectives. Student information and longitudinal information as well as information from

the same course in previous years shall be used if available.

6. The objectives for each course should be established within six weeks of the start of the

course.

Assessments/Measures

Even with rigorous and appropriate learning goals, SLOs may be meaningless without high

quality measures to evaluate the degree to which students achieved these learning goals. In fact,

the quality of the measures may be the Achilles Heel in the entire SLO process, because outside

of a few core content areas, the quality of the available measures is quite variable at best.

However, rather than using concerns about potential measures as a reason to abandon the SLO


process, we should use the SLO approach as motivation to upgrade the quality of measures and

assessments available for teachers to be able to document student learning.

Educators should rely on the best measures available to evaluate the specific SLOs. The use of

the measures should be driven by the fit between the particular learning targets and the

assessments used to evaluate the SLOs. The highest quality assessments should be used to

evaluate the SLOs, but these assessments should be the ones that best match the specific learning

targets. It will be a challenge in the early years to find high quality assessments to evaluate the

SLOs, but this should be seen as an opportunity to improve the quality of local assessments.

This is one of the main reasons why it makes sense to focus first on status-based SLOs. It will

be hard enough to develop or select at least one high quality assessment to evaluate SLOs

without the challenge of needing to find both a high quality pretest and posttest (again, see

Marion, et al., 2012). The following recommendations are intended to help guide the assessment

component of the SLO process.

1. State standards-based assessments shall be used to evaluate the teachers’ contributions to

student performance in the subjects and grades where such assessments are available.

This recommendation allows for local districts to decide to use an SLO process to

contextualize the student assessment results or the district can choose to use a more

conventional test-based approach. Supplementing the use of student growth percentiles

(SGPs) in tested subjects and grades with a small set of SLOs can provide another set of

measures to broaden the assessment information for each educator.

2. When state assessments are not available, which is the case for all non-tested subjects and

grades, schools and districts will have to choose another method for measuring the SLOs.

Common benchmark tests created by the district or other entities shall be used to evaluate

SLOs to the extent that the assessment provides a valid measure of the learning

objectives. Determining what constitutes a valid measure of the learning objectives is not

an easy task and there will be other resources available, such as quality criteria for

assessments, to help districts evaluate the technical quality of various assessments.

3. WDE and a consortia of districts shall be encouraged to facilitate the development of

resources/tools (e.g., common rubrics, common assessments) as examples to aid in the

assessment of SLOs in non-tested subjects and grades. It makes little sense for every

district to tackle this challenge on its own, so this recommendation is intended to

encourage cross-district collaboration to build higher quality assessments for SLOs than

would be possible if each district was working on its own. Because we are concerned

about the cost, both in terms of time and money, of creating new common assessments

for courses and grades where there are currently no state-supported assessments, criteria

for quality student assessments will be established, Frameworks and examples will be

provided, and local districts and schools will be provided professional development on


creating quality assessments. This is an important aspect of building professional human

capacity.

5. Educator performance on the SLOs should generally be scored using at least three or four

categories of performance (e.g., exceeded SLO, met SLO, partially met, and did not meet

SLO).

Oversight and Support

Designing and implementing an SLO process assumes that teachers and leaders have the

knowledge and skills to establish appropriate objectives, locate or develop assessments suitable

for measuring student learning relative to these targets, and evaluate educator performance

according to how well the students performed. Educators will need professional development to

gain the knowledge and skills necessary to sustain wide-scale implementation of the SLO

process. Further, some level of monitoring and oversight at the state level is necessary to

promote comparability in SLO processes and outcomes. Comparability of SLOs and SLO

outcomes is a major concern of the Wyoming Advisory Committee. As such, the

recommendations discussed below are intended to help ensure comparability of goals and

objectives starting from the classroom (i.e., multiple SLOs within the same classroom and across

classrooms should be comparable) to the school, district, and state. The recommendations that

follow are intended to address the support necessary to successfully implement an SLO approach

for documenting educator contributions to student learning as well as to provide guidance around

the type of monitoring and support the Advisory Committee recommends for the state and

districts.

1. WDE, based on recommendations from the Advisory Committee, shall create clear

guidance for creating a local SLO process that includes the items described in this

document. This guidance shall describe criteria for developing and evaluating high

quality SLOs and should provide examples of both high quality and weaker (for contrast)

SLOs.

2. A State SLO Advisory Review Committee shall be established to review and support the

SLO process including evaluating the quality and rigor of objectives, assessment

measures, and performance expectations (what counts as “good enough”). This SLO

Advisory Review Committee will be designed to ameliorate differences in SLOs across

districts due, in part, to differences in district capacity. At a minimum, districts shall

conduct such processes across schools within their districts.

3. WDE along with contributing schools and districts shall develop a resource bank of

exemplar SLOs and potential assessment instruments.


4. Each district, with WDE support, shall design a structure and process for providing

professional development on the development of an SLO process for its educators and

administrators. This shall include training for educational leaders on how to work with

his/her teachers in establishing meaningful and rigorous learning objectives, how to

establish and support peer teams, and how to determine what types of assessments are

suitable for evaluating SLOs. The support for educators shall include training for how to

use data to establish learning objectives, determining the appropriateness and

meaningfulness of targets, monitoring student progress toward the targets, and using

assessments to evaluate the degree to which students met the targets.

5. As part of the pilot of the educator evaluation system, special attention should be devoted

to the ways that student growth measures work within the systems. The results of the

pilot process shall be reported and used to inform subsequent modifications to the SLO

process and the weighting of student growth in the Wyoming evaluation system.


APPENDIX B:

Considerations When Calculating Student Performance Results in “Tested” Subjects and

Grades

Incorporating the results of student achievement tests requires the Advisory Committee to

consider and make recommendations about several important issues. The following pages lay

out many of these considerations to provide background information for decisions to be made by

the Advisory Committee.

Tests Included

It is assumed that the grade/ subject tests included in the Wyoming Framework will be the same

as those included in the school accountability system. Creating as much overlap as possible

among the set of included tests is a desirable feature of coherence. The proposed school

accountability system to meet the requirements of WEA 65 includes academic growth based on

state assessment results (PAWS currently) in grades 4-8 in reading and mathematics. Therefore,

these grades and content areas should serve as the basis for inclusion of SGP in the Wyoming

Framework as well.

Obviously, it is not desirable to exclude high schools from SGP calculations. The State Board

and the legislature are currently considering a plan for implementing end of course tests (EOC),

which may open up new options for calculating SGPs at the high school. However, calculating

growth at the high school level is extremely complex, particularly if, as expected, there is

variability in course sequence. Therefore, until we know much more about the developing high

school assessment system, the focus should be on grades 4-8 in reading and mathematics.

Teacher/Leader of Record

Another important consideration in operationalizing growth in Wyoming’s Educator Evaluation

Framework is determining which teacher/leader should be held accountable for a student’s

performance (leaving aside for the moment the discussion of shared attribution). A suitable

definition - and an accompanying data system that permits operationalization of this definition -

should establish the conditions and circumstances governing the connection of educators with

classes and account for the variety of learning environments in Wyoming’s schools. For

example, the Data Quality Campaign (DQC) (2010) advises states seeking to use assessment data

to inform educator evaluation to:

Account for contributions of multiple educators in a single course

Enable teachers to review rosters for accuracy


Account for schedule changes and variable class environments such as virtual classes or

labs

Link attendance records with teachers to track actual days of instruction

Based on the framework for defining teacher of record offered by DQC (2010b) the following

questions are important to address in order to arrive at an operational definition for included

teacher/ leader of record. Sample responses, intended only as ‘placeholders’ at this time, are

provided. It is recommended that the advisory committee carefully consider each.

What educators and leaders will be included?

o The primary educator who provides instruction contributing to and culminating in

the statewide PAWS test in reading or mathematics

o Elementary and middle school principals

o Other building level leaders/administrators whose role is primarily associated

with instruction

How much instructional time is required to establish a link?

o Teacher has primary responsibility for instruction in the class of record

o Minimum of 90 days of instruction (approximately half of the full academic year)

for the class of record

What prior measures will be required?

o At least one prior year summative state test score in the same content area

Will any courses/ schools be specifically excluded and why?

Will any teachers/ leaders be specifically excluded and why?

What is the minimum n size?

o Class and school growth estimates reported for groups of 20 or more students, but

multiple years of data can be aggregated to reach 20 students.

What is the inclusion rule?

o Class scores are not reported if contributing students represent fewer than 25% of

class size.

o School scores are not reported if contributing students represent fewer than 25%

of school size.

What students will be included?

o Students in grades 4-8 continuously enrolled for the full academic year in the

current year participating in the state PAWS in reading or math.

o All prior test scores in PAWS reading or math regardless of term of enrollment.

Missing/ Incomplete Data

Another ‘data issue’ to address is missing and/or incomplete data. This situation exists when any

of the following occur:


One or more prior (pre) test scores are missing

The current year (post) test score is missing

The student is not continuously enrolled in a single building/class throughout the term of

instruction

The student record is missing or incomplete (e.g. test scores but no identifier)

Missing data can impact the precision and stability of the model and introduce systematic bias in

the resulting estimates (Braun et al, 2010). Moreover, it is generally acknowledged that data are

not Missing At Random (MAR), meaning that it is likely that the performance of students with

missing or incomplete data differ systematically from those with complete records. Consider,

for example, that mobility rates are typically higher for economically disadvantaged students

compared to other students.

There is no single or best approach to dealing with missing data. It is recommended that

Wyoming take these near-term steps moving forward.

Identify business rules to clearly define what data are usable and which are not.

Investigate the extent that data are missing for districts, schools, and classes. Seek to

understand patterns of missing data for various levels of performance and by subgroup.

Such analyses will help determine the extent to which data are MAR or differ in a

systematic manner.

Multiple Educators and Shared Attribution

Another issue to consider is how to handle circumstances where students receive instruction

from multiple educators. This may be regarded as a special case of the teacher/ leader of record

issue, but merits specific attention.

There are three general cases that lead to this occurrence. First, the student may receive planned,

ongoing instruction from multiple teachers, as with a team teaching approach or scheduled

support sessions. Second, changes can occur throughout the year, such as a leave of absence for

the primary instructor or the student transitions to another class. Finally, additional instruction

can occur in a variety of contexts, such as when a student receives tutoring outside of class.

Whatever the case, multiple sources of instruction will likely have an impact on student

achievement.

Some researchers have hypothesized that a ‘dosage’ model may be appropriate in such

circumstances. That is, if Ms. Smith provides 70% of instruction and Mr. Jones provides 30% of

instruction, then the outcomes are assigned to the educators consistent with the proportion of


instruction provided. While it may be useful to research the feasibility of this approach, the

following caveats should be considered:

It is unlikely that proportional contribution to instruction can be captured with precision,

particularly when it is unscheduled. Also, it will be necessary to create potentially

complex connections in the state data system to account for this.

The proportional contribution to instruction may not be governed by time alone. For

example, an hour spent introducing new concepts to a class may not represent the same

‘instructional contribution’ as an hour spent overseeing time allotted for student directed

study.

The research on attributing a student’s academic performance to teachers and leaders is

emerging – even for the least ambiguous circumstances when the teacher of record is well

defined. Much less is known about the credibility of results based on proportional

attribution of scores.

Therefore, we strongly recommend using the shared attribution framework discussed in the main

part of this document and base decisions on which results get shared by which teachers on an

explicit theory of action or improvement for the school.

Performance Standards

Coherence

In order to maximize the coherence between school and educator evaluation system it is

desirable for performance expectations for growth at the class level to be similar by design to

growth targets at the school level. By so doing, the likelihood that outcomes will be favorable

for schools but not educators at that school (or vice versa) will be minimized. Additionally, it is

critical to ensure that the system does not create incentives that are in conflict.

More specifically, it is expected that growth outcome for classes will be the median student

growth percentile (MGP) and that standards for meeting and exceeding targets will be coherent

with those established for the school. At the time of writing this document, the growth targets

for schools have not been finalized, but draft plans call for three categories of performance—

high, typical, and low—at the school level in grades 4-8 based on PAWS.

Before moving forward with growth standards for the educator accountability system, there are

at least three critical considerations that should be addressed by the Advisory Committee. The

first is to determine the number and type of growth levels that need to be produced to support the

intended purposes and uses of the system. The second consideration is to explore the extent to

which the proposed growth rates are both attainable and meaningful at the class level. Based on

the documentation provided to date, it appears that the school targets were selected normatively.


That is, performance cutscores were selected based on the percentages of schools that would end

up in each of the three categories. However, it is less clear if the growth rates in the proposed

meets and exceeds range are sufficient to establishing meaningful growth to be on track to

achieve or maintain proficiency or readiness.

Finally, it is important to deal with the inherent unreliability of class level outcomes. Given that

class level results will be much more variable and subject to sampling error than school level

results, mechanisms must be put in place to deal with the lack of stability of outcomes in order to

have a greater degree of confidence in the results. The remaining two sections will address these

issues.

Reporting Outcomes

It is essential to determine the number and type of growth outcomes necessary to support the

purposes and uses of the educator evaluation system. In general, there is a tension between

reporting high-level results that are more reliable and the desire to report more nuanced but less

precise outcomes for multiple indicators. For example, there will be a much higher level of

confidence in classifications of class effects as low, typical, or high compared to a class effects

described on a ten point scale from 1 (ineffective) to 10 (highly effective). In the latter case,

stakeholders may regard this information as useful to understand more fine grained degrees of

difference, but such a scale may carry only the appearance of precision that is not supported by

evidence, particularly for adjacent ratings. The same issue is generally true for reporting units.

That is, results for individual content areas or classes will be much less defensible (and results

based on strands or subscores will be almost certainly indefensible) than aggregate results for

multiple classes. The goal, of course, is to find the balance between the necessary specificity of

outcomes and an acceptable level of precision. As a matter of best practice, is advisable to

privilege technical defensibility, in order to provide the best case for results to be meaningfully

interpreted and utilized.

Norm and Criterion Referenced Growth

Broadly, approaches to identifying growth standards can be characterized as either norm-

referenced or criterion-referenced. A norm-referenced approach compares student achievement

to an expectation often based on a distribution of observed performance. Alternatively,

criterion-referenced growth standards establish a specific target outcome. For example,

requiring students who are not proficient to grow at a rate such that they achieve proficiency in a

set amount of time is a criterion referenced approach.

Each approach has advantages and limitations. Setting a norm-referenced expectation is useful

for identifying comparably high or low growth. Indeed, it seems intuitively reasonable to


describe valued growth as that which is significantly higher than other students. However, a

limitation is that some students who grow at very high rates relative to their peers may not

achieve proficiency in a reasonable amount of time. A criterion-referenced standard resolves this

potential ‘growth to nowhere’ problem, but raises a new issue: some students may be so far

below standard that even at exceptionally high rates of growth the student will not achieve

proficiency in a reasonable time frame. Particularly when growth is used for accountability

purposes, this can create a condition where some classes are uniformly disadvantaged.

Conversely, very high performing classes could exhibit little or no growth and meet standard.

While the Advisory Committee recommended blending both normative and criterion approaches

for evaluating growth for school accountability purposes, standards for growth in educator

evaluation systems should only be normative. This is due to the fact that students, rightfully so

in many cases, are not randomly assigned to teachers. Requiring teachers to equally advance

students toward meaningful outcomes (e.g., proficient) does not take into account that this is

much more challenging for students far below proficient than for students closer to the proficient

cut. However, expecting all teachers to have their students grow at meaningful rates compared

to each student’s academic peers in a normative sense is fairer to all educators in the system.

Reliability

Reliability refers to the consistency or stability of a measure. In this case, we are interested in

the reliability of the measures of teacher/leader effectiveness based on a system influenced by

growth estimates. Reliability is challenging in this context due to the error in achievement

measures and growth measures and the likely variation in the performance of teachers – about

which, little is known. We know little, except anecdotally, about the extent to which

performance differs across content areas for the same teacher. For example, would we expect a

teacher to be effective in ELA but not math? If so, to what extent would the levels of

effectiveness differ? Further, how stable is teaching effectiveness across years? Could a teacher

be effective one year but not the next and if so, to what would we attribute this variability?

Ultimately, it is challenging to disentangle measurement error from true variation in

performance. In the end, an educator evaluation system is built on the assumption that

performance is “stable-enough” to reliably detect some differences in true effectiveness.

One way to mitigate issues of unreliability is to base overall outcomes on aggregations of results

within content areas for the current year and across multiple years. For example, if a teacher

teaches three sections of the same mathematics class, the median growth informing the

performance category is based on all students across sections. Additionally, if that teacher has

results for the prior year, the teacher’s outcome for the current year could be based on the median

of the two years combined. The idea behind this approach is to both minimize uncertainty. The

reliability of overall outcomes will also be improved by the manner in which additional elements


aside from academic growth are incorporated into the system (e.g. professional practices), but

that will be addressed separately.

Shared and Individual Attribution of Student Performance Results

The Advisory Committee recognizes the challenges of properly attributing the results of student

performance to individual teachers. Therefore, the Framework relies on a mix of shared

attribution and individual attribution of student performance results. The SGP results, based on

PAWS tests in grades 3-8 should, depending on the specific theory of improvement for the

particular school, be shared among educators at the same grade and/or teaching the same subject

areas. SLO results, assuming groups of educators are working on the same SLO, may also be

shared among educators at the same grade and/or content area. However, SLOs allow for more

control than state test results and the Framework requires that at least some portion of the SLOs

used to document student performance by attributed to the individual educator of record.

References

Data Quality Campaign. (2010a). Strengthening the teacher-student data link to inform teacher

quality efforts. Retrieved from: www.DataQualityCampaign.org/resources/947.

Data Quality Campaign. (2010b). Developing a definition of teacher of record. Retrieved from:

http://dataqualitycampaign.org/files/Teacher%20of%20Record.pdf.

National Research Council. 2010. Getting value out of value-added. H. Braun, N. Chudowsky,

and J. Koenig (eds.). Washington, DC: National Academy Press.

http://www.dataqualitycampaign.org/resources/947

http://dataqualitycampaign.org/files/Teacher%20of%20Record.pdf

draft: october 17, 2012 introduction€¦ · draft: october 17, 2012 introduction ... decisions...

Documents