improving teaching effectiveness - rand.org€¦ · kaitlin fronberg, gabriel weinberger, gerald...

196
Prepared for the Bill and Melinda Gates Foundation Improving Teaching Effectiveness BRIAN M. STECHER, DEBORAH J. HOLTZMAN, MICHAEL S. GARET, LAURA S. HAMILTON, JOHN ENGBERG, ELIZABETH D. STEINER, ABBY ROBYN, MATTHEW D. BAIRD, ITALO A. GUTIERREZ, EVAN D. PEET, ILIANA BRODZIAK DE LOS REYES, KAITLIN FRONBERG, GABRIEL WEINBERGER, GERALD PAUL HUNTER, JAY CHAMBERS final report appendixes The INTENSIVE PARTNERSHIPS for EFFECTIVE TEACHING Through 2015–2016 C O R P O R A T I O N

Upload: ledung

Post on 29-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Prepared for the Bill and Melinda Gates Foundation

Impro v i n g Te ach i n g E f f e c t i v e n ess

BR IAN M. STECHER , DEBORAH J . HOLTZMAN , M ICHAEL S . GARET, LAURA S . HAMILTON ,

JOHN ENGBERG , E L I ZABETH D . STE INER , ABBY ROBYN , MATTHEW D . BA IRD ,

I TALO A . GUT I ERREZ , EVAN D . PEET, I L I ANA BRODZ IAK DE LOS REYES ,

KA ITL IN FRONBERG , GABR I E L WE INBERGER , GERALD PAUL HUNTER , JAY CHAMBERS

f i n a l r e p o r t a p p e n d i x e s

The INTENSIVE PARTNERSHIPS for EFFECTIVE TEACHING Through 2015–2016

C O R P O R A T I O N

This work is licensed under a Creative Commons Attribution 4.0 International License. All users of the publication are permitted to copy and redistribute the material in any medium or format and transform and build upon the material, including for any purpose (including commercial) without further permission or fees being required. For additional information, please visit http://creativecommons.org/licenses/by/4.0/.

The RAND Corporation is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest.

RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors.

Support RANDMake a tax-deductible charitable contribution at

www.rand.org/giving/contribute

www.rand.org

For more information on this publication, visit www.rand.org/t/RR2242

Published by the RAND Corporation, Santa Monica, Calif.

© Copyright 2018 RAND Corporation

R® is a registered trademark.

iii

Contents

Figures ..................................................................................................................................... vii Tables ........................................................................................................................................ ix Appendix A. Survey, Interview, and Archival Academic Data Collection and Analysis .............. 1

Survey Methods .................................................................................................................................. 1 Survey Content and Constructs ....................................................................................................... 1 Survey Sampling ............................................................................................................................. 3 Survey Administration .................................................................................................................... 4 Survey Data Analysis ...................................................................................................................... 7

Interview Methods ............................................................................................................................ 10 Interview Data Collection ............................................................................................................. 10 Interview Analysis ........................................................................................................................ 13

Archival Academic Data Methods ..................................................................................................... 13 Data Acquisition ........................................................................................................................... 13 Estimation of Teacher Value Added .............................................................................................. 14

Appendix B. Site TE Measures: Supplementary Material for Chapter Three ............................. 17 Districts ............................................................................................................................................ 17

HCPS............................................................................................................................................ 17 PPS ............................................................................................................................................... 19 SCS .............................................................................................................................................. 23

CMOs: Common Elements of the TE Measures................................................................................. 26 Composite Measure ...................................................................................................................... 26 Classroom Practice Measure ......................................................................................................... 27 Student Achievement Measure ...................................................................................................... 28 Student Feedback Measure ............................................................................................................ 28 Family Feedback Measure ............................................................................................................. 28 Peer Feedback Measure ................................................................................................................. 28

CMO-Specific Aspects of the TE Measures ....................................................................................... 29 Alliance ........................................................................................................................................ 29 Aspire ........................................................................................................................................... 30 Green Dot ..................................................................................................................................... 32 PUC .............................................................................................................................................. 35

Appendix C. Additional Exhibits for Chapter Three .................................................................. 39 Appendix D. Site Recruitment, Hiring, Placement, and Transfer Policies: Supplementary

Material for Chapter Four ................................................................................................... 45 District Recruitment, Hiring, Placement, and Transfer Policies .......................................................... 45

HCPS............................................................................................................................................ 45 PPS ............................................................................................................................................... 47

iv

SCS .............................................................................................................................................. 49 CMO Recruitment, Hiring, Placement, and Transfer Policies............................................................. 52

Alliance ........................................................................................................................................ 53 Aspire ........................................................................................................................................... 54 Green Dot ..................................................................................................................................... 56 PUC .............................................................................................................................................. 58

Appendix E. Site Tenure and Dismissal Policies: Supplementary Material for Chapter Five ..... 59 District Tenure and Dismissal Policies .............................................................................................. 59

HCPS............................................................................................................................................ 59 PPS ............................................................................................................................................... 59 SCS .............................................................................................................................................. 60

CMO Tenure and Dismissal Policies ................................................................................................. 61 Appendix F. Site PD Policies: Supplementary Material for Chapter Six .................................... 63

District PD Policies ........................................................................................................................... 63 HCPS............................................................................................................................................ 63 PPS ............................................................................................................................................... 64 SCS .............................................................................................................................................. 66

CMO PD Policies.............................................................................................................................. 68 Alliance ........................................................................................................................................ 69 Aspire ........................................................................................................................................... 71 Green Dot ..................................................................................................................................... 72 PUC .............................................................................................................................................. 73

Appendix G. Additional Exhibits for Chapter Six ..................................................................... 75 Appendix H. Site Compensation Policies: Supplementary Material for Chapter Seven.............. 79

District Compensation Policies .......................................................................................................... 79 HCPS............................................................................................................................................ 79 PPS ............................................................................................................................................... 80 SCS .............................................................................................................................................. 82

CMO Compensation Policies ............................................................................................................ 83 Supplementary Effectiveness-Based Payments .............................................................................. 83 Effectiveness-Based Salary Schedule ............................................................................................ 84

Appendix I. Analyzing the Relationships Between Teacher Compensation, Assignment to LIM Populations, and TE: Analytic Methods for Chapter Seven ................................................. 87

Appendix J. Site CL Policies: Supplementary Material for Chapter Eight ................................. 89 District CL Policies ........................................................................................................................... 89

HCPS............................................................................................................................................ 89 PPS ............................................................................................................................................... 89 SCS .............................................................................................................................................. 92

CMO CL Policies.............................................................................................................................. 93 Alliance ........................................................................................................................................ 93 Aspire ........................................................................................................................................... 93 Green Dot ..................................................................................................................................... 94

v

PUC .............................................................................................................................................. 96 Appendix K. Additional Exhibits for Chapter Eight .................................................................. 99 Appendix L. Resources Invested in the IP Initiative: Analytic Methods for Chapter Nine ....... 103

Site Expenditure Data and Analysis ................................................................................................. 103 Data Sources ............................................................................................................................... 103 Data Analysis .............................................................................................................................. 105

Time Allocation Data and Analysis ................................................................................................. 106 Description of the Survey Section ............................................................................................... 106 Data Cleaning and Processing ..................................................................................................... 107 Requirements for Inclusion in Analysis ....................................................................................... 108 Analytic Samples ........................................................................................................................ 108

Estimation of the Value of Teacher and SL Time Spent on Evaluation Activities ............................. 110 Data ............................................................................................................................................ 110 Data Analysis .............................................................................................................................. 110

Appendix M. Additional Exhibits for Chapter Nine ................................................................ 113 Appendix N. Additional Exhibits for Chapter Ten .................................................................. 119

HCPS.............................................................................................................................................. 119 PPS ................................................................................................................................................. 120 SCS ................................................................................................................................................ 121 Alliance .......................................................................................................................................... 123 Aspire ............................................................................................................................................. 124 Green Dot ....................................................................................................................................... 125

Appendix O. Estimating the Relationship Between TE and Retention: Analytic Methods for Chapter Eleven ................................................................................................................. 127

Modeling Teacher Retention as a Function of Effectiveness ............................................................ 127 Appendix P. Additional Exhibits for Chapter Eleven .............................................................. 131

Annual Trends in Retention Rates ................................................................................................... 131 HCPS.......................................................................................................................................... 131 PPS ............................................................................................................................................. 132 SCS ............................................................................................................................................ 134 Alliance ...................................................................................................................................... 135 Aspire ......................................................................................................................................... 136 Green Dot ................................................................................................................................... 137

Sensitivity Check: Teacher Retention After Two Consecutive Years ............................................... 137 HCPS.......................................................................................................................................... 138 PPS ............................................................................................................................................. 139 SCS ............................................................................................................................................ 140

Appendix Q. Additional Exhibits for Chapter Twelve ............................................................. 143 Appendix R. The Initiative’s Effects on TE and LIM Students’ Access to Effective Teaching:

Analytic Methods for Chapter Twelve .............................................................................. 145 Relationship Between Percentage of Students Who Are LIM Students and Teacher Value Added ... 145

vi

Change in Access Coefficient: Interrupted Time-Series Methodology ............................................. 147 Analysis of Mechanisms Used to Change Access ............................................................................ 148

Appendix S. Additional Exhibits for Chapter Thirteen ............................................................ 151 Appendix T. Estimating the Initiative’s Impact on Student Outcomes: Data and Analytic

Methods for Chapter Thirteen ........................................................................................... 153 Data and Outcomes ......................................................................................................................... 153 School-Level Difference-in-Differences Methodology .................................................................... 156 Estimation Models .......................................................................................................................... 159

Appendix U. Additional Impact Estimates for Chapter Thirteen ............................................. 163

vii

Figures

Figure C.1. Teachers Reporting That Evaluation Components Were Valid Measures of Their Effectiveness to a Large or Moderate Extent, Springs 2013–2016 ............................ 39

Figure C.2. Teachers’ Agreement with Statements About Observations, Springs 2013–2016 .... 40 Figure C.3. Teachers’ Agreement with Statements About the Use of Student Achievement

in Teachers’ Evaluations, Springs 2013–2016 ................................................................... 41 Figure C.4. Teachers’ Agreement with Statements About the Use of Student Feedback in

Teachers’ Evaluations, Springs 2013–2016 ....................................................................... 41 Figure C.5. Teachers’ Agreement with Statements About Evaluation, Springs 2013–2016 ........ 42 Figure C.6. Teachers’ Agreement with Statements About the Usefulness of Feedback from

Evaluation Components, Springs 2013–2016 .................................................................... 43 Figure G.1. Teachers’ Responses About Uses of Evaluation Results, Springs 2013–2016 ......... 75 Figure G.2. Teachers’ Responses to the Survey Question, “To What Extent Did Each of the

Following Influence What Professional Development You Participated in This Year?” Springs 2011–2016 ........................................................................................................... 76

Figure G.3. Teachers’ Agreement That Their PD During the Past Year Was Aligned with Various Sources, Springs 2013–2016 ................................................................................ 76

Figure G.4. Teachers’ Agreement with Statements About Support for PD, Springs 2011–2016 ........................................................................................................... 77

Figure G.5. Percentage of Teachers Reporting Enhanced Skills and Knowledge, in Various Areas, Due to PD, Springs 2011–2016 ............................................................. 78

Figure G.6. Teachers’ Perceptions of the Usefulness of Various Forms of PD, Springs 2013–2016 ........................................................................................................... 78

Figure K.1. SLs Reporting That Their Site Had or Was Phasing in a CL or Specialized Instructional Positions, Springs 2013–2016 ..................................................................... 100

Figure K.2. SLs Reporting That There Were Teachers at Their School Who Held Higher-Level CL or Specialized Instructional Positions, Springs 2013–2016 .................. 101

Figure K.3. Teachers’ Agreement with Statements About CLs, Selected Sites and Years ........ 101 Figure N.1. HCPS Middle-Experience Effectiveness,

by VAM Score and Composite TE Level ........................................................................ 119 Figure N.2. HCPS High-Experience Effectiveness, by VAM Score and Composite TE Level . 120 Figure N.3. PPS Middle-Experience Effectiveness, by VAM Score and Composite TE Level . 121 Figure N.4. PPS High-Experience Effectiveness, by VAM Score and Composite TE Level .... 121 Figure N.5. SCS Middle-Experience Effectiveness, by VAM Score and Composite TE Level 122 Figure N.6. SCS High-Experience Effectiveness, by VAM Score and Composite TE Level .... 122 Figure N.7. Alliance Middle-Experience Effectiveness, by Composite TE Level .................... 123

viii

Figure N.8. Alliance High-Experience Effectiveness, by Composite TE Level ........................ 123 Figure N.9. Aspire Middle-Experience Effectiveness, by VAM Score and Composite TE Level

....................................................................................................................................... 124 Figure N.10. Aspire High-Experience Effectiveness, by VAM Score and Composite TE Level

....................................................................................................................................... 125 Figure N.11. Green Dot Middle-Experience Effectiveness, by Composite TE Level ............... 125 Figure N.12. Green Dot High-Experience Effectiveness, by Composite TE Level ................... 126 Figure P.1. Adjusted Percentage of Teachers Remaining in HCPS,

by Year, Composite TE Level, and VAM Score .............................................................. 132 Figure P.2. Adjusted Percentage of Teachers Remaining in PPS,

by Year, Composite TE Level, and VAM Score .............................................................. 133 Figure P.3. Adjusted Percentage of Teachers Remaining in SCS,

by Year, Composite TE Level, and VAM Score .............................................................. 134 Figure P.4. Adjusted Percentage of Teachers Remaining in Alliance,

by Year and Composite TE Level ................................................................................... 135 Figure P.5. Adjusted Percentage of Teachers Remaining in Aspire,

by Year, Composite TE Level, and VAM Score .............................................................. 136 Figure P.6. Adjusted Percentage of Teachers Remaining in Green Dot from One Year to

the Next, by Composite TE Level ................................................................................... 137 Figure P.7. Adjusted Percentage of Teachers Remaining in HCPS Based on Two

Consecutive Years of Ratings for TE and Teacher Value Added, by Effectiveness Level .................................................................................................... 139

Figure P.8. Adjusted Percentage of Teachers Remaining in PPS Based on Two Consecutive Years of Ratings for TE and Teacher Value Added, by Effectiveness Level .................................................................................................... 140

Figure P.9. Adjusted Percentage of Teachers Remaining in SCS Based on Two Consecutive Years of Ratings for TE and Teacher Value Added, by Effectiveness Level .................................................................................................... 141

Figure Q.1. SLs Agreement with Statements About Teacher Assignments, Springs 2014–2016 ......................................................................................................... 143

Figure S.1. SLs’ Perceptions of “How Many Teachers in Your School” Possessed Various Skills, Springs 2013–2016............................................................................................... 151

Figure T.1. Graphical Depiction of Methodology for Computing Forecasts of Postinitiative Trends ....................................................................................................... 158

ix

Tables

Table A.1. Numbers of Schools Surveyed ................................................................................... 3 Table A.2. Numbers of Teachers and SLs Surveyed .................................................................... 4 Table A.3. District Teacher Response Rates, Surveys Completed, and Teachers Sampled ........... 5 Table A.4. CMO Teacher Response Rates, Surveys Completed, and Teachers Sampled ............. 5 Table A.5. District SL Response Rates, Surveys Completed, and Leaders Sampled .................... 6 Table A.6. CMO SL Response Rates, Surveys Completed, and Leaders Sampled ....................... 6 Table A.7. Collapsing of Site TE Categories for Survey Item Disaggregations, by TE Rating ..... 9 Table A.8. Number of Central-Office Administrators and Stakeholders Interviewed Each

Fall ................................................................................................................................... 11 Table A.9. Number of School-Level Staff Interviewed ............................................................. 12 Table D.1. Participants in the Aspire Residency Program ......................................................... 55 Table K.1. Teacher Survey Questions About Awareness of CLs and Specialized Positions ....... 99 Table L.1. IP Sites’ Financial Reports ..................................................................................... 103 Table L.2. Strategies, by Site .................................................................................................. 105 Table L.3. Detailed Description of SL and Teacher Survey Sample Exclusions for the

Time Allocation Analysis................................................................................................ 109 Table L.4. Final Sample Sizes, by Site .................................................................................... 109 Table L.5. Value of Teacher Time Spent on Evaluation Activities .......................................... 111 Table L.6. Value of SL Time Spent on Evaluation Activities .................................................. 111 Table M.1. Teacher Time Allocation Mean Percentages, by Site............................................. 113 Table M.2. SL Time Allocation Mean Percentages, by Site .................................................... 114 Table M.3. Principal and AP Time Allocation Mean Percentages, by Site ............................... 116 Table O.1. Estimated Teacher-Retention Percentages, by TE Level, Period, and Site,

for All Teachers with Composite TE Levels .................................................................... 128 Table O.2. Estimated Teacher-Retention Percentages,

by Level of Value Added, Period, and Site, for All Teachers with VAM Scores .............. 129 Table O.3. Estimated Teacher-Retention Percentages, by TE Level, Period, and Site,

for All Teachers with Both Composite TE Levels and VAM Scores ................................ 130 Table O.4. Estimated Teacher-Retention Percentages, by Level of Value Added, Period,

and Site, for All Teachers with Both Composite TE Levels and VAM Scores ................. 130 Table Q.1. Average and Standard Deviations of Teacher Value Added ................................... 143 Table T.1. Summary of Data Elements ................................................................................... 155 Table T.2. Average Demographics in the IP Sites and in the Rest of Their States,

as Proportions ................................................................................................................. 157 Table U.1. HCPS Impact Estimates, by Grade, Subgroup, and Year........................................ 164

x

Table U.2. PPS Impact Estimates, by Grade, Subgroup, and Year ........................................... 169 Table U.3. SCS Impact Estimates, by Grade, Subgroup, and Year .......................................... 173 Table U.4. CMOs’ Combined Impact Estimates, by Grade, Subgroup, and Year ..................... 175 Table U.5. Aspire Impact Estimates, by Grade, Subgroup, and Year ....................................... 180 Table U.6. Green Dot Impact Estimates, by Grade, Subgroup, and Year ................................. 184

1

Appendix A. Survey, Interview, and Archival Academic Data Collection and Analysis

Many parts of this evaluation relied on three types of data collected from the initiative sites: (1) surveys of teachers and SLs designed and administered by the evaluation team; (2) interviews conducted by the evaluation team with central-office administrators and with SLs and teachers in a small sample of schools; and (3) archival academic data related to students and teachers. This appendix describes the survey, interview, and archival data we acquired from the sites. We also describe the methods we used to analyze the data in those cases in which the analyses were common to many parts of the evaluation (and were reported in many chapters of this report). In those instances in which data or methods are pertinent to only individual chapters of the report, we present those methods in chapter-specific appendixes that follow.

Survey Methods Throughout this report, we present results of surveys administered to teachers and SLs in the

seven IP sites. This appendix provides details about the content and constructs, sampling, administration, and analysis of those surveys. The analysis section discusses, among other things, how we selected the survey items for which we present results in this report.

For the purposes of this report, teachers were surveyed five times: the springs of 2011, 2013, 2014, 2015, and 2016. SLs were surveyed six times: the springs of 2011, 2012, 2013, 2014, 2015, and 2016.1

Survey Content and Constructs

The teacher and SL surveys used in the IP evaluation were developed for the evaluation, although they were informed by a variety of existing surveys. The surveys asked about respondents’ experiences with and perceptions of a variety of initiative components (i.e., the levers), as well as other issues related to TE.

Topics on the teacher survey included PD, collaboration, the current teacher-evaluation system and its components (e.g., classroom observations, student achievement, student input), career paths and opportunities for advancement, compensation and other HR policies, and perceived influences on student learning. We asked SLs about similar topics, as well as staffing, teacher termination, and assignment of students and teachers to classes. We asked both groups a

1 The teacher and SL surveys are continuing (in all of the sites except Alliance) in 2017 and 2018, but this report contains results only through 2016.

2

few background questions. In selected years, the teacher and SL surveys also included detailed questions about respondents’ time allocation; in Appendix L, we describe the use of these data.

In this report, we present teacher and SL perceptions, based largely on the surveys, of each IP lever along the following dimensions:

• awareness: Did teachers and SLs in each site know about and understand that site’s policies related to each lever? Although a policy could have its intended effects without teacher and SL awareness of it, awareness of a policy, particularly by those it directly affects, is generally a necessary precondition for successful implementation and effectiveness.

• endorsement: Did teachers and SLs in each site approve of that site’s policies related to each lever? Policies are more likely to be implemented and to be effective if the affected stakeholders—in this case, teachers and SLs—buy into and support the policies.

• fairness: Did teachers and SLs in each site think that that site’s policies related to each lever were fair? Again, policies are more likely to be implemented and to be effective if the affected stakeholders perceive them as being fair.

• perceived effects: What types of effects did teachers and SLs report that policies related to each lever had had? For instance, did teachers find the policies useful for improving their teaching, and did SLs think that the policies had helped improve the quality of teaching at their school? Although self-report of policy effectiveness is not a substitute for objective analysis, it is nevertheless instructive to gauge self-perceptions related to effectiveness because they can be a leading indicator of effectiveness measured by other means. In addition, stakeholders might have a broader definition of usefulness or effectiveness that goes beyond what can be easily measured (for example, by student test scores). And, like with endorsement and fairness, policies might be more likely to be implemented successfully and to be sustained over time if the implementers perceive them to be useful.

We designed the surveys with these constructs in mind, although not every lever had survey questions pertaining to all four constructs for both teachers and SLs.

We designed both the teacher survey and the SL survey to take 45 to 60 minutes to complete, except for the teacher survey administered in 2014 and 2016, which was a short version designed to take 20 to 30 minutes to complete. With that exception, the content of the surveys changed relatively little from year to year, although some modifications were made each year, including some items being dropped and others being added. (In rare cases, we revised the wording on individual items, but we tried to keep such changes to a minimum to ensure comparability over time of results on a given item.)

3

Survey Sampling

In each IP site, the survey sampling frame included all regular, public schools serving students in grades K through 12.2 Table A.1 presents the number of surveyed schools in each site in each year.

Table A.1. Numbers of Schools Surveyed

Year HCPS PPS SCS Alliance Aspire Green Dot PUC 2011 239 62 191 18 30 16 12

2012a 228 60 188 20 34 18 13

2013 240 54 178 21 34 18 13

2014 240 54 186 20 37 16 13

2015 235 54 172 26 38 19 15

2016 236 54 163 27 38 21 15 a In 2012, we surveyed only SLs. In HCPS, some small alternative schools lacked SLs, so the 2012 number of schools is slightly smaller than that for the other years. Other year-­to-­year changes reflect growth or decline in the actual number of schools in each site.

We surveyed all SLs and a sample of teachers from every school within each site. We used a

stratified random sampling design to select the teachers, taking into account the subject area taught and years of teaching experience;3 the number of teachers selected in each school varied by site and school level. SLs included principals, APs, and all other staff holding equivalent titles (e.g., director, instructional leader, dean). We did not follow teachers longitudinally over the years of the survey; we drew a new sample of teachers each year. Table A.2 shows the total number of teachers and SLs invited to participate in the survey during each administration.

2 We excluded charter schools in the three districts, based on an understanding (from district central-office staff) that charter schools were not part of the IP initiative. In 2014, we excluded schools in SCS that were with the district only temporarily (i.e., legacy SCS schools that were departing to municipalities following the 2013–2014 year). 3 Specifically, we stratified based on core and noncore subject areas, in order to ensure adequate representation from teachers of all types. We defined core teachers as general-education teachers of reading and ELA, mathematics, science, social studies, and (at MS and HS levels) foreign languages. We defined noncore teachers as teachers of other subject areas and special-education teachers. Our samples typically consisted of approximately 80 percent core teachers and 20 percent noncore teachers. In addition, we oversampled novice teachers in the districts (which have high proportions of experienced teachers) and experienced teachers in the CMOs (which have high proportions of novice teachers) to ensure adequate representation from each group.

4

Table A.2. Numbers of Teachers and SLs Surveyed

Year Teachers SLs 2011 4,311 1,174

2012a N/A 1,209

2013 4,697 1,172

2014 4,838 1,287

2015 4,946 1,310

2016 5,055 1,319 a In 2012, we surveyed only SLs.

Survey Administration

Surveys were web-based and administered in the late spring of each year. We contacted survey recipients at the email addresses that the sites provided to the RAND team that collected site administrative data. We provided each recipient with a unique link to access the survey; this link included an embedded identification code by which we could track responses and merge them with administrative data, such as each teacher’s grade level taught and effectiveness rating, and school demographic characteristics (see “Archival Academic Data Methods”). We contacted nonrespondents about once per week throughout the data-collection period, initially by email and later by phone.4 Every person who completed the survey received a gift card;5 there were also occasional drawings for $50 gift cards and, at the end of each year, a final drawing for $500 school prizes from among schools with high response rates.

We calculated the survey response rate as the number of responding teachers (or SLs) divided by the number of sampled teachers (or SLs).6 Tables A.3 through A.6 show the response rates for teachers and SLs, respectively, in each district and each CMO in each year.

4 The administration of the 2016 surveys in Alliance followed a different procedure, in which site staff (not the evaluation team) emailed all teachers and leaders a generic survey link; completion of the survey was anonymous, and there were no individualized follow-up efforts. 5 The amount and disbursement of the gift card differed across years and surveys. In 2011, 2013, and 2015, each teacher received a $25 iCard for completing the survey. In 2014, each teacher received a $10 iCard for completing the survey, which was shorter that year. In 2016 (another short-survey year), each teacher invited to complete the survey received a $10 Amazon gift card, and each teacher who completed the survey received an additional $10 Amazon gift. Each SL, meanwhile, received a $25 iCard for completing the survey in each year from 2011 through 2015. In 2016, each SL invited to complete the survey received a $10 Amazon gift card, and each SL who completed the survey received an additional $15 Amazon gift card. 6 To be included in the response-rate calculation, as well as in the analysis, a survey had to have at least one question answered in more than half of the major survey sections.

5

Table A.3. District Teacher Response Rates, Surveys Completed, and Teachers Sampled

Year

HCPS PPS SCS

Rate (%) Completed Sampled Rate (%) Completed Sampled Rate (%) Completed Sampled 2011 84 1,168 1,393 78 657 838 82 1,052 1,282

2013 75 1,040 1,393 75 586 783 83 1,038 1,244

2014 79 1,109 1,397 70 548 780 84 1,087 1,298

2015 73 1,026 1,407 76 578 758 80 987 1,234

2016 81 1,168 1,442 74 562 762 75 862 1,157

Table A.4. CMO Teacher Response Rates, Surveys Completed, and Teachers Sampled

Year

Alliance Aspire Green Dot PUC

Rate (%) Completed Sampled Rate (%) Completed Sampled Rate (%) Completed Sampled Rate (%) Completed Sampled 2011 77 140 182 86 261 303 65 132 203 82 90 110

2013 77 313 407 79 285 359 61 206 335 76 134 176

2014 79 344 435 80 300 375 68 231 341 75 159 212

2015 70 363 518 68 276 403 64 239 376 62 156 250

2016 16a 97 598 77 316 408 69 286 416 68 185 272 a In the spring 2016 survey, the leadership at Alliance severely restricted our access to teachers, resulting in a lower response rate.

6

Table A.5. District SL Response Rates, Surveys Completed, and Leaders Sampled

Year

HCPS PPS SCS

Rate (%) Completed Sampled Rate (%) Completed Sampled Rate (%) Completed Sampled 2011 77 465 607 83 85 102 76 259 339

2012 81 493 610 80 78 97 82 277 337

2013 77 459 597 74 64 86 65 207 317

2014 68 433 637 71 58 82 66 254 386

2015 66 426 646 69 61 88 63 225 360

2016 56 366 651 61 54 89 54 188 349

Table A.6. CMO SL Response Rates, Surveys Completed, and Leaders Sampled

Year

Alliance Aspire Green Dot PUC

Rate (%) Completed Sampled Rate (%) Completed Sampled Rate (%) Completed Sampled Rate (%) Completed Sampled 2011 59 23 39 81 30 37 56 18 32 72 13 18

2012 67 33 49 72 38 53 66 25 38 76 19 25

2013 65 31 48 69 33 48 65 33 51 72 18 25

2014 78 43 55 62 32 52 71 37 52 70 16 23

2015 61 44 72 53 31 58 58 33 57 55 16 29

2016 15a 13 84 63 38 60 41 24 58 46 13 28 a In the spring 2016 survey, the leadership at Alliance severely restricted our access to school leaders, resulting in a lower response rate.

7

Survey Data Analysis

Weighting

We calculated sampling weights for each teacher based on the sampling design. (SLs had an implicit sampling weight of 1 because all SLs were surveyed.) Following data collection, for both teachers and SLs, we conducted nonresponse analyses to adjust the weights. We used a two-level hierarchical generalized linear model (individuals nested within schools) predicting the probability of response based on person-level characteristics, such as gender and years of experience, as well as school-level characteristics, such as percentage of students who were LIM and school level (elementary school, MS, or HS).7 Accordingly, the reported survey percentages represent the full population of teachers or SLs in each site in each year.

Analysis Strategy

We conducted survey analyses in Stata, using Stata’s survey estimation procedures (e.g., svy: proportion). For both teachers and SLs, we specified a two-stage design, with schools as the first stage and individuals as the second stage. At the first stage, we treated each site as a stratum, and we included a finite population correction for the number of schools in each site. At the second stage for teachers, we treated core and noncore teachers within each school as strata, with a finite population correction for the number of teachers (within school) in each stratum. At the second stage for SLs, we specified principals and APs within each school as strata, with a finite population correction for the number of leaders in each stratum. Analyses, which we conducted separately for each survey year, used Stata’s over option to provide separate results for each IP site.8

Survey results presented in this report are based primarily on descriptive analyses (i.e., survey-weighted proportions or percentages). We always present results separately for each site and for each year. Except where we present subgroup disaggregations (described later in this section), figures depicting teacher survey responses have green bars, and figures depicting SL survey responses have blue bars.

Selection of Survey Findings for This Report

Space did not permit reporting of results on every item in the teacher and SL surveys. In each report chapter focusing on an IP lever (Chapters Three through Eight), we selected items that were most salient to that lever along each of the four dimensions described earlier (awareness, endorsement, fairness, and perceived effects). In some cases, there were multiple relevant survey

7 The exact model used for the nonresponse analysis varied by site and by year. We included only predictors that were statistically significant (p < 0.05), prioritizing parsimony in model selection. 8 We did, however, create a file of responses across all years, which we used to test for the significance of differences between years (within each site).

8

items, and we used our judgment to select which ones to present in the main report. For some chapters, we also present results for additional related survey items in the appendixes.

That said, we report results for a relatively high proportion of survey items, particularly on the teacher survey. Using the 2015 teacher survey as an example (a “long-form” year) and excluding the questions related to time allocation, the survey had 309 individual items (including individual rows in table-type questions and individual checkboxes in checkbox questions). Of the 309 items, 21 were respondent background or teaching-situation items, 21 were checkbox or yes/no questions used primarily for routing to later questions, and 73 (constituting just nine question blocks) were on topics that turned out to be insufficiently relevant to the topics discussed in the report9 or, because of survey routing or skip patterns, were not answered by a large proportion of respondents. Of the remaining 194 items, we report results for 131 (68 percent) of them in the report or the appendixes.10

Subgroup Disaggregations

Starting in 2013, we disaggregated many of the survey results—especially those from survey items with Likert scale and yes-no response options—by a variety of respondent and school characteristics so that we could examine differences between subgroups. For Likert scale items, we typically collapsed the response options into two dichotomous categories, such as agree (combining “agree strongly” and “agree somewhat”) and disagree (combining “disagree strongly” and “disagree somewhat”), and looked at subgroup differences for only one of the two combined categories (e.g., “agree”).11 Disaggregations were done separately within site and within year.

For teachers, we disaggregated items by the following teacher and school characteristics:

• teacher experience: novice teachers versus experienced teachers • TE rating from the previous year: low versus middle versus high effectiveness category,

with the categories defined from sites’ own rating categories as shown in Table A.7 • teachers of core versus noncore subject areas (definitions provided earlier, but based on

self-report on the survey rather than on the extant data used for the sampling) • teachers of tested versus nontested subject areas and grade levels (based on self-report) • school level: elementary schools versus MSs versus HSs (or, in some cases, elementary

versus MS and HS combined or elementary and MS combined versus HS)12

9 For instance, 26 items (in three question blocks) pertained to perceptions of collaboration and leadership within the respondent’s site and school. 10 We do not present all the results graphically; some we report only in narrative form. Moreover, not all have results presented for every year of data available, particularly in the report itself, although, in many cases, we provide results for additional years in an appendix. 11 For items that had a “don’t know” or “not applicable” option, we coded responses of that option as missing prior to collapsing categories. 12 Each teacher in a school with a grade span crossing the traditional elementary/MS/HS boundaries was assigned a school level based on his or her grade most taught.

9

• school percentage of enrolled students with LIM status, specified in one of two ways:

- schools with 80 percent or more students with LIM status versus all other schools: The advantage of this specification was that it was based on an absolute criterion that might have intrinsic meaning. The disadvantage was that, in some sites (in some or all years), all of the schools fell on one side of the 80-percent cutoff, meaning that we could not make a comparison for that site.

- above median (top half) versus below median (bottom half), with median determined separately within each site (and year) based on its own distribution of school percentages of students with LIM status.13 The advantage of this specification was that all seven sites always had both categories (above median and below median). The disadvantage was that, if a site actually had very little variation across schools in the percentage of students with LIM status, schools in the two halves might not, in fact, have been very meaningfully different from one another.

Table A.7. Collapsing of Site TE Categories for Survey Item Disaggregations, by TE Rating

Site Low Middle High HCPS U or NI E HE level 4 or HE level 5

SCS Performing below or significantly below expectations

Meeting or performing above expectations

Performing significantly above expectations

PPS F or NI P D

Alliance Entering or emerging E HE or master

Aspire Emerging E HE or master

Green Dot

Entry or emerging E HE or HE 2

NOTE: We exclude PUC because it did not provide TE ratings after 2013.

For SLs, we disaggregated items by the following:

• position: principals versus other SLs (mostly APs) • school level: elementary versus MS versus HS (or, in some cases, elementary versus MS

and HS combined or elementary and MS combined versus HS)14 • school percentage of students with LIM status, specified in the same two ways noted for

teachers.

In the subgroup-disaggregation graphs included in the report, different colors are used for each of the different types of disaggregation. For example, comparisons of novice and

13 In some cases, we also looked at tertiles (thirds) rather than halves, again with the cut points determined within site (and year). Tertiles offered the advantage of allowing for comparison of more-extreme groups (i.e., bottom third versus top third) but had smaller samples within each third and thus had less statistical power. 14 Each SL in a school with a grade span crossing the traditional elementary/MS/HS boundaries was assigned a school level based on the schools’ grade-span enrollments.

10

experienced teachers have orange bars, while comparisons based on the TE rating have purple bars.

We present disaggregated results in the report only for items for which there was a clear theoretical rationale for comparing particular subgroups (i.e., a theory-based reason that results for one subgroup might differ from results for another subgroup).15 Exploration of subgroup differences across all survey items presented in the report, for all the subgroup classifications, all the sites, and all the years, was prohibitive.

Where we include subgroup comparisons, we indicate whether the difference between the subgroups (within each site) is statistically significant, using superscripts on the site names. Where we compare only two subgroups (e.g., novice and experienced teachers), we provide an indication of significance level between the two groups;16 each of these figures also includes a bar showing each site’s overall percentage. Where we compare three subgroups, we indicate whether the difference between each pair of subgroups is significant (p < 0.05), such as low-rated versus middle-rated teachers, low-rated versus high-rated teachers, and middle-rated versus high-rated teachers. These figures show bars for only the three subgroups, excluding the overall percentages.

Interview Methods

Interview Data Collection

Each fall, we conducted in-person interviews with the key central-office administrators in each IP site (see Table A.8) who were involved in developing, implementing, or reviewing the IP levers, as well as two or three selected local stakeholders (e.g., teachers’ union officials, school board members). The interviews focused on the development and implementation of the IP reforms and policies, such as the use of TE ratings, development and implementation of targeted PD, challenges, implementation successes, local contextual factors, and interactions with the foundation and with other districts.

15 In a few cases, we added subgroup comparisons that reviewers requested. 16 * denotes that the difference is significant at p < 0.05. ** denotes that the difference is significant at p < 0.01. *** denotes that the difference is significant at p < 0.001.

11

Table A.8. Number of Central-­Office Administrators and Stakeholders Interviewed Each Fall

Year HCPS SCS PPS Alliance Aspire Green Dot PUC 2010 21 12 19 1 1 2 2

2011 21 10 28 3 3 4 3

2012 11 12 18 9 8 8 5

2013 14 15 18 9 5 7 5

2014 12 13 17 13 7 10 8

2015 11 13 17 7 7 9 7

2016 8 3 9 4 3 3 3

NOTE: We also interviewed TCRP leaders who coordinated activities among the CMOs: five leaders in 2010, two in 2011, one in 2012, and one in 2014. The numbers of interviewees changed over time because of site input into which staff should be included. For example, in HCPS, during the initial years, we interviewed several staff who worked in finance or IT, and we dropped several of these staff from the sample in later years.

Each spring, we conducted in-person and telephone interviews with school staff at seven

schools in each district and one to two schools in each of the four CMOs. Table A.9 shows the number of teachers and SLs we interviewed in each site each year. We purposefully sampled the schools with feedback from staff in each site to ensure representation across grade-level configurations, geography, and achievement levels. We also considered site-specific implementation factors, such as piloting or implementation of policies or programs of interest in certain schools. In the first year of the project (2010–2011), we conducted in-person visits at all seven schools in each site and all seven schools in the CMOs.17 During these visits, we conducted individual interviews with three SLs (including the principal) and three teachers, as well as a focus group of six to eight teachers. In the second year of the study (2011–2012), to minimize burden on the schools, we conducted in-person visits at half of the schools in each site, conducting interviews with SLs and teachers as described, and telephone interviews with the principals in the remaining schools. We randomly selected schools for each group and switched them in subsequent years (e.g., schools that received in-person visits in the spring of 2012 received telephone interviews in the spring of 2013).

17 One school dropped out after the 2011 interview. In the spring of 2012, we added one Aspire school. In the spring of 2013, we added one PUC school. From 2013 on, the sample contained two schools at each CMO.

12

Table A.9. Number of School-­Level Staff Interviewed

Interview HCPS SCS PPS Alliance Aspire Green Dot PUC Spring 2011 school visit

SLs interviewed 18 20 21 4 1 4 2

Teachers interviewed (individual and focus group) 52 65 63 19 9 15 3

Spring 2012 school visit

SLs interviewed 11 13 13 3 3 4 3

Teachers interviewed (individual and focus group) 31 34 42 12 9 9 9

Spring 2013 school visit

SLs interviewed 12 8 8 3 2 2 2

Teachers interviewed (individual and focus group) 47 48 37 15 15 4 4

Spring 2014 school visit

SLs interviewed 8 8 9 3 3 2 3

Teachers interviewed (individual and focus group) 33 40 45 11 9 2 13

Spring 2015 school visit

SLs interviewed 15 8 8 3 2 4 3

Teachers interviewed (individual and focus group) 14 27 35 16 12 19 12

Spring 2016 school visit

SLs interviewed 7 6 7 2 2 2 2

Teachers interviewed (individual and focus group) 18 10 14 4 4 4 4

In the third year of the study (2012–2013), we adjusted our participant sample with the goal

of increasing the number of teachers interviewed. In the schools that received in-person visits, we reduced the number of SLs sampled from three to two (i.e., the principal and another SL), and we increased the number of teachers sampled for individual interviews from three to four. The teacher focus group was unchanged. In the schools that received telephone interviews, we sampled two teachers for telephone interviews in addition to interviewing the principals. In the fourth year of the study (2013–2014), we further refined our participant sample; in the schools that received telephone interviews, we reduced the number of teachers sampled from two to one to reduce the burden on the school. Sampling for the schools that received in-person visits was unchanged. We repeated these sampling and interviewing procedures in the spring of 2015. In the spring of 2016, we conducted in-person visits at all seven site-visit schools and interviewed the principal and two teachers at each school. In the three districts, any teacher who participated in a focus group scheduled after school hours received a $25 gift card as thanks for his or her time.

A member of the research team conducted each interview using a semistructured protocol to guide the questioning. We also used probe questions as needed to follow up, and we audio-recorded all interviews and focus groups. We informed all participants that their interview responses would be confidential and that any reporting would be done in the aggregate. We also

13

informed participants that no responses or quotations would be reported in a way that would allow them to be identified. School-based in-person and telephone individual interviews lasted approximately 45 minutes, and the in-person focus group lasted approximately one hour. We randomly sampled teachers for the individual interviews and focus groups to ensure variability across grades and subjects (tested and not tested), years of teaching experience, and levels of involvement or holding special roles in the school (e.g., coaching or CL roles). We requested the staff rosters used for sampling directly from the district central office or from the principals of the CMO schools, and we requested supplemental information (e.g., teachers serving in coaching or CL roles) from the principals. An interview with central-office staff lasted one hour.

Interview Analysis

The analysis of the interview data each year proceeded in several steps. First, we compared interview notes with the audio recording and cleaned them to serve as a near-transcript of the conversations. We then loaded the cleaned interview notes into the qualitative analysis software package NVivo 10 and autocoded them by interview question (i.e., so responses to specific interview questions were easily accessible). We also coded them using a thematic codebook that we developed. (For example, we included such codes as “teacher evaluation system,” “teacher PD, coaching, mentoring,” “communication strategies,” and “challenges.”) Once we finished the thematic coding, we conducted a second round of coding, analyzing the data according to research questions of interest (e.g., how do principals’ opinions about the teacher-evaluation measures differ from teachers’ opinions?). At this stage, we used an inductive coding process (i.e., we derived codes from the data rather than from a structured codebook) to develop responses to the question of interest. The codebook remained largely unchanged from the beginning of the study, with some minor revisions to eliminate redundancies or to capture new themes as they emerged. The consistency of the codebook and coding methodology over time allowed us to examine changes over time, as well as look at each year’s interviews individually.

Archival Academic Data Methods

Data Acquisition

Each of the three districts and four CMOs provided us with administrative data on students and staff. They provided the data for school years 2007–2008 through 2015–2016. Student-level data included enrollment by date and school, demographics, FRPL status, ELL status, gifted status, and state assessment scaled scores. Staff-level data included demographics, highest degree attained, NBPTS certification status, years of experience in the site, job title, and other fields used for survey sampling and administration (see the “Survey Methods” section of this appendix). We also obtained site-generated composite TE levels for each teacher during the years we computed these scores. We linked students with their teachers and classmates by using

14

administrative records on courses and class sections for each student and teacher. We used these data sets for survey sampling and administration, for outcome analyses contained in this report and in interim reports to the foundation and the sites, and for the creation of administrative dashboards provided annually to the sites and the foundation.

Estimation of Teacher Value Added

We used the student and staff data that the sites provided to calculate teacher VAM scores, which, in turn, we used to analyze the relationship of value added to various aspects of the initiative. Here, we describe our methodology for estimating VAM scores. In later appendixes, corresponding to Chapter Seven and Chapters Ten through Thirteen, we describe how we used the VAM scores to analyze the initiative’s effects on various outcomes.

Our methodology estimates teacher VAM scores by performing a two-stage least-squares regression of student achievement (standardized to z-scores) on lagged student achievement (instrumented by achievement in the other subject), student and classroom covariates, and a full set of teacher indicator variables, which capture each teacher’s VAM score. Including classroom covariates is important to control for peer effects and the different learning environments in which LIM students often study, independently of the teachers (see Goldhaber, Quince, and Theobald, 2016).

We estimate VAM scores and the estimates’ sorting parameters in separate stages, employing a generalized least-squares hierarchical fixed-effects approach that Borjas and Sueyoshi, 1994, describes and Aaronson, Barrow, and Sander, 2007, applies to teacher VAM scores. In the first-stage model,

(A.1)

Aicjt is student achievement for student i assigned to teacher j in year t and classroom section c. We first scaled it to a state-level z-score using the state/year/grade standard deviations and means and, from there, scaled to the national level using NAEP.18 Achievement is a function of lagged achievement (Ait – 1), which is an estimate of the combination of innate ability and prior learning; observed student-level covariates (Xit), including gender, race and ethnicity, socioeconomic status, being overage for one’s grade, gifted status, and status as an ELL; and classroom-level covariates (Zct), which include lagged student test scores and the other covariates, each aggregated to the classroom level, as well as class size. µjt is the teacher VAM score in year t, and εicjt is the random noise (unexplained variation in student test scores).

18 We want estimates of VAM scores to be in units that allow us to compare across sites and over time, which scaling to the external NAEP allows us to do. A sample of students in grades 4 and 8 takes the exam every two years in each state. We use the means and standard deviations for each state and nationally to rescale scores to the national norm. We use linear regression to interpolate means and standard deviations for grades in between grades 4 and 8 (so grades 5 through 7) and for untested years.

Aicjt =α0 +α1Ait−1 + Xitα X + Zctα Z + µ jt + ε icjt .

15

Student-level and classroom-level covariates (i.e., measures except for lagged test scores) are centered at their site-specific (i.e., district- or CMO-specific) means.

The inclusion of classroom-level covariates allows us to separate teachers’ contributions to student learning and the aggregate effects of the classroom composition. We identified the effects of the classroom-level covariates within teacher, taking advantage of the fact that many teachers across grades and sites teach more than one class section in a given content area each year.

Equation A.1 could alternatively be estimated in two stages: one that regresses student test scores on student covariates and classroom dummy variables and a second that regresses the estimated classroom fixed effects on classroom-level covariates and teacher dummy variables. However, in sensitivity analyses, we found very little difference in VAM estimates or in associations between estimates of VAM scores and students’ LIM status when we collapsed the first two stages, as shown in Equation A.1. This suggests that the classroom-level covariates capture the important sources of variation for teachers’ classroom-level deviations from their overall VAM scores.

Our models account for the fact that test scores (and thus lagged test scores) are measured with error. Like Briggs and Domingue, 2011, in accounting for this measurement error, we use two-stage least squares and instrument lagged test scores using the lagged test scores from the other subject (e.g., lagged mathematics score is instrumented by lagged reading score).19

To estimate Equation A.1, we use weighted least squares (WLS), with weights given by the proportion of the year in which a given teacher taught students in the tested subject. In other words, following the Hock and Isenberg, 2012, full-roster method, a student’s test score might appear as multiple observations in the data, with one record for each course in which the student was taught the tested subject. Weights reflect the proportion of the school year that the student spent in a particular course and are constrained not to exceed 1. This constraint means that we anticipate 0 marginal return to supplemental doses of mathematics or reading instruction beyond the first course. Weights are calculated as

19 We experimented with various instruments, such as double lags in the same subject and in the other subject, and found little difference in the estimates of VAM scores or in the teacher sorting coefficients. Likewise, we tested the inclusion of lagged other test score as a control variable instead of as an instrument and found similar results. We settled on the specification used here to be consistent with the literature that accounts for measurement error and to retain as many observations as possible (hence, not using double lags). We note that Lockwood and McCaffrey, 2014, investigates a variety of methods for correcting for measurement error and uses simulation methods to show that a well-identified instrumental variable method performs just as well as a more burdensome method based on conditional standard errors of measurement. It does not, however, investigate whether using an additional score as an instrument, like we do, is preferable or using it as an additional covariate is.

pk,

16

where p is the proportion of the school year the student spent in a given school (using modal enrollment days at that school as a denominator) and k is the number of unique mathematics or reading class sections in that school to which the student is linked in a given year.20

20 In sensitivity tests, we gave each record a weight of p rather than

thereby allowing the sum of a student’s weights to exceed 1. Our results were not sensitive to the use of this alternative weighting approach.

pk,

17

Appendix B. Site TE Measures: Supplementary Material for Chapter Three

The following descriptions supplement the information on TE measures presented in Chapter Three. We first describe the districts, followed by the CMOs.

Districts

HCPS

Composite Measure

The final composite rating (0–100) consists of up to 40 points based on VAM scores and up to 60 points from the classroom observations. When the TE measure was first implemented in 2011–2012, 30 points of the observation score derived from the school administrator observations and 30 points derived from the peer evaluator or swap mentor observations. Starting in the 2012–2013 school year, HCPS revised the composition so that 35.1 points derived from the school administrator observations and 24.9 points derived from the peer evaluator or swap mentor observations. This change was intended to reflect that school administrators evaluated teachers on more components of domain 4 (professional responsibilities) than peer evaluators and swap mentors. The composite rating is broken into five performance levels, determined in 2012–2013 and used through the rest of the time of the grant:21

• level 5 (HE): 70–100 • level 4 (HE): 63–69.9999 • level 3 (E): 46–62.9999 • level 2 (NI): 42–45.9999 • level 1 (U): 0–41.9999.

Classroom Practice Measure

Before the grant was awarded, HCPS had already started developing a new classroom practice evaluation rubric, based on the 22 components of professional practice from the FFT. Development began in 2009, and the new rubric was implemented for observations and evaluation in the 2010–2011 school year. The teacher’s union, HCTA, was involved in developing the new evaluation system and bought into the process from the beginning, accepting the contract that included the new rubric with 96 percent of the voting membership in the 2010–2011 school year.

21 Level 5 and level 4 are both referred to as HE.

18

The observation component included both formal and informal peer observations, as well as formal and informal observations by the principal or AP. Observations were scored using a four-point scale on a 22-item rubric, based on the FFT and aligned to the FEAP. We divided the 22 items into four weighted domains: planning and preparation (20 percent), the classroom environment (20 percent), instruction (40 percent), and professional responsibilities (20 percent). The final scores included findings from both formal and informal observations. For their first two years in the district, new teachers were observed six times: one formal and two informal observations by a school administrator and three formal observations by the teacher’s swap mentor, a mentor assigned to the new teacher specifically for observation purposes. The number of observations for other teachers depended on their combined observation score from the prior year. Through 2015–2016, all teachers received a minimum of one formal observation each from a school administrator and a peer evaluator. A teacher with a combined observation score of 22.99 or less received an additional formal peer observation. A teacher with a combined observation score of 45.00 or higher received one informal administrator observation and could choose to have one informal peer observation. A teacher with a score between 35.00 and 44.99 received two informal observations (one administrator, one peer). A teacher with a score between 23.00 and 34.99 receive two informal observations (two from administrators, two from peers), and a teacher with a score lower than 23.00 received five (two administrator, three peer). Any teacher in the Deferred Retirement Option Program (i.e., had declared his or her intent to retire within three years) and with an overall rating of E or HE received two formal observations (one peer, one administrator).22 This change represents a sharp increase in the number of both formal and informal classroom observations—prior to 2011–2012, most experienced teachers received formal observations less than once per year.

Student Achievement Measure

Before the grant and in its first year, HCPS used Florida’s MAP scores to measure student achievement with a value table calculation. To develop a robust student growth measure to replace MAP, HCPS partnered with VARC. VARC produced its first calculations of VAM scores for HCPS in the fall of 2011 for the 2010–2011 school year; HCPS used the same method all subsequent years of the grant. At the beginning of the grant, the Florida state test was the FCAT. In the spring of 2015, Florida switched to the FSAs. Although the method of calculating the VAM score did not change, the changeover caused considerable delay.

Students take standardized tests in all subjects in HCPS. Local standardized tests have been developed for those subjects that the state does not test. Therefore, HCPS can calculate a VAM score for each classroom teacher based on a standardized test score. Weights of state and local test scores vary depending on subject. Student performance is calculated for up to three prior years of data, depending on how many years are available for a given teacher. Scores from all

22 In 2016–2017, HCPS discontinued the peer evaluations for all teachers and simplified the observation schedule.

19

three years are reported to the teacher. To combine the VAM scores with the classroom-observation data on a 100-point scale at the proper percentages, we rescaled the data from a total of 60 points (centered on an average score of 38) to a 40-point scale (centered on an average score of 25).

PPS

Composite Measure

Before the IP initiative, principals rated PPS teachers as either S or U. The composite teacher-evaluation measure that was developed as part of the IP initiative consisted of three components (observation of practice, student achievement growth, and student feedback) and has a maximum score of 300 points and four performance levels:

• D: 210–300 • P: 150–209 • NI: 140–149 • F: 0–139.

U ratings result from an F or from two NI ratings in the same certification area in a ten-year period; all other ratings are considered satisfactory. PPS first provided teachers with a preview of their composite score data in the spring of 2013, based on 2012–2013 data. Principals were also provided these preview data for their teachers, but teachers received their scores a few days before principals. PPS implemented the measure as its teacher-evaluation system, with stakes attached, in the fall of 2013. In the spring of 2014, teacher performance information was provided to principals, as well as teachers. Thereafter, teachers received reports summarizing their evaluation data in the spring of each year, shortly before the end of the school year. These reports were delivered to teachers via email; teachers could also access their performance information via the district’s online portal.

In the PPS measure, observations of practice are weighted at 50 percent, a measure of individual student achievement growth is weighted at 30 percent, student feedback is weighted at 15 percent, and school student achievement growth is at 5 percent. PPS used its composite measure for compensation decisions for some teachers, for determining eligibility for differentiated career roles and for performance improvement plans. The composite measure and its components were developed collaboratively with the union, PFT. The composite measure was used for a majority of teachers in all subjects—specifically, the measures of classroom practice did not include subject-specific measures. The combined measure was not used for pretenure teachers in their first three semesters of service; teachers in PPS’ special schools, which serve students with exceptionalities; and other unique teacher groups, such as early childhood.

The composite measure was calculated by multiplying the scores for each component by the weight of that component and then adding them. The measures of student outcomes and student feedback, which were calculated on a normal curve–equivalent (NCE) scale, were multiplied by

20

3.03 before weighting. PPS makes this precise adjustment to translate the NCE scale, which is 1 to 99, to the 300-point scale used for the other measures.

Classroom Practice Measure

RISE, an observation rubric, was based on the FFT and developed in 2008–2009. RISE was piloted in about one-third of the district’s schools in 2009–2010, before the award of the IP grant, and implemented district-wide in 2010–2011. From 2010–2011 through 2012–2013, the district used RISE scores as its teacher-evaluation measure while other measures were being piloted. The RISE rubric was revised over time, always by a committee of teachers, union officials, and district staff, to simplify the language and include examples of what each level of practice should look like, with the goal of helping observers rate practices more consistently.

From 2010–2011 through 2013–2014, RISE ratings were based on scores on 12 “power components” out of 24 total components across four domains. Teachers, union officials, and district staff considered the 12 power components to be those most important and indicative of good instruction and described them as providing a “common language for what effective teaching looks like in the district.” Principals were the primary observers and scored the power components in one of four categories (i.e., U, basic [B], P, D). The principal determined the final RISE score based on a “preponderance of evidence,” which was a qualitative judgment the principal made after considering the teacher’s RISE scores throughout the year, along with other evidence, such as receptiveness to feedback, improvement of practice, and the teacher’s self-evaluation. In the first year of RISE implementation (2010–2011), all teachers were observed; in 2011–2012 and 2012–2013, tenured teachers with satisfactory performance were observed every two or three years, depending on principal preference. In the nonobservation years, each tenured teacher was expected to complete an independent project (called a supported growth project), which focused on one RISE component, with the result that about one-third to one-half of tenured teachers would be observed in a given year. A teacher working on a supported growth project would be scored on the single RISE component related to his or her project; ratings for the other RISE components were carried over from the previous year. In 2014–2015, the district replaced supported growth projects with the Independent Growth Year (IGY). A teacher on IGY was not observed and did not complete a project. IGY teachers’ RISE ratings were carried over from the previous year; other measures that were part of the composite score (e.g., VAM, Tripod) were assessed in the IGY.

A pretenured teacher was observed each year until he or she achieved tenure. In the years in which a tenured teacher was observed, he or she received at least one formal observation and multiple informal observations (at the principal’s discretion) per year; a pretenure teacher received at least one formal observation and multiple informal observations (at the principal’s discretion) per semester. In 2012–2013, PPS developed guidelines for adjusting the number of observations based on teachers’ needs; these guidelines were developed as part of the release of the preview composite score in the spring of 2013. Starting in the fall of 2013, any teacher with a

21

performance rating of NI or F received up to 15 “touch points” per year (out of which two were formal observations). A tenured teacher with a preponderance of P in RISE domains 2 and 3 received one formal observation and four to six informal observations per year. Formal observations could be either announced or unannounced. An announced formal observation followed a protocol that consisted of four steps: preconference, observation, teacher self-score, and postconference; an unannounced formal observation did not include a preconference.

Teachers received training on the RISE observation process throughout implementation, and PPS had a formal process for training and calibrating observers in the early years of the initiative. From 2011–2012 through 2014–2015, as part of this process, PPS principals were expected to rate videos of teacher practice using RISE, discuss their scores and resolve any discrepancies, complete a training course, and train the other observers in their buildings (i.e., other building administrators and, in some schools, teachers in CL roles). Most principals passed the calibration process; those who did not were provided with extra training and support from their supervisors but were not barred from observing teachers. In the fall of 2012, a new district-level role, instructional leadership specialist, was implemented to conduct co-observations with principals, with the goal of helping increase the accuracy of their ratings. As of 2015–2016, the process for calibrating observers was less formal; principals were expected to participate in periodic calibration conversations with their supervisors and with grade-level peers in their school support networks.

As of 2013–2014, the year the composite measure was implemented, each of the four RISE performance categories was assigned a point value (D = 300, P = 200, B = 100, U = 0), and 15 power components across the four performance categories were rated on this 0–300 scale. The final rating for each component was weighted and averaged to arrive at the final observation score. In the fall of 2015, PPS discontinued the practice of rating teachers on RISE components during informal observations; however, observers were still expected to collect evidence and share that evidence with teachers in a feedback conversation. PPS made this change in an effort to focus informal observations on conversations about growth and feedback, as well as to reduce the burden on observers. However, the evidence collected during informal observations could still be used to inform teachers’ summative RISE ratings at the end of the year.

Student Achievement Measures

In 2009–2010 and 2010–2011, PPS contracted with Mathematica Policy Research to develop customized, individual teacher VAM scores (where data were available) and school-level VAM scores. PPS solicited teacher input during development of the VAM scores, with the intention of ensuring that the measures reflected the things that the district thought were important (e.g., treatment of student characteristics). Individual VAM scores were first shared with teachers in the spring of 2012 but were not part of the TE measure and were not shared broadly with principals. VAM scores were first used for teacher evaluation in 2012–2013 and shared with teachers and principals in the spring of 2013.

22

The individual VAM score is based on three years of data but does not include the current school year. For example, the VAM score that each teacher received in August 2013 was based on data from the 2009–2010, 2010–2011, and 2011–2012 school years. PPS chose this approach for two reasons: The first was to enable a more stable estimate, and the second was to be able to include a VAM score in the composite measure when that measure was provided to teachers at the end of the school year. Similarly, the school-level VAM score was based on two years of data and did not include the current school year. Individual VAM scores were calculated only for teachers with the requisite data, which are typically state tests, so typically very few teachers received VAM scores. PPS worked with Mathematica to create a value-added model that would include as many teachers as possible and, as a result, included some district-developed tests—CBAs—in the models. Although the district made an effort to maximize the number of teachers with VAM scores using available tests, PPS committed to not developing additional tests solely for the purpose of teacher evaluation. Over time, teachers not only expressed concerns about the quality of the CBAs but also found the practice of using locally developed tests for high-stakes purposes problematic, and they were removed from the VAM calculations in the fall of 2015. School and individual scores are given on an NCE (a distribution of 1 to 99, with 50 as the median), and then number of points was determined by multiplying the NCE score by 3.03 to get the score on the 0–300 scale, which was then input into the composite measure. VAM scores are scaled to SLO scores (the means and standard deviations are set as equal) so as not to disadvantage teachers with VAM scores.

PPS teachers without individual VAM scores measured student growth using component 3f on the RISE rubric for two school years (2012–2013 and 2013–2014) and, in 2014–2015, switched to using SLOs, a procedure required by the state. Principals rated component 3f, which PPS developed in its adaptation of the FFT, on the four-point RISE scale, and it was weighted at 30 percent in the composite measure. SLOs, which were piloted in 2013–2014 and adopted in the fall of 2014 to conform to state requirements, were written centrally for each grade and subject, and teachers worked with principals to set their own targets (e.g., 100 percent of students will accomplish x). At the end of the year, principals scored the SLO using the same four categories as those used for performance levels (i.e., D, P, NI, or F). The performance level was determined based on the percentage of students who met the stated target. Once a categorical rating was assigned, it was then translated into a numeric score on the district’s 300-point scale, in much the same way as RISE ratings, for incorporation into the combined measure.

Student Feedback Measure

PPS used the Tripod survey, developed by Ron Ferguson, as its measure of student feedback. It was administered twice per year to one class of students per teacher. The Tripod survey was piloted in a few schools in 2010–2011 and was administered district-wide for formative purposes in 2011–2012 and 2012–2013. Results from the 2011–2012 pilot were shared with teachers but not with principals; results from the 2012–2013 pilot were shared with teachers, principals, and

23

central-office staff. Tripod scores are compared within grade bands (e.g., K through 2, 3 through 5) and then scaled to NCE scores, which are on a 1–99 scale. PPS chose to compare Tripod results within grade band, rather than district-wide, in an effort to avoid disadvantaging teachers of higher grades; as their rationale for this decision, central-office staff mentioned evidence from national studies suggesting that students in upper grades tend to respond more negatively than students in lower grades. The NCE score was then multiplied by 3.03 to calculate the number of points for the composite measure. Multiple years of data were used where available.

SCS

Composite Measure

Before the IP initiative, principals rated SCS teachers annually on a multidimensional rubric, the Tennessee, which was based on the FFT. The state calculated a measure of value added, TVAAS, for teachers in tested grades and subjects. TVAAS scores were shared with teachers but were not used for evaluation. This system was in use until July 2011, when SCS (then MCS) adopted TEM. In 2010, shortly after SCS was awarded the IP grant, the state of Tennessee was awarded a federal RTT grant, one of the requirements of which was that the state implement a teacher-evaluation system using multiple measures. From 2010 through 2011, SCS worked closely with the state to inform the design of the state TE system and adopted the state’s implementation timeline.

In the SCS measure, as of 2015–2016, TVAAS was weighted at 35 percent, and a measure of student achievement was weighted at 15 percent (classroom practice was weighted at 40 percent, student feedback at 5 percent, and other measures at 5 percent) for teachers in tested grades and subjects and teachers with portfolios, which are a measure of student growth for teachers of world languages, fine arts, health, and physical activity and carry the 35-percent weight. Measures of classroom practice were given greater weight (65 percent) for teachers without test or portfolio scores; for such teachers, data on state-level student achievement were weighted at 10 percent, and the other weights remained the same. From July 2011 to July 2013, the other measures (5 percent of the total) consisted of a measure of teacher content knowledge. In July 2013, this was changed to a measure of professionalism as a result of the merger between legacy MCS and legacy SCS.

TEM has five performance levels:

• significantly above expectations (TEM 5): 425–500 • above expectations (TEM 4): 450–424.99 • meeting expectations (TEM 3): 275–349.99 • below expectations (TEM 2): 200–274.99 • significantly below expectations (TEM 1): 100–199.99 points.

24

The maximum score is 500 points. Each TEM component is given a score between 1 and 5; these scores are weighted by multiplying by the weight, and the weighted scores are summed to produce the final score.

In 2015–2016, there were problems administering the state tests on which the student achievement measures are based; the tests were administered only in the applicable HS grades and subjects. As a result, student growth measures could not be calculated for teachers in grades K through 8. Teachers’ prior-year observation scores were substituted for the student achievement portion of the measure.

Classroom Practice Measure

In 2009, when SCS was awarded the IP grant, the district piloted three observation rubrics as measures of classroom practice and, in 2011, selected the Washington, D.C., IMPACT rubric as its measure (locally called the TEM rubric). The TEM rubric had four domains, two of which (teach and classroom learning environment) the observer rated. The TEM rubric included grade-level and subject-specific addenda (e.g., special education, early grades, HS grades and subjects), which were intended to clarify what teaching practice at each level should look like in specific grades and subjects. Use of these addenda was optional until 2015–2016, when they became required. A tenured teacher received a minimum of four observations per year (at least two unannounced) for a combined total of 60 minutes, and a pretenure teacher received a minimum of six per year (at least three unannounced) for a total of 90 minutes. Principals were expected to conduct the first and last evaluations each year, but the other observations could be conducted by other school or district administrators. Observers were trained in a district-wide process and participated in monthly “norming” training, generally using videos, intended to maintain interrater reliability. To be considered certified, observers had to score within one point of a master rater, a process SCS calls calibration. Raters who did not meet the calibration threshold were required to participate in an additional training session focused on intensive review of the rubric indicators. Most principals passed the certification test, but those who did not were given extra support; they were not barred from observing teachers. After the first year of implementation, principals were responsible for training the raters (e.g., APs) in their schools.

After the merger, the number of observations teachers received depended on their observation scores, and teachers were placed in “tracks” (i.e., groups) that specified the number of announced and unannounced observations. Lower-rated teachers received more unannounced observations. As of the spring of 2016, there were three observation tracks: (1) teachers in the first year of service; (2) teachers with prior-year TEM score of 1 or 2; and (3) teachers with prior-year TEM scores of 3, 4, or 5. A teacher in track 1 received one announced and three unannounced observations. A teacher in track 2 received one announced and two unannounced observations for the year and began the year with an initial coaching conversation about prior-year performance. A teacher in track 3 received one announced and one unannounced observation. An observation was added for any teacher, in any track, who received a score of 2

25

or less on two or more rubric indicators during the year. In addition, principals could add observations at their discretion. In the fall of 2014, the district decided not to score the classroom learning environment domain and further reduced the number of observations for tenured teachers at the highest performance levels in large part to reduce the workload for principals. In addition, the district revised the rubric yearly to clarify language and guidance for observers and to align with the Tennessee Academic Standards. Each teacher received a report that contained his or her evaluation data for the practice, student feedback and other measures at the end of the school year, and his or her student achievement measures that fall.

Student Achievement Measures

Each teacher in a tested subject received a measure of student growth in the form of TVAAS, the state’s system for assessing value added; the measure included all years of available data for that teacher and subject. In 2012, SCS adopted portfolios as a measure of student growth for teachers in some nontested subjects (e.g., world languages, fine arts, health and physical education). Portfolios were intended to show improvement in student work toward specific goals over time and were scored by peer raters (e.g., retired educators) with expertise in the subject matter. TEM also included a measure of student achievement per state requirements. Teachers could choose from a list of state-approved measures (e.g., state test scores) in consultation with their principals. Most teachers did not receive TVAAS scores for the 2015–2016 school year because there were statewide logistical issues administering TCAP, the state test, after a transition to a version aligned with Tennessee standards. Teachers in nontested subjects did not have individual measures of student growth, but 10 percent of their composite measures consisted of school-level TVAAS scores, which use one year of data.

Student Feedback Measure

SCS uses the Tripod survey as a measure of student feedback and piloted the measure in 2009–2010 and 2010–2011. Special-education teachers do not receive Tripod scores, and the weight of their practice measures are increased accordingly. From 2011–2012 through 2014–2015, Tripod was administered twice per year and the scores combined for inclusion in the composite TEM. In the fall of 2015, only the highest of the two scores was included in TEM, to mitigate the problem of missing Tripod scores for teachers who were hired late or who change teaching assignments midyear. In the fall of 2015, the district also switched to using the shorter, 30-question version of Tripod, rather than the longer, 80-question version, to combat survey fatigue. Tripod scores were scaled to NCE scores, which are on a 0–99 scale. The NCE distribution is divided into quintiles and scores of 1 to 5 assigned for weighting for the composite TEM.

26

Other Measures

From July 2011 to July 2013, the other measures (5 percent of the total) consisted of a measure of teacher content knowledge. Teachers could choose from a menu of options that included teachers’ Praxis (licensure test) scores, completion of content-specific PD, observation by a content-area specialist, or a portfolio. In July 2013, after the merger, SCS stopped using measures of teacher content knowledge and replaced them with a measure of professionalism, which was in use in legacy SCS. The professionalism rubric had four components: professional growth and learning, use of data, school and community involvement, and leadership. Teachers and school administrators were expected to collect evidence of a teacher’s professionalism in these domains and meet at the end of the year to determine a final score.

CMOs: Common Elements of the TE Measures Because TCRP began as a consortium of CMOs, in accordance with their Gates Foundation

grant, the CMOs developed common student growth and teacher practice measures and common evaluation component weights. Working jointly, they also developed a common observation rubric and observation process and used common stakeholder feedback measures. Each of the CMOs communicated its vision of TE through extensive teacher and SL participation in the development and review of measures, through members of an advisory council composed of teacher representatives from each school who were intended to act as two-way conduits of information, and through PD sessions examining each indicator on the College-Ready Teaching Framework and each evaluation measure. In this section, we describe the common initial elements of the TCRP evaluation system and, in the subsequent site-focused sections, describe the ongoing development and modifications made by each of the CMOs.

Composite Measure

All of the CMOs began with the same weights for the components of the evaluation—40 percent teacher practice, 40 percent student achievement, and 20 percent stakeholder feedback. Teachers received results on their observations within a few days and survey results within a few weeks. Student achievement results were not available until the following fall. Typically, results were available online, and SLs reviewed them with teachers. Some of the CMOs, prepared written reports to teachers showing their results and comparing them with the results of other teachers. Composite scores were calculated in the fall of the following school year, when the student assessment results became available. For the calculation of the composite score, each evaluation measure was converted to a four- or five-point scale and then the teacher’s score was multiplied by the weight of the measure. Each CMO set the cut points on the scale for each individual measure and set the cut points for the composite measure. Once the special-education rubrics were implemented, most of the CMOs developed a separate set of weights for special-education teachers. With the loss of state test scores in 2013–2015 and the consequent

27

inability to calculate SGPs, the CMOs each began to adjust the weight of the components, increasing the weight of teacher practice and decreasing the weight of student achievement.

Classroom Practice Measure

The CMOs reviewed existing teacher-evaluation frameworks and selected the FFT as a base. Working with administrators and teachers and using an iterative process, Teaching Learning Solutions created a draft rubric that was submitted to administrators and teachers for feedback, and, after more modification, the resulting framework was called the College-Ready Teaching Framework. For the observation process, CMOs agreed on a minimum for teachers of one preobservation conference, one classroom observation, and a postobservation conference plus one other event (e.g., a shorter observation, peer observation, portfolio). In the spring of 2012, Teaching Learning Solutions trained SL observers, who then piloted the rubric and the process in a minimum of six schools in each CMO. The rubric was revised, a process that continued annually, and the teacher practice measure was implemented with all teachers in 2012–2013. Concurrently, the CMOs worked with several vendors to provide platforms for entering the observation data. By 2012–2013, all the CMOs were using the BloomBoard platform to enter teacher-observation scripts, ratings, and recommendations.

All of the CMOs began with formal observations of about 45 minutes (or the length of a classroom period) and several informal observations of about 20 minutes. Teachers were rated on a rubric based on the FFT, with 39 indicators. Ratings were on a scale of 1 to 4, with very specific explanations of the requirement for each rating level. From 2014 on, the CMOs began experimenting with shorter, more-frequent observations and shorter rubrics. See the individual site descriptions of the classroom practice measure in the next section for details. The formal observations consisted of three parts: a preobservation conference, observation, and a postobservation conference. At the preobservation, the teacher and evaluator discussed the lesson that would be observed, and the teacher presented supplemental materials (e.g., examples of student work). During the observation, the evaluator took detailed notes (called scripting), which were then assigned to specific indicators on the observation rubric as evidence for the rating. The scripting and ratings were entered onto an online platform, BloomBoard, and were available to the teacher. At the postobservation, the teacher reviewed what had occurred in the observation and discussed his or her ratings of the lesson and the observer’s rating. Typically, after the informal observations, teachers received either emailed or in-person feedback. The extent to which indicators were scored during the informal observations varied by CMO. At all of the CMOs, evaluators had to be certified or conditionally certified to be observers. The certification was annual and was conducted for all the CMOs against the same true-scored video. If an evaluator did not pass all the certification areas, he or she could be conditionally certified but needed to be accompanied by a certified observer for evaluation purposes. If the evaluator did not pass any areas, he or she was not certified and could not conduct observations for evaluation purposes. Observers who had difficulty passing the certification test received one-to-one

28

coaching and continued to be tested until they were certified. It is rare for an SL not to be certified. Central-office representatives from all of the CMOs calibrate annually with each other against a true-scored video.

Student Achievement Measure

The CMOs selected SGPs as their student achievement growth measure. They were chosen instead of a measure of value added because they were perceived as easier than VAM to explain to teachers. Scores were based on the CSTs using the Los Angeles Unified School District scores as a comparison group. In 2013–2014, California began to transition to a new state assessment, Smarter Balanced, which was more closely aligned with the Common Core State Standards. No CSTs were administered in 2013–2014; instead, the state piloted the new assessment, but scores were not reported. The first scores on the new assessments reported to schools were for 2014–2015. Because a minimum of two years of scores is necessary to calculate an SGP, it was not until the 2015–2016 scores were available that an SGP could once again be calculated from state assessment data. Each of the CMOs made its own adjustments to its evaluation calculations. Even in 2016, when the CMOs could use state scores, they were hesitant to calculate growth scores using the state assessment results, wanting to wait until they felt that they were valid and reliable measures. See the site descriptions in the next section for details on the changes that each CMO made to its assessments and their weights in its composite evaluation measure.

Student Feedback Measure

The CMOs piloted the Tripod student survey in 2010–2011. Feedback from teachers and administrators indicated that the Tripod survey took too long and that there were research questions (e.g., “How many people live in the household?”) that were unnecessary. The survey was refined in 2011–2012 and substantially shortened. From 2012–2013 on, each of the CMOs created its own version of the survey, generally bringing the questions into closer alignment with the teacher rubric.

Family Feedback Measure

All of the CMOs piloted the Tripod family survey in 2010–2011 and subsequently conducted family surveys based on their own modified versions of the Tripod survey. The family survey typically contains parent satisfaction–type questions.

Peer Feedback Measure

For several years, Aspire, Green Dot, and PUC conducted annual peer surveys and found that the ratings all tended to be very high. However, because they typically contributed only 5 percent of the composite TE rating, they had little impact.

29

CMO-­Specific Aspects of the TE Measures

Alliance

Composite Measure

Alliance created the following weights for its teacher-evaluation composite measure:

• 55 percent observation • 25 percent student achievement • 10 percent family survey • 10 percent student survey.

These weights remained the same during the entire study period; special-education teachers used the same weights.

Classroom Practice Measure

Of the two formal observations per year and the informal observations, only the score on the second formal observation counted toward the evaluation rating. Alliance added measures focusing on compliance issues to the rubric for special-education teachers in 2013–2014. In 2015–2016, it conducted several pilots in which teachers could choose to participate, including shorter baseline observations for new teachers; shorter lesson plans; shorter, more-frequent observations; multiple observers; a revised student survey for special-education teachers; and a rubric shortened from 39 indicators across four domains to about 15 indicators covering domains 1 through 3: (1) classroom learning environment, (2) instruction, and (3) professional responsibilities.

Student Achievement Measure

To accommodate the lack of an SGP measure, from 2013 onward, Alliance has adopted a Lexile growth metric for all teachers. The measure is based on a preassessment, midyear assessment and postassessment of reading ability. Achieve 3000 is the program delivering the online instruction on which the Lexile score is calculated. Each teacher uses the percentage of his or her students meeting their expected growth targets to calculate his or her student achievement rating.

Student Feedback Measure

In 2011–2012, Alliance began conducting an online student survey based on the Tripod student survey but shortened and with more-specific questions. In 2012–2013, Alliance began using its own student survey, which was more aligned with the teacher-observation rubric. The CMO administered the student survey each spring to a sample of the teacher’s students and reported scores at the teacher level.

30

Family Feedback Measure

From 2011–2012 onward, Alliance has conducted a modified and shortened version of the Tripod family survey. Alliance conducts the family survey annually and reports results at the school level.

Aspire

Composite Measure

Aspire’s initial composite evaluation measure in 2011–2012 contained the following weights for teachers in tested subjects and grade levels:

• 40 percent observation • 30 percent individual student achievement • 10 percent school-level student achievement • 10 percent student survey • 5 percent parent survey • 5 percent peer survey.

For teachers in nontested subjects or grade levels, all of the student achievement percentage (40 percent) was school-wide student achievement. All other components remained the same.

To offset the loss of state test scores in 2013–2015, Aspire initially administered the previous year’s state test and used it to calculate an SGP, then used a combination of measures (see “Student Achievement Measure”).

In 2014–2015, Aspire developed a set of weights for special-education teachers:

• 60 percent practice (40 percent observation on the special-education rubric and 20 percent observation of individualized education program [IEP] facilitation and individualized education)

• 20 percent student achievement (school level) • 10 percent student feedback • 5 percent family feedback • 5 percent peer feedback.

From 2015–2016 onward, Aspire has used an increased weight for the teacher practice measure and decreased weight for student achievement. As before, nontested teachers used school-wide student achievement scores:

• 50 percent observation • 20 percent individual student achievement • 10 percent school student achievement • 10 percent student survey • 5 percent family survey • 5 percent peer survey.

31

For special-education teachers, the percentages were as follows:

• 40 percent observation on the special-education rubric • 20 percent observation of IEP facilitation and individualized education • 20 percent school student achievement • 10 percent student survey • 5 percent family survey • 5 percent peer survey.

Classroom Practice Measure

Over the course of the study, Aspire tinkered with the number of observations and how scores would be calculated. In 2011–2012, each teacher had one formal and four informal observations. In 2012–2013, each teacher had two formal and three to four mini-observations, and the three lowest scores on the formal observation could be replaced by scores from the mini-observations. In 2013–2014, Aspire returned to one formal observation, which counted for 30 percent of the teacher-evaluation score, and three mini-observations, which counted for a total of 10 percent. In the next two years, 2014–2015 and 2015–2016, each teacher could choose between the “classic model” of one formal and three mini-observations or the “many-mini” model of six mini-observations of about 20 minutes each, three of which were unannounced. The score was the average for the given indicator. For the many-mini model, if a teacher was not rated on at least 80 percent of the rubric indicators, that teacher did not receive a rating for the year and received a rating of either E or the previous year’s rating, whichever was higher.

Aspire added measures focusing on compliance issues to the rubric for special-education teachers in 2014–2015.

Student Achievement Measure

Aspire continued to calculate SGP scores using previous state assessments and other measures. For 2013–2014, Aspire administered CSTs from 2012–2013. All elementary schools gave mathematics and ELA CSTs. One of the nine secondary schools gave math and ELA; the other eight gave either the mathematics or ELA assessment.

In 2014–2015, Aspire used the following assessments:

• grades K through 5: Renaissance’s Star Renaissance within-year growth measure • grades 6 through 12: ACT Aspire within-year growth in ELA, math, and science. Both

measures are aligned to Common Core State Standards and generate SGPs based on a national sample of academically similar peers. Aspire worked with ACT’s research department to create a norm group that looked more like Aspire students than a national sample.

In 2015–2016, Aspire used the following assessments:

• grades K through 2: Star Renaissance within-year growth • grades 3 through 8 and 11: Smarter Balanced spring-to-spring growth score

32

• grades 9 and 10: ACT Aspire spring-to-spring growth • grade 11: ACT Aspire using the previous year’s ACT Aspire score for the growth

measure.

Even though the students in grades 3 and 11 took the Smarter Balanced Assessment Consortium assessment, Aspire could not generate a growth measure for them because there were no prior grade assessments. Instead, teachers of grades K through 3 used Star results, teachers of grades 4 through 8 used Smarter Balanced results, and teachers of grades 9 through 11 used ACT Aspire results.

Student Feedback Measure

In 2011–2012, Aspire shortened the annual student survey from the original Tripod version. The CMO revised it again from 2012–2013 onward to use language aligned to the observation rubric. For elementary students, it reports results at the classroom level. For secondary students, each student responds for two randomly selected teachers, and scores are reported at the teacher level. There is a survey for grades 1 and 2 and a survey for grades 3 through 5, and results are reported at the classroom level. At the secondary level, a panel of students for each teacher takes the survey for that teacher, and the survey is reported at the teacher level.

Family Feedback Measure

All of the CMOs piloted the Tripod family survey in 2010–2011 and have subsequently conducted annual family surveys based on their own modified versions of the Tripod survey. In 2014–2015, Aspire developed its own family survey. A family survey typically contains parent satisfaction–type questions, and results are typically reported at the school level, but Aspire reports them at the teacher level for grades K through 5.

Peer Feedback Measure

From 2011–2012 through 2015–2016, Aspire administered an annual peer survey. The central office provided principals with the names of peers to anonymously rate their colleagues. Aspire implemented a new version of the peer survey in 2012–2013, more aligned with the observation rubric. In 2014–2015, it revised the peer survey to align more closely with the Aspire core values of quality, collaboration, ownership, and purposefulness.

Green Dot

Composite Measure

The original Green Dot composite measure from 2011–2013 consisted of the following weights:

• for teachers in tested subjects and grades

- 40 percent observation

33

- 30 percent individual student achievement - 10 percent school student achievement - 10 percent student survey - 5 percent family survey - 5 percent peer survey.

• for teachers in other subjects and grades

- 55 percent observation - 25 percent school student achievement - 10 percent student survey - 5 percent family survey - 5 percent peer survey.

In 2012–2013, Green Dot developed a set of weights for special-education teachers: - 35 percent observation - 25 percent compliance - 20 percent school student achievement - 10 percent student survey - 5 percent family survey - 5 percent peer survey.

In 2013–2014, when state assessment scores were no longer available for calculating an SGP measure, Green Dot temporarily eliminated student achievement as an evaluation component and increased the weight of other measures to compensate. The weights continued through 2015–2016:

• 65 percent observation • 15 percent peer survey • 15 percent student survey • 5 percent family survey

Weights for special-education teachers were as follows:

• 65 percent practice (50 percent observation on the special-education rubric and 15 percent compliance)

• 15 percent peer feedback • 15 percent student feedback • 5 percent family feedback.

Classroom Practice Measure

From its inception in 2011 through 2014–2015, Green Dot maintained a model of two informal and one formal observation each semester. At the second formal observation, teachers could choose to keep any 3s and 4s from the first observation and not be observed and rated on those indicators again in the spring.

34

In 2013–2014, Green Dot piloted an alternative model of six mini-observations at several schools. The teachers’ union in California rejected the model, but the Green Dot schools in Tennessee adopted the many-mini model of six mini-observations.

In 2014–2015, Green Dot added measures focusing on compliance issues to the rubric for special-education teachers.

In 2015–2016, it implemented a new configuration of observations to try to minimize the burden on both teachers and administrators, provide a more authentic picture of teacher practice, and place more emphasis on teacher support. The new configuration divided teachers into two groups. Group 1 contained any teacher who had taught at Green Dot for at least two years and, in 2014–2015, had observation score of 2.7 or higher. One semester, the teacher received one formal scheduled observation (45 minutes) and two unscheduled informal (25-minute) observations that were scripted and not scored, but evidence from the informal observations drove the summative score. The other semester, the teacher received three informal observations, which were scored, and the observation results were used for coaching.

Group 2 contained any first- or second-year teacher at Green Dot who, in 2014–2015, had an observation score lower than 2.7. One semester, the teacher received one formal scheduled observation (45 minutes) and two informal unscheduled (25 minutes), with the evidence aggregated for a summative score, and one informal observation that was not scored. The other semester, the teacher received two unscheduled (25-minute) observations. A score of 3 or 4 on a maximum of 15 indicators could be transferred from fall to spring summative scores and were not rerated.

Student Achievement Measure

Green Dot implemented an SGP measure based on the CST from 2011–2012 through 2012–2013. From 2013–2014 onward, Green Dot has “grayed out” the student achievement measure in its teacher-evaluation components. The new Smarter Balanced state assessment is administered only once (grade 11) for mathematics and ELA at the HS level. Because Green Dot schools are primarily at the HS level, it is challenging for Green Dot to calculate SGP scores using state assessments.

Student Feedback Measure

In 2011–2012 and 2012–2013, Green Dot administered a student survey once in the fall to students in a teacher’s second-period class and in the spring to a randomized set of 25 students. From 2013–2014 onward, Green Dot has administered the student survey annually to about 42 students per teacher (randomly selected) during the students’ advisory periods. Green Dot stopped using the Tripod survey after the first year because it was so long and developed its own survey aligned with the teacher-observation rubric. Questions focusing more on “what happens in class” than on “what my teacher does” correlated better with SGP scores. For example, a question on the 2011–2012 Green Dot student survey read, “I know how each lesson is related to

35

other lessons,” whereas the question on the 2016 student survey read, “My teacher explains how today’s lesson connects to what we learned before and what we will learn in the future.” The former item correlated with SGP scores better than the latter did.

Family Feedback Measure

From 2011–2012 onward, Green Dot has conducted a family survey annually based on its revised and shortened version of the Tripod survey.

Peer Feedback Measure

Green Dot has annually administered a 360 peer survey from 2012–2013 onward. In 2012–2013, each teacher was rated by three peers. From 2013–2014 onward, each teacher has been rated by five peers: two from the teacher’s department, two from the teacher’s grade level, and the fifth from either the department or the grade level. One administrator, who does the evaluation for the formal observation, also fills out a survey. Each teacher receives a copy of his or her self-rating, the aggregated peer rating (involving five surveys), and the administrator rating.

PUC

Composite Measure

From 2011–2012 through 2012–2013, the PUC composite measure consisted of the following items and weights:

• for teachers of tested subjects and grade levels

- 44 percent observation - 30 percent individual student achievement - 10 percent school-level student achievement - 10 percent student survey - 3 percent parent survey - 3 percent peer survey.

• for teachers of other subjects and grade levels

- 44 percent observation - 40 percent school-wide student achievement - 10 percent student survey - 3 percent parent survey - 3 percent peer survey.

36

In 2013–2014, the student achievement measure was school-level Lexile scores for all teachers. That same year, PUC developed a composite measure for special-education teachers consisting of the following:

• 55 percent teacher practice (15 percent compliance review, 15 percent IEP meetings, 10 percent growth goals, and 15 percent the collaboration meeting)

• 25 percent student growth and achievement (15 percent individual student and 10 percent school level)

• 10 percent professional contributions (peer and family surveys and collaborative rating) • 10 percent student survey.

In 2014–2015, PUC stopped calculating a composite score because of pushback from teachers who resented being reduced to one number. At his or her summative conference, a PUC teacher can receive the scores for each individual component: the student survey data, the parent survey data, the Lexile score, observation notes, and the narrative that describes his or her strengths and areas of growth related to his or her growth goals. At the teacher’s summative conference, the SL reviews with the teacher his or her student survey results, his or her performance on growth goals, professional contributions as measured by domain 4 on the rubric and the family survey, and student Lexile growth. PUC uses these elements to determine whether the teacher met his or her growth goals.

Classroom Practice Measure

From 2014–2015 onward, each PUC teacher has two classic observations per year and a minimum of two open (i.e., shorter and more informal) observations per semester. The observations focus on three to five growth goals drawn from the rubric indicators: one organization goal (e.g., parent involvement), one school goal, and one to three teacher goals. Observers script and enter data into BloomBoard, but there is no scoring along the way. All evidence feeds into a final determination of whether the teacher has met his or her growth goals. An observer is always looking at all the indicators but providing feedback and development primarily on the teacher’s specific growth goals.

PUC added measures focusing on compliance issues to the rubric for special-education teachers in 2013–2014.

Student Achievement Measure

From 2013–2014 onward, because state assessment scores are no longer available to calculate SGPs, PUC has to use school-level Lexile scores to provide fall-to-spring student growth scores. This score is part of the data reviewed with the teacher at the summative conference to consider progress on his or her growth goals.

37

Student Feedback Measure

In 2011–2012, PUC began annually administering a student survey. It was a slightly modified version of the Tripod student survey. The site randomly selected cohorts of students and randomly assigned each to a teacher to rate. From 2014–2015 onward, the survey has been shortened and divided into four sections. PUC changed the sample of students so that every student completes a survey for each of his or her teachers but only one of the four sections of the survey.

Family Feedback Measure

PUC began annually administering a modified version of the Tripod family survey in 2011–2012.

From 2014–2015 onward, PUC has split the questions on the family survey into two versions. Results of the survey are reported at the school level.

Peer Feedback Measure

PUC administered a peer feedback survey once a year from 2011 through 2013. The survey development team looked at work done by Achievement First and the 360 model. Peers at the MS level were rated by grade-level team members and at the HS level by department team members; at both levels, a random selection of raters by the central office was also included after vetting by the principal. The site discontinued the peer survey after 2013 because of teacher dissatisfaction with the measure.

39

Appendix C. Additional Exhibits for Chapter Three

Figure C.1. Teachers Reporting That Evaluation Components Were Valid Measures of Their Effectiveness to a Large or Moderate Extent, Springs 2013–2016

NOTE: Significant (p < 0.05) differences between the 2016 percentage and other years’ percentages: For the component in row 1, SCS declined from 2013, PPS declined from 2015, Alliance increased from 2014, and Green Dot increased from 2015. For the component in row 2, SCS declined from 2013, Alliance increased from 2014 and 2015, and Aspire declined from 2013 and 2014. For the component in row 3, SCS declined from 2013, and PUC increased from 2013. For the component in row 4, HCPS increased from 2013 and 2015;; PPS increased from 2014;; Alliance increased from 2013, 2014, and 2015;; Aspire declined from 2014;; Green Dot increased from 2015;; and PUC increased from 2013, 2014, and 2015. HCPS’s TE measure did not include student input, and, after 2014, Green Dot’s measure did not include student achievement.

Percentage of teachers reporting the component is valid to a large or moderate extent:

HCPS SCS PPS Alliance Aspire Green Dot PUC

Observations of your teaching

Student achievement or growth on state, local, or other standardized tests

Student input or feedback (for example, survey responses)

All evaluation components combined

93

62

48

74

89

61

36

72

87

58

38

68

88

55

36

72

2013 2014

2015 2016

94

54

35

66

89

48

33

66

95

48

32

67

88

45

36

72

2013 2014

2015 2016

91

73

68

80

89

64

71

79

93

57

68

78

96

77

72

87

2013 2014

2015 2016

91

76

60

82

95

78

64

90

94

67

61

86

94

65

63

84

2013 2014

2015 2016

90

62

65

73

90

57

68

74

85

64

69

91

66

77

2013 2014

2015 2016

95

76

79

86

91

60

82

83

96

81

82

93

86

91

2013 2014

2015 2016

76

56

63

77

60

71

76

52

62

78

56

69

2013 2014

2015 2016

40

Figure C.2. Teachers’ Agreement with Statements About Observations, Springs 2013–2016

NOTE: Statements in rows with bars missing for some years were not included in the survey administered in those years.

Percentage of teachers agreeing with each statement (somewhat or strongly)

HCPS SCS PPS Alliance Aspire Green Dot PUC

I have a clear understanding of the rubric that observers are using to evaluate my teaching.The observation rubric is well-­‐suited for measuring many different forms or styles of good teaching.

The observation rubric is well-­‐suited for measuring instruction in my subject area(s).

The observation rubric is well-­‐suited for measuring instruction with the types of students I teach.

The people who observe my teaching are well qualified to evaluate it.

The observations are long enough to provide an accurate view of my teaching.

There are enough observations to provide an accurate view of my teaching.

I do extra preparation or planning for lessons that are going to be formally observed.The way I teach during formal observations is the same as the way I teach when I'm not being observed.

85

51

63

72

64

84

84

48

64

73

65

78

42

58

71

63

87

85

48

54

46

63

71

63

88

87

2013 2014

2015 2016

92

69

86

86

90

79

89

59

81

90

88

90

55

83

89

87

81

90

61

62

54

84

89

84

78

94

2013 2014

2015 2016

85

68

77

78

73

65

83

59

76

71

70

85

67

80

84

75

67

88

66

66

55

82

81

70

72

96

2013 2014

2015 2016

84

65

80

71

62

91

90

64

82

76

67

85

56

85

72

59

95

87

69

70

67

86

80

63

97

88

2013 2014

2015 2016

86

74

89

69

61

90

90

75

89

80

64

87

69

91

75

80

93

92

67

75

69

89

69

79

92

88

2013 2014

2015 2016

87

54

81

71

54

96

82

55

80

78

58

76

44

82

65

54

95

82

58

61

54

86

68

62

94

86

2013 2014

2015 2016

86

75

85

71

60

93

86

63

84

70

52

84

78

91

74

70

65

90

80

77

80

91

83

76

65

98

2013 2014

2015 2016

41

Figure C.3. Teachers’ Agreement with Statements About the Use of Student Achievement in Teachers’ Evaluations, Springs 2013–2016

NOTE: Statements in rows with bars missing for some years were not included in the survey administered in those years.

Figure C.4. Teachers’ Agreement with Statements About the Use of Student Feedback in Teachers’ Evaluations, Springs 2013–2016

NOTE: Significant (p < 0.05) differences between the 2016 percentage and other years’ percentages: For the statement in row 1, from 2013 to 2015, we saw a decline in Alliance and increases in Aspire and PUC. For the statement in row 2, PPS declined from 2013, Aspire increased from 2013, Green Dot declined from 2013 and 2014, and PUC increased from 2013 and 2015. For the statement in row 3, SCS declined from 2013, 2014, and 2015;; PPS declined from 2013 and 2015;; Alliance increased from 2014 and 2015;; Aspire increased from 2013 and 2014;; and PUC increased from 2013, 2014, and 2015. For the statement in row 4, Aspire declined from 2013 and 2015;; Green Dot declined from 2015;; and PUC declined from 2013 and 2014.

Percentage of teachers agreeing with each statement (somewhat or strongly)

HCPS SCS PPS Alliance Aspire Green Dot PUC

I have a clear understanding of how student test scores are used to evaluate my performance.

The student tests used in my evaluation measure important skills and knowledge.

The student tests used in my evaluation are well aligned with my curriculum.

Scores on the student tests used in my evaluation are a good measure of how well students have learned what I've taught during the year.

The ways that student test scores are used to evaluate my performance appropriately adjust for student factors not under my control.

The student tests used in my evaluation have room at the top for even the district's/CMO's highest-­‐achieving students to grow.

If I am an effective teacher, my students will show progress on standardized test scores in my subject area(s) during the time I am their teacher.

50

59

64

50

31

62

68

63

69

69

56

37

65

50

55

58

43

30

59

63

55

59

59

46

30

61

2013 20142015 2016

75

72

83

56

46

71

79

77

72

82

60

42

72

72

71

78

51

43

67

73

68

65

67

50

39

66

2013 20142015 2016

58

58

45

42

40

60

60

59

61

50

45

34

64

62

61

50

37

35

58

63

65

61

59

45

36

72

2013 20142015 2016

70

58

75

64

40

69

81

78

65

68

61

52

66

58

60

55

36

37

64

71

77

72

65

58

47

65

2013 20142015 2016

73

70

83

68

50

69

78

84

68

73

70

53

74

62

59

41

47

44

72

74

62

70

46

48

44

72

2013 20142015 2016

65

55

80

58

41

65

75

69

58

67

46

44

60

65

55

73

57

40

65

65

52

65

76

69

46

74

2013 20142015 2016

71

57

75

57

41

66

78

75

59

54

54

41

64

57

61

51

51

44

71

65

59

72

64

57

60

87

2013 2014

2015 2016

Percentage of teachers agreeing with each statement (somewhat or strongly)

SCS PPS Alliance Aspire Green Dot PUC

Getting input from students is important to assessing teacher effectiveness. [not asked in 2014 or 2016]

Students are good judges of how effective a teacher's instruction is.

I trust my students to provide honest, accurate feedback about my teaching.

I worry that many students do not really understand the questions they are asked about their teacher or class.

58

40

57

88

38

48

89

58

43

55

86

38

40

88

2013 2014

2015 2016

53

41

47

89

39

44

90

54

40

51

92

33

39

90

2013 2014

2015 2016

87

69

76

84

64

67

74

83

61

70

86

68

80

80

2013 2014

2015 2016

73

53

62

86

56

59

79

82

56

67

84

63

71

73

2013 2014

2015 2016

86

74

72

76

69

70

74

84

59

67

82

61

69

75

2013 2014

2015 2016

89

69

85

68

77

85

68

94

64

86

64

80

93

60

2013 2014

2015 2016

42

Figure C.5. Teachers’ Agreement with Statements About Evaluation, Springs 2013–2016

NOTE: Significant (p < 0.05) differences between the 2016 percentage and other years’ percentages: For the statement in row 1, HCPS declined from 2014, SCS declined from 2013, Alliance increased from 2013, Aspire increased from 2013 and 2015, Green Dot increased from 2014 and 2015, and PUC increased from 2014 and 2015. For the statement in row 2, HCPS declined from 2013 and 2014, SCS declined from 2013 and 2014, and PPS declined from 2015. For the statement in row 3, HCPS declined from 2014, Green Dot increased from 2014, and PUC increased from 2014 and 2015. For the statement in row 4, SCS increased from 2014;; PPS declined from 2013 but increased from 2014;; Alliance increased from 2013, 2014, and 2015;; Green Dot increased from 2013, 2014, and 2015;; and PUC increased from 2013, 2014, and 2015. For the statement in row 5, HCPS increased from 2015, PPS increased from 2014, Alliance increased from 2014 and 2015, Aspire increased from 2014 and 2015, and PUC increased from 2014 and 2015. For the statement in row 6, HCPS increased from 2013;; PPS increased from 2014;; Alliance increased from 2013, 2014, and 2015;; Aspire decreased from 2014 and 2015;; Green Dot increased from 2014 and 2015;; and PUC increased from 2013 and 2014.

Percentage of teachers agreeing with each statement (somewhat or strongly)

HCPS SCS PPS Alliance Aspire Green Dot PUC

As a result of the evaluation system, I have become more reflective about my teaching.

The evaluation system has helped me to pinpoint specific things I can do to improve my instruction.As a result of the evaluation system, I have made changes in the way I teach. [not asked in 2013]The evaluation system is fair to all teachers, regardless of their personal characteristics or those of the students they teach.

The evaluation system has been fair to me. [not asked in 2013]

The consequences tied to teachers' evaluation results are reasonable, fair, and appropriate.

64

70

19

30

73

71

81

20

54

36

66

66

76

18

52

35

66

64

73

19

58

35

2013 2014

2015 2016

77

84

38

41

72

78

83

28

67

35

75

76

83

32

68

39

70

71

81

35

70

38

2013 2014

2015 2016

72

73

37

38

67

67

75

20

64

26

73

75

74

29

74

39

73

67

77

29

78

36

2013 2014

2015 2016

63

74

48

63

75

81

83

48

71

61

79

81

87

43

72

57

82

81

87

61

84

77

2013 2014

2015 2016

72

85

53

67

84

88

91

52

73

72

82

91

92

52

73

76

87

88

89

52

80

64

2013 2014

2015 2016

68

82

35

52

66

76

76

30

66

52

64

77

78

29

65

50

74

77

83

44

71

59

2013 2014

2015 2016

77

79

53

69

74

80

84

42

68

53

68

84

80

62

83

74

86

84

89

72

92

77

2013 2014

2015 2016

43

Figure C.6. Teachers’ Agreement with Statements About the Usefulness of Feedback from Evaluation Components, Springs 2013–2016

NOTE: Significant (p < 0.05) differences between the 2016 percentage and other years’ percentages: For the statement in row 1, HCPS increased from 2015, Alliance increased from 2013, Aspire declined from 2014 and 2015, and PUC increased from 2013 and 2014. For the statement in row 2, HCPS declined from 2014, Aspire declined from 2014, and PUC increased from 2013 and 2014. For the statement in row 3, HCPS declined from 2014, SCS declined from 2014 and 2015, and Aspire declined from 2014. For the statement in row 4, HCPS declined from 2014, SCS declined from 2014 and 2015, and Aspire declined from 2014. For the statement in row 5, SCS declined from 2013;; Alliance increased from 2015;; Aspire increased from 2014;; Green Dot declined from 2013;; and PUC increase from 2013, 2014, and 2015. For the statement in row 6, SCS declined from 2013 and 2015;; Alliance increased from 2013, 2014, and 2015;; Green Dot declined from 2013;; and PUC increased from 2014. HCPS’s TE measure did not include student input, and, after 2014, Green Dot’s measure did not include student achievement.

Percentage of teachers agreeing with each statement (somewhat or strongly)

HCPS SCS PPS Alliance Aspire Green Dot PUC

After my teaching is observed, I receive useful and actionable feedback.

I have made changes in the way I teach as a result of feedback I have received from observers.I receive useful and actionable data from the student tests used in my evaluation. [not asked in 2013]I have made changes in what (or how) I teach based on data from the student tests used in my evaluation. [not asked in 2013]I would consider making changes to my teaching based on feedback from my students.The student feedback results help me understand my strengths and weaknesses as a teacher.

65

79

67

84

59

78

58

77

47

71

67

78

47

68

2013 2014

2015 2016

87

88

72

58

85

88

75

91

67

48

86

88

76

91

66

52

87

87

59

80

63

46

2013 2014

2015 2016

78

82

78

57

78

85

50

79

76

51

79

84

53

78

76

51

83

83

53

79

74

49

2013 2014

2015 2016

78

83

91

78

80

90

65

79

90

76

85

88

56

79

85

72

85

88

62

74

93

86

2013 2014

2015 2016

82

90

85

74

87

95

66

80

82

72

87

93

56

70

84

73

81

92

50

64

88

74

2013 2014

2015 2016

82

83

91

82

75

88

50

73

88

72

80

88

91

68

78

88

86

71

2013 2014

2015 2016

67

81

91

88

76

86

61

79

96

85

85

87

94

86

85

92

100

92

2013 2014

2015 2016

45

Appendix D. Site Recruitment, Hiring, Placement, and Transfer Policies: Supplementary Material for Chapter Four

The descriptions in this appendix supplement the information presented in Chapter Four on recruitment, hiring, placement, and transfer policies. In HCPS, we describe placement in the section on transfer because the district does not place teachers involuntarily. In PPS and SCS, we describe both transfer and placement policies in the section on hiring because they are part of the hiring process. The CMOs have no centralized transfer and placement policies, and we note this in the descriptions. We describe the districts first, then the CMOs.

District Recruitment, Hiring, Placement, and Transfer Policies

HCPS

HCPS is a growing school district, and, as such, improving recruitment and hiring processes is a high priority. The district made two major changes during the initiative: beginning to hire earlier in the year and launching a new online application tool called AppliTrack.

Recruitment

HCPS recruits new teachers by a variety of means, including hiring fairs and social media campaigns throughout the state of Florida. The district holds hiring fairs specifically targeting difficult-to-staff schools and, during the initiative, began conducting outreach in Puerto Rico. To track the effectiveness of its recruiting efforts, HCPS entered into a partnership with TNTP, an organization that assists in collecting and analyzing recruitment data.

Screening

At the start of the initiative, HCPS used the Haberman tool to evaluate and screen teacher applications, and HR sent teacher applications to individual principals by paper mail, email, and fax. Starting in 2013–2014, the district began to design, then pilot, AppliTrack, a comprehensive hiring and recruitment platform that teacher candidates would use to upload resumes and portfolios. HCPS launched AppliTrack to all schools for new hires in 2015–2016. One key aspect of AppliTrack is the Teacher Fit tool, a survey based on HCPS’s TE rubric that all applicants complete. HCPS drops from consideration any applicant who scores in the first or second stanine. Applicants scoring in the third stanine are evaluated by HR for potential interviews, while applicants scoring in the fourth stanine or above are automatically deemed eligible for interviews and added to the hiring pool. School principals then use AppliTrack to review the eligible pool of candidates, schedule interviews, and track open positions.

46

Hiring

HCPS began moving up its hiring timeline in 2013–2014 to give SLs more time to plan and to better compete with other area districts that started hiring earlier in the year. As of 2015–2016, the hiring process began six weeks earlier than it had before the IP grant, in April rather than in June.

Hiring for Hard-­to-­Staff Schools

To recruit teachers to lower-performing schools, HCPS offers special bonuses and other incentives for teachers who are willing to transfer or apply to these schools. There are two different incentive programs. One, which is funded through Title I, serves the Renaissance schools. Renaissance schools are the 50 schools in the district with the highest numbers of students who qualify for FRPL (at least 90 percent for elementary school, 85 percent for MS, and 75 percent for HS). Before 2014–2015, any teacher who worked at a Renaissance school received a bonus of 5 percent of base pay (2 percent of base pay for first-year teachers). Since 2014–2015, a teacher with one year or less of experience receives an annual bonus of $1,000; a teacher with two to ten years of experience receives $2,300 annually; and a teacher with 11 years or more of experience receives $3,600 annually. An NBPTS-certified teacher is eligible for an additional $4,500 annually. The second type of incentive applies to POWER3 schools, which a TIF grant funded starting in 2012–2013. HCPS offers each teacher at one of the 30 high-need schools a $1,000 hiring bonus and a $2,000 annual retention bonus. Only teachers with a TE rating of HE (4 or 5) were eligible for POWER3 bonuses. A new teacher with experience in another district is also eligible for the bonus if he or she can show a rating equivalent to HE that includes a student growth measure.

Transfer

Teachers are free to request a voluntary transfer within HCPS, and such requests receive priority over new hires, except at Renaissance schools. HCPS requires principals to check TE scores of prospective new hires and can factor theses scores into their hiring decisions. Seniority within HCPS is not considered as a factor. Transfer candidates apply directly to the schools where they want to work, and the schools make the final hiring decisions. The district does not place teachers involuntarily, except when enrollment declines cause positions to be eliminated. In those cases, HCPS gives the teacher the choice to stay at his or her current school (if enough positions remain) or transfer to an open position for which he or she qualifies, and those teachers are assigned in order of seniority. Beginning in 2014–2015, if two or more teachers have the same level of seniority, HCPS ranks them in order of TE scores to determine the order of placement.

47

PPS

PPS could not place much emphasis on changing the composition of the teacher workforce through recruitment, hiring, or placement in part because, as administrators explained, student enrollment in the district was declining throughout the period of the initiative, so the district hired very few teachers early in the initiative. In fact, because of budget shortfalls, PPS furloughed teachers at the end of the 2011–2012 school year. In addition, state laws and the collective bargaining agreement with teachers made it difficult to change many of these policies. Since the start of the IP initiative, PPS has not partnered with any alternative teacher certification programs (e.g., TFA, TNTP).

Recruitment

A teacher can enter the candidate pool in two ways: as an external candidate or as an internal candidate. Applicants who do not currently teach in PPS are external candidates; current teachers seeking new positions are internal candidates. When recruiting external candidates, Pennsylvania law requires PPS to consider only the top 10 percent of candidates in any certification area; teachers in the top 10 percent of applicants constitute the “eligible list” from which PPS can hire teachers. PPS sets its own requirements for determining which teachers fall into this top 10 percent, and it has defined a 20-point application model.

As part of its IP proposal, PPS planned to implement “teacher academies” in two of its highest-need schools. The district intended the teacher academies to attract highly qualified candidates and provide them with on-the-job training, their teaching certificates, and positions teaching in the district once they completed the two-year residency program. Placing the academies in two of the district’s highest-need schools was intended to attract high-quality teachers to those schools. However, the district never implemented the teacher academies because it faced a budget shortfall and was going to have to furlough teachers. Academy resident teachers, as the least senior teachers in the district, would have been the first to be furloughed.

Although recruitment was not a focus of the initiative at first (the district downsized its workforce because of financial constraints), it has become more of a focus in recent years, and PPS has been trying to improve the racial and ethnic diversity of the teaching workforce by recruiting a more diverse applicant pool. Specifically, central-office staff reported that the district is working to recruit teacher candidates outside of Pennsylvania, including at historically black colleges and universities by visiting their campuses and attending job fairs.

Screening

An applicant who applies to PPS must fill out an application; provide a resume; complete three short essays (developed with TNTP’s help) to screen for grit, desire to work in an urban setting, and high expectations for all students; and complete the Gallup TeacherInsight survey. A teacher is awarded up to ten points for leadership and teaching experience, both overall and as a substitute in the district; up to five points for his or her three essays; and up to five points for his

48

or her score on the Gallup TeacherInsight survey. The district has a cadre of teachers trained to screen applications based on score; the “eligible list” consists of teachers who are in the top 10 percent of applicants. The district does not use TE data in the recruiting, application, or screening process.

A teacher who is currently employed in the district and who received an S rating on the composite TE measure may enter the candidate pool as an internal candidate. A current teacher can opt to change schools voluntarily, or he or she can be displaced from a school involuntarily if his or her position is cut (e.g., for budget reasons), if the position was funded through supplemental funds that have expired, or if his or her date of hire was after August 1. When a school has to let teachers go—or displace them—involuntarily, the least senior teachers are displaced first.

Hiring

As part of its collective bargaining agreement, PPS must find jobs for its internal candidates before matching external candidates to open positions. The district considers internal candidates for positions between approximately March and June. Principals in all schools, including staffing-support schools must interview internal candidates, and internal candidates must be matched with positions, before HR hires and assigns any external candidates.

The hiring process begins with the district HR office sharing the list of qualified candidates with principals, who then interview the candidates with their site-based selection teams, which usually include the principal and several teachers. After conducting interviews, principals submit their hiring preferences to the HR office, which extends offers to the internal candidates whom the principal prefers. HR does its best to use a “mutual-match” process and take principal and teacher preferences when making offers to candidates. When displaced teachers remain, HR assigns them to remaining positions by considering principal and teacher preferences and reviewing TE information. According to district policy, internal teacher candidates are placed in order of seniority (i.e., the most-senior teachers are placed first), but, in practice, the district also takes other factors, such as teacher and principal preferences, into account. The district must place all internal candidates before offering any external candidate a position. It is the district HR office’s responsibility to assign to available positions any internal teacher candidates who do not apply, are not selected, or decline offers of open positions.

Any remaining vacant positions that internal candidates do not fill are open to external applicants, and the external hiring process follows a similar order (minus seniority considerations), from about June through August. When hiring for the 2016–2017 school year, PPS offered an early hiring option to the most-qualified external candidates. It offered selected external candidates “tryouts,” in which each taught a 30-minute lesson to a panel of educators and students. The district made hiring commitments to those judged to be the more effective even though there was not yet an actual position available. The tryout process enabled the district to secure the most-qualified external candidates early in the year. The central office also shares

49

TE information for internal candidates with the principals of staffing-support schools to help those principals recruit or interview specific teachers if they wish. Principals of non–staffing-support schools do not receive TE information from the central office about the applications they receive, but teachers can opt to share the information on their own.

Hiring for Hard-­to-­Staff Schools

PPS provides a range of support services to hard-to-staff schools to enable them to hire and retain more-effective teachers. PPS identifies these schools based on low achievement and low scores on the district’s teaching and learning condition survey—the staffing-support schools. There were about 14 staffing-support schools in PPS in 2015–2016. To help staffing-support schools hire more-effective teachers, PPS employs several strategies:

• It offers teachers incentives—placement at a higher step on the salary scale than the typical new hire if they are hired into staffing-support schools.

• The central office provides TE data, if available, for all teachers planning to transfer internally, so principals can be strategic about which teachers they recruit to apply. TE information is provided only to staffing-support schools.

• Teachers with TE ratings of NI or F are not placed in staffing-support schools. • Any teacher who applies for a voluntary transfer must visit a staffing-support school and

meet with the principal and site-based selection team. • Principals of staffing-support schools may interview external candidates if they do not

receive any internal applicants. • Hiring for the 2016–2017 school year for staffing-support schools began in January 2016,

well before hiring for the rest of the district. In addition, to improve retention of effective new teachers in staffing-support schools, such

teachers can be exempted from the district’s August 1 rule. This rule, which is part of the collective bargaining agreement, requires that teachers hired after August 1 be automatically displaced at the end of the school year. In staffing-support schools, a teacher hired after August 1 can stay if the teacher and principal agree.

SCS

Before the IP initiative, some of SCS’s teacher staffing policies were not in alignment with the goals of the IP initiative. For example, placement of furloughed teachers was based on seniority, and the district had to place any furloughed teacher in a position before considering external candidates. SCS had limited ability to change staffing requirements that were subject to state law. During the initiative, SCS placed a great deal of emphasis on staffing (i.e., recruitment, hiring, and placement) strategies through its partnership with TNTP, which predates the IP initiative by several years. SCS’s partnership with TNTP began in about 2004, and, from 2004 through 2007, TNTP helped the district build an online recruitment and application system, establish systems to track vacant positions, and manage a teacher residency program. In 2010, after the start of the IP initiative, TNTP’s role expanded. From about 2012 through 2015, TNTP

50

was responsible for filling teacher vacancies; thus, it managed much of the district’s recruitment, screening, hiring, and placement efforts. During this period, the district HR department handled compliance matters (e.g., checking licensure requirements, handling grievances). In 2015, hiring responsibilities were transitioned back to the district. According to district staff, TNTP staff resumed these duties again in 2016, at the request of district leadership.

Recruitment

In addition to its regular recruitment practices, such as attending teacher-hiring fairs and recruiting candidates from local teacher-preparation programs, SCS partnered with alternative certification programs (e.g., TFA, TNTP) prior to the initiative to recruit and prepare new-teacher candidates, and many of those partnerships continue. In 2010, when the district expanded its partnership with TNTP, TNTP focused on expanding the pool of teacher applicants by recruiting earlier in the year and expanding the reach of recruitment to out-of-region candidates. We did not ask specifically about diversity, and concerns about the racial and ethnic diversity of the teaching workforce did not come up in interviews with central-office staff.

Screening

SCS had a rolling application deadline (i.e., there was no cutoff date for applying; the district accepted applications year-round), and that process continued during the IP initiative. In the fall of 2010, right after the IP grant was awarded, the district asked each external applicant to complete a paper application and a phone interview. In 2011, TNTP moved the application process online. In the fall of 2013, TNTP implemented a new screening process for applicants new to the district that was linked to the TEM rubric. The process consisted of a phone interview in which the candidate was asked to review data and describe how those data would inform their teaching, while the interviewer rated the candidate’s responses on a rubric. TNTP staff referred the candidates with the highest scores on these interviews to principals first. In the fall of 2015, the process was refined further, so that teacher applicants from other districts who had TEM scores of 3, 4, or 5 bypassed the screening process so that their applications went directly to principals for consideration.

Hiring

The process for filling vacant teaching positions usually follows these steps once a principal or teacher notifies HR or TNTP that a vacancy exists:

1. The district posts the available position on a rolling basis. 2. HR or TNTP provides the principal with a list of screened applicants. 3. The principal interviews applicants. 4. The principal submits the hiring choice to HR or TNTP.

Teachers can transfer within the district voluntarily (i.e., opt to move to a different school) or involuntarily (i.e., the district does not offer them positions for the next year at their current

51

schools). According to central-office staff, the vast majority of transfers are voluntary. The rolling transfer period starts in February. The district posts and fills available positions, which are generally determined by the school budget or by a teacher request to leave (e.g., voluntary transfer, resignation, retirement), on a rolling basis.

Before the initiative, the district matched tenured teachers with open positions by seniority within their areas of certification. In practice, this meant that the lists of candidates that HR sent to a principal for consideration consisted of the most-senior teachers in that certification area. This process usually occurred in March and April. If there were no internal candidates for a position, the principal could consider external candidates.

The process changed with the passage of the RTT legislation in 2009. The state of Tennessee issued guidelines for placing teachers in open positions. The guidelines stated that districts should (1) use evaluations and student achievement data to place teachers, (2) base both reductions in force (RIFs) and recalling teachers from furlough on effectiveness of teachers, (3) avoid seniority as a determining factor in personnel decisions, and (4) strive for placements that have principal and teacher buy-in (mutual consent). As a result, starting in the fall of 2011, SCS no longer placed internal candidates in positions according to seniority. The district allowed, and TNTP facilitated, mutual-content hiring, in which the principal and teacher had to agree the position was a good fit. However, at this time, internal candidates who did not “match” with positions were assigned to one (i.e., forced into a placement).

This change in policy was intended to improve the match between teachers and schools and thus reduce teacher turnover. In the mutual-consent process, a principal could consider external candidates before all internal candidates had been placed if he or she could not find a “match” among internal candidates. In 2012, TNTP began projecting the number of vacancies anticipated in each grade and subject, along with the number of anticipated internal candidates. In grades and subjects that had more vacancies than there were internal candidates, a principal could hire his or her candidate of choice, without regard to internal status or seniority, until the number of vacancies approached the number of expected internal candidates. According to TNTP staff, during this period, predicting the number of vacancies was difficult. To help predict the number of vacancies, TNTP began surveying teachers about their plans for the subsequent year, specifically asking each teacher whether he or she planned to transfer or leave the district (e.g., retire or resign). Principals did not have complete flexibility in their choices of hire because the district had an obligation to provide positions to all internal candidates.

As of the fall of 2013, the hiring and placement preferences were relaxed further, becoming what one central-office interviewee described as “a free market.” A principal could select his or her preferred candidate without regard for internal status or seniority. Principals were no longer obligated to interview internal candidates before interviewing external candidates. The mutual-consent process was still in place, and the teacher and principal had to agree to the placement. Principals were expected to look at effectiveness data when making hiring decisions, internal candidates were no longer entitled to positions, and principals were no longer required to take

52

teachers who did not match into positions. At this time, TNTP started conducting hiring fairs for internal candidates to help them network and find positions, a practice that continues as of the writing of this report. TNTP encouraged internal candidates to bring their effectiveness data to discuss with principals during interviews.

Hiring for High-­Need Schools

Before the fall of 2012, the district’s high-need schools (Striving Schools or iZone schools) were subject to the same hiring policies as other schools. Starting in the fall of 2012, principals of these high-need schools were permitted to make staffing decisions based on TE data. iZone teachers with TEM scores of 3 to 5 were automatically retained in their positions because of their effectiveness scores, and, when hiring for open positions, a principal could select the candidate of his or her choice, without regard to internal status, from a pool of high-performing teachers. To be eligible to transfer to an iZone school, an internal candidate had to have received a TEM score of 4 or 5. If the candidate had a lower score, he or she needed to secure permission from HR. Also starting in the fall of 2012, the district offered any teacher hired into a high-need school either a signing bonus of $1,000 (payable at the beginning of the year) or a retention bonus of $1,000 (disbursed in two payments, one in December and one in May). High-need schools were also exempt from seniority-based layoffs and “surplussing.” As of the spring of 2013, the hiring period for high-need schools began one week earlier than it did for other schools. As of the spring of 2014, iZone schools had complete autonomy in hiring, could use performance data to make hiring decisions, and benefited from an earlier hiring timeline.

Hiring for High-­Need Positions

Starting in the spring of 2012, the district offered what it called open contracts for some high-need positions for which there were typically more vacancies than internal candidates (e.g., special education, English as a second language, MS science and math). The district would make a hiring commitment to a candidate before a specific vacancy was identified. The candidate would then be matched to a position when it became available.

CMO Recruitment, Hiring, Placement, and Transfer Policies All the CMOs had some of the recommended recruitment and hiring policies in place prior to

the introduction of the IP initiative. At all the CMOs, teachers apply online and receive a preliminary screening for credentials and security by the central office. Further screening at the central office or the school site varies by CMO. Principals can also recruit candidates directly (e.g., by getting recommendations from existing staff). Hiring authority rests with the principal, and teachers serve “at will”—that is, there is no tenure. Teachers wishing to change schools must apply along with any other applicants. Although there is no tenure, teachers expect, as do administrators, that they will continue their employment. For example, each CMO asks each teacher each March to file a notice of nonintent to return if he or she does not plan to continue

53

teaching the following year. Generally, recruiting begins in March after the schools receive these notifications and know the number of positions they will need to fill. At the outset of the initiative in 2009–2010, when the California economy was depressed, there were large numbers of applicants. In the past few years, as the economy has improved, there has been severe competition from other schools and districts for good candidates. In response, the CMOs have instituted more-extensive recruiting strategies, including more social media outreach, partnerships with local colleges, and residency programs.

In addition, Green Dot reorganized its personnel department as part of the initiative. Before TCRP, the Green Dot HR department was responsible for all personnel functions, including posting positions online, conducting background checks, reviewing credentials, maintaining employee records, and processing payroll. To support the TCRP initiative, in the spring of 2011, Green Dot elevated the importance of staff improvement by creating an HC department within the education section of the organization; three staff members were assigned to work there, and a vice president was given supervisory responsibility. The HR department continued to focus on employee records, payroll, background checks, and similar functions, while the HC department focused on recruitment, retention, PD, transitions, and performance management.

In the following sections, we present details of each CMO’s recruitment, screening, hiring, transfer, and placement policies.

Alliance

Recruitment

Before the initiative, Alliance did very little centralized recruitment. Principals managed their own recruitment and selection processes and made all hiring decisions. After the initiative started, the Alliance central office began to recruit more extensively, including holding job fairs at universities and using LinkedIn for mathematics and science positions. In 2013–2014, the HR team grew, and, for the first time, the department hosted a career fair for teacher applicants to Alliance schools. TFA was another major source of recruits (e.g., Alliance hired about 35 TFA residents for 2012–2013 and 26 for 2013–2014). Starting in 2011–2012, Alliance established a residency program with Loyola Marymount University in mathematics and science as a recruitment source. The program had eight to ten residents in 2011–2012 and three residents in 2012–2013. The program was discontinued because of an insufficient number of available mentors. In 2014–2015, Alliance began an expanded residency program in conjunction with the University of the Pacific, and ten residents participated. The program proved to be too expensive and was discontinued for 2015–2016.

Screening

At the beginning of the initiative, the Alliance central office screened candidates for credentials and security only, and principals were responsible for any subsequent screening.

54

However, in 2012–2013, HR began recruiting and screening teachers to identify a pool of qualified applicants on which all schools could draw. The HR office reviewed teacher certifications, conducted phone interviews, and conducted interviews with a central-office committee. It placed candidates whom it deemed to be qualified into a pool of candidates that was available to school administrators. About 35 percent of principals made use of the service; the others continued to do their own screening.

Hiring

For most of its history, job postings for positions at Alliance were for individual schools, with the principal as the key contact person. Principals were responsible for scheduling, interviewing, and making hiring decisions. Principals also devised their own selection processes; there was no standardized hiring process across the CMO.

After Alliance selected a new CEO in 2015, the central office was restructured, and a new talent management division was created. The talent management division also includes offices responsible for educator effectiveness and new-teacher support, reflecting the organization’s increased concern about teacher selection and support. Alliance hired TNTP to review the organization’s HR functions, and TNTP suggested creating standard procedures for recruiting, selecting, onboarding, dismissal, and similar activities; the development of such procedures began in 2015–2016.

Despite these efforts to standardize the process, some of the responsibility remains decentralized. For example, principals are encouraged to offer stipends or signing bonuses for teachers in shortage fields, but the funds must come out of the school’s budget. Bonuses are not frequently offered. Similarly, HR does not provide any interviewer training for principals.

Transfer and Placement

There are no transfers of teachers per se in Alliance. Each school does its own hiring, and any teacher wanting to change schools has to apply to the new school and go through the hiring process for that school. Similarly, there is no centralized placement of teachers in schools at any of the CMOs.

Aspire

Recruitment

Aspire’s central office engages in a variety of recruitment efforts, including participating in job fairs, doing outreach to colleges, posting on employment websites, and holding open houses and interview days. It also posts jobs on EDJOIN. A central-office administrator noted, “It’s a high priority to have teachers who share the same background and experiences as our kids.” That priority is seen in Aspire’s residency program, which became a strong source of candidates. Because Aspire eventually hires most residents, the program is a strong source of teachers as well. When the residency program began in 2010–2011, 35 percent of the participants were

55

nonwhite; because of efforts to diversify, by 2015–2016, the residency program was 70 percent nonwhite, much closer to the racial and ethnic breakdown of the students Aspire serves. A student who is accepted into the residency program works with a mentor teacher for four days per week and attends classes one day per week. Each resident receives a $13,500 stipend while in training. The mentor teacher receives a $3,000 bonus, plus $500 to spend on personal PD. The residency program began in 2010–2011 and continues. Table D.1 shows the number of participants in the residency program, by year, since its inception. A large percentage of the students who are accepted into the program complete it and are hired by Aspire.

Table D.1. Participants in the Aspire Residency Program

Status Cohort 1

(2010–2011) Cohort 2

(2011–2012) Cohort 3

(2012–2013) Cohort 4

(2013–2014) Cohort 5

(2014–2015) Cohort 6

(2015–2016) Accepted 20 18 34 29 38 54

Completed 18 17 28 27 33

Hired 18 17 23 25 33

Screening

All candidates apply online. The hiring process consists of a phone screening by an HR recruiter for “mission fit” and eligibility; then, HR refers eligible candidates to interested principals, who conduct additional phone screening before deciding whom they will invite for in-person interviews.

Hiring

Before a school does any interviewing, the HR department trains any school staff who will be involved in the process. A new principal also receives one day of training on good interviewing practices and HR tools that are available for interviewing.

The invited candidate participates in an interview at the school site with a panel that includes teachers, parents, and community members. The candidate delivers a sample lesson attended by the other teachers in his or her subject area, the principal or AP, and the regional superintendent. The principal has the final hiring authority.

Hiring for Hard-­to-­Fill Positions

Occasionally, a school might offer a stipend of up to $2,500 to attract a teacher to a hard-to-fill position. This most often happens when there is an urgent need to fill an interim position, such as to replace a teacher going on maternity leave. Beginning in 2013–2014, Aspire offered a $10,000 incentive for any HE or master teacher to move to a focus (low-achieving) school, but very few teachers took up the offer.

56

Transfer and Placement

The central office does not place teachers in schools at any of the CMOs. Each school does its own hiring. A teacher wanting to change schools must formally apply to another school and go through the same interview process as any other applicant. For example, when a charter lapsed at one Aspire school, Aspire did not guarantee teachers positions at a newly opening school. They had to apply and go through the interview process, and their TE ratings were taken into consideration in hiring.

Green Dot

Recruitment

Recruits apply online through TalentEd recruiting and hiring programs. The California economy was doing poorly at the start of the initiative. Relatively high unemployment meant that recruitment was not a challenge. For example, in 2011–2012, Green Dot had 3,000 applicants for 180 teaching positions. However, as the economy began to improve and the environment became more competitive, the ratio of applicants to positions decreased; in 2015–2016, Green Dot had only 892 applicants for 180 positions. Green Dot began to put substantial effort into upgrading its recruitment efforts to improve the quality of available candidates. In 2014–2015, it began a partnership with CSUDH to increase the preparation of teachers who might work in Green Dot schools. As part of the partnership, Green Dot staff members sit on the CSUDH panel reviewing candidates for the university teaching credential program, Green Dot offers PD to students preparing for teaching jobs (e.g., tips and workshops on successful interviewing techniques), and Green Dot staff participated in mixers to meet all of the CSUDH candidates. The CMO also began increasing the number of student teachers who were allowed to work in its schools, and the partnership organization took steps to ensure that student teachers were paired with HE teachers. To further advertise for Green Dot, the organization hired student ambassadors at targeted colleges to help with recruitment. To target its outreach to colleges that provided the best candidates, the CMO started collecting data from new teachers’ initial evaluations and linking them back to the colleges from which they were recruited. In 2015–2016, Green Dot increased its use of social media as a recruitment device, particularly to reach younger teachers. Despite all these recruitment efforts, Green Dot leaders believe that the candidates who are best suited to Green Dot are those whom other Green Dot teachers refer. To this end, Green Dot offers its teachers a $250 referral incentive if they refer a candidate who teaches in a Green Dot school for three months. In 2011–2012, the CMO began a residency program with Loyola Marymount University but discontinued it because of lack of funding.

Screening

Before the initiative, a principal would identify a potential candidate and then ask HR to review the candidate’s qualifications. After the reorganization in the spring of 2011, the HC

57

department assumed responsibility for screening applicants and then made the information available to the principals. All applicants apply online, and HC identifies the eligible candidates through phone interviews aligned to the observation rubric. Until 2011–2012, Green Dot also used the Haberman assessment but eliminated it in 2011–2012. Eligible candidates submit lesson plans, receive feedback, and participate in home office groups. Qualified candidates are placed into a pool from which principals can select appropriate candidates to interview for their schools. In 2015–2016, Green Dot explored ways to streamline the process for highly qualified candidates (e.g., TFA alumni who were known to the organization and tended to do well). The CMO skipped the phone screen for such an applicant, who either did a mini demo lesson for the home office, received feedback, participated in a Socratic discussion, and went on to do a school demo, or went directly to the school demo lesson.

Hiring

The school site typically asks each candidate to conduct a demonstration lesson and respond orally to teaching scenarios. The school’s hiring panel scores candidate’s responses, and the final hiring decision belongs to the principal. In 2015–2016, Green Dot explored ways to streamline the process for highly qualified candidates (e.g., TFA alumni). Candidates who had been through the TFA training tended to do well as Green Dot hires, so the CMO skipped the phone screen for such an applicant and either referred the applicant directly to the school to conduct a demonstration lesson or asked the applicant to conduct a mini–demonstration lesson for the HC department, receive feedback, and participate in a Socratic discussion; those who performed satisfactorily were referred to schools.

Transfer and Placement

There are no mandatory transfers of teachers in Green Dot. Each school does its own hiring, and any teacher wanting to change schools must apply to the new school and go through the hiring process for that school.

All the CMOs are expanding, and RIFs are rare. At Green Dot, the union contract describes the criteria to be followed in identifying teachers who would be let go should a RIF become necessary and the principal and affected department members cannot agree on a determination regarding who will be laid off. The CMO must rank the teachers in the affected department according to four criteria, with the following weights:

• 40 percent status of credential • 30 percent average score of all evaluations • 15 percent educational attainment • 15 percent years of experience.

Green Dot places any teacher selected for layoff on a reemployment list for 12 months and offers that teacher any vacant position that meets his or her qualifications. If the teacher declines the offer, Green Dot removes him or her from the list.

58

PUC

Recruitment

PUC recruits at university job fairs, at job fairs hosted by PUC, at “meet and greets” at local universities, and through partnerships with UC Los Angeles (UCLA) Extension and the Claremont Colleges.

PUC started a mathematics and science residency program with Loyola Marymount University in 2010–2011 but had difficulties with the quality of the residents and the time commitment necessary for the teacher mentors; it discontinued the program.

In 2014–2015, PUC began a “grow-your-own” residency program with Loyola Marymount University for graduates of PUC HSs and people who had worked for the schools in such roles as teacher aides. It started with two residents in 2014–2015 and had seven residents complete the program in 2015–2016; all but one accepted positions at PUC. Although we never asked PUC staff directly about diversity issues, having a residency program utilizing graduates of PUC’s own schools allows the CMO to hire teachers who match their students in terms of ethnicity and socioeconomic background.

Screening

Before the IP initiative, candidates applied online through EDJOIN or at job fairs. In 2013, PUC implemented its own online platform, which includes a tracking system called ClearCompany that allows SLs to track where a candidate is in the process. HR screens candidates to make sure they have the appropriate credentials and experience then uploads applications for principals to review.

Hiring

Hiring is coordinated centrally, but individual principals make the final decisions. HR holds two interview days per week interviewing panels of candidates. Principals attend if they are interested in the set of candidates being interviewed. This approach allows principals to see, at one time, all the candidates qualified for a particular position (e.g., science teacher). Principals, APs, and teachers can attend the interviews. The next day, the candidate returns and does a demonstration lesson for 30 minutes, and the students who are taught rate the lesson. In 2015–2016, PUC changed the hiring process so that it could make offers more quickly. It could interview a candidate and host the candidate’s sample lesson all in one day.

Transfer and Placement

All movement between schools is voluntary. Teachers who want to change schools must apply for any openings. There are no monetary incentives to move to high-need schools, but, if a high-need school needs veteran teachers, a teacher might get a leadership opportunity, such as serving as department chair. This has occurred very rarely.

59

Appendix E. Site Tenure and Dismissal Policies: Supplementary Material for Chapter Five

The descriptions in this appendix supplement the information on tenure and dismissal policies presented in Chapter Five. We first describe the districts, then the CMOs.

District Tenure and Dismissal Policies

HCPS

HCPS offered tenure until July 1, 2011, when the Florida state legislature passed a law abolishing tenure for newly hired teachers. (Under the new law, teachers who had already earned tenure retained tenured status.) From 2011–2012 until 2015–2016, HCPS offered any newly hired teacher nonprobationary status after three years of satisfactory performance as a probationary teacher and a fourth-year appointment to a teaching position. HCPS granted nonprobationary teachers protections similar to those provided under tenure in the past. HCPS is considering changes in its approach to defining probationary and nonprobationary status beginning in the 2016–2017 school year.

After a rating of U or NI, a teacher is required to participate in an assistance plan aligned with the teacher-evaluation rubric that includes target dates for showing improvement. In general, teachers who are placed on assistance plans and do not show marked improvement are not renominated for teaching positions (i.e., not offered teaching contracts for the subsequent year). In most cases, these teachers are counseled into nonteaching positions, such as assistant teachers, rather than dismissed from the district outright.

Of the 152 teachers not in probationary status and eligible for assistance plans in 2015–2016,

• 135 stayed in teaching positions and went on the assistance plan. Of these, eight were in Deferred Retirement Option Program status, which means that they planned to retire within the subsequent five years.

• 12 were terminated or retired. • five were demoted to assistant teaching positions.

PPS

A PPS teacher must complete six semesters (three years) of satisfactory performance to earn tenure. Teachers are pretenure during this three-year period. This policy predates the IP reforms, but the definition of satisfactory performance changed with the IP reforms. From 2010–2011, the year PPS implemented the RISE teacher-evaluation rubric and observation process, through 2012–2013, satisfactory performance consisted of RISE ratings of B, P, or D. When PPS

60

implemented the combined TE measure in 2012–2013, satisfactory performance consisted of a TE rating of D, P, or NI, if it was the first NI rating. U performance consisted of a TE rating of F or two NI ratings in the same certification area in ten years; two NI ratings equals one F rating. Even though PPS based the definition of U performance on ratings on the combined measure, teachers who are eligible for tenure typically do not have all the TE measures. They are generally missing the measures of school and individual value added because those measures are based on multiple years of data. Therefore, the combined measure rating used for tenure decisions generally consists of observation and Tripod scores. At the end of the 2014–2015 school year, 52 teachers (out of a cohort of 109) were eligible for tenure, and all 27 received tenure; about half left the district prior to the tenure decision year.

Before 2011–2012, a principal could place a teacher on an improvement plan at his or her discretion, based on an observation. During the 2011–2012 and 2012–2013 school years, PPS provided additional support for low-performing teachers through the employee improvement plan (EIP) process. Teachers who did not perform well on the RISE process in 2010–2011 were required to participate in the EIP process starting in the fall of 2011. Although there were no clear criteria for placing teachers on EIPs, teachers who were low performers (U or low B) were asked to participate. When the combined TE measure was implemented in 2012–2013, PPS required any PPS teacher who received a rating of NI or F in the previous year to participate in an intensive support plan, which replaced the EIP process. PPS first used 2012–2013 ratings to identify teachers for intensive support in the fall of 2013. In 2015–2016, 55 teachers participated in intensive support (23 pretenured and 32 tenured).

Tenured teachers who receive U TE ratings for two consecutive years are eligible for dismissal based on poor performance. Of the 55 teachers who participated in intensive support during 2015–2016,

• five did not receive ratings because they were on long-term leaves of absence • three retired • six resigned • 11 teachers received second U ratings for which they could have been dismissed; ten of

these were reassigned, and one is in the dismissal process • 27 improved • three received second negative ratings but were not dismissed because these were not

their second consecutive U ratings; they all participated in intensive support the next year.

SCS

Before July 2011, completion of six semesters (three years) of satisfactory performance was required to earn tenure. Since then, SCS has required teachers to meet all of the following conditions:

• Hold a bachelor’s degree from an approved college or a two-year degree with equivalent training.

61

• Possess a teacher’s license that is valid in Tennessee. • Complete a probationary period of five school years, or not less than 45 months, with the

last two years employed in a regular (rather than interim, such as substitute) teaching position.

• Receive scores of 4 or 5 on the combined TEM in the last two years of the probationary period.

• Receive an offer of employment at the conclusion of the probationary period.

A teacher who earned tenure on or after July 1, 2011, can return to probationary status based on poor performance if he or she receives TE ratings of 1 or 2 on the combined measure for two consecutive years. Therefore, to maintain tenure, a teacher must earn a rating of 3, 4, or 5 on the combined measure. (Teachers who earned tenure earlier could not return to probationary status.)

Ratings for 2011–2012 were first used in 2012–2013 to identify teachers for improvement plans, called professional learning plans (PLPs). Any teacher who received a rating of 2 or lower on two or more of the seven observation indicators or who had an overall TEM score of 1 or 2 was recommended for a PLP, in which the principal and teacher are supposed to plan a course of PD and feedback to help the teacher improve during the course of the year. If the teacher does not improve during the year, he or she is generally counseled out at the end of the year. In addition, teachers with low scores on the observation measure are recommended for initial coaching conversations at the beginning of the subsequent school year and encouraged to develop PLPs with their principals to help them improve, but, for such teachers, PLPs are not required.

Principals are responsible for recommending teachers for nonrenewal (dismissal), and HR must uphold the recommendation. In SCS, nontenured teachers are on year-to-year contracts and can simply be “not renewed” at the end of the year. If a teacher is not renewed at a specific school, he or she can search for employment at another school, but the district does not guarantee a position in the district. A teacher who received tenure before July 1, 2011, cannot be dismissed for poor performance; he or she can be dismissed only for cause (e.g., insubordination).

CMO Tenure and Dismissal Policies None of the CMOs offers tenure. Teachers are hired by the principals at the schools where

they will be teaching. Their employment is at will, and they are retained or dismissed at the discretion of their principal. Teacher-evaluation results do not automatically trigger dismissal or review. However, both evaluative data and the input of CMO central staff are taken into consideration. The principal has final authority to retain or dismiss. Very few teachers are dismissed midyear; typically, they are just not rehired for the subsequent year.

Generally, CMO SLs visit teachers’ classrooms frequently, particularly new teachers, and they identify poor teacher performance using these “pop-in” visits, the formal and informal observations that are part of the evaluation cycle, and data on student performance. Typically, when a teacher is seriously struggling to perform acceptably, the principal will consult with his

62

or her area superintendent and the HR department and will document evidence of the teacher’s deficiencies and the measures taken to assist the teacher to improve, such as placement on an improvement plan. Depending on the CMO, teachers are usually given 30 to 45 days to show improvement, then possibly an additional 30 to 45 days if necessary. Teachers on improvement plans typically are observed more often (although not as part of the observation evaluation score), given more feedback, and given more coaching that can include coteaching or coplanning with them. Teacher-evaluation results do not automatically trigger placement on an improvement plan. A principal can choose a less formal approach to assisting a teacher (e.g., recommending that the teacher observe another teacher) because improvement plans require more of a principal’s time. The principal considers observation results, student assessments, and stakeholder survey results when deciding how to support a teacher.

Because Green Dot has a teachers’ union, the process for placement on an improvement or development plan and termination is defined in the union contract. The contract states that a teacher with less than two years of service can be placed on a development plan (the first phase of an improvement plan) after two informal observations and debriefs showing two or more indicators with ratings of 1.0. A teacher with two years or more of service who averages less than 2.0 after any formal observation can be placed on a development plan. In 2015–2016, Green Dot piloted a change for veteran teachers to possible placement on a development plan if the teacher’s summative observation score was less than 2.0 or, in the past two consecutive years, the teacher received a fall semester observation average score between 2.0 and 2.3.

The development plan requires all of the following:

• areas of growth in which specific improvement is needed, along with supporting evidence • specific expected outcomes for improvement • supports and resources to be utilized to assist with the improvement • the means by which improvement will be measured.

If, after 45 days, the teacher does not make sufficient improvement, Green Dot can place him or her on an improvement plan for another 45 days. If the teacher still does not show sufficient improvement, Green Dot can terminate or not rehire him or her for the following year.

63

Appendix F. Site PD Policies: Supplementary Material for Chapter Six

The descriptions in this appendix supplement the information presented in Chapter Six on PD policies. We first describe the districts, then the CMOs.

District PD Policies

HCPS

HCPS has long prided itself on offering a robust menu of PD options to its teachers, including both school-wide and individualized training. Although HCPS offered a wide range of PD opportunities, before the IP initiative, they were not explicitly linked to teacher evaluation. The primary change in the provision of PD at HCPS during the initiative was to link the content of PD programs to the components of the newly developed classroom-observation rubric. It accomplished this connection in three ways: by offering training on the rubric itself, by updating the online web platform to match PD offerings with rubric components, and by providing mentoring for new teachers.

In 2011–2012, HCPS created a seven-hour training program to educate teachers on the components of the new TE measure, particularly the classroom-observation rubric. It provided this program via multiple routes: a two-evening workshop, a daylong Saturday workshop, a school-wide in-service, and video. The site offered the seven-hour training to teachers throughout the initiative.

During the initiative, most teachers at HCPS accessed PD through an online web platform launched in April 2012 that lists PD options by observation rubric component. A teacher could choose PD based on the areas in which he or she needed the most development according to his or her scores, and peer evaluators, mentors, and principals and APs recommended specific PD programs based on the teacher’s classroom evaluation scores in a postevaluation conference held after each formal observation. Teachers on improvement plans also had access to coaches, who could recommend specific PD as well. The district created what it referred to as “look-for” lists, to help teachers use their evaluation scores to guide them toward the PD offerings providing the most benefit. The district also looked at general trends across teacher-evaluation rubric scores across the district to decide which PD programs to offer and highlight through the website. The website included options for online-only, face-to-face, and hybrid PD programs. Moodle was the primary platform for online-only PD both before and during the initiative.

Starting in 2010–2011, HCPS assigned each new teacher (a teacher in the first two years at HCPS, with less than six months’ prior teaching experience) a mentor, as described in Chapter

64

Two. Mentors met with new teachers weekly for coaching and debriefing, and they kept records of their mentees’ PD. PD for new teachers was aligned with the state’s new-teacher development program (the FEAP), and, in 2011–2012, the site developed specific courses linked to the TE rubric as part of TIP, HCPS’s two-year PD course for all new teachers with no previous teaching experience.

During 2015–2016, the district began a shift toward offering more in-school PD, with the goal of leveraging in-house expertise on the TE system and best practices to deliver PD to teachers and to craft PD programs that would address the needs of the teachers at each individual school. Principals attended training on looking at teacher-observation data and using the data to create these types of specialized PD programs for their schools. A few schools began offering this type of PD in 2015–2016, and HCPS planned to expand such offerings to all its schools during the 2016–2017 school year.

During the initiative, HCPS encouraged PD for all teachers, but, for veteran teachers, it was not required (except for a minimum amount for state recertification every five years), even for teachers on development plans. Section 4 of the classroom-observation rubric contains a part in which the principal can note a teacher’s PD, but principals were not required to monitor teacher participation in PD, apart from completing the rubric (although some did).

PPS

Before the initiative, in general, PPS did not use TE data to inform PD options or recommendations for teachers. This changed in 2010, when it implemented the RISE observation process, which included postobservation feedback. From 2010 through 2013, PPS used RISE data to place low-performing teachers on EIPs, which were structured plans for professional growth approved by the teacher and the principal. It used EIPs until 2013. Starting in June 2013, when PPS implemented the composite TE measure, teachers received general suggestions for PD they could pursue as part of the package of information they received along with their TE scores. Teachers who scored at the P or D level were expected to take it upon themselves to identify specific PD opportunities and pursue them independently. The TE data were supposed to inform teachers’ PD planning, but the district did not monitor this in any systematic way. Teachers who scored at the F or NI level were put on structured PD plans (called intensive support) that were approved and supervised by the principal. Informal feedback and coaching through the regular observation and feedback process (the district called this the RISE process) continued as well.

Throughout the initiative, PPS teachers had access to several types of support:

Coaching

PPS expected principals and other observers (e.g., ITL2s) to coach and give feedback to teachers as part of the RISE process. According to our interviews with principals, teachers, and central-office staff, this feedback covered everything from specific instructional strategies (e.g.,

65

questioning techniques) to more-general topics, such as classroom layout. In addition, LESs provided coaching to teachers struggling with classroom management.

Induction and Ongoing Support for New Teachers

Pennsylvania requires completion of a new-teacher induction program for level II certification, and PPS had made completing such a program a prerequisite for the tenure milestone, so it is something the district has always provided. Most of the teachers who have participated in PPS’s induction program are in their first or second years of teaching. Before the initiative, this orientation lasted two to three days; after the initiative, it was expanded to about two weeks, before the beginning of the school year. PPS implemented this two-week induction program from 2010–2011 through 2013–2014. In addition, during the initiative, PPS planned to provide ongoing mentoring and support for new teachers, but it did not implement these programs systematically. In 2014, PPS hired a full-time coordinator and coach for the district’s new-teacher support efforts. The new coordinator adjusted the timing of the induction courses so they occurred throughout the school year rather than before the start of school. According to central-office staff, PPS made this change to accommodate teachers who were hired after the start of the school year. As of 2014–2015, the content of the induction course included a series of face-to-face seminars that focused on the RISE rubric, teachers’ specific content areas, and networking with more-experienced colleagues; online courses that included fostering a positive classroom climate and culturally responsive pedagogy; and Beyond Diversity training on raising awareness of race-based inequity. The new coordinator also provided periodic coaching (differentiated according to need) to every teacher in his or her first year, connected new teachers with more-experienced teachers for peer-to-peer coaching, and coordinated the induction program.

District-­Provided Large-­Group Sessions

PPS provided large-group PD sessions for teachers before the initiative and continued to do so throughout the initiative. These daylong sessions occurred about four times per year and, according to teachers, typically covered grade-level curriculum content and district administrative matters.

PD Provided at School Sites

Each principal provided large- and small-group PD sessions for his or her teachers at the school site before the initiative and continued to do so throughout the initiative. According to principals and teachers we interviewed, for the first several years of the initiative, many of these sessions were training sessions devoted to the RISE rubric. For example, each school sent representatives (called RISE teams) to receive training on RISE from the district; these RISE teams would then train the teachers in their schools. As the initiative progressed, principals we

66

interviewed told us, some of these sessions addressed components of the RISE rubric for which principals’ observations indicated support was needed.

Resources for Individual Use

At the beginning of the initiative, PPS planned to provide resources—generally online—which teachers could opt to access and use for their individual development; such resources were not available before the initiative. In the fall of 2011, PPS implemented an online platform it called the Learning Bridge, which used an online interface to organize and present a variety of articles, videos, a catalog of resources linked to the Tripod survey, and other resources teachers could choose to access and that were designed to help teachers improve their instruction. According to the central-office staff we interviewed, PPS struggled to develop or vet appropriate, high-quality content, and teacher use of the Learning Bridge was low. In 2013, PPS switched platforms and implemented BloomBoard, an online platform that enabled PPS staff to organize resources by RISE component and that came with a built-in library of resources. Low levels of teachers’ use of BloomBoard remained a challenge in 2016, according to central-office staff. This is consistent with what we heard in teacher interviews; few teachers reported using it.

SCS

Before the initiative, PD for legacy MCS teachers was entirely online; teachers completed their required numbers of hours of PD individually by completing online courses. According to interviews with the central-office staff, using TE data to inform PD recommendations for teachers was not something the district did before 2011. SCS began using observation data to identify teacher development needs in 2011–2012, the same year it adopted its effectiveness measure. Teachers who were performing at acceptable levels (i.e., 3, 4, or 5 out of 5) were provided with periodic observer feedback and encouraged to seek additional development opportunities on their own. Teachers who received low scores (i.e., 1 or 2 out of 5) on two or more rubric components were encouraged to seek PD designed to help them improve in those areas.

After the MCS–SCS merger, in July 2013, the district’s emphasis for PD shifted to one-on-one coaching. As of the fall of 2013, central-office staff told us that there were four “tiers” to the SCS teacher support model: (1) large-group coaching from PIT crew and PD staff on issues or topics concerning large groups of teachers (e.g., Common Core or new-teacher induction); (2) in-school team-based learning through PLCs, organized by the principal and led by the PLC coach; (3) support for struggling and new teachers in the form of job-embedded coaching support from HE educators (i.e., learning coaches, master teachers, PAR CTs, and PIT crew); (4) self-directed independent study opportunities available to all teachers through the video library (in PD 360, the district’s PD repository before My Learning Plan; videos are aligned with rubric standards) and through the Teachscape video-capture and reflective practice process, in which the teacher video-records lessons and discusses the videos with a coach. In interviews, SCS’s central-office

67

staff described this as a shift in approach to PD from a centralized “sit-and-get” model to one in which teachers received differentiated support through the coaching model or through individual, self-directed study; SCS expected each teacher to be responsible for his or her own professional learning.

SCS teachers had access to a variety of PD opportunities throughout the initiative:

Coaching

In 2013–2014—the year after the merger—SCS adopted legacy SCS’s coaching model (called tiered coaching) as a means of ensuring that struggling teachers received some coaching support. SCS required any teacher who received a score of 1 or 2 on more than two rubric indicators in a given observation to work with a coach for about six weeks, after which the teacher would be observed again. Teachers who did not show improvement with this approach were referred to increasingly intensive coaching with more expert teacher coaches in their schools and then to full-time coaches who served larger regions. This system was in place until the end of the 2014–2015 school year. PAR CTs supported struggling veteran teachers in some schools starting in 2013 and continuing as of the writing of this report. PLC coaches (2013 and ongoing) provided additional school-based coaching support in their buildings. SCS shifted its coaching strategy in 2015, according to interviews with central-office staff, away from using TEM data to identify the lowest-performing teachers for coaching to using TEM data to help identify teachers who would potentially grow the most from coaching support (i.e., teachers who were likely to grow the most and who were most receptive to coaching). In 2015, the district also implemented subject-specific coaching in mathematics and literacy. From 2011 through 2013, SCS also offered real-time coaching in a few schools to teachers who were willing to participate. The participating teacher would wear an earpiece and receive real-time coaching advice from an observer standing in the back of the classroom.

New-­Teacher Mentors

As of the fall of 2012, SCS paired any teacher new to the district and to teaching with a veteran teacher mentor for the new teacher’s first year. The mentor worked with the mentee face-to-face every month for coaching and support and discussed the mentee’s progress with the principal. As of the fall of 2015, coaching was no longer part of the mentor teacher’s role; instead, the mentor’s support was more about how to navigate the PD system and where to access instructional resources. As far as we know, the new-teacher mentor program was ongoing as of the writing of this report.

District-­Provided Large-­Group Sessions

SCS offered district learning days, large-group sessions all district teachers were invited to attend, about three times per year before and during the initiative. As of the summer of 2016, the

68

district began offering TEM “deep-dive” sessions, which teachers could choose to attend over the summer.

PD Provided at School Sites

As of the fall of 2013, SCS expected principals to offer PLC PD sessions in their schools. In most schools, the principal developed the session and a PLC coach led it. According to our interviews with central-office staff, these PLC sessions were supposed to offer teachers the opportunity for team-based learning in their schools. Principals were also responsible for organizing PD sessions to orient teachers to the evaluation system.

Resources for Individual Use

SCS teachers had access to numerous resources for individual use throughout the initiative. One of the first, developed in 2011, was a handbook called Resource Book (Whitney et al., undated), an online and printed listing of PD resources (e.g., videos, articles, lesson plans, in-person PD sessions) and a crosswalk so that teachers could easily identify which resources were relevant to which rubric components. This handbook was in use until the merger in 2013.

Starting in the fall of 2012, SCS began to build a video library in the Teachscape platform, which provided PD opportunities in two different ways. First, teachers could access exemplar videos of SCS teachers. Second, for the teachers whose teaching was featured in the videos, the creation of the videos offered an opportunity to review and reflect on their practice.

After the 2013 merger, teachers could access some independent study resources online via the Learning Loop (later called My Learning Plan), an online resource available to teachers; principals recommended resources (e.g., videos, readings, example materials) based on observation scores. Principals could also use these resources as a starting point for discussion during postobservation conferences.

These opportunities were generally available to all teachers; in addition, the district’s most-struggling schools (known as iZone schools) had their own, supplemental PD resources not available to other schools.

CMO PD Policies The amount of PD that the CMOs offered changed very little from before the initiative: a few

days of CMO-wide sessions, weekly school sessions, and some content-focused half-days. A main driver of PD has always been and continues to be student assessment results, but, for the first three years of implementation of the IP initiative in the CMOs (school years 2010–2011 through 2013–2014), the CMOs spent a good deal of time on the instructional strategies embodied in the observation rubric. With the advent of the Common Core State Standards in 2012–2013, the focus shifted to implementing curricula to meet those standards.

All of the CMOs offered several CMO-wide PD days each year; these typically included sessions with a subject-matter focus and sessions targeted at rubric indicators. In addition,

69

schools conducted weekly PD sessions for one to two hours. The agenda and format for the school sessions varied by CMO. At Alliance, they were usually directed by the principal; at Aspire and PUC, school instructional teams planned the sessions; and, at Green Dot, the central office set the subject for a few of the sessions and the school team planned the rest.

Except for Aspire, the CMOs did not have central-office coaching staff before the initiative, but they all developed coaching staffs during the initiative. All of the CMOs separated coaching from evaluation and focused coaching on teacher development; consequently, coaches did not have access to observation results. Typically, the principal and teacher set a series of goals based on observation results, and coaching included those goals. However, in most of the CMOs, much of the coaching was subject matter–based. Because the CMOs are relatively new organizations and attract many young teachers, these new teachers had to meet state requirements to participate in induction programs to “clear” their preliminary credentials. Teachers could select from a variety of induction programs. Those who enrolled in BTSA also received one-to-one coaching, on average, for one hour each week from their BTSA coaches. Two of the CMOs—Aspire and PUC—had their own BTSA coaches.

With the launch of the observation process in 2011–2012, PD focused on acquainting teachers with the rubric and the CMOs provided PD sessions on specific indicators. In 2012–2013, the Common Core standards replaced the California Instructional Standards, and, the next year, the CMOs began planning for the transition to a new state assessment aligned with the Common Core. The PD emphasis shifted to the Common Core, although with attempts to highlight links to the observation rubric indicators. As one Alliance administrator said in the fall of 2014, rubric-linked instructional strategies “are part of trainings, but we don’t take the matrix and say ‘this training is focused on this module of the TCRP.’” Several CMOs created crosswalks detailing the links between Common Core standards and the CMOs’ observation frameworks (e.g., Aspire updated its Aspire Instructional Rubric guides for each of the observation rubric indicators to show the link with the Common Core.

Alliance

Before the initiative, Alliance held several days of Alliance-wide PD sessions every year. Each school held its own weekly PD session, with the agenda at the principal’s discretion. The central office did not provide coaching. In this section, we describe Alliance’s PD practices during the initiative.

Use of TE to Drive PD

The instructional branch of Alliance, which designs most of the Alliance-wide PD, used student performance data and teacher input to design PD sessions for the Alliance-wide PD days. Common Core standards and assessments played a major role. According to central-office staff, only about 10 percent of PD focused directly on the rubric indicators, reflecting the instructional

70

staff’s focus on assessment results. Observers also used observation results to provide feedback to teachers.

Content

School-based PD sessions were at the principal’s discretion through 2015–2016. In 2011–2012, Aspire asked principals for the first time to submit PD action plans to the central office, but few principals complied. Alliance-wide PD sessions occurred every ten weeks and targeted schools reviewing their benchmark test data, content-area group meetings, and choice sessions related to the observation rubric or other topics (e.g., blended learning). Besides the Alliance-wide sessions, there were regional sessions focused on content areas.

Coaching

Alliance hired coaches for the first time in 2013–2014: four ELA coaches, four mathematics coaches, and two other coaches. In 2015–2016, there were 15 content-area central-office coaches. Each coach had a caseload of up to six schools, which they visited every one to two weeks to assist teachers. Most coaching was subject matter–based. Alliance also instituted ALLI coaches. These were teachers who coached several periods per day and taught during the other periods. Each school had one or two ALLI coaches trained to coach new teachers. Ideally, they coached each new teacher 90 minutes per week. New teachers enrolled in BTSA received about one hour each week of additional coaching through that program. The central-office coaches were discontinued in 2016–2017, and the ALLI coaches took on more of an induction role starting in 2015–2016.

For New Teachers

PD for new teachers consisted of a two-day induction at the beginning of the year; the duration was increased to four days in 2013–2014. In addition to HR information, the induction included an introduction to TCRP, ELL instructional strategies, and orientation to the instructional guides. New teachers also had the option of enrolling in BTSA, but doing so was not a requirement, and TFA teachers (a strong source of new teachers for Alliance) did not enroll in BTSA.

Resources

In 2012–2013, Alliance had a few webinars and some videos on its internal website that aligned with the observation rubric. Other resources were scattered across several websites, and teachers did not access them frequently. The CMO switched to the BloomBoard platform in 2012–2013 and began to populate it with resources, but the resources remained limited compared with those available in the other CMOs.

71

Aspire

Before the initiative, Aspire provided summer training sessions for teachers new to Aspire, as well as a new-teacher support group, instructional coaches, classroom observations and formal performance feedback from the principal, and weekly school-based PD sessions. In this section, we describe PD that Aspire provided during the initiative.

Use of TE to Drive PD

The CMO looked at areas in which ratings were low and used trends to drive workshops. Principals also used TE data to select their schools’ focus indicators and to inform their individual work with teachers or the decision to bring in coaches. As of 2016–2017, TE data continued to influence PD.

Content

Aspire offered several days of retreats for school principals and department chairs, quarterly assessment data days for teachers, two half-days per month for planning meeting and lesson study with a teacher’s grade and content cohort, and weekly school sessions lasting 90 minutes to three hours. PD centered on gaps in teacher practices identified through the observation data and student assessment results. At the school level, each principal identified a few indicators as the quarter- or semester-long focus for PD at that school.

In 2014–2015, Aspire held a summer Common Core institute for all teachers and an optional additional one-week Common Core summer training. PD focused mainly on instructional strategies aligned with the Common Core standards.

Coaching

Aspire assigned central-office instructional coaches by region, subject, and grade level. In most instances, a principal or teacher initiated the request for coaching support. Aspire is a BTSA provider, and about half of a coach’s clients were new teachers completing the state-required induction program, which includes one-to-one coaching for, on average, one hour per week. Aspire restructured its operations in 2016–2017 and eliminated the central-office coaches in three of its four regions. Most were assigned to specific schools as deans of instruction and continued to have a strong coaching role for new teachers.

For New Teachers

Each new teacher attended a one-week summer training session and monthly follow-up sessions during his or her first semester. If the teacher enrolled in BTSA, he or she also received the general guidance of an induction coach and about an hour of coaching each week.

72

Resources

Aspire directed much of its PD resources to the development of online products linked to the observation rubric indicators. Observation ratings and feedback were entered into an online platform, BloomBoard, along with materials tagged to rubric indicators and performance level. In 2011–2012, it launched the Purple Planet, a website with PD aligned with the rubric. In 2013–2014, it expanded online resources with the addition of Doug Lemov videos and Relay teacher-training courses. Aspire also created short videos of instruction linked to specific indicators, featuring Aspire teachers at various performance levels. By 2014–2015, Aspire had an online library of more than 200 film clips.

Green Dot

Before the initiative, Green Dot conducted two Green Dot–wide collaboration days, for teachers to “learn, collaborate, create common assessments, and share best practices with discipline specific peers across the organization” (Green Dot, undated [c]), and benchmark collaboration days for reviewing benchmark test results. In addition, schools conducted weekly 90-minute PD sessions. In this section, we describe PD delivered at Green Dot during the initiative.

Use of TE to Drive PD

By 2013–2014, Green Dot identified four observation rubric indicators, which correlated with academic performance and were well aligned with Common Core strategies (cognitive engagement, group structures, academic discourse, and questioning), and linked all PD to those indicators. Principals also used teachers’ observation results to direct coaching and to supply topics for some school PD sessions.

Content

Green Dot offered several CMO-wide PD days, which focused primarily on content areas, and schools provided 90 minutes of PD once a week, which increased to two days a week in 2014–2015. Each spring, Green Dot provided a PD focus for the coming year (e.g., focusing on cognitive engagement, which is on the observation rubric and in the Common Core standards). Principals and their cluster superintendents developed the PD school focus based on teacher and student data and usually paralleling the Green Dot–wide sessions. One school session per quarter focused on an observation rubric indicator developed by the central office. According to central-office staff, about 60 percent of school PD aligned with both the rubric indicators and to Common Core standards. In the summer of 2014, Green Dot held a four-day Common Core boot camp for administrators, PD central-office staff, and teacher instructional leaders. Starting in 2014–2015, the content of CMO-wide PD days focused on Common Core State Standards, but they were explicitly linked to rubric indicators. As one central-office staff person explained,

73

“The framework is our common language. We’ll put it on a slide at the beginning and say what we’re focusing on, but it’s not the meat of what we’re focusing on. It’s how we view practice.”

Coaching

Before the initiative, Green Dot did not have any central-office coaches. By 2015–2016, 15 coaches were available: three science, two mathematics, three history, four ELA, one special education, and two teacher effectiveness support specialists. Green Dot assigned coaches to teachers at the CMO level. An administrator could also request a coach for a teacher or school PD session. Green Dot offered coaching to all first- and second-year teachers. In 2013–2014, Green Dot organized its coaching practices into a three-tiered coaching system based on observation results: Basic coaching consisted of observation or lesson planning once a month, limited coaching was twice a month, and targeted coaching was a weekly observation and debrief.

For New Teachers

Each new teacher received five additional days of PD in the summer before the start of school. A new teacher also received targeted coaching in the second quarter consisting of a weekly observation and debrief.

Resources

In 2013–2014, Green Dot began modifying the PUC instructional growth guides (see the section on PUC), which are linked to specific observation rubric indicators, and making them available online. These guides were also linked to Common Core standards. They also developed videos of best practices with examples of several teachers effectively implementing the observation rubric indicators.

PUC

Before the initiative, PD at PUC included weekly workshops and at the school level. Principals developed growth goals and targets in specific areas. In this section, we provide more-specific details of the PD provided during the initiative.

Use of TE to Drive PD

Principals focused their school PD sessions in part on teachers’ growth goals linked to the rubric indicators that were common among their faculty.

Content

During the initiative, PUC continued having several CMO-wide PD days and weekly PD sessions at each school. At the school level, PD in 2011–2012 and 2012–2013 focused on the observation rubric and student achievement. The CMO encouraged principals to form PLCs for teachers with growth goals for similar indicators. These small groups met together during PUC-

74

wide PD days. Also, once a year, teachers presented sessions at the PUC-wide days focused on the rubric indicators and designed their own PD, which might include observing a teacher at another school or doing research on the internet for a specific topic. In 2012–2013, PUC began the transition to the Common Core standards with a weeklong Common Core institute for teachers, and half of the time at the PUC-wide PD days focused on the Common Core standards. After the introduction of the Common Core, PD focused more heavily on content and results of the state assessments. In 2015–2016, for example, the PUC-wide PD days focused on addressing literacy problems for ELL and special-education students, an issue identified by the state assessment. After low mathematics results on the state assessment, a mathematics specialist was hired to provide PD for mathematics teachers.

Coaching

In 2011–2012, PUC began hiring coaches and brought in TNTP to develop the CMO’s coaching capacity. The CMO assigned each coach two schools, and and the coach would spent one day a week at each school. Coaches’ first priority was new teachers. All coaches were also BTSA induction coaches.

For New Teachers

New teachers received one week of summer training sessions and follow-up sessions with central-office staff. Each teacher in induction had two hours a week with a coach and three pull-out days for workshops and observing other teachers.

Resources

PUC produced videos of effective teaching strategies, instructional guides with criteria for each level of the observation rubric indicators, and troubleshooting information for implementing the strategies. Teachers could access all resources via the internet. The instructional guides were developed in 2012–2013 and served as models for Aspire and Green Dot, which produced modified versions of the PUC guides.

75

Appendix G. Additional Exhibits for Chapter Six

The exhibits in this appendix supplement the information presented in Chapter Six on staff responses to survey items related to PD.

Figure G.1. Teachers’ Responses About Uses of Evaluation Results, Springs 2013–2016

Percentage of teachers reporting that resul ts from the evaluation of their teaching in the current school year wi l l be used to a moderate or large extent for each of the fol lowing purposes :

HCPS SCS PPS Alliance Aspire Green Dot PUC

To provide you with feedback that you can use to improve your instruction

To identify areas in which you need professional development

To determine whether you need additional support (for example, from an instructional coach)

To decide whether you receive (or keep) tenure [not asked in 2014]

To determine whether you receive a monetary bonus on top of your salary

To determine how much of a salary increase you receive for next year

To determine where you are placed on a career ladder, or whether you are promoted to a higher levelTo determine whether you should move from your current school to a different school

To determine what classes or students within your school you will teach next year

To provide information to parents and/or the general public about the quality of your teachingTo determine whether you enter into some type of probationary status (employee improvement plan, etc.)

To determine whether you are qualified to continue teaching

74

67

52

20

60

37

28

18

25

23

34

43

79

73

60

72

52

42

22

29

35

44

48

72

65

53

25

74

42

30

19

27

29

41

48

74

68

57

28

69

40

31

24

29

32

42

48

2013 20142015 2016

86

81

67

37

34

20

21

27

25

30

32

55

83

80

72

31

23

26

29

33

32

44

57

87

82

76

46

66

56

31

33

37

39

49

62

82

79

73

39

37

32

24

30

34

31

45

56

2013 20142015 2016

80

67

60

27

19

9

15

11

11

15

52

47

82

66

59

22

15

20

12

13

19

57

55

78

63

64

31

22

17

26

12

15

16

51

49

82

65

62

34

29

30

35

16

16

20

52

52

2013 20142015 2016

80

67

60

34

80

43

45

23

24

27

40

57

82

69

66

90

47

46

26

26

31

45

62

81

69

65

28

89

93

65

23

24

30

48

65

83

74

75

35

69

83

58

33

33

37

58

67

2013 20142015 2016

86

75

67

12

75

48

41

10

13

14

20

27

91

75

72

85

84

72

21

20

22

26

30

88

66

61

20

65

89

69

15

16

19

27

38

87

72

67

21

60

85

61

20

22

22

27

39

2013 20142015 2016

88

65

62

12

56

26

30

17

16

17

45

43

80

62

64

64

22

27

18

17

15

59

52

80

61

62

13

28

14

23

17

19

13

59

50

84

66

69

18

17

16

24

22

18

17

54

45

2013 20142015 2016

86

77

76

26

85

36

45

27

18

22

37

52

85

75

79

78

24

33

17

13

21

47

53

85

81

73

25

17

13

28

15

15

17

41

53

89

81

79

28

18

18

27

18

28

23

35

48

2013 20142015 2016

76

Figure G.2. Teachers’ Responses to the Survey Question, “To What Extent Did Each of the Following Influence What Professional Development You Participated in This Year?” Springs

2011–2016

Figure G.3. Teachers’ Agreement That Their PD During the Past Year Was Aligned with Various Sources, Springs 2013–2016

Percentage of teachers saying that each of the following influenced what PD they participated in to a moderate or large extent:

HCPS SCS PPS Alliance Aspire Green Dot PUC

Needs identified as part of a formal evaluation of your teaching

Needs identified from informal feedback you have received on your teaching

Needs and interests you identified yourself

Priorities set by your school or district/CMO for multiple teachers (not asked in 2011 or 2013)

44

40

90

41

34

85

48

42

85

78

43

38

86

74

42

41

87

72

2011 20132014 20152016

43

39

82

62

52

83

68

61

82

78

62

59

85

81

64

57

83

85

2011 20132014 20152016

32

29

56

29

25

54

39

34

61

85

29

27

60

84

33

34

60

84

2011 20132014 20152016

22

23

51

41

37

71

48

50

77

77

47

48

79

77

59

62

81

81

2011 20132014 20152016

24

28

61

37

40

56

52

50

73

83

46

44

68

85

51

54

77

84

2011 20132014 20152016

23

27

64

41

37

59

49

48

58

83

42

46

56

78

46

46

64

79

2011 20132014 20152016

40

41

63

59

55

83

56

56

73

81

56

64

77

79

59

66

78

85

2011 20132014 20152016

Percentage of teachers agreeing (somewhat or strongly) that their PD experiences in the current year had been:

HCPS SCS PPS Alliance Aspire Green Dot PUC

Well aligned with the Common Core State Standards and/or curriculum based on these standards

Well aligned with other standards and/or curriculum

Aligned with or focused on specific elements of my district/CMO teacher observation rubric

85

86

72

91

88

75

87

83

70

88

86

74

2013 2014

2015 2016

91

86

85

86

83

85

85

87

86

89

84

80

2013 20142015 2016

82

76

74

77

69

70

82

77

69

78

78

70

2013 20142015 2016

62

67

71

79

70

72

83

76

69

91

79

78

2013 2014

2015 2016

47

75

78

85

77

76

88

80

66

82

72

70

2013 20142015 2016

54

69

81

84

65

74

82

69

70

83

75

76

2013 20142015 2016

74

71

79

91

77

79

94

84

84

87

82

84

2013 20142015 2016

77

Figure G.4. Teachers’ Agreement with Statements About Support for PD, Springs 2011–2016

NOTE: Significant (p < 0.05) differences between the 2016 percentage and other years’ percentages: For the statement in row 1, HCPS increased from 2013;; SCS decreased from 2015;; Alliance increased from 2011, 2013, and 2014;; Aspire increased from 2011;; Green Dot increased from 2011 and 2014;; and PUC increased from 2011, 2013, and 2014. For the statement in row 2, HCPS decreased from 2011 and increased from 2013;; SCS decreased from 2011;; PPS decreased from 2011 and 2015;; Alliance increased from all prior years;; Aspire increased from 2011, 2013, and 2014;; Green Dot increased from 2013 and 2015;; and PUC increased from 2011, 2013, and 2014 and decreased from 2015. For the statement in row 3, HCPS decreased from 2011 and 2014;; PPS decreased from 2011;; Alliance increased from all prior years;; Aspire increased from 2011, 2013, and 2014;; Green Dot increased from all prior years;; and PUC increased from 2011 and 2013.

Percentage of teachers agreeing with each statement (somewhat or strongly)

HCPS SCS PPS Alliance Aspire Green Dot PUC

School and local administrators have encouraged and supported my participation in professional development.

I have had sufficient flexibility in my schedule to pursue professional development opportunities of interest to me.Sufficient resources (for example, substitute coverage, funding to cover expenses, stipends) have been available to allow me to participate in the professional development I need to teach effectively.

89

75

53

87

52

45

91

59

52

90

56

44

91

59

47

2011 20132014 20152016

90

73

59

91

63

59

91

63

56

93

65

59

89

65

60

2011 20132014 20152016

87

61

55

86

48

50

85

49

46

85

52

46

85

47

46

2011 20132014 20152016

79

50

42

83

53

47

84

56

59

90

63

66

93

74

75

2011 20132014 20152016

82

53

46

88

42

42

88

56

61

92

62

67

90

64

71

2011 20132014 20152016

75

47

42

85

38

52

80

44

55

85

38

47

88

50

63

2011 20132014 20152016

85

49

44

82

55

60

86

58

65

93

70

68

93

64

68

2011 20132014 20152016

78

Figure G.5. Percentage of Teachers Reporting Enhanced Skills and Knowledge, in Various Areas, Due to PD, Springs 2011–2016

Figure G.6. Teachers’ Perceptions of the Usefulness of Various Forms of PD, Springs 2013–2016

Percentage of teachers saying that as a result of PD they had participated in during the current year, their knowledge and skills had been enhanced to a moderate or large extent in each of the following areas:

HCPS SCS PPS Alliance Aspire Green Dot PUC

Your familiarity with effective instructional strategies in subject area(s) that you teach

Your content knowledge in subject area(s) that you teach

Your understanding of difficulties students commonly face, or misconceptions they commonly have, in subject area(s) that you teach

How to differentiate instruction for students in classes with a wide range of ability levels or needs

How to promote student engagement or motivation [not asked in 2014 and 2016]

How to analyze data on student performance

How to manage your classroom and student behavior [not asked in 2014 and 2016]

How to work with or involve students' families [not asked in 2014 and 2016]

82

68

59

62

69

55

46

31

76

64

54

58

66

57

42

27

78

68

58

61

57

73

65

60

62

66

53

46

29

73

63

62

65

56

2011 20132014 20152016

73

63

59

65

68

68

53

41

76

66

57

64

66

64

48

37

73

62

62

64

67

77

65

64

69

71

76

50

46

74

67

62

66

64

2011 20132014 20152016

55

45

39

42

51

52

28

19

51

40

42

39

46

49

27

25

55

46

48

41

48

57

51

51

42

46

52

32

27

52

48

48

34

41

2011 20132014 20152016

42

34

44

35

40

57

29

24

56

38

49

50

48

49

34

30

68

47

55

63

61

72

53

56

63

55

63

40

33

77

55

68

72

68

2011 20132014 20152016

64

43

46

46

62

68

55

21

69

41

45

48

56

66

50

23

77

50

53

50

63

77

58

59

56

64

54

44

36

77

57

61

53

64

2011 20132014 20152016

50

34

43

42

49

52

38

20

59

31

42

36

44

49

37

13

61

32

44

44

48

63

38

46

49

48

62

36

16

65

36

45

52

46

2011 20132014 20152016

62

31

44

43

60

70

35

23

71

42

48

56

62

64

44

36

71

48

51

57

59

80

60

66

69

71

69

58

46

79

62

71

73

71

2011 20132014 20152016

Percentage of teachers reporting that each of the fol lowing types of PD had been moderately or very useful for helping them improve their effectiveness

HCPS SCS PPS Alliance Aspire Green Dot PUC

Workshops or inservices for teachers at your school only (typically on-­‐site)

Workshops, inservices, institutes, or conferences organized by your district/CMO for teachers from multiple schoolsWorkshops, institutes, or conferences put on by external providers (professional associations, universities, etc.) [not asked in 2013]

Online professional development offered by or through your district/CMO

Receiving instructional coaching (provided by school-­‐based coaches or district/CMO coaches)School-­‐based teacher collaboration (grade-­‐level or subject-­‐area teams, professional learning communities, study groups, etc.)

Videos of sample lessons

57

69

47

52

62

42

62

71

66

53

59

70

49

61

69

64

49

56

70

51

62

68

64

52

57

72

49

2013 20142015 2016

63

60

48

56

70

47

64

63

66

53

65

78

52

69

60

67

49

65

78

55

71

63

70

54

64

78

56

2013 20142015 2016

45

41

29

37

65

42

55

41

54

29

46

71

33

54

45

54

26

47

67

36

53

45

55

33

46

63

34

2013 2014

2015 2016

39

31

31

49

61

33

46

46

57

38

61

69

38

59

48

69

46

65

75

46

66

62

74

74

78

82

60

2013 20142015 2016

59

42

30

76

78

49

66

41

51

42

70

77

55

60

58

62

33

69

80

51

63

52

73

45

75

80

57

2013 20142015 2016

48

40

22

50

62

31

50

37

60

33

63

66

37

53

39

70

37

68

68

33

55

49

65

43

64

67

45

2013 20142015 2016

44

64

39

65

69

39

54

64

71

50

72

80

44

66

67

72

40

75

80

43

64

61

73

53

76

79

58

2013 2014

2015 2016

79

Appendix H. Site Compensation Policies: Supplementary Material for Chapter Seven

The descriptions in this appendix supplement the information presented in Chapter Seven on compensation policies. We first describe the districts, then the CMOs.

District Compensation Policies

HCPS

Performance-­Based Salary Adjustments

Two Florida state laws—Florida statute 1012.01(2)(a)–(d) (definitions of classroom teachers, student personnel services, librarians/media specialists, and other instructional staff), passed in 2005–2006, and Senate Bill 736, passed in 2011—mandated merit pay for public school teachers in Florida. In response to these laws, HCPS adopted a MAP, beginning in 2006–2007, which offered bonus pay to teachers in the top quartile of effectiveness. In 2010–2011, HCPS shifted to using the new composite TE measure developed as part of the IP initiative. The district began awarding merit pay to teachers in the top quartile under this new measure starting in 2011–2012.

In 2013–2014, HCPS announced new a performance-based salary adjustment for any teachers who received a 4 or 5 TE rating in the new composite.23 Because the VAM measure of the composite is based on three years of data, it took until 2013–2014 to have enough data to properly calculate the new TE ratings and create the cut scores used for scoring. To be eligible for this salary adjustment, HCPS required that the teacher be in at least the fourth year of teaching and have a VAM score and three consecutive years of observations. A teacher who earns a salary adjustment based on performance in a particular year receives the award as a modification to his or her regular paycheck throughout the subsequent year. (Only staff who remained teaching received this performance-based salary adjustment. HCPS did not award it to staff who retired or left the district during the year when they would have expected to receive the salary adjustment.) The CMO also required that, to remain eligible for the salary adjustment, the teacher remain on the same rubric during this time. If a teacher changed positions in the district, such as moving into a coaching position, he or she would have to wait another five years for a potential performance-based salary adjustment.

23 HCPS concluded that, by awarding IP bonuses to teachers with TE scores of either 4 or 5, it would be in conformance with the aforementioned laws.

80

Between 2014 and 2016, every teacher received a predetermined salary adjustment based on his or her rating. HCPS provided an award of $2,000 for any teacher with a level 4 rating and $3,000 for a teacher with a level 5 rating in 2014–2015. In 2015–2016, a teacher with a TE rating of 4 received $1,900, and one with a TE rating of 5 received $2,900. For the 2016–2017 school year, which is based on the 2015–2016 ratings, HCPS set aside $12.4 million for teachers with level 4 and 5 TE ratings, which was eventually broken out into $1,399.99 for every teacher with a level 4 TE rating and $2,101.84 for every teacher with a level 5 TE rating. About 50 percent of teachers reached performance-based salary adjustments in 2014–2015 based on their TE ratings, and about 55 percent of teachers reached pay-for-performance salary adjustments in 2015–2016 based on their TE ratings.

In addition, HCPS had three bonus programs based on federal TIF grants during this period: POWER1 from 2007–2008 through 2011–2012, POWER2 from 2010–2011 through 2014–2015, and POWER3 from 2012–2013 through 2016–2017. POWER1 covered 116 high-need schools, while POWER2 and POWER3 covered 35 and 30 schools, respectively. Each grant provided HE teachers (defined as top quartile in POWER1 and POWER2 and with a TE rating of 4 or 5 in POWER3) with a lump-sum bonus. The bonus amount for POWER1 and POWER2 was defined as 5 percent of base salary, or an average of $2,000, while the POWER3 bonus was $3,800. Because the TIF grants provided bonuses, not salary adjustments, awarded teachers received this money regardless of whether they were still at the POWER schools.

Effectiveness-­Based Salary Schedule

HCPS did not make any changes to its salary schedule to link base compensation to TE.

PPS

Supplementary Effectiveness-­Based Payments

PPS adopted the PRC Cohort Award as part of the 2010 collective bargaining agreement. The PRC consists of teachers of grades 9 and 10 who work in teams and “loop” with their students over two years—that is, the same group of teachers teach the same group of students in grades 9 and 10. Before 2015–2016, each of three PPS schools had multiple PRC teams. The PRC program was expanded to three additional schools in the fall of 2015. PRC Cohort Awards are based on the PRC teams’ contributions to growth of their students over the two-year loop and are thus awarded every two years. PPS selected teachers for the PRC based on years of teaching experience and an application process that included consideration of TE scores (i.e., teachers rated F or NI are not eligible to hold CL roles). Teachers who are not part of the PRC but whose students are at least 60 percent grade 9 or 10 (non-PRC teachers) are also eligible for a version of the PRC Cohort Award. About 8 percent of the total PPS workforce was eligible for some form of this award in 2015–2016. PRC teachers can earn up to $20,000 each two-year loop; the exact amount is based on a VAM score of at least 51 (out of 99), and amounts increase as the VAM

81

score increases. A non-PRC teacher’s award is based on the maximum awards in his or her school and then reduced by half and prorated based on the number of teams in the school and the proportion of students in grades 9 and 10 whom he or she teaches. Awards for PRC and non-PRC teachers are also prorated based on attendance. In 2015–2016, six of the nine PRC teams had VAM scores that made them eligible for PRC Cohort Awards. In 2015–2016, 80 percent of eligible PRC teachers earned awards based on 2013–2014 and 2014–2015 results, and approximately 26 percent of non-PRC teachers met criteria to earn prorated awards. Awards for non-PRC teachers ranged from $64 to $1,500. From 2010 to the end of the 2014–2015 school year, the PRC VAM score was based 50 percent on assessments and 50 percent on other measures intended to measure student growth in nonacademic areas, such as attendance. In the fall of 2015, in alignment with the district’s decision to eliminate the CBAs from teacher and school-level VAM scores, PPS adjusted the PRC VAM scores, based on feedback from PRC members, to 40 percent assessments and 60 percent nonassessments. The 2015 PRC VAM scores also omitted the district’s homegrown CBAs and replaced them with the state’s Keystone exams.

The STAR Award is a school-level award that was also adopted with the 2010 collective bargaining agreement but first awarded in the fall of 2012. Eligibility for traditional schools is based on a two-year VAM score that compares the performance of PPS schools with that of other schools in the state, with awards going to up to eight PPS schools in the top 15 percent of schools in the state. If fewer than eight PPS schools are in the top 15 percent, up to eight schools will still receive awards, provided that the schools are within the top 25 percent of schools in the state. A staff member represented by PFT who works in a school that meets either the 15-percent requirement or the 25-percent requirement receives a bonus of up to $6,000. PFT-represented staff at the district’s special schools are eligible for STAR Awards. (There is a separate formula for determining eligibility for PPS schools for students with special needs.) Additional individual eligibility criteria for awards are (1) satisfactory performance, according to the TE composite measure, for the year in which the STAR Award is earned, and (2) assignment to the school for at least 91 days of the school year. PPS award amounts are prorated to account for leaves and absences, as well as the number of days per week worked at the school. If a school earns STAR status, 100 percent of teachers at all of the district’s traditional schools and four of its special schools are eligible for STAR Awards. In 2015–2016, four of the district’s traditional schools and three of its special schools earned STAR status based on 2013–2014 and 2014–2015 outcomes. Although eligibility for STAR awards is determined at the school level, a teacher in one of those schools must demonstrate satisfactory performance on the composite TE measure to receive an award.

The AYP Award is a one-time bonus paid to PFT members (i.e., all teachers but also other staff, such as nurses and librarians) at the top step of the salary scale in years the district achieves AYP, a performance metric for improvement in student achievement under the federal NCLB legislation. Absences would cause the bonus amount of $1,000 to be prorated. PPS achieved

82

AYP only once while NCLB was still in force—during the 2010–2011 school year. AYP bonuses were paid to eligible staff in the fall of 2011.

Effectiveness-­Based Salary Schedule

PPS adopted this policy when PPS teachers ratified the collective bargaining agreement in July 2010. Under this policy, teachers hired after July 2010 could receive salary increases in either of two ways: (1) accrue years of service, known as advancing up the ladder, or (2) demonstrate high levels of performance, known as advancing across levels. Thus, the portion of the policy that awards salary increases based on performance is known as a level decision. A teacher rated D at least once in the three years since the previous level decision moves across levels to receive the performance-based salary increase. However, the first year of service in the district does not count for pretenure teachers; thus, their first level decisions occur after four years. In future years, a teacher could earn an additional amount of up to about $30,000, depending on his or her step placement at the time of the level decision. In 2014–2015, the first year PPS gave these increases, 63 percent of teachers (ten out of 16) received increases; the following year, 98 percent of eligible teachers (43 out of 44) received increases.

SCS

Supplementary Effectiveness-­Based Payments

In the fall of 2012, SCS awarded effectiveness bonuses based on data from the 2011–2012 school year. This bonus was intended to reward three groups of the district’s highest-performing teachers:

• “5 × 5” teachers: A teacher with a score of 5 on each component of the TEM would receive a $2,000 award.

• “irreplaceables”: A teacher with a TEM score in the top 10 percent of the legacy-MCS workforce would receive a $1,000 award.

• “TEM 5 professionals”: A teacher with a TEM score of 5 (between 425 and 453) would receive a $500 award.

Approximately 1,525 teachers (25 5 × 5 teachers, 600 irreplaceables, and 900 TEM 5 professionals) were awarded bonuses in the fall of 2012; this is approximately 25 percent of the district’s teacher workforce.

The bonus for TVAAS gains used TIF and RTT funding to reward teachers for gains in their TVAAS (state VAM) scores. Legacy MCS awarded these bonuses in the fall of 2012 based on data from the 2011–2012 school year. Schools that received bonuses out of TIF funds were classified as high-priority schools two years out of three, starting with the 2009–2010 school year, and one of those schools was identified for awards if the school-level TVAAS score for 2011–2012 showed positive gains in all tested content areas. Schools that received bonuses out of RTT funds were ranked by their school-level scores; schools that met the threshold for “sufficient gains,” as determined by legacy-MCS administration, were awarded bonuses. In

83

addition to teachers, principals, APs, and support staff received awards. This bonus program did not directly award teachers based on their individual effectiveness scores. Instead, schools were deemed eligible for the program based on school-wide achievement, and all teachers within eligible schools received payments.

The bonus for achievement on state tests used TIF funds for these bonuses in the fall of 2014 based on data from the 2013–2014 school year. Schools that met or exceeded achievement goals on state tests during the 2013–2014 school year received awards; in 2014, this was 14 schools. In addition to teachers, principals, APs, and support staff received awards. Like the bonus for TVAAS gains, this bonus for achievement on state tests did not directly reward teachers based on their individual effectiveness scores. Instead, schools were deemed eligible for the program based on school-wide achievement, and all teachers within eligible schools received payments.

The reward status bonus used SIG funds starting in the fall of 2013, based on data from the 2012–2013 school year. Any iZone school that is among the top 5 percent of schools in the state in terms of achievement growth or proficiency on state tests receives awards. Every teacher who teaches in one of those schools during the year the gains occur receives a $3,000 bonus. Like the bonus for TVAAS gains and the bonus for achievement on state tests, this reward status bonus does not directly reward teachers based on their individual effectiveness scores. This program is ongoing as of the writing of this report.

Effectiveness-­Based Salary Schedule

SCS did not make any changes to its salary schedule to link base compensation to TE.

CMO Compensation Policies

Supplementary Effectiveness-­Based Payments

Initially, the CMOs implemented bonus systems rather than pay-for-performance salary structures because, given California’s uncertain financial situation at the time, they were not sure they could maintain increasing salary commitments. The bonuses were small, typically between $500 and $5,000. By 2014–2015, all of the CMOs had discontinued bonuses that were based on effectiveness ratings.

As a result of the financial crises in California, wages in the CMOs were frozen for three years, from 2008–2009 through 2010–2011. Evaluation data became available as the economy began to recover. Although, originally, the CMOs considered 2011–2012 a pilot year, in the fall of 2013, all of the CMOs distributed bonuses based on overall effectiveness scores using teachers’ 2011–2013 results. Typically, they awarded bonuses to every teacher in the top three of five effectiveness categories. They awarded no bonuses to teachers rated as entry level.

84

Alliance

From 2012–2013 through 2014–2015, Alliance awarded bonuses to teachers at each TE level. The bonuses were $5,500 for a master teacher; $4,000 for HE; $2,250 for E; and $750 for achieving. Alliance awarded no bonuses to entering teachers.

The CMO switched to a pay-for-performance salary schedule in 2014–2015. It had awarded bonuses to teachers in all TE categories except entry level. The 2016–2017 Alliance salary schedule is based on years of service and two years of performance at a given TE level.

Aspire

Aspire awarded bonuses linked to effectiveness ratings in 2012–2013 and 2013–2014 for teachers at each TE level except entering teacher. Bonuses were $500 for an emerging teacher; $1,000 for E; $2,000 for HE; and $3,000 for a master teacher.

Green Dot

Green Dot awarded bonuses in 2012–2013 and 2013–2014 to teachers in the top three of five TE categories. Bonuses were $500 for E teachers; $1,000 for HE teachers; and $2,000 for HE II teachers.

PUC

In January 2014, PUC awarded bonuses linked to the 2012–2013 TE ratings. It awarded them to teachers in the top three of five effectiveness categories: $1,500 for the progressing level; $3,000 for HE level; and $5,000 for exemplary level. For 2013–2014, instead of a bonus linked to 2013–2014 ratings, every teacher received $500 for being part of the research and development of the evaluation system. PUC subsequently discontinued bonuses. The CMO changed its emphasis from evaluation to teacher development and stopped calculating a TE score. PUC continues to implement a traditional step-and-column pay structure.

Effectiveness-­Based Salary Schedule

Alliance

Alliance instituted a salary schedule linked to TE level in 2014–2015 based on data from the previous two years and on years of service. A teacher needs two consecutive years at a new TE level to move up the salary scale. Alliance places every new teacher with one to two years of experience at entry level and any teacher with more than two years’ experience at achieving level. Teachers cannot be moved lower on the salary schedule even if they earn lower effectiveness scores.

Aspire

Aspire instituted a salary schedule linked to TE level in 2014–2015 based on the prior year’s TE score and on years of service. Teachers cannot be moved down on the salary scale.

85

Green Dot and PUC

Green Dot and PUC continue to use traditional step-and-column pay structures based on years of service and education credits.

87

Appendix I. Analyzing the Relationships Between Teacher Compensation, Assignment to LIM Populations, and TE: Analytic Methods for Chapter Seven

The estimates presented in Chapter Seven (specifically, Figures 7.8 and 7.9) result from modeling teacher compensation as a function of TE (measured in terms of the site’s composite TE level or the study-calculated VAM score), controlling for the teacher’s age, teaching experience, educational attainment, gender, and race. This modeling conceptualizes teacher compensation as responsive to effectiveness. Consequently, the estimates show the effect that composite TE levels and VAM scores measured in one year have on compensation in the subsequent year. Specifically, the dependent variable in our specification is the natural log of total compensation (base compensation plus all other compensation) for teacher i in year t + 1, ln(P)it + 1. We regressed the dependent variable on indicators of effectiveness from year t (E1it, E2it, and E3it). We grouped effectiveness measures (composite TE levels and VAM scores) into three categories: E1it = 1 if the teacher received a low composite TE or VAM rating, E2it = 1 if the teacher received a middle composite TE or VAM rating, and E3it = 1 if the teacher received a high composite TE or VAM rating. Additionally, we included a vector of control variables, Xit, including age, gender, race, educational attainment, and teaching experience. Furthermore, we centered each control variable by its annual mean and excluded the constant so that the coefficients of E1it, E2it, and E3it give the expected log compensation for an average teacher of each effectiveness level. The following equation shows this specification:

We ran the models separately for each site and for each year. To obtain expected

compensation for teachers of each effectiveness level, we converted the estimates using the smearing method for nonparametric retransformation developed by Duan, 1983.

ln P( )it+1 = Xitγ + β1E1it + β2E 2it + β3E3it + ε it .

89

Appendix J. Site CL Policies: Supplementary Material for Chapter Eight

The descriptions in this appendix supplement the information presented in Chapter Eight on CL policies. We first describe the districts, then the CMOs.

District CL Policies

HCPS

In its proposal, HCPS aimed to implement a six-step CL, which it did not adopt. However, the mentors who evaluated novice teachers played an important mentoring role, so mentor could be considered a CL position, according to our definition. HCPS launched the mentor position in 2010–2011.24 Mentors provided advice and support to new teachers and evaluated new teachers who were not their mentees. (In this capacity, they were called swap mentors.) In 2013–2014, the district created a new position, teacher leader, as part of HCPS’s TIF POWER3 grant; HCPS implemented this pilot program in 15 high-need schools and expanded it to 30 high-need schools in 2014–2015. Teacher leaders provided individualized coaching to teachers.

The rules governing the two positions differed somewhat. Mentors were originally appointed for two-year terms; however, HCPS repeatedly extended the appointments for all interested mentors to cover the full length of the IP grant. Mentors served in these roles full time and had no other teaching responsibilities. Teacher leaders were appointed through the end of the POWER3 grant, 2016–2017. Unlike mentors, they served half the time as teacher leaders and half the time as classroom teachers.

PPS

Before the IP initiative, PPS did not have any teacher leadership roles that were based explicitly on teachers’ effectiveness. As part of the IP initiative, PPS proposed five CL roles, each of which offered additional compensation in the form of bonuses or salary increments or both: CRI, PRC, LES, ITL2,25 and turnaround teacher. The program had two goals: improve the overall quality of PPS’s teacher workforce through peer support and coaching and attract some of the district’s best teachers to the neediest schools. PPS implemented the CL positions in 2011–2012 and 2012–2013, and most of the positions were in the lowest-performing schools. All the

24 Simultaneously, HCPS created the related peer evaluator position, which we discuss in detail in Chapter Two. 25 There had previously been a position called instructional teacher leader, and this new position was called ITL2 to distinguish the two positions.

90

CL positions were term limited—that is, each teacher selected served a two- or three-year term, although these teachers could serve multiple terms. In addition, CL teachers were evaluated on domain 5, an additional RISE domain that PPS developed specifically to evaluate CL positions. Domain 5 focuses on skills critical to the CL role, such as coaching and instructional leadership. PPS selected teachers for CL roles based on a rigorous interview process, which included teaching a sample lesson, participating in group discussions, and responding to writing prompts. The selection process also included a review of available effectiveness data. Each CL teacher had to maintain a performance level of P or D on the TE composite measure, and a CL teacher could receive no more than one rating of U in a domain 5 component in the first year and no U ratings in domain 5 components in the second year. In 2015–2016, CL teachers could accrue building seniority while they served in CL roles. PPS intended this change to incentivize teachers to transfer into high-need schools.

The district implemented the ITL2 position in the fall of 2012 in several of the district’s highest-need schools; the position carried a three-year term. A teacher needed to have three years of teaching experience with at least one of those in PPS and a composite TE rating of P or D to be eligible for the role. The $11,300 stipend included $5,000 for taking on a school-leadership role based on effectiveness. The remaining amount covered the extended work hours incurred by having an extended working year. From the program’s inception through 2014–2015, the term was three years then reduced to two years in 2015–2016.

Each ITL2 received a stipend and taught a reduced course load (three or four periods, depending on the school). Initially, ITL2s served as coaches and mentors of their peers, often first-year teachers, by doing informal observations and providing formative feedback. ITL2s were matched with mentee teachers in their subject areas to the extent possible and were expected to provide subject-specific feedback and coaching. In the second year of the program, ITL2s conducted formal observations (i.e., with stakes attached) in addition to the informal observations, feedback, and coaching. In 2015–2016, the program was expanded to include more positions in most of the district’s schools and a new role, ITL2 leads, who were supposed to support and coach three or four first-year ITL2s. As of 2015–2016, there were 70 ITL2s in about 40 schools. Central-office staff told us that the position would be discontinued at the end of the 2016–2017 school year because of the new district leaders’ desire to invest in content-specific coaching in roles that did not also include responsibility for evaluation.

In the fall of 2011, PPS implemented the role of the CRI, which had a three-year term and offered a stipend for coaching responsibilities in addition to teaching, in two of the district’s lowest-performing schools. In the original conception, the CRI role was to train and mentor first-year teachers and provide peer support to experienced teachers as part of the teacher academies; however, PPS did not implement the academies, and the CRI role changed several times after its implementation. At first, PPS asked CRIs to serve as nonevaluative peer mentors to struggling teachers in their schools, as well as teach limited course loads. CRIs were given training in classroom observation, providing feedback, and peer leadership and were expected to assist

91

principals in implementing schools’ improvement plans. In later years, PPS expanded the CRI’s role to include reviewing existing PD activities and materials and developing materials as part of the district’s effort to create an online repository of PD offerings linked to RISE rubric components. At the end of the 2013–2014 school year, PPS discontinued the CRI role in one school and, at the end of the 2014–2015 school year, in the second school.

The goal of the PRC program, which was piloted in the fall of 2010–2011, fully implemented in the fall of 2011–2012, and still in place as of the writing of this report, was to form an elite cadre of teachers to help students transition to HS and provide intensive academic support to help as many students as possible be “Promise-Ready,” or eligible for the Pittsburgh Promise scholarship program. To be eligible, a teacher must have at least one year of teaching experience, which does not have to be in PPS, as well as a composite TE rating of P or D. The PRC consists of teachers of grades 9 and 10 who loop or teach consecutive grades to the same group of students. PRC teachers meet daily to plan support for students and teach in multisubject teams. Until 2015–2016, every PRC teacher received an annual stipend to compensate him or her for the additional duties. The $9,300 stipend included $5,000 for taking on a school-leadership role based on effectiveness. The remaining amount covered the extended work hours incurred by having an extended working year. Members of PRC teams that produced above-average student achievement gains also receive performance-based bonuses (a lump-sum payment every two years), described in more detail in Chapter Seven. In the 2015–2016 school year, the district changed the structure of the program and introduced PRC lead roles. PRC leads were responsible for leading the PRC teams and were considered to have CL roles; teachers who joined a PRC team starting in 2015–2016 were not considered to be in CL roles and thus were not eligible for the stipend. But existing PRC teachers were grandfathered in and are still considered to have CL roles, thereby remaining eligible for the stipend. As of 2015–2016, there were 76 PRC or PRC lead teachers.

In the fall of 2011–2012, PPS implemented the LES position in seven of the district’s lowest-performing schools. The position carried a three-year term from the inception of the program through 2014–2015; beginning in 2015–2016, PPS reduced the term to two years. To be eligible, a teacher must have three years of teaching experience with at least one of those in PPS and a composite TE rating of P or D. The $9,300 stipend includes $5,000 for taking on a school-leadership role based on effectiveness. The remaining amount covers the extended work hours entailed by an eight-hour workday and an extended working year. The goal of the LES positions was to provide coaching and mentoring in classroom management to teachers who struggle with that skill. An LES teacher also worked with the principal and staff in his or her school to implement the district’s equity program, which encouraged teachers to provide equitable instruction to all groups of students. The LES did not teach classes and received a stipend for the increased responsibilities. In the fall of 2012–2013, because of school closures, PPS reduced the number of LES positions to five. In 2015–2016, it added district-level positions, resulting in

92

three district-based LESs, each responsible for several schools, and school-based LESs in three schools. As of 2015–2016, there were six LESs.

The turnaround-teacher position, which was envisioned as a team of HE teachers who would be deployed to the district’s highest-need schools, was scheduled to be implemented in the fall of 2012–2013, along with the ITL2 position, but PPS never implemented the position.

SCS

Legacy MCS proposed new CL roles that were closely integrated with the district’s plans for reforming teacher compensation, but, as of the writing of this report, SCS has not implemented these plans, largely because of the lack of staffing at the district level, challenges working with consultants, and the merger. Instead, the district implemented several coaching roles in ways that fit our definition of teacher leadership roles, so we include them here. We do not have any information on what CL roles, if any, existed before the initiative.

In the 2012–2013 school year, legacy MCS piloted the PAR program, which used effective veteran teachers, PAR CTs, to coach struggling veteran teachers. In 2013–2014, SCS fully implemented the PAR program, which continues as of the writing of this report. To be eligible for the PAR CT role, a teacher must have “demonstrated effectiveness” through a TEM score of 4 or 5, with a minimum TE score of 3. A PAR CT receives a yearly stipend of $3,000.

After the merger in 2013, SCS implemented several coaching roles that could be filled by effective teachers. Taken together, the roles of learning coach, master teacher, and PIT crew were referred to as the tiered coaching model and were intended to provide increasingly intensive coaching to struggling teachers, ranging from coaches who were teachers at their local schools to more-expert coaches who served larger regions. The PLC coach was another coaching role for effective teachers, but it was not considered part of the tiered coaching model.

A learning coach or master teacher received a yearly stipend and taught a full course load, along with coaching his or her building peers. PIT crew members and PLC coaches were full-time coaches and did not teach. Learning coaches provided formative coaching, with no stakes attached, to new and struggling teachers; they taught full course loads, but SCS provided each a stipend for the extra work coaching. Master teachers supported the learning coaches in their buildings, provided extra support to struggling or new teachers, and could conduct formal observations and implement school-wide PD. SCS called on PIT crew coaches to provide more-intensive coaching when a teacher struggled to improve and conducted formal evaluations. PLC coaches provided coaching to teachers in their buildings and could conduct formal evaluations. To be eligible for these roles, a teacher needed to have a score of 4 or 5 on the observation rubric and a minimum composite TEM score of 3; he or she also had to demonstrate competency in coaching. Initially, district staff identified teachers for these roles based on the prior year’s evaluation scores; however, central-office staff told us that principals had discretion in selecting learning coaches in their buildings and that some principals might have ignored the set requirements. The roles of learning coach, master teacher, and PIT Crew were in place for two

93

years, from 2013–2014 and 2014–2015; the role of PLC coach continues as of the writing of this report.

CMO CL Policies

Alliance

Before the start of the IP initiative, Alliance selected its most-effective mathematics teachers to work as transformational leaders. Each transformational leader received a stipend of $6,000; continued to teach in his or her own school, and served as a trainer for other schools. These same teachers later became mentors when Alliance began a teacher-residency training program.

Alliance created another specialized teacher role in 2012–2013, the ALLI coach, and fully implemented it the following year. These teachers coached first- and second-year teachers for one, two, or three periods out of the teaching day. To be eligible to be an ALLI coach, a teacher must have scored at the top two effectiveness levels (out of 5 levels). Each school could have up to two ALLI coaches, and the principal determined, based on the school’s resources, the number of periods they coached.

Alliance created several other career positions in 2013–2014. These positions included data fellow (to help administrators and teachers navigate data, such as student test results), demonstration teacher (to conduct lessons that other teachers could observe), and instructional PD teacher (to work with the school’s instructional team to help implement the Common Core standards). Although the Alliance central office developed these positions and provided training, each SL decided whether to implement the position at his or her school and find funding to support it. The Alliance central office believed that these positions would contribute to improved student achievement, but, aside from the ALLI coaches, few teachers applied. One central-office administrator explained the low application rate:

We offered a handful of positions and advertised them to teachers; to my knowledge, I don’t think there was a lot of energy and interest in that because folks were just worried about their evaluation and how to perform in a classroom.

Aspire

Before the IP initiative, Aspire’s school organization included lead teachers, expert teachers in their subjects whom principals selected to lead department meetings and serve on the school instructional teams that set the PD agendas for the schools. These positions continue to be implemented.

In 2012–2013, Aspire began creating an array of teacher leadership roles available to teachers at all levels of effectiveness. In 2013–2014, Aspire fully implemented this set of positions, the Aspire Teacher Leadership and Career Path. According to central-office staff, the

94

motivation for these roles was to keep HE teachers in the classroom. One central-office administrator explained the rationale in these words:

Don’t become a [central-office] coach; don’t become a dean; don’t become a principal. $2000 bonus or salary may not incent them, but if we say they can also be the peer observer or be the instructional driver and get extra PD, [they will think,] “I love and live for instruction. That will incent me to stay in the classroom.”

Among the 20 new positions created were induction coach, Common Core driver, data driver, mentor teacher, model teacher, video teacher, and both virtual and in-person PLC leaders in various topics. Stipends for the Aspire roles typically ranged from $1,000 to $2,500. Aspire used its TIF grant, received in 2012–2013, to pay for these teacher leader roles. Over time, new roles evolved, and Aspire retired roles that it no longer needed (e.g., Common Core driver). According to central-office staff, one of the primary reasons teachers sought the roles was to receive the extra PD provided to them to prepare them for the positions. However, in 2016, we were told that the number of applicants had declined following the adoption of the Common Core standards. According to central-office staff, “a lot [of] teachers realized that they needed to refocus on their own classrooms, and really buckle down on their own practice, rather than serving in these regional or cross-regional roles.”

In 2015–2016, Aspire teachers held 21 teacher leadership positions. The most-popular positions, their minimum effectiveness requirements, and their stipends were as follows:

• data driver (E, HE, or master): $1,000 • ELA instructional driver (E, HE, or master): $1,500 • equity driver (any effectiveness level): $1000 • math instructional driver (E, HE, or master): $1,500 • peer observer (HE or master): $1,500 • site-based induction coach (HE or master): $1,500.

Green Dot

Green Dot offered two teacher leadership positions before the IP initiative: ILT member, which was similar to department chair, and new-teacher mentor. These positions were both selected at the school level in compliance with the negotiated union contract and continue to be implemented.

Green Dot began offering three additional teacher leader positions in 2012–2013: teacher leader facilitators, who designed and led PD at CMO-wide PD days; the Green Dot ILT, which consisted of 96 teachers, about five from each school and one from each department, who received training on being effective department leaders; and demonstration classroom teachers, four teachers who conducted classes that other teachers could observe. The number of positions was expanded in 2013–2014 to include PD leaders, who provided PD sessions, and data fellows, who assisted teachers in navigating and interpreting data.

95

In 2014–2015, Green Dot discontinued the teacher leader facilitator and data fellow positions but created several new teacher leader positions. In some of these positions, the central office chose teachers who could serve across the CMO or only in their own schools, while, for other positions, teachers were selected and served only at the school level. The school-level positions were not standardized, in that the exact role and the stipend could differ by school. One advantage of the school-defined positions, which could make them more appealing to teachers, was that teachers did not have to travel to other locations; the teachers might have been more committed to their schools than to the CMO as a whole.

The career positions that were operating in 2015–2016 were as follows:

• school-level positions

- English learner lead (one per school) - Green Dot ILT (six department chairs per school) - new-teacher mentor (one per school).

• central-office positions

- special-education new-teacher support advisers (one MS, one HS) - National Expansion Leadership Collaborative, to assist in Memphis (four) - teacher PD advisers (six; none filled in 2015–2016) - demo class teacher (one ELA, one math, and two science, history, or electives) - PD leader (11, by subject and level) - TIP coaches (one per participating induction teacher) - special-education coteaching advisers (two special education, one general-education

teacher; none filled in 2015–2016) - core curriculum review team (28) - sheltered ELA revision committee (seven) - special-education academic success working team (four) - special-education curriculum and assessment adviser (two) - technology pathways review team (one MS, one HS) - site liaison (one per school).

There are minimum qualifications for all the instructional leadership positions—demo class teacher, PD leader, TIP coach, special-education coteaching advisers, ILT, new-teacher mentor, and English learner lead. First, for leadership roles, the organization hires only people whom central-office staff have observed and who have performed well in the observations. Second, all roles require recommendations from the school site administrator. Other requirements depend on the position. For example, for the instructional leadership positions, teachers should have at least HE on the TE measure. However, a teacher does not need to have an HE rating to qualify for a technology pathway review team position. A central-office staff member said,

We look at the total score and take it with [a] grain of salt. Somebody could have 3.3 on observation but we know they’re not as strong as at a school with rigorous evaluators. We don’t look at anybody as just a number. [We are a] small enough organization that we have observed all these people.

96

Green Dot has two main goals for the leadership roles: to keep teachers in the classroom who want to remain in the classroom and to provide leadership opportunities for teachers who want to move into administration:

We want to keep everyone in the classroom who wants to stay in the classroom. The demo classroom teacher is our highest-paid leadership position. It incentivizes teachers to stay in the classroom and lead by doing what they do well. We also need administrators, so if people are interested in exploring leadership, we support that.

Despite the variety of new CL roles (which reflected the organization’s needs) and despite the stipends provided, many teachers were reluctant to take on the extra duties. Thus, the roles did not encourage retention of effective teachers. A TCRP study (Abshere, 2016) identified the same reluctance on the part of teachers: “Our [TCRP] teachers talked about really not liking the teacher leadership roles, not seeing them in a way they were intended. Not really a retention driver. It worked for a subset of people” (personal communication; Abshere, 2016). Some teachers said that teacher leadership roles led to a path out of the classroom, which they did not want, and some teachers placed more value on their relationships at their schools and with their principals than on a career path.

The central office decided not to adopt a hierarchical CL after feedback from teachers who indicated that they felt that being in a position at the bottom of the ladder was not representative of the teacher’s importance or value to the organization. However, even though the current CL positions are not organized in the form of a ladder, administrators used them as signals of readiness when considering teachers’ qualifications for administrative positions. One administrator explained how he factored CL experience into the equation when deciding which teachers to admit to the formal administrator residence program:

When recruiting for administrator residency, I would look for someone who participated in two site-level positions, one of which would be the instructional leadership team, which are the most-effective teachers in each content area. I’d look for somebody with experience in one Green Dot–wide leadership position.

PUC

PUC had two teacher leadership positions before the IP initiative that are still ongoing. One role is induction support providers, who assist new teachers in meeting state requirements to move from a preliminary to a “clear” credential, and the second role, learning lab demo teacher, occurs during the summer institute for new teachers, in which actual classrooms are set up and new teachers can observe the instruction conducted by the learning lab demo teachers.

In 2013–2014, PUC implemented the Common Core Pioneer position to train teachers in Common Core teaching strategies that they could then model for other teachers at their schools. In 2014–2015, PUC eliminated the position and replaced it with content coordinator and assistant content coordinator positions. The content coordinators prepared descriptions of their best activities and made them available online for other teachers and students. Five CL positions were

97

in place in 2015–2016: advisory panel member, content coordinator, assistant content coordinator, Alumni Teach Project mentor, and learning lab demo teacher. The minimum qualifications for every one of these positions in 2015–2016 was that the teacher had to be in good standing with respect to his or her evaluation and had to be recommended by his or her principal. The central office reviewed the qualifications and interviewed the candidates. (PUC stopped calculating composite TE scores after 2012–2013, so these were not available to use for selection.) Some positions also required that the teacher have at least two years of experience.

PUC central-office administrators described the purpose of the teacher leader roles as giving teachers more responsibilities and stipends based on their expertise, both so that teachers who wanted to stay in the classroom could do so and so that teachers could gain experience to move into leadership. Additionally, for people in the positions with stipends, those stipends would serve in place of effectiveness bonuses. One administrator added, “We believe [that] the career path for which teachers have to qualify . . . is, in essence, going to be rewarding them for effectiveness.”

Despite these intentions, central-office staff acknowledged that PUC has been slow to implement CL positions and that those that it has implemented are not attracting large numbers of teachers. One central-office staff person commented,

I don’t have a good explanation for why we haven’t moved on this. It’s definitely the area we haven’t moved in, especially compared to the other CMOs. It’s apparently a priority for this year [2015–2016]. We’re still in at the level of identifying the teacher leader positions, and we haven’t got up to the point of identifying what the career path is and we haven’t restructured our salary schedule.

Principals were supposed to publicize the positions and identify appropriate teachers to fill them, but, apparently, this has not always occurred. Another staff person suggested that SLs might not have been prepared for this task: “Almost all of our school leaders know about the roles, but not all are using them or feel capable of naming a teacher [who] can do X, Y, or Z for us.”

99

Appendix K. Additional Exhibits for Chapter Eight

Table K.1. Teacher Survey Questions About Awareness of CLs and Specialized Positions

Question Who Responded Response Options This year, does your district/CMO have in place a “career ladder” for teachers, or specialized instructional positions that teachers may take on if they are considered qualified?

All survey respondents • Yes • Partially implemented or being

phased in (for example, some positions are currently available while others are still being developed)

• No • Don’t know

This year, are there teachers who hold higher-­level career ladder or specialized instructional positions at your school?

Only respondents who answered “yes” or “partially implemented or being phased in” to the first question

• Yes, me (non-­exclusive) • Yes, teacher(s) other than me

(non-­exclusive) • No (exclusive) • Don’t know (exclusive)

Please fill in the title of the career ladder or specialized position you currently hold.

Only respondents who answered “Yes, me” to the second question

• (write-­in response)

100

Figure K.1. SLs Reporting That Their Site Had or Was Phasing in a CL or Specialized Instructional Positions, Springs 2013–2016

NOTE: Omitted response categories are “no” and “don’t know.” We did not ask the question in 2011. Because of rounding, some percentages do not sum precisely.

101

Figure K.2. SLs Reporting That There Were Teachers at Their School Who Held Higher-­Level CL or Specialized Instructional Positions, Springs 2013–2016

NOTE: We asked this question only of SLs who said that their site had a fully or partially implemented CL. Omitted response categories are “no” and “don’t know.”

Figure K.3. Teachers’ Agreement with Statements About CLs, Selected Sites and Years

NOTE: We based the decision about which site-­years to include on the analysis of awareness presented in Chapter Eight of the report. Omitted response categories are “disagree somewhat” and “disagree strongly.” Significant (p < 0.05) differences between the 2015 percentage and other years' percentages: For the statement in row 1, Alliance increased from 2014, Aspire increased from 2013, and PUC increased from 2013 and 2014 and decreased to 2016. For the statement in row 2, PPS increased from 2014. For the statement in row 3, PPS increased from 2013 and 2014, Alliance increased to 2016, and Aspire increased from 2013. For the statement in row 4, PPS increased from 2013 and 2014 and Aspire increased from 2013.

Percentage of teachers agreeing with each statement (somewhat or strongly)

SCS PPS Alliance Aspire Green Dot PUC

The process by which teachers in my district/CMO are selected for the various career ladder/specialized positions is fair.

I aspire to a higher or specialized teaching position in my district/CMO.

The opportunity to advance to a higher or specialized teaching position in my district/CMO has motivated me to improve my instruction.

The opportunity to advance to a higher or special teaching position in my district/CMO increases the chances that I will remain in teaching.

67

74

62

61

72

69

59

54

2013 2014

2015 2016

60

39

29

23

57

38

27

23

62

45

40

30

59

47

38

35

2013 2014

2015 2016

73

73

63

64

83

73

60

65

78

82

78

74

2013 2014

2015 2016

74

77

63

67

82

83

70

70

86

80

73

74

88

83

75

75

2013 2014

2015 2016

75

71

54

58

71

70

54

46

72

69

52

56

75

73

61

64

2013 2014

2015 2016

71

75

61

64

71

74

64

59

90

76

65

66

79

85

75

72

2013 2014

2015 2016

103

Appendix L. Resources Invested in the IP Initiative: Analytic Methods for Chapter Nine

Site Expenditure Data and Analysis

Data Sources

We based our analysis of sites’ IP expenditures mainly on the financial reports that each site submitted to the Gates Foundation. For most of the sites, we obtained copies of the financial report files submitted for 2013, 2014, 2015, and 2016. We also looked at the stocktake narratives and other supporting documentation that accompanied the financial reports. Table L.1 shows the expenditure files that we reviewed for each site.

Table L.1. IP Sites’ Financial Reports

Site Information Provided

File Name

HCPS Expenditure reporting to the foundation

• HCPS IPS Financial Report -­ October 19 2012.xls • HCPS IPS Financial Report -­ Fall 2013.xls • Hillsborough IPS Financial Report -­ Spring 2014 final sent.xls • Hillsborough IPS Financial Report -­ Fall 2015 8-­24-­15.xls • HCPS Final GATES Stocktake Fall 2016.xls

HCPS Stocktake submissions to the foundation

• Stocktake Narrative_Fall2015 Final.pdf

PPS Expenditure reporting to the foundation

• Pittsburgh IPS Financial Report -­ Fall 2014.xls • Pittsburgh IPS Financial Report -­ Spring 2015.xls • OPP1006112_2015_PPS_Financial Report -­ For the Period of Jan-­Dec

2016.xls

PPS Stocktake submissions to the foundation

• PittsburghPublicSchools_Spring2013StocktakeNarrative.pdf • Financial Section -­ Pittsburgh Public Schools Fall 2014 Sustainability

Progress Plans.pdf

SCS Expenditure reporting to the foundation

• Shelby IPS Financial Report Fall 2014 with Actuals for FY2015.xls • Copy of OPP1006364_2016_Shelby_Budget_FALL 2016 Final 9-­12-­2016

Final.xls

SCS Stocktake submissions to the foundation

• SCS IPS Progress Report Dec 2014 Submitted.pdf • SCS IPS Progress Report_Fall 2015 FINAL.doc

Alliance Expenditure reporting to the foundation

• Alliance.Stocktake Fall 2013_11-­25-­2013.xls • Alliance Financial Report -­ Fall 2014.xls • Alliance Financial Report -­ Fall 2015.xls • OPP1040958_2016_Alliance_IPS Financial Report -­ Fall 2016.xls

Aspire Expenditure reporting to the foundation

• Aspire_Stocktake Nov 2013_FINAL.xls • Aspire Financial Report -­ Fall 2014 10.13.14.xls • 100116 Aspire IPS Financial Report -­ Fall 2016.xls

104

Site Information Provided

File Name

Aspire Stocktake submissions to the foundation

• Aspire_Contextualizing RAND Data_Stocktake January 2015.pdf

Green Dot

Expenditure reporting to the foundation

• Green Dot Financial Report -­ Fall 2014.xls • Green Dot IPS Financial Report -­ Fall 2015 BMGF.xls • OPP1040954_Green_Dot_IPS_Financial_Report_Fall_2016.xls

Green Dot

Stocktake submissions to the foundation

• Green Dot November 2013 with problem of practice.pptx • GD_Programmatic Stocktake final.pptx (January 2015)

PUC Expenditure reporting to the foundation

• PUC Programmatic Stocktake Spring 13.xls • PUC Financial Report 11-­21-­14.xls

PUC Stocktake submissions to the foundation

• PUC Programmatic Jan 2015.pdf

TCRP Expenditure reporting to the foundation

• TCRP Gates Expenditures 2010.xls • TCRP_Actual_vs_Budget_Report_063010_v4_IPS_ Progress_Report.xlsx • TCRP_Actual_vs_Budget_Report_123110.xlsx

TCRP Stocktake submissions to the foundation

• Hub Programmatic and Fiscal Stocktake Spring 13

NOTE: Expenditure statements were attachments to the stocktake documents, and, in Aspire and PUC, they did not have separate titles. We did not receive complete stocktake submissions from Alliance—just the expenditure attachment.

In the financial reports, each site detailed the strategies and activities it implemented, with its

corresponding expenditures and funding source. The strategies and the specific line items listed under each strategy differed by site, especially among the three districts. For example, as Table L.2 shows, HCPS had nine strategies, whereas SCS had only five; the number of individual line items ranged in the districts from just over 100 (SCS) to more than 250 (PPS). There appeared to be greater consistency among the CMOs: Alliance, Aspire, and PUC listed the same six strategies; Green Dot also had these six but another two in addition. However, despite the similarity of overall strategy names, the CMOs also reported their expenditures quite differently from one another at the line-item level, and the number of line items in the CMO reports ranged from about 40 in Alliance and PUC up to 80 in Green Dot. The next section explains how our analysis of expenditures handled this variation across sites.

105

Table L.2. Strategies, by Site

Site

Approximate Number of Unique Line Items Reported Strategy

HCPS 120 Measuring Teacher Effectiveness, Generation for Pay, Programs and Incentives for High Needs Students, Apprentice Teacher Acceleration Program, Enhanced Recruitment and Dismissal, Strengthen School Leadership, Performance Management, Integrated Instructional Toolkit, Change Management Communication

PPS 270 PRC, Teacher Practice Evaluation, HR Effectiveness, TLE, Teachers Academy, CL, Aligned IT Systems, Performance Pay/Collective Bargaining, Integrated Communications, Project Management

SCS 110 Define and Measure Effective Teaching;; Make Smarter Decisions About Who Teaches;; Better Support, Utilize, and Compensate Teachers;; Improve the Surrounding Context to Foster Effective Teaching;; Overall Implementation

Alliance 45 Teacher Supports, Principal Leadership, Teacher Residency, Extended Implementation Team, Career Path, Differentiated Compensation

Aspire 60 Teacher Supports, Principal Leadership, Teacher Residency, Extended Implementation Team, Career Path, Differentiated Compensation

Green Dot

80 Teacher Supports, Principal Leadership, Teacher Residency, Extended Implementation Team, Career Path, Differentiated Compensation, Program Expense Reimbursement, Counseling

PUC 40 Teacher Supports, Principal Leadership, Teacher Residency, Extended Implementation Team, Career Path, Differentiated Compensation

SOURCES: IP site financial reports for the fall of 2014 and the springs of 2015 and 2016.

Detailed financial reports were not available for the CMOs for the years before FY 2012. For

FYs 2010 and 2011, when the CMOs were organized collectively as TCRP, we estimated each CMO’s funding by prorating the total TCRP funding in those years by each CMO’s share of the four CMOs’ combined funding in FYs 2012–2014.

Data Analysis

Our analyses of total expenditures and expenditures by funding source were relatively straightforward: We simply summed the reported expenditures across years or across funding sources. We calculated per-pupil expenditures by dividing each expenditure by the number of students enrolled in the site in 2015–2016. We took these enrollment numbers from the sites’ stocktakes, data dashboards, and, for PPS only, the general fund budget. Although we know that student enrollments might have changed over the course of the initiative, particularly in the CMOs, we elected to keep the enrollment number constant so that per-pupil expenditures would be more comparable across years. In addition, for the per-pupil expenditures across the entire grant period, we did not have data allowing us to calculate the number of unique students in each site over the whole grant period. For the sake of simplicity, we elected to use the final-year (2015–2016) enrollment.

106

Our analysis of spending by implementation lever was somewhat more involved. This analysis required calculating expenditures in each of the four main implementation lever categories: teacher evaluation, staffing (recruitment, hiring, placement, transfer, tenure, and dismissal), PD, and compensation and CLs. The site financial reports, however, listed expenditures by the sites’ strategies (as shown in Table L.2), which did not always correspond directly to one of the four lever categories. Thus, we attempted to classify each individual line item into one or more of the levers. Because our knowledge of the individual activities was sometimes limited, we shared with each site the definition of each implementation lever and our initial classification of each of the site’s reported expenditures. We then met with each site to discuss the classification, including whether certain expenditures should be distributed across multiple levers.26 For some expenditures cutting across levers, the sites specified how we should allocate the expenditures. For example, HCPS had a line item referring to lead mentors; the district finance person indicated that 85 percent of this expenditure should be allocated to the teacher-evaluation lever and the remaining 15 percent to the PD lever. In other cases, we apportioned an expenditure proportionally across all of the levers. For instance, the line items under HCPS’s Change Management Communication strategy applied to all of the levers. In consultation with the site, we distributed these expenditures proportionally across all four of the implementation levers, once we knew what each lever’s proportion of total spending was.

Time Allocation Data and Analysis Except for the “short” teacher surveys administered in 2014 and 2016, the surveys

administered to teachers and SLs (see Appendix A) included detailed questions about respondents’ time allocation. In this report, we present results based on the surveys administered in the spring of 2013 (about the 2012–2013 school year) and the spring of 2015 (about the 2014–2015 school year). We did not use data from the 2011 surveys because the format for the time allocation questions in that year’s surveys differed from that used in subsequent years, creating a challenge for comparability across years. In addition, in preliminary analyses, we found that there were almost no differences between 2012 and 2013 in the SL time allocations; the same was true for 2014 and 2015. Thus, for simplicity and to parallel the teacher survey data, we present the SL results from the spring of 2013 and the spring of 2015 only.

Description of the Survey Section

The time allocation section of each survey began by asking respondents how many contract days they worked per year and how many hours they worked in a typical work week.27 (For hours, the survey instructed respondents to enter actual hours, including off-site hours and

26 We were able to meet with all of the sites except PUC. 27 As an aid to respondents, the surveys provided the typical number of contract days in each site.

107

weekend hours, rather than contract hours.) The survey then presented a detailed list of specific activities and asked respondents to report the hours (either per week or per year, whichever they preferred for each activity) they spent on each activity. The teacher survey asked about 39 specific activities, grouped into the following categories:

• classroom instruction during the regular school year (two activities) • noninstructional contact with students and contact with families (three activities) • PD received (11 activities) • participating in activities related to the respondent’s own performance evaluation (three

activities) • serving as a formal or informal mentor, instructional coach, or PD provider (four

activities) • observing teachers for the purpose of their formal evaluation (five activities, seen only if

the respondent had reported earlier in the survey that he or she evaluated other teachers) • general administrative, nonteaching activities (six activities) • participating in activities related to district reform initiatives (two activities) • planning and preparation for the classes the respondent taught (three activities).

In a separate section, the teacher survey also asked the respondent about the amount of time he or she had spent the previous summer (before the current school year) on activities related to his or her job as a teacher, such as planning and attending training or other PD.

Similarly, the SL survey asked SLs to “provide your best estimate of hours you spend” on each of 30 activities, grouped as follows:

• PD received (six activities) • PD provided for staff (three activities) • observation and evaluation of teachers and other staff (five activities) • administrative duties and activities (nine activities) • recruitment and hiring (three activities) • classroom instruction and related preparation duties (two items, seen only if the

respondent had reported earlier in the survey that he or she had official teaching responsibilities)

• district reform initiative activities (two activities).

Data Cleaning and Processing

Before conducting our analyses, we cleaned and processed the data. First, for teachers only, we checked the total amount of time they reported spending teaching in relation to total hours worked. In particular, we found that some teachers who reported working at least 35 hours during a typical week reported teaching less than seven hours during a regular week.28 We concluded that these teachers likely mistakenly entered daily, instead of weekly, hours for time

28 We restricted this analysis to teachers who filled regular teaching roles—that is, those who indicated that they were “regular education teachers” who either “taught a single group of students all or most of the day in multiple subject areas” or “several classes of different students during the day in a particular subject or two subjects.”

108

spent teaching. We thus multiplied their instructional hours by 5 to create a number of weekly hours.

Second (or first, for SLs), we calculated weekly hours spent on each activity. This was straightforward when a respondent entered the time as a per-week total; when a respondent instead entered time as a per-year total, we used the reported number of contracted workdays to convert, for each activity, the yearly hours into weekly hours. (We also factored in teachers’ summer hours.) We then summed the weekly hours across all activities and checked to see how closely this cross-activity sum matched the total amount of time the respondent reported working in a week. Although the survey instructed respondents to attempt to make the two amounts of time match, we nevertheless found some discrepancies. We assumed that the reported total number of weekly hours worked was likely to be more accurate than the sum of the reported hours worked on each of the individual activities, so we used the reported total weekly hours in our analyses rather than the sum of hours across activities. In particular, we calculated the percentage of weekly hours spent on the various activities and then multiplied the percentage of time for each activity by the total number of reported weekly hours. In other words, we created a revised weekly total of hours per activity that preserved the relative proportions of time spent on each activity but rescaled the hours themselves to sum to the total number of reported weekly hours.

Requirements for Inclusion in Analysis

To be included in the analysis, a respondent had to complete the time allocation section of the survey, particularly the time worked for each regular school day activity. We included in the sample any respondent who did not report the total time worked during the regular school year if he or she reported the time for each activity; for these respondents, we imputed total time per week using the sum of time across activities. In addition, we dropped from the analysis any respondent who had outlying values on the sum of his or her regular school year hours across all activities; we defined outliers as those whose sum fell outside the outer fences (the 25th percentile minus three times the interquartile range and the 75th percentile plus three times the interquartile range). We also dropped any teacher survey on which the respondent reported working less than 11 hours per week and any SL survey on which the respondent reported working less than ten hours per week.

Analytic Samples

Table L.3 provides the number of teachers and SLs we excluded from the analysis and the summary statistics for the final samples. Table L.4 shows the final sample sizes, by site.

109

Table L.3. Detailed Description of SL and Teacher Survey Sample Exclusions for the Time Allocation Analysis

Survey

Number of Surveys

Final Sample Size

Total Weekly Hours

Responding

Missing Time

Allocation Dataa

(Dropped)

Low Hour Outliersb (Dropped)

High Hour Outliersc (Dropped) Mean SD Min Max

Teacher

2012–2013

3,602 140 41 102 3,319 56.0 12.2 11 100

2014–2015

3,625 226 40 82 3,277 56.6 13.3 12 100

School leader

2012–2013

845 41 6 43 755 58.0 13.8 8 220

2014–2015

836 57 10 36 733 59.7 22.2 7 375

SOURCES: Teacher and SL surveys from the springs of 2013 and 2015. NOTE: Numbers and hours are unweighted. a We excluded from the sample any respondent who did not report time worked for each regular school day activity. b We excluded from the sample any respondent who reported weekly hours below the value of the lower fence (the 25th percentile minus three times the interquartile range). In addition, we excluded from the sample any teacher who reported working less than 11 hours per week and any SL who reported working less than ten hours per week. c We excluded from the sample any respondent with weekly hours larger than the value of the outer fence (the 75th percentile plus three times the interquartile range). For teachers, the outer fence was 136 hours in 2012–2013 and 139 hours in 2014–2015. For SLs, the outer fence was 118 hours in 2012–2013 and 128 hours in 2014–2015.

Table L.4. Final Sample Sizes, by Site

Site

Teachers in Sample SLs in Sample

2012–2013 2014–2015 2014–2015 2014–2015 District 2,437 2,337 653 624

HCPS 966 944 423 389

PPS 543 538 53 50

SCS 928 855 177 185

CMO 882 940 102 109 Alliance 294 334 28 39

Aspire 270 255 29 28

Green Dot 194 212 29 28

PUC 124 139 16 14

Total 3,319 3,277 755 733

SOURCES: Teacher and SL surveys from the springs of 2013 and 2015.

110

Estimation of the Value of Teacher and SL Time Spent on Evaluation Activities

Data

The RAND data team collected individual-level 2014–2015 compensation data from the sites and provided them, deidentified, to us for all teachers and SLs in each site. The data contained fields for several different types of compensation, including base salary, benefits, and bonuses; these are all the types of compensation included:29

• base salary • medical and health benefits • retirement systems • sick-pay sources • life insurance benefits • performance, including bonuses • teacher reimbursements and stipends • paid time off, including holiday and vacation • disability benefits • overtime • instruction beyond the normal school day or school year • teaching workshops • adjustments and differentials • summer teaching and activities • substitute-related activities • extracurricular activities, including athletics and clubs • TIF • other.

We calculated each individual’s overall compensation by summing across all the types of compensation.

Data Analysis

We created our teacher analysis sample based on the following criteria: Total compensation had to be between $15,000 and $150,000. We assumed that teachers whose compensation was not in this range were highly atypical and should not be included in our compensation analyses. For SLs, we did not find any substantially low or high values, so we did not exclude any observations.30

29 Compensation data are from FY 2015. 30 SLs’ total compensation ranged from $53,000 to $275,000.

111

Then we summed the overall compensation amounts across individuals to estimate each site’s total compensation-related expenditures. We did this separately for teachers and SLs.

Finally, to estimate the value of teacher and SL time allocated to evaluation activities, we simply multiplied the percentage of total time they spent on evaluation by the total compensation expenditures. Tables L.5 and L.6 illustrate our calculations.

Table L.5. Value of Teacher Time Spent on Evaluation Activities

Site

Total Compensation Expenditures, in

Dollars

Percentage of Time Spent on Evaluation, 2014–2015

Estimated Value of Time on

Evaluation, in Dollars Enrollment

Per Pupil, in Dollars

HCPS 755,493,727 3.5 26,442,280 193,532 137

PPS 67,677,213 4.6 3,113,152 25,504 122

SCS 326,593,440 5.7 18,615,826 117,269 159

Alliance 30,337,756 5.7 1,729,252 11,000 157

Aspire 44,701,196 4.7 2,100,956 14,682 143

Green Dot 31,865,087 3.1 987,818 11,909 83

PUC 12,766,385 3.8 485,123 4,800 101

SOURCES: Spring 2015 teacher survey and compensation data for FY 2015.

Table L.6. Value of SL Time Spent on Evaluation Activities

Site

Total Compensation Expenditures, in

Dollars

Percentage of Time Spent on Evaluation, 2014–2015

Estimated Value of Time on Evaluation Enrollment

Per Pupil, in Dollars

HCPS 23,931,910 23.6 5,643,261 193,532 29

PPS 6,338,235 26.7 1,690,618 25,504 66

SCS 34,333,684 26.3 9,034,014 117,269 77

Alliance 6,666,519 18.6 1,242,008 11,000 113

Aspire 2,633,081 22.4 590,197 14,682 40

Green Dot 5,317,577 23.4 1,241,865 11,909 104

PUC 2,456,479 15.0 369,369 4,800 77

SOURCES: Spring 2015 SL survey and compensation data for FY 2015.

113

Appendix M. Additional Exhibits for Chapter Nine

Table M.1. Teacher Time Allocation Mean Percentages, by Site

Site Construct 2013 2015 Difference HCPS Classroom instruction 51 51 0

Instructional planning 21 21 0 Administration 4 4 0 Contact with students and families 8 8 –1 PD 12 12 0 Mentoring and evaluation 3 4 0 Reform 1 1 0

PPS Classroom instruction 46 46 0 Instructional planning 22 23 1 Administration 5 5 0 Contact with students and families 9 9 0 PD 12 12 0 Mentoring and evaluation 5 5 –1 Reform 1 1 0a

SCS Classroom instruction 51 50 –1 Instructional planning 17 18 1 Administration 4 4 0 Contact with students and families 10 9 –1 PD 13 13 0 Mentoring and evaluation 4 6 1a Reform 1 1 0

Alliance Classroom instruction 51 49 –2a Instructional planning 22 21 –1 Administration 4 4 0 Contact with student and families 6 6 0 PD 13 14 1 Mentoring and evaluation 4 6 2a Reform 0 0 0

Aspire Classroom instruction 46 47 1 Instructional planning 28 22 –6a Administration 5 5 1 Contact with student and families 6 7 1a PD 11 14 3a

114

Site Construct 2013 2015 Difference Mentoring and evaluation 5 5 0 Reform 1 1 0

Green Dot Classroom instruction 47 49 3 Instructional planning 24 23 –2 Administration 5 5 0 Contact with student and families 7 7 0 PD 13 13 0 Mentoring and evaluation 4 3 –1 Reform 1 0 0a

PUC Classroom instruction 44 46 2 Instructional planning 28 22 –6a Administration 4 5 1a Contact with student and families 6 8 1a PD 14 15 1a Mentoring and evaluation 3 4 1a Reform 1 0 0a

SOURCES: Teacher surveys from the springs of 2013 and 2015. NOTE: We calculated differences using unrounded data and then rounded to the nearest whole percentage. a The difference between years is statistically significant at p < 0.05.

Table M.2. SL Time Allocation Mean Percentages, by Site

Site Construct 2013 2015 Difference HCPS Administration 50 52 2a

Classroom instruction 0 0 0a Evaluation 24 24 0a PD received 14 14 0a PD provided 6 6 0a Recruitment 2 2 0a Reform 3 3 0a

PPS Administration 45 44 –1 Classroom instruction 0 0 0 Evaluation 28 27 –1 PD received 16 17 1 PD provided 8 9 1 Recruitment 1 2 1a Reform 3 3 0

115

Site Construct 2013 2015 Difference MCS Administration 39 43 4a

Classroom instruction 0 0 0 Evaluation 29 26 –3a PD received 17 16 –1 PD provided 9 9 0 Recruitment 3 2 –1a Reform 3 3 0

Alliance Administration 49 50 1 Classroom instruction 2 2 0 Evaluation 22 19 –3a PD received 12 16 4a PD provided 9 9 0 Recruitment 3 2 –1 Reform 3 3 0

Aspire Administration 47 44 –3 Classroom instruction 5 4 –1 Evaluation 23 22 –1 PD received 13 14 1 PD provided 9 11 2 Recruitment 3 3 0 Reform 1 2 1

Green Dot Administration 49 44 –5a Classroom instruction 0 0 0 Evaluation 22 23 1 PD received 15 14 –1 PD provided 10 14 4a Recruitment 3 3 0 Reform 2 2 0

PUC Administration 42 38 –4 Classroom instruction 0 1 1 Evaluation 24 15 –9a PD received 14 20 6a PD provided 15 21 6 Recruitment 3 4 1a Reform 3 1 –2a

SOURCES: SL surveys from the springs of 2013 and 2015. NOTE: We calculated differences using unrounded data and then rounded to the nearest whole percentage. a The difference between years is statistically significant at p < 0.05.

116

Table M.3. Principal and AP Time Allocation Mean Percentages, by Site

Site Construct

2013 2015

Principal AP Difference Principal AP Difference

HCPS Administration 42 55 13a 45 56 11a Classroom instruction

0 0 0a 0 0 0

Evaluation 31 19 –12a 30 20 –10a PD received 14 14 0 13 14 1a PD provided 6 7 1a 6 6 0 Recruitment 3 2 –1a 3 2 –1a Reform 4 3 –1a 3 2 –1a

PPS Administration 37 62 25a 44 46 0 Classroom instruction

0 0 0 0 0 0

Evaluation 34 16 –18a 27 22 –5a PD received 17 12 –5a 15 20 0 PD provided 9 5 –4a 10 8 0 Recruitment 1 2 0 2 2 0 Reform 2 3 0 3 3 0

SCS Administration 37 40 3a 40 45 5a Classroom instruction

0 0 0 0 0 0

Evaluation 30 28 –2a 28 25 –3a PD received 16 19 3a 17 16 0 PD provided 9 8 0 10 8 –2a Recruitment 3 2 –1a 2 2 0 Reform 4 3 –1a 3 3 0

Alliance Administration 54 46 –8a 49 50 0 Classroom instruction

1 2 1a 1 2 0

Evaluation 16 26 10a 22 16 –6a PD received 14 11 –3a 12 18 6a PD provided 11 8 –3a 9 10 0 Recruitment 2 3 0 3 2 –1a Reform 2 4 2a 4 2 –2a

117

Site Construct

2013 2015

Principal AP Difference Principal AP Difference Aspire Administration 39 61 22a 40 48 0

Classroom instruction

1 12 11a 0 9 9a

Evaluation 29 10 –19a 29 16 –13a PD received 16 6 –10a 11 17 0 PD provided 9 9 0 15 7 –8a Recruitment 4 1 –3a 3 2 0 Reform 2 0 –2a 2 2 0

Green Dot Administration 52 48 0 41 46 0 Classroom instruction

0 0 0 0 0 0

Evaluation 21 22 0 21 25 0 PD received 12 16 4a 15 14 0 PD provided 10 10 0 17 12 –5a Recruitment 4 2 –2a 5 1 –4a Reform 2 2 0 1 3 0

PUC Administration 49 36 –13a 37 38 0 Classroom instruction

0 0 0 0 1 0

Evaluation 19 27 8a 15 15 0 PD received 13 15 2a 23 18 –0 PD provided 12 18 6a 20 22 0 Recruitment 3 2 –1a 4 5 0 Reform 4 2 –2a 1 1 0

SOURCES: SL surveys from the springs of 2013 and 2015. NOTE: We calculated differences using unrounded data and then rounded to the nearest whole percentage. a The difference between principals and APs is statistically significant at p < 05.

119

Appendix N. Additional Exhibits for Chapter Ten

This appendix presents additional information about trends in the effectiveness (measured in terms of both VAM score and composite TE level) of experienced teachers for each of the sites, supplementing the analysis presented in Chapter Ten of the effectiveness of newly hired teachers. We examine the trends in the VAM scores and composite TE levels of experienced teachers as a check on potential drift in composite TE measures of new teachers. We adjusted VAM scores based on state NAEP performance trends to make them equivalent across states and over time. If the changes in the composite TE levels of new hires parallel changes of more-experienced teachers, there are two possible explanations: (1) a drift in the composite TE measure that does not reflect true improvement or (2) an increase in composite TE for all existing teachers and an improvement in teacher-preparation programs such that new teachers are also more effective over time. This comparison facilitates the analysis of whether the changes in hiring policies resulted in true improvements in the effectiveness of new hires.

HCPS Figure N.1 shows the trends in the VAM scores and composite TE levels of middle-

experience teachers (those with three to five years of experience) in HCPS. By VAM score, the distributions are generally as expected until 2014–2015, when the percentage of middle-experience teachers in the bottom 20 percent of the VAM distribution increased and the percentage in the middle 60 percent decreased. Alternatively, by composite TE level, we see a general decrease in the proportion of middle-TE teachers and an increase in the proportion of high-TE teachers (those with six or more years of experience) over time.

Figure N.1. HCPS Middle-­Experience Effectiveness, by VAM Score and Composite TE Level

020

4060

8010

0Pe

rcen

t

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

VAM of Teachers with 3-5 Yrs Experience

Bottom 20% Middle Top 20%

020

4060

8010

0Pe

rcen

t

2010-11 2011-12 2012-13 2013-14 2014-15

TE of Teachers with 3-5 Yrs Experience

Low TE Mid TE High TE

120

Figure N.2 shows the trends in the VAM scores and composite TE levels of high-experience teachers in HCPS. By VAM score, the distributions are as expected throughout the period. Alternatively, by composite TE, we see a general decrease in the proportion of middle-TE teachers and an increase in the proportion of high-TE teachers over time.

Figure N.2. HCPS High-­Experience Effectiveness, by VAM Score and Composite TE Level

The stability of VAM scores for middle- and high-experience teachers, combined with the improvement of composite TE for these same two groups, suggests that there is upward drift in the composite TE ratings for these groups. This is similar to the measurement drift observed among new hires in Figure 10.5 in Chapter Ten.

PPS Figure N.3 shows the trends in the VAM scores and composite TE levels of middle-

experience teachers in PPS. By VAM score, the distributions are variable but without a consistent pattern. Alternatively, by composite TE, we see a general decrease in the proportion of low- and middle-TE teachers and an increase in the proportion of high-TE teachers over time.

020

4060

8010

0Pe

rcen

t

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

VAM of Teachers with 6+ Yrs Experience

Bottom 20% Middle Top 20%0

2040

6080

100

Perc

ent

2010-11 2011-12 2012-13 2013-14 2014-15

TE of Teachers with 6+ Yrs Experience

Low TE Mid TE High TE

121

Figure N.3. PPS Middle-­Experience Effectiveness, by VAM Score and Composite TE Level

Figure N.4 shows the trends in the VAM scores and composite TE levels of high-experience teachers in PPS. By VAM score, the distributions are as expected throughout the period. Alternatively, by composite TE, we see a general decrease in the proportion of middle-TE teachers and an increase in the proportion of high-TE teachers over time.

Figure N.4. PPS High-­Experience Effectiveness, by VAM Score and Composite TE Level

VAM score stability for middle- and high-experience teachers, combined with the improvement of composite TE for these same two groups, suggests that there is upward drift in the composite TE ratings for these groups. This is similar to the measurement drift observed among new hires in Figure 10.6 in Chapter Ten.

SCS Figure N.5 shows the trends in the VAM scores and composite TE levels of middle-

experience teachers in SCS. By VAM score, the distributions are consistent and as expected, with an increase in the percentage of middle-experience teachers in the top 20 percent of the

020

4060

8010

0Pe

rcen

t

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

VAM of Teachers with 3-5 Yrs Experience

Bottom 20% Middle Top 20%

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 3-5 Yrs Experience

Low TE Mid TE High TE

020

4060

8010

0Pe

rcen

t

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

VAM of Teachers with 6+ Yrs Experience

Bottom 20% Middle Top 20%

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 6+ Yrs Experience

Low TE Mid TE High TE

122

VAM score distribution in 2014–2015. By composite TE, we see an earlier increase in the proportion of high-TE teachers accompanied by a decrease in the proportion of low- and middle-TE teachers over time.

Figure N.5. SCS Middle-­Experience Effectiveness, by VAM Score and Composite TE Level

Figure N.6 shows the trends in the VAM scores and composite TE levels of high-experience teachers in SCS. By VAM score, the distributions are as expected throughout the period. Alternatively, by composite TE, we see a general decrease in the proportion of middle- and low-TE teachers and an increase in the proportion of high-TE teachers over time.

Figure N.6. SCS High-­Experience Effectiveness, by VAM Score and Composite TE Level

VAM score stability for middle- and high-experience teachers, combined with the early improvement of composite TE for these same two groups, suggests that there is upward drift in the composite TE ratings for these groups.

020

4060

8010

0Pe

rcen

t

2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

VAM of Teachers with 3-5 Yrs Experience

Bottom 20% Middle Top 20%0

2040

6080

100

Perc

ent

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 3-5 Yrs Experience

Low TE Mid TE High TE

020

4060

8010

0Pe

rcen

t

2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

VAM of Teachers with 6+ Yrs Experience

Bottom 20% Middle Top 20%

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 6+ Yrs Experience

Low TE Mid TE High TE

123

Alliance Figure N.7 shows the trends in the composite TE levels of middle-experience teachers in

Alliance. We see a decrease in the proportion of low- and middle-TE teachers accompanied by an increase in the proportion of high-TE teachers over time.

Figure N.7. Alliance Middle-­Experience Effectiveness, by Composite TE Level

Figure N.8 shows the trends in the composite TE levels of high-experience teachers in Alliance. None of these teachers was in the low-TE category. Also, we see a decrease in the proportion of middle-TE teachers accompanied by an increase in the proportion of high-TE teachers over time.

Figure N.8. Alliance High-­Experience Effectiveness, by Composite TE Level

Without VAM scores, the conclusions that we can draw are limited. However, the level of turnover among middle- and high-experience teachers does not appear to be sufficient to explain

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 3-5 Yrs Experience

Low TE Mid TE High TE

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 6+ Yrs Experience

Low TE Mid TE High TE

124

the growth in the composite TE levels. Consequently, these results suggest that there is upward drift in the composite TE levels for these groups similar to the upward drift in the composite TE levels of new hires seen in Figure 10.8.

Aspire Figure N.9 shows the trends in the VAM scores and composite TE levels of middle-

experience teachers in Aspire. By VAM score, the distributions are relatively consistent and as expected (with greater variance than the districts due to sample size). We observe a slight increase in the percentage of middle-experience teachers in the top 20 percent of the VAM score distribution in 2013–2015. By composite TE level, we see an earlier increase in the proportion of high-TE teachers accompanied by a decrease in the proportion of low- and middle-TE teachers over time.

Figure N.9. Aspire Middle-­Experience Effectiveness, by VAM Score and Composite TE Level

Figure N.10 shows the trends in the VAM scores and composite TE levels of high-experience teachers in Aspire. By VAM score, the distributions are highly variable, which is expected given the sample sizes. However, we observe a decrease in the percentage of high-experience teachers in the bottom 20 percent of the VAM score distribution over time and an accompanying increase in the percentage of teachers in the top 20 percent of the distribution. as expected throughout the period. We also see a decrease in the proportion of middle-TE teachers and an increase in the proportion of high-TE teachers over time.

020

4060

8010

0Pe

rcen

t

2008-09 2009-10 2010-11 2011-12 2012-13 2013-14

VAM of Teachers with 3-5 Yrs Experience

Bottom 20% Middle Top 20%

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 3-5 Yrs Experience

Low TE Mid TE High TE

125

Figure N.10. Aspire High-­Experience Effectiveness, by VAM Score and Composite TE Level

VAM score stability for middle-experience teachers, combined with the improvement of composite TE, suggests that there is upward drift in the composite TE levels for middle-experience teachers. However, the increase in VAM scores among high-experience teachers between 2008–2009 and 2013–2014 suggests that the increase in composite TE over the same period could reflect actual improvements in the effectiveness of high-experience teachers. This differs from the trend of less effective new hires observed in Figure 10.9 in Chapter Ten.

Green Dot Figure N.11 shows the trends in the composite TE levels of middle-experience teachers in

Green Dot. We see a decrease in the proportion of low- and middle-TE teachers accompanied by an increase in the proportion of high-TE teachers over time.

Figure N.11. Green Dot Middle-­Experience Effectiveness, by Composite TE Level

020

4060

8010

0Pe

rcen

t

2008-09 2009-10 2010-11 2011-12 2012-13 2013-14

VAM of Teachers with 6+ Yrs Experience

Bottom 20% Middle Top 20%

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 6+ Yrs Experience

Low TE Mid TE High TE

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 3-5 Yrs Experience

Low TE Mid TE High TE

126

Figure N.12 shows the trends in the composite TE levels of high-experience teachers in Green Dot. None of these teachers was in the low-TE category. Also, we see a decrease in the proportion of middle-TE teachers accompanied by an increase in the proportion of high-TE teachers over time.

Figure N.12. Green Dot High-­Experience Effectiveness, by Composite TE Level

Without VAM scores, the conclusions that we can draw are limited. However, the level of turnover among middle- and high-experience teachers does not appear to be sufficient to explain the growth in the composite TE levels. Consequently, these results suggest that there is upward drift in the composite TE levels for these groups similar to the drift observed among new hires in Figure 10.10 in Chapter Ten.

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

TE of Teachers with 6+ Yrs Experience

Low TE Mid TE High TE

127

Appendix O. Estimating the Relationship Between TE and Retention: Analytic Methods for Chapter Eleven

Modeling Teacher Retention as a Function of Effectiveness The estimates presented in Chapter Eleven (specifically, Figures 11.8, 11.9, 11.12, 11.13,

11.16, 11.17, 11.20, 11.23, 11.24, and 11.27) and in Figures P.1 through P.6 in Appendix P result from modeling teacher retention as a function of TE (measured in terms of the site’s composite TE level or the study-calculated VAM score), controlling for the teacher’s age, teaching experience, educational attainment, gender, and race. This modeling conceptualizes teacher retention as responsive to effectiveness. Consequently, the estimates show the effect that composite TE levels and VAM scores measured in one year have on retention in the next year. Specifically, we regressed retention of teacher i in year t + 1, Rit + 1 on indicators of effectiveness from year t (E1it, E2it, and Eeit). We grouped effectiveness measures (composite TE levels and VAM scores) into three categories: E1it = 1 if the teacher received a low composite TE or VAM rating, E2it = 1 if the teacher received a middle composite TE or VAM rating, and E3it = 1 if the teacher received a high composite TE or VAM rating. We allowed the coefficients on E1it, E2it, and E3it to differ for each year. Additionally, we include a vector of control variables, including age, gender, race, educational attainment, and teaching experience, Xit. Furthermore, we centered each control variable in Xit by its annual mean and excluded the constant so that the coefficients β1t, β2t, and β3t give the expected retention likelihood for an average teacher of each effectiveness level and year. The following equation shows this specification:

We ran the models separately for each site. We used a linear probability model that avoids

bias introduced by model misspecification (i.e., arbitrarily assigning a distribution to the error terms). In Figures P.1 through P.6 in Appendix P, we plot the estimates of β1t, β2t, and β3t and their confidence intervals.

To examine and test whether composite TE levels or VAM scores had changed by the end of the study period, we estimated a more parsimonious model in which we grouped the years into three periods (pre-IP up through 2009–2010, early IP between 2010–2011 and 2012–2013, and late IP 2013–2014 onward). The model we estimated is similar to the previous model with the exception that, instead of years, t = 1,…,T, we use periods, p = 1,…,P. Again, we excluded the constant so that the coefficients β1p, β2p, and β3p give the expected retention likelihood for an

Rit+1 = Xitγ + β1t E1itt=1

T

∑ + β2t E 2itt=1

T

∑ + β3t E3itt=1

T

∑ + ε it .

128

average teacher of each effectiveness level and period. We centered each control variable by its period mean. The following equation shows this specification:

Tables O.1 and O.2 in Appendix O show the estimates of these retention rates, β1p, β2p, and

β3p, and their standard errors. We also chart these estimates and their confidence intervals in Figures 11.8, 11.9, 11.12, 11.13, 11.16, 11.17, 11.20, 11.23, 11.24, and 11.27 in Chapter Eleven. The blue, red, and green bars depict different effectiveness levels. In Figures 11.8, 11.12, 11.16, 11.20, 11.23, and 11.27, blue depicts low composite TE level, red depicts middle composite TE level, and green depicts high composite TE level. In Figures 11.9, 11.13, 11.17, and 11.24, blue depicts low VAM scores or the bottom 20 percent of the distribution, red depicts middle VAM scores or the middle 60 percent of the distribution, and green depicts high VAM scores or the top 20 percent of the distribution. The definition of each of these levels varies by district, but we specify them in the figure notes. Additionally, we group the results by period: pre-IP (up through 2009–2010), early IP (2010–2011 through 2012–2013), and late IP (2013–2014 onward), although composite TE level is not available for the pre-IP period.

Table O.1. Estimated Teacher-­Retention Percentages, by TE Level, Period, and Site, for All Teachers with Composite TE Levels

Site

Low TE Middle TE High TE

Early IP Late IP Early IP Late IP Early IP Late IP HCPS 65.30 57.14a 89.64 88.39 90.70 90.09

(1.32) (1.54) (0.21) (0.28) (0.25) (0.29)

PPS 74.33 62.03a 82.87 85.26a 82.60 85.90a

(2.32) (5.00) (1.36) (1.23) (1.93) (1.37)

SCS 81.00 81.76 86.83 82.94a 88.56 85.70a

(1.07) (1.07) (0.39) (0.39) (0.46) (0.41)

Alliance 63.58 56.11 84.65 78.87a 94.36 90.36

(6.04) (8.33) (1.40) (1.23) (4.07) (1.17)

Aspire 80.10 69.17 87.97 76.30a 85.31 86.06

(4.33) (8.04) (1.23) (1.20) (4.11) (3.79)

Green Dot 70.51 49.25 89.95 81.44a 91.80 87.46

(8.20) (24.17) (1.13) (2.31) (2.46) (2.32)

NOTE: Standard errors are shown in parentheses. We omitted any CMO without sufficient data (in this case, PUC). a Differs from the early IP estimated retention likelihood at p < 0.05.

Rit+1 = Xitγ + β1pE1itp=1

P

∑ + β2 pE 2itp=1

P

∑ + β3 pE3itp=1

P

∑ + ε it .

129

Table O.2. Estimated Teacher-­Retention Percentages, by Level of Value Added, Period, and Site, for All Teachers with VAM Scores

Site

Low VAM Middle VAM High VAM

Pre-­IP Early IP Late IP Pre-­IP Early IP Late IP Pre-­IP Early IP Late IP HCPS 88.52 88.27 86.05a 91.01 89.33 87.75a 90.06 90.38 89.45

(0.75) (0.78) (0.94) (0.40) (0.44) (0.52) (0.69) (0.70) (0.81)

PPS 70.95 72.53 77.39 76.64 76.72 78.98 78.79 78.59 80.41

(3.01) (2.96) (3.20) (2.05) (2.10) (2.24) (2.45) (2.61) (2.87)

SCS 93.71 83.35 76.92a 94.64 86.42 83.37a 94.94 91.19 85.89a

(1.54) (1.41) (2.13) (0.83) (0.80) (1.09) (1.34) (1.01) (1.68)

Aspire 80.53 67.39 52.71a 80.94 79.21 69.67a 81.13 85.54 58.61a

(5.04) (4.93) (9.29) (2.97) (2.48) (4.90) (4.89) (4.07) (9.34)

NOTE: Standard errors are shown in parentheses. We omitted any CMO without sufficient data (in this case, Alliance, Green Dot, and PUC). a Differs from the pre-­IP estimated retention likelihood at p < 0.05.

For comparison purposes, we provide Tables O.3 and O.4 that are identical to Tables O.1 and

O.2 except that they use only teachers who have both VAM scores and composite TE levels—reading and mathematics teachers in grades 4 through 8 in the early IP and late IP periods. In HCPS, PPS, and Aspire, the patterns are similar to those in Table O.1, which uses all teachers with either composite TE levels or VAM ratings except that the estimates with the restricted sample are less precise. However, for SCS, we see a very different pattern. For this subset of teachers, the retention rates rise rather than fall from early IP to late IP. This reflects very high exit rates for a subset of teachers in 2013–2014 and 2014–2015 who had VAM scores but not composite TE levels. We think that this is primarily a group of teachers who left the district and for whom the district did not calculate composite TE level.

130

Table O.3. Estimated Teacher-­Retention Percentages, by TE Level, Period, and Site, for All Teachers with Both Composite TE Levels and VAM Scores

Site

Low TE Middle TE High TE

Early IP Late IP Early IP Late IP Early IP Late IP HCPS 67.82 47.23a 89.91 88.62 90.88 89.05a

(3.64) (4.70) (0.47) (0.66) (0.51) (0.61)

PPS 68.03 62.36 84.07 84.60 85.03 85.86

(5.94) (11.08) (3.12) (2.83) (3.71) (3.25)

SCS 80.19 92.22a 87.55 95.64a 90.33 96.82a

(2.57) (2.01) (1.01) (0.73) (1.13) (0.71)

Aspire 89.28 61.56 87.06 69.05a 84.67 76.68

(4.45) (17.50) (2.54) (4.62) (8.28) (10.64)

NOTE: Standard errors are shown in parentheses. We omitted any CMO without sufficient data (in this case, Alliance, Green Dot, and PUC). a Differs from the early IP estimated retention likelihood at p < 0.05.

Table O.4. Estimated Teacher-­Retention Percentages, by Level of Value Added, Period, and Site, for All Teachers with Both Composite TE Levels and VAM Scores

Site

Low VAM Middle VAM High VAM

Early IP Late IP Early IP Late IP Early IP Late IP HCPS 88.64 86.57 89.68 87.80a 90.47 88.67

(0.78) (1.04) (0.46) (0.59) (0.71) (0.95)

PPS 79.71 85.16 81.77 83.13 85.16 85.65

(4.14) (3.36) (3.31) (3.07) (3.57) (3.46)

SCS 83.99 93.85a 87.72 96.20a 90.81 95.65a

(1.79) (1.39) (0.98) (0.64) (1.32) (1.06)

Aspire 82.51 71.79 88.32 72.28a 87.22 60.09a

(6.24) (10.71) (2.57) (5.08) (4.51) (9.28)

NOTE: Standard errors are shown in parentheses. We omitted any CMO without sufficient data (in this case, Alliance, Green Dot, or PUC). a Differs from the early IP estimated retention likelihood at p < 0.05.

131

Appendix P. Additional Exhibits for Chapter Eleven

This appendix presents annual trends in teacher retention in the sites and a sensitivity check using two-year retention data.

Annual Trends in Retention Rates

HCPS

The left-hand side of Figure P.1 shows that, from 2010–2011 through 2014–2015, high-TE teachers were more likely than middle- and low-TE teachers to remain and that the likelihood remained relatively stable. The differences between high- and low-rated teachers are statistically significant for each year. On one hand, the retention of high-TE teachers and middle-TE teachers did not increase because of any actions that the district took during this period. On the other hand, the likelihood that low-TE teachers would remain teaching decreased over the period. We discuss dismissal-related policies in Chapter Five, and this decline in retention could have been related to those efforts. The largest year-to-year change occurred between 2010–2011 and 2011–2012, the point at which HCPS implemented a policy enabling effectiveness to be used as a basis for dismissal. The retention rate for low-TE teachers significantly decreased in 2014–2015, the point at which the full composite TE rating, using a three-year average of results, became available for use. Note that the confidence intervals for the low-TE estimates are much larger than those for middle- and high-TE. This is because the sample of low-TE teachers is smaller than for the other categories (see Figures 11.6 and 11.7 in Chapter Eleven).

132

Figure P.1. Adjusted Percentage of Teachers Remaining in HCPS, by Year, Composite TE Level, and VAM Score

NOTE: For any given year, we classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low TE = U or NI, middle TE = E, and high TE = HE level 4 or HE level 5. Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. The composite TE measure was available beginning in 2010–2011. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.1 shows the average likelihood of teacher retention, by VAM

score, over time. VAM estimates are available in four additional years—three years prior to the composite TE rating (2007–2008, 2008–2009, and 2009–2010) and one year after (2015–2016). Fewer teachers had VAM scores than composite TE levels in each year, making the estimated likelihood of retention by VAM scores less precise for each year. In contrast to the composite TE results, the likelihood of retention of low-, middle-, and high-VAM teachers did not significantly differ from one to another VAM level in any year from 2010–2011 through 2014–2015. Although the differences were not statistically significant, in each of these years, high-VAM teachers were more likely to remain teaching than both middle- and low-VAM teachers. However, in 2008–2009 and 2009–2010, before the initiative, high-VAM teachers were also more likely than low-VAM teachers to remain teaching. In 2015–2016, the high-VAM teachers were more likely than middle- and low-VAM teachers to remain teaching and more likely than high-VAM teachers in previous years to remain teaching.

PPS

Each year, HE PPS teachers were more likely than less effective teachers to remain teaching in the site; however, the likelihood of HE teachers remaining in teaching did not generally increase over time. The left-hand side of Figure P.2 shows the likelihood that teachers would remain in teaching, by composite TE level, over time in PPS. Overall, we observe that, in each year, middle-TE and high-TE teachers were more likely than low-TE teachers to remain

5060

7080

9010

0Pe

rcen

t

2010-11 2011-12 2012-13 2013-14 2014-15

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

8085

9095

100

Perc

ent

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

133

teaching. These differences were statistically significant from 2012–2013 through 2014–2015. The likelihood that high-TE teachers would remain teaching significantly increased in 2012–2013 relative to 2011–2012. Additionally, beginning with 2013–2014, we observe a statistically significant decrease from 2011–2012 and 2012–2013 in the likelihood that low-TE teachers would remain in teaching. This could be driven by the 2013–2014 policy change to place low-TE teachers on improvement plans (see Chapter Five).

Figure P.2. Adjusted Percentage of Teachers Remaining in PPS, by Year, Composite TE Level, and VAM Score

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low TE = F or NI, middle TE = P, and high TE = D. Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. We based TE results for 2011–2012 on a pilot version of PPS’s composite measure that was never shared with teachers or SLs;; the composite TE level became fully operational, with stakes attached, beginning in 2013–2014. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.2 shows the likelihood of retention for PPS teachers, by

VAM level and year. The point estimates suggest that, in each year, high-VAM teachers were more likely than lower-VAM teachers to remain teaching in the district. However, because we could calculate VAM scores for only a small fraction of teachers in PPS, the estimates are imprecise, and it is difficult to discern any temporal patterns by year in retention among teachers in terms of VAM score. Retention of each type of teacher increased in later years, but the estimates do not indicate significantly different likelihood of retention. Additionally, we do not observe a decrease in the likelihood that low-VAM teachers remain teaching, in contrast to the decrease observed for low-TE teachers.

5060

7080

9010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

6070

8090

100

Perc

ent

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

134

SCS

HE teachers were more likely than less effective teachers to remain teaching in SCS in several of the years. The left-hand side of Figure P.3 shows changes in the retention likelihood, by composite TE level, over time in SCS. In each year except 2014–2015, middle- and high-TE teachers were significantly more likely than less effective teachers to remain teaching. The likelihood that low- and middle-TE teachers would remain as teachers in SCS decreased significantly in 2012–2013, increased substantially the next two years, and then plummeted in 2015–2016. This instability probably reflects the merger of legacy SCS and legacy MCS in June and July 2013 and the creation of the ASD. We know that these changes affected teachers’ career decisions to some degree. For example, during the 18-month merger negotiation, many teachers we interviewed reported feeling unsure about their job security following the merger. This perceived lack of security might have motivated many teachers, particularly low- and middle-TE teachers, to leave. We do not have good explanations for the other changes shown in Figure P.3.

Figure P.3. Adjusted Percentage of Teachers Remaining in SCS, by Year, Composite TE Level, and VAM Score

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low = performing significantly below or below expectations, middle = meeting or performing above expectations, and high = performing significantly above expectations. Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. We have computed the composite TE level only since 2011–2012. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.3 shows the retention likelihood, by VAM score, beginning

in 2009–2010, two years before the composite TE level was available. The likelihood that all teachers would remain in SCS substantially decreased in 2012–2013 for all levels of VAM scores, a pattern also observed in the composite TE results. This could be related to uncertainty caused by the merger or by schools (and teachers) being assigned to the ASD. Multiple retention

6070

8090

100

Perc

ent

2011-12 2012-13 2013-14 2014-15 2015-16

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

6070

8090

100

Perc

ent

2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

135

policy changes in 2013–2014 might have increased the retention of effective teachers and decreased the retention of ineffective teachers, including the reintroduction of effectiveness-based bonuses, CL changes, and the use of effectiveness as a basis for dismissal. However, we do not see any significant changes in 2013–2014 from 2012–2013.

Alliance

Each year, HE Alliance teachers were more likely than middle- or low-TE teachers to remain teaching; however, over time, the likelihood that high-TE teachers would remain did not increase. The likelihood that low-TE and middle-TE teachers would remain in teaching decreased in 2012–2013. Figure P.4 shows the changes in the likelihood of retention, by composite TE level, over time in Alliance. In each year, high-TE teachers were significantly more likely than low-TE teachers to remain teaching. Comparing year to year, we see that the retention likelihood of middle- and high-TE teachers does not statistically differ between 2011–2012 and 2013–2014. There was a large and significant decrease in the retention likelihood of low- and middle-TE teachers in 2012–2013.

Figure P.4. Adjusted Percentage of Teachers Remaining in Alliance, by Year and Composite TE Level

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE category, low = entering or emerging, middle = E, and high = HE or master. Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. The composite TE level was available beginning in 2011–2012. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

3040

5060

7080

9010

0Pe

rcen

t

2011-12 2012-13 2013-14

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

136

Aspire

More-effective teachers in Aspire were more likely than less effective teachers to remain teaching, although the differences were not statistically significant. There were no statistically significant changes over time in the likelihood that high-TE teachers would remain teaching in Aspire. The left-hand side of Figure P.5 shows the changes in retention rates for Aspire, by composite TE level by year. Generally, low-TE teachers were the least likely to remain teaching; however, because of the smallness of the sample, the differences are not statistically significant. There was no significant change over time in the likelihood that high-TE teachers would remain teaching. The retention likelihood of low-TE teachers significantly decreased in 2012–2013. In 2013–2014, the retention likelihood for middle-TE teachers significantly decreased.

Figure P.5. Adjusted Percentage of Teachers Remaining in Aspire, by Year, Composite TE Level, and VAM Score

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low TE = emerging, middle TE = E, and high TE = HE or master. Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. The composite TE level was available beginning in 2011–2012. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.5 shows the retention likelihood, by each VAM level, for

Aspire between 2007–2008 and 2013–2014. Although the small number of teachers with VAM scores produced imprecise estimates for each year, we generally find that high-VAM teachers were more likely than low- and middle-VAM teachers to remain teaching. Overall, there is some discrepancy between the changes observed by composite TE level and by VAM score, and the imprecision of the estimates limits the conclusions we can draw.

6070

8090

100

Perc

ent

2011-12 2012-13 2013-14

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level40

6080

100

Perc

ent

2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

137

Green Dot

The most-effective Green Dot teachers were the most likely to remain teaching in the site, and there was no statistically significant change over time. The left-hand side of Figure P.6 shows the changes in retention rates for Green Dot, by composite TE level by year. In each year, middle- and high-TE teachers were more likely than low-TE teachers to remain teaching, although, because of sample size, the differences are significant only in 2012–2013. The retention likelihood for each composite TE level decreased each year, and the largest decrease in retention likelihood occurred among low-TE teachers.

Figure P.6. Adjusted Percentage of Teachers Remaining in Green Dot from One Year to the Next, by Composite TE Level

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low = entry or emerging, middle = E, and top = HE or HE 2. Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. The composite TE level was available beginning in 2011–2012. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

Sensitivity Check: Teacher Retention After Two Consecutive Years In addition to the estimates presented in Chapter Eleven (specifically, Figures 11.8, 11.9,

11.12, 11.13, 11.16, 11.17, 11.20, 11.23, 11.24, and 11.27, which assess teacher retention by site and period), we assess retention as a function of two consecutive years of effectiveness ratings.

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

138

Like we did with the other results, we estimate the following model for each site and for each year:

For this analysis, we classified teachers as low TE in a given year only if he or she received a

low composite TE level in year t and t – 1 (e.g., a teacher would have low TE in 2011–2012 only if he or she had a low composite TE level in 2010–2011 and 2011–2012). Similarly, high-TE teachers were those who received high TE scores two years in a row (e.g., high-TE teachers in 2014–2015 had high composite TE levels in 2013–2014 and 2014–2015). The middle-TE category consisted of teachers who had consecutive middle composite TE levels, had middle TE one year and low TE or high TE the next, had low TE one year and improved the next, or had high TE and then regressed. Similarly, for VAM scores, we denote teachers with two consecutive years in the bottom 20 percent as low VAM score, teachers with two consecutive years in the top 20 percent as high VAM score, and those in the middle 60 percent both years or those who shift from the bottom or top as middle VAM score. Note that, because we base these categories on composite TE levels from consecutive years, we do not report results for the first year that the composite TE level or VAM score was available.

HCPS

The left-hand side of Figure P.7 displays the results of analysis describing the likelihood of retention after two consecutive composite TE evaluations. Generally, these results are similar to those from previous analyses; they show that the most-effective teachers were significantly more likely than the least effective teachers to remain teaching. The retention likelihood for teachers with two years of low TE ratings significantly decreased in 2012–2013 and again in 2014–2015. In contrast, there was no change for middle-TE or high-TE teachers between 2010–2011 and 2014–2015; their likelihood of remaining in teaching did not increase over time. Additionally, the likelihood that the least effective teachers would remain in teaching gradually decreased over time. The retention likelihood for middle-TE or high-TE teachers was significantly higher than that of low-TE teachers from 2011–2012 through 2014–2015.

Rit+1 = Xitγ + β1t E1itt=1

T

∑ + β2t E 2itt=1

T

∑ + β3t E3itt=1

T

∑ + ε it .

139

Figure P.7. Adjusted Percentage of Teachers Remaining in HCPS Based on Two Consecutive Years of Ratings for TE and Teacher Value Added, by Effectiveness Level

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low TE = U or NI in two consecutive years, high TE = HE level 4 or HE level 5 in two consecutive years, and middle TE = all others (e.g., those with U and NI in one year and E the next, those with E in one year and HE level 4 in the next). Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.7 displays the results of analysis describing the likelihood of retention after two consecutive VAM score evaluations. Generally, teachers who received consecutive VAM scores in the top 20 percent were more likely than less effective teachers to remain teaching. However, in most years, the differences between the retention likelihood for low-, middle-, and high-VAM teachers were not statistically significant. The retention likelihood for high-VAM teachers significantly increased in 2015–2016, and the retention likelihood for low-VAM teachers significantly decreased in 2014–2015, but these patterns did not persist.

PPS

The left-hand side of Figure P.8 displays the results of analysis describing the likelihood of retention using two consecutive composite TE levels. The retention likelihood for middle- and high-TE teachers remained relatively constant during this period; however, the retention likelihood for low-TE teachers significantly decreased in 2014–2015. The decrease in the retention likelihood for low-TE teachers occurred one year later in this analysis than in the previous analysis using one-year categorization.

020

4060

8010

0Pe

rcen

t

2011-12 2012-13 2013-14 2014-15

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

8085

9095

100

Perc

ent

2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

140

Figure P.8. Adjusted Percentage of Teachers Remaining in PPS Based on Two Consecutive Years of Ratings for TE and Teacher Value Added, by Effectiveness Level

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low TE = F or NI in two consecutive years, high TE = D in two consecutive years, and middle TE = all others (e.g., those with F or NI in one year and P the next, those with P in one year and D in the next). Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.8 displays the results of analysis describing the likelihood of

retention using two consecutive VAM evaluations. Because of sample-size restrictions, we do not find any significant differences in terms of low-VAM versus high-VAM retention within year or changes over time within a VAM level.

SCS

The left-hand side of Figure P.9 describes the retention likelihood by consecutive composite TE levels. The results are similar to those presented in Figures 11.16 and 11.17 in Chapter Eleven; the only difference is that the likelihood that teachers who received consecutive low-TE evaluations would remain teaching is slightly lower in each year.

020

4060

8010

0Pe

rcen

t

2012-13 2013-14 2014-15

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

6070

8090

100

Perc

ent

2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

141

Figure P.9. Adjusted Percentage of Teachers Remaining in SCS Based on Two Consecutive Years of Ratings for TE and Teacher Value Added, by Effectiveness Level

NOTE: For any given year, we have classified every teacher as either remaining as a teacher in the district the following year or not. For composite TE level, low TE = performing significantly below or below expectations in two consecutive years, high TE = performing above or significantly above expectations in two consecutive years, and middle TE = all others (e.g., those with below expectations in one year and meeting expectations the next, those with meeting expectations in one year and above expectations in the next). Error bars show 95-­percent confidence intervals;; estimates control for teacher characteristics. We used multiple regression to adjust percentages for teacher characteristics, including gender, experience level, and education level. Therefore, differences between percentages can be associated with year, effectiveness level, or level of VAM score rather than these characteristics.

The right-hand side of Figure P.9 describes the retention likelihood by consecutive VAM

evaluations. Again, these results are similar to those in Chapter Eleven, the only difference being that the likelihood of separation is lower in each year for teachers who received consecutive low VAM scores.

6070

8090

100

Perc

ent

2012-13 2013-14 2014-15 2015-16

Low TE Mid TE High TE

Retention by Teacher Effectiveness Level

4060

8010

0Pe

rcen

t

2010-11 2011-12 2012-13 2013-14 2014-15

Bottom 20% Middle Top 20%

Retention by Teacher Value-Added Level

143

Appendix Q. Additional Exhibits for Chapter Twelve

Figure Q.1. SLs Agreement with Statements About Teacher Assignments, Springs 2014–2016

NOTE: Data are the percentage of SLs agreeing somewhat or strongly with the statement in the first column.

Table Q.1. Average and Standard Deviations of Teacher Value Added

Site Data Point

Mathematics Reading

Before Reform

Early Reform

Late Reform

Before Reform

Early Reform

Late Reform

HCPS

Mean –0.047 –0.080 –0.035a –0.033 –0.097 –0.102a

Standard deviation

0.114 0.170 0.216 0.098 0.087 0.107

PPS Mean –0.058 –0.072 –0.055 –0.015 –0.057 –0.035a

Standard deviation

0.169 0.170 0.155 0.145 0.151 0.124

SCS Mean –0.075 –0.071 –0.073a –0.185 –0.134 0.048a

Standard deviation

0.202 0.273 0.244 0.170 0.169 0.131

a Differs from before the reform at p < 0.05.

Percentage of school leaders agreeing somewhat or strongly that...

HCPS PPS SCS Alliance Aspire Green Dot PUC

In my school, the highest-­‐achieving students typically get the best teachers.

Parents have a lot of influence over which students get which teachers at my school.

Teachers at my school would be resistant to changing the methods by which teachers are assigned to classes.I have taken steps to ensure that students with the greatest needs are taught by the most effective teachers.My school does a good job of matching students with teachers in ways that will benefit the most students.Teachers who are effective with high-­‐achieving students would probably be less effective with low-­‐achieving students.

44

24

42

78

84

39

38

20

29

80

90

31

36

20

33

79

87

27

2014 20152016

40

14

39

89

87

25

39

19

40

88

90

26

40

19

41

87

89

29

2014 20152016

19

2

31

72

79

23

25

5

30

68

77

20

15

8

44

59

67

23

2014 20152016

6

3

10

64

76

13

3

6

24

90

93

17

6

21

27

82

82

33

2014 20152016

22

6

42

68

74

12

39

9

55

84

79

33

38

0

51

67

60

8

2014 20152016

6

6

18

57

69

19

0

0

7

26

46

7

0

0

40

45

63

17

2014 20152016

16

12

38

79

73

19

17

13

41

81

81

28

16

12

41

82

67

38

2014 20152016

145

Appendix R. The Initiative’s Effects on TE and LIM Students’ Access to Effective Teaching: Analytic Methods for Chapter Twelve

Chapter Twelve empirically evaluates the extent of access that LIM students have to effective teachers. This appendix describes the methodology used to determine the access parameters, changes in these parameters over time, and decompositions of the access coefficients into different mechanisms.

Relationship Between Percentage of Students Who Are LIM Students and Teacher Value Added There are considerable differences in VAM scores among teachers, suggesting that students

in the same site might be taught by teachers of very different performance levels. Therefore, after estimating teacher effects, we estimated three relationships between student LIM proportions in year t and teacher effects in year t. The first is the overall relationship, representing the extent to which each teacher’s fitted value added, 𝜇$%, is related to the proportion of his or her students who are LIM, regardless of the school in which he or she works. We captured this relationship with a second-stage regression, in which the parameter of interest, β1, represents the difference in 𝜇$%, associated with a unit difference in the share of all of teacher j’s students in year t who are LIM:31

(R.1) To account for the randomness associated with the estimation of 𝜇$% , we estimated these two

stages using generalized least squares—that is, we weighted the second-stage regression by the Cholesky decomposition of the inverse of the variance–covariance matrix associated with the estimation of µjt. Note that this also shrinks noisy estimates of the VAM scores and so is comparable to empirical Bayes shrinkage, a common postestimation strategy for teacher VAM scores (McCaffrey et al., 2004).

We were also interested in decomposing the effect of LIMjt into the within-school and between-school components to see whether sorting is particularly strong in one or both areas. To do so, instead of estimating Equation R.1 as the second stage, we estimated Equation R.2. 𝛳𝛳st is a fixed effect controlling for the school, s, in which the teacher works during year t. Including this fixed effect changes the interpretation of the coefficient on the share of the teacher’s students with LIM status. This coefficient in Equation R.2 now reflects the estimated difference between 31 Note that, because LIMjt is coded from 0 to 1, a unit difference is actually a 100-percentage-point difference.

µ jt = β0 + β1LIM jt +υ jt .

146

teachers within a school, rather than throughout the district (like in Equation R.1), with low and high percentages of students with LIM status:

(R.2) We also estimate a third regression (again, using generalized least squares), replacing the

LIM share of the teacher’s students (LIMjt) with the LIM share of the school’s students:

(R.3) γ1 represents the relationship of TE among schools based on the percentage of their students

who are LIM students. It reflects the sorting of TE between schools. Overall sorting, (β1), is a weighted average of within-school sorting and between-school sorting, with the weights

reflecting the ratio of the variances of between-teacher percentage LIM and between-school percentage LIM (Raudenbush and Bryk, 2002, p. 137).

It is important to note that γ1 also reflects anything about the school that makes all teachers in the school more or less productive, such as leadership effectiveness, special programs, or resources. Although it has been shown that teachers are the most-important school-based factors in students’ achievement growth, the presence of these other factors could bias our estimates of between-school sorting.

In estimating value added, an important consideration is whether to estimate teachers’ VAM scores using just their students in the current year or whether to include the performance of their prior-year students as well. Several studies have demonstrated marked improvement in the reliability of estimates of VAM scores when they incorporate the performance of the students the teachers taught not only in the current year but also in one or more previous years (Goldhaber and Hansen, 2010; Schochet and Chiang, 2010). Presumably for this reason, the PPS and SCS IP sites calculate estimates of teachers’ VAM scores based on estimates of VAM scores that average teachers’ performance across multiple years. Given that the sites’ estimates carry high stakes for teachers, this approach seems appropriate for strengthening the reliability of the estimates.

However, the downside of averaging VAM scores across years is that it likely understates true year-to-year variation in teacher performance. In the case of the IP evaluation, in which we were interested in gauging the initiative’s impact not only on teachers’ assignments to their schools but on changes in individual teachers’ effectiveness relative to other teachers in the same IP site, we estimated VAM scores based on the performance of a teacher’s students in the current year. Our own investigations have revealed this to be the correct choice in our setting across various loss functions. Although this might result in some instability due to the sample of students a teacher is assigned in a given year, it also allowed our estimates to capture true year-to-year changes in teachers’ relative effectiveness.

A related consideration we faced was whether to examine sorting of teacher VAM scores by student LIM composition in terms of teachers’ estimated effectiveness in the current or prior year. In this study, we focused on the sorting of LIM students in terms of teachers’ current-year

µ jt = β0' + β1

'LIM jt +θ st +υ jt .

µ jt = γ 0 + γ 1LIMst +ηst .

(β1' )

147

effectiveness estimates. This approach allows us to examine the extent to which LIM students have access to high-quality teaching in each year of the study, relative to their non-LIM peers in the same site. Changes in sorting patterns from year to year can arise for a variety of reasons. These include not only changes in how administrators assign existing teachers to classrooms or schools (or how teachers are encouraged to take different assignments) but also such factors as how new teachers are assigned and how teachers of LIM students are professionally developed or rewarded for improving their instructional practice. In other words, our approach takes into account all of the factors that can shift the relative quality of teaching that LIM students receive from year to year.

An alternative approach would have been to estimate the relationship between estimates of teachers’ prior-year VAM scores and the LIM statuses of their current students. This approach would capture the extent to which the sites were assigning teachers to classrooms or schools based on what was previously known about their performance. However, because schools typically do not have estimates of VAM scores available for the prior year until shortly before or even after the start of a new school year, we would actually have needed to use teachers’ VAM scores from two years prior to the current year to report on the extent to which sites were deliberately assigning teachers to schools or classrooms based on prior estimates of VAM scores. Moreover, because the systematic use of teachers’ VAM scores in decisionmaking was largely precipitated by the IP initiative, schools would not have been able to base assignments on prior-year VAM scores until the 2012–2013 school year in HCPS and the 2013–2014 school year in the other sites, so we would not have been able to detect these impacts in our current data. For all of these reasons, we focus instead on sorting of current-year VAM scores by teachers’ current-year student LIM composition. From a student’s perspective, this is the most important definition because it captures the relative quality of instruction that LIM students received in a given year.

In general, we pooled all teachers in grades 4 through 8 when we examined sorting. However, the greater variety of course offerings during MS suggests that there might be more sorting of students within schools during these years. The greater departmentalization suggests that within-school sorting might differ more between subjects in MS grades than in elementary school grades. Therefore, we also conducted the same sorting analysis after dividing teachers into elementary grades (grades 4 through 5) and MS grades (6 through 8).

Change in Access Coefficient: Interrupted Time-­Series Methodology To evaluate the change in the sorting coefficient, we used an interrupted time-series

regression. Equation R.4 presents this regression.

(R.4) We regressed VAM scores for teacher j in year t on the fraction of his or her students with LIM status, whether it is preinitiative (Post = 0 for academic years 2009–2010 and earlier) or recent (Post = 1 for academic years 2013–2014 and later), and the interaction between the two. To

VAM jt = β0 + β1LIM jt + β2Post jt + β3LIM jt × Post jt + ε jt .

148

emphasize the impact after relatively full implementation of the reforms, we did not include early initiative years in this regression. β3, the coefficient on the interaction, measures how the overall sorting coefficient changed from preinitiative to recent years and was the variable of interest. We weighted the regression using the inverse of the standard error of each measure of VAM score to weight toward measures with greater precision and clustered the standard errors at the school level. The within-school and between-school regressions are similar extensions by adding a posttreatment indicator, as well as an interaction between Post and LIM to the models explained in Baird et al., 2016.

Analysis of Mechanisms Used to Change Access We also examined whether LIM students’ access to high-VAM teaching changed during the

initiative by any of three possible mechanisms: (1) teachers with more LIM students have greater improvements in VAM scores, (2) higher-VAM teachers are reassigned to classes with more LIM students, or (3) exiting teachers of high-LIM classes are replaced with higher-VAM teachers than their counterparts in low-LIM classes. We refer to these three mechanisms as improve, reassign, and replace, respectively. To investigate this possibility, we decomposed the change in overall sorting into four components, as follows:

On the left-hand side of the equation are the estimates of the coefficients, βt and βt – 1, that

indicate LIM students’ level of access to high-VAM teachers in two consecutive years. The top line on the right-hand side of the equation is the difference between the access coefficients for teachers who are new and for the teachers they replace. This is weighted by the proportion of the staff in those two years who fall into those two categories,

which is the proportion of teachers who transition, on average, in those two years. The second element, ∆improve = βV1L0|stay – βV0L0|stay, measures the change in the sorting coefficients caused by changes in VAM scores across the two years. In this expression, βV1L0|stay is the estimated access coefficient on the subsample of teachers who stay between years 0 and 1, using year 1 VAM scores and year 0 LIM assignments, and βV0L0|stay is the access coefficient for the same population and assignments but using year 0 VAM scores. In other words, it measures what the change in the sorting coefficients would have been if each teacher’s fraction of students who were LIM

β1 − β0 =pnew + pexit

2βV 1L1|new − βV 0L0|exit( )

+pstay + pexp

2βV 1L0|stay − βV 0L0|stay + βV 0L1|stay − βV 0L0|stay( )+ R

= 1− p( )Δreplace + p Δimprove + Δreassign( )+ R.

1− p( ) = pnew + pexit( )2

,

149

students did not change across the two years but each teacher’s effectiveness was allowed to change as was observed. This is weighted by

the average fraction of teachers who stay and the fraction who return then (i.e., are experienced) the next year (in the case that the total number of teachers is the same across years, these two measures are identical). The third element is ∆reassign = βV0L1|stay – βV0L0|stay, the portion of the sorting coefficient changed by changes in assignments of teachers and their fractions of LIM students. Similar to before, βV0L1|stay is the estimated access coefficient for the teachers who stay between years 0 and 1, using year 0 VAM scores and year 1 LIM assignments, while βV0L0|stay is the same but uses year 0 LIM assignments. This then measures how the sorting coefficient would have changed if those teachers’ VAM scores had stayed the same but their fractions of LIM students changed as we observed in the data. It is weighted by the same p. The fourth element, R, is the residual difference between the actual difference in sorting coefficients and our decomposition. This is a complicated function of regression coefficients on various samples that largely cancels out. Another difference is that this decomposition does not use the WLS weights that the actual analysis uses. However, the correlation coefficient between the actual (WLS) difference in the sorting coefficients and our decomposition (leaving R out) is above 0.96, and a regression of the former on the latter yields an ordinary-least-squares coefficient of 0.903 (t-statistic of 19.82) with an intercept of –0.006 (t-statistic of –0.36). This demonstrates how close our decomposition is to a complete decomposition (leaving a negligible residual) even without accounting for the WLS (note that we did not perform any inference on these but used the statistics as guidance), and we use this version, which presents interpretable elements that can be examined.

p =pstay + pexp( )

2,

151

Appendix S. Additional Exhibits for Chapter Thirteen

Figure S.1. SLs’ Perceptions of “How Many Teachers in Your School” Possessed Various Skills, Springs 2013–2016

NOTE: Omitted response categories are “about half,” “a few,” and “none or almost none.” We did not ask this question in 2011.

Percentage of school leaders reporting that more than half of the teachers in their school...

HCPS SCS PPS Alliance Aspire Green Dot PUC

Have a good grasp of the subject matter they teach

Are ful ly prepared to teach based on the Common Core State Standards (math and ELA teachers ) or other relevant subject-­‐area s tandards (other teachers )

Have the skills needed to foster meaningful student learning

Have the skills needed to help students improve their performance on standardized testsAre able to promote learning among all students, even those who are difficult to teachEngage in regular, productive conversations with one another about how to improve instruction

Really believe every child can learn and be college ready

85

34

72

71

59

52

58

86

60

74

74

59

61

63

86

73

75

74

62

63

65

87

73

76

74

60

60

67

2013 20142015 2016

85

41

75

70

58

62

64

81

47

75

73

60

64

64

85

47

75

71

65

67

65

79

52

69

61

57

63

62

2013 20142015 2016

76

26

54

56

40

52

47

82

34

71

67

50

60

59

81

62

73

70

58

61

60

83

53

64

59

47

50

44

2013 20142015 2016

94

42

81

87

68

71

81

95

61

80

81

75

76

84

95

68

64

58

64

66

77

90

74

84

100

64

67

77

2013 20142015 2016

88

34

85

88

70

79

85

80

44

73

76

67

77

90

82

51

67

50

49

63

87

68

41

46

50

49

65

68

2013 20142015 2016

79

22

59

56

41

50

66

83

25

65

67

43

46

68

76

33

55

48

33

58

67

83

47

53

45

29

68

64

2013 20142015 2016

100

22

83

89

78

78

94

94

56

74

75

63

74

88

100

81

94

81

81

87

94

84

61

78

76

59

67

84

2013 2014

2015 2016

153

Appendix T. Estimating the Initiative’s Impact on Student Outcomes: Data and Analytic Methods for Chapter Thirteen

In this appendix, we describe the data, the student outcomes, and the estimation method that we used in our evaluation of the initiative’s impact.

Data and Outcomes We used school-level data on student achievement and dropout rates from Florida,

Pennsylvania, Tennessee, and California to estimate the impact of the IP initiative.32 We also used data on graduation rates from Tennessee and California. The schools in the initiative sites formed the treatment group, and we used the remainder of schools in the four respective states as the comparison group. All analyses used only publicly available data aggregated to the school by grade by subject level or school by grade by subject by subgroup level. We obtained the data from state department of education websites or by making requests for such data to the state departments of education.33

Changes in SCS’s composition in the 2013–2014 school year and later introduced complications. The most significant change was that legacy MCS merged with legacy SCS just prior to the 2013–2014 school year.34 This means that, in 2013–2014, a single district included both the original schools from legacy MCS that were part of the IP initiative and all the schools that used to be part of legacy SCS, which did not receive the initiative.35 If we were to use all 2013–2014 SCS schools in the analysis, a significant portion of the schools in our treatment group would not actually have received the initiative. Therefore, before conducting the analysis, we removed all schools in the merged SCS that used to be part of legacy SCS rather than legacy MCS; we excluded these schools from the analysis for all years. They were in neither the treatment nor the control group.

Another challenge to the SCS analysis was that some schools from legacy MCS were subsequently transferred into a new state organization, the ASD, which either directly operates the schools or transfers their operation to other groups, including CMOs.36 These schools were

32 We could not obtain college-going rates. 33 We had to request data directly from the state departments of education of Pennsylvania and Tennessee. 34 Further changes to the district boundaries occurred the following year, with many of the suburbs of legacy SCS leaving the newly merged SCS district and creating their own districts. Because our analyses do not include the legacy SCS schools in the analysis, the departing schools’ leaving did not affect our estimates. 35 Schools that were part of legacy SCS did not receive funding from the IP initiative until after the merge. 36 ASD, undated, provides information and a list of schools.

154

subject to the initiative until they were transferred to the ASD but not after. To address the issue of partial exposure to the IP initiative, we excluded ASD schools from the comparison group in all years and included schools that were originally from legacy MCS in the analysis up to the year they transferred to the ASD.37

The main outcome of interest is the school-level average of student scale scores on the state assessments. Because the tests used in HS differ from those used in grades 3–8, we report results separately for grades 3–8 and for HSs, for mathematics and reading (English language arts), where data permit. We also report disaggregated results for each grade and subject. In the analysis, we standardized the scale scores by the within-state student mean and standard deviation (in that subject-grade-year). Therefore, we can interpret the estimates in effect-size units of the student-level test score distributions in each state.38

In addition to average overall scale scores for grades 3 through 8 and HS, we examined the initiative’s impact on exit exams (for the CMOs only) and on nontest outcomes, such as dropout rates, attendance (for legacy MCS only), graduation rates (for legacy MCS and the CMOs only), and University of California (UC) eligible (for the CMOs only).39 We also examined results for demographic subgroups. Specifically, we examined results for Hispanic, black, and economically disadvantaged groups, by grade and subject, when these subgroups made up a sufficient proportion of district population. We report these results in Appendix U. Table T.1 lists the outcomes and subgroups for each site.

37 However, in our analysis, we do include the Memphis Innovation Zone (i-Zone) schools. 38 To understand the effect-size concept, consider a simple example. Suppose that students take a test and that the scale score values for this test range from 100 to 500, with a mean of 300. Without further information, an estimated impact of, say, three points would be uninformative. To make sense of this finding, what is needed is information on how much variation there is in the scale score. The standard deviation of the test scale score is the usual way of measuring this variation. Frequently, test scale scores follow a bell-shaped distribution known as a normal distribution. In this common case, about two-thirds of students score within one standard deviation of the mean (300, in this example), and about 95 percent score within two standard deviations of the mean. The effect size is simply the change in the scale score (for example, three points) translated into standard deviation units. If the standard deviation were 10, the effect size would be 3 ÷ 10 = 0.3, indicating that the program increased test scores by 0.3 standard deviations. This would be a meaningful impact, which not many education interventions attain. In contrast, if the standard deviation were 100 and the difference in scale score were three points, the effect size would be a more modest 0.03. 39 UC eligible describes a student who graduates meeting the UC or California State University entrance requirements.

155

Table T.1. Summary of Data Elements

Data Element HCPS PPS SCS CMOs Test scores (unless otherwise noted) for grades 3 through 8

Math and reading Math and reading Math and reading (grades 3–7)

Math and reading (grades 3–7)

HS test scores Reading (grades 9–10)

Reading (grade 11) Nonea Reading (grade 11);; exit exams (reading and math)

Relevant subgroup test score

Black, Hispanic, low socioeconomic status

Black, low socioeconomic status

Noneb Black, Hispanic, low socioeconomic status

Nontest outcomes Dropout rates Dropout rates Graduation rates;; dropout rates;; promotion rates (K–8);; attendance rates (K–12)

Graduation rates;; dropout rates;; UC-­eligible rates

Covariates Ethnicity, ELLs, FRPL, average number of preinitiative students absent more than 21 days,c stability rate,c, d and proficiency levels 1, 2, 3, 4, and 5 for mathematics and readingc

Ethnicity;; FRPL;; average preinitiative percentages in proficient, advanced, basic, and below basic in mathematics and readingc

Ethnicity;; FRPL;; average preinitiative percentages in proficient, advanced, and below proficient in mathematics and readingc

Ethnicity;; FRPL;; ELLs;; average preinitiative percentages in far below, below, basic, proficient, and advanced in mathematics and readingc

a Tennessee administers several EOC exams in HS. However, these exams can be retaken throughout HS;; without being able to separate first-­time from retested students’ scores, we determined these test scores to be noisy signals of performance. As a result, we excluded these tests from our analysis. b The Tennessee Department of Education provided the overall test score information, not broken out by subgroup, to RAND. Thus, we could not complete these analyses by subgroup. c For the HCPS analysis, we used an average of all preinitiative years of the variable at the district level. d Stability rate indicates the percentage of students included on the October membership survey still present for the February membership survey.

Having preinitiative data was important to control for differences between schools and

students in treatment sites and those in the rest of the state. Thus, we collected and used three years of preinitiative data, from school years 2006–2007 through 2008–2009.40 In addition to preinitiative outcomes, we used other publicly available school-level covariates in the analysis. Table T.1 also lists these. In the rest of this appendix, we describe the DiD method we employed to study the effects of the IP initiative.

40 We truncated the preinitiative data at the 2006–2007 school year to avoid additional changes in tests and so that predictions from the model would better reflect recent trends and changes in states’ testing and school demographics. For example, PPS experienced a major change in demographics between 2006 and 2007 that led to a sharp decline in test scores compared with other schools in the state.

156

School-­Level Difference-­in-­Differences Methodology Estimating the initiative’s impact was difficult because the outcomes in the IP sites could

differ from those in non-IP sites for reasons other than the IP initiative itself, such as students in IP sites being from less affluent families than students in other sites. As shown in Table T.2, there are clear differences between the distributions of characteristics of students served by schools in the IP sites and those of students in other schools in the same state that are not in the IP sites. For example, the IP sites, except for HCPS, had much larger fractions of students from minority ethnicities and in poverty than the other districts in their states. To the extent that these differences drive differences in student outcomes, comparisons between the outcomes of students in schools in the IP sites and those in the non-IP sites will be misleading about the initiative’s impact.

157

Table T.2. Average Demographics in the IP Sites and in the Rest of Their States, as Proportions

Student Characteristic

School Year 2008–2009 School Year 2014–2015

Site Rest of State Site Rest of State HCPS and the rest of Florida

Black 0.22 0.23 0.21 0.23

Hispanic 0.28 0.25 0.35 0.31

Asian 0.03 0.02 0.04 0.03

Receiving FRPL 0.52 0.49 0.64 0.61

ELL 0.15 0.11 0.13 0.09

PPS and the rest of Pennsylvania

Black 0.56 0.14 0.52 0.14

Hispanic 0.01 0.07 0.02 0.09

Asian 0.02 0.03 0.04 0.04

Receiving FRPL 0.69 0.40 0.68 0.44

SCS and rest of Tennessee

Black 0.86 0.18 0.77 0.15

Hispanic 0.06 0.05 0.13 0.09

Asian 0.01 0.02 0.03 0.02

Receiving FRPL 0.79 0.52 0.85 0.56

CMOs and rest of California

Black 0.16 0.08 0.09 0.06

Hispanic 0.78 0.50 0.86 0.55

Asian 0.01 0.09 0.02 0.09

Receiving FRPL 0.84 0.53 0.87 0.59

ELL 0.22 0.21 0.16 0.19

SOURCES: Public data published by the respective states’ departments of education. NOTE: We calculated demographic variables at the school level by dividing the number of students from a certain category by the total number of students in the school and then averaging across schools in the IP site and in the rest of the state based on student enrollment by school.

To disentangle the initiative’s effects from the effects of student characteristics and other

district-specific factors, we employed a DiD approach using school-level data. This approach involved two steps. The first step used data on school-level outcomes and on demographic characteristics (at the school and district levels) in the preinitiative years to forecast what school outcomes were likely to be in the postinitiative years, taking into account any changes in demographic characteristics (at the school and district levels).41 In the second step, we examined

41 In previous interim analyses, we have also tried first trimming the sample of non-IP schools to keep only those schools that are more similar in terms of demographics to the schools in the IP site. After selecting these schools, we followed the two estimation steps described here. The estimation results with the trimmed sample were very similar

158

whether differences between the actual outcomes and the forecasted outcomes systematically differ also between schools in an IP site and those in the same state’s non-IP districts. This DiD can be interpreted as the gap between the performance of schools in IP sites and non-IP schools, net the difference that would be expected given the preinitiative outcome patterns and differences in demographics.

The hypothetical example in Figure T.1 depicts how the first step in this procedure works. Figure T.1 shows the relationship between some school-level outcome (average scale scores, in this example) and time. Data points to the left of the red dashed line are from years prior to the IP initiative, and data points to the right are from years after the IP initiative went into place. In this example, there are very large differences in the preinitiative years between the treated and comparison schools, shown by the difference in the heights of the lines along the vertical axis.

Figure T.1. Graphical Depiction of Methodology for Computing Forecasts of Postinitiative Trends

NOTE: Data points to the left of the red dashed line are from years prior to the IP initiative, and data points to the right are from years after the IP initiative went into place. In this example, there are very large differences in the preinitiative years between the treated and comparison schools, shown by the difference in the heights of the lines along the vertical axis. The deviations from this forecast for the first two years after the initiative equal the vertical distance from the forecasted outcome (i.e., the dashed line) and the actual outcome. For the comparison group, these deviations are dc1 and dc2, respectively;; for the treatment group, they are dt1 and dt2, respectively. In this example, the comparison group does a little better than predicted in one year and a little worse in the other. In contrast, the difference between the actual and predicted treatment-­group performance is large and positive in both years.

to the results obtained using the full sample of schools in the state (which we present in this report). This suggests that a linear specification does a relatively good job of controlling for differences in observed characteristics.

159

To account for the differences between the treated and nontreated schools, our method used data from before the start of the initiative to form a prediction of what the counterfactual outcomes would be in the postinitiative world. We based this forecast on a statistical model that uses preinitiative data to estimate linear predictions for postinitiative years. We then used the predictions to determine what the outcomes likely would have been had school and district demographics continued to have the same effect on outcomes as they did before the initiative. We depict the preinitiative data graphically as squares (for the control group) and circles (for the treatment group) to the left of the dashed red line in Figure T.1. The solid lines represent the fit of the statistical model, and the dashed lines depict the forecasts of the model.

We then computed the difference between what the forecasting model predicted that the outcome would be and the actual outcome. In Figure T.1, these deviations from this forecast for the first two years after the initiative equal the vertical distance from the forecasted outcome (i.e., the dashed line) and the actual outcome. For the comparison group, these deviations are dc1 and dc2, respectively; for the treatment group, they are dt1 and dt2, respectively. In this example, the comparison group does a little better than predicted in one year and a little worse in the other. In contrast, the difference between the actual and predicted treatment-group performance is large and positive in both years.

The second step of our method consisted of estimating whether these differences systematically and statistically differed between schools in IP districts and those in the comparison non-IP districts. This difference in prediction differences (or prediction errors) provided our DiD estimation of the IP initiative’s impact. It can be interpreted as the difference in performance between schools in treated districts and other schools in the state, after netting out the difference that would be expected, given preinitiative outcome patterns and school and district demographics.

Estimation Models To implement this two-step DiD analysis, we used a multivariate regression procedure. In the

first step of the method, we developed a forecasting model that used preinitiative data to predict the outcomes in postinitiative years under the counterfactual assumption that the initiative had not happened. The prediction model accounts for separate intercepts for each district and for differences in school and district demographics. In our analysis, we grouped the CMO schools and treated them as if they were a separate district. As mentioned in Chapter Thirteen, we also conducted separate analyses for Aspire in grades 3 through 8 and for Green Dot in grade 11.

The equation we estimated is given by42

(T.1)

42 We did not weight these models by school size, so we weighted each school equally in the analysis.

Ysdt =α d + βX X sdt + βX X dt + ∈sdt ,

160

where Ysdt is the outcome for school s in district d by year t (the outcomes can pertain to a specific grade or student subgroup); αd denotes district-specific intercepts; Xsdt denotes the school demographic characteristics each year, including ethnicity composition and percentage of students in FRPL plans. It also includes some time-invariant characteristics, such as average preinitiative proficiency levels in mathematics and reading (i.e., in school years 2006–2007 through 2008–2009). For a list of specific covariates, see Table T.1. contains the time-varying variables in Xsdt but aggregated at the district level (we did not include time-constant district-level variables because they were perfectly collinear with the district-specific intercept).

Equation T.1 does not have an overall time trend. This is because the standardized test scores (our main outcome of interests) are standardized by the within-state student mean and standard deviation (the example in Figure T.1 shows raw scale scores instead of standardized scores). Apart from the overall time trend, an extension of our model would be to use linear district-specific time trends to predict postinitiative counterfactual outcomes. However, we found that, as we predict several years into the future, maintaining the trends from before the initiative leads to large prediction errors and imprecise estimates of the initiative’s impact. Thus, in this report, we do not include linear district-specific trends in our model.43

We estimated the model in Equation T.1 using only information from school years 2006–2007 through 2008–2009. We used the estimated model to form a forecast of the outcome for each school in the postinitiative period. We then computed the difference between each school’s forecast and actual value. This difference reflects how the school’s outcome differed from what was expected, based on the school’s and district’s characteristics. This approach implies that we included in our analysis only schools that were open by 2008–2009. In this analysis, we did not include schools that opened after this period. The only exception is our analysis of HSs in PPS, as explained in Chapter Thirteen.44 We also kept in our analyses schools that closed after 2008–2009 until the year of closure.

43 We examined the error of the predicted outcomes for schools in the comparison group. We used the comparison group; because it was not exposed to the initiative, we expected that past trends at the district level would be a good predictor of future outcomes. This can be tested in the data. We found that the model without trends delivered smaller prediction errors, as measured by the root-mean-squared error. 44 For the analysis of the initiative impact on HS (grade 11) reading in PPS, we also included HSs that opened after 2008–2009 (or that merged with other schools, which we treated as new schools in the data). We believe that this better captures the initiative’s district-wide effects because of the high rate of closures or mergers of HSs in PPS, a phenomenon that we do not see in the other IP sites. To include schools that opened or merged after 2008–2009 in our analysis, we had to exclude the average preinitiative (i.e., 2006–2007 to 2008–2009) proficiency levels in mathematics and reading from the list of controls in Xsdt (and in in Equation T.1 because these values are undefined for schools that opened after 2008–2009.

Xdt

Xdt )

161

The second step in the analysis examined whether the differences between the forecast and actual values were systematically different in the IP districts and the comparison districts. We estimated the following regression:

(T.2) The variable difsdt denotes the difference between the forecast and actual values of the

outcome. The vectors Xsdt and are the same vectors of school-level and district-level

demographics as in Equation T.1 (excluding time-invariant variables). The variable treatmentd is an indicator variable that equals 1 if the schools were in the initiative district.

The coefficient of interest is which captures the difference in the prediction error (the

DiD) between schools in the initiative and comparison districts in year t. We allowed to vary

with time because it is plausible that the IP initiative will take time to generate effects because the reforms it entails require several years to implement. In practice, we estimated Equation T.2 separately for every year in the data (before and after the initiative).

Also, we control for demographic factors both in Equation T.1, the forecasting model, and in Equation T.2, the model that explains the difference between the forecast and actual values. The reasoning for following this approach is that, in Equation T.1, we assumed the effects of demographic factors to be constant over time. In other words, we assumed that the influence that different factors (such as the ethnicity composition) have on the achievement outcome did not vary over time. In reality, however, this might not be true, and it adds to the prediction error. We acknowledged this by adding demographic factors to Equation T.2 and letting them have differential impacts in every year (because we estimated separate regressions for Equation T.2 for each time period). The key assumption behind this approach was that changes in demographics and in their impacts on outcomes are independent or unrelated to the IP initiative.

A significant empirical challenge was to determine whether the usual variability in outcomes that occurred across districts could explain the initiative’s estimated impacts. The district-by-time–level random-effect component included in the analysis addresses this problem. We

assumed that the common shocks to schools’ performances in a district in a given year, which would occur regardless of the IP initiative, followed a normal distribution (i.e.,

Adding this district–year random-effect component to the model allowed us to measure the natural variability in outcomes across districts. This allowed us to judge whether the initiative’s estimated impacts were large enough in comparison to the expected variation in the absence of any initiative.

dif sdt = γ +ηttreatmentd +θt ,X X sdt +θt ,X X dt + µdt + vsdt .

Xdt

ηt ,

ηt

(µdt )

µdt ∼ N 0,\σ 2⎡⎣ ⎤⎦).

163

Appendix U. Additional Impact Estimates for Chapter Thirteen

This appendix contains the results for additional outcomes, including test scores for student subgroups (specifically, by demographic characteristic and grade) and indicators of HS persistence (dropout and graduation rates). For each outcome, we report the estimated impact effect for a given postinitiative year in the first row and its p-value in the second row in brackets.

164

Table U.1. HCPS Impact Estimates, by Grade, Subgroup, and Year

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 3 Math All DiD –0.014 0.016 –0.038 –0.029 0.024 –0.051

p-­value [0.449] [0.536] [0.293] [0.446] [0.618] [0.233]

Black DiD –0.035 –0.001 –0.011 –0.073 –0.014 –0.068

p-­value [0.153] [0.979] [0.758] [0.106] [0.808] [0.324]

FRPL DiD 0.001 0.009 –0.039 –0.023 0.031 –0.004

p-­value [0.972] [0.764] [0.423] [0.594] [0.571] [0.935]

Hispanic DiD –0.04 –0.007 –0.034 –0.097* –0.013 –0.086

p-­value [0.109] [0.851] [0.504] [0.073] [0.836] [0.144]

Reading All DiD 0.02** 0.057*** 0.029** –0.013 0.076*** 0.007

p-­value [0.03] [0.001] [0.011] [0.562] [0.001] [0.791]

Black DiD 0.022 –0.005 0.044*** –0.015 0.065** –0.023

p-­value [0.105] [0.73] [0.007] [0.453] [0.011] [0.595]

FRPL DiD 0.014 0.049*** 0.03** –0.006 0.076*** 0.016

p-­value [0.09] [0.001] [0.009] [0.735] [0.001] [0.689]

Hispanic DiD –0.03 0.102*** 0.038 –0.009 0.098*** –0.024

p-­value [0.14] [0.001] [0.201] [0.694] [0.004] [0.542]

4 Math All DiD 0.027 –0.007 –0.074*** –0.058** –0.061* –0.065*

p-­value [0.106] [0.741] [0.006] [0.043] [0.064] [0.057]

Black DiD 0.033* 0.026 –0.023 –0.036 –0.013 –0.004

p-­value [0.07] [0.382] [0.481] [0.232] [0.744] [0.931]

FRPL DiD 0.074*** 0.019 –0.078*** –0.042 –0.042 –0.052

P-­value [0.001] [0.368] [0.008] [0.181] [0.247] [0.151]

Hispanic DiD –0.006 –0.031* –0.097*** –0.065** –0.138*** –0.063*

p-­value [0.774] [0.091] [0.001] [0.035] [0.001] [0.081]

Reading All DiD –0.017*** 0.058*** 0.035*** 0.017 0.033*** –0.01

165

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.001] [0.001] [0.001] [0.281] [0.006] [0.698]

Black DiD 0.002 0.067*** 0.048 0.011 0.103*** –0.005

p-­value [0.934] [0.001] [0.099] [0.569] [0.001] [0.878]

FRPL DiD 0.011 0.065*** 0.025* 0.023 0.061*** 0.001

p-­value [0.088] [0.001] [0.068] [0.17] [0.001] [0.975]

Hispanic DiD –0.063*** 0.027* 0.045*** 0.008 –0.007 –0.012

p-­value [0.005] [0.056] [0.001] [0.701] [0.651] [0.756]

5 Math All DiD –0.003 –0.012 –0.046** –0.005 –0.035 –0.052*

p-­value [0.722] [0.63] [0.019] [0.856] [0.311] [0.064]

Black DiD –0.008 0.068* –0.02 0.065* 0.015 –0.004

p-­value [0.45] [0.085] [0.474] [0.052] [0.721] [0.889]

FRPL DiD –0.029*** 0.007 –0.043** –0.01 –0.023 –0.025

p-­value [0.001] [0.75] [0.024] [0.722] [0.514] [0.367]

Hispanic DiD 0.003 –0.009 –0.056** –0.033 –0.058 –0.087*

p-­value [0.845] [0.755] [0.015] [0.224] [0.139] [0.049]

Reading All DiD 0.01 0.008 –0.012 0.002 0.06*** 0.024

p-­value [0.191] [0.468] [0.31] [0.89] [0.002] [0.222]

Black DiD –0.019 0.029 0.009 0.007 0.065** 0.033

p-­value [0.104] [0.268] [0.504] [0.809] [0.021] [0.244]

FRPL DiD –0.003 0.02 –0.004 –0.019 0.092*** 0.038*

p-­value [0.698] [0.101] [0.75] [0.174] [0.001] [0.065]

Hispanic DiD –0.019 0.001 –0.026 –0.031 0.042 –0.004

p-­value [0.248] [0.983] [0.24] [0.495] [0.186] [0.943]

6 Math All DiD –0.028*** 0.02 –0.104*** –0.073*** –0.073*** –0.075*

p-­value [0.001] [0.288] [0.001] [0.001] [0.002] [0.084]

Black DiD –0.012 0.01 –0.099*** –0.079** –0.074 –0.035

p-­value [0.309] [0.588] [0.004] [0.035] [0.125] [0.429]

166

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 FRPL DiD –0.016* 0.001 –0.066*** –0.099*** –0.032 –0.05

p-­value [0.074] [0.96] [0.001] [0.001] [0.338] [0.21]

Hispanic DiD 0.024 0.077*** –0.019 –0.055 –0.022 0.027

p-­value [0.268] [0.001] [0.608] [0.12] [0.558] [0.608]

Reading All DiD –0.023*** –0.038*** –0.105*** –0.085*** –0.115*** –0.087***

p-­value [0.001] [0.006] [0.001] [0.001] [0.001] [0.001]

Black DiD 0 –0.05*** –0.09*** –0.1*** –0.121*** –0.089***

p-­value [0.98] [0.001] [0.001] [0.001] [0.001] [0.001]

FRPL DiD –0.001 –0.034* –0.088*** –0.08*** –0.081*** –0.067***

p-­value [0.926] [0.056] [0.001] [0.001] [0.001] [0.001]

Hispanic DiD 0.021 0.044** –0.015 –0.023 –0.053*** 0.028

p-­value [0.229] [0.025] [0.498] [0.402] [0.004] [0.298]

7 Math All DiD –0.041*** –0.051*** –0.042** –0.03 –0.002 0.107**

p-­value [0.001] [0.001] [0.045] [0.325] [0.96] [0.021]

Black DiD –0.034 –0.033 –0.103*** –0.167*** –0.099** 0.093**

p-­value [0.155] [0.324] [0.001] [0.001] [0.01] [0.044]

FRPL DiD –0.016 –0.023 –0.034* –0.022 –0.002 0.111**

p-­value [0.289] [0.131] [0.078] [0.47] [0.968] [0.016]

Hispanic DiD –0.058*** –0.005 –0.048** –0.061 0 0.16***

p-­value [0.003] [0.876] [0.034] [0.169] [0.994] [0.001]

Reading All DiD –0.019** –0.06*** –0.093*** –0.124*** –0.083*** 0.068***

p-­value [0.039] [0.001] [0.001] [0.001] [0.001] [0.006]

Black DiD 0.003 –0.031 –0.143*** –0.164*** –0.14*** 0.047

p-­value [0.908] [0.33] [0.001] [0.001] [0.001] [0.202]

FRPL DiD 0 –0.035*** –0.082*** –0.087*** –0.076*** 0.067**

p-­value [0.976] [0.001] [0.001] [0.003] [0.009] [0.015]

Hispanic DiD –0.004 –0.041*** –0.104*** –0.142*** –0.11*** 0.119***

167

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.83] [0.002] [0.001] [0.001] [0.002] [0.001]

8 Math All DiD 0.007 –0.009 –0.034** 0.115* 0.163*** –0.27***

p-­value [0.468] [0.259] [0.047] [0.065] [0.002] [0.001]

Black DiD 0.003 –0.061*** –0.042* –0.001 0.055 –0.245***

p-­value [0.84] [0.009] [0.086] [0.982] [0.135] [0.001]

FRPL DiD –0.03*** –0.026** –0.069*** 0.043 0.115** –0.233***

p-­value [0.004] [0.019] [0.001] [0.392] [0.012] [0.001]

Hispanic DiD –0.016 –0.008 –0.014 0.1 0.183** –0.288***

p-­value [0.206] [0.83] [0.691] [0.139] [0.018] [0.001]

Reading All DiD 0.018** –0.072*** –0.099*** –0.083*** –0.086*** –0.067

p-­value [0.032] [0.001] [0.001] [0.001] [0.001] [0.19]

Black DiD –0.005 –0.064 –0.088*** –0.087*** –0.018 –0.133**

p-­value [0.756] [0.13] [0.001] [0.005] [0.522] [0.037]

FRPL DiD –0.009 –0.096*** –0.119*** –0.119*** –0.094*** –0.139***

p-­value [0.459] [0.001] [0.001] [0.001] [0.001] [0.001]

Hispanic DiD 0 –0.045 –0.095*** –0.107** –0.068 –0.107*

p-­value [0.985] [0.332] [0.002] [0.03] [0.11] [0.075]

3–8 Math All DiD –0.002 –0.004 –0.052** –0.017 0.002 –0.046

p-­value [0.84] [0.841] [0.03] [0.58] [0.962] [0.146]

Black DiD 0 0.003 –0.056** –0.018 0.02 –0.031

p-­value [0.989] [0.876] [0.019] [0.475] [0.588] [0.191]

FRPL DiD 0.004 –0.005 –0.058** –0.024 0.004 –0.026

p-­value [0.597] [0.787] [0.02] [0.418] [0.912] [0.447]

Hispanic DiD –0.021 –0.01 –0.044* –0.047 –0.018 –0.061*

p-­value [0.101] [0.6] [0.081] [0.126] [0.642] [0.074]

Reading All DiD 0.003 0.013 –0.012 –0.025 0.026 0.001

p-­value [0.477] [0.161] [0.255] [0.106] [0.102] [0.948]

168

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 Black DiD 0.022* –0.001 –0.028*** –0.014 0.037** –0.016

p-­value [0.066] [0.948] [0.001] [0.305] [0.032] [0.451]

FRPL DiD 0.006 0.009 –0.011 –0.019 0.042* 0.011

p-­value [0.107] [0.376] [0.394] [0.268] [0.054] [0.642]

Hispanic DiD –0.015 0.048** 0.013 –0.018 0.036 –0.004

p-­value [0.194] [0.025] [0.534] [0.502] [0.161] [0.877]

9 Reading All DiD –0.027** –0.004 –0.106*** –0.127*** –0.132*** 0.027

p-­value [0.031] [0.7] [0.001] [0.001] [0.001] [0.278]

Black DiD 0.001 –0.054*** –0.093** –0.087** –0.211*** –0.106

p-­value [0.974] [0.002] [0.01] [0.044] [0.001] [0.162]

FRPL DiD –0.003 –0.011 –0.07*** –0.122*** –0.151*** –0.083

p-­value [0.812] [0.499] [0.007] [0.001] [0.001] [0.132]

Hispanic DiD –0.031 –0.006 –0.087*** –0.096*** –0.14*** –0.031

p-­value [0.124] [0.794] [0.001] [0.001] [0.001] [0.488]

10 Reading All DiD –0.077*** –0.173*** –0.124*** –0.163*** –0.104*** –0.071**

p-­value [0.001] [0.001] [0.001] [0.001] [0.001] [0.011]

Black DiD –0.087*** –0.184*** –0.134*** –0.222*** –0.133*** –0.176***

p-­value [0.001] [0.001] [0.001] [0.001] [0.001] [0.005]

FRPL DiD –0.052*** –0.189*** –0.162*** –0.181*** –0.151*** –0.133**

p-­value [0.002] [0.001] [0.001] [0.001] [0.001] [0.031]

Hispanic DiD –0.051*** –0.132*** –0.101*** –0.162*** –0.073* –0.127**

p-­value [0.002] [0.001] [0.001] [0.001] [0.084] [0.032]

HS Reading All DiD –0.034*** –0.063*** –0.11*** –0.126*** –0.078*** 0

p-­value [0.001] [0.001] [0.001] [0.001] [0.004] [0.986]

Black DiD –0.041*** –0.119*** –0.098*** –0.139*** –0.129*** –0.07

p-­value [0.001] [0.001] [0.001] [0.001] [0.002] [0.153]

FRPL DiD –0.018 –0.065*** –0.106*** –0.159*** –0.136*** –0.066*

169

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.111] [0.001] [0.001] [0.001] [0.001] [0.056]

Hispanic DiD –0.013 –0.032* –0.06*** –0.087*** –0.08*** –0.056

p-­value [0.424] [0.058] [0.003] [0.001] [0.001] [0.202]

HS Dropout rate, as a percentage

All DiD 0.03 –2.1 –0.2 1.55*** 0.48 0.12

p-­value [0.86] [0.295] [0.576] [0.001] [0.366] [0.798]

NOTE: DiD represents the DiD estimate of the initiative’s effect on each outcome in each year. For more details on the DiD methodology, see Gutierrez, Weinberger, and Engberg, 2016. A difference value of 0.000 indicates a difference of less than 0.0005. HS tests indicates the average for tests taken in grade 9 or 10. For the graduation and dropout rates, we used a logit model to estimate the predicted trends to take into account the bounded range of these estimates. In 2011, Florida started to implement the FCAT 2.0. The FCAT 2.0 does not administer the mathematics exam in grade 9 or 10. *** = significant at p < 0.01. ** = significant at p < 0.05. * = significant at p < 0.10.

Table U.2. PPS Impact Estimates, by Grade, Subgroup, and Year

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 3 Math All DiD –0.07*** –0.067* –0.095 0.112*** 0.137*** 0.048

p-­value [0.001] [0.096] [0.137] [0.006] [0.001] [0.234]

Black DiD –0.071*** 0.024 –0.078* 0.14*** 0.149*** –0.008

p-­value [0.001] [0.692] [0.095] [0.004] [0.001] [0.854]

FRPL DiD –0.043*** –0.032 –0.108** 0.166*** 0.182*** 0.072*

p-­value [0.005] [0.478] [0.031] [0.001] [0.001] [0.065]

Reading All DiD –0.066*** –0.115*** –0.092* –0.075** –0.009 –0.021

p-­value [0.001] [0.001] [0.063] [0.02] [0.773] [0.605]

Black DiD –0.073*** –0.013 –0.074** 0.005 0.033 –0.078**

p-­value [0.001] [0.777] [0.029] [0.87] [0.326] [0.04]

FRPL DiD –0.019 0.013 –0.096** 0.007 0.072** 0.045

p-­value [0.132] [0.744] [0.019] [0.84] [0.032] [0.251]

4 Math All DiD –0.072*** –0.031 –0.066* –0.072* 0.114*** 0.029

p-­value [0.001] [0.326] [0.086] [0.087] [0.001] [0.335]

170

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 Black DiD –0.103*** 0.033 0.028 0.016 0.289*** 0.065

p-­value [0.001] [0.423] [0.546] [0.74] [0.001] [0.1]

FRPL DiD –0.046*** –0.019 –0.096** –0.043 0.157*** 0.065*

p-­value [0.001] [0.563] [0.029] [0.375] [0.001] [0.096]

Reading All DiD –0.051*** 0.038 0.038 –0.055* 0.04 0.058*

p-­value [0.001] [0.111] [0.342] [0.094] [0.194] [0.09]

Black DiD –0.089*** 0.096** 0.174*** 0.004 0.156*** 0.023

p-­value [0.001] [0.017] [0.001] [0.915] [0.001] [0.519]

FRPL DiD –0.038*** 0.072*** –0.043 –0.031 0.065 0.083**

p-­value [0.001] [0.006] [0.241] [0.39] [0.114] [0.028]

5 Math All DiD –0.003 –0.005 0.011 –0.024 –0.002 –0.051

p-­value [0.738] [0.867] [0.828] [0.578] [0.962] [0.214]

Black DiD –0.033* 0.01 0.048 0.035 0.123*** –0.017

p-­value [0.059] [0.808] [0.437] [0.423] [0.001] [0.668]

FRPL DiD –0.003 0.024 –0.15*** –0.008 0.01 –0.021

p-­value [0.845] [0.457] [0.001] [0.854] [0.72] [0.562]

Reading All DiD 0.035*** 0.106*** 0.146*** –0.055 0.005 –0.035

p-­value [0.001] [0.001] [0.005] [0.183] [0.893] [0.326]

Black DiD 0.015 0.117*** 0.215*** –0.007 0.116*** –0.013

p-­value [0.465] [0.007] [0.001] [0.853] [0.004] [0.67]

FRPL DiD 0.024** 0.14*** –0.028 –0.015 0.019 0.042

p-­value [0.012] [0.001] [0.506] [0.707] [0.643] [0.382]

6 Math All DiD –0.113*** –0.139*** –0.083** –0.158*** –0.032 –0.163***

p-­value [0.001] [0.001] [0.044] [0.002] [0.527] [0.001]

Black DiD –0.095*** –0.114*** 0.018 –0.14*** 0.073 –0.099***

p-­value [0.001] [0.004] [0.628] [0.001] [0.135] [0.002]

FRPL DiD –0.066*** –0.063** 0.007 –0.086* 0.037 –0.115**

171

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.001] [0.021] [0.901] [0.078] [0.41] [0.027]

Reading All DiD –0.049*** 0.034 0.082*** –0.011 –0.075*** –0.055**

p-­value [0.001] [0.119] [0.004] [0.715] [0.009] [0.049]

Black DiD –0.054*** 0.046 0.181*** –0.074*** 0.066* –0.009

p-­value [0.001] [0.215] [0.001] [0.006] [0.056] [0.782]

FRPL DiD –0.018 0.072*** 0.109*** 0.009 –0.013 –0.014

p-­value [0.174] [0.002] [0.001] [0.819] [0.622] [0.676]

7 Math All DiD 0.032** –0.004 –0.11** –0.086*** 0.119*** –0.012

p-­value [0.027] [0.822] [0.021] [0.003] [0.002] [0.778]

Black DiD 0.051*** –0.067* –0.072 –0.094** 0.108** 0.055*

p-­value [0.002] [0.053] [0.168] [0.015] [0.014] [0.089]

FRPL DiD 0.039** 0.016 0.021 –0.038 0.189*** 0.051

p-­value [0.016] [0.412] [0.777] [0.215] [0.001] [0.257]

Reading All DiD 0.026** 0.023 0.081** –0.109*** 0.048* –0.047

p-­value [0.041] [0.13] [0.043] [0.001] [0.054] [0.269]

Black DiD –0.002 –0.028 0.152*** –0.071** 0.072* 0.085***

p-­value [0.913] [0.264] [0.001] [0.047] [0.051] [0.009]

FRPL DiD 0.014 0.054*** 0.083** –0.045*** 0.141*** 0.043

p-­value [0.272] [0.001] [0.014] [0.002] [0.001] [0.253]

8 Math All DiD 0.001 –0.001 0.023 –0.133*** 0.05* 0.029

p-­value [0.883] [0.955] [0.385] [0.001] [0.052] [0.442]

Black DiD 0.002 0.005 –0.019 –0.123** 0.082** 0.019

p-­value [0.85] [0.888] [0.693] [0.029] [0.035] [0.629]

FRPL DiD –0.001 0.038 –0.042 –0.027 0.149*** 0.063

p-­value [0.93] [0.219] [0.584] [0.378] [0.001] [0.104]

Reading All DiD 0.027*** –0.014 0.068*** –0.083*** –0.03 –0.005

p-­value [0.003] [0.431] [0.005] [0.001] [0.183] [0.875]

172

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 Black DiD 0.004 –0.013 0.012 –0.08 0.005 –0.029

p-­value [0.835] [0.709] [0.755] [0.101] [0.897] [0.541]

FRPL DiD –0.006 –0.009 –0.012 –0.017 0.057*** 0.024

p-­value [0.661] [0.711] [0.711] [0.262] [0.002] [0.481]

3–8 Math All DiD –0.036*** 0.011 –0.02 –0.009 0.104*** 0.006

p-­value [0.001] [0.631] [0.455] [0.774] [0.001] [0.83]

Black DiD –0.06*** –0.022 –0.018 –0.009 0.147*** 0.016

p-­value [0.001] [0.454] [0.563] [0.786] [0.001] [0.545]

FRPL DiD –0.017*** 0.022 –0.077* 0.005 0.113*** 0.02

p-­value [0.094] [0.319] [0.083] [0.874] [0.001] [0.446]

Reading All DiD –0.025*** 0.035 0.046* –0.04 0.021 –0.011

p-­value [0.001] [0.136] [0.084] [0.107] [0.391] [0.654]

Black DiD –0.04*** 0.028 0.065** –0.033 0.076*** 0.007

p-­value [0.001] [0.262] [0.018] [0.161] [0.005] [0.755]

FRPL DiD –0.004** 0.055** –0.022 –0.021 0.031 0.005

p-­value [0.702] [0.023] [0.542] [0.375] [0.219] [0.853]

11 (previous model)

Reading All DiD –0.029** –0.007 –0.055** –0.092*** –0.076** –0.082**

p-­value [0.016] [0.718] [0.039] [0.001] [0.023] [0.017]

Black DiD –0.071*** 0.021 0.075* –0.033 0.152*** 0.089**

p-­value [0.002] [0.449] [0.052] [0.557] [0.006] [0.032]

FRPL DiD –0.016 0.006 –0.065* –0.05 0.104** 0.004

p-­value [0.277] [0.801] [0.068] [0.204] [0.038] [0.915]

11 (new model) Reading All DiD –0.133*** 0.005 0.072*** 0.097*** 0.059 0.091**

p-­value [0.001] [0.881] [0.008] [0.004] [0.223] [0.019]

Black DiD –0.15*** 0.011 0.15*** 0.106** 0.147*** 0.133***

p-­value [0.001] [0.776] [0.001] [0.02] [0.003] [0.001]

FRPL DiD –0.095*** 0.009 –0.052* 0.042 0.053 0.098**

173

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.001] [0.805] [0.059] [0.355] [0.331] [0.012]

HS Dropout rate, as a percentage

All DiD –1.31*** 0.64 –3.52*** –1.54*** –0.92 –1.68*

p-­value [0.008] [0.435] [0.001] [0.008] [0.147] [0.051]

NOTE: DiD represents the DiD estimate of the initiative’s effect on each outcome in each year. For more details on the DiD methodology, see Gutierrez, Weinberger, and Engberg, 2016. A difference value of 0.000 indicates a difference of less than 0.0005. The test scores for grade 11 apply to the Keystone Exam, which is an EOC exam that tests for specific subjects (algebra 1 for math and literature for reading). The Keystone Exam for math is less standardized across schools, so we did not include it in the analysis. For the dropout rates, we used a logit model to estimate the predicted trends to take into account the bounded range of these estimates. *** = significant at p < 0.01. ** = significant at p < 0.05. * = significant at p < 0.10.

Table U.3. SCS Impact Estimates, by Grade, Subgroup, and Year

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 3 Math All DiD –0.277*** –0.162 –0.129 –0.121* –0.033 –0.098

p-­value [0.002] [0.14] [0.165] [0.068] [0.749] [0.223]

Reading All DiD –0.166** –0.212** –0.145** –0.121** 0.026 –0.004

p-­value [0.026] [0.043] [0.04] [0.012] [0.773] [0.934]

4 Math All DiD –0.029 –0.165* –0.083 –0.022 –0.124 –0.091

p-­value [0.728] [0.071] [0.517] [0.894] [0.288] [0.363]

Reading All DiD –0.054 –0.103 –0.119 –0.162*** –0.114*** –0.065

p-­value [0.208] [0.113] [0.178] [0.005] [0.007] [0.196]

5 Math All DiD –0.166 –0.212** –0.322** –0.256 –0.043 –0.139

p-­value [0.103] [0.042] [0.020] [0.110] [0.837] [0.389]

Reading All DiD –0.132*** –0.184*** –0.085 –0.201** 0.037 –0.065

p-­value [0.001] [0.004] [0.327] [0.036] [0.777] [0.22]

6 Math All DiD –0.194*** –0.306*** –0.3*** –0.143 –0.299** –0.176

p-­value [0.001] [0.001] [0.001] [0.172] [0.034] [0.13]

Reading All DiD –0.089*** –0.192*** –0.192*** –0.105** –0.215*** –0.099

p-­value [0.002] [0.001] [0.007] [0.029] [0.003] [0.102]

174

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 7 Math All DiD –0.14*** –0.27*** –0.224** –0.097 –0.269** –0.131

p-­value [0.004] [0.001] [0.018] [0.353] [0.023] [0.237]

Reading All DiD –0.11** –0.243*** –0.16*** –0.153** –0.028 –0.135**

p-­value [0.035] [0.001] [0.006] [0.024] [0.54] [0.036]

8 Math All DiD –0.098 –0.238*** –0.288*** –0.22** –0.224 –0.173

p-­value [0.118] [0.001] [0.001] [0.046] [0.109] [0.181]

Reading All DiD –0.012 –0.159*** –0.08 –0.097 –0.078 –0.032

p-­value [0.708] [0.001] [0.197] [0.193] [0.182] [0.506]

3–8 Math All DiD –0.146*** –0.173*** –0.204*** –0.155* –0.174* –0.14

p-­value [0.002] [0.001] [0.003] [0.05] [0.093] [0.071]

Reading All DiD –0.103*** –0.133*** –0.132*** –0.144*** –0.011 –0.022

p-­value [0.001] [0.002] [0.008] [0.001] [0.792] [0.531]

Elementary school

Attendance, as a percentage

All DiD –0.29 –0.81*** –0.73** –1.27** –1.3*** –0.41

p-­value [0.188] [0.001] [0.011] [0.02] [0.005] [0.282]

Promotion, as a percentage

All DiD 1.02 –1.72*** –1.8*** 5.42 –1.85 –2.98

p-­value [0.123] [0.009] [0.001] [0.448] [0.856] [0.287]

HS Dropout rate, as a percentage

All DiD 0.29 –3.61** 4.48** –0.06 2.35 5.9

p-­value [0.88] [0.042] [0.038] [0.966] [0.532] [0.108]

Graduation rate, as a percentage

All DiD –9.27** –0.71 –6.26*** –11.45*** –14.01*** –15.66***

p-­value [0.016] [0.781] [0.007] [0.001] [0.001] [0.001]

Attendance, as a percentage

All DiD 1.27 0.66 1.1* 0.49 3.96*** 3.32**

p-­value [0.106] [0.381] [0.062] [0.528] [0.001] [0.011]

NOTE: DiD represents the DiD estimate of the initiative’s effect on each outcome in each year. For more details on the DiD methodology, see Gutierrez, Weinberger, and Engberg, 2016. A difference value of 0.000 indicates a difference of less than 0.0005. For the graduation, dropout, attendance, and promotion rates, we used a logit model to estimate the predicted trends to take into account the bounded range of these estimates. We could not calculate the initiative’s impact for black students or for other subgroups (e.g., low-­income students) because Tennessee does not provide data on average performance by subgroup in each school, grade, and subject. *** = significant at p < 0.01. ** = significant at p < 0.05. * = significant at p < 0.10.

175

Table U.4. CMOs’ Combined Impact Estimates, by Grade, Subgroup, and Year

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 3 Math All DiD –0.003 –0.067*** –0.01 0.059** N/A –0.109***

p-­value [0.841] [0.001] [0.623] [0.019] N/A [0.001]

Black DiD 0.006 –0.103*** –0.048 –0.039 N/A –0.221***

p-­value [0.82] [0.004] [0.327] [0.47] N/A [0.001]

FRPL DiD 0.002 –0.072*** –0.011 0.01 N/A –0.145***

p-­value [0.869] [0.001] [0.632] [0.709] N/A [0.001]

Hispanic DiD –0.065*** –0.061*** –0.089*** 0.017 N/A –0.118***

p-­value [0.001] [0.001] [0.001] [0.513] N/A [0.001]

Reading All DiD 0.109*** 0.168*** 0.175*** 0.109*** N/A –0.012

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.406]

Black DiD 0.148*** 0.007 0.213*** 0.093* N/A –0.202***

p-­value [0.001] [0.795] [0.001] [0.056] N/A [0.001]

FRPL DiD 0.138*** 0.15*** 0.191*** 0.113*** N/A –0.04**

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.01]

Hispanic DiD 0.06*** 0.19*** 0.099*** 0.146*** N/A –0.011

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.564]

4 Math All DiD 0.027** –0.068*** –0.167*** –0.034* N/A –0.232***

p-­value [0.036] [0.001] [0.001] [0.082] N/A [0.001]

Black DiD 0.034** 0.091*** –0.107** 0.09* N/A –0.309***

p-­value [0.029] [0.009] [0.032] [0.063] N/A [0.001]

FRPL DiD 0.047*** –0.127*** –0.23*** –0.069*** N/A –0.286***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.027 –0.172*** –0.244*** –0.125*** N/A –0.242***

p-­value [0.098] [0.001] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.173*** 0.105*** 0.019 0.119*** N/A –0.209***

p-­value [0.001] [0.001] [0.143] [0.001] N/A [0.001]

176

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 Black DiD 0.308*** 0.234*** –0.037 0.264*** N/A –0.213***

p-­value [0.001] [0.001] [0.217] [0.001] N/A [0.001]

FRPL DiD 0.185*** 0.075*** –0.021 0.085*** N/A –0.233***

p-­value [0.001] [0.001] [0.114] [0.001] N/A [0.001]

Hispanic DiD 0.149*** –0.011 –0.043*** –0.005 N/A –0.256***

p-­value [0.001] [0.46] [0.003] [0.734] N/A [0.001]

5 Math All DiD 0.047*** 0.06*** –0.16*** –0.166*** N/A –0.228***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Black DiD 0.526*** 0.383*** 0.07 –0.295*** N/A –0.345***

p-­value [0.001] [0.001] [0.157] [0.001] N/A [0.001]

FRPL DiD 0.024* 0.075*** –0.177*** –0.182*** N/A –0.225***

p-­value [0.064] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.004 0.009 –0.23*** –0.208*** N/A –0.244***

p-­value [0.846] [0.663] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.104*** 0.092*** –0.02 –0.011 N/A –0.141***

p-­value [0.001] [0.001] [0.114] [0.466] N/A [0.001]

Black DiD 0.229*** 0.275*** 0.103** –0.183*** N/A –0.306***

p-­value [0.001] [0.001] [0.01] [0.001] N/A [0.001]

FRPL DiD 0.089*** 0.074*** –0.071*** –0.061*** N/A –0.116***

p-­value [0.001] [0.001] [0.001] [0.002] N/A [0.001]

Hispanic DiD 0.049*** 0.057*** –0.089*** –0.032* N/A –0.118***

p-­value [0.001] [0.001] [0.001] [0.061] N/A [0.001]

6 Math All DiD –0.021 0.021 –0.05 –0.154* N/A –0.293***

p-­value [0.114] [0.86] [0.506] [0.089] N/A [0.001]

Black DiD –0.023 0.221*** 0.078 –0.17 N/A –0.011

p-­value [0.889] [0.005] [0.801] [0.255] N/A [0.896]

FRPL DiD –0.013 0.034 –0.062 –0.159*** N/A –0.241**

177

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.89] [0.68] [0.447] [0.001] N/A [0.024]

Hispanic DiD –0.032 0.038 –0.076 –0.176*** N/A –0.259**

p-­value [0.736] [0.638] [0.312] [0.001] N/A [0.024]

Reading All DiD –0.009 0.009 0.009 –0.052 N/A –0.126***

p-­value [0.804] [0.856] [0.846] [0.178] N/A [0.001]

Black DiD –0.032 0.193*** 0.114 –0.113 N/A –0.043

p-­value [0.766] [0.001] [0.588] [0.346] N/A [0.528]

FRPL DiD –0.022 –0.016 –0.035 –0.101** N/A –0.144**

p-­value [0.808] [0.703] [0.598] [0.048] N/A [0.035]

Hispanic DiD –0.012 0.007 –0.049 –0.119** N/A –0.165*

p-­value [0.904] [0.912] [0.499] [0.013] N/A [0.081]

7 Math All DiD 0.196*** 0.062 –0.079 0.03 N/A –0.244***

p-­value [0.001] [0.662] [0.46] [0.842] N/A [0.001]

Black DiD –0.237* –0.178 –0.062 –0.193 N/A –0.239**

p-­value [0.056] [0.523] [0.773] [0.176] N/A [0.018]

FRPL DiD 0.071 –0.054 –0.226*** –0.109 N/A –0.348***

p-­value [0.45] [0.644] [0.001] [0.111] N/A [0.001]

Hispanic DiD 0.111 –0.065 –0.21*** –0.109 N/A –0.355***

p-­value [0.219] [0.615] [0.004] [0.259] N/A [0.001]

Reading All DiD 0.044 0.019 –0.004 0.007 N/A –0.07

p-­value [0.589] [0.79] [0.943] [0.861] N/A [0.381]

Black DiD –0.303* –0.235 –0.08 –0.162 N/A –0.023

p-­value [0.1] [0.226] [0.619] [0.296] N/A [0.892]

FRPL DiD –0.014 –0.067 –0.117*** –0.102*** N/A –0.157***

p-­value [0.85] [0.238] [0.001] [0.001] N/A [0.006]

Hispanic DiD 0.011 –0.075 –0.087* –0.107*** N/A –0.138*

p-­value [0.893] [0.262] [0.077] [0.001] N/A [0.059]

178

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 8 Reading All DiD –0.034** 0.073 0.043 –0.155** N/A –0.222*

p-­value [0.029] [0.364] [0.635] [0.02] N/A [0.083]

Black DiD –0.331*** –0.383*** –0.246*** –0.347*** N/A –0.255***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

FRPL DiD –0.048 0.009 –0.062 –0.269*** N/A –0.29***

p-­value [0.386] [0.877] [0.315] [0.001] N/A [0.001]

Hispanic DiD –0.003 0.012 –0.048 –0.241*** N/A –0.302***

p-­value [0.956] [0.842] [0.454] [0.001] N/A [0.002]

3–8 Math All DiD 0.043 0.027 –0.079 –0.017 N/A –0.167***

p-­value [0.139] [0.804] [0.211] [0.827] N/A [0.001]

Black DiD –0.001 –0.003 –0.035 –0.179 N/A –0.158***

p-­value [0.997] [0.982] [0.803] [0.163] N/A [0.001]

FRPL DiD 0.035 –0.05 –0.162* –0.103* N/A –0.215***

p-­value [0.562] [0.662] [0.075] [0.049] N/A [0.001]

Hispanic DiD 0.026 –0.038 –0.148* –0.078 N/A –0.195***

p-­value [0.586] [0.739] [0.096] [0.201] N/A [0.001]

Reading All DiD 0.029 0.048 0.005 –0.038 N/A –0.079

p-­value [0.459] [0.474] [0.92] [0.245] N/A [0.252]

Black DiD 0.031 –0.022 –0.013 –0.067 N/A –0.123***

p-­value [0.807] [0.878] [0.913] [0.542] N/A [0.001]

FRPL DiD 0.045 0.012 –0.046 –0.099 N/A –0.138***

p-­value [0.557] [0.87] [0.511] [0.2] N/A [0.001]

Hispanic DiD 0.021 0.004 –0.064 –0.104 N/A –0.141***

p-­value [0.734] [0.952] [0.331] [0.123] N/A [0.001]

11 Reading All DiD 0.065 0.018 0.072 –0.006 N/A 0.186***

p-­value [0.21] [0.84] [0.417] [0.959] N/A [0.001]

Black DiD 0.153 –0.27*** –0.247* –0.214** N/A 0.356***

179

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.294] [0.001] [0.072] [0.02] N/A [0.001]

FRPL DiD –0.008 –0.147** –0.066 –0.16 N/A 0.104

p-­value [0.934] [0.037] [0.429] [0.138] N/A [0.193]

Hispanic DiD –0.044 –0.108 –0.027 –0.135 N/A 0.081

p-­value [0.63] [0.251] [0.743] [0.228] N/A [0.331]

HS Dropout rate, as a percentage All DiD 1.22 3.05* 1.4 1.46 0.53 2.63*

p-­value [0.283] [0.088] [0.286] [0.383] [0.703] [0.06]

Black DiD 4.26** 5.24** 2.08 2.82 1.96 4.45**

p-­value [0.033] [0.039] [0.442] [0.198] [0.454] [0.02]

Hispanic DiD 1.81 3.64** 1.34 0.97 0.79 2.57*

p-­value [0.312] [0.044] [0.295] [0.588] [0.524] [0.063]

Graduation rate, as a percentage All DiD –2.35* –0.48 –2.32*** –3.22*** –5.06*** –6.61***

p-­value [0.087] [0.723] [0.001] [0.001] [0.001] [0.001]

Black DiD 0.94 –3.12 –3.81 5.45 –0.21 –15.49***

p-­value [0.716] [0.371] [0.184] [0.169] [0.955] [0.001]

Hispanic DiD –2.14 1.24 –1.51 –3.15** –5.79*** –6.89***

p-­value [0.346] [0.452] [0.185] [0.03] [0.001] [0.002]

UC eligible, as a percentage All DiD 5.36** 7.43*** 7.49*** 1.23 6.43*** 7.26***

p-­value [0.035] [0.001] [0.004] [0.826] [0.001] [0.001]

Black DiD 7.35 7.05* 7.02** 7.28 14.58*** –0.92

p-­value [0.101] [0.057] [0.034] [0.375] [0.001] [0.87]

Hispanic DiD 6.33** 10.12*** 7.37*** –0.1 5.28*** 6.27***

p-­value [0.011] [0.001] [0.003] [0.985] [0.001] [0.005]

CAHSEE math All DiD 0.07 0.05 0.13 0.2* 0.24** 0.21***

p-­value [0.628] [0.697] [0.186] [0.072] [0.019] [0.001]

Black DiD –0.1 –0.21 0.15 –0.1 0.32* 0.18***

p-­value [0.244] [0.464] [0.195] [0.189] [0.062] [0.005]

180

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 FRPL DiD 0.04 0.09 0.16 0.2* 0.23** 0.22***

p-­value [0.768] [0.483] [0.104] [0.075] [0.014] [0.001]

Hispanic DiD 0.09 0.03 0.14 0.17 0.17* 0.2***

p-­value [0.381] [0.828] [0.159] [0.115] [0.085] [0.001]

CAHSEE reading All DiD –0.11 –0.08 –0.01 –0.04 0.1 –0.05

p-­value [0.216] [0.417] [0.925] [0.7] [0.221] [0.503]

Black DiD –0.24*** –0.36 0.02 –0.2 0.22*** 0.03

p-­value [0.001] [0.132] [0.846] [0.222] [0.001] [0.504]

FRPL DiD –0.1 –0.06 –0.01 –0.02 0.11 0

p-­value [0.361] [0.535] [0.913] [0.845] [0.157] [0.962]

Hispanic DiD –0.12 –0.08 –0.02 –0.02 0.1 0.02

p-­value [0.234] [0.441] [0.816] [0.885] [0.181] [0.744]

NOTE: DiD represents the DiD estimate of the initiative’s effect on each outcome in each year. For more details on the DiD methodology, see Gutierrez, Weinberger, and Engberg, 2016. A difference value of 0.000 indicates a difference of less than 0.0005. For the dropout, graduation, and UC-­eligible rates, we used a logit model to estimate the predicted trends to take into account the bounded range of these estimates. Standardized testing for math was not administered in grade 8 until 2015, so we did not estimate effects on grade 8 math scores. In 2014, California started to implement a new standardized test and did not publish results for the first year of the test. CAHSEE is an HS exit exam required of all students to graduate that can be taken up to three times, beginning in the second semester of grade 10. *** = significant at p < 0.01. ** = significant at p < 0.05. * = significant at p < 0.10.

Table U.5. Aspire Impact Estimates, by Grade, Subgroup, and Year

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 3 Math All DiD –0.003 –0.067*** –0.01 0.059** N/A –0.109***

p-­value [0.841] [0.001] [0.623] [0.019] N/A [0.001]

Black DiD 0.006 –0.103*** –0.048 –0.039 N/A –0.221***

p-­value [0.82] [0.004] [0.327] [0.47] N/A [0.001]

FRPL DiD 0.002 –0.072*** –0.011 0.01 N/A –0.145***

p-­value [0.869] [0.001] [0.632] [0.709] N/A [0.001]

181

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 Hispanic DiD –0.065*** –0.061*** –0.089*** 0.017 N/A –0.118***

p-­value [0.001] [0.001] [0.001] [0.513] N/A [0.001]

Reading All DiD 0.109*** 0.168*** 0.175*** 0.109*** N/A –0.012

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.406]

Black DiD 0.148*** 0.007 0.213*** 0.093* N/A –0.202***

p-­value [0.001] [0.795] [0.001] [0.056] N/A [0.001]

FRPL DiD 0.138*** 0.15*** 0.191*** 0.113*** N/A –0.04**

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.01]

Hispanic DiD 0.06*** 0.19*** 0.099*** 0.146*** N/A –0.011

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.564]

4 Math All DiD 0.027** –0.068*** –0.167*** –0.034* N/A –0.232***

p-­value [0.036] [0.001] [0.001] [0.082] N/A [0.001]

Black DiD 0.034** 0.091*** –0.107** 0.09* N/A –0.309***

p-­value [0.029] [0.009] [0.032] [0.063] N/A [0.001]

FRPL DiD 0.047*** –0.127*** –0.23*** –0.069*** N/A –0.286***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.027 –0.172*** –0.244*** –0.125*** N/A –0.242***

p-­value [0.098] [0.001] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.173*** 0.105*** 0.019 0.119*** N/A –0.209***

p-­value [0.001] [0.001] [0.143] [0.001] N/A [0.001]

Black DiD 0.308*** 0.234*** –0.037 0.264*** N/A –0.213***

p-­value [0.001] [0.001] [0.217] [0.001] N/A [0.001]

FRPL DiD 0.185*** 0.075*** –0.021 0.085*** N/A –0.233***

p-­value [0.001] [0.001] [0.114] [0.001] N/A [0.001]

Hispanic DiD 0.149*** –0.011 –0.043*** –0.005 N/A –0.256***

p-­value [0.001] [0.46] [0.003] [0.734] N/A [0.001]

5 Math All DiD 0.047*** 0.06*** –0.16*** –0.166*** N/A –0.228***

182

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Black DiD 0.526*** 0.383*** 0.07 –0.295*** N/A –0.345***

p-­value [0.001] [0.001] [0.157] [0.001] N/A [0.001]

FRPL DiD 0.024* 0.075*** –0.177*** –0.182*** N/A –0.225***

p-­value [0.064] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.004 0.009 –0.23*** –0.208*** N/A –0.244***

p-­value [0.846] [0.663] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.104*** 0.092*** –0.02 –0.011 N/A –0.141***

p-­value [0.001] [0.001] [0.114] [0.466] N/A [0.001]

Black DiD 0.229*** 0.275*** 0.103** –0.183*** N/A –0.306***

p-­value [0.001] [0.001] [0.01] [0.001] N/A [0.001]

FRPL DiD 0.089*** 0.074*** –0.071*** –0.061*** N/A –0.116***

p-­value [0.001] [0.001] [0.001] [0.002] N/A [0.001]

Hispanic DiD 0.049*** 0.057*** –0.089*** –0.032* N/A –0.118***

p-­value [0.001] [0.001] [0.001] [0.061] N/A [0.001]

6 Math All DiD –0.021 –0.116*** –0.14*** –0.332*** N/A –0.32***

p-­value [0.118] [0.001] [0.001] [0.001] N/A [0.001]

Black DiD 0.128*** 0.245*** 0.385*** –0.052 N/A 0.022

p-­value [0.001] [0.001] [0.001] [0.204] N/A [0.625]

FRPL DiD 0.053*** –0.067*** –0.103*** –0.273*** N/A –0.199***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.005 –0.101*** –0.16*** –0.342*** N/A –0.244***

p-­value [0.728] [0.001] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.034*** 0.007 0.015 –0.112*** N/A –0.171***

p-­value [0.001] [0.471] [0.158] [0.001] N/A [0.001]

Black DiD 0.085*** 0.19*** 0.35*** 0.015 N/A 0.009

p-­value [0.005] [0.001] [0.001] [0.687] N/A [0.835]

183

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 FRPL DiD 0.033** –0.065*** –0.067*** –0.168*** N/A –0.165***

p-­value [0.013] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.004 –0.051*** –0.127*** –0.262*** N/A –0.208***

p-­value [0.756] [0.001] [0.001] [0.001] N/A [0.001]

7 Math All DiD 0.203*** –0.042** –0.224*** –0.199*** N/A –0.336***

p-­value [0.001] [0.04] [0.001] [0.001] N/A [0.001]

Black DiD –0.149*** 0.072** 0.121** –0.081** N/A –0.381***

p-­value [0.001] [0.037] [0.044] [0.032] N/A [0.001]

FRPL DiD 0.04* –0.192*** –0.393*** –0.375*** N/A –0.493***

p-­value [0.059] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.105*** –0.236*** –0.358*** –0.384*** N/A –0.512***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.079*** 0.052*** –0.073*** 0.012 N/A –0.177***

p-­value [0.001] [0.001] [0.001] [0.379] N/A [0.001]

Black DiD –0.147*** –0.071** 0.054 –0.033 N/A –0.234***

p-­value [0.001] [0.015] [0.311] [0.381] N/A [0.001]

FRPL DiD –0.03 –0.08*** –0.227*** –0.156*** N/A –0.305***

p-­value [0.116] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.015 –0.108*** –0.183*** –0.137*** N/A –0.276***

p-­value [0.447] [0.001] [0.001] [0.001] N/A [0.001]

8 Reading All DiD –0.02 –0.023* –0.068*** –0.234*** N/A –0.377***

p-­value [0.096] [0.083] [0.001] [0.001] N/A [0.001]

Black DiD –0.331*** –0.383*** –0.246*** –0.347*** N/A –0.255***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

FRPL DiD –0.027* –0.119*** –0.193*** –0.373*** N/A –0.456***

p-­value [0.078] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.038** –0.087*** –0.15*** –0.322*** N/A –0.46***

184

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.02] [0.001] [0.001] [0.001] N/A [0.001]

3–8 Math All DiD 0.023*** –0.063*** –0.136*** –0.148*** N/A –0.24***

p-­value [0.005] [0.001] [0.001] [0.001] N/A [0.001]

Black DiD 0.077*** 0.069*** 0.023 –0.123*** N/A –0.192***

p-­value [0.001] [0.001] [0.244] [0.001] N/A [0.001]

FRPL DiD 0.017 –0.095*** –0.166*** –0.182*** N/A –0.256***

p-­value [0.213] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.012 –0.103*** –0.172*** –0.183*** N/A –0.241***

p-­value [0.392] [0.001] [0.001] [0.001] N/A [0.001]

Reading All DiD 0.086*** 0.078*** 0.009 –0.019* N/A –0.172***

p-­value [0.001] [0.001] [0.31] [0.059] N/A [0.001]

Black DiD 0.097*** 0.057*** 0.04 –0.013 N/A –0.147***

p-­value [0.001] [0.001] [0.136] [0.532] N/A [0.001]

FRPL DiD 0.09*** 0.024*** –0.026*** –0.053*** N/A –0.23***

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.001]

Hispanic DiD 0.047*** 0.006 –0.059*** –0.081*** N/A –0.237***

p-­value [0.001] [0.491] [0.001] [0.001] N/A [0.001]

NOTE: DiD represents the DiD estimate of the initiative’s effect on each outcome in each year. For more details on the DiD methodology, see Gutierrez, Weinberger, and Engberg, 2016. A difference value of 0.000 indicates a difference of less than 0.0005. Aspire did not administer any standardized testing for math for grade 8 until 2015, so we did not estimate effects on grade 8 math scores. In 2014, California started to implement a new standardized test and did not publish results for the first year of the test. *** = significant at p < 0.01. ** = significant at p < 0.05. * = significant at p < 0.10.

Table U.6. Green Dot Impact Estimates, by Grade, Subgroup, and Year

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 11 Reading All DiD 0.017 –0.067*** –0.007 –0.074*** N/A 0.192***

p-­value [0.205] [0.001] [0.685] [0.001] N/A [0.001]

Black DiD 0.125*** –0.108* –0.266*** –0.243*** N/A 0.419***

185

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 p-­value [0.001] [0.068] [0.001] [0.001] N/A [0.001]

FRPL DiD –0.079*** –0.193*** –0.133*** –0.244*** N/A 0.052

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.104]

Hispanic DiD –0.132*** –0.205*** –0.098*** –0.236*** N/A 0.012

p-­value [0.001] [0.001] [0.001] [0.001] N/A [0.734]

HS Dropout rate, as a percentage All DiD 2.14** 6.05*** 2.43* 2.8* 2.39*** 3.63***

p-­value [0.015] [0.001] [0.067] [0.057] [0.006] [0.005]

Black DiD 1.09 7.69*** 5.2** 4.22** 6.21*** 5.61***

p-­value [0.575] [0.001] [0.017] [0.029] [0.001] [0.002]

Hispanic DiD 0.99 5.38*** 2.01 1.9 2.25** 3.46***

p-­value [0.34] [0.001] [0.132] [0.27] [0.014] [0.009]

Graduation rate, as a percentage All DiD –3.68*** –1.3*** –5.62*** –6.86*** –8.09*** –10.98***

p-­value [0.001] [0.007] [0.001] [0.001] [0.001] [0.001]

Black DiD 1.96 –1.18 –4.09*** 8.54*** –4.89*** –24.16***

p-­value [0.236] [0.404] [0.003] [0.001] [0.002] [0.001]

Hispanic DiD –3.6*** –0.99 –5.91*** –7.28*** –8.93*** –13***

p-­value [0.001] [0.391] [0.001] [0.001] [0.001] [0.001]

UC eligible, as a percentage All DiD –1.62** –0.06 –1.57*** –14.07*** –0.68 –1.45***

p-­value [0.044] [0.907] [0.003] [0.001] [0.122] [0.008]

Black DiD –11.22*** –3.94*** –6.77*** –24.45*** 2.94** –14.41***

p-­value [0.001] [0.001] [0.001] [0.001] [0.01] [0.001]

Hispanic DiD –1.08** 0.13 –2.99*** –16.89*** –2.69*** –4.78***

p-­value [0.015] [0.888] [0.001] [0.001] [0.001] [0.001]

CAHSEE math All DiD –0.07* –0.01 0.11*** 0.1** 0.25*** 0.27***

p-­value [0.062] [0.745] [0.005] [0.012] [0.001] [0.001]

Black DiD 0.04 –0.41*** 0.2*** 0.01 0.64*** 0.36***

p-­value [0.609] [0.001] [0.001] [0.846] [0.001] [0.001]

186

Grade Subject Subgroup Data Point 2010 2011 2012 2013 2014 2015 FRPL DiD –0.07** 0 0.14*** 0.07* 0.22*** 0.27***

p-­value [0.042] [0.962] [0.001] [0.076] [0.001] [0.001]

Hispanic DiD 0.01 –0.07* 0.09** 0.02 0.16*** 0.26***

p-­value [0.838] [0.053] [0.046] [0.648] [0.001] [0.001]

CAHSEE reading All DiD –0.22*** –0.19*** –0.09** –0.21*** –0.02 –0.09**

p-­value [0.001] [0.001] [0.03] [0.001] [0.524] [0.044]

Black DiD –0.13** –0.54*** 0.01 –0.23*** 0.35*** 0.14***

p-­value [0.026] [0.001] [0.884] [0.001] [0.001] [0.002]

FRPL DiD –0.21*** –0.19*** –0.08** –0.21*** –0.01 –0.04

p-­value [0.001] [0.001] [0.036] [0.001] [0.791] [0.292]

Hispanic DiD –0.26*** –0.23*** –0.11*** –0.23*** –0.02 –0.03

p-­value [0.001] [0.001] [0.009] [0.001] [0.42] [0.491]

NOTE: DiD represents the DiD estimate of the initiative’s effect on each outcome in each year. For more details on the DiD methodology, see Gutierrez, Weinberger, and Engberg, 2016. A difference value of 0.000 indicates a difference of less than 0.0005. For the dropout, graduation, and UC-­eligible rates, we used a logit model to estimate the predicted trends to take into account the bounded range of these estimates. In 2014, California started to implement a new standardized test and did not publish results for the first year of the test. CAHSEE is an HS exit exam required of all students to graduate that can be taken up to three times, beginning in the second semester of grade 10. *** = significant at p < 0.01. ** = significant at p < 0.05. * = significant at p < 0.10.