building on the basics: the impact of high-stakes testing on ...students’ performance on...
TRANSCRIPT
![Page 1: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/1.jpg)
C E N T E R F O R C I V I C I N N O V A T I O NA T T H E M A N H A T T A N I N S T I T U T E
C C i
Civ
iC R
epo
RtN
o. 5
4 Ju
ly 2
008
Publ
ishe
d by
Man
hatt
an In
stitu
te
Building on the Basics: the impact of
high-stakes testing on student Proficiency in low-
stakes subjects
Marcus A. Winters, Ph.D.Senior Fellow, Manhattan Institute
Jay P. Greene, Ph.D.Senior Fellow, Manhattan Institute
Julie R. Trivitt, Ph.D.Assistant Professor, Arkansas Tech University
![Page 2: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/2.jpg)
![Page 3: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/3.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
exeCutive SummaRy
School systems across the nation have adopted policies that reward or sanction particular schools on the basis of their students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding such “high-stakes testing” policies is that they oblige schools to focus on subjects for which they are held accountable but to neglect the rest. Many have worried that the limited focus of these policies could have an unintended negative effect on student proficiency in other subjects, such as science, that are important to the development of human capital and thus to future economic growth.
This paper uses a regression discontinuity design utilizing student-level data to evaluate the impact of sanctions under Florida’s high-stakes testing policy on student proficiency in science. Under that state’s A+ program, every public school receives a letter grade from A to F that is based primarily upon its students’ performance on the state’s standardized math and reading exams. Students in Florida were also administered a standardized exam in science, but this test was low-stakes because its results held no consequences under the A+ program or any other formal accountability policy.
Previous research has found that the rewards and sanctions of receiving an F grade in the prior year led to improved gains in student proficiency in the high-stakes subjects of math and reading. This current paper is the first to evaluate the impact of the incentives under this high-stakes testing system on student proficiency in science. This paper adds to a sparse previous literature quantitatively evaluating whether high-stakes testing policies have “crowded out” learning in a low-stakes subject.
The primary findings of the study are:
• The F-grade sanction produced after one year a gain in student science proficiency of about a 0.08 standard deviation. These gains are similar to those in reading and appear smaller than the gains in math that were due to the F sanction.
• There is some evidence to suggest that student science proficiency increased primarily because student learning in math and reading enabled that increase. That is, learning in math and reading appear to contribute to learning in science.
![Page 4: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/4.jpg)
Civ
ic R
epor
t 54
July 2008
about the authoRS
MaRcus a. WinteRs is a senior fellow at the Manhattan Institute. He has conducted studies of a variety of education
policy issues including high-stakes testing, performance pay for teachers, and the effects of vouchers on the public
school system. His research has been published in the journals Education Finance and Policy, Teachers College Record,
and Education Next. His op-ed articles have appeared in numerous newspapers, including the Wall Street Journal,
Washington Post, and USA Today. He received a B.A. in political science from Ohio University in 2002 and a Ph.D. in
economics from the University of Arkansas in 2008.
JaY P. gReene is endowed chair and head of the Department of Education Reform at the University of Arkansas and a senior fellow at the Manhattan Institute. He conducts research and writes about topics such as school choice, high school graduation rates, accountability, and special education.
Dr. Greene’s research was cited four times in the U.S. Supreme Court’s opinions in the landmark Zelman v. Simmons-Harris case on school vouchers. His articles have appeared in policy journals such as The Public Interest, City Journal, and Education Next; in academic journals such as The Georgetown Public Policy Review, Education and Urban Society, and The British Journal of Political Science; as well as in newspapers such as the Wall Street Journal and the Washington Post. He is the author of Education Myths (Rowman & Littlefield, 2005). Dr. Greene has been a professor of government at the University of Texas at Austin and the University of Houston. He received a B.A. in history from Tufts University in 1988 and a Ph.D. from the Department of Government at Harvard University in 1995.
Julie R. tRivitt is an assistant professor of economics at Arkansas Tech University. She earned her Ph.D. in economics from the University of Arkansas in December 2006. Her focus was health economics and applied econometrics. Since completing her Ph.D., she has worked as a senior research associate in the Department of Education Reform at the University of Arkansas and as a consultant on education projects to the World Trade Center in Brescia, Italy. Her research agenda focuses on human capital, health, labor, and education economics.
![Page 5: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/5.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
introduction
Florida’s a+ accountability Program
data and Method
the impact of the F-grade sanction on student Proficiency in high-
and low-stakes subjects
understanding the effects of the F-grade sanction on
science Proficiency
summary and discussion
endnotes
References
CONTENTS1
3
4
5
6
7
8
9
![Page 6: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/6.jpg)
Civ
ic R
epor
t 54
July 2008
![Page 7: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/7.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
INTrOduCTION
Schoolsystemsacrossthenationhaveadoptedpoliciesthatrewardorsanctionparticularschoolsonthebasisoftheirstudents’performanceonstandardizedtests.Suchtestinghas been a dominant force in education policy since at
leastthe1990s.Morethanhalfthestateshadalreadyimplementedsomeformofhigh-stakestestbeforetheNoChildLeftBehindAct(NCLB)madeituniversalin2002.Wecallatestahigh-stakestestwhentherearemeaningfulconsequencesforschoolsorstudentsthatarebasedonhowstudentsperformonthetest.
Oneofthemostfrequentlyraisedconcernsregardinghigh-stakestestingpoliciesisthattheyobligeschoolstofocusonsubjectsforwhichtheyareheldaccountablebuttoneglect therest(NicholsandBerliner2007;Gunzenhauser2003;Groves2002;Patterson2002;MurilloandFlores2002;McNeil2000;Jonesetal.1999).Thevastmajorityofthesepoliciesbasetheirrewardsorsanctionsexclusivelyontheresultsofreadingandmathtests.Thoughsomepoliciesaremoreexpansivethanothers,fewthreatenmeaningfulconsequenceswhenstudentsfailtomeetstandardsinsubjectssuchasscience,his-tory,orthearts.FailuretoassurestudentmasteryofsubjectsotherthanbasicmathandreadingcouldhaveimportantimplicationsforthefutureofhumancapitalintheUnitedStates.
Ifschoolsreallocatetimeandresourcesawayfromimportantbutlow-stakessubjectsandtowardthehigh-stakessubjects,withthe
Marcus A. Winters, Jay P. Greene & Julie r. Trivitt
1
building on the baSiCS: the impaCt of
high-StakeS teSting on Student pRofiCienCy in
low-StakeS SubjeCtS
![Page 8: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/8.jpg)
Civ
ic R
epor
t 54
July 2008
examsthatcouldskewresults(Jacob2005;JacobandLevitt2003).PreviousresearchinFloridafoundthattheresultsofthatstate’shigh-stakesexamshavenotbeensystematicallymanipulatedandaregenerallyreliableindicatorsofstudentproficiency(Greene,Winters,andForster2004;WestandPeterson2006).
Florida’s high-stakes testing program is also worthstudying because its accountability system, unlikethatofmanyotheraccountabilitysystems,makes itpossibletousearigorous“regressiondiscontinuity”design,whichallowsforacausaltestoftheimpactoftheprogram’ssanctions.Beginninginthe2001–02schoolyear,schoolsreceivedlettergradesreflectingpointsearnedunderanelaboratesystemforcaptur-ing several aspects of a school’s performance. Asdescribedbelow,Florida imposesmeaningful sanc-tionsonlywhenaschoolreceivesafailinggrade.WefollowthestrategyofapreviouspaperbyRouseetal.(2007)thatusesthechangeinthepolicytocontrolfortheheterogeneityofschoolsthatreceiveafailingorpassinggrade.
Wefind that students attending schools designatedasfailingintheprioryearmadegreatergainsonthestate’sscienceexamthantheywouldhavedoneiftheirschoolhadnotreceivedtheFsanction.Thegainsthatstudentsmadeinscienceweresimilartothosethatpreviousresearch(whichwereplicatehere)hasfoundthatstudentsmadeinthehigh-stakessubjectsofmathandreading.ThesefindingssuggestthattheincentivesofFlorida’shigh-stakestestingprogramhavenotledtosignificantcrowdingoutofstudentknowledgeinthelow-stakessubjectofscience.
Atfirst,ourresultsmayseemcounterintuitive,inthathigh-stakestestinginonlycertainsubjectswouldbeexpectedtoleadschoolstofocusonthoseareas.Infact,encouragingschools toshift theirpriorities to-wardsubjectscommonlyrecognizedasacademicallyimportant(i.e.,mathandreading)isarguablyoneofthepurposesofthepolicy.
Therearetworeasonsthathigh-stakestestingmightinsteadhaveapositiveeffectonstudentachievementinlow-stakessubjects.First,thepressureofaccount-abilitytestingcouldleadschoolstoadoptreformsthat
resultthatstudentsachievedinthehigh-stakessub-jectsattheexpenseofproficiencyinthelow-stakessubjects,wewouldsaythatthepolicy“crowdedout”learninginthelow-stakessubjects.Itisimportanttonotethatthisdefinitionofcrowdingoutfocusesonlearningoutput,notteachinginputs.Inotherwords,ifschoolsincreasedtimespentonmathorreadingbydecreasingtimespentonscience,wewouldconsiderhigh-stakestestingofmathorreadingtohavecrowdedoutscienceteachingonlyifstudentsactuallylearnedlessscienceasaresult.
A substantial amount of anecdotal and qualitativeevidence suggests that schools and teachers haveresponded to high-stakes testing by adjusting theirteachingstyles(McNeil2000;NewYorkStateEduca-tion Department 2004) and by shifting focus awayfromlow-stakessubjects(CenteronEducationPolicy2006;Jonesetal.1999;KingandMathers1997;Gordon2002;Groves2002;MurilloandFlores2002).However,thereiscurrentlyverylittleempiricalevidenceoftheimpact of high-stakes testing policies onmeasuredstudentproficiencyinsubjectsthatarenotpartoftheaccountabilitysystem.
In the only quantitative evaluation of this topic ofwhichweareaware,Jacob(2005)foundthatChicago’shigh-stakestestingsystemledtosignificantlearninggainsinthelow-stakessubjectsofscienceandsocialstudies. However, he found that these gains weresmallerthanthoseinthehigh-stakessubjectsofmathandreading.
Inthispaper,weaddtothelimitedpreviousresearchby evaluating the effects on student proficiency inthelow-stakessubjectofscienceandthehigh-stakessubjectsofmathandreadingofahigh-stakestestingsysteminFloridathatemployssanctions.TherearetwoimportantreasonstoresearchthisquestioninasystemotherthanChicago’s.First,byevaluatingtheimpactof sanctions under high-stakes testing on studentproficiencyinlow-stakessubjectsinanotherschoolsystem,wecanhelpdeterminewhethertheresultsinChicagoarelimitedtothatareaorholdmoregener-ally.Second,itisimportanttoinvestigateoutcomesin another system, since some research has foundsystematic manipulations of Chicago’s high-stakes
2
![Page 9: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/9.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
3
improvetheiroverallquality.Forexample,aschoolcouldmoreeffectivelymotivateitsstudents,oritcouldimproverelationswithitsteachers.Thoughschools’purposemaybetoimprovestudentscoresinmathandreadingtoavoidthesanctionsofthehigh-stakestesting policy, general improvements of this kindmightproduceacross-the-boardincreasesinstudentachievement.Second,sanctionsunderhigh-stakestest-ingcouldimprovestudentachievementinlow-stakessubjectsiftheresultingmasteryofhigh-stakessubjectsfacilitatesmasteryofothersubjects.
Thoughatruetestoftheprevalenceofeitherofthesekindsofexplanationsisnotavailabletous,wehavediscovered evidence suggesting that student profi-ciencyinsciencehasincreasedunderthehigh-stakessanctions primarily because the improvements thatstudentshavemade inmath and readinghave en-hancedtheirabilitytolearnsciencematerialaswell.However,westressthatfutureresearchusingstrongerstrategiesthanareavailableheretoexplainapositiverelationshipbetweenhigh-stakestestingandstudentimprovementinlow-stakessubjectsisnecessary.
FlOrIdA’S A+ ACCOuNTAbIlITy PrOGrAM
Florida is among the nation’s leaders in high-stakes testing. Most agree that the state’s A+AccountabilityProgram(A+)isoneofthemost
aggressiveprogramsofitskind.Itwasclearlyatem-plateforthefederalNCLBlaw.
Eachyear,thestateadministersastandardizedtest,theFloridaComprehensiveAssessmentTest(FCAT),inmathandreadingtoallpublicschoolstudentsinthestatewhoareenrolledingrades3–10.Schoolsreceivelettergrades,fromAtoF,basedontheper-centageoftheirstudentsmeetingparticularachieve-mentlevelsandtheacademicprogressofstudentsincertainsubgroups.
Therearetwoimportantreasonsthatwemightexpectschoolsdeemedtobefailingtorespondpositively.ThosethathavereceivedanFgradeforthefirsttimemaybeshamedintoimprovingtheirperformance(Fi-
glioandRouse2005;Ladd2001;Carnoy2001;Harris2001).Thosethathavereceivedat leastonefailinggrademaydecidetoraisetheirperformancebecausetheyfearattritionoftheirstudentbody.ThismayoccurastheresultofapolicyofissuingOpportunitySchol-arships (vouchers) to students in schools thathavereceivedtwofailinggradeswithinafour-yearperiodthattheycanusetoattendanotherpublicschooloraprivateschoolwillingtoacceptthevoucherasafulltuitionpayment.1Inthispaper,wearenotparticularlyconcernedwithwhethertheseoranyotherphenom-enadriveincreasesinstudentperformanceineitherhigh-orlow-stakessubjects.
AchangeintheadministrationoftheprogramprovidedaninterestingavenueforresearchingFlorida’spolicy.Intheprogram’sinitialyears,schoolgradeswerebasedon the percentage of students earning level 2 (thesecond-lowestoffivelevels)oraboveontheread-ing,math,andwritingportionsoftheFCATandthepercentageofeligiblestudentstested.AschoolcouldavoidearninganFifatleast50%oftestedstudentsscoredatachievementlevel3inwriting,orif60%oftestedstudentsscoredatlevel2inreadingormathand90%ofeligiblestudentsweretested.Ifaschoolmetoneortwoofthesecriteria,itearnedaD.Ifitmetallthreeofthesecriteria,itearnedaC.SchoolswithparticularsubpopulationsmeetingallthreereceivedaB.ToearnanA,schoolshadtomeetmorestringentrequirementsfortheoverallstudentpopulationandeach subpopulation. The opinion was widespreadthatschoolshaddeterminedthatsatisfactoryscoresinwritingweretheeasiesttoachieveundertheoriginalschool-gradingformatandthattheteachingofwrit-inginstrugglingschoolsthereforestressedtechniquesgearedtothewritingportionoftheexam.
Startinginthe2001–02schoolyear,Floridaadoptedan accumulating point system to evaluate schools.Schoolsearnonepointforeachpercentofstudentswhoscoreinachievementlevels3,4,or5(thethreehighestoffive levels) inreadingandonepoint foreachpercentofstudentswhoscoreinlevels3,4,or5inmath.Schoolsearnonepointforeachpercentofstudentsscoring3.5oraboveinwriting,whichisgradedfrom1to6.Schoolsearnonepointforeachpercentofstudentswhomakelearninggainsinread-
![Page 10: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/10.jpg)
Civ
ic R
epor
t 54
July 2008
ingandonepointforeachpercentofstudentswhoreachahigherachievementlevelormaintaina3,4,or5 inmath.Schoolsalsoearnonepoint foreachpercentofthelowest-performingreaderswhomaketest-score improvements in the year in question.Aschool that earns fewer than 280points receives afailinggrade.Themultifariousnatureofthegradingprocesshasprobablymadedirectmanipulationofthesystemrelativelydifficult.
Beginninginthe2002–03schoolyear,Floridapublicschoolsalsowererequiredtotestforproficiencyinsciencewhentheyadministeredthemathandread-ingexams.ThesciencepartoftheFCATiscurrentlyadministeredtoallpublicschoolstudentsingrades5,8,and11.Theresultsofthescienceexamhavenowbeenincorporateddirectlyintotheaccountabilitypro-gram;butduringtheyearsofouranalysis,theyhadnoeffectontheschool’sgrade,nordidtheyrepresentanyotherformofofficialaccountability.
SeveralresearchershaveevaluatedtheimpactoftheA+programontheacademicgainsofpublicschoolstudents in math and reading (Rouse et al. 2007;GreeneandWinters2004;Chakrabarti2005;FiglioandRouse2005;WestandPeterson2006;Greene2001).Though there is some disagreement about whichaspectoftheaccountabilitypolicywaseffective(thethreatofvouchersortheshameofanFgrade),eachoftheseanalysesfoundthatthepolicyimprovedthemathandreadingproficiencyofstudentsinpublicschoolsdesignatedas failing.WeareawareofnopreviousresearchanalyzingtheimpactoftheA+programonsciencetestscores.
dATA ANd METhOd2
WeutilizeadatasetprovidedbytheFloridaDepartment of Education that containstestscoresinmath,reading,andscience
as well as demographic characteristics of the uni-verseofstudentsenrolledingrades3–10inFloridapublicschools.Wesupplement the individual-leveldatasetwithschool-levelinformation—specifically,theschool’spointtotalandlettergradeunderA+at
4
theendofthe2001–02schoolyear.Tosimplifythecomparisonofscoresindifferentsubjects,weconverttheFCATscoresofstudentswhowereinoursampleinto a scale scorewith amean of 0 and standarddeviationof1.
Inordertoalignourfindingswiththoseinthepre-vious literature, we utilize the comparison strategyimplementedina2007studyconductedbyRouseetal.thatevaluatedtheimpactofFlorida’sA+policyonstudentachievementinmathandreading.OursampleconsistsoftheuniverseofFloridapublicschoolstu-dentswhowereenrolledinthefifthgradein2002–03andwerepromotedattheendoftheprioryear.Thiswas the first class of fifth-grade students attendingaschool thathadreceiveda lettergradeunder therevisedpointsystemoftheA+policy.Wefocusononlythosestudentswithbothamathandreadingtestscorereportedin2001–02and2002–03.
Wesupplementtheindividual-leveldatawithadmin-istrativeinformationontheschool’sgradeandpointsearnedunder theA+systemduring thesummerof2002.Intheanalysesthatfollow,alongwithobservablecharacteristicsofthestudentandschoolwecontrolforboththeschool’slettergradeattheendofthe2001–02yearand the totalpointsearnedunder thegradingsystem.Theideahereisthatcontrollingforthepointsearnedbytheschoolaccountsfordifferencesinschoolperformance,andthustheremainingdifferences intheperformanceofstudentsatschoolsreceivinganFgrademustreflectresponsestotheincentivesthatexistundertheaccountabilitypolicy.
Weusethisgeneralcomparisonstrategytoperforma series of cross-sectional regressions. We are firstconcernedwithdiscoveringwhetherstudentsmadeacademicgainsinscienceduetotheFsanction,andwealsoconfirmthefindingofanimpactofthesanc-tiononstudentproficiencyinmathandreading.WethenevaluatetheextenttowhichanygainsmadebystudentsinscienceduetotheFsanctionweredrivenbyimprovementsintheoverallperformanceoftheschoolorasymbioticrelationshipbetweenlearninginthehigh-andlow-stakessubjects.
![Page 11: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/11.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
5
ThE IMPACT OF ThE F-GrAdE SANCTION ON STudENT PrOFICIENCy IN hIGh- ANd lOW-STAkES SubJECTS
WeadoptthestrategyofRouseetal.tomea-suretheimpactoftheF-gradesanctiononstudentproficiencyinscience.Asacheck
onourprocedure,wealsoattempttoreplicateinmathandreadingtheresultsofthispreviouspaper.
To evaluate the impact of the F-grade sanction onstudent proficiency, we estimated cross-sectionalregressionmodelsusing thestudent’s test scoreonthe fifth-grade test in 2002–03 in the subject beingevaluatedasthedependentvariable.Theregressioncontrols for a variety of observable characteristicsaboutthestudentandschool,includingthelettergradeandcubicfunction3ofthenumberofpointsearnedbytheschoolattheendofthe2001–02schoolyear.Thevariableofinterestisthatwhichindicateswhetherthechild’sschoolreceivedanFgradeintheprioryear.
In themathandreadinganalyses,wecontrol foracubicfunctionofthestudent’stestscoreinthatsub-jectattheendofthepreviousyear(whenthestudentwasinthefourthgrade),whichallowsustomeasureimprovementsinthestudent’smathandreadingprofi-ciencyandaccountforunobserveddifferencesamongstudents.Unfortunately,studentsdidnottakeascienceexaminthefourthgrade,soasimilarcontrolisnotavailableforthescienceevaluation.Instead,weusethecubicfunctionsofthestudents’fourth-gradescoresinmathandreadingtosubstitutefortheirscoresinsci-ence.Thisprocedureassumesthatstudentproficiencyin thesesubjects ishighlycorrelatedandthat therewasnodifferentialrelationshipinstudentknowledgeamongthesesubjectsinthefivecategoriesofschoolsbeforetheF-gradesanctionwasintroduced.
Theresultsofourestimationsofstudentproficiencyinmath,reading,andsciencearereportedinTable1.OurfindingsinmathandreadingareverysimilartothosereportedbyRouseetal.(2007).Ourestimation
dependent var: Reading Math science
Coef. t Coef. t Coef. t
Prior Reading 0.759 238.220 *** 0.539 133.430 ***
Prior Reading ^ 2 -0.009 -6.490 *** -0.008 -5.120 ***
Prior Reading ^ 3 -0.024 -40.290 *** -0.016 -23.400 ***
Prior Math 0.806 214.900 *** 0.329 79.810 ***
Prior Math ^ 2 -0.004 -2.490 ** 0.015 10.480 ***
Prior Math ^ 3 -0.029 -42.980 *** -0.009 -13.550 ***
a -0.003 -0.100 0.003 0.060 0.023 0.510
B -0.005 -0.180 0.005 0.110 0.006 0.160
c 0.006 0.320 0.002 0.090 0.003 0.120
F 0.086 2.940 *** 0.175 3.840 *** 0.087 2.160 **
R-squared 0.6949 0.6871 0.6588
n 152,003 152,003 151,604
Table 1. regressions evaluating the impact of F-grade sanction on student proficiency and gains in high- and low-stakes subjects
Estimated with OLS with robust standard errors clustered by school.Models additionally control for year, limited English-proficiency status, free or reduced-price lunch status, race, gender, disability classification, predicted score in summer of 2002 if the old grading system is kept, a cubic function for the number of points school earned in summer of 2002.
* Statistically significant at 10% level ** Statistically significant at 5% level *** Statistically significant at 1% level
![Page 12: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/12.jpg)
Civ
ic R
epor
t 54
July 2008
suggeststhatthescoresofstudentsenrolledinanF-gradedschoolexceededby0.09standarddeviationsinreadingand0.17standarddeviationsinmaththescoresofstudentsinD-gradedschools.Atthesametime,therewasnostatisticallysignificantdifferenceintheperformanceofA-,B-,C-,andD-gradedschools,strengthening the view that sanctions have an ef-fectonperformance.ThesimilarityofourresultstothosereportedbyRouseetal.suggeststhatwehavereproduced their procedure relatively well, whichshouldprovideadditionalconfidenceinourfindingsforscience.
Column3ofTable1 reports the results inscience.HerewefindthattheF-gradesanctionproducedaf-teroneyearagaininstudentproficiencyofabouta0.08standarddeviationrelativetostudentsinschoolsthatearnedaDgrade.Theresultissignificantatthe5%confidencelevel,meaningthatwecanbehighlyconfidentthattheFsanctionhadapositiveimpactonstudents’sciencetestscores.
ThesefindingssuggestthattheF-gradesanctionnotonly improved student learning in the high-stakessubjects;italsohadapositiveeffectonstudentprofi-ciencyinthelow-stakessubjectofscience.ItappearsthatthepositiveimpactoftheFsanctiononscienceproficiencywassimilartothatfoundinreadingbutsomewhatlowerthanthatfoundinmath.
uNdErSTANdING ThE EFFECTS OF ThE F-GrAdE SANCTION ON SCIENCE PrOFICIENCy
Ourfindingthat theF-gradesanction ledtosubstantial improvements in scienceprofi-ciencyseemsoddatfirstglance.Itisclear
thatunderFlorida’spolicy,point-maximizingpublicschoolshaveanincentivetofocusonthehigh-stakessubjects,evenifdoingsoistothedetrimentofthelow-stakessubjects.Qualitativeresearchandanecdotalevidencesuggest,moreover,thatthereallocationofresourcesprecipitatedbyhigh-stakestestinghascur-tailedgeneralstudentknowledge.
Therearetwopossiblewaysofexplaininghowhigh-stakestestingcouldincreaseperformanceinlow-stakessubjects.Oneisthatgainsinonesubjectmayfacilitatemasteryofanother.Wecallthisthe“correlationeffect.”Anotheristhatimplementationofhigh-stakestestingcouldleadtotheadoptionofpoliciesandattitudesthatimproveperformancegenerally.Forexample,high-stakestestingcouldleadschoolstoexpectimprovedstudentachievementacrosstheboard,tobeshamedintoimprovingtheiroverallperformance,torecognizeexcellenceinothersubjects,andsoon.Rouseetal.findthatschoolsrespondedtoreceivingtheF-gradesanctioninavarietyofways,includinglengtheningschoolperiods(blockscheduling)andincreasingtimeforcollaborativeplanningandclasspreparation.Suchchangesintheoverallschoolenvironmentcouldaffecttheteachingofscienceasmuchastheydotheteach-ingofmathorreading.Werefertothispossibilityasthe“systemiceffect.”
Unfortunately,wecannotproduceatruecausalesti-mationoftheprevalenceofsystemicandcorrelationeffects.Wecan,however,producesomeevidencesug-gestingtherelativeimportanceofeachbyanalyzingaregressionofstudentproficiencyinscience,controllingforobservablecharacteristicsusedinthepreviousre-gression,andincludingacontrolforthetest-scoregainthatthestudentmadeinmathandreadingfromthe2001–02tothe2002–03schoolyear.Heretheestimateof thevariablesfor thestudent’sgains inmathandreadingmeasurestheircontributiontogainsinscience;asexplainedearlier,thisisthecorrelationeffect.Thevariableforthegradeearnedbythestudent’sschoolin2001–02measures thegrade’s impactonscienceproficiency independently of the correlation effect;thisiswhatwecallthe“systemiceffect.”
TheresultsofthisestimationarefoundinTable2.Wefindastronglypositiverelationshipbetweensciencescoresandgainsinmathandreading,indicatingthelikelyexistenceofacorrelationeffect.Thevariablerepresenting the independent effect of the F-gradesanctionongainsinscienceisquitesmallandstatis-ticallyinsignificant,indicatingthelackofasystemiceffect.TheseresultssuggestthattheentiregainfoundinscienceduetotheF-gradesanctionislikelyduetothecorrelationeffect.
6
![Page 13: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/13.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
7
Table 2. Estimating systemic and correlation effectdependent var: science
Coef. t
Prior Reading 0.644 167.060 ***
Prior Reading ^ 2 0.003 2.190 **
Prior Reading ^ 3 -0.008 -11.840 ***
Prior Math 0.343 81.770 ***
Prior Math ^ 2 0.010 7.050 ***
Prior Math ^ 3 -0.001 -1.770 *
Read Gain 0.424 104.120 ***
Read Gain ^ 2 -0.011 -3.310 ***
Read Gain ^ 3 -0.003 -1.820 *
Math Gain 0.281 60.540 ***
Math Gain ^ 2 0.010 3.270 ***
Math Gain ^ 3 -0.004 -3.080 ***
A 0.023 0.670
B 0.009 0.290
C 0.002 0.080
F 0.001 0.020
R-Squared 0.7371
N 150,458
Estimated with OLS with robust standard errors clustered by school.Dependent variable is the student’s test score on the fifth-grade science exam. Models additionally control for year, limited English- proficiency status, free or reduced-price lunch status, race, gender, disability classification, predicted score in summer of 2002 if the old grading system is kept, a cubic function for the number of points school earned in summer of 2002.
* Statistically significant at 10% level** Statistically significant at 5% level*** Statistically significant at 1% level
SuMMAry ANd dISCuSSION
Inthispaper,wehaveevaluatedwhethertheF-grade sanction inFlorida’sA+programhas ledschoolstoincreasestudentlearninginthehigh-
stakessubjectsofmathandreadingtothedetrimentoflearningintheimportantbutlow-stakessubjectofscience.OurresultsindicatethattheF-gradesanctionledtosubstantialstudentgainsinthelearningofmath,reading,andscience.Finally,weproducedasimplemodeltoexplaintheimpactofhigh-stakestestingonstudent learning in low-stakessubjects.Weprovidesomeevidencesuggestingthatvirtuallyallthepositivefindingsinscienceareattributabletocomplementari-tiesinthelearningofmathandreading.
Itcouldbesaidthatstudentperformanceinscienceisnot themostauthoritative testof thepropositionthathigh-stakestestingcrowdsoutinstructioninothersubjects,sincescienceproficiencymaybemorede-pendentonreadingandmathproficiencythanothersubjectsare.Ineffect,suchcriticismwouldbeassum-ingthevalidityofthecorrelationeffect.
BecausestudentsinFloridadonottakestandardizedtestsinotherlow-stakessubjects,weareunabletotestthishypothesis.Itshouldnotbeforgotten,however,thatmuchofthediscussionofthecrowding-outeffectfocusesonitsimpactonsciencelearning.Neverthe-less,welookforwardtofutureworkevaluatingtheimpactofhigh-stakestestingonstudentlearninginlow-stakessubjectsotherthanscience.
![Page 14: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/14.jpg)
Civ
ic R
epor
t 54
July 2008
endnoteS
1. The voucher provision of this policy was recently overturned by the state’s supreme court, though it was in ef-
fect during all the years in which this study takes place.
2. A more technical treatment of the methods utilized in this paper is available online at
http://www.manhattan-institute.org/pdf/cr_54_tech_version.pdf.
3. This simply means that we included variables for the points, the points squared, and the points cubed. Use of
the cubic function allows for a more flexible model because it relaxes the assumption of linearity in measuring the
impact of school points on student proficiency. That is, only controlling for the number of points earned by the
school makes the strong assumption that every point has the same impact on science proficiency. That is, it would
assume that the impact on a student’s science proficiency of a school’s raising its score from 100 points to 110
points was identical to the impact of a school’s raising its score from 200 to 210 points. The cubic function allows
us to account for nonlinearities in this relationship. The same basic argument also holds for the control for a cubic
function of the child’s prior test scores, which are also discussed and used in this analysis.
8
![Page 15: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/15.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
RefeRenCeS
Carnoy, Martin. 2001. “Do School Vouchers Improve Student Performance?” American Prospect 12, no. 1 (special
report, January): 42–45.
Center on Education Policy (Washington, D.C.). 2006. “From the Capital [sic] to the Classroom Year 4 of the No Child
Left Behind Act.” Manuscript. Center. Washington, D.C.
Chakrabarti, R. 2005. “Do Public Schools Facing Vouchers Behave Strategically?: Evidence from Florida.” Manuscript.
Program on Education Policy and Governance.
Figlio, David N., and Cecilia Rouse. 2005. “Do Accountability and Voucher Threats Improve Low-Performing Schools?”
Journal of Public Economics 90 (January): 239–55.
Gordon, Jenny. 2002. “From Broadway to the ABCs: Making Meaning of Arts Reform in the Age of Accountability.”
Educational Foundations 16, no. 2 (spring): 33–53.
Greene, Jay P. 2001. “An Evaluation of the Florida A-Plus Accountability and School Choice Program.” Manuscript.
Manhattan Institute.
Greene, Jay P., and Marcus A. Winters. 2004. “Competition Passes the Test.” Education Next 4, no. 3: 66–71.
Greene, Jay P., Marcus A. Winters, and Greg Forster. 2004. “Testing High-Stakes Tests: Can We Believe the Results of
Accountability Tests?” Teachers College Record 106, no. 6: 1124–44.
Groves, Paula. 2002. “ ‘Doesn’t It Feel Morbid Here?’: High-Stakes Testing and the Widening of the Equity Gap.”
Educational Foundations 16, no. 2 (spring): 15–31.
Gunzenhauser, Michael G. 2003. “High-Stakes Testing and the Default Philosophy of Education.” Theory into Practice
42, no. 1 (winter): 51–58.
Harris, Doug. 2001. “What Caused the Effects of the Florida A+ Program: Ratings or Vouchers?,” in School Vouchers:
Examining the Evidence, ed. Martin Carnoy. Economic Policy Institute.
Jacob, Brian A. 2005. “Accountability, Incentives and Behavior: The Impact of High-Stakes Testing in the Chicago Public
Schools.” Journal of Public Economics 89, nos. 5–6 (June): 761–96.
Jacob, Brian A., and Steven D. Levitt. 2003. “Rotten Apples: An Investigation of the Prevalence and Predictors of
Teacher Cheating.” Quarterly Journal of Economics 118, no. 3 (August): 843–77.
Jones, M. Gail, Brett D. Jones, Belinda Hardin, and Lisa Chapman. 1999. “The Impact of High-Stakes Testing on
Teachers and Students in North Carolina.” Phi Delta Kappan 81, no. 3 (November): 199.
9
![Page 16: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/16.jpg)
Civ
ic R
epor
t 54
July 2008
10
King, Richard A., and Judith K. Mathers. 1997. “Improving Schools through Performance-Based Accountability and
Financial Rewards.” Journal of Education Finance 23, no. 2 (fall): 147–76.
Ladd, Helen F. 2001. “School-Based Educational Accountability Systems: The Promise and the Pitfalls.” National Tax
Journal 54, no. 2 (June): 385–400.
McNeil, Linda M. 2000. “Creating New Inequalities: Contradictions of Reform.” Phi Delta Kappan 81, no. 10
(June): 728–34.
Murillo, Enrique G., and Susana Y. Flores. 2002. “Reform by Shame: Managing the Stigma of Labels in High-Stakes
Testing.” Educational Foundations 16, no. 2 (spring): 93–108.
New York State Education Department. 2004. “The Impact of High-Stakes Exams on Students and Teachers.”
NYSED Policy Brief.
Nichols, Sharon Lynn, and David C. Berliner. 2007. Collateral Damage: How High-Stakes Testing Corrupts America’s
Schools. Cambridge, Mass.: Harvard Education Press.
Patterson, Jean A. 2002. “Exploring Reform as Symbolism and Expression of Belief.” Educational Foundations 16,
no. 2 (spring): 55–75.
Rouse, C. E., J. Hannaway, D. Goldhaber, and D. Figlio. 2007. “Feeling the Florida Heat?: How Low-Performing
Schools Respond to Voucher and Accountability Pressure.” National Center for Analysis of Longitudinal Data in
Education Research, Working Paper 13.
West, Martin R., and Paul E. Peterson. 2006. “The Efficacy of Choice Threats within School Accountability Systems:
Results from Legislatively Induced Experiments.” Economic Journal 116, no. 510 (March): C46–62.
![Page 17: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/17.jpg)
Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects
noteS
![Page 18: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/18.jpg)
Civ
ic R
epor
t 54
July 2008
noteS
![Page 19: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/19.jpg)
![Page 20: Building on the Basics: the impact of high-stakes testing on ...students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding](https://reader034.vdocuments.net/reader034/viewer/2022052020/6034320034a66107123e1c7b/html5/thumbnails/20.jpg)
The Center for Civic Innovation’s (CCI) mandate is to improve the quality of life in cities by
shaping public policy and enriching public discourse on urban issues. The Center sponsors
studies and conferences on issues including education reform, welfare reform, crime reduction,
fiscal responsibility, immigration, counter-terrorism policy, housing and development, and
prisoner reentry. CCI believes that good government alone cannot guarantee civic health,
and that cities thrive only when power and responsibility devolve to the people closest to
any problem, whether they are concerned parents, community leaders, or local police.
www.manhattan-institute.org/cci
The Manhattan Institute is a 501(C)(3) nonprofit organization. Contributions are tax-
deductible to the fullest extent of the law. EIN #13-2912529
CenteR foR CiviC innovation
Stephen Goldsmith, Advisory Board Chairman Emeritus
Howard Husock, Vice President, Policy Research
fellowS
Edward Glaeser Jay P. Greene
George L. Kelling Edmund J. McMahon
Peter Salins Fred Siegel
Marcus A. Winters