building on the basics: the impact of high-stakes testing on ...students’ performance on...

C E N T E R F O R C I V I C I N N O V A T I O NA T T H E M A N H A T T A N I N S T I T U T E

C C i

Civ

iC R

epo

RtN

o. 5

4 Ju

ly 2

008

Publ

ishe

d by

Man

hatt

an In

stitu

te

Building on the Basics: the impact of

high-stakes testing on student Proficiency in low-

stakes subjects

Marcus A. Winters, Ph.D.Senior Fellow, Manhattan Institute

Jay P. Greene, Ph.D.Senior Fellow, Manhattan Institute

Julie R. Trivitt, Ph.D.Assistant Professor, Arkansas Tech University

Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects

exeCutive SummaRy

School systems across the nation have adopted policies that reward or sanction particular schools on the basis of their students’ performance on standardized math and reading tests. One of the most frequently raised concerns regarding such “high-stakes testing” policies is that they oblige schools to focus on subjects for which they are held accountable but to neglect the rest. Many have worried that the limited focus of these policies could have an unintended negative effect on student proficiency in other subjects, such as science, that are important to the development of human capital and thus to future economic growth.

This paper uses a regression discontinuity design utilizing student-level data to evaluate the impact of sanctions under Florida’s high-stakes testing policy on student proficiency in science. Under that state’s A+ program, every public school receives a letter grade from A to F that is based primarily upon its students’ performance on the state’s standardized math and reading exams. Students in Florida were also administered a standardized exam in science, but this test was low-stakes because its results held no consequences under the A+ program or any other formal accountability policy.

Previous research has found that the rewards and sanctions of receiving an F grade in the prior year led to improved gains in student proficiency in the high-stakes subjects of math and reading. This current paper is the first to evaluate the impact of the incentives under this high-stakes testing system on student proficiency in science. This paper adds to a sparse previous literature quantitatively evaluating whether high-stakes testing policies have “crowded out” learning in a low-stakes subject.

The primary findings of the study are:

• The F-grade sanction produced after one year a gain in student science proficiency of about a 0.08 standard deviation. These gains are similar to those in reading and appear smaller than the gains in math that were due to the F sanction.

• There is some evidence to suggest that student science proficiency increased primarily because student learning in math and reading enabled that increase. That is, learning in math and reading appear to contribute to learning in science.

Civ

ic R

epor

t 54

July 2008

about the authoRS

MaRcus a. WinteRs is a senior fellow at the Manhattan Institute. He has conducted studies of a variety of education

policy issues including high-stakes testing, performance pay for teachers, and the effects of vouchers on the public

school system. His research has been published in the journals Education Finance and Policy, Teachers College Record,

and Education Next. His op-ed articles have appeared in numerous newspapers, including the Wall Street Journal,

Washington Post, and USA Today. He received a B.A. in political science from Ohio University in 2002 and a Ph.D. in

economics from the University of Arkansas in 2008.

JaY P. gReene is endowed chair and head of the Department of Education Reform at the University of Arkansas and a senior fellow at the Manhattan Institute. He conducts research and writes about topics such as school choice, high school graduation rates, accountability, and special education.

Dr. Greene’s research was cited four times in the U.S. Supreme Court’s opinions in the landmark Zelman v. Simmons-Harris case on school vouchers. His articles have appeared in policy journals such as The Public Interest, City Journal, and Education Next; in academic journals such as The Georgetown Public Policy Review, Education and Urban Society, and The British Journal of Political Science; as well as in newspapers such as the Wall Street Journal and the Washington Post. He is the author of Education Myths (Rowman & Littlefield, 2005). Dr. Greene has been a professor of government at the University of Texas at Austin and the University of Houston. He received a B.A. in history from Tufts University in 1988 and a Ph.D. from the Department of Government at Harvard University in 1995.

Julie R. tRivitt is an assistant professor of economics at Arkansas Tech University. She earned her Ph.D. in economics from the University of Arkansas in December 2006. Her focus was health economics and applied econometrics. Since completing her Ph.D., she has worked as a senior research associate in the Department of Education Reform at the University of Arkansas and as a consultant on education projects to the World Trade Center in Brescia, Italy. Her research agenda focuses on human capital, health, labor, and education economics.


introduction

Florida’s a+ accountability Program

data and Method

the impact of the F-grade sanction on student Proficiency in high-

and low-stakes subjects

understanding the effects of the F-grade sanction on

science Proficiency

summary and discussion

endnotes

References

CONTENTS1

3

4

5

6

7

8

9

Civ

ic R

epor

t 54

July 2008


INTrOduCTION

Schoolsystemsacrossthenationhaveadoptedpoliciesthatrewardorsanctionparticularschoolsonthebasisoftheirstudents’performanceonstandardizedtests.Suchtestinghas been a dominant force in education policy since at

leastthe1990s.Morethanhalfthestateshadalreadyimplementedsomeformofhigh-stakestestbeforetheNoChildLeftBehindAct(NCLB)madeituniversalin2002.Wecallatestahigh-stakestestwhentherearemeaningfulconsequencesforschoolsorstudentsthatarebasedonhowstudentsperformonthetest.

Oneofthemostfrequentlyraisedconcernsregardinghigh-stakestestingpoliciesisthattheyobligeschoolstofocusonsubjectsforwhichtheyareheldaccountablebuttoneglect therest(NicholsandBerliner2007;Gunzenhauser2003;Groves2002;Patterson2002;MurilloandFlores2002;McNeil2000;Jonesetal.1999).Thevastmajorityofthesepoliciesbasetheirrewardsorsanctionsexclusivelyontheresultsofreadingandmathtests.Thoughsomepoliciesaremoreexpansivethanothers,fewthreatenmeaningfulconsequenceswhenstudentsfailtomeetstandardsinsubjectssuchasscience,his-tory,orthearts.FailuretoassurestudentmasteryofsubjectsotherthanbasicmathandreadingcouldhaveimportantimplicationsforthefutureofhumancapitalintheUnitedStates.

Ifschoolsreallocatetimeandresourcesawayfromimportantbutlow-stakessubjectsandtowardthehigh-stakessubjects,withthe

Marcus A. Winters, Jay P. Greene & Julie r. Trivitt

1

building on the baSiCS: the impaCt of

high-StakeS teSting on Student pRofiCienCy in

low-StakeS SubjeCtS

Civ

ic R

epor

t 54

July 2008

examsthatcouldskewresults(Jacob2005;JacobandLevitt2003).PreviousresearchinFloridafoundthattheresultsofthatstate’shigh-stakesexamshavenotbeensystematicallymanipulatedandaregenerallyreliableindicatorsofstudentproficiency(Greene,Winters,andForster2004;WestandPeterson2006).

Florida’s high-stakes testing program is also worthstudying because its accountability system, unlikethatofmanyotheraccountabilitysystems,makes itpossibletousearigorous“regressiondiscontinuity”design,whichallowsforacausaltestoftheimpactoftheprogram’ssanctions.Beginninginthe2001–02schoolyear,schoolsreceivedlettergradesreflectingpointsearnedunderanelaboratesystemforcaptur-ing several aspects of a school’s performance. Asdescribedbelow,Florida imposesmeaningful sanc-tionsonlywhenaschoolreceivesafailinggrade.WefollowthestrategyofapreviouspaperbyRouseetal.(2007)thatusesthechangeinthepolicytocontrolfortheheterogeneityofschoolsthatreceiveafailingorpassinggrade.

Wefind that students attending schools designatedasfailingintheprioryearmadegreatergainsonthestate’sscienceexamthantheywouldhavedoneiftheirschoolhadnotreceivedtheFsanction.Thegainsthatstudentsmadeinscienceweresimilartothosethatpreviousresearch(whichwereplicatehere)hasfoundthatstudentsmadeinthehigh-stakessubjectsofmathandreading.ThesefindingssuggestthattheincentivesofFlorida’shigh-stakestestingprogramhavenotledtosignificantcrowdingoutofstudentknowledgeinthelow-stakessubjectofscience.

Atfirst,ourresultsmayseemcounterintuitive,inthathigh-stakestestinginonlycertainsubjectswouldbeexpectedtoleadschoolstofocusonthoseareas.Infact,encouragingschools toshift theirpriorities to-wardsubjectscommonlyrecognizedasacademicallyimportant(i.e.,mathandreading)isarguablyoneofthepurposesofthepolicy.

Therearetworeasonsthathigh-stakestestingmightinsteadhaveapositiveeffectonstudentachievementinlow-stakessubjects.First,thepressureofaccount-abilitytestingcouldleadschoolstoadoptreformsthat

resultthatstudentsachievedinthehigh-stakessub-jectsattheexpenseofproficiencyinthelow-stakessubjects,wewouldsaythatthepolicy“crowdedout”learninginthelow-stakessubjects.Itisimportanttonotethatthisdefinitionofcrowdingoutfocusesonlearningoutput,notteachinginputs.Inotherwords,ifschoolsincreasedtimespentonmathorreadingbydecreasingtimespentonscience,wewouldconsiderhigh-stakestestingofmathorreadingtohavecrowdedoutscienceteachingonlyifstudentsactuallylearnedlessscienceasaresult.

A substantial amount of anecdotal and qualitativeevidence suggests that schools and teachers haveresponded to high-stakes testing by adjusting theirteachingstyles(McNeil2000;NewYorkStateEduca-tion Department 2004) and by shifting focus awayfromlow-stakessubjects(CenteronEducationPolicy2006;Jonesetal.1999;KingandMathers1997;Gordon2002;Groves2002;MurilloandFlores2002).However,thereiscurrentlyverylittleempiricalevidenceoftheimpact of high-stakes testing policies onmeasuredstudentproficiencyinsubjectsthatarenotpartoftheaccountabilitysystem.

In the only quantitative evaluation of this topic ofwhichweareaware,Jacob(2005)foundthatChicago’shigh-stakestestingsystemledtosignificantlearninggainsinthelow-stakessubjectsofscienceandsocialstudies. However, he found that these gains weresmallerthanthoseinthehigh-stakessubjectsofmathandreading.

Inthispaper,weaddtothelimitedpreviousresearchby evaluating the effects on student proficiency inthelow-stakessubjectofscienceandthehigh-stakessubjectsofmathandreadingofahigh-stakestestingsysteminFloridathatemployssanctions.TherearetwoimportantreasonstoresearchthisquestioninasystemotherthanChicago’s.First,byevaluatingtheimpactof sanctions under high-stakes testing on studentproficiencyinlow-stakessubjectsinanotherschoolsystem,wecanhelpdeterminewhethertheresultsinChicagoarelimitedtothatareaorholdmoregener-ally.Second,itisimportanttoinvestigateoutcomesin another system, since some research has foundsystematic manipulations of Chicago’s high-stakes

2


3

improvetheiroverallquality.Forexample,aschoolcouldmoreeffectivelymotivateitsstudents,oritcouldimproverelationswithitsteachers.Thoughschools’purposemaybetoimprovestudentscoresinmathandreadingtoavoidthesanctionsofthehigh-stakestesting policy, general improvements of this kindmightproduceacross-the-boardincreasesinstudentachievement.Second,sanctionsunderhigh-stakestest-ingcouldimprovestudentachievementinlow-stakessubjectsiftheresultingmasteryofhigh-stakessubjectsfacilitatesmasteryofothersubjects.

Thoughatruetestoftheprevalenceofeitherofthesekindsofexplanationsisnotavailabletous,wehavediscovered evidence suggesting that student profi-ciencyinsciencehasincreasedunderthehigh-stakessanctions primarily because the improvements thatstudentshavemade inmath and readinghave en-hancedtheirabilitytolearnsciencematerialaswell.However,westressthatfutureresearchusingstrongerstrategiesthanareavailableheretoexplainapositiverelationshipbetweenhigh-stakestestingandstudentimprovementinlow-stakessubjectsisnecessary.

FlOrIdA’S A+ ACCOuNTAbIlITy PrOGrAM

Florida is among the nation’s leaders in high-stakes testing. Most agree that the state’s A+AccountabilityProgram(A+)isoneofthemost

aggressiveprogramsofitskind.Itwasclearlyatem-plateforthefederalNCLBlaw.

Eachyear,thestateadministersastandardizedtest,theFloridaComprehensiveAssessmentTest(FCAT),inmathandreadingtoallpublicschoolstudentsinthestatewhoareenrolledingrades3–10.Schoolsreceivelettergrades,fromAtoF,basedontheper-centageoftheirstudentsmeetingparticularachieve-mentlevelsandtheacademicprogressofstudentsincertainsubgroups.

Therearetwoimportantreasonsthatwemightexpectschoolsdeemedtobefailingtorespondpositively.ThosethathavereceivedanFgradeforthefirsttimemaybeshamedintoimprovingtheirperformance(Fi-

glioandRouse2005;Ladd2001;Carnoy2001;Harris2001).Thosethathavereceivedat leastonefailinggrademaydecidetoraisetheirperformancebecausetheyfearattritionoftheirstudentbody.ThismayoccurastheresultofapolicyofissuingOpportunitySchol-arships (vouchers) to students in schools thathavereceivedtwofailinggradeswithinafour-yearperiodthattheycanusetoattendanotherpublicschooloraprivateschoolwillingtoacceptthevoucherasafulltuitionpayment.1Inthispaper,wearenotparticularlyconcernedwithwhethertheseoranyotherphenom-enadriveincreasesinstudentperformanceineitherhigh-orlow-stakessubjects.

AchangeintheadministrationoftheprogramprovidedaninterestingavenueforresearchingFlorida’spolicy.Intheprogram’sinitialyears,schoolgradeswerebasedon the percentage of students earning level 2 (thesecond-lowestoffivelevels)oraboveontheread-ing,math,andwritingportionsoftheFCATandthepercentageofeligiblestudentstested.AschoolcouldavoidearninganFifatleast50%oftestedstudentsscoredatachievementlevel3inwriting,orif60%oftestedstudentsscoredatlevel2inreadingormathand90%ofeligiblestudentsweretested.Ifaschoolmetoneortwoofthesecriteria,itearnedaD.Ifitmetallthreeofthesecriteria,itearnedaC.SchoolswithparticularsubpopulationsmeetingallthreereceivedaB.ToearnanA,schoolshadtomeetmorestringentrequirementsfortheoverallstudentpopulationandeach subpopulation. The opinion was widespreadthatschoolshaddeterminedthatsatisfactoryscoresinwritingweretheeasiesttoachieveundertheoriginalschool-gradingformatandthattheteachingofwrit-inginstrugglingschoolsthereforestressedtechniquesgearedtothewritingportionoftheexam.

Startinginthe2001–02schoolyear,Floridaadoptedan accumulating point system to evaluate schools.Schoolsearnonepointforeachpercentofstudentswhoscoreinachievementlevels3,4,or5(thethreehighestoffive levels) inreadingandonepoint foreachpercentofstudentswhoscoreinlevels3,4,or5inmath.Schoolsearnonepointforeachpercentofstudentsscoring3.5oraboveinwriting,whichisgradedfrom1to6.Schoolsearnonepointforeachpercentofstudentswhomakelearninggainsinread-

Civ

ic R

epor

t 54

July 2008

ingandonepointforeachpercentofstudentswhoreachahigherachievementlevelormaintaina3,4,or5 inmath.Schoolsalsoearnonepoint foreachpercentofthelowest-performingreaderswhomaketest-score improvements in the year in question.Aschool that earns fewer than 280points receives afailinggrade.Themultifariousnatureofthegradingprocesshasprobablymadedirectmanipulationofthesystemrelativelydifficult.

Beginninginthe2002–03schoolyear,Floridapublicschoolsalsowererequiredtotestforproficiencyinsciencewhentheyadministeredthemathandread-ingexams.ThesciencepartoftheFCATiscurrentlyadministeredtoallpublicschoolstudentsingrades5,8,and11.Theresultsofthescienceexamhavenowbeenincorporateddirectlyintotheaccountabilitypro-gram;butduringtheyearsofouranalysis,theyhadnoeffectontheschool’sgrade,nordidtheyrepresentanyotherformofofficialaccountability.

SeveralresearchershaveevaluatedtheimpactoftheA+programontheacademicgainsofpublicschoolstudents in math and reading (Rouse et al. 2007;GreeneandWinters2004;Chakrabarti2005;FiglioandRouse2005;WestandPeterson2006;Greene2001).Though there is some disagreement about whichaspectoftheaccountabilitypolicywaseffective(thethreatofvouchersortheshameofanFgrade),eachoftheseanalysesfoundthatthepolicyimprovedthemathandreadingproficiencyofstudentsinpublicschoolsdesignatedas failing.WeareawareofnopreviousresearchanalyzingtheimpactoftheA+programonsciencetestscores.

dATA ANd METhOd2

WeutilizeadatasetprovidedbytheFloridaDepartment of Education that containstestscoresinmath,reading,andscience

as well as demographic characteristics of the uni-verseofstudentsenrolledingrades3–10inFloridapublicschools.Wesupplement the individual-leveldatasetwithschool-levelinformation—specifically,theschool’spointtotalandlettergradeunderA+at

4

theendofthe2001–02schoolyear.Tosimplifythecomparisonofscoresindifferentsubjects,weconverttheFCATscoresofstudentswhowereinoursampleinto a scale scorewith amean of 0 and standarddeviationof1.

Inordertoalignourfindingswiththoseinthepre-vious literature, we utilize the comparison strategyimplementedina2007studyconductedbyRouseetal.thatevaluatedtheimpactofFlorida’sA+policyonstudentachievementinmathandreading.OursampleconsistsoftheuniverseofFloridapublicschoolstu-dentswhowereenrolledinthefifthgradein2002–03andwerepromotedattheendoftheprioryear.Thiswas the first class of fifth-grade students attendingaschool thathadreceiveda lettergradeunder therevisedpointsystemoftheA+policy.Wefocusononlythosestudentswithbothamathandreadingtestscorereportedin2001–02and2002–03.

Wesupplementtheindividual-leveldatawithadmin-istrativeinformationontheschool’sgradeandpointsearnedunder theA+systemduring thesummerof2002.Intheanalysesthatfollow,alongwithobservablecharacteristicsofthestudentandschoolwecontrolforboththeschool’slettergradeattheendofthe2001–02yearand the totalpointsearnedunder thegradingsystem.Theideahereisthatcontrollingforthepointsearnedbytheschoolaccountsfordifferencesinschoolperformance,andthustheremainingdifferences intheperformanceofstudentsatschoolsreceivinganFgrademustreflectresponsestotheincentivesthatexistundertheaccountabilitypolicy.

Weusethisgeneralcomparisonstrategytoperforma series of cross-sectional regressions. We are firstconcernedwithdiscoveringwhetherstudentsmadeacademicgainsinscienceduetotheFsanction,andwealsoconfirmthefindingofanimpactofthesanc-tiononstudentproficiencyinmathandreading.WethenevaluatetheextenttowhichanygainsmadebystudentsinscienceduetotheFsanctionweredrivenbyimprovementsintheoverallperformanceoftheschoolorasymbioticrelationshipbetweenlearninginthehigh-andlow-stakessubjects.


5

ThE IMPACT OF ThE F-GrAdE SANCTION ON STudENT PrOFICIENCy IN hIGh- ANd lOW-STAkES SubJECTS

WeadoptthestrategyofRouseetal.tomea-suretheimpactoftheF-gradesanctiononstudentproficiencyinscience.Asacheck

onourprocedure,wealsoattempttoreplicateinmathandreadingtheresultsofthispreviouspaper.

To evaluate the impact of the F-grade sanction onstudent proficiency, we estimated cross-sectionalregressionmodelsusing thestudent’s test scoreonthe fifth-grade test in 2002–03 in the subject beingevaluatedasthedependentvariable.Theregressioncontrols for a variety of observable characteristicsaboutthestudentandschool,includingthelettergradeandcubicfunction3ofthenumberofpointsearnedbytheschoolattheendofthe2001–02schoolyear.Thevariableofinterestisthatwhichindicateswhetherthechild’sschoolreceivedanFgradeintheprioryear.

In themathandreadinganalyses,wecontrol foracubicfunctionofthestudent’stestscoreinthatsub-jectattheendofthepreviousyear(whenthestudentwasinthefourthgrade),whichallowsustomeasureimprovementsinthestudent’smathandreadingprofi-ciencyandaccountforunobserveddifferencesamongstudents.Unfortunately,studentsdidnottakeascienceexaminthefourthgrade,soasimilarcontrolisnotavailableforthescienceevaluation.Instead,weusethecubicfunctionsofthestudents’fourth-gradescoresinmathandreadingtosubstitutefortheirscoresinsci-ence.Thisprocedureassumesthatstudentproficiencyin thesesubjects ishighlycorrelatedandthat therewasnodifferentialrelationshipinstudentknowledgeamongthesesubjectsinthefivecategoriesofschoolsbeforetheF-gradesanctionwasintroduced.

Theresultsofourestimationsofstudentproficiencyinmath,reading,andsciencearereportedinTable1.OurfindingsinmathandreadingareverysimilartothosereportedbyRouseetal.(2007).Ourestimation

dependent var: Reading Math science

Coef. t Coef. t Coef. t

Prior Reading 0.759 238.220 *** 0.539 133.430 ***

Prior Reading ^ 2 -0.009 -6.490 *** -0.008 -5.120 ***

Prior Reading ^ 3 -0.024 -40.290 *** -0.016 -23.400 ***

Prior Math 0.806 214.900 *** 0.329 79.810 ***

Prior Math ^ 2 -0.004 -2.490 ** 0.015 10.480 ***

Prior Math ^ 3 -0.029 -42.980 *** -0.009 -13.550 ***

a -0.003 -0.100 0.003 0.060 0.023 0.510

B -0.005 -0.180 0.005 0.110 0.006 0.160

c 0.006 0.320 0.002 0.090 0.003 0.120

F 0.086 2.940 *** 0.175 3.840 *** 0.087 2.160 **

R-squared 0.6949 0.6871 0.6588

n 152,003 152,003 151,604

Table 1. regressions evaluating the impact of F-grade sanction on student proficiency and gains in high- and low-stakes subjects

Estimated with OLS with robust standard errors clustered by school.Models additionally control for year, limited English-proficiency status, free or reduced-price lunch status, race, gender, disability classification, predicted score in summer of 2002 if the old grading system is kept, a cubic function for the number of points school earned in summer of 2002.

* Statistically significant at 10% level ** Statistically significant at 5% level *** Statistically significant at 1% level

Civ

ic R

epor

t 54

July 2008

suggeststhatthescoresofstudentsenrolledinanF-gradedschoolexceededby0.09standarddeviationsinreadingand0.17standarddeviationsinmaththescoresofstudentsinD-gradedschools.Atthesametime,therewasnostatisticallysignificantdifferenceintheperformanceofA-,B-,C-,andD-gradedschools,strengthening the view that sanctions have an ef-fectonperformance.ThesimilarityofourresultstothosereportedbyRouseetal.suggeststhatwehavereproduced their procedure relatively well, whichshouldprovideadditionalconfidenceinourfindingsforscience.

Column3ofTable1 reports the results inscience.HerewefindthattheF-gradesanctionproducedaf-teroneyearagaininstudentproficiencyofabouta0.08standarddeviationrelativetostudentsinschoolsthatearnedaDgrade.Theresultissignificantatthe5%confidencelevel,meaningthatwecanbehighlyconfidentthattheFsanctionhadapositiveimpactonstudents’sciencetestscores.

ThesefindingssuggestthattheF-gradesanctionnotonly improved student learning in the high-stakessubjects;italsohadapositiveeffectonstudentprofi-ciencyinthelow-stakessubjectofscience.ItappearsthatthepositiveimpactoftheFsanctiononscienceproficiencywassimilartothatfoundinreadingbutsomewhatlowerthanthatfoundinmath.

uNdErSTANdING ThE EFFECTS OF ThE F-GrAdE SANCTION ON SCIENCE PrOFICIENCy

Ourfindingthat theF-gradesanction ledtosubstantial improvements in scienceprofi-ciencyseemsoddatfirstglance.Itisclear

thatunderFlorida’spolicy,point-maximizingpublicschoolshaveanincentivetofocusonthehigh-stakessubjects,evenifdoingsoistothedetrimentofthelow-stakessubjects.Qualitativeresearchandanecdotalevidencesuggest,moreover,thatthereallocationofresourcesprecipitatedbyhigh-stakestestinghascur-tailedgeneralstudentknowledge.

Therearetwopossiblewaysofexplaininghowhigh-stakestestingcouldincreaseperformanceinlow-stakessubjects.Oneisthatgainsinonesubjectmayfacilitatemasteryofanother.Wecallthisthe“correlationeffect.”Anotheristhatimplementationofhigh-stakestestingcouldleadtotheadoptionofpoliciesandattitudesthatimproveperformancegenerally.Forexample,high-stakestestingcouldleadschoolstoexpectimprovedstudentachievementacrosstheboard,tobeshamedintoimprovingtheiroverallperformance,torecognizeexcellenceinothersubjects,andsoon.Rouseetal.findthatschoolsrespondedtoreceivingtheF-gradesanctioninavarietyofways,includinglengtheningschoolperiods(blockscheduling)andincreasingtimeforcollaborativeplanningandclasspreparation.Suchchangesintheoverallschoolenvironmentcouldaffecttheteachingofscienceasmuchastheydotheteach-ingofmathorreading.Werefertothispossibilityasthe“systemiceffect.”

Unfortunately,wecannotproduceatruecausalesti-mationoftheprevalenceofsystemicandcorrelationeffects.Wecan,however,producesomeevidencesug-gestingtherelativeimportanceofeachbyanalyzingaregressionofstudentproficiencyinscience,controllingforobservablecharacteristicsusedinthepreviousre-gression,andincludingacontrolforthetest-scoregainthatthestudentmadeinmathandreadingfromthe2001–02tothe2002–03schoolyear.Heretheestimateof thevariablesfor thestudent’sgains inmathandreadingmeasurestheircontributiontogainsinscience;asexplainedearlier,thisisthecorrelationeffect.Thevariableforthegradeearnedbythestudent’sschoolin2001–02measures thegrade’s impactonscienceproficiency independently of the correlation effect;thisiswhatwecallthe“systemiceffect.”

TheresultsofthisestimationarefoundinTable2.Wefindastronglypositiverelationshipbetweensciencescoresandgainsinmathandreading,indicatingthelikelyexistenceofacorrelationeffect.Thevariablerepresenting the independent effect of the F-gradesanctionongainsinscienceisquitesmallandstatis-ticallyinsignificant,indicatingthelackofasystemiceffect.TheseresultssuggestthattheentiregainfoundinscienceduetotheF-gradesanctionislikelyduetothecorrelationeffect.

6


7

Table 2. Estimating systemic and correlation effectdependent var: science

Coef. t

Prior Reading 0.644 167.060 ***

Prior Reading ^ 2 0.003 2.190 **

Prior Reading ^ 3 -0.008 -11.840 ***

Prior Math 0.343 81.770 ***

Prior Math ^ 2 0.010 7.050 ***

Prior Math ^ 3 -0.001 -1.770 *

Read Gain 0.424 104.120 ***

Read Gain ^ 2 -0.011 -3.310 ***

Read Gain ^ 3 -0.003 -1.820 *

Math Gain 0.281 60.540 ***

Math Gain ^ 2 0.010 3.270 ***

Math Gain ^ 3 -0.004 -3.080 ***

A 0.023 0.670

B 0.009 0.290

C 0.002 0.080

F 0.001 0.020

R-Squared 0.7371

N 150,458

Estimated with OLS with robust standard errors clustered by school.Dependent variable is the student’s test score on the fifth-grade science exam. Models additionally control for year, limited English- proficiency status, free or reduced-price lunch status, race, gender, disability classification, predicted score in summer of 2002 if the old grading system is kept, a cubic function for the number of points school earned in summer of 2002.

* Statistically significant at 10% level** Statistically significant at 5% level*** Statistically significant at 1% level

SuMMAry ANd dISCuSSION

Inthispaper,wehaveevaluatedwhethertheF-grade sanction inFlorida’sA+programhas ledschoolstoincreasestudentlearninginthehigh-

stakessubjectsofmathandreadingtothedetrimentoflearningintheimportantbutlow-stakessubjectofscience.OurresultsindicatethattheF-gradesanctionledtosubstantialstudentgainsinthelearningofmath,reading,andscience.Finally,weproducedasimplemodeltoexplaintheimpactofhigh-stakestestingonstudent learning in low-stakessubjects.Weprovidesomeevidencesuggestingthatvirtuallyallthepositivefindingsinscienceareattributabletocomplementari-tiesinthelearningofmathandreading.

Itcouldbesaidthatstudentperformanceinscienceisnot themostauthoritative testof thepropositionthathigh-stakestestingcrowdsoutinstructioninothersubjects,sincescienceproficiencymaybemorede-pendentonreadingandmathproficiencythanothersubjectsare.Ineffect,suchcriticismwouldbeassum-ingthevalidityofthecorrelationeffect.

BecausestudentsinFloridadonottakestandardizedtestsinotherlow-stakessubjects,weareunabletotestthishypothesis.Itshouldnotbeforgotten,however,thatmuchofthediscussionofthecrowding-outeffectfocusesonitsimpactonsciencelearning.Neverthe-less,welookforwardtofutureworkevaluatingtheimpactofhigh-stakestestingonstudentlearninginlow-stakessubjectsotherthanscience.

Civ

ic R

epor

t 54

July 2008

endnoteS

1. The voucher provision of this policy was recently overturned by the state’s supreme court, though it was in ef-

fect during all the years in which this study takes place.

2. A more technical treatment of the methods utilized in this paper is available online at

http://www.manhattan-institute.org/pdf/cr_54_tech_version.pdf.

3. This simply means that we included variables for the points, the points squared, and the points cubed. Use of

the cubic function allows for a more flexible model because it relaxes the assumption of linearity in measuring the

impact of school points on student proficiency. That is, only controlling for the number of points earned by the

school makes the strong assumption that every point has the same impact on science proficiency. That is, it would

assume that the impact on a student’s science proficiency of a school’s raising its score from 100 points to 110

points was identical to the impact of a school’s raising its score from 200 to 210 points. The cubic function allows

us to account for nonlinearities in this relationship. The same basic argument also holds for the control for a cubic

function of the child’s prior test scores, which are also discussed and used in this analysis.

8


RefeRenCeS

Carnoy, Martin. 2001. “Do School Vouchers Improve Student Performance?” American Prospect 12, no. 1 (special

report, January): 42–45.

Center on Education Policy (Washington, D.C.). 2006. “From the Capital [sic] to the Classroom Year 4 of the No Child

Left Behind Act.” Manuscript. Center. Washington, D.C.

Chakrabarti, R. 2005. “Do Public Schools Facing Vouchers Behave Strategically?: Evidence from Florida.” Manuscript.

Program on Education Policy and Governance.

Figlio, David N., and Cecilia Rouse. 2005. “Do Accountability and Voucher Threats Improve Low-Performing Schools?”

Journal of Public Economics 90 (January): 239–55.

Gordon, Jenny. 2002. “From Broadway to the ABCs: Making Meaning of Arts Reform in the Age of Accountability.”

Educational Foundations 16, no. 2 (spring): 33–53.

Greene, Jay P. 2001. “An Evaluation of the Florida A-Plus Accountability and School Choice Program.” Manuscript.

Manhattan Institute.

Greene, Jay P., and Marcus A. Winters. 2004. “Competition Passes the Test.” Education Next 4, no. 3: 66–71.

Greene, Jay P., Marcus A. Winters, and Greg Forster. 2004. “Testing High-Stakes Tests: Can We Believe the Results of

Accountability Tests?” Teachers College Record 106, no. 6: 1124–44.

Groves, Paula. 2002. “ ‘Doesn’t It Feel Morbid Here?’: High-Stakes Testing and the Widening of the Equity Gap.”

Educational Foundations 16, no. 2 (spring): 15–31.

Gunzenhauser, Michael G. 2003. “High-Stakes Testing and the Default Philosophy of Education.” Theory into Practice

42, no. 1 (winter): 51–58.

Harris, Doug. 2001. “What Caused the Effects of the Florida A+ Program: Ratings or Vouchers?,” in School Vouchers:

Examining the Evidence, ed. Martin Carnoy. Economic Policy Institute.

Jacob, Brian A. 2005. “Accountability, Incentives and Behavior: The Impact of High-Stakes Testing in the Chicago Public

Schools.” Journal of Public Economics 89, nos. 5–6 (June): 761–96.

Jacob, Brian A., and Steven D. Levitt. 2003. “Rotten Apples: An Investigation of the Prevalence and Predictors of

Teacher Cheating.” Quarterly Journal of Economics 118, no. 3 (August): 843–77.

Jones, M. Gail, Brett D. Jones, Belinda Hardin, and Lisa Chapman. 1999. “The Impact of High-Stakes Testing on

Teachers and Students in North Carolina.” Phi Delta Kappan 81, no. 3 (November): 199.

9

Civ

ic R

epor

t 54

July 2008

10

King, Richard A., and Judith K. Mathers. 1997. “Improving Schools through Performance-Based Accountability and

Financial Rewards.” Journal of Education Finance 23, no. 2 (fall): 147–76.

Ladd, Helen F. 2001. “School-Based Educational Accountability Systems: The Promise and the Pitfalls.” National Tax

Journal 54, no. 2 (June): 385–400.

McNeil, Linda M. 2000. “Creating New Inequalities: Contradictions of Reform.” Phi Delta Kappan 81, no. 10

(June): 728–34.

Murillo, Enrique G., and Susana Y. Flores. 2002. “Reform by Shame: Managing the Stigma of Labels in High-Stakes

Testing.” Educational Foundations 16, no. 2 (spring): 93–108.

New York State Education Department. 2004. “The Impact of High-Stakes Exams on Students and Teachers.”

NYSED Policy Brief.

Nichols, Sharon Lynn, and David C. Berliner. 2007. Collateral Damage: How High-Stakes Testing Corrupts America’s

Schools. Cambridge, Mass.: Harvard Education Press.

Patterson, Jean A. 2002. “Exploring Reform as Symbolism and Expression of Belief.” Educational Foundations 16,

no. 2 (spring): 55–75.

Rouse, C. E., J. Hannaway, D. Goldhaber, and D. Figlio. 2007. “Feeling the Florida Heat?: How Low-Performing

Schools Respond to Voucher and Accountability Pressure.” National Center for Analysis of Longitudinal Data in

Education Research, Working Paper 13.

West, Martin R., and Paul E. Peterson. 2006. “The Efficacy of Choice Threats within School Accountability Systems:

Results from Legislatively Induced Experiments.” Economic Journal 116, no. 510 (March): C46–62.


noteS

Civ

ic R

epor

t 54

July 2008

noteS

The Center for Civic Innovation’s (CCI) mandate is to improve the quality of life in cities by

shaping public policy and enriching public discourse on urban issues. The Center sponsors

studies and conferences on issues including education reform, welfare reform, crime reduction,

fiscal responsibility, immigration, counter-terrorism policy, housing and development, and

prisoner reentry. CCI believes that good government alone cannot guarantee civic health,

and that cities thrive only when power and responsibility devolve to the people closest to

any problem, whether they are concerned parents, community leaders, or local police.

www.manhattan-institute.org/cci

The Manhattan Institute is a 501(C)(3) nonprofit organization. Contributions are tax-

deductible to the fullest extent of the law. EIN #13-2912529

CenteR foR CiviC innovation

Stephen Goldsmith, Advisory Board Chairman Emeritus

Howard Husock, Vice President, Policy Research

fellowS

Edward Glaeser Jay P. Greene

George L. Kelling Edmund J. McMahon

Peter Salins Fred Siegel

Marcus A. Winters

building on the basics: the impact of high-stakes testing on ...students’ performance on...

Documents