measuring youth program quality: a guide to assessment...

92
Nicole Yohalem and Alicia Wilson-Ahlstrom, The Forum for Youth Investment with Sean Fischer, New York University and Marybeth Shinn, Vanderbilt University Published by The Forum for Youth Investment January 2009

Upload: hanga

Post on 06-Mar-2018

220 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

NicoleYohalemandAliciaWilson-Ahlstrom,TheForumforYouthInvestmentwithSeanFischer,NewYorkUniversityandMarybethShinn,VanderbiltUniversity

PublishedbyTheForumforYouthInvestmentJanuary2009

Page 2: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment2

AbouttheForumforYouthInvestmentTheForumforYouthInvestmentisanonprofit,nonpartisan“actiontank”dedicatedtohelpingcommunitiesandthenationmakesureallyoungpeopleareReadyby21®–readyforcollege,workandlife.Informedbyrigorousresearchandpracticalexperience,theForumforgesinnovativeideas,strategiesandpartnershipstostrengthensolutionsforyoungpeopleandthosewhocareaboutthem.Atrustedresourceforpolicymakers,advocates,researchersandpractitioners,theForumprovidesyouthandadultleaderswiththeinformation,connectionsandtoolstheyneedtocreategreateropportunitiesandoutcomesforyoungpeople.

TheForumwasfoundedin1998byKarenPittmanandMeritaIrby,twoofthecountry’stopleadersonyouthissuesandyouthpolicy.TheForum’s25-personstaffisheadquarteredinWashingtonD.C.inthehistoricCady-LeeHousewithasatelliteofficeinMichiganandstaffinMissouri,NewMexico,VirginiaandWashington.

Page 3: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

SuggestedCitation:Yohalem,N.andWilson-Ahlstrom,A.withFischer,S.andShinn,M.(2009,January).MeasuringYouthProgramQuality:AGuide

toAssessmentTools,SecondEdition.

Washington,D.C.:TheForumforYouthInvestment.

©2009TheForumforYouthInvestment.Allrightsreserved.PartsofthisreportmaybequotedorusedaslongastheauthorsandtheForumforYouthInvestmentarerecognized.NopartofthispublicationmaybereproducedortransmittedforcommercialpurposeswithoutpriorpermissionfromtheForumforYouthInvestment.

PleasecontacttheForumforYouthInvestmentatTheCady-LeeHouse,7064EasternAve,NW,Washington,D.C.20012-2031,Phone:202.207.3333,Fax:202.207.3329,Web:www.forumfyi.org,Email:youth@forumfyi.orgforinformationaboutreprintingthispublicationandinformationaboutotherpublications.

NicoleYohalemandAliciaWilson-Ahlstrom,TheForumforYouthInvestmentwithSeanFischer,NewYorkUniversityandMarybethShinn,VanderbiltUniversity

Page 4: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

4

Theauthorswouldliketothankthefollowingprojectadvisorswhohelpeddeveloptheoriginalscopeofwork,providedinputintotheinterviewprotocolandreportoutlineandreviewedtheoriginalreportindraftform:

CarolBehrer,IowaCollaborationforYouth•Development

PriscillaLittle,HarvardFamilyResearchProject•

ElaineJohnson,NationalTrainingInstitutefor•CommunityYouthWork

JeffreyBuehler,MissouriAfterschoolState•Network

BobPianta,UniversityofVirginia•

MarybethShinn,VanderbiltUniversity•

WeareespeciallygratefulforthecontributionsofSeanFischerandMarybethShinn,whotooktheleadonreviewingthetechnicalpropertiesofeachinstrumentanddraftedthetechnicalsectionsofboththefirst(2007)andsecondeditionsofthisreport.

Thedevelopersofeachofthetoolsdescribedinthisreportalsodeservethanks,fortheirwillingnesstosharetheirmaterials,talkwithusandreviewdrafts.ThankstoJulieGoldsmith,AmyArbreton,BethMiller,WendySurr,EllenGannett,JudyNee,PeterHowe,SuzanneGoldstein,AjayKhashu,LizReisner,EllenPechman,ChristinaRussell,RheMcLaughlin,SaraMello,ThelmaHarms,CharlesSmith,DeborahVandellandKimPierce.

ThankstoKarenPittmanforherguidanceandsuggestionsthroughouttheprojectandtoseveralForumstaffmembers,includingNaliniRavindranathandLauraMattisfortheirassistanceinthelayout,designandeditingprocess.

Finally,thankstotheWilliamT.GrantFoundationforsupportingthisworkandinparticulartoBobGranger,VivianTsengandEdSeidman,whoseideas,suggestionsandencouragementwerecriticalintransformingthisfromanideatoafinalproduct.

Page 5: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

5

Introduction................................................................................... 6

UpdatedContent........................................................................... 7

Cross-CuttingComparisons......................................................... 10

At-a-GlanceSummaries................................................................ 19AssessingAfterschoolProgramPracticesTool.......................................................20

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool.........21

Out-of-SchoolTimeObservationInstrument...........................................................22

ProgramObservationTool........................................................................................23

ProgramQualityObservationTool............................................................................24

ProgramQualitySelf-AssessmentTool....................................................................25

PromisingPracticesRatingScale.............................................................................26

QualityAssuranceSystem®......................................................................................27

School-AgeCareEnvironmentRatingScale............................................................28

YouthProgramQualityAssessment.........................................................................29

IndividualToolDescriptions......................................................... 30AssessingAfterschoolProgramPracticesTool.......................................................31

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool.........38

Out-of-SchoolTimeObservationInstrument...........................................................43

ProgramObservationTool........................................................................................48

ProgramQualityObservationTool............................................................................53

ProgramQualitySelf-AssessmentTool....................................................................59

PromisingPracticesRatingScale.............................................................................63

QualityAssuranceSystem®......................................................................................68

School-AgeCareEnvironmentRatingScale............................................................72

YouthProgramQualityAssessment.........................................................................77

References..................................................................................... 85

Appendix........................................................................................ 87

Page 6: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

6

Thefollowingtoolsareincludedintheguideatthistime:

AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures

Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.

ProgramObservationTool(POT)NationalAfterSchoolAssociation

ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce

ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork

PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QualityAssuranceSystem®(QAS)Foundations,Inc.

School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality

Withtheafter-schoolandyouthdevelopmentfieldsexpandingandmaturingoverthepastseveralyears,programqualityassessmenthasemergedasacentraltheme.Thisinterestinprogramqualityissharedbypractitioners,policymakersandresearchersintheyouth-servingsector.

Fromaresearchperspective,moreevaluationsareincludinganassessmentofprogramqualityandmanyhaveincorporatedsetting-levelmeasures(wheretheobjectofmeasurementistheprogram,nottheparticipants)intheirdesigns.Atthepolicylevel,decision-makersarelookingforwaystoensurethatresourcesareallocatedtoprogramslikelytohaveanimpactandareincreasinglybuildingqualityassessmentandimprovementexpectationsintorequestsforproposalsandprogramregulations.Atthepracticelevel,programs,organizationsandsystemsarelookingfortoolsthathelpconcretizewhateffectivepracticelookslikeandallowpractitionerstoassess,reflectonandimprovetheirprograms.

Withthisgrowinginterestinprogramqualityhascomeanincreaseinthenumberoftoolsavailabletohelpprogramsandsystemsassessandimprovequality.Giventhesizeanddiversityoftheyouth-servingsector,itisunrealistictoexpectthatanyonequalityassessmenttoolwillfitallprogramsorcircumstances.Whilediversityinavailableresourcesispositiveandreflectstheevolutionofthefield,italsomakesitimportantthatpotentialusershaveaccesstogoodinformationtohelpguidetheirdecision-making.

Overthelastseveralyears,weattheForumhavefoundourselvesregularlyfieldingquestionsrelatedtoprogramqualityassessmentincludingwhattoolsexist,whatittakestousethemandwhatmightworkbestunderwhatconditions.Theneedtoofferguidancetothefieldintermsofavailableresourceshasbecomeincreasinglyclear.

Thisguidewasdesignedtocomparethepurpose,structure,contentandtechnicalpropertiesofseveralyouthprogramqualityassessmenttools.Itbuildsonworkwebeganinthisareafiveyearsago,aswellasrecentworkconductedbytheHarvardFamilyResearchProject

Page 7: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

7

todocumentandcompilequalitystandardsformiddleschoolprograms(Westmoreland,H.&Little,P.,2006).

CriteriaforInclusionWithanycompendiumcomesthechallengeofdeterminingwhattoinclude.Ourfirstcaveatisthatweplantocontinuerevisingthisguideovertime,inpartbecauseinitscurrentformitisnotinclusiveoftheuniverseofrelevanttoolsandinpartbecauseagreatdealofinnovationiscurrentlyunderway.Manyofthetoolsincludedinthereviewwillberevisedorwillundergofurtherfieldtestinginthenext1-2years.

Ourcriteriaforinclusionintheguidewereasfollows:

Toolsthatareorthatincludesetting-level•observationalmeasuresofquality.Weareparticularlyinterestedindirectprogramobservationasameansforgatheringspecificdataaboutprogramqualityandinparticular,staffpractice.Thereforethisreviewdoesnotfeatureothermethodologicalapproachestomeasuringquality(e.g.,surveyingparticipants,stafforparentsabouttheprogram).

Toolswhichareapplicableinarangeofschool•andcommunity-basedprogramsettings.Wedidnotincludetoolsthataredesignedtomeasurehowwellaspecificmodelisbeingimplemented(sometimesreferredtoasfidelity)orhavelimitedapplicabilitybeyondspecificorganizationsorapproaches.

Toolsthatincludeafocusonsocialprocesses•withinprograms.Manyofthetoolsinthisguideaddresssomestaticregulatoryorlicensingissues(e.g.,policiesrelatedtostaffing,healthandsafety).However,weareparticularlyinterestedintoolsthataddresssocialprocessesortheinteractionsbetweenandamongpeopleintheprogram.

Toolswhichareresearch-based.• Allofthetoolsincludedare“research-based”inthesensethattheirdevelopmentwasinformedbyrelevantchild/youthdevelopmentliterature.Althoughweareparticularlyinterestedininstrumentswithestablishedtechnicalproperties(e.g.,reliability,

validity),notallofthoseincludedfitthismorerigorousdefinitionof“research-based.”

PurposeandContentsoftheGuideWehopethiscompendiumwillprovideusefulguidancetopractitioners,policymakers,researchersandevaluatorsinthefieldastowhatoptionsareavailableandwhatissuestoconsiderwhenselectingandusingaqualityassessmenttool.Itfocusesonthepurposeandhistory,content,structureandmethodology,technicalpropertiesanduserconsiderationsforeachoftheinstrumentsincluded,aswellasabriefdescriptionofhowtheyarebeingusedinthefield.Foreachtool,weaimtoaddressthefollowingkeyquestions:

PurposeandHistory.Whywastheinstrumentdeveloped–forwhomandinwhatcontext?Isitsprimarypurposeprogramimprovement?Accreditation?Evaluation?Forwhatkindsofprograms,servingwhatagegroups,isitappropriatefor?

Content.Whatkindsofthingsaremeasuredbythetool?Istheprimaryfocusontheactivity,programororganizationlevel?Whatcomponentsofthesettingsareemphasized–socialprocesses,programresources,orthearrangementofthoseresources(Seidman,Tseng&Weisner,2006)?HowdoesitalignwiththeNationalResearchCouncil’spositivedevelopmentalsettingsframework1(2002)?

StructureandMethodology.Howisthetoolorganizedandhowdoyouuseit?Howaredatacollectedandbywhom?Howdotheratingscalesworkandhowareratingsdetermined?Canthetoolbeusedtogenerateanoverallprogramqualityscore?

TechnicalProperties.Isthereanyevidencethatdifferentobserversinterpretquestionsinsimilarways(reliability)?Isthereanyevidencethatthetoolmeasureswhatitissupposedtomeasure(validity)?SeetheAppendixfora“psychometricsdictionary”thatdefinesrelevantterminologyandexplainswhytechnicalpropertiesareanimportantconsideration.

1Thisreportincludedalistof“featuresofpositivedevelopmentalsettings”culledfromfrequentlycitedliterature.Ithascontributedtotheemergingconsensusaboutthecomponentsofprogramquality.

Page 8: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

8

UserConsiderations.Howeasyisthetooltoaccessanduse?Doesitcomewithinstructionsthatareunderstandableforpractitionersaswellasresearchers?Istrainingavailableontheinstrumentitselforonthecontentcoveredbyit?Aredatacollection,managementandreportingservicesavailable?Whatcostsareassociatedwithusingthetool?

IntheField.Howisthetoolbeingappliedinspecificprogramsorsystems?

Toensurethattheguideisusefultoarangeofaudienceswithdifferentpurposesandpriorities,wehaveprovidedbothin-depthandsummarylevelinformationinavarietyofformats.

Foreachtool,weprovidebothaonepage“at-a-glance”summaryaswellasalongerdescription.Theat-a-glancesummariesorlongertooldescriptionscanstandaloneasindividualresources.Shouldyoudecidetouseoneoftheseinstrumentsorwanttotakeacloserlookattwoorthree,youcouldpullthesesectionsoutandsharewithkeystakeholders.

Wealsoprovidecross-instrumentcomparisonchartsandtablesforthosewhowanttogetasenseofwhatthelandscapeofprogramqualityassessmenttoolslookslike.TheCross-CuttingObservationssectionthatfollowscomparestheinstrumentsacrossmostofthecategorieslistedabove(purpose,content,structure,technicalproperties,userconsiderations).Whiledefinitionsofqualitydonotdifferdramaticallyacrosstheinstruments,therearenotabledifferencesinsomeoftheseotherareaswhichwetrytocapture.

Page 9: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

9

Inthiseditionoftheguide,weupdatethesummariesofnineassessmenttoolsfeaturedintheoriginalMarch2007edition,andaddanadditionaltool–theCommunitiesOrganizingResourcestoAdvanceLearning(CORAL)ObservationTool)–developedbyPublic/PrivateVentures.Thiseditionalsoincludesrefineddefinitionsofvalidityandadiscussionregardingsomeofthelimitationsoftraditionalmethodsofestablishingreliability.

Sinceouroriginalpublication,therehasbeenaflurryofactivityrelatedtothedevelopmentanduseofthevarioustools.Almostallofthetooldevelopershavecontinuedtoworkoneithertechnicalorpracticalaspectsoftheirassessmenttools,aswellasonrelatedresourcestosupportpractitioneruseofthesetools.

Thesechangesdemonstratecontinuedinvestmentonthepartofdevelopersinmakingtoolsmoreaccessibleanduser-friendlytoprogramsandsystemstryingtoimplementqualityassessmentandimprovement.Changesthathavebeenmadeorareindevelopmentsince2007include:

Furtherpsychometrictestingofthereliabilityand•validityofmeasures(OST;YPQA)

Developmentand/orexpansionofresourcesto•supporttheuseofvarioustools(APT;POT;QSA;QAS)

Developmentand/orexpansionoftheavailabilityof•web-basedtoolsandresources(QAS;QSA;YPQA)

Aligningqualityassessmenttoolswithother•measurestocreateapackageofcompatibletools(APT)

Restructuringoftheframeworkand/orscales•(APT;OST)

Expandingaccessbytranslatingatoolintodifferent•languages(SACERS)

Developmentofbrother/sistertoolstargeting•differentagegroups(YPQA;SACERS)

Wehopethiscompendiumwillprovideusefulguidancetopractitioners,policymakers,researchersandevaluatorsinthefieldastowhatoptionsareavailableandwhatissuestoconsiderwhenselectingandusingaqualityassessmenttool.Welookforwardtoupdatingthecompendiumagainasthisworkadvances.

Page 10: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

10

TOOLDEVELOPERSKEYAPT:AssessingAfterschoolProgramPracticesToolNationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CORAL:CommunitiesOrganizingResourcestoAdvanceLearningObservationToolPublic/PrivateVentures

OST:Out-of-SchoolTimeObservationToolPolicyStudiesAssociates,Inc.

POT:ProgramObservationToolNationalAfterSchoolAssociation

PQO:ProgramQualityObservationScaleDeborahLoweVandellandKimPierce

QSA:ProgramQualitySelf-AssessmentToolNewYorkStateAfterschoolNetwork

PPRS:PromisingPracticesRatingScaleWisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QAS:QualityAssuranceSystem®Foundations,Inc.

SACERS:School-AgeCareEnvironmentRatingScaleFrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YPQA:YouthProgramQualityAssessmentDavidP.WeikartCenterforYouthProgramQuality

Althoughtheindividualtooldescriptionsincludewhatwehopeisusefulinformationaboutseveraldifferentprogramqualityassessmentinstruments,theirlevelofdetailmaybedaunting,particularlywithoutasenseofthebroaderlandscapeofresources.Someoftheindividualizedinformationabouteachtoolcanbefurtherdistilledinwaysthatmayhelpreadersunderstandboththebroadercontextofprogramqualityassessmentandwhereindividualtoolsfallwithinthatcontext.Wewerenotabletocollectcompletelycomparableinformationaboutallinstrumentsineverytopicarea,butinthosecaseswherewewere,wehavesummarizedandcomparedthatinformationinnarrativeandcharts.

Figure1:TargetAgeandPurpose

Figure2:CommonandUniqueContent

Figure3:Methodology

Figure4:StrengthofTechnicalProperties

AdditionalTechnicalConsiderations

Figure5:TechnicalGlossary

Figure6:TrainingandSupportforUsers

Page 11: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment 11

Mostofthetoolsincludedinthisreviewweredevelopedprimarilyforself-assessmentandprogramimprovementpurposes.Some,however,weredevelopedwithprogrammonitoringoraccreditationasakeygoalandseveralweredevelopedexclusivelyforuseinresearch.Manyhavetheirrootsinearlychildhoodassessment(SACERS,

POT,PQO)whileothersdrawmoreheavilyonyouthdevelopmentand/oreducationliterature(APT,CORAL,OST,PPRS,QAS,QSA,YPQA).Whilethemajorityoftoolsweredesignedtoassessprogramsservingabroadrangeofchildren(oftenK–12orK–8),somearetailoredformorespecificageranges.

ProgramTargetAge

PrimaryPurpose

GradesServed Improvement Monitoring/Accreditation

Research/Evaluation

AssessingAfterschoolProgramPracticesTool(APT) GradesK–8 ¸ ¸

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) GradesK–5 ¸ ¸

Out-of-SchoolTimeObservationTool(OST) GradesK–12 ¸

ProgramObservationTool(POT) GradesK–8 ¸ ¸

ProgramQualityObservationScale(PQO) Grades1–5 ¸

ProgramQualitySelf-AssessmentTool(QSA) GradesK–12 ¸

PromisingPracticesRatingScale(PPRS) GradesK–8 ¸

QualityAssuranceSystem(QAS) GradesK–12 ¸

School-AgeCareEnvironmentRatingScale(SACERS) GradesK–6 ¸ ¸ ¸

YouthProgramQualityAssessment(YPQA) Grades4–12 ¸ ¸ ¸

Page 12: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment12

Thereisreasonableconsensusacrossinstrumentsaboutthecorefeaturesofsettingsthatmatterfordevelopment.Allofthetoolsincludedinthisreviewmeasuresixcoreconstructs(atvaryinglevelsofdepth):relationships,environment,engagement,socialnorms,skillbuildingopportunitiesandroutine/structure.ThecontentofmostoftheinstrumentsalignswellwiththeNationalResearchCouncil’sfeaturesofpositive

developmentsettingsframework(2002),whichhashelpedcontributetothegrowingconsensusaroundelementsofqualitythathasemergedsincethen.Intermsofwhatcomponentsofsettingsthetoolsemphasize(Seidmanetal,2006),allincludeafocusonsocialprocesses.Althoughonlyasubsetemphasizeprogramresources,severalincludeitemsrelatedtothearrangementofresourceswithinthesetting.

Management(CORAL,POT,QAS,QSA)

Staffing(APT,YPQA,QSA,

SACERS,POT)

YouthLeadership/Participation(APT,YPQA,OST,

QSA,PPRS)

ALLTOOLSMEASURE:RelationshipsEnvironmentEngagementSocialNorms

Skill-BuildingOpportunitiesRoutine/Structure

LinkagestoCommunity

(APT,YPQA,SACERS,QSA,QAS,POT)

Page 13: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment 13

2Thetimesamplingmethodhasobserversgothroughacycleofselectingindividualparticipants(ideallyatrandom)toobserveforbriefperiodsoftimeanddocumenttheirexperiences.

Manyofthetoolsincludedinthisreviewfollowasimilarstructure.Theytendtobeorganizedaroundacoresetoftopicsorconstructs,eachofwhichisdividedintoseveralitems,whicharethendescribedbyahandfulofmoredetailedindicators.Somevariationdoesexist,however.Forexample,thePQOincludesauniquetimesamplingcomponent.2Whilemosttoolsareorganizedaroundfeaturesofquality,somearenot.

Forexample,whiletheAPTaddressesacoresetofqualityfeatures,thetoolitselfisorganizedaroundtheprogram’sdailyroutine(e.g.,arrival,transitions,pick-up).Observationistheprimarydatacollectionmethodforeachoftheinstrumentsinthisreview,althoughseveralrelyuponinterview,questionnaireordocumentreviewasadditionaldatasources.

TargetUsers DataCollectionMethods

Prog

ram

St

aff

Exte

rnal

Ob

serv

ers

Obse

rvat

ion

Inte

rvie

w

Ques

tionn

aire

Docu

men

tRe

view

AssessingAfterschoolProgramPracticesTool(APT) ¸ ¸ ¸ ¸CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) ¸ ¸Out-of-SchoolTimeObservationTool(OST) ¸ ¸ProgramObservationTool(POT) ¸ ¸ ¸ ¸ ¸ProgramQualityObservationScale(PQO) ¸ ¸ProgramQualitySelf-AssessmentTool(QSA) ¸ ¸ ¸PromisingPracticesRatingScale(PPRS) ¸ ¸QualityAssuranceSystem(QAS) ¸ ¸ ¸ ¸ ¸School-AgeCareEnvironmentRatingScale(SACERS) ¸ ¸ ¸ ¸YouthProgramQualityAssessment(YPQA) ¸ ¸ ¸ ¸

Page 14: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment14

Mostoftheinstrumentshavesomeinformationshowingthatifdifferentobserverswatchthesameprogrampractices,theywillscoretheinstrumentsimilarly(internalconsistencyandinterraterreliability).Few,however,havelookedatotheraspectsofreliabilitythatareofinterestwhenassessingthestrengthofaprogramquality

measure.Severaloftheinstrumentshavepromisingfindingstoconsiderintermsofvalidity–meaningtheyhavemadesomeefforttodemonstratethattheinstrumentaccuratelymeasureswhatitissupposedtomeasure.Seetheaccompanyingglossaryonpage15andtheAppendixformoredetaileddefinitionsofpsychometricterms.

Scor

eDi

strib

utio

ns

Inte

rrat

er

Relia

bilit

y

Test

-Ret

est

Relia

bilit

y

Inte

rnal

Co

nsiste

ncy*

Conv

erge

nt

Valid

ity

Conc

urre

nt/

Pred

ictiv

eVa

lidity

Valid

ity

ofS

cale

St

ruct

ure*

AssessingAfterschoolProgramPracticesTool(APT) ¸̧ † ¸̧ †

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧

Out-of-SchoolTimeObservationTool(OST) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧

ProgramObservationTool(POT) ¸̧ †̧ ¸̧ †̧ ¸̧ †̧ ¸̧ †

ProgramQualityObservationScale(PQO) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸ ¸̧ N/A

ProgramQualitySelf-AssessmentTool(QSA)

PromisingPracticesRatingScale(PPRS) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ N/A

QualityAssuranceSystem(QAS)

School-AgeCareEnvironmentRatingScale(SACERS) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧

YouthProgramQualityAssessment(YPQA) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸̧ ¸

Key=NoEvidence

¸̧ ¸ =Evidenceofthispropertyisstrongbygeneralstandards

¸̧ =Evidenceofthispropertyismoderatebygeneralstandards,promisingbutlimitedormixed(strongonsomeitemsorscale,weakeronothers)

¸ =Evidenceofthispropertyisweakerthandesired

*Thistypeofevidenceisonlyrelevantforinstrumentswithalotofitemsthatwouldbeusefuliforganizedintoscales.†Psychometricinformationisnotbasedontheinstrumentinitscurrentform,soitsgeneralizabilitymaybelimited.

Page 15: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment 15

Whatisit? WhyisitUseful?

ScoreDistributions

Thedispersionorspreadofscoresfrommultipleassessmentsforaspecificitemorscale.

Inorderforitemsandscales(setsofitems)tobeuseful,theyshouldbeabletodistinguishdifferencebetweenprograms.Ifalmosteveryprogramscoreslowonaparticularscale,itmaybethattheitemsmakeit“toodifficult”toobtainahighscoreand,asaresult,don’tdistinguishbetweenprogramsonthisdimensionverywell.

InterraterReliability

Howmuchassessmentsbydifferenttrainedratersagreewhenobservingthesameprogramatthesametime.

ItisimportanttouseinstrumentsthatyieldreliableinformationregardlessofthewhimsorpersonalitiesofindividualobserversIffindingsdependlargelyonwhoisratingtheprogram(raterAismorelikelytogivefavorablescoresthanraterB),itishardtogetasenseoftheprogram’sactualstrengthsandweaknesses.

Test-RetestReliability

Thestabilityofaninstrument’sassessmentsofthesameprogramovertime.

Ifaninstrumenthasstrongtest-retestreliabilitythanthescoreitgeneratesshouldbestableovertime.Thisisimportantbecausewewantchangesinscorestoreflectrealchangesinprogramquality.Thegoalistoavoidsituationswhereaninstrumentiseithertoosensitivetosubtlechangesthatmayholdlittlesignificance,orinsensitivetoimportantlong-termchanges.

InternalConsistency Thecohesivenessofitemsforminganinstrument’sscales.

Scalesaresetsofitemswithinaninstrumentthatjointlymeasureaparticularconcept.If,however,theitemswithinagivenscaleareactuallyconceptuallyunrelatedtoeachother,thentheoverallscoreforthatscalemaynotbemeaningful.

ConvergentValidity

Theextenttowhichaninstrumentcomparesfavorablywithanotherinstrument(preferablyonewithdemonstratedvaliditystrengths)measuringidenticalorhighlysimilarconcepts.

Itisimportanttouseaninstrumentthatgeneratesaccurateinformationaboutwhatyouaretryingtomeasure.Iftwoinstrumentsarepresumedtomeasureidenticalorhighlysimilarconcepts,wewouldexpectprogramsthatreceivehighscoresononemeasuretoalsoreceivehighscoresontheother.

Concurrent/PredictiveValidity

Theextenttowhichaninstrumentisrelatedtodistincttheoreticallyimportantconceptsandoutcomesinexpectedways.

Ifaninstrumentaccuratelymeasureshighprogramqualitythenonecanexpectittopredictbetteroutcomesfortheyouthparticipatingintheprogram.Theinstrumentsfindingsshouldalsoberelatedtodistinct,theoreticallyimportantvariablesandconceptsinexpectedways.

ValidityofScaleStructure

Theextenttowhichitemsstatisticallygrouptogetherinexpectedwaystoformscales.

Itishelpfultoknowexactlywhichconceptsaninstrumentismeasuring.Factoranalysiscanhelpdetermineifonescaleactuallyincorporatesmorethanonerelatedconceptorifdifferentitemscanbecombinedbecausetheyareessentiallymeasuringthesamething.

Page 16: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

16

Manyinstrumentsinthisreporthavestrongreliabilityandvalidityevidenceusingtraditionallyacceptedtechniques.However,aswithanyfield,newmethodsareoftenintroducedtoadvanceourunderstandingofreliabilityandvalidity.Inthissection,wediscusssomeofthelimitationsoftraditionalmethodsthathavebeenhighlightedbyresearchersattheDavidP.WeikartCenterforYouthProgramQuality3(YPQAdeveloper),whatthesemethodscanandcannottellusaboutaninstrument’sreliabilityandvalidity,andhownewmethodsareaddressingtheseissues.

VariationsinQualityAcrossDifferentContextsInanidealworld,scoresweobtainfromprogramqualityinstrumentswouldalwaysbeperfectlyaccurate.Unfortunately,realitytendstobemessierbecausemanyfactorsinfluenceassessments.Forexample,differentratersmaynotperceiveobservationsinthesamewayandthusgivedifferentscorestothesamequestions.Differentstaffordifferentactivitiesmightgetdifferentscores,ifwecouldobservethemall,buttypicallywecannotdoso.Anotherpossibleissuecouldbethatprogramstaffinteractwithchildrendifferentlyatthebeginningoftheyear(whentheydonotknowthechildrenwellyet)versustheendoftheyear.Whentheseinfluencesareunaccountedforwhenusinganinstrument,theyarecollectivelyknownas“errorvariance”.Whenaninstrumentisreliable,itsscoresarenotinfluencedbymucherror.

Unfortunately,traditionalreliabilitymethods,includingtheonesusedbyinstrumentsinthisreport,donotaccountforallpossiblesourcesofvariationinscores(therebyincreasingtheinaccuracyoftheinstrument),asdiscussedbySteveRaudenbushandhiscolleagues(forareadabletreatmentofthistopic,seeMartinez&Raudenbush,2008).CharlesSmithattheWeikartCenterhasdonepreliminaryworkthatexaminessourcesofvariationfortheYPQA,includingwhetherratingsaredifferentduringearlierprogramsessionsversuslaterprogramsessions.Hehasalsoexamined

whetheritisenoughtoobserveonetypeofactivitywithinanagencyversusobservingabroadrangeofactivities(readerswhoareinterestedinspecificfindingsshouldrefertothetechnicalsummaryoftheYPQA,seepages78-82).

Whenweknowhowaninstrumentisinfluencedbyallthesefactors,wecantakestepstoreduceerror.Forexample,ifaninstrument’sscoresvarywidelydependingonwhichactivitiesweareobserving,thenweshouldobserveawiderangeofactivities.Ifscoresdependonthetimeofday,thenweshouldconductobservationsatmultipletimesthroughouttheday.Byaccountingfortheseadditionalinfluencesonprogramquality,wereduceerrorandobtainmoreaccuratescores.Atthispointintime,theYPQAistheonlyinstrumenttohavepreliminaryinformationonexternalsourcesofvariationbeyondinterraterreliability.

UnderstandingAssumptionsaboutInternalConsistencySeveralinstrumentsinthisreporthaveinternalconsistencyinformationontheirscales.Asexplainedinotherpartsofthisreport,scalesarecomposedofitemsmeanttomeasureaparticularconcept.Measuringinternalconsistencyisoftenthefirststepinevaluatingwhethertheitemsformameaningfuldomainbydeterminingwhethertheyarecohesive(readerswhowouldlikeamoreextensiveexplanationwithexamplesshouldrefertotheAppendix).

However,Smithpointsoutthatinternalconsistencyisonlyappropriatewhentheitemsarereflectionsofanon-tangibleconcept(calledreflectiveforshort).Asanexample,considertheconcept“SupportiveEnvironment.”Althoughthismightbeanimportantconcepttoassess,onecannotmeasureitthesamewayonewouldmeasuretemperatureorweight.Instead,researchersmustrelyonasetofquestionstoapproximatehowsupportivetheenvironmentisforchildren.Oneanalogyforareflectiveconceptcouldbeanartsculpture–totrulyappreciateit,onemustlookatthesculpturefrommultipleangles.

3TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.

Page 17: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

17

Similarly,totrulymeasureareflectiveconcept,onemustexamineitusingagroupofsimilaritems(whichprovidethedifferentangles).Theoreticallyspeaking,researchersinterestedindevelopingascalecouldprobablygeneratehundredsofpossibleitemstomeasureaparticularconcept.Ofcourse,itisimpracticaltousethemall,soresearcherschooseamanageablesubset.

Incontrast,internalconsistencyscoresarenotappropriatewhenconceptsareformative,whichmeansthatasingleconceptisacompositeofmultiple,separatecomponents(MacKenzie,Podsakoff,&Jarvis,2005).Agoodanalogyforthistypeofconceptisapuzzle.Toassessaformativeconcept,youneedtogatherallofthepiecesandputthemtogether.Forexample,imaginethatwewishtomeasure“ProgramResources.”Unlikeareflectiveconcept,thistypeofconceptisacompositeofseveralimportantcomponents(thepuzzlepieces).

Ouritemswouldinquireaboutthingslikemoney,time,space,andnumberofstaffmembers.Eachoftheseresourcesmaybeanimportantcomponentoftheoverallconceptandareessentialtoincludeinthescaleifwearetoobtainaclearpictureandcompletethepuzzle.Unlikereflectiveconcepts,researcherscannotchooseasubsetofitemsfromalargelistofpossibilities.Rather,eachitemisanimportantcomponenttothewhole.Becausetheconceptisacompositeofseparateandpotentiallyunrelatedparts,thecohesivenessoftheitemsisnotimportant,andthereforeinternalconsistencyproceduresarenotappropriate(asstatedintheAppendix,internalconsistencymeasurestherelatednessofitems,whichassumesthattheitemsarereflective).

TheWeikartCenterhasbeenreexaminingtheYPQAscalestoassesswhethertheyarereflectiveversusformative.Althoughthisworkisstillinprogress,theresultswillhaveimportantimplicationsforhowwethinkaboutevaluatingbothreliabilityandvalidityforobservation-basedmetrics.

Page 18: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment18

*Afeestructuremaybedevelopedovertime,onceadditionalmaterialsarecompleted.†Traininganddataserviceshaveonlybeenmadeavailableinthecontextofspecificresearchprojects.††Theseareestimatesoftimenecessarytoprepareobservers;developersofthesetoolshavenottrained“toreliability.”

SixoftheteninstrumentsincludedinthisreviewarefreetousersandavailabletodownloadfromtheInternet;theotherfourhavevariouscostsassociatedwiththeiruse.Inmost,butnotallcases,trainingisavailable(atafee)forthoseinterestedinusingthetool.Manycomewithuser-friendlymanualsthatexplainhow

tousetheinstrument;insomecasesthesematerialsarestillunderdevelopment.Inseveralcases,thedevelopersofthetoolsalsoprovidedatacollection,managementandreportingservicesatadditionalcosttousers.Detailsaboutsuchconsiderationsareincludedintheindividualtooldescriptions.

Cost

Trai

ning

Ava

ilabl

e

Estim

ated

Tim

eNe

cess

ary

to

Trai

nOv

erse

rver

sto

Gen

erat

e

Relia

bleSc

ores

Estim

ated

M

inim

um

Obse

rvat

ion

Tim

eNe

eded

to

Gene

rate

Sou

nd

Data

Data

Col

lect

ion,

M

anag

emen

tan

dRe

portin

gAv

aila

ble

AssessingAfterschoolProgramPracticesTool(APT) Free* Yes

4hourtrainingplus2programobservations

1afternoon(2-3hours)

No

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) Free No 2days 3-4hours No

Out-of-SchoolTimeObservationTool(OST) Free No†8-18hours,dependingonexperience

3hours No†

ProgramObservationTool(POT)

$300AdvancingQualityKit

Yes 2.5-3days3-5hours(forself-assessment)

No

ProgramQualityObservationScale(PQO) Free No†

2hoursplus2-4observations&2-4timesamples,dependingonexperience

1.5hoursobservation&.5hourstimesampling

No†

ProgramQualitySelf-AssessmentTool(QSA) Free Yes 2hours†† N/A No

PromisingPracticesRatingScale(PPRS) Free No†

2hoursplus2-4observations,dependingonexperience

2hours No†

QualityAssuranceSystem(QAS)

$75AnnualSiteLicense

Yes 2-3hours††1afternoon(2-3hours)

Yes

School-AgeCareEnvironmentRatingScale(SACERS)

$15.95SACERSBooklet

Yes 4-5days 3hours Yes

YouthProgramQualityAssessment(YPQA) $39.95YPQAStarterPack

Yes 2days 4hours Yes

Page 19: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

19

Detaileddescriptionsofthetenassessmenttoolsareprovidedinthenextsection.Hereweofferone-pagesummariestocopyandshare.Eachsummaryfollowsacommonformat.

AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures

Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.

ProgramObservationTool(POT)NationalAfterSchoolAssociation

ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce

ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork

PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QualityAssuranceSystem®(QAS)Foundations,Inc.

School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality

Page 20: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

20

DevelopedbyNIOSTandtheMassachusettsDepartmentofElementary&SecondaryEducation

Overview:TheAssessmentofAfterschoolProgramPracticesTool(APT)isdesignedtohelppractitionersexamineandimprovewhattheydointheirprogramtosupportyoungpeople’slearninganddevelopment.Itexaminesthoseprogrampracticesthatresearchsuggestsrelatetoyouthoutcomes(e.g.,behavior,initiative,socialrelationships).AresearchversionoftheAPT(theAPT-R)wasdevelopedin2003-2004.Thismoreuser-friendlyself-assessmentversionwasdevelopedin2005.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation

ProgramTargetAge:GradesK–8

RelevantSettings:Bothstructuredandunstructuredprogramsthatserveelementaryandmiddleschoolstudentsduringthenon-schoolhours.

Content:TheAPTmeasuresasetof15program-levelfeaturesandpracticesthatcanbesummarizedintofivebroadcategories–programclimate,relationships,approachesandprogramming,partnershipsandyouthparticipation.

Structure:The15programfeaturesaddressedbytheAPTaremeasuredbytwotools–theobservationinstrument(APT-O)andquestionnaire(APT-Q).TheAPT-Oguidesobservationsoftheprograminaction,whiletheAPT-Qexaminesaspectsofqualitythatarenoteasilyobservedandguidesstaffreflectiononthoseaspectsofpracticeandorganizationalpolicy.

Methodology:Itemsthatareobservablewithinagivenprogramsession(typicallyonefullafternoon)areassessedintheAPT-O.TheAPT-Qisaquestionnairetogatherinformationaboutplanning,frequencyandregularityofprogramofferings

andopportunitiesandfrequencyofconnectionswithfamiliesandschool.BoththeAPT-OandAPT-Qhavefour-pointscales,thoughflexibilityisencouragedforuserswhofindthescalesnotusefulfortheirpurposes.Dependingonwhatpartofthetool(s)isbeingused,thescalesmeasurehowcharacteristicanitemisoftheprogram,theconsistencyofanitemorthefrequencyofanitem.Foreachitem,concretedescriptorsillustratewhatascoreof1,2,3or4lookslike.

TechnicalProperties:Whilenopsychometricinformationisavailableforthecurrentself-assessmentversionoftheAPT,someisavailableontheresearchversion(APT-R)onwhichitisbased.FortheAPT-R,interraterreliabilitywasmoderateandpreliminaryevidenceofconcurrentandpredictivevalidityisavailable.NIOSThasplansforfurthertestingoftheAPT.

UserConsiderations:EaseofUse

“Cheatsheets”demonstratelinkbetweenquality•andoutcomes.

Instrumentisextremelyflexibleinterms•ofadministration,useofscales,numberofobservations,etc.

Theinstrumentisdesignedforuserstomake•observationsinjustoneprogramsession.

Theinstrumentcanbeusedaspartofapackage•includinganoutcomestoolanddatatrackingsystem.

AvailableSupportsTrainingonboththeAPTitselfandtheyouth•developmentprinciplesembeddedintheinstrumentisavailablethroughNIOST.

Packagingandpricinginformationabouttraining•ontheinstrumentisavailablefromNOISTfororganizationsnotalreadyaffiliatedwiththeAPT.

ForMoreInformation:www.niost.org/content/view/1572/282/orwww.doe.mass.edu/21cclc/ta

Page 21: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

21

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool

Overview:TheCORALobservationtoolwasdesignedbyPublic/PrivateVentures(P/PV)fortheCORALafter-schoolinitiativefundedbytheJamesIrvineFoundation.ThetoolwasdevelopedforresearchpurposesandwasprimarilyusedinaseriesofevaluationstudiesontheCORALafter-schoolinitiative.TheprimarypurposeoftheobservationswastomonitorfidelitytotheBalancedLiteracyModelandchangeinqualityandoutcomesovertime.Thetoolwasusedintwoways:1)observationofliteracyinstructionand2)observationofprogramminginsupportofliteracy.ThoughtheCORALobservationtoolwasdesignedtohelpobserversmeasuretheimpactofafter-schoolprogramsonacademicachievement,ithasapplicationsforobservingqualityinawidevarietyofsettings.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:GradesK–5

RelevantSettings:Structuredliteracy-basedprograms,bothschoolandcommunity-based.

Content:TheCORALobservationtooldocumentstheconnectionbetweenthequalityoftheprogram,fidelitytotheBalancedLiteracyModelandtheacademicoutcomesofparticipants.

Structure:TheCORALobservationtoolisstructuredaroundfivekeyconstructsofquality–adult-youthrelations,effectiveinstruction,peercooperation,behaviormanagementandliteracyinstruction.Thetoolisdividedintofiveparts.Thefirstthree–theactivitydescriptionform,characteristicsformandtheactivitycheckboxform–arefocusedondescribingtheactivityaswellasparticipantandstaffbehavior.Thesecondtwo

componentsincludeanactivityscaleandanoverallassessmentform,andarecompletedaftera90-minuteobservationperiod.

Methodology:Eachconstructisbasedonafive-pointratingscale.Theactivitydescriptionform,characteristicsformandactivitycheckboxformarefilledoutbeforeanactivityisobserved,andcontainthemostinformativeaspectsoftheactivity.Theactivityscaleandoverallassessmentformarecompletedaftera90-minuteobservationsession.

TechnicalProperties:Evidenceforscoredistributionsandpredictivevalidityisstrongbygeneralstandards,andevidenceforinternalconsistencyandthevalidityofscalestructureispromisingbutlimited.

UserConsiderations:EaseofUse

Containsdetailedinstructionsforconducting•observations.

Includesspaceforopen-endednarratives.•

Scoringtakes3-4hours,includingcompletingthe•ratingscales,relatednarrativesandtheoverallassessment.

AvailableSupportsCurrently,trainingislimitedtoindividualsinvolved•inspecificevaluationsthatemploytheinstrument.

Public/PrivateVenture’swebsitefeaturesafree•downloadofmaterialsintheirAfterschoolToolkit.

ForMoreInformation:www.ppv.org/ppv/initiative.asp?section _ id=0&initiative _ id=29

Page 22: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

22

Out-of-SchoolTimeProgramObservationToolDevelopedbyPolicyStudiesAssociates,Inc.

Overview:TheOut-of-SchoolTimeProgramObservationTool(OST)wasdevelopedinconjunctionwithseveralresearchprojectsrelatedtoout-of-schooltimeprogramming,withthegoalofcollectingconsistentandobjectivedataaboutthequalityofactivitiesthroughobservation.Itsdesignisbasedonseveralassumptionsabouthigh-qualityprograms–firstthatcertainstructuralandinstitutionalfeaturessupporttheimplementationofhigh-qualityprogramsandsecondthatinstructionalactivitieswithcertaincharacteristics–variedcontent,mastery-orientedinstructionandpositiverelationships–promotepositiveyouthoutcomes.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:GradesK–12

RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.

Content:TheOSTdocumentsandratesthequalityofthefollowingmajorcomponentsofafter-schoolactivities:interactionsbetweenyouthandadultsandamongyouth,staffteachingprocessesandactivitycontentandstructures.

Structure:ThefirstsectionofOSTallowsfordetaileddocumentationofactivitytype,numberanddemographicsofparticipants,spaceused,learningskillstargeted,typeofstaffandtheenvironmentalcontext.Theremainderofthetoolassessesthequalityofactivitiesalongfivekeydomainsincludingrelationships,youthparticipation,staffskillbuildingandmasterystrategiesandactivitycontentandstructure.

Methodology:TheOSTobservationinstrumentusesaseven-pointscaletoassesstheextenttowhicheachindicatorisorisnotpresentduringanobservation.Qualitativedocumentation,recordedonsite,supplementstheratingscales.ActivityandqualityindicatordatafromtheOSTobservationinstrumentisusedinconjunctionwithrelatedsurveymeasures.

TechnicalProperties:Evidenceforinterraterreliabilityisstrongbygeneralstandards,asisevidenceforscoredistributionsandinternalconsistency.Evidenceforconcurrentvalidityandthevalidityofthescalestructureispromisingbutlimited.

UserConsiderations:EaseofUse

Freeandavailableonline.•

Toolincludesanintroductionandbasicprocedures•foruse.

Includessometechnicallanguagebuthasbeenused•bybothresearchersandpractitioners.

Ratersmustobserveapproximately3hoursof•programmingtogeneratesounddata.

Observerscanbetrainedtogeneratereliable•observationsthrough8-16hoursoftraining,dependingonlevelofexperience.

AvailableSupportsTrainingislimitedtoindividualsinvolvedinspecific•evaluationsthatemploytheinstrument.

Additionalnon-observationalmeasuresrelatedto•after-schoolprogrammingareavailablefromPSAthatcanbeusedinconjunctionwiththeOST.

ForMoreInformation:www.policystudies.com/studies/youth/OST%20Instrument.html

Page 23: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

23

DevelopedbytheNationalAfterSchoolAssociation

Overview:TheProgramObservationToolisthecenterpieceoftheNationalAfterSchoolAssociation’s(NAA)programimprovementandaccreditationprocessandisdesignedspecificallytohelpprogramsassessprogressagainsttheStandardsforQualitySchool-AgeCare.Developedin1991byNAAandtheNationalInstituteonOut-of-SchoolTime,thetoolwasrevisedandpilotedbeforetheaccreditationsystembeganin1998.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation

ProgramTargetAge:GradesK–8

RelevantSettings:Schoolandcenter-basedafter-schoolprograms.

Content:TheProgramObservationToolmeasures36“keysofquality,”organizedintosixcategories.Fiveareassessedprimarilythroughobservation:humanrelationships;indoorenvironment;outdoorenvironment;activities;andsafety,healthandnutrition.Thesixth–administration–isassessedthroughquestionnaire/interview.ThetoolreflectsNAA’scommitmenttoholisticchilddevelopmentanditsaccreditationorientation.

Structure:Thefivequalitycategoriesthatarethefocusofthetoolaremeasuredusingoneinstrumentthatincludesthe20relevantkeysandatotalof80indicators(fourperkey).Ifaprogramisgoingthroughtheaccreditationprocess,theadministrationitemsareassessedseparately,throughquestionnaire/interview.

Methodology:Theratingscalecaptureswhethereachindicatoristrueallofthetime,mostofthetime,sometimesornotatall.Specificdescriptionsofwhata0,1,2or3lookslikearenotprovided,butdescriptivestatementshelpclarifythemeaningofeachindicator.Programsseekingaccreditation

mustassignanoverallprogramratingbasedonindividualscoresandguidelinesareprovidedforobserverstoreconcileandcombinescores.Foraccreditationpurposes,theprogram/activitiesandsafety/nutritioncategoriesare“weighted.”

TechnicalProperties:NopsychometricevidenceisavailableonthePOTitself,butthereisinformationabouttheASQ(AssessingSchool-AgeChildcareQuality),fromwhichthePOTwasderived.Overall,evidenceforinterraterandtest-retestreliabilityisstrongbygeneralstandards.Followingrevisionstothescales,evidenceforinternalconsistencywasalsostrong.PreliminaryevidenceofconcurrentvalidityisalsoavailablefortheASQ.

UserConsiderations:EaseofUse

Accessiblelanguageandformatdevelopedwithinput•frompractitioners.

Whenusedforself-assessment,observationand•scoringtakesroughly3-5hours.

Aself-studymanualprovidesdetailedguidanceon•instrumentadministration.

Thepackagecostsapproximately$300(additional•costsforfullaccreditation).

AvailableSupportsThePOTispartofanintegratedsetofresourcesfor•self-studyandaccreditation.

Thefullaccreditationpackageprovidesdetailed•guides,videosandothersupports.

BeginninginSeptember2008,accreditationis•offeredthroughtheCouncilonAccreditation.

NAAcurrentlyofferstrainingthatcoversthe•ProgramObservationToolthroughitsday-longEndorserTraining(NAArecommendstwoandahalfdaysoftraininginordertoensurereliability).

SomeNAAstateaffiliatesoffertrainingforprograms•interestedinself-assessmentandimprovement.

ForMoreInformation:http://naaweb.yourmembership.com/?page=NAAAccreditation

Page 24: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

24

DevelopedbyDeborahLoweVandell&KimPierce

Overview:TheProgramQualityObservationScale(PQO)wasdesignedtohelpobserverscharacterizetheoverallqualityofanafter-schoolprogramenvironmentandtodocumentindividualchildren’sexperienceswithinprograms.ThePQOhasbeenusedinaseriesofresearchstudiesandhasitsrootsinVandell’sobservationalworkinearlychildcaresettings.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:Grades1–5

RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.

Content:ThePQOfocusesprimarilyonsocialprocessesandinparticular,threecomponentsofqualityofchildren’sexperiencesinsideprograms:relationshipswithstaff,relationshipswithpeersandopportunitiesforengagementinactivities.

Structure:Thetoolhastwocomponents–qualitativeratingsfocusedontheprogramenvironmentandstaffbehavior(referredtoas“caregiverstyle”)andtimesamplesofchildren’sactivitiesandinteractions.Whileprogramenvironmentratingsaremadeoftheprogramasawhole,caregiverstyleratingsaremadeseparatelyforeachstaffmemberobserved.

Methodology:Allitemsareallassessedthroughobservation(althoughthePQOhasalwaysbeenusedintandemwithothermeasuresthatrelyondifferentkindsofdata).Programenvironmentandcaregiverstyleratingsaremadeusingafour-pointscaleandusersaregivendescriptionsofwhatconstitutesa1,2,3or4forthreeaspectsof

environmentandfouraspectsofcaregiverstyle.Inthetimesampleofactivities,activitytypeisrecordedusing19differentcategoriesandinteractionsareassessedandcodedalongseveraldimensions.

TechnicalProperties:Evidenceforinterraterreliability,scoredistributions,internalconsistencyandconvergentvalidityisstrongbygeneralstandardsandevidencefortest-retestreliabilityandconcurrent/predictivevalidityispromisingbutmixed.

UserConsiderations:EaseofUse

Freeandavailableforuse.•

ThePQOwasdevelopedwitharesearchaudience•inmind.Manualincludesbasicinstructionsforconductingobservationsandcompletingformsbuthasnotbeentailoredforgeneralorpractitioneruseatthistime.

Qualitativeratingsofenvironmentandstaff•requireaminimumof90minutesobservationtime.Completingthetimesamplesasoutlinedtakesaminimumof30minutesforanexperiencedobserver.

AvailableSupportsTraininghasonlybeenmadeavailableinthecontext•ofaspecificresearchstudy.

Datacollection,managementorreportinghaveonly•beenavailableinthecontextofaspecificstudy.

Theauthorshavedevelopedarangeofrelated•measuresthatcanbeusedinconjunctionwiththePQO(e.g.,physicalenvironmentquestionnaire;staff,studentandparentsurveys).

ForMoreInformation:http://childcare.gse.uci.edu/des4.html

Page 25: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

25

DevelopedbytheNewYorkStateAfterschoolNetwork

Overview:TheProgramQualitySelf-AssessmentTool(QSA)wasdevelopedexclusivelyforself-assessmentpurposes(useforexternalassessmentandformalevaluationpurposesisdiscouraged).TheQSAisintendedtobeusedasthefocalpointofacollectiveself-assessmentprocessthatinvolvesallprogramstaff.Soonafteritwascreatedin2005,thestateofNewYorkbeganrequiringthatall21stCCLC-fundedprogramsuseittwiceayearforself-assessmentpurposes.

PrimaryPurpose:ProgramImprovement

ProgramTargetAge:GradesK–12

RelevantSettings:Thefullrangeofschoolandcommunity-basedafter-schoolprograms.TheQSAisparticularlyrelevantforprogramsthatintendtoprovideabroadrangeofservicesasopposedtothosewitheitheraverynarrowfocusornoparticularfocus(e.g.,drop-incenters).

Content:TheQSAisorganizedinto10essentialelementsofeffectiveafter-schoolprograms,includingenvironment/climate;administration/organization;programming/activities;andyouthparticipation/engagement,amongothers.Alistofstandardsdescribeseachelementingreaterdetail.Theelementsrepresentamixofactivity-level,program-levelandorganizational-levelconcerns.

Structure:EachoftheQSA’s10essentialelementsisfurtherdefinedbyasummarystatementwhichisthenfollowedbybetween7and18qualityindicators.Thefour-pointratingscaleusedintheQSAisdesignedtocaptureperformancelevelsforeachindicator.Indicatorsarealsoconsideredstandardsofpractice,sothegoalistodeterminewhethertheprogramdoesordoesnotmeeteachofthestandards.

Methodology:Whilemostessentialelementsareassessedthroughobservation,themoreorganizationallyfocusedelementssuchasadministration,measuringoutcomes/evaluationandprogramsustainability/growthareassessedprimarilythroughdocumentreview.Usersarenotencouragedtocombinescoresforeachelementtodetermineaglobalrating,becausethetoolisintendedforself-assessmentonly.

TechnicalProperties:Beyondestablishingfacevalidity,theinstrument’spsychometricpropertieshavenotbeenresearched.

UserConsiderations:EaseofUse

PractitionersledthedevelopmentoftheQSA;•languageandformatareclearanduser-friendly.

Thetoolisfreeanddownloadableandincludes•anoverviewandinstructions.

Thetoolisscheduledforarevisionwhichwill•targetlengthandguidanceondeterminingratings.

AdditionalSupportsTheNewYorkStateAfterschoolNetworkhas•developedauserguide,whichprovidesaself-guidedwalk-throughofthetool.

ProgramscancontacttheNewYorkState•AfterschoolNetworktoreceivereferralsfortechnicalassistanceinusingtheinstrument.

ProgramsareencouragedtousetheQSAinconcert•withotherformalorinformalevaluativeefforts.

NYSANtrainingsareorganizedaroundthe•10elementsfeaturedintheinstrument,sopractitionerscaneasilyfindprofessionaldevelopmentopportunitiesthatconnecttothefindingsintheirself-assessment.

ForMoreInformation:www.nysan.org

Page 26: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

26

DevelopedbytheWisconsinCenterforEducationResearch&PolicyStudiesAssociates,Inc.

Overview:ThePromisingPracticesRatingScale(PPRS)wasdevelopedinthecontextofastudyoftherelationshipbetweenparticipationinhighqualityafter-schoolprogramsandchildandyouthoutcomes.Thetoolwasdesignedtohelpresearchersdocumenttypeofactivity,extenttowhichpromisingpracticesareimplementedwithinactivitiesandoverallprogramquality.ThePPRSbuildsdirectlyonearlierworkbyDeborahLoweVandellanddrawsuponseveralotherobservationinstrumentsincludedinthisreport.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:GradesK–8

RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.

Content:ThePPRSfocusesprimarilyonsocialprocessesoccurringattheprogramlevel(othertoolsinthePPassessmentsystemareavailabletocollectotherkindsofinformation).Thetooladdressesactivitytype,implementationofpromisingpracticesandoverallprogramquality.Thepracticesatthecoreoftheinstrumentincludesupportiverelationswithadults,supportiverelationswithpeers,levelofengagement,opportunitiesforcognitivegrowth,appropriatestructure,over-control,chaosandmasteryorientation.

Structure:Thefirstpartoftheinstrumentfocusesonactivitycontext.Observerscodethingslikeactivitytype,space,skillstargeted,numberofstaffandyouthinvolved.Observersthenaddabriefnarrativedescriptionoftheactivity.ThecoreofthePPRSiswhereobserversdocumenttowhatextentcertainexemplarsofpromisingpracticearepresentintheprogram.

Methodology:Allitemsinthescaleareaddressedthroughobservation,withanemphasisfirstonactivitiesandthenmorebroadlyontheimplementationofpromisingpracticesbystaffwithintheprogram.Eachareaofpracticeisdividedintospecificexemplars(positiveandnegative)withdetailedindicators.Ratingsareassignedattheoverallpracticelevelusingafour-pointscale.Observersthenreviewtheirratingsofpromisingpracticesacrossmultipleactivitiesandassignanoverallratingforeachpracticeareaandtheoverallprogram.

TechnicalProperties:Strongevidenceforscoredistributionandinternalconsistencyoftheaverageoverallscorehasbeenestablished.Promisingbutlimitedevidenceofmoderateinterrelaterreliabilityandpredictivevalidityhavealsobeenestablished.

UserConsiderations:EaseofUse

Freeandavailableforuse.•

ThePPRSwasdevelopedwitharesearchaudience•inmind.Manualincludesbasicinstructionsforconductingobservationsandcompletingformsbuthasnotbeentailoredforgeneralorpractitioneruseatthistime.

InthestudythePPRSwasdevelopedfor,formal•observationtimetotaledapproximatelytwohourspersite,withadditionalhoursspentreviewingnotesandassigningratings.

AvailableSupportsTraininghasonlybeenmadeavailableinthecontext•ofaspecificstudy.

Datacollection,managementorreportinghasonly•beenavailableinthecontextofaspecificstudy.

Theauthorshavedevelopedarangeofrelated•measuresthatcanbeusedinconjunctionwiththePPRS(e.g.,physicalenvironmentquestionnaire;staff,studentandparentsurveys).

ForMoreInformation:http://childcare.gse.uci.edu/des3.html

Page 27: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

27

QualityAssuranceSystem®DevelopedbyFoundations,Inc.

Overview:TheQualityAssuranceSystem®(QAS)wasdevelopedtohelpprogramsconductqualityassessmentandcontinuousimprovementplanning.Basedonseven“buildingblocks”thatareconsideredrelevantforanyafter-schoolprogram,thisWeb-basedtoolisexpandableandhasbeencustomizedforparticularorganizationsbasedontheirparticularfocus.TheQASfocusesonqualityatthe“site”levelandaddressesarangeofaspectsofqualityfrominteractionstoprogrampoliciesandleadership.

PrimaryPurpose:ProgramImprovement

ProgramTargetAge:GradesK–12

RelevantSettings:Arangeofschool-andcommunity-basedprograms.

Content:ThevariouscomponentsofqualitythattheQASfocusesonareconsidered“buildingblocks.”Thesevencorebuildingblocksinclude:programplanningandimprovement;leadership;facilityandprogramspace;healthandsafety;staffing;familyandcommunityconnections;andsocialclimate.Threeadditional“programfocusbuildingblocks”thatreflectparticularfociwithinprogramsarealsoavailable.

Structure:TheQASisdividedintotwoparts.Partone–programbasics–includesthesevencorebuildingblocks.Foreach,usersaregivenabriefdescriptionoftheimportanceofthataspectofqualityandthenthebuildingblockisfurthersubdividedintobetweenfiveandeightelements,eachofwhichgetsrated.Parttwoofthetool–programfocus–consistsofthethreeadditionalbuildingblocksanditsstructureparallelsthatofpartone.RatingsfortheQASaremadeusingafour-pointscalefromunsatisfactory(1)tooutstanding(4).

Methodology:FillingouttheQASrequiresacombinationofobservation,interviewanddocumentreview.Usersfollowafive-stepprocessforconductingasitevisitandcollectingdata,whichincludesobservationoftheprograminactionandareviewofrelevantdocuments.Onceratingsforeachelementareenteredintothecomputer,scoresaregeneratedforeachbuildingblock–ratherthanasinglescorefortheoverallprogram–reflectingthetool’semphasisonidentifyingspecificareasforimprovement.

TechnicalProperties:Beyondestablishingfacevalidity,researchabouttheinstrument’spsychometricpropertieshasnotbeenconducted.

UserConsiderations:EaseofUse

TheQASisflexibleandcustomizable,withbuilt-in•user-friendlyfeatures.

Theinstructionguidewalkstheuserthroughbasic•stepsforusingthesystem.

The$75annuallicensingfeecoverstwoassessments•andcumulativereports.

Multi-siteprogramscangeneratesitecomparisonreports.•

AvailableSupports

Foundations,Inc.offersonlinesessionsandin-person•trainings.

OnceaQASsitelicenseispurchased,programs•canreceivelightphonetechnicalassistancefreeofchargefromstaff.

Programsthatwishtohavetheirassessment•conductedbytrainedassessorscanpurchasethisserviceundercontractwithFoundations,Inc.

TheQASisavailableinaWeb-basedformatallowing•userstoenterdataandimmediatelygeneratebasicgraphsandanalyses.

ForMoreInformation:http://qas.foundationsinc.org/start.asp?st=1

Page 28: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

28

DevelopedbyFrankPorterGrahamChildDevelopmentInstitute&ConcordiaUniversity,Montreal

Overview:TheSchool-AgeCareEnvironmentRatingScale(SACERS),publishedin1996andupdatedperiodically,isoneofaseriesofqualityratingscalesdevelopedbyresearchersattheFrankPorterGrahamChildDevelopmentInstitute.SACERSfocuseson“processquality”orsocialinteractionswithinthesetting,aswellasfeaturesrelatedtospace,scheduleandmaterialsthatsupportthoseinteractions.TheSACERScanbeusedbyprogramstaffaswellastrainedexternalobserversorresearchers.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation;Research/Evaluation

ProgramTargetAge:GradesK–8

RelevantSettings:Arangeofprogramenvironmentsincludingchildcarecenters,school-basedafter-schoolprogramsandcommunity-basedorganizations.

Content:SACERSisbasedonthenotionthatqualityprogramsaddressthree“basicneeds”:protectionofhealthandsafety,positiverelationshipsandopportunitiesforstimulationandlearning.Thesevensub-scalesoftheinstrumentincludespaceandfurnishings;healthandsafety;activities;interactions;programstructure;staffdevelopment;andaspecialneedssupplement.

Structure:TheSACERSscaleincludes49items,organizedintosevensubscales.All49itemsareratedonaseven-pointscale,from“inadequate”to“excellent.”Concretedescriptionsofwhateachitemlookslikeatdifferentlevelsareprovided.Allofthesub-scalesanditemsareorganizedintoonebookletthatincludesdirectionsforuseandscoringsheets.

Methodology:Whileobservationisthemainformofdatacollection,severalitemsarenotlikelytobeobservedduringprogramvisits.Ratersareencouragedtoaskquestionsofadirector

orstaffpersoninordertoratetheseandareprovidedwithsamplequestions.Formanyitems,clarifyingnoteshelptheuserunderstandwhattheyshouldbelookingfor.Observersenterscoresonasummaryscoresheet,whichencouragesuserstocompileratingsandcreateanoverallprogramqualityscore.

TechnicalProperties:Evidenceforinterraterreliabilityandinternalconsistencyisstrongbygeneralstandards.Convergentandconcurrentvalidityevidenceislimitedbutpromising.

UserConsiderations:EaseofUse

Accessibleformatandlanguage.•

Includesfullinstructionsforuse,clarifyingnotes•andatrainingguide.

ThecostofSACERSbookletis$15.95.•

Suggestedtimeneeded:threehourstoobserve•aprogramandcompleteform.

Guidanceisofferedonhowtosample,observeand•scoretoreflectmultipleactivitieswithinaprogram.

AvailableSupportsAdditionalscoresheetscanbepurchasedinpackages•of30.

Threeandfive-daytrainingsonSACERSstructure,•rationaleandscoring.

Guidanceonhowtoconductyourowntraining•isprovidedinbooklet.

Trainingtoreliabilitytakes4-5days,withreliability•checksthroughout.

AccesstoalistservthroughtheFranklinPorter•GrahamInstituteWebsite.

Largescaleuserscannowusecommercialsoftware•toenter/scoredata.

WithWeb-basedreportingsystem,individual•assessmentscanberoutedtoasupervisorforqualityassuranceandfeedbackandaggregateanalysesandreportingcanbeprovided.

ForMoreInformation:www.fpg.unc.edu/~ecers/

Page 29: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

29

DevelopedbytheDavidP.WeikartCenterforYouthProgramQuality4

Overview:TheYouthProgramQualityAssessment(YPQA)wasdevelopedbytheHigh/ScopeEducationalResearchFoundationandhasitsrootsinalonglineageofqualitymeasurementrubricsforpre-schoolelementaryandnowyouthprograms.TheoverallpurposeoftheYPQAistoencourageindividuals,programsandsystemstofocusonthequalityoftheexperiencesyoungpeoplehaveinprogramsandthecorrespondingtrainingneedsofstaff.Whilesomestructuralandorganizationalmanagementissuesareincludedintheinstrument,theYPQAisprimarilyfocusedonwhatthedevelopersrefertoasthe“pointofservice”–thedeliveryofkeydevelopmentalexperiencesandyoungpeople’saccesstothoseexperiences.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation;Research/Evaluation

ProgramTargetAge:Grades4–12

RelevantSettings:Structuredprogramsinarangeofschool-andcommunity-basedsettings.

Content:Becauseofthefocusonthe“pointofservice,”theYPQAemphasizessocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Themajorityofitemsareaimedathelpingusersobserveandassessinteractionsbetweenandamongyouthandadults,theextenttowhichyoungpeopleareengagedintheprogramandthenatureofthatengagement.HowevertheYPQAalsoaddressesprogramresources(human,material)andtheorganizationorarrangementofthoseresourceswithintheprogram.

Structure:TheYPQAassessessevendomainsusingtwooverallscales.Topicscoveredincludeengagement,interaction,supportiveenvironment,safeenvironment,highexpectations,youth-centeredpoliciesandpracticesandaccess.

Methodology:Itemsattheprogramofferinglevelareassessedthroughobservation.Organizationlevelitemsareassessedthroughacombinationofguidedinterviewandsurveymethods.

Thescaleusedthroughoutisintendedtocapturewhethernoneofsomething(1),someofsomething(3)orallofsomething(5)exists.Foreachindicator,concretedescriptorsillustratewhatascoreof1,3or5lookslike.

TechnicalProperties:Evidenceforscoredistributions,test-retestreliability,convergentvalidityandvalidityofscalestructureisstrong.Evidenceforinterraterreliabilityismixedandevidenceispromisingbutlimitedintermsofinternalconsistencyandconcurrentvalidity.

UserConsiderations:EaseofUse

Languageandformatofthetoolareaccessible.•

Administrationmanualwithdefinitionsofterms•andscoringguidelines.

Thetoolcanbeorderedonline.•

Ratersmustobserveforroughlyfourhoursto•generatesounddata.

Observerscanbetrainedtogeneratereliable•observationsintwodays.

AvailableSupports

One-daybasicandtwo-dayintermediateYPQA•trainingareavailable,withadditionaltechnicalassistanceavailableuponrequest.

Youthdevelopmenttrainingthatisalignedwith•toolcontentisavailable.

Online“scoresreporter”andaWeb-baseddata•managementsystemareavailable.

ForMoreInformation:www.highscope.org/content.asp?contentid=117

4TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.

Page 30: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

30

At-a-glancedescriptionsofthetenassessmenttoolsareprovidedintheprevioussection.Hereweoffermoredetaileddescriptions.Eachwrite-upfollowsacommonformat.

AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures

Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.

ProgramObservationTool(POT)NationalAfterSchoolAssociation

ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce

ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork

PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QualityAssuranceSystem(QAS)Foundations,Inc.

School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality

Page 31: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

31

PurposeandHistoryTheAssessingAfterschoolProgramPracticesTool(APT)isasetofobservationandquestionnairetoolsdesignedtohelppractitionersexamineandimprovewhattheydointheirafter-schoolprogramtosupportyoungpeople’slearninganddevelopment.Itwasspecificallydesignedtoexaminethoseprogrampracticesthatresearchsuggestsmayberelatedtokeyyouthoutcomes(e.g.,behavior,initiative,socialrelationships)anditisacorecomponentoftheAfterschoolProgramAssessmentSystem(APAS).5

TheresearchversionoftheAPT(theAPT-R)wasdevelopedin2003-2004foruseintheMassachusettsAfterschoolResearchStudy(MARS).BasedonextensivefieldtestingbygranteesaswellassomeadditionaltestingofthescalesusingMARSdata,amoreuser-friendlyself-assessmentversionofthetoolwasdevelopedduring2005forusebytheMassachusettsDepartmentofElementaryandSecondaryEducation21stCenturyCommunityLearningCenters(21stCCLC)granteesandotherprogramsinterestedinqualityassessment.Theself-assessmentversionisthefocusofthisdescription.

Theinstrumentcanbeusedtomeasurequalityinawidevarietyofprogrammodelsthatserveelementaryandmiddleschoolstudentsduringthenon-schoolhours.Inadditiontoservingasaself-assessmenttool,theAPTdefinesdesirableprogrampracticesinconcretetermsthatcanbeusedtocommunicatewithstaffandothers,helpstimulatereflectionanddiscussionregardingprogramstrengthsandweaknesses,guidethecreationofprofessionaldevelopmentprioritiesandimprovementgoalsandhelpgaugeprogresstowardthosegoals.

TheAPTfocusesontheexperiencesofyouthinprogramsandisnotintendedtoevaluateindividualstaffperformanceorproducedefinitiveglobalqualityscores

forprograms.WhiletheAPTincludesafour-pointratingscale,assigningratingsisnotrequired;programsareencouragedtousethetoolinwaysthatwillyieldthemostusefulinformationtoguideprogramimprovement.

ContentTheAPTmeasuresasetof15program-levelfeaturesandpracticesthatcanbesummarizedintofivebroadcategories:

Programclimate•

Relationships•

Approachesandprogramming•

Partnerships•

Youthparticipation•

Whileitdoesaddresssomebroaderorganizationalpolicyissues(e.g.,connectionswithschools,staff-youthratios)itwasdesignedtofocusprimarilyonthingsthatprogramstaffhavecontroloverandthatarerelevantacrossarangeofdifferentorganizationalcontextsandfacilities(e.g.,schools,communitycenters).

TheAPTemphasizessomeaspectsofsettingsmorethanothersandinparticularplacesastrongemphasisonrelationships,asresearchhasshownthatrelationshipshavethegreatestimpactonyouthoutcomes.Theprimaryfocusisthereforeonsocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Severalitemshelpusersobserveandassessyouth-adultrelationshipsandinteractions,aswellaspeerinteractionsandconnectionswithfamiliesandschoolpersonnel.Toalesserextentthansocialprocesses,theAPTalsoaddressesprograms’humanandmaterialresourcesandhowthoseresourcesareorganizedorarrangedwithinthesetting.

IndevelopingtheAPT,theauthorsreviewedrelevantliteraturetoidentifyprogramfeaturesthatrelatetooutcomesandalsolookedatexistingdefinitionsandmeasuresofprogramquality.OnesuchdefinitioncommonlyreferencedinthefieldistheNationalResearchCouncil’sfeaturesofpositivedevelopmental

DevelopedbyNIOSTandtheMassachusettsDepartmentofElementary&SecondaryEducation

5TheAPTwasdesignedtoaddressprogrampracticesthatresearchsuggestsleadtoyouthoutcomesmeasuredbytheSurveyofAfterschoolYouthOutcomes(SAYO)–anevaluationsystemdevelopedbyNIOSTundercontractwiththeMADepartmentofEducation.TheSAYOincludespre-andpost-participationsurveysforteachersandprogramstaffandmeasuresthingslikebehaviorintheclassroomandprogram,initiative,engagementinlearning,relationswithpeersandadults,homework,analysisandproblem-solvingandacademicperformance.Formoreinformation,seewww.niost.org/training/APASbrochureforweb.pdf

Page 32: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

32

settings,aframeworkwhichitselfisfocusedprimarilyonsocialprocesses.ItemsontheAPTaddresseachoftheeightfeaturesidentifiedbytheNationalResearchCouncil(2002).

StructureandMethodologyThe15programfeaturesaddressedbytheAPTaremeasuredbyoneorbothoftwotools,theobservationinstrument(APT-O)andquestionnaire(APT-Q).TheAPT-Oguidesobservationsoftheprograminaction,whiletheAPT-Qexaminesaspectsofqualitythatarenoteasilyobservedandguidesstaffreflectiononthoseaspectsofpracticeandorganizationalpolicy.

Althoughthe15programfeaturesdrivethecontentofthetool,theAPT-Oisorganizedbydailyroutine.Fivesectionsareintendedtofollowwhatthedevelopersconsideratypicalprogramday.Whilethesesectionsmostcloselyreflectthedailyroutineinelementaryandmiddleschool21stCCLCprograms,thetoolisdesignedtobeflexibleandusersareencouragedtousewhicheversectionsaremostrelevantinwhateverordermakessense.Thesefivesectionsincludebothinformalprogramtimes(arrival,transitions,pick-up)andformalprogramtimes(homework,activities).

Withineachsection,sub-sectionsfocusonparticularpracticesandbehaviorsduringthosetimeperiods(forexample,sub-sectionsunder“homework”includehomeworkorganization,youthparticipationinhomeworktime,staffeffectivelymanagehomeworktimeandstaffprovideindividualizedhomeworksupport).Finally,eachsub-sectionincludesbetweentwoandeightspecificitemsthatcanbeobservedandrated.

AnimportantstructuralaspectoftheAPTisitsexplicitconnectiontoaspecificyouthoutcomemeasurementtool–theSurveyofAfterschoolYouthOutcomes(SAYO).ProgramscanuseAPTfindingstolookathowtheymaybecontributingtospecificoutcomeareasincludedintheSAYOandusersareprovidedwithchartsconnectingparticularAPTitemstospecificoutcomeareas.Despitethislinkage,theAPTisalsousefulasastand-alonetool.

Recently,anewsetofresources,theAPT-SAYOLinks,weredevelopedasquickguidesforpractitionerstounderstandtheresearchbaseconnectingAPTprogrampracticesandspecificSAYOoutcomeareas.

ProgramFeature APT-O APT-Q

Welcoming&InclusiveEnvironment ¸ ¸

PositiveBehaviorGuidance ¸

HighProgram&ActivityOrganization ¸

SupportiveStaff-YouthRelationships ¸ ¸

PositivePeerRelations ¸

Staff/ProgramSupportsIndividualizedNeeds&Interests ¸ ¸

Staff/ProgrammingStimulatesEngagement&Thinking ¸ ¸

TargetedSAYOSkillBuilding/Activities ¸ ¸

YoutharePositivelyEngagedinProgram/SkillBuilding ¸

Varied/FlexibleApproachestoProgramming ¸ ¸

SpaceisConducivetoLearning ¸

ConnectionswithFamilies ¸ ¸

OpportunitiesforResponsibility,Autonomy&Leadership ¸

ConnectionswithSchools ¸

ProgramSupportsStaff ¸

Page 33: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

33

TheAPT-OTheAPT-Oratingscale,shouldusersdecidetoassignratingstotheirobservations,isafour-pointscaledesignedtoanswerthequestion,“HowtrueisitthatthisstatementdescribeswhatIobserved?”Definitionsofeachpointonthescaledifferslightlydependingonwhetheryouareobservingaprogramorstaffpracticevs.youthbehaviors.Adetaileddescriptionoftheratingscaleaswellasotherratingoptionsandconsiderationsareincludedintheinstructionmanual.Some“conditional”itemsareincluded,whichareonlytoberatedshouldtheyoccur(e.g.,whenyouthbehaviorisinappropriate,staffusesimplereminderstoredirectbehavior).

Ratersareaskedtoassigna1-4(orN/A)ratingtoeachoftheindividualitemswithineachsub-section.Formostitems,aspecificdescriptionofwhata“1”lookslikeisprovided.Thewordingoftheitemitselfconstitutesa“4”sincethequestiondrivingtheratingsis,“How

trueisthis?”Theinstructionmanualprovidesgeneralguidance(notitem-specific)forhowtothinkabout2s(e.g.,desiredpracticewasonlypartiallymet,someminorevidenceofnegativeexpressionsofthepractice,orthepracticeisobservedinfrequently)and3s(observedpracticemostlyreflecteddesiredpractice,orthedesiredpracticewasobservedbutperhapsnotatallexpectedtimes).Thisyear,NIOSTwillbegindevelopingmorespecificanchorsfor2sand3sontheAPTratingscaleforeachitem.Thesenewanchorswillbefield-tested,butnotpsychometricallytestedthisacademicyear.

TheAPT-QTheAPTProgramQuestionnaire(APT-Q)helpsprogramsreflectupontheaspectsofqualitythatarenotnecessarilyobserved,suchasprogramplanning,frequencyofofferings,andconnectionswithparentsandschools.AsisthecasewiththeAPT-O,flexibilityisbuiltintothequestionnairecomponentofthetool.Users

ArrivalTime Howtrue? Notes

1.Thereisanestablishedarrivalroutinethatseemsfamiliartostaffandyouth. 1234N/A

2.Activitiesareavailabletoyouthtobecomeengagedinastheyarrive(mayincludesnack).(e.g.,Widevarietyofactivitiesareavailabletoarrivingyouth.)

1=Therearenoactivitiesavailableforarrivingyouth.Youthhavenothingtodo(e.g.Standaroundwaitingforstafftobeginprogramming).

1234N/A

3.Staffacknowledgechildren/youthwhentheyarrive.(e.g.,Offeragreeting,slaphands,ask“How’sitgoing?”,exchangehellos,etc.)

1=Staffdonotacknowledgeanyarrivingyouth.

1234N/A

4.Staffengagein1:1conversationswithyouth.(e.g.,Talkaboutyouth’sday,askaboutsomethingtheybroughtormade).

1=Staffarenotseenconversingorinteractingwithindividualyouth.

1234N/A

Page 34: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

34

areencouragedtoassignratingsonlytotheextentitisusefultodosoandtoreviewthevarioussectionsofthequestionnairebeforeusetoselectthosethatbestmatchaprogram’spriorities.TheAPT-Q,whichisdividedintoeightsections(seebox),providesopportunitiestoratetheconsistencyand/orfrequencyofcertainpractices.Italsoprovideslistsofspecificprogrampracticesthatsupportvariousqualityfeatures(e.g.,waystocreateawelcomingandinclusiveenvironment),encouraginguserstocheckthosethatareinuseintheprogrambutatthesametimeofferingabroadrangeofconcrete,positivepracticesthatcanencourageprogramdevelopmentandinnovation.

TheAPT-Qincludesthreedifferentratingscales.Afour-point“howconsistently”scale(rarely/never;onceinawhile/sometimes;often/alotofthetime;almostalways/always)isusedwiththesectionfocusedonprogramplanningandtheuseofspecificplanningpractices.Asix-point“howfrequently”scale(rarely/never;afewtimesperyear;aboutoncepermonth;aboutonceper

week;morethanonceperweek;usuallyeveryday)isusedfortwosectionsthatlookatprogramofferingsandtowhatextenttheprogrampromotesresponsibility,autonomyandleadership.Asimplerfour-point“howfrequently”scale(aboutonceperyear;severaltimesperyear;aboutoncepermonth;weeklyormoreoften)isusedforthesectionsthataddresshowtheprogramconnectswithfamiliesandschools.

TechnicalPropertiesThepsychometricinformationthatisavailableontheAPTcomesfromtheversionusedintheMassachusettsAfterschoolResearchStudy(MARS),conductedbytheInterculturalCenterforResearchinEducationandtheNationalInstituteonOut-of-SchoolTime(2005).6Theextenttowhichtrainedratersagreewhenobservingthesameprogramatthesametime,orinterraterreliability,wasmoderateandpreliminaryevidenceforconcurrentandpredictivevaliditysuggesttheAPT-Ryieldsaccurateinformationabouttheconceptsitmeasures.Asmentionedintheprevioussection,theanchorsforthe2and3ratingswillbedevelopedwiththeintentofimprovinginterraterreliability.Iffundingforfurthertestingcomesthrough,NIOSTwillbere-testinginterraterreliability.

Whilethecurrentself-assessmentversionofthetoolhasnopsychometricdata,NIOSTiscurrentlyseekingfundingtoconductfurtherpsychometrictestingoftheAPTandSAYOtools,includingtheextenttowhichthetwotoolscanworktogetherasanintegratedmeasurementsystemandallowforpractitionerstotargetkeypracticesandtrackexpectedoutcomes.

InterraterReliabilityResearchersexaminedinterraterreliabilityfor78programsintheMARSstudyandfoundthatpairedratersagreedontheirratings(withinonescorepoint)85percentofthetime.Iftherangeofresponseoptionsfortheresearchandself-assessmentversionsissimilar,thenwecanexpect,simplybychance,agreementbetweenraterstobeatleast62.5percent,yieldingamaximumkappascoreof0.60(ahighkappaisgenerally

APTProgramQuestionnaireSections

1.Howyouplananddesignprogramofferings

2.Yourprogramofferings

3.Howyourprogrampromotesresponsibility,autonomyandleadership

4.Howyourprogramcreatesawelcomingandinclusiveenvironment

5.Howyourprogramsupportsyouthasindividuals

6.Howyourprogramconnectswithfamilies

7.Howyourprogrampartnerswithschoolstosupportyouth

8.Howyourprogramsupportsandutilizesstafftopromotequality

6Thedevelopershaveconductedadetailedcomparisonofthetwoversions.RoughlyhalfoftheAPT-Ritemsinthecurrentself-assessmentversionappearexactlyastheywerewordedintheresearchversion.Roughlyonequarteroftheoriginalitemsweretakenout,roughlyonequarterwererevisedslightly.

Page 35: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

35

considered0.70).Althoughinterraterreliabilityhasnotyetbeenestablishedfortheself-assessmentversionoftheAPT,existingdatasuggestthatagreementwasmoderatelybetterthanchance.

FaceValidityThedevelopersreceivedextensiveandsystematicfeedbackontheAPT,aboutusabilityaswellasperceivedvalidity.Twenty-sixgranteesrepresentingover100programsitesrespondedtoasetofquestionsaboutthetool,moststaffparticipatedinfocusgroupsandin-depthinterviewswereconductedwith12grantees.Positivefeedbackfromthisrangeofkeystakeholderssuggestedthattheitemsmakeintuitivesenseformeasuringprogramquality.However,thisistheweakestformofvalidityasitisnotbasedonempiricalevidence.

ConcurrentValidityMARSauthorscomparedfindingsfromtheAPT-Rtoobservationsofprogramcharacteristicsandfoundthatcertainitemswererelatedtoprogramandstaffcharacteristicsinexpectedways.Forexample,programsthatscoredhighonitemsrelatedtostaffengagementandengagingactivitiestendedtohavesmallergroupsizes,strongerconnectionswithschools,ahigherstaff-to-childratioandahigherpercentageofstaffwithcollegedegrees.Programsthatscoredhighonitemsrelatingtoyouthengagementtendedtobewell-paced,organizedwithclearroutines,hadahigherstaff-to-childratioandahigherpercentageofstaffwithcollegedegrees.Betterfamilyrelationswererelatedtostrongerconnectionsbetweenprogramsandparentsandthecommunity.Programsthatofferedhighqualityhomeworktimetendedtooffermoreproject-basedlearningactivities,weremoreorganizedwithclearroutines,hadahigherstaff-to-childratioandhadmorestaffthatwerecertifiedteachers.NIOST’sproposedstudyincludesconcurrentvaliditytestingoftheAPTusingtheYPQA.

Thisevidenceofconcurrentvalidityshouldberegardedaspreliminarybecausemanyitemswerenotrelatedtoprogramcharacteristics.Forexample,youthengagementinprogramswasunrelatedtosmallergroupsizesandengagingandchallengingactivitieswerenotrelated

toprogramsbeingwell-pacedandorganizedwithclearroutines.Becausetheauthorsdidnotexplicitlyidentifywhichrelationshipsweremostimportantandwhichfindingsrancontrarytotheirexpectations,wecannotbecertainthattheobservedfindingsindicateunequivocallystrongconcurrentvalidity.

MARSauthorsalsoexaminedtheassociationbetweenAPT-Rscalesandfivestudentcharacteristicsthatwouldbeexpectedfromtheoryandpriorresearch:Students’improvementintheir(1)behavior,(2)initiative,(3)homework,(4)relationswithpeers,and(5)relationswithadults.AuthorsfoundthattheAPT-RYouthEngagementscalewasrelatedtoallstudentcharacteristicsinexpectedwaysexceptRelationswithAdults.HigherscoresonAPT-RChallengingActivitieswasrelatedtolowerscoresonRelationswithPeers,andhigherscoresonAPT-RRelationswithFamilieswasmarginallyrelatedtohigherscoresonRelationswithAdults,butbothAPT-Rscaleswereunrelatedtootherstudentcharacteristics.TheAPT-RHomeworkTimescalewasunrelatedtoallstudentcharacteristics.MARSauthorspointoutthattheprogramsintheirsampledidnotscorehighonChallengingActivitiesorHomeworkTime,indicatingthattheresultscouldbedifferentinasamplewithmorediverseprogramsonthesedimensions.Inaddition,correlationalevidencedidnotshowanysignificantrelationshipsbetweenstudentcharacteristicsandAPT-RscalesconcerningAppropriateSpaceandStaffEngagement.Giventhesefindingsaswellaskeepinginmindthatsome,butnotall,itemsusedintheAPT-Rappearintheself-assessmentversionofthetool,thisevidenceshouldberegardedaspreliminary,withadditionalevidenceneededtofirmlyestablishtheconcurrentvalidityoftheAPT.

ValidityofScaleStructureTheauthorscreatedseveralscalesintheAPT-Rusingastatisticaltechniqueknownasfactoranalysis.Howevertheextenttowhichthesefindingscanbegeneralizedtotheself-assessmentversionisuncleargiventhedifferencesbetweenthetwoinstruments.TestingonthevalidityofthescalestructurehasbeenplannedinNIOST’supcomingstudy.

Page 36: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

36

UserConsiderationsEaseofUseTodate,theAPThasbeenusedby36schooldistrictsoffering21stCCLCprogramsinover150sites,andinafour-citypilot(Charlotte,Boston,MiddlesexCounty,NewJerseyandAtlanta).Inresponsetoitsbroaduse,NIOSThasdevelopedmanyproductsrelatedtotheAPTthatmakeitbothuser-friendlyandadaptabletoavarietyofsystemsandsettings.

Onesuchproductisapackageoftools,includingtheAPT,theSAYOandadatamanagementtrackingsystemwhichtracksindividualyouth’sprogramparticipation(anexampleofsuchasystemisKidtrax).Whencombined,thesetoolsarereferredtoasthe“AfterschoolProgramAssessmentSystem(APAS).”Thesystemcanbeusedtosupporton-linedatacollection,management,analysisandreporting,allowingprogramstolinkqualitydatafromtheAPTandyouthoutcomesdatafromtheSAYOwithinformationondailyattendanceandprogramparticipationinacomprehensive,flexibleandintegratedfashion.

NIOSThasdevelopedotherproductstosupporttheuseoftheAPTinconjunctionwithitsothertoolsaspartofafullassessmentsystem(althoughtheAPTmaybeusedasastand-alonetool).Forexample,inadditiontotheSAYO,NIOSThascreatedaseriesofSAYO-APTLinksor“cheatsheets”tosupportprograms’effortstolinkqualityandoutcomes.TheSAYO-APTLinksdescribeeachSAYOyouthoutcomeareaandcorresponditwiththerelatedprogrampracticefoundintheAPT.Anotherproductisadetailed,practitioner-friendlytrainingnotebookthatguidesprogramsintheuseoftheAPTandrelatedtools.InterestedprogramsshouldcontactNIOSTfortrainingoptionsandcosts;trainingsarecustomizedtomeettheneedsandinterestsoforganizations.

Inanotherdevelopment,NIOSTisfinalizinganonlineYouthSurvey(SAYO-Y)designedtocomplementtheAPTandSAYO.Thesurveymeasuresyouth’sprogramexperiencesinfiveareas(e.g.,inclusiveenvironment,choice&autonomy);youth’ssenseofcompetenceinsixareas(e.g.,math,reading);andyouth’sfutureplanningandexpectations.Thissurveyhasbeentestedwithover

6,000Massachusettsyouthparticipatingin21stCCLCprogramsandwillbefullypilotedagainthisyear.

Flexibilityisahall-markoftheAPT,soalthoughthedevelopersprovidesomeguidanceastowhentoconductobservations,forhowlong,etc.,theyemphasizethattheAPT-Ocanbeusedinmanydifferentwaysandthatdecisionsabouthowmanyobservers,howmanyobservationsandwhethertousenumericalratingsshouldbedrivenbywhatusersintendtodowiththedataintheend.Thegeneralintentionbehindthedesignisforanobservertoobserveonefullprogramsession(typicallyafullafternoon),takingnotesduringtheobservationandusingtimeimmediatelyafterwardstocompleteallrelevantsectionsincludinganopen-ended“impressions”sectionattheendofthetool.ManyprogramsusingtheAPTasaself-assessment,however,havepreferredtoobtainatleasttwodaysofobservationdata.Initstrainingsessions,NIOSTguidesparticipantsinunderstandinghowtheymaywanttousetools,and,therefore,howtheywilltailorthemtosupporttheirevaluationandassessmentobjectives.

AvailableSupportsTrainingonboththeAPTitselfandprocessesforguidingprogramimprovementisavailablethroughNIOST.Mostrecently,atwo-dayAPAStrainingmodulehasbeendevelopedtopreparesitedirectorsandotherstousetheSAYOandAPT.TrainingfortheAPT,alone,isonefullday.NIOSTalsoprovidesaQualityAdvisortrainingtohelpcoachesandothertechnicalspecialistsusetheAPTtoworkwithprograms.AnonlinetutorialsisalsoavailabletopreparesitestousetheSAYOoutcometool.

Inmid-2007,packagingandpricinginformationabouttrainingontheinstrumentbecameavailablefororganizationsthatareinterestedbutnotalreadyaffiliatedwiththeAPTthroughstatewideeffortsinMassachusetts.

InTheFieldTheCityofCambridge,MAAgendaforChildrenisacity-wideout-of-schooltimeinitiativebringingtogethercitydepartments,community-basedorganizations,

Page 37: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

37

businesses,fundersandresidentstopositivelyimpactthelivesofCambridgeyouth.Specifically,theinitiativeworkstoimproveaccesstoandthequalityofCambridgeout-of-schooltimeprograms.Inpursuitofthatgoal,CambridgeAgendaforChildrenhasbeenrefiningamulti-siteprogramimprovementprocessoverthreeyears,usingtheAPTprogramimprovementtoolinthateffort.TheirSelf-AssessmentSupportinitiativesupportsprogramstoengageinobservationandself-reflection,andtaketargetedactionbasedonwhattheysee.

TheAPTgivesprogramcoordinators,sitecoordinatorsandfront-linestaffacommonlanguagefortalkingabouttheirgoalsanddetaileddescriptionsofeffectivepractice.

ForMoreInformationInformationabouttheAPTisavailableonlineat:www.niost.org/content/view/1572/282/orwww.doe.mass.edu/21cclc/ta

ContactKathySchleyer,TrainingOperationsManagerNationalInstituteofOut-of-SchoolTimeWellesleyCentersforWomenWellesleyCollege106CentralStreetWellesley,[email protected]

Page 38: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

38

PurposeandHistoryTheCORALobservationtoolwasdesignedbyPublic/PrivateVentures(P/PV)fortheCORALafter-schoolinitiativefundedbytheJamesIrvineFoundation.ThetoolwasdevelopedforresearchpurposesandwasprimarilyusedinaseriesofevaluationstudiesontheCORALafter-schoolinitiative.TheprimarypurposeoftheobservationswastomonitorfidelitytotheBalancedLiteracyModelandchangeinqualityandoutcomesovertime.

UndertheCORALinitiative,after-schoolprogrammingwasprovidedtoelementaryschoolagedchildreninfivecitiesinCalifornia.Ineachofthesecities,programmingwasdifferentandconsistedofavarietyofactivitiesranginginfocusfromscience-basedprogramstoartandculturalenrichmentprogramming.AllCORALprogramsincludedthecommoncoreelementofBalancedLiteracyprogramming.

Theobservationtoolwasusedintwoways:first,toobserveBalancedLiteracyinstructioninCORALafter-schoolprograms,andsecond,toobservetheintegrationofliteracyprogramminginavarietyofotheractivitiesincludinghomeworkhelpandacademicenrichmentprogramsranginginfocusfromsciencetoartandculturalenrichment.ThoughtheCORALobservationtoolwasdesignedtohelpobserversmeasuretheimpactofafter-schoolprogramsonacademicachievement,ithasapplicationsforobservingqualityinawidevarietyofsettings.

TheCORALobservationtoolhasfivecomponents:

an• activitydescriptionform,usedtogatherdescriptiveinformationontheobservedactivitypriortotheobservation;

an• activitycharacteristicform,completedduringtheobservation,usedtocollectgeneralinformationabouttheactivityandtypeofinstruction(i.e.,lengthoftime,numberofparticipants,numberofstaff,teachingmethods,etc.);

an• activitycheckboxformthatisdividedintothefiveoverarchingcategoriesforobservation–this

istheprimarymethodforrecordinginformationduringtheactivity;

the• activityscalesform,completedaftertheobservation,thatisusedtorateeachoftheconstructsobservedintheactivity;

andtheoverallassessmentcomponent(completed•afterthreeobservationsforliteracyprogramsandaftertwoobservationsfornon-literacyprograms)whichmeasuresbothaspectsoftheactivityandparticipantimprovementofskillareas.

BecausetheCORALinitiativeemphasizedbestpracticesinyouthdevelopment(includingpositiveadult/youthrelationshipsandongoingyouthparticipation),P/PVdevelopedtheCORALtoolbasedonanobservationtoolusedforevaluationoftheSanFranciscoBeaconinitiative.P/PValsodesignedasimilarobservationaltoolcontainingtheupdatedacademiccomponentsfromtheCORALtoolforthePhiladelphiaBeaconinitiative.

ContentTheCORALobservationtoolwasdesignedtohelpresearcherscollectdatathroughanongoingprogramobservationprocesstomeasuretheconnectionbetweenthequalityoftheprogram,fidelitytotheBalancedLiteracyModel,andtheacademicoutcomesofparticipants.Asaresult,themainsectionsofthetoolfocusonfivecomponentsofquality:adult-youthrelations,effectiveinstruction,peercooperation,behaviormanagement,andliteracyinstruction.Eachaspectofqualityhasseveralelementsofqualitythatareratedandcapturedinsubcategories,whicharecalledconstructs.Anumberofthesecoreconstructs(suchasbehaviormanagement,youth/adultrelationships,peercooperation,etc.)arerelevantforbothformalprogramsettingsaswellasinformal,adultsupervisedsettings.Thecharacteristicsratedwithineachoftheconstructsarelistedonpage38.

BecausethefocusoftheCORALobservationtoolisonfidelitytotheBalancedLiteracyModel–astructuredliteracyapproachthatusesavarietyofmodalitiesaimedadevelopingcompetentreaders–ittendstofocusonsocialprocessesandskilldevelopmentinsupportofliteracygains,andlessonprogramresourcesortheorganization

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool

Page 39: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

39

ofthoseresourceswithinaprogram.Literacyfocusedactivitiesareassessedfortheirfidelitytothemodelandtheirassociationwithchangeinparticipantinterestandmotivation.Non-literacyfocusedactivitiesareassessedfortheirintegrationofliteracyskillsintothecurriculum.

StructureandMethodologyThefirstthreecomponentsoftheCORALobservationtool–theactivitydescriptionform,characteristicsformandtheactivitycheckboxform–arefocusedondescribingtheactivityaswellasparticipantandstaffbehavior.Theactivitydescriptionformiscompletedbefore(orafter)theactivitybasedontheinformationgleanedfromtheinstructorina10-15minuteconversation.Theactivitycharacteristicsform,wheregeneralinformationisrecorded,iscompletedduringthefirsttenminutesoftheactivity.Theactivitycheckboxformisarunninglistoftheobservationsofbehaviorandactivitycharacteristics,andisfilledoutbytheobserverwhiletheactivityisongoing.Intheactivitycheckboxform,observerscanchoosebetweenseveralexamplesofpositiveandnegativebehaviorforeachoftheconstructsabove.Iftheyobserveabehaviornotcapturedintheexamples,itcanberecordedinthenotessectionand

consideredinthescoringaftertheobservationiscompleted.Observationsareconductedovera90-minuteperiod(inCORAL,theliteracyactivitytookplaceovera90-minuteperiod,sotheobservationtookplaceoveritsentiredurationinordertobeabletoassessfidelity).

Eachconstructisbasedonafive-pointratingscale,with1representingthelowestscore(definitelyneedsimprovement)and5thehighest(outstanding).TheCORALdeveloperssuggestthat5sbeawardedsparinglyastheintentistoindicatethatthereisnoroomforimprovement.Meanwhile,the1ratingcanbegivenforseveralreasons,includingobservednegativestaffbehaviorsorduetoanactivitynotfittingtheappropriateconstruct.Activityratingsareassignedaftereach90minuteobservationforcumulativeobservationsofthreeforaliteracyprogramandtwoforanon-literacyactivity.

Afterthe90minuteobservation,observerscompletetheactivityscalesformwithin24hoursofeachobservation,includingthedescriptivenarrative.Thecheckboxformdoesnottranslateintoaone-to-onescoreonthescalesform.Insteadobserversarerequiredtoconsidertheactivitiesrecordedonthecheckboxformalongwiththe

Adult-YouthRelationships BehaviorManagement Instruction Literacy PeerCooperation

Adultsupport•fortheactivity

Generaladult•responsiveness

Emotional•qualityoftherelationship

Appropriatenessof•behavioraldemands

Adultmanagement•

Staff’s•inclusivenessofyouth

Clarity•

Organization•

Motivation•

Challenge•

Connectionto•othermaterial

Connection•betweenyouth&material

Cultural•awareness

Responsivenessto•Englishlanguagelearners

Literacyrich•environment

Readaloud•

Booktalk/•discussion/shoutout

Writing•

Independentreading•

Skilldevelopment•activities/games

Buildvocabulary/•spelling

Connectionsbetween•youth&text

Cooperative•activity

Page 40: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

40

duration/frequencyoftheobservedbehavior,qualityofthebehaviorandimportanceofthebehaviortotheactivitywhenderivingascoreforeachconstruct.Additionally,observerscompletedescriptivenarrativesummaries(usingsamplenarrativesasaguide)whichcontainthemostinformativeaspectsoftheactivity.

Uponcompletionoftherequiredseriesofobservations,theoverallassessmentformisusedtoratetheoverallqualityoftheactivity.Theformcontains11narrativequestionsinwhichobserversdescribethestrengths,weaknesses,areasofimprovementfortheactivity,culturalawarenessoftheinstructor,modificationsforvaryinglinguisticneeds,andtheclassroomenvironment.Theobserversarealsoaskedtorecordtheimprovementstheyobserved.Thequestioncontainsapre-selectedlistofskillswhichrangefromacademictovisualartsandperformance.

Thedeveloperssuggestthreeorfourhoursforcompletingtheratingscales,relatednarrativesandtheoverallassessment.

Fortheirresearchpurposes,Public/PrivateVenturesadditionallyrequiredobserverstowriteanarrativedescriptionforeachcomponentandusedthedescriptionsaspartoftheirquantitativeanalysisandasamethodofcompiling“bestpractices”thatexemplifiedspecificconstructs.Inaddition,observerswereaskedtoidentifyinanarrativewhichaspectofeachoftheseskillsparticipantsshowedimprovement.

TechnicalPropertiesTechnicalevidencefortheCORALobservationtoolcomesfromitsuseduringtheevaluationoftheCORALInitiative,atwo-yearstudythatincluded56observationsin23after-schoolprogramsinthefirstyearand43observationin21after-schoolprogramsinthesecondyear.The90-minuteliteracyactivitieswereobservedtwotofourtimeseachduringthefirstyearoftheevaluation,andaminimumofthreetimeseachduringthesecond.Evidenceisdrawnfromstudy’sinitialreportafterthefirstyear(Arbreton,Goldsmith&Sheldon,2005)aswellasthefinalreport,whichdrawsonevidencefrombothyears(Arbretonetal.,2008).Datafromthenon-literacyactivityobservationswereusedtocreatequalitativeprofilesoftheactivities,andnostatisticalanalysiswasconducted.FortheliteracyactivitiesintheCORALevaluation,statisticalanalysiswasconductedtoidentifytherelationshipbetweenprogramqualityandparticipantacademicgains.ThetechnicalpropertiesdescribedbelowpertaintoP/PV’sanalysisofliteracyobservationdataonly.

SomeusersmaybeinterestedinsummarizingdatafromtheCORALintoscales.IntheiranalysisfromtheCORALinitiative(Arbreton,Goldsmith,&Sheldon,2005;Arbretonetal.,2008),thedeveloperscreatedfourscalesusingitemsfromtheinstruments’ActivityScalesForm:(1)Adult-YouthRelation(averageofitems1through3);(2)InstructionalQuality(averageofitems5through8);(3)GroupManagement(averageofitems16and17);

Q8.Challenge(++) Q8.Challenge(––)Thestaff:

encourageyouthtopushbeyondtheirpresentlevel•ofcompetency.

trytosustainthemotivationofyouthwhoare•discouragedorreluctanttotry.

continouslymovetothenextstepassoonas•youthprogress.

reinforceandencourageyouth’seffortsinorder•tomaintaintheirinvolvement.

Thestaff:

discourageyouthwhotriedtopushbeyondtheir•presentlevelofcompetency.

missopportunitiestosustainthemotivationof•youthwhoarediscouragedorreluctanttotry.

missopportunitiestomovetothenextsteepas•soonasyouthprogressed(e.g.,pacewastooslow)

donotreinforceorencourageyouth’seffortsin•ordertomaintaintheirinvolvement.

N/Abecause:

Page 41: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

41

(4)ConnectionBetweenYouthandActivities(averageofitems9and10).

DevelopersalsocreatedanOverallLessonRating,whichisanaverageofscoresfromthreescales(Adult-YouthRelation,InstructionalQuality,&GroupManagement),andseveralitems:ReadAloud(item19),BookTalk(item20),Writing(item21),IndependentReading(item22),&ConnectiontoYouth(item10).

ScoreDistributionsItemsandscalesshouldbeabletodetectmeaningfuldifferencesacrosssettings,andthereforeshouldexhibitarangeofscores.Scoredistributionswereexaminedforthefourdomain-specificscalesaswellassixindividualitemsrepresentingtheBalancedLiteracystrategiesmeasuredintheCORALtool.Mostitemsexhibitedgoodscoredistributions,althoughtheitem“SkillDevelopmentActivities”(item24)wasonthelowendofthescaleforbothyearsofmeasurement(averagescoreswere1.5and1.8,respectively,onascaleof1to5).

InternalConsistencyResponsestoitemscomprisingscalesshouldbehighlyrelated,suggestingthattheitemsformmeaningfuldomains.TheinternalconsistencyoftheOverallLessonRatingwasquitestrongwithanalphaof.94.OfthefourscalesmeasuringspecificdomainsfromtheCORALInitiative,three(AdultSupport,InstructionalQuality,andGroupManagement)exhibitedexcellentinternalconsistencywithCronbach’salpharangingfrom.84to.88(exceedingtherecommendedvalueof.70),suggestingthatthescalesarecohesiveandcomposedofrelateditems.However,thescaleConnectiontoYouthandActivitieswaslesscohesive(alpha=.54),suggestingthatthetwoitemscomposingthisscaleareonlymoderatelyrelatedandmoreitemsmaybeneededtofullycapturethisdomain.

ValidityofScaleStructureAnanalysisthatexaminesscalestructurevalidityforasinglescaletestswhetherthescaletrulymeasuresadistinctandcoherentdomain(ratherthanseveraldifferentconceptsordomains).DevelopersexaminedwhetherseveralitemsandscalesfromtheCORAL

toolcouldbecombinedtorepresentanOverallLessonRating.FindingsindicatedthattheOverallLessonRatingrepresentsacohesivesummaryofmultipledomainsfromtheCORALmeasure.Thedevelopersdidnotexaminescalestructurevalidityforotherscalesmeasuringspecificdomains(suchasInstructionalQuality).

PredictiveValidityIftheCORALtooltrulymeasureseffectiveliteracystrategies,wecanexpectthatscoresfromtheliteracystrategyitemswillberelatedtogainsinreadingandEnglishlanguageskills.Toexaminetheinstrument’spredictivevalidity,authorsexaminedtherelationshipbetweenscoresontheCORALtoolwithoutcomesof234-383childrenacrossthetwoyearsofthestudy.

AsdiscussedintheCORALinitiativereports(Arbreton,Goldsmith,&Sheldon,2005;Arbretonetal.,2008),authorsclassifiedprogramsintofiveliteracyprofilesbasedonhowtheprogramsscoredonthesixitemsmeasuringBalancedLiteracystrategies.ReaderswhoareinterestedinmoreinformationonhowtheliteracyprofileswerecreatedshouldconsulttheCORALinitiativereportscitedinthisdocument.Attheendofthefirstyear,childrenwhoattendedprogramswithbetterliteracyprofileshadgreaterreadingimprovementsontheInformalReadingInventory(IRI)butwerenotmorelikelytohavepositiveoutcomesontheCaliforniaStandardsTest-English-LanguageArts(CST-ELA).OnereasonwhyliteracyprofilesmayhavepredictedbetterscoresontheIRIbutnottheCST-ELAcouldbebecausetheIRIfocuseslargelyonreadingabilitiesandcomprehension,whereastheCST-ELAissomewhatbroaderandalsoincorporateswritingskillsandwordanalysis.Classroompractices(asmeasuredbythefourscales)didnotbythemselvespredictreadingimprovementsineithertheIRIortheCST-ELA,overthefirstyearoftheevaluation.

Attheendofthesecondyear,theauthorscreatedascalecalledtheOverallLessonRatingthatisacombinationofitemsmeasuringliteracystrategiesaswellasclassroompractices.TheOverallLessonRatingpredictedmorepositiveoutcomesontheCST-ELA(scoresontheIRIwerenotexaminedinthesecond

Page 42: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

42

year).ThefindingsfrombothyearssuggestthattheCORALtoolsuccessfullymeasuresqualityinprogramswithastrongliteracycomponent.

UserConsiderationsEaseofUseAlthoughtheCORALtoolcanbeusedbyanyone,itwasdesignedexclusivelyforresearchpurposesandhasnotbeenspecificallyadaptedforpractitioneruseatthistime.However,thetooldoescontaindetailedinstructionsforconductingobservationsandcompletingtheforms.

AvailableSupportsAtthistime,Public/PrivateVenturesisnotofferingtrainingonuseoftheCORALobservationtool,thoughatwo-daytrainingwasofferedtotheobserversparticipatingintheevaluationstudy.Thistrainingincludedareviewoftheobservationmaterials,mockobservationsandwrite-upsondayone,andafieldobservationinwhichanewobserverwaspairedwithanexperiencedguideondaytwo.Thetraineeandguidecomparednotestodevelopconsistency.Traineesreceivedongoingmonitoringandsupport.

IntheFieldInthe2004-2005schoolyear,theCORALobservationtoolwasusedaspartofanevaluationoftheprograms’implementationoftheBalancedLiteracyModel.TheevaluationwasconductedbyPublic/PrivateVentures.Liveobservationsoftheparticipants’experiences,theimpactofbalancedliteracyontheacademicgainsofparticipants,andfidelitytotheBalancedLiteracyModelwereatthecenteroftheresearch.Observationswereconductedduringtheprogramparticipants’third-gradeandfourth-gradeyears,andeachparticipantwasobservedthreetimesbyanobserverassignedfromapoolofobservers.

Inanalyzingthedata,researchersfoundthatalthoughthelinkbetweenqualityandacademicgainswasinconclusive,thefindingsindicatethattheparticipantswiththegreatestacademicgainswerethosethatparticipatedinhigherqualityprograms.Additionally,all

youth,includingthosereadingbelowgradelevel,hadgreatergainswheninthehigherqualityprogramsthandidtheircounterpartsinlowerqualityprograms.Inthesecondyearofobservations,theresearchersobservedthattherewasconsistencyintheimplementationoftheliteracymodelaswellashigherqualityimplementation.Inthesametimeperiod,researchersalsoobservedreadinggainsthatwere39percenthigherthanthegainsrecordedinthefirstyear.

ForMoreInformation:InformationabouttheCORALobservationtoolisavailableat:www.ppv.org/ppv/initiative.asp?section _ id=0&initiative _ id=29

Contact:AmyArbreton,SeniorResearchFellowPublic/PrivateVenturesLakeMerritPlaza1999HarrisonStreet,Suite1550Oakland,CA94612510.273.4600

Page 43: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

43

youth.Thefirstsectioncapturesarangeofin-depthinformationaboutthetypeofactivitybeingobservedandtheskillsemphasizedthroughthatactivity;theremainderfocusesonwhattheyouthdevelopmentliteraturepointstoascriticalcomponentsofprograms.

Becauseofitsdevelopmentalgroundinganditsfocusonwhatyoungpeopleexperienceinsideofprograms,theOSTObservationToolhasanactivityandprogram-levelfocusanddoesnotaddressorganizationalissuesrelatedtomanagement,leadershiporpolicy.Theprimaryfocusisonsocialprocesses–includingrelationalissuesandmanyitemsthatspeakspecificallytoinstructionandlearning.Beyondoneitemrelatedtomaterials,theinstrumentdoesnotfocusonprogramresourcesortheorganizationofthoseresourceswithinthesetting.

ThecontentoftheOSTObservationInstrumentalignsverycloselywiththeSAFEframework(DurlakandWeissberg,2007)whichoutlinesfeaturesofprogramsthatcontributetopositiveoutcomesforyouthinout-of-schooltimeprograms.SAFEreferstoout-of-schooltimeactivitiesthatare:

Sequenced:• Contentandinstructionaredesignedtoincreasinglyadvanceskillsandknowledgeandhelpyouthachievegoals;

Active:• Activitieslendthemselvestoactiveengagementinlearning;

PersonallyFocused:• Activitiesstrengthenrelationshipsamongyouthandbetweenstaffandyouth;

Explicit:• Activitiesexplicitlytargetspecificlearningordevelopmentalgoals.

The2008versionofthetoolwasupdatedtofullyalignwiththeSAFEframework.Thechangesaremostobviousinthereorganizationofthequalitativeportionoftheinstrumentinwhichobservationalnotesarerecordedandsynthesized.Thisframeworkwasalsousedintheevaluators’analysesoftheirYear2findingsfortheNewYorkandNewJerseystudieswhichdemonstratedthattheOSTindicatorsmapwelltotheSAFEframework.

PurposeandHistoryPolicyStudiesAssociates(PSA)developedtheOut-of-SchoolTimeProgramObservationTool(OSTObservationTool)overafive-yearperiod,inconjunctionwithseveralresearchprojectsrelatedtoafter-schoolprogramming,includingamajorstudyofpromisingpracticesinhigh-performingprograms(Birmingham,Pechman,Russell&Mielke,2005).Athirdeditionofthetool,revisedin2008,wasreviewedforthiscompendium.TheinstrumentwasrecentlyusedinstudiesoftheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTime(OST)ProgramsforYouthandoftheNewJerseyAfter3Initiative.

Thetoolwasdevelopedwithresearchgoalsinmind–inparticularthedesiretocollectconsistentandobjectivedataaboutthequalityofafter-schoolactivitiesthroughobservation.Itsdesignisbasedonacoupleofassumptionsabouthigh-qualityprograms–firstthatcertainstructuralandinstitutionalfeaturessupporttheimplementationofhigh-qualityprogramsandsecondthatinstructionalactivitieswithcertaincharacteristics–variedcontent,mastery-orientedinstructionandpositiverelationships–promotepositiveyouthdevelopmentoutcomes.

TheOSTObservationToolcanbeusedinvariedafter-schoolcontextsincludingschool-orcenter-basedprogramsandwithyouthparticipantsthatareinkindergartenthrough12thgrade.Whilethetoolcanprovideprogramstaffwithaframeworkforobservingandreflectingontheirpractice,itwasdevelopedfor,andthusfarhasprimarilybeenusedfor,researchpurposes.Initscurrentdesign,itisnotintendedtobeusedtoassignoverallqualityscoresforprogramsorstaff.

ContentTheOSTObservationToolwasdesignedtoprovideresearchersandotheruserswithaframeworkforobservingessentialindicatorsofpositiveyouthdevelopment.Itfocusesonthreemajorcomponentsofprograms:activitytype,activitystructuresandinteractionsbetweenyouthandadultsandamong

DevelopedbyPolicyStudiesAssociates,Inc.

Page 44: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

44

Additionalchangesbetweenthesecondandthirdeditionsinvolveinclusionoftheacademicandtechnologyfeaturesofprograms.Thisnewsectionfeaturesitemsrelatedtoliteracy,mathinstructionandtheuseoftechnology,guidinguserstonotethepresenceorabsenceofactivitiesthatmeetliteracy,mathortechnologygoals.

StructureandMethodologyThefirstpartoftheinstrument,whichfocusesonactivitytype,providesobserverswithdetaileddefinitionsfordocumenting:

Typeofactivity(e.g.,tutoring,visualarts,music,•sports,communityservice)

Typeofspace(e.g.,classroom,gym,library,•auditorium,hallway,playground)

Primaryskilltargeted(e.g.,artistic,physical,•literacy,numeracy,interpersonal)

Numberandeducationlevelofstaffinvolved•intheactivity

Environmentalcontext(e.g.,supervision,space,•materials)

Number,genderandgradelevelofparticipants•

Inaddition,theinstrumentprovidesdetaileddescriptionsofeachofthefourSAFEfeaturesforratingpurposes.Theaboveobservationsarerecordedonacoversheetthatalsoincludesotherbasicinformationabouttheobserver,program,date,time,etc.

Theremainderofthetooladdressesfivekeyyouthdevelopment“domains”includingrelationships(youth-andstaff-directedareconsideredseparately),youthparticipation,skillbuildingandmastery,andactivity

contentandstructure.Eachdomainissubdividedintofourtosevenspecificindicatorsorpractices.Foreachindicator,adetailed“exemplar”isofferedtoguideratings.Forexample:

Domain:• Relationship-building:Youth.

Indicator:• Allormostyoutharefriendlyandrelaxedwithoneanother.

Exemplar:• Youthsocializeinformally.Theyarerelaxedintheirinteractionswitheachother.Theyappeartoenjoyoneanother’scompany.

TheratingscaleintheOSTObservationInstrumentasksuserstoassestheextenttowhicheachindicatorisorisnotpresentduringanobservation.Whilethedevelopershaveexperimentedwithboththree-andfive-pointscalesinvariousstudies,thethirdeditionoftheinstrumentusesaseven-pointratingscalewhichgivesmoreroomforcapturingsubtleties,where1=notevidentand7=highlyevidentandconsistent(seebelow).A“5”ratingisconsideredbasicquality.Observersareinstructedtofirstselecttheoddnumberthatmostcloselyreflectsthelevelofevidenceobservedandthen,ifnecessary,tomoveupordowntotheadjacentevennumberifthatmoreaccuratelyreflectsthepresenceoftheindicatorwithintheactivity.

DevelopersoftheOSTObservationToolhavestructureditflexiblysothatuserscanarrangescalesdifferentlyfordifferentpurposes.Althoughdefinitiverulesforconstructingscalesdonotexist,intheirreportonthevalidationstudy(Pechmanet.al.,2008),theauthorspresentfourdifferentmethodsforcreatingscales.Thefoursetsofscaleswereusedindifferentstudiesacrossseveralyearsandwereeachguidedbyseparatetheoriesandevaluationquestions.Theauthorspresentthescalesetsasdifferentoptionsforuserstosummarizedata

1 2 3 4 5 6 7

Exemplarisnotevident

Exemplarisrarelyevident

Exemplarismoderatelyevidentorimplicit

Exemplarishighlyevidentandconsistent

Page 45: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

45

fromtheOST.Userswhoareinterestedinsummarizingdatashouldrefertothereportofthevalidationstudyforinformationonwhichitemscomposespecificscales.

Thefirstscalesetincludesfourscales:(1)Youthrelationship-buildingandparticipation,(2)Staff-youthrelationships,(3)Skillbuildingandmasteryand(4)Activitycontentandstructure.ThesescaleswereusedinastudyofSharedFeaturesofHigh-PerformingAfter-SchoolProgramsconductedonbehalfofTheAfter-SchoolCorporationandtheSouthwestEducationalDevelopmentLaboratory(Birminghametal,2005)

Thesecondscalesetissimilarandalsoincludesfourscales:(1)Youthrelationship-building,(2)Staff-relationshipbuilding,(3)Instructionalmethodsand(4)Activitycontentandstructure.ThesescaleswereusedinthefirstyearoftheevaluationoftheNewJerseyAfter3initiative(Kimetal.,2006).

Thethirdscalesetincludesthreescales:(1)Relationships,(2)Instructionalstrategiesand(3)Activitycontentandstructure.ThesescaleswereusedinthefirstyearoftheevaluationoftheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative(Russelletal.,2006).

ThefourthscalesetisbasedontheSAFEframeworkwhichemphasizesactivitiesthatareSequenced,Active,FocusedandExplicit,asdescribedbyDurlakandWeissberg(2007).ThefourscalesinthissetcorrespondtothesefourSAFEdomains.ThesescaleswereusedinthesecondyearevaluationsofboththeNewJerseyAfter3initiativeandtheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative(WalkingEagleetal.,2008;Russelletal.,2008).

TechnicalPropertiesPsychometricinformationpresentedherefortheOSTObservationInstrumentcomesfromthethreestudiespreviouslymentioned(SharedFeaturesofHigh-PerformingAfter-SchoolPrograms,NewJerseyAfter

3,andtheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative).

ScoreDistributionsPechmanandcolleagues(2008)examinedthescoredistributionsofallscalesineachofthefourscalesets.Theauthorsgenerallyfoundgoodvariabilityinthescoresacrossraters’observationsof159to238activitiesin10to15programsobservedacrossthethreestudies.OneexceptionwastheActivescalefromtheSAFEscaleset.Theaveragescoreforthisscalewassomewhatlow(1.9onascaleof1to7),makingitdifficulttodeterminewhetherthescalehasdifficultycapturingdifferencesacrossprogramsormostprogramssimplydonotkeepyouthveryactive.However,findingssuggestthatallotherscalescapturemeaningfuldifferencesacrossavarietyofactivitiesandprograms.

InterraterReliabilityObserversusingthisinstrumentreachedhighlevelsofagreement.Examiningdatafromfiveassessmentsacrossthreeseparatestudies,pairsofresearchersco-observedbetween19to40activitieswithin10to15programsateachassessment.UsingPearsonandintraclasscorrelations,researchersexaminedtheinterraterreliabilityfortheoverallscoreaswellastheaveragescorewithineachoftheinstrument’sfivedomains.Whenavailable,intraclasscorrelationsweregenerallyabovetherecommendedvalueof.50,indicatingstrongagreement.StrongagreementwasalsosupportedbythePearsoncorrelations,whichwereclosetoorabovetherecommendedvalueof.70.7Therefore,thesefindingssuggestthattrainedraterscanachievehighoverallagreementacrosseachofthefivedomains.8

InternalConsistencyInternalconsistencywasstrongforallscalesacrossthefoursetswithCronbach’salphasexceedingtherecommendedvalueof.70.Thesefindingssuggestthateachofthescales’itemsarehighlyrelatedandformmeaningfuldomains.Thefirstscalesetsummarized

7TheoneexceptionwasfortheActivityContentandStructuredomain,whichwaslowerforoneoutofthefiveassessments,butagreementontheotherfourassessmentswasstrong.8Theinterraterreliabilitiesforspecificitemsandscaleswerenotreportedandcouldbelower.

Page 46: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

46

TASCstudydatacollectedfrom173independentobservationsin10programsandhadalphalevelsrangingfrom.73to.88.ThesecondscalesetsummarizedNewJerseyAfter3datacollectedfrom179independentobservationsin10programsandhadalphalevelsrangingfrom.81to.83.ThethirdscalesetsummarizedNewYorkCityOSTstudydatacollectedfrom238independentobservationsfrom15programsandhadalphasrangingfrom.80to.87.ThefourthscalesetsummarizedbothNewYorkCityOSTstudydataandNewJerseyAfter3datacollectedfromacombinedtotalof358observationsandhadalphalevelsrangingfrom.84to.88.

ConcurrentValidityOSTdevelopersexaminedtheconcurrentvalidityofthethirdscalesetdrawingon1,444youthsurveysfromtheDYCDOSTinitiativeinNewYorkCityandtheNewJerseyAfter3initiative.Specifically,usingSpearman’sRhorankordercorrelationcoefficients,researchersexaminedtheassociationsbetweentheOSTRelationships,InstructionalStrategies,andActivityContentandStructurescaleswithresponsesfromaseparateyouthsurveyonInteractionswithStaff,InteractionswithPeers,SenseofBelonging,ExposuretoNewExperiences,andAcademicBenefits.

HigherscoresontheRelationshipsscalewererelatedtohigherscoresforExposuretoNewExperiences,InteractionswithPeers,andInteractionswithStaffforbothyearsofthestudy,aswellashigherscoresforSenseofBelonginginthefirstyearandAcademicBenefitsinthesecondyear.HigherscoresontheInstructionalStrategiesscalewererelatedtohigherscoresonExposuretoNewExperiences,SenseofBelonging,andInteractionswithStaffinthefirstyearofthestudybutnotthesecond.InstructionalStrategieswasnotrelatedtoInteractionswithPeersorAcademicBenefits,andtheActivityContentandStructureScalewasnotrelatedtoanyyouthexperiencesfromtheyouthsurveyineitheryear.

Basedonthesefindings,theavailableconcurrentvalidityevidenceismixed.Inaddition,concurrentvalidityevidencedoesnotcurrentlyexistfortheotherscalesets.

ValidityofScaleStructureAnanalysisexaminingscalestructurevaliditytestswhetheritemsformingmultiplescalestrulymeasuredistinctandcoherentdomainasexpected.Theinstrument’sdevelopersconductedafactoranalysistoexaminethestructuralvalidityofthefourthscaleset.FindingssuggestedthattheitemscouldbecategorizedintothefourSAFEdomains,althoughtheSequencedandExplicitdomainsweremoderatelyrelated,suggestingthattheyarenotentirelydistinctfromoneanother.Inaddition,theActivedomainappearedtobeacombinationoftwodistinctcategories,suggestingthatthisdomainisnotcompletelycohesive.

AlthoughtheinstrumenthassomeevidenceofscalestructurevalidityfortheSAFEdomains,evidencefortheotherthreescalesetsiscurrentlyunavailable.

UserConsiderationsEaseofUseWhiletheOSTObservationInstrumentisavailableonlineandisfreeforanyonetodownloadanduse,itisimportanttorecognizethatitwasdevelopedwithprimarilyaresearchaudienceinmind.Theintroductiontothetoolincludesanoverviewandreviewofbasicproceduresforconductingobservationsandcompletingtheform,butthematerialshavenotbeentailoredforpractitionersatthistimeanduselanguage(e.g.,sampling,reliability)thatmaynotbeaccessibletosomeaudiences.

ItsdevelopersconsidertheOSTObservationTooltobehighlyefficienttocompleteinthefield.Usersobserve15minutesofanactivityandscoreitimmediatelyinlessthanfiveminutes.Usersareadvisedtoobserveatotalof8-10activitiesoveratleasttwoafternoons(orapproximatelythreehoursofprogramobservation)toadequatelysampleprogramofferings.Additionalguidanceabouthowtoorganizeobservationsonsite,sampleactivitiesappropriatelyandmanagemultipleobserversisprovidedintheinstrument’sproceduressection.

AvailableSupportsAtthistime,trainingrelatedtotheOSTObservationInstrumentislimitedtoindividualsinvolvedina

Page 47: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

47

specificstudythatemploystheinstrument.Datacollectorsparticipateintrainingsthatprovideadetailedoverviewoftheinstrument,itsindicatorsandthetheoreticalframework.Followingareviewoftheoperationaldefinitionsforeachcategoryandgroupofindicators,researchersparticipateinpracticeratingsessionsusingvideo-tapedsamplesofafter-schoolactivitiestobuildinterraterreliabilitypriortofieldwork.Additionalreliabilitychecksareconductedinthefieldandinfollow-upmeetingstoensurecommoninterpretationoftermsanditems.

ResearcherstypicallyusetheobservationdatacollectedwiththeOSTinstrumentinconjunctionwithsupplementary(butnotformallylinked)measuressuchasinterviews,surveysandfocusgroups.Asresearchcontinues,validitydatawillbecomeavailableabouttherelationshipbetweenprogramqualityfeaturesandyouthoutcomesasmeasuredbysomeoftheseotherinstruments.

InTheFieldIn2005,theNewYorkCityDepartmentofYouthandCommunityDevelopment(DYCD)contractedwithPSAtoconductacomprehensiveevaluationofits536OSTprogramsserving69,000participantsunderitsOut-of-SchoolTimeProgramsforYouthinitiative.Participatingserviceproviders,servingallgradelevels,operatedunderoneofthreefundingmechanisms:a)OptionI,targetedtowardageneralpoolofserviceprovidersoperatingprogramsinneighborhoodsthroughoutNewYorkCity;b)OptionII,forprogramsusinga30%privatefundingmatch;andc)OptionIII,forprogramsoperatedincollaborationwiththeDepartmentofParksandRecreationandofferedatParkssites.

TheOSTObservationToolwasusedinasampleof15ofthe536OSTprogramsaspartoftheoverallevaluation.Theevaluationcombinedthissamplingdatawithotherdatasources,includingaparticipationdatabaseandprogramdirectorsurveystoroundoutthepicture.Thefirst-yearevaluationfindingsidentifiedavenuesforimprovingtheeffectivenessofOSTprogramminginseveralareas.Forexample,

althoughprogramssuccessfullyenrolledstudentsinthefirstyear,theystruggledtomaintainhighyouthparticipationrates,suggestinganeedtoestablishprogrampoliciesandactivityofferingsthatencouragedregularparticipation.Additionally,whileprogramsinthefirstyearconsistentlyprovidedsafeandstructuredenvironmentsforparticipants,,theyexperiencedchallengesindeliveringinnovative,content-basedlearningopportunitiesthatengagedyouth.

PSA’ssecondyearfindingscenteredonevidenceofprograms’effortstoimproveprogramqualityandscale.ThefindingssuggestthatOSTprogramsincreasedboththeirenrollmentandparticipationrates.Programsscaledupenrollmentfrom51,000youthinthepreviousyeartoservemorethan69,000youththroughoutNewYorkCity.RatesofindividualyouthparticipationalsoincreasedsubstantiallycomparedtoYear1,indicatingthatprogramsweresuccessfullyrecruitingandretainingparticipants.Inaddition,programsreportedthattheyimprovedthequalityandcapacityoftheirprogramstaffthroughimprovedhiringandprofessionaldevelopmentopportunities.

Inyearthree,theevaluationwillcontinuetocollectdatafromOSTprogramstoexploretheassociationsamongprogram-qualityfeatures,youthparticipationpatterns,andyouthoutcomes.

ForMoreInformationTheOSTObservationInstrumentisavailableonlineat:www.policystudies.com/studies/youth/OST%20Instrument.html

ContactChristinaRussellorEllenPechmanPolicyStudiesAssociates1718ConnecticutAvenue,NWWashington,[email protected]

Page 48: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

48

PurposeandHistoryTheProgramObservationToolisthecenterpieceoftheNationalAfterSchoolAssociation’sprogramimprovementandaccreditationprocessandisdesignedspecificallytohelpprogramsassessprogressagainsttheirStandardsforQualitySchool-AgeCare.TheinstrumentwasdevelopedbytheNationalAfterSchoolAssociation(NAA)andtheNationalInstituteonOut-of-SchoolTimein1991andwasbasedontheAssessingSchool-AgeChildCareQualityProgramObservationInstrumentdevelopedbySusanO’Connor,ThelmaHarms,DebbyCryerandKathrynWheeler.Theinstrumentwasrevisedin1995andpilotedbetween1995and1997.AdditionalrevisionswerethenmadebeforeNAA’saccreditationsystembecameactivein1998.

TheNAAStandards,whichtheProgramObservationToolisbasedon,aremeanttoprovideabaselineofqualityforafter-schoolprogramsservingchildrenandyouthbetweenages5and14.Theyareintendedforuseingroupsettings–primarilyschoolandcenter-based–wherechildrenparticipateregularlyandwherethegoalissupportingandenhancingoveralldevelopment.

Rootedintheframeworksoftheearlychildhoodandschool-agecarefieldsoftheearly1990’s,theinstrumentandtheNAAstandardsreflectmuchofthethinkingofthetime,particularlyintermsoflicensingandmonitoring,andhavebeenusedaspartofaseven-stepaccreditationprocessforthepastdecade.Pre-datingthecreationofthefederal21stCenturyCommunityLearningCentersprogram,theNAAstandardshaveplayedasignificantroleinthefieldandhavebeenadoptedbyarangeofprogramsandsystemsacrossthecountry.Therearenow20,000copiesofthestandardsbookinprintandover500programsacrossthecountryareinsomestageoftheaccreditationprocess.

In2008,NAAshiftedawayfromitsroleasanaccreditingbodyandisnowofferingaccreditationthroughtheCouncilonAccreditation.NAAwillcompletetheaccreditationprocessthroughtheendof2009forallagenciesthatappliedandwereatsomestageoftheprocessbeforeSeptember2008.Foragenciesapplying

afterSeptember2008,aprogramrelationshipmanagerfromtheCouncilwillassistthemthroughtheprocess.

TheProgramObservationToolwillstillbeavailableforagenciesinterestedinusingitforself-assessmentandimprovementpurposes(ashasalwaysbeenthecaseforagenciesnotseekingaccreditation).NAAwillretaintherightstothestandardsandmaterials,andcontinuetoprovidesupportsfortechnicalassistance.

ContentTheProgramObservationToolmeasures36“keysofquality”thatareorganizedintosixcategories.Fiveofthosecategoriesareconsideredobservableandareassessedprimarilythroughobservation:humanrelationships;indoorenvironment;outdoorenvironment;activities;andsafety,healthandnutrition.Thesixthcategory–administration–isassessedthroughquestionnaire.

BecauseofNAA’scommitmenttosupportingchilddevelopmentinaholisticway,theinstrumentmeasuresarangeofsocialprocesses–howchildrenandstaffwithinthesettinginteract.Becauseofthelinktoaccreditation,italsofocusesquiteabitonprogramresourcesandthearrangement(spatial,socialandtemporal)ofthoseresourceswithintheprogram.Unlikesomeoftheothertoolsinthiscompendium,theProgramObservationToolalsoaddressesprogrampoliciesandproceduresthatarebelievedtoinfluencequality.

TheProgramObservationToolpre-datestheNationalResearchCouncil’sfeaturesofpositivedevelopmentsettingsframework(2002)byoveradecadeanddrawsmoreheavilyontheearlychildhoodliteraturethantheyouthdevelopmentliterature.However,itdoesaddressmanyoftheNRCfeatures,placingtheleastemphasison“supportforefficacyandmattering”and“skillbuildingopportunities.”

StructureandMethodologyThefivequalitycategoriesthatarethefocusoftheProgramObservationToolaremeasuredusingoneoverallinstrumentthatincludesthe20relevantkeysandatotalof80indicators(fourperkey).Ifaprogramisgoingthroughthe

DevelopedbytheNationalAfterSchoolAssociation

Page 49: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

49

accreditationprocess,theadministrationitems(includedintheStandards,butnottheObservationTool)areassessedseparately,throughquestionnaire/interview.

TheratingscaleusedthroughouttheProgramObservationTool(seeexamplebelow)isintendedtocapturewhethereachindicatoristrueallofthetime(3),mostofthetime(2),sometimes(1)ornotatall(0).Althoughspecificdescriptionsofwhata0,1,2,or3lookslikeforeachindicatorarenotprovided,betweenoneandeightdescriptivebulletstatementsareincludedundereachindicatortoclarifymeaning.

Spaceisprovidedforobserverstotakenotesoneachindicator.Atthebottomofeachpage,observersareencouragedtototaltheirnumericalscoresforeachqualitykeytoachieveanoverallprogramrating.Tallysheetsandinstructionsareprovidedformultipleobserverstoreconcileandcombinetheirscores.Inordertoachieveaccreditation,therearetwo“weighted”categories–program/activitiesandsafety/nutritioninwhichprogramsmustmeetacertainthresholdinordertobeaccredited.

6.Childrenandyouthgenerallyinteractwithoneanotherinpositiveways.GuidingQuestions:Dochildrenseemtoenjoyspendingtimetogether?Dotheytalkaboutfriendsattheprogram?Dotheytendtoincludeothersfromdifferentbackgroundsorwithdifferentabilitiesintheirplay?

Comments Rating

a.Childrenappearrelaxedandinvolvedwitheachother.Groupsoundsarepleasantmostofthetime.•

0123

b.Childrenshowrespectforeachother.Teasing,belittlingorpickingonparticularchildrenisuncommon.•

Childrenshowsympathyforeachotherandhelpeachother.•0123

c.Childrenusuallycooperateandworkwelltogether.Childrenwillinglysharematerialsandspace.•

Theysuggestactivities,negotiaterolesandjointlyworkoutrules.•

Childrenincludeotherswithdevelopmental,physicalorlanguage•differenceintheirplay.

Childrenoftenhelpeachother.•

Thereisastrongsenseofcommunity.•

0123

d.Whenproblemsoccur,childrenoftentrytodiscusstheirdifferencesandworkoutasolution.

Childrenlistentoeachother’spointofviewandtrytocompromise•(e.g.iftwochildrenwanttousethesameequipment,theymaydecidetotaketurnsasasolution).

Childrenknowhowtosolveproblems.•

Theirsolutionsaregenerallyreasonableandfair.•

Theydonottrytosolvedisagreementsbybullyingoractingaggressively.•

0123

Page 50: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

50

TechnicalPropertiesAlthoughnopsychometricevidenceisavailableontheProgramObservationToolitself,thereisinformationavailableabouttheASQ(AssessingSchool-AgeChildcareQuality),fromwhichthePOTwasderived.UsersshouldnotethattheASQ’spsychometricpropertiesmaynotbecompletelyconsistentwiththoseofthePOT.9Overall,evidenceforinterraterandtest-retestreliabilityisstrongfortheASQ,meaningtheassessmentsofthesameprogrampracticesbydifferentobserversareconsistentandassessmentsarestableovertime.Followingrevisionstothescales,evidenceofinternalconsistency,orthedegreetowhichitemsfittogetherinmeaningfulways,wasstrong.Validitydataarelimited,althoughpreliminaryevidenceforconcurrentvaliditysuggeststheinstrumentmayyieldaccurateinformationabouttheconceptsitmeasures.10

ThefieldstudywhichprovidespsychometricsupportfortheASQinvolvedasampleof40after-schoolprogramsinMassachusettsandNorthCarolina(Knowlton&Cryer,1994).TwoversionsofASQscaleswereexamined:originalandrevised.Therevisedversion’sscalesarecomparabletothoseinthePOT:HumanRelationships,IndoorEnvironment,OutdoorEnvironment,Activities,andSafety,HealthandNutrition.Oftheoriginalscales,onlytwooverlappedwiththePOT,namelyhumanrelationshipsandactivities.Whenappropriate,westatewhichsetofscalesexhibitsspecificproperties.

InterraterReliabilityToexamineinterraterreliability,pairedratersevaluated40programsusingthemeasure.ASQindicatorsareorganizedinto21itemsandthoseitemsarefurtherorganizedintofivescales.KnowltonandCryerexaminedagreementamongratersatboththeitemandscalelevels.Thekappastatisticmeasuresthedegreetowhichratersagreeandcorrectsforcaseswhereratersagreesimplybychance.Allitemshadkappascoresabove.70,generallyconsideredthethresholdforhighagreement.Theauthorsalsocomputedintraclasscorrelationsand

alloftheASQoriginalscalesandtotalscorewerenearorabove.70,showinggoodagreementonthesescores.11However,becauseonlytheoriginalASQscales,nottherevisedversions,wereexamined,wecangeneralizeonlyforthosescalesthataresimilar(HumanRelationships,Activities–andthetotalscore).

Test-RetestReliabilityIdeally,instrumentsshouldbeabletoassessmajorchangesovertimebutshouldexhibitstabilityinscoresacrossmultipleassessmentsintheshort-term.FortheASQ,25programswerereassessedtwoweeksaftertheirinitialassessmenttodeterminetheinstrument’stest-retestreliability.KnowltonandCryer(1994)foundthatallitemsdemonstratedacceptablestability,withkappascoresabove.70.12Theauthorsalsocomputedintraclasscorrelationcoefficientstoexaminestabilityoftheoriginalscalesandtotalscoreovertime.Allscalesandtotalscorewereabove0.70,butwecanonlygeneralizetothescalesthatoverlapwiththePOT–HumanRelationshipsandActivities–andthetotalscore.

InternalConsistencyTodeterminewhetheritemswithinthescalesfittogetherinmeaningfulways,KnowltonandCryerexaminedtheinternalconsistencyoftheoriginalscalesandthetotalscorebycomputingastatisticcalledCronbach’salpha.Thealphaforoneoftheoriginalscales(Safety)wasverylow,sotheauthorsrevisedthescales(andtherevisionsmorecloselymatchthePOT).Resultsfromtherevisedscalesandthetotalscoredemonstratedgoodinternalconsistency,withalphasnearorabovetherecommendedcutoffof.70.

ConvergentValidityTodeterminetheextenttowhichtheASQyieldsaccurateinformationabouttheaspectsofprograms

9ThereareslightlymoreindicatorsintheASQ(84)thaninthePOT(80).Itisunclearhowmanyindicatorsareidenticalorsimilar.10Thetechnicalsectiononlyevaluatesevidencefromtheobservationalportionoftheinstrument,nottheadministrationquestionnaire.

11ReadersshouldnotethatKnowltonandCryeralsolookedattheinterraterreliabilityoftheindividualindicatorsthatcomposedtheitems.Manyindicatorsexhibitedpooragreement.However,summingtheindicatorsintoitemscreatesmorereliablemeasuresbecauseitcancelsoutsomeofthemeasurementproblems.Forthisreason,usersshouldevaluateprogramsbasedontheitemsandscales,nottheindividualindicators.12Similartotestsoninterraterreliability,theauthorsfoundthat40percentoftheindicatorshadpoorshort-termstability.However,themeasurementproblemsassociatedwithindividualindicatorslikelycanceloutwhencreatinganitemscore.Again,usersshouldexaminetheitemsandscales,nottheindicators,whenevaluatingprograms.

Page 51: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

51

itissupposedtomeasure,KnowltonandCryer(1994)comparedtheASQscoresfor11programswithsubjectiveratingsbyexperts.Specifically,twoexpertsrankedasetofprogramsintermsofoverallqualitywithineachofthefiveoriginalASQdomainsusingtheirowncriteria.UsingwhatiscalledtheSpearmancorrelation,ASQrankingsweremoderatelytostronglyrelatedtotheexpertrankings,withtheexceptionoftheSafetyandHealthandNutritionareas.Thisvalidityevidenceshouldberegardedaspreliminary,basedonthesmallnumberofprogramsandexpertsincludedintheanalysisandthefactthatestimateswerecomputedontheoriginal,unrevisedASQscales.

UserConsiderationsEaseofUseTheProgramObservationToolandNAAstandardsweredevelopedwithsignificantinputfrompractitioners,resultinginaccessiblelanguageandauser-friendlyformat.

ProgramswishingtoundertaketheaccreditationprocesscancontacttheCouncilonAccreditation(seecontactinformation).Forself-assessmentpurposes,observingtheprogramandscoringthefullinstrumenttakesroughly3-5hours.Theself-studymanualprovidesverydetailedguidancetoprogramdirectorsandstaffonhowlongandhowmuchoftheprogramtoobserve,howtodetermineratingsandhowtocombinescoresfromdifferentraters.

Theobservationtoolisoneofapackageofproductsrelatedtoaccreditation–theAdvancingandRecognizingQualityKit–whichincludesthestandardsbook;theguidetoprogramaccreditation;self-studymanualsthatincludetheobservationtoolaswellasstaff,familyandchild/youthquestionnaires;andatrainingvideo.Theteamleader’smanualwalksprogramdirectorsorstaffthroughthevariousstepsoftheaccreditationprocessindetailandincludesspecifictoolsfordevelopinganactionplanforimprovementbasedonobservationaldata.Asapackage,theseresourcescostapproximately$300.Thereareadditionalcostsrelatedtothefullaccreditationprocess.

AvailableSupportsItisimportanttoreiteratethatwhilethissummaryhasfocusedspecificallyontheProgramObservationTool,thatinstrumentisjustonepieceofanintegratedsetofresourcesrelatedtoself-studyandaccreditation.NAAofferstrainingthatcoverstheProgramObservationToolthroughitstheday-longEndorserTraining(NAArecommendstwoandahalfdaysoftraininginordertoensurereliability).SomeNAAstateaffiliatesofferlocaltrainingrelatedtotheinstrumentforprogramsinterestedinusingitforself-assessmentandimprovement.

IntheFieldTheUniversityofMissouri-AdventureClubisadistrict-wideafter-schoolinitiativeforelementaryschoolstudentsinMissouri’sColumbiapublicschooldistrict.TheNationalAfterschoolAssociation’sstandardsandobservationtool,aswellasthelargerAdvancingSchool-AgeQuality(ASQ)processwithinwhichtheseareembedded,serveastheorganizingframeworkforAdventureClub’s18programs.Institutionalizationofthestandardshasresultedinacommonlanguageandunderstandingofprogramqualitythatspanstheindividualstaff,programandcross-sitelevels.

TheProgramObservationToolisusedbyeachofthe18programsseveraltimeseachyearandisacorepieceofthenewstafforientationprocess,whichincludesconductinganddiscussingaprogramobservationwithmoreseniorcolleagues.Line-staffarewell-versedinthe36“keysofquality,”andeachweekduringcross-sitedirectors’meetingsonekeyisthefocusofin-depthdiscussion.

Inadditiontoregularobservations–bystaff,administratorsandparents,eachprogramhasanASQteammadeupofthesestakeholders(parents,staff,administratorsandsometimeschildren).Teamsmeetmonthlyorbi-monthlytoreviewnewobservationdataandrevisittheprogram’simprovementplan.“Thisisacontinuousprocess–itdoesn’tstartandstopeachyear.Eachprogramdevelopedaplanwhenwefirststartedusingthestandardsandthosegetrevisitedandupdatedseveraltimesayearbasedonongoingobservation,”

Page 52: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

52

explainedChrissyPoertner,whocoordinatestheaccreditationandimprovementprocessforthe18programs.Observationdataandprogramimprovementplansarealsousedtoguidestaffdevelopment.

Initiallysomestaffexpressedconcernthatthetoolwaslongandwouldbecumbersometoworkwith,butPoertnersaystheoverallresponsehasbeenverypositive,especiallybecauseeveryoneisinvolvedinandownstheprocess.“Thesetoolsgivestaffaguide,andwhenyou’reoutthereworkinginthefieldtheautonomycanfeeloverwhelming.Becausewe’vecreatedthebuy-inandtheyarepartoftheimprovementprocess,peoplerespondreallypositively.”

ForMoreInformationAdditionalinformationaboutNAA’sobservationtoolandaccreditationprocessisavailableonlineat:http://naaweb.yourmembership.com/?page=NAAAccreditation

ContactJudyNee,PresidentandCEOTheNationalAfterSchoolAssociation529MainStreet,Suite214Charlestown,[email protected]

Page 53: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

53

PurposeandHistoryTheProgramQualityObservationScale(PQO),fundedbytheNationalInstituteofChildHealthandHumanDevelopment(NICHD)aspartofaninitiativetostudyout-of-schooltime,wasdesignedtohelpobserverscharacterizetheoverallqualityofanafter-schoolprogramenvironmentandtodocumentindividualchildren’sexperienceswithinprograms.Thetoolhastwocomponents–qualitativeratingsfocusedontheprogramenvironmentandstaffbehaviorandtimesamplesofchildren’sactivitiesandtheirinteractionswithstaffandpeers.

ThePQOwasdevelopedforresearchpurposesbyDeborahVandellandKimPierceandhasbeenusedinaseriesofstudies,primarilylookingatthequalityofschool-andcenter-basedafter-schoolprogramsservingfirstthroughfifthgradeelementary-schoolchildren.TheinstrumenthasitsrootsinVandell’sobservationalworkinearlychildhoodcaresettings,includingtheNICHDStudyofEarlyChildhoodCareandherworkinafter-schoolprograms,includingtheEcologicalStudyofAfter-SchoolCarefundedbytheSpencerFoundation.

Theprimaryfocusofthetimesampleprocedureisonthreecomponentsofindividualchildren’sexperiencesinprograms–relationshipswithstaff,relationshipswithpeersandopportunitiesforengagementinactivities.Thequalititativeratingsfocusonallchildren’sexperiencesintheprogramintermsofstaffbehaviorandtheprogramenvironment.Thequalitativeratingsofprogramenvironmentarebestsuitedforuseinformalschool-orcenter-basedafter-schoolprograms,whilethequalitativeratingsofstaffbehaviorsandthetimesamplingofchildren’sactivitiesandinteractionsarerelevantinbothformalprogramsettingsaswellasinformal,adult-supervisedsettings.

ContentThePQOwasdesignedtohelpresearchersunderstandthequalityofchildren’sexperiencesinsideprogramsandfocusesonthreecomponentsofquality–relationshipswithstaff,relationshipswithpeersandopportunitiesfor

engagementinactivities.Asnotedabove,theinstrumenthastwomajorcomponents–qualitativeratingsandtimesamplesofchildren’sactivitiesandinteractions.Ratingsaremadeoftheprogramenvironmentandstaffbehavior,orwhatthedeveloperscall“caregiverstyle.”Thefollowingthreeaspectsoftheprogramenvironmentarerated:

Programmingflexibility•

Appropriatenessanddiversityoftheavailable•activities

Chaos•

Fourcharacteristicsofcaregiverstylearerated:

Positivebehaviormanagement•

Negativebehaviormanagement•

Positiveregardforchildren•

Negativeregardforchildren•

Thetimesamplecomponentofthetoolisdesignedtorecordtheactivitiesandinteractionsofindividualchildrenwithintheprogram.Thereare19differentactivitycategoriesforobserverstoselectfrom(e.g.,arts/crafts,largemotor,snack,academic/homework).Inaddition,thetoolprovidesobserverswithsixdifferenttypesofinteractionstolookfor:positive,neutralandnegativeinteractionswithpeers,andpositive,neutralandnegativeinteractionswithstaff.

BecausethefocusofthePQOisonchildren’sexperiencesinsideofprograms,ittendstofocusprimarilyonsocialprocessesandlessonresourcesortheorganizationofthoseresourceswithinprograms.However,Vandellandcolleagueshavedevelopedanumberofrelatedmeasuresthatdocaptureaspectsoftheseothercomponents,suchasaphysicalenvironmentscale.DevelopedlongbeforetheNationalResearchCouncil’sfeaturesofpositivedevelopmentalsettingsframework(2002),someaspectsofthePQOalignwellwiththatframeworkwhileothersmoreclearlyreflectitsearlychildhoodroots.

DevelopedbyDeborahLoweVandell&KimPierce

Page 54: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

54

StructureandMethodologyThefirstcomponentofthePQO–thequalitativeratings–isfocusedonprogramenvironmentandstaffbehavioror“caregiverstyle.”Ratingsareassignedbasedonaminimumof90minutesofcontinuousobservation.Whileprogramenvironmentratingsaremadeoftheprogramasawhole,caregiverstyleratingsaremadeseparatelyforeachstaffmemberobserved(butcouldbeadaptedtorateallstaffmemberscollectively).

Programenvironmentandcaregiverstyleratingsaremadeusingafour-pointscale.Usersaregivendescriptionsofwhatconstitutesa1,2,3or4rating

forthreedistinctaspectsofprogramenvironment–flexibility,activitiesandchaos,andfourdifferentaspectsofcaregiverstyle–positivebehaviormanagement,negativebehaviormanagement,positiveregardandnegativeregard.A“4”ratingmeansthatparticularaspectoftheenvironment(orstaffbehavior)ishighlycharacteristicoftheprogram(seeexamplebelow).

ThetimesamplingcomponentofthePQOisfocusedontheactivitiesandinteractionsthatindividualchildrenengageinatanafter-schoolprogram.Activitytypeisrecordedusing19differentcategories.Interactionsareassessedintermsofwhethertheyarepositive,neutralornegativeandwhethertheyhappenwithpeersorwithstaff.Inaddition,staffinteractionsarefurthercodedtonotewhethertheyareone-on-one,smallgrouporlargegroup.

Timesamplingentailsdocumentingtheactivitiesandinteractionsthatanumberofindividualchildrenhaveinaprogramforshortperiodsoftime.ThedevelopersofthePQOsuggestthat30-minutetimesamplesbeconductedin30-secondintervals(foratotalof60intervals).Duringeachinterval,theraterobservesachildfor20secondsandthenspends10secondsrecordingorcodingwhattheyobserved.Becausetimesampleobservationswillsometimesinvolvefewerthan60intervals,scoresneedtobeadjustedforthetotalnumberofintervalsactuallyobserved.Thistimesamplingcomponenthasbeenadjustedforuseindifferentstudies(forexamplewithlongerobservationperiods,fewercycles,etc).13

TechnicalPropertiesAvailablepsychometricevidencesupportingthePQOaddressesscoredistribution,interraterreliability,test-retestreliability,convergentvalidityandconcurrentvalidity,mostlyfromareportbyVandellandPierce

Chaos

4=

Chaosanddisorganizationarehighlycharacteristics,persistingacrossmultipleactivitiesandsettings.Thechildrenareoutofcontrol.Theymaybefightingwithoneanother,yelling,orbehavinginappropriately,jumpingonfurniture,ruiningmaterials,orjustgenerallyrunningaround.Activitiesdonotseemorganized;disorderisevident.

3=

Thereischaosanddisorganizationintheenvironment,butitisnotcharacteristicofmanychildrenorallactivities.Agroupofchildrenmayexhibitthebehaviorsthatmeritaratingof4orsomeactivitiesandtransitiontimesmaybechaoticanddisorganizedsuchthattheprogressoforbeginningofactivitiesforsomechildrenisimpeded.

2=

Oneortwochildren’sbehaviormaybeoutofcontrol,butingeneral,children’sbehaviorisappropriateandreasonablycontrolled.Transitionsandactivitiesgenerallygosmoothly,althoughtheremaybeexceptions.

1=Nochaosordisorganizationisobservedintheenvironment.Children’sbehaviorisappropriate,andactivitiesandtransitionsproceedsmoothly.

13ReadersshouldnotethatthedevelopersdidnotdesignthePQOforself-assessment,butratherforalargestudythatrequiredthattimesampleobservationscenteronasinglechildofinterest.Thetimesamplecomponentoftheinstrumentcouldbemodifiedforgeneralusebyobserversrandomlychoosingchildrenforeachassessment.However,itisuncleariftheavailablepsychometricfindingsonthetimesampleobservationswillextendtothismodifiedinstrument.Thiscaveatdoesnotapplytothequalitativeratings,whichweredesignedtomeasuretheprogramasawholeanddonotrequiremodificationforself-assessment.

Page 55: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

55

(2006)basedonmultipleobservationsofafter-schoolprogramsoverseveralyearsintheNICHDStudyofAfter-SchoolCareandChildren’sDevelopment.Thestudyincludedabroadsampleof46for-andnonprofitprogramslocatedinschools,childcarecentersandcommunitycenters.Eachprogramwasobservedthreeorfourtimesayearoverafive-yearperiod.PredictivevalidityevidencecomesfromastudyconductedbyPierce,BoltandVandell(2008),whichexaminessocialandacademicoutcomesfrom120childrenenrolledin46after-schoolprograms.Childrenwereassessedduringtheir2ndand3rdgrades.

ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecificdimensions.VandellandPierceexaminedtheaveragescoresandrangesforoverallobservedqualityandtheindividualqualitativeratingscalesobtainedinformalprogramsintheStudyofAfter-SchoolCare.Theoverallqualityscorewascreatedbyaveragingtheindividualqualitativeratings(afterreversingthescoresforChaos,NegativeRegardandNegativeBehaviorManagement).Forboththeoverallqualityscoreandtheindividualratings,annualcompositesareaveragesofallobservationsconductedwithinaschoolyear.

Theoverallscoreandmostofthequalitativeratingsandtime-sampledactivitiesandinteractionshadwidevariability,suggestingtheinstrumentcandetectdifferencesamongavarietyofprograms.Acrossmultipleobservationsinseveralyearsofstudy,thefullrangeofscoreswasobtainedformostofthesampledactivities.Amongthequalitativeratings,lowvariabilitywasfoundamongNegativeBehaviorManagementandNegativeRegardforChildren.However,thestrongvalidityevidencesuggeststhattheinstrumentisdetectingmeaningfuldifferencesinthesedomainsdespitetheirlowfrequencies.

Forchildren’sinteractions,thefullrangeofscoreswasobtainedforneutralinteractionswithstaffandwithpeers;aswouldbeexpected,therangewasmorerestrictedforinteractionsthatwereclearlypositiveornegative.

InterraterReliabilityThedegreetowhichdifferentratersagreewhenobservingthesameprogramwastestedforboththequalitativeratingsandtimesamplingcomponentsoftheinstrument.Forthequalitativeratings,kappacoefficientswerecomputedonceayearoverfouryears.Allofthedomainshadscoresabove.70,thebenchmarkforstronginterraterreliability,exceptStaffNegativeRegard,forwhichthelowestcoefficientwas.59.TheproportionofNegativeRegardscoresonwhichobserversachievedexactagreementwashigh,however,suggestingthatthemoderatekappascoremaybeduetotherelativeinfrequencywithwhichnegativeregardwasobserved.Theaveragekappascoreforstaffnegativeregardwas.82,suggestingthattrainedraterscanreachacceptableagreementonalldomains.

Agreementwasalsocomputedforalldomainsofthetimesampleobservationsexceptgroupsize.Kappascoresatalltimepointswereeitherabove.70orveryclose,indicatingstronginterraterreliability.

InternalConsistencyTodeterminewhetheritemsfittogethertoformameaningfuloverallscore,theauthorscomputedastatisticcalledCronbach’salpha.VandellandPiercefoundalphalevelsfortheannualoverallobservedqualityscoresaveraged.81,wellabovetherecommendedcutoffof.70.

Test-RetestReliabilityInordertodeterminewhetherthequalitycompositeandindividualqualitativeratingsgeneratedbythePQOarestableovertime,VandellandPiercecorrelatedtheratingsmadeinadjacentobservationsduringthesecondyearofthestudy,whenthesamplewaslargestandmostrepresentativeoftherangeofprogramsinthecommunity(N=45).Fourobservationswereconductedineachprogram,approximatelytwomonthsapart.Ratingsfromthefirstobservationwerecorrelatedwiththosefromthesecond;theratingsfromthesecondobservationwerecorrelatedwiththosefromthethird;andtheratingsfromthethirdobservationwerecorrelatedwiththosefromthefourth.Correlations

Page 56: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

56

foroverallobservedqualitywerenearorabove.70,suggestingtheinstrumentdetectschangesinprogramqualityandisnotoverlysensitivetominorchanges.Correlationsfortheindividualratingswerelower,withtheaverageforalldomainsrangingfrom.34to.59.Thissuggeststhatprogramsareonlysomewhatstableintheirscoresforparticulardomainsoverperiodsoftwomonths.Itisunclearwhetherthisreflectsshort-termvariability(asmightbeseenfromonedaytothenext)ormeaningfulchangesoverthecourseoftwomonths.

ConvergentValidityToexaminewhetherthePQOyieldsaccurateinformationabouttheaspectsofprogramsitissupposedtomeasuretheauthorscomparedfindingsforthequalitativeratingstofindingsfromtheSACERS(alsoreviewedinthisreport).Ifbothinstrumentstrulymeasureprogramquality,onecanreasonablyexpectthatthefindingswillberelated.ThefollowingrelationshipsbetweenPQOandSACERSscaleswereexamined:(1)PQOProgrammaticFlexibilitywaspositivelyrelatedtoSACERSProgramStructure,(2)PQOAvailableActivitieswaspositiverelatedtoSACERSActivities,and(3)PQOStaffPositiveRegardandPositiveBehaviorManagementwaspositivelyrelatedtoSACERSInteractions,andPQOStaffNegativeRegardandNegativeBehaviorManagementwasnegativelyrelatedtoSACERSInteractions.Thesefindingsprovidestrongevidencethattheinstrumentadequatelymeasuresprogramquality.Althoughconvergentvalidityissupportedformostqualitativeitems,wecannotinferthevalidityoftheChaosratingbecausetherewasnocomparableSACERSquestiontocompareitto.

ConcurrentValidityAnotherwaytoexaminewhetherthePQOyieldsaccurateinformationabouttheaspectsofprogramsitissupposedtomeasureistocomparetheinstrument’sratingstootherdistinctbuttheoreticallyimportantconcepts.DeveloperscomparedthefindingsassessedbythePQOtostructuralfeaturesofafter-schoolprograms(Pierce,K.M.,Hamm,J.V.,Sisco,C.,&Gmeinder,K.,1995).Similartowhatisreportedintheearlychildhoodliterature,PositiveRegardratingswerehigherandNegativeRegardscoreswerelowerinnonprofit

programscomparedtofor-profitprograms,whenchild-staffratiosweresmallerandwhenprogramstaffhadmoreformaleducation.Programmingflexibilitywashigherinnonprofitcomparedtofor-profitprogramsandwhenchild-staffratiosweresmaller.Ratingsofavailableactivitieswerehigherinnonprofitprograms.

Relationsbetweentime-sampledactivitiesandinteractionswerealsoassociatedwithprogramcharacteristicsaswellaschildreportsoftheirexperiencesintheprograms.Forexample,childrenwereobservedtohavemorefrequentpositive/neutralinteractionswithstaffandlessfrequentnegativeinteractionswithpeersinprogramswithsmallergroupsizes;andsmallerstaff-childratioswereassociatedwithchildrenhavingmorefrequentpositive/neutralinteractionswithstaffandspendinglesstimeintransition(e.g.,standinginline)andinlargemotoractivities.Althoughthisevidenceisquitestrong,itisunclearwhetherresearchershadexpectedadditionalrelationshipsthattheydidnotfind.

PredictiveValidityInastudyconductedbyPierce,Bolt,andVandell(2008),researchersexaminedtherelationshipsbetweenthreePQOscales(Staff-childrelations,Availableactivities,andProgrammingflexibility)andsocialandacademicoutcomesof120childrenenrolledin46after-schoolprograms.Programobservationswereconductedseveraltimesperschoolyearfortwoyears,andchildren’soutcomeswereassessedattheendofeachschoolyearwhentheywerein2ndand3rdgrades,respectively.BetterStaff-Childrelationswasassociatedwithhigherreadingscores(bothgradelevels),mathscores(grade3only)andbettersocialskills(2ndgradeboysonly),butthescalewasunrelatedtoworkhabits.Activitieswasrelatedtobettermathgradesandworkhabits(bothforGrade3only),butthescalewasunrelatedtoreadinggradesandsocialskills.ProgrammingFlexibilitywasnotrelatedtoanyoutcomes.Predictivevalidityappearsmixedbuttheevidenceshouldberegardedaspreliminary.Theauthorsstatethatlittleresearchcurrentlyexiststhatexaminestherelationshipsofspecificprogramstrategies(ratherthanoverallprogramquality)withchildren’sacademicandsocialoutcomes.

Page 57: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

57

UserConsiderationsEaseofUseWhilethePQOisavailableforanyonetouse,itisimportanttorecognizethatitwasdevelopedwithexclusivelyaresearchaudienceinmind.Whilethemanualincludesbasicinstructionsforconductingobservationsandcompletingtheforms,itwaswrittenforresearchersparticipatingindatacollectionrelatedtoaparticularstudy.Thematerialshavenotbeentailoredforgeneralorpractitioneruseatthistimeandthereforeincludesomeconceptsandlanguage(e.g.,adjustedfrequencies,sampling,qualitative)thatmaynotbeparticularlyaccessiblefornon-researchaudiences.

InthecontextofthestudiesthePQOwasdevelopedfor,formalobservationtimeatsiteswasfairlylimited,butsomeadditionaltimeshouldbefactoredinforreviewingnotesandassemblingratings.Itisrecommendedthatthequalitativeratingsofenvironmentandstaffbehaviorbemadebasedonaminimumof90minutesofobservation.Completingthetimesampleprocessasoutlinedinthemanualtakesaminimumof30minutes(6030-secondcycles)foranexperiencedobserver.Someguidanceabouthowtoconductobservationsanddevelopratingsisprovidedinthemanual.

AvailableSupportsAtthistime,trainingisnotregularlyavailableonhowtousethePQO,buthasbeenconductedwithdatacollectorsinvolvedinthestudiesitwasdevelopedfor.Trainingsincludereviewingthecontentsoftheinstrumentandpairingnewraterswithtrainedraterstodoanobservationinthefield,comparescoresandbuildinter-observeragreement.

ObservationdatacollectedusingthePQOhavealwaysbeencoupledwithsupplementarydatasourcessuchasaquestionnaireaboutthephysicalenvironmentaswellasstaff,studentandparentsurveys.HoweverformallinksdonotexistbetweentheobservationtoolandothermeasuresandthePQOcouldbeusedindependently.

IntheFieldIntheStudyofAfter-SchoolCareandChildren’sDevelopment,conductedbyDeborahVandellandKimPierceinthemid-1990s,liveobservationofchildren’sexperiencesinprogramswasatthecenteroftheresearch(Pierce,K.,Hamm,J.,&Vandell,D.L.,1999).Observationswereconductedduringtheprogramparticipants’first-gradeyearandeachchildwasobservedthreetimesbyanindividualobserverwhowasrandomlyassignedfromapoolofobservers.TheobserversusedbothcomponentsofthePQO–thetimesampleprocedureandqualitativeratingsoftheprogramenvironmentandcaregiverstyle.Othertypesofinformationwerecollectedusingdifferentmethodsandmeasures.

Inanalyzingthedata,theresearcherslookedforassociationsbetweenthevariousmeasuresofprogramqualityandalsoatassociationsbetweenprogramqualityandchildren’sadjustmentatschool.Intermsofhowaspectsofprogramqualityrelate,staffpositivitywasnegativelycorrelatedwithstaffnegativity,asonemightexpect.Staffpositivitywashigherinprogramsthatweremoreflexibleandofferedmoreactivities.Staffnegativitywasassociatedwithlessprogrammaticflexibility.Theyalsofoundassociationsbetweentheprogramqualityindicatorsandchildren’sadjustmentinthefirst-gradeclassroom,primarilyforboys.Staffpositivitywasassociatedwithboysearninghigherreadingandmathgradesandexhibitinglessexternalizingbehavioratschool.Greaterprogrammingflexibilitywasassociatedwithboysexhibitingbettersocialskillsatschool.Greateravailabilityofage-appropriateactivitieswasassociatedwithboysearningpoorerreadingandmathgrades,andexhibitingpoorerworkhabitsandmoreexternalizingbehavioratschool.

Pierce,Bolt,andVandell(inpress)recentlyexaminedassociationsbetweenprogramqualityindicatorsasmeasuredbyseveralPQOqualitativeratings(staffpositiveregard,activities,flexibility)andchildren’sadjustmentatschool(grades,workhabits,socialskills)inGrades2and3,controllingforchildandfamilycharacteristicsandchildprioradjustment.The

Page 58: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

58

researchersfoundthatgreaterstaffpositivityinafter-schoolprogramswasassociatedwithbothboysandgirlsearningbetterreadinggradesinGrades2and3andbettermathgradesinGrade3.BoysalsoexhibitedbettersocialskillsinGrade2whentheirafter-schoolprogramswerecharacterizedbygreaterstaffpositivity.Availabilityofmultipleactivitiesintheafter-schoolprogramswasassociatedwithboysandgirlsearningbettermathgradesandexhibitingbetterworkhabitsatschoolinGrade3.ProgrammingflexibilitywasnotassociatedwithchildoutcomesinGrades2and3.

VandellandPierce(2001)alsoreportedlong-termassociationsbetweenoverallprogramquality,asmeasuredbyannualcompositesofthequalitativeratingsandchildren’soutcomes.Theylookedatcumulativeprogramquality(averagedacrosstwoyears,threeyearsandfouryears)inrelationtochildren’sadjustmentatschool.Controllingforchildandfamilycharacteristicsandforchildren’sfunctioningattheendoffirstgrade,theresearchersfoundthatchildrenwhoexperiencedgreatercumulativeprogramqualityinGrades1–3werereportedbytheirteacherstohavebetteracademicgradesatschool.Girlswhoseafter-schoolprogramshadhighercumulativequalityacrossGrades1–3or1–4hadbetterworkhabitsandbettersocialskillswithpeersatschoolinGrades3and4.

ForMoreInformationThePQOisavailableonlineat:http://childcare.gse.uci.edu/des4.html

ContactDeborahLoweVandellDepartmentofEducationUniversityofCalifornia,Irvine2001BerkeleyPlaceIrvine,CA92697949.824.7840

Page 59: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

59

PurposeandHistoryIn2003,theNewYorkStateAfterschoolNetwork(NYSAN)beganatwo-yearprocessofdevelopingtheProgramQualitySelf-AssessmentTool(QSA).AQualityAssurancecommitteeinvolvingkeystakeholdersfrompractice,policyandresearch,reviewedrelevantliterature,draftedtheinstrument,conductedfieldtestsandincorporatedfeedbackfrompractitionersacrossthestate.Soonaftertheinstrumentwascompletedin2005,NewYorkStatebeganrequiringthatall21stCCLC-fundedprogramsuseittwiceayearforself-assessmentpurposes.

TheQSAwasdevelopedexclusivelyforself-assessmentpurposes;programsarediscouragedfromusingitforexternalassessmentorformalevaluation.Itisintendedtobeusedinitsentirety,ideallyasthefocalpointofacollectiveself-assessmentprocessthatinvolvesallprogramstaff.TheQSAisalsousedbynewafter-schoolprogramsduringtheirinitialdevelopment;specificitemsthatareconsidered“foundational”indicatorsforthestart-upstageareidentified.

TheQSAwasdesignedtobeusedinthefullrangeschoolandcommunity-basedafter-schoolprogramsandisparticularlyrelevantforprogramsthatintendtoprovideabroadrangeofservicesasopposedtothosewitheitheraverynarrowfocusornoparticularfocus(e.g.,drop-incenters).Itwasalsodesignedtobeusedbyprogramsservingabroadrangeofstudents,fromkindergartenthroughhighschool.

ContentTheProgramQualitySelf-AssessmentToolisorganizedinto10essentialelementsofeffectiveafter-schoolprograms(seebelow).Eachelementcontainsalistofstandardsofpracticeorqualityindicatorsthatdescribeeachelementingreaterdetail.Theelementsrepresentamixofactivity-level,program-levelandorganizational-levelconcerns:

Environment/Climate•

Administration/Organization•

Relationships•

Staffing/ProfessionalDevelopment•

Programming/Activities•

LinkagesBetweenDayandAfter-School•

YouthParticipation/Engagement•

Parent/Family/CommunityPartnerships•

ProgramSustainability/Growth•

MeasuringOutcomes/Evaluation•

BecausetheQSAwasdesignedwithaneyetowardsprogramsreceiving21stCCLCfunding,therewasanintentionalefforttocaptureaspectsofprogrammingthatalthoughtheymaynotrelatedirectlytoacademics,willenhanceprograms’abilitytoaddressstudents’educationalneeds.Thedevelopersareexploringoptionsthatwouldallowprogramstoaddressasubsetofitemsbasedontheirlevelofreadiness;however,theultimategoalistoassesstheprogramororganizationinitsentirety.

Becauseofitsbroadfocusextendingfromtheactivityleveltotheorganizationasawhole,theQSAemphasizesseveraldifferentcomponentsofprogramsettingsincludingsocialprocesses,programresourcesandtheorganizationorarrangementofthoseresourcesinsidetheprogram.Socialprocessesaddressedbythetoolincluderelationships,climateandpedagogy.Resourceissuesincludefacilitiesandstaffingrequirementsandarrangementssuchaseffectivetransitions,policiesandproceduresandrelationshipswithschoolsarealsoaddressed.

StructureandMethodologyBecauseofitscommitmenttochildandyouthdevelopmentbroadlydefined,itisnotsurprisingthattheitemsincludedintheQSAreflecteachofthefeaturesidentifiedbytheNationalResearchCouncilasfeaturesofpositivedevelopmentalsettings(2002).

EachoftheQSA’s10essentialelementsofeffectiveafter-schoolprogrammingisfurtherdefinedbya

DevelopedbytheNewYorkStateAfterschoolNetwork

Page 60: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

60

summarystatement,whichisthenfollowedbybetween7and18qualityindicators–statementsaimedatillustratingwhataparticularelementlookslikeinpractice.Whilemostessentialelementsareassessedthroughobservation,themoreorganizationallyfocusedelementssuchasadministration,measuringoutcomes/evaluationandprogramsustainability/growthareassessedprimarilythroughdocumentreview.

TheratingscaleusedintheQSA(seeexamplebelow)isdesignedtocaptureperformancelevelsforeachindicator.Indicatorsarealsoconsideredstandards

ofpractice,sothegoalistodeterminewhethertheprogramdoesordoesnotmeeteachofthestandards.Staffareaskedtodeterminewhethertheirperformanceineachindicatorareais:

4=Excellent/ExceedStandards

3=Satisfactory/MeetsStandards

2=SomeProgressMade/ApproachingStandard

1=MustAddress&Improve/StandardNotMet

Relationships:AQUALITYprogramdevelops,nurturesandmaintainspositiverelationshipsandinteractionsamongstaff,participants,familiesandcommunities.

AQualityProgram:PerformanceLevel PlantoImprove

1 2 3 4 RightNow

ThisYear

NextYear

Hasstaffwhorespectandcommunicatewithoneanotherandarerolemodelsofpositiveadultrelationships.

Interactswithfamiliesincomfortable,respectfulandwelcomingway.

Treatsparticipantswithrespectandlistenstowhattheysay.

Teachesparticipantstointeractwithoneanotherinpositiveways.

Teachesparticipantstomakeresponsiblechoicesandencouragespositiveoutcomes.

Issensitivetothecultureandlanguageoftheparticipants.

Establishesmeaningfulcommunitycollaboration.

Hasscheduledmeetingswithitsmajorstakeholders.

Encouragesformerparticipantstocontributeasvolunteersorstaff.

Page 61: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

61

Whilesomeadditionalguidanceisprovidedtostaffinthetool’sintroductionabouthowtodetermineratings,developersacknowledgethatthisisoneoftheareastheymayrevisitinthefuture,basedonfeedbackfromthefield.Usersarenotencouragedtocombinescoresforeachelementortodetermineaglobalrating,becausethetoolisintendedforinternalself-assessmentpurposesonly.Inadditiontoassigningaratingforeachindicator,usersaregivenspaceontheformtonoteandprioritizetheirplansforimprovement.

TechnicalPropertiesBeyondestablishingfacevalidity(peoplewithexpertiseintheafter-schoolfieldagreethismeasuresimportantfeaturesofprogramquality),researchrelatedtotheinstrument’spsychometricpropertieshasnotbeenconducted.

UserConsiderationsEaseofUsePractitionersledthedevelopmentoftheQSAandrepresentitsprimarytargetaudience.Thelanguageandformatoftheinstrumentarestraightforwardanduser-friendly.Thetoolconsistsofonedocument,freeanddownloadablefromtheWeb,thatincludesanoverview,instructionsandtheinstrumentitself.

NYSANhasdevelopedanewuserguide,publishedinApril2008,toassistprogramsinutilizingtheQSA.Thistoolprovidesguidanceonhowtoengagestaffintheassessmentprocessinadditiontooutliningthebasicguidelinesforadministeringthetool.

Programsareexpectedtogothroughtheself-assessmentprocesstwiceayear.Someinthefieldhaveconcernsaboutthetoolbeingtoolengthy;thisfeedbackwillbetakenintoanupcomingrevisionprocess.

AdditionalSupportsTheuserguide,mentionedabove,wascreatedinconsultationwithawiderangeofstakeholders–includingNYSANstaff,astate-wideQualityAssuranceCommittee,practitioner-basedfocusgroupsandanadvisorygroup.Theguideservesasa“self-guided

walkthrough”theQSAtool;thetoolisembeddedinthesecondhalfoftheguide.NYSANiscurrentlydevelopingphasetwooftheguide–anonlineversionwhichwillallowuserstoclickonlinkstootherweb-basedtools,articlesandresourcesrelatedtoanyoneofthetenessentialelementsortheoverallqualityassessmentandimprovementprocess.Theonlineversionwillalsoprovideadescriptiveexampleofoptimalperformanceforeverysingleindicator(thecurrenthardcopyguidefeaturesonlyselectexamples).ProgramscancontactNYSANtoreceiveadditionalreferralsfortechnicalassistanceinusingtheinstrument.

Whilenocentralizedmechanismforcollectingoranalyzingresultscurrentlyexists,withthedevelopmentoftheonlineversionofthetoolanduserguide,itwillbepossibletoenterdatabycomputer.Thiscouldleadtoefficientopportunitiestotrackandanalyzedataovertime.

Althoughadditionalinstrumentsarenotprovidedwiththetool,usersareencouragedtoconsiderQSAresultsoneimportantsourceofdatatoinformprogramplanningandareencouragedtouseitinconcertwithotherformalorinformalevaluativeeffortssuchasparticipant,parentandstaffsurveys,staffmeetingsandcommunityforums.Inthefuture,userswillbeabletolinktoothertoolsfromtheonlineversionoftheQSAandguide.

AllNYSANtrainingisnoworganizedbythe10elementsfeaturedinthetool,sopractitionerscaneasilyfindprofessionaldevelopmentopportunitiesthatconnectwiththeresultsoftheirself-assessment.Regulartrainingsthatareconductedtwiceayearwith21stCCLCgranteesarealsoorganizedaroundthe10elements.

IntheFieldTheNiagaraFallsSchoolDistricthasfundingthroughthe21stCCLCprogramtorunafter-schoolprogramsatfoursites–threemiddleschoolsandonehighschool.Whileallafter-schoolprogramsreceiving21stCCLCfundinginthestateofNewYorkarerequiredtoconductandsubmitQSAassessmentstwiceayear,theseprogramsinNiagaraFallshaveextendedtheiruseofthe

Page 62: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

62

toolwellbeyondself-assessment.TheyseetheQSAascentraltostaffandprogramdevelopmentefforts.

SusanRoss,theProgramDirectorwithintheschooldistrict,describedhowsitecoordinatorssethetool.“WeseetheQSAasastaffdevelopmentresource.Aboutthreeweeksaftertheschoolyearstarts,sitecoordinatorsbeginsittingdownwithalloftheirstaff–teachersandcommunitypartners–andwalkingthroughthetool,onepageperstaffmeeting.Itgivesusacollectivesenseofwhat’sworkingandwhatweneedtoimprove.It’sagreatfocalpointfordiscussionsamongstaff.”

Rossemphasizedthatoneoftheimportantbenefitsofthisprocessisthatithelpstoleveltheplayingfieldbetweenstafffromexternalcommunitypartnerorganizationsandschoolteacherswhoworkinafter-schoolprograms.“Thisreallygivesourpartnersanopportunitytofeeltheiropinionsarevalued.OftenwhenCBOstaffcomeintoschoolstheyfeellikeguestsasopposedtofull-fledgedpartners.Throughthisprocess,theyseetheiropinionsareequallyvaluedandthathelpsbuildoverallstaffmorale.”

Sitedirectorsandstafffindthetoolaccessibleanduser-friendly.RosssummedupherassessmentoftheQSAinamatter-of-factway.“Welikeit.It’seasytouse,self-explanatoryandunderstandable.Infact,Iwouldn’tchangeanythingaboutit.”

ForMoreInformationNYSAN’sProgramQualitySelf-AssessmentToolisavailableonlineat:www.nysan.org/content/document/detail/1991/

ContactAjayKhashu,DirectorofResearchNYSAN925NinthAvenueNewYork,[email protected]

Page 63: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

63

PurposeandHistoryThePromisingPracticesRatingScale(PPRS)wasdevelopedforresearchpurposesandisdesignedforuseinschool-andcommunity-basedafter-schoolprogramsthatserveelementaryandmiddleschoolstudents.Thetoolallowsobserverstodocumentthetypeofactivity,theextenttowhichpromisingpracticesareimplementedwithinactivitiesandoverallprogramquality.

The2005versionofthePPRS,theversionthatiscurrentlyavailable,wasdevelopedbyDeborahVandell,LizReisner,KimDadisman,KimPierceandEllenPechmaninthecontextofaspecificstudyfocusedontherelationshipbetweenparticipationin“typicallyperforming”programsandchildandyouthoutcomes(Vandell,D.,Reisner,E.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,Dadisman,K.&Pechman,E.,2006;Vandell,D.,Reisner,E.&Pierce,K.,2007).Becauseofthis,thetoolwasinitiallydesignedtoverifywhetherornotprogramswerehigh-qualityratherthantolookatvariationsinqualityacrossprograms.

ThisinstrumentbuildsdirectlyonearlierworkbyVandellandcolleaguesfocusedattheelementarylevel(seewriteupoftheProgramQualityObservationScaleinthisreport)aswellasthefeaturesofpositivedevelopmentalsettingsidentifiedbytheNationalResearchCouncil(2002).Itsauthorsalsodrewuponseveralotherobservationinstrumentsincludedinthisreportastheydevelopedtheexemplarsofpromisingpractices:theSchool-AgeCareEnvironmentRatingScale,theProgramObservationToolandtheOSTObservationTooldesignedbyPolicyStudiesAssociates.

AlthoughthefocusofthissummaryisthePPRSspecifically,othercomponentsofthePromisingPracticesqualityassessmentsystemincludeinterviewsandquestionnairescompletedbyprogramdirectorsandstaff.Thesetoolsobtaininformationaboutstructuralfeaturesofprogramssuchasstaffqualificationsandongoingtraining,materialandfinancialresourcesandconnectionsbetweentheprogramandschool,familyandcommunity.

ContentThePPRSprovidesresearcherswithaframeworkforobservingessentialindicatorsofhighqualityprograms.Itaddressesthreedifferentaspectsofprogramming:activitytype,implementationofpromisingpracticesandoverallprogramquality.Thefirstsection,whichcloselymirrorstheOSTObservationTooldevelopedbyPolicyStudiesAssociates,focusesondocumentingarangeofin-depthinformationaboutthetypeofactivitybeingobservedandtheskillsemphasizedthroughthatactivity.Thepromisingpracticesratingsthatconstitutethecoreoftheinstrumentfocusonthefollowingeightareasofquality:

SupportiveRelationswithAdults•

SupportiveRelationswithPeers•

LevelofEngagement•

OpportunitiesforCognitiveGrowth•

AppropriateStructure•

Over-control•

Chaos•

MasteryOrientation•

Becauseofitsemphasisonwhatchildrenandyouthexperienceinprograms,thePPRShasanactivityandprogram-levelfocusanddoesnotaddressorganizationalissuesrelatedtomanagement,leadershiporpolicy.Theprimaryfocusisonsocialprocesses–includinginteractionsbetweenandamongyouthandstaffandsomeaspectsofinstruction.

Asmentionedabove,thedevelopersdrewheavilyontheCommunityProgramstoPromoteYouthDevelopmentreport(NationalResearchCouncil,2002),sothefeaturesofpositivedevelopmentoutlinedinthatreportarequitevisiblewithinthetool’sdefinitionofpromisingpractices.AlthoughthePPRSitselfdoesnotincludeafocusonconnectionsbetweentheprogramandschool,familyorcommunity(oneofthefeaturesdescribedintheNRCreport),companiontoolsareavailabletocapturethistypeofinformation.

DevelopedbytheWisconsinCenterforEducationResearch&PolicyStudiesAssociates,Inc.

Page 64: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

64

StructureandMethodologyThefirstpartofthePPRS,whichfocusesontheactivitycontext,hasobserverswatchanactivityfor15minutesandcodeseveralaspectsofwhattheyareobserving,including:

Activitytype(e.g.,tutoring,visualarts,music,•sports,communityservice);

Space(e.g.,classroom,gym,library,cafeteria,•auditorium,hallway,playground);

Primaryskilltargeted(e.g.,artistic,physical,•literacy,numeracy,interpersonal);

Numberofstaffinvolvedintheactivity;and•

Number,genderandgradelevelofparticipants.•

Theseobservationsarerecordedonacoversheetthatalsoincludesotherbasicinformationabouttheobserver,theprogram,date,time,etc.

Nextobserversareaskedtowritedownabriefnarrativedescriptionoftheactivitytheyareobserving,followingasetofspecificguidingquestions(seebelow).Thisdescriptionsupplementstheactivitycontextcodingwitharicherdescriptionofwhatisgoingon.

Whatareyouthdoing?•

Whatkindsofmaterialsareused?•

Whatkindsofinstructionalprocessesareused?•

What,ifany,specificskillsdoestheactivity’s•leader(s)havethatsupportstheinstructioninvolvedintheactivityhe/sheisconducting?

LevelofEngagement(inintendedexperiences)High Low

Studentsappearengaged,focusedandinterestedintheiractivities.

Engagedinthefocalactivityand/orusingfreetime•appropriately.

Appeartobeinterestedintheactivity.•

Followstaffdirectionsinanagreeablemanner.•

Studentsappearboredordistracted.Ignorestaffwhoaretalkingtothem•

‘Pretend’tolisten.•

Wanderaimlessly.•

Markersofengagementareappropriatetoactivity(e.g.intenseconcentrationwitnessedduringcomputeractivity,highlevelsofaffectduringsportsactivities;canbesolitaryorgroupactivities.

Markersofengagementinappropriatetoactivity(e.g.pickingflowerswhileplayingasportactivity).

Studentscontributetodiscussions.Discussbackandforthandoffercomments.•

Ask“on-task”questions.•

arecomfortableinitiatingconversation.•

Studentsdonotcontributetodiscussions.Donotparticipateindiscussions.•

Donotaskquestions.•

RatingIndicators:1=Moststudentsarenotengagedappropriately,mayappearbored.2=Studentsareparticipatinginactivitiesbutdonotappeartobeconcentratingoraffectivelyinvolved.3=Studentsarefocusedonactivitieswithsomeevidenceofaffectiveinvolvementorsustainedconcentration.4=Studentsareconcentratingonactivities,focused,interactingpleasantlywhenappropriateandareaffectivelyinvolvedintheactivity.

Page 65: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

65

Whatistheoverallaffectivetone?•

Towhatextentareyouthengaged?•

Describeobservedpromisingpracticesas•appropriateandraiseconcernsaboutquality,ifthereareany.

ThenextsectionandthecoreofthePPRSisthePromisingPracticesRatingssection,whereobserversdocumenttowhatextentcertainexemplarsofpracticearepresentintheprogram.Thissectionofthetooladdressestheeightkeyareasofpracticelistedpreviously.

Eachareaofpracticeissubdividedintotwotofivespecificexemplars,withmoredetailedindicatorsprovidedundereach.Observersaregivenbothpositiveandnegativeexemplarsandindicatorsforeachpracticeareainordertohelpguidedeterminationofratings(seeexampleonpage63).

InthePPRS,ratingsareonlyassignedattheoverallpracticelevel(notforindividualexemplarsorindicators).Practicesareeitherconsideredhighlycharacteristic(4),somewhatcharacteristic(3),somewhatuncharacteristic(2),orhighlyuncharacteristic(1).Additionalguidanceastowhateachofthesetermsmeanisprovidedintheinstrument.Atthebottomofthedescriptionofeachpracticearea,observersaregiventailoredguidanceastowhatmightleadtoa1,2,3or4ratingforthatpractice.

Finally,observersareaskedtoreviewtheirratingsofpromisingpracticesacrossmultipleactivitiesandassignanoverallratingforeachpromisingpracticearea.Anoverallprogramqualityscoreiscomputedasthemeanoftheratingsonthe8scales,afterreversingthescoresforover-controlandchaos.Foreachpracticeareathereisspacetowritedownnotesto“justify”theoverallratingassigned.

TechnicalPropertiesAvailablepsychometricevidencesupportingthePPRSaddressesinterraterreliability,scoredistribution,internalconsistencyandpredictivevalidityinformationfromastudyof35after-schoolelementaryandmiddleschoolprograms(Vandell,D.,Resiner,E.,et.al.,2006).

ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecificdimensions.Vandell,Reisner,etal.(2006)examinedtheaveragescoresforoverallprogramqualityandtheindividualratingscalesobtainedwiththesampleofhigh-qualityelementaryandmiddleschoolprogramsthatparticipatedintheStudyofPromisingAfter-SchoolProgramssampleattwotimepoints.Generally,itisimportanttohavearangeofscoresacrossprograms,asthatwouldsuggestthemeasurecandetectmeaningfuldifferencesbetweenprograms.Becausethissampleincludedonlyhigh-qualityprograms,however,thescoresnaturallyfelltowardthepositiveextremesofeachdimension.ScoredistributionsonthePPRSobtainedin37observationsinprogramsofvaryingqualityaremorewidelydistributed,suggestingthattheinstrumentdetectsmeaningfuldifferencesamongprograms.

Inthatstudy,theauthorstheorizedthatscoreswouldexhibitawiderrangeandwouldshowlow,moderateandhighquality(asopposedtomostprogramsscoringonthehighendofthescale).Asexpected,scoresforeachofthescalesgenerallyhadawiderdistributionwithaveragesacrosstheprogramsfallinginthemiddleformostoftheeightscales.Twoscales,OpportunitiesforCognitiveGrowthandMasteryOrientation,hadlowscoresoverall(averageswere1.65and1.78onascaleof1to4,respectively),althoughtheirscoresintheStudyofPromisingPracticeswereclosertothecenter(averagescoresacrosstwoobservationperiodswithinprogramsservingelementaryandmiddleschoolstudents,respectively,rangedfrom2.6to2.9).Thedistributiondifferencesforthesetwoscalessuggestthattheymaybebettersuitedtodifferentiateamongprogramsofhigherquality.TheOvercontrolscalewastheonlyitemthatwasconsistentlylowforboththeStudyofPromisingPracticesandthefollowupstudy,whichmaysimplysuggestthatstaffinmostprogramsdonotexhibitagreatdealofovercontrol.Takentogether,thetwostudiesprovidestrongevidencethattheinstrumentcapturesmeaningfuldifferencesacrossavarietyofprograms.

Page 66: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

66

InterraterReliabilityTheauthorsexaminedrateragreementforeachoftheinstrument’seightitemsbycalculatingintraclasscorrelationcoefficientsbetweenratingsof24programsmadebytwoobservers.Coefficientsrangedfrom.58forOpportunitiesforCognitiveGrowthto.86forStructure(average=.74)fortheindividualscales.Theintraclasscorrelationfortheoverallprogramqualityscorewas.90.Interrateragreementrepresentedbykappascoresinworkconductedbyotherresearchteamsinprogramsofvaryingqualityrangedfrom.63forover-controlto.94forsupportiverelationswithadults(average=.77)across37observationsmadebytwoobservers.Thesescoresindicateacceptableinterraterreliability,meaningtheinstrument’sitemsareclearenoughforraterstounderstandandagreeon.

Thereiscurrentlynoinformationregardinginterraterreliabilityfortheoverallprogramqualityrating,whichrangeonathree-pointscalefromlowprogramqualitytohighprogramquality.Therefore,evidencedoesnotexisttosupportwhetherratersagreeonthedegreeofqualitythatindividualprogramsexhibit.

AdditionalReliabilityEvidenceAdditionalrateragreementinformationwasobtainedbycomparingtwosetsofratingsbythesameraterconductedonconsecutivedaysforeachprogram.Theauthorsfoundthatthepercentagreementforratingsofeachfeatureovertwodayswasbetween81percentand97percent,withanaverageof90percent.Thistranslatesintoanaveragekappascoreof0.80,indicatingthattheaverageitem’sratingforDay2isnotverydifferentfromDay1.

InternalConsistencyTodeterminewhethertheitemsfittogethertoformameaningfuloverallscore,theauthorscomputedastatisticcalledCronbach’salpha.IntheStudyofPromisingAfter-SchoolPrograms,alphacoefficientsfortheoverallprogramqualityscorerangedfrom.74to.77,indicatingacceptableinternalconsistency.

PredictiveValidityInitialevidenceofpredictivevalidityisavailableforthePPRS,whichmeansthattheinstrumentdoespredictyouthoutcomesthatwouldbeexpectedfrompriortheoryorresearch.Specifically,Vandell,Reisnerandcolleagues(2005)foundthatyouthattendinghighqualityprograms(asmeasuredbythePPRS)hadbettereducationalandbehavioraloutcomesbytheendoftheacademicyearthanunsupervisedyouthwhodidnotregularlyattendanyafter-schoolprogram,includingbetteracademicperformance,taskpersistence,socialskillsandpro-socialbehaviorswithpeersandlessmisconduct,substanceabuseandaggressivebehavior.14Vandell,Reisneretal.(2006)reportedsimilarfindingsforlongertermoutcomesaftertwoyearsofprogramparticipation.Improvementinmathachievementscoreshavealsobeenreported(Vandell,Reisner&Pierce,2007).

Thefactthattheinstrument’sratingsrelatedtoexpectedoutcomesofferssomereassurancetousersthatitaccuratelymeasuresaspectsofprogramquality.However,thevalidityevidenceshouldbetakenaspreliminaryforseveralreasons.First,theauthorshavenotexaminedPPRSratingsoflow-qualityprograms.Noevidenceexiststhattheinstrumentdistinguishesbetweenexpertratingsoflow-andhigh-qualityprograms,orwhetherlow-qualityprogramratingspredictyouthoutcomesdifferentlythanhighqualityprogramratings.Itwouldalsobeusefultounderstandthepredictivevalidityofeachspecificscale(e.g.,levelofengagement,appropriatestructure)andtheoverallscore.

UserConsiderationsEaseofUseWhilethePPRSisavailableonlineandfreeforanyonetodownloadanduse,itisimportanttorecognizethatitwasdevelopedwithprimarilyaresearchaudienceinmind.Whiletheobservationmanualincludesbasicinstructionsforconductingobservationsandcompletingtheforms,itwaswrittenforresearchersparticipatingindatacollectionrelatedtoaparticularstudy.Thematerialshavenotbeentailoredforgeneraluseorforpractitioneruseatthistimeandthereforeincludesomelanguage(e.g.,construct,exemplar)thatmaynotnecessarilybeaccessiblefornon-researchaudiences.

14Resultswerefoundusingtwoadvancedstatisticaltechniquesknownasclusteranalysisandhierarchicallinearmodeling.

Page 67: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

67

InthecontextofthestudythePPRSwasdevelopedfor,sitevisitswerefairlytime-intensive(spreadoverthecourseoftwodays).However,formalobservationtimetotaledapproximatelytwohourspersite,withseveraladditionalhoursspentreviewingnotesandassigningratings.Someadditionalguidanceabouthowtoconductobservations,developratingsandcompletetheformsisprovidedinthemanual.

AvailableSupportsAtthistime,trainingisnotregularlyavailableonhowtousethePPRS,buthasbeenconductedwithdatacollectorsinvolvedinresearch.Trainingshaveincludedreviewingthecontentsoftheinstrumentandpairingnewraterswithtrainedraterstodoanobservationinthefield,comparescoresandbuildinter-observeragreement.

ObservationdatacollectedwiththePPRShasalwaysbeencoupledwithsupplementarydatasourcessuchasaquestionnaireaboutthephysicalenvironmentaswellasstaff,studentandparentsurveys.HoweverformallinksdonotexistbetweentheobservationtoolandothermeasuresandthePPRScouldbeusedindependently.AdditionalmeasuresarealsoavailableatthesamewebsiteasthePPRS.

IntheFieldThePromisingPracticesRatingSystemwasdevelopedspecificallyforuseintheStudyofPromisingAfter-schoolPrograms,anationalstudyfundedbytheC.S.MottFoundationthatfocusedontheshort-andlong-termimpactsofhigh-qualityafter-schoolprogramsonthecognitive,academic,socialandemotionaldevelopmentofchildrenandyouthinhigh-povertycommunities.TheresearchwasledbyDeborahVandellofUCIrvine(formerlyoftheUniversityofWisconsin-Madison)andElizabethReisnerofPolicyStudiesAssociates.

Two-daysitevisitstoparticipatingprogramswereconductedinfall2002,spring2003,fall2003andspring2005toassessthequalityofeachprogram.Duringsitesvisits,researchersconductedobservationsusingthePPRSontwoafternoons,foraminimumofonehourperday.Observersfocusedontheactivitiesof

thetargetagegroups(grades3and4andgrades6and7)andobservedasmanydifferenttypesofactivitiesaspossible,withaminimumof15minutesperactivity.Attheendofthefirstdayofthesitevisit,observersassignedtentativeratingstoeachoftheeightpracticeareas;attheendofthesecondday,thefinalratingsweredetermined.

Asanalysesgotunderway,theauthorsrevisedtheirconceptualschemebasedontheideathatsetsofexperiencesshouldbetakenintoconsiderationasopposedtolabelingstudentsasprogramvs.non-program.Elementarystudentswithhighratesofparticipationinqualityafter-schoolprogramsbutlowlevelsofparticipationinotherafter-schoolarrangements(theprogramonlycluster)outperformedthelowsupervisorcluster(self-care+limitedactivities)oneverymeasureofacademicandsocialcompetenceassessed.Thesupervisedathomegroupoutpacedtheself-care+activitiesclusteronallacademicmeasuresandsocialskills.Amongmiddleschoolstudents,theprogram+activitiesclusterhadbetterworkhabitsthanthelowsupervisiongroupandbothprogramgroups(program+activities,programonly)reportedlessmisconductandsubstanceusecomparedtothelowsupervisiongroup.Similarresultswerefoundforprograminvolvementacrossthreeyears.Additionalfindingsareavailableathttp://childcare.gse.uci.edu/des3.html.

ForMoreInformationThePPRSisavailableonlineat:http://childcare.gse.uci.edu/des3.html

ContactDeborahLoweVandellDepartmentofEducationUniversityofCalifornia,Irvine2001BerkeleyPlaceIrvine,[email protected]

Page 68: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

68

PurposeandHistoryTheQualityAssuranceSystem®wasdevelopedbyFoundations,Inc.tohelpafter-schoolprogramsconductqualityassessmentandcontinuousimprovementplanning.BasedonFoundations,Inc.’sexperiencerunningafter-schoolprograms,offeringprofessionaldevelopmentactivitiesandprovidingtechnicalassistanceandpublicationsforthefield,theQASwasdesignedtohelpprogramsdevelopandsustainacommitmenttoquality.

Initsfirstincarnation,theQASwasasimplechecklistdesignedtoassessthequalityofafter-schoolprogramsoperatedbytheorganizationitself.RoughlyfiveyearsagostaffatFoundationsreconstructedandexpandedthetoolforbroaderuse,withinputfrompractitionersbothinsideandoutsideoftheorganization.TheQASwasdevelopedtobegeneralenoughforuseinarangeofschool-andcommunity-basedprogramsservingchildrenandyouthgradespre-K–12.

Basedonseven“buildingblocks”thatareconsideredrelevantforanyafter-schoolprogram,thisWeb-basedtoolisexpandableandhasbeencustomizedforparticularorganizationsbasedontheirfocus.TheQASfocusesonqualityatthe“site”levelandaddressesarangeofaspectsofqualityfrominteractionstoprogrampoliciesandleadership.FillingouttheQASrequiresacombinationofobservation,interviewanddocumentreview.Scoresaregeneratedforeachbuildingblockratherthantheoverallprogram,reflectingthetool’semphasisonidentifyingspecificareasforimprovement.

ContentThevariouscomponentsofqualitythattheQASfocusesonarecalled“buildingblocks.”Thesevencorebuildingblocks,whichdescribewhatFoundationsconsiderstobethefundamentalfeaturesthatunderlieeffectiveafter-schoolprogramming,include:

Programplanningandimprovement;•

Leadership;•

Facilityandprogramspace;•

Healthandsafety;•

Staffing;•

Familyandcommunityconnections;and•

Socialclimate.•

Inadditiontotheseseven,three“programfocusbuildingblocks”reflectingtheparticulargoalsorfocusofaprogramareavailableforuserstoselectfrom:

Academics;•

Recreation;and•

Youthdevelopment.•

TheQASputsroughlyequalemphasisonthreedifferentcomponentsofsettingsincludingsocialprocesses,programresourcesandthearrangementororganizationofthoseresourceswithinprograms.ThereareitemsontheQASthataddressallofthefeaturesofpositivedevelopmentalsettingsoutlinedbytheNationalResearchCouncil(2002),withsomewhatmoreofanemphasisonthingsrelatedtostructureandskill-buildingthanonfeaturessuchas“supportforefficacyandmattering”and“supportiverelationships.”

StructureandMethodologyThestructureoftheQASisclearandstraightforward.Partone–programbasics–includesthesevencorebuildingblocks.Foreachone,usersaregivenabriefdescriptionoftheimportanceofthataspectofquality.Thebuildingblockisfurthersubdividedintofivetoeightspecificelements,eachofwhichareassignedaratingbyassessors.Forexample,theelementsofthesocialclimatebuildingblockinclude:behavioralexpectations,staff/participantinteractions,diversity,socialtimeandenvironment.Foreachelement,morespecificdescriptions(alsoreferredtoasa“rubric”)areprovided.Parttwoofthetool–programfocus–consistsofthethreeadditionalbuildingblocksanditsstructureparallelsthatofpartone.Programsareencouragedtouseone,twoorallthreeoftheprogramfocusbuildingblocksinconductingtheirassessment.

QualityAssuranceSystem®DevelopedbyFoundations,Inc.

Page 69: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

69

RatingsfortheQASaremadeusingafour-pointscalefromunsatisfactory(1)tooutstanding(4).Foreachelementofabuildingblock,specificdescriptionsofwhatmightleadtoa1,2,3or4ratingareprovided(seeexamplebelow).

Intermsofdatacollection,usersareprovidedwithadocumentchecklistthatidentifieswhatkindsofspecificdocumentsmightbeusefulinfillingouttheQASandareencouragedtogatherandexaminesuchdocumentspriortoobservingtheprogram.The“programprofile”

sectionofthetoolasksuserstouploadimportantbasicinformationabouttheprogramandcanalsobefilledout,forthemostpart,priortovisiting.

Onceon-site,theusers’guideencouragesobservers•togothroughfivesteps:

Meetpeopletoestablishrapportandhearfrom•staffandyouthabouttheprogram;

Wanderwithpurposetodevelopasenseofthe•entirefacility;

Staffing Unsatisfactory NeedsImprovement Satisfactory OutstandingScore

Elements 1 2 3 4

5.1StafftoParticipantRatio

Insufficientstaffarehiredforthenumberofparticipants

Sufficientstaffarehiredforsomelevelsofparticipation,butstaffingissometimesinsufficientduetoattendancefluctuations

Appropriateparticipanttostaffratiosaremaintainedconsistenly.

Staffnumberandattendanceexceeedrequiredratios.

5.2Qualifications

Fewerthanhalfthestaffhavetherequiredtrainingand/orexperience

Morethanhalfthestaffhavetherequiredtrainingand/orexperience.

Allstaffhavethetrainingand/orexperiencerequiredbytheprogram.

Manystaffmembersexceedtrainingand/orexperiencerequiredbytheprogram.

5.3ProfessionalGrowth

Professionaldevelopmentisnotprovidednoristimeallocatedforstafftopursueindividualprofessionalgrowth.

Someprofessionaldevelopmentopportunitiesareprovided,buttheyarepoorlyattended.

Staffattendprofessionaldevelopmentsessionsatleasttwiceayear.

Staffidentifyprofessionaldevelopmentneedsandattendprofessionaldevelopmentsessionsmorethantwiceayear.

5.4Attendence

Staffabsenteeismisanongoingproblem(e.g.significantnumberofstaffroutinelyabsent).

Staffabsencesareanoccasionalproblem.

Staffarereliableandabsencesareinfrequent.

Staffabsencesarerare.

5.5Retention Staffturnoverisidentifiedasaproblem.

Staffturnoveroccasionallyaffectsprogramofferings.

Staffretentionisnotidentifiedasaproblem.

Staffretentionisexcellentandprovidesstability.

TotalScore

Page 70: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

70

Observeactivitiestoseetheprograminaction,the•levelofengagementandthenatureofactivities;

Gathermaterialstoensurethatallofthe•documentsinthechecklistandanyotherrelevantmaterialsarecollected;and

Takenotestoensureyouhavearunningrecordof•yourobservationsandquestions.

OncescoresforeachelementareenteredintotheQAS,theprogramelectronicallygeneratesoverallbuildingblockscores.Theprogram’squalityprofilethenbeginstoemergethroughsummarygraphsthesoftwaregeneratesforeachbuildingblockaswellasaprogramsummarygraphthatcontainsscoresforeachbuildingblockassessed.Thegraphsandbuildingblockscoreshelpuserstargetareasforimprovementaspartoftheassessmentprocess.Afollow-upQASassessmentenablesuserstoidentifyareasofprogressandthenrefinegoal-settingandimprovementplanning.

TechnicalPropertiesBeyondestablishingfacevalidity(peoplewithexpertiseintheafter-schoolfieldagreethismeasuresimportantfeaturesofprogramquality),researchrelatedtotheinstrument’spsychometricpropertieshasnotyetbeenconducted.

UserConsiderationsEaseofUseTheQASisastraightforward,flexibletoolwithseveralbuilt-infeaturesthatmakeitparticularlyuser-friendly.Theinstructionguideiswritteninclear,accessiblelanguageandwalksusersthroughthenecessarybackgroundandbasicstepsforusingthesystem.ThestandardcostfortheQAShasrecentlybeenreducedto$75foranannualsitelicense.Thislicenseisgoodfortwoofficialuses(orassessments)–whichiswhatitsdeveloperssuggestprogramsconductannually,oncetowardthebeginningoftheyearandoncetowardtheend.Aftertwousesthesystemgeneratesacumulativereportcomparingtheinitialandfollow-upassessments.Forprogramswithmultiplesites,acumulativereportcomparingsiteresultsisavailablewiththeinitial

assessment.WhentheQASisusedaspartofaprofessionaldevelopmentpackagerelatedtoqualityimprovement,discountsareavailable.

AvailableSupportsFoundations,Inc.offersonlinesessionsandin-persontrainingoptionstoassistorganizationsinusingthetool.Multi-siteorganizationsmaycontractforindividualizedtechnicalassistanceandtraining,whichmayincludeoptionsforcustomizationofthetool.Trainingsaddressingqualityelementsreflectedinthebuildingblocksareavailableonline,intechnicalassistance,andinprofessionaldevelopmentsessions.

Forself-assessmentpurposes,onceaQASsitelicenseispurchased,programscanreceivelightphonetechnicalassistancefreeofchargefromFoundations,Inc.staffiftheyhavequestionswhileusingthesystem.ProgramsthatwishtohavetrainedassessorsconducttheirassessmentcanpurchasethisserviceundercontractwithFoundations,Inc.

TheQASisavailableinaWeb-basedformat,allowinguserstoenterdataandimmediatelygeneratebasicgraphsandanalyses.Thesite-specificreportsgeneratedarespecificallydesignedtohelpsitestaffandleadersusetheinformationtoguideimprovementplanning.

IntheFieldFoundations,Inc.isworkingwiththeU.S.DreamAcademy,anationalprogramservingelementaryandmiddleschoolstudentswhoarechildrenofprisoners.In10centersaroundthecountry,U.S.DreamAcademyprovidesacomprehensiveprogramincludingacademicsupport,enrichment,andaone-on-onementoringrelationship.TheU.S.DreamAcademychosetousetheQASandatechnicalassistancestrategyduring2008-2010withFoundationstobuildandsupportprogramquality,andtoestablishanongoingprocessofcontinuousimprovement.Duringinitialmeetings,U.S.DreamleadersandstaffworkedwithFoundationstoclarifyqualityindicatorsfortheprogramandcustomizethetool.Theprocessesofco-assessmentandselfassessmentaredesignedtobuildthecapacityofsites

Page 71: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

71

totargetspecificimprovementgoalsandconcretesteps,identifysitestrengthsandinnovations,andsharestrengthsorganization-wide.U.S.DreamAcademynationalheadquarterswillusetheQASfindingstoregularlyidentifywhereandhowtheycanbestsupporttheircenters.

DirectorsanticipatethattheQASandsurroundingprocesseswilldirectstafftoaskcriticalquestionsabouttheirprogramenvironmentsandstaffpractice.Atthesametime,itwillallowCenterstohighlightandsharestrengthsandaccomplishmentsacrosstheorganization,buildinginternalresourcesforquality.AfterjointassessmentsareconductedwithU.S.DreamstaffandFoundationsateachsite,individualscoreswillbeaggregatedandpresentedatanationalmeeting.Eachcenterwillconductafollow-upself-assessmentattheendofthe2008-2009schoolyear,atwhichpointtheywillbeabletoanalyzethedataandevaluatetheirownprogramdevelopment.TheQAStoolalsowillbeavailablethefollowingschoolyeartoalloweachsitetocontinuetheself-assessmentprocess.

TheQASdesign,coupledwithtechnicalassistanceprocesses,allowsforcustomizationofthetool.WiththeadditionofaneleventhbuildingblockfocusedonU.S.DreamAcademy’smentoringcomponent,thetoolencompassestheorganization’sfullrangeofessentialprogramcomponents.C.DianeWallaceBooker,ExecutiveDirectoroftheU.S.DreamAcademy,statedthat,“thebeautyofQASisthatitisdesignedwithevidence-basedqualityindicatorsyetiscustomizableandabletocapturetheuniqueelementsofourprogramandhelpedustomoreclearlydefinewhatqualitylookslikeforusversusanyotherafterschoolprogram.Further,theprocessofguided,selfassessmentandcontinuousimprovementplanningiscriticaltoourongoingeffortstoachieveimpactinthelivesofthechildrenweserve.”Establishinganongoingprocessforquality-buildingtailoredtothespecificsoftheprogramisparticularlyimportantformulti-siteprograms.AsU.S.DreamAcademyexpandstoservemorechildren,thesustainedqualityassurancecomponentbecomesevermorecritical.

ForMoreInformationAdditionalinformationabouttheQAS,includingorderinginformationisavailableonlineat:http://qas.foundationsinc.org/start.asp?st=1orbyvisitingwww.afterschooled.org

ContactRheMcLaughlin,AssociateExecutiveDirectorCenterforAfterschoolEducationFoundations,Inc.MoorestownWestCorporateCenter2ExecutiveDrive,Suite4Moorestown,[email protected]

Page 72: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

72

PurposeandHistoryTheSchool-AgeCareEnvironmentRatingScale(SACERS)isdesignedtoassessbefore-andafter-schoolcareprogramsforelementaryschoolagechildren(5-to12-yearolds)aswellaswholedayprogramsincommunitieswithyear-roundschools.Itfocuseson“processquality”orsocialoreducationalinteractionsgoingoninthesetting,aswellasprogramfeaturesrelatedtospace,schedule,materialsandactivitiesthatsupportthoseinteractions.

TheSACERSwasdevelopedforself-assessment,programmonitoringorprogramimprovementplanning,aswellasforresearchandevaluation.Itcanbeusedbyprogramstaffaswellastrainedexternalobserversorresearchers.Whileself-describedasappropriatefor“groupcareprograms,”theSACERShasbeenusedinarangeofprogramenvironmentsbeyondchildcarecenters,includingschool-basedafter-schoolprogramsandcommunity-basedorganizationssuchasYMCAsandBoysandGirlsClubs.

TheSACERS,publishedin1996butupdatedperiodicallysincethen,isoneofaseriesofprogramassessmentinstrumentsdevelopedbyresearchersaffiliatedwiththeFrankPorterGrahamChildDevelopmentInstitute(FPG).Assuch,theSACERSisanadaptationoftheEarlyChildhoodEnvironmentRatingScale(ECERS)andisquitesimilarinformatandmechanicstotheECERS,theFamilyDayCareRatingScale(FDCRS)andtheInfant/ToddlerEnvironmentRatingScale(ITERS).Somestatesandlocalitieshaveusedseveralscaleswithintheseriestocreatecontinuityacrossaccreditationoraccountabilitysystems,giventheconsistentorientation,language,formatandscoringtechniques.

ContentTheSACERSmeasuresprocessqualityaswellascorrespondingstructuralfeaturesofprograms.Itscontentreflectsthenotionthatqualityprogramsaddressthree“basicneeds”ofchildren:protectionoftheirhealthandsafety,positiverelationshipsandopportunitiesforstimulationandlearning.Thesethreebasiccomponentsofqualitycareareconsideredequallyimportant.Theymanifestthemselvesintangible,

observablewaysandconstitutethekeyaspectsofprocessqualityincludedintheSACERS.Thesevensub-scalesoftheSACERSinclude:

SpaceandFurnishings;•

HealthandSafety;•

Activities;•

Interactions;•

ProgramStructure;•

StaffDevelopment;and•

SpecialNeedsSupplement.•

Byaddressingbothprocessqualityaswellasstructuralfeaturesthatrelatetoprocessquality(andotherstructuralmattersnotdirectlyrelatedtoprocessqualitysuchashealthpolicy),theSACERSputsasmuchemphasis,ifnotmore,onprogramresourcesandtheorganizationofthoseresourcesasitdoesonsocialprocessesthatoccurwithinthesetting.Thisreflectsitsrootsintheassessmentandmonitoringofenvironmentsservingyoungchildren.ThereareitemsontheSACERSthataddresseachofthefeaturesofpositivedevelopmentalsettingsoutlinedbytheNationalResearchCouncil(2002),withthemostemphasis(thelargestnumberofrelevantitems)clusteringunderthe“physicalandpsychologicalsafety”feature.

InteractionsSub-ScaleItemsGreeting/Departing•

Staff-childInteractions•

Staff-childCommunication•

StaffSupervisionofChildren•

Discipline•

PeerInteractions•

InteractionsBetweenStaff&Parents•

StaffInteraction•

RelationshipsBetweenProgramStaff•&ClassroomTeachers

DevelopedbyFrankPorterGrahamChildDevelopmentInstitute&ConcordiaUniversity,Montreal

Page 73: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

73

StructureandMethodologyThestructureoftheSACERSisstraightforwardandconsistentwiththeothertoolsintheEnvironmentRatingScalesseries.Thescaleincludes49itemsinthesevensubscalesmentionedabove(seeboxfortheitemsinthe“Interactions”sub-scale).Allofthesub-scalesanditemsareorganizedintoonebookletthatincludestheitems,directionsforuseandscoringsheets.

Whileobservationisthemainformofdatacollectiontheinstrumentisbuiltaround,thereareseveralitemsthatarenotlikelytobeobservedduringprogramvisits.WhiletheSACERSdoesnotseparatethoseitemsoutintoaseparateinterviewscaleorform,ratersareencouragedtoaskquestionsofadirectororstaffpersoninordertoratetheseitemsandareprovidedwithspecificsamplequestionsthatwillhelpthemgetthenecessaryinformationtocompletetheform.

All49itemsareratedonaseven-pointscale,withonebeing“inadequate”andsevenbeing“excellent.”Concretedescriptionsofwhateachitemlookslike

ataone,three,fiveandsevenareprovided(seeexamplesbelow).Notesforclarificationthathelptheuserunderstandwhattheyshouldbelookingforandarealsoprovidedformanyitems.Observerscompiletheirscoresontoasummaryscoresheet,whichencouragesuserstocompilesratingsandcreateanoverallaverageprogramqualityscore.

TheSACERSismeanttobeusedwhileobservingonegroupatatime,foraperiodofthreehours.Asampleofone-thirdtoone-halfofgroups(whenprogramshavechildrendividedintogroupsorclassrooms)isrequiredtoestablishascoreforanentireprogram.

TechnicalPropertiesInthecaseoftheSACERS,psychometricevidencedemonstratesthatobservationsbydifferentratersareconsistent(interraterreliability)andthattheinstrument’sscalesconsistofitemsthatclustertogetherinmeaningfulways(internalconsistency).Preliminaryevidencealsoexistsforconcurrentvalidity,suggestingtheSACERSmaybeanaccuratemeasureof

Inadequate Minimal Good Excellent

1 2 3 4 5 6 7

Staf

f-Chi

ldC

omm

unicat

ions

Staff-child•communicationisusedprimarilytocontrolchildren’sbehavior&mangeroutines.

Children’stalknot•encouraged.

Stateinitiatebrief•conversations(e.g.askquestionsthatcanbeansweredwithyes.no,limitedturn-takinginconversations).

Limitedresponseby•stafftochild-initiatedconversations&questions.

Staff-childconversations•arefrequent.

Turn-takinginconversation•betweenstaff&childisencouraged(e.g.stafflistenaswellastalk).

Languageisusedprimarily•bystafftoexchangeinformationwithchildren&forsocialinteractions.

Childrenareasked“why,•how,whatis”questionswhichrequirelongermorecomplexanswers.

Staffmakeeffortto•talkwitheachchild(e.g.listentochild’sdescriptionofschoolday,includingproblems&successes).

Staffverballyexpand•onideaspresentedbychildren(e.g.addinformation,askquestionstoencouragechildrentoexploreideas).

Page 74: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

74

programpracticesthatpredictrelatedoutcomes.15TheinformationpresentedhereisreportedbyHarms,JacobsandWhite(1996).

InterraterReliabilityToexamineinterraterreliabilityorthedegreetowhichdifferentratersagreewhenobservingthesameprogram,pairedratersassessed24programsusingthemeasure.ResearcherstestedinterraterreliabilitywiththeSACERSscalesandtotalscoreusingkappascoresandintraclasscorrelationcoefficients.

Allreliabilitycoefficientswerenearorabove0.70,suggestingstrongagreement.Inotherwords,withadequatetrainingforraters,scoreswillnotdependonwhichraterisevaluatingagivenprogram.

InternalConsistencyResearchersexaminedhowconsistentindividualitemscoresarewithineachrespectiveSACERSscale,sincealloftheitemswithinaparticularscaleareintendedtomeasureaparticularconcept(e.g.,HealthandSafety).Internalconsistencyofthescalesandthetotalscorewasstrong,withalphavaluesrangingfrom.67to.95.Highinternalconsistencystrengthenstheargumentthattheitemsjointlyrepresentthecentralconceptofinterest.

ConvergentValidityConvergentvalidityisexaminedbycomparingthefindingsfromtheinstrumentofinteresttoasimilarassessmenttool,tohelpdemonstratetheinstrument’sabilitytomeasurewhatitissupposedtomeasure.FindingsfromthreeoftheSACERSscaleswerecomparedtoratingswithVandellandPierce’sProgramQualityObservationScale(bytheauthorsandcolleaguesofthePQO,alsoreviewedinthisreport).EvidenceindicatedthateachofthesethreeSACERSscales(ProgramStructure,ActivitiesandInteractions)wererelatedtosimilarPQOitemsinexpectedways.Specifically,VandellandPierce(1998)foundthefollowingrelationshipsbetweenPQOandSACERS

scalesin46after-schoolprograms:(1)SACERSProgramStructurewaspositivelyrelatedtoPQOProgrammingFlexibility,(2)SACERSActivitieswaspositiverelatedtoPQOAvailableActivities,and(3)SACERSInteractionswaspositivelyrelatedtoPQOStaffPositiveRegardandPositiveBehaviorManagement,andnegativelyrelatedtoPQOStaffNegativeRegardandNegativeBehaviorManagement.ConvergentvalidityevidenceisunavailableontheotherSACERSscales.

ConcurrentValidityTodeterminewhetherSACERSaccuratelymeasuresprogramquality,developersexaminedwhethertheinstrument’sratingswererelatedtodistinct,theoreticallyimportantconceptsinexpectedways.Additionalconcurrentvalidityevidencecoversallofthescalesandtotalscore.Specifically,becausepriorresearchsuggestsprogramqualityisrelatedtostaffeducation/training,researchersexpectedthatiftheSACERSscaleswereadequatelymeasuringquality,theywouldbepositivelyrelatedtostaffeducation/training.AsHarms,JacobsandWhite(1996)expected,SpaceandFurnishings,Interactions,andProgramStructure,aswellastheoverallSACERSscore(whichcanbethoughtofasgeneralprogramquality)weremoderately,positivelycorrelatedwithameasureofstaffeducationandtraining.However,theydidnotreportparallelcorrelationswithmeasuresofthreeadditionalscales(HealthandSafety,Activities,orStaffDevelopment);itisunclearwhethertheydidnottestthesescalesoriftheyfoundthemtobeunrelatedtostaffeducation/training.Theresearchersalsotestedthevalidityofthescalesbyexaminingtheirrelationshiptostaff-childratio.Asexpected,theyfoundthatHealthandSafety,Activities,andStaffDevelopmentweremoderatelyrelatedwithchild-staffratio.Theydidnotreportcorrelationsbetweentheotherscalesandtotalscorewithstaff-childratioanditisunclearwhethertheydidnottesttheseorwhethertheywereuncorrelatedwithstaff-childratio.

AdditionalValidityEvidenceToexploretheextenttowhichtheSACERSadequatelymeasuresprogramquality,thedevelopersaskednine15Exceptwhennoted,psychometricinformationisnotavailableforthe

supplementary“specialneeds”itemsattheendoftheinstrumentbecausenoneoftheprogramstestedhadexceptionalchildren.

Page 75: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

75

expertstoratehowmucheachitemintheinstrumentrelatedtotheirdefinitionofhighquality.Usingafive-pointscale(withfivebeingaveryimportantaspectofquality),theminimumaveragescorewasaroundafourandexpertsratedmostitemsclosetoafive.Thesescoressuggestthatitemsadequatelymeasureaspectsofquality.However,sinceexpertswerenotaskedwhetheranyaspectsofqualitywereabsentfromtheinstrument,thisshouldnotbetakenasevidencethatprogramqualityasawholeisadequatelyrepresented.

UserConsiderationsEaseofUseTheSACERSisveryeasytouseintermsofaccessibilityofformatandlanguage(itiscurrentlyavailableinEnglish,French,German,andmostrecently,Spanish).Fullinstructionsforusingthescaleareincludedinthebookletalongwiththeitemsthemselves,notesclarifyingmanyoftheitemsandatrainingguidewithadviceonpreparingtousethescale,conductingapracticeobservationanddetermininginterraterreliability.Oneblankscoresheetisincludedinthebookletandadditionalscore-sheetscanbeorderedinpackagesof30.TheSACERSbookletisavailableforpurchasethroughTeachersCollegePressat$15.95.

Developerssuggestittakesapproximatelythreehourstoobserveaprogramandcompletetheform(usersareencouragedtocheckoffindicatorsandmakeatleastinitialscoringdecisionswhileobserving).Acknowledgingthatqualitycanvarywithinthesamecenterorprogram,thedevelopersadvisethattheapproachtoobservationandscoringreflecthowprogramsarestructured.Ifaprogramhaschildrenbrokenintoseveraldifferentclassrooms,observersareencouragedtoobserveone-thirdtoone-halfofthegroupsintheprogrambeforecreatinganoverallscore.

AvailableSupportsThreeandfive-daytrainingworkshopsfocusedonthestructure,rationaleandscoringoftheSACERSareavailablethroughtheFPGInstitute,asisadditionalinformationabouttheinstrumentandtheotherratingscalesintheseries.Specificguidanceforhowtoconduct

yourowntrainingwithstafforotherobserversisprovidedintheSACERSbooklet.Trainingtoreliabilitytakesanestimated4-5days,withreliabilitychecksthroughout.

FPGiscurrentlysolicitinginputfromusersinthefieldtodevelopapracticalmanualforadulteducatorsusinganyoftheratingscales,whichwillincludespecificmaterialssuchascoursesyllabiandoutlines.Formshavealsobeendevelopedtoassistwithreportingandapplyingobservationstoprogramimprovementplans.UserscansignuptojoinalistservthroughtheFPGWebsitetointeractwithotherusersinthefieldandtohearaboutupdatesandotherrelevantdevelopments.

Largescaleusersoftheratingscalescannowworkwithacommercialsoftwarepackage–theERSDataSystem–toenterandscoretheirdata.TheTabletPCversiondisplaystheitemsasseenintheprintversionandscoresaremadebytappingonthescreen.Notescanalsobewrittenwithaspecialpenandareautomaticallytranslatedintoprinttextandcanbeincorporatedintothesummaryreports.Thesoftwarealsohasamoduleoninterraterreliabilitywhichcanbeusedtocomparescores,reachconsensusanddeterminereliability.UsingtheWeb-basedsystem,individualassessmentscanbeautomaticallyroutedtoasupervisorforqualityassuranceandfeedbackandaggregatedataanalysisandorganizationandprogram-levelreportingcanbeprovided.

ImportantinformationforupdatingtheSACERSisavailableatwww.fpg.unc.edu/~ecers,includingadditionalNotesforClarificationandanexpandedscoresheet.Also,arevisionofSACERSisforthcoming,asisaYouthratingscaleforprogramsservingmiddleandhigh-schoolageyouth.

IntheFieldThestateofTennesseepassedlegislationin2001requiringalllicensedchildcarecentersandfamily/grouphomesinthestatetobeassessedusingtheEnvironmentRatingScales(includingSACERS).TheresultingChildCareEvaluationandReportCardprogramhastwoparts,onemandatoryandonevoluntary,both

Page 76: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

76

ofwhicharestructuredaroundtheEnvironmentRatingScalestoassessthequalityofcareprovidedatspecificfacilities.Inthemandatorypartoftheprogram,theERSassessmentisoneofseveralcomponentsofanoverall“reportcard”giventoeachproviderthatmustbepostedalongwiththeirannuallicense.

ThevoluntarypartoftheprogramtiestheERS-basedassessmenttoreimbursements.IntheStar-QualityChildCareprogram,overallassessmentscoresforparticipatingprovidersisconvertedintoone,two,orthreestars,whichinturncanincreasetheprovider’sstatereimbursementby5,10or15percentrespectively.Tosupportparticipationinboththemandatoryandvoluntaryprograms,localTechnicalAssistanceUnitsprovideassistance,atnocharge,toanyproviderthatwantsinformationonhowtoimprovequalityandtherebyincreaseitsassessmentscore.

TheTennesseeDepartmentofHumanServices(TDHS)workswiththeUniversityofTennesseeandseveralotherorganizationstoimplementandmanagethisprogram.TDHSandUT’sSocialWorkOfficeofResearchandPublicServicemanagetheprogramandTennesseeStateUniversitypreparesanddeliverstheinitialtrainingforassessors.ElevenresourcecentersaroundthestatehouseanAssessmentandTechnicalAssistanceUnit.Theseunits,whichareresponsibleforconductingalltheERS-basedassessments,hireandemployabout60assessorsstatewide.AssessorsreceiveongoingtrainingandfrequentreliabilitychecksbyassessmentspecialistsattheUT.

Theassessmentprocesstakesplaceinconjunctionwithlicenserenewal.Adatabasehasbeendevelopedthatprovidesaccesstoregularlyupdatedstatisticalanddemographicinformationabouttheprogram’ssuccessinpromoting,supportingandincreasingqualitychildcareacrossthestate.

TheSACERSandotherscalesinthisseriesarepartofmanyotherstatequalityratingsystems,includingNorthCarolina,Mississippi,ArkansasandPennsylvania.

ForMoreInformationAdditionalinformationabouttheSACERS,supplementarymaterialsandorderinginformationisavailableonlineat:www.fpg.unc.edu/~ecers/

ContactThelmaHarms,DirectorofCurriculumDevelopmentFrankPorterGrahamChildDevelopmentInstitute517S.GreensboroStreetCarborro,[email protected]

Page 77: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

77

PurposeandHistoryTheYouthProgramQualityAssessment(YPQA)isaninstrumentdesignedtoevaluatethequalityofyouth-servingprograms.Whileitspracticalusesincludebothprogramassessmentandprogramimprovement,itsoverallpurposeistoencourageindividuals,programsandsystemstofocusonthequalityoftheexperiencesyoungpeoplehaveinprogramsandthecorrespondingtrainingneedsofstaff.

Whilesomequalityassessmenttoolsandprocessesfocusonthewholeorganization,theYPQAisprimarilyfocusedonwhatthedevelopersrefertoasthe“pointofservice”–thedeliveryofkeydevelopmentalexperiencesandyoungpeople’saccesstothoseexperiences.Whilesomestructuralandorganizationalmanagementissuesareincludedintheinstrument,itfocusesprimarilyonthosefeaturesofprogramsthatcanbeobservedandthatstaffhavecontroloverandcanbeempoweredtochange.Whilethesesocialprocesseshavenotalwaysbeenemphasizedinlicensingandregulatoryprocesses,researchsuggeststheyarecriticalininfluencingprogramqualityandoutcomesforyouth.Giventhisfocus,theYPQAisexpectedtoassessprogramqualitymostaccuratelywhenusersobserveprogramofferings(programmaticexperiencesconsistingofthesamestaff,childrenandlearningpurposeacrossmultiplesessions).

TheYPQAhasitsrootsinalonglineageofqualitymeasurementrubricsdevelopedbytheHigh/ScopeEducationalResearchFoundationoverthepastseveraldecadesforpre-school,elementaryandnowyouthprograms.Initsinitialiteration,theinstrumentwasdevelopedspecificallytoassessimplementationoftheHigh/Scopeparticipatorylearningapproach.Initscurrentform,thetoolisrelevantforawiderangeofcommunity-andschool-basedyouth-servingsettingsthatservegrades4–12.Ithasbeenusedinarangeofafter-school,camp,youthdevelopment,preventionandjuvenilejusticeprograms.Itisnotnecessarilyappropriateforuseinhighlyunstructuredsettingsthatlackfacilitatedactivities.

ContentTheYPQAmeasuresfactorsattheProgramOfferinglevelandtheOrganizationallevelthataffectqualityatthe“pointofservice.”Thesevenmajordomains(calledsub-scalesinthetool)thatarecoveredincludeEngagement,Interaction,SupportiveEnvironment,SafeEnvironment,Youth-centeredPoliciesandPractices,HighExpectationsandAccess.

Becauseofthefocusonthe“pointofservice,”theYPQAemphasizessocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Themajorityofitemsareaimedathelpingusersobserveandassessinteractionsbetweenandamongyouthandadults,theextenttowhichyoungpeopleareengagedintheprogramandthenatureofthatengagement.HowevertheYPQAalsoaddressesprogramresources(human,material)andtheorganizationorarrangementofthoseresourceswithintheprogram.

ThecontentoftheYPQAalignswellwiththeNationalResearchCouncil’sfeaturesofpositivedevelopmentalsettings(2002),withtheleastemphasisonwhatisreferredtobytheNRCas“integrationoffamily,schoolandcommunityefforts.”ThecontentoftheYPQAhasalsobeenreviewedagainstandappearscompatiblewithJimConnellandMichelleGambone’syouthdevelopmentframework(2002).

StructureandMethodologyTheseventopicsordomainscoveredbytheYPQAaremeasuredbytwodifferentoverallscales(groupsofrelateditems)thatrequiredifferentdatacollectionmethods.TheprogramofferingitemsareincludedinFormAandareassessedthroughobservation.FormBincludestheorganizationlevelitems,whichessentiallyassessthequalityoforganizationalsupportfortheprogramofferinglevelitemsthatarethefocusofFormA.EvidenceforFormBisgatheredthroughacombinationofguidedinterviewandsurveymethods.

Thesevendomainscanbegraphicallyrepresentedbythe“pyramidofprogramquality,”(seebelow),whichrepresentsbothanempiricalrealityandaunified

DevelopedbytheDavidP.WeikartCenterforYouthProgramQuality16

16TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.

Page 78: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

78

frameworkforunderstandingandimprovingquality.Fromanempiricalperspective,assessmentsusingtheYPQAthusfarfollowadistinctpattern–mostprogramsscorehighestinsafetyandthenprogressivelylowerasyoumoveupthelevelsofthepyramidthroughsupport,interactionandengagement.Programsthatscorehighinengagementandinteractionappearmost

abletoinfluencepositiveyouthoutcomes(seetechnicalpropertiesformoredetailonthevaliditystudy).

ThescaleusedthroughouttheYPQAisintendedtocapturewhethernoneofsomething(1),someofsomething(3)orallofsomething(5)exists.Foreachindicator,veryconcretedescriptorsareprovidedto

Interaction

SupportiveEnvironment

SafeEnvironment

ReflectMakeChoices

Lead&Mentor

PartnerWithAdults

ExperienceaSenseofBelonging

SetGoals&MakePlans

BeinSmallGroups

HealthyFood&Drinks

ReframingConflict

WelcomingAtmosphere SkillBuilding

AppropriateSessionFlow

ActiveEngagement

Encouragement

EmergencyProcedures&Supplies ProgramSpace

&Furniture

PhysicallySafeEnvironment

Psychological&EmotionalSafety

Engagement

HighExpectationsStaffDevelopment•

SupportiveSocial•Norms

HighExpectationsfor•YoungPeople

CommittedtoProgram•Improvement

YouthCenteredPolicies&PracticesStaffQualificationsSupport•PositiveYouthDevelopment

TapYouthInterests&BuildSkills•

YouthInfluenceSettings&•Activities

YouthInfluenceStructure&Policy•

AccessStaffAvailability•&Longevity

ProgramSchedules•

BarriersAddressed•

Families,Other•Organizations,Schools

Page 79: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

79

illustratewhatascoreof1,3or5lookslike(seeexampleonnextpage).ThescoringforFormsAandBisconsistent,butinthecaseofFormB,evidencetodrivethescoringisbasedonaninterviewasopposedtoobservations.Observersareencouragedtowritedownevidenceorexamplesthatsupportthescorethathasbeenapplied.

TechnicalPropertiesExtensivepsychometricevidenceabouttheYPQAisprimarilyavailablefromthreestudies.Thefirst,referredtoastheValidationStudy,examinedthereliabilityandvalidityoftheinstruments’scaleswithasampleof59organizations,mostofwhichwereafter-schoolprograms(Smith&Hohmann,2005).Thefindingssuggesttheinstrumenthasmanygoodpsychometricproperties.Threeofthesevenscales,however,didnotperformwellinoneormorepsychometricareas.

Thesecondstudy,referredtoastheSelf-AssessmentPilotStudy,includedasampleof24sitesandspecificallyexaminedtheYPQA’suseasaself-assessmenttoolforafter-schoolprograms(Smith,2005).Thisistheonlystudymentionedinthisreportthataskedprogramstoassessthemselvesratherthan

relyingonindependentresearcherstocollectdata.ThisstudyexaminedtheconcurrentvalidityoftheYPQAandfoundpreliminarysupportforthetotalscoreandseveralscales.Similartothefirststudy,somescalesexhibitedproblemswithinternalconsistency.

Thethirdstudy,referredtoasthePalmBeachQualityImprovementSystem(QIS)PilotStudy,usedamodifiedformoftheYPQAknownasthePBQ-PQAtoassessprogramqualityin38sites.ThePBQ-PQAhadsimilar,butnotidentical,scalescomparedtotheYPQA(Smith,C,Akiva,T.,Blazevski,J.&Pelle,L.,2008).

Inadditiontothesethreestudies,thedevelopersalsoconductedadditionalinterraterreliabilityanalysesfortheprogramofferingssectionoftheinstrument.Theyhavealsobegunusingtechniquesthatprovidemorerefinedanddetailedanalysesofreliabilityandvaliditythantraditionalmethods(seepages16-17).Inrelatedwork,CYPQhasjustfinishedavaliditystudyonayoungeryouthversionofthePQA(gradesK–4).

ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecific

II.SupportiveEnvironmentII-I.StaffSupportYouthinBuildingNewSkills

Indicators SupportingEvidence/Anecdotes1 3 5

Youtharenotencouragedtotryoutnewskillsorattempthigherlevelsofperformance.

Someyouthareencouragedtotryoutnewskillsorattempthigherlevelsofperformancebutothersarenot.

Allyouthareencouragedtotryoutnewskillsorattempthigherlevelsofperformance.

n/o=1

Someyouthwhotryoutnewskillswithimperfectresults,errorsorfailureareinformedoftheirerrors(e.g.,“That’swrong.”)and/orarecorrected,criticized,madefunof,orpunishedbystaffwithoutexplanation.

Someyouthwhotryoutnewskillsreceivesupportfromstaffwhoproblem-solvewithyouthdespiteimperfectresults,errors,orfailure,and/orsomeyoutharecorrectedwithanexplanation.

Allyouthwhotryoutnewskillsreceivesupportfromstaffdespiteimperfectresults,errors,orfailure;staffallowyouthtolearnfromandcorrectmistakesandencourageyouthtokeeptryingtoimprovetheirskills.

n/o=1

Page 80: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

80

dimensions.SmithandHohmann(2005)examinedaveragescoresandspreadforeachofthescalesandtotalscoresfortheProgramOfferingsandOrganizationitemsandfoundthatallofthescalesandtotalscorehadgooddistributionsexceptforSafeEnvironmentandAccess(whicheachhadmeansof4.4outofapossible5.0).Mostprogramsscoredveryhighonthesescales,makingithardtocapturereliabledifferences.ForSafeEnvironment,itmayberealistictoassumethatnearlyallprogramsarerelativelysafe,particularlysincethescoresfromthisscalewerevalidatedbyfindingsfromayouthsurvey(seesectiononconcurrentvalidity).However,additionalevidenceisneededtodeterminewhethernearlyallprogramsarehighonAccess,orwhethertherearemeaningfuldifferencesthatarenotbeingpickedupbecausetheitemsare“tooeasy.”Inthelattercase,theitemscouldberevisedtobettercapturedifferencesbetweenprograms.

InterraterReliabilityRecentanalysessuggestthatthecurrentversionofthetoolpairedwithimprovedtrainingtechniquesproducesmoderatetohighlevelsofinterraterreliability.FortheProgramOfferingitems,High/Scoperesearchershavecapturedfourpaired-raterdatasetsoverthepasttwoyearsforatotalof32raterpairsusingliveandvideomethodsfortestingagreement.OneofthesedatasetswasproducedindependentlybytheChildren’sInstituteattheUniversityofRochester.AllratersusedthecurrentversionoftheYPQA.Researchersfoundthatacrosstheraterpairstherewasanaverageof78percentperfectagreementattheindicatorlevel,whichtranslatestoanaveragemaximumkappacoefficientof.66,closetothe.70benchmarkforhighinterraterreliability.Similarly,theaverageitem-levelmaximumkappafortheProgramOfferingitemswasalsohighat0.72.

FindingssuggestthatthecurrentversionofthetoolpairedwithratertrainingproducesacceptablelevelsofinterraterreliabilityforthreeofthefourscalesintheProgramOfferingssection.Specifically,theSafety,Support,andEngagementscaleshadacceptablereliabilitiesrangingbetween0.66and0.73.TheInteractionscalehadmoderatereliability(0.54).

InformationfortheOrganizationitems(scalesfivethrougheight)comesfromanearliervalidationstudybySmithandHohmann(2005).Theauthorscomparedpairsofraterswhoexaminedthesameprogramsatthesamepointsintime.TheyexaminedthepercentageofagreementacrosstheseitemsandfoundthatthehighestpossibleKappawas0.68,veryclosetothe.70benchmarkforhighreliability.

SmithandHohmann(2005)alsoexaminedinterraterreliabilitiesofthethreeOrganizationscales,whichisimportantbecauseuserswillultimatelydrawmostoftheirconclusionsfromthescales,nottheindividualitems.Theyexaminedagreementusingastatisticknownastheintraclasscorrelationcoefficient(ICC),whichexaminesthedegreetowhichdifferencesamongallratingshavetodowiththedifferencebetweenratersordifferencesamongtheprogramsthemselves.TheYouthCenteredPoliciesandPractices,HighExpectationsandAccessscalesallhadhighinterraterreliability(ICC=.51,.90&.73respectively).

InternalConsistencyInternalconsistencyindicateshowcloselyrelatedscoresarefortheoreticallysimilaritems.TheValidationStudyfoundthatmostoftheYPQAscalesexhibitedacceptableinternalconsistencyexceptforSafeEnvironmentandAccess.Asnotedabove,thismayhavetodowiththedistributionsofscores.Twoitemsfromaninternallyconsistentscalegotogether,sothatwhenitemAisratedashigh,itemBisratedashighandwhenAislow,Bisalsolow.However,ifAisalwayshigh(becauseallprogramsdowellonit),whetherornotBishigh,internalconsistencywillbelow.

OneexampleofaniteminwhichmostorganizationsreceivedthehighestpossiblescoreintheSelf-AssessmentPilotStudywas,“Thephysicalenvironmentissafeandhealthyforyouth.”Ifitemssuchasthisonearealwayshigh,wemaynotneedtokeepmeasuringthem.However,ifresearchersbelievethatthereismeaningfulvariationamongprograms,thenthesescalesmayneedadditionalrevisionbeforewecanbeconfidentthattheirscoresreliablymeasuretheconceptsthat

Page 81: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

81

theyaresupposedtomeasure.Similarly,Smith(2005)foundintheSelf-AssessmentPilotStudythatthesetwoscaleshadlowinternalconsistency,butitalsoshowedlowinternalconsistencyfortwootherscales:YouthCenteredPoliciesandPracticeandHighExpectationsforAllStudentsandStaff.ApossibleexplanationisthatstaffparticipatingintheSelf-AssessmentPilotStudywereonlygivenonedayoftraining,whereastrainedratersintheValidationStudymayhavebeengivenmore.

Oneadditionalexplanationforwhyinternalconsistencymayhavebeenloweronsomescalescouldbethattheconceptsformingthesescalesareformativeratherthanreflective.AsexplainedinthesectiononAdditionalTechnicalConsiderations(page16-17),internalconsistencytestsareonlyappropriatewhenitemsarereflective,meaningthattheyallreflectthesameunderlyingconcept.Suchitemsarecloselyrelatedtooneanotherandeachrepresentsaunique“attempt”tomeasuretheconceptofinterest.However,internalconsistencyshouldnotbeusedwhenitemsareformative,meaningthatdifferentcomponentstogethermakeuporformacoherentset.Forexample,theSafeEnvironmentscalemaybemoreformativethanreflective.Aprogramthatprovideshealthyfoodanddrinks(asassessedbyoneitem)maynotnecessarilyhaveappropriateemergencyproceduresandsuppliespresent(anotheritemonthescale).However,eventhoughthesetwoitemstapdifferentunderlyingconcepts(nutrition,safetyinemergencies)andmaynotbecloselyrelated,theircombinationprovidesanimportantindexofhowaprogrampromotessafetyandhealth.

YPQAdevelopershavebegunexaminingwhethersomeYPQAscalesareformativeversusreflective,andtheyarecurrentlyexploringwhethercertainitemscanbecombinedtoformnew,reflectivescales.

Test-RetestReliabilityTheValidationStudyexaminedhowmuchscoreschangedonmultipleratingsoveraperiodofthreemonths.Correlationsbetweenassessmentsrangedfrom0.81to0.98,indicatingthatratingsdonotfluctuatewidelyovershortperiodsoftime.Long-termstability

wasnotassessed,sowecannotofferanyevidenceonwhethertheYPQAissensitiveenoughtodetectlong-termchange.

ValidityofScaleStructureEachofthescalesintheYPQAissupposedtomeasureaseparateconcept.Afactoranalysisexamineswhichitemsaresimilartoeachotherandwhicharedifferent.SmithandHohmann(2005)conductedafactoranalysisatbothobservationperiodsandfoundpreliminaryevidencethattheProgramOfferingitems(scalestwothroughfour)groupedtogetherinwayssimilartothescales.SafeEnvironmentwasnotincludedinthefactoranalysisandtheauthorsacknowledgethatthefactoranalysisdidnotsupporttheirexpectationsuntiltheyremovedtheseitems.WithouttheSafeEnvironmentitems,findingsindicatedthatSupportiveEnvironmentandOpportunitiesforInteractionoverlapandmaynotbeentirelydistinct.ValiditysupportwasstrongfortheOrganizationitems(scalesfivethroughseven),whichgenerallygroupedtogetheraccordingtothetheorizedstructureofthescales.

ConvergentValidityOnewaytoexaminewhetheraninstrumentactuallymeasuresaspectsofprogramqualityistocompareitsscorestomeasuresofidenticalorhighlysimilarconcepts.TheValidationStudytestedconvergentvaliditybycomparingallYPQAscalesexceptAccessandHighExpectationstosimilarscalesonaseparateyouthsurvey.Forexample,theSupportiveEnvironmentscalewascomparedtoaBelongingscaleontheyouthsurvey.CorrelationalevidenceindicatesthattheYPQAismoderatelytostronglyrelatedtofindingsfromtheyouthsurvey.TheYPQAtotalscoresfortheobservationandinterviewscaleswerealsorelatedtotheyouthsurveytotalscore.Theseresultsareencouragingforestablishingvalidity.

InthePalmBeachQualityImprovementSystemPilotStudy,researchersexaminedtherelationshipbetweenyouthperceptionsofprogramqualitytomodifiedversionsofthefourYPQAFormAdomainscales.Scalesonthisformweresimilartotheoriginalscales,but

Page 82: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

82

notidentical.AuthorsfoundthatyouthperceptionsofqualitywererelatedtotheInteractionscale,butwereunrelatedtoSafeEnvironment,SupportiveEnvironment,andEngagementscales.Althoughthisevidenceismixed,thisvalidityevidencemaynotapplytothecurrentYPQAscalessincetheYPQAandtheinstrumentusedinthisstudyarenotcompletelyidentical.

ConcurrentValidityConcurrentvalidityisestablishedwhenaninstrument’sitemsandscalesarerelatedtodistinctbuttheoreticallyimportantconceptsthataremeasuredinthesametimeperiod.TheValidationStudymeasuredthevalidityofthetotalprogramqualityscore(createdbyaveragingthevariousscalescores)byexaminingitsrelationshiptoexpertratingsoftheprogramsthatwerebeingevaluated.Specifically,expertsratedprogramsbasedonyouthcenterednessandavailabilityofresources.ItisreasonabletoexpectthatiftheYPQAisindeedmeasuringprogramquality,thenthetotalscorewouldberelatedtothesetwoexpert-ratedconcepts.UsingPearsoncorrelationsasameasureofrelatedness,Smith&Hohmann(2005)foundstrongevidencethattheYPQAtotalscoreisrelatedtoexpertratingsforthesetwodomains,lendingadditionalsupportthattheinstrumentisindeedmeasuringprogramquality.Theyalsotestedthevalidityoftheglobalprogramqualityscoresbycomparingprogramswithtrainedstafftoprogramswithouttrainedstaff.Asexpected,theprogramswithtrainedstaffhadhigherglobalqualityscoresthanthosewithout,againlendingsupportthattheinstrumentcanvalidlymeasureoverallprogramquality.

TheValidationStudyalsoexaminedhowwelltheinstrumentwasassociatedwithstudentexperiencesassessedbyaseparateyouthsurvey(Smith&Hohmann,2005).ThefollowingrelationshipswereexaminedbetweenYPQAandyouthsurveyscales:(1)YPQAtotalscorewiththeyouthsurveymeasureofoverallprogramexperiences,(2)YPQAEngagedLearningwithmeasuresofgivingbacktothecommunity,youthgrowth,interestintheprogram,andchallengingexperiences,and(3)YPQAInteractionOpportunitieswithameasureofdecisionmakingintheprogram.

Theauthorsfoundstrongevidenceforconcurrentvalidityinthatalloftheirhypothesizedrelationshipsweresupportedexceptfortwo(theEngagementscalewasnotrelatedtoyouths’interestintheprogramorchallengingexperiences).However,thisevidenceislimitedinthattheoreticallyimportantrelationshipsinvolvingFormA’sSafeEnvironmentandSupportiveEnvironmentscalesandthethreeFormBscaleswerenotexamined.

TheSelf-AssessmentPilotStudyexaminedconcurrentvaliditybycorrelatingfindingsfromtheSupportiveEnvironmentandEngagementscalesandtheProgramOfferingstotalscorewithayouthsurveymeasureofstaffsupport.FindingsindicatedastrongrelationshipbetweenSupportiveEnvironmentandtheyouthsurvey.TheEngagementscalewasrelatedinexpectedwaystoameasureofprogramgovernanceontheyouthsurveyandtheProgramOfferingstotalscorewasrelatedinexpectedwaystoacademicsupportandpeerrelations.Noneoftheserelationshipswerestatisticallysignificant,perhapsbecausethesamplesizewassosmall(12programs).Thus,theserelationshipsshouldbeconsideredpromisingbutnotdefinitive.

ThePalmBeachQualityImprovementSystemPilotStudyevaluatedtheconcurrentvalidityofmodifiedversionsofthefourFormAscalesbyexaminingtheirrelationshipswithtwoscalesfromayouthsurvey:positiveaffectandchallengingexperiences.HigherpositiveaffectscoreswererelatedtohigherscoresonYPQAInteractionOpportunities,butitwasunrelatedtotheotherthreescales.HigherscoresonchallengingexperienceswasrelatedtohigherscoresonYPQAEngagedLearning,butitwasunrelatedtotheotherthreescales.Inadditiontotheresultsbeingmixed,validitymaynotapplytotheYPQAscalessincetheywerenotidenticaltotheonesusedinthisstudy.

Theconcurrentvalidityevidenceispromisingbutlimitedatthispoint.Additionalsupportisneededforseveraloftheinstrument’sscales.

Page 83: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

83

VariationsinQualityAcrossDifferentContextsProgramqualitymayvaryacrossdifferentcontextssuchasdifferentofferings,andhowmanysessionschildrenandyouthhavehadwithoneanother.Itisimportanttoknowifaninstrumentissensitivetothesetypesofdifferences,becauseifso,thenuserswillneedtoconductobservationsacrossarangeofcontexts.Forexample,ifqualityscoresvaryacrossdifferenttypesofactivitieswithinaprogram,thenuserswillneedtoobserveawiderangeofactivitiestoobtainacompletepictureofquality.

DevelopershavebegunexamininghowtheYPQAperformsacrossthreeimportantprogramcomponents:individualofferings,thecontentoftheseofferings,andhowmanysessionsthechildrenandstaffhavehadtogether.Developersalsoexaminedvariationacrosstwocombinationsofthesecomponents.Forexample,doesqualityforsomeofferingsstayconstantthroughouttheyearwhereasqualityforotherofferingsimprovesfromthebeginningtotheendoftheyear?Arequalityscoresrelativelysimilarincertaincontentareasregardlessofwhichagenciesarebeingobservedwhereasqualityscoresinothercontentareasvaryacrossagencies?Currently,evidenceisonlyavailablefortheInteractionsscale.Findingsindicatedthatinadditiontodetectingqualitydifferencesacrossagencies,theInteractionsscalewassensitivetodifferencesacrossvarioustypesofofferingsandcontent.However,measuredqualitydidnotvarybythenumberofsessionschildrenandstaffhadtogetheroracrossdifferentcombinationsofprogrammaticcomponents.ThesefindingssuggestthatusersoftheYPQAshouldconductobservationsacrossdifferenttypesofofferingsandcontentareastoobtainanaccurateInteractionsscore.

Inaddition,eventhoughagreementamongraterswasacceptableinotherstudies(asindicatedinthesectiononinterraterreliability),thedevelopersfoundreliabledifferencesamongratingsgivenbydifferentraters.Thissuggeststhatevengoodreliabilityamongratersdoesnotmeanthatratersshouldbeignored–afindingthatprobablyextendstoalltheinstrumentsinthiscompendium.

NoevidenceonhowqualityvariesacrosscontextsiscurrentlyavailablefortheotherYPQAscales,anditisalsopossiblethattheinstrumentissensitivetootherdifferencesbesidestheonesalreadyexamined(e.g.,timeofday).

UserConsiderationsEaseofUseTheYPQAwasdevelopedwithandforbothpractitionersandresearchers;asaresultthelanguageisaccessibleandtheformatandscoringprocessisuser-friendly.TheadministrationmanualandtheintroductionstoFormAandBofferusersasummaryofthepurposeandbenefitsofthetool,definitionsofkeytermsused(e.g.,scale,sub-scale,offering,item)andclearstepsthatwalkusersthroughtheobservationandscoringprocess.Whiletrainingisrecommended,themanualsthemselvesareself-explanatory.A“starterpack”thatincludesanadministrationmanual,FormAandFormBcanbeorderedonlinefor$39.95.

UsersoftheYPQAareencouragedtoconductarunningrecordofwhatoccursduringarelativelyextensiveprogramobservationasopposedtocapturingseveralshortsnapshotsofprogramming,becausedevelopersbelieveactivitieshaveacertainflowthatisimportanttotrytoobserve.Thisisparticularlyimportantifthegoalistocomeupwithareliableandvalidscoreforanindividualprogramasopposedtoaggregatingalargesampleofobservationsforresearchpurposes.Developersestimatethatgeneratingascoreforaprogram,basedonbothFormsAandB,takesaminimumofapproximatelysixhoursforasinglestaffperson.Roughlyfourofthosehoursaretypicallyspentobserving/interviewingwithintheprogramandanothertwohourswritingupandscoringtheinstrument.

AvailableSupportsInadditiontoanonlinetraining,theWeikartCenteroffersYPQAtrainingperiodicallyaroundthecountry(whichwillsoonbeavailableonline).Theone-dayworkshop,YPQABasics,introducestheobservationandevidencegatheringmethod,familiarizesparticipantswitheachitemandindicatorandpreparesstaffto

Page 84: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

84

conducttheprogramself-assessmentmethodofevidencegatheringandscoring.Thetwo-dayYPQAIntermediateworkshopcoversallthematerialfromtheone-dayandgivesparticipantssubstantialpracticescoringthetoolusingwrittenscenariosandvideo,bringsparticipantstoacceptablelevelsofinterraterreliabilityandpreparesstafftoconducttheexternalassessmentmethodofevidencegatheringandscoring.Thethree-dayworkshopcoversallthematerialfromthesetwotrainingsandincludesasitevisit(duringwhichtheparticipantsscoreayouthprogram)andananalysisofthescoringefforts.

Inthepastyear,theWeikartCenterhasdevelopedasetofmanagement-focusedtrainingstoassistsitemanagersinleadingtheirprogramsthroughadata-drivenqualityimprovementprocess.

TheWeikartCenteralsooffers12youthdevelopmenttrainingsthatarealignedwiththecontentoftheYPQA.Followingaself-assessmentorevaluationprocess,forexample,programdirectorscanassembleatailoredstafftrainingexperiencebasedonspecificareaswithintheYPQAwheretheassessmentshowedworkwasneeded.

Anelectronic“scoresreporter”iscurrentlyavailablefromtheWeikartCenter(andisfreetothosewhopurchasetheinstrument).AmoresophisticatedWeb-baseddatamanagementsystem,iscurrentlyunderdevelopment.Thiswillallowindividualprogramsornetworkstojoin,goonlinetoenterandanalyzedataandseetheirresultsatvariouslevelsofaggregation.

IntheFieldTheRhodeIslandState21stCCLCprogramhaspartneredwiththeCenterforYouthProgramQualityinamulti-yearqualityassessmentprocessusingacustomizedtoolbasedontheresearch-validatedPQA.TheRhodeIslandProgramQualityAssessment(RIPQA),throughajointpartnershipoftheRhodeIslandAfterSchoolPlusAlliance,theProvidenceAfterSchoolAllianceandtheRhodeIslandDepartmentofEducation,iscurrentlyusedbyafter-schoolprogramsacrossthecityofProvidenceandthroughoutthestate,including

all21stCCLCfundedprograms.Participatingprogramsconductanannualself-assessmentusingtheRIPQA.Tosupporttheirefforts,aWeikartCenter-trainedQualityAdvisorworkswithprogramstojointlyobserveprogramofferingswithsitestaffandthenworkone-on-onewithagenciestodevelopqualityimprovementplansbasedonthoseobservations.

Asanadditionalcomponentofthiseffort,theWeikartCenterhasalsoconductedarandomizedfieldtrialtotestouttheirfulltrainingmodel.Basedon100interviewswithsitesupervisors,researchershavefoundthatengagingprovidersintheobservationandreflectionprocesshasbeenwell-receivedacrosstheboard.Thequalityadvisorandsite-basedtechnicalsupporthasbeenaveryimportantpartoftheprocess,especiallyforthoseproviderswithlimitedcapacity.Aggregatedsystem-widequalitydataareusedtodesignandcoordinatesystem-wideprofessionaldevelopmentofferingsaroundtheneedsthatgetsurfacedthroughassessment.

AccordingtoElizabethDevaney,DirectorofQualityInitiativesattheProvidenceAfterSchoolAlliance,thequalityimprovementefforthas“strengthenedourpositionandabilitytoattractpublicandprivateresourcestogrowthesystem,andisanimportantstrategyforsustainabilitygoingforward.”

ForMoreInformationInformationabouttheYPQAandorderinginformationisavailableonlineat:www.highscope.org/content.asp?contentid=117

ContactCharlesSmith,DirectorDavidP.WeikartCenterforYouthProgramQualityCentennialPlazaBuildingSuite601124PearlStreetYpsilanti,[email protected]

Page 85: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

85

Arbreton,A.,Goldsmith,J.&Sheldon,J.(2005).Launchingliteracyinafter-schoolprograms:Earlylessons

fromtheCORALinitiative.Philadelphia,PA:Public/PrivateVentures.

Arbreton,A.,Sheldon,J.,Bradshaw,M.,&Goldsmith,J.withJucovy,L.&Pepper,S.(2008).Advancing

achievement:Findingsfromanindependentevaluationof

amajorafter-schoolinitiative.Philadelphia,PA:Public/PrivateVentures.

Birmingham,J.,Pechman,E.,Russell,C.,&Mielke,M.(2005).Sharedfeaturesofhigh-performingafter-

schoolprograms:Afollow-uptotheTASCevaluation.Washington,D.C.:PolicyStudiesAssociates,Inc.

Connell,J.,&Gambone,M.(2002).Youthdevelopment

incommunitysettings:Acommunityactionframework.Philadelphia,PA:YouthDevelopmentStrategiesInc.

Durlak,J.&Weissberg,R.(2007).Theimpactofafter-

schoolprogramsthatpromotepersonalandsocialskills.Chicago,IL:CollaborativeforAcademic,Social,andEmotionalLearning.

Harms,T.,Jacobs,E.,&White,D.(1996).School-agecareenvironmentscale.NewYork,NY:TeachersCollegePress.

InterculturalCenterforResearchinEducation,&NationalInstituteonOut-of-SchoolTime(2005).Pathwaystosuccessforyouth:Whatworksin

afterschool:AreportoftheMassachusettsAfterschool

ResearchStudy(MARS).Boston,MA:UnitedWayofMassachusettsBay.

Kim,J.,Miller,T.,Reisner,E.&WalkingEagle,K.(2005).EvaluationofNewJerseyAfter3:First-year

reportonprogramsandparticipants.Washington,DC:PolicyStudiesAssociates,Inc.

Knowlton,J.,&Cryer,D.(1994).FieldtestoftheASQ

programobservationforreliabilityandvalidity.ChapelHill,NC:Authors.

MacKenzie,S.,Podsakoff,P.,&Jarvis,C.(2005).Theproblemofmeasurementmodelmisspecificationinbehavioralandorganizationalresearchandsomerecommendedsolutions.JournalofAppliedPsychology.90(4).(pgs.710-730).

Martinez,A.,&Raudenbush,S.W.(2008).Measuringandimprovingprogramquality:Reliabilityandstatisticalpower.InM.Shinn&H.Yoshikawa(Eds.),Toward

positiveyouthdevelopment:Transformingschoolsand

communityprograms(pgs.333-349).NewYork,NY:OxfordUniversityPress,Inc.

NationalResearchCouncilandInstituteofMedicine.(2002).Communityprogramstopromoteyouth

development.Eccles,J.andGootman,J.,eds.Washington,DC:NationalAcademyPress.

Pechman,E.,Mielke,M.,Russell,C.,White,R.&Cooc,N.(2008).Out-of-Schooltimeobservationinstrument:

Reportofthevalidationstudy.Washington,DC:PolicyStudiesAssociates,Inc.

Pierce,K.M.,Bolt,D.M.,&Vandell,D.L.(inpress).Specificfeaturesofafter-schoolprogramquality:Associationswithchildren’sfunctioninginmiddlechildhood.AmericanJournalofCommunityPsychology.

Pierce,K.,Hamm,J.,Sisco,C.,&Gmeinder,K.(1995).A

comparisonofformalafter-schoolprogramtypes.PostersessionpresentedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Indianapolis,IN.

Pierce,K.,Hamm,J.,&Vandell,D.(1999).Experiencesinafter-schoolprogramsandchildren’sadjustmentinfirst-gradeclassrooms.ChildDevelopment.70(3),(pgs.756-767).

Raudenbush,S.,Martinez,A.,Bloom,H.,Zhu,P.,&Lin,F.(2008).Aneight-stepparadigmforstudyingthereliability

ofgroup-levelmeasures.Chicago:UniversityofChicago.

Russell,C.,Mielke,M.,&Reisner,E.(2008).EvaluationoftheNewYorkCityDepartmentofYouthand

Page 86: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

86

CommunityDevelopment:Out-of-schooltimeprograms

forYouthInitiative:Resultsofeffortstoincreaseprogram

qualityandscaleinyear2.Washington,DC:PolicyStudiesAssociates,Inc.

Russell,C.,Reisner,E.,Pearson,L.,Afolabi,K.,Miller,T.,&Mielke,M.(2006).EvaluationoftheOut-of-School

TimeInitiative:Reportonthefirstyear.Washington,DC:PolicyStudiesAssociates.

Seidman,E.,Tseng,V.,&Weisner,T.(February2006).Socialsettingtheoryandmeasurement.InWilliamT.GrantFoundationReportandResourceGuide2005-2006.NewYork,NY:WilliamT.GrantFoundation.

Smith,C.(2005).MeasuringqualityinMichigan’s

21stCenturyafterschoolprograms:TheYouthPQA

self-assessmentpilotstudy.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.

Smith,C.,Akiva,T.,Blazevski,J.,&Pelle,L.(2008,January).FinalReportonthePalmBeachQuality

ImprovementSystemPilot:ModelImplementation

andProgramQualityImprovementin38After-school

Programs.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.

Smith,C.,&Hohmann,C.(2005).Youthprogram

qualityassessmentyouthvalidationstudy:Findings

forinstrumentvalidation.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.

Spielberger,J.&Lockaby,T.(2008).PalmBeach

County’sPrimeTimeinitiative:Improvingthequalityof

after-schoolprograms.Chicago:ChapinHallCenterforChildrenattheUniversityofChicago.

Vandell,D.L.,Reisner,E.R.,&Pierce,K.M.(2007).Outcomeslinkedtohigh-qualityafterschoolprograms:

Longitudinalfindingsfromthestudyofpromising

afterschoolprograms.Unpublishedmanuscript.PolicyStudiesAssociates,Inc.

Vandell,D.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,Dadisman,K.,Pechman,E.,&Reisner,E.(2006).

Developmentaloutcomesassociatedwiththeafter-school

contextsoflow-incomechildrenandyouth.Unpublishedmanuscript.

Vandell,D.,&Pierce,K.(2006).Studyofafter-schoolcare:Programqualityobservation.Retrievedonlineatwww.wcer.wisc.edu/childcare/pdf/asc/program_quality_observation_manual.pdf.

Vandell,D.,&Pierce,K.(2001,April).Experiencesinafter-schoolprogramsandchildwell-being.InJ.L.Mahoney(Chair).Protectiveaspectsofafter-schoolactivities:Processesandmechanisms.PapersymposiumconductedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Minneapolis,MN.

Vandell,D.,Reisner,E.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,&Pechman,E.(2006).Thestudyofpromisingafter-

schoolprograms:Examinationoflongertermoutcomesafter

twoyearsofprogramexperiences.WisconsinCenterforEducationResearch,UniversityofWisconsin-Madison.

Vandell,D.L.,&Pierce,K.M.(2001,April).Experiencesinafter-schoolprogramsandchildwell-being.InJ.L.Mahoney(Chair),Protectiveaspectsofafter-schoolactivities:Processesandmechanisms.PapersymposiumconductedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Minneapolis,MN.

Vandell,D.&Pierce,K.(1998).Measuresusedinthestudy

ofafter-schoolcare:Psychometricpropertiesandvalidity

information.Unpublishedmanual,UniversityofWisconsin-Madison.

WalkingEagle,K.,Miller,T.,Reisner,E.,LeFleur,J.,Mielke,M.,Edwards,S.,Farber,M.(2008).Increasingopportunitiesforacademicandsocialdevelopmentin2006-07:Evaluation

ofNewJerseyAfter3.Washington,DC:PolicyStudiesAssociates,Inc.

Westmoreland,H.&Little,P.(2006).Exploringqualitystandardsformiddleschoolafterschoolprograms:Whatwe

knowandwhatweneedtoKnow:Asummitreport.HarvardFamilyResearchProject;Cambridge,MA.Retrievedonlineatwww.gse.harvard.edu/hfrp/content/projects/afterschool/conference/summit-2005-summary.pdf.

Page 87: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

87

Psychometrics:Whataretheyandwhyaretheyuseful?

TheyouthprogramJaniceworksforisinterestedinself-assessmentandislookingforatoolthatmeasurestheoverallqualityoftheprogram.Afterlookingoverseveraloptions,shesettlesonaninstrumentthatseemseasytouse,withquestionsthatseemrelevanttotheorganization’sgoals.Unfortunately,sheencountersanumberofproblemsonceshestartsusingtheinstrument.First,theobserversinterpretquestionsverydifferently,leadingtodisputesovertheirassessmentsofquality.Second,theindividualitemscoresdon’tseemtoformacoherentpictureoftheprogram.Third,thefindingsareunrelatedtoyouthoutcomesthatshouldbedirectlyrelatedtoprogramquality.AlloftheseissuesmakeJanicequestionwhethertheinstrumentmeasuresprogramqualityaswellasitshould.

TheinstrumentJanicechoselookedusefulonthesurface,butitsfieldperformancewasnotparticularlyhelpful.PsychometricinformationmighthavehelpedJaniceunderstandthestrengthsandweaknessesoftheinstrumentbeforesheusedit.Psychometricsarestatisticsthathelpresearchersevaluateinstruments’fieldperformances.Psychometricinformationcanbedividedintoseveralcategories.

ReliabilityAninstrument’sabilitytogenerateconsistentanswersorresponses.Themostcommonanalogyusedtounderstandreliabilityisagameofdarts.Ifaplayer’sdartsconsistentlylandonthesamelocationontheboard,wewouldsaythatthedartplayerhasexcellentreliability(whetherornotthatplaceisthecenteroftheboard).Thesameistrueforresearchinstrumentsthatyieldpredictableandconsistentinformation.Therearevarioustypesofreliabilitydiscussedbelow.

InterraterReliabilityTheextenttowhichtrainedratersagreewhenevaluatingthesameprogramatthesametime.

Foraccurateprogramassessments,usersshouldchooseinstrumentsthatyieldreliableinformationregardlessofthewhimsorpersonalitiesofindividualraters.Whenfindingsdependlargelyonwhoisratingtheprogram(e.g.,ifRaterAismorelikelytogivefavorablescoresthanRaterB),itishardtogetasenseoftheprogram’sactualstrengthsandweaknesses.Forthisreason,organizationsshouldconsidertheinterraterreliabilityofvariousmeasuresevenifonlyoneraterwillberatingtheprogram.Poorinterraterreliabilityoftenstemsfromambiguousquestionsthatleavealotofroomforindividualinterpretationandsuchambiguityisnotalwaysimmediatelyapparentfromlookingattheinstrument.

Severalmethodsexisttomeasureinterraterreliability.Manyoftheinstrumentsinthisreportgivethepercentagethatratersagreeforagivenitem(allowingaone-pointdifferencetocountasagreement).Whilethismethodiscommon,itisnotasusefulasotherstatistics.Whenavailable,weinsteadreporttwootherstatisticsknownaskappaandintraclasscorrelation.Valuesofkappanearorabove.70indicatehighreliabilityandthisvalueisoftenconsideredthebenchmarkforastrong,reliableinstrument.Otherresearchersstatethatkappavaluesstartingat.60indicatesubstantial/strongagreement,whereasvaluesrangingfrom.40to.59indicatemoderateagreement.Similarguidelinesdonotyetexistfortheintraclasscorrelation,butthisreportconsidersvaluesclosetoorabove.50toindicatehighreliability.

Thereasonthatpercentageagreementdoesnotsufficientlyrepresentreliabilityisthatitdoesnotaccountforthoseinstanceswhereratersagreesimplybychance,whereaskappascoresandintraclasscorrelationsdo.Inmanycases,whatlookslikehighinterrateragreementmayactuallyhavealowkappascoreorintraclasscorrelationcoefficient.Whenkappascoresorintraclasscorrelationsarenotavailableforaninstrument,weprovideanestimateofkappa.Readersshouldknowthattheestimateisthebestpossiblescorebasedontheavailableinformation,thoughitispossibletheactualkappascoreismuchlower(indicatingworsereliability).

Page 88: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

88

Itisimportanttonotethatinterraterreliabilitystatisticsassumethatallratershavebeenadequatelytrainedontheinstrument.Someinstruments’developersoffertrainingforraters.Ifyoucannotreceiveformaltrainingonaninstrument,itisstillimportanttotrainratersyourselfbeforeconductinganevaluation.Organizationscanholdmeetingstorevieweachquestionindividuallyanddiscusswhatcriteriaarenecessarytoassignascoreof1,2or3,etc.Ifpossible,ratersshouldgothrough“testevaluations”topracticeusingtheinstrumentwithscenariosthatcouldoccurintheprogram(ideallythroughvideos,butsuchscenarioscouldalsobewrittenifdetailedenough).Whendisagreementoccursonindividualquestions,ratersshoulddiscusswhytheychosetoratetheprogramthewaytheydidandcometoaconsensus.Practiceevaluationswillhelpratersget“onthesamepage”andhaveamutualunderstandingofwhattolookfor.

Test-RetestReliabilityThestabilityofaninstrument’sassessmentsofthesameprogramovertime.Ifseveralafter-schoolprogramsareeachassessedtwotimes,onemonthapart,therespectivescoresatbothassessmentswoulddifferverylittleiftheinstrumenthadstrongtest-retestreliability.Thestrengthofaninstrument’stest-retestreliabilitydependsonboththesensitivityoftheinstrumentandhowmuchtheprogramchangesovertime.Ifinstrumentsaretoosensitivetosubtlechangesinaprogram,test-retestreliabilitywillbelowandscoresmaydifferwidelybetweenassessmentseventhoughthesubtlechangesdrivingthisdifferencemayholdlittlepracticalsignificance.Ontheotherextreme,instrumentswithextremelyhightest-retestreliabilitymaybeinsensitivetoimportantlong-termchanges.Asisthecasewithinterraterreliability,severalmethodstomeasuretest-retestreliabilityexistincludingpercentageagreement,kappaandintraclasscorrelations,withthelattertwobeingpreferred.

Veryfewoftheinstrumentsinthisreporthaveundergonetestingforthistypeofreliability.Becausethetimespanbetweenassessmentshasbeenrelativelyshortfortheseinstruments,test-retestreliabilityshouldbehigh.

InternalConsistencyThecohesivenessofitemsformingtheinstrument’sscales.Anitemisaspecificquestionorratingandascaleisasetofitemswithinaninstrumentthatjointlymeasureaparticularconcept.Forexample,aninstrumentmightinclude10itemsthataresupposedtomeasurethefriendlinessofprogramstaffanduserswouldaverageorsumthe10scorestogetanoverall“friendlinessscore.”Becauseitemsformingascalejointlymeasurethesameconcept,wecanexpectthatthescoresforeachitemwillberelatedtoalloftheotheritems.Forexample,saythatthreeofour“friendliness”itemsinclude:(1)Howmuchdoesthestaffmembersmileatchildren?(2)Howmuchdoesthestaffmembercomplimentchildren?(3)Howmuchdoesthestaffmembercriticizechildreninaharshmanner?Ifthescalehadhighinternalconsistency,thescoresforeachquestionwould“makesense”comparedtotheothers(e.g.,ifthefirstquestionreceivedahighscore,wewouldexpectthatthesecondwouldalsoreceiveahighscoreandthethirdwouldreceivealowscore).Inascalewithlowinternalconsistencytheitems’scoresareunrelatedtoeachother.Lowinternalconsistencysuggeststheitemsmaynotfittogetherinameaningfulwayandthereforetheoverallscore(e.g.,averagefriendliness)maynotbemeaningfuleither.

Theanalogyofthedartboardisusefulwhenunderstandinginternalconsistency.Thinkabouttheindividualitemsasthedarts:theaimofthethrowerismeaninglessifthedartslandhaphazardlyacrosstheboard.Inthesameway,anoverallscorelikeaveragefriendlinessismeaninglessiftheitems’scoresdonotrelatetoeachother.ThestatisticthatdeterminesinternalconsistencyiscalledCronbach’salpha.Forascaletohaveacceptableinternalconsistency,itshouldbenearorovertheconventionalcutoffof0.70.Whereasinterraterandtest-retestreliabilitiesareimportantinformationforallinstruments,internalconsistencyisonlyrelevantforinstrumentswithscales.

TheWeikartCenter(YPQAdeveloper),amongothers(MacKenzie,S.,Podsakoff,P.,&Jarvis,C.,2005),

Page 89: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

89

hasnotedthatinternalconsistencyisonlyappropriatewhentheitemsarereflectiveofalargerconceptratherthanformative.Foramorein-depthdiscussionofthisrequirement,readersshouldrefertothesectiononAdditionalTechnicalConsiderations,foundonpages16-17ofthisreport.

VariationinQualityAcrossDifferentContextsProgramqualitymaynotbeentirelyuniformacrossdifferentstaff,differentactivities,orevendifferentdaysoftheweekormonthsoftheyear.Evenwhentwoobserverscanagreeonthelevelofqualitythattheyareobservingwhenbothareobservingpreciselythesameactivityatthesametime,theymightcomeupwithdifferentratingsiftheyobserveadifferentactivityatadifferenttime.Someinstrumentsmayalsobeparticularlysensitivetosometypesofvariation.AstheWeikartCenterandothershavenoted(Raudenbush,S.,Martinez,A.,Bloom,H.,Zhu,P.,&Lin,F.,2008),evidenceaboutthewaysthatscoresonaparticularinstrumentvarywithinaprogramisimportantsothatusersknowhowtoaccountforthisvariation(e.g.,ifaninstrument’sscoresdependontheactivity,thenitisimportanttoassessawiderangeofactivitiesintheprogram).Foramorein-depthdiscussionoftheseissues,readersshouldrefertothesectionAdditionalTechnicalConsiderations,foundonpages16-17ofthisreport.

Validity17

Aninstrument’sabilitytomeasurewhatitissupposedtomeasure.Ifaninstrumentissupposedtomeasureprogramquality,thenitwouldbevalidifityieldedaccurateinformationonthistopic.Howeverresearchershavedevisedseveraldifferentmethodsforestablishingvalidity.Themostcommonanalogyusedtounderstandvalidityagainisthegameofdarts.Whilereliabilityisabouttheplayerconsistentlythrowingdartstothesamelocation,validityrelatestowhetherornottheplayerishittingthebull’seye.Thebull’seyeisthetopicaninstrument

issupposedtomeasure.Whilereliabilityisessential,itisalsoimportanttoknowifaninstrumentisvalid(dartplayersthatconsistentlymisstheboardentirelymaybereliable–theymayhitthesamespotoverandover–buttheyaresuretolosethegame!).

Sometimesaninstrumentmaylooklikeitmeasuresoneconceptwheninfactitmeasuressomethingratherdifferentornothingatall.Forexample,aninstrumentmightclaimtomeasureafter-schoolprogramquality,butitwouldnotbeparticularlyvalidifitfocusedsolelyonwhetherchildrenlikedtheprogramandwerehavingfun.

Validitycanbetrickytoassessbecausetheconceptsofinterest(e.g.,programquality)areoftennottangibleorconcrete.Unlikethecaseofreliability,thereisnospecificnumberthattellsusaboutvalidity.Thesemethodseachassessdifferenttypesofrelationshipsthattogethergiveusconfidencethattheinstrumentismeasuringwhatwethinkitmeasures.Next,wedescribethedifferentsubtypesofvalidity.

FaceValidityIndividuals’opinionsofaninstrument’squality.Thisistheweakestformofvaliditybecauseitdoesnotinvolvedirecttestingoftheinstrumentandisbasedonappearanceonly.Oneexampleoffacevalidityinamedicalcontextconcernstakingatemperature.Todayweknowtodothiswithathermometer.Butthinkbackacouplehundredyears.Atthattime,feelingapatient’sforeheadwouldhaveseemedamuchmorevalidmeasureoftemperaturethanstickingaglasstubefilledwithmercuryintothepatient’smouth.Howhotaforeheadfeelsisafacevalidmeasureoftemperature,butfewpeopletodayconsiderthismethodalonetobeadequate.Instead,doctorsrelyonthermometersbecausetheyhavebeenscientificallyproventobemoreaccurate.Similarly,researchersandpractitionersshouldconsiderotherformsofvaliditywhenavailablebeforechoosinganinstrument.

ConvergentValidityTheextenttowhichaninstrumentcomparesfavorablywithanotherinstrument(preferablyone

17ResearchersoftenrefertothetypeofvaliditydiscussedinthisreportasConstructValidity,becauseitaddresseswhetheraninstrumentadequatelymeasuresaspecificconceptorconstruct.Althoughotherformsofvalidityexist,theyarenotaddressedinthisreport.

Page 90: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

90

withdemonstratedvaliditystrengths)measuringidenticalorhighlysimilarconcepts.Iftwoinstrumentsarepresumedtomeasurethesameorsimilarconcepts,wewouldexpectprogramsthatreceivehighscoresononemeasuretoalsoreceivehighscoresontheother.Forexample,imagineresearchershavedevelopedanewinstrument(InstrumentA)thatissupposedtomeasurestaffbehaviormanagementtechniquesinafter-schoolprograms.Todetermineitsvalidity,researchersmightcompareInstrumentAtoInstrumentB,whichisalreadyknowntoaccuratelymeasurestaff’sdisciplinestrategiesinafter-schoolprograms.AssumingthatInstrumentAisavalidmeasurement,wecanexpectthatwhenInstrumentBfindsthatprogramsrarelyuseappropriatedisciplinestrategies,InstrumentAwillfindthatthesameprogramsutilizepoorbehaviormanagementtechniques(andviceversa).Ifthiswerenotthecase,wewouldconcludethatInstrumentAprobablydoesnotadequatelymeasurebehaviormanagement.

ConcurrentandPredictiveValidityTheextenttowhichaninstrumentisrelatedtodistincttheoreticallyimportantconceptsandoutcomesinexpectedways.Ifaninstrumentmeasuresthequalityofhomeworkassistanceinafter-schoolprograms,thenchildrenwhoattendhighqualityprogramsshouldhavehigherratesofhomeworkcompletion(orperhapsgrades)thanchildrenwhoattendlowqualityprograms(assumingthereisnodifferencebetweenthechildrenbeforestartingtheprograms).Usually,theoryandpriorresearchfindingshelpresearchersdeterminewhichoutcomesaremostappropriatetoexaminewitheachinstrument.Validityevidenceisstrongestwhendifferencesintheoutcomesaredetectedaftertheinitialprogramobservationshavebeenconducted(knownasPredictiveValidity).Forexample,imaginethattwoafter-schoolprogramsaredesignedtoimprovechildren’sgrades,andthatchildrenattendingtheseprogramshadsimilargradesatthebeginningoftheschoolyear.Afterconductingprogramobservations,researchersdeterminedthatoneprogramwasofhighqualityandtheotherwasoflowquality.Ifchildrenattendingthehighqualityprogramhadhighergradesattheendoftheschoolyearcomparedtothe

childrenattendingthelowqualityprogram,thismakesusmoreconfidentthattheinstrumentaccuratelydetectedqualitydifferencesbetweenthetwoprograms.

Sometimesobservationsandrelatedconceptsaremeasuredinthesametimeperiod(knownasConcurrentValidity),particularlywhentherelatedconceptsareexpectedtochangesimultaneously.Howeverresearchersgenerallyprefertoseethehypothesizedcause(programquality)comebeforetheeffect.Whenbotharemeasuredatthesametime,itismorelikelythattheremaybeanotherexplanationfortherelationship.

Althoughsimilarinsomeways,concurrentandpredictivevalidityareseparatefromconvergentvalidity.Whereasconvergentvaliditycomparestwoinstrumentsthatmeasureidenticalorhighlysimilarconcepts,concurrentandpredictivevalidityrefertorelationshipsbetweendistinctconceptsthatweexpecttobestrongbasedontheoryandpriorresearch.

ValidityofScaleStructureTheextenttowhichitemsstatisticallygrouptogetherinexpectedwaystoformscales.Asalreadystated,scalesarecomposedofseveralitemsthat,whenaveragedorsummed,createanoverallscoreofaspecificconcept.Determiningwhetherscalesadequatelymeasuretheconceptstheyclaimtomeasurecanbedifficult,thoughconductingafactoranalysisisonehelpfulwaytodoso.Factoranalysisverifiesthatitemsgotogetherthewaysthedevelopersthoughttheywouldbyexaminingwhichitemsaresimilartoeachotherandwhicharedifferent.

Forexample,imagineaninstrumentwithtwoscales:StaffCommunicationStyleandStaffPatience.Next,imaginethatwheneverstaffareratedashavingaharshcommunicationstyletowardchildren,theyarealsoalwaysratedashavinglittlepatiencewithchildren.Becauseoftheirhighsimilarity,wewouldsaythatweareactuallymeasuringoneconcept,nottwo,anditwouldmakemoresensetohaveoneoverallscore(perhapsrenamedStaffAttitudesTowardChildren).

Page 91: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

91

Factoranalysiscanalsohelpdetermineifonescaleactuallyincorporatesmorethanonerelatedconcept.ImaginethatwehaveaninstrumentwithascalecalledHomeworkAssistance,butourfactoranalysisfindsthatweactuallyhavetwoseparateconcepts.WemightdiscoverthatsomeitemsrelatetoTutoringonSpecificSubjectMatterwhereasanothersetrelatestoTeachingStudySkills.Thereasonthatthevalidityofscalestructureisimportantisbecausewewanttoknowexactlywhichconceptsourinstrumentmeasures.

ScoreDistributionThedispersionorspreadofscoresfrommultipleassessmentsforaspecificitemorscale,includingfeaturessuchastheaveragescore,therangeofobservedvaluesandtheirconcentrationaroundparticularpoint(s).Inorderforitemsandscalestobeuseful,theyshouldbeabletodistinguishdifferencesbetweenprogramsonarangeofqualities.Toachievethis,scoresshouldnotbe“bunchedup”onanyparticularplaceonthescale.Forexample,imaginethataparticularinstrumenthasascalecalledPositiveChildBehaviorandusersmustrate,from1to5,howtruestatementslike“Childrenneverstophelpingeachother”and“Childrenthankstaffateveryopportunity”areforalargenumberofprograms.Ifalmosteveryprogramscoredlowforthisparticularscale,wemightarguetheitemsaremakingit“toodifficult”toobtainahighscoreanddonotmeaningfullydistinguishbetweenprogramsonthisdimension.Onesolutionwouldbetorevisetheitemstobetterreflectprogramdifferences.Thetwosampleitemsabovemightberevisedtosay“Childrenhelpeachotherwhenneeded”and“Childrenappreciatehelpfromstaff.”

Severalimportantstatisticshelpresearchersunderstandwhetherscoresarebunchingupontheends,includingtheaveragescore(sometimescalledthemean)andhowspreadoutthescoresare.Forexample,ascaleoritemwouldnotbeveryusefulfordistinguishingbetweenprogramsiftheaveragescoreacrossmanydifferentprogramswasa4.8outofapossible5.0.Inaddition,ascaleoritemmighthaveanaverageof3.5,butitwouldbelessusefulifthescoresonlyrangedbetween3and4insteadofalargerspreadbetween1and5.

Page 92: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we