measuring youth program quality: a guide to assessment...
TRANSCRIPT
NicoleYohalemandAliciaWilson-Ahlstrom,TheForumforYouthInvestmentwithSeanFischer,NewYorkUniversityandMarybethShinn,VanderbiltUniversity
PublishedbyTheForumforYouthInvestmentJanuary2009
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment2
AbouttheForumforYouthInvestmentTheForumforYouthInvestmentisanonprofit,nonpartisan“actiontank”dedicatedtohelpingcommunitiesandthenationmakesureallyoungpeopleareReadyby21®–readyforcollege,workandlife.Informedbyrigorousresearchandpracticalexperience,theForumforgesinnovativeideas,strategiesandpartnershipstostrengthensolutionsforyoungpeopleandthosewhocareaboutthem.Atrustedresourceforpolicymakers,advocates,researchersandpractitioners,theForumprovidesyouthandadultleaderswiththeinformation,connectionsandtoolstheyneedtocreategreateropportunitiesandoutcomesforyoungpeople.
TheForumwasfoundedin1998byKarenPittmanandMeritaIrby,twoofthecountry’stopleadersonyouthissuesandyouthpolicy.TheForum’s25-personstaffisheadquarteredinWashingtonD.C.inthehistoricCady-LeeHousewithasatelliteofficeinMichiganandstaffinMissouri,NewMexico,VirginiaandWashington.
SuggestedCitation:Yohalem,N.andWilson-Ahlstrom,A.withFischer,S.andShinn,M.(2009,January).MeasuringYouthProgramQuality:AGuide
toAssessmentTools,SecondEdition.
Washington,D.C.:TheForumforYouthInvestment.
©2009TheForumforYouthInvestment.Allrightsreserved.PartsofthisreportmaybequotedorusedaslongastheauthorsandtheForumforYouthInvestmentarerecognized.NopartofthispublicationmaybereproducedortransmittedforcommercialpurposeswithoutpriorpermissionfromtheForumforYouthInvestment.
PleasecontacttheForumforYouthInvestmentatTheCady-LeeHouse,7064EasternAve,NW,Washington,D.C.20012-2031,Phone:202.207.3333,Fax:202.207.3329,Web:www.forumfyi.org,Email:youth@forumfyi.orgforinformationaboutreprintingthispublicationandinformationaboutotherpublications.
NicoleYohalemandAliciaWilson-Ahlstrom,TheForumforYouthInvestmentwithSeanFischer,NewYorkUniversityandMarybethShinn,VanderbiltUniversity
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
4
Theauthorswouldliketothankthefollowingprojectadvisorswhohelpeddeveloptheoriginalscopeofwork,providedinputintotheinterviewprotocolandreportoutlineandreviewedtheoriginalreportindraftform:
CarolBehrer,IowaCollaborationforYouth•Development
PriscillaLittle,HarvardFamilyResearchProject•
ElaineJohnson,NationalTrainingInstitutefor•CommunityYouthWork
JeffreyBuehler,MissouriAfterschoolState•Network
BobPianta,UniversityofVirginia•
MarybethShinn,VanderbiltUniversity•
WeareespeciallygratefulforthecontributionsofSeanFischerandMarybethShinn,whotooktheleadonreviewingthetechnicalpropertiesofeachinstrumentanddraftedthetechnicalsectionsofboththefirst(2007)andsecondeditionsofthisreport.
Thedevelopersofeachofthetoolsdescribedinthisreportalsodeservethanks,fortheirwillingnesstosharetheirmaterials,talkwithusandreviewdrafts.ThankstoJulieGoldsmith,AmyArbreton,BethMiller,WendySurr,EllenGannett,JudyNee,PeterHowe,SuzanneGoldstein,AjayKhashu,LizReisner,EllenPechman,ChristinaRussell,RheMcLaughlin,SaraMello,ThelmaHarms,CharlesSmith,DeborahVandellandKimPierce.
ThankstoKarenPittmanforherguidanceandsuggestionsthroughouttheprojectandtoseveralForumstaffmembers,includingNaliniRavindranathandLauraMattisfortheirassistanceinthelayout,designandeditingprocess.
Finally,thankstotheWilliamT.GrantFoundationforsupportingthisworkandinparticulartoBobGranger,VivianTsengandEdSeidman,whoseideas,suggestionsandencouragementwerecriticalintransformingthisfromanideatoafinalproduct.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
5
Introduction................................................................................... 6
UpdatedContent........................................................................... 7
Cross-CuttingComparisons......................................................... 10
At-a-GlanceSummaries................................................................ 19AssessingAfterschoolProgramPracticesTool.......................................................20
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool.........21
Out-of-SchoolTimeObservationInstrument...........................................................22
ProgramObservationTool........................................................................................23
ProgramQualityObservationTool............................................................................24
ProgramQualitySelf-AssessmentTool....................................................................25
PromisingPracticesRatingScale.............................................................................26
QualityAssuranceSystem®......................................................................................27
School-AgeCareEnvironmentRatingScale............................................................28
YouthProgramQualityAssessment.........................................................................29
IndividualToolDescriptions......................................................... 30AssessingAfterschoolProgramPracticesTool.......................................................31
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool.........38
Out-of-SchoolTimeObservationInstrument...........................................................43
ProgramObservationTool........................................................................................48
ProgramQualityObservationTool............................................................................53
ProgramQualitySelf-AssessmentTool....................................................................59
PromisingPracticesRatingScale.............................................................................63
QualityAssuranceSystem®......................................................................................68
School-AgeCareEnvironmentRatingScale............................................................72
YouthProgramQualityAssessment.........................................................................77
References..................................................................................... 85
Appendix........................................................................................ 87
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
6
Thefollowingtoolsareincludedintheguideatthistime:
AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures
Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.
ProgramObservationTool(POT)NationalAfterSchoolAssociation
ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce
ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork
PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.
QualityAssuranceSystem®(QAS)Foundations,Inc.
School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal
YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality
Withtheafter-schoolandyouthdevelopmentfieldsexpandingandmaturingoverthepastseveralyears,programqualityassessmenthasemergedasacentraltheme.Thisinterestinprogramqualityissharedbypractitioners,policymakersandresearchersintheyouth-servingsector.
Fromaresearchperspective,moreevaluationsareincludinganassessmentofprogramqualityandmanyhaveincorporatedsetting-levelmeasures(wheretheobjectofmeasurementistheprogram,nottheparticipants)intheirdesigns.Atthepolicylevel,decision-makersarelookingforwaystoensurethatresourcesareallocatedtoprogramslikelytohaveanimpactandareincreasinglybuildingqualityassessmentandimprovementexpectationsintorequestsforproposalsandprogramregulations.Atthepracticelevel,programs,organizationsandsystemsarelookingfortoolsthathelpconcretizewhateffectivepracticelookslikeandallowpractitionerstoassess,reflectonandimprovetheirprograms.
Withthisgrowinginterestinprogramqualityhascomeanincreaseinthenumberoftoolsavailabletohelpprogramsandsystemsassessandimprovequality.Giventhesizeanddiversityoftheyouth-servingsector,itisunrealistictoexpectthatanyonequalityassessmenttoolwillfitallprogramsorcircumstances.Whilediversityinavailableresourcesispositiveandreflectstheevolutionofthefield,italsomakesitimportantthatpotentialusershaveaccesstogoodinformationtohelpguidetheirdecision-making.
Overthelastseveralyears,weattheForumhavefoundourselvesregularlyfieldingquestionsrelatedtoprogramqualityassessmentincludingwhattoolsexist,whatittakestousethemandwhatmightworkbestunderwhatconditions.Theneedtoofferguidancetothefieldintermsofavailableresourceshasbecomeincreasinglyclear.
Thisguidewasdesignedtocomparethepurpose,structure,contentandtechnicalpropertiesofseveralyouthprogramqualityassessmenttools.Itbuildsonworkwebeganinthisareafiveyearsago,aswellasrecentworkconductedbytheHarvardFamilyResearchProject
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
7
todocumentandcompilequalitystandardsformiddleschoolprograms(Westmoreland,H.&Little,P.,2006).
CriteriaforInclusionWithanycompendiumcomesthechallengeofdeterminingwhattoinclude.Ourfirstcaveatisthatweplantocontinuerevisingthisguideovertime,inpartbecauseinitscurrentformitisnotinclusiveoftheuniverseofrelevanttoolsandinpartbecauseagreatdealofinnovationiscurrentlyunderway.Manyofthetoolsincludedinthereviewwillberevisedorwillundergofurtherfieldtestinginthenext1-2years.
Ourcriteriaforinclusionintheguidewereasfollows:
Toolsthatareorthatincludesetting-level•observationalmeasuresofquality.Weareparticularlyinterestedindirectprogramobservationasameansforgatheringspecificdataaboutprogramqualityandinparticular,staffpractice.Thereforethisreviewdoesnotfeatureothermethodologicalapproachestomeasuringquality(e.g.,surveyingparticipants,stafforparentsabouttheprogram).
Toolswhichareapplicableinarangeofschool•andcommunity-basedprogramsettings.Wedidnotincludetoolsthataredesignedtomeasurehowwellaspecificmodelisbeingimplemented(sometimesreferredtoasfidelity)orhavelimitedapplicabilitybeyondspecificorganizationsorapproaches.
Toolsthatincludeafocusonsocialprocesses•withinprograms.Manyofthetoolsinthisguideaddresssomestaticregulatoryorlicensingissues(e.g.,policiesrelatedtostaffing,healthandsafety).However,weareparticularlyinterestedintoolsthataddresssocialprocessesortheinteractionsbetweenandamongpeopleintheprogram.
Toolswhichareresearch-based.• Allofthetoolsincludedare“research-based”inthesensethattheirdevelopmentwasinformedbyrelevantchild/youthdevelopmentliterature.Althoughweareparticularlyinterestedininstrumentswithestablishedtechnicalproperties(e.g.,reliability,
validity),notallofthoseincludedfitthismorerigorousdefinitionof“research-based.”
PurposeandContentsoftheGuideWehopethiscompendiumwillprovideusefulguidancetopractitioners,policymakers,researchersandevaluatorsinthefieldastowhatoptionsareavailableandwhatissuestoconsiderwhenselectingandusingaqualityassessmenttool.Itfocusesonthepurposeandhistory,content,structureandmethodology,technicalpropertiesanduserconsiderationsforeachoftheinstrumentsincluded,aswellasabriefdescriptionofhowtheyarebeingusedinthefield.Foreachtool,weaimtoaddressthefollowingkeyquestions:
PurposeandHistory.Whywastheinstrumentdeveloped–forwhomandinwhatcontext?Isitsprimarypurposeprogramimprovement?Accreditation?Evaluation?Forwhatkindsofprograms,servingwhatagegroups,isitappropriatefor?
Content.Whatkindsofthingsaremeasuredbythetool?Istheprimaryfocusontheactivity,programororganizationlevel?Whatcomponentsofthesettingsareemphasized–socialprocesses,programresources,orthearrangementofthoseresources(Seidman,Tseng&Weisner,2006)?HowdoesitalignwiththeNationalResearchCouncil’spositivedevelopmentalsettingsframework1(2002)?
StructureandMethodology.Howisthetoolorganizedandhowdoyouuseit?Howaredatacollectedandbywhom?Howdotheratingscalesworkandhowareratingsdetermined?Canthetoolbeusedtogenerateanoverallprogramqualityscore?
TechnicalProperties.Isthereanyevidencethatdifferentobserversinterpretquestionsinsimilarways(reliability)?Isthereanyevidencethatthetoolmeasureswhatitissupposedtomeasure(validity)?SeetheAppendixfora“psychometricsdictionary”thatdefinesrelevantterminologyandexplainswhytechnicalpropertiesareanimportantconsideration.
1Thisreportincludedalistof“featuresofpositivedevelopmentalsettings”culledfromfrequentlycitedliterature.Ithascontributedtotheemergingconsensusaboutthecomponentsofprogramquality.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
8
UserConsiderations.Howeasyisthetooltoaccessanduse?Doesitcomewithinstructionsthatareunderstandableforpractitionersaswellasresearchers?Istrainingavailableontheinstrumentitselforonthecontentcoveredbyit?Aredatacollection,managementandreportingservicesavailable?Whatcostsareassociatedwithusingthetool?
IntheField.Howisthetoolbeingappliedinspecificprogramsorsystems?
Toensurethattheguideisusefultoarangeofaudienceswithdifferentpurposesandpriorities,wehaveprovidedbothin-depthandsummarylevelinformationinavarietyofformats.
Foreachtool,weprovidebothaonepage“at-a-glance”summaryaswellasalongerdescription.Theat-a-glancesummariesorlongertooldescriptionscanstandaloneasindividualresources.Shouldyoudecidetouseoneoftheseinstrumentsorwanttotakeacloserlookattwoorthree,youcouldpullthesesectionsoutandsharewithkeystakeholders.
Wealsoprovidecross-instrumentcomparisonchartsandtablesforthosewhowanttogetasenseofwhatthelandscapeofprogramqualityassessmenttoolslookslike.TheCross-CuttingObservationssectionthatfollowscomparestheinstrumentsacrossmostofthecategorieslistedabove(purpose,content,structure,technicalproperties,userconsiderations).Whiledefinitionsofqualitydonotdifferdramaticallyacrosstheinstruments,therearenotabledifferencesinsomeoftheseotherareaswhichwetrytocapture.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
9
Inthiseditionoftheguide,weupdatethesummariesofnineassessmenttoolsfeaturedintheoriginalMarch2007edition,andaddanadditionaltool–theCommunitiesOrganizingResourcestoAdvanceLearning(CORAL)ObservationTool)–developedbyPublic/PrivateVentures.Thiseditionalsoincludesrefineddefinitionsofvalidityandadiscussionregardingsomeofthelimitationsoftraditionalmethodsofestablishingreliability.
Sinceouroriginalpublication,therehasbeenaflurryofactivityrelatedtothedevelopmentanduseofthevarioustools.Almostallofthetooldevelopershavecontinuedtoworkoneithertechnicalorpracticalaspectsoftheirassessmenttools,aswellasonrelatedresourcestosupportpractitioneruseofthesetools.
Thesechangesdemonstratecontinuedinvestmentonthepartofdevelopersinmakingtoolsmoreaccessibleanduser-friendlytoprogramsandsystemstryingtoimplementqualityassessmentandimprovement.Changesthathavebeenmadeorareindevelopmentsince2007include:
Furtherpsychometrictestingofthereliabilityand•validityofmeasures(OST;YPQA)
Developmentand/orexpansionofresourcesto•supporttheuseofvarioustools(APT;POT;QSA;QAS)
Developmentand/orexpansionoftheavailabilityof•web-basedtoolsandresources(QAS;QSA;YPQA)
Aligningqualityassessmenttoolswithother•measurestocreateapackageofcompatibletools(APT)
Restructuringoftheframeworkand/orscales•(APT;OST)
Expandingaccessbytranslatingatoolintodifferent•languages(SACERS)
Developmentofbrother/sistertoolstargeting•differentagegroups(YPQA;SACERS)
Wehopethiscompendiumwillprovideusefulguidancetopractitioners,policymakers,researchersandevaluatorsinthefieldastowhatoptionsareavailableandwhatissuestoconsiderwhenselectingandusingaqualityassessmenttool.Welookforwardtoupdatingthecompendiumagainasthisworkadvances.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
10
TOOLDEVELOPERSKEYAPT:AssessingAfterschoolProgramPracticesToolNationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation
CORAL:CommunitiesOrganizingResourcestoAdvanceLearningObservationToolPublic/PrivateVentures
OST:Out-of-SchoolTimeObservationToolPolicyStudiesAssociates,Inc.
POT:ProgramObservationToolNationalAfterSchoolAssociation
PQO:ProgramQualityObservationScaleDeborahLoweVandellandKimPierce
QSA:ProgramQualitySelf-AssessmentToolNewYorkStateAfterschoolNetwork
PPRS:PromisingPracticesRatingScaleWisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.
QAS:QualityAssuranceSystem®Foundations,Inc.
SACERS:School-AgeCareEnvironmentRatingScaleFrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal
YPQA:YouthProgramQualityAssessmentDavidP.WeikartCenterforYouthProgramQuality
Althoughtheindividualtooldescriptionsincludewhatwehopeisusefulinformationaboutseveraldifferentprogramqualityassessmentinstruments,theirlevelofdetailmaybedaunting,particularlywithoutasenseofthebroaderlandscapeofresources.Someoftheindividualizedinformationabouteachtoolcanbefurtherdistilledinwaysthatmayhelpreadersunderstandboththebroadercontextofprogramqualityassessmentandwhereindividualtoolsfallwithinthatcontext.Wewerenotabletocollectcompletelycomparableinformationaboutallinstrumentsineverytopicarea,butinthosecaseswherewewere,wehavesummarizedandcomparedthatinformationinnarrativeandcharts.
Figure1:TargetAgeandPurpose
Figure2:CommonandUniqueContent
Figure3:Methodology
Figure4:StrengthofTechnicalProperties
AdditionalTechnicalConsiderations
Figure5:TechnicalGlossary
Figure6:TrainingandSupportforUsers
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment 11
Mostofthetoolsincludedinthisreviewweredevelopedprimarilyforself-assessmentandprogramimprovementpurposes.Some,however,weredevelopedwithprogrammonitoringoraccreditationasakeygoalandseveralweredevelopedexclusivelyforuseinresearch.Manyhavetheirrootsinearlychildhoodassessment(SACERS,
POT,PQO)whileothersdrawmoreheavilyonyouthdevelopmentand/oreducationliterature(APT,CORAL,OST,PPRS,QAS,QSA,YPQA).Whilethemajorityoftoolsweredesignedtoassessprogramsservingabroadrangeofchildren(oftenK–12orK–8),somearetailoredformorespecificageranges.
ProgramTargetAge
PrimaryPurpose
GradesServed Improvement Monitoring/Accreditation
Research/Evaluation
AssessingAfterschoolProgramPracticesTool(APT) GradesK–8 ¸ ¸
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) GradesK–5 ¸ ¸
Out-of-SchoolTimeObservationTool(OST) GradesK–12 ¸
ProgramObservationTool(POT) GradesK–8 ¸ ¸
ProgramQualityObservationScale(PQO) Grades1–5 ¸
ProgramQualitySelf-AssessmentTool(QSA) GradesK–12 ¸
PromisingPracticesRatingScale(PPRS) GradesK–8 ¸
QualityAssuranceSystem(QAS) GradesK–12 ¸
School-AgeCareEnvironmentRatingScale(SACERS) GradesK–6 ¸ ¸ ¸
YouthProgramQualityAssessment(YPQA) Grades4–12 ¸ ¸ ¸
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment12
Thereisreasonableconsensusacrossinstrumentsaboutthecorefeaturesofsettingsthatmatterfordevelopment.Allofthetoolsincludedinthisreviewmeasuresixcoreconstructs(atvaryinglevelsofdepth):relationships,environment,engagement,socialnorms,skillbuildingopportunitiesandroutine/structure.ThecontentofmostoftheinstrumentsalignswellwiththeNationalResearchCouncil’sfeaturesofpositive
developmentsettingsframework(2002),whichhashelpedcontributetothegrowingconsensusaroundelementsofqualitythathasemergedsincethen.Intermsofwhatcomponentsofsettingsthetoolsemphasize(Seidmanetal,2006),allincludeafocusonsocialprocesses.Althoughonlyasubsetemphasizeprogramresources,severalincludeitemsrelatedtothearrangementofresourceswithinthesetting.
Management(CORAL,POT,QAS,QSA)
Staffing(APT,YPQA,QSA,
SACERS,POT)
YouthLeadership/Participation(APT,YPQA,OST,
QSA,PPRS)
ALLTOOLSMEASURE:RelationshipsEnvironmentEngagementSocialNorms
Skill-BuildingOpportunitiesRoutine/Structure
LinkagestoCommunity
(APT,YPQA,SACERS,QSA,QAS,POT)
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment 13
2Thetimesamplingmethodhasobserversgothroughacycleofselectingindividualparticipants(ideallyatrandom)toobserveforbriefperiodsoftimeanddocumenttheirexperiences.
Manyofthetoolsincludedinthisreviewfollowasimilarstructure.Theytendtobeorganizedaroundacoresetoftopicsorconstructs,eachofwhichisdividedintoseveralitems,whicharethendescribedbyahandfulofmoredetailedindicators.Somevariationdoesexist,however.Forexample,thePQOincludesauniquetimesamplingcomponent.2Whilemosttoolsareorganizedaroundfeaturesofquality,somearenot.
Forexample,whiletheAPTaddressesacoresetofqualityfeatures,thetoolitselfisorganizedaroundtheprogram’sdailyroutine(e.g.,arrival,transitions,pick-up).Observationistheprimarydatacollectionmethodforeachoftheinstrumentsinthisreview,althoughseveralrelyuponinterview,questionnaireordocumentreviewasadditionaldatasources.
TargetUsers DataCollectionMethods
Prog
ram
St
aff
Exte
rnal
Ob
serv
ers
Obse
rvat
ion
Inte
rvie
w
Ques
tionn
aire
Docu
men
tRe
view
AssessingAfterschoolProgramPracticesTool(APT) ¸ ¸ ¸ ¸CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) ¸ ¸Out-of-SchoolTimeObservationTool(OST) ¸ ¸ProgramObservationTool(POT) ¸ ¸ ¸ ¸ ¸ProgramQualityObservationScale(PQO) ¸ ¸ProgramQualitySelf-AssessmentTool(QSA) ¸ ¸ ¸PromisingPracticesRatingScale(PPRS) ¸ ¸QualityAssuranceSystem(QAS) ¸ ¸ ¸ ¸ ¸School-AgeCareEnvironmentRatingScale(SACERS) ¸ ¸ ¸ ¸YouthProgramQualityAssessment(YPQA) ¸ ¸ ¸ ¸
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment14
Mostoftheinstrumentshavesomeinformationshowingthatifdifferentobserverswatchthesameprogrampractices,theywillscoretheinstrumentsimilarly(internalconsistencyandinterraterreliability).Few,however,havelookedatotheraspectsofreliabilitythatareofinterestwhenassessingthestrengthofaprogramquality
measure.Severaloftheinstrumentshavepromisingfindingstoconsiderintermsofvalidity–meaningtheyhavemadesomeefforttodemonstratethattheinstrumentaccuratelymeasureswhatitissupposedtomeasure.Seetheaccompanyingglossaryonpage15andtheAppendixformoredetaileddefinitionsofpsychometricterms.
Scor
eDi
strib
utio
ns
Inte
rrat
er
Relia
bilit
y
Test
-Ret
est
Relia
bilit
y
Inte
rnal
Co
nsiste
ncy*
Conv
erge
nt
Valid
ity
Conc
urre
nt/
Pred
ictiv
eVa
lidity
Valid
ity
ofS
cale
St
ruct
ure*
AssessingAfterschoolProgramPracticesTool(APT) ¸̧ † ¸̧ †
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧
Out-of-SchoolTimeObservationTool(OST) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧
ProgramObservationTool(POT) ¸̧ †̧ ¸̧ †̧ ¸̧ †̧ ¸̧ †
ProgramQualityObservationScale(PQO) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸ ¸̧ N/A
ProgramQualitySelf-AssessmentTool(QSA)
PromisingPracticesRatingScale(PPRS) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ N/A
QualityAssuranceSystem(QAS)
School-AgeCareEnvironmentRatingScale(SACERS) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧
YouthProgramQualityAssessment(YPQA) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸̧ ¸
Key=NoEvidence
¸̧ ¸ =Evidenceofthispropertyisstrongbygeneralstandards
¸̧ =Evidenceofthispropertyismoderatebygeneralstandards,promisingbutlimitedormixed(strongonsomeitemsorscale,weakeronothers)
¸ =Evidenceofthispropertyisweakerthandesired
*Thistypeofevidenceisonlyrelevantforinstrumentswithalotofitemsthatwouldbeusefuliforganizedintoscales.†Psychometricinformationisnotbasedontheinstrumentinitscurrentform,soitsgeneralizabilitymaybelimited.
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment 15
Whatisit? WhyisitUseful?
ScoreDistributions
Thedispersionorspreadofscoresfrommultipleassessmentsforaspecificitemorscale.
Inorderforitemsandscales(setsofitems)tobeuseful,theyshouldbeabletodistinguishdifferencebetweenprograms.Ifalmosteveryprogramscoreslowonaparticularscale,itmaybethattheitemsmakeit“toodifficult”toobtainahighscoreand,asaresult,don’tdistinguishbetweenprogramsonthisdimensionverywell.
InterraterReliability
Howmuchassessmentsbydifferenttrainedratersagreewhenobservingthesameprogramatthesametime.
ItisimportanttouseinstrumentsthatyieldreliableinformationregardlessofthewhimsorpersonalitiesofindividualobserversIffindingsdependlargelyonwhoisratingtheprogram(raterAismorelikelytogivefavorablescoresthanraterB),itishardtogetasenseoftheprogram’sactualstrengthsandweaknesses.
Test-RetestReliability
Thestabilityofaninstrument’sassessmentsofthesameprogramovertime.
Ifaninstrumenthasstrongtest-retestreliabilitythanthescoreitgeneratesshouldbestableovertime.Thisisimportantbecausewewantchangesinscorestoreflectrealchangesinprogramquality.Thegoalistoavoidsituationswhereaninstrumentiseithertoosensitivetosubtlechangesthatmayholdlittlesignificance,orinsensitivetoimportantlong-termchanges.
InternalConsistency Thecohesivenessofitemsforminganinstrument’sscales.
Scalesaresetsofitemswithinaninstrumentthatjointlymeasureaparticularconcept.If,however,theitemswithinagivenscaleareactuallyconceptuallyunrelatedtoeachother,thentheoverallscoreforthatscalemaynotbemeaningful.
ConvergentValidity
Theextenttowhichaninstrumentcomparesfavorablywithanotherinstrument(preferablyonewithdemonstratedvaliditystrengths)measuringidenticalorhighlysimilarconcepts.
Itisimportanttouseaninstrumentthatgeneratesaccurateinformationaboutwhatyouaretryingtomeasure.Iftwoinstrumentsarepresumedtomeasureidenticalorhighlysimilarconcepts,wewouldexpectprogramsthatreceivehighscoresononemeasuretoalsoreceivehighscoresontheother.
Concurrent/PredictiveValidity
Theextenttowhichaninstrumentisrelatedtodistincttheoreticallyimportantconceptsandoutcomesinexpectedways.
Ifaninstrumentaccuratelymeasureshighprogramqualitythenonecanexpectittopredictbetteroutcomesfortheyouthparticipatingintheprogram.Theinstrumentsfindingsshouldalsoberelatedtodistinct,theoreticallyimportantvariablesandconceptsinexpectedways.
ValidityofScaleStructure
Theextenttowhichitemsstatisticallygrouptogetherinexpectedwaystoformscales.
Itishelpfultoknowexactlywhichconceptsaninstrumentismeasuring.Factoranalysiscanhelpdetermineifonescaleactuallyincorporatesmorethanonerelatedconceptorifdifferentitemscanbecombinedbecausetheyareessentiallymeasuringthesamething.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
16
Manyinstrumentsinthisreporthavestrongreliabilityandvalidityevidenceusingtraditionallyacceptedtechniques.However,aswithanyfield,newmethodsareoftenintroducedtoadvanceourunderstandingofreliabilityandvalidity.Inthissection,wediscusssomeofthelimitationsoftraditionalmethodsthathavebeenhighlightedbyresearchersattheDavidP.WeikartCenterforYouthProgramQuality3(YPQAdeveloper),whatthesemethodscanandcannottellusaboutaninstrument’sreliabilityandvalidity,andhownewmethodsareaddressingtheseissues.
VariationsinQualityAcrossDifferentContextsInanidealworld,scoresweobtainfromprogramqualityinstrumentswouldalwaysbeperfectlyaccurate.Unfortunately,realitytendstobemessierbecausemanyfactorsinfluenceassessments.Forexample,differentratersmaynotperceiveobservationsinthesamewayandthusgivedifferentscorestothesamequestions.Differentstaffordifferentactivitiesmightgetdifferentscores,ifwecouldobservethemall,buttypicallywecannotdoso.Anotherpossibleissuecouldbethatprogramstaffinteractwithchildrendifferentlyatthebeginningoftheyear(whentheydonotknowthechildrenwellyet)versustheendoftheyear.Whentheseinfluencesareunaccountedforwhenusinganinstrument,theyarecollectivelyknownas“errorvariance”.Whenaninstrumentisreliable,itsscoresarenotinfluencedbymucherror.
Unfortunately,traditionalreliabilitymethods,includingtheonesusedbyinstrumentsinthisreport,donotaccountforallpossiblesourcesofvariationinscores(therebyincreasingtheinaccuracyoftheinstrument),asdiscussedbySteveRaudenbushandhiscolleagues(forareadabletreatmentofthistopic,seeMartinez&Raudenbush,2008).CharlesSmithattheWeikartCenterhasdonepreliminaryworkthatexaminessourcesofvariationfortheYPQA,includingwhetherratingsaredifferentduringearlierprogramsessionsversuslaterprogramsessions.Hehasalsoexamined
whetheritisenoughtoobserveonetypeofactivitywithinanagencyversusobservingabroadrangeofactivities(readerswhoareinterestedinspecificfindingsshouldrefertothetechnicalsummaryoftheYPQA,seepages78-82).
Whenweknowhowaninstrumentisinfluencedbyallthesefactors,wecantakestepstoreduceerror.Forexample,ifaninstrument’sscoresvarywidelydependingonwhichactivitiesweareobserving,thenweshouldobserveawiderangeofactivities.Ifscoresdependonthetimeofday,thenweshouldconductobservationsatmultipletimesthroughouttheday.Byaccountingfortheseadditionalinfluencesonprogramquality,wereduceerrorandobtainmoreaccuratescores.Atthispointintime,theYPQAistheonlyinstrumenttohavepreliminaryinformationonexternalsourcesofvariationbeyondinterraterreliability.
UnderstandingAssumptionsaboutInternalConsistencySeveralinstrumentsinthisreporthaveinternalconsistencyinformationontheirscales.Asexplainedinotherpartsofthisreport,scalesarecomposedofitemsmeanttomeasureaparticularconcept.Measuringinternalconsistencyisoftenthefirststepinevaluatingwhethertheitemsformameaningfuldomainbydeterminingwhethertheyarecohesive(readerswhowouldlikeamoreextensiveexplanationwithexamplesshouldrefertotheAppendix).
However,Smithpointsoutthatinternalconsistencyisonlyappropriatewhentheitemsarereflectionsofanon-tangibleconcept(calledreflectiveforshort).Asanexample,considertheconcept“SupportiveEnvironment.”Althoughthismightbeanimportantconcepttoassess,onecannotmeasureitthesamewayonewouldmeasuretemperatureorweight.Instead,researchersmustrelyonasetofquestionstoapproximatehowsupportivetheenvironmentisforchildren.Oneanalogyforareflectiveconceptcouldbeanartsculpture–totrulyappreciateit,onemustlookatthesculpturefrommultipleangles.
3TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
17
Similarly,totrulymeasureareflectiveconcept,onemustexamineitusingagroupofsimilaritems(whichprovidethedifferentangles).Theoreticallyspeaking,researchersinterestedindevelopingascalecouldprobablygeneratehundredsofpossibleitemstomeasureaparticularconcept.Ofcourse,itisimpracticaltousethemall,soresearcherschooseamanageablesubset.
Incontrast,internalconsistencyscoresarenotappropriatewhenconceptsareformative,whichmeansthatasingleconceptisacompositeofmultiple,separatecomponents(MacKenzie,Podsakoff,&Jarvis,2005).Agoodanalogyforthistypeofconceptisapuzzle.Toassessaformativeconcept,youneedtogatherallofthepiecesandputthemtogether.Forexample,imaginethatwewishtomeasure“ProgramResources.”Unlikeareflectiveconcept,thistypeofconceptisacompositeofseveralimportantcomponents(thepuzzlepieces).
Ouritemswouldinquireaboutthingslikemoney,time,space,andnumberofstaffmembers.Eachoftheseresourcesmaybeanimportantcomponentoftheoverallconceptandareessentialtoincludeinthescaleifwearetoobtainaclearpictureandcompletethepuzzle.Unlikereflectiveconcepts,researcherscannotchooseasubsetofitemsfromalargelistofpossibilities.Rather,eachitemisanimportantcomponenttothewhole.Becausetheconceptisacompositeofseparateandpotentiallyunrelatedparts,thecohesivenessoftheitemsisnotimportant,andthereforeinternalconsistencyproceduresarenotappropriate(asstatedintheAppendix,internalconsistencymeasurestherelatednessofitems,whichassumesthattheitemsarereflective).
TheWeikartCenterhasbeenreexaminingtheYPQAscalestoassesswhethertheyarereflectiveversusformative.Althoughthisworkisstillinprogress,theresultswillhaveimportantimplicationsforhowwethinkaboutevaluatingbothreliabilityandvalidityforobservation-basedmetrics.
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
©January2009TheForumforYouthInvestment18
*Afeestructuremaybedevelopedovertime,onceadditionalmaterialsarecompleted.†Traininganddataserviceshaveonlybeenmadeavailableinthecontextofspecificresearchprojects.††Theseareestimatesoftimenecessarytoprepareobservers;developersofthesetoolshavenottrained“toreliability.”
SixoftheteninstrumentsincludedinthisreviewarefreetousersandavailabletodownloadfromtheInternet;theotherfourhavevariouscostsassociatedwiththeiruse.Inmost,butnotallcases,trainingisavailable(atafee)forthoseinterestedinusingthetool.Manycomewithuser-friendlymanualsthatexplainhow
tousetheinstrument;insomecasesthesematerialsarestillunderdevelopment.Inseveralcases,thedevelopersofthetoolsalsoprovidedatacollection,managementandreportingservicesatadditionalcosttousers.Detailsaboutsuchconsiderationsareincludedintheindividualtooldescriptions.
Cost
Trai
ning
Ava
ilabl
e
Estim
ated
Tim
eNe
cess
ary
to
Trai
nOv
erse
rver
sto
Gen
erat
e
Relia
bleSc
ores
Estim
ated
M
inim
um
Obse
rvat
ion
Tim
eNe
eded
to
Gene
rate
Sou
nd
Data
Data
Col
lect
ion,
M
anag
emen
tan
dRe
portin
gAv
aila
ble
AssessingAfterschoolProgramPracticesTool(APT) Free* Yes
4hourtrainingplus2programobservations
1afternoon(2-3hours)
No
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) Free No 2days 3-4hours No
Out-of-SchoolTimeObservationTool(OST) Free No†8-18hours,dependingonexperience
3hours No†
ProgramObservationTool(POT)
$300AdvancingQualityKit
Yes 2.5-3days3-5hours(forself-assessment)
No
ProgramQualityObservationScale(PQO) Free No†
2hoursplus2-4observations&2-4timesamples,dependingonexperience
1.5hoursobservation&.5hourstimesampling
No†
ProgramQualitySelf-AssessmentTool(QSA) Free Yes 2hours†† N/A No
PromisingPracticesRatingScale(PPRS) Free No†
2hoursplus2-4observations,dependingonexperience
2hours No†
QualityAssuranceSystem(QAS)
$75AnnualSiteLicense
Yes 2-3hours††1afternoon(2-3hours)
Yes
School-AgeCareEnvironmentRatingScale(SACERS)
$15.95SACERSBooklet
Yes 4-5days 3hours Yes
YouthProgramQualityAssessment(YPQA) $39.95YPQAStarterPack
Yes 2days 4hours Yes
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
19
Detaileddescriptionsofthetenassessmenttoolsareprovidedinthenextsection.Hereweofferone-pagesummariestocopyandshare.Eachsummaryfollowsacommonformat.
AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures
Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.
ProgramObservationTool(POT)NationalAfterSchoolAssociation
ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce
ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork
PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.
QualityAssuranceSystem®(QAS)Foundations,Inc.
School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal
YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
20
DevelopedbyNIOSTandtheMassachusettsDepartmentofElementary&SecondaryEducation
Overview:TheAssessmentofAfterschoolProgramPracticesTool(APT)isdesignedtohelppractitionersexamineandimprovewhattheydointheirprogramtosupportyoungpeople’slearninganddevelopment.Itexaminesthoseprogrampracticesthatresearchsuggestsrelatetoyouthoutcomes(e.g.,behavior,initiative,socialrelationships).AresearchversionoftheAPT(theAPT-R)wasdevelopedin2003-2004.Thismoreuser-friendlyself-assessmentversionwasdevelopedin2005.
PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation
ProgramTargetAge:GradesK–8
RelevantSettings:Bothstructuredandunstructuredprogramsthatserveelementaryandmiddleschoolstudentsduringthenon-schoolhours.
Content:TheAPTmeasuresasetof15program-levelfeaturesandpracticesthatcanbesummarizedintofivebroadcategories–programclimate,relationships,approachesandprogramming,partnershipsandyouthparticipation.
Structure:The15programfeaturesaddressedbytheAPTaremeasuredbytwotools–theobservationinstrument(APT-O)andquestionnaire(APT-Q).TheAPT-Oguidesobservationsoftheprograminaction,whiletheAPT-Qexaminesaspectsofqualitythatarenoteasilyobservedandguidesstaffreflectiononthoseaspectsofpracticeandorganizationalpolicy.
Methodology:Itemsthatareobservablewithinagivenprogramsession(typicallyonefullafternoon)areassessedintheAPT-O.TheAPT-Qisaquestionnairetogatherinformationaboutplanning,frequencyandregularityofprogramofferings
andopportunitiesandfrequencyofconnectionswithfamiliesandschool.BoththeAPT-OandAPT-Qhavefour-pointscales,thoughflexibilityisencouragedforuserswhofindthescalesnotusefulfortheirpurposes.Dependingonwhatpartofthetool(s)isbeingused,thescalesmeasurehowcharacteristicanitemisoftheprogram,theconsistencyofanitemorthefrequencyofanitem.Foreachitem,concretedescriptorsillustratewhatascoreof1,2,3or4lookslike.
TechnicalProperties:Whilenopsychometricinformationisavailableforthecurrentself-assessmentversionoftheAPT,someisavailableontheresearchversion(APT-R)onwhichitisbased.FortheAPT-R,interraterreliabilitywasmoderateandpreliminaryevidenceofconcurrentandpredictivevalidityisavailable.NIOSThasplansforfurthertestingoftheAPT.
UserConsiderations:EaseofUse
“Cheatsheets”demonstratelinkbetweenquality•andoutcomes.
Instrumentisextremelyflexibleinterms•ofadministration,useofscales,numberofobservations,etc.
Theinstrumentisdesignedforuserstomake•observationsinjustoneprogramsession.
Theinstrumentcanbeusedaspartofapackage•includinganoutcomestoolanddatatrackingsystem.
AvailableSupportsTrainingonboththeAPTitselfandtheyouth•developmentprinciplesembeddedintheinstrumentisavailablethroughNIOST.
Packagingandpricinginformationabouttraining•ontheinstrumentisavailablefromNOISTfororganizationsnotalreadyaffiliatedwiththeAPT.
ForMoreInformation:www.niost.org/content/view/1572/282/orwww.doe.mass.edu/21cclc/ta
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
21
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool
Overview:TheCORALobservationtoolwasdesignedbyPublic/PrivateVentures(P/PV)fortheCORALafter-schoolinitiativefundedbytheJamesIrvineFoundation.ThetoolwasdevelopedforresearchpurposesandwasprimarilyusedinaseriesofevaluationstudiesontheCORALafter-schoolinitiative.TheprimarypurposeoftheobservationswastomonitorfidelitytotheBalancedLiteracyModelandchangeinqualityandoutcomesovertime.Thetoolwasusedintwoways:1)observationofliteracyinstructionand2)observationofprogramminginsupportofliteracy.ThoughtheCORALobservationtoolwasdesignedtohelpobserversmeasuretheimpactofafter-schoolprogramsonacademicachievement,ithasapplicationsforobservingqualityinawidevarietyofsettings.
PrimaryPurpose:Research/Evaluation
ProgramTargetAge:GradesK–5
RelevantSettings:Structuredliteracy-basedprograms,bothschoolandcommunity-based.
Content:TheCORALobservationtooldocumentstheconnectionbetweenthequalityoftheprogram,fidelitytotheBalancedLiteracyModelandtheacademicoutcomesofparticipants.
Structure:TheCORALobservationtoolisstructuredaroundfivekeyconstructsofquality–adult-youthrelations,effectiveinstruction,peercooperation,behaviormanagementandliteracyinstruction.Thetoolisdividedintofiveparts.Thefirstthree–theactivitydescriptionform,characteristicsformandtheactivitycheckboxform–arefocusedondescribingtheactivityaswellasparticipantandstaffbehavior.Thesecondtwo
componentsincludeanactivityscaleandanoverallassessmentform,andarecompletedaftera90-minuteobservationperiod.
Methodology:Eachconstructisbasedonafive-pointratingscale.Theactivitydescriptionform,characteristicsformandactivitycheckboxformarefilledoutbeforeanactivityisobserved,andcontainthemostinformativeaspectsoftheactivity.Theactivityscaleandoverallassessmentformarecompletedaftera90-minuteobservationsession.
TechnicalProperties:Evidenceforscoredistributionsandpredictivevalidityisstrongbygeneralstandards,andevidenceforinternalconsistencyandthevalidityofscalestructureispromisingbutlimited.
UserConsiderations:EaseofUse
Containsdetailedinstructionsforconducting•observations.
Includesspaceforopen-endednarratives.•
Scoringtakes3-4hours,includingcompletingthe•ratingscales,relatednarrativesandtheoverallassessment.
AvailableSupportsCurrently,trainingislimitedtoindividualsinvolved•inspecificevaluationsthatemploytheinstrument.
Public/PrivateVenture’swebsitefeaturesafree•downloadofmaterialsintheirAfterschoolToolkit.
ForMoreInformation:www.ppv.org/ppv/initiative.asp?section _ id=0&initiative _ id=29
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
22
Out-of-SchoolTimeProgramObservationToolDevelopedbyPolicyStudiesAssociates,Inc.
Overview:TheOut-of-SchoolTimeProgramObservationTool(OST)wasdevelopedinconjunctionwithseveralresearchprojectsrelatedtoout-of-schooltimeprogramming,withthegoalofcollectingconsistentandobjectivedataaboutthequalityofactivitiesthroughobservation.Itsdesignisbasedonseveralassumptionsabouthigh-qualityprograms–firstthatcertainstructuralandinstitutionalfeaturessupporttheimplementationofhigh-qualityprogramsandsecondthatinstructionalactivitieswithcertaincharacteristics–variedcontent,mastery-orientedinstructionandpositiverelationships–promotepositiveyouthoutcomes.
PrimaryPurpose:Research/Evaluation
ProgramTargetAge:GradesK–12
RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.
Content:TheOSTdocumentsandratesthequalityofthefollowingmajorcomponentsofafter-schoolactivities:interactionsbetweenyouthandadultsandamongyouth,staffteachingprocessesandactivitycontentandstructures.
Structure:ThefirstsectionofOSTallowsfordetaileddocumentationofactivitytype,numberanddemographicsofparticipants,spaceused,learningskillstargeted,typeofstaffandtheenvironmentalcontext.Theremainderofthetoolassessesthequalityofactivitiesalongfivekeydomainsincludingrelationships,youthparticipation,staffskillbuildingandmasterystrategiesandactivitycontentandstructure.
Methodology:TheOSTobservationinstrumentusesaseven-pointscaletoassesstheextenttowhicheachindicatorisorisnotpresentduringanobservation.Qualitativedocumentation,recordedonsite,supplementstheratingscales.ActivityandqualityindicatordatafromtheOSTobservationinstrumentisusedinconjunctionwithrelatedsurveymeasures.
TechnicalProperties:Evidenceforinterraterreliabilityisstrongbygeneralstandards,asisevidenceforscoredistributionsandinternalconsistency.Evidenceforconcurrentvalidityandthevalidityofthescalestructureispromisingbutlimited.
UserConsiderations:EaseofUse
Freeandavailableonline.•
Toolincludesanintroductionandbasicprocedures•foruse.
Includessometechnicallanguagebuthasbeenused•bybothresearchersandpractitioners.
Ratersmustobserveapproximately3hoursof•programmingtogeneratesounddata.
Observerscanbetrainedtogeneratereliable•observationsthrough8-16hoursoftraining,dependingonlevelofexperience.
AvailableSupportsTrainingislimitedtoindividualsinvolvedinspecific•evaluationsthatemploytheinstrument.
Additionalnon-observationalmeasuresrelatedto•after-schoolprogrammingareavailablefromPSAthatcanbeusedinconjunctionwiththeOST.
ForMoreInformation:www.policystudies.com/studies/youth/OST%20Instrument.html
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
23
DevelopedbytheNationalAfterSchoolAssociation
Overview:TheProgramObservationToolisthecenterpieceoftheNationalAfterSchoolAssociation’s(NAA)programimprovementandaccreditationprocessandisdesignedspecificallytohelpprogramsassessprogressagainsttheStandardsforQualitySchool-AgeCare.Developedin1991byNAAandtheNationalInstituteonOut-of-SchoolTime,thetoolwasrevisedandpilotedbeforetheaccreditationsystembeganin1998.
PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation
ProgramTargetAge:GradesK–8
RelevantSettings:Schoolandcenter-basedafter-schoolprograms.
Content:TheProgramObservationToolmeasures36“keysofquality,”organizedintosixcategories.Fiveareassessedprimarilythroughobservation:humanrelationships;indoorenvironment;outdoorenvironment;activities;andsafety,healthandnutrition.Thesixth–administration–isassessedthroughquestionnaire/interview.ThetoolreflectsNAA’scommitmenttoholisticchilddevelopmentanditsaccreditationorientation.
Structure:Thefivequalitycategoriesthatarethefocusofthetoolaremeasuredusingoneinstrumentthatincludesthe20relevantkeysandatotalof80indicators(fourperkey).Ifaprogramisgoingthroughtheaccreditationprocess,theadministrationitemsareassessedseparately,throughquestionnaire/interview.
Methodology:Theratingscalecaptureswhethereachindicatoristrueallofthetime,mostofthetime,sometimesornotatall.Specificdescriptionsofwhata0,1,2or3lookslikearenotprovided,butdescriptivestatementshelpclarifythemeaningofeachindicator.Programsseekingaccreditation
mustassignanoverallprogramratingbasedonindividualscoresandguidelinesareprovidedforobserverstoreconcileandcombinescores.Foraccreditationpurposes,theprogram/activitiesandsafety/nutritioncategoriesare“weighted.”
TechnicalProperties:NopsychometricevidenceisavailableonthePOTitself,butthereisinformationabouttheASQ(AssessingSchool-AgeChildcareQuality),fromwhichthePOTwasderived.Overall,evidenceforinterraterandtest-retestreliabilityisstrongbygeneralstandards.Followingrevisionstothescales,evidenceforinternalconsistencywasalsostrong.PreliminaryevidenceofconcurrentvalidityisalsoavailablefortheASQ.
UserConsiderations:EaseofUse
Accessiblelanguageandformatdevelopedwithinput•frompractitioners.
Whenusedforself-assessment,observationand•scoringtakesroughly3-5hours.
Aself-studymanualprovidesdetailedguidanceon•instrumentadministration.
Thepackagecostsapproximately$300(additional•costsforfullaccreditation).
AvailableSupportsThePOTispartofanintegratedsetofresourcesfor•self-studyandaccreditation.
Thefullaccreditationpackageprovidesdetailed•guides,videosandothersupports.
BeginninginSeptember2008,accreditationis•offeredthroughtheCouncilonAccreditation.
NAAcurrentlyofferstrainingthatcoversthe•ProgramObservationToolthroughitsday-longEndorserTraining(NAArecommendstwoandahalfdaysoftraininginordertoensurereliability).
SomeNAAstateaffiliatesoffertrainingforprograms•interestedinself-assessmentandimprovement.
ForMoreInformation:http://naaweb.yourmembership.com/?page=NAAAccreditation
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
24
DevelopedbyDeborahLoweVandell&KimPierce
Overview:TheProgramQualityObservationScale(PQO)wasdesignedtohelpobserverscharacterizetheoverallqualityofanafter-schoolprogramenvironmentandtodocumentindividualchildren’sexperienceswithinprograms.ThePQOhasbeenusedinaseriesofresearchstudiesandhasitsrootsinVandell’sobservationalworkinearlychildcaresettings.
PrimaryPurpose:Research/Evaluation
ProgramTargetAge:Grades1–5
RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.
Content:ThePQOfocusesprimarilyonsocialprocessesandinparticular,threecomponentsofqualityofchildren’sexperiencesinsideprograms:relationshipswithstaff,relationshipswithpeersandopportunitiesforengagementinactivities.
Structure:Thetoolhastwocomponents–qualitativeratingsfocusedontheprogramenvironmentandstaffbehavior(referredtoas“caregiverstyle”)andtimesamplesofchildren’sactivitiesandinteractions.Whileprogramenvironmentratingsaremadeoftheprogramasawhole,caregiverstyleratingsaremadeseparatelyforeachstaffmemberobserved.
Methodology:Allitemsareallassessedthroughobservation(althoughthePQOhasalwaysbeenusedintandemwithothermeasuresthatrelyondifferentkindsofdata).Programenvironmentandcaregiverstyleratingsaremadeusingafour-pointscaleandusersaregivendescriptionsofwhatconstitutesa1,2,3or4forthreeaspectsof
environmentandfouraspectsofcaregiverstyle.Inthetimesampleofactivities,activitytypeisrecordedusing19differentcategoriesandinteractionsareassessedandcodedalongseveraldimensions.
TechnicalProperties:Evidenceforinterraterreliability,scoredistributions,internalconsistencyandconvergentvalidityisstrongbygeneralstandardsandevidencefortest-retestreliabilityandconcurrent/predictivevalidityispromisingbutmixed.
UserConsiderations:EaseofUse
Freeandavailableforuse.•
ThePQOwasdevelopedwitharesearchaudience•inmind.Manualincludesbasicinstructionsforconductingobservationsandcompletingformsbuthasnotbeentailoredforgeneralorpractitioneruseatthistime.
Qualitativeratingsofenvironmentandstaff•requireaminimumof90minutesobservationtime.Completingthetimesamplesasoutlinedtakesaminimumof30minutesforanexperiencedobserver.
AvailableSupportsTraininghasonlybeenmadeavailableinthecontext•ofaspecificresearchstudy.
Datacollection,managementorreportinghaveonly•beenavailableinthecontextofaspecificstudy.
Theauthorshavedevelopedarangeofrelated•measuresthatcanbeusedinconjunctionwiththePQO(e.g.,physicalenvironmentquestionnaire;staff,studentandparentsurveys).
ForMoreInformation:http://childcare.gse.uci.edu/des4.html
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
25
DevelopedbytheNewYorkStateAfterschoolNetwork
Overview:TheProgramQualitySelf-AssessmentTool(QSA)wasdevelopedexclusivelyforself-assessmentpurposes(useforexternalassessmentandformalevaluationpurposesisdiscouraged).TheQSAisintendedtobeusedasthefocalpointofacollectiveself-assessmentprocessthatinvolvesallprogramstaff.Soonafteritwascreatedin2005,thestateofNewYorkbeganrequiringthatall21stCCLC-fundedprogramsuseittwiceayearforself-assessmentpurposes.
PrimaryPurpose:ProgramImprovement
ProgramTargetAge:GradesK–12
RelevantSettings:Thefullrangeofschoolandcommunity-basedafter-schoolprograms.TheQSAisparticularlyrelevantforprogramsthatintendtoprovideabroadrangeofservicesasopposedtothosewitheitheraverynarrowfocusornoparticularfocus(e.g.,drop-incenters).
Content:TheQSAisorganizedinto10essentialelementsofeffectiveafter-schoolprograms,includingenvironment/climate;administration/organization;programming/activities;andyouthparticipation/engagement,amongothers.Alistofstandardsdescribeseachelementingreaterdetail.Theelementsrepresentamixofactivity-level,program-levelandorganizational-levelconcerns.
Structure:EachoftheQSA’s10essentialelementsisfurtherdefinedbyasummarystatementwhichisthenfollowedbybetween7and18qualityindicators.Thefour-pointratingscaleusedintheQSAisdesignedtocaptureperformancelevelsforeachindicator.Indicatorsarealsoconsideredstandardsofpractice,sothegoalistodeterminewhethertheprogramdoesordoesnotmeeteachofthestandards.
Methodology:Whilemostessentialelementsareassessedthroughobservation,themoreorganizationallyfocusedelementssuchasadministration,measuringoutcomes/evaluationandprogramsustainability/growthareassessedprimarilythroughdocumentreview.Usersarenotencouragedtocombinescoresforeachelementtodetermineaglobalrating,becausethetoolisintendedforself-assessmentonly.
TechnicalProperties:Beyondestablishingfacevalidity,theinstrument’spsychometricpropertieshavenotbeenresearched.
UserConsiderations:EaseofUse
PractitionersledthedevelopmentoftheQSA;•languageandformatareclearanduser-friendly.
Thetoolisfreeanddownloadableandincludes•anoverviewandinstructions.
Thetoolisscheduledforarevisionwhichwill•targetlengthandguidanceondeterminingratings.
AdditionalSupportsTheNewYorkStateAfterschoolNetworkhas•developedauserguide,whichprovidesaself-guidedwalk-throughofthetool.
ProgramscancontacttheNewYorkState•AfterschoolNetworktoreceivereferralsfortechnicalassistanceinusingtheinstrument.
ProgramsareencouragedtousetheQSAinconcert•withotherformalorinformalevaluativeefforts.
NYSANtrainingsareorganizedaroundthe•10elementsfeaturedintheinstrument,sopractitionerscaneasilyfindprofessionaldevelopmentopportunitiesthatconnecttothefindingsintheirself-assessment.
ForMoreInformation:www.nysan.org
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
26
DevelopedbytheWisconsinCenterforEducationResearch&PolicyStudiesAssociates,Inc.
Overview:ThePromisingPracticesRatingScale(PPRS)wasdevelopedinthecontextofastudyoftherelationshipbetweenparticipationinhighqualityafter-schoolprogramsandchildandyouthoutcomes.Thetoolwasdesignedtohelpresearchersdocumenttypeofactivity,extenttowhichpromisingpracticesareimplementedwithinactivitiesandoverallprogramquality.ThePPRSbuildsdirectlyonearlierworkbyDeborahLoweVandellanddrawsuponseveralotherobservationinstrumentsincludedinthisreport.
PrimaryPurpose:Research/Evaluation
ProgramTargetAge:GradesK–8
RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.
Content:ThePPRSfocusesprimarilyonsocialprocessesoccurringattheprogramlevel(othertoolsinthePPassessmentsystemareavailabletocollectotherkindsofinformation).Thetooladdressesactivitytype,implementationofpromisingpracticesandoverallprogramquality.Thepracticesatthecoreoftheinstrumentincludesupportiverelationswithadults,supportiverelationswithpeers,levelofengagement,opportunitiesforcognitivegrowth,appropriatestructure,over-control,chaosandmasteryorientation.
Structure:Thefirstpartoftheinstrumentfocusesonactivitycontext.Observerscodethingslikeactivitytype,space,skillstargeted,numberofstaffandyouthinvolved.Observersthenaddabriefnarrativedescriptionoftheactivity.ThecoreofthePPRSiswhereobserversdocumenttowhatextentcertainexemplarsofpromisingpracticearepresentintheprogram.
Methodology:Allitemsinthescaleareaddressedthroughobservation,withanemphasisfirstonactivitiesandthenmorebroadlyontheimplementationofpromisingpracticesbystaffwithintheprogram.Eachareaofpracticeisdividedintospecificexemplars(positiveandnegative)withdetailedindicators.Ratingsareassignedattheoverallpracticelevelusingafour-pointscale.Observersthenreviewtheirratingsofpromisingpracticesacrossmultipleactivitiesandassignanoverallratingforeachpracticeareaandtheoverallprogram.
TechnicalProperties:Strongevidenceforscoredistributionandinternalconsistencyoftheaverageoverallscorehasbeenestablished.Promisingbutlimitedevidenceofmoderateinterrelaterreliabilityandpredictivevalidityhavealsobeenestablished.
UserConsiderations:EaseofUse
Freeandavailableforuse.•
ThePPRSwasdevelopedwitharesearchaudience•inmind.Manualincludesbasicinstructionsforconductingobservationsandcompletingformsbuthasnotbeentailoredforgeneralorpractitioneruseatthistime.
InthestudythePPRSwasdevelopedfor,formal•observationtimetotaledapproximatelytwohourspersite,withadditionalhoursspentreviewingnotesandassigningratings.
AvailableSupportsTraininghasonlybeenmadeavailableinthecontext•ofaspecificstudy.
Datacollection,managementorreportinghasonly•beenavailableinthecontextofaspecificstudy.
Theauthorshavedevelopedarangeofrelated•measuresthatcanbeusedinconjunctionwiththePPRS(e.g.,physicalenvironmentquestionnaire;staff,studentandparentsurveys).
ForMoreInformation:http://childcare.gse.uci.edu/des3.html
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
27
QualityAssuranceSystem®DevelopedbyFoundations,Inc.
Overview:TheQualityAssuranceSystem®(QAS)wasdevelopedtohelpprogramsconductqualityassessmentandcontinuousimprovementplanning.Basedonseven“buildingblocks”thatareconsideredrelevantforanyafter-schoolprogram,thisWeb-basedtoolisexpandableandhasbeencustomizedforparticularorganizationsbasedontheirparticularfocus.TheQASfocusesonqualityatthe“site”levelandaddressesarangeofaspectsofqualityfrominteractionstoprogrampoliciesandleadership.
PrimaryPurpose:ProgramImprovement
ProgramTargetAge:GradesK–12
RelevantSettings:Arangeofschool-andcommunity-basedprograms.
Content:ThevariouscomponentsofqualitythattheQASfocusesonareconsidered“buildingblocks.”Thesevencorebuildingblocksinclude:programplanningandimprovement;leadership;facilityandprogramspace;healthandsafety;staffing;familyandcommunityconnections;andsocialclimate.Threeadditional“programfocusbuildingblocks”thatreflectparticularfociwithinprogramsarealsoavailable.
Structure:TheQASisdividedintotwoparts.Partone–programbasics–includesthesevencorebuildingblocks.Foreach,usersaregivenabriefdescriptionoftheimportanceofthataspectofqualityandthenthebuildingblockisfurthersubdividedintobetweenfiveandeightelements,eachofwhichgetsrated.Parttwoofthetool–programfocus–consistsofthethreeadditionalbuildingblocksanditsstructureparallelsthatofpartone.RatingsfortheQASaremadeusingafour-pointscalefromunsatisfactory(1)tooutstanding(4).
Methodology:FillingouttheQASrequiresacombinationofobservation,interviewanddocumentreview.Usersfollowafive-stepprocessforconductingasitevisitandcollectingdata,whichincludesobservationoftheprograminactionandareviewofrelevantdocuments.Onceratingsforeachelementareenteredintothecomputer,scoresaregeneratedforeachbuildingblock–ratherthanasinglescorefortheoverallprogram–reflectingthetool’semphasisonidentifyingspecificareasforimprovement.
TechnicalProperties:Beyondestablishingfacevalidity,researchabouttheinstrument’spsychometricpropertieshasnotbeenconducted.
UserConsiderations:EaseofUse
TheQASisflexibleandcustomizable,withbuilt-in•user-friendlyfeatures.
Theinstructionguidewalkstheuserthroughbasic•stepsforusingthesystem.
The$75annuallicensingfeecoverstwoassessments•andcumulativereports.
Multi-siteprogramscangeneratesitecomparisonreports.•
AvailableSupports
Foundations,Inc.offersonlinesessionsandin-person•trainings.
OnceaQASsitelicenseispurchased,programs•canreceivelightphonetechnicalassistancefreeofchargefromstaff.
Programsthatwishtohavetheirassessment•conductedbytrainedassessorscanpurchasethisserviceundercontractwithFoundations,Inc.
TheQASisavailableinaWeb-basedformatallowing•userstoenterdataandimmediatelygeneratebasicgraphsandanalyses.
ForMoreInformation:http://qas.foundationsinc.org/start.asp?st=1
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
28
DevelopedbyFrankPorterGrahamChildDevelopmentInstitute&ConcordiaUniversity,Montreal
Overview:TheSchool-AgeCareEnvironmentRatingScale(SACERS),publishedin1996andupdatedperiodically,isoneofaseriesofqualityratingscalesdevelopedbyresearchersattheFrankPorterGrahamChildDevelopmentInstitute.SACERSfocuseson“processquality”orsocialinteractionswithinthesetting,aswellasfeaturesrelatedtospace,scheduleandmaterialsthatsupportthoseinteractions.TheSACERScanbeusedbyprogramstaffaswellastrainedexternalobserversorresearchers.
PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation;Research/Evaluation
ProgramTargetAge:GradesK–8
RelevantSettings:Arangeofprogramenvironmentsincludingchildcarecenters,school-basedafter-schoolprogramsandcommunity-basedorganizations.
Content:SACERSisbasedonthenotionthatqualityprogramsaddressthree“basicneeds”:protectionofhealthandsafety,positiverelationshipsandopportunitiesforstimulationandlearning.Thesevensub-scalesoftheinstrumentincludespaceandfurnishings;healthandsafety;activities;interactions;programstructure;staffdevelopment;andaspecialneedssupplement.
Structure:TheSACERSscaleincludes49items,organizedintosevensubscales.All49itemsareratedonaseven-pointscale,from“inadequate”to“excellent.”Concretedescriptionsofwhateachitemlookslikeatdifferentlevelsareprovided.Allofthesub-scalesanditemsareorganizedintoonebookletthatincludesdirectionsforuseandscoringsheets.
Methodology:Whileobservationisthemainformofdatacollection,severalitemsarenotlikelytobeobservedduringprogramvisits.Ratersareencouragedtoaskquestionsofadirector
orstaffpersoninordertoratetheseandareprovidedwithsamplequestions.Formanyitems,clarifyingnoteshelptheuserunderstandwhattheyshouldbelookingfor.Observersenterscoresonasummaryscoresheet,whichencouragesuserstocompileratingsandcreateanoverallprogramqualityscore.
TechnicalProperties:Evidenceforinterraterreliabilityandinternalconsistencyisstrongbygeneralstandards.Convergentandconcurrentvalidityevidenceislimitedbutpromising.
UserConsiderations:EaseofUse
Accessibleformatandlanguage.•
Includesfullinstructionsforuse,clarifyingnotes•andatrainingguide.
ThecostofSACERSbookletis$15.95.•
Suggestedtimeneeded:threehourstoobserve•aprogramandcompleteform.
Guidanceisofferedonhowtosample,observeand•scoretoreflectmultipleactivitieswithinaprogram.
AvailableSupportsAdditionalscoresheetscanbepurchasedinpackages•of30.
Threeandfive-daytrainingsonSACERSstructure,•rationaleandscoring.
Guidanceonhowtoconductyourowntraining•isprovidedinbooklet.
Trainingtoreliabilitytakes4-5days,withreliability•checksthroughout.
AccesstoalistservthroughtheFranklinPorter•GrahamInstituteWebsite.
Largescaleuserscannowusecommercialsoftware•toenter/scoredata.
WithWeb-basedreportingsystem,individual•assessmentscanberoutedtoasupervisorforqualityassuranceandfeedbackandaggregateanalysesandreportingcanbeprovided.
ForMoreInformation:www.fpg.unc.edu/~ecers/
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
29
DevelopedbytheDavidP.WeikartCenterforYouthProgramQuality4
Overview:TheYouthProgramQualityAssessment(YPQA)wasdevelopedbytheHigh/ScopeEducationalResearchFoundationandhasitsrootsinalonglineageofqualitymeasurementrubricsforpre-schoolelementaryandnowyouthprograms.TheoverallpurposeoftheYPQAistoencourageindividuals,programsandsystemstofocusonthequalityoftheexperiencesyoungpeoplehaveinprogramsandthecorrespondingtrainingneedsofstaff.Whilesomestructuralandorganizationalmanagementissuesareincludedintheinstrument,theYPQAisprimarilyfocusedonwhatthedevelopersrefertoasthe“pointofservice”–thedeliveryofkeydevelopmentalexperiencesandyoungpeople’saccesstothoseexperiences.
PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation;Research/Evaluation
ProgramTargetAge:Grades4–12
RelevantSettings:Structuredprogramsinarangeofschool-andcommunity-basedsettings.
Content:Becauseofthefocusonthe“pointofservice,”theYPQAemphasizessocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Themajorityofitemsareaimedathelpingusersobserveandassessinteractionsbetweenandamongyouthandadults,theextenttowhichyoungpeopleareengagedintheprogramandthenatureofthatengagement.HowevertheYPQAalsoaddressesprogramresources(human,material)andtheorganizationorarrangementofthoseresourceswithintheprogram.
Structure:TheYPQAassessessevendomainsusingtwooverallscales.Topicscoveredincludeengagement,interaction,supportiveenvironment,safeenvironment,highexpectations,youth-centeredpoliciesandpracticesandaccess.
Methodology:Itemsattheprogramofferinglevelareassessedthroughobservation.Organizationlevelitemsareassessedthroughacombinationofguidedinterviewandsurveymethods.
Thescaleusedthroughoutisintendedtocapturewhethernoneofsomething(1),someofsomething(3)orallofsomething(5)exists.Foreachindicator,concretedescriptorsillustratewhatascoreof1,3or5lookslike.
TechnicalProperties:Evidenceforscoredistributions,test-retestreliability,convergentvalidityandvalidityofscalestructureisstrong.Evidenceforinterraterreliabilityismixedandevidenceispromisingbutlimitedintermsofinternalconsistencyandconcurrentvalidity.
UserConsiderations:EaseofUse
Languageandformatofthetoolareaccessible.•
Administrationmanualwithdefinitionsofterms•andscoringguidelines.
Thetoolcanbeorderedonline.•
Ratersmustobserveforroughlyfourhoursto•generatesounddata.
Observerscanbetrainedtogeneratereliable•observationsintwodays.
AvailableSupports
One-daybasicandtwo-dayintermediateYPQA•trainingareavailable,withadditionaltechnicalassistanceavailableuponrequest.
Youthdevelopmenttrainingthatisalignedwith•toolcontentisavailable.
Online“scoresreporter”andaWeb-baseddata•managementsystemareavailable.
ForMoreInformation:www.highscope.org/content.asp?contentid=117
4TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
30
At-a-glancedescriptionsofthetenassessmenttoolsareprovidedintheprevioussection.Hereweoffermoredetaileddescriptions.Eachwrite-upfollowsacommonformat.
AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures
Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.
ProgramObservationTool(POT)NationalAfterSchoolAssociation
ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce
ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork
PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.
QualityAssuranceSystem(QAS)Foundations,Inc.
School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal
YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
31
PurposeandHistoryTheAssessingAfterschoolProgramPracticesTool(APT)isasetofobservationandquestionnairetoolsdesignedtohelppractitionersexamineandimprovewhattheydointheirafter-schoolprogramtosupportyoungpeople’slearninganddevelopment.Itwasspecificallydesignedtoexaminethoseprogrampracticesthatresearchsuggestsmayberelatedtokeyyouthoutcomes(e.g.,behavior,initiative,socialrelationships)anditisacorecomponentoftheAfterschoolProgramAssessmentSystem(APAS).5
TheresearchversionoftheAPT(theAPT-R)wasdevelopedin2003-2004foruseintheMassachusettsAfterschoolResearchStudy(MARS).BasedonextensivefieldtestingbygranteesaswellassomeadditionaltestingofthescalesusingMARSdata,amoreuser-friendlyself-assessmentversionofthetoolwasdevelopedduring2005forusebytheMassachusettsDepartmentofElementaryandSecondaryEducation21stCenturyCommunityLearningCenters(21stCCLC)granteesandotherprogramsinterestedinqualityassessment.Theself-assessmentversionisthefocusofthisdescription.
Theinstrumentcanbeusedtomeasurequalityinawidevarietyofprogrammodelsthatserveelementaryandmiddleschoolstudentsduringthenon-schoolhours.Inadditiontoservingasaself-assessmenttool,theAPTdefinesdesirableprogrampracticesinconcretetermsthatcanbeusedtocommunicatewithstaffandothers,helpstimulatereflectionanddiscussionregardingprogramstrengthsandweaknesses,guidethecreationofprofessionaldevelopmentprioritiesandimprovementgoalsandhelpgaugeprogresstowardthosegoals.
TheAPTfocusesontheexperiencesofyouthinprogramsandisnotintendedtoevaluateindividualstaffperformanceorproducedefinitiveglobalqualityscores
forprograms.WhiletheAPTincludesafour-pointratingscale,assigningratingsisnotrequired;programsareencouragedtousethetoolinwaysthatwillyieldthemostusefulinformationtoguideprogramimprovement.
ContentTheAPTmeasuresasetof15program-levelfeaturesandpracticesthatcanbesummarizedintofivebroadcategories:
Programclimate•
Relationships•
Approachesandprogramming•
Partnerships•
Youthparticipation•
Whileitdoesaddresssomebroaderorganizationalpolicyissues(e.g.,connectionswithschools,staff-youthratios)itwasdesignedtofocusprimarilyonthingsthatprogramstaffhavecontroloverandthatarerelevantacrossarangeofdifferentorganizationalcontextsandfacilities(e.g.,schools,communitycenters).
TheAPTemphasizessomeaspectsofsettingsmorethanothersandinparticularplacesastrongemphasisonrelationships,asresearchhasshownthatrelationshipshavethegreatestimpactonyouthoutcomes.Theprimaryfocusisthereforeonsocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Severalitemshelpusersobserveandassessyouth-adultrelationshipsandinteractions,aswellaspeerinteractionsandconnectionswithfamiliesandschoolpersonnel.Toalesserextentthansocialprocesses,theAPTalsoaddressesprograms’humanandmaterialresourcesandhowthoseresourcesareorganizedorarrangedwithinthesetting.
IndevelopingtheAPT,theauthorsreviewedrelevantliteraturetoidentifyprogramfeaturesthatrelatetooutcomesandalsolookedatexistingdefinitionsandmeasuresofprogramquality.OnesuchdefinitioncommonlyreferencedinthefieldistheNationalResearchCouncil’sfeaturesofpositivedevelopmental
DevelopedbyNIOSTandtheMassachusettsDepartmentofElementary&SecondaryEducation
5TheAPTwasdesignedtoaddressprogrampracticesthatresearchsuggestsleadtoyouthoutcomesmeasuredbytheSurveyofAfterschoolYouthOutcomes(SAYO)–anevaluationsystemdevelopedbyNIOSTundercontractwiththeMADepartmentofEducation.TheSAYOincludespre-andpost-participationsurveysforteachersandprogramstaffandmeasuresthingslikebehaviorintheclassroomandprogram,initiative,engagementinlearning,relationswithpeersandadults,homework,analysisandproblem-solvingandacademicperformance.Formoreinformation,seewww.niost.org/training/APASbrochureforweb.pdf
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
32
settings,aframeworkwhichitselfisfocusedprimarilyonsocialprocesses.ItemsontheAPTaddresseachoftheeightfeaturesidentifiedbytheNationalResearchCouncil(2002).
StructureandMethodologyThe15programfeaturesaddressedbytheAPTaremeasuredbyoneorbothoftwotools,theobservationinstrument(APT-O)andquestionnaire(APT-Q).TheAPT-Oguidesobservationsoftheprograminaction,whiletheAPT-Qexaminesaspectsofqualitythatarenoteasilyobservedandguidesstaffreflectiononthoseaspectsofpracticeandorganizationalpolicy.
Althoughthe15programfeaturesdrivethecontentofthetool,theAPT-Oisorganizedbydailyroutine.Fivesectionsareintendedtofollowwhatthedevelopersconsideratypicalprogramday.Whilethesesectionsmostcloselyreflectthedailyroutineinelementaryandmiddleschool21stCCLCprograms,thetoolisdesignedtobeflexibleandusersareencouragedtousewhicheversectionsaremostrelevantinwhateverordermakessense.Thesefivesectionsincludebothinformalprogramtimes(arrival,transitions,pick-up)andformalprogramtimes(homework,activities).
Withineachsection,sub-sectionsfocusonparticularpracticesandbehaviorsduringthosetimeperiods(forexample,sub-sectionsunder“homework”includehomeworkorganization,youthparticipationinhomeworktime,staffeffectivelymanagehomeworktimeandstaffprovideindividualizedhomeworksupport).Finally,eachsub-sectionincludesbetweentwoandeightspecificitemsthatcanbeobservedandrated.
AnimportantstructuralaspectoftheAPTisitsexplicitconnectiontoaspecificyouthoutcomemeasurementtool–theSurveyofAfterschoolYouthOutcomes(SAYO).ProgramscanuseAPTfindingstolookathowtheymaybecontributingtospecificoutcomeareasincludedintheSAYOandusersareprovidedwithchartsconnectingparticularAPTitemstospecificoutcomeareas.Despitethislinkage,theAPTisalsousefulasastand-alonetool.
Recently,anewsetofresources,theAPT-SAYOLinks,weredevelopedasquickguidesforpractitionerstounderstandtheresearchbaseconnectingAPTprogrampracticesandspecificSAYOoutcomeareas.
ProgramFeature APT-O APT-Q
Welcoming&InclusiveEnvironment ¸ ¸
PositiveBehaviorGuidance ¸
HighProgram&ActivityOrganization ¸
SupportiveStaff-YouthRelationships ¸ ¸
PositivePeerRelations ¸
Staff/ProgramSupportsIndividualizedNeeds&Interests ¸ ¸
Staff/ProgrammingStimulatesEngagement&Thinking ¸ ¸
TargetedSAYOSkillBuilding/Activities ¸ ¸
YoutharePositivelyEngagedinProgram/SkillBuilding ¸
Varied/FlexibleApproachestoProgramming ¸ ¸
SpaceisConducivetoLearning ¸
ConnectionswithFamilies ¸ ¸
OpportunitiesforResponsibility,Autonomy&Leadership ¸
ConnectionswithSchools ¸
ProgramSupportsStaff ¸
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
33
TheAPT-OTheAPT-Oratingscale,shouldusersdecidetoassignratingstotheirobservations,isafour-pointscaledesignedtoanswerthequestion,“HowtrueisitthatthisstatementdescribeswhatIobserved?”Definitionsofeachpointonthescaledifferslightlydependingonwhetheryouareobservingaprogramorstaffpracticevs.youthbehaviors.Adetaileddescriptionoftheratingscaleaswellasotherratingoptionsandconsiderationsareincludedintheinstructionmanual.Some“conditional”itemsareincluded,whichareonlytoberatedshouldtheyoccur(e.g.,whenyouthbehaviorisinappropriate,staffusesimplereminderstoredirectbehavior).
Ratersareaskedtoassigna1-4(orN/A)ratingtoeachoftheindividualitemswithineachsub-section.Formostitems,aspecificdescriptionofwhata“1”lookslikeisprovided.Thewordingoftheitemitselfconstitutesa“4”sincethequestiondrivingtheratingsis,“How
trueisthis?”Theinstructionmanualprovidesgeneralguidance(notitem-specific)forhowtothinkabout2s(e.g.,desiredpracticewasonlypartiallymet,someminorevidenceofnegativeexpressionsofthepractice,orthepracticeisobservedinfrequently)and3s(observedpracticemostlyreflecteddesiredpractice,orthedesiredpracticewasobservedbutperhapsnotatallexpectedtimes).Thisyear,NIOSTwillbegindevelopingmorespecificanchorsfor2sand3sontheAPTratingscaleforeachitem.Thesenewanchorswillbefield-tested,butnotpsychometricallytestedthisacademicyear.
TheAPT-QTheAPTProgramQuestionnaire(APT-Q)helpsprogramsreflectupontheaspectsofqualitythatarenotnecessarilyobserved,suchasprogramplanning,frequencyofofferings,andconnectionswithparentsandschools.AsisthecasewiththeAPT-O,flexibilityisbuiltintothequestionnairecomponentofthetool.Users
ArrivalTime Howtrue? Notes
1.Thereisanestablishedarrivalroutinethatseemsfamiliartostaffandyouth. 1234N/A
2.Activitiesareavailabletoyouthtobecomeengagedinastheyarrive(mayincludesnack).(e.g.,Widevarietyofactivitiesareavailabletoarrivingyouth.)
1=Therearenoactivitiesavailableforarrivingyouth.Youthhavenothingtodo(e.g.Standaroundwaitingforstafftobeginprogramming).
1234N/A
3.Staffacknowledgechildren/youthwhentheyarrive.(e.g.,Offeragreeting,slaphands,ask“How’sitgoing?”,exchangehellos,etc.)
1=Staffdonotacknowledgeanyarrivingyouth.
1234N/A
4.Staffengagein1:1conversationswithyouth.(e.g.,Talkaboutyouth’sday,askaboutsomethingtheybroughtormade).
1=Staffarenotseenconversingorinteractingwithindividualyouth.
1234N/A
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
34
areencouragedtoassignratingsonlytotheextentitisusefultodosoandtoreviewthevarioussectionsofthequestionnairebeforeusetoselectthosethatbestmatchaprogram’spriorities.TheAPT-Q,whichisdividedintoeightsections(seebox),providesopportunitiestoratetheconsistencyand/orfrequencyofcertainpractices.Italsoprovideslistsofspecificprogrampracticesthatsupportvariousqualityfeatures(e.g.,waystocreateawelcomingandinclusiveenvironment),encouraginguserstocheckthosethatareinuseintheprogrambutatthesametimeofferingabroadrangeofconcrete,positivepracticesthatcanencourageprogramdevelopmentandinnovation.
TheAPT-Qincludesthreedifferentratingscales.Afour-point“howconsistently”scale(rarely/never;onceinawhile/sometimes;often/alotofthetime;almostalways/always)isusedwiththesectionfocusedonprogramplanningandtheuseofspecificplanningpractices.Asix-point“howfrequently”scale(rarely/never;afewtimesperyear;aboutoncepermonth;aboutonceper
week;morethanonceperweek;usuallyeveryday)isusedfortwosectionsthatlookatprogramofferingsandtowhatextenttheprogrampromotesresponsibility,autonomyandleadership.Asimplerfour-point“howfrequently”scale(aboutonceperyear;severaltimesperyear;aboutoncepermonth;weeklyormoreoften)isusedforthesectionsthataddresshowtheprogramconnectswithfamiliesandschools.
TechnicalPropertiesThepsychometricinformationthatisavailableontheAPTcomesfromtheversionusedintheMassachusettsAfterschoolResearchStudy(MARS),conductedbytheInterculturalCenterforResearchinEducationandtheNationalInstituteonOut-of-SchoolTime(2005).6Theextenttowhichtrainedratersagreewhenobservingthesameprogramatthesametime,orinterraterreliability,wasmoderateandpreliminaryevidenceforconcurrentandpredictivevaliditysuggesttheAPT-Ryieldsaccurateinformationabouttheconceptsitmeasures.Asmentionedintheprevioussection,theanchorsforthe2and3ratingswillbedevelopedwiththeintentofimprovinginterraterreliability.Iffundingforfurthertestingcomesthrough,NIOSTwillbere-testinginterraterreliability.
Whilethecurrentself-assessmentversionofthetoolhasnopsychometricdata,NIOSTiscurrentlyseekingfundingtoconductfurtherpsychometrictestingoftheAPTandSAYOtools,includingtheextenttowhichthetwotoolscanworktogetherasanintegratedmeasurementsystemandallowforpractitionerstotargetkeypracticesandtrackexpectedoutcomes.
InterraterReliabilityResearchersexaminedinterraterreliabilityfor78programsintheMARSstudyandfoundthatpairedratersagreedontheirratings(withinonescorepoint)85percentofthetime.Iftherangeofresponseoptionsfortheresearchandself-assessmentversionsissimilar,thenwecanexpect,simplybychance,agreementbetweenraterstobeatleast62.5percent,yieldingamaximumkappascoreof0.60(ahighkappaisgenerally
APTProgramQuestionnaireSections
1.Howyouplananddesignprogramofferings
2.Yourprogramofferings
3.Howyourprogrampromotesresponsibility,autonomyandleadership
4.Howyourprogramcreatesawelcomingandinclusiveenvironment
5.Howyourprogramsupportsyouthasindividuals
6.Howyourprogramconnectswithfamilies
7.Howyourprogrampartnerswithschoolstosupportyouth
8.Howyourprogramsupportsandutilizesstafftopromotequality
6Thedevelopershaveconductedadetailedcomparisonofthetwoversions.RoughlyhalfoftheAPT-Ritemsinthecurrentself-assessmentversionappearexactlyastheywerewordedintheresearchversion.Roughlyonequarteroftheoriginalitemsweretakenout,roughlyonequarterwererevisedslightly.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
35
considered0.70).Althoughinterraterreliabilityhasnotyetbeenestablishedfortheself-assessmentversionoftheAPT,existingdatasuggestthatagreementwasmoderatelybetterthanchance.
FaceValidityThedevelopersreceivedextensiveandsystematicfeedbackontheAPT,aboutusabilityaswellasperceivedvalidity.Twenty-sixgranteesrepresentingover100programsitesrespondedtoasetofquestionsaboutthetool,moststaffparticipatedinfocusgroupsandin-depthinterviewswereconductedwith12grantees.Positivefeedbackfromthisrangeofkeystakeholderssuggestedthattheitemsmakeintuitivesenseformeasuringprogramquality.However,thisistheweakestformofvalidityasitisnotbasedonempiricalevidence.
ConcurrentValidityMARSauthorscomparedfindingsfromtheAPT-Rtoobservationsofprogramcharacteristicsandfoundthatcertainitemswererelatedtoprogramandstaffcharacteristicsinexpectedways.Forexample,programsthatscoredhighonitemsrelatedtostaffengagementandengagingactivitiestendedtohavesmallergroupsizes,strongerconnectionswithschools,ahigherstaff-to-childratioandahigherpercentageofstaffwithcollegedegrees.Programsthatscoredhighonitemsrelatingtoyouthengagementtendedtobewell-paced,organizedwithclearroutines,hadahigherstaff-to-childratioandahigherpercentageofstaffwithcollegedegrees.Betterfamilyrelationswererelatedtostrongerconnectionsbetweenprogramsandparentsandthecommunity.Programsthatofferedhighqualityhomeworktimetendedtooffermoreproject-basedlearningactivities,weremoreorganizedwithclearroutines,hadahigherstaff-to-childratioandhadmorestaffthatwerecertifiedteachers.NIOST’sproposedstudyincludesconcurrentvaliditytestingoftheAPTusingtheYPQA.
Thisevidenceofconcurrentvalidityshouldberegardedaspreliminarybecausemanyitemswerenotrelatedtoprogramcharacteristics.Forexample,youthengagementinprogramswasunrelatedtosmallergroupsizesandengagingandchallengingactivitieswerenotrelated
toprogramsbeingwell-pacedandorganizedwithclearroutines.Becausetheauthorsdidnotexplicitlyidentifywhichrelationshipsweremostimportantandwhichfindingsrancontrarytotheirexpectations,wecannotbecertainthattheobservedfindingsindicateunequivocallystrongconcurrentvalidity.
MARSauthorsalsoexaminedtheassociationbetweenAPT-Rscalesandfivestudentcharacteristicsthatwouldbeexpectedfromtheoryandpriorresearch:Students’improvementintheir(1)behavior,(2)initiative,(3)homework,(4)relationswithpeers,and(5)relationswithadults.AuthorsfoundthattheAPT-RYouthEngagementscalewasrelatedtoallstudentcharacteristicsinexpectedwaysexceptRelationswithAdults.HigherscoresonAPT-RChallengingActivitieswasrelatedtolowerscoresonRelationswithPeers,andhigherscoresonAPT-RRelationswithFamilieswasmarginallyrelatedtohigherscoresonRelationswithAdults,butbothAPT-Rscaleswereunrelatedtootherstudentcharacteristics.TheAPT-RHomeworkTimescalewasunrelatedtoallstudentcharacteristics.MARSauthorspointoutthattheprogramsintheirsampledidnotscorehighonChallengingActivitiesorHomeworkTime,indicatingthattheresultscouldbedifferentinasamplewithmorediverseprogramsonthesedimensions.Inaddition,correlationalevidencedidnotshowanysignificantrelationshipsbetweenstudentcharacteristicsandAPT-RscalesconcerningAppropriateSpaceandStaffEngagement.Giventhesefindingsaswellaskeepinginmindthatsome,butnotall,itemsusedintheAPT-Rappearintheself-assessmentversionofthetool,thisevidenceshouldberegardedaspreliminary,withadditionalevidenceneededtofirmlyestablishtheconcurrentvalidityoftheAPT.
ValidityofScaleStructureTheauthorscreatedseveralscalesintheAPT-Rusingastatisticaltechniqueknownasfactoranalysis.Howevertheextenttowhichthesefindingscanbegeneralizedtotheself-assessmentversionisuncleargiventhedifferencesbetweenthetwoinstruments.TestingonthevalidityofthescalestructurehasbeenplannedinNIOST’supcomingstudy.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
36
UserConsiderationsEaseofUseTodate,theAPThasbeenusedby36schooldistrictsoffering21stCCLCprogramsinover150sites,andinafour-citypilot(Charlotte,Boston,MiddlesexCounty,NewJerseyandAtlanta).Inresponsetoitsbroaduse,NIOSThasdevelopedmanyproductsrelatedtotheAPTthatmakeitbothuser-friendlyandadaptabletoavarietyofsystemsandsettings.
Onesuchproductisapackageoftools,includingtheAPT,theSAYOandadatamanagementtrackingsystemwhichtracksindividualyouth’sprogramparticipation(anexampleofsuchasystemisKidtrax).Whencombined,thesetoolsarereferredtoasthe“AfterschoolProgramAssessmentSystem(APAS).”Thesystemcanbeusedtosupporton-linedatacollection,management,analysisandreporting,allowingprogramstolinkqualitydatafromtheAPTandyouthoutcomesdatafromtheSAYOwithinformationondailyattendanceandprogramparticipationinacomprehensive,flexibleandintegratedfashion.
NIOSThasdevelopedotherproductstosupporttheuseoftheAPTinconjunctionwithitsothertoolsaspartofafullassessmentsystem(althoughtheAPTmaybeusedasastand-alonetool).Forexample,inadditiontotheSAYO,NIOSThascreatedaseriesofSAYO-APTLinksor“cheatsheets”tosupportprograms’effortstolinkqualityandoutcomes.TheSAYO-APTLinksdescribeeachSAYOyouthoutcomeareaandcorresponditwiththerelatedprogrampracticefoundintheAPT.Anotherproductisadetailed,practitioner-friendlytrainingnotebookthatguidesprogramsintheuseoftheAPTandrelatedtools.InterestedprogramsshouldcontactNIOSTfortrainingoptionsandcosts;trainingsarecustomizedtomeettheneedsandinterestsoforganizations.
Inanotherdevelopment,NIOSTisfinalizinganonlineYouthSurvey(SAYO-Y)designedtocomplementtheAPTandSAYO.Thesurveymeasuresyouth’sprogramexperiencesinfiveareas(e.g.,inclusiveenvironment,choice&autonomy);youth’ssenseofcompetenceinsixareas(e.g.,math,reading);andyouth’sfutureplanningandexpectations.Thissurveyhasbeentestedwithover
6,000Massachusettsyouthparticipatingin21stCCLCprogramsandwillbefullypilotedagainthisyear.
Flexibilityisahall-markoftheAPT,soalthoughthedevelopersprovidesomeguidanceastowhentoconductobservations,forhowlong,etc.,theyemphasizethattheAPT-Ocanbeusedinmanydifferentwaysandthatdecisionsabouthowmanyobservers,howmanyobservationsandwhethertousenumericalratingsshouldbedrivenbywhatusersintendtodowiththedataintheend.Thegeneralintentionbehindthedesignisforanobservertoobserveonefullprogramsession(typicallyafullafternoon),takingnotesduringtheobservationandusingtimeimmediatelyafterwardstocompleteallrelevantsectionsincludinganopen-ended“impressions”sectionattheendofthetool.ManyprogramsusingtheAPTasaself-assessment,however,havepreferredtoobtainatleasttwodaysofobservationdata.Initstrainingsessions,NIOSTguidesparticipantsinunderstandinghowtheymaywanttousetools,and,therefore,howtheywilltailorthemtosupporttheirevaluationandassessmentobjectives.
AvailableSupportsTrainingonboththeAPTitselfandprocessesforguidingprogramimprovementisavailablethroughNIOST.Mostrecently,atwo-dayAPAStrainingmodulehasbeendevelopedtopreparesitedirectorsandotherstousetheSAYOandAPT.TrainingfortheAPT,alone,isonefullday.NIOSTalsoprovidesaQualityAdvisortrainingtohelpcoachesandothertechnicalspecialistsusetheAPTtoworkwithprograms.AnonlinetutorialsisalsoavailabletopreparesitestousetheSAYOoutcometool.
Inmid-2007,packagingandpricinginformationabouttrainingontheinstrumentbecameavailablefororganizationsthatareinterestedbutnotalreadyaffiliatedwiththeAPTthroughstatewideeffortsinMassachusetts.
InTheFieldTheCityofCambridge,MAAgendaforChildrenisacity-wideout-of-schooltimeinitiativebringingtogethercitydepartments,community-basedorganizations,
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
37
businesses,fundersandresidentstopositivelyimpactthelivesofCambridgeyouth.Specifically,theinitiativeworkstoimproveaccesstoandthequalityofCambridgeout-of-schooltimeprograms.Inpursuitofthatgoal,CambridgeAgendaforChildrenhasbeenrefiningamulti-siteprogramimprovementprocessoverthreeyears,usingtheAPTprogramimprovementtoolinthateffort.TheirSelf-AssessmentSupportinitiativesupportsprogramstoengageinobservationandself-reflection,andtaketargetedactionbasedonwhattheysee.
TheAPTgivesprogramcoordinators,sitecoordinatorsandfront-linestaffacommonlanguagefortalkingabouttheirgoalsanddetaileddescriptionsofeffectivepractice.
ForMoreInformationInformationabouttheAPTisavailableonlineat:www.niost.org/content/view/1572/282/orwww.doe.mass.edu/21cclc/ta
ContactKathySchleyer,TrainingOperationsManagerNationalInstituteofOut-of-SchoolTimeWellesleyCentersforWomenWellesleyCollege106CentralStreetWellesley,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
38
PurposeandHistoryTheCORALobservationtoolwasdesignedbyPublic/PrivateVentures(P/PV)fortheCORALafter-schoolinitiativefundedbytheJamesIrvineFoundation.ThetoolwasdevelopedforresearchpurposesandwasprimarilyusedinaseriesofevaluationstudiesontheCORALafter-schoolinitiative.TheprimarypurposeoftheobservationswastomonitorfidelitytotheBalancedLiteracyModelandchangeinqualityandoutcomesovertime.
UndertheCORALinitiative,after-schoolprogrammingwasprovidedtoelementaryschoolagedchildreninfivecitiesinCalifornia.Ineachofthesecities,programmingwasdifferentandconsistedofavarietyofactivitiesranginginfocusfromscience-basedprogramstoartandculturalenrichmentprogramming.AllCORALprogramsincludedthecommoncoreelementofBalancedLiteracyprogramming.
Theobservationtoolwasusedintwoways:first,toobserveBalancedLiteracyinstructioninCORALafter-schoolprograms,andsecond,toobservetheintegrationofliteracyprogramminginavarietyofotheractivitiesincludinghomeworkhelpandacademicenrichmentprogramsranginginfocusfromsciencetoartandculturalenrichment.ThoughtheCORALobservationtoolwasdesignedtohelpobserversmeasuretheimpactofafter-schoolprogramsonacademicachievement,ithasapplicationsforobservingqualityinawidevarietyofsettings.
TheCORALobservationtoolhasfivecomponents:
an• activitydescriptionform,usedtogatherdescriptiveinformationontheobservedactivitypriortotheobservation;
an• activitycharacteristicform,completedduringtheobservation,usedtocollectgeneralinformationabouttheactivityandtypeofinstruction(i.e.,lengthoftime,numberofparticipants,numberofstaff,teachingmethods,etc.);
an• activitycheckboxformthatisdividedintothefiveoverarchingcategoriesforobservation–this
istheprimarymethodforrecordinginformationduringtheactivity;
the• activityscalesform,completedaftertheobservation,thatisusedtorateeachoftheconstructsobservedintheactivity;
andtheoverallassessmentcomponent(completed•afterthreeobservationsforliteracyprogramsandaftertwoobservationsfornon-literacyprograms)whichmeasuresbothaspectsoftheactivityandparticipantimprovementofskillareas.
BecausetheCORALinitiativeemphasizedbestpracticesinyouthdevelopment(includingpositiveadult/youthrelationshipsandongoingyouthparticipation),P/PVdevelopedtheCORALtoolbasedonanobservationtoolusedforevaluationoftheSanFranciscoBeaconinitiative.P/PValsodesignedasimilarobservationaltoolcontainingtheupdatedacademiccomponentsfromtheCORALtoolforthePhiladelphiaBeaconinitiative.
ContentTheCORALobservationtoolwasdesignedtohelpresearcherscollectdatathroughanongoingprogramobservationprocesstomeasuretheconnectionbetweenthequalityoftheprogram,fidelitytotheBalancedLiteracyModel,andtheacademicoutcomesofparticipants.Asaresult,themainsectionsofthetoolfocusonfivecomponentsofquality:adult-youthrelations,effectiveinstruction,peercooperation,behaviormanagement,andliteracyinstruction.Eachaspectofqualityhasseveralelementsofqualitythatareratedandcapturedinsubcategories,whicharecalledconstructs.Anumberofthesecoreconstructs(suchasbehaviormanagement,youth/adultrelationships,peercooperation,etc.)arerelevantforbothformalprogramsettingsaswellasinformal,adultsupervisedsettings.Thecharacteristicsratedwithineachoftheconstructsarelistedonpage38.
BecausethefocusoftheCORALobservationtoolisonfidelitytotheBalancedLiteracyModel–astructuredliteracyapproachthatusesavarietyofmodalitiesaimedadevelopingcompetentreaders–ittendstofocusonsocialprocessesandskilldevelopmentinsupportofliteracygains,andlessonprogramresourcesortheorganization
CommunitiesOrganizingResourcestoAdvanceLearningObservationTool
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
39
ofthoseresourceswithinaprogram.Literacyfocusedactivitiesareassessedfortheirfidelitytothemodelandtheirassociationwithchangeinparticipantinterestandmotivation.Non-literacyfocusedactivitiesareassessedfortheirintegrationofliteracyskillsintothecurriculum.
StructureandMethodologyThefirstthreecomponentsoftheCORALobservationtool–theactivitydescriptionform,characteristicsformandtheactivitycheckboxform–arefocusedondescribingtheactivityaswellasparticipantandstaffbehavior.Theactivitydescriptionformiscompletedbefore(orafter)theactivitybasedontheinformationgleanedfromtheinstructorina10-15minuteconversation.Theactivitycharacteristicsform,wheregeneralinformationisrecorded,iscompletedduringthefirsttenminutesoftheactivity.Theactivitycheckboxformisarunninglistoftheobservationsofbehaviorandactivitycharacteristics,andisfilledoutbytheobserverwhiletheactivityisongoing.Intheactivitycheckboxform,observerscanchoosebetweenseveralexamplesofpositiveandnegativebehaviorforeachoftheconstructsabove.Iftheyobserveabehaviornotcapturedintheexamples,itcanberecordedinthenotessectionand
consideredinthescoringaftertheobservationiscompleted.Observationsareconductedovera90-minuteperiod(inCORAL,theliteracyactivitytookplaceovera90-minuteperiod,sotheobservationtookplaceoveritsentiredurationinordertobeabletoassessfidelity).
Eachconstructisbasedonafive-pointratingscale,with1representingthelowestscore(definitelyneedsimprovement)and5thehighest(outstanding).TheCORALdeveloperssuggestthat5sbeawardedsparinglyastheintentistoindicatethatthereisnoroomforimprovement.Meanwhile,the1ratingcanbegivenforseveralreasons,includingobservednegativestaffbehaviorsorduetoanactivitynotfittingtheappropriateconstruct.Activityratingsareassignedaftereach90minuteobservationforcumulativeobservationsofthreeforaliteracyprogramandtwoforanon-literacyactivity.
Afterthe90minuteobservation,observerscompletetheactivityscalesformwithin24hoursofeachobservation,includingthedescriptivenarrative.Thecheckboxformdoesnottranslateintoaone-to-onescoreonthescalesform.Insteadobserversarerequiredtoconsidertheactivitiesrecordedonthecheckboxformalongwiththe
Adult-YouthRelationships BehaviorManagement Instruction Literacy PeerCooperation
Adultsupport•fortheactivity
Generaladult•responsiveness
Emotional•qualityoftherelationship
Appropriatenessof•behavioraldemands
Adultmanagement•
Staff’s•inclusivenessofyouth
Clarity•
Organization•
Motivation•
Challenge•
Connectionto•othermaterial
Connection•betweenyouth&material
Cultural•awareness
Responsivenessto•Englishlanguagelearners
Literacyrich•environment
Readaloud•
Booktalk/•discussion/shoutout
Writing•
Independentreading•
Skilldevelopment•activities/games
Buildvocabulary/•spelling
Connectionsbetween•youth&text
Cooperative•activity
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
40
duration/frequencyoftheobservedbehavior,qualityofthebehaviorandimportanceofthebehaviortotheactivitywhenderivingascoreforeachconstruct.Additionally,observerscompletedescriptivenarrativesummaries(usingsamplenarrativesasaguide)whichcontainthemostinformativeaspectsoftheactivity.
Uponcompletionoftherequiredseriesofobservations,theoverallassessmentformisusedtoratetheoverallqualityoftheactivity.Theformcontains11narrativequestionsinwhichobserversdescribethestrengths,weaknesses,areasofimprovementfortheactivity,culturalawarenessoftheinstructor,modificationsforvaryinglinguisticneeds,andtheclassroomenvironment.Theobserversarealsoaskedtorecordtheimprovementstheyobserved.Thequestioncontainsapre-selectedlistofskillswhichrangefromacademictovisualartsandperformance.
Thedeveloperssuggestthreeorfourhoursforcompletingtheratingscales,relatednarrativesandtheoverallassessment.
Fortheirresearchpurposes,Public/PrivateVenturesadditionallyrequiredobserverstowriteanarrativedescriptionforeachcomponentandusedthedescriptionsaspartoftheirquantitativeanalysisandasamethodofcompiling“bestpractices”thatexemplifiedspecificconstructs.Inaddition,observerswereaskedtoidentifyinanarrativewhichaspectofeachoftheseskillsparticipantsshowedimprovement.
TechnicalPropertiesTechnicalevidencefortheCORALobservationtoolcomesfromitsuseduringtheevaluationoftheCORALInitiative,atwo-yearstudythatincluded56observationsin23after-schoolprogramsinthefirstyearand43observationin21after-schoolprogramsinthesecondyear.The90-minuteliteracyactivitieswereobservedtwotofourtimeseachduringthefirstyearoftheevaluation,andaminimumofthreetimeseachduringthesecond.Evidenceisdrawnfromstudy’sinitialreportafterthefirstyear(Arbreton,Goldsmith&Sheldon,2005)aswellasthefinalreport,whichdrawsonevidencefrombothyears(Arbretonetal.,2008).Datafromthenon-literacyactivityobservationswereusedtocreatequalitativeprofilesoftheactivities,andnostatisticalanalysiswasconducted.FortheliteracyactivitiesintheCORALevaluation,statisticalanalysiswasconductedtoidentifytherelationshipbetweenprogramqualityandparticipantacademicgains.ThetechnicalpropertiesdescribedbelowpertaintoP/PV’sanalysisofliteracyobservationdataonly.
SomeusersmaybeinterestedinsummarizingdatafromtheCORALintoscales.IntheiranalysisfromtheCORALinitiative(Arbreton,Goldsmith,&Sheldon,2005;Arbretonetal.,2008),thedeveloperscreatedfourscalesusingitemsfromtheinstruments’ActivityScalesForm:(1)Adult-YouthRelation(averageofitems1through3);(2)InstructionalQuality(averageofitems5through8);(3)GroupManagement(averageofitems16and17);
Q8.Challenge(++) Q8.Challenge(––)Thestaff:
encourageyouthtopushbeyondtheirpresentlevel•ofcompetency.
trytosustainthemotivationofyouthwhoare•discouragedorreluctanttotry.
continouslymovetothenextstepassoonas•youthprogress.
reinforceandencourageyouth’seffortsinorder•tomaintaintheirinvolvement.
Thestaff:
discourageyouthwhotriedtopushbeyondtheir•presentlevelofcompetency.
missopportunitiestosustainthemotivationof•youthwhoarediscouragedorreluctanttotry.
missopportunitiestomovetothenextsteepas•soonasyouthprogressed(e.g.,pacewastooslow)
donotreinforceorencourageyouth’seffortsin•ordertomaintaintheirinvolvement.
N/Abecause:
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
41
(4)ConnectionBetweenYouthandActivities(averageofitems9and10).
DevelopersalsocreatedanOverallLessonRating,whichisanaverageofscoresfromthreescales(Adult-YouthRelation,InstructionalQuality,&GroupManagement),andseveralitems:ReadAloud(item19),BookTalk(item20),Writing(item21),IndependentReading(item22),&ConnectiontoYouth(item10).
ScoreDistributionsItemsandscalesshouldbeabletodetectmeaningfuldifferencesacrosssettings,andthereforeshouldexhibitarangeofscores.Scoredistributionswereexaminedforthefourdomain-specificscalesaswellassixindividualitemsrepresentingtheBalancedLiteracystrategiesmeasuredintheCORALtool.Mostitemsexhibitedgoodscoredistributions,althoughtheitem“SkillDevelopmentActivities”(item24)wasonthelowendofthescaleforbothyearsofmeasurement(averagescoreswere1.5and1.8,respectively,onascaleof1to5).
InternalConsistencyResponsestoitemscomprisingscalesshouldbehighlyrelated,suggestingthattheitemsformmeaningfuldomains.TheinternalconsistencyoftheOverallLessonRatingwasquitestrongwithanalphaof.94.OfthefourscalesmeasuringspecificdomainsfromtheCORALInitiative,three(AdultSupport,InstructionalQuality,andGroupManagement)exhibitedexcellentinternalconsistencywithCronbach’salpharangingfrom.84to.88(exceedingtherecommendedvalueof.70),suggestingthatthescalesarecohesiveandcomposedofrelateditems.However,thescaleConnectiontoYouthandActivitieswaslesscohesive(alpha=.54),suggestingthatthetwoitemscomposingthisscaleareonlymoderatelyrelatedandmoreitemsmaybeneededtofullycapturethisdomain.
ValidityofScaleStructureAnanalysisthatexaminesscalestructurevalidityforasinglescaletestswhetherthescaletrulymeasuresadistinctandcoherentdomain(ratherthanseveraldifferentconceptsordomains).DevelopersexaminedwhetherseveralitemsandscalesfromtheCORAL
toolcouldbecombinedtorepresentanOverallLessonRating.FindingsindicatedthattheOverallLessonRatingrepresentsacohesivesummaryofmultipledomainsfromtheCORALmeasure.Thedevelopersdidnotexaminescalestructurevalidityforotherscalesmeasuringspecificdomains(suchasInstructionalQuality).
PredictiveValidityIftheCORALtooltrulymeasureseffectiveliteracystrategies,wecanexpectthatscoresfromtheliteracystrategyitemswillberelatedtogainsinreadingandEnglishlanguageskills.Toexaminetheinstrument’spredictivevalidity,authorsexaminedtherelationshipbetweenscoresontheCORALtoolwithoutcomesof234-383childrenacrossthetwoyearsofthestudy.
AsdiscussedintheCORALinitiativereports(Arbreton,Goldsmith,&Sheldon,2005;Arbretonetal.,2008),authorsclassifiedprogramsintofiveliteracyprofilesbasedonhowtheprogramsscoredonthesixitemsmeasuringBalancedLiteracystrategies.ReaderswhoareinterestedinmoreinformationonhowtheliteracyprofileswerecreatedshouldconsulttheCORALinitiativereportscitedinthisdocument.Attheendofthefirstyear,childrenwhoattendedprogramswithbetterliteracyprofileshadgreaterreadingimprovementsontheInformalReadingInventory(IRI)butwerenotmorelikelytohavepositiveoutcomesontheCaliforniaStandardsTest-English-LanguageArts(CST-ELA).OnereasonwhyliteracyprofilesmayhavepredictedbetterscoresontheIRIbutnottheCST-ELAcouldbebecausetheIRIfocuseslargelyonreadingabilitiesandcomprehension,whereastheCST-ELAissomewhatbroaderandalsoincorporateswritingskillsandwordanalysis.Classroompractices(asmeasuredbythefourscales)didnotbythemselvespredictreadingimprovementsineithertheIRIortheCST-ELA,overthefirstyearoftheevaluation.
Attheendofthesecondyear,theauthorscreatedascalecalledtheOverallLessonRatingthatisacombinationofitemsmeasuringliteracystrategiesaswellasclassroompractices.TheOverallLessonRatingpredictedmorepositiveoutcomesontheCST-ELA(scoresontheIRIwerenotexaminedinthesecond
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
42
year).ThefindingsfrombothyearssuggestthattheCORALtoolsuccessfullymeasuresqualityinprogramswithastrongliteracycomponent.
UserConsiderationsEaseofUseAlthoughtheCORALtoolcanbeusedbyanyone,itwasdesignedexclusivelyforresearchpurposesandhasnotbeenspecificallyadaptedforpractitioneruseatthistime.However,thetooldoescontaindetailedinstructionsforconductingobservationsandcompletingtheforms.
AvailableSupportsAtthistime,Public/PrivateVenturesisnotofferingtrainingonuseoftheCORALobservationtool,thoughatwo-daytrainingwasofferedtotheobserversparticipatingintheevaluationstudy.Thistrainingincludedareviewoftheobservationmaterials,mockobservationsandwrite-upsondayone,andafieldobservationinwhichanewobserverwaspairedwithanexperiencedguideondaytwo.Thetraineeandguidecomparednotestodevelopconsistency.Traineesreceivedongoingmonitoringandsupport.
IntheFieldInthe2004-2005schoolyear,theCORALobservationtoolwasusedaspartofanevaluationoftheprograms’implementationoftheBalancedLiteracyModel.TheevaluationwasconductedbyPublic/PrivateVentures.Liveobservationsoftheparticipants’experiences,theimpactofbalancedliteracyontheacademicgainsofparticipants,andfidelitytotheBalancedLiteracyModelwereatthecenteroftheresearch.Observationswereconductedduringtheprogramparticipants’third-gradeandfourth-gradeyears,andeachparticipantwasobservedthreetimesbyanobserverassignedfromapoolofobservers.
Inanalyzingthedata,researchersfoundthatalthoughthelinkbetweenqualityandacademicgainswasinconclusive,thefindingsindicatethattheparticipantswiththegreatestacademicgainswerethosethatparticipatedinhigherqualityprograms.Additionally,all
youth,includingthosereadingbelowgradelevel,hadgreatergainswheninthehigherqualityprogramsthandidtheircounterpartsinlowerqualityprograms.Inthesecondyearofobservations,theresearchersobservedthattherewasconsistencyintheimplementationoftheliteracymodelaswellashigherqualityimplementation.Inthesametimeperiod,researchersalsoobservedreadinggainsthatwere39percenthigherthanthegainsrecordedinthefirstyear.
ForMoreInformation:InformationabouttheCORALobservationtoolisavailableat:www.ppv.org/ppv/initiative.asp?section _ id=0&initiative _ id=29
Contact:AmyArbreton,SeniorResearchFellowPublic/PrivateVenturesLakeMerritPlaza1999HarrisonStreet,Suite1550Oakland,CA94612510.273.4600
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
43
youth.Thefirstsectioncapturesarangeofin-depthinformationaboutthetypeofactivitybeingobservedandtheskillsemphasizedthroughthatactivity;theremainderfocusesonwhattheyouthdevelopmentliteraturepointstoascriticalcomponentsofprograms.
Becauseofitsdevelopmentalgroundinganditsfocusonwhatyoungpeopleexperienceinsideofprograms,theOSTObservationToolhasanactivityandprogram-levelfocusanddoesnotaddressorganizationalissuesrelatedtomanagement,leadershiporpolicy.Theprimaryfocusisonsocialprocesses–includingrelationalissuesandmanyitemsthatspeakspecificallytoinstructionandlearning.Beyondoneitemrelatedtomaterials,theinstrumentdoesnotfocusonprogramresourcesortheorganizationofthoseresourceswithinthesetting.
ThecontentoftheOSTObservationInstrumentalignsverycloselywiththeSAFEframework(DurlakandWeissberg,2007)whichoutlinesfeaturesofprogramsthatcontributetopositiveoutcomesforyouthinout-of-schooltimeprograms.SAFEreferstoout-of-schooltimeactivitiesthatare:
Sequenced:• Contentandinstructionaredesignedtoincreasinglyadvanceskillsandknowledgeandhelpyouthachievegoals;
Active:• Activitieslendthemselvestoactiveengagementinlearning;
PersonallyFocused:• Activitiesstrengthenrelationshipsamongyouthandbetweenstaffandyouth;
Explicit:• Activitiesexplicitlytargetspecificlearningordevelopmentalgoals.
The2008versionofthetoolwasupdatedtofullyalignwiththeSAFEframework.Thechangesaremostobviousinthereorganizationofthequalitativeportionoftheinstrumentinwhichobservationalnotesarerecordedandsynthesized.Thisframeworkwasalsousedintheevaluators’analysesoftheirYear2findingsfortheNewYorkandNewJerseystudieswhichdemonstratedthattheOSTindicatorsmapwelltotheSAFEframework.
PurposeandHistoryPolicyStudiesAssociates(PSA)developedtheOut-of-SchoolTimeProgramObservationTool(OSTObservationTool)overafive-yearperiod,inconjunctionwithseveralresearchprojectsrelatedtoafter-schoolprogramming,includingamajorstudyofpromisingpracticesinhigh-performingprograms(Birmingham,Pechman,Russell&Mielke,2005).Athirdeditionofthetool,revisedin2008,wasreviewedforthiscompendium.TheinstrumentwasrecentlyusedinstudiesoftheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTime(OST)ProgramsforYouthandoftheNewJerseyAfter3Initiative.
Thetoolwasdevelopedwithresearchgoalsinmind–inparticularthedesiretocollectconsistentandobjectivedataaboutthequalityofafter-schoolactivitiesthroughobservation.Itsdesignisbasedonacoupleofassumptionsabouthigh-qualityprograms–firstthatcertainstructuralandinstitutionalfeaturessupporttheimplementationofhigh-qualityprogramsandsecondthatinstructionalactivitieswithcertaincharacteristics–variedcontent,mastery-orientedinstructionandpositiverelationships–promotepositiveyouthdevelopmentoutcomes.
TheOSTObservationToolcanbeusedinvariedafter-schoolcontextsincludingschool-orcenter-basedprogramsandwithyouthparticipantsthatareinkindergartenthrough12thgrade.Whilethetoolcanprovideprogramstaffwithaframeworkforobservingandreflectingontheirpractice,itwasdevelopedfor,andthusfarhasprimarilybeenusedfor,researchpurposes.Initscurrentdesign,itisnotintendedtobeusedtoassignoverallqualityscoresforprogramsorstaff.
ContentTheOSTObservationToolwasdesignedtoprovideresearchersandotheruserswithaframeworkforobservingessentialindicatorsofpositiveyouthdevelopment.Itfocusesonthreemajorcomponentsofprograms:activitytype,activitystructuresandinteractionsbetweenyouthandadultsandamong
DevelopedbyPolicyStudiesAssociates,Inc.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
44
Additionalchangesbetweenthesecondandthirdeditionsinvolveinclusionoftheacademicandtechnologyfeaturesofprograms.Thisnewsectionfeaturesitemsrelatedtoliteracy,mathinstructionandtheuseoftechnology,guidinguserstonotethepresenceorabsenceofactivitiesthatmeetliteracy,mathortechnologygoals.
StructureandMethodologyThefirstpartoftheinstrument,whichfocusesonactivitytype,providesobserverswithdetaileddefinitionsfordocumenting:
Typeofactivity(e.g.,tutoring,visualarts,music,•sports,communityservice)
Typeofspace(e.g.,classroom,gym,library,•auditorium,hallway,playground)
Primaryskilltargeted(e.g.,artistic,physical,•literacy,numeracy,interpersonal)
Numberandeducationlevelofstaffinvolved•intheactivity
Environmentalcontext(e.g.,supervision,space,•materials)
Number,genderandgradelevelofparticipants•
Inaddition,theinstrumentprovidesdetaileddescriptionsofeachofthefourSAFEfeaturesforratingpurposes.Theaboveobservationsarerecordedonacoversheetthatalsoincludesotherbasicinformationabouttheobserver,program,date,time,etc.
Theremainderofthetooladdressesfivekeyyouthdevelopment“domains”includingrelationships(youth-andstaff-directedareconsideredseparately),youthparticipation,skillbuildingandmastery,andactivity
contentandstructure.Eachdomainissubdividedintofourtosevenspecificindicatorsorpractices.Foreachindicator,adetailed“exemplar”isofferedtoguideratings.Forexample:
Domain:• Relationship-building:Youth.
Indicator:• Allormostyoutharefriendlyandrelaxedwithoneanother.
Exemplar:• Youthsocializeinformally.Theyarerelaxedintheirinteractionswitheachother.Theyappeartoenjoyoneanother’scompany.
TheratingscaleintheOSTObservationInstrumentasksuserstoassestheextenttowhicheachindicatorisorisnotpresentduringanobservation.Whilethedevelopershaveexperimentedwithboththree-andfive-pointscalesinvariousstudies,thethirdeditionoftheinstrumentusesaseven-pointratingscalewhichgivesmoreroomforcapturingsubtleties,where1=notevidentand7=highlyevidentandconsistent(seebelow).A“5”ratingisconsideredbasicquality.Observersareinstructedtofirstselecttheoddnumberthatmostcloselyreflectsthelevelofevidenceobservedandthen,ifnecessary,tomoveupordowntotheadjacentevennumberifthatmoreaccuratelyreflectsthepresenceoftheindicatorwithintheactivity.
DevelopersoftheOSTObservationToolhavestructureditflexiblysothatuserscanarrangescalesdifferentlyfordifferentpurposes.Althoughdefinitiverulesforconstructingscalesdonotexist,intheirreportonthevalidationstudy(Pechmanet.al.,2008),theauthorspresentfourdifferentmethodsforcreatingscales.Thefoursetsofscaleswereusedindifferentstudiesacrossseveralyearsandwereeachguidedbyseparatetheoriesandevaluationquestions.Theauthorspresentthescalesetsasdifferentoptionsforuserstosummarizedata
1 2 3 4 5 6 7
Exemplarisnotevident
Exemplarisrarelyevident
Exemplarismoderatelyevidentorimplicit
Exemplarishighlyevidentandconsistent
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
45
fromtheOST.Userswhoareinterestedinsummarizingdatashouldrefertothereportofthevalidationstudyforinformationonwhichitemscomposespecificscales.
Thefirstscalesetincludesfourscales:(1)Youthrelationship-buildingandparticipation,(2)Staff-youthrelationships,(3)Skillbuildingandmasteryand(4)Activitycontentandstructure.ThesescaleswereusedinastudyofSharedFeaturesofHigh-PerformingAfter-SchoolProgramsconductedonbehalfofTheAfter-SchoolCorporationandtheSouthwestEducationalDevelopmentLaboratory(Birminghametal,2005)
Thesecondscalesetissimilarandalsoincludesfourscales:(1)Youthrelationship-building,(2)Staff-relationshipbuilding,(3)Instructionalmethodsand(4)Activitycontentandstructure.ThesescaleswereusedinthefirstyearoftheevaluationoftheNewJerseyAfter3initiative(Kimetal.,2006).
Thethirdscalesetincludesthreescales:(1)Relationships,(2)Instructionalstrategiesand(3)Activitycontentandstructure.ThesescaleswereusedinthefirstyearoftheevaluationoftheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative(Russelletal.,2006).
ThefourthscalesetisbasedontheSAFEframeworkwhichemphasizesactivitiesthatareSequenced,Active,FocusedandExplicit,asdescribedbyDurlakandWeissberg(2007).ThefourscalesinthissetcorrespondtothesefourSAFEdomains.ThesescaleswereusedinthesecondyearevaluationsofboththeNewJerseyAfter3initiativeandtheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative(WalkingEagleetal.,2008;Russelletal.,2008).
TechnicalPropertiesPsychometricinformationpresentedherefortheOSTObservationInstrumentcomesfromthethreestudiespreviouslymentioned(SharedFeaturesofHigh-PerformingAfter-SchoolPrograms,NewJerseyAfter
3,andtheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative).
ScoreDistributionsPechmanandcolleagues(2008)examinedthescoredistributionsofallscalesineachofthefourscalesets.Theauthorsgenerallyfoundgoodvariabilityinthescoresacrossraters’observationsof159to238activitiesin10to15programsobservedacrossthethreestudies.OneexceptionwastheActivescalefromtheSAFEscaleset.Theaveragescoreforthisscalewassomewhatlow(1.9onascaleof1to7),makingitdifficulttodeterminewhetherthescalehasdifficultycapturingdifferencesacrossprogramsormostprogramssimplydonotkeepyouthveryactive.However,findingssuggestthatallotherscalescapturemeaningfuldifferencesacrossavarietyofactivitiesandprograms.
InterraterReliabilityObserversusingthisinstrumentreachedhighlevelsofagreement.Examiningdatafromfiveassessmentsacrossthreeseparatestudies,pairsofresearchersco-observedbetween19to40activitieswithin10to15programsateachassessment.UsingPearsonandintraclasscorrelations,researchersexaminedtheinterraterreliabilityfortheoverallscoreaswellastheaveragescorewithineachoftheinstrument’sfivedomains.Whenavailable,intraclasscorrelationsweregenerallyabovetherecommendedvalueof.50,indicatingstrongagreement.StrongagreementwasalsosupportedbythePearsoncorrelations,whichwereclosetoorabovetherecommendedvalueof.70.7Therefore,thesefindingssuggestthattrainedraterscanachievehighoverallagreementacrosseachofthefivedomains.8
InternalConsistencyInternalconsistencywasstrongforallscalesacrossthefoursetswithCronbach’salphasexceedingtherecommendedvalueof.70.Thesefindingssuggestthateachofthescales’itemsarehighlyrelatedandformmeaningfuldomains.Thefirstscalesetsummarized
7TheoneexceptionwasfortheActivityContentandStructuredomain,whichwaslowerforoneoutofthefiveassessments,butagreementontheotherfourassessmentswasstrong.8Theinterraterreliabilitiesforspecificitemsandscaleswerenotreportedandcouldbelower.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
46
TASCstudydatacollectedfrom173independentobservationsin10programsandhadalphalevelsrangingfrom.73to.88.ThesecondscalesetsummarizedNewJerseyAfter3datacollectedfrom179independentobservationsin10programsandhadalphalevelsrangingfrom.81to.83.ThethirdscalesetsummarizedNewYorkCityOSTstudydatacollectedfrom238independentobservationsfrom15programsandhadalphasrangingfrom.80to.87.ThefourthscalesetsummarizedbothNewYorkCityOSTstudydataandNewJerseyAfter3datacollectedfromacombinedtotalof358observationsandhadalphalevelsrangingfrom.84to.88.
ConcurrentValidityOSTdevelopersexaminedtheconcurrentvalidityofthethirdscalesetdrawingon1,444youthsurveysfromtheDYCDOSTinitiativeinNewYorkCityandtheNewJerseyAfter3initiative.Specifically,usingSpearman’sRhorankordercorrelationcoefficients,researchersexaminedtheassociationsbetweentheOSTRelationships,InstructionalStrategies,andActivityContentandStructurescaleswithresponsesfromaseparateyouthsurveyonInteractionswithStaff,InteractionswithPeers,SenseofBelonging,ExposuretoNewExperiences,andAcademicBenefits.
HigherscoresontheRelationshipsscalewererelatedtohigherscoresforExposuretoNewExperiences,InteractionswithPeers,andInteractionswithStaffforbothyearsofthestudy,aswellashigherscoresforSenseofBelonginginthefirstyearandAcademicBenefitsinthesecondyear.HigherscoresontheInstructionalStrategiesscalewererelatedtohigherscoresonExposuretoNewExperiences,SenseofBelonging,andInteractionswithStaffinthefirstyearofthestudybutnotthesecond.InstructionalStrategieswasnotrelatedtoInteractionswithPeersorAcademicBenefits,andtheActivityContentandStructureScalewasnotrelatedtoanyyouthexperiencesfromtheyouthsurveyineitheryear.
Basedonthesefindings,theavailableconcurrentvalidityevidenceismixed.Inaddition,concurrentvalidityevidencedoesnotcurrentlyexistfortheotherscalesets.
ValidityofScaleStructureAnanalysisexaminingscalestructurevaliditytestswhetheritemsformingmultiplescalestrulymeasuredistinctandcoherentdomainasexpected.Theinstrument’sdevelopersconductedafactoranalysistoexaminethestructuralvalidityofthefourthscaleset.FindingssuggestedthattheitemscouldbecategorizedintothefourSAFEdomains,althoughtheSequencedandExplicitdomainsweremoderatelyrelated,suggestingthattheyarenotentirelydistinctfromoneanother.Inaddition,theActivedomainappearedtobeacombinationoftwodistinctcategories,suggestingthatthisdomainisnotcompletelycohesive.
AlthoughtheinstrumenthassomeevidenceofscalestructurevalidityfortheSAFEdomains,evidencefortheotherthreescalesetsiscurrentlyunavailable.
UserConsiderationsEaseofUseWhiletheOSTObservationInstrumentisavailableonlineandisfreeforanyonetodownloadanduse,itisimportanttorecognizethatitwasdevelopedwithprimarilyaresearchaudienceinmind.Theintroductiontothetoolincludesanoverviewandreviewofbasicproceduresforconductingobservationsandcompletingtheform,butthematerialshavenotbeentailoredforpractitionersatthistimeanduselanguage(e.g.,sampling,reliability)thatmaynotbeaccessibletosomeaudiences.
ItsdevelopersconsidertheOSTObservationTooltobehighlyefficienttocompleteinthefield.Usersobserve15minutesofanactivityandscoreitimmediatelyinlessthanfiveminutes.Usersareadvisedtoobserveatotalof8-10activitiesoveratleasttwoafternoons(orapproximatelythreehoursofprogramobservation)toadequatelysampleprogramofferings.Additionalguidanceabouthowtoorganizeobservationsonsite,sampleactivitiesappropriatelyandmanagemultipleobserversisprovidedintheinstrument’sproceduressection.
AvailableSupportsAtthistime,trainingrelatedtotheOSTObservationInstrumentislimitedtoindividualsinvolvedina
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
47
specificstudythatemploystheinstrument.Datacollectorsparticipateintrainingsthatprovideadetailedoverviewoftheinstrument,itsindicatorsandthetheoreticalframework.Followingareviewoftheoperationaldefinitionsforeachcategoryandgroupofindicators,researchersparticipateinpracticeratingsessionsusingvideo-tapedsamplesofafter-schoolactivitiestobuildinterraterreliabilitypriortofieldwork.Additionalreliabilitychecksareconductedinthefieldandinfollow-upmeetingstoensurecommoninterpretationoftermsanditems.
ResearcherstypicallyusetheobservationdatacollectedwiththeOSTinstrumentinconjunctionwithsupplementary(butnotformallylinked)measuressuchasinterviews,surveysandfocusgroups.Asresearchcontinues,validitydatawillbecomeavailableabouttherelationshipbetweenprogramqualityfeaturesandyouthoutcomesasmeasuredbysomeoftheseotherinstruments.
InTheFieldIn2005,theNewYorkCityDepartmentofYouthandCommunityDevelopment(DYCD)contractedwithPSAtoconductacomprehensiveevaluationofits536OSTprogramsserving69,000participantsunderitsOut-of-SchoolTimeProgramsforYouthinitiative.Participatingserviceproviders,servingallgradelevels,operatedunderoneofthreefundingmechanisms:a)OptionI,targetedtowardageneralpoolofserviceprovidersoperatingprogramsinneighborhoodsthroughoutNewYorkCity;b)OptionII,forprogramsusinga30%privatefundingmatch;andc)OptionIII,forprogramsoperatedincollaborationwiththeDepartmentofParksandRecreationandofferedatParkssites.
TheOSTObservationToolwasusedinasampleof15ofthe536OSTprogramsaspartoftheoverallevaluation.Theevaluationcombinedthissamplingdatawithotherdatasources,includingaparticipationdatabaseandprogramdirectorsurveystoroundoutthepicture.Thefirst-yearevaluationfindingsidentifiedavenuesforimprovingtheeffectivenessofOSTprogramminginseveralareas.Forexample,
althoughprogramssuccessfullyenrolledstudentsinthefirstyear,theystruggledtomaintainhighyouthparticipationrates,suggestinganeedtoestablishprogrampoliciesandactivityofferingsthatencouragedregularparticipation.Additionally,whileprogramsinthefirstyearconsistentlyprovidedsafeandstructuredenvironmentsforparticipants,,theyexperiencedchallengesindeliveringinnovative,content-basedlearningopportunitiesthatengagedyouth.
PSA’ssecondyearfindingscenteredonevidenceofprograms’effortstoimproveprogramqualityandscale.ThefindingssuggestthatOSTprogramsincreasedboththeirenrollmentandparticipationrates.Programsscaledupenrollmentfrom51,000youthinthepreviousyeartoservemorethan69,000youththroughoutNewYorkCity.RatesofindividualyouthparticipationalsoincreasedsubstantiallycomparedtoYear1,indicatingthatprogramsweresuccessfullyrecruitingandretainingparticipants.Inaddition,programsreportedthattheyimprovedthequalityandcapacityoftheirprogramstaffthroughimprovedhiringandprofessionaldevelopmentopportunities.
Inyearthree,theevaluationwillcontinuetocollectdatafromOSTprogramstoexploretheassociationsamongprogram-qualityfeatures,youthparticipationpatterns,andyouthoutcomes.
ForMoreInformationTheOSTObservationInstrumentisavailableonlineat:www.policystudies.com/studies/youth/OST%20Instrument.html
ContactChristinaRussellorEllenPechmanPolicyStudiesAssociates1718ConnecticutAvenue,NWWashington,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
48
PurposeandHistoryTheProgramObservationToolisthecenterpieceoftheNationalAfterSchoolAssociation’sprogramimprovementandaccreditationprocessandisdesignedspecificallytohelpprogramsassessprogressagainsttheirStandardsforQualitySchool-AgeCare.TheinstrumentwasdevelopedbytheNationalAfterSchoolAssociation(NAA)andtheNationalInstituteonOut-of-SchoolTimein1991andwasbasedontheAssessingSchool-AgeChildCareQualityProgramObservationInstrumentdevelopedbySusanO’Connor,ThelmaHarms,DebbyCryerandKathrynWheeler.Theinstrumentwasrevisedin1995andpilotedbetween1995and1997.AdditionalrevisionswerethenmadebeforeNAA’saccreditationsystembecameactivein1998.
TheNAAStandards,whichtheProgramObservationToolisbasedon,aremeanttoprovideabaselineofqualityforafter-schoolprogramsservingchildrenandyouthbetweenages5and14.Theyareintendedforuseingroupsettings–primarilyschoolandcenter-based–wherechildrenparticipateregularlyandwherethegoalissupportingandenhancingoveralldevelopment.
Rootedintheframeworksoftheearlychildhoodandschool-agecarefieldsoftheearly1990’s,theinstrumentandtheNAAstandardsreflectmuchofthethinkingofthetime,particularlyintermsoflicensingandmonitoring,andhavebeenusedaspartofaseven-stepaccreditationprocessforthepastdecade.Pre-datingthecreationofthefederal21stCenturyCommunityLearningCentersprogram,theNAAstandardshaveplayedasignificantroleinthefieldandhavebeenadoptedbyarangeofprogramsandsystemsacrossthecountry.Therearenow20,000copiesofthestandardsbookinprintandover500programsacrossthecountryareinsomestageoftheaccreditationprocess.
In2008,NAAshiftedawayfromitsroleasanaccreditingbodyandisnowofferingaccreditationthroughtheCouncilonAccreditation.NAAwillcompletetheaccreditationprocessthroughtheendof2009forallagenciesthatappliedandwereatsomestageoftheprocessbeforeSeptember2008.Foragenciesapplying
afterSeptember2008,aprogramrelationshipmanagerfromtheCouncilwillassistthemthroughtheprocess.
TheProgramObservationToolwillstillbeavailableforagenciesinterestedinusingitforself-assessmentandimprovementpurposes(ashasalwaysbeenthecaseforagenciesnotseekingaccreditation).NAAwillretaintherightstothestandardsandmaterials,andcontinuetoprovidesupportsfortechnicalassistance.
ContentTheProgramObservationToolmeasures36“keysofquality”thatareorganizedintosixcategories.Fiveofthosecategoriesareconsideredobservableandareassessedprimarilythroughobservation:humanrelationships;indoorenvironment;outdoorenvironment;activities;andsafety,healthandnutrition.Thesixthcategory–administration–isassessedthroughquestionnaire.
BecauseofNAA’scommitmenttosupportingchilddevelopmentinaholisticway,theinstrumentmeasuresarangeofsocialprocesses–howchildrenandstaffwithinthesettinginteract.Becauseofthelinktoaccreditation,italsofocusesquiteabitonprogramresourcesandthearrangement(spatial,socialandtemporal)ofthoseresourceswithintheprogram.Unlikesomeoftheothertoolsinthiscompendium,theProgramObservationToolalsoaddressesprogrampoliciesandproceduresthatarebelievedtoinfluencequality.
TheProgramObservationToolpre-datestheNationalResearchCouncil’sfeaturesofpositivedevelopmentsettingsframework(2002)byoveradecadeanddrawsmoreheavilyontheearlychildhoodliteraturethantheyouthdevelopmentliterature.However,itdoesaddressmanyoftheNRCfeatures,placingtheleastemphasison“supportforefficacyandmattering”and“skillbuildingopportunities.”
StructureandMethodologyThefivequalitycategoriesthatarethefocusoftheProgramObservationToolaremeasuredusingoneoverallinstrumentthatincludesthe20relevantkeysandatotalof80indicators(fourperkey).Ifaprogramisgoingthroughthe
DevelopedbytheNationalAfterSchoolAssociation
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
49
accreditationprocess,theadministrationitems(includedintheStandards,butnottheObservationTool)areassessedseparately,throughquestionnaire/interview.
TheratingscaleusedthroughouttheProgramObservationTool(seeexamplebelow)isintendedtocapturewhethereachindicatoristrueallofthetime(3),mostofthetime(2),sometimes(1)ornotatall(0).Althoughspecificdescriptionsofwhata0,1,2,or3lookslikeforeachindicatorarenotprovided,betweenoneandeightdescriptivebulletstatementsareincludedundereachindicatortoclarifymeaning.
Spaceisprovidedforobserverstotakenotesoneachindicator.Atthebottomofeachpage,observersareencouragedtototaltheirnumericalscoresforeachqualitykeytoachieveanoverallprogramrating.Tallysheetsandinstructionsareprovidedformultipleobserverstoreconcileandcombinetheirscores.Inordertoachieveaccreditation,therearetwo“weighted”categories–program/activitiesandsafety/nutritioninwhichprogramsmustmeetacertainthresholdinordertobeaccredited.
6.Childrenandyouthgenerallyinteractwithoneanotherinpositiveways.GuidingQuestions:Dochildrenseemtoenjoyspendingtimetogether?Dotheytalkaboutfriendsattheprogram?Dotheytendtoincludeothersfromdifferentbackgroundsorwithdifferentabilitiesintheirplay?
Comments Rating
a.Childrenappearrelaxedandinvolvedwitheachother.Groupsoundsarepleasantmostofthetime.•
0123
b.Childrenshowrespectforeachother.Teasing,belittlingorpickingonparticularchildrenisuncommon.•
Childrenshowsympathyforeachotherandhelpeachother.•0123
c.Childrenusuallycooperateandworkwelltogether.Childrenwillinglysharematerialsandspace.•
Theysuggestactivities,negotiaterolesandjointlyworkoutrules.•
Childrenincludeotherswithdevelopmental,physicalorlanguage•differenceintheirplay.
Childrenoftenhelpeachother.•
Thereisastrongsenseofcommunity.•
0123
d.Whenproblemsoccur,childrenoftentrytodiscusstheirdifferencesandworkoutasolution.
Childrenlistentoeachother’spointofviewandtrytocompromise•(e.g.iftwochildrenwanttousethesameequipment,theymaydecidetotaketurnsasasolution).
Childrenknowhowtosolveproblems.•
Theirsolutionsaregenerallyreasonableandfair.•
Theydonottrytosolvedisagreementsbybullyingoractingaggressively.•
0123
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
50
TechnicalPropertiesAlthoughnopsychometricevidenceisavailableontheProgramObservationToolitself,thereisinformationavailableabouttheASQ(AssessingSchool-AgeChildcareQuality),fromwhichthePOTwasderived.UsersshouldnotethattheASQ’spsychometricpropertiesmaynotbecompletelyconsistentwiththoseofthePOT.9Overall,evidenceforinterraterandtest-retestreliabilityisstrongfortheASQ,meaningtheassessmentsofthesameprogrampracticesbydifferentobserversareconsistentandassessmentsarestableovertime.Followingrevisionstothescales,evidenceofinternalconsistency,orthedegreetowhichitemsfittogetherinmeaningfulways,wasstrong.Validitydataarelimited,althoughpreliminaryevidenceforconcurrentvaliditysuggeststheinstrumentmayyieldaccurateinformationabouttheconceptsitmeasures.10
ThefieldstudywhichprovidespsychometricsupportfortheASQinvolvedasampleof40after-schoolprogramsinMassachusettsandNorthCarolina(Knowlton&Cryer,1994).TwoversionsofASQscaleswereexamined:originalandrevised.Therevisedversion’sscalesarecomparabletothoseinthePOT:HumanRelationships,IndoorEnvironment,OutdoorEnvironment,Activities,andSafety,HealthandNutrition.Oftheoriginalscales,onlytwooverlappedwiththePOT,namelyhumanrelationshipsandactivities.Whenappropriate,westatewhichsetofscalesexhibitsspecificproperties.
InterraterReliabilityToexamineinterraterreliability,pairedratersevaluated40programsusingthemeasure.ASQindicatorsareorganizedinto21itemsandthoseitemsarefurtherorganizedintofivescales.KnowltonandCryerexaminedagreementamongratersatboththeitemandscalelevels.Thekappastatisticmeasuresthedegreetowhichratersagreeandcorrectsforcaseswhereratersagreesimplybychance.Allitemshadkappascoresabove.70,generallyconsideredthethresholdforhighagreement.Theauthorsalsocomputedintraclasscorrelationsand
alloftheASQoriginalscalesandtotalscorewerenearorabove.70,showinggoodagreementonthesescores.11However,becauseonlytheoriginalASQscales,nottherevisedversions,wereexamined,wecangeneralizeonlyforthosescalesthataresimilar(HumanRelationships,Activities–andthetotalscore).
Test-RetestReliabilityIdeally,instrumentsshouldbeabletoassessmajorchangesovertimebutshouldexhibitstabilityinscoresacrossmultipleassessmentsintheshort-term.FortheASQ,25programswerereassessedtwoweeksaftertheirinitialassessmenttodeterminetheinstrument’stest-retestreliability.KnowltonandCryer(1994)foundthatallitemsdemonstratedacceptablestability,withkappascoresabove.70.12Theauthorsalsocomputedintraclasscorrelationcoefficientstoexaminestabilityoftheoriginalscalesandtotalscoreovertime.Allscalesandtotalscorewereabove0.70,butwecanonlygeneralizetothescalesthatoverlapwiththePOT–HumanRelationshipsandActivities–andthetotalscore.
InternalConsistencyTodeterminewhetheritemswithinthescalesfittogetherinmeaningfulways,KnowltonandCryerexaminedtheinternalconsistencyoftheoriginalscalesandthetotalscorebycomputingastatisticcalledCronbach’salpha.Thealphaforoneoftheoriginalscales(Safety)wasverylow,sotheauthorsrevisedthescales(andtherevisionsmorecloselymatchthePOT).Resultsfromtherevisedscalesandthetotalscoredemonstratedgoodinternalconsistency,withalphasnearorabovetherecommendedcutoffof.70.
ConvergentValidityTodeterminetheextenttowhichtheASQyieldsaccurateinformationabouttheaspectsofprograms
9ThereareslightlymoreindicatorsintheASQ(84)thaninthePOT(80).Itisunclearhowmanyindicatorsareidenticalorsimilar.10Thetechnicalsectiononlyevaluatesevidencefromtheobservationalportionoftheinstrument,nottheadministrationquestionnaire.
11ReadersshouldnotethatKnowltonandCryeralsolookedattheinterraterreliabilityoftheindividualindicatorsthatcomposedtheitems.Manyindicatorsexhibitedpooragreement.However,summingtheindicatorsintoitemscreatesmorereliablemeasuresbecauseitcancelsoutsomeofthemeasurementproblems.Forthisreason,usersshouldevaluateprogramsbasedontheitemsandscales,nottheindividualindicators.12Similartotestsoninterraterreliability,theauthorsfoundthat40percentoftheindicatorshadpoorshort-termstability.However,themeasurementproblemsassociatedwithindividualindicatorslikelycanceloutwhencreatinganitemscore.Again,usersshouldexaminetheitemsandscales,nottheindicators,whenevaluatingprograms.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
51
itissupposedtomeasure,KnowltonandCryer(1994)comparedtheASQscoresfor11programswithsubjectiveratingsbyexperts.Specifically,twoexpertsrankedasetofprogramsintermsofoverallqualitywithineachofthefiveoriginalASQdomainsusingtheirowncriteria.UsingwhatiscalledtheSpearmancorrelation,ASQrankingsweremoderatelytostronglyrelatedtotheexpertrankings,withtheexceptionoftheSafetyandHealthandNutritionareas.Thisvalidityevidenceshouldberegardedaspreliminary,basedonthesmallnumberofprogramsandexpertsincludedintheanalysisandthefactthatestimateswerecomputedontheoriginal,unrevisedASQscales.
UserConsiderationsEaseofUseTheProgramObservationToolandNAAstandardsweredevelopedwithsignificantinputfrompractitioners,resultinginaccessiblelanguageandauser-friendlyformat.
ProgramswishingtoundertaketheaccreditationprocesscancontacttheCouncilonAccreditation(seecontactinformation).Forself-assessmentpurposes,observingtheprogramandscoringthefullinstrumenttakesroughly3-5hours.Theself-studymanualprovidesverydetailedguidancetoprogramdirectorsandstaffonhowlongandhowmuchoftheprogramtoobserve,howtodetermineratingsandhowtocombinescoresfromdifferentraters.
Theobservationtoolisoneofapackageofproductsrelatedtoaccreditation–theAdvancingandRecognizingQualityKit–whichincludesthestandardsbook;theguidetoprogramaccreditation;self-studymanualsthatincludetheobservationtoolaswellasstaff,familyandchild/youthquestionnaires;andatrainingvideo.Theteamleader’smanualwalksprogramdirectorsorstaffthroughthevariousstepsoftheaccreditationprocessindetailandincludesspecifictoolsfordevelopinganactionplanforimprovementbasedonobservationaldata.Asapackage,theseresourcescostapproximately$300.Thereareadditionalcostsrelatedtothefullaccreditationprocess.
AvailableSupportsItisimportanttoreiteratethatwhilethissummaryhasfocusedspecificallyontheProgramObservationTool,thatinstrumentisjustonepieceofanintegratedsetofresourcesrelatedtoself-studyandaccreditation.NAAofferstrainingthatcoverstheProgramObservationToolthroughitstheday-longEndorserTraining(NAArecommendstwoandahalfdaysoftraininginordertoensurereliability).SomeNAAstateaffiliatesofferlocaltrainingrelatedtotheinstrumentforprogramsinterestedinusingitforself-assessmentandimprovement.
IntheFieldTheUniversityofMissouri-AdventureClubisadistrict-wideafter-schoolinitiativeforelementaryschoolstudentsinMissouri’sColumbiapublicschooldistrict.TheNationalAfterschoolAssociation’sstandardsandobservationtool,aswellasthelargerAdvancingSchool-AgeQuality(ASQ)processwithinwhichtheseareembedded,serveastheorganizingframeworkforAdventureClub’s18programs.Institutionalizationofthestandardshasresultedinacommonlanguageandunderstandingofprogramqualitythatspanstheindividualstaff,programandcross-sitelevels.
TheProgramObservationToolisusedbyeachofthe18programsseveraltimeseachyearandisacorepieceofthenewstafforientationprocess,whichincludesconductinganddiscussingaprogramobservationwithmoreseniorcolleagues.Line-staffarewell-versedinthe36“keysofquality,”andeachweekduringcross-sitedirectors’meetingsonekeyisthefocusofin-depthdiscussion.
Inadditiontoregularobservations–bystaff,administratorsandparents,eachprogramhasanASQteammadeupofthesestakeholders(parents,staff,administratorsandsometimeschildren).Teamsmeetmonthlyorbi-monthlytoreviewnewobservationdataandrevisittheprogram’simprovementplan.“Thisisacontinuousprocess–itdoesn’tstartandstopeachyear.Eachprogramdevelopedaplanwhenwefirststartedusingthestandardsandthosegetrevisitedandupdatedseveraltimesayearbasedonongoingobservation,”
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
52
explainedChrissyPoertner,whocoordinatestheaccreditationandimprovementprocessforthe18programs.Observationdataandprogramimprovementplansarealsousedtoguidestaffdevelopment.
Initiallysomestaffexpressedconcernthatthetoolwaslongandwouldbecumbersometoworkwith,butPoertnersaystheoverallresponsehasbeenverypositive,especiallybecauseeveryoneisinvolvedinandownstheprocess.“Thesetoolsgivestaffaguide,andwhenyou’reoutthereworkinginthefieldtheautonomycanfeeloverwhelming.Becausewe’vecreatedthebuy-inandtheyarepartoftheimprovementprocess,peoplerespondreallypositively.”
ForMoreInformationAdditionalinformationaboutNAA’sobservationtoolandaccreditationprocessisavailableonlineat:http://naaweb.yourmembership.com/?page=NAAAccreditation
ContactJudyNee,PresidentandCEOTheNationalAfterSchoolAssociation529MainStreet,Suite214Charlestown,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
53
PurposeandHistoryTheProgramQualityObservationScale(PQO),fundedbytheNationalInstituteofChildHealthandHumanDevelopment(NICHD)aspartofaninitiativetostudyout-of-schooltime,wasdesignedtohelpobserverscharacterizetheoverallqualityofanafter-schoolprogramenvironmentandtodocumentindividualchildren’sexperienceswithinprograms.Thetoolhastwocomponents–qualitativeratingsfocusedontheprogramenvironmentandstaffbehaviorandtimesamplesofchildren’sactivitiesandtheirinteractionswithstaffandpeers.
ThePQOwasdevelopedforresearchpurposesbyDeborahVandellandKimPierceandhasbeenusedinaseriesofstudies,primarilylookingatthequalityofschool-andcenter-basedafter-schoolprogramsservingfirstthroughfifthgradeelementary-schoolchildren.TheinstrumenthasitsrootsinVandell’sobservationalworkinearlychildhoodcaresettings,includingtheNICHDStudyofEarlyChildhoodCareandherworkinafter-schoolprograms,includingtheEcologicalStudyofAfter-SchoolCarefundedbytheSpencerFoundation.
Theprimaryfocusofthetimesampleprocedureisonthreecomponentsofindividualchildren’sexperiencesinprograms–relationshipswithstaff,relationshipswithpeersandopportunitiesforengagementinactivities.Thequalititativeratingsfocusonallchildren’sexperiencesintheprogramintermsofstaffbehaviorandtheprogramenvironment.Thequalitativeratingsofprogramenvironmentarebestsuitedforuseinformalschool-orcenter-basedafter-schoolprograms,whilethequalitativeratingsofstaffbehaviorsandthetimesamplingofchildren’sactivitiesandinteractionsarerelevantinbothformalprogramsettingsaswellasinformal,adult-supervisedsettings.
ContentThePQOwasdesignedtohelpresearchersunderstandthequalityofchildren’sexperiencesinsideprogramsandfocusesonthreecomponentsofquality–relationshipswithstaff,relationshipswithpeersandopportunitiesfor
engagementinactivities.Asnotedabove,theinstrumenthastwomajorcomponents–qualitativeratingsandtimesamplesofchildren’sactivitiesandinteractions.Ratingsaremadeoftheprogramenvironmentandstaffbehavior,orwhatthedeveloperscall“caregiverstyle.”Thefollowingthreeaspectsoftheprogramenvironmentarerated:
Programmingflexibility•
Appropriatenessanddiversityoftheavailable•activities
Chaos•
Fourcharacteristicsofcaregiverstylearerated:
Positivebehaviormanagement•
Negativebehaviormanagement•
Positiveregardforchildren•
Negativeregardforchildren•
Thetimesamplecomponentofthetoolisdesignedtorecordtheactivitiesandinteractionsofindividualchildrenwithintheprogram.Thereare19differentactivitycategoriesforobserverstoselectfrom(e.g.,arts/crafts,largemotor,snack,academic/homework).Inaddition,thetoolprovidesobserverswithsixdifferenttypesofinteractionstolookfor:positive,neutralandnegativeinteractionswithpeers,andpositive,neutralandnegativeinteractionswithstaff.
BecausethefocusofthePQOisonchildren’sexperiencesinsideofprograms,ittendstofocusprimarilyonsocialprocessesandlessonresourcesortheorganizationofthoseresourceswithinprograms.However,Vandellandcolleagueshavedevelopedanumberofrelatedmeasuresthatdocaptureaspectsoftheseothercomponents,suchasaphysicalenvironmentscale.DevelopedlongbeforetheNationalResearchCouncil’sfeaturesofpositivedevelopmentalsettingsframework(2002),someaspectsofthePQOalignwellwiththatframeworkwhileothersmoreclearlyreflectitsearlychildhoodroots.
DevelopedbyDeborahLoweVandell&KimPierce
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
54
StructureandMethodologyThefirstcomponentofthePQO–thequalitativeratings–isfocusedonprogramenvironmentandstaffbehavioror“caregiverstyle.”Ratingsareassignedbasedonaminimumof90minutesofcontinuousobservation.Whileprogramenvironmentratingsaremadeoftheprogramasawhole,caregiverstyleratingsaremadeseparatelyforeachstaffmemberobserved(butcouldbeadaptedtorateallstaffmemberscollectively).
Programenvironmentandcaregiverstyleratingsaremadeusingafour-pointscale.Usersaregivendescriptionsofwhatconstitutesa1,2,3or4rating
forthreedistinctaspectsofprogramenvironment–flexibility,activitiesandchaos,andfourdifferentaspectsofcaregiverstyle–positivebehaviormanagement,negativebehaviormanagement,positiveregardandnegativeregard.A“4”ratingmeansthatparticularaspectoftheenvironment(orstaffbehavior)ishighlycharacteristicoftheprogram(seeexamplebelow).
ThetimesamplingcomponentofthePQOisfocusedontheactivitiesandinteractionsthatindividualchildrenengageinatanafter-schoolprogram.Activitytypeisrecordedusing19differentcategories.Interactionsareassessedintermsofwhethertheyarepositive,neutralornegativeandwhethertheyhappenwithpeersorwithstaff.Inaddition,staffinteractionsarefurthercodedtonotewhethertheyareone-on-one,smallgrouporlargegroup.
Timesamplingentailsdocumentingtheactivitiesandinteractionsthatanumberofindividualchildrenhaveinaprogramforshortperiodsoftime.ThedevelopersofthePQOsuggestthat30-minutetimesamplesbeconductedin30-secondintervals(foratotalof60intervals).Duringeachinterval,theraterobservesachildfor20secondsandthenspends10secondsrecordingorcodingwhattheyobserved.Becausetimesampleobservationswillsometimesinvolvefewerthan60intervals,scoresneedtobeadjustedforthetotalnumberofintervalsactuallyobserved.Thistimesamplingcomponenthasbeenadjustedforuseindifferentstudies(forexamplewithlongerobservationperiods,fewercycles,etc).13
TechnicalPropertiesAvailablepsychometricevidencesupportingthePQOaddressesscoredistribution,interraterreliability,test-retestreliability,convergentvalidityandconcurrentvalidity,mostlyfromareportbyVandellandPierce
Chaos
4=
Chaosanddisorganizationarehighlycharacteristics,persistingacrossmultipleactivitiesandsettings.Thechildrenareoutofcontrol.Theymaybefightingwithoneanother,yelling,orbehavinginappropriately,jumpingonfurniture,ruiningmaterials,orjustgenerallyrunningaround.Activitiesdonotseemorganized;disorderisevident.
3=
Thereischaosanddisorganizationintheenvironment,butitisnotcharacteristicofmanychildrenorallactivities.Agroupofchildrenmayexhibitthebehaviorsthatmeritaratingof4orsomeactivitiesandtransitiontimesmaybechaoticanddisorganizedsuchthattheprogressoforbeginningofactivitiesforsomechildrenisimpeded.
2=
Oneortwochildren’sbehaviormaybeoutofcontrol,butingeneral,children’sbehaviorisappropriateandreasonablycontrolled.Transitionsandactivitiesgenerallygosmoothly,althoughtheremaybeexceptions.
1=Nochaosordisorganizationisobservedintheenvironment.Children’sbehaviorisappropriate,andactivitiesandtransitionsproceedsmoothly.
13ReadersshouldnotethatthedevelopersdidnotdesignthePQOforself-assessment,butratherforalargestudythatrequiredthattimesampleobservationscenteronasinglechildofinterest.Thetimesamplecomponentoftheinstrumentcouldbemodifiedforgeneralusebyobserversrandomlychoosingchildrenforeachassessment.However,itisuncleariftheavailablepsychometricfindingsonthetimesampleobservationswillextendtothismodifiedinstrument.Thiscaveatdoesnotapplytothequalitativeratings,whichweredesignedtomeasuretheprogramasawholeanddonotrequiremodificationforself-assessment.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
55
(2006)basedonmultipleobservationsofafter-schoolprogramsoverseveralyearsintheNICHDStudyofAfter-SchoolCareandChildren’sDevelopment.Thestudyincludedabroadsampleof46for-andnonprofitprogramslocatedinschools,childcarecentersandcommunitycenters.Eachprogramwasobservedthreeorfourtimesayearoverafive-yearperiod.PredictivevalidityevidencecomesfromastudyconductedbyPierce,BoltandVandell(2008),whichexaminessocialandacademicoutcomesfrom120childrenenrolledin46after-schoolprograms.Childrenwereassessedduringtheir2ndand3rdgrades.
ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecificdimensions.VandellandPierceexaminedtheaveragescoresandrangesforoverallobservedqualityandtheindividualqualitativeratingscalesobtainedinformalprogramsintheStudyofAfter-SchoolCare.Theoverallqualityscorewascreatedbyaveragingtheindividualqualitativeratings(afterreversingthescoresforChaos,NegativeRegardandNegativeBehaviorManagement).Forboththeoverallqualityscoreandtheindividualratings,annualcompositesareaveragesofallobservationsconductedwithinaschoolyear.
Theoverallscoreandmostofthequalitativeratingsandtime-sampledactivitiesandinteractionshadwidevariability,suggestingtheinstrumentcandetectdifferencesamongavarietyofprograms.Acrossmultipleobservationsinseveralyearsofstudy,thefullrangeofscoreswasobtainedformostofthesampledactivities.Amongthequalitativeratings,lowvariabilitywasfoundamongNegativeBehaviorManagementandNegativeRegardforChildren.However,thestrongvalidityevidencesuggeststhattheinstrumentisdetectingmeaningfuldifferencesinthesedomainsdespitetheirlowfrequencies.
Forchildren’sinteractions,thefullrangeofscoreswasobtainedforneutralinteractionswithstaffandwithpeers;aswouldbeexpected,therangewasmorerestrictedforinteractionsthatwereclearlypositiveornegative.
InterraterReliabilityThedegreetowhichdifferentratersagreewhenobservingthesameprogramwastestedforboththequalitativeratingsandtimesamplingcomponentsoftheinstrument.Forthequalitativeratings,kappacoefficientswerecomputedonceayearoverfouryears.Allofthedomainshadscoresabove.70,thebenchmarkforstronginterraterreliability,exceptStaffNegativeRegard,forwhichthelowestcoefficientwas.59.TheproportionofNegativeRegardscoresonwhichobserversachievedexactagreementwashigh,however,suggestingthatthemoderatekappascoremaybeduetotherelativeinfrequencywithwhichnegativeregardwasobserved.Theaveragekappascoreforstaffnegativeregardwas.82,suggestingthattrainedraterscanreachacceptableagreementonalldomains.
Agreementwasalsocomputedforalldomainsofthetimesampleobservationsexceptgroupsize.Kappascoresatalltimepointswereeitherabove.70orveryclose,indicatingstronginterraterreliability.
InternalConsistencyTodeterminewhetheritemsfittogethertoformameaningfuloverallscore,theauthorscomputedastatisticcalledCronbach’salpha.VandellandPiercefoundalphalevelsfortheannualoverallobservedqualityscoresaveraged.81,wellabovetherecommendedcutoffof.70.
Test-RetestReliabilityInordertodeterminewhetherthequalitycompositeandindividualqualitativeratingsgeneratedbythePQOarestableovertime,VandellandPiercecorrelatedtheratingsmadeinadjacentobservationsduringthesecondyearofthestudy,whenthesamplewaslargestandmostrepresentativeoftherangeofprogramsinthecommunity(N=45).Fourobservationswereconductedineachprogram,approximatelytwomonthsapart.Ratingsfromthefirstobservationwerecorrelatedwiththosefromthesecond;theratingsfromthesecondobservationwerecorrelatedwiththosefromthethird;andtheratingsfromthethirdobservationwerecorrelatedwiththosefromthefourth.Correlations
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
56
foroverallobservedqualitywerenearorabove.70,suggestingtheinstrumentdetectschangesinprogramqualityandisnotoverlysensitivetominorchanges.Correlationsfortheindividualratingswerelower,withtheaverageforalldomainsrangingfrom.34to.59.Thissuggeststhatprogramsareonlysomewhatstableintheirscoresforparticulardomainsoverperiodsoftwomonths.Itisunclearwhetherthisreflectsshort-termvariability(asmightbeseenfromonedaytothenext)ormeaningfulchangesoverthecourseoftwomonths.
ConvergentValidityToexaminewhetherthePQOyieldsaccurateinformationabouttheaspectsofprogramsitissupposedtomeasuretheauthorscomparedfindingsforthequalitativeratingstofindingsfromtheSACERS(alsoreviewedinthisreport).Ifbothinstrumentstrulymeasureprogramquality,onecanreasonablyexpectthatthefindingswillberelated.ThefollowingrelationshipsbetweenPQOandSACERSscaleswereexamined:(1)PQOProgrammaticFlexibilitywaspositivelyrelatedtoSACERSProgramStructure,(2)PQOAvailableActivitieswaspositiverelatedtoSACERSActivities,and(3)PQOStaffPositiveRegardandPositiveBehaviorManagementwaspositivelyrelatedtoSACERSInteractions,andPQOStaffNegativeRegardandNegativeBehaviorManagementwasnegativelyrelatedtoSACERSInteractions.Thesefindingsprovidestrongevidencethattheinstrumentadequatelymeasuresprogramquality.Althoughconvergentvalidityissupportedformostqualitativeitems,wecannotinferthevalidityoftheChaosratingbecausetherewasnocomparableSACERSquestiontocompareitto.
ConcurrentValidityAnotherwaytoexaminewhetherthePQOyieldsaccurateinformationabouttheaspectsofprogramsitissupposedtomeasureistocomparetheinstrument’sratingstootherdistinctbuttheoreticallyimportantconcepts.DeveloperscomparedthefindingsassessedbythePQOtostructuralfeaturesofafter-schoolprograms(Pierce,K.M.,Hamm,J.V.,Sisco,C.,&Gmeinder,K.,1995).Similartowhatisreportedintheearlychildhoodliterature,PositiveRegardratingswerehigherandNegativeRegardscoreswerelowerinnonprofit
programscomparedtofor-profitprograms,whenchild-staffratiosweresmallerandwhenprogramstaffhadmoreformaleducation.Programmingflexibilitywashigherinnonprofitcomparedtofor-profitprogramsandwhenchild-staffratiosweresmaller.Ratingsofavailableactivitieswerehigherinnonprofitprograms.
Relationsbetweentime-sampledactivitiesandinteractionswerealsoassociatedwithprogramcharacteristicsaswellaschildreportsoftheirexperiencesintheprograms.Forexample,childrenwereobservedtohavemorefrequentpositive/neutralinteractionswithstaffandlessfrequentnegativeinteractionswithpeersinprogramswithsmallergroupsizes;andsmallerstaff-childratioswereassociatedwithchildrenhavingmorefrequentpositive/neutralinteractionswithstaffandspendinglesstimeintransition(e.g.,standinginline)andinlargemotoractivities.Althoughthisevidenceisquitestrong,itisunclearwhetherresearchershadexpectedadditionalrelationshipsthattheydidnotfind.
PredictiveValidityInastudyconductedbyPierce,Bolt,andVandell(2008),researchersexaminedtherelationshipsbetweenthreePQOscales(Staff-childrelations,Availableactivities,andProgrammingflexibility)andsocialandacademicoutcomesof120childrenenrolledin46after-schoolprograms.Programobservationswereconductedseveraltimesperschoolyearfortwoyears,andchildren’soutcomeswereassessedattheendofeachschoolyearwhentheywerein2ndand3rdgrades,respectively.BetterStaff-Childrelationswasassociatedwithhigherreadingscores(bothgradelevels),mathscores(grade3only)andbettersocialskills(2ndgradeboysonly),butthescalewasunrelatedtoworkhabits.Activitieswasrelatedtobettermathgradesandworkhabits(bothforGrade3only),butthescalewasunrelatedtoreadinggradesandsocialskills.ProgrammingFlexibilitywasnotrelatedtoanyoutcomes.Predictivevalidityappearsmixedbuttheevidenceshouldberegardedaspreliminary.Theauthorsstatethatlittleresearchcurrentlyexiststhatexaminestherelationshipsofspecificprogramstrategies(ratherthanoverallprogramquality)withchildren’sacademicandsocialoutcomes.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
57
UserConsiderationsEaseofUseWhilethePQOisavailableforanyonetouse,itisimportanttorecognizethatitwasdevelopedwithexclusivelyaresearchaudienceinmind.Whilethemanualincludesbasicinstructionsforconductingobservationsandcompletingtheforms,itwaswrittenforresearchersparticipatingindatacollectionrelatedtoaparticularstudy.Thematerialshavenotbeentailoredforgeneralorpractitioneruseatthistimeandthereforeincludesomeconceptsandlanguage(e.g.,adjustedfrequencies,sampling,qualitative)thatmaynotbeparticularlyaccessiblefornon-researchaudiences.
InthecontextofthestudiesthePQOwasdevelopedfor,formalobservationtimeatsiteswasfairlylimited,butsomeadditionaltimeshouldbefactoredinforreviewingnotesandassemblingratings.Itisrecommendedthatthequalitativeratingsofenvironmentandstaffbehaviorbemadebasedonaminimumof90minutesofobservation.Completingthetimesampleprocessasoutlinedinthemanualtakesaminimumof30minutes(6030-secondcycles)foranexperiencedobserver.Someguidanceabouthowtoconductobservationsanddevelopratingsisprovidedinthemanual.
AvailableSupportsAtthistime,trainingisnotregularlyavailableonhowtousethePQO,buthasbeenconductedwithdatacollectorsinvolvedinthestudiesitwasdevelopedfor.Trainingsincludereviewingthecontentsoftheinstrumentandpairingnewraterswithtrainedraterstodoanobservationinthefield,comparescoresandbuildinter-observeragreement.
ObservationdatacollectedusingthePQOhavealwaysbeencoupledwithsupplementarydatasourcessuchasaquestionnaireaboutthephysicalenvironmentaswellasstaff,studentandparentsurveys.HoweverformallinksdonotexistbetweentheobservationtoolandothermeasuresandthePQOcouldbeusedindependently.
IntheFieldIntheStudyofAfter-SchoolCareandChildren’sDevelopment,conductedbyDeborahVandellandKimPierceinthemid-1990s,liveobservationofchildren’sexperiencesinprogramswasatthecenteroftheresearch(Pierce,K.,Hamm,J.,&Vandell,D.L.,1999).Observationswereconductedduringtheprogramparticipants’first-gradeyearandeachchildwasobservedthreetimesbyanindividualobserverwhowasrandomlyassignedfromapoolofobservers.TheobserversusedbothcomponentsofthePQO–thetimesampleprocedureandqualitativeratingsoftheprogramenvironmentandcaregiverstyle.Othertypesofinformationwerecollectedusingdifferentmethodsandmeasures.
Inanalyzingthedata,theresearcherslookedforassociationsbetweenthevariousmeasuresofprogramqualityandalsoatassociationsbetweenprogramqualityandchildren’sadjustmentatschool.Intermsofhowaspectsofprogramqualityrelate,staffpositivitywasnegativelycorrelatedwithstaffnegativity,asonemightexpect.Staffpositivitywashigherinprogramsthatweremoreflexibleandofferedmoreactivities.Staffnegativitywasassociatedwithlessprogrammaticflexibility.Theyalsofoundassociationsbetweentheprogramqualityindicatorsandchildren’sadjustmentinthefirst-gradeclassroom,primarilyforboys.Staffpositivitywasassociatedwithboysearninghigherreadingandmathgradesandexhibitinglessexternalizingbehavioratschool.Greaterprogrammingflexibilitywasassociatedwithboysexhibitingbettersocialskillsatschool.Greateravailabilityofage-appropriateactivitieswasassociatedwithboysearningpoorerreadingandmathgrades,andexhibitingpoorerworkhabitsandmoreexternalizingbehavioratschool.
Pierce,Bolt,andVandell(inpress)recentlyexaminedassociationsbetweenprogramqualityindicatorsasmeasuredbyseveralPQOqualitativeratings(staffpositiveregard,activities,flexibility)andchildren’sadjustmentatschool(grades,workhabits,socialskills)inGrades2and3,controllingforchildandfamilycharacteristicsandchildprioradjustment.The
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
58
researchersfoundthatgreaterstaffpositivityinafter-schoolprogramswasassociatedwithbothboysandgirlsearningbetterreadinggradesinGrades2and3andbettermathgradesinGrade3.BoysalsoexhibitedbettersocialskillsinGrade2whentheirafter-schoolprogramswerecharacterizedbygreaterstaffpositivity.Availabilityofmultipleactivitiesintheafter-schoolprogramswasassociatedwithboysandgirlsearningbettermathgradesandexhibitingbetterworkhabitsatschoolinGrade3.ProgrammingflexibilitywasnotassociatedwithchildoutcomesinGrades2and3.
VandellandPierce(2001)alsoreportedlong-termassociationsbetweenoverallprogramquality,asmeasuredbyannualcompositesofthequalitativeratingsandchildren’soutcomes.Theylookedatcumulativeprogramquality(averagedacrosstwoyears,threeyearsandfouryears)inrelationtochildren’sadjustmentatschool.Controllingforchildandfamilycharacteristicsandforchildren’sfunctioningattheendoffirstgrade,theresearchersfoundthatchildrenwhoexperiencedgreatercumulativeprogramqualityinGrades1–3werereportedbytheirteacherstohavebetteracademicgradesatschool.Girlswhoseafter-schoolprogramshadhighercumulativequalityacrossGrades1–3or1–4hadbetterworkhabitsandbettersocialskillswithpeersatschoolinGrades3and4.
ForMoreInformationThePQOisavailableonlineat:http://childcare.gse.uci.edu/des4.html
ContactDeborahLoweVandellDepartmentofEducationUniversityofCalifornia,Irvine2001BerkeleyPlaceIrvine,CA92697949.824.7840
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
59
PurposeandHistoryIn2003,theNewYorkStateAfterschoolNetwork(NYSAN)beganatwo-yearprocessofdevelopingtheProgramQualitySelf-AssessmentTool(QSA).AQualityAssurancecommitteeinvolvingkeystakeholdersfrompractice,policyandresearch,reviewedrelevantliterature,draftedtheinstrument,conductedfieldtestsandincorporatedfeedbackfrompractitionersacrossthestate.Soonaftertheinstrumentwascompletedin2005,NewYorkStatebeganrequiringthatall21stCCLC-fundedprogramsuseittwiceayearforself-assessmentpurposes.
TheQSAwasdevelopedexclusivelyforself-assessmentpurposes;programsarediscouragedfromusingitforexternalassessmentorformalevaluation.Itisintendedtobeusedinitsentirety,ideallyasthefocalpointofacollectiveself-assessmentprocessthatinvolvesallprogramstaff.TheQSAisalsousedbynewafter-schoolprogramsduringtheirinitialdevelopment;specificitemsthatareconsidered“foundational”indicatorsforthestart-upstageareidentified.
TheQSAwasdesignedtobeusedinthefullrangeschoolandcommunity-basedafter-schoolprogramsandisparticularlyrelevantforprogramsthatintendtoprovideabroadrangeofservicesasopposedtothosewitheitheraverynarrowfocusornoparticularfocus(e.g.,drop-incenters).Itwasalsodesignedtobeusedbyprogramsservingabroadrangeofstudents,fromkindergartenthroughhighschool.
ContentTheProgramQualitySelf-AssessmentToolisorganizedinto10essentialelementsofeffectiveafter-schoolprograms(seebelow).Eachelementcontainsalistofstandardsofpracticeorqualityindicatorsthatdescribeeachelementingreaterdetail.Theelementsrepresentamixofactivity-level,program-levelandorganizational-levelconcerns:
Environment/Climate•
Administration/Organization•
Relationships•
Staffing/ProfessionalDevelopment•
Programming/Activities•
LinkagesBetweenDayandAfter-School•
YouthParticipation/Engagement•
Parent/Family/CommunityPartnerships•
ProgramSustainability/Growth•
MeasuringOutcomes/Evaluation•
BecausetheQSAwasdesignedwithaneyetowardsprogramsreceiving21stCCLCfunding,therewasanintentionalefforttocaptureaspectsofprogrammingthatalthoughtheymaynotrelatedirectlytoacademics,willenhanceprograms’abilitytoaddressstudents’educationalneeds.Thedevelopersareexploringoptionsthatwouldallowprogramstoaddressasubsetofitemsbasedontheirlevelofreadiness;however,theultimategoalistoassesstheprogramororganizationinitsentirety.
Becauseofitsbroadfocusextendingfromtheactivityleveltotheorganizationasawhole,theQSAemphasizesseveraldifferentcomponentsofprogramsettingsincludingsocialprocesses,programresourcesandtheorganizationorarrangementofthoseresourcesinsidetheprogram.Socialprocessesaddressedbythetoolincluderelationships,climateandpedagogy.Resourceissuesincludefacilitiesandstaffingrequirementsandarrangementssuchaseffectivetransitions,policiesandproceduresandrelationshipswithschoolsarealsoaddressed.
StructureandMethodologyBecauseofitscommitmenttochildandyouthdevelopmentbroadlydefined,itisnotsurprisingthattheitemsincludedintheQSAreflecteachofthefeaturesidentifiedbytheNationalResearchCouncilasfeaturesofpositivedevelopmentalsettings(2002).
EachoftheQSA’s10essentialelementsofeffectiveafter-schoolprogrammingisfurtherdefinedbya
DevelopedbytheNewYorkStateAfterschoolNetwork
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
60
summarystatement,whichisthenfollowedbybetween7and18qualityindicators–statementsaimedatillustratingwhataparticularelementlookslikeinpractice.Whilemostessentialelementsareassessedthroughobservation,themoreorganizationallyfocusedelementssuchasadministration,measuringoutcomes/evaluationandprogramsustainability/growthareassessedprimarilythroughdocumentreview.
TheratingscaleusedintheQSA(seeexamplebelow)isdesignedtocaptureperformancelevelsforeachindicator.Indicatorsarealsoconsideredstandards
ofpractice,sothegoalistodeterminewhethertheprogramdoesordoesnotmeeteachofthestandards.Staffareaskedtodeterminewhethertheirperformanceineachindicatorareais:
4=Excellent/ExceedStandards
3=Satisfactory/MeetsStandards
2=SomeProgressMade/ApproachingStandard
1=MustAddress&Improve/StandardNotMet
Relationships:AQUALITYprogramdevelops,nurturesandmaintainspositiverelationshipsandinteractionsamongstaff,participants,familiesandcommunities.
AQualityProgram:PerformanceLevel PlantoImprove
1 2 3 4 RightNow
ThisYear
NextYear
Hasstaffwhorespectandcommunicatewithoneanotherandarerolemodelsofpositiveadultrelationships.
Interactswithfamiliesincomfortable,respectfulandwelcomingway.
Treatsparticipantswithrespectandlistenstowhattheysay.
Teachesparticipantstointeractwithoneanotherinpositiveways.
Teachesparticipantstomakeresponsiblechoicesandencouragespositiveoutcomes.
Issensitivetothecultureandlanguageoftheparticipants.
Establishesmeaningfulcommunitycollaboration.
Hasscheduledmeetingswithitsmajorstakeholders.
Encouragesformerparticipantstocontributeasvolunteersorstaff.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
61
Whilesomeadditionalguidanceisprovidedtostaffinthetool’sintroductionabouthowtodetermineratings,developersacknowledgethatthisisoneoftheareastheymayrevisitinthefuture,basedonfeedbackfromthefield.Usersarenotencouragedtocombinescoresforeachelementortodetermineaglobalrating,becausethetoolisintendedforinternalself-assessmentpurposesonly.Inadditiontoassigningaratingforeachindicator,usersaregivenspaceontheformtonoteandprioritizetheirplansforimprovement.
TechnicalPropertiesBeyondestablishingfacevalidity(peoplewithexpertiseintheafter-schoolfieldagreethismeasuresimportantfeaturesofprogramquality),researchrelatedtotheinstrument’spsychometricpropertieshasnotbeenconducted.
UserConsiderationsEaseofUsePractitionersledthedevelopmentoftheQSAandrepresentitsprimarytargetaudience.Thelanguageandformatoftheinstrumentarestraightforwardanduser-friendly.Thetoolconsistsofonedocument,freeanddownloadablefromtheWeb,thatincludesanoverview,instructionsandtheinstrumentitself.
NYSANhasdevelopedanewuserguide,publishedinApril2008,toassistprogramsinutilizingtheQSA.Thistoolprovidesguidanceonhowtoengagestaffintheassessmentprocessinadditiontooutliningthebasicguidelinesforadministeringthetool.
Programsareexpectedtogothroughtheself-assessmentprocesstwiceayear.Someinthefieldhaveconcernsaboutthetoolbeingtoolengthy;thisfeedbackwillbetakenintoanupcomingrevisionprocess.
AdditionalSupportsTheuserguide,mentionedabove,wascreatedinconsultationwithawiderangeofstakeholders–includingNYSANstaff,astate-wideQualityAssuranceCommittee,practitioner-basedfocusgroupsandanadvisorygroup.Theguideservesasa“self-guided
walkthrough”theQSAtool;thetoolisembeddedinthesecondhalfoftheguide.NYSANiscurrentlydevelopingphasetwooftheguide–anonlineversionwhichwillallowuserstoclickonlinkstootherweb-basedtools,articlesandresourcesrelatedtoanyoneofthetenessentialelementsortheoverallqualityassessmentandimprovementprocess.Theonlineversionwillalsoprovideadescriptiveexampleofoptimalperformanceforeverysingleindicator(thecurrenthardcopyguidefeaturesonlyselectexamples).ProgramscancontactNYSANtoreceiveadditionalreferralsfortechnicalassistanceinusingtheinstrument.
Whilenocentralizedmechanismforcollectingoranalyzingresultscurrentlyexists,withthedevelopmentoftheonlineversionofthetoolanduserguide,itwillbepossibletoenterdatabycomputer.Thiscouldleadtoefficientopportunitiestotrackandanalyzedataovertime.
Althoughadditionalinstrumentsarenotprovidedwiththetool,usersareencouragedtoconsiderQSAresultsoneimportantsourceofdatatoinformprogramplanningandareencouragedtouseitinconcertwithotherformalorinformalevaluativeeffortssuchasparticipant,parentandstaffsurveys,staffmeetingsandcommunityforums.Inthefuture,userswillbeabletolinktoothertoolsfromtheonlineversionoftheQSAandguide.
AllNYSANtrainingisnoworganizedbythe10elementsfeaturedinthetool,sopractitionerscaneasilyfindprofessionaldevelopmentopportunitiesthatconnectwiththeresultsoftheirself-assessment.Regulartrainingsthatareconductedtwiceayearwith21stCCLCgranteesarealsoorganizedaroundthe10elements.
IntheFieldTheNiagaraFallsSchoolDistricthasfundingthroughthe21stCCLCprogramtorunafter-schoolprogramsatfoursites–threemiddleschoolsandonehighschool.Whileallafter-schoolprogramsreceiving21stCCLCfundinginthestateofNewYorkarerequiredtoconductandsubmitQSAassessmentstwiceayear,theseprogramsinNiagaraFallshaveextendedtheiruseofthe
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
62
toolwellbeyondself-assessment.TheyseetheQSAascentraltostaffandprogramdevelopmentefforts.
SusanRoss,theProgramDirectorwithintheschooldistrict,describedhowsitecoordinatorssethetool.“WeseetheQSAasastaffdevelopmentresource.Aboutthreeweeksaftertheschoolyearstarts,sitecoordinatorsbeginsittingdownwithalloftheirstaff–teachersandcommunitypartners–andwalkingthroughthetool,onepageperstaffmeeting.Itgivesusacollectivesenseofwhat’sworkingandwhatweneedtoimprove.It’sagreatfocalpointfordiscussionsamongstaff.”
Rossemphasizedthatoneoftheimportantbenefitsofthisprocessisthatithelpstoleveltheplayingfieldbetweenstafffromexternalcommunitypartnerorganizationsandschoolteacherswhoworkinafter-schoolprograms.“Thisreallygivesourpartnersanopportunitytofeeltheiropinionsarevalued.OftenwhenCBOstaffcomeintoschoolstheyfeellikeguestsasopposedtofull-fledgedpartners.Throughthisprocess,theyseetheiropinionsareequallyvaluedandthathelpsbuildoverallstaffmorale.”
Sitedirectorsandstafffindthetoolaccessibleanduser-friendly.RosssummedupherassessmentoftheQSAinamatter-of-factway.“Welikeit.It’seasytouse,self-explanatoryandunderstandable.Infact,Iwouldn’tchangeanythingaboutit.”
ForMoreInformationNYSAN’sProgramQualitySelf-AssessmentToolisavailableonlineat:www.nysan.org/content/document/detail/1991/
ContactAjayKhashu,DirectorofResearchNYSAN925NinthAvenueNewYork,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
63
PurposeandHistoryThePromisingPracticesRatingScale(PPRS)wasdevelopedforresearchpurposesandisdesignedforuseinschool-andcommunity-basedafter-schoolprogramsthatserveelementaryandmiddleschoolstudents.Thetoolallowsobserverstodocumentthetypeofactivity,theextenttowhichpromisingpracticesareimplementedwithinactivitiesandoverallprogramquality.
The2005versionofthePPRS,theversionthatiscurrentlyavailable,wasdevelopedbyDeborahVandell,LizReisner,KimDadisman,KimPierceandEllenPechmaninthecontextofaspecificstudyfocusedontherelationshipbetweenparticipationin“typicallyperforming”programsandchildandyouthoutcomes(Vandell,D.,Reisner,E.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,Dadisman,K.&Pechman,E.,2006;Vandell,D.,Reisner,E.&Pierce,K.,2007).Becauseofthis,thetoolwasinitiallydesignedtoverifywhetherornotprogramswerehigh-qualityratherthantolookatvariationsinqualityacrossprograms.
ThisinstrumentbuildsdirectlyonearlierworkbyVandellandcolleaguesfocusedattheelementarylevel(seewriteupoftheProgramQualityObservationScaleinthisreport)aswellasthefeaturesofpositivedevelopmentalsettingsidentifiedbytheNationalResearchCouncil(2002).Itsauthorsalsodrewuponseveralotherobservationinstrumentsincludedinthisreportastheydevelopedtheexemplarsofpromisingpractices:theSchool-AgeCareEnvironmentRatingScale,theProgramObservationToolandtheOSTObservationTooldesignedbyPolicyStudiesAssociates.
AlthoughthefocusofthissummaryisthePPRSspecifically,othercomponentsofthePromisingPracticesqualityassessmentsystemincludeinterviewsandquestionnairescompletedbyprogramdirectorsandstaff.Thesetoolsobtaininformationaboutstructuralfeaturesofprogramssuchasstaffqualificationsandongoingtraining,materialandfinancialresourcesandconnectionsbetweentheprogramandschool,familyandcommunity.
ContentThePPRSprovidesresearcherswithaframeworkforobservingessentialindicatorsofhighqualityprograms.Itaddressesthreedifferentaspectsofprogramming:activitytype,implementationofpromisingpracticesandoverallprogramquality.Thefirstsection,whichcloselymirrorstheOSTObservationTooldevelopedbyPolicyStudiesAssociates,focusesondocumentingarangeofin-depthinformationaboutthetypeofactivitybeingobservedandtheskillsemphasizedthroughthatactivity.Thepromisingpracticesratingsthatconstitutethecoreoftheinstrumentfocusonthefollowingeightareasofquality:
SupportiveRelationswithAdults•
SupportiveRelationswithPeers•
LevelofEngagement•
OpportunitiesforCognitiveGrowth•
AppropriateStructure•
Over-control•
Chaos•
MasteryOrientation•
Becauseofitsemphasisonwhatchildrenandyouthexperienceinprograms,thePPRShasanactivityandprogram-levelfocusanddoesnotaddressorganizationalissuesrelatedtomanagement,leadershiporpolicy.Theprimaryfocusisonsocialprocesses–includinginteractionsbetweenandamongyouthandstaffandsomeaspectsofinstruction.
Asmentionedabove,thedevelopersdrewheavilyontheCommunityProgramstoPromoteYouthDevelopmentreport(NationalResearchCouncil,2002),sothefeaturesofpositivedevelopmentoutlinedinthatreportarequitevisiblewithinthetool’sdefinitionofpromisingpractices.AlthoughthePPRSitselfdoesnotincludeafocusonconnectionsbetweentheprogramandschool,familyorcommunity(oneofthefeaturesdescribedintheNRCreport),companiontoolsareavailabletocapturethistypeofinformation.
DevelopedbytheWisconsinCenterforEducationResearch&PolicyStudiesAssociates,Inc.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
64
StructureandMethodologyThefirstpartofthePPRS,whichfocusesontheactivitycontext,hasobserverswatchanactivityfor15minutesandcodeseveralaspectsofwhattheyareobserving,including:
Activitytype(e.g.,tutoring,visualarts,music,•sports,communityservice);
Space(e.g.,classroom,gym,library,cafeteria,•auditorium,hallway,playground);
Primaryskilltargeted(e.g.,artistic,physical,•literacy,numeracy,interpersonal);
Numberofstaffinvolvedintheactivity;and•
Number,genderandgradelevelofparticipants.•
Theseobservationsarerecordedonacoversheetthatalsoincludesotherbasicinformationabouttheobserver,theprogram,date,time,etc.
Nextobserversareaskedtowritedownabriefnarrativedescriptionoftheactivitytheyareobserving,followingasetofspecificguidingquestions(seebelow).Thisdescriptionsupplementstheactivitycontextcodingwitharicherdescriptionofwhatisgoingon.
Whatareyouthdoing?•
Whatkindsofmaterialsareused?•
Whatkindsofinstructionalprocessesareused?•
What,ifany,specificskillsdoestheactivity’s•leader(s)havethatsupportstheinstructioninvolvedintheactivityhe/sheisconducting?
LevelofEngagement(inintendedexperiences)High Low
Studentsappearengaged,focusedandinterestedintheiractivities.
Engagedinthefocalactivityand/orusingfreetime•appropriately.
Appeartobeinterestedintheactivity.•
Followstaffdirectionsinanagreeablemanner.•
Studentsappearboredordistracted.Ignorestaffwhoaretalkingtothem•
‘Pretend’tolisten.•
Wanderaimlessly.•
Markersofengagementareappropriatetoactivity(e.g.intenseconcentrationwitnessedduringcomputeractivity,highlevelsofaffectduringsportsactivities;canbesolitaryorgroupactivities.
Markersofengagementinappropriatetoactivity(e.g.pickingflowerswhileplayingasportactivity).
Studentscontributetodiscussions.Discussbackandforthandoffercomments.•
Ask“on-task”questions.•
arecomfortableinitiatingconversation.•
Studentsdonotcontributetodiscussions.Donotparticipateindiscussions.•
Donotaskquestions.•
RatingIndicators:1=Moststudentsarenotengagedappropriately,mayappearbored.2=Studentsareparticipatinginactivitiesbutdonotappeartobeconcentratingoraffectivelyinvolved.3=Studentsarefocusedonactivitieswithsomeevidenceofaffectiveinvolvementorsustainedconcentration.4=Studentsareconcentratingonactivities,focused,interactingpleasantlywhenappropriateandareaffectivelyinvolvedintheactivity.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
65
Whatistheoverallaffectivetone?•
Towhatextentareyouthengaged?•
Describeobservedpromisingpracticesas•appropriateandraiseconcernsaboutquality,ifthereareany.
ThenextsectionandthecoreofthePPRSisthePromisingPracticesRatingssection,whereobserversdocumenttowhatextentcertainexemplarsofpracticearepresentintheprogram.Thissectionofthetooladdressestheeightkeyareasofpracticelistedpreviously.
Eachareaofpracticeissubdividedintotwotofivespecificexemplars,withmoredetailedindicatorsprovidedundereach.Observersaregivenbothpositiveandnegativeexemplarsandindicatorsforeachpracticeareainordertohelpguidedeterminationofratings(seeexampleonpage63).
InthePPRS,ratingsareonlyassignedattheoverallpracticelevel(notforindividualexemplarsorindicators).Practicesareeitherconsideredhighlycharacteristic(4),somewhatcharacteristic(3),somewhatuncharacteristic(2),orhighlyuncharacteristic(1).Additionalguidanceastowhateachofthesetermsmeanisprovidedintheinstrument.Atthebottomofthedescriptionofeachpracticearea,observersaregiventailoredguidanceastowhatmightleadtoa1,2,3or4ratingforthatpractice.
Finally,observersareaskedtoreviewtheirratingsofpromisingpracticesacrossmultipleactivitiesandassignanoverallratingforeachpromisingpracticearea.Anoverallprogramqualityscoreiscomputedasthemeanoftheratingsonthe8scales,afterreversingthescoresforover-controlandchaos.Foreachpracticeareathereisspacetowritedownnotesto“justify”theoverallratingassigned.
TechnicalPropertiesAvailablepsychometricevidencesupportingthePPRSaddressesinterraterreliability,scoredistribution,internalconsistencyandpredictivevalidityinformationfromastudyof35after-schoolelementaryandmiddleschoolprograms(Vandell,D.,Resiner,E.,et.al.,2006).
ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecificdimensions.Vandell,Reisner,etal.(2006)examinedtheaveragescoresforoverallprogramqualityandtheindividualratingscalesobtainedwiththesampleofhigh-qualityelementaryandmiddleschoolprogramsthatparticipatedintheStudyofPromisingAfter-SchoolProgramssampleattwotimepoints.Generally,itisimportanttohavearangeofscoresacrossprograms,asthatwouldsuggestthemeasurecandetectmeaningfuldifferencesbetweenprograms.Becausethissampleincludedonlyhigh-qualityprograms,however,thescoresnaturallyfelltowardthepositiveextremesofeachdimension.ScoredistributionsonthePPRSobtainedin37observationsinprogramsofvaryingqualityaremorewidelydistributed,suggestingthattheinstrumentdetectsmeaningfuldifferencesamongprograms.
Inthatstudy,theauthorstheorizedthatscoreswouldexhibitawiderrangeandwouldshowlow,moderateandhighquality(asopposedtomostprogramsscoringonthehighendofthescale).Asexpected,scoresforeachofthescalesgenerallyhadawiderdistributionwithaveragesacrosstheprogramsfallinginthemiddleformostoftheeightscales.Twoscales,OpportunitiesforCognitiveGrowthandMasteryOrientation,hadlowscoresoverall(averageswere1.65and1.78onascaleof1to4,respectively),althoughtheirscoresintheStudyofPromisingPracticeswereclosertothecenter(averagescoresacrosstwoobservationperiodswithinprogramsservingelementaryandmiddleschoolstudents,respectively,rangedfrom2.6to2.9).Thedistributiondifferencesforthesetwoscalessuggestthattheymaybebettersuitedtodifferentiateamongprogramsofhigherquality.TheOvercontrolscalewastheonlyitemthatwasconsistentlylowforboththeStudyofPromisingPracticesandthefollowupstudy,whichmaysimplysuggestthatstaffinmostprogramsdonotexhibitagreatdealofovercontrol.Takentogether,thetwostudiesprovidestrongevidencethattheinstrumentcapturesmeaningfuldifferencesacrossavarietyofprograms.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
66
InterraterReliabilityTheauthorsexaminedrateragreementforeachoftheinstrument’seightitemsbycalculatingintraclasscorrelationcoefficientsbetweenratingsof24programsmadebytwoobservers.Coefficientsrangedfrom.58forOpportunitiesforCognitiveGrowthto.86forStructure(average=.74)fortheindividualscales.Theintraclasscorrelationfortheoverallprogramqualityscorewas.90.Interrateragreementrepresentedbykappascoresinworkconductedbyotherresearchteamsinprogramsofvaryingqualityrangedfrom.63forover-controlto.94forsupportiverelationswithadults(average=.77)across37observationsmadebytwoobservers.Thesescoresindicateacceptableinterraterreliability,meaningtheinstrument’sitemsareclearenoughforraterstounderstandandagreeon.
Thereiscurrentlynoinformationregardinginterraterreliabilityfortheoverallprogramqualityrating,whichrangeonathree-pointscalefromlowprogramqualitytohighprogramquality.Therefore,evidencedoesnotexisttosupportwhetherratersagreeonthedegreeofqualitythatindividualprogramsexhibit.
AdditionalReliabilityEvidenceAdditionalrateragreementinformationwasobtainedbycomparingtwosetsofratingsbythesameraterconductedonconsecutivedaysforeachprogram.Theauthorsfoundthatthepercentagreementforratingsofeachfeatureovertwodayswasbetween81percentand97percent,withanaverageof90percent.Thistranslatesintoanaveragekappascoreof0.80,indicatingthattheaverageitem’sratingforDay2isnotverydifferentfromDay1.
InternalConsistencyTodeterminewhethertheitemsfittogethertoformameaningfuloverallscore,theauthorscomputedastatisticcalledCronbach’salpha.IntheStudyofPromisingAfter-SchoolPrograms,alphacoefficientsfortheoverallprogramqualityscorerangedfrom.74to.77,indicatingacceptableinternalconsistency.
PredictiveValidityInitialevidenceofpredictivevalidityisavailableforthePPRS,whichmeansthattheinstrumentdoespredictyouthoutcomesthatwouldbeexpectedfrompriortheoryorresearch.Specifically,Vandell,Reisnerandcolleagues(2005)foundthatyouthattendinghighqualityprograms(asmeasuredbythePPRS)hadbettereducationalandbehavioraloutcomesbytheendoftheacademicyearthanunsupervisedyouthwhodidnotregularlyattendanyafter-schoolprogram,includingbetteracademicperformance,taskpersistence,socialskillsandpro-socialbehaviorswithpeersandlessmisconduct,substanceabuseandaggressivebehavior.14Vandell,Reisneretal.(2006)reportedsimilarfindingsforlongertermoutcomesaftertwoyearsofprogramparticipation.Improvementinmathachievementscoreshavealsobeenreported(Vandell,Reisner&Pierce,2007).
Thefactthattheinstrument’sratingsrelatedtoexpectedoutcomesofferssomereassurancetousersthatitaccuratelymeasuresaspectsofprogramquality.However,thevalidityevidenceshouldbetakenaspreliminaryforseveralreasons.First,theauthorshavenotexaminedPPRSratingsoflow-qualityprograms.Noevidenceexiststhattheinstrumentdistinguishesbetweenexpertratingsoflow-andhigh-qualityprograms,orwhetherlow-qualityprogramratingspredictyouthoutcomesdifferentlythanhighqualityprogramratings.Itwouldalsobeusefultounderstandthepredictivevalidityofeachspecificscale(e.g.,levelofengagement,appropriatestructure)andtheoverallscore.
UserConsiderationsEaseofUseWhilethePPRSisavailableonlineandfreeforanyonetodownloadanduse,itisimportanttorecognizethatitwasdevelopedwithprimarilyaresearchaudienceinmind.Whiletheobservationmanualincludesbasicinstructionsforconductingobservationsandcompletingtheforms,itwaswrittenforresearchersparticipatingindatacollectionrelatedtoaparticularstudy.Thematerialshavenotbeentailoredforgeneraluseorforpractitioneruseatthistimeandthereforeincludesomelanguage(e.g.,construct,exemplar)thatmaynotnecessarilybeaccessiblefornon-researchaudiences.
14Resultswerefoundusingtwoadvancedstatisticaltechniquesknownasclusteranalysisandhierarchicallinearmodeling.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
67
InthecontextofthestudythePPRSwasdevelopedfor,sitevisitswerefairlytime-intensive(spreadoverthecourseoftwodays).However,formalobservationtimetotaledapproximatelytwohourspersite,withseveraladditionalhoursspentreviewingnotesandassigningratings.Someadditionalguidanceabouthowtoconductobservations,developratingsandcompletetheformsisprovidedinthemanual.
AvailableSupportsAtthistime,trainingisnotregularlyavailableonhowtousethePPRS,buthasbeenconductedwithdatacollectorsinvolvedinresearch.Trainingshaveincludedreviewingthecontentsoftheinstrumentandpairingnewraterswithtrainedraterstodoanobservationinthefield,comparescoresandbuildinter-observeragreement.
ObservationdatacollectedwiththePPRShasalwaysbeencoupledwithsupplementarydatasourcessuchasaquestionnaireaboutthephysicalenvironmentaswellasstaff,studentandparentsurveys.HoweverformallinksdonotexistbetweentheobservationtoolandothermeasuresandthePPRScouldbeusedindependently.AdditionalmeasuresarealsoavailableatthesamewebsiteasthePPRS.
IntheFieldThePromisingPracticesRatingSystemwasdevelopedspecificallyforuseintheStudyofPromisingAfter-schoolPrograms,anationalstudyfundedbytheC.S.MottFoundationthatfocusedontheshort-andlong-termimpactsofhigh-qualityafter-schoolprogramsonthecognitive,academic,socialandemotionaldevelopmentofchildrenandyouthinhigh-povertycommunities.TheresearchwasledbyDeborahVandellofUCIrvine(formerlyoftheUniversityofWisconsin-Madison)andElizabethReisnerofPolicyStudiesAssociates.
Two-daysitevisitstoparticipatingprogramswereconductedinfall2002,spring2003,fall2003andspring2005toassessthequalityofeachprogram.Duringsitesvisits,researchersconductedobservationsusingthePPRSontwoafternoons,foraminimumofonehourperday.Observersfocusedontheactivitiesof
thetargetagegroups(grades3and4andgrades6and7)andobservedasmanydifferenttypesofactivitiesaspossible,withaminimumof15minutesperactivity.Attheendofthefirstdayofthesitevisit,observersassignedtentativeratingstoeachoftheeightpracticeareas;attheendofthesecondday,thefinalratingsweredetermined.
Asanalysesgotunderway,theauthorsrevisedtheirconceptualschemebasedontheideathatsetsofexperiencesshouldbetakenintoconsiderationasopposedtolabelingstudentsasprogramvs.non-program.Elementarystudentswithhighratesofparticipationinqualityafter-schoolprogramsbutlowlevelsofparticipationinotherafter-schoolarrangements(theprogramonlycluster)outperformedthelowsupervisorcluster(self-care+limitedactivities)oneverymeasureofacademicandsocialcompetenceassessed.Thesupervisedathomegroupoutpacedtheself-care+activitiesclusteronallacademicmeasuresandsocialskills.Amongmiddleschoolstudents,theprogram+activitiesclusterhadbetterworkhabitsthanthelowsupervisiongroupandbothprogramgroups(program+activities,programonly)reportedlessmisconductandsubstanceusecomparedtothelowsupervisiongroup.Similarresultswerefoundforprograminvolvementacrossthreeyears.Additionalfindingsareavailableathttp://childcare.gse.uci.edu/des3.html.
ForMoreInformationThePPRSisavailableonlineat:http://childcare.gse.uci.edu/des3.html
ContactDeborahLoweVandellDepartmentofEducationUniversityofCalifornia,Irvine2001BerkeleyPlaceIrvine,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
68
PurposeandHistoryTheQualityAssuranceSystem®wasdevelopedbyFoundations,Inc.tohelpafter-schoolprogramsconductqualityassessmentandcontinuousimprovementplanning.BasedonFoundations,Inc.’sexperiencerunningafter-schoolprograms,offeringprofessionaldevelopmentactivitiesandprovidingtechnicalassistanceandpublicationsforthefield,theQASwasdesignedtohelpprogramsdevelopandsustainacommitmenttoquality.
Initsfirstincarnation,theQASwasasimplechecklistdesignedtoassessthequalityofafter-schoolprogramsoperatedbytheorganizationitself.RoughlyfiveyearsagostaffatFoundationsreconstructedandexpandedthetoolforbroaderuse,withinputfrompractitionersbothinsideandoutsideoftheorganization.TheQASwasdevelopedtobegeneralenoughforuseinarangeofschool-andcommunity-basedprogramsservingchildrenandyouthgradespre-K–12.
Basedonseven“buildingblocks”thatareconsideredrelevantforanyafter-schoolprogram,thisWeb-basedtoolisexpandableandhasbeencustomizedforparticularorganizationsbasedontheirfocus.TheQASfocusesonqualityatthe“site”levelandaddressesarangeofaspectsofqualityfrominteractionstoprogrampoliciesandleadership.FillingouttheQASrequiresacombinationofobservation,interviewanddocumentreview.Scoresaregeneratedforeachbuildingblockratherthantheoverallprogram,reflectingthetool’semphasisonidentifyingspecificareasforimprovement.
ContentThevariouscomponentsofqualitythattheQASfocusesonarecalled“buildingblocks.”Thesevencorebuildingblocks,whichdescribewhatFoundationsconsiderstobethefundamentalfeaturesthatunderlieeffectiveafter-schoolprogramming,include:
Programplanningandimprovement;•
Leadership;•
Facilityandprogramspace;•
Healthandsafety;•
Staffing;•
Familyandcommunityconnections;and•
Socialclimate.•
Inadditiontotheseseven,three“programfocusbuildingblocks”reflectingtheparticulargoalsorfocusofaprogramareavailableforuserstoselectfrom:
Academics;•
Recreation;and•
Youthdevelopment.•
TheQASputsroughlyequalemphasisonthreedifferentcomponentsofsettingsincludingsocialprocesses,programresourcesandthearrangementororganizationofthoseresourceswithinprograms.ThereareitemsontheQASthataddressallofthefeaturesofpositivedevelopmentalsettingsoutlinedbytheNationalResearchCouncil(2002),withsomewhatmoreofanemphasisonthingsrelatedtostructureandskill-buildingthanonfeaturessuchas“supportforefficacyandmattering”and“supportiverelationships.”
StructureandMethodologyThestructureoftheQASisclearandstraightforward.Partone–programbasics–includesthesevencorebuildingblocks.Foreachone,usersaregivenabriefdescriptionoftheimportanceofthataspectofquality.Thebuildingblockisfurthersubdividedintofivetoeightspecificelements,eachofwhichareassignedaratingbyassessors.Forexample,theelementsofthesocialclimatebuildingblockinclude:behavioralexpectations,staff/participantinteractions,diversity,socialtimeandenvironment.Foreachelement,morespecificdescriptions(alsoreferredtoasa“rubric”)areprovided.Parttwoofthetool–programfocus–consistsofthethreeadditionalbuildingblocksanditsstructureparallelsthatofpartone.Programsareencouragedtouseone,twoorallthreeoftheprogramfocusbuildingblocksinconductingtheirassessment.
QualityAssuranceSystem®DevelopedbyFoundations,Inc.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
69
RatingsfortheQASaremadeusingafour-pointscalefromunsatisfactory(1)tooutstanding(4).Foreachelementofabuildingblock,specificdescriptionsofwhatmightleadtoa1,2,3or4ratingareprovided(seeexamplebelow).
Intermsofdatacollection,usersareprovidedwithadocumentchecklistthatidentifieswhatkindsofspecificdocumentsmightbeusefulinfillingouttheQASandareencouragedtogatherandexaminesuchdocumentspriortoobservingtheprogram.The“programprofile”
sectionofthetoolasksuserstouploadimportantbasicinformationabouttheprogramandcanalsobefilledout,forthemostpart,priortovisiting.
Onceon-site,theusers’guideencouragesobservers•togothroughfivesteps:
Meetpeopletoestablishrapportandhearfrom•staffandyouthabouttheprogram;
Wanderwithpurposetodevelopasenseofthe•entirefacility;
Staffing Unsatisfactory NeedsImprovement Satisfactory OutstandingScore
Elements 1 2 3 4
5.1StafftoParticipantRatio
Insufficientstaffarehiredforthenumberofparticipants
Sufficientstaffarehiredforsomelevelsofparticipation,butstaffingissometimesinsufficientduetoattendancefluctuations
Appropriateparticipanttostaffratiosaremaintainedconsistenly.
Staffnumberandattendanceexceeedrequiredratios.
5.2Qualifications
Fewerthanhalfthestaffhavetherequiredtrainingand/orexperience
Morethanhalfthestaffhavetherequiredtrainingand/orexperience.
Allstaffhavethetrainingand/orexperiencerequiredbytheprogram.
Manystaffmembersexceedtrainingand/orexperiencerequiredbytheprogram.
5.3ProfessionalGrowth
Professionaldevelopmentisnotprovidednoristimeallocatedforstafftopursueindividualprofessionalgrowth.
Someprofessionaldevelopmentopportunitiesareprovided,buttheyarepoorlyattended.
Staffattendprofessionaldevelopmentsessionsatleasttwiceayear.
Staffidentifyprofessionaldevelopmentneedsandattendprofessionaldevelopmentsessionsmorethantwiceayear.
5.4Attendence
Staffabsenteeismisanongoingproblem(e.g.significantnumberofstaffroutinelyabsent).
Staffabsencesareanoccasionalproblem.
Staffarereliableandabsencesareinfrequent.
Staffabsencesarerare.
5.5Retention Staffturnoverisidentifiedasaproblem.
Staffturnoveroccasionallyaffectsprogramofferings.
Staffretentionisnotidentifiedasaproblem.
Staffretentionisexcellentandprovidesstability.
TotalScore
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
70
Observeactivitiestoseetheprograminaction,the•levelofengagementandthenatureofactivities;
Gathermaterialstoensurethatallofthe•documentsinthechecklistandanyotherrelevantmaterialsarecollected;and
Takenotestoensureyouhavearunningrecordof•yourobservationsandquestions.
OncescoresforeachelementareenteredintotheQAS,theprogramelectronicallygeneratesoverallbuildingblockscores.Theprogram’squalityprofilethenbeginstoemergethroughsummarygraphsthesoftwaregeneratesforeachbuildingblockaswellasaprogramsummarygraphthatcontainsscoresforeachbuildingblockassessed.Thegraphsandbuildingblockscoreshelpuserstargetareasforimprovementaspartoftheassessmentprocess.Afollow-upQASassessmentenablesuserstoidentifyareasofprogressandthenrefinegoal-settingandimprovementplanning.
TechnicalPropertiesBeyondestablishingfacevalidity(peoplewithexpertiseintheafter-schoolfieldagreethismeasuresimportantfeaturesofprogramquality),researchrelatedtotheinstrument’spsychometricpropertieshasnotyetbeenconducted.
UserConsiderationsEaseofUseTheQASisastraightforward,flexibletoolwithseveralbuilt-infeaturesthatmakeitparticularlyuser-friendly.Theinstructionguideiswritteninclear,accessiblelanguageandwalksusersthroughthenecessarybackgroundandbasicstepsforusingthesystem.ThestandardcostfortheQAShasrecentlybeenreducedto$75foranannualsitelicense.Thislicenseisgoodfortwoofficialuses(orassessments)–whichiswhatitsdeveloperssuggestprogramsconductannually,oncetowardthebeginningoftheyearandoncetowardtheend.Aftertwousesthesystemgeneratesacumulativereportcomparingtheinitialandfollow-upassessments.Forprogramswithmultiplesites,acumulativereportcomparingsiteresultsisavailablewiththeinitial
assessment.WhentheQASisusedaspartofaprofessionaldevelopmentpackagerelatedtoqualityimprovement,discountsareavailable.
AvailableSupportsFoundations,Inc.offersonlinesessionsandin-persontrainingoptionstoassistorganizationsinusingthetool.Multi-siteorganizationsmaycontractforindividualizedtechnicalassistanceandtraining,whichmayincludeoptionsforcustomizationofthetool.Trainingsaddressingqualityelementsreflectedinthebuildingblocksareavailableonline,intechnicalassistance,andinprofessionaldevelopmentsessions.
Forself-assessmentpurposes,onceaQASsitelicenseispurchased,programscanreceivelightphonetechnicalassistancefreeofchargefromFoundations,Inc.staffiftheyhavequestionswhileusingthesystem.ProgramsthatwishtohavetrainedassessorsconducttheirassessmentcanpurchasethisserviceundercontractwithFoundations,Inc.
TheQASisavailableinaWeb-basedformat,allowinguserstoenterdataandimmediatelygeneratebasicgraphsandanalyses.Thesite-specificreportsgeneratedarespecificallydesignedtohelpsitestaffandleadersusetheinformationtoguideimprovementplanning.
IntheFieldFoundations,Inc.isworkingwiththeU.S.DreamAcademy,anationalprogramservingelementaryandmiddleschoolstudentswhoarechildrenofprisoners.In10centersaroundthecountry,U.S.DreamAcademyprovidesacomprehensiveprogramincludingacademicsupport,enrichment,andaone-on-onementoringrelationship.TheU.S.DreamAcademychosetousetheQASandatechnicalassistancestrategyduring2008-2010withFoundationstobuildandsupportprogramquality,andtoestablishanongoingprocessofcontinuousimprovement.Duringinitialmeetings,U.S.DreamleadersandstaffworkedwithFoundationstoclarifyqualityindicatorsfortheprogramandcustomizethetool.Theprocessesofco-assessmentandselfassessmentaredesignedtobuildthecapacityofsites
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
71
totargetspecificimprovementgoalsandconcretesteps,identifysitestrengthsandinnovations,andsharestrengthsorganization-wide.U.S.DreamAcademynationalheadquarterswillusetheQASfindingstoregularlyidentifywhereandhowtheycanbestsupporttheircenters.
DirectorsanticipatethattheQASandsurroundingprocesseswilldirectstafftoaskcriticalquestionsabouttheirprogramenvironmentsandstaffpractice.Atthesametime,itwillallowCenterstohighlightandsharestrengthsandaccomplishmentsacrosstheorganization,buildinginternalresourcesforquality.AfterjointassessmentsareconductedwithU.S.DreamstaffandFoundationsateachsite,individualscoreswillbeaggregatedandpresentedatanationalmeeting.Eachcenterwillconductafollow-upself-assessmentattheendofthe2008-2009schoolyear,atwhichpointtheywillbeabletoanalyzethedataandevaluatetheirownprogramdevelopment.TheQAStoolalsowillbeavailablethefollowingschoolyeartoalloweachsitetocontinuetheself-assessmentprocess.
TheQASdesign,coupledwithtechnicalassistanceprocesses,allowsforcustomizationofthetool.WiththeadditionofaneleventhbuildingblockfocusedonU.S.DreamAcademy’smentoringcomponent,thetoolencompassestheorganization’sfullrangeofessentialprogramcomponents.C.DianeWallaceBooker,ExecutiveDirectoroftheU.S.DreamAcademy,statedthat,“thebeautyofQASisthatitisdesignedwithevidence-basedqualityindicatorsyetiscustomizableandabletocapturetheuniqueelementsofourprogramandhelpedustomoreclearlydefinewhatqualitylookslikeforusversusanyotherafterschoolprogram.Further,theprocessofguided,selfassessmentandcontinuousimprovementplanningiscriticaltoourongoingeffortstoachieveimpactinthelivesofthechildrenweserve.”Establishinganongoingprocessforquality-buildingtailoredtothespecificsoftheprogramisparticularlyimportantformulti-siteprograms.AsU.S.DreamAcademyexpandstoservemorechildren,thesustainedqualityassurancecomponentbecomesevermorecritical.
ForMoreInformationAdditionalinformationabouttheQAS,includingorderinginformationisavailableonlineat:http://qas.foundationsinc.org/start.asp?st=1orbyvisitingwww.afterschooled.org
ContactRheMcLaughlin,AssociateExecutiveDirectorCenterforAfterschoolEducationFoundations,Inc.MoorestownWestCorporateCenter2ExecutiveDrive,Suite4Moorestown,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
72
PurposeandHistoryTheSchool-AgeCareEnvironmentRatingScale(SACERS)isdesignedtoassessbefore-andafter-schoolcareprogramsforelementaryschoolagechildren(5-to12-yearolds)aswellaswholedayprogramsincommunitieswithyear-roundschools.Itfocuseson“processquality”orsocialoreducationalinteractionsgoingoninthesetting,aswellasprogramfeaturesrelatedtospace,schedule,materialsandactivitiesthatsupportthoseinteractions.
TheSACERSwasdevelopedforself-assessment,programmonitoringorprogramimprovementplanning,aswellasforresearchandevaluation.Itcanbeusedbyprogramstaffaswellastrainedexternalobserversorresearchers.Whileself-describedasappropriatefor“groupcareprograms,”theSACERShasbeenusedinarangeofprogramenvironmentsbeyondchildcarecenters,includingschool-basedafter-schoolprogramsandcommunity-basedorganizationssuchasYMCAsandBoysandGirlsClubs.
TheSACERS,publishedin1996butupdatedperiodicallysincethen,isoneofaseriesofprogramassessmentinstrumentsdevelopedbyresearchersaffiliatedwiththeFrankPorterGrahamChildDevelopmentInstitute(FPG).Assuch,theSACERSisanadaptationoftheEarlyChildhoodEnvironmentRatingScale(ECERS)andisquitesimilarinformatandmechanicstotheECERS,theFamilyDayCareRatingScale(FDCRS)andtheInfant/ToddlerEnvironmentRatingScale(ITERS).Somestatesandlocalitieshaveusedseveralscaleswithintheseriestocreatecontinuityacrossaccreditationoraccountabilitysystems,giventheconsistentorientation,language,formatandscoringtechniques.
ContentTheSACERSmeasuresprocessqualityaswellascorrespondingstructuralfeaturesofprograms.Itscontentreflectsthenotionthatqualityprogramsaddressthree“basicneeds”ofchildren:protectionoftheirhealthandsafety,positiverelationshipsandopportunitiesforstimulationandlearning.Thesethreebasiccomponentsofqualitycareareconsideredequallyimportant.Theymanifestthemselvesintangible,
observablewaysandconstitutethekeyaspectsofprocessqualityincludedintheSACERS.Thesevensub-scalesoftheSACERSinclude:
SpaceandFurnishings;•
HealthandSafety;•
Activities;•
Interactions;•
ProgramStructure;•
StaffDevelopment;and•
SpecialNeedsSupplement.•
Byaddressingbothprocessqualityaswellasstructuralfeaturesthatrelatetoprocessquality(andotherstructuralmattersnotdirectlyrelatedtoprocessqualitysuchashealthpolicy),theSACERSputsasmuchemphasis,ifnotmore,onprogramresourcesandtheorganizationofthoseresourcesasitdoesonsocialprocessesthatoccurwithinthesetting.Thisreflectsitsrootsintheassessmentandmonitoringofenvironmentsservingyoungchildren.ThereareitemsontheSACERSthataddresseachofthefeaturesofpositivedevelopmentalsettingsoutlinedbytheNationalResearchCouncil(2002),withthemostemphasis(thelargestnumberofrelevantitems)clusteringunderthe“physicalandpsychologicalsafety”feature.
InteractionsSub-ScaleItemsGreeting/Departing•
Staff-childInteractions•
Staff-childCommunication•
StaffSupervisionofChildren•
Discipline•
PeerInteractions•
InteractionsBetweenStaff&Parents•
StaffInteraction•
RelationshipsBetweenProgramStaff•&ClassroomTeachers
DevelopedbyFrankPorterGrahamChildDevelopmentInstitute&ConcordiaUniversity,Montreal
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
73
StructureandMethodologyThestructureoftheSACERSisstraightforwardandconsistentwiththeothertoolsintheEnvironmentRatingScalesseries.Thescaleincludes49itemsinthesevensubscalesmentionedabove(seeboxfortheitemsinthe“Interactions”sub-scale).Allofthesub-scalesanditemsareorganizedintoonebookletthatincludestheitems,directionsforuseandscoringsheets.
Whileobservationisthemainformofdatacollectiontheinstrumentisbuiltaround,thereareseveralitemsthatarenotlikelytobeobservedduringprogramvisits.WhiletheSACERSdoesnotseparatethoseitemsoutintoaseparateinterviewscaleorform,ratersareencouragedtoaskquestionsofadirectororstaffpersoninordertoratetheseitemsandareprovidedwithspecificsamplequestionsthatwillhelpthemgetthenecessaryinformationtocompletetheform.
All49itemsareratedonaseven-pointscale,withonebeing“inadequate”andsevenbeing“excellent.”Concretedescriptionsofwhateachitemlookslike
ataone,three,fiveandsevenareprovided(seeexamplesbelow).Notesforclarificationthathelptheuserunderstandwhattheyshouldbelookingforandarealsoprovidedformanyitems.Observerscompiletheirscoresontoasummaryscoresheet,whichencouragesuserstocompilesratingsandcreateanoverallaverageprogramqualityscore.
TheSACERSismeanttobeusedwhileobservingonegroupatatime,foraperiodofthreehours.Asampleofone-thirdtoone-halfofgroups(whenprogramshavechildrendividedintogroupsorclassrooms)isrequiredtoestablishascoreforanentireprogram.
TechnicalPropertiesInthecaseoftheSACERS,psychometricevidencedemonstratesthatobservationsbydifferentratersareconsistent(interraterreliability)andthattheinstrument’sscalesconsistofitemsthatclustertogetherinmeaningfulways(internalconsistency).Preliminaryevidencealsoexistsforconcurrentvalidity,suggestingtheSACERSmaybeanaccuratemeasureof
Inadequate Minimal Good Excellent
1 2 3 4 5 6 7
Staf
f-Chi
ldC
omm
unicat
ions
Staff-child•communicationisusedprimarilytocontrolchildren’sbehavior&mangeroutines.
Children’stalknot•encouraged.
Stateinitiatebrief•conversations(e.g.askquestionsthatcanbeansweredwithyes.no,limitedturn-takinginconversations).
Limitedresponseby•stafftochild-initiatedconversations&questions.
Staff-childconversations•arefrequent.
Turn-takinginconversation•betweenstaff&childisencouraged(e.g.stafflistenaswellastalk).
Languageisusedprimarily•bystafftoexchangeinformationwithchildren&forsocialinteractions.
Childrenareasked“why,•how,whatis”questionswhichrequirelongermorecomplexanswers.
Staffmakeeffortto•talkwitheachchild(e.g.listentochild’sdescriptionofschoolday,includingproblems&successes).
Staffverballyexpand•onideaspresentedbychildren(e.g.addinformation,askquestionstoencouragechildrentoexploreideas).
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
74
programpracticesthatpredictrelatedoutcomes.15TheinformationpresentedhereisreportedbyHarms,JacobsandWhite(1996).
InterraterReliabilityToexamineinterraterreliabilityorthedegreetowhichdifferentratersagreewhenobservingthesameprogram,pairedratersassessed24programsusingthemeasure.ResearcherstestedinterraterreliabilitywiththeSACERSscalesandtotalscoreusingkappascoresandintraclasscorrelationcoefficients.
Allreliabilitycoefficientswerenearorabove0.70,suggestingstrongagreement.Inotherwords,withadequatetrainingforraters,scoreswillnotdependonwhichraterisevaluatingagivenprogram.
InternalConsistencyResearchersexaminedhowconsistentindividualitemscoresarewithineachrespectiveSACERSscale,sincealloftheitemswithinaparticularscaleareintendedtomeasureaparticularconcept(e.g.,HealthandSafety).Internalconsistencyofthescalesandthetotalscorewasstrong,withalphavaluesrangingfrom.67to.95.Highinternalconsistencystrengthenstheargumentthattheitemsjointlyrepresentthecentralconceptofinterest.
ConvergentValidityConvergentvalidityisexaminedbycomparingthefindingsfromtheinstrumentofinteresttoasimilarassessmenttool,tohelpdemonstratetheinstrument’sabilitytomeasurewhatitissupposedtomeasure.FindingsfromthreeoftheSACERSscaleswerecomparedtoratingswithVandellandPierce’sProgramQualityObservationScale(bytheauthorsandcolleaguesofthePQO,alsoreviewedinthisreport).EvidenceindicatedthateachofthesethreeSACERSscales(ProgramStructure,ActivitiesandInteractions)wererelatedtosimilarPQOitemsinexpectedways.Specifically,VandellandPierce(1998)foundthefollowingrelationshipsbetweenPQOandSACERS
scalesin46after-schoolprograms:(1)SACERSProgramStructurewaspositivelyrelatedtoPQOProgrammingFlexibility,(2)SACERSActivitieswaspositiverelatedtoPQOAvailableActivities,and(3)SACERSInteractionswaspositivelyrelatedtoPQOStaffPositiveRegardandPositiveBehaviorManagement,andnegativelyrelatedtoPQOStaffNegativeRegardandNegativeBehaviorManagement.ConvergentvalidityevidenceisunavailableontheotherSACERSscales.
ConcurrentValidityTodeterminewhetherSACERSaccuratelymeasuresprogramquality,developersexaminedwhethertheinstrument’sratingswererelatedtodistinct,theoreticallyimportantconceptsinexpectedways.Additionalconcurrentvalidityevidencecoversallofthescalesandtotalscore.Specifically,becausepriorresearchsuggestsprogramqualityisrelatedtostaffeducation/training,researchersexpectedthatiftheSACERSscaleswereadequatelymeasuringquality,theywouldbepositivelyrelatedtostaffeducation/training.AsHarms,JacobsandWhite(1996)expected,SpaceandFurnishings,Interactions,andProgramStructure,aswellastheoverallSACERSscore(whichcanbethoughtofasgeneralprogramquality)weremoderately,positivelycorrelatedwithameasureofstaffeducationandtraining.However,theydidnotreportparallelcorrelationswithmeasuresofthreeadditionalscales(HealthandSafety,Activities,orStaffDevelopment);itisunclearwhethertheydidnottestthesescalesoriftheyfoundthemtobeunrelatedtostaffeducation/training.Theresearchersalsotestedthevalidityofthescalesbyexaminingtheirrelationshiptostaff-childratio.Asexpected,theyfoundthatHealthandSafety,Activities,andStaffDevelopmentweremoderatelyrelatedwithchild-staffratio.Theydidnotreportcorrelationsbetweentheotherscalesandtotalscorewithstaff-childratioanditisunclearwhethertheydidnottesttheseorwhethertheywereuncorrelatedwithstaff-childratio.
AdditionalValidityEvidenceToexploretheextenttowhichtheSACERSadequatelymeasuresprogramquality,thedevelopersaskednine15Exceptwhennoted,psychometricinformationisnotavailableforthe
supplementary“specialneeds”itemsattheendoftheinstrumentbecausenoneoftheprogramstestedhadexceptionalchildren.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
75
expertstoratehowmucheachitemintheinstrumentrelatedtotheirdefinitionofhighquality.Usingafive-pointscale(withfivebeingaveryimportantaspectofquality),theminimumaveragescorewasaroundafourandexpertsratedmostitemsclosetoafive.Thesescoressuggestthatitemsadequatelymeasureaspectsofquality.However,sinceexpertswerenotaskedwhetheranyaspectsofqualitywereabsentfromtheinstrument,thisshouldnotbetakenasevidencethatprogramqualityasawholeisadequatelyrepresented.
UserConsiderationsEaseofUseTheSACERSisveryeasytouseintermsofaccessibilityofformatandlanguage(itiscurrentlyavailableinEnglish,French,German,andmostrecently,Spanish).Fullinstructionsforusingthescaleareincludedinthebookletalongwiththeitemsthemselves,notesclarifyingmanyoftheitemsandatrainingguidewithadviceonpreparingtousethescale,conductingapracticeobservationanddetermininginterraterreliability.Oneblankscoresheetisincludedinthebookletandadditionalscore-sheetscanbeorderedinpackagesof30.TheSACERSbookletisavailableforpurchasethroughTeachersCollegePressat$15.95.
Developerssuggestittakesapproximatelythreehourstoobserveaprogramandcompletetheform(usersareencouragedtocheckoffindicatorsandmakeatleastinitialscoringdecisionswhileobserving).Acknowledgingthatqualitycanvarywithinthesamecenterorprogram,thedevelopersadvisethattheapproachtoobservationandscoringreflecthowprogramsarestructured.Ifaprogramhaschildrenbrokenintoseveraldifferentclassrooms,observersareencouragedtoobserveone-thirdtoone-halfofthegroupsintheprogrambeforecreatinganoverallscore.
AvailableSupportsThreeandfive-daytrainingworkshopsfocusedonthestructure,rationaleandscoringoftheSACERSareavailablethroughtheFPGInstitute,asisadditionalinformationabouttheinstrumentandtheotherratingscalesintheseries.Specificguidanceforhowtoconduct
yourowntrainingwithstafforotherobserversisprovidedintheSACERSbooklet.Trainingtoreliabilitytakesanestimated4-5days,withreliabilitychecksthroughout.
FPGiscurrentlysolicitinginputfromusersinthefieldtodevelopapracticalmanualforadulteducatorsusinganyoftheratingscales,whichwillincludespecificmaterialssuchascoursesyllabiandoutlines.Formshavealsobeendevelopedtoassistwithreportingandapplyingobservationstoprogramimprovementplans.UserscansignuptojoinalistservthroughtheFPGWebsitetointeractwithotherusersinthefieldandtohearaboutupdatesandotherrelevantdevelopments.
Largescaleusersoftheratingscalescannowworkwithacommercialsoftwarepackage–theERSDataSystem–toenterandscoretheirdata.TheTabletPCversiondisplaystheitemsasseenintheprintversionandscoresaremadebytappingonthescreen.Notescanalsobewrittenwithaspecialpenandareautomaticallytranslatedintoprinttextandcanbeincorporatedintothesummaryreports.Thesoftwarealsohasamoduleoninterraterreliabilitywhichcanbeusedtocomparescores,reachconsensusanddeterminereliability.UsingtheWeb-basedsystem,individualassessmentscanbeautomaticallyroutedtoasupervisorforqualityassuranceandfeedbackandaggregatedataanalysisandorganizationandprogram-levelreportingcanbeprovided.
ImportantinformationforupdatingtheSACERSisavailableatwww.fpg.unc.edu/~ecers,includingadditionalNotesforClarificationandanexpandedscoresheet.Also,arevisionofSACERSisforthcoming,asisaYouthratingscaleforprogramsservingmiddleandhigh-schoolageyouth.
IntheFieldThestateofTennesseepassedlegislationin2001requiringalllicensedchildcarecentersandfamily/grouphomesinthestatetobeassessedusingtheEnvironmentRatingScales(includingSACERS).TheresultingChildCareEvaluationandReportCardprogramhastwoparts,onemandatoryandonevoluntary,both
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
76
ofwhicharestructuredaroundtheEnvironmentRatingScalestoassessthequalityofcareprovidedatspecificfacilities.Inthemandatorypartoftheprogram,theERSassessmentisoneofseveralcomponentsofanoverall“reportcard”giventoeachproviderthatmustbepostedalongwiththeirannuallicense.
ThevoluntarypartoftheprogramtiestheERS-basedassessmenttoreimbursements.IntheStar-QualityChildCareprogram,overallassessmentscoresforparticipatingprovidersisconvertedintoone,two,orthreestars,whichinturncanincreasetheprovider’sstatereimbursementby5,10or15percentrespectively.Tosupportparticipationinboththemandatoryandvoluntaryprograms,localTechnicalAssistanceUnitsprovideassistance,atnocharge,toanyproviderthatwantsinformationonhowtoimprovequalityandtherebyincreaseitsassessmentscore.
TheTennesseeDepartmentofHumanServices(TDHS)workswiththeUniversityofTennesseeandseveralotherorganizationstoimplementandmanagethisprogram.TDHSandUT’sSocialWorkOfficeofResearchandPublicServicemanagetheprogramandTennesseeStateUniversitypreparesanddeliverstheinitialtrainingforassessors.ElevenresourcecentersaroundthestatehouseanAssessmentandTechnicalAssistanceUnit.Theseunits,whichareresponsibleforconductingalltheERS-basedassessments,hireandemployabout60assessorsstatewide.AssessorsreceiveongoingtrainingandfrequentreliabilitychecksbyassessmentspecialistsattheUT.
Theassessmentprocesstakesplaceinconjunctionwithlicenserenewal.Adatabasehasbeendevelopedthatprovidesaccesstoregularlyupdatedstatisticalanddemographicinformationabouttheprogram’ssuccessinpromoting,supportingandincreasingqualitychildcareacrossthestate.
TheSACERSandotherscalesinthisseriesarepartofmanyotherstatequalityratingsystems,includingNorthCarolina,Mississippi,ArkansasandPennsylvania.
ForMoreInformationAdditionalinformationabouttheSACERS,supplementarymaterialsandorderinginformationisavailableonlineat:www.fpg.unc.edu/~ecers/
ContactThelmaHarms,DirectorofCurriculumDevelopmentFrankPorterGrahamChildDevelopmentInstitute517S.GreensboroStreetCarborro,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
77
PurposeandHistoryTheYouthProgramQualityAssessment(YPQA)isaninstrumentdesignedtoevaluatethequalityofyouth-servingprograms.Whileitspracticalusesincludebothprogramassessmentandprogramimprovement,itsoverallpurposeistoencourageindividuals,programsandsystemstofocusonthequalityoftheexperiencesyoungpeoplehaveinprogramsandthecorrespondingtrainingneedsofstaff.
Whilesomequalityassessmenttoolsandprocessesfocusonthewholeorganization,theYPQAisprimarilyfocusedonwhatthedevelopersrefertoasthe“pointofservice”–thedeliveryofkeydevelopmentalexperiencesandyoungpeople’saccesstothoseexperiences.Whilesomestructuralandorganizationalmanagementissuesareincludedintheinstrument,itfocusesprimarilyonthosefeaturesofprogramsthatcanbeobservedandthatstaffhavecontroloverandcanbeempoweredtochange.Whilethesesocialprocesseshavenotalwaysbeenemphasizedinlicensingandregulatoryprocesses,researchsuggeststheyarecriticalininfluencingprogramqualityandoutcomesforyouth.Giventhisfocus,theYPQAisexpectedtoassessprogramqualitymostaccuratelywhenusersobserveprogramofferings(programmaticexperiencesconsistingofthesamestaff,childrenandlearningpurposeacrossmultiplesessions).
TheYPQAhasitsrootsinalonglineageofqualitymeasurementrubricsdevelopedbytheHigh/ScopeEducationalResearchFoundationoverthepastseveraldecadesforpre-school,elementaryandnowyouthprograms.Initsinitialiteration,theinstrumentwasdevelopedspecificallytoassessimplementationoftheHigh/Scopeparticipatorylearningapproach.Initscurrentform,thetoolisrelevantforawiderangeofcommunity-andschool-basedyouth-servingsettingsthatservegrades4–12.Ithasbeenusedinarangeofafter-school,camp,youthdevelopment,preventionandjuvenilejusticeprograms.Itisnotnecessarilyappropriateforuseinhighlyunstructuredsettingsthatlackfacilitatedactivities.
ContentTheYPQAmeasuresfactorsattheProgramOfferinglevelandtheOrganizationallevelthataffectqualityatthe“pointofservice.”Thesevenmajordomains(calledsub-scalesinthetool)thatarecoveredincludeEngagement,Interaction,SupportiveEnvironment,SafeEnvironment,Youth-centeredPoliciesandPractices,HighExpectationsandAccess.
Becauseofthefocusonthe“pointofservice,”theYPQAemphasizessocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Themajorityofitemsareaimedathelpingusersobserveandassessinteractionsbetweenandamongyouthandadults,theextenttowhichyoungpeopleareengagedintheprogramandthenatureofthatengagement.HowevertheYPQAalsoaddressesprogramresources(human,material)andtheorganizationorarrangementofthoseresourceswithintheprogram.
ThecontentoftheYPQAalignswellwiththeNationalResearchCouncil’sfeaturesofpositivedevelopmentalsettings(2002),withtheleastemphasisonwhatisreferredtobytheNRCas“integrationoffamily,schoolandcommunityefforts.”ThecontentoftheYPQAhasalsobeenreviewedagainstandappearscompatiblewithJimConnellandMichelleGambone’syouthdevelopmentframework(2002).
StructureandMethodologyTheseventopicsordomainscoveredbytheYPQAaremeasuredbytwodifferentoverallscales(groupsofrelateditems)thatrequiredifferentdatacollectionmethods.TheprogramofferingitemsareincludedinFormAandareassessedthroughobservation.FormBincludestheorganizationlevelitems,whichessentiallyassessthequalityoforganizationalsupportfortheprogramofferinglevelitemsthatarethefocusofFormA.EvidenceforFormBisgatheredthroughacombinationofguidedinterviewandsurveymethods.
Thesevendomainscanbegraphicallyrepresentedbythe“pyramidofprogramquality,”(seebelow),whichrepresentsbothanempiricalrealityandaunified
DevelopedbytheDavidP.WeikartCenterforYouthProgramQuality16
16TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
78
frameworkforunderstandingandimprovingquality.Fromanempiricalperspective,assessmentsusingtheYPQAthusfarfollowadistinctpattern–mostprogramsscorehighestinsafetyandthenprogressivelylowerasyoumoveupthelevelsofthepyramidthroughsupport,interactionandengagement.Programsthatscorehighinengagementandinteractionappearmost
abletoinfluencepositiveyouthoutcomes(seetechnicalpropertiesformoredetailonthevaliditystudy).
ThescaleusedthroughouttheYPQAisintendedtocapturewhethernoneofsomething(1),someofsomething(3)orallofsomething(5)exists.Foreachindicator,veryconcretedescriptorsareprovidedto
Interaction
SupportiveEnvironment
SafeEnvironment
ReflectMakeChoices
Lead&Mentor
PartnerWithAdults
ExperienceaSenseofBelonging
SetGoals&MakePlans
BeinSmallGroups
HealthyFood&Drinks
ReframingConflict
WelcomingAtmosphere SkillBuilding
AppropriateSessionFlow
ActiveEngagement
Encouragement
EmergencyProcedures&Supplies ProgramSpace
&Furniture
PhysicallySafeEnvironment
Psychological&EmotionalSafety
Engagement
HighExpectationsStaffDevelopment•
SupportiveSocial•Norms
HighExpectationsfor•YoungPeople
CommittedtoProgram•Improvement
YouthCenteredPolicies&PracticesStaffQualificationsSupport•PositiveYouthDevelopment
TapYouthInterests&BuildSkills•
YouthInfluenceSettings&•Activities
YouthInfluenceStructure&Policy•
AccessStaffAvailability•&Longevity
ProgramSchedules•
BarriersAddressed•
Families,Other•Organizations,Schools
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
79
illustratewhatascoreof1,3or5lookslike(seeexampleonnextpage).ThescoringforFormsAandBisconsistent,butinthecaseofFormB,evidencetodrivethescoringisbasedonaninterviewasopposedtoobservations.Observersareencouragedtowritedownevidenceorexamplesthatsupportthescorethathasbeenapplied.
TechnicalPropertiesExtensivepsychometricevidenceabouttheYPQAisprimarilyavailablefromthreestudies.Thefirst,referredtoastheValidationStudy,examinedthereliabilityandvalidityoftheinstruments’scaleswithasampleof59organizations,mostofwhichwereafter-schoolprograms(Smith&Hohmann,2005).Thefindingssuggesttheinstrumenthasmanygoodpsychometricproperties.Threeofthesevenscales,however,didnotperformwellinoneormorepsychometricareas.
Thesecondstudy,referredtoastheSelf-AssessmentPilotStudy,includedasampleof24sitesandspecificallyexaminedtheYPQA’suseasaself-assessmenttoolforafter-schoolprograms(Smith,2005).Thisistheonlystudymentionedinthisreportthataskedprogramstoassessthemselvesratherthan
relyingonindependentresearcherstocollectdata.ThisstudyexaminedtheconcurrentvalidityoftheYPQAandfoundpreliminarysupportforthetotalscoreandseveralscales.Similartothefirststudy,somescalesexhibitedproblemswithinternalconsistency.
Thethirdstudy,referredtoasthePalmBeachQualityImprovementSystem(QIS)PilotStudy,usedamodifiedformoftheYPQAknownasthePBQ-PQAtoassessprogramqualityin38sites.ThePBQ-PQAhadsimilar,butnotidentical,scalescomparedtotheYPQA(Smith,C,Akiva,T.,Blazevski,J.&Pelle,L.,2008).
Inadditiontothesethreestudies,thedevelopersalsoconductedadditionalinterraterreliabilityanalysesfortheprogramofferingssectionoftheinstrument.Theyhavealsobegunusingtechniquesthatprovidemorerefinedanddetailedanalysesofreliabilityandvaliditythantraditionalmethods(seepages16-17).Inrelatedwork,CYPQhasjustfinishedavaliditystudyonayoungeryouthversionofthePQA(gradesK–4).
ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecific
II.SupportiveEnvironmentII-I.StaffSupportYouthinBuildingNewSkills
Indicators SupportingEvidence/Anecdotes1 3 5
Youtharenotencouragedtotryoutnewskillsorattempthigherlevelsofperformance.
Someyouthareencouragedtotryoutnewskillsorattempthigherlevelsofperformancebutothersarenot.
Allyouthareencouragedtotryoutnewskillsorattempthigherlevelsofperformance.
n/o=1
Someyouthwhotryoutnewskillswithimperfectresults,errorsorfailureareinformedoftheirerrors(e.g.,“That’swrong.”)and/orarecorrected,criticized,madefunof,orpunishedbystaffwithoutexplanation.
Someyouthwhotryoutnewskillsreceivesupportfromstaffwhoproblem-solvewithyouthdespiteimperfectresults,errors,orfailure,and/orsomeyoutharecorrectedwithanexplanation.
Allyouthwhotryoutnewskillsreceivesupportfromstaffdespiteimperfectresults,errors,orfailure;staffallowyouthtolearnfromandcorrectmistakesandencourageyouthtokeeptryingtoimprovetheirskills.
n/o=1
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
80
dimensions.SmithandHohmann(2005)examinedaveragescoresandspreadforeachofthescalesandtotalscoresfortheProgramOfferingsandOrganizationitemsandfoundthatallofthescalesandtotalscorehadgooddistributionsexceptforSafeEnvironmentandAccess(whicheachhadmeansof4.4outofapossible5.0).Mostprogramsscoredveryhighonthesescales,makingithardtocapturereliabledifferences.ForSafeEnvironment,itmayberealistictoassumethatnearlyallprogramsarerelativelysafe,particularlysincethescoresfromthisscalewerevalidatedbyfindingsfromayouthsurvey(seesectiononconcurrentvalidity).However,additionalevidenceisneededtodeterminewhethernearlyallprogramsarehighonAccess,orwhethertherearemeaningfuldifferencesthatarenotbeingpickedupbecausetheitemsare“tooeasy.”Inthelattercase,theitemscouldberevisedtobettercapturedifferencesbetweenprograms.
InterraterReliabilityRecentanalysessuggestthatthecurrentversionofthetoolpairedwithimprovedtrainingtechniquesproducesmoderatetohighlevelsofinterraterreliability.FortheProgramOfferingitems,High/Scoperesearchershavecapturedfourpaired-raterdatasetsoverthepasttwoyearsforatotalof32raterpairsusingliveandvideomethodsfortestingagreement.OneofthesedatasetswasproducedindependentlybytheChildren’sInstituteattheUniversityofRochester.AllratersusedthecurrentversionoftheYPQA.Researchersfoundthatacrosstheraterpairstherewasanaverageof78percentperfectagreementattheindicatorlevel,whichtranslatestoanaveragemaximumkappacoefficientof.66,closetothe.70benchmarkforhighinterraterreliability.Similarly,theaverageitem-levelmaximumkappafortheProgramOfferingitemswasalsohighat0.72.
FindingssuggestthatthecurrentversionofthetoolpairedwithratertrainingproducesacceptablelevelsofinterraterreliabilityforthreeofthefourscalesintheProgramOfferingssection.Specifically,theSafety,Support,andEngagementscaleshadacceptablereliabilitiesrangingbetween0.66and0.73.TheInteractionscalehadmoderatereliability(0.54).
InformationfortheOrganizationitems(scalesfivethrougheight)comesfromanearliervalidationstudybySmithandHohmann(2005).Theauthorscomparedpairsofraterswhoexaminedthesameprogramsatthesamepointsintime.TheyexaminedthepercentageofagreementacrosstheseitemsandfoundthatthehighestpossibleKappawas0.68,veryclosetothe.70benchmarkforhighreliability.
SmithandHohmann(2005)alsoexaminedinterraterreliabilitiesofthethreeOrganizationscales,whichisimportantbecauseuserswillultimatelydrawmostoftheirconclusionsfromthescales,nottheindividualitems.Theyexaminedagreementusingastatisticknownastheintraclasscorrelationcoefficient(ICC),whichexaminesthedegreetowhichdifferencesamongallratingshavetodowiththedifferencebetweenratersordifferencesamongtheprogramsthemselves.TheYouthCenteredPoliciesandPractices,HighExpectationsandAccessscalesallhadhighinterraterreliability(ICC=.51,.90&.73respectively).
InternalConsistencyInternalconsistencyindicateshowcloselyrelatedscoresarefortheoreticallysimilaritems.TheValidationStudyfoundthatmostoftheYPQAscalesexhibitedacceptableinternalconsistencyexceptforSafeEnvironmentandAccess.Asnotedabove,thismayhavetodowiththedistributionsofscores.Twoitemsfromaninternallyconsistentscalegotogether,sothatwhenitemAisratedashigh,itemBisratedashighandwhenAislow,Bisalsolow.However,ifAisalwayshigh(becauseallprogramsdowellonit),whetherornotBishigh,internalconsistencywillbelow.
OneexampleofaniteminwhichmostorganizationsreceivedthehighestpossiblescoreintheSelf-AssessmentPilotStudywas,“Thephysicalenvironmentissafeandhealthyforyouth.”Ifitemssuchasthisonearealwayshigh,wemaynotneedtokeepmeasuringthem.However,ifresearchersbelievethatthereismeaningfulvariationamongprograms,thenthesescalesmayneedadditionalrevisionbeforewecanbeconfidentthattheirscoresreliablymeasuretheconceptsthat
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
81
theyaresupposedtomeasure.Similarly,Smith(2005)foundintheSelf-AssessmentPilotStudythatthesetwoscaleshadlowinternalconsistency,butitalsoshowedlowinternalconsistencyfortwootherscales:YouthCenteredPoliciesandPracticeandHighExpectationsforAllStudentsandStaff.ApossibleexplanationisthatstaffparticipatingintheSelf-AssessmentPilotStudywereonlygivenonedayoftraining,whereastrainedratersintheValidationStudymayhavebeengivenmore.
Oneadditionalexplanationforwhyinternalconsistencymayhavebeenloweronsomescalescouldbethattheconceptsformingthesescalesareformativeratherthanreflective.AsexplainedinthesectiononAdditionalTechnicalConsiderations(page16-17),internalconsistencytestsareonlyappropriatewhenitemsarereflective,meaningthattheyallreflectthesameunderlyingconcept.Suchitemsarecloselyrelatedtooneanotherandeachrepresentsaunique“attempt”tomeasuretheconceptofinterest.However,internalconsistencyshouldnotbeusedwhenitemsareformative,meaningthatdifferentcomponentstogethermakeuporformacoherentset.Forexample,theSafeEnvironmentscalemaybemoreformativethanreflective.Aprogramthatprovideshealthyfoodanddrinks(asassessedbyoneitem)maynotnecessarilyhaveappropriateemergencyproceduresandsuppliespresent(anotheritemonthescale).However,eventhoughthesetwoitemstapdifferentunderlyingconcepts(nutrition,safetyinemergencies)andmaynotbecloselyrelated,theircombinationprovidesanimportantindexofhowaprogrampromotessafetyandhealth.
YPQAdevelopershavebegunexaminingwhethersomeYPQAscalesareformativeversusreflective,andtheyarecurrentlyexploringwhethercertainitemscanbecombinedtoformnew,reflectivescales.
Test-RetestReliabilityTheValidationStudyexaminedhowmuchscoreschangedonmultipleratingsoveraperiodofthreemonths.Correlationsbetweenassessmentsrangedfrom0.81to0.98,indicatingthatratingsdonotfluctuatewidelyovershortperiodsoftime.Long-termstability
wasnotassessed,sowecannotofferanyevidenceonwhethertheYPQAissensitiveenoughtodetectlong-termchange.
ValidityofScaleStructureEachofthescalesintheYPQAissupposedtomeasureaseparateconcept.Afactoranalysisexamineswhichitemsaresimilartoeachotherandwhicharedifferent.SmithandHohmann(2005)conductedafactoranalysisatbothobservationperiodsandfoundpreliminaryevidencethattheProgramOfferingitems(scalestwothroughfour)groupedtogetherinwayssimilartothescales.SafeEnvironmentwasnotincludedinthefactoranalysisandtheauthorsacknowledgethatthefactoranalysisdidnotsupporttheirexpectationsuntiltheyremovedtheseitems.WithouttheSafeEnvironmentitems,findingsindicatedthatSupportiveEnvironmentandOpportunitiesforInteractionoverlapandmaynotbeentirelydistinct.ValiditysupportwasstrongfortheOrganizationitems(scalesfivethroughseven),whichgenerallygroupedtogetheraccordingtothetheorizedstructureofthescales.
ConvergentValidityOnewaytoexaminewhetheraninstrumentactuallymeasuresaspectsofprogramqualityistocompareitsscorestomeasuresofidenticalorhighlysimilarconcepts.TheValidationStudytestedconvergentvaliditybycomparingallYPQAscalesexceptAccessandHighExpectationstosimilarscalesonaseparateyouthsurvey.Forexample,theSupportiveEnvironmentscalewascomparedtoaBelongingscaleontheyouthsurvey.CorrelationalevidenceindicatesthattheYPQAismoderatelytostronglyrelatedtofindingsfromtheyouthsurvey.TheYPQAtotalscoresfortheobservationandinterviewscaleswerealsorelatedtotheyouthsurveytotalscore.Theseresultsareencouragingforestablishingvalidity.
InthePalmBeachQualityImprovementSystemPilotStudy,researchersexaminedtherelationshipbetweenyouthperceptionsofprogramqualitytomodifiedversionsofthefourYPQAFormAdomainscales.Scalesonthisformweresimilartotheoriginalscales,but
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
82
notidentical.AuthorsfoundthatyouthperceptionsofqualitywererelatedtotheInteractionscale,butwereunrelatedtoSafeEnvironment,SupportiveEnvironment,andEngagementscales.Althoughthisevidenceismixed,thisvalidityevidencemaynotapplytothecurrentYPQAscalessincetheYPQAandtheinstrumentusedinthisstudyarenotcompletelyidentical.
ConcurrentValidityConcurrentvalidityisestablishedwhenaninstrument’sitemsandscalesarerelatedtodistinctbuttheoreticallyimportantconceptsthataremeasuredinthesametimeperiod.TheValidationStudymeasuredthevalidityofthetotalprogramqualityscore(createdbyaveragingthevariousscalescores)byexaminingitsrelationshiptoexpertratingsoftheprogramsthatwerebeingevaluated.Specifically,expertsratedprogramsbasedonyouthcenterednessandavailabilityofresources.ItisreasonabletoexpectthatiftheYPQAisindeedmeasuringprogramquality,thenthetotalscorewouldberelatedtothesetwoexpert-ratedconcepts.UsingPearsoncorrelationsasameasureofrelatedness,Smith&Hohmann(2005)foundstrongevidencethattheYPQAtotalscoreisrelatedtoexpertratingsforthesetwodomains,lendingadditionalsupportthattheinstrumentisindeedmeasuringprogramquality.Theyalsotestedthevalidityoftheglobalprogramqualityscoresbycomparingprogramswithtrainedstafftoprogramswithouttrainedstaff.Asexpected,theprogramswithtrainedstaffhadhigherglobalqualityscoresthanthosewithout,againlendingsupportthattheinstrumentcanvalidlymeasureoverallprogramquality.
TheValidationStudyalsoexaminedhowwelltheinstrumentwasassociatedwithstudentexperiencesassessedbyaseparateyouthsurvey(Smith&Hohmann,2005).ThefollowingrelationshipswereexaminedbetweenYPQAandyouthsurveyscales:(1)YPQAtotalscorewiththeyouthsurveymeasureofoverallprogramexperiences,(2)YPQAEngagedLearningwithmeasuresofgivingbacktothecommunity,youthgrowth,interestintheprogram,andchallengingexperiences,and(3)YPQAInteractionOpportunitieswithameasureofdecisionmakingintheprogram.
Theauthorsfoundstrongevidenceforconcurrentvalidityinthatalloftheirhypothesizedrelationshipsweresupportedexceptfortwo(theEngagementscalewasnotrelatedtoyouths’interestintheprogramorchallengingexperiences).However,thisevidenceislimitedinthattheoreticallyimportantrelationshipsinvolvingFormA’sSafeEnvironmentandSupportiveEnvironmentscalesandthethreeFormBscaleswerenotexamined.
TheSelf-AssessmentPilotStudyexaminedconcurrentvaliditybycorrelatingfindingsfromtheSupportiveEnvironmentandEngagementscalesandtheProgramOfferingstotalscorewithayouthsurveymeasureofstaffsupport.FindingsindicatedastrongrelationshipbetweenSupportiveEnvironmentandtheyouthsurvey.TheEngagementscalewasrelatedinexpectedwaystoameasureofprogramgovernanceontheyouthsurveyandtheProgramOfferingstotalscorewasrelatedinexpectedwaystoacademicsupportandpeerrelations.Noneoftheserelationshipswerestatisticallysignificant,perhapsbecausethesamplesizewassosmall(12programs).Thus,theserelationshipsshouldbeconsideredpromisingbutnotdefinitive.
ThePalmBeachQualityImprovementSystemPilotStudyevaluatedtheconcurrentvalidityofmodifiedversionsofthefourFormAscalesbyexaminingtheirrelationshipswithtwoscalesfromayouthsurvey:positiveaffectandchallengingexperiences.HigherpositiveaffectscoreswererelatedtohigherscoresonYPQAInteractionOpportunities,butitwasunrelatedtotheotherthreescales.HigherscoresonchallengingexperienceswasrelatedtohigherscoresonYPQAEngagedLearning,butitwasunrelatedtotheotherthreescales.Inadditiontotheresultsbeingmixed,validitymaynotapplytotheYPQAscalessincetheywerenotidenticaltotheonesusedinthisstudy.
Theconcurrentvalidityevidenceispromisingbutlimitedatthispoint.Additionalsupportisneededforseveraloftheinstrument’sscales.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
83
VariationsinQualityAcrossDifferentContextsProgramqualitymayvaryacrossdifferentcontextssuchasdifferentofferings,andhowmanysessionschildrenandyouthhavehadwithoneanother.Itisimportanttoknowifaninstrumentissensitivetothesetypesofdifferences,becauseifso,thenuserswillneedtoconductobservationsacrossarangeofcontexts.Forexample,ifqualityscoresvaryacrossdifferenttypesofactivitieswithinaprogram,thenuserswillneedtoobserveawiderangeofactivitiestoobtainacompletepictureofquality.
DevelopershavebegunexamininghowtheYPQAperformsacrossthreeimportantprogramcomponents:individualofferings,thecontentoftheseofferings,andhowmanysessionsthechildrenandstaffhavehadtogether.Developersalsoexaminedvariationacrosstwocombinationsofthesecomponents.Forexample,doesqualityforsomeofferingsstayconstantthroughouttheyearwhereasqualityforotherofferingsimprovesfromthebeginningtotheendoftheyear?Arequalityscoresrelativelysimilarincertaincontentareasregardlessofwhichagenciesarebeingobservedwhereasqualityscoresinothercontentareasvaryacrossagencies?Currently,evidenceisonlyavailablefortheInteractionsscale.Findingsindicatedthatinadditiontodetectingqualitydifferencesacrossagencies,theInteractionsscalewassensitivetodifferencesacrossvarioustypesofofferingsandcontent.However,measuredqualitydidnotvarybythenumberofsessionschildrenandstaffhadtogetheroracrossdifferentcombinationsofprogrammaticcomponents.ThesefindingssuggestthatusersoftheYPQAshouldconductobservationsacrossdifferenttypesofofferingsandcontentareastoobtainanaccurateInteractionsscore.
Inaddition,eventhoughagreementamongraterswasacceptableinotherstudies(asindicatedinthesectiononinterraterreliability),thedevelopersfoundreliabledifferencesamongratingsgivenbydifferentraters.Thissuggeststhatevengoodreliabilityamongratersdoesnotmeanthatratersshouldbeignored–afindingthatprobablyextendstoalltheinstrumentsinthiscompendium.
NoevidenceonhowqualityvariesacrosscontextsiscurrentlyavailablefortheotherYPQAscales,anditisalsopossiblethattheinstrumentissensitivetootherdifferencesbesidestheonesalreadyexamined(e.g.,timeofday).
UserConsiderationsEaseofUseTheYPQAwasdevelopedwithandforbothpractitionersandresearchers;asaresultthelanguageisaccessibleandtheformatandscoringprocessisuser-friendly.TheadministrationmanualandtheintroductionstoFormAandBofferusersasummaryofthepurposeandbenefitsofthetool,definitionsofkeytermsused(e.g.,scale,sub-scale,offering,item)andclearstepsthatwalkusersthroughtheobservationandscoringprocess.Whiletrainingisrecommended,themanualsthemselvesareself-explanatory.A“starterpack”thatincludesanadministrationmanual,FormAandFormBcanbeorderedonlinefor$39.95.
UsersoftheYPQAareencouragedtoconductarunningrecordofwhatoccursduringarelativelyextensiveprogramobservationasopposedtocapturingseveralshortsnapshotsofprogramming,becausedevelopersbelieveactivitieshaveacertainflowthatisimportanttotrytoobserve.Thisisparticularlyimportantifthegoalistocomeupwithareliableandvalidscoreforanindividualprogramasopposedtoaggregatingalargesampleofobservationsforresearchpurposes.Developersestimatethatgeneratingascoreforaprogram,basedonbothFormsAandB,takesaminimumofapproximatelysixhoursforasinglestaffperson.Roughlyfourofthosehoursaretypicallyspentobserving/interviewingwithintheprogramandanothertwohourswritingupandscoringtheinstrument.
AvailableSupportsInadditiontoanonlinetraining,theWeikartCenteroffersYPQAtrainingperiodicallyaroundthecountry(whichwillsoonbeavailableonline).Theone-dayworkshop,YPQABasics,introducestheobservationandevidencegatheringmethod,familiarizesparticipantswitheachitemandindicatorandpreparesstaffto
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
84
conducttheprogramself-assessmentmethodofevidencegatheringandscoring.Thetwo-dayYPQAIntermediateworkshopcoversallthematerialfromtheone-dayandgivesparticipantssubstantialpracticescoringthetoolusingwrittenscenariosandvideo,bringsparticipantstoacceptablelevelsofinterraterreliabilityandpreparesstafftoconducttheexternalassessmentmethodofevidencegatheringandscoring.Thethree-dayworkshopcoversallthematerialfromthesetwotrainingsandincludesasitevisit(duringwhichtheparticipantsscoreayouthprogram)andananalysisofthescoringefforts.
Inthepastyear,theWeikartCenterhasdevelopedasetofmanagement-focusedtrainingstoassistsitemanagersinleadingtheirprogramsthroughadata-drivenqualityimprovementprocess.
TheWeikartCenteralsooffers12youthdevelopmenttrainingsthatarealignedwiththecontentoftheYPQA.Followingaself-assessmentorevaluationprocess,forexample,programdirectorscanassembleatailoredstafftrainingexperiencebasedonspecificareaswithintheYPQAwheretheassessmentshowedworkwasneeded.
Anelectronic“scoresreporter”iscurrentlyavailablefromtheWeikartCenter(andisfreetothosewhopurchasetheinstrument).AmoresophisticatedWeb-baseddatamanagementsystem,iscurrentlyunderdevelopment.Thiswillallowindividualprogramsornetworkstojoin,goonlinetoenterandanalyzedataandseetheirresultsatvariouslevelsofaggregation.
IntheFieldTheRhodeIslandState21stCCLCprogramhaspartneredwiththeCenterforYouthProgramQualityinamulti-yearqualityassessmentprocessusingacustomizedtoolbasedontheresearch-validatedPQA.TheRhodeIslandProgramQualityAssessment(RIPQA),throughajointpartnershipoftheRhodeIslandAfterSchoolPlusAlliance,theProvidenceAfterSchoolAllianceandtheRhodeIslandDepartmentofEducation,iscurrentlyusedbyafter-schoolprogramsacrossthecityofProvidenceandthroughoutthestate,including
all21stCCLCfundedprograms.Participatingprogramsconductanannualself-assessmentusingtheRIPQA.Tosupporttheirefforts,aWeikartCenter-trainedQualityAdvisorworkswithprogramstojointlyobserveprogramofferingswithsitestaffandthenworkone-on-onewithagenciestodevelopqualityimprovementplansbasedonthoseobservations.
Asanadditionalcomponentofthiseffort,theWeikartCenterhasalsoconductedarandomizedfieldtrialtotestouttheirfulltrainingmodel.Basedon100interviewswithsitesupervisors,researchershavefoundthatengagingprovidersintheobservationandreflectionprocesshasbeenwell-receivedacrosstheboard.Thequalityadvisorandsite-basedtechnicalsupporthasbeenaveryimportantpartoftheprocess,especiallyforthoseproviderswithlimitedcapacity.Aggregatedsystem-widequalitydataareusedtodesignandcoordinatesystem-wideprofessionaldevelopmentofferingsaroundtheneedsthatgetsurfacedthroughassessment.
AccordingtoElizabethDevaney,DirectorofQualityInitiativesattheProvidenceAfterSchoolAlliance,thequalityimprovementefforthas“strengthenedourpositionandabilitytoattractpublicandprivateresourcestogrowthesystem,andisanimportantstrategyforsustainabilitygoingforward.”
ForMoreInformationInformationabouttheYPQAandorderinginformationisavailableonlineat:www.highscope.org/content.asp?contentid=117
ContactCharlesSmith,DirectorDavidP.WeikartCenterforYouthProgramQualityCentennialPlazaBuildingSuite601124PearlStreetYpsilanti,[email protected]
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
85
Arbreton,A.,Goldsmith,J.&Sheldon,J.(2005).Launchingliteracyinafter-schoolprograms:Earlylessons
fromtheCORALinitiative.Philadelphia,PA:Public/PrivateVentures.
Arbreton,A.,Sheldon,J.,Bradshaw,M.,&Goldsmith,J.withJucovy,L.&Pepper,S.(2008).Advancing
achievement:Findingsfromanindependentevaluationof
amajorafter-schoolinitiative.Philadelphia,PA:Public/PrivateVentures.
Birmingham,J.,Pechman,E.,Russell,C.,&Mielke,M.(2005).Sharedfeaturesofhigh-performingafter-
schoolprograms:Afollow-uptotheTASCevaluation.Washington,D.C.:PolicyStudiesAssociates,Inc.
Connell,J.,&Gambone,M.(2002).Youthdevelopment
incommunitysettings:Acommunityactionframework.Philadelphia,PA:YouthDevelopmentStrategiesInc.
Durlak,J.&Weissberg,R.(2007).Theimpactofafter-
schoolprogramsthatpromotepersonalandsocialskills.Chicago,IL:CollaborativeforAcademic,Social,andEmotionalLearning.
Harms,T.,Jacobs,E.,&White,D.(1996).School-agecareenvironmentscale.NewYork,NY:TeachersCollegePress.
InterculturalCenterforResearchinEducation,&NationalInstituteonOut-of-SchoolTime(2005).Pathwaystosuccessforyouth:Whatworksin
afterschool:AreportoftheMassachusettsAfterschool
ResearchStudy(MARS).Boston,MA:UnitedWayofMassachusettsBay.
Kim,J.,Miller,T.,Reisner,E.&WalkingEagle,K.(2005).EvaluationofNewJerseyAfter3:First-year
reportonprogramsandparticipants.Washington,DC:PolicyStudiesAssociates,Inc.
Knowlton,J.,&Cryer,D.(1994).FieldtestoftheASQ
programobservationforreliabilityandvalidity.ChapelHill,NC:Authors.
MacKenzie,S.,Podsakoff,P.,&Jarvis,C.(2005).Theproblemofmeasurementmodelmisspecificationinbehavioralandorganizationalresearchandsomerecommendedsolutions.JournalofAppliedPsychology.90(4).(pgs.710-730).
Martinez,A.,&Raudenbush,S.W.(2008).Measuringandimprovingprogramquality:Reliabilityandstatisticalpower.InM.Shinn&H.Yoshikawa(Eds.),Toward
positiveyouthdevelopment:Transformingschoolsand
communityprograms(pgs.333-349).NewYork,NY:OxfordUniversityPress,Inc.
NationalResearchCouncilandInstituteofMedicine.(2002).Communityprogramstopromoteyouth
development.Eccles,J.andGootman,J.,eds.Washington,DC:NationalAcademyPress.
Pechman,E.,Mielke,M.,Russell,C.,White,R.&Cooc,N.(2008).Out-of-Schooltimeobservationinstrument:
Reportofthevalidationstudy.Washington,DC:PolicyStudiesAssociates,Inc.
Pierce,K.M.,Bolt,D.M.,&Vandell,D.L.(inpress).Specificfeaturesofafter-schoolprogramquality:Associationswithchildren’sfunctioninginmiddlechildhood.AmericanJournalofCommunityPsychology.
Pierce,K.,Hamm,J.,Sisco,C.,&Gmeinder,K.(1995).A
comparisonofformalafter-schoolprogramtypes.PostersessionpresentedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Indianapolis,IN.
Pierce,K.,Hamm,J.,&Vandell,D.(1999).Experiencesinafter-schoolprogramsandchildren’sadjustmentinfirst-gradeclassrooms.ChildDevelopment.70(3),(pgs.756-767).
Raudenbush,S.,Martinez,A.,Bloom,H.,Zhu,P.,&Lin,F.(2008).Aneight-stepparadigmforstudyingthereliability
ofgroup-levelmeasures.Chicago:UniversityofChicago.
Russell,C.,Mielke,M.,&Reisner,E.(2008).EvaluationoftheNewYorkCityDepartmentofYouthand
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
86
CommunityDevelopment:Out-of-schooltimeprograms
forYouthInitiative:Resultsofeffortstoincreaseprogram
qualityandscaleinyear2.Washington,DC:PolicyStudiesAssociates,Inc.
Russell,C.,Reisner,E.,Pearson,L.,Afolabi,K.,Miller,T.,&Mielke,M.(2006).EvaluationoftheOut-of-School
TimeInitiative:Reportonthefirstyear.Washington,DC:PolicyStudiesAssociates.
Seidman,E.,Tseng,V.,&Weisner,T.(February2006).Socialsettingtheoryandmeasurement.InWilliamT.GrantFoundationReportandResourceGuide2005-2006.NewYork,NY:WilliamT.GrantFoundation.
Smith,C.(2005).MeasuringqualityinMichigan’s
21stCenturyafterschoolprograms:TheYouthPQA
self-assessmentpilotstudy.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.
Smith,C.,Akiva,T.,Blazevski,J.,&Pelle,L.(2008,January).FinalReportonthePalmBeachQuality
ImprovementSystemPilot:ModelImplementation
andProgramQualityImprovementin38After-school
Programs.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.
Smith,C.,&Hohmann,C.(2005).Youthprogram
qualityassessmentyouthvalidationstudy:Findings
forinstrumentvalidation.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.
Spielberger,J.&Lockaby,T.(2008).PalmBeach
County’sPrimeTimeinitiative:Improvingthequalityof
after-schoolprograms.Chicago:ChapinHallCenterforChildrenattheUniversityofChicago.
Vandell,D.L.,Reisner,E.R.,&Pierce,K.M.(2007).Outcomeslinkedtohigh-qualityafterschoolprograms:
Longitudinalfindingsfromthestudyofpromising
afterschoolprograms.Unpublishedmanuscript.PolicyStudiesAssociates,Inc.
Vandell,D.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,Dadisman,K.,Pechman,E.,&Reisner,E.(2006).
Developmentaloutcomesassociatedwiththeafter-school
contextsoflow-incomechildrenandyouth.Unpublishedmanuscript.
Vandell,D.,&Pierce,K.(2006).Studyofafter-schoolcare:Programqualityobservation.Retrievedonlineatwww.wcer.wisc.edu/childcare/pdf/asc/program_quality_observation_manual.pdf.
Vandell,D.,&Pierce,K.(2001,April).Experiencesinafter-schoolprogramsandchildwell-being.InJ.L.Mahoney(Chair).Protectiveaspectsofafter-schoolactivities:Processesandmechanisms.PapersymposiumconductedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Minneapolis,MN.
Vandell,D.,Reisner,E.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,&Pechman,E.(2006).Thestudyofpromisingafter-
schoolprograms:Examinationoflongertermoutcomesafter
twoyearsofprogramexperiences.WisconsinCenterforEducationResearch,UniversityofWisconsin-Madison.
Vandell,D.L.,&Pierce,K.M.(2001,April).Experiencesinafter-schoolprogramsandchildwell-being.InJ.L.Mahoney(Chair),Protectiveaspectsofafter-schoolactivities:Processesandmechanisms.PapersymposiumconductedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Minneapolis,MN.
Vandell,D.&Pierce,K.(1998).Measuresusedinthestudy
ofafter-schoolcare:Psychometricpropertiesandvalidity
information.Unpublishedmanual,UniversityofWisconsin-Madison.
WalkingEagle,K.,Miller,T.,Reisner,E.,LeFleur,J.,Mielke,M.,Edwards,S.,Farber,M.(2008).Increasingopportunitiesforacademicandsocialdevelopmentin2006-07:Evaluation
ofNewJerseyAfter3.Washington,DC:PolicyStudiesAssociates,Inc.
Westmoreland,H.&Little,P.(2006).Exploringqualitystandardsformiddleschoolafterschoolprograms:Whatwe
knowandwhatweneedtoKnow:Asummitreport.HarvardFamilyResearchProject;Cambridge,MA.Retrievedonlineatwww.gse.harvard.edu/hfrp/content/projects/afterschool/conference/summit-2005-summary.pdf.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
87
Psychometrics:Whataretheyandwhyaretheyuseful?
TheyouthprogramJaniceworksforisinterestedinself-assessmentandislookingforatoolthatmeasurestheoverallqualityoftheprogram.Afterlookingoverseveraloptions,shesettlesonaninstrumentthatseemseasytouse,withquestionsthatseemrelevanttotheorganization’sgoals.Unfortunately,sheencountersanumberofproblemsonceshestartsusingtheinstrument.First,theobserversinterpretquestionsverydifferently,leadingtodisputesovertheirassessmentsofquality.Second,theindividualitemscoresdon’tseemtoformacoherentpictureoftheprogram.Third,thefindingsareunrelatedtoyouthoutcomesthatshouldbedirectlyrelatedtoprogramquality.AlloftheseissuesmakeJanicequestionwhethertheinstrumentmeasuresprogramqualityaswellasitshould.
TheinstrumentJanicechoselookedusefulonthesurface,butitsfieldperformancewasnotparticularlyhelpful.PsychometricinformationmighthavehelpedJaniceunderstandthestrengthsandweaknessesoftheinstrumentbeforesheusedit.Psychometricsarestatisticsthathelpresearchersevaluateinstruments’fieldperformances.Psychometricinformationcanbedividedintoseveralcategories.
ReliabilityAninstrument’sabilitytogenerateconsistentanswersorresponses.Themostcommonanalogyusedtounderstandreliabilityisagameofdarts.Ifaplayer’sdartsconsistentlylandonthesamelocationontheboard,wewouldsaythatthedartplayerhasexcellentreliability(whetherornotthatplaceisthecenteroftheboard).Thesameistrueforresearchinstrumentsthatyieldpredictableandconsistentinformation.Therearevarioustypesofreliabilitydiscussedbelow.
InterraterReliabilityTheextenttowhichtrainedratersagreewhenevaluatingthesameprogramatthesametime.
Foraccurateprogramassessments,usersshouldchooseinstrumentsthatyieldreliableinformationregardlessofthewhimsorpersonalitiesofindividualraters.Whenfindingsdependlargelyonwhoisratingtheprogram(e.g.,ifRaterAismorelikelytogivefavorablescoresthanRaterB),itishardtogetasenseoftheprogram’sactualstrengthsandweaknesses.Forthisreason,organizationsshouldconsidertheinterraterreliabilityofvariousmeasuresevenifonlyoneraterwillberatingtheprogram.Poorinterraterreliabilityoftenstemsfromambiguousquestionsthatleavealotofroomforindividualinterpretationandsuchambiguityisnotalwaysimmediatelyapparentfromlookingattheinstrument.
Severalmethodsexisttomeasureinterraterreliability.Manyoftheinstrumentsinthisreportgivethepercentagethatratersagreeforagivenitem(allowingaone-pointdifferencetocountasagreement).Whilethismethodiscommon,itisnotasusefulasotherstatistics.Whenavailable,weinsteadreporttwootherstatisticsknownaskappaandintraclasscorrelation.Valuesofkappanearorabove.70indicatehighreliabilityandthisvalueisoftenconsideredthebenchmarkforastrong,reliableinstrument.Otherresearchersstatethatkappavaluesstartingat.60indicatesubstantial/strongagreement,whereasvaluesrangingfrom.40to.59indicatemoderateagreement.Similarguidelinesdonotyetexistfortheintraclasscorrelation,butthisreportconsidersvaluesclosetoorabove.50toindicatehighreliability.
Thereasonthatpercentageagreementdoesnotsufficientlyrepresentreliabilityisthatitdoesnotaccountforthoseinstanceswhereratersagreesimplybychance,whereaskappascoresandintraclasscorrelationsdo.Inmanycases,whatlookslikehighinterrateragreementmayactuallyhavealowkappascoreorintraclasscorrelationcoefficient.Whenkappascoresorintraclasscorrelationsarenotavailableforaninstrument,weprovideanestimateofkappa.Readersshouldknowthattheestimateisthebestpossiblescorebasedontheavailableinformation,thoughitispossibletheactualkappascoreismuchlower(indicatingworsereliability).
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
88
Itisimportanttonotethatinterraterreliabilitystatisticsassumethatallratershavebeenadequatelytrainedontheinstrument.Someinstruments’developersoffertrainingforraters.Ifyoucannotreceiveformaltrainingonaninstrument,itisstillimportanttotrainratersyourselfbeforeconductinganevaluation.Organizationscanholdmeetingstorevieweachquestionindividuallyanddiscusswhatcriteriaarenecessarytoassignascoreof1,2or3,etc.Ifpossible,ratersshouldgothrough“testevaluations”topracticeusingtheinstrumentwithscenariosthatcouldoccurintheprogram(ideallythroughvideos,butsuchscenarioscouldalsobewrittenifdetailedenough).Whendisagreementoccursonindividualquestions,ratersshoulddiscusswhytheychosetoratetheprogramthewaytheydidandcometoaconsensus.Practiceevaluationswillhelpratersget“onthesamepage”andhaveamutualunderstandingofwhattolookfor.
Test-RetestReliabilityThestabilityofaninstrument’sassessmentsofthesameprogramovertime.Ifseveralafter-schoolprogramsareeachassessedtwotimes,onemonthapart,therespectivescoresatbothassessmentswoulddifferverylittleiftheinstrumenthadstrongtest-retestreliability.Thestrengthofaninstrument’stest-retestreliabilitydependsonboththesensitivityoftheinstrumentandhowmuchtheprogramchangesovertime.Ifinstrumentsaretoosensitivetosubtlechangesinaprogram,test-retestreliabilitywillbelowandscoresmaydifferwidelybetweenassessmentseventhoughthesubtlechangesdrivingthisdifferencemayholdlittlepracticalsignificance.Ontheotherextreme,instrumentswithextremelyhightest-retestreliabilitymaybeinsensitivetoimportantlong-termchanges.Asisthecasewithinterraterreliability,severalmethodstomeasuretest-retestreliabilityexistincludingpercentageagreement,kappaandintraclasscorrelations,withthelattertwobeingpreferred.
Veryfewoftheinstrumentsinthisreporthaveundergonetestingforthistypeofreliability.Becausethetimespanbetweenassessmentshasbeenrelativelyshortfortheseinstruments,test-retestreliabilityshouldbehigh.
InternalConsistencyThecohesivenessofitemsformingtheinstrument’sscales.Anitemisaspecificquestionorratingandascaleisasetofitemswithinaninstrumentthatjointlymeasureaparticularconcept.Forexample,aninstrumentmightinclude10itemsthataresupposedtomeasurethefriendlinessofprogramstaffanduserswouldaverageorsumthe10scorestogetanoverall“friendlinessscore.”Becauseitemsformingascalejointlymeasurethesameconcept,wecanexpectthatthescoresforeachitemwillberelatedtoalloftheotheritems.Forexample,saythatthreeofour“friendliness”itemsinclude:(1)Howmuchdoesthestaffmembersmileatchildren?(2)Howmuchdoesthestaffmembercomplimentchildren?(3)Howmuchdoesthestaffmembercriticizechildreninaharshmanner?Ifthescalehadhighinternalconsistency,thescoresforeachquestionwould“makesense”comparedtotheothers(e.g.,ifthefirstquestionreceivedahighscore,wewouldexpectthatthesecondwouldalsoreceiveahighscoreandthethirdwouldreceivealowscore).Inascalewithlowinternalconsistencytheitems’scoresareunrelatedtoeachother.Lowinternalconsistencysuggeststheitemsmaynotfittogetherinameaningfulwayandthereforetheoverallscore(e.g.,averagefriendliness)maynotbemeaningfuleither.
Theanalogyofthedartboardisusefulwhenunderstandinginternalconsistency.Thinkabouttheindividualitemsasthedarts:theaimofthethrowerismeaninglessifthedartslandhaphazardlyacrosstheboard.Inthesameway,anoverallscorelikeaveragefriendlinessismeaninglessiftheitems’scoresdonotrelatetoeachother.ThestatisticthatdeterminesinternalconsistencyiscalledCronbach’salpha.Forascaletohaveacceptableinternalconsistency,itshouldbenearorovertheconventionalcutoffof0.70.Whereasinterraterandtest-retestreliabilitiesareimportantinformationforallinstruments,internalconsistencyisonlyrelevantforinstrumentswithscales.
TheWeikartCenter(YPQAdeveloper),amongothers(MacKenzie,S.,Podsakoff,P.,&Jarvis,C.,2005),
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
89
hasnotedthatinternalconsistencyisonlyappropriatewhentheitemsarereflectiveofalargerconceptratherthanformative.Foramorein-depthdiscussionofthisrequirement,readersshouldrefertothesectiononAdditionalTechnicalConsiderations,foundonpages16-17ofthisreport.
VariationinQualityAcrossDifferentContextsProgramqualitymaynotbeentirelyuniformacrossdifferentstaff,differentactivities,orevendifferentdaysoftheweekormonthsoftheyear.Evenwhentwoobserverscanagreeonthelevelofqualitythattheyareobservingwhenbothareobservingpreciselythesameactivityatthesametime,theymightcomeupwithdifferentratingsiftheyobserveadifferentactivityatadifferenttime.Someinstrumentsmayalsobeparticularlysensitivetosometypesofvariation.AstheWeikartCenterandothershavenoted(Raudenbush,S.,Martinez,A.,Bloom,H.,Zhu,P.,&Lin,F.,2008),evidenceaboutthewaysthatscoresonaparticularinstrumentvarywithinaprogramisimportantsothatusersknowhowtoaccountforthisvariation(e.g.,ifaninstrument’sscoresdependontheactivity,thenitisimportanttoassessawiderangeofactivitiesintheprogram).Foramorein-depthdiscussionoftheseissues,readersshouldrefertothesectionAdditionalTechnicalConsiderations,foundonpages16-17ofthisreport.
Validity17
Aninstrument’sabilitytomeasurewhatitissupposedtomeasure.Ifaninstrumentissupposedtomeasureprogramquality,thenitwouldbevalidifityieldedaccurateinformationonthistopic.Howeverresearchershavedevisedseveraldifferentmethodsforestablishingvalidity.Themostcommonanalogyusedtounderstandvalidityagainisthegameofdarts.Whilereliabilityisabouttheplayerconsistentlythrowingdartstothesamelocation,validityrelatestowhetherornottheplayerishittingthebull’seye.Thebull’seyeisthetopicaninstrument
issupposedtomeasure.Whilereliabilityisessential,itisalsoimportanttoknowifaninstrumentisvalid(dartplayersthatconsistentlymisstheboardentirelymaybereliable–theymayhitthesamespotoverandover–buttheyaresuretolosethegame!).
Sometimesaninstrumentmaylooklikeitmeasuresoneconceptwheninfactitmeasuressomethingratherdifferentornothingatall.Forexample,aninstrumentmightclaimtomeasureafter-schoolprogramquality,butitwouldnotbeparticularlyvalidifitfocusedsolelyonwhetherchildrenlikedtheprogramandwerehavingfun.
Validitycanbetrickytoassessbecausetheconceptsofinterest(e.g.,programquality)areoftennottangibleorconcrete.Unlikethecaseofreliability,thereisnospecificnumberthattellsusaboutvalidity.Thesemethodseachassessdifferenttypesofrelationshipsthattogethergiveusconfidencethattheinstrumentismeasuringwhatwethinkitmeasures.Next,wedescribethedifferentsubtypesofvalidity.
FaceValidityIndividuals’opinionsofaninstrument’squality.Thisistheweakestformofvaliditybecauseitdoesnotinvolvedirecttestingoftheinstrumentandisbasedonappearanceonly.Oneexampleoffacevalidityinamedicalcontextconcernstakingatemperature.Todayweknowtodothiswithathermometer.Butthinkbackacouplehundredyears.Atthattime,feelingapatient’sforeheadwouldhaveseemedamuchmorevalidmeasureoftemperaturethanstickingaglasstubefilledwithmercuryintothepatient’smouth.Howhotaforeheadfeelsisafacevalidmeasureoftemperature,butfewpeopletodayconsiderthismethodalonetobeadequate.Instead,doctorsrelyonthermometersbecausetheyhavebeenscientificallyproventobemoreaccurate.Similarly,researchersandpractitionersshouldconsiderotherformsofvaliditywhenavailablebeforechoosinganinstrument.
ConvergentValidityTheextenttowhichaninstrumentcomparesfavorablywithanotherinstrument(preferablyone
17ResearchersoftenrefertothetypeofvaliditydiscussedinthisreportasConstructValidity,becauseitaddresseswhetheraninstrumentadequatelymeasuresaspecificconceptorconstruct.Althoughotherformsofvalidityexist,theyarenotaddressedinthisreport.
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
90
withdemonstratedvaliditystrengths)measuringidenticalorhighlysimilarconcepts.Iftwoinstrumentsarepresumedtomeasurethesameorsimilarconcepts,wewouldexpectprogramsthatreceivehighscoresononemeasuretoalsoreceivehighscoresontheother.Forexample,imagineresearchershavedevelopedanewinstrument(InstrumentA)thatissupposedtomeasurestaffbehaviormanagementtechniquesinafter-schoolprograms.Todetermineitsvalidity,researchersmightcompareInstrumentAtoInstrumentB,whichisalreadyknowntoaccuratelymeasurestaff’sdisciplinestrategiesinafter-schoolprograms.AssumingthatInstrumentAisavalidmeasurement,wecanexpectthatwhenInstrumentBfindsthatprogramsrarelyuseappropriatedisciplinestrategies,InstrumentAwillfindthatthesameprogramsutilizepoorbehaviormanagementtechniques(andviceversa).Ifthiswerenotthecase,wewouldconcludethatInstrumentAprobablydoesnotadequatelymeasurebehaviormanagement.
ConcurrentandPredictiveValidityTheextenttowhichaninstrumentisrelatedtodistincttheoreticallyimportantconceptsandoutcomesinexpectedways.Ifaninstrumentmeasuresthequalityofhomeworkassistanceinafter-schoolprograms,thenchildrenwhoattendhighqualityprogramsshouldhavehigherratesofhomeworkcompletion(orperhapsgrades)thanchildrenwhoattendlowqualityprograms(assumingthereisnodifferencebetweenthechildrenbeforestartingtheprograms).Usually,theoryandpriorresearchfindingshelpresearchersdeterminewhichoutcomesaremostappropriatetoexaminewitheachinstrument.Validityevidenceisstrongestwhendifferencesintheoutcomesaredetectedaftertheinitialprogramobservationshavebeenconducted(knownasPredictiveValidity).Forexample,imaginethattwoafter-schoolprogramsaredesignedtoimprovechildren’sgrades,andthatchildrenattendingtheseprogramshadsimilargradesatthebeginningoftheschoolyear.Afterconductingprogramobservations,researchersdeterminedthatoneprogramwasofhighqualityandtheotherwasoflowquality.Ifchildrenattendingthehighqualityprogramhadhighergradesattheendoftheschoolyearcomparedtothe
childrenattendingthelowqualityprogram,thismakesusmoreconfidentthattheinstrumentaccuratelydetectedqualitydifferencesbetweenthetwoprograms.
Sometimesobservationsandrelatedconceptsaremeasuredinthesametimeperiod(knownasConcurrentValidity),particularlywhentherelatedconceptsareexpectedtochangesimultaneously.Howeverresearchersgenerallyprefertoseethehypothesizedcause(programquality)comebeforetheeffect.Whenbotharemeasuredatthesametime,itismorelikelythattheremaybeanotherexplanationfortherelationship.
Althoughsimilarinsomeways,concurrentandpredictivevalidityareseparatefromconvergentvalidity.Whereasconvergentvaliditycomparestwoinstrumentsthatmeasureidenticalorhighlysimilarconcepts,concurrentandpredictivevalidityrefertorelationshipsbetweendistinctconceptsthatweexpecttobestrongbasedontheoryandpriorresearch.
ValidityofScaleStructureTheextenttowhichitemsstatisticallygrouptogetherinexpectedwaystoformscales.Asalreadystated,scalesarecomposedofseveralitemsthat,whenaveragedorsummed,createanoverallscoreofaspecificconcept.Determiningwhetherscalesadequatelymeasuretheconceptstheyclaimtomeasurecanbedifficult,thoughconductingafactoranalysisisonehelpfulwaytodoso.Factoranalysisverifiesthatitemsgotogetherthewaysthedevelopersthoughttheywouldbyexaminingwhichitemsaresimilartoeachotherandwhicharedifferent.
Forexample,imagineaninstrumentwithtwoscales:StaffCommunicationStyleandStaffPatience.Next,imaginethatwheneverstaffareratedashavingaharshcommunicationstyletowardchildren,theyarealsoalwaysratedashavinglittlepatiencewithchildren.Becauseoftheirhighsimilarity,wewouldsaythatweareactuallymeasuringoneconcept,nottwo,anditwouldmakemoresensetohaveoneoverallscore(perhapsrenamedStaffAttitudesTowardChildren).
©January2009TheForumforYouthInvestment
MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition
91
Factoranalysiscanalsohelpdetermineifonescaleactuallyincorporatesmorethanonerelatedconcept.ImaginethatwehaveaninstrumentwithascalecalledHomeworkAssistance,butourfactoranalysisfindsthatweactuallyhavetwoseparateconcepts.WemightdiscoverthatsomeitemsrelatetoTutoringonSpecificSubjectMatterwhereasanothersetrelatestoTeachingStudySkills.Thereasonthatthevalidityofscalestructureisimportantisbecausewewanttoknowexactlywhichconceptsourinstrumentmeasures.
ScoreDistributionThedispersionorspreadofscoresfrommultipleassessmentsforaspecificitemorscale,includingfeaturessuchastheaveragescore,therangeofobservedvaluesandtheirconcentrationaroundparticularpoint(s).Inorderforitemsandscalestobeuseful,theyshouldbeabletodistinguishdifferencesbetweenprogramsonarangeofqualities.Toachievethis,scoresshouldnotbe“bunchedup”onanyparticularplaceonthescale.Forexample,imaginethataparticularinstrumenthasascalecalledPositiveChildBehaviorandusersmustrate,from1to5,howtruestatementslike“Childrenneverstophelpingeachother”and“Childrenthankstaffateveryopportunity”areforalargenumberofprograms.Ifalmosteveryprogramscoredlowforthisparticularscale,wemightarguetheitemsaremakingit“toodifficult”toobtainahighscoreanddonotmeaningfullydistinguishbetweenprogramsonthisdimension.Onesolutionwouldbetorevisetheitemstobetterreflectprogramdifferences.Thetwosampleitemsabovemightberevisedtosay“Childrenhelpeachotherwhenneeded”and“Childrenappreciatehelpfromstaff.”
Severalimportantstatisticshelpresearchersunderstandwhetherscoresarebunchingupontheends,includingtheaveragescore(sometimescalledthemean)andhowspreadoutthescoresare.Forexample,ascaleoritemwouldnotbeveryusefulfordistinguishingbetweenprogramsiftheaveragescoreacrossmanydifferentprogramswasa4.8outofapossible5.0.Inaddition,ascaleoritemmighthaveanaverageof3.5,butitwouldbelessusefulifthescoresonlyrangedbetween3and4insteadofalargerspreadbetween1and5.