eric · 2014-05-14 · document resume ed 402 558 cs 012 669 author valencia, sheila w.; au,...

DOCUMENT RESUME

ED 402 558 CS 012 669

AUTHOR Valencia, Sheila W.; Au, Kathryn H.TITLE Portfolios across Educational Contexts: Issues of

Evaluation, Teacher Development, and System Validity.Reading Research Report No. 73.

INSTITUTION National Reading Research Center, Athens, GA.;National Reading Research Center, College Park,MD.

SPONS AGENCY Office of Educational Research and Improvement (ED),Washington, DC.

PUB DATE 97

CONTRACT 117A20007NOTE 42p.

PUB TYPE Reports Research/Technical (143)

EDRS PRICE MF01/PCO2 Plus Postage.DESCRIPTORS Case Studies; Elementary Education; *Literacy;

*Portfolio Assessment; Portfolios (BackgroundMaterials); Professional Development; ProgramEffectiveness; *Student Evaluation; Validity

IDENTIFIERS Bellevue School District WA; Kamehameha EarlyEducation Program

ABSTRACTA case study across 2 different elementary education

settings examined (1) how well portfolios document literacy learningthat is both authentic and aligned with curriculum; (2) teachers'ability to interpret and evaluate portfolio evidence from more thanone site; and (3) what teachers learn about literacy instruction andassessment as a result °of cross-site collaboration. The two programswere the Bellevue Literacy Portfolio Project (located in a suburb ofSeattle, Washington) and the Kamehameha Elementary Education Program(a privately funded educational research and development effort inHawaii). Results suggest that portfolios contained authenticartifacts of students' literacy experiences, although there was asubstantial amount of evidence judged to be missing from theportfolios. Nevertheless, with a shared understanding of literacylearning, teachers were able to reach a high degree of agreement whenrating portfolios from different sites and enhance theirunderstanding of both learning and assessment through the cross-siteevaluation process. Findings should not be interpreted simply asfindings on portfolio assessment--they must be interpreted in lightof a complete portfolio system in which attention is given togenerating and collecting artifacts, supporting collaborativeevaluation, and providing ongoing professional development.Supportive internal and external conditions must be present ifportfolios are to become effective tools for literacy assessment andprofessional development. (Contains 48 references, and 5 tables and 3figures of data.) (Author/RS)

***********************************************************************

Reproductions supplied by EDRS are the best that can be madefrom the original document.

***********************************************************************

0

Portfolios Across Educational Contexts:Issues of Evaluation, Teacher Development,and System Validity

Sheila W. ValenciaUniversity of Washington

Kathryn H. AuUniversity of Hawaii

U.S. DEPARTMENT OF EDUCATIONOffice of Educational Research and Improvement

E

?,4CATIONAL RESOURCES INFORMATIONCENTER (ERIC)

This document has been reproduced asreceived from the person or organizationoriginating it.Minor changes have been made to

improve reproduction quality.

Points of view or opinions stated in thisdocument do not necessarily representofficial OERI position or policy.

MAC NationalReading ResearchCenter

READING RESEARCH REPORT NO. 73

Winter 1997

2BEST COPY AVAILABLE

NRRCNational Reading Research Center

Portfolios Across Educational Contexts:Issues of Evaluation, Teacher Development,

and System Validity



READING RESEARCH REPORT NO. 73Winter 1997

The work reported herein is a National Reading Research Center Project of the Universityof Georgia and University of Maryland. It was supported under the Educational Research andDevelopment Centers Program (PR/AWARD NO. 117A20007) as administered by the Officeof Educational Research and Improvement, U.S. Department of Education. The findings andopinions expressed here do not necessarily reflect the position or policies of the NationalReading Research Center, the Office of Educational Research and Improvement, or the U.S.Department of Education.

3

NRRC NationalReading ResearchCenter

Executive CommitteeDonna E. Alvermann, Co-DirectorUniversity of Georgia

John T. Guthrie, Co-DirectorUniversity of Maryland College Park

James F. Baumann, Associate DirectorUniversity of Georgia

Patricia S. Koskinen, Associate DirectorUniversity of Maryland College Park

Jamie Lynn Metsala, Associate DirectorUniversity of Maryland College Park

Penny OldfatherUniversity of Georgia

John F. O'FlahavanUniversity of Maryland College Park

James V. HoffmanUniversity of Texas at Austin

Cynthia R. HyndUniversity of Georgia

Robert SerpellUniversity of Maryland Baltimore County

Betty Shockley-BisplinghoffClarke County School District, Athens, Georgia

Linda DeGroffUniversity of Georgia

Publications Editors

Research Reports and PerspectivesLinda DeGroff, EditorUniversity of Georgia

James V. Hoffman, Associate EditorUniversity of Texas at Austin

Mariam Jean Dreher, Associate EditorUniversity of Maryland College Park

Instructional ResourcesLee Galda, University of GeorgiaResearch HighlightsWilliam G. HollidayUniversity of Maryland College Park

Policy BriefsJames V. HoffmanUniversity of Texas at Austin

VideosShawn M. Glynn, University of Georgia

NRRC StaffBarbara F. Howard, Office ManagerKathy B. Davis, Senior SecretaryUniversity of Georgia

Barbara A. Neitzey, Administrative AssistantValerie Tyra, AccountantUniversity of Maryland College Park

National Advisory BoardPhyllis W. AldrichSaratoga Warren Board of Cooperative EducationalServices, Saratoga Springs, New York

Arthur N. ApplebeeState University of New York, Albany

Ronald S. BrandtAssociation for Supervision and CurriculumDevelopment

Marsha T. DeLainDelaware Department of Public InstructionCarl A. GrantUniversity of Wisconsin-Madison

Barbara McCombsMid-Continent Regional Educational Laboratory (MCREL)Luis C. MollUniversity of Arizona

Carol M. SantaSchool District No. 5Kalispell, Montana

Anne P. SweetOffice of Educational Research and Improvement,U.S. Department of Education

Louise Cherry WilkinsonRutgers University

Peter WinogradUniversity of Kentucky

Production EditorKatherine P. HutchisonUniversity of Georgia

Dissemination CoordinatorJordana E. RichUniversity of Georgia

Text FormatterAngela R. WilsonUniversity of Georgia

NRRC - University of Georgia318 AderholdUniversity of GeorgiaAthens, Georgia 30602-7125(706) 542-3674 Fax: (706) 542-3678INTERNET: [email protected]

NRRC - University of Maryland College Park3216 J. M. Patterson BuildingUniversity of MarylandCollege Park, Maryland 20742(301) 405-8035 Fax: (301) 314-9625INTERNET: [email protected]

About the National Reading Research Center

The National Reading Research Center (NRRC) isfunded by the Office of Educational Research andImprovement of the U.S. Department of Education toconduct research on reading and reading instruction.The NRRC is operated by a consortium of the Univer-sity of Georgia and the University of Maryland CollegePark in collaboration with researchers at several institu-tions nationwide.

The NRRC's mission is to discover and documentthose conditions in homes, schools, and communitiesthat encourage children to become skilled, enthusiastic,lifelong readers. NRRC researchers are committed toadvancing the development of instructional programssensitive to the cognitive, sociocultural, and motiva-tional factors that affect children's success in reading.NRRC researchers from a variety of disciplines conductstudies with teachers and students from widely diversecultural and socioeconomic backgrounds in pre-kinder-garten through grade 12 classrooms. Research projectsdeal with the influence of family and family-schoolinteractions on the development of literacy; the interac-tion of sociocultural factors and motivation to read; theimpact of literature-based reading programs on readingachievement; the effects of reading strategies instructionon comprehension and critical thinking in literature,science, and history; the influence of innovative groupparticipation structures on motivation and learning; thepotential of computer technology to enhance literacy;and the development of methods and standards foralternative literacy assessments.

The NRRC is further committed to the participationof teachers as full partners in its research. A betterunderstanding of how teachers view the development ofliteracy, how they use knowledge from research, andhow they approach change in the classroom is crucial toimproving instruction. To further this understanding,the NRRC conducts school-based research in whichteachers explore their own philosophical and pedagogi-cal orientations and trace their professional growth.

Dissemination is an important feature of NRRCactivities. Information on NRRC research appears inseveral formats. Research Reports communicate theresults of original research or synthesize the findings ofseveral lines of inquiry. They are written primarily forresearchers studying various areas of reading andreading instruction. The Perspective Series presents awide range of publications, from calls for research andcommentary on research and practice to first-personaccounts of experiences in schools. InstructionalResources include curriculum materials, instructionalguides, and materials for professional growth, designedprimarily for teachers.

For more information about the NRRC's researchprojects and other activities, or to have your nameadded to the mailing list, please contact:

Donna E. Alvermann, Co-DirectorNational Reading Research Center318 Aderhold HallUniversity of GeorgiaAthens, GA 30602-7125(706) 542-3674

John T. Guthrie, Co-DirectorNational Reading Research Center3216 J. M. Patterson BuildingUniversity of MarylandCollege Park, MD 20742(301) 405-8035

5

NRRC Editorial Review Board

Peter AfflerbachUniversity of Maryland College Park

Jane AgeeUniversity of Georgia

JoBeth AllenUniversity of Georgia

Janice F. AlmasiUniversity of Buffalo-SUNY

Patty AndersUniversity of Arizona

Harriette ArringtonUniversity of Kentucky

Marlia BanningUniversity of Utah

Jill BartonElizabethtown College

Eurydice BauerUniversity of Georgia

Janet BentonBowling Green, Kentucky

Irene BlumPine Springs Elementary School

Falls Church, Virginia

David BloomeVanderbilt University

John BorkowskiNotre Dame University

Fenice BoydUniversity of Georgia

Karen BromleyBinghamton University

Martha CarrUniversity of Georgia

Suzanne ClewellMontgomery County Public Schools

Rockville, Maryland

Joan ColeyWestern Maryland College

Michelle CommeyrasUniversity of Georgia

Linda CooperShaker Heights City Schools

Shaker Heights, Ohio

Karen CostelloConnecticut Department of Education

Hartford, Connecticut

Jim CunninghamGibsonville, North Carolina

Karin DahlOhio State University

Marcia DelanyWilkes County Public Schools

Washington, Georgia

Lynne Diaz-RicoCalifornia State University-San

Bernardino

Mark DressmanNew Mexico State University

Ann DuffyUniversity of Georgia

Ann Egan-RobertsonAmherst College

Jim FloodSan Diego State University

Dana FoxUniversity of Arizona

Linda GambrellUniversity of Maryland College Park

Mary GrahamMcLean, Virginia

Rachel GrantUniversity of Maryland College Park

Barbara GuzzettiArizona State University

Frances HancockConcordia College of Saint Paul,

Minnesota

Kathleen HeubachVirginia Commonwealth University

Sally Hudson-RossUniversity of Georgia

Cynthia HyndUniversity of Georgia

Gay IveyUniversity of Georgia

David JardineUniversity of Calgary

Robert JimenezUniversity of Oregon

Michelle KellyUniversity of Utah

James KingUniversity of South Florida

Kate KirbyGeorgia State University

Linda LabboUniversity of Georgia

Michael LawUniversity of Georgia

Donald T. LeuSyracuse University

Susan LytleUniversity of Pennsylvania

Bert ManginoLas Vegas, Nevada

Susan MazzoniBaltimore, Maryland

Ann Dacey McCannUniversity of Maryland College Park

Sarah McCartheyUniversity of Texas at Austin

Veda McClainUniversity of Georgia

Lisa McFallsUniversity of Georgia

Randy McGinnisUniversity of Maryland

Mike McKennaGeorgia Southern University

Barbara MichaloveFourth Street Elementary School

Athens, Georgia

Elizabeth B. MojeUniversity of Utah

Lesley MorrowRutgers University

Bruce MurrayAuburn University

Susan NeumanTemple University

John O'FlahavanUniversity of Maryland College Park

Marilyn Ohlhausen-McKinneyUniversity of Nevada

Penny OldfatherUniversity of Georgia

Barbara M. PalmerMount Saint Mary's College

Stephen PhelpsBuffalo State College

Mike PickleGeorgia Southern University

Amber T. PrinceBerry College

Gaoyin QianLehman College-CUNY

Tom ReevesUniversity of Georgia

Lenore RinglerNew York University

Mary RoeUniversity of Delaware

Nadeen T. RuizCalifornia State University-

Sacramento

Olivia SarachoUniversity of Maryland College Park

Paula SchwanenflugelUniversity of Georgia

Robert SerpellUniversity of Maryland Baltimore

County

Betty Shockley-BisplinghoffBarnett Shoals Elementary School

Athens, Georgia

Wayne H. SlaterUniversity of Maryland College Park

Margaret SmithLas Vegas, Nevada

Susan SonnenscheinUniversity of Maryland Baltimore

County

7

BEST COPY AVAILABLE

Bernard SpodekUniversity of Illinois

Bettie St. PierreUniversity of Georgia

Steve StahlUniversity of Georgia

Roger StewartBoise State University

Anne P. SweetOffice of Educational Research

and Improvement

Louise TomlinsonUniversity of Georgia

Bruce VanSledrightUniversity of Maryland College Park

Barbara WalkerEastern Montana University-Billings

Louise WaynantPrince George's County Schools

Upper Marlboro, Maryland

Dera WeaverAthens Academy

Athens, Georgia

Jane WestAgnes Scott College

Renee WeisburgElkins Park, Pennsylvania

Allan WigfieldUniversity of Maryland College Park

Shelley WongUniversity of Maryland College Park

Josephine Peyton YoungUniversity of Georgia

Hallie YoppCalifornia State University

About the Authors

Sheila Valencia is Associate Professor ofCurriculum and Instruction at the University ofWashington, Seattle. She received her doctorate inreading education from the University ofColorado, Boulder, and then returned to the publicschools for 6 years as a district reading specialist.Her research focuses on literacy instruction andassessment, with a special emphasis on classroom-based assessment. Dr. Valencia has served on theadvisory boards of several national, state, andprofessional assessment projects including theNational Academy of Education, National TaskForce on Assessment, and the Joint Committee onAssessment of NCTE and IRA.

Kathryn H. Au is Associate Professor in theCollege of Education at the University of Hawaiiat Manoa. She received her doctorate in education-al psychology from the University of Illinois atUrbana-Champaign, after working as a classroomteacher in the primary grades. Her researchfocuses on the literacy instruction of students ofdiverse cultural and linguistic backgrounds. She ispresident elect of the National Reading Conferenceand was elected a vice president of the AmericanEducational Research Association.

National Reading Research CenterUniversities of Georgia and MarylandReading Research Report No. 73Winter 1997

Portfolios Across Educational Contexts:Issues of Evaluation, Teacher Development, and

System Validity



Abstract. We report here a case study of literacyportfolios across two different settings. Specifically,we investigated (1) how well portfolios documentliteracy learning that is both authentic and alignedwith curriculum; (2) teachers' ability to interpretand evaluate portfolio evidence from more than onesite; and (3) what teachers learn about literacyinstruction and assessment as a result of cross-sitecollaboration. Results suggest that portfolios con-tained authentic artifacts of students' literacy experi-ences, although there was a substantial amount ofevidence judged to be missing from the portfolios.Nevertheless, with a shared understanding of litera-cy learning, teachers were able to reach a highdegree of agreement when rating portfolios fromdifferent sites and enhance their understanding ofboth learning and assessment through the cross-siteevaluation process. We suggest that the resultsshould not be interpreted simply as findings onportfolio assessment. They must be interpreted inlight of a complete portfolio system in which atten-tion is given to generating and collecting artifacts,supporting collaborative evaluation, and providingongoing professional development. Supportive

1

internal and external conditions must be present ifportfolios are to become effective tools for literacyassessment and professional development.

All across the United States, interest inportfolios is running high. Private foundations,states, and school districts are investing mil-lions of dollars in portfolio projects they hopewill enhance teaching, learning, and assess-ment practices (Pelavin, 1991). The need toestablish ongoing, classroom-based assessmenthas become a predictable part of the assessmentconversation, and portfolios appear to be apromising candidate for the job.

There are three major expectations forportfolios. First, portfolios are viewed as beingmore meaningful, authentic, and valid indica-tors of what students know and can do thanmore traditional assessments. They can beintegrated with classroom instruction, reflectthe rich and complex work that children actual-ly do, and address broader, more importantlearning outcomes (Au, Scheu, Kawakami, &Herman, 1990; Calfee & Perfumo, 1996;

2 Valencia & Au

Valencia, 1990; Wiggins, 1989a; Wolf, 1989).By including multiple indicators of studentperformance, portfolios also capture the vari-ability and patterns, across tasks and time, thatcharacterize true learning (Messick, 1994;

Valencia, 1990; Wiggins, 1993).Second, as a result of these authentic, class-

room-based features, portfolios have the poten-tial to enhance both teaching and learning.Because they are housed in the classroom andcan be used regularly by teachers and studentsas part of the instructional program, portfolioshave potential to provide more useful, mean-ingful, and accessible information than typicalassessments. Teachers and students shouldbecome more reflective and knowledgeable asa consequence of using portfolios (Arter &Spandel, 1992; Darling-Hammond & Ancess,1993; Johnston, 1989; Moss et al., 1992;

Valencia & Calfee, 1991; Wiggins, 1989b;Wolf, 1989).

Third, some educators and psychometriciansare hopeful that portfolios will provide usefulassessment information for reporting to peopleoutside the classroom or local context. Thiswill require an acceptable level of interrateragreement as portfolio raters outside the localsite examine and score portfolios. For some,this outside reporting is necessary to ensure thesurvival of portfolios as an assessment innova-tion (Calfee & Perfumo, 1996; Freedman,1993; Valencia, 1991); for others, it is a wayto enter classroom information into the policyarena (Chittenden & Spicer, 1993; LeMahieu,Eresh, & Wallace, 1992; Wixson, Valencia, &Lipson, 1994); and for others, the evaluationprocess itself is valued as a powerful mecha-

nism for professional development (Wolf,LeMahieu, & Eresh, 1992).

Such an ambitious agenda for portfolioassessment poses enormous challenges foreducators. It is a relatively simple task tocollect evidence of student work, a much morecomplex one to assure that work representsauthentic instances of learning, feeds back toimprove teaching and learning, and yieldsuseful assessment information (Aschbacher,1994). The complexity is magnified and thetask becomes more daunting as portfolio con-tents and processes vary, and participantsrepresent a range of schools, districts, andeducational contexts. As educators make site-based decisions to collect different types ofportfolio artifacts, use different evaluationschemes, and participate in different levels ofprofessional development, it may become moredifficult to use portfolio data to analyze studentachievement and to improve teaching andlearning.

We report here a case study of literacyportfolios across two settings. Specifically, weexplore: (1) the potential of literacy portfoliosfrom different sites to capture curriculumoutcomes that are both authentic and alignedwith instruction; (2) teachers' ability to inter-pret and evaluate portfolio evidence from morethan one site; and (3) what teachers learn aboutliteracy instruction and assessment through theprocess of cross-site collaboration. By examin-ing these issues, we hope to shed light on aconceptual framework for a portfolio sys-temthe components and the external andinternal conditions that are needed for portfo-lios to be effectively implemented and used.Specifically, we hypothesize that if the collec-

NATIONAL READING RESEARCH CENTER, READING RESEARCH REPORT NO. 73

10

Portfolios Across Educational Contexts 3

tion of portfolio artifacts, evaluation process,and professional development all receive con-current support, and are situated in a sharedunderstanding of literacy and a low-stakesenvironment, classroom-based portfolio assess-ment can be successfully implemented, used,and evaluated across sites. Furthermore, wehypothesize that the process of evaluatingportfolios across sites provides teachers with animportant "outside" perspective on instructionand assessment that transfers to their ownclassrooms.

Background

It is difficult to generalize about the role ofportfolios in education because portfolios aredefined and implemented differently withinprojects and sites (Calfee & Perfumo, 1996;Valencia & Calfee, 1991). In some settings,portfolios are conceptualized simply as collec-tions of students' work. These collectionsprovide evidence of the work students havedone in the classroom, but are not the object ofreflection and systematic analysis by studentsor teachers. In other settings, where portfolioshave evolved beyond collections, portfoliosserve a variety of purposes. For example,showcase portfolios are used to highlight thebest work that students have done, documenta-tion portfolios may be used to show students'growth over time with respect to specificoutcomes, and evaluation portfolios may con-tain prespecified pieces that are each evaluated.In some settings, students' ownership of portfo-lios is considered of primary importance, andstudents, rather than teachers, decide what willgo into the portfolios (Hansen, 1994; Howard,

1990; Tierney, Carter, & Desai , 1991). Inother settings, portfolios are used for large-scale evaluation; teachers have guidelines aboutthe kinds of work that should be present ineach student's portfolio (Chittenden & Spicer,1993; Koretz, Stecher, Klein, & McCaffrey,1994; LeMahieu et al., 1992). And in others,decisions about portfolio contents are shared bythe teacher and the student (Valencia & Place,1994a). These issues of purpose and audiencepresent important choices and implications forstudent and teacher ownership, portfolio con-tents, and use of portfolio information (Moss etal., 1992; Valencia & Calfee, 1991).

A similar range of possibilities is found inthe various contexts in which portfolios areimplemented. Some portfolios are simplyimplemented by individual teachers interestedin portfolios; others are part of district orstate-wide assessment systems. Some have nostakes or sanctions attached to their implemen-tation or results while others have high publicvisibility and consequences. There is also greatvariability in the support provided to teachersfor implementing and using portfolios, fromminimal professional development to long-termsupport over years.

There is considerably less information aboutissues of evaluation and reporting of portfolioinformation than there is about implementation.Indeed, some would reject any type of evalua-tion or reporting, finding it undesirable, unnec-essary, and potentially detrimental to theportfolio philosophy (Carini, 1975; Johnston,1989). Others approach portfolio evaluationwith interest in collaborative interpretation andnarrative reporting of information. This ap-proach cautions against relying too heavily on


1 1

4 Valencia & Au

"traditional" interrater agreement. It empha-sizes the value of critical dialogue about collec-tions of student work among those most knowl-edgeable about the context in which the assess-ment occurs (Johnston, 1989; Moss, 1994;Moss et al., 1992). In contrast, a few havetried to apply evaluation rubrics to portfolios,concerning themselves with issues of interrateragreement and large-scale reporting (Chitten-den & Spicer, 1993; Gearhart, Herman, Baker,& Whittaker, 1992, 1993; Koretz, McCaffrey,Klein, Bell, & Stecher, 1993; LeMahieu et al.,1992; Nystrand, Cohen, & Martinez, 1993). Ingeneral, they find questionable levels of inter-rater agreement and low correlations withon-demand tasks. Furthermore, interrateragreement may drop below acceptable levelsfor individual and program decisions unlessindividual items in the portfolio are scored oneat a time and items are standardized (Moss,1994). Alternatively, some evaluation andreporting has been attempted on a smallerscale, with an eye toward developing responsi-ble evaluation and professional developmentrather than high stakes reporting (Au, 1994;Valenicia & Place, 1994a, 1994b). Psychome-tricians caution that to provide information thatwill satisfy those historically interested instandardized test scores will require a reconce-ptualization of the critical attributes of goodassessment, including reliability and validity,and consideration of the different needs ofvarious stakeholders (Haney, 1991; Linn,Baker, & Dunbar, 1991; Moss et al., 1992).

In short, there is presently no commonphilosophy underlying the use of portfolios, noone blueprint for how portfolios can or shouldbe implemented in classrooms, and no one

approach to professional development or toevaluation. As educators begin to explore thepotential of portfolios, these various models,purposes, and contexts will be critical variablesto consider. Unlike assessments of the past oreven the newer on-demand performance assess-ments, the variability in portfolio assessmentmay introduce new challenges for their imple-mentation, use, and aggregation of informationwithin a site and, most especially, acrossdifferent sites. If, however, the variability isvalued and grounded in teachers' experience,knowledge, and professional development, wemay be able to create portfolio systems thatsupport teaching, learning, and assessment.

Method

The Context

The two programs participating in the studywere the Bellevue Literacy Portfolio Projectand the Kamehameha Elementary EducationProgram (KEEP). Similarities and differencebetween the sites are summarized in Table 1.Bellevue is a suburb of Seattle. The schooldistrict serves approximately 15,000 students,20% of whom are minority. As a whole, testscores reflect achievement at about the 60thpercentile in reading and writing. The districthas a history of strong, on-going professionaldevelopment. KEEP was a privately fundededucational research and development effort.At the time of this study, the program served5,300 Native Hawaiian students in 9 publicschools in low-income communities on 3 of theHawaiian islands. The test scores for studentsin the KEEP program were typically below the


12


Table 1Similarities and Differences Across Sites

KEEP Bellevue

Student population 5,300 15,000

Diversity 100% minority 20% minority

Achievement lowest 25%ile 60%ile

Literacy philosophy Constructivistliterature-based; pro-cess writing

Constructivistliterature-based;process writing

Literacy outcomes Reading Ownership(including variety)

Reading Process(understanding, response, strategies)

Writing Ownership

Writing Process(quality, variety, process)

Reading Ownership

Reading Variety

Reading Ability(understanding, response)

Reading Strategies

Writing Ownership

Writing Ability(quality, variety)

Writing Process

Self-Reflection/Evaluation

Literacy & portfolio professionaldevelopment activities

at least 4 years 4 years

Teachers' portfolio implementation at least 2 years at least 2 years

Portfolio development Designed by KEEP curriculum de-velopers and consultants

Designed by teachers and districtspecialist

Portfolio requirements Emphasis on required pieces, someteacher choice

Emphasis on teacher choice andstudent choice, some requiredpieces

Portfolio use Used primarily for evaluation, ad-ministrative reports; some individualteacher use

Used primarily by classroom tea-chers with students; selected scor-ing for administrative reporting

Portfolio evaluation criteria Lists of specific benchmarks for eachof the 4 outcomes

Descriptive analytic rubric alignedwith 8 outcomes

Evaluation experience Some experience with scoring Some experience with scoring

Portfolio participation Voluntary Voluntary


BEST COPY AVAILABLE

13

6 Valencia & Au

25th percentile. Beyond these demographics,the two sites for this project have importantsimilarities and differences that make them abest-case scenario for a cross-site study ofportfolio assessment.

Similarities. Both portfolio projects hadbeen in place for several years and involvedonly teachers who had volunteered. At the timeof this project, the Bellevue teachers had al-ready worked with portfolios for 3

yearsexperimenting for 1 year and thenimplementing for 2 years. The project beganwith 32 volunteer teachers; by the third year,an additional 240 teachers also had participatedin portfolio inservice activities. The 150 KEEPteachers had been introduced to portfolios in1989 and had implemented portfolios in theirclassrooms for 2 years at the time of the proj-ect.

Project teachers at both sites were supportedthrough on-going professional development thatenabled them to learn about and share theirwork with portfolios. In Bellevue, a core groupof approximately 30 teachers met almostmonthly over 3 years. The school districtcharged this group with developing a model forportfolios. In addition, these teachers played akey role in disseminating portfolio informationto their building colleagues. At KEEP, consul-tants based at each school provided teacherswith knowledge about portfolios. During thefirst 3 years of implementation, consultantsassumed major responsibility for implementingportfolios. Teachers took over responsibilityfor the portfolios after the third year.

Both portfolio projects grew out of efforts tochange the overall language arts curriculum.These efforts resulted in new outcomes for

student learning. In Bellevue, the districtadopted new language arts student learningobjectives (SLOs) in the fall of 1990. Theywere organized by age bands and broad litera-cy outcomes. KEEP previously had a compre-hension-oriented curriculum, but changed in1989 to a whole literacy curriculum withstudents' ownership of literacy as the overarch-ing goal (Au et al., 1990). In both sites, thenew curricula reflected a process-oriented,constructivist philosophy about literacy andliteracy learning (Applebee, 1991; Weaver,1990). In this philosophy, the affective andcognitive dimensions of literacy are consideredto be equally important, and ownership ofliteracy is seen as the overarching goal. Stu-dents engage in authentic literacy activities,such as the reading of literature and writing fordifferent purposes, that have meaning outsidethe classroom as well as within it. Literacylearning proceeds as a social process in whichstudents learn from peers as well as from theteacher. Both the Bellevue and KEEP curriculaincluded outcomes for student learning inownership, reading, and writing. Teachers atboth sites taught writing following a processapproach (Calkins, 1994; Graves, 1983),engaging students in planning, drafting, revis-ing, editing, and publishing. Literature-basedinstruction was the approach for teachingreading, with an emphasis on discussions ofliterature and writing in response to literature(Roser & Martinez, 1995; Short & Pierce,1990).

In both sites, curriculum changes wereaccompanied by a desire to change the assess-ment system. Both sites had a commitment tolarge-scale evaluation as well as to individual


14


student progress. In both sites, portfolios wereseen as a potential tool for evaluation and forprofessional development. At the time portfolioassessment was introduced at Bellevue andKEEP, it was not known whether one systemcould in fact meet both purposes. However, theportfolio systems evolved with the intention oftrying to do so. Each system had specificrequired portfolio artifacts and an evaluationsystem designed to evaluate progress towardthe outcomes.

Differences. The sites had important differ-ences as well. In addition to serving differentstudent populations, the portfolio models weredifferent. Each was designed to meet theunique needs of the site. The Bellevue portfoliowas designed to serve three functionstoevaluate student progress toward each of eightliteracy outcomes, to involve students in self-reflection and self-evaluation, and to documentstudent growth over time (Valencia & Place,1994a). Although Bellevue wanted to see iftheir portfolios could be used to report district-wide student achievement, the primary empha-sis was on improving instruction and engagingstudents in learning at the classroom level. Incontrast, the KEEP portfolio was designedprimarily to meet the requirements of large-scale assessment. Students were judged to beat, above, or below grade level on benchmarksfor each of six literacy outcomes (Au, 1994).Consequently, portfolios from Bellevue andKEEP required different types of evidence anddifferent levels of student and teacher involve-ment.

Another important difference was the man-ner in which portfolios were introduced intoeach system. The Bellevue portfolio project

used a bottom-up approach. The Bellevuesystem was designed by a group of teachersworking with the District Language Arts Spe-cialist and a consultant. Both the School Boardand the Superintendent supported the projectwith teacher released days each month, andthey gave a 5-year commitment to the project.The group worked on the system for 1 year,revising and refining it, before they agreed toimplement it, although still on a voluntarybasis. The actual portfolios included somerequired "common tools" but varied consider-ably across classrooms since individual teach-ers and students had a major role in the selec-tion of portfolio artifacts. A small group ofteachers participated in the development,training, and implementation of a portfolioevaluation system during the third and fourthyears of the project. Only a subset of portfolioswas scored each year, and the results were notformally reported or used for district evalua-tion. The voluntary nature of the project andthe low stakes associated with it, resulted in awide of range of teacher compliance. Someteachers experimented with portfolios through-out the project, implementing them as theydesired. Others, especially those involved inthe evaluation, were much more rigorous andtried to adhere to the portfolio requirements(Valencia & Place, 1994a).

In contrast, KEEP portfolio implementationwas a top-down process. The KEEP portfoliosystem was developed as part of an overhaul ofthe instructional program triggered by inconsis-tent standardized achievement test results.Although KEEP administrators and consultantswere under pressure to bring change aboutquickly, teachers' participation in portfolios


15

8 Valencia & Au

was gradual and supported (Au, 1994). Theportfolio assessment system was designed byKEEP curriculum developers and consultants,and classroom teachers were asked to imple-ment the system. These consultants workedwith the teachers on a weekly basis on issues ofclassroom organization, literacy instruction,and portfolio assessment. Initially, consultantsdid most of the collecting of portfolio informa-tion during their classroom visits; because ofthis consultant role and because of the nature ofthe KEEP learner benchmarks, observationchecklists were a cornerstone of the portfolios.Consultants visited the classrooms and, withthe help of the teacher, identified evidence ofstudent performance on specific benchmarksand recorded it on the checklists. Some of theevidence came from student work, other evi-dence came from the consultants' classroomobservations. Teachers and students had mini-mal engagement with the portfolios early on. Inthe fourth year of the project, however, teach-ers assumed more responsibility for keepingand selecting work in the portfolios, althoughthere were still requirements for certain typesof evidence (Au & Asam, in press).

This study investigated what appeared to bea best-case scenario for exploring portfoliosacross sites, in terms of both the similaritiesand differences in the projects. Both projectsgrew from similar research-based, construc-tivist philosophies on literacy learning andteaching, and teachers at each site knew how tocreate portfolios that provided rich representa-tions of their students' literacy learning thataligned with their curricula. Teachers at bothsites had considerable experience with portfo-lios and had moved beyond concern with the

logistics of portfolio implementation. In addi-tion, both sites had portfolio models that, fromtheir inception, included an evaluation andreporting component. This interest in evalua-tion grew from concerns at the sites them-selves, rather than being imposed by an exter-nal agency, such as a state or federal depart-ment of education. Therefore, no site wasasked to believe or do anything that was notconsistent with its original mission. This situa-tion maximized the potential for capturingauthentic indicators of students' reading andwriting and for reliable evaluation across thetwo sites, despite differences in portfoliomodels.

This project was also considered to be abest-case scenario in terms of the opportunitiesavailable for studying professional develop-ment through portfolio evaluation. Due to thesimilarities in language arts philosophy, it wasexpected that teachers from the two projectswould have shared understandings that wouldgive them a common language for talkingabout portfolios, and the ability to understandand benefit from the contents of each others'portfolios. They would most likely have com-mon concerns and be able to engage in mean-ingful, useful professional discussions. Inaddition, because both portfolio systems al-ready included an evaluation component,teachers were accustomed to and comfortablewith making judgments about student work.

Finally, the differences in geographic loca-tion, student population, and actual portfoliomodels make this case study a comparison oftwo like-minded but not identical systems. Weexpected and found considerable variability inthe types of information in the portfolios.


16


In sum, the common literacy philosophy,years of experience implementing portfolios,and orientation to both classroom and outsidereporting combined with the difference inactual portfolio requirements and studentpopulations made this pair of sites the best caseto see if artifacts across sites were meaningful,and if cross-site evaluation was desirable andfeasible. These two sites also provided anexcellent opportunity to look at the professionalgrowth of experienced portfolio teachers asthey learned about how portfolios worked inanother context.

Participants

Four teachers (2 primary, 2 intermediate)from each site participated in this in-depthstudy. The 4 Bellevue teachers had from 5 to29 years of teaching experience; 3 of the teach-ers had more than 13 years. All had beenmembers of the core portfolio team and hadparticipated in several professional develop-ment classes, workshops, and district commit-tees. They credited these experiences and theirteaching colleagues with their own professionalgrowth. The KEEP teachers had from 21 to 28years of teaching experience. They had beenassociated with KEEP for 4 to 16 years. Aspart of their involvement with KEEP, all theteachers had received individual feedback ontheir teaching from KEEP staff members,attended numerous workshops, engaged inprofessional reading, observed in other teach-ers' classrooms, and participated in teachernetworks. All had implemented the wholeliteracy curriculum at a high level. Although alltaught both a writers' and a readers' workshop,

3 of the teachers had focused their portfoliodata collection on writing, while 1 had focusedon reading. All 8 teachers were selected be-cause their classrooms reflected a good under-standing of their local literacy curriculum andportfolios. All had worked with portfolioassessment for at least 2 years.

Procedures

At the beginning of the study, the teacherswere interviewed about their teaching back-grounds, professional development, languagearts instruction, and portfolios. Teachers wereinterviewed again at the end of the study todetermine what they had learned about portfo-lios and how their participation in the projecthad influenced their professional development.There were two cross-site meetings, one inJanuary and one in April. The procedure forthese meetings was similar (see Table 2). Aftergetting acquainted, the visiting teachers ob-served in the home-site teachers' classroomsfor a morning. The purpose of the observationwas to orient the visitors to the particularcultural context of the classroom and to allowthem to become acquainted with instructionalpractices reflected in the portfolios they weregoing to review. Following the observation,the visiting teachers met as a group with thelocal site director to discuss their observations.

The following 2 to 3 days, teachers evaluat-ed portfolios from both sites using specificevaluation rubrics. At the first cross-site meet-ing, held at KEEP, teachers evaluated Bellevueand KEEP portfolios using the KEEP evalua-tion system. At the second meeting, held inBellevue, they used the Bellevue rubric. In


17

10 Valencia & Au

Table 2Procedure for Cross-Site Collaboration and Portfolio Evaluation

Pre-meeting interviews

4

Meeting at KEEPclassroom observations & discussionsorientation to portfolios from both sites (collaborative descriptive discussions)training using KEEP evaluation criteriaevaluation of KEEP and Bellevue portfolios using KEEP evaluation criteriacollaborative cross-site discussions

4

Meeting at Bellevueclassroom observations & discussionstraining using Bellevue evaluation systemevaluation of KEEP and Bellevue portfolios using Bellevue evaluation criteriateacher development of Common evaluation criteriaevaluation of KEEP and Bellevue portfolios using Common evaluation criteriacollaborative cross-site discussions

Post-meeting interviews

addition, they collaboratively developed a newset of Common criteria drawing from both setsof learning outcomes and evaluation systems,and from their experiences working together.They applied the Common criteria to portfoliosfrom both sites. Eight portfolios were evaluatedusing each evaluation criteria. Finally, at theclose of each meeting, there was a discussionand debriefing about what the teachers hadlearned about portfolios and gained from theexperience of working with teachers fromanother site.

For the purposes of portfolio evaluation,teachers worked in cross-site groups of 4the

4 primary teachers comprised one group andthe 4 intermediate teachers comprised theother. At the first cross-site meeting, teachersspent 2 hr familiarizing themselves with theportfolios from each site. Each teacher offeredone of her portfolios for others to review.Using a descriptive process in which collabora-tive dialogue is featured (Moss, 1994; Valencia& Place, 1994b), each teacher observed as theothers discussed their interpretations of herstudent's portfolio. The teacher entered thediscussion at the end, offering her confirma-tions, additions, and disagreements to theconversation. After portfolios from both sites

NATIONAL READING RESEARCH CENTER; READING RESEARCH REPORT NO. 73

18


had been discussed, teachers synthesized theways in which the portfolio evidence from thetwo sites was similar and different. Often,work was linked to what the teachers hadobserved in the classrooms the previous day.They discussed why certain pieces were includ-ed in the portfolios and the changes they mightwant to make in their own portfolios. Thisorientation occurred only at the first meeting.It was a critical step for clarifying the evidenceand the learning outcomes. By the secondmeeting, teachers were very familiar with theportfolio models and evidence from both sites.

Training for the actual evaluation beganwith an overview of the evaluation system to beused, followed by practice rating one portfoliofrom each site. With the initial descriptivediscussions as background, differences ininterpretation of evidence and performancecriteria were discussed and negotiated. Formalevaluation occurred next. Each teacher contrib-uted one portfolio to be evaluated by the 4members of her cross-site group, making atotal of four portfolios to be evaluated by eachgroup. Teachers evaluated each portfolioindependently, spending approximately 20 to30 min with each. They read through all thepieces in the portfolio, looking within andacross pieces for evidence of students' abilitieson each of the learning outcomes or bench-marks. Then they recorded a rating for each.These ratings were used in the analyses. Afterrating all the portfolios on her own, eachteacher discussed her ratings with her same sitepartner and then with the cross-site group of 4.

Data Sources

Portfolio evidence. The evaluated portfolioswere reviewed at each site meeting and a table

of contents compiled for each. The contentswere analyzed to determine variability acrosssites, alignment with outcomes and portfoliorequirements, and changes in portfolio contentsover time. Pieces were coded according to themajor disciplinary focus of each artifact: read-ing, writing, and other subjects (e.g., science,social studies). For example, reading responsejournals, summaries, book reports, and teach-ers' notes on students' oral reading (i.e., run-ning records) were coded as reading artifacts.Writing artifacts included different types ofwriting (i.e., stories, poems, reports, journals),planning notes, rough drafts, final copies, andsamples of daily oral language activities.Pieces were also coded according to the type ofartifact: student work (e.g., reports, interestsurveys, writing); anecdotal notes/checklists(e.g., teacher records, running records); other(e.g., photos, audiotapes, artwork); student-selected pieces/goals/self-evaluation (e.g.,student entry slips on portfolio items, self-eval-uation forms, personal goals); and parent input(questionnaires, comments). Some artifactswere double coded. For example, a piece ofwriting with rough draft and a student entryslip describing why the student selected thepiece for the portfolio was coded as writingand student-selected/self-reflection; a finaldraft of a book report was coded as readingand writing; a research report on salmon wascoded as writing and as other content area; andrunning records were coded as reading and asanecdotal notes in order to reflect both thefocus of the activity and the documentationmode. Most of the reading artifacts werewritten responses (i.e., a reading log withcomments, literature response journal); however,


19

12 Valencia & Au

the majority of these were coded only as read-ing because writing was not the focus of in-struction, simply the mode of response.

Evaluation criteria. Each site had developedand used its own construct-centered evaluationsystem to accompany its outcomes and portfo-lio model (Messick, 1994). The Bellevue evalu-ation rubric was designed by participatingteachers around its major Student LearningOutcomesReading Ownership, ReadingAbility, Reading Variety, Reading Strategies,Writing Ownership, Writing Ability, WritingProcess, and Self-Reflection/Evaluation. It

included descriptors of performance for eachoutcome for ratings of 1, 3, and 5, althoughraters could assign scores in between (Valen-cia, 1996). This could be considered a "highinference" system. Descriptions do not specifythe form of types of information that must befound in the portfolio. Instead, there are gener-al criteria associated with each particular out-come and performance level. An example forreading comprehension (Reading Ability) ispresented in Table 3. The evaluator looked forthe required common tools in the Bellevueportfolios such as reading logs and questions,reading summaries, and reading responsejournals, although these tools might haveprovided evidence on other outcomes as well.Additional evidence for reading comprehensionmight have been found in the form of self-re-ports, running records, reading activities, as-signments, or anecdotal records. The evaluatorwrote down supporting evidence and conclu-sions, compared the evidence with the rubric,and then assigned a rating of 1-5 or M for"missing." This procedure was followed foreach outcome. In the Bellevue system, the

specific descriptors associated with a rating of3 were designed to represent typical perfor-mance for each outcome for students at thedesignated grade level. By using both descrip-tors and grade-level designation, teachers feltthey would be able to provide useful informa-tion to many different audiences.

The KEEP evaluation system was designedby KEEP consultants, and similar to the Belle-vue system, it was designed around its fourmajor literacy outcomes: Reading Ownership,Reading Process, Writing Ownership, andWriting Process. Within each of these out-comes is a list of specific benchmarks, indicat-ing the kind of evidence that should be foundin the portfolios (Asam et al., 1993). There arebetween 7 and 21 benchmarks for each literacyoutcome at the primary level, and 9 to 28benchmarks for each at the intermediate level.The number of benchmarks increases by gradelevel. Specific evidence is required for eachbenchmark. For example, writing samples,including drafts and published versions, arerequired for benchmarks under Writing Pro-cess. This could be considered a "low infer-ence" system compared with the Bellevuesystem, since it is quite specific about whatcounts as evidence. An example for compre-hension is presented in Table 3. When examin-ing the portfolio, the evaluator had to findevidence for each benchmark. For example,for "writes personal responses to literature,"there needed to be copies of pages from aliterature log. Some benchmarks were coveredby the same piece of evidence. For example, awritten story summary might have served asevidence for the student's ability to compre-hend and write about the theme, as well as


20


TABLE 3Comparison Across Three Evaluation Criteria for Reading Comprehension (Intermediate Grade)

Bellevue KEEP Common

(5) Personal response; synthesis;coherence; theme, major con-cepts; significant details; co-herent/concise summary; appliesto prior knowledge; grade-leveltext or above

(3) Some personal response; at-tempt to synthesize; main idea orproblem; some details; literalfocus; logical sequence; grade-level text or above

(1) Limited response; summariesare retellings; sketchy; basicfacts; misinformation or mis-understandings; grade-level textor below

(M) No evidence in any of theartifacts; evidence missing

Written Response: AestheticWrites personal responses to

literatureComprehends and writes about

theme/author's messageApplies/connects theme to own

life/experiencesMakes connections among different

works of literatureApplies/connects content text

information to own life/experiences

Written Response: EfferentReads nonfiction and shows

understanding of contentWrites summary that includes story

elementsUses clear, meaningful language to

express ideas in written responsesor summaries

Reads different genres of fictionand shows understanding of genre

characteristicsUnderstands elements of author's

craft

Research StrategiesObtains facts and ideas from a

variety of informational textsUses a variety of reference materialsUses graphic organizersUses a variety of library resourcesTakes notesWrites research report synthesizing

information from multiple sourcesPublishes research report of

equivalent productUses a variety of outside resources

(5) Coherence; personal response;theme/big idea; story elements; sig-nificant details; connections; readsdifferent types of material; has awide range of reading interests(more than 3 types); grade-leveltext or above

(4) Not as much coherence or in-sight, but still has a personal re-sponse; theme/big idea; story ele-ments; significant details; connec-tions; reads more than 3 types ofmaterial; grade-level text or above

(3) Retelling present or poorly writ-ten summary; surface information;sketchy; mostly literal understand-ing; story elements; reads 2-3 typesof material; grade-level text orabove

(2) Incomplete retelling; little co-herence; details; reads 1-2 types ofmaterial; grade-level text or below

(1) Minimal response; little or noreading; grade-level text or below

(M) No evidence in any of the arti-facts; evidence missing

knowledge of story elements. However, theevaluator of a third grader's portfolio had tofind evidence for all 19 benchmarks. On thebasis of the evidence, the evaluator rated the

student S (satisfactory, grade-level perfor-mance), D (developing, below grade-levelperformance), or M (missing, no evidenceavailable) for each specific benchmark. For


21

BEST COPY AVAILABLE

14 Valencia & Au

benchmarks that depended on the story summa-ry as evidence, the evaluator referred to anchorpieces serving as examples of performance ator below grade level. If, after examining all theevidence, the student received an S on allbenchmarks, the evaluator assigned an overallrating of "at grade level." If the student re-ceived one or more Ds, the rating given was"below grade level."

The Common evaluation system was devel-oped collaboratively by the teachers at thesecond cross-site meeting. It more closelyresembled Bellevue's system than KEEP's; itwas oriented to just seven outcomes, ratherthan to a whole series of benchmarks. Theoutcomes were Reading Product, ReadingProcess, Reading Enjoyment, Writing Product,Writing Process, Writing Enjoyment, andSelf-Evaluation/Reflection. Like the Bellevuesystem, the Common system used a descriptive6-point scale, with a rating of 3 describingtypical grade-level performance. The descrip-tors for the points on the scale were somewhatmore specific than in the Bellevue system, butnot as specific as in the KEEP system. Thus,the Common evaluation system required moreinference than the KEEP system but less thanthe Bellevue system.

The treatment of reading comprehension inthe Common evaluation system is similar tothat in the Bellevue system, focusing on abroader outcomeReading Product. However,the teachers also liked the specificity providedby the benchmarks in the KEEP system. Theyfelt that rating portfolio evidence would beeasier if descriptors could be made more pre-cise and detailed than in the Bellevue system.For this reason, they began developing descrip-

tors for ratings of 1 through 5 for the Commonevaluation system (see Table 3). The teachers'intention was to integrate some of the KEEPbenchmarks into the descriptors, but time didnot permit them to accomplish this task. How-ever, they spent considerable time discussinghow concepts listed under the benchmarkswould fit within each of the outcomes. So,although the written documentation was notfully developed (see Table 3), their discussionhelped them clarify and specify their under-standing of the criteria. They also suggestedthat it would be useful to specify requiredcommon tools to be used in the future for thenew evaluation system.

Interrater agreement was calculated amongevaluators across sites. Using the KEEP nomi-nal scale of S, D, and M, agreement amongraters required an exact match. Using theBellevue and Common scales, ratings that werewithin 1 point were considered matches aswere matches in "missing" designations.

Professional development. Teachers' pro-fessional development was examined qualita-tively by examining the information providedin the individual interviews conducted at thebeginning and end of the study. In addition,audiotapes made at the meetings providedevidence of what teachers were learning.Audiotapes were made of all whole-groupdiscussions, portfolio orientation sessions, anddebriefing discussions after teachers' observedeach other's classrooms. Records of the con-tents of portfolios teachers submitted at thefirst and second meetings served as anothersource of information about changes in theteachers' understandings.


22


Table 4Content Analysis of Portfolios

Primary Intermediate

Bellevue KEEP Bellevue KEEP

Average total number of pieces' 18 18 22 23

Disciplinary Focus

reading 13 7 6 2

writing 8 8 8 11

other subjects 4 2 5 1

Type of Artifact

student work 16 15 20 14

anecdotal notes/checklists 1 4 0 6

other (photo, drawing, audiotape) 2 1 1 2

student-selected/self-assessment/reflection

8 5 10 6

parent entry/comment 1 0 0 1

I All numbers are averages across the portfolios evaluated. Each portfolio artifact was coded according to the variableslisted below. As discussed in "Data Sources," some artifacts were double coded. Therefore, column totals do not equalaverage total number of pieces.

Findings and Discussion

Portfolio Content Analyses

The contents of the portfolios were used todetermine if they included authentic, highquality samples of the curriculum outcomes,

and to compare the types of artifacts acrosssites and over time. There had been no a prioriagreement about the portfolio evidence nor hadthere been any contact between teachers acrosssites before the first meeting. Table 4 presentsa summary of the average number of types ofartifacts included in all 8 portfolios analyzed at


23

16 Valencia & Au

the first meeting. As noted above, artifactswere coded according to several criteria, whichresulted in double coding of some artifacts.

The analysis revealed surprising similaritybetween Bellevue and KEEP portfolios in termsof the average number of pieces collected ineach portfolio between September and January.However the distribution of pieces across sitesvaried considerably and was consistent with theemphasis at each site. For example, Bellevueportfolios included many more reading piecesthan KEEP portfolios, reflecting attention tocollecting both reading and writing portfolioartifacts in the Bellevue portfolios. On theother hand, there were only a few more writingpieces in KEEP portfolios than Bellevue's eventhough 3 of the 4 KEEP teachers emphasizedcollecting evidence from their writers' work-shop. Teachers confirmed that it was easier tocollect writing evidence than reading evidence,and suggested that without deliberate attentionto collecting reading artifacts, it was difficult torepresent reading in the portfolios. Conse-quently, an emphasis on reading in the portfo-lios resulted in more reading evidence, but acorresponding emphasis on writing in portfo-lios did not produce more writing evidence.

Several different types of artifacts werefound in the portfolios. The most common,however, was student work. Within this cate-gory, student-generated written work predomi-nated. For example, there were many samplesof students' original writing such as stories,reports, responses to what they were reading,and daily journals. Often there were roughdrafts and final published pieces: There werealso several surveys and reading logs. Only afew, 2 out of 81, of the artifacts were lower

level, fill-in-the-blank types of activities. Theoverwhelming majority of student work inthese portfolios represented authentic instancesof students' literacy learning; artifacts of stu-dent performance resembled meaningful read-ing and writing behaviors and tasks. For exam-ple, there were pen pal letters, selected pagesfrom learning journals, book logs, originalpoetry, and reading and writing interest sur-veys from the beginning of the school year.One teacher remarked,

I think one of the reasons that our projectworked is because we were really looking at. . . very authentic evidence of kids' work.There weren't any worksheets in there, andI'm not just panning worksheets, . . . but Ifelt like we were looking at authentic evi-dence of what kids could produce.There were slightly more student-selected

pieces, goal setting artifacts, and portfoliovisits in Bellevue portfolios, consistent withtheir portfolio model and Self-Reflection out-come. On the other hand, KEEP portfoliosincluded substantially more teacher observationand anecdotal data. This is most likely becauseseveral KEEP benchmarks required completionof observation checklists and one requiredrunning records.

There was more consistency in both disci-plinary focus and types of artifacts amongportfolios from the same classroom than amongportfolios from the same sites; we could easilyidentify portfolios from the same classroom.For example, one class had spent severalweeks researching and writing state reports. Asa result, the reading and writing portfolioartifacts in those portfolios were distinctivefrom other classes. Some of these artifacts


24


served as required pieces, others were selectedby the teacher or student as personal entries.Although teachers included items according totheir specific site guidelines, they also includeda variety of artifacts they, or their students,wanted to include. Even with this individuality,there was still a marked similarity amongportfolios from the same site; Bellevue portfo-lios looked more like other Bellevue portfoliosthan like KEEP portfolios. Bellevue's portfo-lios were distinctive for their "common tools"(i.e., reading summaries, book logs, entryslips, portfolio visits), and KEEP portfolioswere distinctive for their reading and writing

checklists.While reviewing each other's portfolios at

the first cross-site meeting, teachers spontane-ously discussed types of evidence that werenew or interesting to them. For example,Bellevue teachers noted how KEEP teachersdocumented the planning stage of writing andhow they used the benchmarks to help studentsself-evaluate their writing. Similarly, KEEPteachers were interested in Bellevue's commontools, use of entry slips, and the ways teachersdocumented reading as well as writing. Areview of portfolio contents at the secondcross-site meeting revealed that teachers fromboth sites had incorporated some of the teach-ing strategies and portfolio artifacts they hadlearned from each other. Bellevue portfolioscontained more evidence of planning for writ-ing, using rubrics with students, and makingintertextual connections. KEEP portfolioscontained more entry slips, reflections aboutlearning, and information related to students'reading. In some cases, these new artifactsreflected an effort to document student learning

that teachers reported they had observed buthad not included in a portfolio. For example,most teachers reported holding reading confer-ences with students and having them plan forwriting, but they had not focused on includingconference notes or planning notes in theportfolios. In other cases, the new artifactsreflected a new instructional emphasis teachershad learned during observation, discussion,and evaluation. For example, at the first meet-ing, several teachers discussed the importanceof having students make intertextual connec-tions during literature discussions. Otherteachers found these ideas new and intriguing.By the second meeting, several teachers hadtried to help students make more meaningfulintertextual connections and had included thatevidence in their portfolios.

We should note, however, that teacherswere as struck by the similarities in their phi-losophies and instructional emphasis as by thedifferences in the portfolio contents. As oneBellevue teacher put it, the contents of theKEEP portfolios

are somewhat different from ours, but theyask a lot of the same things . . What thestudent produces is very similar. Our portfo-lios might look a little different, but you getthe essence of the child, I think, in both ofthem.

A KEEP teacher noted:It was so neat to find that the Bellevue teach-ers and KEEP teachers shared many commongoals in literacy. We found many similarsuccesses as well as challenges. How won-derful that we could examine portfoliosseparately and still come up with commonagreement and understanding of an individual

through [the] portfolio.


25

18 Valencia & Au

Table 5Interrater Agreement: Site by Evaluation Criteria

KEEP Portfolios Bellevue Portfolios Total

KEEP Criteria 80 74 77

Bellevue Criteria 81 89 85

Common Criteria 87 97 92

With a shared understanding of literacy learn-ing and teaching behind them, and a familiaritywith one another's portfolio artifacts, theseteachers could see the commonality in learningoutcomes and uniqueness of individual childrenin portfolios from different sites and class-rooms.

Overall, the content analyses suggest thatportfolios contained high quality, authenticsamples and records of students' reading andwriting. They reflected the underlying out-comes and portfolio models of each site as wellas the emphasis of individual teachers. Howev-er, evidence of some outcomes and some typesof work, particularly reading outcomes anddiscussions, were apparently difficult to docu-ment using a portfolio, resulting in limitedinformation about particular outcomes. Never-theless, the process of reviewing and discussingportfolio contents provided teachers with ideasabout both assessment and instruction thatcarried over into their own classrooms andtheir students' portfolios.

Evaluation Process and Results

Qualitative and quantitative data from theevaluation sessions provide insights about the

feasibility and desirability of cross-site evalua-tion of classroom-based portfolios. Resultsindicate that teachers from different sites wereable to learn and apply their own and otherevaluation systems. Interrater agreement,averaged across outcomes, increased from theKEEP, to the Bellevue, to the Common criteriafor portfolios from both sites (see Table 5).The relatively strong interrater agreementsuggests promising possibilities for reportingresults for groups of students across sites. Suchagreement is remarkable because the portfoliosdid not contain similar prespecified pieces;they were developed to be most useful at thelocal district and individual classroom level. Asa result, there was substantial variability in thecontents across portfolios from different sites.Additionally, because most of the pieces in theportfolios were complex authentic examples ofreading and writing, individual pieces oftencontained evidence of several outcomes orbenchmarks. For example, a reading responsejournal might contain evidence of readingownership, writing ownership, reading com-prehension, and writing process. The evaluatorwould need to review each piece carefully sothat all possible evidence was identified withina particular artifact. Furthermore, individual



portfolio artifacts did not receive ratings;instead, the entire contents of each portfoliowas reviewed and then a rating given for eachreading and each writing outcome or bench-mark. Evaluators reviewed 12 to 25 pieceswithin each portfolio. Finally, the teachers hadonly a brief training for evaluation, and thenspent only 30 min rating each portfolio for allthe outcomes or benchmarks.

Findings from the evaluation process alsoprovide a perspective on one specific aspect ofvaliditycontent or construct underrepresenta-tion (Linn et al., 1991; Messick, 1994). Al-though most proponents suggest that portfoliosprovide valid assessment of literacy learning,three assumptions underlie this belief. First,it's assumed that all identified outcomes orbenchmarks are adequately documented in theportfolio. Second, the evaluation criteria usedto evaluate the portfolios must be aligned withthese outcomes or benchmarks. Third, it'santicipated that teachers can agree on theevidence in a portfolio that can be used toevaluate each specific outcome or benchmark.These are difficult conditions to achieve whenportfolios are generated and evaluated within asingle sitesignificantly more difficult whenportfolios are generated and evaluated acrossdifferent sites. In cross-site evaluation, it isimportant to determine if there is a fit betweenportfolio evidence and the evaluation criteriaused within and across sites. Additionally,teachers must be able to interpret differenttypes of complex portfolio evidence producedin different contexts to fit site-specific out-comes or benchmarks. To address these issuesof fit, we examined the extent to which theactual portfolio artifacts were judged to repre-

sent evidence of the outcomes and benchmarksincluded in the various evaluation criteria.Specifically, we looked for missing data todetermine if the portfolios from each siteprovided sufficient evidence to make judgmentsabout student performance. If teachers cannotfind or interpret artifacts needed to applyspecific evaluation criteria, then the validity ofthe evaluation should be questioned.

Figure 1 presents the percent of missingdata recorded for portfolios from each siteaccording to each evaluation criteria. As teach-ers rated, they looked for evidence of specificoutcomes or benchmarks in each portfolio. Ifan individual rater determined that there wasno evidence in any of the artifacts, she record-ed "missing." The percentage depicted inFigure 1 represents the percent of all the rat-ings assigned to each portfolio that were desig-nated "missing." There was a substantialamount of missing evidence recorded by theteachers when using the KEEP evaluationcriteria, less for the Bellevue criteria, andslightly less for the Common criteria..

In-depth analysis suggests that the missingdata for the KEEP evaluation system was aresult of several factors. First, the KEEPcriteria required very specific indicators ofperformance and specific types of information(e.g., running records, observation checklists);if specific types of evidence were not found,"missing" was recorded. This could be consid-ered a problem of grain size," that is, need-ing to find very particular indicators of perfor-mance. Second, three KEEP teachers did notsystematically collect portfolio information onreading and one did not collect information onwriting. Although the KEEP teachers taught


27

20 Valencia & Au

I KEEP Portfolios

2 Bellevue Portfolios

Figure 1. Percent missing data across three evaluation criteria.

both reading and writing, they did not try tofocus portfolio information on both. Conse-quently, there was a significant amount ofmissing data for reading in the KEEP portfolioswhen evaluating with the KEEP criteria. Onthe other hand, there was also a substantialamount of missing data in the Bellevue portfo-lios when the KEEP criteria were applied. Thiswas because Bellevue teachers did not includethe very specific kinds of evidence required forsome of the KEEP benchmarks (e.g., runningrecords, preplanning for writing, writing inmultiple genres).

In contrast, there was relatively little miss-ing data found when the Bellevue evaluation

criteria were applied; this was most likely areflection of the more global nature of theBellevue descriptors. Missing evidence inKEEP portfolios was predominantly in the areaof student self-reflection, which was not aKEEP benchmark. KEEP teachers had notemphasized student self-reflection in theirteaching or in their assessment. As a result,there were few, if any, artifacts of self-reflec-tion in the KEEP portfolios. There were alsosome missing data in the Bellevue portfolios inthe area of Reading Strategies. Bellevue teach-ers had difficulty documenting this outcomeeven though it was part of their curriculum.

The Common evaluation system, which wasa synthesis of the KEEP and Bellevue criteria



(adopting the global outcomes of the Bellevuerubric with some of the specificity of KEEPbenchmarks), produced missing data resem-bling that identified for the Bellevue rubric.Missing data were found primarily in areas ofself-reflection, writing process, and readingprocess. These areas replicated the missingdata identified when the KEEP and Bellevuecriteria were applied.

These findings suggest that missing data arenot only a problem when applying an "outside"evaluation criteria to local portfolios but whenapplying the "inside," locally developed crite-ria as well. If teachers or students are unfamil-iar with the evaluation criteria ahead of time,they may not select work required by thecriteria. This does not necessarily mean thatoutcomes or benchmarks were not taught orlearned, or that the portfolios could not orwould not include this evidence. It simplysuggests that there was not a perceived need toinclude it. For example, after reviewing andevaluating KEEP portfolios, one Bellevueteacher commented that she wanted to try todocument small group discussions and researchprojects in her portfolios. Although these werealready a part of her instructional program,there was no evidence in her students' portfo-lios to demonstrate these abilities. Alternative-ly, it is possible that evidence was missingbecause a particular outcome was NOT part ofthe class curriculum. An example would beself-reflection for the KEEP portfolios.

The nature of these missing data suggeststhat a specific evaluation system used with aclassroom-based portfolio may not provide avalid measure of a particular child or of aparticular classroom (e.g., no information

placed in the portfolio; using evaluation criteriathat were not intended for a particular site). Toremedy this might require a fairly minor adap-tation, for example, requesting that teachersinclude specific artifacts typically found intheir classrooms. Both the KEEP and Bellevueteachers assured us this was possible and desir-able. Alternatively, it might require a moredramatic interventiona change in the curricu-lum (outcomes/benchmarks), instructionalemphasis, or revision of the evaluation criteriato align it with the actual curriculum.

Although there was a considerable amountof missing evidence, teachers across sitesgenerally were able to agree on the presence orabsence of evidence. On average, teachersreached 94 % interrater agreement about wheth-er evidence for each outcome could be found inthe artifacts or. whether evidence was missing.They were able to examine many differenttypes of complex portfolio artifacts generatedin different classrooms, and agree on whatcounted as evidence of particular learningoutcomes or benchmarks. It appears that theteachers shared a strong knowledge base inliteracy that enabled them to interpret studentwork consistently. There were few instances(less than 6%) of teachers from one site findingsufficient evidence to assign a rating and otherteachers finding no evidence.

Cross-site evaluation of portfolios alsoprovides insight about teachers' abilities to usea different set of outcomes and criteria toevaluate their own and others' portfolios. Inaddition to the complexities outlined above,teachers had to step outside their own portfoliomodels, artifacts, instructional strategies,evaluation systems, and personal knowledge of


29

22 Valencia & Au

100

90

80

70

60

50

40

30

20

10

score agree

MI miss agree

KEEP port Bell port 'KEEP port Bell port 'KEEP port Bell port

Figure 2. Interrater agreement across three evaluation criteria.

their students to evaluate the portfolio artifacts.We addressed this aspect of evaluation byexamining if teachers' ratings of portfoliosfrom their own site were different from ratingsgiven by the "outside" teachers.

Our analysis suggests that teachers did nothave a bias when interpreting and rating port-folios. Bellevue teachers and KEEP teachersdid not consistently rate their own portfolioshigher or lower than the teachers from theother site. KEEP teachers never rated theirportfolios higher than Bellevue teachers andBellevue teachers rated their own portfolioshigher than KEEP teachers less than 1 % of the

time. All the teachers appeared to be able touse the portfolio information, adjust theirexpectations to the criteria being used, andapply it equally to their own and others' port-folios. In light of concern about fairness ofnew assessments (Linn et al., 1991), thisfinding is especially promising. Not only werethe students from very different cultural back-grounds, but the teachers (raters) were as well.

Overall, the evaluation data suggest thatthese teachers learned to rate portfolios with ahigh degree of interrater agreement usingseveral different, but philosophically similar,evaluation systems. Several factors most likely


30


contributed to this consistency. First, teachersshared common conceptual understandingsabout literacy learning that had been developedfor 4 years at each site before this projectbegan. These understandings were furtherenhanced through the classroom visits, cross-site portfolio discussions, and the evaluationprocess that were part of this study. Thiscommon knowledge was reflected in the teach-ers' abilities to interpret portfolio work fromtheir own site and the other site. One teachercommented:

I think what surprised me most was howeasily we could communicate with the teach-ers about portfolios. . . . We almost immedi-ately got into good discussions about how theevidence was collected. I learned that wecould talk to teachers from a very differentsetting about portfolios and come up withmuch agreement about the students' skillsand performances. The KEEP teachers sawthings in my portfolios that transcended thetools and "spoke" of the student.Integral to this professional collegiality was

the low stakes nature of the project. Teachersfelt fortunate to be part of this working group,were generous with their colleagues, and eagerto learn. One Bellevue teacher noted that,

With the KEEP (teachers), I think there wasmore opportunity to really look at whatpeople were doing and then say, "Gee, Ithink I need to be doing that" whereas in alot of our own meetings, . . . it's sometimesbeen competitive.A second explanation for strong interrater

agreement might be the inclusion of missingdata. Figure 2 depicts the difference in inter-rater agreement when missing agreements areadded or deleted from the calculations. Obvi-

ously, interrater agreement would have beensubstantially lower, especially using the KEEPrubric, had we not counted "missing-missing"agreements into our calculations. However,most evaluation systems do not use a "miss-ing" category, but rather rate no evidence orminimal evidence with the lowest value on theevaluation scale. We believe that the ability tointerpret complex work in portfolios, and tosee (or not see) evidence of outcomes areimportant issues for portfolio evaluation andprofessional development. This is especiallytrue if portfolios are locally defined but evalu-ated across sites; the complexity and variabilitywill always be present to complicate interpreta-tion.

We also believe missing data shed light onthe alignment of curriculum and assessment. Asubstantial amount of missing evidence sug-gests a lack of alignment, an underrepresenta-tion of the targeted outcomes (Messick, 1994).Consequently, the validity of the assessmentwould be questioned. Our experience suggests,however, that teachers can become more fo-cused on collecting portfolio evidence whenthey know what is expected and believe thatevidence would be useful to them as well asothers. Nevertheless, even with the best inten-tions, we found that some types of evidencewere difficult for teachers to document.

Finally, the increase in interrater agreementacross evaluation criteria was probably some-what influenced by the progression from a veryspecific evaluation system (KEEP) to moreglobal criteria (Bellevue), to the collaborativedevelopment of the Common criteria whichteachers felt combined the best of both sys-tems. In addition, although 4 months separated


31

24 Valencia & Au

the two meetings, and teachers did not evaluateportfolios at their local sites during this time,they were interested and motivated to preparefor the next cross-site meeting. This, in turn,focused their attention on the portfolio con-tents, the outcomes and benchmarks, and theevaluation process, which might have improvedtheir interrater agreement over time.

The evaluation process was judged byteachers to be a valuable part of portfolioimplementation. Their comments suggest thatcollaboratively learning to identify evidenceand then rate that evidence, provided insightabout assessment and instruction that' theymight otherwise not have gained. Because therewere low stakes placed on the scores and highpriority placed on professional development,teachers were not intimidated by the evaluationprocess or defensive about the results. Theywere able to take full advantage of the opportu-nities. They found the experience to be engag-ing, interesting, and professionally useful. Saidone teacher,

I can't remember when an experience likethis gave me as much to think about in termsof what I could adopt and adapt. The chanceto re-examine what we had developed bylooking in a very similar but different mirroris a very unusual opportunity and (one) to behighly prized.

Professional Development

A more detailed examination of professionaldevelopment issues is made possible by lookingclosely at two teachers, one from Bellevue andone from KEEP. These two were selectedbecause they typify the eight teachers in thestudy. They were articulate and eager to talk

about their experiences. Through their wordsand experiences, we gain insights that areevident in the protocols of the other participat-ing teachers. We analyzed the statements madeby these teachers during interviews and meet-ings and looked at their students' portfolios forevidence of the influence of the project upontheir professional development. The teachers'names are used with their permission.

Sue Bradley, who taught a third- throughfifth-grade multiage classroom, had 12 years ofteaching experience at the time of the projectand had a strong background in the use ofportfolios. Sue was an original member of theBellevue portfolio group, had used portfoliosin her classroom for 4 years, and had been oneof the district's representatives to the NewStandards Project. When Sue first observed theclassroom of Nora Okamoto, a KEEP fifth-grade teacher, she immediately noticed similar-ities to her own teaching. "For example, writ-ers' workshop, literature circles, responsejournals were all things that I instantly recog-nized in Nora's room." These similaritiesallowed her to focus on specific details ofNora's instruction, such as "how she paced herday, what she asked of students in and out ofclass, and visible signs around the room of theaccountability that she had established for herstudents." The similarities in philosophy andinstruction made it easy for Sue to communi-cate with Nora and the other KEEP teachersabout portfolios. "There was little lead timeneeded to build a common understanding, andinstead we almost immediately got into gooddiscussions about how the evidence was col-lected."

After her visit to KEEP, Sue noted that shewanted to try more of the modeling of reading


32


and writing with her students that she had seenNora do. In the classroom of another KEEPteacher, she was impressed by the student-ledliterature discussions and the emphasis onquality responses. She remarked on the KEEPteachers' constant use of the terms of the KEEPreading and writing benchmarks (e.g., author'scraft, reading/writing connections, reading andwriting in different genres), which gave stu-dents the language for communicating withteachers about their progress. Finally, shethought it was a good idea to involve studentsin assisting with record keeping.

Involvement in portfolio evaluation anddevelopment of the Common evaluation rubric,with the complex discussions that these en-tailed, broadened Sue's understanding of port-folios. She stated:

I think it's a healthy process because anytime you have to teach somebody what youmean, it is like peer teaching in a classroom.It's not just giving, it's a very reciprocalsituation. So when you have to explain whatyou meant in a certain situation to someoneelse, you're then clarifying your own think-ing.

The discussions helped Sue to see the signifi-cance of various artifacts within another teach-er's portfolio. She stated, "Because I hadconversations with her (Nora), sat and wrestledwith scores with her, I gained more of anunderstanding about what that chart (a portfolioartifact) represents."

At the conclusion of the project, Sue notedthat she had started being more specific withher students about the kinds of evidence neededto document their involvement with the writingprocess. She described herself as "much morereligious about looking for evidence of pre-

planning writing" to the point where she re-quired her students to write out their plansinstead of making this step optional. She hadher students evaluate their own writing usingan evaluation rubric, as a means of teachingthem about standards for literacy performance.She also had become more aware in her ownclassroom of the quality of students' responsesduring literature discussions.

The portfolio content analysis served toverify that Sue had succeeded in making sever-al changes. For example, one student's portfo-lio contained a plan for writing and a penpalletter (to a student in Nora's class) with aprewriting web. Clearly, this student hadbecome aware of gathering evidence for plan-ning her writing. This portfolio included apiece of explanatory writing with two student-scored rubrics and a checklist on which thestudent had evaluated her progress as a writer.These artifacts indicated that the student wasinvolved in activities to promote awareness ofstandards for literacy performance. The portfo-lio also contained the notes of a reading confer-ence, showing Sue's concern with the qualityof the student's responses to literature.

As noted above, Sue's counterpart at KEEPwas Nora Okamoto. Nora had been teachingfor 21 years and associated with KEEP for 4years. At the time of the project, she had usedportfolios in her classrooms for 2 years, butonly as part of her writers' workshop, not herreaders' workshop. In the first year, followingthe KEEP approach, she had familiarized thestudents with the fifth-grade benchmarks forwriting. Then she had gone through the stu-dents' portfolios and tagged pieces showingevidence that the students had met the various


33

26 Valencia & Au

benchmarks, a documentation procedure re-quired in KEEP's monitoring of studentachievement. During the second year, Noradecided to turn the responsibility for selectingand tagging evidence over to the students. Shefelt this approach had worked to address theonly weakness she saw in using portfolioassessment: that it could be extremely time-consuming. Before meeting the Bellevue teach-ers, Nora indicated that she was in the processof developing a system to assess her students'progress in reading to parallel the system shehad worked out for writing.

Contact with the Bellevue teachers accelerat-ed Nora's thinking about how to expand herportfolios to encompass reading as well aswriting. Nora found that immersing herself inanother portfolio system "afforded me theopportunity to see a wide range of data collec-tion tools that work well in assessing studentachievement and growth." By studying theBellevue portfolios, she gained many specificideas about how to document growth in read-ing. Before her visit to Bellevue, Nora had notseen a multiage classroom or a school buildingorganized into pods to encourage team teach-ing. However, like Sue, she was able to, seebeyond the differences in classroom and cultur-al contexts, noting Sue's students' involvementin many of the same activities she used in herown classroom, including literature circles,independent reading, and reading journals.

Nora indicated that involvement in portfolioevaluation and development of the Commonevaluation rubric had "broadened [her] per-spective on portfolio contents and organiza-tion." She gained confidence in her ability tounderstand descriptors and evaluate individual

students' portfolios. She was impressed by thegroup's ability to reach agreement about thecriteria for judging students' competence inreading and writing.

At the conclusion of the project, Noraobserved that she had become aware of theimportance of students' self-evaluation andreflection and regular portfolio visits, centralfeatures of the Bellevue portfolio system. Shenoted:

I've expanded on the student entry tags onthe writing process steps to include com-ments about specific skills, and I now havestudents evaluate writing projects upon com-pletion (self-evaluation and reflection). I planto use colored entry slips and the portfoliovisit questionnaire as part of the portfolionext year. The entry slips were helpful inunderstanding the significance of the workand why it was selected for the portfolio.Quarterly portfolio visits provided insightinto students' thinking about their overallprogress in that quarter.

Nora concluded that "participation in thisproject has had a positive impact on my learn-ing and growth as a classroom teacher." Sheadded that she had gained "methods to try,tools to use, and a better understanding ofportfolios as a way to integrate teaching, learn-ing, and assessment."

A content analysis of a sample of Nora'sportfolios at the end of the school year indicat-ed that she had already succeeded in makingseveral changes. At the end of the secondquarter, after her first meeting with the Belle-vue teachers, Nora had her students add thefollowing to their portfolios: a page from theirreading journals; a web showing efferent andaesthetic responses to literature; and activity

NATIONAL READING RESEARCH CENTER, READING RESEARCH. REPORT NO. 73

34


sheets showing vocabulary development, storysequencing, and an analysis of story parts.These items were evident in students' portfoliosin the fourth quarter. Entry slips similar tothose used in Bellevue were placed on theitems, indicating what students thought theitems showed about their progress as readersand writers.

As these case examples indicate, the teach-ers in this project gained specific ideas forimproving portfolio assessment in their class-rooms. Their study, rating, and discussions ofeach other's portfolios heightened their con-cerns about documenting students' literacylearning in as complete and sensitive a manneras possible. Teachers seemed to be adding totheir notions of what should be included in aportfolio and broadening their thinking aboutimportant dimensions of student performance.Their approach seemed to be additive; they didnot discard any of the dimensions recognized asimportant in their own portfolio system, butadded on dimensions highlighted in the othersite's portfolios (for example, self-evaluation inthe case of Bellevue and the planning of writingin the case of KEEP). In other words, theyappeared to be expanding the criteria by whichthey assessed students' literacy learning, know-ing that their instruction would have to beadjusted to meet these additional criteria. AsFrederikson and Collins (1989) suggest, devel-oping an evaluation system and learning toapply it can make the critical traits in perfor-mance clear to teachers as well as students.

Understanding a Portfolio AssessmentSystem: The Components and Conditions

Any discussion of this case study must takeinto account the naturally occurring context,

both for its best-case perspective and for therealities of classroom-based research. Portfo-lios had been well-established, integral ele-ments of these teachers' classrooms and, assuch, they reflected the experience, knowl-edge, commitment, professional development,idiosyncrasies, and individualization eachteacher brought to her own classroom and tothe cross-site collaboration. Such is the natureof all classroom-based assessment and, indeed,all instruction. We found that rather thanserving simply as assessment tools, effectiveportfolios were actually part of a complex andinteractive portfolio system that served tosupport both teaching and learning. Our pur-pose, therefore, is not to generalize to otherprojects, list the specifications for portfoliocontents, or suggest evaluation criteria, but tohighlight the essential components and theinternal and external conditions which create asuccessful portfolio assessment system.

The Components

The three components described in thisstudyportfolio evidence, the evaluationprocess, and professional developmentare atthe heart of such a portfolio system. All mustall be in place if portfolios are to improveinstruction, student achievement and class-room-based assessment. Each component hasa profound influence on the others and on theoverall effectiveness of the system (see Figure3). A brief description of the interaction amongthese components follows.

Interaction of portfolio evidence and profes-sional development. Our data and experiencessuggest that as teachers closely examine stu-


35

28 Valencia & Au

Figure 3. Model of effective portfolio system.

dents' work from their own and other class-rooms, they clarify important learning out-comes and learn to interpret student perfor-mance based on multiple forms of evidence.Working collaboratively, discussing portfolioartifacts, encourages teachers to reexaminetheir knowledge, assumptions, and misconcep-tions about teaching and assessment practices.Conversely, teachers' understandings of curric-ulum and instruction are reflected in portfolioevidence. We can see their strengths and weak-nesses, their priorities, and the opportunitiesthey provide for their students. As teachersgrow and change through professional develop-ment, their portfolio evidence begins to changeas well. This slow, iterative process is likely toproduce meaningful, sustainable changes in

both portfolio evidence and in underlyingclassroom instruction.

Interaction of portfolio evidence with theevaluation process. Using portfolios for evalu-ation forces teachers to rely on evidence fortheir decisions. It grounds evaluation in con-crete data. Teachers also learn that studentscannot be evaluated for learning that has notbeen documented or that has not been taught.When portfolio evidence is inadequate orunavailable, the evaluation process may becomplicated and the results may be misleadingor subject to question. Results will differdramatically depending on the interpretation,sources, and amount of portfolio evidence thatis missing. At the same time, the evaluationprocess can reveal gaps in portfolio evidencethat, in turn, reflect gaps in teaching and class-


36


room assessment. As a result, portfolio con-tents and related instruction are likely tochange to match the evaluation criteria. It isclear, however, that genuine changes in portfo-lio evidence occur only when teachers findportfolios useful in their own classrooms.

Interaction between the evaluation processand professional development. The evaluationcomponent is absent from many local portfolioprojects. Many reject it because of negativeassociations with standardized test scores andpsychometric difficulties of portfolio evalua-tion. However, when teachers are involved inthe development of evaluation criteria and theactual evaluation process, the nature of theinteraction between evaluation and professionaldevelopment changes from an antagonistic oneto a symbiotic one. The evaluation processbecomes a positive and valued vehicle forteachers to examine their instruction and ex-pectations for children. Evaluation of portfo-lios anchors teachers' expectations beyond theirlocal classrooms, and it reinforces the impor-tance of making criteria clear to students. Inaddition, with practice, teachers become betterevaluators, more able to identify evidence incomplex student work and to make reliablejudgments about the quality of that work. And,as noted above, once teachers are familiar withthe evaluation criteria and types of evidencethat are useful, they can adjust their portfoliosand instruction accordingly.

Teachers who have developed a strongknowledge base in teaching and learning havelittle difficulty applying evaluation criteria toportfolios from their own and others sites;those who are less well-grounded would likelyexperience difficulty. Furthermore, teachers'

knowledge and professional developmentexperiences influence their participation in theevaluation process. Experienced portfolioteachers use the process to ask questions, learnnew strategies, and support their colleagues.They enrich the evaluation process and takeadvantage of the opportunity for professionalgrowth.

Internal and External Conditions

Evidently, a push or pull on one componentwithin this system affects the others. In addi-tion, the nature of these interactions is influ-enced by the internal and external contexts inwhich they exist. Bird (1988) reminds us that"the potential of portfolio procedures dependsas much on the political, organizational, andprofessional settings in which they are used ason anything about the procedures themselves"(p. 2).

Supportive internal conditions are essentialto sustaining an effective portfolio system. Inthis study, the local conditions within each sitesupported the emerging portfolio systems.Stakes were low for teachers; district interest,support and commitment high. Local, long-term professional development encouragedgradual implementation of portfolios and devel-opment of an evaluation process. Conversa-tions about implementation difficulties wereimportant, not threatening, to teachers. Discus-sions about instruction were part of the pro-cess, not add-ons. In a supportive local con-text, teachers have ownership over their re-spective portfolio implementations, and theytake responsibility and pride in them. Teachersare supported and feel supported in their ef-


37

30 Valencia & Au

forts. The process is valued as much as theportfolio product.

The addition of a compatible cross-site(external) perspective strengthens each of thecomponentsportfolio evidence, evaluationprocess, and professional development. Be-cause, in this case, the external perspective wasperceived as supportive, there was no "coach-ing" for the evaluation sessions or time spentby the teachers or students perfecting portfolioartifacts. There was no fear of results. With anexternal perspective, teachers have an opportu-nity to expand their understandings of impor-tant learning outcomes, explore new instruc-tional techniques, and anchor their expectationsfor student performance against common publicstandards. An external perspective provides asounding board, a point of comparison, forlocally developed outcomes, strategies, andstandards. It provides an opportunity to discov-er inappropriate or unintentional skewing ofexpectations that sometimes comes from beingtoo isolated or close to one's own work.

Without supportive internal and externalconditions, the results of this study certainlywould have been different. Had there been lessflexibility in portfolio implementation, morerequirements for portfolio contents, more timespent on evaluation, or greater familiarity withthe evaluation criteria, we might have achievedhigher interrater agreement or had less missingevidence. However, we are equally convincedthat the benefits to professional developmentwould not have been as great. The portfoliorequirements and the evaluation process wouldhave been too far removed from local class-rooms to be meaningful and useful for teach-ers. Similarly, had there been less commonality

in literacy philosophy and knowledge-base,teacher support, or portfolio experience acrosssites, our findings would not have been aspositive. The external perspective was benefi-cial because it was compatible with the empha-sis, experience, and needs of each site. Hadthere been less compatibility, we are certainthat all the components would have been moreproblematic; portfolio evidence would havebeen more difficult to interpret, the evaluationprocess would have been more complex, inter-rater agreement would have been lower, andprofessional development would have beenminimized.

We realize that some will take exception toour model of a classroom-based portfolioassessment system with its emphasis on clearlystated outcomes and some required portfolioevidence (e.g., Carini, 1975; Hansen, 1994).Others will question the amount of support,cost, and time needed to implement such aproject. Others will challenge the evaluationprocess and credibility of results. And otherswill doubt the need for an external perspectivefor locally developed portfolios. However,with all the elements in place, we believe theprocess is a strong oneit has system validity(Frederilcson & Collins, 1989; Messick, 1994).This assessment process promotes valuedchanges in teaching and learning; this is theultimate goal of assessment.

Taken together, the process and products ofthis study point to the naiveté of mandatingportfolio implementation and expecting imme-diate. results. Certainly, states and schooldistricts can mandate teachers' use of portfo-lios, but they cannot mandate their effective-ness for assessment nor their positive effects on


38


teaching and learning. The challenges of ourexperiences most assuredly will be magnifiedwith mandated portfolio assessment, perhapsbeyond the point of finding the process or theresults beneficial to any stakeholdersteachers,students, parents, administrators, or policy-ma-kers. Instead, we argue for a portfolio systemthat includes attention to all the componentsportfolio evidence, evaluation process, andprofessional developmentall in supportivelocal and external contexts. Without such asystem, portfolio assessment will be disap-pointing, ineffective, and doomed to failure.With such a system, it can contribute to highquality education for all children.

Author Note. We are indebted to the teachers andadministrators associated with the Bellevue andKEEP portfolio projects for their collaboration,commitment, and support.

References

Applebee, A. N. (1991). Environments for lan-guage teaching and learning: Contemporaryissues and future directions. In J. Flood, J. M.Jensen, D. Lapp, & J. R. Squire (Eds.), Hand-book of research on teaching the English lan-guage arts (pp. 549-556). New York: Macmil-lan.

Arter, J. A., & Spandel, V. (1992). Using portfo-lios of student work in instruction and assess-ment. Educational Measurement: Issues and

Practices, 12(1), 36-44.Aschbacher, P. R. (1994). Helping educators to

develop and use alternative assessments: Barri-ers and facilitators. Educational Policy, 8,

202-223.Asam, C., Au, K., Blake, K., Carroll, J., Jacob-

son, H., & Scheu, J. (1993). Literacy curricu-lum guide. Honolulu, HI: Kamehameha Schools

Bernice Pauaki Bishop Estate, Early EducationDivision.

Au, K. H. (1994). Portfolio assessment: Experi-ences at the Kamehameha Elementary EducationProgram. In S. W. Valencia, E. H. Hiebert, &P. P. Afflerbach (Eds.), Authentic readingassessment: Practices and possibilities. New-ark, DE: International Reading Association.

Au, K. H., & Asam, C. A. (in press). Improvingthe achievement of low-income students ofdiverse backgrounds. In M. Graves, B. Taylor,& P. van den Broek, The first R: A right of allchildren. New York: Teachers College Press.

Au, K. H., Scheu, J. A., Kawakami, A. J., &Herman, P. A. (1990). Assessment and account-ability in a whole literacy curriculum. The

Reading Teacher, 43, 574-578.Bird, T. (1988). The schoolteacher's portfolio: An

essay on possibilities. In J. Millman & L. Dar-ling-Hammond (Eds.), Handbook of TeacherEvaluation: Elementary and Secondary Person-nel (pp. 241-256).

Calfee, R. C., & Perfumo, P. (Eds.). (1996).Writing portfolios: Policy and practice. Mah-wah, NJ: Lawrence Erlbaum Associates.

Calkins, L. M. (1994). The art of teaching writing(2nd ed.). Portsmouth, NH: Heinemann.

Carini, P. (1975). Observation and description: Analternative methodology for the investigation ofhuman phenomena. Grand Forks, ND: Centerfor Teaching and Learning, University of NorthDakota.

Chittenden, E., & Spicer, W. (1993). The South

Brunswick literacy portfolio project. Paperpresented at the New Standards Project: Eng-lish/Language Arts Portfolio Meeting, Minneap-olis, MN.

Darling-Hammond, L., & Ancess, J. (1993). Au-thentic assessment and teachers' professionaldevelopment. Resources for restructuring:Newsletter of Center for Restructuring Educa-tion, Schools, and Teaching, 1-6.


33

BEST COPY AVAILABLE

32 Valencia & Au

Frederikson, J. R., & Collins, A. (1989). A sys-tems approach to educational testing. Education-al Researcher, 18(9), 27-32.

Freedman, S. W. (1993). Linking large-scaletesting and classroom portfolio assessments ofstudent writing. Educational Assessment, 1,

27-52.Gearhart, M., Herman, J. L., Baker, E. L., &

Whittaker, A. K. (1992). Writing portfolios atthe elementary level: A study of methods forwriting assessment (CSE Techincal, Report No.337). Los Angeles: National Center for Re-search on Evaluation, Standards, and StudentTesting.

Gearhart, M., Herman, J. L:, Baker, E. L., &Whittaker, A. K. (1993). "Whose work is it?"A question for the validity of large-scale portfolioassessment (CSE Techincal Report No. 363).Los Angeles: National Center for Research onEvaluation, Standards, and Student Testing.

Graves, D. (1983). Writing: Teachers and chil-dren at work. Exeter, NH: Heinemann.

Haney, W. (1991). We must take care: Fittingassessments to functions. In V. Perrone (Ed.),Expanding student assessment (pp. 142-163).Alexandria, VA: Association for Supervisionand Curriculum Development.

Hansen, J. (1994). Literacy portfolios: Windowson potential. In S. W. Valencia, E. H. Hiebert,& P. P. Afflerbach (Eds.), Authentic readingassessment: Practices and possibilities (pp.26-40). Newark, DE: International ReadingAssociation.

Howard, K. (1990). Making the writing portfolioreal. Quarterly of the National Writing Projectand the Center for the Study of Writing andLiteracy, 12(2), 4-7, 27.

Johnston, P. (1989). Constructive evaluation andthe improvement of teaching and learning.Teachers College Record, 90, 509-528.

Koretz, D., Stecher, B., Klein, S., & McCaffrey,D. (1994). The Vermont portfolio assessmentprogram: Findings and implications. Education-al Measurement: Issues and Practice, 13(3),5 -16.

Koretz, D., McCaffrey, D., Klein, S., Bell, R., &Stecher, B. (1993). The reliability ofscores fromthe 1992 Vermont Portfolio Assessment Program(Interim Report). Los Angeles: National Centerfor Research on Evaluation, Standards, andStudent Testing.

LeMahieu, P. G., Eresh, J. T., & Wallace, R. C.(1992). Using student portfolios for a publicaccounting. The School Administrator, 49(11),8 -15.

Linn, R. L., Baker, E. L., & Dunbar, S. B. (19-91). Complex, performance-based assessment:Expectations and validation criteria. EducationalResearcher, 20(8), 5-21.

Messick, S. (1994). The interplay of evidence andconsequences in the validation of performanceassessments. Educational Researcher, 23(2),13-23.

Moss, P. (1994). Can there be validity withoutreliability. Educational Researcher, 23(2), 5 -12.

Moss, P. A., Beck, J. S., Ebbs, C., Matson, B.,Muchmore, J., Steele, D., Taylor, C., & Her-ter, R. (1992). Portfolios, accountability, and aninterpretive approach to validity. EducationalMeasurement: Issues and Practice, 11(3),12-21.

Nystrand, M., Cohen, A. S., & Martinez, M. N.(1993). Addressing reliability problems in theportfolio assessment of college writing. Educa-tional Assessment, 1,53-70.

Pelavin, S. (1991). Performance assessments in thestates. Washington, DC: Pelavin Associates.

Roser, N. L., & Martinez, M. G. (Eds.). (1995).Book talk and beyond: Children and teachersrespond to literature. Newark, DE: Interna-tional Reading Association.



Short, K. G., & Pierce, K. M. (Eds.). (1990).Talking about books: Creating literate commu-nities. Portsmouth, NH: Heinemann.

Tierney, R. J., Carter, M. A., & Desai, L. (1991).Portfolio assessment in the reading-writingclassroom. Norwood, CA: Christopher-Gordon.

Valencia, S. (1996). Portfolios in action. Manu-script in preparation.

Valencia, S. (1990). A portfolio approach to class-room reading assessment: The whys, whats,and hows. The Reading Teacher, 43, 338-340.

Valencia, S. W. (1991). Portfolios: Panacea orPandora's box? In Educational PerformanceTesting. Chicago: Riverside Publishing Compa-ny.

Valencia, S. W., & Calfee, R. C. (1991). Thedevelopment and use of literacy portfolios forstudents, classes, and teachers. Applied Mea-surement in Education, 4, 333-345.

Valencia, S. W., & Place, N. (1994a). Literacyportfolios for teaching, learning, and account-ability: The Bellevue literacy assessment pro-ject. In S. W. Valencia, E. H. Hiebert, & P. P.Afflerbach (Eds. ), Authentic reading assessment:Practices and possibilities (pp. 134-156).Newark, DE: International Reading Association.

Valencia, S. W., & Place, N. (1994b). Portfolios:A process for enhancing teaching and learning.The Reading Teacher, 47, 66-69.

Weaver, C. (1990). Understanding whole lan-guage: Principles and practices. Portsmouth,NH: Heinemann.

Wiggins, G. (1989). Teaching to the (authentic)test. Educational Leadership, 46(7).

Wiggins, G. (1989a). A true test: Toward moreauthentic and equitable assessment. Phi DeltaKappan, 79,703-713.

Wiggins, G. (1989b). Teaching to the (authentic)test. Educational Leadership, 46(7), 41.

Wiggins, G. P. (1993). Assessing student perfor-mance. San Francisco, CA: Jossey-Bass Publish-ers.

Wixson, K. K., Valencia, S. W., & Lipson, M. Y.(1994). Issues in literacy assessment: Facing

the realities of internal and external assessment.Journal of Reading Behavior, 26, 315-337.

Wolf, D. P. (1989). Portfolio assessment: Sam-

pling student work. Educational Leadership,46(7), 35-39.

Wolf, D. P., LeMahieu, P. G., & Eresh, J. T.(1992). Good measure: Assessment in serviceto education. Educational Leadership, 49(8),8-13.


41

"lcNationalReading ResearchCenter318 Aderhold, University of Georgia, Athens, Georgia 30602-71253216J. M. Patterson Building, University of Maryland, College Park, MD 20742

42

(9/92)

U.S. DEPARTMENT OF EDUCATIONOffice of Educational Research and Improvement (OERI)

Educational Resources Information Center (ERIC)

NOTICE

REPRODUCTION BASIS

I icl

This document is covered by a signed "Reproduction Release(Blanket)" form (on file within the ERIC system), encompassing allor classes of documents from its source organization and, therefore,does not require a "Specific Document" Release form.

This document is Federally-funded, or carries its own permission toreproduce, or is otherwise in the public domain and, therefore, maybe reproduced by ERIC without a signed Reproduction Releaseform (either "Specific Document" or "Blanket").

eric · 2014-05-14 · document resume ed 402 558 cs 012 669 author valencia, sheila w.; au,...

Documents