english proficiency of all english learners. states ... · english proficiency of all english...

LORENA LLOSANew York University, New York City

Assessing English Learners’Language Proficiency:A Qualitative Investigation ofTeachers’ Interpretations of theCalifornia ELD Standards

n This study investigates teachers’ use ofthe English Language Development(ELD) Classroom Assessment, anassessment of English proficiency usedin a large urban school district inCalifornia. This classroom assessment,which consists of a checklist of theCalifornia ELD standards, is used tomake high-stakes decisions about stu-dents’ progress from one ELD level tothe next and serves as one criterion forreclassification. Ten elementary schoolteachers were interviewed and asked toproduce verbal protocols while scoringthe ELD Classroom Assessment of twoof their students. Through six examplesfrom the data, this paper shows thatteachers do not interpret the ELD stan-dards consistently and as a result thescores they assign on the ELDClassroom Assessment to different stu-dents have different meanings. Thepaper concludes by discussing severalfactors that might affect how teachersinterpret standards and the implica-tions of these findings for the use ofstandards-based classroom assess-ments within a high-stakes accounta-bility system.

Introduction

The No Child Left Behind Act requiresstates and school districts to test the

English proficiency of all English learners.States, districts, and schools must show annu-al increases in the number and percentage ofstudents who become proficient in English, aswell as in the number and percentage of stu-dents who make progress toward that goal.The language tests used to assess students’English proficiency in this accountability sys-tem have a direct impact on the educationalopportunities of English learners. In manystates, these language assessments arealigned to a set of English LanguageDevelopment (ELD) standards designed toassist teachers in moving English learners tofluency in English and proficiency in theEnglish Language Arts (ELA) ContentStandards. In addition to statewide standard-ized tests, many school districts are usingstandards-based classroom assessments tomonitor the progress of English learners.

In a standards-based system, classroomassessments provide several advantages overstandardized tests that are administered oncea year. Unlike standardized tests that provideonly one score or performance level that rep-resents students’ mastery of the standards,classroom assessments have the potential toproduce rich information that teachers canuse to improve instruction and address theneeds of individual students. Also, throughclassroom assessments a greater number ofstandards can be assessed based on a broadrange of students’ performance over a periodof time, rather than their performance on afew of the standards on the day of the test(Popham, 2003).

Yet many questions remain as to the relia-bility and validity of the use of classroomassessments in a high-stakes accountabilitysystem (Brindley, 1998, 2001; Rea-Dickins &Gardner, 2000). Brindley (1998, 2001), whoresearched the use of classroom assessmentsin Australia and the United Kingdom, foundconsiderable variability in teachers’ interpre-tations of language ability. Similarly, Rea-Dickins and Gardner (2000) also uncoveredin their study of teacher assessments in theUnited Kingdom several factors that threatenthe validity of the inferences drawn from

The CATESOL Journal 17.1 • 2005 • 7

classroom assessments. The present studyinvestigates these issues in the U.S. context byexamining teachers’ use of the EnglishLanguage Development (ELD) ClassroomAssessment, a standards-based classroomassessment of English proficiency used in alarge urban school district in California.

Research on the Use of Standards-BasedClassroom Assessments

Brindley (1998) focuses on the problemsthat arise from the use of standards-basedclassroom assessments in the context ofAustralia and the United Kingdom. In particu-lar, he questions the validity and reliability ofclassroom-based assessments used foraccountability. Since individual teachers oftendevise their own assessment tasks to deter-mine the extent to which their students areachieving the outcomes or standards, he sug-gests that “it is possible that the scores or rat-ings derived from a variety of differentteacher-generated assessments of unknownvalidity and reliability are potentially invalid”(p. 64). Rea-Dickins and Gardner (2000) pointout the following factors that may affect thevalidity of the inferences drawn about learnersfrom classroom assessments: variability in thenature of the assessment activity; degree oftask preparation; level and detail of rubrics;level and detail of spoken instructions provid-ed by the teacher to individual learners; differ-ences in difficulty levels of assessment activi-ties; amount of assistance learners receive dur-ing an assessment; amount of time available tocomplete an activity; differences in content ofthe assessment activity; and differences ininterlocutors (p. 236). Koretz (1998), whoinvestigated the quality of performance dataproduced by large-scale portfolio assessmentefforts in Kentucky and Vermont, also foundthat scores on portfolios varied as a function ofraters, occasion, the rubrics used for scoring,and the tasks teachers assigned. These tasks, inturn, varied in terms of content, difficulty, andamount and type of assistance studentsreceived. Brindley (1998) concluded that “inhigh-stakes contexts, these inconsistencies

could lead to unfair decisions which couldadversely affect people’s lives” (p. 70).

The variability in teachers’ interpretationof language ability presents yet another threatto the validity and reliability of classroomassessments. Brindley (1998) reports thatsome studies have found that teachers andraters interpret language assessment criteriadifferently depending on their backgrounds,training, expectations, and preferences (e.g.,Brown, 1995; Chalhoub-Deville, 1995; Gipps,1994; Lumley & McNamara, 1995; North,1993), even after being trained extensively(Lumley & McNamara, 1995; North 1993).

The present study addresses some of theissues regarding classroom-based assessmentidentified by Brindley (1998) and others, par-ticularly the variability in teachers’ interpre-tation of language ability, by investigating,through verbal protocol analysis, the wayteachers rate students’ language ability asthey score the ELD Classroom Assessment.The next section briefly reviews the literatureon verbal protocol analysis, the methodologyused in this study.

Verbal Protocol Analysis

Since the 1980s, the use of verbal reportsto study cognitive processes has increasedsignificantly in many areas of psychology,education, and cognitive science. Even thoughverbal reports were being used to investigatesecond language learning in the 1980s(Faerch & Kasper, 1987), this methodologywas not introduced in studies of languageassessment until the field recognized theimportance of test-takers’ and raters’ process-es for test validation (Bachman, 1990). Moststudies of language assessment that use ver-bal reports follow the methodology describedby Ericsson and Simon (Lumley, 2002).

According to Ericsson and Simon (1993),the two forms of verbal reports that moreclosely reflect cognitive processes are concur-rent verbal reports and retrospective reports.Concurrent verbal reports—the methodolo-gy used in the present study—also known as“talk aloud” or “think aloud,” take place dur-

8 • The CATESOL Journal 17.1 • 2005

ing an individual’s performance of a particu-lar task. They are typically used to traceprocesses, understand problem solving, andshed light on an individual’s decision-makingprocesses. Two potential limitations of con-current verbal reports are the incompletenessof the data gathered—since people thinkfaster than they speak, only thoughts that areverbalized can be analyzed—and the obtru-siveness of the procedure—when conductingconcurrent reports, the act of verbalizingthoughts can influence the thoughts them-selves. In spite of these limitations, the use ofverbal protocols has proved to be invaluablein research that describes raters’ behavior inassessments of second language writing andreading (Cohen, 1994; Cumming, 1990;Cumming, Kantor, & Powers, 2001; Lumley,2000; Lumley 2002; Milanovic, Seville, &Shen, 1996; Vaughan, 1991; Weigle, 1994;Zhang, 1998). In the present study, verbal pro-tocol analysis is used to examine teachers’ useof the ELD Classroom Assessment and theirdecision-making processes as they assess stu-dents’ language ability. The next section pro-vides a detailed description of the ELDClassroom Assessment.

The ELD Classroom Assessment

The ELD Classroom Assessment, used in alarge urban school district in California, docu-ments an English learner’s progress towardeach of the California ELD standards. Theassessment is intended to be used for forma-tive purposes, but it is also used to make high-stakes decisions about English learners’progress from one ELD level to the next (thereare five ELD levels), and it serves as one of thecriteria for reclassification as Fluent EnglishProficient. There is a separate set of assess-ments for different grade levels: kindergartenthrough grade 2, grades 3 through 5, grades 6through 8, and grades 9 through 12. Each ELDClassroom Assessment consists of a list of allthe ELD standards for a given ELD level divid-ed into three sections—Listening/Speaking,Reading, and Writing—following the sameorganizational structure as the California ELD

Standards. The Reading standards are furtherdivided into the following sections: WordAnalysis, Fluency and Vocabulary, Compre-hension, and Literary Response. The Writingstandards are divided into Strategies, andApplications and Conventions. Teachers areexpected to score student progress towardeach standard using the following scale:

4 Advanced Progress: Exceeds the stan-dards for the identified ELD level

3 Average Progress: Meets the standardsfor the identified ELD level

2 Partial Progress: Demonstrates someprogress toward mastery of the stan-dards

1 Limited Progress: Demonstrates littleor no progress toward mastery of thestandards

Using this scale, teachers assign a score toeach standard by taking into account theoverall performance of the student in theclass.When the teacher determines that a stu-dent has mastered all the ELD standards at agiven ELD Level—that is, the student scoreda 3 or a 4 on all standards—the studentadvances to the next ELD level. Teachers mustalso include in the ELD ClassroomAssessment folder selected examples of stu-dent work that demonstrate mastery of theELD standards and that justify scores record-ed on the ELD Classroom Assessment.Teachers are expected to update the assess-ment each reporting period (three times ayear) and periodically gather student worksamples, such as observation checklists, writ-ing samples, and reading records. The scoreson the assessment, however, are not basedexclusively on these work samples. Teachersbase their scores on a student’s overall per-formance in class and thus, the ELDClassroom Assessment functions somewhatlike a report card. Teachers are not systemati-cally trained to score the ELD ClassroomAssessment, but they are encouraged to dis-cuss their students’ scores with other teachersto develop consistency in scoring acrossgrades and ELD levels.


Research Questions

The large-scale use of the ELD ClassroomAssessment as part of a high-stakes account-ability system allows for the investigation ofseveral issues related to the assessment ofEnglish proficiency via standard-based class-room assessments. This paper focuses on thefollowing research questions: When scoringthe ELD Classroom Assessment, how doteachers make decisions about students’English language proficiency as defined bythe California ELD standards? More specifi-cally, to what extent do teachers interpret theELD standards consistently as they determinestudent mastery of the standards?

Method

The sample for this study consists of 10fourth-grade teachers in five schools withlarge populations of English learners. Eightof the teachers were female and 2 were male.Six of the teachers were white, 2 were Asian-American, 1 was Hispanic, and 1 wasAfrican American. The number of yearsworking in their school district ranged from1 to 26 years.

This sample is part of a larger studydesigned to investigate the constructsassessed by the California English LanguageDevelopment Test (CELDT) and the ELDClassroom Assessment. For the larger study,20 schools were randomly sampled amongschools in the district that have a majoritypopulation of English learners (50% ormore). Eight principals of the 20 schoolsinvited to participate in the study agreed toparticipate. (Five principals declined to par-ticipate and the remaining principals did notrespond). In each of the eight schools, allfourth-grade teachers were invited to partici-pate in the qualitative study. The first 10teachers who accepted the invitation wereincluded in the study and paid a stipend of$50 for their participation.

Meetings with each teacher were sched-uled at the teachers’ convenience, typicallyafter school, and lasted between 1 and 1.5hours. Each teacher was asked to sign a con-

sent form and for permission to record thesession. During the session, the teacher wasfirst trained on how to produce a “think-aloud” verbal report. The teacher was thenasked to engage in a “think aloud” as hescored two of his students’ ELD ClassroomAssessments. Because the majority ofEnglish learners in fourth grade are ELD lev-els 3 and 4, teachers were asked to score theassessments of students at one of those twoELD levels. (The instructions provided toteachers for producing the verbal protocolsare included in the Appendix.)

These concurrent verbal protocols wereaudio-recorded and later transcribed. Thedata were analyzed qualitatively. All tran-scripts were read carefully for themes andpatterns in the data. The verbal protocolswere then grouped by ELD standard. For eachELD standard, teachers’ interpretations andstrategies for scoring students on that stan-dard were examined and then comparedacross teachers. Finally, themes and patternsin teacher interpretations and strategies forscoring were identified across all the ELDstandards in the assessment.

Findings

The analysis of the data revealed that,when scoring the ELD ClassroomAssessment, teachers did not always interpretthe standards consistently. Of the 29 stan-dards in the ELD level 3 ClassroomAssessment, 16 (55%) were interpreted in dif-ferent ways by the teachers. Different interpre-tations occurred most often when a standardcontained ambiguous language or when astandard contained multiple parts. Whenconfronted with ambiguous language teach-ers would (a) assign their own meaning to thestandard, (b) ignore the part that they per-ceived as ambiguous, or (c) ignore the entirestandard. When faced with a standard thatcontained multiple parts, teachers wouldeither (a) focus on one part only, or (b) find away to “average”all parts. This process is illus-trated in Figure 1:


Figure 1How Teachers Interpret

the Standards

This section presents six examples fromthe data that illustrate how teachers interpretthe California ELD standards and make deci-sions about students’ language ability.

Example 1

One of the “Reading—Literary Responseand Analysis”standards for ELD level 3 states:“Apply knowledge of language to derivemeaning/comprehension from literary texts.”While scoring Mayra’s assessment, one of theteachers said:

Again, we went over this again today—going over our vocabulary in class. Andshe was very proficient at knowing that ifa “–tion” is at the end of a word, thatmeans that it’s a noun. So if I know it’s anoun I can start to understand what thatword means. And so we actually went overthat today and she did a great job at that.(Wrote down a score of 4).

This teacher interpreted “knowledge of lan-guage” to mean knowledge of vocabulary,specifically knowledge of word parts (affixesand suffixes) to derive the meaning of words.Another teacher decided to ignore the part ofthe standard that she perceived as ambiguouswhen scoring Raul’s assessment:

I’m never sure what they mean by “applyknowledge of language,” but he can defi-

nitely find meaning and comprehensionin the text so for these kinds of things Ijust use my best judgment. So I’d givehim a 4.

For Raul’s teacher, a score of 4 on this stan-dard means that Raul can find meaning andcomprehension in the text, whereas forMayra’s teacher a score of 4 means that Mayrahas knowledge of word parts and can use thatknowledge to understand the meaning ofwords. Clearly, the scores of 4 for these twostudents mean different things. When facedwith the ambiguous phrase,“apply knowledgeof language,” in the standard, their teacherschose different strategies: Mayra’s teacherassigned her own meaning to the phrase,whereas Raul’s teacher ignored it. As a result,the students’ identical score on the standardrepresents mastery of different abilities.

Example 2

The following is another example of teach-ers’ strategies for dealing with a standard thatcont a i ns a mbi g uou s l a ng u age. T he“Reading—Word Analysis” standard for ELDlevel 3 states: “Use common English mor-phemes in oral and silent reading.”

This is what Manuel’s teacher said whilescoring his assessment:

These are so hard because how do I knowhe’s applying knowledge of morphemesespecially in such a big class? Unless hesays: “Oh, I’m noticing that this word hasan ending and that this ending is changingthe meaning of the word,” it’s really hard totell. So what I do, if he’s doing well in read-ing, I say a 3. He is deriving meaning fromwhat he’s reading.

Manuel’s teacher seemed confident aboutthe meaning of the standard but was not surehow to gather evidence of the student’s mas-tery of this standard. Thus, she decided tobase her score on Manuel’s general readingability and his ability to derive meaning fromtext, which is a skill not mentioned in thestandard. Another teacher, on the other hand,


chose to completely ignore the standard whenscoring Ernesto’s assessment:

I really don’t know for sure; they say inoral and silent reading. Oh, that’s a strangeone. I would say a 3 because I can’t be sure.I don’t think anyone can be sure.

Ernesto’s teacher was unsure about themeaning of the standard and to be “safe” andnot penalize him, she assigned a score of 3.Manuel’s score of 3 on this particular stan-dard means that he can read and derivemeaning from text. Ernesto’s score of 3 is notvery meaningful.

Example 3

The following example involves a stan-dard that at first glance does not appear tobe ambiguous but that lends itself to manyinterpretations upon closer inspection. Oneof the “Reading—Fluency and SystematicVocabulary Development” standards forELD level 3 reads: “Create a simple diction-ary of frequently used words.” When scor-ing Jessica’s assessment, her teacher saidthe following:

She doesn’t do that yet. I mean, she doesn’tcreate one for herself. She doesn’t take theinitiative to say, “Oh, that’s a new word. Ishould put that down.”She doesn’t do that.. . . Some do. I have quite a few studentsthat do that, but in her case she doesn’t.

This teacher interpreted this standard asan activity that students are responsible fordoing proactively: She expected Jessica to cre-ate a dictionary on her own. It is unclearwhether the teacher has taught this activity asa strategy, but it does not seem that sherequires students to create a dictionary.Another teacher, however, had a very differentinterpretation of the standard. While scoringGustavo’s assessment, she said:

I should have them do that. Because I don’tand so it’s a funny thing to be grading

them on. It’s something that I should do.

This teacher understood the standard tomean that she is responsible for assigning theactivity of creating a dictionary to her stu-dents, while the former teacher expected stu-dents to take the initiative to create a diction-ary on their own. Neither Cecilia nor Gustavocreated a dictionary, but while Cecilia wasgiven a score of 2 for not taking the initiativeto do it, Gustavo was not scored at all becausethe teacher did not expect him to do it.

Yet a third teacher, unsure of what thestandard means, assigned an entirely differ-ent meaning to the standard. While scoringCecilia’s assessment, she said:

She’d be able to do that with some help.She might not know, and I’m not sureexactly what they’re asking when they askthis, but I’m assuming that they mean likealphabetical skills and those kinds ofthings and the meanings of words, and shecould possibly do that. But I’d probablygive her a 2 because she would not be ableto define as many words as, say, Alexwould or somebody at the same ELD level.

This teacher focused on Cecilia’s alphabet-ical skills, knowledge of the meaning ofwords, and the ability to define words. She didnot interpret this standard as requiring stu-dents to literally create a dictionary in the waythe other two teachers did. Cecilia’s score of 2has a very different meaning from Jessica’sscore of 2.

The examples above paint a picture of theproblems present when teachers are confront-ed with language in a standard that is—or isperceived as being—ambiguous. The nextthree examples show the strategies that teach-ers use when faced with standards that havemultiple parts.

Example 4

The following standard from the“Writing—Strategies and Applications” sec-tion of the ELD level 3 Classroom Assessment


states: “Independently create cohesive para-graphs that develop a central idea with con-sistent use of standard English grammaticalform. (Some rules may not be in evidence).”When scoring Edwin’s assessment, his teacherfocused on one part of the standard—thepart that requires independent action:

A 2. He’s not really independent. If I sit andI work with him, he does a lot. He can writea paragraph, but I have to really keep himon task and keep him focused. So inde-pendently he can’t do it. It would be a 2.

But Cinthia’s teacher focused on the sec-ond part of the standard when scoring herassessment—“creat[ing] cohesive para-graphs that develop a central idea”:

Right now I’d say a 2 only because she’sone that has problems with that in herwriting. The cohesive paragraphs, tryingto put that together, having a topic sen-tence and supporting one—it’s a complexprocess and we work on main ideas. She’sdoing it, better and better with each writ-ing we do, but it’s still…a work-in-progress situation, so…a 2.

Once again, Edwin’s and Cinthia’s identicalscores mean different things. A score of 2means that Edwin can write a paragraph, butonly with assistance. In Cinthia’s case, a 2means that she has difficulties writing a para-graph regardless of whether she is doing itindependently or not. When scoring José onthis same standard, another teacher chose tofocus on the third part of the standard: “withconsistent use of standard English grammati-cal form. (Some rules may not be in evi-dence)”:

[W]hat I read from this is that they’remostly concerned that he can write aparagraph using standard Englishbecause writing a paragraph, a cohesiveparagraph that develops the central idea,is difficult for any fourth-grader. So I

would give him a 3 because I think they’remostly focusing on the fact that he can usestandard English.

José was assigned a score of 3, whichmeans that unlike Cinthia and Edwin whoreceived a 2, he has mastered the standard.Yetaccording to the teacher’s statement, José can-not write a cohesive paragraph that developsa central idea; he can only write a paragraphusing standard English.

In short, the data suggest that when astandard contains multiple parts, the extentto which a student is determined to have mas-tered that standard often reflects mastery ofthe part of the standard on which the teacherchose to focus.

Example 5

The following is another example of astandard with multiple parts and the waysin which a teacher struggles to consider allparts before assigning a score. The“Reading Comprehension” standard forELD level 3 states: “Read and orally identifyexamples of fact/opinion, and cause/effectin literature and content area texts.” Whilescoring Ruby’s assessment, a teacherexpressed the following:

I would say she would get a 3 for fact—shecould probably even get a 4 for fact andopinion. But the cause and effect is a littleharder for her to grasp, and so we’ll just gowith a 3. The cause and effect, they’re stillgetting them confused, which is the causeand which is the effect. . . . But the fact andopinion, all the kids know that.

In this example, the teacher admitted thatRuby exceeds the standard for understandingfact versus opinion, but she is not yet able tounderstand cause and effect. Because bothskills are part of the same standard, theteacher felt compelled to “average out” the twoskills and assigned Ruby a score of 3, eventhough she has not mastered cause and effect.


Example 6

The last example also shows how a teacherstruggles to address all parts of the standardto come up with one score. One of the“Listening and Speaking” standards for ELDlevel 3 reads: “Actively participate in socialconversations with peers and adults on famil-iar topics by asking and answering questionsand soliciting information.” Alex’s teachercarefully tried to attend to each part of thestandard to come up with a single score:

I would give him a 3. He’s usually a 2. Well,he would be a 4 with his peers. He has noproblem with his peers. With adults, itwould probably be a 2. But when it’s af a m i l i a r topi c h e d o e s g re at .Because…[H]e had attended space camp,and he wants to be an astronaut. So whenwe had the story about that, he was shar-ing every day. I would’ve given him a 4 thatweek. But overall, I would say a 3.

As in the previous example, this one showsthe problems teachers face when they areforced to assign one score to assess a studenton a standard that has multiple parts. Theresulting score is not a true reflection of thestudent’s mastery of the standard but rather acompromise or “average” of the multiple abil-ities assessed. These examples demonstratethat when a standard has multiple parts andonly one score can be used to rate mastery ofthe standard, the score is not particularlymeaningful. Students’ abilities might be bet-ter described with individual scores for eachpart of the standard.

In summary, the data show that teachersdo not always interpret the ELD standardsconsistently when scoring the ELDClassroom Assessments. When faced withambiguous language or standards with mul-tiple parts, teachers resort to one or morestrategies: They assign their own meaning tothe standard, they ignore it in part or inwhole, they focus on one part only, or theyaverage all parts. As a result, the same scoreson the assessment can represent different

levels of mastery or types of proficiency fordifferent students.

Discussion

Bachman and Palmer (1996) define valid-ity as “the meaningfulness and appropriate-ness of the interpretations that we make onthe basis of test scores” (p. 21). They explainthat, to justify a particular score interpreta-tion, evidence must be gathered to confirmthat “the test score reflects the area(s) of lan-guage ability we want to measure, and verylittle else” (p. 21). The present study showsthat the scores on the ELD ClassroomAssessment reflect more than just students’language ability; scores also reflect teachers’interpretations of the standards. In addition,other factors not explored in this paper arereflected in the scores as well, includingteachers’ interpretation of the scoring criteria,students’ personalities and behavior, externalpressure to advance students to the next level,and teachers’ beliefs about assessment andgrading. Overall, the evidence gathered fromthe present study calls into question the valid-ity of score interpretations.

So, why are teachers not interpreting andusing the ELD standards consistently whenassessing their students? There are at leastthree possible explanations.

One reason might have to do with thestandards themselves. Content standardshave often been criticized for being too broad,too ambitious, and too numerous for teachersto address effectively (Marzano & Kendall,1997; Popham, 2003). Teachers in the presentstudy often complained about the “vague-ness” of the ELD standards, particularly whenthe language of the standards included“caveats” or “qualifications” of the ability—statements such as “may include some incon-sistent use of ” or “some rules may not be inevidence.” Teachers also identified as ambigu-ous standards that included modifiers such as“more expanded vocabulary,” “detailed,”“more complex.” Given the central role thatstandards play in instruction and assessment,studies should be conducted to examine the


quality, clarity, and importance of contentstandards (Lane, 2004).

Another reason why teachers do not inter-pret the ELD standards consistently might belack of professional development. Of the 10teachers in the study, 3 had never receivedtraining on the ELD standards or the use ofthe ELD Classroom Assessment, while theremaining 7 vaguely remembered attending aprofessional development session some yearsbefore. This finding is consistent with those ofa recent survey of teachers of English learnersin California that found that, in the last 5years, many teachers of English learners hadreceived little or no professional developmenton issues related to English learners(Gándara, Maxwell-Jolly, & Driscoll, 2005). Atthe beginning of the school year, teachersreceive a memo about the ELD ClassroomAssessment that encourages them to discusstheir scores with other grade-level teachers.But teachers often do not have the time toengage in these conversations. In fact, in his1998 study, Stecher found that classroomassessments place additional burdens onteachers, who are already strapped for time.

A third possible explanation might have todo with alignment. Even though there is aperfect alignment between the California ELDst a n d ard s a n d t h e E L D C l a s sro omAssessment—after all, the standards listed inthe assessment correspond directly to theCalifornia ELD standards—there is no directalignment between the standards in the ELDClassroom Assessment and the instructionalprogram. The Reading/Language Arts cur-riculum that teachers use is aligned to theEnglish Language Art (ELA) standards, notthe ELD standards. Even though there mightbe some overlap, there is not a clear alignmentbetween the ELD standards and what teach-ers are focusing on in the classroom. As evi-dent in some of the examples discussed inthis paper, the lack of alignment betweenwhat is covered in class and the standardsmeasured in the assessment makes it difficultfor teachers to determine whether a standardhas been met, especially when they have notcovered the content in class or engaged stu-

dents in activities that would allow them todemonstrate mastery.

Regardless of the reasons, the fact that theextent to which a student is determined tomaster a standard is in part based on a par-ticular teacher’s interpretation of that stan-dard not only has implications for the validi-ty of the ELD Classroom Assessment itself,but it also calls into question the effective-ness of standards-based reform, at least as itconcerns English learners. The goal of stan-dards-based reform is to improve the qualityof education by aligning the entire system—curriculum, assessment, professional devel-opment, and funding—to standards set atthe state level. These standards are at the coreof such a system so, unless there is a clear andcommon understanding of the standards, thesystem cannot improve the quality of educa-tion. In areas such as English Language Artsand mathematics, where the pressures ofaccountability are greatest, the school districtin this study adopted and supported theimplementation of a curriculum aligned toCalifornia’s set of standards. But with somuch attention focused on English LanguageArts and mathematics, the ELD standards arenot a priority for teachers. This “second-class” status of ELD standards and ELDinstruction might ultimately explain whyteachers do not receive adequate professionaldevelopment on the standards and as a resultdo not interpret standards consistently whenassessing their students.

In spite of some of the issues raised in thispaper, classroom assessments such as theELD Classroom Assessment have the poten-tial to be useful assessment instruments topromote student learning. For one, theseassessments provide teachers with immediateand practical information about their stu-dents that can inform their instruction. Infact, 7 of the 10 teachers reported that theyuse the assessment in formative ways. A fewuse the assessment to identify students whoare not doing well and provide those studentswith additional assistance, sometimesthrough an instructional aide. Others report-ed that scoring the assessment gives them


new teaching ideas and reminds them of con-tent and activities that they should cover inclass. Among the many benefits of usingclassroom assessments, Stecher (1998) alsofound greater enthusiasm for teaching, higherexpectations for students, and desiredchanges in educational goals, content, andinstructional procedures.

Many of the issues raised in this papermight be resolved if teachers were trainedand retrained on a regular basis on the ELDstandards and the use of the ELD classroomassessment. Professional developmentshould focus on the meaning of the stan-dards, the scoring criteria, and examples ofstudent work. Also, the alignment betweenthe ELD Classroom Asessment and theinstructional program needs to be explicitlyarticulated for teachers to take advantage ofthe connections that already exist betweenthe instructional program and the assess-ment. The Map of Standards for EnglishLearners created by WestEd (Carr &Lagunoff, 2003) is a helpful resource thatmaps the ELD standards to the ELA stan-dards, but it provides minimal guidance fordesigning instructional activities. There is aneed for specific instructional activities tohelp teachers address those ELD standardsthat are not matched directly to the ELAstandards and as a result are not addressedby the English Language Arts curriculum.

In conclusion, several issues need to beresolved to ensure that standards-basedclassroom assessments are used effectivelyto promote student learning and foraccountability. But their potential shouldnot be wasted. As Lane (2004) and manyother researchers have pointed out, we can-not rely on any one measure alone. Theeffective use of high-quality classroomassessments that reflect the standardsshould be an integral part of a “cohesive,balanced assessment system” aimed at pro-moting student learning (p. 13).

This research was supported by grants fromthe University of California LinguisticMinority Research Institute (UC LMRI) and

the Spencer Foundation. Opinions reflect thoseof the author and do not necessarily reflectthose of the grant agencies.

Author

Lorena Llosa completed her Ph.D. in AppliedLinguistics at the University of California, LosAngeles in June 2005. She is now an assistantprofessor at New York University’s SteinhardtSchool of Education. Her research interestsinclude second and foreign language learningand teaching, language testing, program evalu-ation, and research methods.

References

Bachman, L. F. (1990). Fundamental consider-ations in language testing. Oxford,England: Oxford University Press.

Bachman, L. F., & Palmer, A. S. (1996).Language testing in practice. Oxford,England: Oxford University Press.

Brindley, G. (1998). Outcomes-based assess-ment and reporting in language learningprogrammes: A review of the issues.Language Testing, 15(1), 45-85.

Brindley, G. (2001). Outcomes-based assess-ment in practice: Some examples andemerging insights. Language Testing,18(4), 393-407.

Brown, A. (1995). The effect of rater variableson the development of an occupation-spe-cific language performance test. LanguageTesting 12(1), 1-15.

Carr, J., and Lagunoff, R. (2003). The map ofstandards for English learners: Integratinginstruction and assessment of English lan-guage development and English languagearts standards in California (4th ed). SanFrancisco: WestEd.

Chalhoub-Deville, M. (1995) Deriving oralassessment scales across different testsand rater groups. Language Testing, 12(1),16-33.

Cohen, A .D. (1994). English for academicpurposes in Brazil: The use of summarytasks. In C. Hill & K. Parry (Eds.), Fromtesting to assessment: English as an inter-national language. London: Longman.


Cumming, A. (1990). Expertise in evaluatingsecond language composition. LanguageTesting, 7(1), 31-51.

Cumming, A., Kantor, R., & Powers, D. (2001).Scoring TOEFL essays and TOEFL 2000prototype writing tasks: An investigationinto raters’ decision making, and develop-ment of a preliminary analytic framework.(TOEFL Monograph Series.) Princeton,NJ: Educational Testing Service.

Ericsson, K.A., & Simon, H.A. (1993). Protocolanalysis: Verbal reports as data (Rev. ed.).Cambridge, MA: The MIT Press.

Faerch, C., & Kasper, G. (1987). Introspectionin second language research. Clevedon,Great Britain: Multilingual Matters.

Gándara, P., Maxwell-Jolly, J., & Driscoll, A.(2005). Listening to teachers of English lan-guage learners: A survey of Californiateachers’ challenges, experiences, and pro-fessional development needs. Santa Cruz,CA: Center for the Future of Teaching andLearning.

Gipps, C. (1994). Beyond testing: Towards atheory of educational assessment. London:The Falmer Press.

Koretz, D. (1998). Large-scale assessmentassessments in the US: Evidence pertain-ing to the quality of measurement.Assessment in Education, 5(3), 309-334.

Lane, S. (2004).Validity of high-stakes assess-ment: Are students engaged in complexthinking? Educational Measurement:Issues and Practice, 23(3), 6-14.

Lumley, T. (2000). The process of the assess-ment of writing performance: The rater’sperspective. Unpublished doctoral thesis,University of Melbourne, Melbourne,Australia.

Lumley, T. (2002). Assessment criteria in alarge-scale writing test: What do they real-ly mean to the raters? Language Testing,19(3), 246-276.

Lumley, T. J. N., & McNamara, T. F. (1995).Rater characteristics and rater bias:Implications for training. LanguageTesting 12(1), 54-71.

Marzano, R. J., & Kendall, J. S. (1997). The falland rise of standards-based education: A

National Association of School Boards ofEducation (NASBE) issues in brief. Aurora,CO: Mid-Continent Research forEducation and Learning.

Milanovic, M., Saville, N., and Shen, S. (1996).A study of the decision-making behaviourof composition markers. In M. Milanovic& N. Saville (Eds.), Performance testing,cognition and assessment: Selected papersfrom the 15th language testing research col-loquium, Cambridge and Arnhem (pp. 92-114). Cambridge, England: CambridgeUniversity Press and University ofCambridge Local Examinations Syndicate.

North, B. (1993). The development of descrip-tors on scales of language proficiency.Washington, DC: The National ForeignLanguage Center.

Popham, J. W. (2003). The trouble with test-ing: Why standards-based assessmentdoesn’t measure up. American SchoolBoard Journal, 190(2), 14-17.

Rea-Dickins, P., and Gardner, S. (2000). Snaresand silver bullets: Disentangling the con-struct of formative assessment. LanguageTesting, 17(2), 215-243.

Stecher, B. M. (1998). The local benefits andburdens of large-scale portfolio assess-ment. Assessment in Education, 5(3), 335-351.

U.S. Department of Education (n.d.) No childleft behind act of 2001. RetrievedSeptember 12, 2005, from http://www.ed.gov/nclb/landing.jhtml

Vaughan, C. (1991). Holistic assessment:What goes on in the rater’s mind? In L.Hamp-Lyons (Ed.), Assessing second lan-guage writing in academic contexts,(pp.111-125). Norwood, NJ: Ablex.

Weigle, S. C. (1994). Effects of training onraters of English as a second language com-positions: Quantitative and qualitativeapproaches. Unpublished doctoral disser-tation, University of California, LosAngeles.

Zhang, W. (1998). The rhetorical patternsfound in Chinese EFL student writers’examination essays in English and theinfluence of these patterns on rater


response. Unpublished doctoral thesis,Hong Kong Polytechnic University,People’s Republic of China.

AppendixInstructions Given to Teachers

for Think-Aloud Task(Adapted from Lumley, 2002)

“I am now going to ask you to score two ofyour students using the ELD ClassroomAssessment. I would like you to score them asfar as possible in the usual way, that is, just asyou have done it in the previous reportingperiod. However, there will be one importantdifference this time: As I have previouslymentioned, I am conducting a study of theprocesses used by teachers when they scorethe ELD Classroom Assessment, and I wouldnow like you to talk and think aloud as youscore these assessments, while this tape

recorder records what you say.First, you should state the student’s cur-

rent ELD level. Then you should read eachstandard out loud as you start to score. Then,as you score each standard, you should vocal-ize your thoughts, and explain why you givethe scores you give.

It is important that you keep talking allthe time, registering your thoughts all thetime. If you spend time reading from theELD Classroom Assessment, you should dothat aloud also, so that I can understandwhat you are doing at that time. In order tomake sure there are no lengthy silent pausesin your scoring, I propose to sit here andprompt you to keep talking if necessary. Iwill sit here while you rate and talk. I will saynothing more than give you periodic feed-back such as ‘mhm,’ although I will promptyou to keep talking if you fall silent for morethan 10 seconds.”


english proficiency of all english learners. states ... · english proficiency of all english...

Documents