standardized achievement tests and english language ...jherold08/ed633/done abedi standardized...

28
Standardized Achievement Tests and English Language Learners: Psychometrics Issues Jamal Abedi Graduate School of Education and Information Studies CRESST/University of California, Los Angeles Using existing data from several locations across the U.S., this study examined the impact of students’ language background on the outcome of achievement tests. The results of the analyses indicated that students’ assessment results might be con- founded by their language background variables. English language learners (ELLs) generally perform lower than non-ELL students on reading, science, and math–a strong indication of the impact of English language proficiency on assessment. Moreover, the level of impact of language proficiency on assessment of ELL students is greater in the content areas with higher language demand. For example, analyses showed that ELL and non-ELL students had the greatest performance differences in the language-related subscales of tests in areas such as reading. The gap between the performance of ELL and non-ELL students was smaller in science and virtually non- existent in the math computation subscale, where language presumably has the least impact on item comprehension. The results of our analyses also indicated that test item responses by ELL stu- dents, particularly ELL students at the lower end of the English proficiency spec- trum, suffered from low reliability. That is, the language background of students may add another dimension to the assessment outcome that may be a source of measure- ment error in the assessment for English language learners. Further, the correlation between standardized achievement test scores and exter- nal criterion measures was significantly larger for the non-ELL students than for the ELL students. Analyses of the structural relationships between individual items and between items and the total test scores showed a major difference between ELL and non-ELL students. Structural models for ELL students demonstrated lower statistical EDUCATIONAL ASSESSMENT, 8(3), 231–257 Copyright © 2002, Lawrence Erlbaum Associates, Inc. Requests for reprints should be sent to Jamal Abedi, UCLA–CSE/CRESST, 300 Charles E. Young Drive North, GSE&IS Building, 3rd Floor, Los Angeles, CA 90095–1522. E-mail: jabedi@ cse.ucla.edu

Upload: duonglien

Post on 07-Feb-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Standardized Achievement Testsand English Language Learners:

Psychometrics Issues

Jamal AbediGraduate School of Education and Information Studies

CRESST/University of California, Los Angeles

Using existing data from several locations across the U.S., this study examined theimpact of students’ language background on the outcome of achievement tests. Theresults of the analyses indicated that students’ assessment results might be con-founded by their language background variables. English language learners (ELLs)generally perform lower than non-ELL students on reading, science, and math–astrong indication of the impact of English language proficiency on assessment.Moreover, the level of impact of language proficiency on assessment of ELL studentsis greater in the content areas with higher language demand. For example, analysesshowed that ELL and non-ELL students had the greatest performance differences inthe language-related subscales of tests in areas such as reading. The gap between theperformance of ELL and non-ELL students was smaller in science and virtually non-existent in the math computation subscale, where language presumably has the leastimpact on item comprehension.

The results of our analyses also indicated that test item responses by ELL stu-dents, particularly ELL students at the lower end of the English proficiency spec-trum, suffered from low reliability. That is, the language background of students mayadd another dimension to the assessment outcome that may be a source of measure-ment error in the assessment for English language learners.

Further, the correlation between standardized achievement test scores and exter-nal criterion measures was significantly larger for the non-ELL students than for theELL students. Analyses of the structural relationships between individual items andbetween items and the total test scores showed a major difference between ELL andnon-ELL students. Structural models for ELL students demonstrated lower statistical

EDUCATIONAL ASSESSMENT, 8(3), 231–257Copyright © 2002, Lawrence Erlbaum Associates, Inc.

Requests for reprints should be sent to Jamal Abedi, UCLA–CSE/CRESST, 300 Charles E.Young Drive North, GSE&IS Building, 3rd Floor, Los Angeles, CA 90095–1522. E-mail: [email protected]

Page 2: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

fit. The factor loadings were generally lower for ELL students, and the correlationsbetween the latent content-based variables were also weaker for them.

We speculate that language factors may be a source of construct-irrelevant vari-ance in standardized achievement tests (Messick, 1994) and may affect their con-struct validity.

Due to the rapidly changing demographics of the U.S. population, fairness and va-lidity issues in assessment are becoming top priorities in the national agenda. Be-tween 1990 and 1997, the number of U.S. residents not born in the United Statesincreased by 30%, from 19.8 million to 25.8 million (Hakuta & Beatty, 2000). Ac-cording to the Survey of the States’ Limited English Proficient Students and Avail-able Educational Programs and Services 1999–2000 Summary Report, over 4.4million limited English proficient1 students were enrolled in public schools (Na-tional Clearinghouse for English Language Acquisition and Language InstructionEducational Programs, 2002). To provide fair assessment and uphold standards oninstruction for every child in this country, both federal (e.g., No Child Left BehindAct of 2001) and state legislation now require the inclusion of all students, includ-ing ELLs, into large-scale assessments (Abedi, Lord, Hofstetter, & Baker, 2000;Mazzeo, Carlson, Voelkl, & Lutkus, 2000). Such inclusion requirements haveprompted new interest in modifying assessments to improve the level of Englishlanguage learners’ participation and to enhance validity and equitability of infer-ences drawn from the assessments themselves.

Standardized, high-stakes achievement tests are frequently used for assessmentand classification of ELL students, as well as for accountability purposes. Theyshape instruction and student learning (Linn, 1995). About 40% of districts andschools use achievement tests for assigning ELL students to specific instructionalservices within a school, and over 70% of districts and schools use achievementtests to reclassify students from ELL status (Zehler, Hopstock, Fleischman, &Greniuk, 1994).

However, as most standardized, content-based tests (such as science and mathtests) are administered in English and normed on native English-speaking test pop-ulations, they may inadvertently function as English language proficiency tests.English language learners may be unfamiliar with the linguistically complex struc-ture of test questions, may not recognize vocabulary terms, or may mistakenly in-

232 ABEDI

1The term English language learner (ELL) refers to students who are not native speakers of Englishand are not as proficient in English as native speakers. A subgroup of these students with a lower levelof English proficiency is referred to as limited English proficient (LEP). The term LEP is used primarilyby government-funded programs to classify students as well as by the National Assessment of Educa-tional Progress (NAEP) for determining inclusion criteria. In this article we use ELL to refer to studentswho are not native English speakers and who are not reclassified as fluent in English.

Page 3: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

terpret an item literally (Duran, 1989; Garcia, 1991). They may also perform lesswell on tests because they read more slowly (Mestre, 1988).

Thus, language factors are likely to reduce the validity and reliability of infer-ences drawn about students’ content-based knowledge, as stated in the Standardsfor Educational and Psychological Testing (American Educational Research As-sociation, American Psychological Association, & National Council on Measure-ment in Education [AERA, APA, & NCME], 1999):

For all test takers, any test that employs language is, in part, a measure of their lan-guage skills. This is of particular concern for test takers whose first language is notthe language of the test. Test use with individuals who have not sufficiently acquiredthe language of the test may introduce construct-irrelevant components to the testingprocess. In such instances, test results may not reflect accurately the qualities andcompetencies intended to be measured. … Therefore it is important to consider lan-guage background in developing, selecting, and administering tests and in interpret-ing test performance. (p. 91)

As indicated earlier, a major criticism of standardized achievement tests is the ex-clusion of ELL students from the norming group for these tests. Linn (1995) refers tothe issues associated with inclusion of all students as one of the three most notable ofthe new features of this reform effort. The inclusion of all students in its assessmentshas also been among the major issues for NAEP (see, e.g., Mazzeo et al., 2000).Navarrette and Gustke (1996) expressed several concerns about the exclusion ofELL students from the norming groups of standardized achievement tests:

Not including students from linguistically diverse backgrounds in the norminggroup, not considering the match or mismatch between a student’s cultural andschool experiences, and not ensuring for English proficiency have led to justified ac-cusations of bias and unfairness in testing. (p. 2)

Findings from a series of studies conducted by the National Center for Researchon Evaluation, Standards, and Student Testing (CRESST) on the impact of students’languagebackgroundon theirperformance indicated that (a) student languageback-ground affects students’ performance in content-based areas such as math and sci-ence,and(b) the linguisticcomplexityof test itemsmaythreaten thevalidityandreli-ability of achievement tests, particularly for ELL students (see Abedi & Leon, 1999;Abedi, Leon, & Mirocha, 2001; Abedi & Lord, 2001; Abedi et al., 2000).

Thus, the literature on the assessment of ELLs clearly suggests that languagefactors confound the test results of English language learners. However, the litera-ture is not clear on the level of impact that language factors may have on differentcontent areas. That is, would the impact level of language on test outcomes differacross the different content areas? Another issue concerns the impact level of lan-

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 233

Page 4: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

guage factors on the validity and reliability of content-based assessments forELLs. Available data from four large school sites in the nation enabled us to ex-plore these issues in greater detail.

METHODOLOGY

Research Questions

1. Could the performance difference between ELL and non-ELL students bepartly explained by language factors in the assessment?

2. Could the linguistic complexity of test items as a possible source of mea-surement error influence the reliability of the assessment?

3. Could the linguistic complexity of test items as a possible source of con-struct-irrelevant variance influence the validity of the assessment?

Data Sources

The data for this study were obtained from four locations across the U.S. To assureanonymity, these data sites are referred to as Sites 1 to 4. Item-level standardizedachievement test data and background information were obtained for participatingstudents. The background variables included gender, ethnicity, free/reduced pricelunch participation, parent education, student ELL status, and students with dis-abilities (SD) status.

Table 1 summarizes some of the main characteristics of the four data sites. Asdata in Table 1 show, there were similarities and differences among the four datasites. All sites used standardized tests for measuring students’achievement in Eng-lish and other content-based areas, but they differed in the type of test adminis-tered. Although all sites had an index of students’ English language proficiencystatus (ELL or bilingual status), and they all provided some student background in-formation, they differed in the type of language proficiency index used and thetype of background variables provided. These differences limited our ability toperform identical analyses at the different sites for cross-validation purposes.However, there were enough similarities in the data structures at the four differentsites to allow for meaningful comparisons.

The following is a brief description of each of the four data sites.

Site 1. Site 1 is a large urban school district. Data on the Iowa Tests of BasicSkills (ITBS) were obtained for Grades 3 through 8 in 1999. No information wasavailable on students’ ELL status; however, students were categorized as towhether or not they were receiving bilingual services. Among the 36,065 studentsin the Grade 3 population, 7,270 (about one in five) of these students were receiv-

234 ABEDI

Page 5: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

ing bilingual services. In Grade 6 there were 28,313 students in the population,with 3,341 (11.8%) receiving bilingual services. In Grade 8 there were 25,406 stu-dents in the population, and 2,306—fewer than one in ten (9.1%)—were receivingbilingual services.

Site 2. Site 2 is a state with a very large number of ELL students. There werea total of 414,169 students in the Grade 2 population of the state, and 125,109(30.2%) of these students were ELLs. In Grade 7 there were 349,581 students, ofwhom 73,993 (21.2%) were ELL students. In Grade 9 there were 309,930 students,and 57,991 (18.7%) were ELL students. Stanford Achievement Test, 9th edition(Stanford 9) test data were obtained for all students in Grades 2 to 11 who were en-rolled in the statewide public schools for the 1997–1998 academic year.

Site 3. Site 3 is an urban school district. Stanford 9 test data were availablefor all students in Grades 10 and 11 for the 1997–1998 academic year. Accommo-dation data were obtained from the district and included both the type and numberof accommodations received. There were 12,919 students in the Grade 10 popula-tion, and 431 (3.3%) of these students were ELLs. In Grade 11 there were 9,803students in the population, of whom 339 (3.5%) were ELL students.

Site 4. Site 4 is a state with a large number of ELL students. Access was pro-vided to Stanford 9 summary test data for all students in Grades 3, 6, 8, and 10 whowere enrolled in the state’s public schools for the 1997–1998 academic year. Therewere a total of 13,810 students in the Grade 3 population of the state, and 1,065(7.7%) of these students were ELLs. In Grade 6 there were 12,998 students in the

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 235

TABLE 1Summary of Characteristics of the Four Data Sites

Data Site Site 1 Site 2 Site 3 Site 4

Location type Large urban district Entire state Large urban district Entire stateTotal number of students,

K–12430,914 5,844,111 approx. 200,000 187,969

Percent of ELL, K–12 15.6 24.1 N/A 6.9Language designation Bilingual/

nonbilingualELL/non-ELL ELL/non-ELL ELL/non-ELL

Grades data available 1–8 2–11 10, 11 3, 6, 8, 10Achievement tests used ITBS SAT9 SAT9 SAT9Language proficiency

tests usedN/A LAS N/A LAS

Accommodation data N/A N/A N/A N/AYears data available 1999 1998 1998 1998

Note. ELL = English language learner; ITBS = Iowa Tests of Basic Skills; SAT9 = StanfordAchievement Test, 9th edition; LAS = Language Assessment Scales; N/A = not available.

Page 6: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

population, of whom 813 (6.3%) were ELL students. In Grade 8 there were 12,400students, and 807 (6.5%) were ELL students.

Design and Statistical Approach

To provide responses to the research questions outlined previously, data from thefour sites were analyzed. There were some differences in the type and format of thedata across the four sites; however, similar analyses were performed on the fourdata sets, and the four sites were used as cross-validation samples.

The main hypothesis of this study focused on the possible impact of students’language background on their performance. Therefore, the focus of the analyseswas on the comparison between the level of performance of ELL and non-ELL stu-dents. However, to develop an understanding about the role of other contributingfactors in the assessment of ELL students, comparisons were also made betweenstudents with respect to other background variables, such as family income andparent education. Students’mean normal-curve equivalent (NCE) scores on differ-ent subscales of standardized achievement tests were compared across subgroupsusing analysis of variance and t tests in a multiple-comparison framework.

To examine the impact of language on the reliability of tests and on the level ofmeasurement error, internal consistency coefficients were computed for differenttests across categories by students’ ELL status and other background variables,such as family income and parent education. This approach was based on the as-sumption that test items within each strand or subscale were measuring the sameconstruct; that is, they were unidimensional (see Cortina, 1993). To study the im-pact of language factors on the validity of tests, the structural equation approachwas used (Bollen, 1989). Through the application of multiple-group factor analy-ses, the internal structural relationship of test items and the relationships of testscores with external criteria were examined.

It must be noted at this point that in some of our data sites, we had access to thedata for the entire student population. Therefore, application of inferential statisti-cal techniques was not necessary. However, to be consistent with the analyses forthe other sites that provided data for subgroups of the population, as well as the en-tire population, we report statistical analyses for all four data sites. Findings fromthese analyses are presented next.

RESULTS

Three main research questions guided the analyses and reporting of the results.These questions were based on (a) issues concerning content-based performancedifferences between ELLs and non-ELLs due to language factors, (b) the impact oflanguage factors on reliability of the tests, and (c) the impact of language factors onthe validity of the tests.

236 ABEDI

Page 7: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

The results of analyses are reported in three sections: (a) performance differ-ences between ELL and non-ELL students, (b) impact of language factors on reli-ability, and (c) validity.

Performance Differences Between ELL and Non-ELLStudents Due to Possible Impact of Language Factors

The results of analyses of data from the four sites consistently suggested that ELLstudents performed substantially lower than non-ELL students. However, the per-formance gap between ELL and non-ELL students was not the same across thecontent areas. In content areas with a higher level of language demand (e.g., read-ing and writing), the performance gap between ELL and non-ELL students was thehighest, whereas in content areas with less language demand (e.g., math and sci-ence), the performance gap was much smaller and in some cases was almost non-existent (e.g., math computation).

To present a picture of the performance gap trend between ELL and non-ELLstudents, we report the descriptive statistics on the site with the largest ELL popu-lation for two grades, an early elementary grade and a secondary school grade. Toconserve space, we have summarized the results of the descriptive analyses for theother three sites.

Table 2 presents the number and percentage of students in Grades 2 and 9 in Site2 who took the Stanford 9 tests in reading, math, and science, by student ELL anddisability status.

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 237

TABLE 2Site 2 Grades 2 and 9 Stanford 9 Frequencies for Students

Students With a Normal Curve Equivalent Score

All Students Reading Math Science

n % n % n % n %

Grade 2SD only 17,506 4.2 15,051 4.1 16,720 4.2 NA NAELL only 120,480 29.1 97,862 26.5 114,519 28.4 NA NAELL and SD 4,629 1.1 3,537 1.0 4,221 1.0 NA NANon-ELL/Non-SD 271,554 65.6 252,696 68.5 267,397 66.4 NA NAAll students 414,169 100.0 369,146 100.0 402,857 100.0 NA NA

Grade 9SD only 18,750 6.0 16,732 5.7 17,350 5.8 17,313 5.8ELL only 53,457 17.2 48,801 16.6 50,666 17.0 50,179 16.9ELL and SD 4,534 1.5 3,919 1.3 4,149 1.4 4,108 1.4Non-ELL/Non-SD 233,189 75.2 224,215 76.4 226,393 75.8 225,457 75.9All students 309,930 100.0 293,667 100.0 298,558 100.0 297,057 100.0

Note. SD = students with disabilities; ELL = English language learner.

Page 8: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

As data in Table 2 show, over 29% of all Grade 2 students at Site 2 who partici-pated in Stanford 9 testing were ELL students. This percentage point (29.1%) maynot represent the actual percentage of ELL students at Site 2 because some ELLstudents did not participate in the assessment due to language barriers. The per-centage of ELL students who participated in the Stanford 9 testing was 17.2% forGrade 9, which was substantially lower than for Grade 2 (29.1%). There wereslight differences between percentages of ELL students across the different con-tent areas in this site.

The large number of ELL students in this site provided a unique opportunityto perform analyses at the subgroup level to examine the impact of students’background variables on academic achievement. Table 3 presents means, stan-dard deviations, and numbers of students in reading, math, and science for Stan-ford 9 test scores by subgroups of students. In addition to data by students’ ELLstatus, we included subgroup data by school lunch program (a proxy for familyincome) and parent education, which were highly confounded with students’ELL status.

In general, the results of analyses reported in Table 3 indicate that:

• ELL students performed substantially lower than non-ELL students, particu-larly in content areas with more language demand such as reading. For example,the mean reading score for ELL students in Grade 2 was 31.6 (SD = 15.9, N =97,862) compared with a mean of 49.3 (SD = 19.7, N = 252,696) for non-ELL stu-dents. This difference was significant beyond the .01 nominal level (t = 250.6, df =350,556, p < .001).2

• The performance gap between ELL and non-ELL students was smaller in thelower grades. For example, there was a 17.7-point difference between ELL andnon-ELL students in Grade 2 reading mean scores as compared with a 22-pointdifference for students in Grade 9.

• The performance gap between ELL and non-ELL students decreased whenthe level of language demand of test items decreased. For example, for Grade 9 stu-dents, the performance gap between ELL and non-ELL students in reading was 22points, as compared to 15.4 points in math.

The results of analyses also show that other background variables affect testperformance. Background variables such as family income (as measured by partic-ipation in free/reduced price lunch program) and parent education may not be di-rectly related to students’ ELL status, but are confounded with it.

238 ABEDI

2Since we are working with the population of students in this site, no statistical comparison is needed.Even a minor difference would be real. However, following tradition, we conducted some statistical sig-nificance testing. To control for multiple comparisons, we used the Benjamini–Hochberg False Discov-ery Rate procedure. For a description of this procedure see Benjamini and Hochberg (1994).

Page 9: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

239

TABLE 3Site 2 Grade 2 Stanford 9 Subsection Scores

Grade 2 Grade 9

Subgroup/Grade Reading Math Science Reading Math Science

ELL statusELL

M 31.6 37.7 NA 24.0 38.1 34.9SD 15.9 19.7 NA 12.5 15.2 12.8N 97,862 114,519 NA 48,801 50,666 50,179

Non-ELLM 49.3 50.4 NA 46.0 53.5 49.2SD 19.7 21.9 NA 18.0 19.4 16.1N 252,696 267,397 NA 224,215 226,393 225,457

School lunchFree/reduced price

M 35.4 38.8 NA 32.0 42.5 39.4SD 17.5 20.1 NA 16.2 16.4 14.3N 106,999 121,461 NA 56,499 57,961 57,553

No free/reduced priceM 47.0 48.5 NA 42.6 50.7 47.0SD 20.6 22.4 NA 19.7 20.1 17.0N 304,092 327,409 NA 338,285 343,480 341,663

Parent educationNot high school grad

M 30.1 34.7 NA 29.2 39.6 37.3SD 15.3 19.1 NA 15.0 15.1 13.5N 54,855 63,960 NA 69,934 71,697 71,183

High school graduateM 40.5 42.6 NA 35.6 44.1 41.7SD 18.1 20.3 NA 17.0 17.1 14.9N 93,031 101,276 NA 71,986 73,187 72,810

Some collegeM 48.8 50.3 NA 44.6 51.6 48.2SD 18.6 20.6 NA 17.2 18.1 15.4N 66,530 70,381 NA 70,364 70,971 70,687

College graduateM 56.5 58.4 NA 48.1 56.3 51.5SD 18.5 20.6 NA 18.5 19.6 16.4N 54,391 56,451 NA 87,654 88,241 87,956

Post graduate studiesM 62.1 64.1 NA 57.6 65.8 58.8SD 18.7 20.4 NA 19.6 20.7 17.1N 25,571 26,367 NA 34,987 35,087 35,022

Note. ELL = English languge learner.

Page 10: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Students who did not participate in the free/reduced price lunch program hadhigher mean scores in all subject areas than those who did participate in the pro-gram. For example, the average NCE score for reading for Grade 2 students whoparticipated in the free/reduced price lunch program was 35.4 (SD = 17.5, N =106,999), as compared with an average score of 47.0 (SD = 20.6, N = 304,092) forthose who did not participate in the program. The difference was statistically sig-nificant (t = 177.8, df = 411,089, p < .001). For Grade 9 students participating in thefree/reduced price lunch program, the average NCE score for reading was 32.0 (SD= 16.2, N = 56,499), as compared with an average of 42.6 (SD = 19.7, N = 338,285)for those who did not participate in the program. The difference between the per-formances of the two groups was statistically significant (t = 139.2, df = 394,755, p< .001).

The results also indicate that parent education has a substantial impact on theStanford 9 test scores. For example, the average NCE score for reading for Grade 2students of parents with low education (not high school graduate) was 30.1 (SD =15.3, N = 54,855), as compared with an average of 62.1 (SD = 18.7, N = 25,571) forstudents of parents with high education (post graduate education). This differencewas statistically significant (t = 238.8, df = 80,424, p < .001). For Grade 9 students,the average NCE score for reading for the low parent education category was 29.2(SD = 15.0, N = 69,934). For students with parents in the high education category,the average was 57.6 (SD = 19.6, N = 34,987). This difference was statistically sig-nificant (t = 238.4, df = 104,919, p < .001). The results of our analyses also suggestthat family income and parent education are confounded with students’ ELL sta-tus. Table 4 presents frequencies and percentages of family income (free/reducedprice lunch program) and parent education by ELL status.

240 ABEDI

TABLE 4Site 2 Free/Reduced Price Lunch Status and Parent Education

by ELL Status

No Free/Reduced Price Lunch–Parent Education

Free/Reduced Price Lunch–Parent Education

NotHS Grad Post Grad Total

NotHS Grad Post Grad Total

GrandTotal

Non-ELL 20,738 22,410 43,148 9,763 980 10,743 53,89126.2% 28.2% 54.4% 12.3% 1.3% 13.6% 68.0%

ELL 15,384 976 16,360 8,648 358 9,006 25,36619.4% 1.2% 20.6% 10.9% 0.5% 11.4% 32.0%

Total 36,122 23,386 59,508 18,411 1,338 19,749 79,25745.6% 29.5% 75.1% 23.2% 1.7% 24.9% 100.0%

Note. Percentages reported are based on the total number of students. Not HS Grad = not high school gradu-ate; Post Grad = post graduate education; ELL = English language learner.

Page 11: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

A chi-square of 12,096.72, which shows confounding of these variables, wassignificant beyond the .01 nominal level (χ2 = 12,096.72, p < .001). A square con-tingency coefficient of .132 presents a rough estimate of the proportion of commonvariance (or confounding) among the three variables. These results suggest that agreater percentage of ELL students are from families with lower income and lowereducation. For example, 95% of ELL students had parents with low education,whereas only 57% of non-ELL students had parents with low education. Thirty-sixpercent of all ELL students participated in the free/reduced price lunch program ascompared with only 20% of non-ELL students.

However, the results of analyses in this study suggest that among these back-ground variables, language factors show a greater impact on assessment, muchgreater than family income or parent education.

To make a more clear comparison between the performance of subgroups ofstudents (e.g., by ELL status, family income, and parent education) in differentcontent areas, a Disparity Index (DI) was computed. For example, to compute DIby students’ ELL status, the mean score for ELL students was subtracted from themean for non-ELL students. The difference was then divided by the mean for ELLstudents, and the result was multiplied by 100. Table 5 shows the DI by studentELL status, as well as by school lunch program and parent education, for Grades 2and 7, for Site 2, in four content areas.3 Similar results were obtained for othergrades (see Abedi & Leon, 1999).

As the data in Table 5 show, the average DI for ELL status over reading, math,language, and spelling for Grade 2 was 48.1 (i.e., over all four subject areas,non-ELL students outperformed ELL students by 48.1%). For Grade 7, the DI was74.8. We also computed DI by school lunch program and parent education. The DIfor school lunch program for Grade 2 students was 29.6. That is, students who didnot participate in the school lunch program outperformed students who partici-pated in the program by 29.6%. For Grade 7, the DI was 35.2. We also comparedthe performance of students with the lowest level of parent education with studentsand the highest level of parent education. The DI for parent education for Grade 2was 99.3; that is, children of parents with the highest level of education (post grad-uate education) outperformed children of parents with lower levels of education(“no education” or “elementary level education”) by 99.3%. The DI for Grade 7 byparent education was 83.5.

By comparing the math DI with the DIs of the language-related subscales (read-ing, language, and spelling), we can see the impact of language on students’perfor-mance. The DIs for all categories (ELL status, school lunch, and parent education)were smaller for math and larger for reading. For example, for Grade 2 students,the DI (non-ELL vs. ELL) was 55.8 in reading (non-ELL students outperformed

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 241

3We have presented the results for Grade 7 rather than Grade 9 to cover a larger range of students indifferent grades.

Page 12: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

ELL students by 55.8%), 60.2 in language, and 42.8 in spelling, as compared witha DI of 33.5 in math. For Grade 7 students, the DIs (non-ELL vs. ELL) were 96.9for reading, 70.7 for language, and 81.1 for spelling, compared to 50.4 for math.The DIs for school lunch program (nonparticipant vs. participant in free/reducedprice lunch) for Grade 2 students were 32.7 for reading, 35.2 for language, and25.3 for spelling, as compared with 25.1 for math.

However, the difference between DIs for math and language-related subscaleswas largest across the ELL categories. In Table 5, we included these DI differencesunder the“Difference”column.TheDIdifferencebyELLstatus4 was19.4 forGrade2 and 32.5 for Grade 7, as compared with the school lunch program DI differences of6.0 and 7.7, respectively, and the parent-education DI differences of 15.8 and 9.8, re-spectively. Once again, these data suggest that language factors may have a moreprofound impact on the assessment outcome than other background variables, suchas family income and parent education, particularly for ELL students.

To shed light on the impact of language factors on assessment, analyses by mathsubscales were conducted and will be presented. Standardized achievement testssuch as the Stanford 9 and ITBS include in their tests different math subscales thathave varying degrees of language demand. These subscales range from testingmath analytical skills, concepts and estimation, and problem solving with a rela-tively higher level of language demand to testing math computation with a mini-mal level of language demand. If the hypothesis concerning the impact of languageon content-based performance is tenable, then the performance difference betweenELL and non-ELL students should be at the minimum level in content-based testswith a minimal level of language demand, such as math computation. This was ex-actly what the results of our analyses showed.

242 ABEDI

TABLE 5Site 2 Grades 2 and 7 Disparity Indexes (DI) by ELL Status,

Free/Reduced Price Lunch, and Parent Education

DI Reading Math Language Spelling Average Difference

Grade 2ELL/Non-ELL 55.8 33.5 60.2 42.8 48.1 19.4Free/reduced lunch 32.7 25.1 35.2 25.3 29.6 6.0Parent education 106.3 84.9 118.5 87.5 99.3 15.8

Grade 7ELL/Non-ELL 96.9 50.4 70.7 81.1 74.8 32.5Free/reduced lunch 47.2 29.5 32.9 31.1 35.2 7.7Parent education 98.4 76.2 79.0 80.5 83.5 9.8

Note. ELL = English language learner.

4This “DI difference” was computed as follows: The three language-related DIs (reading, language,and spelling) were averaged. The result was then subtracted from the DI for math.

Page 13: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Data obtained from some of the sites in this study included different subscalescores including math computation. Table 6 presents the DIs for bilingual studentscompared with nonbilingual5 students by level and grade for math concepts and es-timation, math problem solving, math computation, and reading in Site 1.

The results of the DI analyses shown in Table 6 present several interesting patterns:

1. The DIs indicated that the nonbilingual students generally outperformed thebilingual students. However, the magnitude of the DIs depends, to a greater extent,on the level of language demand of the test items. The DI for test items with lesslanguage demand was smaller than for other items. For example, in Grade 3, bilin-gual students performed better on math computation, which has the lowest level oflanguage demand.

2. Major differences between bilingual and nonbilingual students were foundfor students in Grades 3 and above. There seemed to be a positive relationship be-tween the mean score differences and grade level, in that the difference increasedas the grade level increased, up to Grade 5. Starting with Grade 6, the DI was stillpositive, but the rate of increase was not as systematic as before. For example, inGrade 3, nonbilingual over bilingual students had DIs of 5.3 in math concepts andestimation, 11.1 in math problem solving and data interpretation, –3.1 in mathcomputation, and 23.4 in reading. In Grade 4, these indexes increased to 26.9 formath concepts and estimation, 19.3 for math problem solving and data interpreta-tion, 6.9 for math computation, and 30.1 for reading. The indexes further increasedin Grade 5 to 36.5 for math concepts and estimation, 32.7 for math problem solv-ing and data interpretation, 12.6 for math computation, and 41.1 for reading.

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 243

5This site did not provide information on students’ ELL status. Instead, we used students’ bilingualstatus as a proxy for ELL status.

TABLE 6Site 1 Disparity Indexes of Nonbilingual Over Bilingual Students

on Math and Reading

TestLevel

PrimaryGrade

Math Conceptsand Estimation

Math Problem Solvingand Data Interpretation

MathComputation Reading

9 3 5.3 11.1 –3.1 23.410 4 26.9 19.3 6.9 30.111 5 36.5 32.7 12.6 41.112 6 27.5 30.9 11.8 43.713 7 39.4 32.7 12.9 39.614 8 30.5 31.7 12.9 42.7Average of all levels/grades 27.7 26.4 9.0 36.8

Page 14: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

3. The largest gap between bilingual and nonbilingual students was in reading.The next largest gaps were in the content areas that appear to have more languagedemand. For example, the math concepts and estimation and the math problemsolving and data interpretation subsections seem to have more language demandthan the math computation subsection. Correspondingly, the DIs were higher forthose subsections. The average DI for Grades 3 through 8 was 27.7 for math con-cepts and estimation. That is, the mean of the nonbilingual group in math conceptsand estimation was 27.7% higher than the bilingual group mean. A similar trendwas observed in math problem solving and data interpretation; the average DI forthis subsection was 26.4. The average DI for math computation, however, was 9.0,which was substantially lower than the corresponding DIs for the other two mathsubsections. These results were consistent across the different data sites.

Table 7 reports the DIs, non-ELL versus ELL students, for reading, math total,and the math calculation and math analytical subscales for Grades 3, 6, and 8 atSite 4. Once again, the results of analyses clearly suggest the impact of languagefactors on students’ performance, particularly in areas with more language de-mand. For example, in reading, ELL students had the largest performance gap withnon-ELL students. The average DI for reading across the three grades was 86.7, ascompared with the average performance gap of 33.4 for math total. Among themath subscale scores, those with less language demand showed a smaller perfor-mance gap. The average DI was 41.0 for math analytical and 20.1 for math calcula-tion. The math calculation DI was substantially less than the DI for reading (86.7)and for math analytical (41.0). However, it must be indicated at this point that lan-guage demand and cognitive complexity of test items may also be confounded.That is, items in the math calculation subscale may not only have less language de-mand, but they may also be less cognitively demanding than other math subscales,such as math problem solving. This is a caveat in our discussion on the impact oflanguage on content-based assessments.

244 ABEDI

TABLE 7Site 4 Disparity Indexes of Non-ELL Versus ELL Students in Reading

and Subscales of Math

Disparity Index

Grade Reading Math Total Math Calculation Math Analytical

3 53.4 25.8 12.9 32.86 81.6 37.6 22.2 46.18 125.2 36.9 25.2 44.0Average over the three grades 86.7 33.4 20.1 41.0

Note. ELL = English language learner.

Page 15: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Possible Impact of Language Factorson Reliability of Assessments

In classical test theory, reliability is defined as the ratio of the true-score variance(σ2T) to observed-score variance (σ2X) (Allen & Yen, 1979). This observed scorevariance (σ2X) is the sum of two components, the true-score variance (σ2T) and theerror variance (σ2E). In a perfectly reliable test, the error variance (σ2E) would bezero; therefore, the true-score variance (σ2T) would be equal to the observed-scorevariance.

However, in measurement with human participants there is always an errorcomponent, whether large or small, which is referred to in classical test theory asthe measurement error (see Allen & Yen, 1979; Linn & Gronlund, 1995; Salvia &Ysseldyke, 1998). Appropriate evaluation of the measurement error is important inany type of assessment, whether in a traditional, multiple-choice approach or inperformance-based assessments (Linn, 1995; see also AERA, APA, & NCME,1999). Many different sources (e.g., occasion, task, test administration conditions)may contribute to measurement error in traditional, closed-ended assessment in-struments. In addition to these sources, the reliability of performance assessmentmeasures suffers from yet another source of measurement error, variation in scor-ing of open-ended items. More important, in the assessment of ELL students, lan-guage factors may be another serious source of measurement error, due to unneces-sary linguistic complexity in content-based areas. In the classical approach toestimating reliability of assessment tools, the level of contribution of differentsources to measurement error may be indeterminable. Through the generaliz-ability approach, one would be able to determine the extent of the variance eachindividual source contributes (such as occasion, tasks, items, scorer, and languagefactors) to the overall measurement error (see Cronbach, Gleser, Nanda, &Rajaratnam, 1972; Shavelson & Webb, 1991).

To estimate reliability of the standardized achievement tests used in this studyand to investigate their measurement error, we considered different approaches.Since parallel forms or test–retest data were not available, we decided to use aninternal consistency approach. The main limitation with the internal consistencyapproach, however, is the assumption of unidimensionality. For example, the lit-erature has indicated that the alpha coefficient, which is a measure of internalconsistency, is extremely sensitive to multidimensionality of test items (see, e.g.,Abedi, 1996; Cortina, 1993). However, because the test items within each con-tent area are assumed to measure the same construct, we believe this approachmay be appropriate for estimating reliability of the achievement tests used in thisstudy.

Because different data sites used different tests, and because within the individ-ual sites, different test forms were used in different grades, these analyses wereperformed separately for each site and each grade. Within each grade, we con-

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 245

Page 16: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

ducted the internal consistency analyses separately for ELL and non-ELL stu-dents. The results obtained from analyses at different sites were consistent. Due tospace limitations, only the results from Site 2, the site with the largest number ofstudents, are presented. A complete report of the results of analyses can be found inAbedi et al. (2001).

Language (and perhaps other variables, such as socioeconomic status and op-portunity to learn) may cause a restriction of range in the score distribution thatmay result in lower internal consistency.

Table 8 presents reliability (internal consistency) coefficients for the Stanford 9data for Grade 2 students in Site 2. As the data in Table 8 show, non-ELL studentshad higher coefficients than the ELL students. There was also a slight differencebetween the alpha coefficients across the free/reduced price lunch categories.Nonparticipants in the free/reduced price lunch program had slightly higher alphasthan the participating students. For example, the average reliability for the readingsubscale for the nonparticipant group was .913, as compared with an average reli-ability of .893 for the participant group (a difference of .021), and for ELL studentsthe average reliability was .856, as compared with an average reliability of .914 fornon-ELL students, a difference of .058 (non-ELLs refers to English only). The re-sults of our analyses, which are consistent across the different sites, indicate thatthe difference in internal consistency coefficients between ELL and non-ELL stu-dents is significantly larger than the difference between these coefficients acrossthe free/reduced price lunch and parent education categories.

Table 9 presents the reliability (internal consistency) coefficients for Grade 9students. Comparing the internal consistency coefficients for Grade 9 studentswith those for Grade 2 students (reported in Table 8) once again revealed that re-liability coefficients for ELL students were lower than the coefficients fornon-ELL students. This was particularly true for students in higher grades,where language has more impact on performance. In both Grade 3 and Grade 9,reliabilities were lower for ELL students. However, in Grade 9, the differencebetween reliability coefficients for ELL and non-ELL students was larger. Forexample, for Grade 2, the difference between reliability coefficients for ELL andnon-ELL students was .058 in reading, .013 in math, and .062 in language, ascompared with the ELL/non-ELL reliability difference of .109 for reading, .096for math, and .120 for language in Grade 9. The difference between the overallreliability coefficient of ELL students and English-only students for Grade 9 was.167, which was substantially higher than the respective difference of .043 inGrade 2. Thus, the reliability gap between ELL and non-ELL students increaseswith increase in the grade level. This may be due to the use of more complexlanguage structures in higher grades.

The results of these analyses strongly suggest that students’ language back-ground factors have a profound effect on their assessment outcomes, above and be-yondotherbackgroundcharacteristicssuchasfamily incomeandparenteducation.

246 ABEDI

Page 17: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

247

TABLE 8Site 2 Grade 2 Stanford 9 Subscale Reliabilities

Non-ELL Students: Free LunchParticipation

Subscale (No. of Items) Yes No English Only FEP RFEP ELL

Reading N = 209,262 N = 58,485 N = 34,505 N = 29,771 N = 3,471 N = 101,399Word study (48) .917 .895 .916 .915 .920 .865Vocabulary (30) .913 .897 .915 .906 .907 .857Reading comp. (30) .908 .888 .910 .900 .899 .846Average reliability .913 .893 .914 .907 .909 .856

Math N = 220,971 N = 63,146 N = 249,000 N = 31,444 N = 3,673 N = 118,740Problem solving (45) .893 .881 .896 .886 .890 .871Procedures (28) .892 .892 .891 .887 .895 .890Average reliability .893 .887 .894 .887 .893 .881

Language N = 218,003 N = 62,028 N = 245,384 N = 31,035 N = 3,612 N = 111,752Total (44) .890 .866 .891 .883 .892 .829

Note. ELL = English language learner; FEP = fluent English proficient; RFEP = redesignated fluent English proficient.

Page 18: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics
Page 19: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Validity

Research has indicated that complex language in content-based assessments fornonnative speakers of English may reduce the validity and reliability of infer-ences drawn about students’ content-based knowledge. For example, resultsfrom earlier CRESST language background studies (Abedi & Lord, 2001;Abedi, Lord, & Hofstetter, 1998; Abedi et al., 2000; Abedi, Lord, & Plummer,1997) provided support for a strong link between language factors and con-tent-based performance. The linguistic factors in content-based assessments(such as math and science) may be considered a source of construct-irrelevantvariance because they are not conceptually related to the content being assessed(Messick, 1994):

With respect to distortion of task performance, some aspects of the task may requireskills or other attributes having nothing to do with the focal constructs in question, sothat deficiencies in the construct-irrelevant skills might prevent some students fromdemonstrating the focal competencies. (p. 14)

To examine the impact of students’ language background on the validity of stan-dardized achievement tests, analyses were performed to compare criterion validitycoefficients for ELL and non-ELL students and to examine differences betweenthe structural relationship of ELL and non-ELL groups.

Linguistic complexity of test items, as a possible source of construct-irrelevantvariance, may be a threat to the validity of achievement tests, because it could be asource of measurement error in estimating the reliability of the tests. Inter-correlation between individual test items, the correlation between items and totaltest score (the internal validity coefficient), and the correlation between item scoreand total test score with the external criteria (the students’ other achievement data)were computed. A significant difference across the ELL categories in the relation-ships between test items, between individual items and total test scores (internalvalidity), and between overall test scores and external criteria may be indicative ofthe impact of language on the validity of tests. Since language factors should notinfluence the performance of non-ELL students, these relationships may be stron-ger for non-ELL students.

To examine the hypothesis regarding differences between ELL and non-ELLstudents on the structural relationship of the test items, a series of structural equa-tion models were created for Site 2 and Site 3 data. Fit indexes were comparedacross ELL and non-ELL groups. The results generally indicated that the relation-ships between individual items, items with the total test score, and items with theexternal criteria were higher for non-ELL students than for ELL students.

In creating the structural models, test items in each content area (e.g., read-ing, science, and math) were grouped as “parcels.” Figure 1 presents item par-

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 249

Page 20: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

cels and latent variables for reading, math, and science for Site 2. As Figure 1shows, the 54 reading items were grouped into four parcels. Each parcel wasconstructed to systematically contain items with three degrees of item difficulty:easy, difficult, and moderately difficult items (for a description of the item par-cels and ways to create them, see Catell & Burdsal, 1975). A reading latent vari-able was constructed based on these four parcels.

Similarly, item parcels and latent variables for math and science were createdfrom the 48 math items and 40 science items by the same process. The correlationsbetween the reading, math and science latent variables were estimated. Modelswere tested on randomly selected subsamples to demonstrate the cross-validationof the results.

Table 10 shows the results of the structural models for Grade 9 at Site 2. Corre-lations of item parcels with the latent factors were consistently lower for ELL stu-dents than they were for non-ELL students. This finding was true for all parcels re-gardless of which grade or which sample of the population was tested. Forexample, for Grade 9 ELL students, the correlations for the four reading parcelsranged from a low of .719 to a high of .779 across the two samples (see Table 10).In comparison, for non-ELL students, the correlations for the four reading parcelsranged from a low of .832 to a high of .858 across the two samples. The item parcelcorrelations were also larger for non-ELL students than for ELL students in mathand science. Again, these results were consistent across the different samples.

250 ABEDI

FIGURE 1 Latent variable model for reading, science, and math.

Page 21: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

The correlations between the latent factors were also larger for non-ELL stu-dents than they were for ELL students. This gap in latent factor correlations be-tween non-ELL and ELL students was especially large when there was more lan-guage demand. For example, in Sample 1 for Grade 9, the correlation betweenlatent factors for math and reading for non-ELL students was .782 compared to just.645 for ELL students. When comparing the latent factor correlations betweenreading and science from the same population, the correlation was still larger fornon-ELL students (.837) than for ELL students (.806), but the gap between the cor-relations was smaller. This was likely due to language demand differences. Multi-ple group structural models were used to test whether the differences between

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 251

TABLE 10Site 2 Grade 9 Stanford 9 Reading, Math,

and Science Structural Modeling Results (df = 51)

Non-ELL (N = 22,782) ELL (N = 4,872)

Sample 1 Sample 2 Sample 1 Sample 2

Factor loadingsReading comprehension

Parcel 1 .852 .853 .723 .719Parcel 2 .841 .844 .734 .739Parcel 3 .835 .832 .766 .779Parcel 4 .858 .858 .763 .760

Math factorParcel 1 .818 .821 .704 .699Parcel 2 .862 .860 .770 .789Parcel 3 .843 .843 .713 .733Parcel 4 .797 .796 .657 .674

Science factorParcel 1 .678 .681 .468 .477Parcel 2 .679 .676 .534 .531Parcel 3 .739 .733 .544 .532Parcel 4 .734 .736 .617 .614

Factor correlationReading vs. Math .782 .779 .645 .674Reading vs. Science .837 .839 .806 .802Science vs. Math .870 .864 .796 .789

Goodness of fitChi-square 488 446 152 158NFI .997 .998 .992 .992NNFI .997 .997 .993 .993CFI .998 .998 .995 .995

Note. There was significant invariance for all constraints tested with the multiple group model(Non-ELL/ELL). ELL = English language learner; NFI = Normed Fit Index; NNFI = Nonnormed FitIndex; CFI = Comparative Fit Index.

Page 22: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

non-ELL and ELL students mentioned previously were significant. There weresignificant differences for all constraints tested at the p < .05 level.

The results of simple structure confirmatory factor analyses also showed differ-ences on factor loadings and factor correlations between the ELL and non-ELLgroups for the Site 3 data. The hypotheses of invariance of factor loadings and fac-tor correlations between the ELL and non-ELL groups were tested. Specifically,we tested the following null hypotheses:

• Correlations between parcel scores and a reading latent variable are the samefor the ELL and non-ELL groups.

• Correlations between parcel scores and a science latent variable are the samefor the ELL and non-ELL groups.

• Correlations between parcel scores and a math latent variable are the samefor the ELL and non-ELL groups.

• Correlations between content-based latent variables are the same for the ELLand non-ELL groups.

Table 11 summarizes the results of structural models for reading and math testsfor Site 3 students in Grade 10. Table 11 includes fit indexes for the ELL andnon-ELL groups, correlations between parcel scores and content-based latent vari-ables (factor loadings), and correlations between latent variables. Hypotheses re-garding the invariance of factor loadings and factor correlations between ELL andnon-ELL groups were tested. Significant differences between the ELL andnon-ELL groups at or below .05 nominal levels were identified. These differencesare indicated by an asterisk next to each of the constraints. There were several sig-nificant differences between the ELL and non-ELL groups on the correlations be-tween parcel scores and latent variables. For example, on the math subscale, differ-ences in factor loadings between the ELL and non-ELL groups on Parcels 2 and 3were significant. Table 11 also shows a significant difference between the ELL andnon-ELL groups on the correlation between reading and math latent variables.

These results indicate that:

1. Findings from the two cross-validation samples are very similar and pro-vide evidence on the consistency of the results.

2. Structural models show a better fit for non-ELL than for ELL students.3. Correlations between parcel scores and the content-based latent variables

are generally lower for ELL students.4. Correlations between the content-based latent variables are lower for ELL

students.

The results suggest that language factors may be a source of construct-irrele-vant variance in the assessment of ELL students.

252 ABEDI

Page 23: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

DISCUSSION

The purpose of this study was to examine the impact of students’ language back-ground on the outcome of their assessments. Three major research questionsguided the analyses and reporting and will be the basis for discussion of the resultsof this study:

1. Could the performance difference between ELL and non-ELL students bepartly explained by language factors in the assessment?

2. Could the linguistic complexity of test items as a possible source of mea-surement error influence the reliability of the assessment?

3. Could the linguistic complexity of test items as a possible source of con-struct-irrelevant variance influence the validity of the assessment?

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 253

TABLE 11Site 3 Grade 10 Stanford 9 Reading and Math Structural Modeling Results

(Parcels Ordered by Item Number)

Goodness of Fit Model 1 (df = 75) Model 2 (df = 74)

Chi-square 2938 2019NFI .916 .943NNFI .902 .933CFI .918 .945

Factor LoadingsNon-ELL

(N = 8,947)ELL

(N = 303)Non-ELL

(N = 8,947)ELL

(N = 303)

ReadingParcel 1 .677 .683 .679 .685Parcel 2 .683 .612 .684 .613Parcel 3 .738 .695 .739 .696Parcel 4 .826 .816 .824 .812Parcel 5 .693 .723 .690 .720

MathParcel 1 .735 .763 .752 .788Parcel 2 .659 .702* .667 .716*

Parcel 3 .623 .730* .592 .685*

Parcel 4 .724 .774 .722 .774Parcel 5 .389 .471 .330 .391

Factor correlationReading vs. Math .719 .624* .723 .622*

Note. NFI = Normed Fit Index; NNFI = Nonnormed Fit Index; CFI = Comparative Fit Index: ELL= English language learner.

*Significant at or above .05.

Page 24: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

In response to Question 1, results from the analyses of data from several loca-tions nationwide indicated that students’ assessment results might be confoundedwith language background variables. Descriptive statistics comparing ELL andnon-ELL student performance by subgroup and across different content areas re-vealed major differences between the performance of the two groups. Included inthe descriptive statistics section was a DI (the disparity of performance ofnon-ELL students over that of ELL students). This index showed major differ-ences in performance between students with different language backgrounds. Thehigher the level of English language complexity in the assessment tool, the greaterthe DI (the performance gap between ELL and non-ELL students).

Access tostudent-leveland item-leveldata fromthesitesprovidedanopportunityto conduct analyses on student subgroups that were formed based on their back-ground variables, including language background. The exceptionally large numbersofstudents insomesubgroupsenabledus toconductcross-validationstudies todem-onstrate consistency of results over different sites and grade levels. The high degreeof consistency assured us of the validity and interpretability of the results.

Descriptive analyses revealed that ELL students generally perform lower thannon-ELL students on reading, science, and math subtests. The level of impact oflanguage proficiency on the assessment of ELL students is greater in content areaswith a higher level of language demand—a strong indication of the impact of Eng-lish language proficiency on assessment. For example, analyses show that ELLand non-ELL students had the greatest performance differences in reading, and theleast performance differences in math, where language has less of an impact on theassessment.

In response to Question 2, the results of our analyses indicated that test items forELL students, particularly ELL students at the lower end of the English profi-ciency spectrum, suffer from lower internal consistency. That is, the languagebackground of students may add another dimension to the assessment in con-tent-based areas. Thus, we speculate that language might act as a source of mea-surement error in such areas. It is therefore imperative that test publishers examinethe impact of language factors on test reliability and publish reliability indexesseparately for the ELL subpopulation.

To shed light on the issues concerning the impact of language factors on validity(Question 3), concurrent validity of standardized achievement tests (Stanford 9 andITBS) was examined using a latent-variable modeling approach. Standardizedachievement latent variables were correlated with the external-criterion latent vari-ables. The results suggest that (a) there is a strong correlation between the standard-ized achievement and external-criterion latent variables, (b) this relationship isstronger when latent variables rather than measured variables are used, and (c) thecorrelation between standardized achievement and external-criterion latent vari-ables is significantly larger for the non-ELL population than for the ELL population.We speculate that low correlation between the two latent variables for the ELL group

254 ABEDI

Page 25: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

stems from language factors. That is, language factors act as construct-irrelevantsources (Messick, 1994).

Analyses of the structural relationships between individual items and betweenitems with the total test scores revealed a major difference between ELL andnon-ELL students. Structural models for ELL students demonstrated lower sta-tistical fit. Further, the factor loadings were generally lower for ELL students,and the correlations between the latent content-based variables were weaker forELL students.

The results of this study suggest that ELL test performance may be explainedpartly by language factors. That is, linguistic complexity of test items unrelated tothe content being assessed may at least be partly responsible for the performancegap between ELL and non-ELL students. Based on the findings of this study, werecommend that (a) the issues concerning the impact of language factors on the as-sessment of ELL students should be examined further; (b) psychometric character-istics of assessment tools should be carefully reviewed for use with ELL students;and (c) in assessing ELL students, student language background variables shouldalways be included, and efforts should be made to reduce confounding effects oflanguage background on the assessment outcome.

ACKNOWLEDGMENTS

This research was supported in part by the Office of Bilingual Education and Mi-nority Languages Affairs under Contract R305B960002 as administered by theU.S. Department of Education. The findings and opinions expressed in this reportdo not reflect the position or policies of the Office of Bilingual Education and Mi-nority Languages Affairs or the U.S. Department of Education.

I acknowledge the valuable contribution of colleagues in preparation of this ar-ticle. Seth Leon and Jim Mirocha provided assistance with the data analyses.Kathryn Morrison provided technical assistance in preparation of this article. JoanHerman and Mary Courtney contributed to this article with their helpful commentsand suggestions. I am grateful to Eva Baker and Joan Herman for their support ofthis work.

REFERENCES

Abedi, J. (1996). The interrater/test reliability system (ITRS). Multivariate Behavioral Research, 31,409–417.

Abedi, J., & Leon, S. (1999). Impact of students’language background on content-based performance:Analyses of extant data. Los Angeles: University of California, National Center for Research onEvaluation, Standards, and Student Testing.

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 255

Page 26: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Abedi, J., Leon, S., & Mirocha, J. (2001). Examining ELL and non-ELL student performance differ-ences and their relationship to background factors: Continued analyses of extant data. Los Angeles:University of California, National Center for Research on Evaluation, Standards, and StudentTesting.

Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in Edu-cation, 14, 219–234.

Abedi, J., Lord, C., & Hofstetter, C. (1998). Impact of selected background variables on students’NAEP math performance. Los Angeles: University of California, National Center for Research onEvaluation, Standards, and Student Testing.

Abedi, J., Lord, C., Hofstetter, C., & Baker, E. (2000). Impact of accommodation strategies on Eng-lish language learners’ test performance. Educational Measurement: Issues and Practice, 19(3),16–26.

Abedi, J., Lord, C., & Plummer, J. R. (1997). Final report of language background as a variable inNAEP mathematics performance (CSE Tech. Rep. No. 429). Los Angeles: University of Califor-nia, National Center for Research on Evaluation, Standards, and Student Testing.

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.American Educational Research Association, American Psychological Association, & National Coun-

cil on Measurement in Education. (1999). Standards for educational and psychological testing.Washington, DC: American Educational Research Association.

Benjamini, Y., & Hochberg, Y. (1994). Controlling the false discovery rate: A practical and powerfulapproach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.Cattell, B. R., & Burdsal, A. C. (1975). The radial parcel double factoring design: A solution to the

item-vs.-parcel controversy. Multivariate Behavioral Research, 10, 165–179.Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of

Applied Psychology, 78, 98–104.Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral

measurements: Theory of generalizability of scores and profiles. New York: Wiley.Duran, R. P. (1989). Assessment and instruction of at-risk Hispanic students. Exceptional Children, 56,

154–158.Garcia, G. E. (1991). Factors influencing the English reading test performance of Spanish-speaking

Hispanic children. Reading Research Quarterly, 26, 371–391.Hakuta, K., & Beatty, A. (Eds.). (2000). Testing English language learners in U.S. schools. Washing-

ton, DC: National Academy Press.Linn, R. L. (1995). Assessment-based reform: Challenges to educational measurement. Princeton, NJ:

Educational Testing Service.Linn, R. L., & Gronlund, N. E. (1995). Measurement and assessment in teaching (7th ed.). Englewood

Cliffs, NJ: Prentice-Hall.Mazzeo, J., Carlson J. E., Voelkl, K. E., & Lutkus, A. D. (2000). Increasing the participation of special

needs students in NAEP: A report on 1996 NAEP research activities. Washington, DC: National Cen-ter for Education Statistics.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance as-sessments. Educational Researcher, 23(2), 13–23.

Mestre, J. P. (1988). The role of language comprehension in mathematics and problem solving. In R. R.Cocking & J. P. Mestre (Eds.), Linguistic and cultural influences on learning mathematics (pp.201–220). Hillsdale, NJ: Lawrence Erlbaum, Associates, Inc.

National Clearinghouse for English Language Acquisition and Language Instruction Educational Pro-grams. (2002). Survey of the states’ limited English proficient students and available educationalprograms and services. Washington, DC: Author.

256 ABEDI

Page 27: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics

Navarrette, C., & Gustke, C. (1996). A guide to performance assessment for linguistically diverse stu-dents. Albuquerque: New Mexico Highlands University.

No Child Left Behind Act of 2001, Pub. L. No. 107–110, 115 Stat. 1425 (2002).Salvia, J., & Ysseldyke, J. E. (1998). Assessment (7th ed.). Boston: Houghton Mifflin.Shavelson, R., & Webb, N. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.Zehler, A. M., Hopstock, P. J., Fleischman, H. L., & Greniuk, C. (1994). An examination of assessment

of limited English proficient students (Special Issues Analysis Center Task Order D070 Report).Arlington, VA: Development Associates.

STANDARDIZED ACHIEVEMENT TESTS FOR ELLS 257

Page 28: Standardized Achievement Tests and English Language ...jherold08/ED633/DONE Abedi standardized achieveme… · Standardized Achievement Tests and English Language Learners: Psychometrics