the golden rule bias reduction principle: a practical reform

3
The Golden Rule Bias Reduction Principle: A Practical Reform John Weiss Rational Center for Fair & Open Testing (FairTest) In this article, the Execu- tive Director of FairTest, an organization that has been prominent in promoting Golden Rule - type proce- dures, explains what he means by the ‘Golden Rule bias reduction princi- ple” and why he advocates it as a practical reform to help ensure fairer tests. Summer 1987 Just over a decade ago, Educa- tional Testing Service researchers Donald Medley and Thomas Quirk (1974) examined ETS’s National Teacher Examination (NTE). Their goal was to determine whether the NTE unfairly discriminated against black test-takers. They undertook this research after noticing that A reading of the items on a typi- cal form of the [NTE] would have revealed little awareness on the part of the test’s constructors of the fact that this country contains a large minority of black citizens represented by many fine writers, artists and musicians. (Medley & Quirk, 1974, p. 235) Professors Medley and Quirk had ETS’s test developers prepare an experimental section of the NTE that was administered to NTE test- takers as a nonscored section. Test- takers did not know that this sec- tion would not be scored. One third of the items on this experimental section met the usual specifications as to content; one-third met the same specifications but reflected black culture rather than traditional culture represented by the first group; and one-third of reflected modern (non-black) culture rather than traditional. . . . The “black” and “modern” items were not esoteric in the sense that they called for knowledge peculiar to one group; rather, the black and modern items dealt with material any reasonably well-informed per- son might be expected to know, but which happened to reflect the experiences, accomplishments, and concerns of black citizens (in the one case) and of citizens who lived in the last few decades (in the other). . . . Each experimental item underwent the normal pro- cedures of the construction and review maintained for NTE items. (Medley & Quirk, 1974, pp. 236- 237) Medley and Quirk then examined the results of black and white NTE John Weiss is Executive Director oj. the National Center .for Fair & Open Testing (Fai,rTesti, P.O. Box 1272, EItrr- iiard Sqziare Station, Ctimbridge, MA 02238. He helped Corngressmnn Michael J. Harrington draft the first federtrl Truth-in-Testing Iegisltrt iow in 1.477. This article is adapted from a paper presented at the April 1987 annual meeting of the National Council on Measurement in Education, Washing- ton, DC. 23

Upload: john-weiss

Post on 28-Sep-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Golden Rule Bias Reduction Principle: A Practical Reform

The Golden Rule Bias Reduction Principle: A Practical Reform John Weiss Rational Center for Fair & Open Testing (FairTest)

In this article, the Execu- tive Director of FairTest, an organization that has been prominent in promoting Golden Rule - type proce- dures, explains what he means by the ‘Golden Rule bias reduction princi- ple” and why he advocates it as a practical reform to help ensure fairer tests.

Summer 1987

J u s t over a decade ago, Educa- tional Testing Service researchers Donald Medley and Thomas Quirk (1974) examined ETS’s National Teacher Examination (NTE). Their goal was to determine whether the NTE unfairly discriminated against black test-takers. They undertook this research after noticing that

A reading of the items on a typi- cal form of the [NTE] would have revealed little awareness on the part of the test’s constructors of the fact that this country contains a large minority of black citizens represented by many fine writers, artists and musicians. (Medley & Quirk, 1974, p. 235)

Professors Medley and Quirk had ETS’s test developers prepare an experimental section of the NTE that was administered to NTE test- takers as a nonscored section. Test- takers did not know that this sec- tion would not be scored. One third of the items on this experimental section

met the usual specifications as to content; one-third met the same specifications but reflected black culture rather than traditional culture represented by the first group; and one-third of reflected modern (non-black) culture rather than traditional. . . . The “black” and “modern” items were not esoteric in the sense that they called for knowledge peculiar to

one group; rather, the black and modern items dealt with material any reasonably well-informed per- son might be expected to know, but which happened to reflect the experiences, accomplishments, and concerns of black citizens (in the one case) and of citizens who lived in the last few decades (in the other). . . . Each experimental item underwent the normal pro- cedures of the construction and review maintained for NTE items. (Medley & Quirk, 1974, pp. 236- 237)

Medley and Quirk then examined the results of black and white NTE

John Weiss is Executive Director oj. the National Center .for Fair & Open Testing (Fai,rTesti, P.O. Box 1272, EItrr- iiard Sqziare Station, Ctimbridge, MA 02238. He helped Corngressmnn Michael J . Harrington draft the f irs t federtrl Truth-in-Testing Iegisltrt iow in 1.477.

This article is adapted from a paper presented at the April 1987 annual meeting of the National Council on Measurement in Education, Washing- ton, DC.

23

Page 2: The Golden Rule Bias Reduction Principle: A Practical Reform

test-takers who took their experi- mental test, controlling for sex and urbanlrural differences between test-takers. They found that the average black test-taker scored 4.4 points higher than his or her white counterpart on the “black” items and 13.4 points lower on the “tradi- tional” items.

The two ETS researchers then concluded:

Clearly, whatever else this test may measure, it has the potential of measuring a candidate’s racial background with considerable ac- curacy. . . .The discovery that an unsuspected factor, such as whether a personality referred to in a test is black or white, may have more to do with candidates’ scores than manifest content. . .is disturbing.” (Medley & Quirk, 1974, p. 244)

Evidence recently made public through court cases in Alabama (Allen v. Alabama Board of Educa- tion, 1985), California (Larry P. v. Riles, 1986), Illinois (Golden Rule Insurance Company v. Washburn, 1984), New York (United States of America v. Nassau County, NY, 1986), and Texas (LULAC, GI Forum, and NAACP v. Texas, 1985) demonstrates that some testing companies-and the state agencies that authorize their activities-have not taken sufficient care to elimi- nate biased items from their exams.

An examination of university ad- mission tests made public through New York’s 1979 Truth-in-Testing law also clearly documents that many exams either require knowl- edge of the activities and vocabu- lary of upper middle-class Ameri- cans or have a distractor answer that often fools minority and/or female test-takers (White, 1985a, 198513). For example, recent Scho- lastic Aptitude Tests disclosed be- cause of Truth-in-Testing expected students to be familiar with polo, golfing, tennis, pirouettes, minuets, property taxes, violins, melodeons, tympanists, and horseback riding. (Examples of such culturally loaded quest ions a r e available f rom FairTest.)

Students who lack this kind of culturally specific knowledge can- not obtain the high SAT score needed to enter most of America’s selective colleges and receive finan- cial aid awards from many private

24

Safeguards need to be established to ensure that standardized tests measure relevant knowledge dif- ferences between test-takers and not irrelevant, cul- turally specific factors.

foundations and government agen- cies (Rosser, 1987). For example, examine the following SAT item:

RUNNER:MARATHON (A) envoy:embassy (B) martyr:massacre (C 1 oarsman: regatta (D) referee:tournament (E) horse:stable (cited in Donlon,

Fifty-three percent of the whites but just 22% of the blacks gave the zcunfed answer (C) (Donlon, 1982). Clearly this item does not measure students’ “aptitude” or logical reasoning ability, but instead mea- sures knowledge of an upper middle- class recreational activity.

Safeguards need to be established to ensure that standardized tests measure relevant knowledge differ- ences between test-takers and not irrelevant, culturally specific fac- tors. One practical procedure cur- rently available to make tests as fair as possible is the Golden Rule pro- cedure. This is an objective tech-

1981-82, p.20)

nique, based on a November 1984 out-of-court agreement between the Educational Testing Service, the State of Illinois, and the Golden Rule Insurance Company (Golden Rule, 1984). The agreement settled a law- suit charging that ETS’s Illinois In- surance Agent Licensing Exam un- fairly discriminated against blacks and was not job-related.

Under Golden Rule, the same con- tent areas will be covered as on previous tests and the exam will be of the same overall level of diffi- culty. The only difference is that within groups of equally valid items in the same content areas, test pub- lishers must select those items that display the smallest differences be- tween the correct answer rates of minority and majority test-takers. As Emory University Professor Martin Shapiro told the New York Times, “Once you have this method, not to use it is to knowingly use a more discriminatory procedure” (“Test Service,” 1984).

To understand why the Golden Rule reform is needed, it is neces- sary to examine how test-makers currently construct exams, includ- ing teacher licensing tests. For each content area, test publishers devel- op a pool of potential questions. They then pretest these questions on a sample group of test-takers. Next, tests publishers discard those items that they believe are ambig- uous, biased, or otherwise flawed. From the remaining pool of items they employ a statistical technique -based on point-biserial correla- tions-to select for the final test those pretested questions that max- imize the differences between test- takers. Questions that maximize differences between high- and low- scoring students may really be mea- suring test-takers1 knowledge of irrelevant, culturally specific infor- mation. By using such items, tests may discriminate against otherwise qualified individuals.

The Golden Rule reform makes exams fairer, not easier. Nothing in Golden Rule changes the content specification of future exams. Nothing alters the ability of test ad- ministrators to set cut scores or passing rates. If group perfor- mances differ widely on all the items in a given content area, these questions will still be used on exams assembled under Golden Rule.

Educational Measurement: Issues and Practice

Page 3: The Golden Rule Bias Reduction Principle: A Practical Reform

Setting the Record Straight In recent months, several individ-

uals have misinterpreted FairTest’s position on Golden Rule. To set the record straight, FairTest does not believe, as ETS President Greg Anrig asserts, “that group differ- ences on test questions primarily are caused by ’bias’ ” (Anrig, 1987; Weiss 1987). Rather, we recognize that group score differences reflect a host of causes, including genuine knowledge differences, test-taking abilities, as well as the inclusion on tests of irrelevant and biased ques- tions. The purpose of the Golden Rule reform is to help ensure that biased test questions are removed from exams.

Application of the Golden Rule settlement to ETS’s Illinois Insur- ance Licensing Test has had just this impact. According to data re- vealed to the Advisory Committee established to monitor the agree- ment’s implementation, the gap be- tween the average scores of black and white test-takers closed by 25% under the first Golden Rule as- sembled forms (Shapiro, 1986). If many minorities score just below the passing cutoff, a 25% reduction in the disparate impact of the exam will lead to a substantial increase in the number of minority candidates who pass-without changing the exam’s validity or making it easier.

Another misinterpretation of FairTest’s position on Golden Rule, made recently by University of Illi- nois Professor Robert Linn and Na- tional Evaluation System’s Attor- ney Michael Rebell, is that we want to simply extend the out-of-court settlement reached in Illinois (Linn & Drasgow, 1987; Rebell, 1986). FairTest has always recognized that the Illinois sett lement has numerous special provisions that were either incorporated into the settlement to appease one of the parties in the case or are appropri- ate only for an insurance licensing exam. FairTest has only endorsed the Golden Rule principle, which is that “among questions of equal dif- ficulty and validity in each content area, questions which display the least differences in passing rates between majority and minority test- takers should be used first” (Na- tional Center for Fair & Open Test- ing, 1987, p.1).

Summer 1987

Standardized multiple-choice exams have become our nation’s cradle-to-grave arbiter of social mobility. These exams are far too important to be left to the sole con- trol of those who profit from their sales. The Golden Rule Bias Reduc- tion Principle is a modest, practical proposal that will help ensure that the 40 million standardized multiple- choice tests annually administered to America’s students and job appli- cants are fair.

References

Allen v. Alabama Board of Education, 612 F. Supp. 1046 CMD Ala. 1985.

Anrig, G. (1987, January). Golden Rule: Second thoughts. APA Monitor , p. 3.

Donlon, T. (1981-1982) The SAT in a diverse society: Fairness and sensi- tivity. College Board N e w s , 1-72,

Golden Rule Insurance Company v. Washburn 419-76 Illinois Circuit Court, 7th Ind. Cir. Ct. (1984). Con- sent Decree.

Larry P. v. Riles. Order modifying judg- ment (Sept. 25, 1986) C-71-2270, U.S. District Court, N. Calif.

Linn, R., & Drasgow, F. (1987, January). Implications of the Golden Rule Agreement. The Score (APA Newsletter), p. 4.

LULAC, GI Forum, NAACP v. State of Texas, 5th Circuit Court, Texas (1985, August 27). Memorandum deci- sion in the U.S. District Court for

16-21.

Eastern Texas, Tyler Division Case

Medley, D., &Quirk, T. (1974). The ap- plication of a factorial design to the study of cultural-bias in general cul- ture items on the National Teacher Examination. Joumial ofEducationu1 Measurement , 11 , 235-245.

National Center for Fair & Open Test- ing. (1987). A Golden R d e bins reduc- tion procedure sourcebook (2nd ed.). Cambridge, MA: FairTest.

Rebell, M. (1986). Disparate impact of teacher competency testing on minor- ities: Don’t blame the test-takers or the tests. Y a l e L a w & Policy Reaiew,

Rosser, P. (1987). Sex bias in college ad- missions tests: Why women lose out. Cambridge, MA: National Center for Fair & Open Testing.

Shapiro, M. (1986, December). Testing reform initiatives: 1987 & beyond. In J. Weiss (Chair), Strategies f o r testing reform: 1987 & beyond. Symposium conducted by the National Center For Fair & Open Testing, Washington, DC.

Test service accepts safeguards against bias. (1984, November 29). The N e w York T i m e s , p. B17.

United States of America v. Nassau County, Civil Action No. 77 Civ. 1881 (FXA), 1986.

Weiss, J. (1987, April). Golden Rule: A response to Anrig. APA Monitor, p. 4.

White, D. (1985a). The G M A T exposed. Berkeley, CA: Testing for the Public.

White, D. (198513). The LSAT exposed. Berkeley, CA: Testing for the Public.

NO. 85-2579.

4 , 375-403.

Buros Institute Developments The Buros Institute of Mental Measurements has scheduled its fifth annual Buros-Nebraska Symposium on Tests and Measurements for October 1 and 2, 1987 at the University of Nebraska- Lincoln. The theme for the symposium is “Assessment of the Teaching Function.” Jim Popham is the keynote speaker, and Edward Haertel, George Madaus, and Donaid Medley are also scheduled to make presen- tations, with others to be added soon. Institute symposia always provide plenty of opportunity for small-group interaction with the speakers, and the program promises to be both comprehensive and thought-provoking. To receive additional information on the symposium as it becomes available, write to the Buros Institute, 135 Bancroft, University of Nebraska, Lincoln, NE 68588-0348.

The Buros Institute is also sponsoring a new journal, Applied Measure- m e n t in Educat ion, to be published by Lawrence Erlbaum Associates. Publication of the first issue is expected for early 1988. Manuscripts should be submitted to the Buros Institute at the above address. We hope that all members of the measurement community will show active interest in the new journal through subscriptions and submission of manuscripts.

25