act compass: internet version reference manual

IInntteerrnneett VVeerrssiioonn RReeffeerreennccee MMaannuuaall

©2012 by ACT, Inc. All rights reserved. NOTE: This document is protected by Federal copyright laws. It is available for user institutions to download for use with the COMPASS system. Reproduction of any of the COMPASS Reference Manual materials for other purposes is strictly prohibited without the express, written permission of ACT, Inc.

Introduction to the reference manual

Introduction to the reference manual This introduction describes how the newly revised (September 2012) COMPASS

Reference Manual is organized, how COMPASS institutions can best use it, and where to find COMPASS support and other resources. Comprising 18 chapters of helpful information, the COMPASS Reference Manual may be accessed at this site or downloaded in its entirety or in part as needed. The manual is divided into three parts, each of which has its own table of contents:

• Part 1, How to use COMPASS (chapters 1–3)

• Part 2, COMPASS test content (chapters 1–8)

• Part 3, COMPASS technical information (chapters 1–7)

Part 1, How to use COMPASS. Part 1 focuses on information that may be most helpful for administrators and testing staff at colleges and universities. This part covers recommended placement and diagnostic models for COMPASS, supplemental and parallel uses, high school outreach approaches, early intervention and enrollment advising, progress and exit testing, and how to connect students to campus resources. Part 1 also provides important information on best practices for COMPASS test administrations, test security, test policies, and considerations for accommodating students with special testing needs. The chapters in Part 1 are as follows:

• Chapter 1: “Recommended uses of COMPASS”

• Chapter 2: “Models for using COMPASS”

• Chapter 3: “COMPASS test administration, security, and best practices”

Part 2, COMPASS test content. Part 2 includes chapters that describe the specific skills and subskills associated with each COMPASS/ESL test. Part 2 would be useful for faculty review so that those individuals who are closest to curriculum and instruction can better understand the content focus of the COMPASS Reading, Mathematics, and Writing Skills placement tests; direct writing assessments (i.e., COMPASS e-Write); and the English as a Second Language (ESL) proficiency tests. In addition, Part 2 provides information on how to create test packages that include all required or desired testing components in a single administration. It also explains how to set cutoff scores and how to interpret placement and diagnostic scores. The chapters in Part 2 are as follows:

• Chapter 1: “Reading tests”

• Chapter 2: “Mathematics tests”

• Chapter 3: “Writing Skills tests”

• Chapter 4: “English as a Second Language tests”

• Chapter 5: “Direct writing assessment”

ACT Proprietary & Confidential i


• Chapter 6: “Creating test packages”

• Chapter 7: “Cutoff scores”

• Chapter 8: “Interpretation of scores”

Part 3, COMPASS technical information. Part 3 is intended for institutional research staff, administrators who wish to confirm the psychometric rigor of the exams, and others who are interested in all of the technical aspects of COMPASS assessments. Part 3 discusses test reliability, validity, and item and test development, as well as details regarding item calibrations and the adaptive testing model. The chapters in Part 3 are as follows:

• Chapter 1: “Technical characteristics of COMPASS tests”

• Chapter 2: “Validating uses of COMPASS tests”

• Chapter 3: “Development of COMPASS tests”

• Chapter 4: “Development of ESL tests”

• Chapter 5: “Development of COMPASS e-Write and ESL e-Write”

• Chapter 6: “Calibration of test items”

• Chapter 7: “Adaptive testing”

Note on tables, figures Data tables included in the revised COMPASS Reference Manual are numbered

according to the part, chapter, and order of appearance of the table in the chapter. For example, Table (3)2.2 is found in Part 3, Chapter 2, and is the second table. Figures are similarly numbered, such as Figure (2)6.4, which is found in Part 2, Chapter 6, and is the fourth figure.

ACT Proprietary & Confidential ii


Resources: Information for sites ACT provides additional detailed user guides and resources to support COMPASS users

at the following location: http://www.act.org/compass/resources.html. Go to this ACT website to access the following materials.

COMPASS “Getting Started” user guides • Becoming a COMPASS Site

• Previewing COMPASS

• Licensing COMPASS

Administrative tasks • Installing COMPASS

• Managing Your Account

• Administering a COMPASS Test

• Generating Reports

• Using the Remote Test Network

• Moving Single Student Record Data from COMPASS to Microsoft® Excel®

• Additional Available Reports

Test package setup and features • Setting Up a Test Package

• Setting Up Local Measures

• Setting Up Bulletin Boards

• Setting Up Majors and Major Groups

• Setting Up Local Demographics

• Setting Up Campus Resources

• Setting Up Transfer Institutions

• Setting Up High Schools

COMPASS tutorials and reference materials • How to Generate Reports Using COMPASS

• COMPASS Guide to Effective Student Placement and Retention in Language Arts

ACT Proprietary & Confidential iii


• COMPASS Guide to Effective Student Placement and Retention in Mathematics

• COMPASS Guide to Successful ESL Course Placement

• COMPASS Course Placement Service Interpretive Guide

• COMPASS Guide to Successful High School Outreach

• Guide to Enhancing Developmental Education with COMPASS Diagnostics

• Answers to Frequently Asked Questions about COMPASS e-Write & ESL e-Write

• Concordant ACT®, COMPASS®, and ASSET® Scores

• What Are ACT’s College Readiness Benchmarks?

COMPASS case studies • Moraine Valley Community College and Richland College—ESL proficiency • Wright State University—Remote testing

COMPASS support services If you require other COMPASS support services, please contact the appropriate resource listed in the table below.

For this service: Please contact:

Ordering or billing status inquiries ACT Customer Service 8:30 am−5:00 pm CST, M−F 800.645.1992 (follow the prompts) [email protected]

Help with installation, launch, browser, and technical issues

ACT Help Desk 24 hours a day/7 days a week 800.645.1992 (follow the prompts) [email protected]

Participate in or request data from ACT's research services (Entering Student Descriptive Report, Returning Student Retention Report, Course Placement Service, etc.)

ACT Research Services 8:30 am−5:00 pm CST, M−F 800.645.1992 (follow the prompts) [email protected]

Ordering units, changes to site status or site information, license renewal or cancellation; have your remote test center listed in the directory

ACT Customer Service 8:30 am−5:00 pm EST, M−F 800.645.1992 (follow the prompts) [email protected]

Implementation ideas, training, questions about operational features, or content-related issues

Your ACT Regional Office www.act.org/contacts/field.html

ACT Proprietary & Confidential iv

mailto:[email protected]




http://www.act.org/contacts/field.html

PART 1 HOW TO USE COMPASS Chapter 1: Recommended uses of COMPASS .......................................................................... 1

Overview ................................................................................................................................................... 1 Placement testing ..................................................................................................................................... 2 Diagnostic testing ..................................................................................................................................... 3 Supplemental and parallel uses ................................................................................................................ 3 Educational progress and exit testing ...................................................................................................... 3 Research services: Entry-to-exit tracking and reporting .......................................................................... 4 Early intervention and enrollment advising ............................................................................................. 5

Chapter 2: Models for using COMPASS ................................................................................... 6 Overview ................................................................................................................................................... 6 Models for COMPASS setup ..................................................................................................................... 6 Next steps: Connect students to campus resources ................................................................................ 9 Models for using ESL component ........................................................................................................... 10

Chapter 3: COMPASS test administration, security, and best practices .................................. 12 Overview ................................................................................................................................................. 12 Test administration policies explained ................................................................................................... 12 Test security requirements ..................................................................................................................... 15 Standard security procedures ................................................................................................................. 16 Test-retest policy .................................................................................................................................... 20 Administering COMPASS to students with disabilities ........................................................................... 21 Best practices for the COMPASS test administrator .............................................................................. 24


Part 1, Chapter 1: Recommended uses of COMPASS

Chapter 1: Recommended uses of COMPASS

Overview The ACT Computer-Adaptive Placement Assessment and Support System

(COMPASS®), which includes English as a Second Language (ESL) testing modules, is a comprehensive assessment with advising, retention, and outcomes-oriented services. Postsecondary institutions use COMPASS for course placement and diagnostic testing. It is a flexible, integrated, and efficient system that provides a high degree of assessment accuracy in a short time. The COMPASS placement and diagnostic tests assess student ability in reading, mathematics, and writing skills; the ESL component tests non-native-English speakers in the areas of reading, listening, and grammar/usage. These multiple-choice tests are computer-adaptive, untimed, and can be used in combination or as stand-alone assessments. The system also offers a direct writing assessment, COMPASS e-Write (or ESL e-Write for non-native-English speakers), that is evaluated instantly through an automated scoring system.

Using COMPASS expands opportunities for students and increases the likelihood that entering students will stay in school and succeed academically. The COMPASS system also has the capacity to gather information about each student’s plans, academic skills, and educational needs, and then to organize and communicate this information for faculty to use to appropriately place and advise students.

Generate results immediately Postsecondary institutions can administer a COMPASS test to a student on demand and

then provide the student with immediate results. The COMPASS system generates a Comprehensive Course Placement and Diagnostic Advising Report and a Student Advising Report, both of which a college can customize with campus-specific support and suggestions for the student such as course-placement recommendations and “Where to Find Help” messages. The COMPASS system’s query capabilities allow authorized personnel to search for and quickly identify students with particular characteristics, plans, or needs such as requests for help with study skills. The system also enables the efficient transmission of copies of any or all COMPASS information (through instant or batch approaches) to other campus database and analysis systems, including the campus student information system.

The COMPASS system stores all student responses and test results, which an institution can use at any time to create various individual reports, lists, and summary reports within the software. COMPASS can create extensive summary reports describing student success,

ACT Proprietary & Confidential 1


retention, and outcomes-oriented research at the subgroup, college, system, state, and national levels. These summaries span student experiences, including those of ESL students, from recruitment to entry to the college and then through courses in the first and second terms, and on to the point of exit from the college.

As a COMPASS user, each institution is entitled to receive the significant campus-based training and support activities that ACT offers. ACT will assist each college with plans to implement COMPASS and will provide technical support, research services, and research planning. Contact information for COMPASS support services is found in Part 1, “Introduction to the reference manual,” in this Reference Manual.

Placement testing COMPASS has a maximum of eight possible placement scores: one each in Reading,

Writing Skills, and COMPASS e-Write, and up to five scores in Mathematics, specifically, Numerical Skills/Prealgebra, Algebra, College Algebra, Trigonometry, and Geometry. On the basis of the standard COMPASS placement measures, an institution can determine whether a student is ready for college-credit courses or would benefit from starting with one or more developmental or preparation courses. The COMPASS system also allows the incorporation of scores from local placement measures.

To assess the learning needs of students whose first language is not English, COMPASS can measure these students’ skills in ESL Reading, ESL Grammar/Usage, ESL Listening, and ESL e-Write. These proficiency tests measure student ability from near-beginner to near-native-speaker levels. Each test can be used separately or combined with various COMPASS Mathematics tests as needed.

Educational Planning Form To complement the information obtained by the placement assessments, the COMPASS

system also provides the Educational Planning Form. This form is designed to collect more information about each student, such as his or her educational background, needs, plans, and goals. COMPASS provides a menu of preset demographic questions from which institutions can choose as they construct their own questionnaires. Institutions can also create up to 40 locally constructed information-gathering items (30 multiple-choice and 10 numerical-entry items). The Educational Planning Form can help college faculty to improve comprehensive advising, support student retention, and plan for student transfers.



Diagnostic testing The COMPASS system offers 26 diagnostic measures in four areas: numerical skills/prealgebra, algebra, reading, and writing. A student’s performance on the diagnostic measures can provide up to 26 additional levels of information about that student’s abilities in specific subskills. Faculty can use this information to help determine an individual’s strengths and weaknesses in specific subareas of knowledge and skills. The results are printed on the Student Advising Report, which has a place for a message from faculty recommending specific remedial activities. The 26 COMPASS diagnostic measures are detailed in Part 2, Chapters 1, 2, and 3. (NOTE: There are no ESL diagnostics in the COMPASS program.)

Supplemental and parallel uses For institutions using other placement assessments, COMPASS can be an excellent tool when more information (such as the diagnostic or ESL measures) would be helpful in making decisions about individual students. For example, a college using the ACT Assessment mathematics scale for placing students into mathematics courses could also administer a COMPASS Mathematics diagnostic test to students during their orientation on campus. This diagnostic would provide more current information about an individual’s skills, especially if these skills had bordered on two course recommendations at the time that the student took the ACT Assessment in high school. COMPASS can also supplement a local college placement test or ASSET®, an ACT paper-and-pencil placement test.

The flexibility of the COMPASS system means that it can be used in parallel with other testing approaches. For example, a college administers tests to groups of students at regularly scheduled times and offers COMPASS as an alternative placement exam for students who are unable to attend the group testing sessions. One or more COMPASS testing stations can be reserved as needed to accommodate the schedules of individuals.

Educational progress and exit testing Colleges can use COMPASS to assess the progress of students who are taking formal courses or who are engaged in informal learning experiences that parallel the measurement domains in the system. For example, students identified as having developmental needs in reading can be referred to a developmental reading course or tutoring program so they can bring their skills up to the desired level. Later, COMPASS can be administered to these same students a second time to determine whether their reading skills have improved enough for placement into the next curriculum level.



Faculty in an ESL program likewise can administer an ESL measure (such as Grammar/Usage or ESL e-Write) to students at the end of their last term of ESL instruction. Benchmark exit information then can be gathered to assist these students in their transition to college-level courses. ACT Research Services personnel will assist each institution with tracking students and their end-of-term grades to set an exit benchmark that is likely to predict their success in the next level course.

Research services: Entry-to-exit tracking and reporting The COMPASS system offers four research services that prepare reports containing significant information about an institution’s students. The reports focus on the following: (1) description of entering students, (2) retention of returning students, (3) course-placement outcomes, and (4) follow-up of underprepared students. Another method for tracking and reporting is to use COMPASS in conjunction with ACT’s Collegiate Assessment of Academic Proficiency (CAAP) instrument.

Entering Student Descriptive Report The Entering Student Descriptive Report describes the characteristics, needs, plans, and skills of students who initiate the “becoming a student” process at a college. The report has options for generating subgroup reports for groups of students of special interest, such as those students interested in pursuing a particular program of study, those from a particular high school, or those students with particular needs. The report is created at the campus, system, state, and national levels.

Returning Student Descriptive Report The Returning Student Descriptive Report is designed for use in identifying the retention patterns occurring at an institution and the characteristics of the students corresponding to the patterns. The report has options for generating subgroup reports for groups of students of special interest, such as those students interested in pursuing a particular program of study, those from a particular high school, or those students with particular needs. The report is created at the campus, system, state, and national levels.

Course Placement Service Report The Course Placement Service Report describes and evaluates the outcomes of course-placement practices on a specific campus and provides information to assist in the setting or fine-tuning of course placement advising cut scores. The report also offers an option for developing placement cut-score linkages between pairs of placement measures from different tests.



Use with CAAP tests The advantage of assessing students with both COMPASS and CAAP tests is that the results of the students’ general educational services can be documented over an extended time. That is, the longitudinal matching of COMPASS records at entry and CAAP records at exit will document changes that take place in student achievement during the college experience. (To learn more about using CAAP, please go to http://www.act.org/caap/.)

Early intervention and enrollment advising The COMPASS license agreement allows postsecondary institutions to use the COMPASS assessment and advising services to reach out to students at area high schools, outreach centers, and related sites for the purpose of delivering early intervention and support. Through these services, institutions can help individuals who intend to attend college but who may need some guidance with planning, academic development, “brush-up” efforts, or dual enrollment. These services also can be delivered to individuals not committed to attending an institution of higher education but who may benefit from more information on such opportunities.


http://www.act.org/caap/

Part 1, Chapter 2: Models for using COMPASS

Chapter 2: Models for using COMPASS

Overview COMPASS has extensive capabilities that allow it to serve as an authoring system for use at the campus level to customize both the delivery of placement and diagnostic tests and the reporting of results and recommendations to students and staff. Colleges can apply the COMPASS system according to the model of use most helpful to its needs which include the following:

• as a complete, self-contained placement and diagnostic assessment service;

• as a parallel, individualized placement assessment service to provide an alternative set of individualized, untimed measures to supplement the existing use of a group-oriented paper-and-pencil system, such as the ASSET system; and

• as a supplement to existing assessment services, adding information from the COMPASS diagnostics or ESL measures to an existing set of placement assessment information services, such as the ACT Assessment.

To make the design concepts and possible uses of COMPASS more concrete, this chapter first describes three hypothetical settings and how COMPASS is applied within them. It also describes how schools can immediately direct students to appropriate instructional materials when they are taking COMPASS diagnostic tests. The chapter concludes with two hypothetical settings of Intensive English Programs (IEP) and how they use the ESL component of COMPASS. Assistance and information on other ways that colleges can use COMPASS/ESL can be obtained through ACT’s regional offices or various COMPASS customer support services. Please refer to the listing of ACT offices found in Part 1, “Introduction to the reference manual,” in this Reference Manual, or email ACT at [email protected].

Models for COMPASS setup Setting A: Large college or university Each spring and summer, the admissions office of a large land-grant university with an open enrollment policy holds regular orientation sessions for new students visiting campus (a similar orientation is held the first three days of the fall session for new students unable to attend these regular orientations). The students are divided into groups of 30 for activities such as placement testing, immediate course placement, and program advising. Students typically arrive with ACT Assessment scores and are placed in courses based on these scores. For students not




placed with their ACT scores, the COMPASS battery of tests is administered. Students start their COMPASS testing session as a group but leave individually upon finishing. As they leave, they receive printouts of their score reports to take with them to their assigned advisers (who also have access to the score reports electronically). This results in an immediate and smooth flow of students to their advisers. Test results are automatically and immediately uploaded to the university’s comprehensive campus student information system, which also handles electronic admissions, course registrations, and records. Because the university already has some of the demographic data it needs on the students, the COMPASS system is set to collect information that relates to course placement and program advising. The system collects only the minimum amount of demographic data needed to match the examinee’s testing record to the record in the university’s student information system.

COMPASS, with its “Where to Find Help” messaging capabilities, can direct new students to local services when test results indicate that a student needs extra attention and or instruction in specific skill areas. The university has three entry-level or developmental courses in mathematics and two entry-level or developmental courses in writing. These courses lead to College Algebra and Freshman English, respectively, the initial credit-level courses in mathematics and writing. Two levels of developmental reading also are offered. To facilitate the academic success of the students, the university takes care to place students in the appropriate developmental courses when their skills indicate such a need.

Setting B: Large college A college in a suburban area near a large city has a sizable number of underprepared and

non-native-English-speaking students, as well as students ready for direct entry into Calculus I. To serve the underprepared component of the student population, the college has developed an extensive program of developmental courses in writing skills and reading. The institution also has developed an outreach program to work with local high schools to encourage students to further their education. Each spring the college invites high school juniors (for senior year high school course planning and/or dual enrollment) and high school seniors (for college course placement advising) to take COMPASS tests at the high school or college. New students who come to campus during the summer months, including adults who have been out of school for several years, complete the COMPASS assessment and, if needed, the ESL modules.

This college has a networked instructional computer facility to use for COMPASS testing and a computerized student information system to keep track of each student’s admission, assessment, enrollment, and financial records. The advising staff is adequate in size; the assessment center staff conducts the placement testing.



Design testing packages The college’s director of advising and testing has designed a series of testing packages to

meet specific campus needs. These include the following:

• a test package of Reading, Writing Skills, and Mathematics measures for new students whose first language is English;

• a separate ESL assessment package of comparable measures (Reading, Grammar/Usage, and Mathematics) for non-native-English-speaking students;

• separate COMPASS test packages for placement testing in the three curriculum areas individually; and

• test packages for diagnostic testing in Mathematics, Writing Skills, and Reading.

The college collects full demographic information from first-time examinees but collects only identifying information from students who are taking the diagnostic tests. Reports with campus-specific course placement and support services information are printed following the testing sessions, allowing advising to occur immediately. Some instructors of developmental courses use diagnostic scores at the beginning of their courses to identify on each student’s score report specific instructional needs and instructor recommendations. The students in the developmental courses must return to the testing center at the end of those courses to be reassessed by COMPASS to determine if they have improved enough to enter the standard-level courses. Proficiency-estimation scores are obtained both on initial testing and at retesting to maximize the information available about each student’s proficiency.

Setting C: Small college At a small college, many of the students hold full- or part-time jobs and are taking

courses for later transfer to a university or for job advancement or self-improvement. The college caters to these types of students by emphasizing flexible scheduling and individualized programs. For most of its testing needs, the college uses paper-and-pencil placement assessment services (that is, the ASSET program, an ACT paper-and-pencil assessment with scoring and reporting software). College staff favor the flexible testing model of COMPASS, which produces test results that can be used along with the ASSET services, particularly at times when no ASSET session is scheduled. The college also implements services to help students brush up on skills that they have lost. Thus, rather than having to take a complete developmental course, these students can focus on relearning and reviewing specific skills.

The college uses three stand-alone computers in the assessment center to administer COMPASS tests to students who walk in seeking immediate assistance in course selection. The students discuss their needs with an adviser and are given COMPASS tests appropriate to their



needs. COMPASS allows administrators to configure an individualized test package for each student. That is, some students may need the entire COMPASS assessment; others may need only one of the tests. If, for example, a student has previous college credits in reading and writing courses but needs help selecting a mathematics course, only the COMPASS Mathematics component is administered. Examinees who indicate that they have difficulty reading course materials are given the Reading component only.

Students whose performance falls just below a cutoff score in a placement test or tests will be given the related COMPASS diagnostic test to help identify the specific skill areas that need improvement. To assist students in overcoming specific difficulties, the college recommends resources such as helpful chapters in textbooks, units to review on-line, or campus tutoring services. After students have made use of recommended resources, they are given the option to be retested to see if they are ready to advance to the next course. During the semester, instructors at this small college may send some of their students to the assessment center for diagnostic testing as well.

Next steps: Connect students to campus resources With the COMPASS system, a college can provide a set of individualized messages to

students based on their performance on the tests. The messages can describe the “next steps” the students need to take, whether recommended or required, for their academic success. These messages relate to all of the placement and diagnostic tests within the COMPASS system and can also relate to up to 10 sets of local campus tests as well. For students whose scores demonstrate a need for remediation, the next steps may focus on “what to do,” “what to study,” and “where to get needed help” on the campus (such as from a learning center, tutoring service, or Internet resources).

For students whose scores demonstrate an acceptable level of competence, the next steps may focus on which course to take. Messages printed on the COMPASS Individual Student Report can be up to eight lines long for each placement or diagnostic test taken. For each test, the college can specify several different levels of scores (such as 0 to 39, 40 to 69, and 70 to 99), then develop a different message tailored to students in each score range.

Score range group examples Examples A and B, below, show how students are placed in score range groups. In

Example A, students are divided into two groups according to their COMPASS placement tests:

Example A, Group 1: Students score several points below the cutoff score in a placement test and are informed that they may benefit from a review of relevant sections of selected



math, writing, or reading resources before the next term begins. After such a review, they will have the option of retesting on the placement test to see whether their skills have improved enough for them to qualify for the next course.

Example A, Group 2: Students score just above a placement cutoff score and are informed that they may benefit from a review of relevant sections of materials recommended by the faculty before the term begins or during the first few weeks of a course. This extra review may enable them to have a faster start and possibly greater academic success in the course.

In Example B, students taking COMPASS Mathematics, Writing Skills, or Reading diagnostic tests are divided into three groups:

Example B, Group 1: Students score very low on a specific diagnostic test and are informed that they need to be instructed thoroughly on the topics in this area and allowed to review extensively.

Example B, Group 2: Students score in the middle levels of a specific diagnostic test, which shows that they understand some of the concepts in this academic area. They are informed that they need a strong review of other topics in this area.

Example B, Group 3: Students score at the upper level of a specific diagnostic test, which shows that they understand most or all of the concepts. They are informed that they still may benefit from a referral to additional resources for review of the topics in that area.

After a student has completed all of the recommended work associated with the needs identified by the score range group message, the student can take the appropriate COMPASS diagnostics and then be retested on the relevant COMPASS placement test or tests, certifying readiness for the next course.

Models for using ESL component Setting D: University IEP

An Intensive English Program (IEP) affiliated with a large university has an average of 800 non-native-English-speaking students enrolled in a full-time program to develop their English language skills. These students attend IEP classes to prepare for admission to the university. The IEP organizes its courses into four levels of difficulty. Until a few years ago, the courses were divided into separate skill areas as well: reading, writing, listening, speaking, and grammar. Now the courses have been integrated somewhat to yield skill areas of reading and writing together and speaking and listening together; the grammar skills area remains separate.



To accurately place students into the appropriate level within the IEP, the director of advising and testing requires entering students to take three ESL modules of the COMPASS test (ESL Grammar/Usage, ESL Reading, and ESL Listening). The proficiency descriptors for the four levels in each module allow the faculty to correlate the courses in the curriculum with the ESL test level scores. The IEP Reading/Writing faculty also uses an in-house writing assessment for ESL students. The score from this assessment and the three ESL proficiency scores are combined on the score report, enabling advisers to make appropriate IEP placement decisions. The ESL modules are also administered at the end of the IEP courses to help determine a student’s readiness to move on to the next IEP course or to retake the university’s entrance test.

The demographic information gathered by the COMPASS system helps the director of advising and testing identify trends of student performance and retention. Certain characteristics can be selected such as age, skill level, educational background, native language, and so on. The IEP faculty chose to add a specific question about foreign language study to the demographic section of the test. They had noted that ESL students who had studied any foreign language (not necessarily English) for more than three years tended to make more rapid progress than did ESL students without this previous study. Thus, the director of advising and testing now uses the elicited demographic information about previous foreign language study whenever a particular student’s scores fall into a decision zone for placement.

Setting E: Private IEP A private IEP in a major city offers ESL students a combination of group and individual

instruction. Students enroll for as many four-week noncredit sessions as desired. The average enrollment is 100 students per session. Some students want to study English because they plan to enroll in U.S. universities and need to improve their language skills to achieve the necessary score on an entrance test. Others have immigrated to the United States and must improve their language skills to function better on the job and in the English-speaking environment in general.

Individual instruction is geared to the specific needs of the student; group instruction is offered depending on the number of students interested in a particular area. For example, a listening skills course will be offered when the number of students interested in improving their listening comprehension is high enough to warrant academic lectures. The IEP administers the ESL Listening Test to these students to determine their skill level and then offers suitable group courses. The IEP has five computers for student use. One of these computers is equipped with the COMPASS software as a stand-alone unit so that ESL modules can be administered as needed to students for a nominal fee. In some cases, the results of an ESL test are used to demonstrate progress of the student to various funding sources in the student’s home country.


Part 1, Chapter 3: COMPASS test administration, security, and best practices

Chapter 3: COMPASS test administration, security, and best practices

Overview Test administration staff members at COMPASS user institutions are responsible for ensuring that the COMPASS tests given there are delivered securely and fairly in an appropriate testing environment. Test security and test validity depend on local staff following standard test administration procedures and policies. This chapter describes these required test administration policies and steps as they relate to issues such as confidentiality, standardized procedures, fair testing practices, safety, ACT requirements for the testing facility, setup of the computer stations and seating; and the requirements of the staff who administer the tests. Chapter 3 outlines general procedures for test security—steps that will occur before, during, and after the administration—and what to do if there is a security breach. Additionally, the chapter discusses policies for e-Write test security, retesting students, and administering tests to students with disabilities.

Pullout section. The last section of this chapter, “Best practices for the COMPASS test administrator,” may be printed out and given to the test administrator or proctor for easy reference. This section offers a step-by-step summary of important points that they must remember in administering COMPASS tests.

Test administration policies explained For the COMPASS system to be able to successfully measure examinees’ academic skills, tests must be uniformly administered. As with all standardized testing, it is critical that procedures employed by one test administrator or proctor are identical to those employed by others. By strictly following ACT policies and procedures, test administrators and proctors play an important professional role in helping to ensure a fair and equitable testing environment. The following subsections explain necessary policies for COMPASS user institutions.

Confidentiality COMPASS test administrators and proctors have access to information about examinees

and test procedures that they are not permitted to share with anyone outside of ACT. This includes examinee information, test session activities, test administration procedures, manuals, documents, and test items.



Investigations ACT will investigate cases of suspected or documented irregularities that occur in a

COMPASS test administration session. All faculty and staff are obligated to cooperate fully with ACT in subsequent investigations and respond in a timely manner to ACT’s requests for information.

Equal treatment Test administrators are required to administer and supervise COMPASS in a

nondiscriminatory manner and in accordance with all applicable laws, including the Americans with Disabilities Act.

Standardized procedures To protect test security and administer tests in a standardized manner, COMPASS test

administrators are required to read materials provided by ACT, including this Reference Manual, and to follow the standardized procedures provided later in this chapter. Doing so is crucial to ensure the validity of COMPASS test results.

Fair testing practices ACT endorses the Code of Fair Testing Practices in Education and the Code of

Professional Responsibilities in Educational Measurement, which guide the conduct of people involved in educational testing. ACT is committed to ensuring that each of its testing programs upholds the guidelines in these codes. (For a copy of each Code at no charge, contact ACT Customer Services (68), P.O. Box 1008, Iowa City, IA 52243-1008, 319.337.1429.)

Right to terminate ACT reserves the right to terminate its relationship with any institution without advance

notice if ACT determines, in its sole discretion and for any reason, that termination is appropriate. (For the specific details about ACT’s expectations of COMPASS user institutions, please refer to the COMPASS license agreement.)

Safety During the administration of a COMPASS test, the safety of staff and students is of

utmost importance. If an examinee or other person becomes confrontational or disruptive, the test administrator or proctor should take reasonable steps to diffuse the situation, including contacting security personnel or local law enforcement, if necessary. Details of the event should be documented on a COMPASS Irregularity Report. A COMPASS Irregularity Report may be



obtained by emailing ACT at [email protected] and inserting “COMPASS Irregularity Report” in the subject line. A completed COMPASS Irregularity Report should then be submitted to ACT using the same email address and the same subject line convention.

Authorized and unauthorized observers User institutions should be aware of the possibility of a visit by an observer with ACT

identification. Such a visit may or may not be announced in advance. The administrator or proctor should approach any visitor attempting to observe a test being administered and ask for ACT identification or an ACT letter of authorization. If the observer cannot provide either, the administrator must deny admission to the visitor and contact ACT immediately.

Unauthorized persons—including parents, guardians, children, recruiters, employers, and members of the media—must not be allowed to enter, observe, or photograph test rooms or preliminary activities. Cameras, including cell phone cameras or cameras that are part of any portable device, are not allowed in the test room, except for security cameras used by the institution.

Facility requirements Facilities must meet requirements of accessibility with an appropriate physical

environment for the students and no peripheral distractions that could cause any clueing to the test questions.

Accessibility Under the Americans with Disabilities Act, COMPASS must be offered in locations

accessible to persons with disabilities or alternative arrangements must be made for them.

Bulletin boards In the test room, bulletin boards related to potential test questions and charts and maps

that provide strategies for solving problems or writing essays must be removed or covered. Geographical maps and periodic tables need not be covered or removed.

Environment Lighting, temperature, and ventilation must allow examinees to give their full attention to

the test.

Room setup, seating Computer testing stations must be set up in a way that prevents examinees from viewing

any other computer screens. Desktop space must be large enough to accommodate the computer, scratch paper, and a pen or pencil.




Staff requirements Test administrators and proctors have a professional responsibility to become familiar

with how COMPASS tests are to be administrated, as outlined in this chapter of the Reference Manual, and also to adhere to all of the standard and general procedures. The following subsections describe further requirements of the testing staff.

Attentiveness Test administrators and proctors must remain attentive to their testing responsibilities

throughout the entire administration. For staff, this means the following activities are not allowed: reading (except this manual), grading papers, using a computer (except as required to administer COMPASS), talking casually with other staff, or engaging in any activity in the test room not directly related to the administration. No one, including staff, may eat or drink in the test room (unless approved by the institution for medical reasons).

Test administrators and proctors must walk around the test room to monitor examinees. Walking around the room discourages prohibited behavior and makes staff available to answer questions, respond to illness, and ensure that the COMPASS program is functioning at all times.

Test preparation Any person who is involved in COMPASS test preparation activities may not serve as a

test administrator because of the potential for a conflict of interest.

Relatives testing To avoid the appearance of a conflict of interest—and to protect the test administrator or

proctor and his or her relatives or wards from allegations of impropriety—test administrators and proctors may not administer COMPASS to any relative or ward. Relatives and wards include one’s children, stepchildren, grandchildren, nieces, nephews, siblings, in-laws, spouses, and persons under one’s guardianship.

Test security requirements Adhering to appropriate measures of test security goes hand in hand with adhering to standard test administration procedures. Both practices are critical for the valid measurement of student skills. Scores earned on a standardized test are valid only to the extent that they provide a true measure of an examinee’s performance. A student who obtains high scores by unethical means will be given inaccurate course placement advice, which will probably not aid the student in meeting his or her true academic needs; acts of cheating, furthermore, can waste campus and student resources.



The physical security of a test is the first line of defense against compromise. For example, test booklets are locked away when not in use and are tracked carefully when not in a locked cabinet. ACT has incorporated the concern for test security into the COMPASS administration procedures and the design of the software. Because COMPASS is a computer-based test (CBT), some test administration procedures occur automatically. For example, all examinees receive the same set of test directions and practice items on screen.

COMPASS test administrators and proctors who adhere to standard test administration procedures will minimize the opportunity for loss or theft of test items and maximize standardization in the computer test-taking environment. Following are the standard security procedures expected of all COMPASS user institutions.

Standard security procedures General procedures

• The secure COMPASS testing interface must be accessed only by authorized and qualified staff who have been provided with system access.

• Faculty and staff who are provided access to COMPASS testing interfaces or who are responsible for test administration must be trained in the proper security procedures.

• Students must never be left alone with any COMPASS test in a situation that affords them the opportunity to copy test items.

• Examinees are not allowed to take their personal belongings into the testing area with them (such as backpacks, purses, books, cell phones, etc.), except for an approved calculator if it is allowed (individual colleges decide whether to allow calculator use in COMPASS Mathematics tests).

• All faculty and staff are obligated to cooperate fully with ACT in investigations and to respond in a timely manner to ACT’s requests for information.

Before test administration • All testing staff must be thoroughly briefed in the administration and security procedures

outlined in this chapter.

• Staff must ensure that examinees store their personal belongings outside the testing area.

• The test administrator or proctor must ensure that each examinee is properly identified and is tested using the appropriate test package.

• For any COMPASS test, the test administrator or proctor is to supply scratch paper and pens or pencils to examinees who request them. Examinees are not to bring in their own.



During test administration • The test administrator or proctor must control access to the test room at all times. All

identification and admission procedures must be followed to ensure that the person registered to test is the individual admitted to the test room.

• The test administrator or proctor must space the seating of the examinees in such a way to prevent each one from being able to view computer screens of others in the room. The test administrator or proctor must actively monitor the students throughout the examination.

• If examinees are present, the test room must never be left unattended, even momentarily, even if only one examinee is in the room.

• Generally, examinees are not allowed to leave their testing station in the middle of a test. In some situations, this rule cannot and should not be enforced (e.g., in cases of a severe illness, or rest periods for examinees who have certain documented disability needs).

• If a student is permitted to leave the testing station, the test administrator or proctor must ensure that the student does not contact others and that other examinees cannot use or look at the individual’s testing station. The test administrator or proctor must immediately suspend the testing session on that computer by clicking on the appropriate exit key.. The test administrator or proctor must see that the individual either returns to finish the test session or is appropriately excused from additional testing.

After test administration • When examinees finish testing, the test administrator or proctor collects any scratch

paper that has been used. This scratch paper must be disposed of securely.

• Examinees may not leave the testing site with any testing materials, except, if approved by the institution, a copy of their own individual score reports.

• At the conclusion of testing, the test administrator or proctor must physically examine all of the computer screens one by one to ensure that each station has successfully concluded the testing session and that access to any COMPASS test already in progress has been withdrawn. When a testing session is complete, a screen will appear that reads, “STOP. Thank you. Your testing is complete.” After 30 seconds, this screen will revert to the COMPASS “Click here to go on” screen in readiness for the next examinee.

• If testing is finished for the day, the COMPASS program must be removed from the screen by pressing the exit key.



In the event of a security breach A breach in the security of test materials may result in testing staff having to invalidate

and cancel test scores and/or schedule a retest. ACT must be contacted immediately if there is reason to believe that someone has had unauthorized access to test questions or if other improprieties in testing are suspected. A security breach may be reported by emailing ACT at [email protected] or by calling ACT at 800.645.1992 and selecting the general assistance option. In exceptional situations, local testing staff may wish to file an anonymous report about a possible test compromise. In these cases, testing staff may choose to use the ACT Ethics and Compliance Reporting System located at the following: https://act.alertline.com/gcs/welcome. Staff can access this site for additional information on how to report a concern anonymously.

In addition to the physical security of test materials, precautions must be taken to ensure that the software and test items comprising a computer-based test (CBT) are well protected. One threat to test item security is posed by the repeated use of the same test: current examinees could disclose items and answers to future examinees and could even retake the test themselves. Conventional tests counter this threat by periodically releasing new forms. Because CBT item pools are expected to have much longer lives than a conventional test form, different methods of minimizing item exposure are necessary.

The COMPASS item pool is protected against unauthorized access in two ways:

(1) encryption

(2) password-protected access

The computer files holding the item pool are encrypted through a complex and proprietary set of procedures. ACT developed these procedures specifically to ensure the security of the item pools and to minimize the likelihood of compromise even by sophisticated computer programmers. Although no encryption is foolproof, it would take considerable time, effort, and skill to “crack” the system’s code. Because of ACT’s encryption, the only way to display the item pool is through the administration of an actual test. The password-protected access feature has two layers of password control and restricts test administration to authorized users only, who may change passwords whenever desired to further protect access. If the password barriers were somehow to be circumvented, an individual would have to take literally thousands of unauthorized tests before being able to view and record the majority of the approximately 3,000 items in the COMPASS pools.



https://act.alertline.com/gcs/welcome


COMPASS e-Write security features and guidelines ACT advises that when administering COMPASS e-Write or ESL e-Write, users follow all of the standard security guidelines described above, as well as some additional guidelines unique to a direct writing assessment. COMPASS/ESL e-Write components are unique among COMPASS tests in that the system has the capability to allow authorized site staff to reproduce a student’s complete confidential response through the COMPASS “Reports” menu. This capability better enables a college to respond appropriately to student questions and allows for faculty reviews of student responses. It is important that e-Write materials not be subjected to the risk of compromise, however, because compromised prompts may become less able to distinguish a student’s actual level of writing skills. To prevent compromise, ACT recommends enacting the following practices:

• Never permit examinees to have a copy of their response after they have submitted it at the end of their e-Write testing session.

• Handle and store printed copies of student responses to e-Write prompts with the same level of security as other secure test materials (e.g., in a locked file cabinet).

• Limit access to a student’s response to only the faculty or administrative staff with a legitimate “need to know.”

• Securely destroy students’ responses after any formal reviews unless the responses are needed for permanent records, in which case responses must be stored securely.

• Establish a policy for the initial handling of student score challenges. If the review of a particular student response and challenge suggests a problem with the scoring, send the response and challenge to ACT. Computer-scored responses will be rescored by at least two trained ACT raters and returned to the college. If the new score does not match the original score, the new score will replace the original one. If the challenge is for a response scored by ACT raters, a third ACT rater will score it and justify the score; the response, score, and justification will be returned to the college.



Test-retest policy Retesting is appropriate when there is reason to believe that a score obtained from previous testing does not accurately reflect the examinee’s true level of knowledge or skill. This call for retesting typically occurs when (1) factors other than the examinee’s ability are believed to have influenced the previous testing outcome, or (2) the scores from earlier testing are no longer believed to reflect the student’s current ability.

Situation 1: Performance influenced by factors other than ability In Situation 1, some aspect of the previous testing session is believed to have caused the examinee to perform in a manner not indicative of his or her actual level of ability. For example, the examinee may have become physically ill during the test or for some reason was unable to put forth his or her best effort. An examinee who does not understand the testing procedures or who cheats on the test falls into this category. A testing session that is interrupted or improperly administered to the extent that it hinders student performance is another cause for retesting. In Situation 1, the examinee can be retested as soon as the disruptive situation has been resolved.

Situation 2: Significant change in examinee ability In Situation 2, a significant change, whether an increase or a decrease, is believed to have occurred in the examinee’s relevant knowledge and skills. For example, if the examinee has engaged in a learning activity since the previous COMPASS testing and the learning activity is likely to have meaningfully improved the examinee’s relevant knowledge and skills, then COMPASS retesting may occur as soon as the learning activity is completed. In some cases, this type of retesting is referred to as post-testing, because it follows the conclusion of a learning experience (e.g., completion of a prerequisite course) that has occurred since the previous testing. The purpose of such retesting is often to determine if the examinee now meets intended prerequisite skill levels appropriate for placement into the next course. In some instances, an examinee’s scores may be from testing that occurred several months earlier or even more than a year earlier. If it is reasonable to assume that the examinee’s current level of knowledge and skills should be verified, COMPASS retesting can be conducted at any time but must fall within the COMPASS/ESL retest timeline guidelines listed below.

Retest timeline guidelines 1. Under no circumstances should a retest be given on the same day as the original test.

2. An examinee may sit for no more than three COMPASS/ESL tests in a 30-day period.



Administering COMPASS to students with disabilities As a publisher of numerous educational assessment instruments, including COMPASS,

ACT has a long-standing commitment to the policies of the Americans with Disabilities Act (ADA) legislation and to similar policies outlined in previous legislation, such as the Education for all Handicapped Act of 1975 and the Individuals with Disabilities Education Act (IDEA).

Universal design COMPASS item formats (e.g., text and graphic representations) adhere to principles of

universal design, which according to the Center for Universal Design, “is the design of products and environments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design.” Universal design principles for educational materials take into consideration various student abilities, disabilities, racial and ethnic backgrounds, reading skills, ages, and other characteristics. The basic principles of universal design were further influenced by work conducted by the National Center for Educational Outcomes (NCEO) in applying these principles to large-scale assessments (http://cehd.umn.edu/nceo/OnlinePubs/Synthesis44.html, 2004).

COMPASS follows the NCEO universal design recommendations in regard to accommodations, item text, and graphics. The default COMPASS screen resolution setting is 800 by 600 pixels, which results in text display that is generally equivalent to 16-point type in the Tahoma font used. Line length never exceeds 4 inches in the display of passages, items, and graphics, with from 8 to 12 words per line of text on average. COMPASS items use only essential graphics, and their presentation is balanced with the item text to promote visual focus and mitigate confusion. COMPASS graphics are black-and-white line drawings and simple designs; shading is avoided to enhance contrast and clarity. COMPASS multiple-choice answer options are presented vertically, with answer option check boxes that are easy to navigate and represented using a thicker line weight to further enhance ease of use. To the greatest extent possible, COMPASS items are developed and edited to ensure that their on-screen presentation avoids the need for scrolling.

Accommodations Because testing students with documented disabilities is a complex task that requires

diligent effort on the part of the test publisher and the institution, ACT provides information and guidance on appropriate accommodations. Among these accommodations are the following (each of which is readily applied to COMPASS test administration):

• individually administered tests;


http://cehd.umn.edu/nceo/OnlinePubs/Synthesis44.html


• extended time limits (i.e., untimed testing) and rest breaks; and

• reader- or recorder- and proctor-assisted entry of responses.

COMPASS multiple-choice tests are computer-adaptive and untimed, thus providing one of the most common accommodations as a function of the testing model. The examinee with a documented disability can be tested individually and allowed to proceed at whatever pace is most advantageous for that student, allowing for breaks whenever these are needed.

Introductory tutorials for each COMPASS test provide an opportunity for examinees to experience the testing interface before beginning the actual test. That is, the tutorials show them the system features they will use to navigate through stimulus materials, move from item to item, and mark their responses. This helps ensure that students are prepared for the mechanics of the test session. COMPASS navigation features allow examinees to respond to the test questions by using either the computer mouse or the computer keyboard (e.g., tabbing to specific fields, selecting letters from the keyboard for answer options).

For students with limited or impaired vision and students with various types of sensorimotor or cognitive dysfunction, test administrators can provide a trained proctor, or amanuensis, to read the stimuli (e.g., passages) and test questions directly from the computer screen to the student; the amanuensis also can aid the student in navigating the test screens.

Although the standard COMPASS screen resolution allows for a text display generally equivalent to 16-point type, this may not be sufficient for examinees who prefer to use a computer rather than an alternate test format. In these cases, the use of projection equipment or similar external magnifying devices can be applied to the COMPASS system as another level of accommodation.

In addition, ACT is investigating the use of other assistive technologies to support compliance with ADA requirements and the Rehabilitation Act, Section 508. This investigation includes the use of commonly used screen reader products to support text-to-speech and Braille display of items. In light of various COMPASS program needs, ACT is in the process of exploring alternate platform and test delivery options as well.

Alternate formats Because of differences between COMPASS as a computer-adaptive instrument and the

traditional paper-and-pencil test format, some alternate assessment formats (e.g., Braille) are difficult to provide via computer. For examinees with a documented disability who request an alternate format for a placement test, ACT offers ASSET, a paper-and-pencil test available in various alternate formats. ASSET assesses student abilities in the areas of reading, writing, and multiple levels of mathematics. The ASSET Reading and Writing Skills tests align with the



COMPASS Reading and Writing Skills Placement tests. The ASSET Mathematics Test includes numerical skills and elementary algebra components that align with the COMPASS Prealgebra/Numerical Skills Placement Test. The ASSET Mathematics Test also includes intermediate and college algebra components that comprise items measuring intermediate algebra, college algebra, and trigonometry skills and that align with the COMPASS Algebra, College Algebra, and Trigonometry Placement tests.

ACT has developed a set of concordance tables that relate COMPASS Placement Test scores to ASSET Reading Skills, Writing Skills, and Mathematics tests, which will appropriately translate student scores on paper-based forms and special formats. ACT offers Braille, large-type, and CD versions of ASSET, each of which is accompanied by its corresponding Special Format Test Administration Guidelines booklet for test administrators and proctors. As with any nonstandard test administration, users are advised to be cautious in their interpretation of scores resulting from any special administration of COMPASS or ASSET tests, and to use additional relevant information beyond test scores whenever possible in making educational decisions. Accommodated ASSET materials may be ordered by contacting the ACT national office using the general COMPASS Customer Support information found in Part 1, “Introduction to the reference manual,” in this Reference Manual.



Best practices for the COMPASS test administrator General policies and procedures (pullout section) This section describes the procedures for administering COMPASS tests. As the test administrator or proctor, you must first of all become familiar with the policies and procedures outlined in Part 1, Chapter 3, “COMPASS test administration, security, and best practices,” in the COMPASS Reference Manual. You must always ensure that the tests are administered under supervised, secure, standardized testing conditions. No test room may be left unattended while examinees or test materials are present, even if only one examinee is testing. Throughout testing, you must monitor examinees closely and document any irregularities that occur.

Admitting examinees Admit examinees to the test room one-by-one, checking a government-issued photo ID. If

an examinee cannot present acceptable ID, do not admit the examinee. If the name on the identification does not match documentation, determine whether there is a reasonable explanation. If so, admit the examinee. If the explanation is not adequate, deny the examinee admission. In either case, document the situation fully on the COMPASS Irregularity Report. If you cannot make a positive identification or if you suspect the student’s ID has been forged or tampered with in any way, dismiss the student without allowing her or him to participate in testing; document the situation on the Irregularity Report.

General announcements to examinees After verifying the student’s ID, you must inform the student of COMPASS testing

policies concerning (1) items that are prohibited in the test room, (2) calculator use, and (3) prohibited behavior. These are summarized below.

1. Prohibited items in the test room

a. Textbooks, foreign language or other dictionaries, notes, or other aids

b. Communication devices (including cell phones)

c. Recording devices (including cameras, scanners, recorders)

d. Media devices (including games, music, video, headphones)

e. Reading material

f. Food or drink (including water)



g. Tobacco in any form

h. Hats, at the discretion of the test administrator or proctor. Some hats obstruct the examinee’s eyes and may allow examinees to conceal prohibited behavior; however, not all hats will hinder the proctor’s ability to monitor examinees. Some examinees may wear hats for religious or medical reasons.

2. Calculators

Calculator use is permitted in COMPASS tests as long as the institution approves it (some colleges may opt not to allow it). Four-function and scientific calculators are provided in COMPASS. If students plan to use a calculator of their own, they are responsible for bringing an acceptable one (and an acceptable backup calculator, if applicable) to the test session. The types of calculator permitted are listed on the COMPASS website, http://www.act.org/compass/student/calculator.html. The test administrator or proctor is responsible for checking that the calculators brought are acceptable and ensuring that examinees adhere to the following rules:

a. Use a backup calculator only if the primary calculator fails.

b. Do not share calculators with any other examinee.

c. Do not use the calculator’s memory to store any test questions or responses.

3. Prohibited behavior

If examinees engage in prohibited behavior, they will be dismissed, and their test results will not be used. Prohibited behaviors include the following:

a. Looking at another examinee’s computer screen or scratch paper

b. Giving or receiving assistance

c. Using any device to share or exchange information at any time during the tests or during a break

d. Attempting to remove test materials (including test questions and answers) from the test room by any means

e. Not following instructions or abiding by the rules of the institution

f. Exhibiting confrontational, threatening, or unruly behavior

g. Creating a disturbance or allowing an alarm or phone to sound in the test room


http://www.act.org/compass/student/calculator.html


Direct examinees to their seats After you have given the general announcements (and checked calculators, if applicable),

you now will direct the examinee to a specific seat. Separate him or her from friends and relatives and other students who arrived together so that they are not sitting near each other. Never allow examinees to choose their own seats. Random seating is recommended. Spread examinees out in the room as much as possible.

Policy on examinees who leave during a test and return Generally, examinees are not allowed to leave their testing station after testing has begun.

In some situations, this rule should not be enforced (e.g., severe physical illness). If you permit an examinee to leave the testing station, you must be able to ensure that

(1) the examinee does not contact others,

(2) his or her testing station is kept isolated from other examinees, and

(3) the examinee either returns to finish the test session or is appropriately excused from additional testing.

Only one examinee may leave the room at a time, unless there is another test administrator who can accompany the examinees. Do not leave a test room unsupervised at any time. If an examinee is excused in the middle of a test session, immediately suspend the testing session on that computer by hitting the appropriate exit key (i.e., Ctrl-Alt-Q). Then click Home to go to the Staff Login screen.

Distribute scratch paper As the test administrator or proctor, you are to provide scratch paper and a pen or pencil to examinees after they arrive for their test session.

Getting started Complete the following tasks before launching a COMPASS test.

1. Right-click on the desktop and click Properties. Click the Settings tab. 2. Under Screen resolution, change the setting to 800 × 600 pixels. Click OK.

Result: The Monitor Settings window displays. 3. Click Yes.

Complete the following steps to launch COMPASS tests.

1. On the desktop, double-click the Test Launcher icon.



2. On the Staff Login to COMPASS Test Launcher screen, type your Staff ID and password. (These will need to be assigned by the institution.) Click Login.

3. If there is more than one test center associated with the institution, select where the testing will take place. (NOTE: If there is only one test center at your site, you will not see this screen.)

4. Click a test package from the Launch Test Package list. Click GO. Result: The Student Login screen displays and the workstation is ready for the examinee to begin testing. (NOTE: Examinees must have their student ID to begin testing.)

Resuming a test

If a test is interrupted in progress, the COMPASS system will allow the examinee to continue testing from the point of interruption. The testing session may be resumed by completing the following steps:

1. Repeat the Getting Started steps.

2. Click the same Test Package the examinee was in when the session was interrupted.

3. Have the examinee log in and Click “Go on from where I was.”

Monitoring students during the test

During the test session you are to monitor all of the examinees as they are seated at their testing stations. Be alert for instances of unusual or disruptive behavior that will need to be documented. That is, you must complete a COMPASS Irregularity Report whenever an individual or group irregularity occurs, and you must return the completed report to ACT program staff. In the report, identify the names of examinees who were dismissed from (or who left) the testing room without completing all of their tests.

NOTE: A COMPASS Irregularity Report may be obtained by emailing ACT at [email protected] and inserting “COMPASS Irregularity Report” in the subject line.

Group irregularities A group irregularity is one that affects a group of examinees (such as an entire test room).

If this occurs, follow the instructions below and remember to safeguard the security of the test materials at all times.

Interrupting a test. If you must interrupt a test because of circumstances beyond the examinees’ control, instruct examinees to stop testing. Suspend the testing sessions by selecting the appropriate exit keys (i.e., Ctrl-Alt-Q). If there is time to properly exit out of the COMPASS




system, please do so. Otherwise, ensure that no one is left unattended in the test room and that the room is secured. Resume the testing sessions when possible.

Disturbances and distractions. If a disturbance or distraction occurs that affects examinees’ concentration and it cannot be stopped, discontinue the examination. Report all disturbances and distractions on the Irregularity Report. Suspend the testing sessions and resume when possible.

Emergency evacuation. In the event of an emergency evacuation, your first concern is for the safety of examinees and yourself. If an emergency occurs, instruct the examinees to leave the building. If it is safe to do so, lock the test room. Suspend the testing sessions and resume when possible.

Missing or stolen test materials. If, at any time, test materials are unaccounted for, you must immediately contact ACT program staff. You may call COMPASS support at 800.645.1992 and select the general assistance option. ACT will advise you regarding what actions must be taken.

Power failure. If a power failure occurs, try to determine from local sources when power will be restored. Suspend the testing sessions and resume when possible.

Individual irregularities An individual irregularity is one that affects a single person or several individuals

involved in a single circumstance (e.g., communicating answers to each other). Follow the directions for each type of individual irregularity as described below.

Technical difficulties. If an examinee’s computer malfunctions, suspend the testing session by selecting the appropriate exit keys (i.e., Ctrl-Alt-Q), move the examinee to another computer, and resume the testing session, following the instructions in the section titled “Resuming a Test.”

Duplicating test materials. Test administrators, proctors, and examinees are not permitted to duplicate or record any part of COMPASS by copying, taking notes, photographing, scanning, or using any other means. You must collect all scratch paper from examinees and shred it at the end of the test session. In all cases, examinees observed using photographic, scanning, or recording devices are to be dismissed, the device cleared, and the test voided. Inform the examinee that the test is voided and include all necessary information on the COMPASS Irregularity Report. Call COMPASS support at 800.645.1992 and select the general assistance option to determine if any additional action is required.



Examinees who become ill. If an examinee becomes ill and cannot finish testing, reschedule the student’s test session. At the rescheduled session, the examinee must start the COMPASS tests over and cannot continue from the previous stopping point.

Irrational behavior. If an examinee acts in an irrational or violent manner, proceed as follows:

• Try to prevent other examinees from being interrupted, affected, or involved.

• Collect and retain the examinee’s scratch paper without physical force.

• Dismiss the examinee from the test room as quietly as possible, without physical force or contact.

• If necessary, call security or police to protect staff and other examinees’ safety.

• Inform the examinee that his or her test will be voided.

• Give a detailed explanation on the Irregularity Report.

Prohibited behavior. Throughout testing, you should walk quietly around the room to discourage and detect prohibited behavior. Attentiveness is a very effective deterrent. (Please refer to the seven types of prohibited behavior listed earlier in this section, “Best practices for the COMPASS test administrator.”) If an examinee is engaging in any prohibited behavior, proceed in a way that does not cause unnecessary disturbance. Treat the offender reasonably and firmly. If you are certain an examinee is engaging in prohibited behavior, dismiss the offender using the procedure outlined below. If you suspect but are not certain of the prohibited behavior, discreetly warn the examinee that certain behaviors are prohibited and continue close observation.

Dismissal for prohibited behavior. If you must dismiss an examinee, do so as follows:

1. Take action immediately without creating a disturbance. If taking immediate action would result in a disturbance, wait until the end of the test.

2. Collect scratch paper from the examinee.

3. If you believe an electronic device was used to store or exchange information, or to make an image of the test, clear the device.

4. Tell the examinee that

(a) you observed the prohibited behavior and that

(b) he or she is being dismissed because of the behavior. 5. Complete an Irregularity Report that includes the following:

(a) the time of the incident and the name (or names) and IDs of the examinees involved;

(b) the details of what you observed;



(c) the statements that you and the examinee or examinees made; and

(d) the names of any additional test administrators who observed the prohibited behavior.

6. Return the COMPASS Irregularity Report to ACT.

Ending a test session after prohibited behavior. If an examinee is excused for prohibited behavior, immediately suspend the testing session on that person’s computer by selecting the appropriate exit keys (i.e., Ctrl-Alt-Q). Then click Home to go to the Staff Login screen. Do not delete the information from the COMPASS system. If the institution decides to administer a retest for someone who was previously dismissed because of prohibited behavior, the test administrator must closely monitor that examinee’s entire retesting session without performing any other duties. If other examinees are testing at the same time, they must be supervised by another test administrator. All retesting requirements must also be followed.

After testing is completed When an examinee completes a testing session, a computer screen will appear that reads: “STOP. Thank you. Your testing is complete.” After 30 seconds, this screen will revert to the COMPASS “Click here to go on” screen in readiness for the same test package to be administered to the next examinee. COMPASS also will generate printed copies of the examinee’s score report immediately after testing if that option has been activated by the institution. Students may take their score report with them. Remember to collect all scratch paper from examinees before they leave.

In both group and individual testing sessions, your next responsibility is to examine the computer screen to ensure that the testing station has successfully concluded the session and that access to any COMPASS test already in progress has been withdrawn. If testing is finished for the day or if no test administrator will be in the testing room, the COMPASS program must be removed from the screen by applying the appropriate exit protocol (i.e., Ctrl-Alt-Q).


PART 2 COMPASS TEST CONTENT Chapter 1: Reading tests ........................................................................................................ 1

Overview ................................................................................................................................................... 1 Reading Placement Test ........................................................................................................................... 1 Reading Comprehension Diagnostic Test ................................................................................................. 4 Vocabulary Diagnostic Test....................................................................................................................... 5 COMPASS Reader Profile .......................................................................................................................... 5

Chapter 2: Mathematics tests ................................................................................................. 6 Overview ................................................................................................................................................... 6 Use of calculators ..................................................................................................................................... 7 Mathematics placement tests .................................................................................................................. 7 Numerical Skills/Prealgebra Placement Test ............................................................................................ 7 Algebra Placement Test ............................................................................................................................ 9 College Algebra Placement Test ............................................................................................................. 11 Geometry Placement Test ...................................................................................................................... 12 Trigonometry Placement Test ................................................................................................................ 13 Mathematics diagnostic tests ................................................................................................................. 13

Chapter 3: Writing Skills tests ............................................................................................... 21 Overview ................................................................................................................................................. 21 Writing Skills Placement Test ................................................................................................................. 21 Writing Skills diagnostic tests ................................................................................................................. 26

Chapter 4: English as a Second Language tests ...................................................................... 29 Overview ................................................................................................................................................. 29 Philosophies of language learning and testing ....................................................................................... 29 Proficiency at 5 levels ............................................................................................................................. 31 ESL Grammar/Usage Proficiency Test .................................................................................................... 32 ESL Reading Proficiency Test .................................................................................................................. 41 ESL Listening Proficiency Test ................................................................................................................. 45 ESL e-Write Test ...................................................................................................................................... 48


Chapter 5: Direct writing assessment ................................................................................... 50 Overview ................................................................................................................................................. 50 Description of COMPASS e-Write and ESL e-Write ................................................................................ 50 Description of 2–12 score scale .............................................................................................................. 51 Description of 2–8 score scale ................................................................................................................ 57 COMPASS e-Write report services .......................................................................................................... 61 Description of ESL e-Write ...................................................................................................................... 65 ESL e-Write report services ...................................................................................................................... 70

Chapter 6: Creating test packages ........................................................................................ 73 Overview ................................................................................................................................................. 73 Purpose of test packages ........................................................................................................................ 73 Mathematics ........................................................................................................................................... 74 COMPASS Reading and ESL Reading ....................................................................................................... 84 Writing Skills, Direct Writing, and ESL Grammar/Usage ......................................................................... 85 ESL Listening ........................................................................................................................................... 88

Chapter 7: Cutoff scores ....................................................................................................... 89 Overview ................................................................................................................................................. 89 Developing placement messages ........................................................................................................... 89 Developing cutoff scores ........................................................................................................................ 90 Setting initial cutoff scores (Stage 1) ...................................................................................................... 91 Validating cutoff scores (Stage 2) ........................................................................................................... 95 Developing initial ESL cutoff scores ........................................................................................................ 96 Multi-measure placement message ....................................................................................................... 97

Chapter 8: Interpretation of scores ....................................................................................... 98 Overview ................................................................................................................................................. 98 Mathematics and Reading tests ............................................................................................................. 98 Writing Skills test scores ......................................................................................................................... 98 ESL test scores ........................................................................................................................................ 99 Multi-measure placement ...................................................................................................................... 99


Part 2, Chapter 1: Reading tests

Chapter 1: Reading tests

Overview The COMPASS Reading Test comprises placement and diagnostic components as well as a survey of student reading habits. Colleges use the COMPASS Reading Placement Test to determine whether a student has reading skills sufficient for enrolling in a standard entry-level college course or whether first taking a developmental reading course may be more beneficial for the student. The two diagnostic tests for COMPASS Reading are in the areas of reading comprehension and vocabulary; each of these diagnostics has several specific subareas. The COMPASS Reader Profile is a short survey designed to provide information about the amount and type of student reading.

Reading Placement Test Assessing a student’s reading comprehension—the ability to construct meaning from what is read—is the focus of items in the COMPASS Reading Placement Test. The placement test consists of a pool of 71 passages, each of which is accompanied by up to five reading comprehension items. ACT continually develops new reading passages and item sets for the COMPASS Reading component, and future pool updates will introduce these new passages and items into the operational pools.

Passage types The majority of the 71 reading passages are excerpts from copyrighted material, and the remaining passages are original works written by contracted item writers. The average length of the passages is 240 standard words, with a range from 190 to 300 standard words. (A standard word is equal to six characters, including spaces, punctuation marks, letters, and numbers.) The reading level of all passages is approximately equal to what a student encounters in the first year of college; much of the excerpted passage material comes from books, essays, journals, and magazines commonly used in entry-level college courses. Passages are of five types: prose fiction, humanities, social sciences, natural sciences, and practical reading. These passage types are described in more detail as follows:

• Prose fiction passages emphasize the narration of events and revelation of character.

• Humanities passages describe or analyze ideas or works of art and craft.

• Social sciences passages present information gathered by research.



• Natural sciences passages present a science topic along with an explanation of its significance.

• Practical reading passages present information relevant to vocational or technical courses.

Item types Each of the five passage types is accompanied by test items that measure reading comprehension in the general categories of (1) referring and (2) reasoning. Items within these general categories also fall into one or more subcategories that further specify a student’s reading comprehension skills and knowledge. Table (2)1.1 shows the content and approximate percentage of passage types and reading comprehension items in the COMPASS Reading pool, as well as information on the total number of items in the pool. The subcategories are listed in the sections following the table.

TABLE (2)1.1 Contents, passages / items (%) and counts for COMPASS Reading pool

Content by passage type % of pool

Social sciences 31 Natural sciences 27 Prose fiction 12 Humanities 18 Practical reading 12

Total % 100 % of pool

Referring 39 Reasoning 61

Total % 100 No. in pool Total number of passages 71 Total number of items 330

Referring items Reading comprehension items in the referring category pose questions about material explicitly stated in the reading passage presented to the student. The main subcategories of skills that referring items assess are as follows:

• Recognizing the explicitly stated main idea of a passage with more than one paragraph

• Recognizing the explicitly stated main idea of a paragraph



• Locating explicit information in a passage that answers the questions who, what, when, where, why, and how

• Recognizing sequential relationships

• Recognizing cause-and-effect relationships

• Recognizing comparative relationships (greater than, less than, etc.)

• Recognizing explicit evidence presented in support of a claim

• Recognizing stated assumptions

• Recognizing the main idea of visual materials (e.g., charts, graphs, maps)

• Locating explicit information contained in visual materials (e.g., charts, graphs, maps)

• Recognizing comparative relationships between information/ideas presented in prose form and in graphic form (e.g., charts, graphs, maps)

Reasoning items Reading comprehension items in the reasoning category assess a student’s proficiency at making appropriate inferences, developing a critical understanding of the text, and determining the specific meanings of difficult, unfamiliar, or ambiguous words based on the context in which they are used. The main subcategories of skills that reasoning items assess are as follows:

• Inferring the main idea of a passage with more than one paragraph

• Inferring the main idea of a paragraph

• Showing how details are related to the main idea

• Inferring sequence

• Inferring cause-and-effect relationships

• Inferring unstated assumptions

• Drawing conclusions from the facts given

• Making comparisons using stated information

• Making appropriate generalizations

• Recognizing logical fallacies

• Recognizing stereotypes

• Recognizing various points of view

• Recognizing the scope of application of hypotheses, explanations, or conclusions

• Judging the relevance and appropriate application of new information



• Identifying the structure and/or strategy of an argument

• Recognizing relevant distinctions

• Distinguishing between supported and unsupported claims

• Determining specific meanings of words or short phrases in the context of a passage

• Determining how specific word choices shape the meaning and tone of the text

• Applying information, models, or theories from a passage to new situations

• Drawing conclusions from visual materials (e.g., charts, graphs, maps)

• Making comparisons using information in visual materials (e.g., charts, graphs, maps)

Reading Comprehension Diagnostic Test The COMPASS Reading Comprehension Diagnostic Test is designed for developmental readers and provides teachers and students with specific information about students’ reading comprehension skills. The test adaptively selects passages and items from a pool of twenty-nine 200-word passages across three content areas: humanities, social sciences, and natural sciences. Each passage has from four to seven test items. Two categories of reading comprehension skills are assessed: (1) explicit detail (locating information explicitly presented in text) and (2) implicit information (drawing conclusions from information implicit in the passage).

Explicit detail This item type focuses on the student’s ability to find details and answer specific questions by referring to explicit information in a passage. Each passage has at least two explicit-detail items for which the student may have to recognize significant details that

• answer questions of who, what, where, when, why, or how; and/or

• identify relationships of sequence, cause and effect, comparison and contrast, or evidence to support a claim.

Implicit information Items in the implicit information category focus on the student’s ability to make inferences and draw conclusions from a reading passage. At least two implicit-information items accompany each passage for which a student may have to

• infer sequence, cause-and-effect relationships, unstated assumptions, or how details support an important theme; and/or



• demonstrate critical understanding by drawing conclusions from given facts; making comparisons using stated information; making appropriate generalizations; and/or understanding and distinguishing several points of view.

Vocabulary Diagnostic Test The COMPASS Vocabulary Diagnostic Test provides teachers and students with specific information about students’ abilities to determine from context the meaning of vocabulary words—nouns, verbs, adjectives, and adverbs. The diagnostic test uses a multiple-choice format and consists of one- or two-sentence items adaptively selected from a pool of 97 items. In each item, one word is missing and students are asked to select the vocabulary word that works best within the context of the sentence or sentences presented. The approximate composition of the pool is 33 percent nouns, 33 percent verbs, 16 percent adjectives, and 17 percent adverbs. The pool also covers a broad range of difficulty as determined by The Living Word Vocabulary. Difficulty ranges are as follows:

• Low difficulty (grades 4 to 6)

• Moderate difficulty (grades 8 to 10)

• High difficulty (grade 12 and beyond)

COMPASS Reader Profile The COMPASS Reader Profile is a survey designed to provide students and teachers with specific information about students’ reading habits. As part of the COMPASS Reading assessment, students answer from 9 to 15 multiple-choice questions related to the following areas of interest:

• level of enjoyment of reading

• kinds of materials typically read

• amount of time spent reading

• main purpose for reading

• perceptions of their reading ability

The number of questions asked in the survey varies based on a student’s responses to other questions. COMPASS can generate student reader profiles as reports, but these are supplemental only and do not affect any reading test scores.


Part 2, Chapter 2: Mathematics tests

Chapter 2: Mathematics tests

Overview The purpose of the COMPASS Mathematics placement and diagnostic tests is to differentiate among examinees on the basis of their mathematics achievement so that they can be directed to the appropriate level of standard college or developmental math courses. COMPASS offers five placement tests that cover the following major content domains:

(1) Numerical Skills/Prealgebra

(2) Algebra (Elementary and Intermediate Algebra, and Coordinate Geometry)

(3) College Algebra

(4) Plane Geometry

(5) Trigonometry

Each of the five content domains contains a pool of standard, five-option, multiple-choice items. Students can be tested for placement purposes in one or more of these content domains and for diagnostic purposes in the first two domains. The five domains are roughly hierarchical, particularly in the three algebra domains. Adjoining content domains overlap in some topic areas to reflect the content overlap that is built into college mathematics courses and to make the shift from one content domain to another minimally disruptive to the examinee.

The COMPASS Mathematics Tests require that students demonstrate their ability to read and understand mathematical terminology; to apply definitions, algorithms, theorems, and properties; and to interpret data. Students must also apply quantitative reasoning in a variety of ways such as discerning relationships between mathematical concepts, connecting and integrating mathematical concepts and ideas, and making generalizations. The content and complexity of COMPASS mathematics items include three general levels of cognition. Each content domain has a variety of items from these three cognitive levels:

• Knowledge and skills - Items can be solved by performing a sequence of basic operations.

• Direct application - Items assess students’ ability to apply sequences of basic operations in real-

world settings. • Understanding concepts

- Item tests students’ depth of understanding of one or more major concepts and may be based on new or novel settings.

6 ACT Proprietary & Confidential


Use of calculators It is up to individual institutions whether to permit students to use calculators on the COMPASS Mathematics Tests. Effective with Version 3.0, ACT adjusted item parameters for COMPASS math items so that calculators could be allowed in the tests without changing the score interpretation. Colleges using COMPASS Windows Version 2.5 and earlier versions should not permit calculator use. The COMPASS Internet Version, as well as COMPASS Windows Version 3.0 and subsequent versions, all have a calculator built into the software.

For colleges permitting calculator use, students can access this calculator at any time during a math test by clicking a button on the computer screen; or students may bring their own four-function, scientific, or graphing calculators with them to the testing session, provided that the calculators meet specific requirements. For the latest list of prohibited calculators, students and institutions are invited to check the ACT website (http://www.act.org) or call toll free 800.498.6481 for a recorded message.

The test proctor is not expected or required to supply calculators but is responsible to ensure that students who bring their own calculators

• are using an acceptable type;

• use their backup calculator only if the primary calculator fails;

• do not share their calculator with any other test taker; and

• do not use the calculator’s memory to store any test questions or responses.

Mathematics placement tests The subsections below describe the content and topics included in each of the five COMPASS Mathematics placement tests. They also describe basic guidelines to help colleges determine which students should start in which test and how to route students from one test to the next. Part 2, Chapters 6 and 7, of the Reference Manual further explain the routing table, routing rules for creating test packages, and how to interpret the test scores obtained.

Numerical Skills/Prealgebra Placement Test The COMPASS Numerical Skills/Prealgebra Placement Test is the most elementary of the five Mathematics placement tests. Typically, students are administered this test if they

(1) have had limited exposure to algebra,

(2) have performed poorly in previous algebra courses, or

(3) have not used their algebra skills in a long time.



Scores from the Numerical Skills/Prealgebra Placement Test may be used to place students into an elementary algebra course at the college level or to help determine whether students would benefit from a lower-level math course such as prealgebra, arithmetic, or a “refresher” course.

Items in the COMPASS Numerical Skills/Prealgebra Placement Test range in content from basic arithmetic concepts and skills (e.g., basic operations with integers, fractions, and decimals) to the knowledge and skills considered prerequisites for a first algebra course (e.g., understanding and use of exponents, absolute values, and percentages). Table (2)2.1 provides a summary of the specific content areas covered and the approximate percentage of items for each content area in the Numerical Skills/Prealgebra item pool, as well as information on the total number of items in the pool.

TABLE (2)2.1 Contents, items (%) and counts for COMPASS Numerical Skills/Prealgebra pool

Numerical Skills/Prealgebra test items by content area % of pool

Basic operations with integers 10

Basic operations with fractions 14

Basic operations with decimals 8

Exponents, square roots, and scientific notation 7

Ratios and proportions 14

Percentages 21

Conversions between fractions and decimals 1

Multiples and factors of integers 4

Absolute values of numbers 4 Order concepts (greater/less than, using integers, fractions, and decimals) 2

Estimation skills 1

Number theory and/or properties 1

Averages (means, medians, and modes) 11

Counting problems and simple probability 2

Total % 100

No. of items in pool

Total number 332

Students who do well on the Numerical Skills/Prealgebra Placement Test may be routed to one or more additional COMPASS Mathematics placement tests to determine whether they are prepared for an intermediate algebra or higher-level course. Students who do poorly are



best routed to the Numerical Skills/Prealgebra Diagnostic Test or to the end of the Mathematics placement tests.

Algebra Placement Test The COMPASS Algebra Placement Test is most appropriate for students who have recently completed a prealgebra or basic algebra course and for students whose current level of performance suggests a lack of readiness for a college-level algebra course. Students scoring high on the Numerical Skills/Prealgebra Placement Test or low on the College Algebra Placement Test should be routed to the Algebra Placement Test to clarify their current level of competence. Scores on this test may be used in conjunction with other available information to help guide decisions regarding placement in basic, intermediate, or college algebra courses and other math courses requiring a similar degree of mathematical competence.

The Algebra Placement Test is composed of items from three curricular areas: elementary algebra, coordinate geometry, and intermediate algebra. Each of these three areas is further subdivided into a number of more specific content areas. Table (2)2.2 provides a summary of the specific content areas covered and the approximate percentage of items for each content area in the COMPASS Algebra item pool, as well as information on the total number of items in the pool.



TABLE (2)2.2 Contents, items (%) and counts for COMPASS Algebra pool Algebra test items by content area % of pool

Elementary algebra 60 Substituting values into algebraic expressions 10

Setting up equations for given situations 7

Basic operations with polynomials 8

Factoring of polynomials 6

Solving polynomial equations by factoring 5

Formula manipulation and field axioms 3

Linear equations in one variable (using integers, fractions, and decimals as coefficients) 11

Exponents and radicals 5

Linear inequalities in one variable 4

Number theory and/or properties 1

Intermediate algebra 17

Rational expressions 6

Exponents, radical expressions, and equations 3

Systems of linear equations in two variables 4

Quadratic formula and completing the square 2

Absolute value equations and inequalities 1

Fitting parameters to equations and models 1

Coordinate geometry 23

Linear equations/inequalities in two variables 8

Distance and midpoint formulas in the plane 3

Graphing conics (circle, parabola, etc.) 4

Graphing parallel and perpendicular lines 2

Graphing relations in the plane 4 Graphing systems of equations/inequalities and rational

2

Total % 100


Total number 300

Students who score high on the COMPASS Algebra Placement Test may be routed to the College Algebra or Geometry placement tests. Students who score low on the Algebra Placement Test may be routed to the COMPASS Algebra Diagnostic Test or the Numerical Skills/Prealgebra Placement Test.



College Algebra Placement Test The COMPASS College Algebra Placement Test is most appropriate for students who have recently demonstrated proficiency in intermediate algebra courses. Students who score high in the Algebra Placement Test also may be routed to this test. Scores on the College Algebra Placement Test may be used in conjunction with other available information to help guide placement into intermediate or college algebra courses, geometry or trigonometry courses, or other college courses that require a similar level of mathematical competence.

COMPASS College Algebra items test students’ algebraic knowledge and skills in a variety of content areas such as functions, operations with matrices, and factorials. Table (2)2.3 provides a summary of the specific content areas covered and the approximate percentage of items for each content area in the COMPASS College Algebra item pool, as well as information on the total number of items in the pool.

TABLE (2)2.3 Contents, items (%) and counts for COMPASS College Algebra pool College Algebra test items by content area % of pool

Functions 36

Exponents and exponential functions (including logarithms) 25

Complex numbers 6

Set theory and quadratic inequalities 1

Arithmetic and geometric sequences and series 12

Factorials 3

Matrices (basic operations, equations, and determinants) 5

Systems of linear equations in three or more variables 4

Logic and proof techniques 2

Roots of polynomials 6

Total % 100


Total number 210

Students who score low on the COMPASS College Algebra Placement Test may be routed to the Algebra Placement Test. Students who score fairly high on the College Algebra Placement Test may be routed to the Geometry or Trigonometry placement tests, if such information is considered relevant to a particular placement decision.



Geometry Placement Test The COMPASS Geometry Placement Test assesses students’ understanding of concepts in Euclidean geometry and the ability to use spatial or geometric reasoning in problem solving. Scores in this test may provide useful information to supplement scores in the Algebra, College Algebra, and/or Trigonometry placement tests. Table (2)2.4 provides a summary of the specific content areas covered and the approximate percentage of items for each content area in the COMPASS Geometry item pool, as well as information on the total number of items in the pool.

TABLE (2)2.4 Contents, items (%) and counts for COMPASS Geometry pool

Geometry test items by content area % of pool

Angles (supplementary, complementary, adjacent, vertical, etc.) 12

Triangles (perimeter, area, Pythagorean Theorem, etc.) 58

Rectangles (perimeter, area, etc.) 7

Parallelograms and trapezoids (perimeter, area, etc.) 2

Circles (perimeter, area, arcs, etc.) 12

Hybrid (composite) shapes 4

Three-dimensional concepts 5

Total % 100


Total number 178



Trigonometry Placement Test The COMPASS Trigonometry Placement Test assesses students’ understanding of trigonometric concepts and their application in problem solving. Scores in this test may be used along with scores in the COMPASS College Algebra Placement Test and other available information to help guide decisions regarding placement into college algebra, trigonometry, calculus, or other college-level courses that require similar mathematical proficiency. Table (2)2.5 provides a summary of the specific content areas covered and the approximate percentage of items for each content area in the COMPASS Trigonometry item pool, as well as information on the total number of items in the pool.

TABLE (2)2.5 Contents, items (%) and counts for COMPASS Trigonometry pool

Trigonometry test items by content area % of pool

Right-triangle trigonometry 34

Special angles (multiples of 30 and 45 degrees) 10

Trigonometric identities 15

Graphs of trigonometric functions 20

Trigonometric equations and inequalities 9

Trigonometric functions of arbitrary angles 11

Inverse trigonometric functions and polar coordinates 1

Total % 100


Total number 146

Mathematics diagnostic tests The seven COMPASS Numerical Skills/Prealgebra (NS/PA) and eight COMPASS Algebra (Alg) diagnostic tests will provide a more detailed indication of a student’s math capabilities in specific content areas than will the COMPASS Mathematics placement test scores alone. These diagnostics are listed below by domain.

• NS/PA-1: Operations with Integers • NS/PA-2: Operations with Fractions • NS/PA-3: Operations with Decimals • NS/PA-4: Positive Integer Exponents, Square Roots, and Scientific Notation • NS/PA-5: Ratios and Proportions



• NS/PA-6: Percentages • NS/PA-7: Averages (means, medians, and modes) • ALG-1: Substituting Values into Algebraic Expressions • ALG-2: Setting up Equations for Given Situations • ALG-3: Basic Operations with Polynomials • ALG-4: Factoring Polynomials • ALG-5: Linear Equations in One Variable • ALG-6: Exponents and Radicals • ALG-7: Rational Expressions • ALG-8: Linear Equations in Two Variables

Numeric scores are provided for each of the specific content areas. These scores may be interpreted as the expected proportion correct if a student had responded to all items in a specific content area. Each diagnostic test is designed to focus on particular knowledge and skills within the domains of numerical skills/prealgebra or algebra, but many of the tests also assess somewhat more generic skills. These common, generic skills can be conceived of as “strands” that are interwoven into the tests. Some questions are primarily computational, some are primarily application-oriented, and some may not fall clearly within either strand or may be about equal parts computational and application. Strands frequently overlap in difficulty, as presented in Figure (2)2.1 below. The least difficult test items in the diagnostic area are in the computation strand and the most difficult are in the application strand; there also are items of medium difficulty in each strand and between the strands.

FIGURE (2)2.1 Skill strands overlapping in difficulty

Item format and difficulty All of the items in the COMPASS Mathematics diagnostic tests are presented in a standard multiple-choice format with five options. Some questions have “Cannot be determined from the given information” as an option. The questions cover a range of difficulty levels depending on the particular concepts covered and the complexity of the problem. For example,



a two-step problem tends to be easier than a four-step problem covering the same general content. The types of numbers being computed also affect the difficulty level of an item. For example, computing with negative numbers tends to be more difficult than performing the same operations with positive numbers. Computations with fractions also tend to be more difficult than computations with decimals.

Another influence on item difficulty is the amount of context that is embedded in a question. If the item is presented in an applied setting, it tends to be more difficult for students than if the mathematical relationships are extracted from the context. The level of thinking required also affects difficulty. Students are more likely to correctly perform a task that they have practiced before or are familiar with than one that is unfamiliar. The alternate answer choices available to the student also contribute to the difficulty level of an item. If there is a common error that students make when working a problem and the resulting wrong answer is one of the answer choices, then it is likely that fewer students will answer correctly than if the answer choice for that common error were replaced with one that is less common. An answer choice of “Cannot be determined from the given information” also tends to add difficulty to the test questions.

Diagnostic details The following provides specifics regarding each COMPASS Math Diagnostic Test to

provide in-depth information regarding the multiple COMPASS components available to assist postsecondary institutions in pinpointing areas of instructional need.

Numerical Skills/Prealgebra: Operations with Integers. The Prealgebra Operations with Integers Diagnostic Test focuses on computing with and applying the operations of addition, subtraction, multiplication, and division. Some test questions involve positive integers and others involve signed integers. This diagnostic test has two primary strands of mathematical knowledge and reasoning skill—Computation and Application. The questions in the Computation strand present mathematical expressions with integers and require students to compute the result. Questions from the Application strand present real-world situations in which students apply their understanding of operations with integers. At the easier end of this strand are questions that can be modeled as one-step addition of positive integers. The strand progresses to higher levels of complexity, requiring students to combine operations and use negative integers.

Numerical Skills/Prealgebra: Operations with Fractions. The Prealgebra Operations with Fractions Diagnostic Test presents items involving fractions that require students to compute results or solve problems. It also includes a Computation strand and an Application



strand overlapping in difficulty. The less difficult problems in the Computation strand require the student to add or subtract two fractions, only one of which needs to be rewritten to produce a common denominator. This strand progresses to increasingly more difficult items involving multiplication and mixed numbers and addition and subtraction items where both fractions must be rewritten to produce a common denominator. Expressions involving more than one operation and items involving division of fractions are the most difficult items in the Computation strand. The Application strand follows the same progression as the Computation strand. Real-world Application problems are interwoven with the Computation strand, with simple Application questions closely related in terms of difficulty to the underlying arithmetic model from the Computation strand. Items from the Application strand require more steps to solve.

Numerical Skills/Prealgebra: Operations with Decimals. The Prealgebra Operations with Decimals Diagnostic Test also includes Computation and Application strands. The Computation strand covers addition, subtraction, multiplication, and division of decimals. The Application strand covers a wide variety of problems in real-world settings. Many items use decimals to express dollars and cents. Questions asking students to compare unit prices for two different situations tend to be at the more difficult end of the strand. Some questions at the more difficult end can be worked using a linear equation approach. Some questions do not fit clearly into these strands. For example, items that ask test takers to identify a decimal from a list that represents the largest of the numbers listed or items that pose place-value questions, both of which are nominally part of the Computation strand, are typically at the upper end of the difficulty continuum.

Numerical Skills/Prealgebra: Positive Integer Exponents, Square Roots, and Scientific Notation. The Prealgebra Positive Integers Diagnostic Test focuses on exponents and square roots, with many items using the context of scientific notation. Students are expected to work with both positive and negative integers as exponents. This diagnostic test includes Computation and Application strands. In the Computation strand, some questions focus on the interpretation of exponentiation as the number of times the base is multiplied by itself. Scientific notation provides another common theme, looking at the relationship between decimal point and exponent. Some questions involve approximations to square roots, though seldom more precisely than to the nearest integer so that locating the number under the radical between squares of integers is sufficient. A few problems look at properties of square roots. The Application strand covers a variety of real-world situations, focusing on how exponents and roots are used. Most of the problems require a relatively simple conversion to an arithmetic expression and then an evaluation of that expression.



Numerical Skills/Prealgebra: Ratios and Proportions. The Prealgebra Ratios Diagnostic Test asks students to express relations as specific ratios as well as to use equivalent ratios (which are related by proportions) to find desired quantities. Many of the questions in this test appear in a real-world context or in the context of geometric figures. Almost all the questions give information to determine a ratio, give the amount of one quantity being related by the ratio, and ask for the amount of the other quantity. The difficulty level of the questions varies with the complexity of the situation and the other mathematical skills that must be integrated to produce a solution to the problem. For example, questions with fractions other than halves in the quantities being related tend to be more difficult than questions that involve only integer quantities. The classic mixture problems cluster at the more difficult end of the Prealgebra Ratios Diagnostic Test. A few proportions are presented in simple number sentences or algebraic notation; these problems generally appear at the less difficult end of the spectrum.

Numerical Skills/Prealgebra: Percentages. The Prealgebra Percentages Diagnostic Test looks at percentages of numbers, as well as percent increases or decreases. Many of the questions appear in real-world settings. At the less difficult end of the item-level continuum, questions ask for straight percent calculations of numbers. These can become more complex, such as asking for a percent of a percent of a number, but such questions still tend to be less difficult than those with real-world settings. For the more difficult items in this test, there is often more than one “whole” from which the percentage could be computed. Students need to be attuned to what base is appropriate for a given situation.

Numerical Skills/Prealgebra: Averages (Means, Medians, and Modes). The Prealgebra Averages Diagnostic Test concentrates on properties of means, medians, and modes, and on ways these are computed. The word average is generally used in the question because it is more familiar to students than mean. Although computing modes and medians is less work for small numbers of data points than calculating means, students tend to confuse the terms; this tends to make questions about medians and modes more difficult. Many of the questions use real-world data or are posed in real-world contexts. The largest subset of questions in this diagnostic test is related to means. Questions that call for weighted averages tend to be more difficult than those that call for simple averages; most of these items use applied contexts. Clustering at the more difficult end of the diagnostic test are a few questions that ask about changes to the mean, median, and/or mode given changes to the dataset.

Algebra: Substituting Values into Algebraic Expressions. The Algebra Substituting Values Diagnostic Test focuses on one facet of what a variable means—a fundamental concept in algebra. Most of the items are presented in an algebraic context, although some questions have an applied setting at the more difficult end of the continuum. Most questions involve



negative integers; those involving positive integers tend to be less difficult. The complexity of the algebraic expression determines much of the difficulty of the substitution task. Expressions at the less difficult end of the test content have two variables and involve only the basic four operations, with expressions involving one variable that is squared (e.g., x2) being nearly as easy for students. Expressions progress to include denominators and exponents greater than 2. More difficult questions in this diagnostic test require more than just proficiency with substitution; they reflect how substitution is integrated with computation skills.

Algebra: Setting Up Equations for Given Situations. The Algebra Setting Up Equations Diagnostic Test includes only situations asks students to choose the correct algebraic expression to model the situation. Almost half of the situations involve real-world contexts, and the rest ask about mathematical concepts such as rectangles, numbers, products, reciprocals, and averages. Questions at the less difficult end of the content spectrum can involve several steps, but the descriptions of the relations are simple and the steps fairly independent. As the level of difficulty increases, situations become more complex and the order of steps more important. At the more difficult end of the spectrum are questions that could be modeled as involving a system of two equations in two unknowns, ready for one variable to be eliminated by substitution of the other relation.

Algebra: Basic Operations with Polynomials. The Algebra Basic Operations with Polynomials Diagnostic Test includes addition and subtraction of quadratic expressions at the less difficult end of the content continuum, progressing to multiplication of three binomials near the more difficult end. Some questions require that an expression be set up using a description. The most difficult question in this diagnostic test involves, after the initial subtraction is completed, observing that the denominator, a binomial, is half of the numerator.

Algebra: Factoring Polynomials. At the less difficult end of the Algebra Factoring Polynomials Diagnostic Test, items ask students to factor common terms out of a polynomial. Near this end of the spectrum, students are required to factor simple quadratic expressions with a single variable. As the diagnostic test difficulty progresses, questions become more complex; for example, giving an expression that becomes a simple quadratic only after a common term is factored out. Perfect squares and the difference of squares cluster at the more difficult end of this test. A few questions involve more than one variable and a few involve a denominator that is a common factor of the terms in the numerator.

Algebra: Linear Equations in One Variable. The first place that students see the power of algebra over arithmetic is in being able to solve applied problems by using a linear equation. The Algebra Linear Equations in One Variable Diagnostic Test focuses on solving linear equations in one variable and applying those techniques to solve problems set in real-world



contexts. Questions in this diagnostic test can be viewed as falling into two strands—Symbolic and Application—that overlap in difficulty. The Symbolic strand uses the language of algebra to present equations to be solved. Each equation is either linear or can be transformed into a linear equation. At the less difficult end of this strand, equations require one or two steps to isolate the variable. The level of difficulty increases as the question increases in complexity. Equations that must be transformed in order to be linear cluster at the more difficult end of this strand. The Application strand poses problems in real-world contexts. The typical solution involves setting up an equation, solving the equation, and perhaps doing something further with the obtained solution value. At the less difficult end of this strand, items are simple enough that some students will solve them with arithmetic rather than algebra; other applications include time-speed-distance, ratio, percent change, and mixture. Some questions fall somewhere between the two strands. Several of these questions give number sentences about percent change, where the final amount is known and the original amount must be found.

Algebra: Exponents and Radicals. The Algebra Exponents and Radicals Diagnostic Test looks at a wide variety of topics related to exponents, including square roots and cube roots. All questions are in an arithmetic/algebraic setting. At the less difficult end, questions involve numbers, either directly with an arithmetic expression or indirectly by giving values to substitute into an algebraic expression. There are questions widely ranging in difficulty that involve the concept of transferring perfect squares out of a square root or perfect cubes out of a cube root. Negative exponents, some in the context of scientific notation, also appear at various levels of difficulty. Fractional exponents cluster toward the more difficult end of the content continuum.

Algebra: Rational Expressions. The Algebra Rational Expressions Diagnostic Test focuses on the equivalence of various forms of rational expressions. Most of these questions also require factoring skills. At the less difficult end, the expressions in the numerator and denominator involve only monomials. At the next level of complexity are expressions where the numerator is a quadratic and the denominator is a factor of the numerator. Next are expressions in which the numerator and denominator are quadratics that have a common factor. The complexity continues to increase, with the difficulty level affected by the difficulty of the factoring task involved. Some of the questions involve negative integer exponents, and some incorporate equations that can be transformed to have a rational expression on one side. Questions tend to be more difficult if there is a chance to cancel one term in the denominator with one term in the numerator, where there are other terms that are then not cancelled.

Algebra: Linear Equations in Two Variables. The Algebra Linear Equations in Two Variables Diagnostic Test focuses on relationships between several representations of linearity,



including tables of values, equations, ordered pairs, graphs, and slope-intercept. At the less difficult end, when items ask for equations, those equations can be verified by substituting values from the given table of values or coordinates of points. Simple slope calculations are also at the less difficult end. Graphs in the (x,y) plane occur throughout this diagnostic test. Simply plotting points is fairly easy for most students; using a graph to find an equation tends to be harder than using an equation to identify a graph. Toward the more difficult end of the spectrum, questions require examinees to find a new point on a line given sufficient data to first construct the equation of the line. Relationships between parallel lines and perpendicular lines also fall near the more difficult end. Application problems occur throughout this diagnostic test. Less difficult questions tend to border on setting up equations. Some of those at the more difficult end might be solved algebraically as a system of two equations in two unknowns as easily as by constructing the equation of a single line and using that equation to answer the question.


Part 2, Chapter 3: Writing Skills tests

Chapter 3: Writing Skills tests

Overview Assessing a student’s achievement in writing skills is among the most challenging of educational measurements because language use is as diverse as the people using it. As part of ACT’s effort to meet this challenge, ACT updated the COMPASS Internet Version pool of Writing Skills items in June 2012 and is in the process of adopting a new taxonomy of English Language Arts classifications that align with emerging content standards.

Colleges can use the COMPASS Writing Skills components to place students in standard freshman English or developmental English courses. Items in the COMPASS Writing Skills Placement Test assess student abilities in the categories of usage/mechanics (punctuation, basic grammar and usage, sentence structure) and rhetorical skills (strategy, organization, style). The COMPASS Writing Skills diagnostic tests measure students’ strengths and weaknesses in eight writing domains: punctuation, verb formation and agreement, usage, relationships of clauses, shifts in construction, organization, spelling, and capitalization.

Writing Skills Placement Test The COMPASS Writing Skills Placement Test measures a student’s ability to apply the appropriate standard written English conventions of grammar, usage, and mechanics to actual passages. That is, a student performs simulated editing tasks to passages presented on the computer screen. Each passage contains several embedded errors that the student must find and correct. This is done by clicking on a suspected error, which will highlight the segment of text that contains a test item. Only one test item is found within each highlighted segment. The item presents five multiple-choice options. As the student chooses to retain the original text (option A) or to revise it (options B, C, D, or E), the text will be retained or revised accordingly in the passage presented. The end result is the complete new version of the passage as the student revised it. After finishing the revision, the student is presented with one or two test items that pose questions related to the author’s strategy and the passage as a whole.

Passage topics appeal to a diverse variety of students entering two- or four-year colleges and fall into categories of natural sciences, social sciences, prose fiction, practical writing, and humanities. The writing styles may be persuasive, informative, or narrative and expressive.



Classification of items The items presented in the COMPASS Writing Skills Placement Test fall into the following two main classification categories and six subcategories, respectively:

(1) Usage/mechanics category: items testing punctuation, basic grammar and usage, and sentence structure;

(2) Rhetorical skills category: items testing strategy, organization, and style.

Table (2)3.1 shows the content and approximate percentage of items in the COMPASS Writing Skills item pool across these categories and subcategories, as well as information on the total number of items in the pool.

TABLE (2)3.1 Contents, items (%) and counts for Writing Skills pool

Content classification % of pool

Usage/Mechanics 69 Punctuation 16 Basic grammar and usage 26 Sentence structure 27

Rhetorical Skills 31 Strategy 8 Organization 8 Style 15

Total % 100

No. of items in pool Total number 228

Usage/Mechanics items Items in the usage/mechanics category are directed at gauging the student’s understanding of the surface-level characteristics of writing. The three subcategories of this item type are detailed below.

Punctuation Items in the punctuation subcategory test a student’s understanding of misplaced, omitted, or superfluous commas; colons; semicolons; dashes; parentheses; apostrophes; question marks; periods; and exclamation points.

A. Punctuating breaks in thought • End of a sentence (period, exclamation point, question mark)



• Between clauses of compound sentences when conjunction is omitted or when clauses contain commas

• Before a conjunctive adverb joining clauses of a compound sentence • Parenthetical elements (comma, dash, parentheses)

B. Punctuating relationships and sequence • Avoiding ambiguity • Indicating apposition • Indicating possessives • Items in a series • Simple phrases and clauses in a series • Unequivocally restrictive or nonrestrictive clauses and phrases

C. Avoiding unnecessary punctuation • Between subject and predicate • Between verb and object • Between adjective and noun • Between preposition and object • Between noun and preposition • Between the intensive and the antecedent • Between two coordinate elements • Between correlatives • Within series already linked by conjunctions

Basic grammar and usage The items in the basic grammar and usage subcategory address a student’s understanding of agreement between subject and verb, pronoun and antecedent, and modifier and words modified; formation of verb tenses; pronoun case; formation of comparative and superlative adjectives and adverbs; and idiomatic usage.

A. Assuring grammatical agreement • Predicate with subjects of varying complexity (including compound subjects,

collective nouns) • Predicate with subject in sentences beginning with “There” or “Where” • Adjectives and adverbs with their corresponding nouns and verbs



B. Forming verbs • Tenses of regular and irregular verbs • Compound tenses

C. Using pronouns • Using the proper form of possessives and distinguishing them from adverbs

(“there”) and contractions (“it’s”) • Using the appropriate case of a pronoun • Forming comparatives and superlatives of adjectives and adverbs • Using the appropriate comparative or superlative form depending on the context

D. Observing usage conventions • Using the idioms of Standard English

Sentence structure The items in the sentence structure subcategory assess a student’s understanding of the relationships between and among clauses, the management and placement of modifiers, and unnecessary shifts in construction.

A. Relationships of clauses • Avoiding faulty subordination and coordination • Avoiding run-on sentences • Avoiding comma splices • Avoiding sentence fragments (except those rhetorically appropriate in

exclamations, dialogue, etc.)

B. Using modifiers • Constructing sentences so that antecedents are clear and unambiguous • Placing modifiers so that they modify the appropriate element

C. Avoiding unnecessary shifts in construction • Avoiding shifts in person • Avoiding shifts in number • Avoiding shifts in voice • Avoiding shifts in tense • Avoiding shifts in mood



Rhetorical Skills items Items in the rhetorical skills category assess a student’s understanding of the purposes and methods of effective writing. The three subcategories, strategy, organization, and style, are described in more detail below.

Strategy Strategy items deal with the appropriateness of expression in relation to audience and purpose; the effect of adding, revising, or deleting supporting material (e.g., strengthening compositions with appropriate supporting material); and the effective choice of opening, transitional, and closing sentences. These items focus on the processes of writing: the choices made and the strategies employed by a writer in the act of composing or revising. Strategy items may appear either in the passage text or as end-of-passage items.

A. Making decisions about the appropriateness of expression for audience and purpose

• Items of this type emphasize rhetorical strategies directed at particular readers, occasions, and assignments and also specify a particular rhetorical purpose (e.g., “If the writer had been assigned to write a 200-word essay for an academic history journal, would this essay fulfill the assignment?”).

B. Making decisions about adding, revising, or deleting supporting material • Items of this type emphasize strengthening compositions by adding appropriate

supporting facts and details. The items are dependent on the context of the passage and are not answerable simply on the basis of the information provided in the stem. The stem specifies the material being supported (e.g., an assertion or a description) and the primary desired effect.

C. Making decisions about cohesion devices: openings, transitions, and closings • Selecting an effective statement relative to the essay as a whole • Selecting an effective statement relative to a specific paragraph or paragraphs

Organization Organization items test a student’s ability to organize ideas and discern the relevancy of statements in context (order, coherence, unity).

A. Establishing logical order • Beginning a paragraph in the appropriate place • Choosing the appropriate transitional word or phrase • Placing sentences in a logical location • Ordering sentences or phrases in a logical way



B. Judging relevancy • Omitting irrelevant material (or retaining relevant material)

Style The items in the style subcategory assess a student’s understanding of precision and appropriateness in the choice of words and images, rhetorically effective management of sentence elements, avoidance of ambiguous pronoun references, and economy in writing.

A. Managing sentence elements effectively • Rhetorically effective subordination and combination • Avoiding ambiguity of pronoun reference when the relationship is problematic

B. Editing and revising effectively • Avoiding wordiness • Avoiding redundancy

C. Choosing words to fit meaning and function • Choosing words and images that are fresh; recognizing and avoiding clichés • Avoiding silly comparisons or expressions

D. Maintaining the established level of style and tone

Writing Skills diagnostic tests Students with developmental writing skills or who receive a score below a user-defined cutoff score on the COMPASS Writing Skills Placement Test can be routed to one or more of the eight diagnostic writing skills domains for additional assessment of their specific strengths and weaknesses. Each diagnostic domain has its own separate score and provides information about a student’s skills in that particular skill area. The diagnostic items are adaptively selected and presented to students from an item pool of about 40 items (the pool contains 324 items in all). Item types include cloze, multiple choice, sentence recombination, and capitalization. All items are discrete and consist of one or more sentences that focus on one writing skill. The following sections describe each one of the eight COMPASS Writing Skills diagnostic tests.

Punctuation Items in the COMPASS Writing Skills Punctuation Diagnostic Test focus on misplaced, omitted, or superfluous commas; colons; semicolons; dashes; question marks; periods; and exclamation points. These items consist of one or two sentences with choices for altering one area of punctuation. The main content categories are as follows:

• Punctuating breaks in thought



• Punctuating relationships and sequences • Avoiding unnecessary punctuation

Verb formation and agreement The items in the COMPASS Writing Skills Verb Formation and Agreement Diagnostic Test each consist of one or two sentences with choices for selecting appropriate verb forms. The items are of two main content categories:

• Assuring verb agreement • Forming verbs

Usage The items in the COMPASS Writing Skills Usage Diagnostic Test focus on agreement between pronoun and antecedent, agreement between modifiers and the words being modified, formation of comparative and superlative adjectives and adverbs, and idiomatic usage. Each item consists of one or two sentences with choices for selecting appropriate pronouns, adjectives, adverbs, modifiers, and idiomatic expressions. The four main content categories are as follows:

• Assuring grammatical agreement

• Using pronouns

• Forming modifiers

• Observing usage conventions

Relationships of clauses The items in the COMPASS Writing Skills Relationships of Clauses Diagnostic Test focus on clauses and how they relate within sentences. The items consist of one or two sentences with options for selecting the appropriate manner of joining, splitting, or defining clauses in sentences. The main content categories are as follows:

• Avoiding faulty subordination and coordination

• Avoiding run-on sentences

• Avoiding sentence fragments

• Constructing sentences so that antecedents are clear and unambiguous

Shifts in construction Items in the COMPASS Writing Skills Shifts in Construction Diagnostic Test focus on establishing appropriate parallelism within sentences. The items consist of one or two sentences



with options for selecting appropriate parallel structures. The main content categories are the following:

• Avoiding shifts in person

• Avoiding shifts in number

• Avoiding shifts in voice

• Avoiding shifts in tense

• Avoiding shifts in mood

Organization The items in the COMPASS Writing Skills Organization Diagnostic Test focus on establishing the order, coherence, and unity of ideas. The organization domain includes three different item types:

(1) a multiple-choice item consisting of one or two sentences with options for choosing appropriate order or transitions;

(2) a multiple-choice item presenting four different sentences from which students choose the appropriately structured sentence; and

(3) an item with four sentences that students must place in logical order.

The main content categories are as follows:

• Choosing the appropriate transitional word or phrase

• Placing sentences in a logical order

• Sequencing phrases in a logical order

Spelling The items in the COMPASS Writing Skills Spelling Diagnostic Test focus on commonly misspelled words. Each item consists of one sentence with options for spelling a commonly misspelled word. The main content categories are as follows:

• Recognizing the correct spelling of commonly misspelled words

• Recognizing the correct spelling of words with prefixes and suffixes

Capitalization The items in the COMPASS Writing Skills Capitalization Diagnostic Test focus on words with common capitalization errors. The items consist of one sentence with four underlined words. Students may move the cursor over each underlined word and change capitalization by clicking the mouse.


Part 2, Chapter 4: English as a Second Language tests

Chapter 4: English as a Second Language tests

Overview The English as a Second Language component of COMPASS (i.e., COMPASS/ESL, often referred to in this manual as “ESL”) comprises three computer-adaptive, untimed English proficiency tests. These are ESL Grammar/Usage, ESL Reading, and ESL Listening, and an optional direct writing assessment, ESL e-Write™. These assessments can be used separately or in combination to help a college or Intensive English Program (IEP) place ESL students in appropriate ESL classes or mainstream college courses. Each test spans five levels of English proficiency, ranging from beginning ability to near-native-speaker ability.

This chapter discusses philosophies of language learning and testing as they relate to the ESL tests; specifications of item types and details of the English skills proficiency descriptors used for each level in ESL Grammar/Usage, Reading, and Listening tests, and general information about the ESL direct writing assessment; a more detailed description of ESL e-Write is found in Part 2, Chapter 5.

Philosophies of language learning and testing Students learn a new language in a variety of ways. Instructors anticipating this variety can provide students with what works well for them and what will best enable them to be successful. For example, some individuals may do best with the whole language method, spelling practice, or the natural approach. Others do best with one or more methods such as grammar-translation, audio-lingual practice, total physical response, extended silent reading, or careful textual analysis.

Approach to ESL testing ACT has based these ESL tests on several philosophies of language learning drawn from models proposed by Canale and Swain, Krashen, Chomsky, Rea, Bachman, and others. Implicit in the ACT approach is Krashen’s (1983) distinction between “acquisition” and “learning.” Krashen contends that most language is acquired through comprehensible input rather than through the study of rules. Krashen further contends that students who are consciously learning language will develop the ability in certain circumstances to monitor, or self-correct, their own language.

Canale and Swain’s (1980) model of communicative competence describes language as “communication” rather than as “rules learned.” Chomsky (1965) first made the distinction



between “competence” and “performance” in language, noting that competence is what the learner actually knows, whereas performance is what that same learner produces through writing, speaking, and answering questions. Rea (1985), however, points out that this distinction, although interesting for language teaching, is not relevant to language testing. This is because tests measure the student’s performance, and the student’s level of competence can only be inferred from the performance. Rea lists other distinctions in language testing:

• meaning-dependent testing versus meaning-independent testing (that is, language testing with attention to meaning versus with attention to form and manipulating the grammatical code as an abstract system);

• grammatical competence testing versus sociolinguistic competence testing (that is, “usage versus use,” or “what the rules say is correct in certain circumstances versus what people actually say”);

• integrative (contextualized) testing versus decontextualized testing (that is, testing skills within an authentic context versus testing skills in an abstracted form, one that has been removed from a context).

The items in the ESL Grammar/Usage, Reading, and Listening tests are meaning-dependent and integrative—or at least as much as is possible within the context of a multiple-choice format. These items also measure some of the language competence components described by Bachman’s (1990) model in Fundamental Considerations in Language Testing:

• organizational competence (producing or recognizing grammatical sentences, comprehending their propositional content, and ordering the sentences so that text is formed);

• grammatical competence (language usage including vocabulary, morphology, syntax, and phonology/graphology);

• textual competence (joining utterances to form a written or spoken text; cohesion and coherence);

• pragmatic competence (recognizing the linguistic signals used in communication and how they make connections between users and the context and between signs and their referents);

• illocutionary competence (understanding the meaning of various speech acts; being aware of strategies speakers can use to communicate effectively for a variety of purposes);



• sociolinguistic competence (using or recognizing language appropriate to a situation or context; exhibiting sensitivity to differences in dialect, variety of language or register; interpreting cultural references and figures of speech).

The ESL Reading Test items primarily measure an English language learner’s textual or organizational competencies, but some items also measure pragmatic aspects such as metaphor. At the higher levels in all three ESL tests, students are tested on their sensitivity to register and interpretation of cultural references and figures of speech. Although metaphorical language tends to be more difficult for second-language learners, there is value in testing it because of its prevalence in natural speech and in texts presented to students in some college content areas. ESL students will need some competence in metaphorical language before enrolling in standard university-level classes.

Proficiency at 5 levels The ESL Grammar/Usage, Reading, and Listening tests use proficiency descriptors to describe specific skills found at five levels of the score scale. The levels are also mapped to the test specifications for each test. Hence, the skills tested directly link to the levels of the score scale and the proficiency descriptors. Each student’s score report will include both a numeric domain score and a proficiency descriptor.

The ESL levels are designed to provide sufficient discrimination among the levels of proficiency; students scoring at the highest levels are likely ready to enter standard college courses (outside of an ESL program). The range of proficiency for the ESL component covers enough levels to provide an optimum number of well-defined course recommendations. In general, high proficiency correlates with the students’ comprehension of increasingly abstract, dense, and lengthy language stimuli. The ESL component does not use the terms beginning, intermediate, and advanced because these terms may have different meanings at different institutions. Instead, the levels are labeled using numbers (i.e., Pre-Level 1, Level 1, Level 2, Level 3, and Level 4). The five proficiency levels for each testing domain will be elaborated upon in further detail in this chapter under the individual domain heading.

ACT developed the ESL proficiency descriptors after considerable research into pertinent sources, including nationally recognized language proficiency descriptors put forth through the American Council on the Teaching of Foreign Languages (ACTFL) and the Teachers of English to Speakers of Other Languages (TESOL). Other sources were college benchmarks for ESL students; standards for preschool through twelfth grade English language learners; ESL syllabi from various universities; and documents describing learning outcomes tied to specific ESL college curricula—the most notable being “Course Competencies and



Learning Outcomes” correlated to ESL classes at Miami-Dade Community College in Florida. For more information, please refer to Part 3, Chapter 4, “Development of ESL tests,” in this Reference Manual.

Even with the extensive research and detailed correlations that went into designing the ESL proficiency descriptors, it is important to understand that the order of language elements included in proficiency descriptors can be an approximation only. The ESL tests are not designed to be a diagnostic instrument.

ESL Grammar/Usage Proficiency Test The ESL Grammar/Usage Proficiency Test assesses students’ abilities to recognize and manipulate Standard American English in two main categories: (1) sentence elements and (2) sentence structure and syntax. The first category includes word and phrase elements of sentences such as verbs, subjects and objects, modifiers, function words, conventions (punctuation, capitalization, spelling), and word formation. The second category includes word order, the relationships between and among clauses, agreement, and how grammar relates to discourse beyond the sentence level. The emphasis and content in these two areas differ across levels of English proficiency.

It is not the students’ knowledge of English grammar terminology that is being tested but their understanding of English grammar and usage within a context. All items are presented in a multiple-choice format, and the examinee selects the best option by clicking on it. One item type in the ESL Grammar/Usage Proficiency Test uses a modified cloze format, with blanks in sentences and four options for filling in the blank. When a student clicks on an option, its text will appear on screen in the blank so that the student will be able to see the answer in context. Another item type is a question with four options, based on a reading passage.



Item classification categories Table (2)4.1 shows the two main classification categories and relative proportions of these items in the ESL Grammar/Usage Test item pool at four levels of proficiency.

TABLE (2)4.1 ESL Grammar/Usage pool content, items at 4 proficiency levels

ESL Grammar/Usage pool content

Level 1 Level 2 Level 3 Level 4 No. of items

% of pool

No. of items

% of pool

No. of items

% of pool

No. of items

% of pool

Sentence elements 19 9 61 29 42 20 13 6

Sentence structure and syntax 5 2 21 10 33 16 13 6

Total 24 12 82 40 75 36 26 12

The following section provides a complete list of the item types in the ESL Grammar/Usage component. These are given according to classification category, with subheadings denoting the classification code, bullets denoting the language skill tested, and examples showing how the skills could be tested. Brackets indicate incorrect words or phrases that could be used as incorrect answer options (distractors).

Category 1: Sentence elements

Verbs • use of linking/to be verb as a main verb

example: She [missing is] my sister.

• present tense

example: The Earth is [revolve] around the Sun.

• past tense

examples: Cooper [goes] to school yesterday. Why [did he worked] so hard?

• future tense—will, be going to

examples: We will be [to have] the party tomorrow night. We are [go] to attend the lecture next Monday.

• continuous forms in present and past, including impossible continuous forms

examples: Tia is [to working] on her assignment right now. Jose was [attend] school in Boston last year while I was working. The soup [is smelling] good.



Cora [is having] brown eyes.

• present perfect tense, present perfect continuous

examples: Kaj has [live] in New York all his life. I have been [work] here since graduation. We [are worked] here already for five years.

• past perfect tense, present perfect continuous

examples: Jordan had [live] in New York until last year when he moved to California. Sabine had been [work] as a teacher since her graduation until recently when she took a different job.

• future perfect tense, future perfect continuous

examples: In August, Naomi will have [live] here for ten years. In June, Arnold will have been [work] on his house for five years.

• use of auxiliary verbs (be, have, do; matching auxiliaries in tag questions)

examples: Why [missing do] you live there? [Do you lived] there for a long time? He’s a nice man, [isn’t it]? I [missing am] running very fast. [Do you must] come on the bus?

• passive voice, including use of auxiliaries, incorrect choice of passive voice, and get passives

examples: The tree branch [missing was] broken by the storm. Tatum [was hoped] that she could win. The person was injured/got injured by the truck.

• irregular verbs

example: The children [broked] the window with their ball.

• modals: formation, meaning, including perfect modals

example: You (should/ought to/must) [visited] your family this weekend. You [could have pass] the test if you had studied. We should [working] now.

• gerunds/infinitives

example: He stopped eating/he stopped to eat. (Note difference in meaning) It is important [for stop] not. I asked her [to going] with me. He is always early. She expects [missing him] to be early again today.



• two-word verbs/phrasal verbs

example: The woman [ran over] a large bill at the department store. (ran up) Burgess [called his relatives on]. He [ran his friend into] at the store.

Subjects/objects • pronoun case: nominative, objective, possessive

examples: He asked [she] to go along with him. He saw [she’s] mother downtown.

• gender of pronouns

example: The woman went shopping. [He] spent a lot of money.

• surrogate subjects

examples: Are only two good restaurants in town. (missing There) Is raining. (missing It) [There] is raining today.

• missing or superfluous objects

examples: I have a table. I bought [ ] from a friend who didn’t need [ ] anymore. (missing it)

The movie that I saw [it] was good.

• pronoun reference

example: Missy left home late, but before she went to class she had to go back to [his] house.

• contractions with pronouns: he’s

example: They[’s] late for class.

• count/non-count nouns

example: How many [luggages] did he bring with him?

• reflexive/recursive pronouns: e.g., himself, each other

examples: They tried it [themself]. [He saw each other.]

Modifiers (adjectives and adverbs) • comparative and superlative adjectives; equatives: e.g., as. . . as;

number: e.g., a few/few; a little/little; adjectives of place: e.g., this, that

examples: She is [most] beautiful than her sister.



When I first came here, I didn’t know anyone, but now I have [few] friends. (a few)

• possessive adjectives: e.g., my, her

example: This is [mine] book.

• adverbs of place, time, frequency: e.g., sometimes, often

examples: He was tired and wanted to go [to] home. She will go to the doctor [yesterday].

• intensifiers

example: After staying awake all night, he is [somewhat] tired. (very, extremely)

• misuse of adverbs/adjectives as verbs

example: I [sorried] for my mistake.

Function words • articles

examples: Jenny went to see [the] Mr. Brown today. She saw a dog. [A] dog was very large.

• conjunctions: coordinating, subordinating, correlative

examples: It was cold [and] we had fun anyway. (but) [If] he was waiting, we had a good time. [Not only he went, and we did, too.]

• prepositions—including with passives

examples: He went [to] home. She lives [on] an apartment [in] Spruce Street.

Conventions • punctuation: end of sentence, items in a series, parenthetical, misplaced, omitted, or

superfluous

examples: Juanita [and] Joe and Chris went together. (series, missing commas) Illinois which is my favorite state is home to a great variety of people. (missing commas) The state, that I like best, is Illinois. (superfluous commas) As the child stepped into the street, a man yelled, “Stop[.]”

• capitalization

examples: In new york, they visited the statue of liberty. I saw the movie the wizard of oz.



• spelling: plurals, homonyms, transposition of letters

examples: Two [woman] sat down next to me. They lost [there] money on the train. How did you do [taht]?

Word formation • various forms of the same word: e.g., instruct/instruction/instructor

example: He is the best [instruct] I have ever had.

Category 2: Sentence structure and syntax

Agreement • subject/verb agreement

example: The people in the park [is] angry.

• pronoun/antecedent agreement

example: The student ate [their] lunch at noon.

Word order • subject/verb word order

example: [Escaped the man] from his country.

• subject/object word order

example: [Much food will eat a hungry person.]

• placement of adjectives and adverbs, including not

examples: I have a new job. I work in the daytime and [I work not] at night anymore. I [yesterday] went to the office.

• question formation—yes/no questions, Wh-questions, use of do

examples: Did you [worked] all day yesterday? I see that I’m late for the movie. What time [the movie did start]? How [the family did like] the new car?

Clauses • word order using indirect objects

examples: He gave [to] me the assignment book. He gave the assignment book me. [missing to] [She recommended me the class.]

• relative clauses: who, whom, whose, which, that; restrictive/nonrestrictive



examples: The book, that I like most, is Robin Hood. (restrictive clauses punctuated as a nonrestrictive clause)

I like people [they] are helpful. The man [sits next to me on the bus] is very kind.

• clauses of time, cause, and opposition

examples: [When you have worked], I was finishing the painting. I have not seen him [since he leaves this morning]. [Because homesick], I want to go home. Although he worked hard, [but] he didn’t finish.

• conditional clauses; clauses of purpose and result (e.g., so that, as a result of )

examples: What happens [if I went here]? If I were you, [I were working harder] to get a good grade.

• reduced adjective and adverb clauses, including restrictions on reductions

examples: [While lecturing to the class], I fell asleep. [Before left for school], I put on my warm coat.

• noun clauses, including indirect speech

examples: [She doesn’t understand] is clear. He asked me [what did I want?]

• participial adjectives; dangling modifiers; “squinters”

examples: I was so [boring] in class today. The teacher talked on and on. [Having stopped to make a phone call], the bus was missed. To fall down often can be embarrassing. (Often can be read in two ways here, modifying “to fall down” or modifying “can be embarrassing.”)

• relationships between clauses, including interaction among verb tenses

examples: I wanted to see the movie last night, but I didn’t want to go alone. If you [will go], I would have gone, too.

He called her after he [has found] her telephone number. He thought he was smart until he [goes] to college. They [are living] here since they were teenagers. We will go after we [will] finish.

• transitions between clauses: correct choice of transition word; combining sentences

example: The weather has been warm and beautiful. [Also], now it is starting to get cold.

• punctuation as it affects meaning in clauses and sentences; run-on sentences and



sentence fragments; parenthetical expressions

examples: [It was such a nice day, we went on a picnic, there were ants.] (comma splice)

[Emily was late Melinda was early.] (run-on) [Before he came.] I was quite lonely. (fragment)

• parallelism; unnecessary shifts in construction

examples: They like skating, surfing, and [to sail]. I waited while he finished his homework and [that she wasn’t ready either].

Discourse functions of grammar • cohesive devices such as pronoun reference across sentences; lexical

repetition/synonyms; ellipsis/substitution

examples: pronoun reference Several students organized a music concert for their school. Because they thought they would enjoy the concert, they expected the teachers and other students would enjoy it, too. (Question: What does the word they refer to?) First he was a famous actor in Hollywood. Later, Ronald Reagan became president of the United States. (Question: What does the word he refer to?)

example: lexical repetition/synonyms That mutt was my best friend. He went everywhere I went and was the greatest dog that ever lived, a pooch of great intelligence. My friends thought he was an amazing animal, too. (Question: Which other words in these sentences refer to the dog? mutt, he, pooch, animal)

example: ellipsis The children will attend the program early, the adults late. (Question: What information is implied after adults?)

example: substitution He likes spaghetti a lot, and so do I. (Question: What do the words so do I stand for?) My friend got a new car, and I’d like one, too. (Question: What does the word one refer to?)

• foregrounding/backgrounding of sentence elements

examples: foregrounding What did Tosha ask for? What Tosha asked for was a new television set (or A new television set was what Tosha asked for).

examples: inaccurate foregrounding



Who asked for a new television set? [The new television set was asked for by Tosha.]

examples: backgrounding Who broke the glass? The glass was unavoidably broken.

ESL Grammar/Usage Test proficiency descriptors

Pre-Level 1 (1–41) Although students scoring at this level (1–41) may have some limited knowledge of English grammar and usage, they have provided insufficient evidence that they possess the following skills: recognizing simple present tense, plurals, correct word order in simple sentences, and simple pronominal references.

Level 1 (42–62) Students at Level 1 typically can recognize simple present tense, plurals, correct word order in simple sentences, and simple pronominal references.

Level 2 (63–83) Students at Level 2 typically can recognize correct structuring of simple sentences using a variety of tenses including simple past and present, future, past and present continuous, and high-frequency irregular verbs. They also typically can recognize correct word order in statements, imperatives, simple yes-no questions, some Wh-questions, and sentences with simple relative clauses. Level 2 students know many of the conventions of capitalization and punctuation. They recognize correct uses of the basic auxiliary system, time markers, and appropriate end-of-sentence punctuation.

Level 3 (84–93) Students at Level 3 typically can recognize high-frequency uses of the present perfect

and past perfect tenses and correct uses of most regular and irregular verbs, simple modal verbs, passive verbs, and participial adjectives. They can select correctly structured compound sentences as well as complex sentences using subordinating conjunctions. They can correct the punctuation in many run-on sentences or sentence fragments within a context. They often can recognize correct uses of gerunds, infinitives, and conditional clauses. Level 3 students can select appropriate transition words to join clauses and sentences, and they can recognize unnecessary shifts in construction and lack of parallelism at the word and phrase level. They can select correct uses of subordinate clauses, and they can recognize and correct some errors in more abstract kinds of writing, including prose intended for academic or occupational needs.



Level 4 (94–99) Students at Level 4 typically can select correct uses of nearly all of the verb forms of

English. They can recognize unnecessary shifts in construction at the clause level. They can recognize accurate relationships among clauses and correctly formed interactions among verb tenses in related clauses. They can recognize correct word order, agreement, and the complex relationships between and among clauses at a near-native level, including correct uses of coordinating, subordinating, and correlative conjunctions, appropriate transition words, and various other cohesive devices at the level of discourse, not just at the clause or sentence level. They can select correct punctuation related to meaning. Level 4 students can recognize formal and informal registers, know when language is appropriate for a given context or situation, and understand how meaning can change with context. Low-frequency uses of language may still cause problems even for these advanced students.

ESL Reading Proficiency Test The ESL Reading Proficiency Test consists of reading passages accompanied by

multiple-choice items. The test assesses students’ abilities to recognize and manipulate Standard American English in two main categories: (1) referring (reading explicitly stated material) and (2) reasoning (inferential reading). Background knowledge is not tested.

Most of the stimuli in the ESL Reading Proficiency Test are excerpts from authentic published materials that have been edited to be appropriate for English language learners. As a matter of fairness, the reading stimuli avoid subjects that could activate background knowledge for some students but not for others. The stimuli include passages ranging in length from several sentences to many paragraphs; sometimes stimuli will include tables, charts, graphs, or maps for which the student is required to perform various reading tasks such as interpretation or following directions. Graphics or photographs accompany some of the passages and items, thus supplying a context or a means to test reading comprehension.

The passages used in the ESL Reading Proficiency Test vary along two main continua: (1) context, to include vocabulary, and (2) grammatical structure. Materials at the lower end of the context continuum are limited to areas of common knowledge, such as food, transportation, and work, and they use basic vocabulary without idioms or metaphors. Reading passages at the higher levels of the context continuum incorporate academic and unfamiliar contexts and do include idiomatic and metaphorical language.

Although the reading process involves a number of important strategies—such as previewing and predicting, skimming and scanning, using prior knowledge, paraphrasing, and summarizing—the ESL Reading Test does not directly assess these strategies but tests them



indirectly. For example, the items that ask students to locate specific information are testing scanning skills; items asking them to find main ideas and recognize appropriate paraphrases or summaries are requiring them to use strategies of skimming, paraphrasing, and summarizing.

Referring and reasoning test items are presented to students at all levels of proficiency; however, items at the lower levels place more emphasis on tasks of referring, and items at the higher levels emphasize reasoning tasks. In referring items, students must show a literal comprehension of main ideas, significant details, and explicitly stated relationships. The answers to referring items are directly given in the text of the passage. In the reasoning items, students must make inferences from the passage or use logic to find the answer. For example, the item may require inferring the meaning of vocabulary from context or demonstrating critical understanding of the text, paraphrasing some of it, and applying information to new situations. The student may be required to relate several statements in the passage or interpret entire sections of the text.

Table (2)4.2 shows the classification categories and relative proportions of these items in the ESL Reading Proficiency Test item pool at four levels of proficiency.

TABLE (2)4.2 ESL Reading pool content, items at 4 proficiency levels

ESL Reading pool

content


% of pool

No. of items

% of pool

No. of items

% of pool

No. of items

% of pool

Referring 21 10 34 16 29 14 18 9

Reasoning 5 2.5 28 13.5 33 16 40 19

Total 26 12.5 62 29.5 62 30 58 28

Category 1: Referring items The items in the referring classification category pose questions about information

stated explicitly in the passage. They require the student to recognize the following:

• main ideas (that is, to recognize or restate the main idea or ideas of a paragraph or passage that are explicitly stated, including main ideas of directions; paraphrasing);

• significant details (that is, to locate details or pieces of information in the passage such as who, what, when, where, why, how);

• vocabulary related to pictures or signs and vocabulary defined in the text;

• relationships;

• explicitly stated sequences; steps in following directions;

• explicitly stated cause-and-effect relationships;



• explicitly stated comparisons/comparative relationships;

• explicit discourse features of language, especially simple pronoun reference.

Category 2: Reasoning items The items in the reasoning category pose questions about meaning that is implied in a

passage, rather than explicitly stated, and questions that require reasoning about a passage. These items require the student to do the following:

• make inferences from the text;

• infer or synthesize the main idea of a paragraph or passage;

• relate supporting details and/or ideas to the main idea; recognize the organizational pattern of thesis and supporting statements or reasons;

• infer sequence; recognize organizational pattern of time sequence;

• infer cause-and-effect relationships; recognize cause-and-effect organizational patterns;

• infer the meaning of vocabulary from context, including idiomatic expressions at higher levels; choose the correct dictionary definition for the context (academic vocabulary at higher levels); recognize prefixes, suffixes, and roots from context; recognize synonyms and antonyms;

• critically understand the text;

• draw conclusions from facts given, including comparing facts to make choices; make appropriate generalizations based on the passage, including the author’s purpose;

• recognize multiple points of view, including comparing and contrasting ideas; detect bias and prejudice;

• recognize reasonable arguments and hypotheses as well as logical fallacies and ambiguities;

• distinguish between “pro” and “con” (“for” and “against”) statements;

• understand high-frequency cultural references;

• understand author’s tone, mood, or style; recognize formal and informal register; recognize the intended audience;

• determine meaning through understanding discourse functions of language: reference words, transitions, ellipses, substitution; foregrounding and backgrounding of information; and

• apply information from a passage to new situations outside the immediate scope of the passage; make analogies based on a reading passage.



ESL Reading proficiency descriptors

Pre-Level 1 (1–37) Although students scoring at this level (1–37) may have some limited reading skills in English, they have provided insufficient evidence that they recognize most of the letters of the English alphabet and a few sight words, especially those from the environment, such as common signs and words, phrases, or short sentences supported by pictures.

Level 1 (38–64) Students at Level 1 typically can recognize most letters of the English alphabet and a few sight words, especially those from the environment, such as common signs and words, phrases, or short sentences supported by pictures.

Level 2 (65–79) Students at Level 2 typically can read brief prose composed of short, simple sentences related to everyday needs (such as numbers, street signs, short informational signs, simple instructions). They can understand high-frequency structures, such as present, simple past, and simple future tenses. They usually understand some of the more common idioms and colloquial expressions. Level 2 students can compare facts to make choices (such as making a purchase), and they can draw simple conclusions from their reading.

Level 3 (80–91) Students at Level 3 typically can comprehend prose of several paragraphs on subjects within a familiar framework and with a clear underlying structure, and they can understand some main ideas in limited occupational or academic materials. Level 3 students can read news items, basic business letters, simple technical materials, classified ads, school bulletins, and academic text excerpts, and they can comprehend multi-step directions. They can use the reading strategies of skimming, scanning, and predicting to locate information and to help structure their reading for a variety of purposes. They can also use a variety of textual clues such as sentence connectors, transitions, and pronoun reference to comprehend the meaning and structure of a text. Level 3 students sometimes can understand the meanings of new words from context, distinguish between main and supporting ideas, and understand some common cultural references. They can make some inferences and generalizations from what they read, though complex inferences may still be difficult for them to make. However, they can often read texts equal in difficulty to those read by students at a more advanced level, though with less consistent comprehension. They possess some awareness of style and register.



Level 4 (92–99) Students at Level 4 typically can read for many purposes at a relatively normal rate with increasing comprehension. They can read increasingly abstract and grammatically complex materials; understand some uses of hypothesis, argument, and opinion in writing; differentiate between fact and opinion in academic and general writing. They can interpret, infer, make generalizations, relate ideas, and identify an author’s biases, tone, or mood. They can paraphrase an author’s implicit meaning or main points. Level 4 students have an emerging awareness of literary style. Materials they read with accuracy may include more complex newspaper articles as well as some periodicals, academic texts, technical documents, and library reference materials. Their reading exhibits a near-native-speaker proficiency but with less flexibility and a slower comprehension rate. Level 4 students do have some difficulty with unusually complex structures, low-frequency idioms or colloquial language, and obscure cultural references.

ESL Listening Proficiency Test The ESL Listening Proficiency Test assesses students’ abilities to recognize and

manipulate Standard American English in two main categories:

(1) listening for explicitly stated information and

(2) listening for implicitly stated information.

Listening tasks increase in difficulty along several continua across several levels of proficiency. Specifically, these continua comprise the following:

• rate of speech; • vocabulary; • blending; • reductions and elision of words; • idiomatic and metaphorical language; and • length of stimuli.

Stimuli in the ESL Listening Proficiency Test are based on authentic listening activities, and items test the types of information that listeners would have to comprehend in actual second-language situations. The test’s focus is on holistic, meaning-centered, discourse-level language comprehension. Smaller language units such as phonemes and individual words are tested only in the context of comprehending realistic and authentic speech. Aural stimuli differ in length and complexity by proficiency level. Beginning-level students respond to sentence-



length prompts or short conversational exchanges focused on practical, universally familiar situations.

Intermediate-level students respond to somewhat longer prompts, and the speech begins to include idiomatic language, word-level reductions, and the blending of sounds. The primary content at the intermediate level is practical everyday experiences, though some items focus on academic language. Advanced-level students respond to more complex conversational exchanges with a number of word-level reductions, more linking and blending of sounds, and more use of American idioms. In addition, advanced-level students encounter short academic lectures designed to simulate language used in college classrooms.

A common problem with listening tests is that students may rely on skills other than listening comprehension for responding correctly. They may need to read written responses or hold a great deal of information in memory. Thus, a score for listening comprehension may be confounded with reading comprehension skills and working memory. To diminish the effects of less-relevant skills in the listening questions, the ESL Listening Test incorporates pictorial responses to aural prompts (primarily at low levels), note-taking options, and the facility to listen to recordings of questions. During the test, students also have an unlimited time to read—and to listen to, if they so choose—the questions associated with an aural stimulus before they activate the recording of the stimulus. All items are in a multiple-choice format, and the examinee selects options by clicking on them. Item types range from recognizing pictures that go with words at the lowest level to answering inferential questions about academic materials at the highest level. The majority of the questions at all levels test explicit information, but more advanced levels have a higher number of inferential questions.

Item Classification Categories Table (2)4.3 shows the classification categories and approximate percentage of these items in the ESL Listening Proficiency Test pool at four levels of proficiency. Further description about these categories follows the table.

TABLE (2)4.3 ESL Listening pool content, items at 4 proficiency levels

ESL Listening pool

content


% of pool

No. of items

% of pool

No. of items

% of pool

No. of items

% of pool

Explicit 20 11 58 31 52 27 28 15

Implicit 0 0 13 7 10 5 9 4

Total 20 11 71 38 62 32 37 19



Category 1: Listening for explicit information The items in the first category ask questions about information explicitly stated in the aural stimulus. These involve the following:

• recognizing main ideas; • recognizing significant details; • recognizing relationships; distinguishing between main ideas and details; • determining sequence and relationships from discourse markers; • recognizing numbers and dates.

Category 2: Listening for implicit information The items in the second category pose questions about meaning that is implied in an aural stimulus, rather than explicitly stated, and questions that require reasoning about aurally presented information. The test presents a lower percentage of these items, even at the advanced level, because it is difficult to make complex inferences based on spoken material. Following are the skills being tested:

• understanding main ideas; • making inferences about omitted information; • determining vocabulary from context clues; and • recognizing register.

ESL Listening proficiency levels

Pre-Level 1 (1–41) Although students scoring at this level (1–41) may have some limited listening skills in English, they have provided insufficient evidence that they possess the following skills: understanding simple common words and learned phrases related to immediate needs (for example, greetings).

Level 1 (42–66) The understanding of students at Level 1 typically is limited to simple common words and learned phrases related to immediate needs (such as greetings). The students have little ability to comprehend even short utterances.

Level 2 (67–81) Students at Level 2 typically have the ability to understand brief questions and answers relating to personal information, the immediate setting, or predictable aspects of everyday need. They understand short conversations supported by context but usually require careful or slowed



speech, repetitions, or rephrasing. Their comprehension of main ideas and details is still incomplete. They can distinguish common time forms, some question forms (Wh-, yes/no, tag questions), most common word-order patterns, and most simple contractions, but the students may have difficulty with tense shifts and more complex sentence structures.

Level 3 (82–91) Students at Level 3 typically are able to understand most discourse about personal situations and other everyday experiences, including conversations with basic academic and/or occupational subject matter. They typically can understand most exchanges that occur at a near-normal to normal conversational rate; they generally grasp main ideas and details, although comprehension is sometimes affected by length, topic familiarity, or cultural knowledge. They are able to understand different time frames and usually understand utterances using the perfect tenses, conditionals, modals, passives; they are aware of cohesive devices but may be unable to use them to enhance comprehension. Colloquial speech may cause difficulty. Level 3 students can detect emotional overtones but cannot reliably interpret mood, tone, or intent.

Level 4 (92–99) Students at Level 4 typically can understand linguistically complex discussions, including academic lectures and factual reports. Though the students may have occasional problems with colloquialisms, idiomatic language, or rapid native speech, they can use context clues to aid comprehension and also can understand most discourse markers. They have acquired the ability to comprehend implications, inferences, emotional overtones, differences in style, and shifts in register. Level 4 students can understand almost all reductions, elisions, and blends in the spoken language.

ESL e-Write Test ESL e-Write is a direct measure of student writing skills. Students are asked to provide a single writing sample in response to a prompt that has been developed to accommodate varying levels of language acquisition. ESL e-Write writing tasks describe a common aspect of everyday life; test takers are to state an opinion about this aspect of everyday life and support this opinion. ACT developed the prompt language to be specifically accessible for ESL students.

ESL e-Write analytic score scales The ESL e-Write assessment produces six scores: a single overall score that can be used for placement purposes and five analytic subscores that may be used for prescriptive purposes. The analytic scores provide specific domain information to students and institutions concerning



an individual’s relative strengths and weaknesses. These analytic scores can be used to focus on possible areas in which the student needs more instruction. Each ESL e-Write analytic score is reported on a scale of 2–12, according to the electronic scoring. Human scoring, if used, reflects the combined evaluations of two trained raters. The ESL e-Write domains comprise the following: development, focus, organization, language use, and mechanics.

ESL e-Write overall score Because ESL students and their instructors require additional feedback on student-level performance, the method for calculating the overall ESL e-Write score is tied directly to the analytic scores. The analytic scores are summed and the domain weights are assigned. The weighted analytic scores then are summed, resulting in the overall score that ranges from 2 to 12 in 1-point increments. The overall score is helpful to colleges that use it to determine general placement of ESL students in Standard English or ESL courses. Table (2)4.4 lists each of the ESL e-Write analytic score domains and the weighting model used to calculate the overall score. Part 2, Chapter 5, “Direct writing assessment,” has more detailed description of the ESL e-Write assessment and its scoring.

TABLE (2)4.4 ESL e-Write™ analytic scores and overall score weighting

Analytic score domains Analytic score weights (%)

Development The reasons, examples, and details that are used to support the stated or implied position

35

Focus The clarity and consistency with which the main idea(s) or point of view are maintained

10

Organization The clear, logical sequencing of ideas and the use of effective transitional devices to show relationships among ideas

15

Language use Variety and correctness of sentence structures and word choices, and control of grammar and usage (e.g., word order, word forms, verb tense and agreement)

35

Mechanics Errors in spelling, capitalization, and punctuation in relation to meaning

5

Overall score 100


Part 2, Chapter 5: Direct writing assessment

Chapter 5: Direct writing assessment

Overview The COMPASS Internet Version offers a direct writing assessment to measure students’ writing production skills. This assessment is useful in helping colleges to place students in standard college or developmental English courses or English as a Second Language (ESL) classes. COMPASS e-Write™ and ESL e-Write™ (the component for non-native-English-speaking students) ask students to compose essays that respond to a writing prompt. ACT administers the writing components over the Internet and uses the IntelliMetric scoring engine, developed by Vantage Learning Technologies, to score them. The automated scoring system instantly evaluates the essay responses with the accuracy of human raters and generates score reports immediately.

Alternatively, students who require or prefer the option to handwrite their responses in the traditional paper-and-pencil assessment mode may do so. The paper essays are scored by two ACT-trained raters using the same scoring rubrics applied by the automated system.

Description of COMPASS e-Write and ESL e-Write Colleges may use the e-Write component as a stand-alone measure or in combination with other COMPASS or ESL tests. For COMPASS e-Write, students are asked to produce a single writing sample in response to a specific prompt. The prompt describes a specific hypothetical situation and target audience. The student’s writing task is to take a position on the issue presented in the situation and explain to the target audience why this position is the better (or best) alternative. Each prompt specifies the basis upon which the target audience will make its decision. This is to assist the student in identifying which kinds of evidence would be most persuasive to the target audience. Situations and audiences defined in the writing prompts are constructed appropriately for the level of knowledge and experience of entering college students.

Non-native English-speaking students who take the direct writing assessment, ESL e-Write, have a similar writing task with prompts specific to the needs of second-language learners. ESL e-Write is described later in this chapter.

Score categories The purpose of the COMPASS e-Write scoring system is to assess a student’s performance of the required skills given a timed, first-draft writing situation. The direct writing



model for COMPASS e-Write is designed to elicit responses that demonstrate a student’s ability to perform in five specific writing skills areas:

• focus • content • organization • style • conventions

Students receive one analytic subscore in each of these skill areas and one overall score used for course placement. The following two sections describe the two COMPASS e-Write scoring models, the preferred 2–12 scale developed in 2006 and the original 2–8 scale introduced in 2001, which is still used by some colleges. In addition, the following sections focus on the content associated with the holistic and analytic scoring rubrics for these e-Write components. Part 3, Chapter 5, of the Reference Manual provides in-depth information about the automated scoring processes and validation studies associated with COMPASS e-Write prompts.

Description of 2–12 score scale Overall score The purpose of the COMPASS e-Write 2–12 scoring system is to assess a student’s performance of the required skills given a timed, first-draft writing situation. This direct writing assessment uses a 6-point, modified-holistic rubric that was introduced in June 2006 with three associated prompts. In March 2012, three additional writing prompts were added to the 2–12 pool. Human scoring for COMPASS e-Write 2–12 requires that two trained raters independently read and score the response on a scale from 1 (lowest) to 6 (highest). The scores from the two raters for each response are summed for the reported score, which ranges from 2 to 12 in increments of 1 point. If the raters’ scores differ by 2 or more points, a third reader adjudicates and determines the reported score.

Each COMPASS e-Write 2–12 score point reflects a student’s ability to perform the skills identified in the scoring guide. That is, responses are evaluated according to how well a student does the following:

• formulates a clear position on the issue defined in the prompt; • supports that position with reasons and evidence appropriate to the position

taken and the specified concerns of the audience; • develops the argument in a coherent and logical manner; • expresses ideas using clear, effective language.



A student receives lower scores for not adequately demonstrating the four skills described above or an “unscoreable” code, if appropriate. Descriptions of the score points on the overall score scale for the COMPASS e-Write 2–12 prompts are presented in Table (2)5.1.

TABLE (2)5.1 Score points on ACT rubric and COMPASS e-Write 2–12 overall score scale

Rubric definition level

e-Write score

COMPASS e-Write 2–12 overall score description

1 2 The response shows an inadequately developed sense of purpose, audience, and situation. These responses show a failed attempt to engage the issue defined in the prompt, and the response displays more than one of the following significant problems: focus on the stated position may be unclear or unsustained; support is lacking or not relevant; much of the style and language may be inappropriate for the occasion, with a very poor control of language: sentences may be poorly constructed and incomplete, word choice may be imprecise, or there may be so many severe errors in usage and mechanics that the writer’s ideas are very difficult to follow.

3 The response reflects some characteristics of a Level 2 response and some elements of a Level 4 response.

2 4 The response shows a poorly developed sense of purpose, audience, and situation. While the writer takes a position on the issue defined in the prompt, the response shows significant problems in one or more of the following areas, making the writer’s ideas often difficult to follow: focus on the stated position may be unclear or unsustained; support may be extremely minimal; organization may lack clear movement or connectedness; much of the style and language may be inappropriate for the occasion, with a weak control of language: sentences may be poorly constructed or incomplete, word choice may be imprecise, or there may be a pattern of errors in usage and mechanics that significantly interfere with meaning.

5 The response reflects some characteristics of a Level 4 response and some characteristics of a Level 6 response.

3 6 The response shows a partially developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and attempts to support that position, but with only a little elaboration or explanation. The writer maintains a general focus on the stated position, with minor digressions. Organization is clear enough to follow without difficulty. A limited control of language is apparent: word choice may be imprecise, sentences may be poorly constructed or confusing, and there may be numerous errors in usage and mechanics.




TABLE (2)5.1 (Continued)

Rubric definition level

e-Write score

COMPASS e-Write 2–12 overall score description

4 8 The response shows a developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and supports that position with some elaboration or explanation. Focus on the stated position is clear and generally maintained. Organization is generally clear. A competency with language is apparent: word choice and sentence structures are generally clear and appropriate, though there may be some errors in sentence structure, usage, and mechanics.


5 10 The response shows a well-developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and supports that position with moderate elaboration or explanation. Focus on the stated position is clear and consistent. Organization is unified and coherent. A command of language is apparent: word choice and sentence structures are generally varied, precise, and appropriate, though there may be a few errors in sentence structure, usage, and mechanics.


6 12 The response shows a thoughtful and well-developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and supports that position with extensive elaboration or explanation. Focus on the stated position is sharp and consistently maintained. Organization is unified and coherent. Outstanding command of language is apparent: word choice is precise, sentences are well structured and varied, and there are few errors in usage and mechanics.

Analytic score scales The COMPASS e-Write 2–12 analytic score scales provide information to students and institutions about the students’ strengths and weaknesses. ACT developed five analytic scales, each using a 6-point system. These score points are described in Tables (2)5.2 through (2)5.6. In human scoring, each response is read by one trained rater who assigns scores in each of these five areas. Each score point on the 1–6 scale reflects a student’s ability to perform specific skills identified in the scoring guide for that particular writing domain.



TABLE (2)5.2 Analytic score scale: Focus

Score Description

1 The writing is not sufficient to maintain a point of view with any clarity or consistency. Focus is unclear due to one or more of the following reasons: the response is too short to provide sufficient evidence of focus; there is a lack of a stated position; digressions; or confusing language.

2 Focus is difficult to judge due to: the response being too short to provide sufficient evidence of focus, digressions that do not lead back to the stated position, or confusing language.

3 The main idea(s) and point of view are generally maintained. The writer maintains a general focus on the stated position; digressions usually lead back to the stated position.

4 The main idea(s) and point of view are maintained. The writer maintains a generally clear focus on the stated position; minor digressions eventually lead back to the stated position.

5 The main idea(s) and point of view are well maintained throughout the writing. The writer maintains a clear focus on the stated position.

6 The focus on the stated position is sharp and is clearly and consistently maintained throughout the writing.

TABLE (2)5.3 Analytic score scale: Content

Score Description

1 Support for ideas is extremely minimal or absent; specific details are lacking or not relevant; the writer does not adequately engage with the topic.

2 The writer supports ideas with extremely minimal elaboration; support may consist of unsupported assertions.

3 Only a little support is provided for the position taken; a few reasons may be given without much elaboration beyond one or two sentences for each reason; a main impression may be one of rather simple and general writing.

4 Support for the position is somewhat elaborated and detailed in well-developed paragraphs; specific examples may be given, but they are sometimes not well selected. Development may be a bit repetitious.

5 Support for the position is moderately elaborated in well-developed paragraphs; relevant, specific details and varied examples, sometimes from personal experience, are used. Development is clear, precise, and thorough.

6 Support for the position is extensively elaborated in well-developed and logically precise paragraphs; relevant, specific details and varied examples, sometimes from personal experience, are used. The writing gives a sense of completeness because the topic is quite thoroughly covered.



TABLE (2)5.4 Analytic score scale: Organization

Score Description 1 The writing is so minimal that little or no organization is apparent; there is no

introduction, body, or conclusion; few or no transitions are used. 2 Organization may lack clear movement or connectedness; paragraphs may not be

used; transitional words or phrases are rarely used.

3 Organization is clear enough to follow without difficulty; the introduction and conclusion, if present, may be undeveloped; transitions may be lacking, confusing or predictable (e.g., “first,” “second,” etc.); the overall effect may be one of “listing” with several supporting ideas given but with little or no elaboration.

4 Organization is generally clear; introduction and conclusion are appropriate; some transitions show relationships among ideas and are usually appropriate.

5 Organization is unified and coherent; introduction and conclusion are developed; ideas show a progression or appropriate transitions show relationships among ideas.

6 The organization is unified and coherent, with a well-developed introduction, body, and conclusion; sentences within paragraphs flow logically, ideas show a clear progression, and effective transitions are used consistently and appropriately, clearly showing relationships among ideas.

TABLE (2)5.5 Analytic score scale: Style

Score Description 1 Very poor control of language is apparent: several sentences may be fragmented and

confusing; words may be inaccurate or missing. 2 Weak control of language is apparent: sentence structures are often flawed and

incomplete, as several sentences may be fragmented and confusing; word choice is simple and may be incorrect, imprecise, or vague.

3 A control of language is apparent: more than one sentence may be fragmented or confusing; a few words may be inaccurate or missing, but word choice is usually appropriate; phrasing may be vague or repetitive.

4 A competency with language is apparent: sentences are clear, correct, and somewhat varied; word choice is appropriate and accurate.

5 A command of language is apparent: sentence structures are usually varied, and word choice is usually varied, specific, and precise.

6 Language use is interesting and engages the reader: an outstanding command of the language is apparent; sentences are varied in length and structure; word choice is varied, specific, and precise.



TABLE (2)5.6 Analytic score scale: Conventions

Score Description 1 The writing has severe errors of many kinds that interfere with meaning (e.g.,

sentence fragments, sentence splices, subject-verb agreement, plurals, inaccurate or missing words, punctuation, spelling).

2 The writing has a pattern of errors that may significantly interfere with meaning (e.g., sentence fragments, sentence splices, subject-verb agreement, plurals, inaccurate or missing words, punctuation, spelling).

3 The writing may have numerous errors that distract the reader, but they do not usually interfere with meaning (e.g., sentence fragments and splices, punctuation, missing words, spelling).

4 Some errors in grammar, usage, and mechanics may be apparent, such as a few words spelled inaccurately, missing apostrophes and commas, etc., and they may distract and occasionally interfere with meaning.

5 A few errors in grammar, usage, and mechanics may be apparent, such as occasional spelling inaccuracies, missing commas, etc., and they rarely distract or interfere with meaning.

6 A few errors in grammar, usage, and mechanics may be apparent, such as occasional spelling inaccuracies, missing commas, etc., and they do not distract or interfere with meaning.

Unscoreable responses The COMPASS e-Write system does not automatically score responses that deviate significantly from the patterns of results observed in the standard original training essays (that is, the patterns of results expected and programmed into the electronic scoring engine for each score level). Occasionally, a student may submit a response that is off topic, too short to support a score, or for some other reason cannot be scored by computer. In this instance, the text of the student’s response is sent to ACT for scoring by trained human raters. Responses that are typically determined to be not scoreable include those that

• fail to address the topic defined by the prompt;

• are written in a language other than English;

• are too short to support a full evaluation; or

• refer to sensitive issues (such as violent or abusive situations, suicide).

Raters assign unscoreable codes: “91” for responses that are “off topic” and “92” for responses that are “unintelligible.” The score codes and the student responses are then incorporated into the COMPASS Internet Version and the institution is notified via email that score reports for



those students are available. (This procedure is followed for both the 2–8 and 2–12 score scales of COMPASS e-Write and for ESL e-Write responses.)

Description of 2–8 score scale Overall score The original COMPASS e-Write assessment released in 2001 uses a 4-point, modified-holistic rubric and scores the essays on a 2–8 scale as determined by the electronic scoring system. For human scoring, the scores given reflect the combined evaluations of two trained raters. These raters independently read and score the response on a scale from 1 (lowest) to 4 (highest). The scores from the two raters for each response are summed for the reported score, which ranges from 2 to 8 in increments of 1 point. If the raters’ scores differ by 2 or more points, a third reader adjudicates and determines the reported score.

Each score point reflects a student’s ability to perform the skills identified in the scoring guide. Responses are evaluated according to how well a student

• formulates a clear position on the issue defined in the prompt;

• supports that position with reasons and evidence appropriate to the position taken and the specified concerns of the audience;

• develops the argument in a coherent and logical manner; and

• expresses ideas using clear, effective language.

A student receives lower scores for not adequately demonstrating the four skills described above (that is, not taking a position, not supporting it, not developing the argument, or not expressing ideas clearly and effectively). A student who does not respond to the prompt is assigned an “unscoreable” code rather than a score. Descriptions of the score points on the overall COMPASS e-Write 2–8 score scale are presented in Table (2)5.7.



TABLE (2)5.7 Description of score points on the ACT rubric and COMPASS e-Write 2–8 overall score scale

ACT rubric definition level

e-Write score COMPASS e-Write 2–8 overall score scale

1 2 The response shows an inadequately developed sense of purpose, audience, and situation. Although the writer attempts to address the topic defined in the prompt, the response displays more than one of the following significant problems: Much of the style and language may be inappropriate for the occasion; focus may be unclear or unsustained; support is very minimal; sentences may be poorly constructed; word choice may be imprecise; or there may be many errors in usage and mechanics.


2 4 The response shows a partially developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and attempts to support that position, but there may be little elaboration or explanation. Focus may be unclear and not entirely sustained. Some effort to organize and sequence ideas is apparent, but organization may lack coherence. A limited control of language is apparent: word choice may be imprecise; sentences may be poorly constructed or confusing; and there may be many errors in usage and mechanics.


3 6 The response shows a developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt with some elaboration or explanation. Focus is clear and generally maintained. Organization is generally clear. A competency with language is apparent: word choice and sentences are generally clear though there may be some errors in sentence structure, usage, and mechanics.


4 8 The response shows a thoughtful and well-developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt with well-developed elaboration or explanation. Focus is clear and consistently maintained. Organization is unified and coherent. Good command of the language is apparent: word choice is precise; sentences are well structured and varied; and there are few errors in usage and mechanics.



Analytic score scales The purpose of the analytic score scales of COMPASS e-Write 2–8 is to provide additional information to students and institutions concerning students’ strengths and weaknesses. The analytic score scale uses a 4-point system. Electronic scoring is done instantaneously according to the standard results expected at various point values on the score scale (that is, the key criteria for each score). For human scoring, each essay response is read by a trained rater who assigns scores in each of these five areas; the raters use training (example) essays that show key criteria for each score. Each score point on the 1–4 scale reflects a student’s ability to perform the specific skills identified in the scoring guide for that particular domain. Descriptions of these score points are presented in Tables 2(5).8 through (2)5.12.

TABLE (2)5.8 Analytic score scale: Focus

Score Description 1 The writing is not sufficient to maintain a point of view with any clarity or

consistency. 2 The main idea(s) and point of view are generally maintained, although the

writing may ramble in places and there may be digressions from the main points.

3 The main idea(s) and point of view are maintained throughout the writing.

4 The focus is sharp and is clearly and consistently maintained throughout the writing.

TABLE (2)5.9 Analytic score scale: Content

Score Description

1 Support for ideas is minimal; specific details are lacking; the writer does not adequately engage with the topic.

2 Some support is provided for the position taken; a few reasons (one to three) may be given without much elaboration beyond two or three sentences for each reason; a main impression may be one of rather simple and general writing.

3 Support for the position is rather elaborated and detailed (three to four reasons) in quite well-developed paragraphs; specific examples may be given, but they are sometimes not well selected. Due to a lack of clarity in places the reader sometimes may be confused. Development may be a bit repetitious.

4 Support for the position is elaborated in well-developed paragraphs; specific details and examples, sometimes from personal experience, are used. The writing gives a sense of completeness because the topic is quite thoroughly covered.



TABLE (2)5.10 Analytic score scale: Organization

Score Description 1 The writing is so minimal that little or no organization is apparent; there is no

introduction, body, or conclusion; few or no transitions are used. 2 A simple organizational structure is apparent; there is usually some

introduction, if only a sentence or two; the introduction may not adequately introduce the topic because the main point(s) is not presented; transitions may be lacking, confusing or obvious (e.g., “first,” “second”); the overall effect may be one of “listing” with several supporting ideas given but with little or no elaboration.

3 An organizational structure is apparent, with a (usually) well-defined introduction, body, and conclusion. The topic or position taken may not be completely clear in the introduction; transitions are usually used to show relationships between ideas.

4 The organization is unified and coherent with a well-developed introduction, body, and conclusion; sentences within paragraphs flow logically, and transitions are (generally) used consistently.

TABLE (2)5.11 Analytic score scale: Style

Score Description

1 Language use is extremely simple; several sentences may be fragmented and confusing; words may be inaccurate or missing.

2 Language use is quite simple; more than one sentence may be fragmented or confusing; a few words may be inaccurate or missing.

3 Language use shows good control; sentences are clear, correct, and somewhat varied; word choice is appropriate and accurate.

4 Language use is interesting and engages the reader; sentences are varied in length and structure; word choice shows variety and is precise.

TABLE (2)5.12 Analytic score scale: Conventions

Score Description 1 The writing has errors of many kinds that may interfere with meaning (e.g.,

sentence fragments, sentence splices, subject-verb agreement, plurals, inaccurate or missing words, punctuation, spelling).

2 The writing may have errors that distract the reader, but they do not usually interfere with meaning (e.g., sentence fragments and splices, punctuation, missing words, spelling).

3 Errors are relatively minor, such as a few words spelled inaccurately, missing apostrophes and commas, etc.

4 Errors are infrequent and minor, such as occasional spelling inaccuracies, missing commas, etc.



Unscoreable responses As described in the COMPASS e-Write 2–12 section, the COMPASS e-Write system does not automatically score responses that deviate significantly from the patterns of standard results expected and programmed into the electronic scoring engine. COMPASS e-Write 2–8 score scale essays that are found to be “unscoreable” are submitted to ACT for human scoring or coding.

COMPASS e-Write report services The student’s COMPASS e-Write Standard Individual Report contains the six scores described earlier: one overall score and five analytic subscores. This standard report and the COMPASS database together support colleges with the following reporting services:

• customized course placement advice;

• instructional support suggestions;

• program advising services; and

• transfer-planning information.

Examples of a COMPASS e-Write response and a Standard Individual Report are provided in Figures (2)5.1 and (2)5.2, respectively. Because reports for both e-Write components are identical except for the analytical and overall scores reported, only COMPASS e-Write 2–12 reports are illustrated.

COMPASS e-Write response report The COMPASS e-Write response is available to the college, allowing faculty the opportunity to review the student’s writing and compare it to the assigned scores. In addition to the actual student response, this reporting feature lists the student’s name and identification number, COMPASS e-Write test date, session ID, and code that identifies the prompt for which the response was written. The system includes an option to print out the COMPASS e-Write responses. For test security reasons, students’ responses are never to be returned to students. (NOTE: The example of a student response shown below has been converted to a random display to protect the security of the prompt.)



COMPASS e-Write 2–12 COMPASS Session ID: 6 Prompt: COMPASS 110 Dear City Council: Xxx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx. Xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx Xxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx. Xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx. Xxxxxx xxxx xxxx xxxxxxx xxxx xxxxx. Xxxxx xxx xxxxx xxxxx xxxxx xxxxxx. Xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxx xxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx x xxxxx xxxxx. Xxxxxx xx xx xxxxx xxxx xxxxx xxxxx xxxxx xxxxxx. Sincerely,

Tom Jones

FIGURE (2)5.1 Example of student’s COMPASS e-Write response

ACT Computer Placement Assessment and Support System Date: 03/30/20## Page 1 © 20## ACT, Inc.

Jones, Tom Test Date: 03/30/## ID: 111-11-1111



Standard Individual Report The COMPASS Standard Individual Report includes the student’s name and identification number, test session-level information, student background information, and COMPASS e-Write scores. For the COMPASS e-Write 2–12 prompts, the report provides an overall score on a 2–12 score scale and five analytic scores on a 1–6 score scale. (For COMPASS e-Write 2–8 prompts, the report provides an overall score on a 2–8 score scale and five analytic scores on a 1–4 score scale.) In addition to giving student, session, and score information, the Standard Individual Report includes the amount of time that elapsed as the student responded to the writing prompt. All of this detail and the student’s actual COMPASS e-Write response can provide a more complete analysis of the student’s writing ability. The Standard Individual Report also includes placement recommendations that are specific to the institution. Although default placement messages are included as part of the e-Write system, local sites must revise these messages so that they align with local placement decisions.



ACT Computer Placement Assessment and Support System

Date: 03/30/20## Page 1 © 20## ACT, Inc.

Jones, Tom Test Date: 03/30/## ID: 111-11-1111

2201 North Dodge St., Iowa City, IA 52243 Phone: (319) 555-1212 Session #: 6 Location: COMPASS Internet Version Total Time: 01:05:26 Test package: COMPASS e-Write 2–12 Test Release Report To Postsecondary Institutions: No To High School: No

Student Background and Educational Plans (Time = 0:01:45)

English first

Yes Veteran: Yes HS

Not HS graduate Qtr cred since HS: 12 Sem. cred.

23 Ed level since HS: Bachelor's degree

After HS

Not Listed HS GPA: D to C- Enrollment

Summer I Enrollment year: 20##

Enrollment

Evening # credits planned: 13 Career

Air Traffic Contrl Career certainty: Fairly sure

Interest

09 idea Major: Not Listed Major

Fairly sure Employment hours: 16-20 hrs weekly

Education

2yr college degree Transfer plans: Reason

Transfer to 4-year Earned cert/degree: Undecided

Expected

C- to C

COMPASS e-Write 2–12 (Major Group: General Recommendations) COMPASS e-Write 2–12 ID: 3002

Domain Holistic

Score 11

Time (hours:minutes) 00:56

Analytical subscores Focus

6

Content 5 Organization 5 Style 6 Conventions 5

Recommendation: Sample placement message: As part of the COMPASS 2–12 e-Write setup process, the college must modify this message to fit local courses, support services, and preferences. Enroll in a second semester developmental writing course (e.g., Fundamentals of Writing II, Basic Writing II, Essay Composition).

FIGURE (2)5.2 Example of Standard Individual Report for COMPASS e-Write 2–12 score scale



Description of ESL e-Write ESL e-Write, the direct writing assessment component for students who are non-native English speakers, asks students to provide a single writing sample in response to a specific prompt. The assessment produces six scores: an overall score that can be used for placement and five analytic subscores that are combined and weighted to produce the overall score. The ESL e-Write model developed by ACT elicits responses that demonstrate a student’s ability to perform skills in these five domains:

• development • focus • organization • language use • mechanics

The ESL e-Write writing prompts require students to • take a position about a given issue; • support the position with relevant reasons, examples, and details; • organize and connect ideas in a clear and logical manner, maintaining a

consistent focus on the main ideas throughout; and • express those ideas using correct grammar, usage, and mechanics.

Presented in simple language that is accessible for ESL students, each prompt describes a common aspect of everyday life. Students are asked to state and support a personal opinion about this topic. Each prompt is designed so that students can answer it successfully without any specialized knowledge or background information. The following sections focus on the content associated with the analytic scoring model used for ESL e-Write. For indepth information regarding the automated scoring processes and validation studies associated with all e-Write components, please refer to Part 3, Chapter 5, “Development of COMPASS e-Write and ESL e-Write,” in this Reference Manual.

Analytic score scales The purpose of the scoring system is to assess a student’s performance of the required skills given a timed, first-draft writing situation. ESL e-Write uses a 6-point analytic rubric, with analytic scores presented on a scale of 2–12. The electronic scoring system instantly evaluates the essay responses and generates score reports immediately. For the human scoring approach, two trained raters independently read and score a student’s response on a scale from 1 (lowest) to 6 (highest) in each of the five domains. If the raters’ domain scores differ by 2 or



more points, a third reader adjudicates and determines the domain score. The scores from the two raters for each domain are combined and weighted for the overall reported score, which ranges from 2 to 12 in increments of 1 point. The following provides general descriptions of each of the domains.

• Development refers to the reasons, examples, and details that are used to support the stated or implied position.

• Focus refers to the clarity and consistency with which the main idea(s) or point of view is maintained.

• Organization refers to the clear, logical sequencing of ideas and the use of effective transitional devices to show relationships among ideas.

• Language Use refers to the variety and correctness of sentence structures and word choices, and control of grammar and usage (e.g., word order, word forms, verb tense, and agreement).

• Mechanics refers to errors in spelling, capitalization, and punctuation in relation to meaning.

Detailed descriptions of the score points are presented in Tables (2)5.13 through (2)5.17. Each score point on the 1–6 scale reflects a student’s ability to perform the specific skills identified in the scoring guide for that particular domain.

TABLE (2)5.13 ESL analytic score scale: Development

Score Description

1 Development is severely limited, and writing may be partially unrelated to the topic.

2 Development is limited and may include excessive repetition of prompt ideas and/or consistently simple ideas.

3 The topic is developed using very few examples, which may be general and somewhat repetitious, but they are usually relevant to the topic.

4 The topic is developed using reasons supported by a few examples and details.

5 The topic is developed using reasons supported by some specific examples and details. Evidence of critical thinking and/or insight may be displayed.

6 The topic is developed using sound reasoning, supported by interesting, specific examples and details in a full, balanced response. Evidence of critical thinking and/or insight may be displayed. Opposing viewpoints may be considered and/or refuted.



TABLE (2)5.14 ESL analytic score scale: Focus

Score Description

1 Focus cannot be judged due to the brevity of the response.

2 Focus may be difficult to judge due to the brevity of the response; any digressions generally do not lead back to the task.

3 Focus is usually maintained on the main idea(s); any digressions usually lead back to the task.

4 Focus is adequately maintained on the main idea(s); any minor digressions lead back to the task.

5 Focus is maintained clearly on the main idea(s).

6 A sharp focus is maintained consistently on the main idea(s).

TABLE (2)5.15 ESL analytic score scale: Organization

Score Description

1 Little or no organizational structure is apparent.

2 The essay shows an understanding of the need for organization. Transitional words are rarely if ever used. There is minimal evidence of a beginning, middle, and end to the essay.

3 Some organization may be evident. Transitions, if used, are generally simple and predictable. The introduction and conclusion, if present, may be undeveloped.

4 The essay demonstrates little evidence of the logical sequencing of ideas, but there is an adequate organizational structure and some transitions are used. There is an underdeveloped introduction and there may be no conclusion.

5 The essay demonstrates sequencing of ideas that is mostly logical, and appropriate transitions are used to show relationships among ideas. There is a somewhat developed introduction and there may be a brief conclusion.

6 The essay demonstrates logical sequencing of ideas, and transitions are used effectively to show relationships among ideas. There is a well-developed introduction and the essay may have a brief but clear conclusion.



TABLE (2)5.16 ESL analytic score scale: Language Use

Score Description

1 Sentences demonstrate little understanding of English word order, and word choice is often inaccurate. There are numerous errors in grammar and usage that frequently impede understanding.

2 Sentence structure is simple, with some errors evident in word order. Word choice is usually accurate but simple. Language control is inconsistent or weak, with many errors in grammar and usage, often making understanding difficult.

3 Most sentences are complete although some may not be correct or clear. Word choice is sometimes appropriate. Although a few errors may impede understanding, basic language control is evident and meaning is sometimes clear.

4 Some sentence variety is present, but some sentences may not be entirely correct or clear. Word choice is appropriate and varied. Although errors may be frequent, language control is adequate and meaning is usually clear.

5 A variety of kinds of sentences are present and usually correct. Word choice is varied and occasionally specific. Overall, language control is good and meaning is clear.

6 A wide variety of kinds of sentences are present and usually correct. Word choice is varied and specific. Although there may be a few minor errors, language control is competent and meaning is clear.

TABLE (2)5.17 ESL analytic score scale: Mechanics

Score Description 1 Errors are frequently severe and obscure meaning, or mechanics cannot

be judged due to the brevity of the response.

2 Errors often distract and/or frequently interfere with meaning, or mechanics may be difficult to judge due to the brevity of the response.

3 Errors sometimes distract and they occasionally interfere with meaning.

4 Errors usually do not distract or interfere with meaning.

5 Some errors are evident but they do not distract or interfere with meaning.

6 Only minor errors, if any, are present and they do not distract or interfere with meaning.



Overall score Because ESL students and their instructors have a unique need for more feedback on student-level performance relative to instruction than is the case of students taking COMPASS e-Write, the method for achieving the overall score for ESL students differs. Similarly to COMPASS e-Write, the ESL component uses the analytic score scales to provide skill domain information about a student’s relative strengths and weaknesses in writing skills. The overall score for ESL e-Write, which can be used for general placement purposes, is derived on the basis of two parts: (1) the analytic scores assigned and (2) a weighting of these scores.

ACT’s research into ESL curriculum, instruction, and assessment, especially related to ESL direct writing assessments, shows that some domains carry more weight than others in terms of ESL instructional needs and students’ abilities to demonstrate language proficiency. Table (2)5.18 lists each of the ESL e-Write analytic domains and outlines the weighting model used to derive the overall score; the overall score formula immediately follows the table.

TABLE (2)5.18 ESL e-Write overall score weighting

Analytic score domains Analytic score weights Development The reasons, examples, and details that are used to support the stated or implied position

35%

Focus The clarity and consistency with which the main idea(s) or point of view is maintained

10%

Organization The clear, logical sequencing of ideas and the use of effective transitional devices to show relationships among ideas

15%

Language Use Variety and correctness of sentence structures and word choices, and control of grammar and usage (e.g., word order, word forms, verb tense, and agreement)

35%

Mechanics Errors in spelling, capitalization, and punctuation in relation to meaning

5%

Overall score 100%



To achieve the overall score, the ESL e-Write analytic scores are weighted, and the weighted analytic scores are summed. The general formula used is as follows:

Overall ESL e-Write Score:

(Development × 0.35) +

(Focus × 0.10) +

(Organization × 0.15) +

(Language Use × 0.35) +

(Mechanics × 0.05)

Unscoreable responses Just as in COMPASS e-Write, the automated scoring system does not score any ESL e-Write responses that deviate significantly from the patterns of expected results, or key criteria, for each score. Vantage automatically sends significantly deviating responses to ACT to be scored by human scorers. When completed, the responses and scoring results are incorporated into the COMPASS Internet Version software, and the institution is notified via email to print score reports.

ESL e-Write report services The combination of the ESL e-Write Standard Individual Report and the COMPASS database support the following reporting services:

• customized course placement advice • instructional support suggestions • program advising services • transfer planning information

Examples of the ESL e-Write response and ESL e-Write Standard Individual Report are provided in Figures (2)5.3 and (2)5.4.

COMPASS ESL e-Write response report The ESL e-Write response report, Figure (2)5.3, is available to the school, allowing institutional representatives the opportunity to review the student’s writing and compare it to the assigned scores. In addition to the actual student response, this reporting feature lists the student’s name and identification number, the ESL e-Write test date, the session ID, and a code that identifies the prompt to which the response was written. The system includes an option to



print out the ESL e-Write responses. For test security reasons, students’ responses are never to be returned to students. (NOTE: The example of a student’s ESL e-Write response shown below has been converted to a random display to protect the security of the prompt.)

FIGURE (2)5.3 Example of a student’s ESL e-Write response for 2-12 score scale

The Standard Individual Report, Figure (2)5.4, includes the student’s name and identification number, test session information, student background information, ESL e-Write scores—one overall score and five domain scores on the 2–12 score scale—and also the length of time it took for the student to respond to the writing prompt. All of this report information can be used with the scores and the student’s response to provide a more complete analysis of the student’s facility with standard written English. The Standard Individual Report also includes placement recommendations specific to the institution.

ACT Computer Placement Assessment and Support System

Date: 03/30/20## Page 1 © 20## ACT, Inc. Gomez, Tomás Test Date: 03/30/## ID: 111-11-1111

ESL e-Write ID: 3000 COMPASS Session ID: 4 Prompt: COMPASS 002 Xxx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx x xxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx x xxx xxxxxxx xxxxxx xxxxx. Xxxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx xxxx xxxxx xxxx xxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xx xxxxx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxxx xxxxx xxxx xxxx xxxxxxx xxxxxxx xxxxxx xxxxxx xxxxx. Xxxxxx xx xx. Xxxxxx xx xx xxxxx xx xxxxx xxx xxxxx xxx xxx xxx xxxx xxx xxxxx xxxxx xxxxx xxxxxx xxxxxx xxxxxxx xxxx xxxxx xxxxx xx xxxx xxx xxx xxxxx xxxxx xxxxxx xxxxxx xxxxx.



. ACT Computer Placement Assessment and Support System

Date: 03/30/20## Page 1 © 20## ACT, Inc. Gomez, Tomás Test Date: 03/30/## ID: 111-11-1111 2201 North Dodge St., Iowa City, IA 52243 Phone: (319) 555-1212

Session #: 4 Location: COMPASS Internet Version Total Time: 01:05:26 Test package: ESL e-Write Test Release Report To Postsecondary Institutions: No To High School: No

Student Background and Educational Plans (Time = 0:01:45)

English first lang: HS certification: Sem cred since HS: After HS institution: Enrollment term:

No HS graduate 23 Not Listed Summer I

Veteran: Qtr cred since HS: Ed level since HS: HS GPA: Enrollment year:

Yes 12 Bachelor's degree C to B–

Enrollment time: Career goal: Interest region: Major certainty: Education plans: Reason attending: Expected GPA:

Evening Air Traffic Contrl 09 idea Fairly sure 2yr college degree Transfer to 4-year C to B-

# credits planned: Career certainty: Major: Employment hours: Transfer plans: Earned cert/degree:

13 Fairly sure Not Listed 16–20 hrs weekly Undecided

ESL e-Write (2–12) (Major Group: General Recommendations) ESL e-Write ID: 3000

Domain Overall

Score 11

Time (hours:minutes) 00:56

Domain subscores Development

11

Organization 9 Focus 10 Language Use 12 Mechanics 10

Recommendation: Sample placement message: As part of the ESL e-Write setup process, the

college must modify this message to fit local courses, support services, and preferences. Enroll in a High-to-Intermediate/ Advanced ESL course (e.g., Advanced ESL Writing, Advanced ESL Grammar).

FIGURE (2)5.4 Example of Standard Individual Report for ESL e-Write for 2–12 score scale


Part 2, Chapter 6: Creating test packages

Chapter 6: Creating test packages Overview The COMPASS program is flexible and can meet a variety of placement and diagnostic testing needs for institutions. Conventional paper-and-pencil tests and administration procedures do not offer this level of flexibility or configurability. An institution can control the characteristics of COMPASS and ESL testing sessions by creating unique test packages that address the needs of that institution. Specific directions for setting up COMPASS test packages are found in the “Test Setup” section of the Online Software Manual (i.e., “Help” interface) within the COMPASS system.

Purpose of test packages COMPASS allows colleges to create specific test packages to accommodate a variety of testing alternatives, enabling users to design their own placement system. A college can control how students are routed through content domains and diagnostic areas and specify score ranges that align with local needs. A core characteristic of COMPASS is the system’s extensive and adaptable framework for translating students’ test performance into course placement decisions through the use of placement messages.

Institutions can create any number of test packages with different characteristics to suit their specific needs, or they can use the standard test packages that are supplied in the COMPASS software. The standard test packages possess the most generally applicable characteristics that would best serve as a template for customizing additional test packages.

The test package used to control a particular test session is selected from a menu at the time the session begins. Different students at adjacent computer stations can be tested with different test packages. For example, it is possible to place a science major and a humanities major in the appropriate classes using different cut scores on different placement tests.

The following sections describe considerations for setting up COMPASS test packages. The COMPASS Mathematics, Reading, Writing Skills, and English as a Second Language tests are each covered separately to focus on details that differ across these curricular areas.



Mathematics Content domains COMPASS offers Mathematics placement tests in five different content domains: numerical skills/prealgebra, algebra, college algebra, geometry, and trigonometry. Detailed diagnostic tests are also offered for both the numerical skills/prealgebra and algebra domains. The content and structure of the math item pools supporting the mathematics content domains and diagnostic areas are described in Part 2, Chapter 2, “Mathematics tests,” in this Reference Manual.

Although the adaptive testing process is very efficient, administering all five content domains and diagnostic areas to every examinee would be time-consuming and frustrating or tedious to examinees. Many students would take tests in completely unfamiliar or all-too-familiar content domains, and administering all possible domains is inefficient from a measurement standpoint. For example, there is little benefit to administering the college algebra domain to a student who has already done poorly in the numerical skills/prealgebra domain. Conversely, because a student who excels in college algebra is an unlikely candidate for placement in a developmental math course, administering COMPASS Numerical Skills/Prealgebra diagnostic tests to this student would not contribute useful information.

Administering a test package that has a content range that is too narrow is also a potential problem. A student placed in a beginning algebra course on the basis of high performance in the numerical skills/prealgebra domain might have demonstrated a readiness for more advanced coursework if the algebra, college algebra, and trigonometry tests had also been administered. Placement in a calculus course would be appropriate for a student who demonstrates proficiency on the college algebra and trigonometry measures.

As a general rule, examinees should not be given test items that contribute little or no information from a measurement standpoint. All of the items that a student is presented should be relevant to the placement decision. Conventional paper-and-pencil tests fall short of this goal. This is a key benefit of using COMPASS routing for mathematics testing.

Advantages of routing COMPASS not only tests multiple mathematics content domains independently, but also corrects the flaws of conventional paper-and-pencil placement tests. First, COMPASS optimizes measurement and solves the problem of inefficiency by administering items automatically selected by the software. The computer-based test controls how examinees are routed within and between the content domains and diagnostic assessments. This saves time



because no break or interaction with an adviser is necessary between completing one content domain and starting the next. An examinee also may be unaware when testing in one domain has ended and testing in another has started. Test scoring and routing are immediate. However, academic advisers remain important parts of the COMPASS testing process because they must specify the cut scores used by the computer to route students through the tests.

To control how students are guided through the process, three sets of rules must be followed. The first set specifies the available content domain or diagnostic assessment that will be presented as the initial domain to a particular examinee. Proper selection of the initial domain facilitates efficient student placement. Next, the inter-domain routing rules must be specified. These rules state the circumstances under which examinees who have finished one domain are to be sent to another domain or to a diagnostic assessment. Finally, the rules governing the composition of the diagnostic assessments can be defined by selecting from a list of available diagnostic tests.

Specifying the initial domain The basic idea behind adaptive testing is that selection of each new item to be presented is based on the examinee’s prior performance. Examinees displaying capable math skills are administered increasingly more difficult items; conversely, examinees who are less capable are given easier items. Although the concept works well when an examinee’s performance level has been established, one problem is what to do until that point is reached. Considerable testing time can be saved by starting examinees with items appropriate to their skill level. For example, it would make little sense to start a student in the college algebra domain if the student had a minimal mathematics background. Accordingly, COMPASS offers various options for selecting which mathematics content domain is presented first to an examinee.

Specify domain directly. One option is to specify the initial domain directly when defining a test package. Any mathematics content domain or diagnostic assessment can be selected as the initial domain. Specifying the initial domain or assessment is recommended in two circumstances:

(1) when testing students who are fairly homogeneous with respect to their mathematics skill level or

(2) when little or nothing is known about students before their testing.

In the second case, the same carefully chosen initial domain will apply equally well to all students. For example, if all or most students being tested using a particular test package are known to have similar backgrounds that do not include previous algebra courses, it would be reasonable to start each student in the numerical skills/prealgebra domain. When nothing is



known about students’ backgrounds or capabilities, algebra seems a good choice given its central location in the mathematics domain hierarchy. From this starting point, students can be moved quickly up to college algebra or down to numerical skills/prealgebra. The default COMPASS standard test package uses this strategy.

Self-selecting initial domain. An alternative approach for use with students with unknown backgrounds is to allow examinees to select their own initial domain from the limited domain choices of numerical skills/prealgebra, algebra, or college algebra. Although some students may over- or under-estimate their capabilities and start in an inappropriate domain, the net result of self-selection is more likely to increase efficiency than is specifying the same initial domain for all examinees.

Reported test scores. A final option allows the initial domain to be determined on the basis of reported scores from other tests. Two testing programs support this purpose: ASSET and the ACT Assessment Program (AAP). Students can provide ASSET scores in numerical skills, elementary algebra, intermediate algebra, or college algebra. The rules used to determine the initial COMPASS math domain from these scores are described below, along with information about how the AAP Mathematics scores are mapped to initial domains. Table (2)6.1 is used to determine the appropriate initial content domain to be presented in the COMPASS Mathematics placement tests, based on a particular examinee’s AAP or ASSET Mathematics scores.

TABLE (2)6.1 Initial content domain based on AAP or ASSET scores Input AAP or ASSET scores

Initial domain

Numerical Skills/ Prealgebra

Algebra

College Algebra

AAP Mathematics 0–15 16–25 26–36

ASSET Numerical Skills 0–46 47–55 –

ASSET Elementary Algebra 0–34 35–53 54–55

ASSET Intermediate Algebra 0–31 32–51 52–55

ASSET College Algebra – 0–34 35–55

An examinee with a reported AAP Mathematics score between 0 and 15 would start testing in COMPASS Mathematics in the numerical skills/prealgebra domain. Examinees with scores between 16 and 25 would start in algebra. Examinees with scores between 26 and 36 would start in college algebra.



Because a student could have as many as four separate scores, using ASSET results to determine the starting domain can be complex. However, it is unlikely that any examinee will take more than two ASSET math tests. When more than one ASSET test score is available, it should first be determined whether the scores map to the same initial domain. For example, if an examinee scored 49 on numerical skills and 51 on elementary algebra, the initial domain determined by both scores would be COMPASS Algebra. No conflict is present. When multiple scores do conflict (for example, if an examinee scored 49 in numerical skills and 31 in elementary algebra), the conflict is resolved by selecting as the initial domain the highest domain determined by any available ASSET score. In the preceding example, the numerical skills score of 49 suggests algebra as the initial domain, and the 31 in elementary algebra suggests numerical skills/prealgebra. Because algebra is the higher domain, it would be selected.

Basing the initial domain on reported mathematics test scores is likely to be superior to either directly specifying the initial domain or allowing examinees to self-select. Therefore, when the relevant scores are available, ACT recommends that they be used for this purpose. Implementing this option does not require that test scores be available for every examinee. If an examinee does not have the required test scores, selection of the initial domain reverts to self-selection or to direct specification. Either alternative may be chosen as the backup procedure.

Routing rules for COMPASS Mathematics The five content domains that constitute the COMPASS Mathematics placement tests are administered independently of each other. When testing in a given mathematics domain has concluded, the inter-domain routing rules determine if the test session is complete or if testing should proceed to another mathematics domain. When applicable, the routing rules also determine the specific domain to be administered next.

Routing rules are implemented by assigning an outcome to each possible score in each domain. The outcome may be to send the examinee to a new domain or to end the mathematics test session. When a student has finished testing in a domain, the obtained score will be referred to the set of routing rules that then determines the next course of action.

The goal of an effective set of routing rules is to test examinees in the domain most relevant to their placement decision. Ideally, testing will conclude in this domain as well. For example, if the decision for a student is between placement in an elementary or an intermediate algebra course, testing for this student should end with a placement recommendation following administration of the algebra domain.



Students can be routed from any COMPASS mathematics domain to any other. However, the domains are designed to form at least a partial hierarchy with prealgebra at one extreme and trigonometry at the other. Although the location of the geometry domain in this arrangement is less certain, the algebra and college algebra domains follow numerical skills/prealgebra fairly naturally. Ideally, examinees should be routed between adjacent domains in this ordering rather than jumping, for example, from numerical skills/prealgebra to college algebra.

Diagnostic assessments are handled somewhat differently from content domains. Students may be routed to appropriate diagnostic tests from either the numerical skills/prealgebra or algebra domains because the diagnostics available are most closely related to these domains. For example, routing to the Numerical Skills/Prealgebra diagnostic tests should be through the numerical skills/prealgebra domain. Similarly, students should be routed to the Algebra diagnostic tests on the basis of their performance in the algebra content domain.

The process of developing an effective set of routing rules is similar to that required for developing placement messages. Both begin with faculty review of the content of the COMPASS domains and then a comparison of that content with the curriculum and prerequisites for courses into which students may be placed. Using this comparison, guidelines are formulated for each domain. COMPASS provides a standard default test package with a routing table for the Mathematics placement tests. The routing rules in the default test package are based on empirical and anecdotal data from institutions using COMPASS. The routing table for the default test package is shown in Figure (2)6.1 and is discussed in detail in the following sections.



FIGURE (2)6.1 Standard default test package routing table

Numerical Skills/Prealgebra As Figure (2)6.1 indicates, a student whose score meets or exceeds 55 on the Numerical Skills/Prealgebra Placement Test has sufficient skill for placement into an algebra course. The student should be routed to the algebra domain to determine whether an elementary or intermediate course is indicated or if more advanced testing is needed. A student scoring below 55 in numerical skills/prealgebra may be ready for placement into an algebra course but should not skip algebra; this numerical skills/prealgebra score would be used along with cut scores to determine placement in a course ranging from numerical skills to elementary algebra.

Algebra Figure (2)6.1 also shows that a student scoring below 25 on the Algebra Placement Test is not yet ready for a course beyond algebra. Placing this student appropriately will require additional performance information from the numerical skills/prealgebra domain. A student scoring between 26 and 64 inclusive in algebra would likely be ready for a college algebra



course but not a more advanced course. The algebra domain is ideal for deciding which college-level algebra course is most suitable. Students scoring above 65 should be routed to the college algebra domain to determine whether they should be placed in more advanced courses.

College Algebra A student scoring below 30 on the College Algebra Placement Test is probably not yet ready for a college algebra course; the examinee would be routed to the algebra domain for further testing. However, the path to the trigonometry domain begins at a score of 50 because course sequencing is more variable at this level of math ability. Students take geometry and trigonometry at different points in their development, so less information can be inferred from their scores in other domains and more information is needed.

Geometry The default COMPASS mathematic test package does not include any rule for routing students to or from the geometry domain. If students begin testing in any other math domain, they cannot be routed to geometry based on the routing rules in the default test package. Students may begin testing in the geometry domain, but they cannot be routed to other domains. To provide routing rules for the geometry domain, it is necessary to provide appropriate score ranges in the geometry section of the routing rules table. Figure (2)6.2 provides sample geometry routing rules.



FIGURE (2)6.2 Sample geometry routing rules

As shown in Figure (2)6.2, a student scoring below 50 on the COMPASS Geometry Placement Test is most likely unprepared for a trigonometry course and will be routed to the College Algebra Placement Test for further testing. If a student scores over 49, it is likely that the student has reasonable geometry skills and should be routed to trigonometry to determine appropriate course placement. Note that a student who is routed from the geometry domain to the college algebra domain may still be routed to the trigonometry domain by scoring 50 or higher on the College Algebra Placement Test.

Trigonometry A student scoring 30 or below on the COMPASS Trigonometry Placement Test is likely not yet ready for a course requiring trigonometry as a prerequisite. Furthermore, the college algebra domain is probably the best for determining which course is most appropriate. Students whose trigonometry scores exceed 30 are best placed on the basis of their trigonometry score.



The routing table for the default math test package (Figure (2)6.1) illustrates two important characteristics of a routing rule set. First, the score ranges entered in each column of the table must be mutually exclusive (no overlap). Only a single action can be indicated for any given score in a domain. Second, the score ranges entered in a column need not be exhaustive or cover the full range of 1 to 99; an examinee obtaining a score that does not fall into any of the ranges listed in a column is simply finished testing. For example, an examinee who begins testing in the algebra domain and obtains a score of 60 is finished testing because this score falls within the range of 26 to 64, and no routing instructions are given for this range. The examinee would then be placed on the basis of the algebra score. For details for developing cutoff scores and placement messages, see Part 2, Chapter 7, “Cutoff scores,” in this Reference Manual.

Examinees are also finished testing if the routing table directs them back to a domain in which they have already been tested (e.g., if an examinee begins testing in the algebra domain and obtains a score of 20). The algebra column of the routing table indicates that a student with this score would be sent next to the numerical skills/prealgebra domain. If a score of 85 is obtained there, the routing rules direct the examinee back to algebra. Because algebra has already been administered, the test session is considered complete.

Two conditions can end a COMPASS mathematics testing session: a score is obtained for which either (1) no routing instructions are given or (2) the routing instructions direct the student back to a domain in which the student has previously been tested. These conditions can be unified into a single general rule if one considers score ranges for which no routing instructions are offered to be the same as sending the student back to the domain just tested. For example, in the routing model shown earlier, the examinee who scores 60 in algebra can be regarded as having been routed back to algebra, which has already been administered. Therefore, the placement decision is made based on the score in the algebra domain.

Routing rules and placement domains When a student has completed two or more mathematics content domains, the circumstances under which a test session ends will determine the domain that generates the placement recommendation. The domain used for placement is always the domain to which the student is last routed when the test session ends. For example, the testing of the student who scored 60 in algebra (see Figure (2)6.2, Sample geometry routing rules) ended when the student was sent back to the algebra domain; the algebra domain is then used for placement. Another example is a student who scores 20 in algebra, then is sent to numerical skills/prealgebra and scores 85; the test session would end because the score routed the student back to algebra,



which would be the placement domain as well. The domain to which an examinee was last routed does not need to be the domain in which the examinee was last tested.

Routing rules and initial domains For optimal effect, routing rules must be properly coordinated with the procedures selected for determining the initial domain. Students who begin in a mathematics domain that is too easy or too difficult for them should be given the opportunity to move on to a more appropriate domain. The exception is a test package designed to deliver only a diagnostic assessment from which no routing is possible. Unless it is known that a large proportion of students have math skills at the lower end, it may be best to start students in a content domain near the center of the hierarchy (algebra or college algebra) and allow the routing rules to move examinees up or down based on their performance. Students whose known skills or math backgrounds indicate little or no facility with algebra may be best started in the numerical skills/prealgebra domain.

A content domain can be excluded from presentation entirely if the routing rules do not provide students with a path to reach it. This capability is useful in narrowing the focus of the COMPASS Mathematics placement tests to one or two key domains. For example, a mathematics test session could consist exclusively of presentation of the geometry domain. Clearly, the decision to exclude certain domains from testing must be reconciled with the procedure selected to determine each student’s initial domain. If geometry alone is to be presented, geometry should also be directly specified as the initial domain for all students. In this case, allowing students to select their initial domain or using input test scores to select this domain would be counterproductive.

Assembling diagnostic assessments One final means of controlling testing in the mathematics curricular area is by selecting the individual tests that comprise the Numerical Skills/Prealgebra and Algebra diagnostic tests. Seven diagnostic areas are available for Numerical Skills/Prealgebra: basic operations with integers; basic operations with fractions; basic operations with decimals; exponents, square roots, and scientific notation; ratios and proportions; percentages; and averages. The Algebra domain offers eight diagnostic areas: substituting values; setting up equations; basic operations with polynomials; factoring polynomials; linear equations in one variable; linear equations in two variables; exponents and radicals; and rational expressions. Any or all of the available tests of each type can be chosen to constitute the diagnostic test by defining a test package.



COMPASS Reading and ESL Reading Unlike the COMPASS Mathematics placement tests, the COMPASS Reading and ESL Reading tests are administered from single, broad content domains. This is due to the nature of course offerings at most postsecondary institutions. For the reading area, fewer levels of standard entry-level courses are typically offered. The COMPASS Reading Placement Test is most frequently used to place students into either a developmental reading course or into regular courses in their degree program. The ESL Reading Proficiency Test is used to place non-native English-speaking students into various levels of ESL courses.

Sequencing of passage types The COMPASS Reading Placement Test uses an item pool consisting of passages of 190 to 300 standard words. These passages are of five text types: prose fiction, humanities, social sciences, natural sciences, and practical reading. The test administrator is able to indicate the text types to be administered as well as the order in which they will be presented. If desired, the administrator can designate only a single text type or any combination of types. For more information on COMPASS Reading passages and items, please refer to Part 2, Chapter 1, “Reading tests,” in this Reference Manual.

COMPASS Reading Diagnostics If examinees score poorly on the COMPASS Reading Placement Test, they can be routed to the Reading diagnostic measures: COMPASS Reading Comprehension and COMPASS Vocabulary diagnostic tests. Users may also choose to administer the COMPASS Reader Profile to learn about the student’s reading habits. Scores on the diagnostics may help explain an examinee’s low performance on passages in the COMPASS Reading Placement Test. The diagnostic items do not count toward a student’s placement score; rather, they are intended to perform a diagnostic function. Students who take these items will have a separate statement appended to their placement scores indicating their performance on the diagnostics. These scores may suggest certain forms of instructional intervention that are appropriate for the student.

Routing rules Contingent upon performance on the COMPASS Reading Placement Test, examinees can be routed to the diagnostic items. Typically, low-ability examinees would be routed to the diagnostic tests. A sample routing table for the Reading Placement Test is shown in Figure (2)6.3. Examinees can be routed from the Reading Placement Test into one or more of the three



subtests: Reading Comprehension Diagnostic, Vocabulary Diagnostic, and the COMPASS Reader Profile.

ESL students cannot be tested first in COMPASS Reading. However, they can be started in ESL Reading, and those who score high (e.g., Level 4) on ESL Reading can be appropriately routed to COMPASS Reading. It is not possible to route students into ESL Reading from COMPASS Reading or to route an ESL student who scores low (e.g., Pre-Level 1 or Level 1) on the ESL Reading Proficiency Test to the COMPASS Reading diagnostics. The COMPASS Reading placement and diagnostic tests cannot be used to distinguish between native and non-native English speakers.

FIGURE (2)6.3 Sample routing table for COMPASS Reading Diagnostics

Writing Skills, Direct Writing, and ESL Grammar/Usage The COMPASS/ESL system has three writing placement components and eight writing diagnostics. The COMPASS Writing Skills Placement Test is a multiple-choice assessment intended for placing English-speaking students into appropriate levels of standard English writing courses. The second component is a direct writing assessment comprising COMPASS e-Write (2–12 or 2–8) or ESL e-Write (2–12 or 2–8); the latter of these is specifically designed for English language learners. The third COMPASS writing component is the ESL Grammar/Usage Proficiency Test, a multiple-choice test intended for placing ESL students into appropriate levels of ESL courses. The eight COMPASS Writing Skills diagnostic test domains are intended for use primarily with English-speaking students who do not score at or above the cutoff score for placement into a standard entry-level English course.

Using a computer-administered-and-scored format, the COMPASS Writing Skills Placement Test is designed to give an efficient assessment of specific writing skills to help



determine students’ readiness for entry-level college English courses. Students who perform poorly on the test will likely be placed into a developmental English or writing course; students who perform well will be placed into the standard entry-level course or higher. For details on the content of the COMPASS Writing Skills Placement Test, please refer to Part 2, Chapter 3, “Writing Skills tests.”

COMPASS e-Write complements COMPASS Writing Skills by providing additional information about students’ generative writing capabilities, including specific writing domains, such as the ability to focus on and support specific content, organize writing appropriately, use an appropriate writing style, and follow established conventions for English grammar, usage, and mechanics. COMPASS e-Write is not designed for use with ESL students who score low on the ESL Grammar/Usage Proficiency Test, but it may be appropriate for ESL students who score high (e.g., Level 4) on it. For students whose scores on the ESL Grammar/Usage Proficiency Test are in the lower range (e.g., Level 1 or 2), however, ESL e-Write is the appropriate direct writing assessment option. To learn more about the scoring models and reports for COMPASS e-Write and ESL e-Write, please refer to Part 2, Chapter 5, “Direct writing assessment,” in this Reference Manual.

The prompts in ESL e-Write describe a common aspect of everyday life and are presented in language that is accessible for ESL students. These prompts do not require students to have background or specialized knowledge to successfully respond. ESL e-Write provides additional information on ESL students’ writing abilities relative to developing and supporting a stated or implied position, maintaining a clear and consistent focus, logically organizing ideas, demonstrating appropriate control of language use, and using appropriate writing conventions (e.g., mechanics).

The ESL Grammar/Usage domain is designed to assess the knowledge and skills of students for whom English is not their first language. This domain is intended for use in placing ESL students into appropriate ESL courses based on their English grammar and usage score. For detailed information about the nature and content of the ESL Grammar/Usage proficiency domain, refer to Part 2, Chapter 4, “English as a Second Language tests,” in this Reference Manual.

As with all of the placement domains, placement messages for the COMPASS Writing Skills and ESL Grammar/Usage tests are under the control of the test administrator and institution and should be tailored to meet the needs of the institution. For details on developing placement messages and setting cutoff scores for placement domains, refer to Part 2, Chapter 7, “Cutoff scores,” in this Reference Manual.



COMPASS Writing Skills diagnostics The eight COMPASS Writing Skills diagnostic domains are: punctuation, verb

formation and agreement, usage, relationships of clauses, shifts in construction, organization, spelling, and capitalization. When creating test packages, users may select any of these diagnostic domains to administer. All COMPASS Writing Skills diagnostic domains provide estimates of proficiency in the form of numerical scores that reflect the proportion of items that the examinee would likely answer correctly if all items in the domain were administered.

Routing rules If Writing Skills diagnostic tests are to be administered depending on the student’s

score in the Writing Skills Placement Test, examinees can be routed to the Writing Skills diagnostics from the placement test. Typically, low-ability examinees would be routed to one or more of the eight diagnostics. A sample routing table is shown in Figure (2)6.4.

ESL students cannot be tested first in COMPASS Writing Skills. However, they can be started in the ESL Grammar/Usage Proficiency Test, and students who score high (i.e., Level 4) on it could be appropriately routed to COMPASS Writing Skills. It is not possible to route students into ESL Grammar/Usage from the COMPASS Writing Skills or COMPASS e-Write tests or to route students who score low on ESL Grammar/Usage into the Writing Skills diagnostics. The COMPASS Writing Skills placement and diagnostic tests and COMPASS e-Write cannot be used to distinguish between native and non-native English speakers.



FIGURE (2)6.4 Sample routing table for the COMPASS Writing Skills Diagnostic

ESL Listening The ESL Listening Proficiency Test is a single broad skill domain. Depth and precision

are controlled in ESL Listening by specifying test length—standard, extended, or maximum. The minimum number of ESL Listening items that will be presented puts a lower limit on test length. Setting the test length establishes a maximum number of listening items and puts an upper limit on test length, thus preventing individual examinees from receiving very long tests. For more information on the ESL Listening Proficiency Test, please refer to Part 2, Chapter 4, “English as a Second Language tests,” in this Reference Manual.


Part 2, Chapter 7: Cutoff scores

Chapter 7: Cutoff scores

Overview The COMPASS system is designed to provide maximum flexibility and institutional

control. Different groups of students can be placed according to different sets of rules. For example, a student majoring in humanities and a student majoring in science can receive different course placement recommendations even if both students performed identically on the tests. The key to this placement framework is for an institution to integrate cutoff scores with placement messages. Cutoff scores are the points on the score scale, as determined by the institution, at which examinees are classified as either demonstrating or failing to demonstrate a particular level of proficiency required to enter a given course. Placement messages are assigned to an examinee’s score based on that score’s position relative to (i.e., above or below) the cutoff score. COMPASS tests use a scale score range that generally extends from 1 to 99. This chapter provides information about the development and use of placement messages and cutoff scores, with suggestions for their integration.

Developing placement messages The core of the COMPASS framework is the placement message that is associated with

a student’s score. Both the score and the placement message are displayed on a student’s score report. While the COMPASS software includes default placement messages for all COMPASS tests, these default messages are only intended to provide a model for users. To ensure that placement messages align with local needs, institutions should customize placement messages to either indicate assignment to a specific course (e.g., “Recommend placement in Math 101”) or convey a general impression of the student’s performance (e.g., “Recommend placement in an introductory math course to strengthen your algebra skills”).

An institution can establish major groups consisting of several different educational programs or majors. Based on the test score earned, an examinee from a particular major group will receive the specific placement message assigned for that particular group.

A different placement message is assigned for each user-defined score range in a given COMPASS placement test domain. For example, “Recommend placement in Math 099 to further develop numerical skills” may be assigned to the score range of 0 to 40 in the numerical skills/prealgebra content domain. Because the COMPASS Mathematics placement tests are administered in five different content domains, a separate set of placement messages must be specified for each domain. The COMPASS Reading and Writing Skills tests and the ESL



Grammar/Usage, Reading, and Listening tests are each administered from a single, broad content domain, so only one set of placement messages needs to be specified for each of these tests. A complete set of placement messages, called a message group, covers all contingencies within a given content area by collectively assigning placement messages to each possible score point.

Specific messages for majors The institution can link placement recommendations to each student’s educational

program or major by defining several message groups for the same domain or content area; each message group must be appropriate for different major groups of students. Science majors, for example, can be placed according to rules different from those used for humanities majors. It is unlikely that every educational program or major will need its own unique placement messages, so there generally will be fewer message groups than educational programs. Accordingly, individual educational programs are each assigned to one of a common set of message groups. For example, if biology, chemistry, and physics majors are all to be placed by the same set of rules (perhaps into a sequence of prerequisite math courses) these rules could be entered as the “Science” message group. The individual science majors would then be assigned to this major group. Similarly, history, sociology, and psychology majors may each be assigned to a “Social Sciences” message group.

In COMPASS Mathematics tests with placement messages defined separately for each of the five content domains, a complication arises when testing students in two or more domains. Theoretically, students could receive a separate placement message from each domain tested, and the recommendations made in these messages could conflict. The system, therefore, must sort out these possible conflicts and settle on a single placement message for each student. Ideally, placement should be based on performance in the mathematics content domain most representative of a student’s current level of knowledge and skills. Conflicting recommendations are best avoided by careful construction of message groups. The following sections discuss methods for constructing these groups through the appropriate selection of cutoff scores.

Developing cutoff scores Developing cutoff scores is a two-stage process. The first stage involves setting initial

cutoff scores for courses in the local curriculum, while the second stage involves validating the initial decisions with course outcome data of students placed into particular courses. There are several ways to set initial cutoff scores.



To best reflect the local curriculum, most institutions will require locally developed course placement messages for students. Over time, these messages will use cutoff scores derived by comparing locally compiled COMPASS scores with course outcome data. Some important considerations in determining cutoff scores are the following:

• the percentage of students that can be placed into the various courses (e.g., developmental and standard courses)

• an individual student’s probability of succeeding in the standard entry-level course

• the percentage of students that will be correctly placed, both in the developmental and the standard courses

Setting initial cutoff scores (Stage 1) Setting a cutoff score is most problematic when a test is being used for the first time and

no data exist relating student performance in class to student test scores. In such cases, an institution can use expert judgment, information from other tests, or cutoff scores currently in place at an institution with similar curricula and students. These methods are described below.

Expert judgment For an institution using expert judgment to set initial cutoff scores, faculty members

who are knowledgeable about course content can evaluate the content of the tests by examining the COMPASS Sample Items booklet for each domain or content area. The faculty members then would decide the percentage of items that a student needs to answer correctly to demonstrate sufficient proficiency for placement in various courses. Shepard (1984) describes methods that can be employed to determine a cutoff score using expert judgment.

Concordance of COMPASS with other measures If another placement test is currently being used, institutions may wish to set

COMPASS cutoffs that will result in placement outcomes similar to those derived from the current test. Because COMPASS tests and the currently used placement tests may not be strictly parallel, all methods for deriving comparable scores are psychometrically problematic to some degree. However, the following method is designed to yield COMPASS cutoff scores that will result in similar percentages of students placed in courses as are placed using the current test. Although this method is appropriate for converting cutoff scores from one test to another, it is not appropriate for converting an individual student’s score from one test to another test.



To employ this method, several conditions must be satisfied. First, the currently used test must be similar in content and function to the COMPASS test. Second, samples of examinees to whom both tests have been administered must be comparable (i.e., similar in the distribution of student abilities). If these conditions are not met, the COMPASS cutoff scores may not result in the same percentage of students placed in a course, or the students placed may differ in skills and abilities from those students placed using the current test. Furthermore, because the currently used test and COMPASS tests are not parallel forms of the same test, cutoff scores determined using this method should be considered preliminary estimates that should later be validated and modified as needed.

The steps required to compare COMPASS tests with tests currently used are outlined below. The comparisons should be computed separately for each curricular area.

• Collect data from a previous administration of the placement test (or tests) currently used. Choose a representative group of at least 500 examinees tested for placement purposes. The selected group must provide adequate numbers of students at the score levels at which comparisons of cutoff scores will be made.

• Administer the similar COMPASS test (or tests) to a comparable group of students. Alternatively, the above step can be skipped and the similar COMPASS and current tests can be administered to the same group of students in alternating order: COMPASS test first for half of the students and the currently used test first for the remaining half.

• Compute the percentage of students who score below the established cutoff score for the test currently used.

• Determine the COMPASS test cutoff score that identifies the same cumulative percentage of students as the cutoff score for the current test. This is the corresponding cutoff score for COMPASS.

If the currently used placement test is an ACT program (i.e., ASSET or AAP), ACT can provide concordance tables for use in setting corresponding cutoff scores for COMPASS Mathematics, Writing Skills, and Reading.

Using cutoff scores from similar institutions It is also possible for new COMPASS institutions to set initial cutoff scores based on

cutoff scores used by COMPASS user institutions that have similar curricula. Alternatively, institutions may choose to use the cutoff scores set in the default COMPASS test packages. The cutoff scores in the default test packages are based on ACT’s knowledge of typically used cutoff scores. For more information on cutoff scores based on data from current COMPASS



users, refer to “Evidence of predictive validity,” in Part 3, Chapter 2, “Validating uses of COMPASS tests” in this Reference Manual.

Using local norms Norms (score distributions showing the proportions of students at or below each score)

can be obtained from local or national administrations. Local norms are preferable because they are better predictors of the distributions of scores that will be observed in the institution. By definition, local norms are obtained from student scores at the institution. If local norms are not available and cannot be readily obtained, national norms can be helpful. When using national rather than local norms, the students in the norm group may not be particularly representative of the students who will actually be tested; therefore, caution should be exercised when using norms derived from other than a representative local sample of students’ scores because proportions of students who score at or below the scores may not be indicative of the actual proportions that will be obtained.

In the absence of COMPASS scores paired with course outcomes data, institutions may establish initial cutoff scores on the basis of administrative factors, such as the availability of instructional staff or facilities. In such cases, the proportion of students that can be placed at the developmental level is an important consideration. Score distributions from students at the institution (i.e., local norms) can be used under these conditions to provide preliminary cutoff points. For example, an institution may be limited to placing no more than 30% of its entering students in a prealgebra course. The local COMPASS score distribution for a given curricular area can be used to identify the test score below which 30% of the students have scored.

Cutoff scores based on locally determined score distributions are relatively easy to communicate and implement in a placement system. If cutoff scores are determined only on the basis of a desired number of students in the developmental course, however, some students may be placed at a level incompatible with their needs. For example, if a cutoff is set at the score below which 30% of the students score, but in fact 40% of the entering students lack the prerequisite skills for doing standard-level work, then 10% of the students will be inappropriately placed.

Using national norms As an alternative to using local norms, institutions may estimate the numbers of

students who would fall below particular cutoff scores by looking at distributions of scores from national norms. Knowing the proportions of students nationally that would likely fall below various cutoff scores provides a means of estimating the numbers of students that would be placed into classes based on various cutoff scores. For administrative reasons (such as



availability of instructional staff or facilities), estimating the numbers of students that will be placed into various classes given certain cutoff scores is sometimes an important consideration. For example, an institution may not wish to—or be able to—accommodate more that 30% of incoming ESL students in the lowest level ESL course. Although setting cutoff scores based on national score distributions is relatively easy to communicate and implement, using norms is less desirable than using student readiness for a particular level of course. Entering Student Descriptive Reports are available from ACT—one for two-year and one for four-year schools. If local norms are not available, institutions may wish to review the score distributions obtained from these national samples of students for their type of school.

Cutoff scores versus decision zones As an alternative to using a single cutoff score in the placement process, an institution

can employ a decision-zone approach. In mathematics, for example, the college could identify a score range of 50 to 70 in the algebra domain as a decision zone for placement into an elementary algebra course. Students scoring above the decision zone would be placed into an elementary algebra course, and those scoring below the decision zone would be placed into a prealgebra course. Students scoring within the decision zone would be advised that their skills indicate borderline readiness for elementary algebra and that enrolling in the prealgebra course or using some other skill-building service could be options for them to improve skills required for elementary algebra. Another option might be enrolling directly into the elementary algebra course with the awareness that most of the students in it probably would have stronger skills in the prerequisite areas. The student and adviser should also consider other potential indicators of readiness such as obtained GPA in previous math courses.

Summary of procedures for setting initial cutoff scores As indicated previously, many procedures can be used to set cutoff scores. No single

method is best for all institutions and situations. It is wise to consult a measurement expert to help choose and implement a procedure.

• If an institution has qualified judges available who are familiar with both the test and the course content and who can specify a borderline performance level, then cutoff scores can be derived initially on the basis of an estimated level of performance on the test(s) relative to the demands of the specific course(s).

• If a current test is in place or COMPASS will be used in conjunction with another test, an institution might wish to derive COMPASS cutoff scores that are comparable to the other test’s cutoff scores. However, methods for deriving



comparable cutoff scores have many limitations because the tests and samples may differ. If this is the case, comparisons between the two tests may be misleading.

• Institutions may wish to set initial cutoff scores based on the cutoff scores currently in use at similar institutions.

• Another option would be using cutoff scores that are included in the default test administration packages that come in the COMPASS software.

• If an appropriate sample of examinees is available, the use of local norming studies would be a viable approach to establishing cutoff scores. Alternatively, looking at national norms can be helpful in considering initial cutoff scores.

All initial cutoff score methods involve a degree of subjectivity, which varies according to the amount and type of supporting information used in making the decision.

Validating cutoff scores (Stage 2) When a procedure has been selected for establishing initial cutoff scores, it is essential

that the institution monitor the effectiveness of the cutoff scores. Adjustments to the initial cutoff scores may be needed. Information provided in COMPASS research reports illustrates the effectiveness of the cutoff scores for course placement. Cutoff scores should help institutions place students in courses that are appropriate to their skill level and in which they have a high probability of success.

Probability of success An approach to verifying the effectiveness of cutoff scores is based on the statistical

relationship between test scores and the probability of success in the standard course. Traditionally, linear regression or correlation methods have been used to validate the relationship between test scores and course outcomes in placement situations. Strong linear relationships (as indicated by high correlations) between the test score and course grades suggest that the test is functioning well as a placement tool.

However, because correlations are sensitive to restriction of range, they tend to underestimate the strength of the relationship between the test scores and course grades when students already have been selected by pre-existing placement systems. In addition, a correlation indicates the average strength of the linear relationship across different values of the test score scale. The correlation does not focus on the cutoff point; thus, it is not possible to determine the accuracy of a cutoff score that identifies students who need developmental instruction.



Because of the shortcomings of these linear methods in the placement context, ACT has developed an alternative approach to evaluating placement systems that is based on decision theory (Sawyer, 1989). This method uses logistic regression to relate test scores to the probability of success in the standard course. For example, an institution might decide that a student needs a grade of C or higher to pass elementary algebra; students who dropped out or earned grades below C would be considered unsuccessful, and students who completed the course with grades of C or higher would be considered successful.

Using this probability information, an institution can then estimate the effects of various cutoff scores. The percent of correct and incorrect placement decisions can be estimated for each potential cutoff score, allowing the institution to evaluate cutoff scores based on the percentage of students who are correctly placed in both developmental and standard courses. Institutions using COMPASS can participate in the ACT Course Placement Service, which provides this type of information. (See “Research services: Entry-to-exit tracking and reporting” in the Introduction of this Reference Manual.)

Developing initial ESL cutoff scores The previous discussions about setting initial cutoff scores (Stage 1) and validating

cutoff scores (Stage 2) is applicable to the process of setting and validating ESL cutoff scores. With regard to the role of expert judgment in setting initial cutoff scores for ESL courses, a key component of the ESL Grammar/Usage, Reading, and Listening multiple-choice tests is the proficiency descriptors. The proficiency descriptors provide concrete indications of students’ knowledge and skills for each of these three multiple-choice ESL tests.

The following list of steps summarizes a process for setting initial ESL Grammar/Usage, Reading, and Listening cutoff scores using the proficiency descriptors as an integral part of expert judgment.

• Step 1: Review ESL course descriptions and objectives.

• Step 2: Review descriptions of ESL Grammar/Usage, ESL Reading, and ESL Listening tests. In this Reference Manual, please refer to Part 2, Chapter 4, “English as a Second Language tests.”

- Study proficiency descriptors.

- Study sample items.

• Step 3: Decide which (one or more) ESL test will be used for each course.

• Step 4: For each ESL course, compare the intended levels of skills of an entering student to the ESL proficiency levels reported for each domain.



• Step 5: Decide which ESL proficiency level best describes the intended entering student for each course.

Students who score at this proficiency are probably best placed into this course.

• Step 6: Set initial cutoff scores for placement into the course at a domain score within the range of scores associated with the proficiency level.

• Repeat steps for each course.

• Initiate course placement based on cutoff scores.

• Collect faculty judgments regarding accuracy of placement. Collect course outcomes (i.e., grades). Use this information to adjust cutoff scores if necessary.

The expert judgment of ESL faculty should be used to determine the relationship between the proficiency descriptors of the levels in the three ESL modules (Grammar/Usage, Reading, and Listening) and the specific courses in the college’s ESL curriculum. The proficiency descriptors provide profiles of typical student abilities at various levels in the three skill areas. ESL faculty should use these profiles to determine which levels match the desired skill profiles for entering students in the various courses in the ESL curriculum.

Initial ESL Grammar/Usage, Reading, and Listening cutoff scores, however they are determined, must be validated (and modified if necessary) by monitoring the course outcomes of the placed ESL students. This Stage 2 validation is as essential for ESL cutoff scores as it is for other COMPASS measures.

Multi-measure placement message Some institutions wish to make placement decisions based on more than one test score.

COMPASS allows for the inclusion of a multi-measure placement message. Up to six different variables can be included in the placement message for COMPASS examinees. COMPASS allows the addition of specific self-reported high school grades, overall self-reported high school grades, and scores from other tests to be included as measures taken into account in placement decisions. For example, if an examinee scores near but below the cutoff score for placement in a particular course and also reports a high grade in a corresponding high school course and a high score in another relevant assessment measure, the multi-measure placement measure feature allows these factors to be taken into account in the placement decision. For more information regarding construction and use of multi-measure placement messages, see the section on Test Setup in the online “Help” within the COMPASS system.


Part 2, Chapter 8: Interpretation of scores

Chapter 8: Interpretation of scores

Overview The COMPASS system reports results in different ways. A score scale ranging from 1

to 99 is used to describe performance. For most of the COMPASS tests, this scale conceptually represents the percentage of the entire item pool that examinees would be expected to answer correctly if they were to take all of the items. Although the reported scores represent a percentage correct and, therefore, range from a theoretical lower bound of 1% to a maximum possible value of 99%, the observed lower bound is likely to be greater than 1% (i.e., in the range from 15 to 20%) because examinees can get some items correct by guessing.

Mathematics and Reading test scores Scores for the COMPASS Mathematics and Reading tests, both placement and

diagnostic, are reported as estimates of the percentage of items in each administered content domain that the examinees would answer correctly if they were administered all items in the pool. The estimation is based on calibration information for all of the items in the pool and the responses to the particular set of items taken by the examinees. The scoring takes into account not only the number of correct responses to the items administered but also the item difficulty, the probability of getting the items correct by guessing, and the discriminating power of the items. Thus, examinees who are administered difficult items because of previous correct responses and who answer these difficult items correctly will receive higher scores than examinees who respond correctly to easier items after having answered more difficult items incorrectly.

Writing Skills test scores The Writing Skills placement and diagnostic tests are reported on a 1 to 99 score scale,

but the placement test scores must be interpreted somewhat differently from how the other COMPASS test scores are interpreted. This is because of the test format that asks examinees to read a passage, identify errors in writing, and select answer options that correct those errors. In theory, a score of “1” on the COMPASS Writing Skills Placement Test means that the examinee edited the passages and responded to the questions at the same level or lower than did the lowest-performing examinee in the calibration sample. An examinee who responded to the questions so that every editing item segment was incorrect would receive a score of 1. A score of 99 indicates that when the examinee completed the test, the passage contained no errors.



ESL test scores Scores for the ESL Grammar/Usage, Reading, and Listening proficiency tests are

reported as estimates of the percentage of items in each administered content domain or test pool that the examinees would answer correctly if they were administered all items in the pool. The estimation is based on calibration information for all of the items in the pool and the responses to the particular set of items taken by the examinees. The scoring takes into account not only the number of correct responses to the items administered but also the difficulty, the typical probability of getting the items correct by guessing, and the discriminating power of the items. Examinees who receive difficult items because of previous correct responses and who answer these difficult items correctly will receive higher scores than examinees who respond correctly to easier items after having answered more difficult items incorrectly.

In addition to the 1 to 99 scores reported for the ESL measures, proficiency levels are reported that describe the students’ English proficiencies. There are five levels: Pre-Level 1, Level 1, Level 2, Level 3, and Level 4. The ESL proficiency levels are detailed in Part 2, Chapter 4, “English as a Second Language tests,” in this Reference Manual.

Multi-measure placement The COMPASS and ESL assessments all provide a single score upon which placement

messages are typically based. In some cases, a college may find it useful to base a placement message on the combination of scores obtained from different measures. COMPASS provides the option for using a number of scores in combination to create a placement message. Up to six measures can be combined in a multi-measure placement message.

The measures that can be used are: • other COMPASS placement test scores • ASSET Test scores • local measure scores (e.g., writing sample) • numeric-response demographic items • high school grades • grades from post–high school courses • overall high school GPA

Examples of measures used in combination for a placement message are provided in Table (2)8.1.



TABLE (2)8.1 Sample multi-measure placement combination

Course placement recommendation

Measures used in combination

English placement COMPASS Reading COMPASS Writing Skills COMPASS e-Write ACT English Local writing sample High school English grade Post–high school English grade

Reading placement COMPASS Reading Overall high school GPA Vocabulary test ACT Reading score

Basic mathematics placement

COMPASS Numerical Skills/ Prealgebra High school business math grade Post-high school business math


PART 3: COMPASS TECHNICAL INFORMATION

Chapter 1: Technical characteristics of COMPASS tests ........................................................... 1 Overview ................................................................................................................................................... 1 Description of dataset for COMPASS tests ............................................................................................... 1 Description of dataset for ESL tests ........................................................................................................ 11

Chapter 2: Validating uses of COMPASS tests ....................................................................... 17 Overview ................................................................................................................................................. 17 Measuring educational knowledge and skills ......................................................................................... 17 Making course placement decisions....................................................................................................... 17 Correlation coefficients .......................................................................................................................... 18

Chapter 3: Development of COMPASS tests .......................................................................... 26 Overview ................................................................................................................................................. 26 Test specifications .................................................................................................................................. 28 Item development procedures ............................................................................................................... 28

Chapter 4: Development of ESL tests .................................................................................... 53 Overview ................................................................................................................................................. 53 System and test specifications ............................................................................................................... 53 Item development procedures ............................................................................................................... 55 Differential item functioning analyses of ESL pools ............................................................................... 58

Chapter 5: Development of COMPASS e-Write and ESL e-Write ............................................ 72 Overview ................................................................................................................................................. 72 Overview of ESL e-Write ......................................................................................................................... 81 ESL e-Write prompt development .......................................................................................................... 81 ESL e-Write scoring and range-finding ................................................................................................... 83 ESL e-Write scoring system study ........................................................................................................... 84


Chapter 6: Calibration of test items ...................................................................................... 87 Overview ................................................................................................................................................. 87 General item calibration description ...................................................................................................... 87 Mathematics, Reading and Writing Skills descriptions .......................................................................... 89

Chapter 7: Adaptive testing .................................................................................................. 94 Overview ................................................................................................................................................. 94 Components of an adaptive testing system ........................................................................................... 95

References ........................................................................................................................... 99


Part 3, Chapter 1: Technical characteristics of COMPASS tests

Chapter 1: Technical characteristics of COMPASS tests

Overview Documenting the technical characteristics of the COMPASS program requires data to be intermittently collected, analyzed, and reported. Because COMPASS multiple-choice components are computer-adaptive, the kinds of data that need to be collected differ from the data needed for paper-and-pencil tests. For example, reporting the average number of items correctly answered (average raw score) is of little value for COMPASS tests because each examinee is administered a variable number of items.

This chapter presents technical information from two datasets. First, the chapter describes technical information about COMPASS Mathematics, Reading, and Writing Skills placement tests based on operational data collected from October 2004 through July 2008. Second, technical information for the ESL proficiency tests is presented based on data collected during a validity study conducted from fall 1999 through spring 2001.

Description of dataset for COMPASS tests This section presents information about the COMPASS tests that focuses primarily on data collected from October 2004 through July 2008. This dataset contains data from the ACT National Entering Student Descriptive Report (ESDR) database and pertains to 1,708,019 college students who were administered one or more COMPASS tests. Of these students, 1,501,434 were from two-year colleges and 206,585 were from four-year colleges.

Demographic characteristics of examinees Table (3)1.1 provides demographic information for the 1,708,019 students in the dataset. In the total sample, there were 17% more females than males and the majority of examinees were Caucasian, with African American and Mexican American examinees constituting the next largest ethnic groups. Differences between the two-year and the four-year college students were relatively small except for student age. Four-year schools had a higher proportion of students under age 20 than did the two-year institutions.



TABLE (3)1.1 Demographic characteristics of students in 2-year and 4-year colleges

Characteristics 2-year (%) 4-year (%) Total (%)

Age (100%) Under 20 20–29 30–39 40–49 Over 50

*51 29 11 6 2

60 26 9 4 2

55.5 27.5

10 5 2

Gender (97%) Female Male

57 40

57 40

57 40

Ethnic background (100%)

African American / Black American Indian / Alaskan Native Caucasian / White Mexican American / Chicano / Latino Asian / Pacific Islander Puerto Rican / Cuban / other Hispanic Filipino Other Prefer not to respond

22 2 55 7 3 3 1 5 2

22 1 52 5 2 7 0 5 2

22 1.5 53.5

6 2.5

5 0.5

5 2

*The numbers indicate the total percentage of students from each type of school who responded to each question.



Average obtained scores on COMPASS placement tests Table (3)1.2 presents average obtained scores for each of the eight COMPASS placement tests: COMPASS e-Write, Writing Skills, Reading, Numerical Skills/Prealgebra, Algebra, College Algebra, Geometry, and Trigonometry. Separate scores are reported for two-year and four-year schools.

TABLE (3)1.2 Scores obtained 2004–2008 for COMPASS student samples (2-year N = 1,501,434; 4-year N = 206,585)

COMPASS placement test 2-year colleges 4-year colleges

Writing Skills Mean: Std. Deviation: No. of Students:

62.92 28.91

1,009,542

66.03 27.71

73,968

COMPASS e-Write 2–12 Mean: Std. Deviation: No. of Students:

6.76 1.52

27,025

7.49 1.58 2,590

COMPASS e-Write 2–8 Mean: Std. Deviation: No. of Students:

5.35 0.91

75,318

5.34 0.89 9,408

Reading Mean: Std. Deviation: No. of Students

77.88 15.99

1,068,771

78.35 15.63

84,513

Numerical Skills/Prealgebra Mean: Std. Deviation: No. of Students:

45.37 20.83

878,881

47.60 20.91

80,166

Algebra Mean: Std. Deviation: No. of Students:

33.81 18.47

803,266

37.78 19.89

123,956

College Algebra Mean: Std. Deviation: No. of Students:

47.56 19.90

103,498

48.37 19.61

30,834

Geometry Mean: Std. Deviation: No. of Students:

52.72 23.72

27,700

56.21 21.01 397

Trigonometry Mean: Std. Deviation: No. of Students:

43.40 18.38

49,639

44.81 16.94

15,596



Expected reliabilities for COMPASS tests Conventional formulas for computing internal consistency reliability do not directly apply to adaptive tests because the individuals are administered different sets of test items. In an adaptive test, the examinees are measured with a slightly different reliability, which aligns with the items administered. The marginal reliability coefficient usually reported for adaptive tests takes this variation into account by averaging the individual reliabilities across examinees. The result is a coefficient that can be directly compared to values obtained for conventional tests using conventional formulas.

The marginal reliability coefficient can be computed through simulation studies, in which artificial data are generated in a manner that closely resembles actual examinee responses. The advantage of such studies is that the “true” abilities of the examinees are known in advance and can be directly compared with the “observed” results obtained through the testing process. The reliability estimates reported in Table (3)1.3 were obtained through simulation studies that covered a broad range of ability levels in the relevant content areas and were large enough to ensure stable results.

COMPASS users have the option of administering COMPASS/ESL tests in either of two test lengths (standard and maximum) or of three (standard, extended, and maximum), depending upon the content area being tested. The slight tradeoff between test length and test reliability is that longer tests are more reliable than shorter tests. Table (3)1.3 provides the minimum, maximum, and average number of items administered and the marginal reliability coefficient for the various test lengths of each COMPASS test. Three of the tests—the COMPASS Writing Skills and Reading placement tests and the Reading Comprehension Diagnostic Test—offer only standard and maximum test length options. Each of these tests contains one or more passages, accompanied by test items. The test length for “passage-based” tests can be adjusted only at the level of the passage and associated item set. COMPASS offers the two most clearly differentiated test-length options for these tests because the increase in measurement efficiency was small relative to the number of items. NOTE: Please not that posterior variance criteria contribute to stopping rules for COMPASS tests, which results in some variability depending on the test length (e.g., comparable minimum and maximum test lengths, but slight differences in average test lengths).



TABLE (3)1.3 Test length options and corresponding reliability estimates for COMPASS Mathematics, Reading, and Writing Skills tests

COMPASS test Standard length Extended length Maximum length

Min. Max. Avg. Reliability Min. Max. Avg. Reliability Min. Max. Avg. Reliability

Mathematics placement Numerical Skills/Prealgebra Algebra College Algebra Geometry Trigonometry

8 8 8 8 8

14 14 14 14 14

12.5 12.3 11.9 8.8 12.1

0.86 0.85 0.87 0.88 0.85

9 9 9 9 9

17 17 17 17 17

15.9 15.5 14.4 11.4 15.7

0.89 0.88 0.88 0.89 0.87

10 10 10 10 10

20 20 20 20 20

18.5 18.0 16.5 13.6 18.1

0.90 0.89 0.89 0.90 0.88

Prealgebra diagnostics Integers Fractions Decimals Positive Integer Exponents Ratios and Proportions Percentages Means, Medians, Modes

5 5 5 5 5 5 5

12 12 12 12 12 12 12

9.6 6.7 8.7 7.1 7.0 7.8 8.7

0.73 0.79 0.73 0.74 0.79 0.80 0.73

5 5 6 5 6 5 6

12 12 12 12 12 12 12

11.0 7.7 10.1 8.8 10.0 9.3 10.5

0.77 0.81 0.77 0.78 0.83 0.82 0.76

5 5 8 6 7 6 8

15 15 15 15 15 15 15

13.8 9.3 13.3 10.3 12.0 11.5 13.7

0.80 0.84 0.82 0.80 0.85 0.85 0.80

Algebra diagnostics Substituting Alg. Values Setting Up Equations Basic Polynomials Factoring Polynomials Linear Equations 1 Exponents and Radicals Rational Expressions Linear Equations 2

5 5 5 5 5 5 5 5

12 12 12 12 12 12 12 12

8.3 8.6 8.2 7.9 8.4 8.2 8.4 8.6

0.75 0.75 0.80 0.79 0.79 0.80 0.77 0.76

6 6 6 6 6 5 5 5

12 12 12 12 12 12 12 12

9.5 10.6 9.8 9.4 10.0 9.4 10.0 9.9

0.78 0.78 0.82 0.82 0.82 0.82 0.80 0.79

7 8 7 7 7 6 6 7

15 15 15 15 15 15 15 15

12.2 13.9 12.5 12.0 12.4 11.8 12.5 12.5

0.81 0.81 0.85 0.85 0.85 0.85 0.83 0.82



TABLE (3)1.3 (continued)

COMPASS test Standard length Extended length Maximum length


Reading placement 10 21 22.1 0.87 NA NA NA NA 17 25 27.1 0.90 Reading diagnostics Comprehension Vocabulary

9 5

22 14

13.6 13.4

0.78 0.79

NA 5

NA 15

NA

14.4

NA

0.82

9 5

24 18

17.0 17.2

0.82 0.84

Writing Skills placement 23 (10)

27 (20)

24.5 (23.5)

0.88 NA NA NA NA 23 (10)

52 (35)

42.5 (30.0)

0.90

Writing Skills diagnostics Punctuation Verb Formation/Agreement Usage Relationships of Clauses Shifts in Construction Organization Spelling Capitalization

5 5 5

5 5 5 5 5

15 15 15

15 15 15 15 15

10.1 13.3 14.2

11.8 14.0 12.8 12.5 12.1

0.75 0.70 0.70

0.74 0.70 0.75 0.73 0.78

6 5 5

6 5 5 5 5

15 15 15

15 15 15 15 15

12.6 14.4 14.6

13.9 14.5 13.7 14.2 13.8

0.80 0.73 0.72

0.78 0.74 0.78 0.77 0.82

7 7 5

7 5 5 7 6

20 20 20

20 20 20 20 20

15.7 19.3 19.2

17.6 18.9 17.5 18.6 17.9

0.83 0.77 0.76

0.81 0.76 0.80 0.81 0.84



Conditional standard errors of measurement Another method of determining a test’s reliability is to calculate the test’s standard error of measurement (SEM). The SEM provides information about the difference that could be expected between a student’s actual obtained score and the average score that would be obtained if the student could be tested an infinite number of times under identical circumstances. However, the precision (or reliability) of students’ scores varies along the score scale; thus, the single value reflected by the SEM is likely to be more accurate at some points along the score scale and less accurate at others. A more useful indicator of score precision may be the conditional standard error of measurement (CSEM). The CSEM can be estimated for different values across the score scale, thereby helping users interpret likely reliability throughout the score scale.

CSEMs can be interpreted in much the same way as confidence intervals. For example, if a student receives a score of 90 on the COMPASS Numerical Skills/Prealgebra Placement Test, the CSEM at that point on the score scale is 4.8 Thus, it can be concluded that, although the student’s actual obtained score remains the best single estimate of the student’s “true” score, 68% of the time the student’s obtained score will fall between 85.2 and 94.8, and 95% of the time the score will fall between 80.4 and 99.6.

Tables (3)1.4, (3)1.5, and (3)1.6 present CSEMs at 5-point intervals throughout the score scale for the COMPASS Writing Skills, Reading, and Mathematics placement tests based on the standard, extended, and maximum test length options, respectively.



TABLE (3)1.4 CSEMs for COMPASS Writing Skills, Reading, and Mathematics placement tests (standard length)

Scale score

Writing Skills Reading Prealgebra Algebra

College Algebra Geometry Trigonometry

20 11.2 6.6 5.6 4.3 4.6 5.8 4.4

25 11.6 7.4 7.2 6.3 5.9 6.7 5.8

30 12.1 8.5 8.0 7.8 6.7 8.0 6.7

35 12.2 9.2 8.5 8.7 7.4 8.6 7.3

40 12.4 9.0 9.2 9.6 7.7 9.0 7.6

45 11.8 9.2 9.5 10.1 8.0 9.6 7.7

50 12.4 8.7 9.6 10.4 8.2 9.6 7.9

55 11.6 8.4 9.4 10.7 8.2 9.8 8.1

60 11.5 7.9 9.1 10.6 8.5 9.6 8.2

65 11.4 7.4 8.9 10.5 8.3 9.5 8.1

70 11.2 6.8 8.6 10.2 8.1 9.3 8.0

75 10.6 6.2 7.9 9.4 7.7 8.8 7.6

80 9.8 5.5 7.2 9.0 7.0 8.4 7.2

85 8.8 4.7 6.2 7.9 6.1 7.8 6.5

90 7.6 3.9 5.0 6.4 4.7 7.2 5.5

95 5.5 2.7 3.3 4.2 3.0 5.4 3.5



TABLE (3)1.5 CSEMs for COMPASS Mathematics placement tests (extended length)*

Scale score Prealgebra Algebra


20 4.4 4.1 4.3 5.2 4.2

25 5.9 5.8 5.5 6.4 5.3

30 7.1 6.9 6.5 7.2 6.3

35 7.7 8.1 7.1 8.1 6.6

40 8.2 8.5 7.5 8.4 7.0

45 8.5 9.2 7.6 8.8 7.2

50 8.6 9.6 7.8 9.2 7.2

55 8.6 9.7 7.9 9.3 7.4

60 8.3 9.7 8.0 9.1 7.4

65 8.0 9.6 7.8 9.1 7.3

70 7.6 9.4 7.6 8.5 7.3

75 7.1 8.8 7.1 8.2 6.9

80 6.4 8.1 6.4 7.6 6.3

85 5.7 7.0 5.6 6.9 5.7

90 4.4 5.7 4.3 5.9 4.7

95 2.9 3.9 2.6 4.5 3.0

*COMPASS Writing Skills and Reading do not have an extended test length option.



TABLE (3)1.6 CSEMs for COMPASS Writing Skills, Reading, and Mathematics placement tests (maximum length)

Scale score

Writing Skills Reading Prealgebra Algebra


20 10.5 5.1 4.1 3.8 4.2 5.0 4.1

25 11.0 6.3 5.5 5.3 5.5 6.0 5.3

30 11.5 7.2 6.5 6.7 6.3 6.9 6.0

35 11.5 7.7 7.4 7.8 6.8 7.5 6.3

40 11.4 7.8 8.0 8.3 7.1 7.9 6.6

45 11.5 7.7 8.2 8.8 7.3 8.3 6.9

50 11.7 7.4 8.2 9.1 7.4 8.6 7.0

55 11.1 7.1 8.1 9.1 7.6 8.8 7.0

60 11.1 6.8 7.8 9.2 7.6 8.7 7.1

65 10.5 6.4 7.6 8.9 7.6 8.5 7.1

70 9.8 5.9 7.3 9.0 7.2 8.0 6.7

75 9.3 5.3 6.8 8.4 6.9 7.7 6.5

80 8.4 4.8 6.2 7.5 6.2 7.0 6.2

85 7.8 4.2 5.2 6.6 5.1 6.4 5.3

90 6.6 3.5 4.2 5.3 3.9 5.2 4.3

95 4.6 2.4 2.7 3.6 2.4 3.8 2.8



Description of dataset for ESL tests The data reported in this section about the COMPASS ESL proficiency tests come from a validity study conducted from September 1999 through June 2001. Approximately 50 schools from across the United States contributed a total of 22,597 datasets for this study.

Demographic characteristics of students Table (3)1.7 lists the languages that students in the ESL validity study selected as their primary language and the numbers and percentages of students who selected each language. In all, 28 languages were selected at least once; 14% of the students stated their primary language was not listed among the choices; and 10% of the students did not respond to the question.

TABLE (3)1.7 Languages listed as “primary” by students in the ESL pretest validity study

Primary language No. of students % of students Arabic 201 4 Armenian 32 1 Chinese 598 12 Dutch 4 <1 French 55 1 French Creole 4 <1 German 24 <1 Greek 10 <1 Gujarati 36 <1 Hebrew 7 <1 Hindi 44 1 Hungarian 8 <1 Italian 9 <1 Japanese 208 4 Korean 305 6 Navaho 2 <1 Persian 104 2 Polish 68 1 Portuguese 66 1 Russian 379 8 Spanish 1,309 26 Tagalog 85 2 Thai 47 1 Vietnamese 684 14 Yiddish 3 <1 Not Listed 705 14



Table (3)1.8 lists ESL students’ responses to three of the standard COMPASS/ESL demographic questions: gender, age, and ethnic background. In this study, females outnumbered males by a ratio of about 3 to 2. The largest age group was from 20 to 29, accounting for about 47% of the study participants. About 45% of the students indicated their ethnic background was Mexican American, Puerto Rican, Cuban, or other Hispanic.

TABLE (3)1.8 ESL student responses to the standard COMPASS/ESL demographic questions

Demographic characteristic

% of students

Gender Male Female

39 61

Age

Under 20 20–29 30–39 40–49 50 and over

10 47 26 13 4

Ethnic background African American/ Black American Indian/Alaskan Native Caucasian/ White Mexican American/ Chicano Asian/ Pacific Islander Puerto Rican/ Cuban/ other Hispanic Filipino Other Prefer not to respond

3 0 11 16 17 29 2 18 3

Participants in this study were also given the option to respond to another series of questions related to their previous exposure to the English language and their current level of education. Although the rate of non-response to these questions ranged from 10% to 31%, Table (3)1.9 may provide useful information about the educationally relevant backgrounds of these ESL students.



TABLE (3)1.9 Previous exposure to English and educational background of students participating in the ESL validity study

Question Frequency % Have you studied English previously?

No Yes No response (22%)

488

3,837

11 89

How many years? 0–1 years 1–2 years 2–3 years 3–4 years More than 4 years No response (31%)

1,146

699 462 317

1,212

30 18 12

8 32

Do you have a high school degree? No Yes No response (22%)

682

3,643

16 84

Do you have a college degree from another country? No Yes No response (22%)

3,026 1,299

70 30

Do you usually speak English at home? No Yes No response (10%)

3,342 1,666

67 33

Table (3)1.10 provides the average scores obtained by the ESL students who participated in the validity study.

TABLE (3)1.10 Average scores of ESL students in 1999–2001 validity study

ESL test Sample size Mean Standard deviation

Grammar/Usage 8,484 64.1 16.7

Reading 7,120 68.8 18.1

Listening 6,993 67.8 18.3

Estimated reliabilities for ESL tests As with the COMPASS Reading and Writing Skills placement tests, the ESL Reading and ESL Grammar/Usage tests are passage-based and, therefore, are presented in only two test lengths (standard and maximum). However, the ESL Listening Test offers all three test lengths. Table (3)1.11 lists the minimum, maximum, and average numbers of items administered and the



marginal reliability coefficient associated with each available test length option. The simulations conducted to determine the values shown in this table were of the same type and characteristics as those conducted for the COMPASS placement tests, described earlier (Table (3)1.3).

TABLE (3)1.11 Test length options and reliabilities for the ESL tests

ESL test

Standard length Extended length Maximum length


Grammar/ Usage 8 19 12.2 0.85 N/A N/A N/A NA 10 22 17.3 0.90

Reading 8 19 12.7 0.86 N/A N/A N/A NA 10 24 18.4 0.91

Listening 8 15 10.8 0.85 9 18 12.5 0.87 10 18 14.3 0.89

Conditional standard errors of measurement Tables (3)1.12, (3)1.13, and (3)1.14 present conditional standard errors of measurement (CSEMs) at 5-point intervals throughout the score range for the ESL Reading, Listening, and Grammar/Usage tests, respectively, for standard, extended, and maximum test lengths.

Table (3)1.12 CSEM for the ESL tests (standard length)

Scale score ESL Reading ESL Listening ESL Grammar/Usage

25 7.8 8.2 7.9 30 7.9 8.2 8.2 35 8.6 8.4 8.3 40 8.9 8.7 8.0 45 9.1 8.8 7.6 50 8.9 9.1 7.8 55 8.8 9.2 7.3 60 8.6 9.2 7.4 65 8.1 8.8 7.3 70 7.8 8.3 7.2 75 7.2 7.8 6.8 80 6.7 7.4 6.2 85 5.9 6.8 5.7 90 4.7 5.8 5.0



TABLE (3)1.13 CSEM for the ESL tests (extended length)*

Scale score ESL Reading* ESL Listening ESL Grammar/Usage*

25 7.7

30 7.6

35 7.8

40 7.8

45 8.0

50 8.5

55 8.8

60 8.7

65 8.3

70 7.7

75 7.4

80 6.9

85 6.2

90 5.2

*ESL Reading and Grammar/Usage tests do not have an extended test length option.



TABLE (3)1.14 CSEM for the ESL tests (maximum length)

Scale score ESL Reading ESL Listening ESL Grammar/Usage

25 5.4 6.4 5.9

30 6.1 6.9 6.2

35 6.6 6.9 6.6

40 7.0 7.1 6.3

45 7.0 7.6 6.1

50 7.0 7.6 6.3

55 7.0 7.9 6.1

60 7.0 8.0 6.1

65 7.0 7.8 6.0

70 6.5 7.3 6.0

75 5.9 6.7 5.6

80 5.4 6.3 5.3

85 4.6 5.7 4.8

90 3.7 4.7 4.1


Part 3, Chapter 2: Validating uses of COMPASS tests

Chapter 2: Validating uses of COMPASS tests

Overview According to the Standards for Educational and Psychological Testing, the concept of

validity refers to “the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores” (APA, NCME, AERA, 1999). Each particular use of test scores needs to be justified by an argument for validity. This chapter gives validity arguments for two of the principal uses of COMPASS: (1) measuring entering college students’ educational knowledge and skills and (2) assisting students and college officials in making course placement decisions.

Measuring educational knowledge and skills A major aspect of the current validity evidence for the COMPASS tests relates to content

validity. The basic concept for developing these tests is that the best way to predict students’ success in a given course is to measure, as directly as possible, the skills and knowledge students need to succeed in that course. A wide range of input on the nature and content of college curricula went into constructing the COMPASS tests, thus ensuring a strong match between test and course content.

COMPASS tests are developed according to detailed test specifications to ensure that the test content represents current instruction in the relevant courses. All COMPASS tests are reviewed to be certain that they match these specifications, and this process includes a content review by outside experts in the field being assessed.

Content validity for computer-adaptive tests differs somewhat from content validity in conventional tests. In adaptive testing, this concept applies to the representativeness of (1) the item pools from which the adaptive test items are drawn and (2) the adaptive tests that are computer-selected for each student. The COMPASS system of adaptive tests is designed to ensure that content validity is maintained both for the item pools and the individualized tests.

Making course placement decisions COMPASS test scores are intended to be used in placing students into college courses.

The elements of the validity argument supporting this use include the following:

• COMPASS tests measure the skills and knowledge that students need to succeed in specific courses.



• Students who have the skills and knowledge necessary to succeed in specific courses are likely to perform satisfactorily on the COMPASS tests, and students without those skills are not.

• Higher levels of proficiency on the COMPASS tests are related to higher levels of satisfactory performance in the course.

• If course placement is a valid use of these tests, then a significant, positive statistical relationship between COMPASS test scores and course grades would be expected.

In the past, the validity of using tests for placement has been examined using correlation coefficients and related indices. To provide more informative and useful validity evidence, ACT has developed an alternative methodology that uses placement validity indices (Sawyer, 1989). A discussion of the advantages and disadvantages of these two methodologies for making placement decisions follows.

Correlation coefficients Correlation coefficients are a familiar traditional measure of validity; in placement

testing, they document the relationship between test scores and course grades. However, the disadvantages of using correlation coefficients include the following:

1. By themselves, correlations offer little direct information about the effectiveness of test scores for placing students into courses, and correlations also are easily misinterpreted. At most institutions, students are placed into standard freshman English and mathematics courses using test scores and other information deemed important for success in a particular course. Students scoring above a specified cutoff score are placed into the course, and students scoring below the cutoff are placed into developmental, or remedial, courses. Thus, when course outcomes (i.e., grades) for the standard course are examined and associated with test scores, correlations between test scores and course grades can be developed only for students actually placed in the standard course. The range of the test scores is restricted because all scores below the cutoff score are unavailable (i.e., they belong to students not placed in the standard course).

Moreover, if the placement test effectively identifies high-risk students, then there will be few students in the standard course who earn poor grades; as a result, the range of course grades also will be restricted. The correlation coefficients will be lower than those that would be obtained if all tested students were allowed to enroll in the course. As the accuracy of placement increases, the correlation will decrease. Institutions may interpret



a low correlation coefficient as evidence of invalidity when it could, in fact, be evidence of the exact opposite.

2. Correlations show the strength of association between test scores and course grades, but the procedure makes several statistical assumptions that may not be warranted. For example, inferences made from correlational and linear regression results assume that the conditional distribution of grades is normal; the inferences also assume that conditional variances are equal and that the strength of the relationship between test scores and course grades remains constant throughout the score range (i.e., the relationship is linear). One or more of these assumptions is usually violated, particularly the assumption of normality.

3. Correlations do not take into consideration the costs of incorrect placement decisions. Certain financial costs are incurred whenever a student is provided developmental instruction. An institution’s (and a student’s) time, effort, and money are often wasted when a student who could have succeeded in the entry-level course is instead placed into an unnecessary developmental course. Similarly, costs are incurred when students are incorrectly placed into a standard-level course in which they are unable to succeed because they lack the necessary prerequisite skills.

Placement validity indices The alternate methodology for evaluating placement systems uses placement validity

indices generated from logistic regression models and distributions of predictor variables to determine placement effectiveness. The advantages of this method, compared to traditional methods, are that it allows the strength of the relationship between test scores and course grades to vary by test score (i.e., it allows for curvilinear relationships) and it predicts a student’s probability of success in the standard-level course.

The goal of an effective placement program is for students to succeed in the standard-level course. Placement validity for developmental courses is relevant in terms of how well the courses prepare the student for success in the standard-level course. The accuracy of placement into developmental courses is not very meaningful in isolation, but it can be meaningful when interpreted in relation to success in the standard course. This is true regardless of the method used to develop validity evidence.

Typically, a student’s test score is used to recommend placement into a standard-level course. Students scoring below a selected cutoff score are placed into a developmental course. Thus, when evaluating the test score and course grade relationship for the standard course, the data pertain only to those students who enroll in the standard course (i.e., the test score range is



restricted). Using logistic regression allows the researcher to estimate the probability of success (e.g., a grade of B or better or a grade of C or better) in the standard course for all tested students, yielding four estimated percentages:

1. The percentage of students who scored below the cutoff who would have failed the standard course had they enrolled in it (true negative).

2. The percentage of students who scored below the cutoff who would have succeeded in the standard course had they enrolled in it (false negative).

3. The percentage of students who scored at or above the cutoff who actually succeeded in the standard course (true positive).

4. The percentage of students who scored at or above the cutoff who actually failed in the standard course (false positive).

Placement validation using this methodology is accomplished by calculating the percentage of students correctly placed (percentage of correct decisions, or accuracy rate) given the cutoff score used to place students. The accuracy rate is the sum of the true positives and true negatives. Alternative cutoff scores can be evaluated by estimating the percentage of students who would be correctly placed using each alternative cutoff score.

Evidence of predictive validity Since the fall of 1993, COMPASS placement tests have been administered to entering

freshmen at postsecondary institutions. These institutions have provided end-of-semester grades for their tested students, first, as part of a special validity study, and more recently, as users of the ACT Course Placement Service. This service gives schools information on the relationship between course grades and COMPASS scores at their particular institution. All of the data collected have been analyzed to supply criterion-related validity evidence for the COMPASS Writing Skills, Reading, and Mathematics tests. The analyses include only courses that have grades and test scores available for at least 40 students.

Logistic regression models were used to calculate estimated probabilities of success for standard-level mathematics, English, natural science, and social science courses that all had lower-level courses in which a student could be placed. These standard-level courses were English Composition, Arithmetic Skills, Accounting, Technical Mathematics, Elementary Algebra, Intermediate Algebra, College Algebra, Precalculus, Calculus, Biology, History, and Psychology. Course success was predicted from the relevant COMPASS test score using as the criterion a course grade of B or higher and C or higher. The estimated probabilities were used to calculate the estimated percentage of students who would be assigned to the lower-level



mathematics class (for a particular cutoff score) and the estimated accuracy rates (the estimated percentage of students correctly placed).

The results of COMPASS user colleges’ participation in the Course Placement Service between January 1995 and November 2001 are summarized in Tables (3)2.1 and (3)2.2, shown on the next two pages. (Course placement concepts and procedures are discussed in the “Developing placement rules” section of Part 2, Chapter 7, “Cutoff scores,” in this Reference Manual.) Table (3)2.1 analyses are based on students obtaining a grade of B or higher. Table (3)2.2 analyses are based on students obtaining a C or higher.

In Tables (3)2.1 and (3)2.2, a cutoff score for a particular college is defined as the minimum score for which a student has a 50% chance of success in the indicated course. Success is defined as completing the course with a B or higher grade in Table (3)2.1 or a C or higher grade in Table (3)2.2. The cutoff score range and the median cutoff score in the tables pertain to the results summarized over colleges. The accuracy rate is the estimated percentage of students correctly placed with a college’s cutoff score. The percent ready for a course is the percentage of students whose COMPASS scores are at or above the median cutoff score, as documented in the “Fall 1999 COMPASS Composite Report.” The increase in accuracy rate for a given college is the difference between the estimated accuracy rate with a college’s cutoff score and the estimated accuracy rate that would occur if no placement assessment had been used.



TABLE (3)2.1 COMPASS cutoff scores and validity statistics for placement in first-year courses in college (B or higher course grade)

Course type COMPASS test

scored No. of

colleges Cutoff score statistics Validity statistics

Median cutoff score

% ready for course

Median accuracy

rate Median increase in accuracy rate

English

Composition Writing Skills 68 71 44 66 19

Composition Reading 28 81 50 60 10

Mathematics/Business

Arithmetic Numerical Skills/ Prealgebra 26 36 54 70 16

Elementary algebra

Numerical Skills/ Prealgebra 38 62 19 67 25

Intermediate algebra Algebra 29 48 19 71 25

College algebra Algebra 23 71 6 72 43

Precalculus Algebra 6 79 4 78 53

Calculus College Algebra 6 59 23 65 24

Accounting Numerical Skills/ Prealgebra 2 65 17 70 32

Technical mathematics Algebra 2 40 27 75 17

Natural sciences/Social sciences

Biology Reading 2 90 26 71 34

History Reading 5 95 12 74 47

Psychology Reading 11 90 26 68 31



TABLE (3)2.2 COMPASS cutoff scores and validity statistics for placement in first-year courses in college (C or higher course grade)

Course type COMPASS test scored

No. of colleges

Cutoff score statistics

Validity statistics

Median cutoff score

% ready for

course

Median accuracy

rate Median increase in accuracy rate

English

Composition Writing Skills 39 29 83 67 2

Composition Reading 12 55 90 67 2


Arithmetic Numerical Skills/ Prealgebra 16 31 63 72 4

Elementary algebra

Numerical Skills/ Prealgebra 24 40 47 63 6

Intermediate algebra Algebra 17 28 50 68 5

College algebra Algebra 19 48 19 67 20

Precalculus Algebra 5 48 19 59 12

Calculus College Algebra 4 43 54 68 9

Accounting Numerical Skills/ Prealgebra 2 46 37 67 13


Biology Reading 2 75 63 70 20

History Reading 4 88 32 69 29

Psychology Reading 9 59 86 67 4

The goal of an effective course placement program is to match students with the instruction appropriate to their educational development. Under this definition, placement validity can be established by calculating the percentage of students correctly placed (accuracy rate) given the cutoff scores used to place students. Accuracy rates and increases in accuracy rates relative to using no cutoff score (i.e., placing all students in the standard-level course) provide strong validity evidence.

Thus, for example, the first row of Table (3)2.1 can be interpreted as follows: 68 institutions, each with an English composition course, tested at least 40 students using the COMPASS Writing Skills Placement Test. The median optimal cutoff score was 71. This



optimal cutoff score is defined as the score that corresponds to a .50 probability that a student will get a grade of B or higher in the standard course (English Composition). When the optimal cutoff score was used, the median percentage of students placed in the standard-level course was 44%. The median accuracy rate, consisting of the percent of students appropriately placed in either the standard-level or the developmental English course, was 66%. This represents a 19% increase in appropriate placement over using no placement test.

Table (3)2.3 summarizes COMPASS cutoff scores for placement in different types of first-year courses.

TABLE (3)2.3 COMPASS cutoff score guide for placement in first-year college courses

Course type (no. of colleges)

COMPASS test scored

Score needed for 50% chance of . . . B or higher C or higher

English

Composition (44) Writing Skills 71 29

Composition (22) Reading 81 55


Arithmetic (15) Numerical Skills/Prealgebra 36 31

Elementary algebra (23) Numerical Skills/Prealgebra 62 40

Intermediate algebra (19) Algebra 48 28

College algebra (18) Algebra 71 48

Precalculus (4) Algebra 79 48

Calculus (2) College Algebra 59 43

Accounting (1) Numerical Skills/Prealgebra 65 46

Technical mathematics (2) Algebra 40 Not available


Biology (2) Reading 90 75

History (6) Reading 95 88

Psychology (6) Reading 90 59

A “cutoff score” is the minimum score for which it is estimated that a student has a 50% chance of earning a grade of B or higher (or C or higher) in a particular type of course (note: there were insufficient data to establish cutoff scores for earning a C or higher in technical mathematics courses). The B or higher cutoff scores are larger than the C or higher cutoff scores because in a given course, it is more difficult to earn a B than to earn a C.



These cutoff scores are typical results from the COMPASS user institutions participating in the ACT Course Placement Service between January 1995 and November 2001, and they can be used today by other COMPASS users as initial cutoff score estimates for their own institutions. It is recommended, however, that users then participate in the ACT Course Placement Service to obtain more accurate cutoff scores from their own data. (For more information about this chart or the service, go to http://www.act.org/research/services/crsplace/.)


http://www.act.org/research/services/crsplace/

Part 3, Chapter 3: Development of COMPASS tests

Chapter 3: Development of COMPASS tests

Overview Development of the COMPASS Reading, Writing, and Mathematics tests began with conceptual and informational meetings of ACT staff between 1985 and 1989, which resulted in a set of general specifications. These specifications were discussed and further refined in a series of advisory panel meetings. The first of these meetings, in August 1990, set the general direction and focus that guided the initial COMPASS development effort. The system has continued to grow and evolve in response to changing user needs, culminating in this newest version which includes an English as a Second Language (ESL) component and a direct writing assessment called e-Write.

This chapter describes how COMPASS tests are developed. In the meetings that began in 1990, the ACT advisory panel determined content and technical aspects of COMPASS development. The panel formed separate task forces, one for each of the initial three content areas. Each task force clarified the specifications test requirements for its content area and developed the items and item pools for use in COMPASS. Details of the early panel discussion topics and the task forces’ methods of developing COMPASS components are provided later in this chapter.

Some phases of the development process were conducted in a similar fashion across the three initial content areas, whereas other phases were conducted in a way unique to each content area. The chapter first discusses the phases of development that are common to all three initial content areas, then discusses development phases that differ by content areas. Table (3)3.1 lists all of the current COMPASS assessments.



TABLE (3)3.1 Item pools for current COMPASS placement tests

Content area Assessment type Domain

Mathematics Placement Diagnostic

Numerical Skills/Prealgebra Algebra College Algebra Geometry Trigonometry Numerical Skills/Prealgebra Operations with Integers Operations with Fractions Operations with Decimals Positive Integer Exponents Ratios and Proportions Percentages Averages (Mean, Medians, and Modes) Algebra Substituting Values into Expressions Setting up Equations Basic Operations with Polynomials Factoring Polynomials Linear Equations in One Variable Exponents and Radicals Rational Expressions Linear Equations in Two Variables

Reading Placement Diagnostic Reader Profile

Reading Comprehension Reading Comprehension Vocabulary General Reading Habits

Writing Skills Placement Diagnostic

Usage and Mechanics Rhetorical Skills Punctuation Verb Formation and Agreement Usage Relationship of Clauses Shifts in Construction Organization Spelling Capitalization

COMPASS e-Write 2–12 and COMPASS e-Write 2–8

Student-generated essay



Test specifications In August 1990, ACT staff met with an advisory panel of college faculty, counselors, and testing staff from 17 two- and four-year postsecondary institutions throughout the United States. The panel examined the general system specifications and determined that the COMPASS system would comprise placement, diagnostic, and supplemental information or advanced testing components in each of the three major content areas of reading, mathematics, and writing skills. It was also decided that users would be given a wide variety of options for customizing the COMPASS system to meet the varying needs of institutions.

To develop test specifications for the three initial content areas, each content area task force followed a common set of procedures. First, the task forces conducted literature reviews to determine the aspects of assessment in their particular content areas that were relevant to the design of COMPASS. Emphasis was placed on the specific topics covered in the curricula of entry-level courses at two- and four-year postsecondary institutions. Next, the task forces examined course catalogs from a diverse sample of 23 institutions throughout the country to obtain more information on specific topics covered in entry-level, remedial, and advanced courses in the three content areas. Finally, the task forces designed surveys specific to each of the content areas and sent them to a number of institutions across the country. Through these surveys, faculties specified the content that they considered most important for placement and diagnostic testing.

Each task force developed a set of proposed specifications based on the results of the literature reviews, the sampling of course catalogs, and the faculty surveys. Consultants in the content areas reviewed these specifications as a final quality check before ACT formally adopted the specifications.

Item development procedures Selection and training of item writers Development of new COMPASS items begins by identifying and selecting item writers, primarily from among postsecondary faculty in the three major content areas. Potential item writers receive an Item Writer’s Guide specific to their content area. The guides outline ACT specifications for item construction, including the format to use for test items, specific topic areas to cover, and examples of acceptable items. The guides specify requirements for (1) fair portrayal of population subgroups, (2) avoidance of subject matter potentially unfamiliar to members of a population subgroup, and (3) use of nonsexist language. Potential item writers



submit work for review. ACT test development staff evaluate the work to ensure that each item meets the specifications—that it is fair to all examinees and written at the appropriate cognitive and interest level for the intended examinee groups. ACT contracts with individual item writers who are able to construct acceptable test items as assigned.

For certain parts of the Reading and Writing Skills tests, ACT also gives item writers a written passage upon which to base items. Assigning a small number of items (called a “unit”) to each item writer helps ensure the security of the testing materials. Whenever possible, ACT attempts to select a broad sample of item writers from several cultural, geographical, and ethnic groups and to recruit both male and female writers to enhance the diversity of materials. Item writers work closely with ACT content editors, who assist them in producing high-quality units that fully meet all specifications.

Internal and external item content and fairness reviews Each test unit submitted by an item writer is reviewed by ACT staff for fairness, content

accuracy, and general quality of composition. Test development staff will decide whether to accept the unit, ask for revisions, or reject it. Revised units may be accepted on resubmission if they meet the specifications. ACT staff will edit the accepted units so that they meet the specifications for content accuracy, item classification, item format, and language. Each item is checked by the content editor to ensure that the item has one and only one correct answer, that the incorrect alternatives (foils or distractors) of multiple-choice items are plausible but incorrect, and that the material fits the appropriate cognitive level. An English language editor also reviews all items for accuracy of grammar, usage, syntax, clarity, appropriateness of tone and voice, and adherence to COMPASS test style. During the internal editing process, all test materials are likewise reviewed for fair portrayal and balanced representation of groups. Multiple content and editorial reviews are performed to ensure that test items meet ACT standards.

ACT also commissions external consultants to review COMPASS items for both soundness (content) and fairness (sensitivity). The content of each item and passage is reviewed by one or more consultants who are experts (e.g., faculty) in the relevant subject area. They examine each item to ensure that it is appropriate for the intended examinee population in terms of content, knowledge, and skill level required to answer the question the item presents. ACT incorporates extensive procedures into the development process to guard against potential bias.

External fairness reviews Five separate fairness panels external to ACT review every potential COMPASS test item

to identify items that may need to be revised or eliminated to address possible sources of item bias or unfairness to students of any racial, ethnic, cultural, gender, socioeconomic, or linguistic



background. Persons serving on the five COMPASS fairness review panels come from across the United States and represent the following backgrounds:

1. African American

2. Asian American

3. Hispanic, Latino, Latina American (Mexican American)

4. Native American

5. Women

ACT solicits names of potential panelists from nationally recognized advocacy groups, then selects from three to five individuals for each panel. The following outlines the ACT fairness review panel process:

• Panelists receive a packet containing all passages and items being considered for inclusion on the COMPASS tests, along with a set of guidelines for conducting the fairness review.

• After an individual review period, panelists participate in a panel-specific teleconference facilitated by ACT staff.

• During the teleconference, panelists review each passage and each item, raising any concerns or potential issues they see. ACT staff note the nature of identified problems, the seriousness of the problem as judged by the panelists, and the presence or absence of a consensus.

• After the teleconference, panelists return the secure test materials along with written comments to ACT. ACT staff then review all materials and comments and make final decisions about revising or eliminating items of fairness concern.

Criteria used to evaluate COMPASS items for fairness Three primary criteria are used by the external panelists to evaluate items for fairness:

1. Content, context familiarity. COMPASS test items and related stimulus materials must be based on contexts that are likely to be equally familiar to all ethnic and gender groups. Concepts that are likely to be unfamiliar to students in typical, relevant instructional programs should be avoided. Culturally diverse materials, circumstances, and points of view are to be reflected in the test items in a manner that avoids problems with other aspects of fairness.

2. Fair portrayal. All groups should be portrayed accurately and fairly without reference to stereotypes or traditional roles of gender, age, race, ethnicity, religion, physical condition, or geographic origin. Comparative descriptions of different population



groups should be avoided unless they are relevant for the knowledge and skills being measured.

3. Fairness in language. The characterization of any group must not be at the expense of that group; jargon, slang, and demeaning characterizations are not permitted, and references to ethnicity, color, marital status, religion, or gender should be made only when they are germane to the knowledge and skills being assessed.

Whereas the development and fairness review activities described here occur as part of all COMPASS item development, further examination of items is conducted later through differential item functioning (DIF) analyses. Generated on the basis of student response data, DIF analyses provide an empirical method for examining item-level results and comparing item functioning for different portions of the population. The section that follows provides in-depth information regarding DIF analyses for COMPASS placement tests.

Differential item functioning analyses In 2008, a comprehensive COMPASS placement tests DIF study was conducted using

item-level data extracts from the COMPASS Internet Version. This study resulted in substantial DIF n-counts for all COMPASS placement tests. In addition to the 2008 DIF study, ACT conducted follow-up DIF analyses in 2011 and 2012 for COMPASS items. The sections below describe the methodology, data, and results of the 2008 DIF study; a summary section provides information regarding subsequent COMPASS DIF analyses.

DIF methodology DIF analyses offer a method for examining each item on a test to determine whether the

item functions differently for members of different subgroups of the examinee population. A typical DIF analysis compares the performance of a focal (minority) group against a relevant base group. Members of the two groups are matched on the basis of a criterion measure of ability in the relevant subject area, usually the total score on the test.

For the purposes of DIF analyses, ACT uses the Mantel-Haenszel common-odds ratio (Holland & Thayer, 1988) as the statistical index. The Mantel-Haenszel index represents the probability of members of either group getting an item correct compared to the probability that (matched) members of the other group will get the item correct. Items identified by a DIF analysis as being potentially biased against either a focal group or a base group are reexamined, along with any relevant stimulus materials such as passages and tables, to determine whether the language or context of the items or related materials may be a contributing factor.

COMPASS placement test items present unique challenges to standard DIF analyses. One challenge is that the computer-adaptive nature of test administration means that few examinees



receive the same items or even the same number of items. Furthermore, examinees who do get the same items may not get them in the same order. Conventional DIF procedures that match examinees on the basis of simple number-correct scores are, therefore, not applicable. Instead, with DIF analyses of COMPASS items, each examinee’s estimated true score is used as the matching criterion. This score is defined as the expected proportion of items an examinee would answer correctly if all items in a specific COMPASS placement test pool were administered.

In the context of COMPASS DIF analyses, ACT completed three major ethnic group comparisons by reviewing the available demographic data for students who had been administered COMPASS tests.

1. Caucasian American/White (base group) versus Mexican American/Chicano/Latino and Puerto Rican/Cuban/Other Hispanic (focal group)

2. Caucasian American/White (base group) versus African American/Black (focal group)

3. Caucasian American/White (base group) versus Asian American/Pacific Islander and Filipino (focal group)

In addition, the possibility of DIF arising from gender differences (across all ethnic groups) was examined with females as the focal group and males as the base group. Group differences in performance on a given item were then examined for potential bias in either direction (i.e., toward the base or the focal group) for each set of comparisons.

ACT DIF studies are based on data collected during operational (i.e., “live”) test administrations of the COMPASS placement tests rather than on data gathered under more artificial circumstances. This is done to mitigate the possibility of data being affected by student motivation. As part of each COMPASS administration, both operational items (i.e., items that contribute to an examinee’s score) and pretest items are administered. Pretest items have undergone both internal and external fairness panels but do not contribute to examinee scores because the intent of pretesting is to gather data under operational conditions to conduct item analysis and calibrations.

Only one pretest item or item set (stimulus passage and associated items) is administered per examinee for each of the COMPASS placement tests. However, while the number of pretest items presented for each examinee tested is relatively small, the exposure of pretest items is controlled within the COMPASS system so that all pretest items are presented within a particular range of ability level. This system-controlled presentation of pretest items means that there were sufficient data available for most of the pretest items in the COMPASS placement test item pools. This approach to “forcing” pretest item presentation and gathering pretest data is the preferred model for DIF analysis because the pretest stage is the most appropriate time to



eliminate items from pools. Potentially biased pretest items can be removed before they are included in the computation of an examinee’s score.

In contrast to controls associated with presenting pretest items, the principal drawback to using data from operational administrations for operational items (i.e., items that contribute to examinees’ scores) is the time required to collect sufficient data to conduct the necessary analyses. This is particularly true for the COMPASS placement tests because of the following program features and constraints:

1. The COMPASS item pools are very large. Furthermore, items that would otherwise be administered frequently are protected against overexposure, an explicit design of a computer-adaptive test to protect item security and mitigate overuse; the use of item exposure parameters results in a more balanced use of the pool of items. Accordingly, very large numbers of tests must be administered before most items are presented to adequate numbers of members of small demographic groups.

2. Item pools are broad as well as deep, with each pool including items that cover a wide range of difficulty. Because the COMPASS placement tests are adaptive, difficult items are generally administered to better-prepared examinees, and easier items are administered to less-prepared examinees. Therefore, it is not sufficient that the COMPASS placement tests simply be administered to large numbers of examinees; these examinees collectively must also span a wide range of abilities.

3. Because of exposure controls used to manage the selection of operational items and the distribution of examinee ability levels, there are operational COMPASS items that did not achieve sufficient n-counts to include in the 2008 DIF study. Because DIF analyses are conducted as the required data become available, the performance of the most frequently administered items in the largest demographic groups is checked first.

DIF procedures DIF analyses of COMPASS data are performed by a statistical procedure developed by

Mantel and Haenszel (1959) and adapted by Holland and Thayer (1988). Under these procedures, examinees are grouped into performance strata on the basis of a matching variable, which is usually their test scores. Each examinee’s 1/0 (right/wrong) item scores and demographic group membership (focal or base) are used to construct a 2 × 2 contingency table for each of k levels of the matching variable.

The overall 2 × 2 × k table provides the data for the procedure. Specifically, at level k for a given item, the 2 × 2 table is of the form



Group Correct = 1 Incorrect = 0 Total Base group Ak Bk nRk

Focal group Ck Dk nFk Total m1k m0k Tk

where there are Tk examinees at strata k, nRk in the base group and nFk in the focal group. The frequencies of correct response are Ak for the base group and Ck for the focal group, and Bk and Dk are the frequency of incorrect response for the base and focal groups, respectively.

The Mantel-Haenszel common odds ratio is estimated by

Values of the odds ratio greater than 1.0 indicate DIF favoring the base group. Values less than 1.0 indicate DIF favoring the focal group. At ACT, values of the odds ratio that are < 0.5 or > 2.0 are flagged for further review.

A potential limitation of applying the Mantel-Haenszel procedure to adaptive tests concerns the matching variable. The raw score that is typically used in a paper-and-pencil test is not appropriate for an adaptive test because different examinees take tests of different length and composition. An alternative that was shown by Zwick, Thayer, and Wingersky (1993) to be effective is the expected true score based on the computer-adaptive model

where npool is the number of items in the pool and θ is the ability estimate obtained from the items actually administered. Because COMPASS reports the percentage—rather than number—of the pool’s items that an examinee would be expected to answer correctly, this score, ξ/npool, is used as the matching variable for all analyses.

ˆ

k k

kMH

k k

k

A DT =

CBT

α∑

∑

ˆ ˆ( )pooln

ii=1

= Pξ θ∑



Data used in DIF analyses Data for the analyses reported for the 2008 DIF study were collected as part of the

operational use of COMPASS placement tests administered to students from May 2004 through May 2008, with a total of 973 institutions contributing data. Table (3)3.2 summarizes the institutions by size, geographic region, and type.

TABLE (3)3.2 Size and representation of institutions in 2008 DIF study

Description No. of schools

Institution size

Under 5,000 683

5,000–10,000 161

10,000–20,000 91

Over 20,000 38

Region of U.S.

Eastern 122

Northeastern 38

Southeastern 254

Midwestern 284

Western 102

Northwestern 50

Southwestern 122

Canada 1

Institution type by Carnegie classification *

Doctorate-granting universities 53

Master’s colleges and universities 110

Baccalaureate colleges 111

Associates colleges 633

Special focus institutions (faith, health, business) 53

Tribal 13 * Based on The Carnegie Foundation: http://www.carnegiefoundation.org/classifications/


http://www.carnegiefoundation.org/classifications/


The total numbers of examinee results used for the 2008 DIF analyses were as follows: • 1,008,776 for the COMPASS Reading Placement Test • 904,184 for the COMPASS Writing Skills Placement Test • 853,188 for the COMPASS Numerical Skills/Prealgebra Placement Test • 815,130 for the COMPASS Algebra Placement Test • 138,682 for the COMPASS College Algebra Placement Test • 28,256 for the COMPASS Geometry Placement Test • 64,385 for the COMPASS Trigonometry Placement Test

Tables (3)3.3 through (3)3.9 present summaries of these examinees by ethnic and gender background for the COMPASS Reading, Writing Skills, Numerical Skills/Prealgebra, Algebra, Geometry, College Algebra, and Trigonometry tests, respectively.

TABLE (3)3.3 Sample sizes for each ethnic and gender group in COMPASS Reading Test DIF analyses

Ethnicity Gender Total

Female Male

n-count % total n-count % total n-count % total

African American/ Black 141,927 14.57 84,292 8.65 226,219 23.22

American Indian/ Alaskan Native1

11,659 1.20 8,048 0.83 19,707 2.02

Caucasian American/ White

309,325 31.75 230,040 23.61 539,365 55.36

Mexican American/ Chicano/ Latino

41,387 4.25 31,784 3.26 73,171 7.51

Asian American/ Pacific Islander

14,339 1.47 13,638 1.40 27,977 2.87

Puerto Rican/ Cuban/ other Hispanic

16,141 1.66 10,586 1.09 26,727 2.74

Filipino 3,160 0.32 2,231 0.23 5,391 0.55

Other2 21,211 2.18 15,665 1.61 36,876 3.78

Prefer not to respond1 9,986 1.02 8,890 0.91 18,876 1.94

Total 569,135 58.41 405,174 41.59 974,3092 100.00 1These examinees were not included in the DIF analysis because of the low n-count. 2This does not include 34,467 examinees who did not respond to these items.



TABLE (3)3.4 Sample sizes for each ethnic and gender group in COMPASS Writing Skills Test DIF analyses


Female Male




10,869 1.25 7,641 0.88 18,510 2.12


279,300 32.01 216,829 24.85 496,129 56.86


34,873 4.00 28,374 3.25 63,247 7.25


11,910 1.37 11,850 1.36 23,760 2.72


13,106 1.50 9,109 1.04 22,215 2.55

Filipino 2,677 0.31 1,992 0.23 4,669 0.54

Other1 18,666 2.14 14,138 1.62 32,804 3.76


Total 501,177 57.44 371,330 42.56 872,5072 100.00

1These examinees were not included in the DIF analysis because of the low n-count. 2This does not include 31,677 examinees who did not respond to these items.



Table (3)3.5 Sample sizes for each ethnic and gender group in COMPASS Numerical Skills/ Prealgebra Test DIF analyses


Female Male

n-count % total n-count % total n-count % total African American/ Black 127,298 15.51 74,226 9.04 201,524 24.56 American Indian/ Alaskan Native1 9,908 1.21 6,697 0.82 16,605 2.02

Caucasian American/ White 269,628 32.85 185,768 22.64 455,396 55.49

Mexican American/ Chicano/ Latino 31,498 3.84 23,648 2.88 55,146 6.72

Asian American/ Pacific Islander 10,098 1.23 9,169 1.12 19,267 2.35

Puerto Rican/ Cuban/ other Hispanic 13,495 1.64 8,637 1.05 22,132 2.70

Filipino 2,292 0.28 1,560 0.19 3,852 0.47 Other1 17,573 2.14 12,849 1.57 30,422 3.71 Prefer not to respond 8,951 1.09 7,397 0.90 16,348 1.99




TABLE (3)3.6 Sample sizes for each ethnic and gender group in Algebra Test DIF analyses


Female Male




6,730 0.85 5,787 0.73 12,517 1.59


246,836 31.27 205,705 26.06 452,541 57.33


32,865 4.16 26,716 3.38 59,581 7.55


13,012 1.65 12,972 1.64 25,984 3.29


11,984 1.52 8,683 1.10 20,667 2.62

Filipino 2,498 0.32 2,178 0.28 4,676 0.59

Other1 17,596 2.23 14,196 1.80 31,792 4.03


Total 441,280 55.90 348,061 44.10 789,3412 100.00 1These examinees were not included in the DIF analysis because of the low n-count. 2 This does not include 25,789 examinees who did not respond to these items.



TABLE (3)3.7 Sample sizes for each ethnic and gender group in COMPASS College Algebra Test DIF analyses

Ethnicity Gender Total Female Male n-count % total n-count % total n-count % total African American/ Black 6,433 4.74 5,691 4.19 12,124 8.93 American Indian/ Alaskan Native1 841 0.62 801 0.59 1,642 1.21


Mexican American/ Chicano/ Latino 3,703 2.73 4,024 2.97 7,727 5.69


Puerto Rican/ Cuban/ other Hispanic 1,720 1.27 1,581 1.17 3,301 2.43

Filipino 670 0.49 696 0.51 1,366 1.01 Other1 4,185 3.08 3,857 2.84 8,042 5.93 Prefer not to respond1 1,917 1.41 2,506 1.85 4,423 3.26 Total 65,578 48.33 70,119 51.67 135,6972 100.00 1These examinees were not included in the DIF analysis because of the low n-count.

2This does not include 2,985 examinees who did not respond to these items.



TABLE (3)3.8 Sample sizes for each ethnic and gender group in COMPASS Geometry Test DIF analyses


Female Male

n-count % total n-count % total n-count % total African American/ Black 1,632 5.89 1,514 5.46 3,146 11.35 American Indian/ Alaskan Native1 66 0.24 70 0.25 136 0.49


Mexican American/ Chicano/ Latino 665 2.40 666 2.40 1,331 4.80


Puerto Rican/ Cuban/ other Hispanic 732 2.64 622 2.24 1,354 4.88

Filipino 144 0.52 117 0.42 261 0.94 Other1 1,021 3.68 918 3.31 1,939 6.99 Prefer not to respond1 461 1.66 498 1.80 959 3.46 Total 13,962 50.36 13,764 49.64 27,7262 100.00

1These examinees were not included in the DIF analysis because of the low n-count. 2This does not include 530 examinees who did not respond to these items.



TABLE (3)3.9 Sample sizes for each ethnic and gender group in COMPASS Trigonometry Test DIF analyses


Female Male



American Indian/Alaskan Native1

172 0.27 180 0.29 352 0.56



1,472 2.33 1,783 2.83 3,255 5.16

Asian American /Pacific Islander

3,369 5.34 3,811 6.04 7,180 11.38

Puerto Rican/ Cuban/ Other Hispanic

677 1.07 682 1.08 1,359 2.15

Filipino 312 0.49 339 0.54 651 1.03

Other1 2,156 3.42 2,168 3.44 4,324 6.85

Prefer not to respond1 906 1.44 1,309 2.07 2,215 3.51

Total 29,058 46.04 34,051 53.96 63,1092 100.00 1These examinees were not included in the DIF analysis because of the low n-count.

2This does not include 1,276 examinees who did not respond to these items.



Summary of DIF analyses This section describes the DIF analyses that were conducted for the COMPASS

placement tests. The first part of this section provides a description of the DIF analysis and item-review process. The final part of this section provides the results obtained from the analyses and subsequent review.

DIF analysis and item review process. Items were classified on the basis of their Mantel-Haenszel values into one of three categories: A, B, or C.

• A-category items, which had Mantel-Haenszel values of less than .5, were judged as favoring the focal group.

• B-category items had Mantel-Haenszel values between 0.5 and 2.0 and were judged as favoring neither the focal nor the reference groups.

• C-category items had Mantel-Haenszel values that exceeded 2.0 and were judged as favoring the base or reference group.

All items receiving A or C classifications were subjected to additional review in an attempt to discern possible reasons for their favoring one group over another. For the 2008 DIF study, five ACT staff members examined flagged items, as well as any relevant stimulus materials (e.g., passages, graphs, tables, figures), to identify any aspect of the item or associated materials that might possibly be responsible for the obtained DIF results. The ACT reviewers applied the same criteria (i.e., use of familiar content/context, fair portrayal, and fairness in language) that were used by the external fairness panels to ensure a cohesive and consistent approach to the review processes.

All reviewed items and written internal reviewer comments were collected and consolidated. Once written comments were compiled, the reviewers further examined all item-level comments. All comments were evaluated and an appropriate course of action was determined for each flagged item. The three options for disposition of each item were:

1. Remove the item and, if necessary, all related materials from the COMPASS item pools.

2. Remove the item, revise it, and recycle the revised item, along with all associated stimulus materials, through the item-development process (including reevaluating the item via both external fairness review panels and DIF analysis).

3. Continue to use the item in its current form.

All items flagged for DIF and reviewed by ACT staff as part of an internal review process were submitted to external fairness review panels in 2009 and 2010. That is, all items receiving A or C classifications were subjected to additional external fairness reviews as part of



our standard five-panel review process for newly developed items. The five COMPASS fairness review panels (i.e., African Americans, Asian Americans, Hispanic/Latino/Latina Americans, Native Americans, Women) were asked to review these flagged items to discern possible reasons for their favoring a group.

External fairness panelists were asked to review flagged items to identify any aspect of the item that might be responsible for the DIF results. External fairness reviewers applied the same criteria as are used for all new item development fairness reviews, with (1) DIF background information to aid in their analyses and (2) additional specifics regarding the DIF results. External reviewer comments were evaluated to determine whether an alternate action for item disposition was needed. In the case of this external fairness review, external panelist feedback aligned with internal reviewer feedback and the original item disposition decisions.

Description of items reviewed. Item usage rates can vary dramatically within a computer-adaptive test pool; similar variation exists in the size of the examinee sample available to assess each item. For example, while frequently used items may have been administered to thousands of examinees in both the focal and base groups, less frequently used items may have been presented to only a handful of examinees from either group. Items were included in the DIF analyses only if the sample size exceeded 150 examinees for both the focal and base groups.

As indicated previously, a limited number of pretest items are also administered to examinees for each COMPASS placement test administration. Pretest items have been reviewed by external content and fairness panels and judged to be accurate and unbiased in terms of these soundness and fairness reviews. Pretest items are administered during operational test administrations to gather item-level statistics to inform decisions on moving these items to operational use. Pretest items do not contribute to examinee scores. Only one pretest item or item set (stimulus passage and associated items) is administered per examinee for each of the COMPASS placement tests.

For the 2008 COMPASS placement tests DIF study, there were sufficient data to analyze many of the pretest items along with operational items. At the pretest stage, items identified and confirmed to be potentially biased can be removed from the pool before they contribute to an examinee’s score. The updated DIF analyses included both operational and pretest items. The following descriptions and tables provide DIF information (e.g., comparisons, disposition of items) for each of the COMPASS placement tests.



Reading Placement Test. The COMPASS Reading Placement Test results presented in Table (3)3.10 are based on a total of 1,008,776 examinees. A total of 1,596 comparisons were made across gender and ethnicity comparison categories. Based on these comparisons, 28 comparisons (or about 1.7%) were flagged with Mantel-Haenszel values that exceeded the established criteria. Although a precise confidence level for the Mantel-Haenszel statistic is not known, experience indicates that chance alone will dictate that roughly 5% of all comparisons will fall into the A and C categories even when no DIF is present. The results presented below are lower than chance. The 28 flagged comparisons, involving 23 items (12 operational and 11 pretest items), were reviewed. For 7 of the 23 items (1 operational and 6 pretest items), the possibility of bias was deemed sufficient for one or more groups, and the determination was made to remove these items. These items were removed as part of a June 2012 COMPASS pool update.

TABLE (3)3.10 Results of the DIF analysis for the COMPASS Reading Placement Test

Comparison No. of items

analyzed

Mantel-Haenszel values Focal group sample size Base group sample size

Focal group Base group A B C Min. Max. Med. Min. Max. Med.

Female Male 399 1 396 2 3,686 134,232 28,721 2,761 94,435 20,566

African American/ Black

Caucasian American/ White 399 0 396 3 1,672 55,856 11,629 3,435 127,049 27,420

Mexican American/ Puerto Rican/ Cuban/ other Hispanic

Caucasian American/ White 399 0 397 2 616 23,408 5,207 3,435 127,049 27,420

Asian American/ Pacific Islander/ Filipino

Caucasian American/White 399 1 379 19 231 7,647 1,736 3,435 127,049 27,420

A. Items with Mantel-Haenszel values of less than 0.5 are judged as favoring the focal group. B. Items with Mantel-Haenszel values between 0.5 and 2.0 are judged as favoring neither group. C: Items with Mantel-Haenszel values exceeding 2.0 are judged as favoring the base group.



Writing Skills Placement Test. The COMPASS Writing Skills Placement Test analyses shown in Table (3)3.11 are based on a total of 904,184 examinees. A total of 2,331 comparisons were made across gender and ethnicity categories. Based on these comparisons, 89 comparisons (about 4%) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than chance. The 89 flagged comparisons, involving 75 items (26 operational and 49 pretest items), were reviewed. For 18 of the 75 items (6 operational and 12 pretest items), the possibility of bias was deemed sufficient for one or more groups. The determination was made to revise and re-pretest these items. This action was taken as part of a June 2012 COMPASS pool update.

TABLE (3)3.11 Results of the DIF analysis for the Writing Skills Placement Test


analyzed



Female Male 603 1 601 1 293 148,235 29,152 228 110,437 21,624 African American/ black

Caucasian American/White 576 0 564 12 3,636 57,758 11,359 9,618 147,466 29,115


Caucasian American/White 576 0 567 9 1,585 26,147 5,076 9,618 147,466 29,115


Caucasian American/White 576 16 510 50 524 8,434 1,651 9,618 147,466 29,115

A: Items with Mantel-Haenszel values of less than 0.5 are judged as favoring the focal group. B: Items with Mantel-Haenszel values between 0.5 and 2.0 are judged as favoring neither group. C: Items with Mantel-Haenszel values exceeding 2.0 are judged as favoring the base group.



Numerical Skills/Prealgebra Placement Test. The COMPASS Numerical Skills/Prealgebra Placement Test analyses shown in Table (3)3.12 are based on a total of 853,188 examinees. A total of 390 comparisons were made, of which 8 (2%) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than chance. The 8 flagged comparisons, involving 8 operational items, were reviewed. For one of these items the possibility of bias was deemed sufficient for one or more groups, and the determination was made to remove this item. This operational item was removed as part of a June 2012 COMPASS pool update.

TABLE (3)3.12 Results of the DIF analysis for the Numerical Skills/Prealgebra Placement Test


analyzed



Female Male 105 0 104 1 156 114,076 79,330 194 110,524 43,093 African American/ Black

Caucasian American/ White 100 1 98 1 169 52,559 29,375 170 152,616 56,274




Caucasian American/white 90 4 85 1 163 9,017 3,009 2,551 152,616 59,781




Algebra Placement Test. The COMPASS Algebra Placement Test analyses shown in Table (3)3.13 are based on a total of 815,130 examinees. A total of 407 comparisons were made, of which 4 (approximately 1%) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than chance. The 4 flagged comparisons, involving 4 operational items, were reviewed. During the review, no reasonable basis for possible DIF was found for any of these Algebra items and the decision was made to return them to the pool in their current form.

TABLE (3)3.13 Results of the DIF analysis for the Algebra Placement Test


analyzed



Female Male 109 0 108 1 156 109,527 50,707 156 88,311 38,124 African American/ black









College Algebra Placement Test. The COMPASS College Algebra Placement Test analyses shown in Table (3)3.14 are based on a total of 138,682 examinees. A total of 321 comparisons were made, of which 1 (less than 1%) was flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than chance. The 1 flagged comparison was an operational item; this item was reviewed. During review, no reasonable basis for possible DIF was found for this College Algebra item. The decision was made to return it to the pool in its current form.

TABLE (3)3.14 Results of the DIF analysis for the College Algebra Placement Test


analyzed






Caucasian American / White 77 0 77 0 155 3,959 1,923 1,461 24,319 15,952


Caucasian American / White 80 0 80 0 248 3,722 1,965 408 24,319 15,415




Geometry Placement Test. The COMPASS Geometry Placement Test analyses shown in Table (3)3.15 are based on a total of 28,256 examinees. A total of 280 comparisons were made, of which 5 (just over 1%) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than chance. The five flagged operational items were reviewed per the process described previously. During the review, no reasonable basis for possible DIF was found for these items; the decision was made to return the items to the pool in their current form.

TABLE (3)3.15 Results of the DIF analysis for the Geometry Placement Test


analyzed




Caucasian American/ White 73 0 72 1 182 697 468 387 3,361 2,479


Caucasian American/ White 67 0 67 0 171 512 382 1,372 3361 2,503


Caucasian American/ White 61 1 58 2 152 701 321 649 3,361 2,568




Trigonometry Placement Test. The COMPASS Trigonometry Placement Test analyses shown in Table (3)3.16 are based on a total of 64,385 examinees. A total of 352 comparisons were made, of which 2 (0.5%) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than chance. The 2 flagged pretest items were reviewed. During review, no reasonable basis for DIF was found for these Trigonometry pretest items; the decision was made to return them to the pool in their current form.

TABLE (3)3.16 Results of the DIF analysis for the Trigonometry Placement Test


analyzed


Focal group Base group A B C Min. Max. Med. Min. Max. Med. Female Male 118 0 117 1 152 8,224 1,058 175 9,444 1,607

African American/ Black

Caucasian American/ White 60 0 60 0 167 1,312 900 1,292 11,161 8,812


Caucasian American/ White 62 0 62 0 158 1,569 1,079 1,276 11,161 8,762


Caucasian American/ White 112 1 111 0 151 2,396 551 727 11,161 3,110




Summary of DIF analyses for COMPASS item pools. The results reported for the comprehensive DIF study are based on data collected from 2004 to 2008 within the COMPASS Internet Version. As previously indicated, the use of exposure controls to manage operational item selection and the range of ability levels of examinees affects item use and n-counts. Therefore, some test items did not achieve sufficient n-counts to be included in the 2008 DIF study. These DIF results included approximately 55% of all current operational and pretest items in the six COMPASS pools under investigation.

In 2011, ACT implemented routine DIF analyses as a critical step in the context of pretest item calibrations, new pool construction, and pool evaluation and comparability activities. Based on DIF analyses conducted in 2011 and 2012, no additional DIF instances were detected in item behaviors. However, ACT will continue evaluating the COMPASS item pools in terms of disposition of items and operational item exposures to balance the makeup of the pool (e.g., required content and range of difficulty) against the need to collect item-level data. As sufficient additional data become available for the COMPASS placement tests, these data will be analyzed and the results updated and reported.


Part 3, Chapter 4: Development of ESL tests

Chapter 4: Development of ESL tests

Overview In 1993, ACT began investigating expanding the COMPASS program to incorporate

English as a Second Language (ESL) assessments. Progress toward designing ESL proficiency tests proceeded in several stages as summarized in Table (3)4.1.

TABLE (3)4.1 History of ACT involvement in ESL

Date Stages of ESL development process

April 1993 Survey sent to ESL coordinators to obtain information about tests used to assess ESL students at postsecondary institutions

February 1994 Survey sent to ESL program administrators to identify ESL populations that might not be well served by existing tests

March 1994 In-depth content survey sent to selected ESL professionals to determine issues relevant to ESL learning and assessment

July 1994 Two-day panel meeting to discuss the appropriate purpose and structure of an ESL proficiency test

October 1995 Panel meeting of consultants in grammar/usage, reading, and listening

November 1995 Panel meeting of consultants in speaking and writing

December 1995 Panel meeting of consultants to clarify the ESL market

From the initial surveys, ACT learned that although all postsecondary institutions felt to some degree that existing ESL tests did not meet their needs, community colleges voiced the most dissatisfaction. Using the information from the surveys, the initial panel meeting, and internal research, ACT formulated a preliminary test “blueprint” and presented this to the three panels of ESL experts convened during October, November, and December of 1995. Based on feedback from these meetings, ACT created a preliminary item taxonomy and worked to connect the language and testing philosophy of ACT with the requirements of the ESL proficiency tests.

System and test specifications To develop specifications for the ESL proficiency tests, ACT staff began surveying and

consulting with ESL professionals. As a result of those consultations, ACT designed the test to

• meet the proficiency testing needs of students attending community and technical colleges and four-year institutions with ESL programs;



• be administered in individual modules that assess three critical skills: grammar/usage, reading, and listening;

• supply narrative proficiency descriptors in addition to numerical scores;

• contain an easy-to-navigate testing interface appropriate for the population; and

• be computer-adaptive.

ACT chose the computer-adaptive format to most effectively place students at all levels of proficiency. Using computer-adaptive tests with an ESL population is valuable for the following reasons:

1. Adaptive tests focus on administering items at appropriate levels of difficulty, which prevents examinees from taking tests that are far above or far below their abilities.

2. ESL students often enroll at various points in a semester and need to be tested on an individualized basis.

3. Students can proceed at their own pace yet have their assessment time monitored automatically, allowing institutions to use the amount of time the student spends taking the test as an additional factor in ESL course decisions.

As a group, ESL students have particular trouble with performing reading or language tasks when limited testing time is available (Miller, 1988). Ascher points out that bilingual speakers “process information more slowly in their less familiar language, which accounts for their slower speed in test-taking” (Ascher, 1990). Thus, when time is a factor, the “test-wiseness” of the students sometimes becomes the primary skill measured. Removing time pressure is one major advantage of computer-adaptive testing. ESL administrators and faculty attending ACT meetings in the fall of 1995 indicated that the adaptive nature of COMPASS was an improvement over traditional paper-and-pencil tests. However, even with an untimed test, adaptive testing provides the speed and accuracy of information required by institutions for their ESL course placement process.

In the fall of 1995, ACT staff met with a series of panels of ESL professionals to determine the proficiency descriptors for the three ESL modules: grammar/usage, reading, and listening. The proficiency descriptors were linked with nationally recognized benchmarks of English language proficiency, including

• the 1986 proficiency guidelines developed by members of the American Council on the Teaching of Foreign Languages (ACTFL);

• the Second Language Proficiency Descriptors considered by the California Association of Teachers of English to Speakers of Other Languages (TESOL) to assist in ESL articulation;



• the 1993 benchmarks written for the College Standards and Accreditation Council for a pilot project to develop standards in ESL education throughout 17 colleges in Ontario, Canada;

• the Language Benchmarks intended for adult learners of English as a Second Language drafted in 1993 by the [Canadian] National Working Group on Language Benchmarks; and

• the March 1996 draft version of TESOL’s ESL Standards for Pre-K through Grade 12 students.

ACT staff and consultants then developed item classification categories for each of the three modules, addressing the skills at each proficiency level. The next step was determining the proportion of various item types at each level. Then ACT staff began the process of developing passages and items to fit the test plan for each of the ESL proficiency tests.

Item development procedures The following description of item development procedures—writing, editing, reviewing,

and pretesting—applies to all three multiple-choice ESL proficiency tests.

Selection and training of item writers ACT contracts with ESL item writers who are current or former ESL instructors at the

postsecondary level and who represent a variety of institutions and geographical locations. The writers must adhere to the specifications given in the ESL Item Writer’s Guide for the construction of ESL items that test grammar/usage, reading, and listening skills. The guide outlines the item format, appropriate topics, and examples of acceptable items as well as the requirements for sensitivity issues such as fair portrayal of population subgroups and use of nonsexist language. All potential item writers submit a work sample, and ACT selects item writers on the basis of these samples.

Item reviews All passages and items in the ESL proficiency tests go through an internal ACT review;

ESL content editors evaluate and modify items on the basis of soundness and fairness criteria. An English language editor also reviews all items for accuracy of grammar, usage, syntax, clarity, appropriateness of tone and voice, and adherence to COMPASS test style. After the thorough internal review, items undergo two separate reviews by external consultants for both soundness (content) and fairness (sensitivity).



Soundness reviews ACT staff and a panel of ESL consultants review all passages and items in each content

area to ensure that they meet the following criteria:

• stimulus passages match the content criteria in the proficiency descriptor of the assigned skill level;

• items measure the skill reflected in their item classification category;

• items have one and only one correct answer, and the other answer choices are plausible but incorrect;

• items are passage-dependent; that is, prior knowledge or common sense should not allow examinees to answer items without reading or listening to the stimulus;

• passages and items within one level are of comparable difficulty;

• the continuum of passage and item difficulty from lowest to highest levels is smooth.

Sensitivity reviews Before they are pretested, passages and items are sent to three panels for fairness reviews

in an effort to ensure that materials in the ESL proficiency tests will not offend, favor, or disadvantage any examinees due to a particular racial, ethnic, cultural, socioeconomic, or linguistic background. Additionally, panelists are asked to evaluate the content of the lower levels of the tests to determine if these items are equally familiar to all examinee groups (this is further detailed in the ESL fairness guidelines in next subsection).

ACT convenes three separate five-member panels composed of persons from Asian, Latino/Latina, and other backgrounds. A third panel consists of ESL teachers who have worked with students from many cultures. Each panelist receives a packet of passages and items, as well as a set of review guidelines. After an individual review period, panelists participate in a teleconference moderated by ACT staff. Each passage and item is reviewed and any issues raised are discussed. ACT staff note the nature of any problem, its seriousness, and the presence or absence of a panel and staff consensus. After the teleconference, panelists return materials with their written comments. ACT staff discuss comments and panelist recommendations to determine whether to make revisions, drop a passage and/or item completely, or make no change.

ACT ESL fairness guidelines ACT tests developed for native English speakers typically require that test materials

reflect cultural diversity; tests are designed so that the examinee experiences are represented equally across backgrounds. Criteria for the ESL proficiency tests, however, are slightly different. Examinees taking ESL tests may be from any culture in the world. Thus, reflecting the



diversity of the examinee pool would be a difficult task and one that could easily disadvantage many students. Test materials appropriate for ESL students from one language group or culture could portray situations that are unfamiliar to students from other language or culture groups. Therefore, rather than reflecting global diversity, the ESL proficiency tests include materials and topics that are likely to be familiar to the largest possible segment of the test-taking population.

For example, beginning and intermediate level items use topics such as buying food, working, family relationships, and functional transactions. However, even these universal contexts can include some unfamiliar elements, so ACT staff instruct fairness panelists to pay particular attention to what might be unfamiliar to one or more groups. Items at the advanced level are more academic and include some high-frequency idioms.

Topics that in many cultures are sensitive or controversial, such as religion or gender roles, present an additional challenge to developing ESL tests. Views on these issues vary greatly across cultures. To respect the variety of student views, ACT does not use material in its ESL proficiency tests that focus on topics that is potentially culturally sensitive or controversial.

Testing can create a stressful situation for the examinee, so ACT takes care in the writing and selection of test materials to eliminate any source of unfairness that might distract examinees. Generally, unfairness can be eliminated from test materials through close adherence to the principles of fair portrayal and fair language.

• Fair portrayal. All groups should be portrayed accurately and fairly without reference to stereotypes or traditional roles with regard to gender, age, race, ethnicity, religion, physical condition, or geographic origin. Comparative descriptions of different population group attributes not germane to the knowledge and skills being measured should be avoided.

• No stereotypes or moral judgments. Presentations of cultural or ethnic differences must neither explicitly nor implicitly rely on stereotypes or make moral judgments. Comparative descriptions of different population groups and their members should pertain to the same attribute and be relevant for the knowledge and skills being measured.

• Fairness in language. The characterization of any group must not be at the expense of that group: jargon, slang, and demeaning characterizations are not permitted, and references to color, marital status, religion, or gender should be made only when they are germane to the knowledge and skills being measured.



Differential item functioning analyses of ESL pools Differential item functioning (DIF) analyses are based on student response data and are

most appropriately conducted after a test becomes operational. The comprehensive ESL proficiency test DIF analyses conducted in 2008 are based on item-level data extracts from the COMPASS Internet Version. The methodology, data, and results of the DIF study for the ESL Reading, ESL Listening, and ESL Grammar/Usage proficiency tests are described below.

DIF methodology DIF analyses offer a method for examining each item on a test to determine whether the

item functions differently for members of different subgroups of the examinee population. A typical DIF analysis compares the performance of a focal (minority) group against a relevant base group. Members of the two groups are matched on the basis of a criterion measure of ability in the relevant subject area, usually the total score on the test.

For the purposes of DIF analyses, ACT uses the Mantel-Haenszel common-odds ratio (Holland &Thayer, 1988) as the statistical index. The Mantel-Haenszel index represents the probability of members of either group getting an item correct compared to the probability that (matched) members of the other group will get the item correct. Items identified by a DIF analysis as being potentially biased against either a focal group or a base group are reexamined, along with any relevant stimulus materials such as passages and tables, to determine whether the language or context of the items or related materials may be a contributing factor.

The ESL proficiency test items present unique challenges to standard DIF analyses. One challenge is that the computer-adaptive nature of test administration means that few examinees receive the same items or even the same number of items. Furthermore, examinees who do get the same items may not get them in the same order. Conventional DIF procedures that match examinees on the basis of simple number-correct scores are, therefore, not applicable. Instead, these DIF analyses used each examinee’s estimated true score as the matching criterion. This score is defined as the expected proportion of items an examinee would answer correctly if all items in a specific pool were administered.

A second challenge for DIF analyses for the ESL test items is that it is difficult to identify appropriate groupings of ESL students and to identify a base group against which to compare the performance of any identified focal groups. Because the ESL proficiency tests are designed specifically for use by non-native English speakers, they generally assess a level of skills considered to be already present in standard native English speakers; ESL tests are not administered to native speakers (who would be likely to answer all or most items correctly).



To address these issues, ACT reviewed available demographic data for students who had been administered one or more of the ESL proficiency tests to determine the major categories of ethnicity that these students were selecting. Three major groups emerged from these data:

• Asian (including Filipino and other Pacific Islander examinees)

• Hispanic (including Mexican, Puerto Rican, Latino/Latina, and Cuban examinees)

• Caucasian/White examinees

These three groups were examined for the ESL proficiency test DIF analyses. These groups are similar to the categories used during item development to identify and convene fairness-review panels. Because no clear base group emerged from the data, ACT compared each group with each of the other two groups, using the Caucasian/White group as the nominal base group for two analyses (Caucasian/Asian and Caucasian/Hispanic) and using the Asian group as the base group for the Asian-Hispanic comparisons. In addition, the possibility of DIF arising from gender differences (across all ethnic groups) was examined with females as the focal group and males as the base group. As described in the detail that follows, group differences in performance on a given item were then examined for potential bias in either direction (i.e., toward the base or the focal group) for each set of comparisons.

ACT DIF studies are based on data collected during operational (i.e., “live”) test administrations of the ESL proficiency tests rather than on data gathered under more artificial circumstances. This approach mitigates the possibility of having data affected by student motivation. Use of operational data also helps to ensure that the fairness of items and test scores is evaluated under conditions identical to those in place during actual testing.

As part of each ESL test administration, both operational items (i.e., items that contribute to an examinee’s score) and pretest items are administered. Although pretest items have undergone both internal and external fairness panels, they do not contribute to examinee scores because the intent of pretesting is to gather additional data under operational conditions to conduct item analysis and calibrations.

Only one pretest item or item set (stimulus passage and associated items) is administered per examinee for each of the three ESL proficiency tests. However, while the number of pretest items presented for each examinee tested is relatively small, the exposure of pretest items is controlled within the COMPASS system so that all pretest items are presented within a particular range of ability level. This system-controlled presentation of pretest items means that there were sufficient data available for many of the pretest items in the ESL item pools. This approach to “forcing” pretest item presentation and gathering pretest data is the preferred model for DIF analysis because the pretest stage is the most appropriate time to eliminate items from pools.



This approach allows ACT to identify potentially biased items during pretesting; these items can be removed from the pool before they are included in the computation of an examinee’s score.

In contrast to controls associated with presenting pretest items, the principal drawback to using data from administrations of operational items (i.e., items that contribute to examinees’ scores) is the time required to collect sufficient data to conduct the necessary analyses. This is especially true with the ESL proficiency tests as a result of the following program features and constraints:

• The three ESL item pools are large, with approximately 200 operational items each. Furthermore, items that would otherwise be quite frequently administered are protected against overexposure, an explicit design of a computer-adaptive test to protect item security and mitigate overuse; the use of item exposure parameters results in a more balanced use of the pool of items. Accordingly, very large numbers of tests must be administered before most items are presented to adequate numbers of members of small demographic groups.

• Item pools are broad as well as deep, with each pool including items that cover a wide range of difficulty. Because the ESL tests are adaptive, difficult items are generally administered to better-prepared examinees, while easier items are administered to less-prepared examinees. Therefore, it is not sufficient that the ESL proficiency tests simply be administered to large numbers of examinees; these examinees collectively must also span a wide range of abilities.

• Because of the use of exposure controls to manage or limit the selection of operational items and the distribution of ability levels of ESL examinees, it is the case that there are operational ESL proficiency test items that did not achieve sufficient n-counts to include in the 2008 DIF study. Because DIF analyses are conducted as the required data become available, the performance of the most frequently administered items in the largest demographic groups is checked first.

Description of DIF procedures DIF analyses of COMPASS/ESL data are performed by a statistical procedure developed

by Mantel and Haenszel (1959) and adapted by Holland and Thayer (1988). Under these procedures, examinees are grouped into performance strata on the basis of a matching variable, usually their test scores. Each examinee’s 1/0 (right/wrong) item scores and demographic group membership (focal or base) are used to construct a 2 × 2 contingency table for each of k levels of the matching variable. The overall 2 × 2 × k table provides the data for the procedure.



Specifically, at level k for a given item, the 2 × 2 table is of the form

Group Correct = 1 Incorrect = 0 Total Base group Ak Bk nRk Focal group Ck Dk nFk Total m1k m0k Tk

where there are Tk examinees at strata k, nRk in the base group and nFk in the focal group. The frequencies of correct response are Ak for the base group and Ck for the focal group, and Bk and Dk are the frequency of incorrect response for the base and focal groups, respectively.

The Mantel-Haenszel common odds ratio is estimated by

Values of the odds ratio greater than 1.0 indicate DIF favoring the base group. Values less than 1.0 indicate DIF favoring the focal group. At ACT, values of the odds ratio that are < 0.5 or > 2.0 are flagged for further review.

A potential limitation of applying the Mantel-Haenszel procedure to adaptive tests concerns the matching variable. The raw score that is typically used in a paper-and-pencil test is not appropriate for an adaptive test because different examinees take tests of different length and composition. An alternative that was shown by Zwick, Thayer, and Wingersky (1993) to be effective is the expected true score based on the computer-adaptive model

where npool is the number of items in the pool and θ is the ability estimate obtained from the items actually administered. Since COMPASS reports the percentage, rather than number, of the pool’s items that an examinee would be expected to answer correctly, this score, ξ/npool , is used as the matching variable for all analyses.

Data used in DIF analyses Data for the DIF analyses reported here were collected as part of the operational use of

ESL proficiency tests administered from May 2004 through May 2008. A total of 188

ˆ

k k

kMH

k k

k

A DT =

CBT

α∑

∑

ˆ ˆ( )pooln

ii=1

= Pξ θ∑



institutions contributed data to these analyses. Table (3)4.2 presents a summary of these institutions by size, geographic region, and type of institution.

TABLE (3)4.2 Size and geographical representation of institutions in ESL proficiency tests DIF study

Description No. of schools

Institution size Under 5,000 118 5,000–10,000 37 10,000–20,000 23 Over 20,000 10

Region of U.S. Eastern 18 Northeastern 13 Southeastern 28 Midwestern 62 Western 32 Northwestern 8 Southwestern 27

Institution type by Carnegie classification * Doctorate-granting universities 7 Master’s colleges and universities 23 Baccalaureate colleges 9 Associates colleges 145 Special focus institutions (faith, health, business) 4

* Based on The Carnegie Foundation: http://www.carnegiefoundation.org/classifications/

The total numbers of examinee item- and test-level results used for the 2008 DIF analyses were as follows:

• 58,040 for the ESL Reading Proficiency Test

• 56,545 for the ESL Grammar/Usage Proficiency Test

• 53,132 for the ESL Listening Proficiency Test

Tables (3)4.3, (3)4.4, and (3)4.5 present summaries of these examinees by ethnic and gender background for the ESL Reading, Grammar/Usage, and Listening proficiency tests, respectively. In each table, the “Ethnicity” column has several labels with a slash mark such as “Asian American/Pacific Islander,” “Mexican American/Chicano/Latino,” “African American/Black,” and “Caucasian American/White.” These labels are included in a demographic item titled “Ethnic Background” that is automatically administered to all examinees who take a


http://www.carnegiefoundation.org/classifications/


COMPASS test. For non-native-English-speaking examinees who are administered one or more of the ESL proficiency tests, it is assumed that those who check the “African American/Black” category are identifying themselves as “Black” rather than as “African American.” Similarly, ESL examinees selecting the “Caucasian American/White category are assumed to be identifying themselves as “White” rather than as “Caucasian American.” The same reasoning applies to other categories with multiple ethnic descriptors.

TABLE (3)4.3 Sample sizes for each ethnic and gender group in ESL Reading Test DIF analyses Ethnicity Gender Total

Females Males




113 0.22 101 0.19 214 0.41


2,268 4.34 1,224 2.34 3,492 6.68

Mexican American/ Chicano/ Latino(a)

6,442 12.31 4,482 8.57 10,924 20.88


4,927 9.42 3,589 6.86 8,516 16.28


7,036 13.45 4,022 7.69 11,058 21.14

Filipino 423 0.81 210 0.40 633 1.21

Other1 7,363 14.08 4,954 9.47 12,317 23.55

Prefer not to respond1 882 1.69 804 1.54 1,686 3.22




TABLE (3)4.4 Sample sizes for each ethnic and gender group in ESL Grammar/Usage Test DIF analyses


Females Males




110 0.22 107 0.21 217 0.43


2,183 4.28 1,178 2.31 3,361 6.59


6,300 12.36 4371 8.58 10,671 20.94


4,554 8.94 3,481 6.83 8,035 15.77


7,032 13.8 4,017 7.88 11,049 21.68

Filipino 421 0.83 228 0.45 649 1.27

Other1 7,083 13.9 4,783 9.39 11,866 23.28

Prefer not to respond1 875 1.72 823 1.61 1,698 3.33




TABLE (3)4.5 Sample sizes for each ethnic and gender group in ESL Listening Test DIF analyses Ethnicity Gender Total

Females Males




88 0.18 80 0.17 168 0.35


1,800 3.77 977 2.04 2,777 5.81


5,720 11.97 4,036 8.44 9,756 20.41


3,974 8.31 3,128 6.54 7102 14.86


6,794 14.21 3,888 8.13 10,682 22.35

Filipino 367 0.77 188 0.39 555 1.16

Other2` 6,348 13.28 4,555 9.53 10,903 22.81





Summary of DIF analyses This section describes the DIF analyses that were conducted for the ESL Reading,

Grammar/Usage, and Listening proficiency tests. The first part of this section provides a description of the DIF analysis and item-review process. The final part of this section provides the results obtained from the analyses and subsequent review.

Items were classified on the basis of their Mantel-Haenszel values into one of three categories: A, B, or C.

• A-category items, which had Mantel-Haenszel values of less than 0.5, were judged as favoring the focal group.

• B-category items had Mantel-Haenszel values between 0.5 and 2.0 and were judged as favoring neither the focal nor the reference groups.

• C-category items had Mantel-Haenszel values that exceeded 2.0 and were judged as favoring the base or reference group.

Internal item reviews. All items receiving A or C classifications were subjected to additional review in an attempt to discern possible reasons for their favoring one group over another. ACT staff examined flagged items, as well as any relevant stimulus materials (e.g., passages, graphs, tables, audio files, figures), to identify any aspect of the item or associated materials that might possibly be responsible for the obtained DIF results. The reviewers applied the same criteria (i.e., use of familiar content/context, fair portrayal, and fairness in language) that were used by the external fairness panels to ensure a cohesive and consistent approach to the review processes.

All reviewed items and written reviewer comments were collected and consolidated. Once written comments were compiled, reviewers further examined all item-level comments, evaluated them, and determined an appropriate course of action. The three options for disposition of each item were as follows:

1. Remove the item and, if necessary, all related materials from the ESL item pools. 2. Remove the item, revise it, and recycle the revised item, along with all associated

stimulus materials, through the item-development process (including reevaluating the item via both external fairness review panels and DIF analysis).

3. Continue to use the item in its current form.

NOTE: As part of the ongoing ACT effort to promote and maintain fairness in its items and tests, all items that are returned to COMPASS pools continue to be examined along with all other items in later DIF analyses as additional data become available.



Description of items reviewed. Item usage rates can vary dramatically within a computer-adaptive test pool; similar variation exists in the size of the examinee sample available to evaluate each item. For example, while frequently used items may have been administered to thousands of examinees in both the focal and base groups, less frequently used items may have been presented to only a relative handful of examinees from either group. Items were included in the DIF analyses only if the sample size exceeded 150 examinees for both focal and base groups.

A limited number of pretest items are administered for each ESL test administration, all pretest items having been reviewed by external content and fairness panels and judged to be accurate and unbiased. These items are administered during operational tests to gather item-level statistics to inform decisions on moving the pretest items to operational use. Pretest items do not contribute to examinee scores. Only one pretest item or item set (stimulus passage and associated items) is administered per examinee for each of the three ESL tests.

For the 2008 DIF analyses, there were sufficient data to analyze many of the pretest items along with operational items. At the pretest stage, items identified and confirmed to be potentially biased can be removed from the pool before they contribute to an examinee’s score. The 2008 DIF analyses included both operational and pretest items when n-counts were sufficient.

DIF analysis results for ESL pools This subsection details the DIF information on item comparisons and disposition for each

of the three ESL proficiency tests.



ESL Reading Proficiency Test. The ESL Reading Proficiency Test analyses presented in Table (3)4.6 are based on a total of 58,040 examinees. A total of 589 comparisons were made across gender and three ethnicity comparison categories, of which 40 (or 6%) were flagged with Mantel-Haenszel values that exceeded the established criteria. Although a precise confidence level for the Mantel-Haenszel statistic is not known, experience indicates that chance alone will dictate that roughly 5% of all comparisons will fall into the A and C categories even when no DIF is present. The results presented here are somewhat higher than chance. The 40 flagged comparisons, involving 33 items, were reviewed. During the three-person review, no reasonable basis for possible DIF was found for 5 of these items and the decision was made to return them to the pool in their current form. The remaining items were removed from the ESL pool as part of a June 2012 pool update.

TABLE (3)4.6 Results of DIF analysis for the ESL Reading Proficiency Test

Comparison* No. of items

analyzed Mantel-Haenszel

values Focal group sample size Base group sample size


Female Male 219 2 217 0 316 6,852 615 250 4,624 431


Caucasian American/ White 84 4 77 3 547 5,337 536 167 1,286 536


Caucasian American/ White 84 4 79 1 302 2,192 1,609 167 1,286 536



202 9 176 17 363 5,337 433 152 2,192 182

*The ethnicity labels (e.g., Caucasian American) are used for both COMPASS and ESL tests. ESL students are instructed to select the category that best represents their ethnicity. A: Items with Mantel-Haenszel values of less than 0.5 are judged as favoring the focal group. B: Items with Mantel-Haenszel values between 0.5 and 2.0 are judged as favoring neither group. C: Items with Mantel-Haenszel values exceeding 2.0 are judged as favoring the base group.



ESL Grammar/Usage Proficiency Test. The ESL Grammar/Usage Proficiency Test analyses shown in Table (3)4.7 are based on a total of 56,545 examinees. A total of 580 comparisons were made, of which 63 (11 %) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are higher than chance. The 63 flagged comparisons were reviewed per the process described previously. For 14 of the 56 items, the possibility of bias was deemed sufficient for one or more groups, and the determination was made to revise and retest these items (and any related stimulus materials as appropriate). During the three-person review, no reasonable basis for possible DIF was found for 3 of these items and it was decided to return them to the pool in their current form. The remaining items were removed from the ESL pool as part of a June 2012 pool update.

TABLE (3)4.7 Results of DIF analysis for ESL Grammar/Usage Proficiency Test


analyzed



Female Male 227 1 226 0 256 6,414 615 175 4,545 415


Caucasian American/ White 80 1 75 4 325 5,469 909 236 1,112 465


Caucasian American/ White 80 3 70 7 174 2,129 1,389 236 1,112 465



193 22 146 25 308 5,469 441 152 2,129 179

*Ethnicity labels (e.g., Caucasian American) are used for both COMPASS and ESL tests. ESL students are instructed to select the category that best represents their ethnicity. A: Items with Mantel-Haenszel values of less than 0.5 are judged as favoring the focal group. B: Items with Mantel-Haenszel values between 0.5 and 2.0 are judged as favoring neither group. C: Items with Mantel-Haenszel values exceeding 2.0 are judged as favoring the base group.



ESL Listening Proficiency Test. The ESL Listening Proficiency Test analyses shown in Table (3)4.8 are based on a total of 53,132 examinees. A total of 413 comparisons were made, of which 12 (4%) were flagged with Mantel-Haenszel values that exceeded the criteria. These results are lower than might be expected due to chance alone. The 12 flagged comparisons, involving 11 items, were reviewed per the process described previously. During the three-person review, no reasonable basis for possible DIF was found for any of these items and the decision was made to return them to the pool in their current form. As additional data become available, these items, along with all others, will be reanalyzed for possible DIF.

TABLE (3)4.8 Results of DIF analysis for ESL Listening Proficiency Test


analyzed

Mantel-Haenszel Values

Focal group sample size Base group sample size


Female Male 211 0 210 1 247 6,356 457 183 4,603 329


Caucasian American/ White 65 1 64 0 1,307 4,929 3,403 210 779 504


Caucasian American/ White 65 1 61 3 639 1,638 1,396 210 779 504



72 6 66 0 312 4,929 3,303 151 1,638 1,394

*Ethnicity labels (e.g., Caucasian American) are used for both COMPASS and ESL tests. ESL students are instructed to select the category that best represents their ethnicity. A: Items with Mantel-Haenszel values of less than 0.5 are judged as favoring the focal group. B: Items with Mantel-Haenszel values between 0.5 and 2.0 are judged as favoring neither group. C: Items with Mantel-Haenszel values exceeding 2.0 are judged as favoring the base group.



Summary of DIF analyses for ESL pools The results reported for this DIF study are based on all data collected to date for the ESL

proficiency tests as administered within the COMPASS Internet Version. As indicated previously, the use of exposure controls to manage operational item selection and the range of ability levels of ESL examinees affects the item use and n-counts; therefore, some operational and pretest items did not achieve sufficient n-counts to include in this DIF study. However, DIF analyses are conducted as the data become available, and the performance of the most frequently administered items in the largest demographic groups is checked first. These DIF results include over 60% of all current operational and pretest items in the three ESL content areas.

ACT continues to collect data for these and all other ACT tests as part of its ongoing effort to ensure fairness in all ACT tests. ACT will continue evaluating the ESL item pools in terms of disposition of items and operational item exposures to balance the makeup of the pool (e.g., required content and range of difficulty) against the need to collect item-level data. As sufficient additional data become available for the ESL proficiency tests, these data will be analyzed and the results reported.


Part 3, Chapter 5: Development of COMPASS e-Write and ESL e-Write

Chapter 5: Development of COMPASS e-Write and ESL e-Write

Overview In 2001, ACT began developing an online direct writing assessment that could be

incorporated into the COMPASS system to further assist campuses in their course placement process. Progress toward that goal included substantial research and development of online assessment software and intensive field testing with selected campuses. ACT introduced COMPASS e-Write, a direct writing assessment, in December 2001.

COMPASS e-Write prompt development The general ACT approach to the development of the prompts used in COMPASS

e-Write was guided by three principles:

1. There must be thorough and open participation by all relevant populations in the prompt development process.

2. The prompt development process must be carefully designed, technically sound, rigorously implemented, and appropriately validated.

3. The prompt development process must be comprehensible to all interested parties and easily implemented by the process participants.

Following these principles enabled ACT to develop prompts that

• are accessible to all students, regardless of gender, cultural, or ethnic background;

• address social and cultural issues within the general knowledge of the students, free of “weighting” toward students with certain experiences; elicit original writing rather than simple restatement of the topic;

• elicit writing that will provide the basis for evaluation of the student’s ability to develop a central idea, synthesize concepts and ideas, present ideas cohesively and logically, and follow accepted practices of grammar, syntax, and mechanics.

The first major step in development was holding a prompt-writing workshop. ACT has found this an efficient method for producing high-quality prompts. The COMPASS workshop participants consisted of experts in a variety of humanities disciplines who were able to produce sample prompts addressing all of the required specifications. Before the workshop prompt



writers had submitted professional vitae, signed confidentiality agreements and received an ACT Prompt-Writing Guide. They were asked to develop topics before attending, following the prompt specifications in the guide as well as guidelines for the fair portrayal of various groups.

ACT Performance Assessment staff who are specialists in writing assessment conducted the workshop. The prompts generated during the workshop were revised as needed by ACT staff to ensure that all of them met the test specifications and to prepare them for field testing. Twice as many prompts were developed before reviews and field testing than were needed; this was to ensure the best possible quality for future operational prompts.

Over the course of development, ACT solicited reviews from individuals within five different focus groups: (1) African American, (2) Asian American, (3) Latino/Latina American, (4) Native American, and (5) women. In addition, writing experts reviewed the prompts for appropriateness and accessibility. ACT writing experts examined the feedback and comments from all of these individuals, incorporated the feedback into the prompts as appropriate, and prepared the prompts for field testing.

COMPASS e-Write range-finding and scoring After collecting the field-test responses (essays), ACT staff reviewed them as part of a

range-finding meeting. Along with external writing experts, the ACT staff selected responses that best represented the various score points of the COMPASS e-Write scale. The participating writing experts all held advanced degrees in writing and had experience teaching college-level writing courses and assessing student writing. The selected responses were used as part of the rater training process in preparation for scoring.

The raters selected for training in the scoring process had previous experience evaluating writing assessments and had a minimum of an undergraduate degree in English, education, or a related field. Raters received extensive training before starting the actual COMPASS e-Write scoring project, having been required first to demonstrate a predetermined level of accuracy. Raters were asked to report their overall impressions of the prompts and of the criteria they developed to score the responses. Raters’ comments about the efficacy of the prompts were used to select operational prompts. Each response was rated independently by two raters using the scoring features defined in the scoring guide. The holistic score assigned to each response was the sum of the two ratings. A third rater resolved any discrepancies of more than 1 point between the two raters.



COMPASS e-Write 2–8 prompt selection Based on all quantitative and qualitative data, five prompts were originally recommended

for inclusion in the COMPASS e-Write program. A sixth prompt was added in 2004, having gone through the same development cycle and selection process described in this section. Operational prompts were selected based on accessibility, range of scores for responses to each prompt, and raters’ observations about writers’ responses (e.g., the prompt’s success in eliciting reasoned, well-developed responses at the higher score levels).

Table (3)5.1 summarizes the technical results of the scoring of the six operational prompts by trained raters. In the table, “mean score” refers to the average of the sums of all pairs of raters for all responses to each prompt. “Standard deviation” refers to the average difference in those scores. “Exact agreement” refers to exact matches in the score assigned by the two raters, and “adjacent agreement” refers to scores of each rater that differ by no more than 1 point across the 4-point scale.

There was little variation in the mean scores or standard deviations for the distributions. The results provided in Table (3)5.1 indicate that the prompts were very similar in terms of overall performance and consistency of scoring. The means and standard deviations indicate that the six prompts are interchangeable and should be treated as equivalent forms of the assessment. The percent of exact agreement was also very consistent across prompts, ranging from 63% to 67%. These indices indicate that the raters were able to apply the scoring rubric in a very consistent manner.

TABLE (3)5.1 Descriptive statistics for prompt scores assigned by trained raters*

Prompt Mean score

Standard deviation

Exact agreement (%)

Exact + adjacent agreement (%)

COMP101 5.75 .79 66 100

COMP102 5.76 .78 65 100

COMP103 5.74 .75 66 100

COMP104 5.73 .80 67 100

COMP105 5.72 .74 65 100

COMP109 5.69 .74 63 100

* Based on 4-point rubric



COMPASS e-Write 2–12 prompt addition To offer users broader choices in meeting testing needs, three COMPASS e-Write 2–12 prompts were initially added to COMPASS e-Write in June 2006, with three additional prompts deployed in March 2012. These COMPASS e-Write prompts are reported on a scale of 2–12, rather than on a scale of 2–8. The only difference between the two types of prompts is the number of score points on the scale used to score them—the original prompts use a 4-point scoring rubric; the additional COMPASS e-Write prompts use a 6-point rubric.

The 2–12 score scale prompts were developed following the same strict prompt-writing guidelines described previously for COMPASS e-Write 2–8 prompts. Furthermore, the same rigorous field-testing procedures used for COMPASS e-Write 2–8 prompts were used to identify suitable operational prompts to be scored on the 2–12 score scale.

COMPASS e-Write 2–12 prompt selection Based on all available data (quantitative and qualitative), three COMPASS e-Write 2–12 prompts were originally integrated into COMPASS in 2006. Based on field-test studies, range-finding, and scoring conducted in 2006 to 2009, three more COMPASS e-Write 2–12 prompts were added in 2012. These operational prompts were selected based on accessibility, range of scores for responses to each prompt, and expert rater observations about writers’ responses (e.g., the prompt’s success in eliciting well-developed responses at the higher score levels).

Table (3)5.2 summarizes the technical results of the scoring of the six operational COMPASS e-Write 2–12 prompts by trained raters. In the table, “mean score” refers to the average of the sums of all pairs of raters for all responses to each prompt. “Standard deviation” refers to the average difference in those scores. “Exact agreement” refers to exact matches in the score assigned by two raters, and “adjacent agreement” refers to scores of each rater that differ by no more than 1 point across the 6-point scale.

There was no significant variation in the mean scores or in the standard deviations for the distributions. The means and standard deviations outlined in Table (3)5.2 indicate that the six prompts are interchangeable and should be treated as equivalent forms of the assessment. The percent of exact agreement was also very consistent across prompts, ranging from 73% to 79%, indicating that raters were able to apply the scoring rubric in a very consistent manner.



TABLE (3)5.2 COMPASS e-Write 2–12 prompt holistic scores assigned by trained raters*

Prompt Mean score Standard deviation

Exact agreement (%)


COMP110 6.76 1.84 79 100

COMP114 6.62 1.89 79 100

COMP115 6.75 1.69 79 100

COMP116 6.56 1.79 74 100

COMP118 6.85 1.69 73 100

COMP120 6.70 1.77 76 100


COMPASS e-Write automated scoring study COMPASS e-Write uses the IntelliMetric automated scoring system developed by

Vantage Learning Technologies. IntelliMetric is an artificial intelligence-based scoring system and must be “trained” (or calibrated) using sets of human rater-scored responses for each score point. These responses are used as a basis for the system to emulate human rater scoring. The IntelliMetric system internalizes the characteristics of the responses associated with each score point and applies this calibration to subsequent automated scoring. To verify the accuracy of IntelliMetric scoring for COMPASS, ACT and Vantage conducted COMPASS e-Write scoring studies to focus on the validity of the score dimensions. Studies show that IntelliMetric is an effective tool for scoring essay responses. The following describes these automated scoring studies.

Data source and preparation The data used to develop the scoring models for the holistic scores were obtained from examinee responses to prompts accumulated from both COMPASS e-Write field tests and past operational administrations. A total of 12 COMPASS e-Write prompts were modeled. Approximately 300 responses total were used to calibrate and validate scores for each of the prompts. Two expert raters scored each response using 4-point rubrics for COMPASS e-Write 2–8 or 6-point rubrics for COMPASS e-Write 2–12. The average score across the two raters for each response was used as a basis for the automated scoring engine calibration.

For the development of the automated scoring models, approximately 300 responses to each of the prompts were used. Of these 300 responses, 50 were randomly drawn from the total dataset and set aside for use in validating the model. The remaining 250 responses were used to train the scoring system. Once calibration of the automated scoring model was completed,



subsequent IntelliMetric scores for the 50 validation essays were made “blind” (i.e., without any knowledge of the actual scores). Using the 50 unknown validation responses allowed ACT and Vantage to analyze automated scoring model results without artificially inflating results and contributing to false expectations for performance under operational conditions.

Agreement and correlation results The frequency with which COMPASS e-Write system-generated scores agreed with

scores assigned by expert raters was calculated to determine the extent to which automated scoring would yield scores similar to those assigned by human raters. The Pearson correlation between system-assigned scores and scores assigned by expert raters for both exact agreement and adjacent agreement was computed as a measure of the overall relationship between the two data sets. The Pearson correlation theoretically varies from –1 to +1. However, the true value of this statistic under operational conditions is highly dependent on the variance in the data set. Reduced variance will significantly underestimate the correlation.

COMPASS e-Write 2–8 study results Based on study results for COMPASS e-Write 2–8 (i.e., scored using a 4-point rubric),

the rates of agreement between IntelliMetric scores and expert rater scores were within 1 point 100% of the time and exactly matched 66% to 88% of the time. The correlations of scores assigned by the COMPASS e-Write system and expert raters ranged from 0.67 to 0.83. These results are presented in Table (3)5.3. As a whole, the agreement rates meet or exceed the levels typically obtained by expert raters [see Table (3)5.1]. The Pearson correlation for the six 4-point prompts examined (0.67 to 0.83) is within an acceptable range and conforms to expectations for response scoring employing a 4-point rubric.

TABLE (3)5.3 Agreement and relationship between human and IntelliMetric scoring *

Prompt Exact agreement (%)


Correlation

COMP101 78 100 .73

COMP102 66 100 .69

COMP103 76 100 .67

COMP104 74 100 .67

COMP105 88 100 .83

COMP109 74 100 .75




COMPASS e-Write 2–12 study results Based on study results for COMPASS e-Write 2–12 (i.e., scored using a 6-point rubric), the rates of agreement between IntelliMetric scores and expert rater scores were within one point 100% of the time and exactly matched 58% to 78% of the time. The correlations of scores assigned by IntelliMetric and expert raters ranged from 0.74 to 0.85.

These results are presented in Table (3)5.4. As is to be expected when comparing response scoring using a 4-point scale to scoring using a 6-point scale, the rate of agreement between system-generated scores and scores assigned by expert raters was slightly lower using the 6-point scale. However, the Pearson correlation for the 1–6 scale prompts (.74 to .85) is well within an acceptable range and conforms to expectations for response scoring employing a 6-point scale.

TABLE (3)5.4 Agreement and relationship between human and IntelliMetric scoring * Prompt Exact agreement

(%) Exact + adjacent agreement (%)

Correlation

COMP110 58 100 .74

COMP114 70 100 .85

COMP115 76 100 .75

COMP116 78 100 .81

COMP118 74 100 .83

COMP120 74 100 .81 * Based on 6-point rubric

Comparable scoring results Overall, the results of COMPASS e-Write studies confirm Vantage research and verify that IntelliMetric is scoring responses at levels consistent with assessment industry standards. With an agreement of 100% on adjacent scores, COMPASS e-Write automated scoring achieves levels of performance at or above what is experienced with expert raters. In addition, ACT conducts routine quality monitoring by having human raters rescore samples of essays scored by the IntelliMetric scoring engine to ensure that the expected agreement is maintained.



COMPASS e-Write score scale concordance study In 2004, ACT conducted a concordance study to provide users with comparisons of the two COMPASS e-Write score scales. For the purposes of this study, the same responses to the same prompt were scored by the same group of raters; the only variable in the study was the scoring rubric used. ACT Assessment Center staff scored 1,132 COMPASS e-Write responses using both the 2–8 scoring rubric and the 2–12 scoring rubric. Table (3)5.5 provides data comparisons for this concordance study. Figure (3)5.1 provides a graphic comparison of the two COMPASS e-Write score scales.

TABLE (3)5.5 COMPASS e-Write score scale comparisons

COMPASS e-Write

2–12 score

% at or below

(cumulative %) % within

score category

COMPASS e-Write 2–8

score

% at or below

(cumulative %)

% within score

category

12 11

100.00

98.85

1.15

0.97 8 100.00 1.50

10 9

97.88

94.08

3.80

3.80 7 98.50 2.74

8 7

90.28

62.28

28.00

10.69 6 95.76 37.99

6 51.59 34.54 5 57.77 34.19

5

4

3

17.05

12.28

3.18

4.77

9.10

0.71

4 3

23.5 6.71

16.87 3.80

2 2.47 2.47 2 2.92 2.92



FIGURE (3)5.1 COMPASS e-Write score scale comparisons

COMPASS e-Write Special StudyComparing 2-12 Scale to 2-8 Scale

9.1

16.94.8

34.234.5

38.0

10.7

28.0

2.5 2.90.7 3.8

2.7 1.53.83.8 1.0 1.2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 - 12 Scale 2 - 8 Scale

% A

chie

ving

Sco

re(c

umul

ativ

e pe

rcen

t)



The data for this study represented in Figure (3)5.1 show the percentage of responses receiving specific scores for both score scales. The data represented within Table (3)5.5 provides general comparisons for score-level frequency distributions in terms of the two scales. For example, the frequencies of the scores of 11 and 12 on the COMPASS e-Write 2–12 score scale most closely align with the frequency of the score of 8 on the COMPASS e-Write 2–8 score scale. A score of 6 on the COMPASS e-Write 2–12 score scale most closely aligns with a score of 5 on the COMPASS e-Write 2–8 score scale. There is greater differentiation of scores 3–5 on the COMPASS e-Write 2–12 score scale; these scores tend to correspond to scores of 3 and 4 on the COMPASS e-Write 2–8 score scale.

This comparative information will be helpful in guiding COMPASS user decisions as they transition to a new direct writing model. ACT recommends that as users adopt the new COMPASS e-Write 2–12 prompts and scoring model, they review institution-level results (e.g., frequency distributions) in conjunction with local cutoff scores and placement messages to ensure that these align with institution-level needs; cutoff scores and placement messages should be adjusted as deemed necessary.

Overview of ESL e-Write In spring 2003, ACT began developing an online direct ESL writing assessment that could be incorporated into the COMPASS system to further assist campuses in their course placement process. Progress toward that goal included substantial research and development of online assessment software and intensive fieldwork with selected campuses in the fall of 2003 and the spring and fall of 2004. ACT introduced ESL e-Write, an ESL-specific direct writing assessment, in June 2006.

ESL e-Write prompt development The development of the ESL e-Write prompts was guided by same three principles

applied to COMPASS e-Write:

1. There must be thorough and open participation by all relevant populations in the prompt development process.

2. The prompt development process must be carefully designed, technically sound, rigorously implemented, and appropriately validated.

3. The prompt development process must be comprehensible to all interested parties and easily implemented by the process participants.

These principles contribute to a development process that results in writing prompts that are accessible to all students and that address social and cultural issues within the general



knowledge of ESL students. The prompts are free of bias toward students with certain experiences. They elicit original writing rather than simple restatement of the topic and elicit writing that will provide the basis for evaluation of the student’s ability to develop a central idea, to synthesize concepts and ideas, to present ideas cohesively and logically, and to follow accepted practices of grammar, syntax, and mechanics.

The first major step in development was the prompt-writing workshop. Prompt writers were experts in English language acquisition, ESL teaching, and ESL writing who were able to produce prompts addressing all the specifications listed above. Prompt writers were asked to submit professional vitae and to sign confidentiality agreements.

ACT staff developed a Prompt-Writing Guide specific to the ESL requirements that was sent to prompt writers in advance of the workshop. Prompt writers were asked to develop prompt topics before attending the workshop. The guide included specific information about the writing sample and specifications for prompts. Guidelines were also included regarding the fair portrayal of various population groups.

ACT Performance Assessment staff who specialize in writing assessment conducted the prompt-writing workshop. The prompts generated through the workshop were revised as needed by ACT staff to ensure that all prompts met test specifications and to prepare the prompts for field testing. As a way to ensure the best possible quality for operational prompts in the future, twice as many prompts as needed were developed.

Over the course of prompt development, ACT solicited reviews from individuals representing five different focus groups: African Americans, Asian Americans, Latino/Latina Americans, Native Americans, and women. In addition, experts in ESL writing production reviewed the prompts for appropriateness and accessibility. ACT writing experts examined the feedback and comments from these individuals, incorporated this feedback into the prompts where appropriate, and prepared the prompts for field testing.

Approximately 100 students responded to each of the prompts during field testing. The prompts were spiraled for the field-test administrations, creating randomly equivalent groups of students. The field-test information was used to study the comparability of prompts and to select prompts for operational use.



ESL e-Write scoring and range-finding After responses were collected from the field-test studies, they were reviewed as part of a

range-finding meeting. ACT staff and writing experts worked together to select responses that best represented the various score points of the ESL e-Write scales. The writing experts held advanced degrees in ESL writing and had experience in teaching college-level writing courses and in assessing student writing. The selected responses were used as part of the rater-training process in preparation for scoring.

The raters selected for training in the scoring process had previous experience evaluating ESL writing assessments and had a minimum of an undergraduate degree in English, education, or a related field. Raters received extensive training prior to participation in the scoring project and were required to demonstrate a predetermined level of accuracy prior to scoring. Raters were asked to report their overall impressions of the prompts and of the criteria they developed to score the responses. Raters’ comments about the efficacy of the prompts and data resulting from the field test scoring were used to select operational prompts. Each response was rated independently by two raters on a scale of 1 to 6 in each of the five skill domains, using the scoring features defined in the scoring guide. A third rater resolved any discrepancies of more than 1 point between the two raters. Descriptions of the six score points in each domain are provided in Part 2, Chapter 5, “Direct writing assessment,” in this Reference Manual.

Selection of operational prompts for ESL e-Write Based on all available data (quantitative and qualitative), three prompts were

recommended for inclusion in the ESL e-Write program. Operational prompts were selected based on accessibility, range of scores for responses, and raters’ observations about writers’ responses (e.g., the prompt’s success in eliciting reasoned, well-developed responses at the higher score levels).

Table (3)5.6 summarizes the technical results of the analytic scores assigned by trained raters for each domain for the three ESL prompts. Because the primary scoring model for the ESL prompts is based on the analytic scores, these are the categories represented within this table. In Table (3)5.6, “mean score” refers to the average of the sums of all pairs of raters for each analytic score for each domain across all three prompts. “Standard deviation” refers to the average difference in those scores. “Exact agreement” refers to the average of exact matches in the score assigned by the two raters, and “adjacent agreement” refers to average scores of each rater that differ by no more than 1 point across the 6-point analytic scale.

Although some variation occurred in the mean scores and the standard deviations for the analytic scores for the three prompts, this variation was not significant. The results provided in



Table (3)5.6 indicate that the domain-level scoring for the ESL prompts was similar in terms of overall performance and consistency of scoring. The percent of exact agreement was consistent across analytic scores, ranging from 63% for Mechanics to 78% for Development. These indices indicate that the raters were able to apply the analytic scoring rubric in a consistent manner.

TABLE (3)5.6 Descriptive statistics for analytic scores assigned by trained raters

Domain Mean score

Standard deviation

Exact agreement (%)


Development 6.64 2.26 78 100

Focus 6.96 2.19 64 100

Organization 5.75 2.22 68 100

Language Use 6.65 2.07 66 100

Mechanics 7.14 2.08 63 100

ESL e-Write scoring system study Scoring process overview ESL e-Write uses the IntelliMetric scoring system developed by Vantage Learning Technologies (the same system that COMPASS e-Write uses). IntelliMetric emulates the process carried out by human raters by using an artificial intelligence-based scoring system that must be “trained” with a set of previously scored responses containing “known score” marker responses for each score point. These responses are used as a basis for the system to infer the rubric and the pooled judgments of the human raters. The IntelliMetric system internalizes the characteristics of the responses associated with each score point and applies this intelligence in subsequent scoring. A unique solution is created for each stimulus or prompt. This is conceptually similar to prompt-specific training for human raters. This allows IntelliMetric to achieve acceptable correlations with the scores of human raters and high matching percentages with scores awarded by human raters.

Previous studies have shown IntelliMetric to be an effective tool for scoring responses to various direct writing assessments. With this well established, models were produced for the three ESL prompts slated for operational use. The ESL e-Write study included modeling for the analytic scores assigned to each domain for each ESL prompt.

Data source and preparation The data used to develop the analytic scoring models for the ESL e-Write were obtained

from responses to the prompts accumulated from pilot administrations. Five domains for three



ESL e-Write prompts were modeled. Approximately 500 responses were used to calibrate ESL e-Write for each domain. Two expert raters scored each domain for each response on an analytic score scale ranging from 1 to 6. The average analytic score across the two raters for each domain within each response was used as a basis for the scoring system calibration.

The data for the analytic score calibrations were split into two sets. Five separate calibration models, one for each of the scores, were derived. The analyses were conducted “blind” to avoid the pitfall encountered in some scoring validation studies where the training and prediction are carried out on the same dataset. IntelliMetric predictions were made without any knowledge of the actual scores. A failure to separate training and validation artificially inflates results and contributes to false expectations for performance under operational conditions. Therefore, the responses for the validation responses were treated as unknown. IntelliMetric was trained and then used to score the remaining “unknown” responses.

Analytic score scale study results The rate of agreement of scores assigned by the automated ESL e-Write scoring system and those assigned by expert raters was examined. ESL e-Write analytic scores assigned by IntelliMetric agreed with scores assigned by the expert raters within one point nearly 100% of the time; exact agreement ranged from 61% for the Mechanics scale to 77% for Development. The correlation between IntelliMetric and expert raters ranged from 0.79 to 0.92.

These agreement rates and the Pearson correlations for the individual ESL domain scales are within acceptable ranges and conform to expectations for scoring. These results are presented in Table (3)5.7. The level of analytic score agreement and overall correlation for ESL e-Write based on IntelliMetric versus human scoring comparisons is equal to scoring study results for the COMPASS e-Write prompts.

TABLE (3)5.7 Summary data for analytic scales

Domain Comparison between ACT raters and IntelliMetric scoring

Exact agreement (%)


Correlation

Development 77 100 .92 Focus 66 99 .86 Organization 66 99 .86 Language Use 62 97 .79 Mechanics 61 99 .80

In the most recent audit of the accuracy of IntelliMetric scoring for ESL e-Write, a total of 250 operationally scored essays were selected for each of the three prompts, representing



responses from January 2007 to January 2009. Table (3)5.8 compares the analytic scores of ACT expert raters to the scores assigned by IntelliMetric, averaged across the three prompts. These agreement rates and the Pearson correlations for the individual ESL e-Write domain scales are again verified to be within acceptable ranges and conform to expectations for scoring.

TABLE (3)5.8 Summary data for analytic scales: Spring 2009 audit

Domain Comparison between ACT raters and ESL e-Write scoring

Exact agreement (%)

Exact + adjacent agreement (%) Correlation

Development 70 100 .94 Focus 67 98 .92 Organization 65 98 .91 Language Use 58 98 .86 Mechanics 60 98 .88

Comparability of scoring approaches The results of ACT’s studies verify that IntelliMetric is scoring responses at levels

consistent with assessment industry standards. With an exact plus adjacent agreement rate approaching 100%, ESL e-Write’s computerized scoring achieves levels of agreement consistent with agreement statistics for two expert raters. ACT will continue to routinely monitor samples of student essays scored by the IntelliMetric scoring engine to provide ongoing verification of the agreement between system score results and expert rater scores. In the case of the new ESL e-Write prompts, trained ACT raters will be conducting additional human scoring for specific ranges of ESL e-Write responses (i.e., upper and lower range scores) to confirm the accuracy of assigned scores and further refine the ESL e-Write scoring system.

For students who are not prepared to enter their writing assessment online, ACT will continue to offer the traditional approach using paper-and-pencil assessment and scoring via two expert raters. The same scoring rubric is applied in both scoring approaches (the IntelliMetric model and the human raters), making the scores from both systems directly comparable for use by the institution in making ESL student placement decisions.


Part 3, Chapter 6: Calibration of test items

Chapter 6: Calibration of test items

Overview Because tests in the COMPASS system are adaptively administered, few, if any,

examinees will receive exactly the same items on a particular test. Therefore, items in the available item pool for each COMPASS test must be interpretable along some common metric. To meet this requirement, each of the COMPASS item pools is separately scaled using a three-parameter logistic item response theory (IRT) model (Lord, 1980). This scaling, known as item calibration, allows for the direct comparison of test performance of examinees even if their tests are composed of entirely different sets of items. Items for the COMPASS Mathematics, Reading, and Writing Skills tests and the ESL Reading, ESL Grammar/Usage, and ESL Listening components are calibrated separately because of their unique characteristics and requirements. The following sections describe the pretesting and calibration of COMPASS and ESL items.

General item calibration description The general pretest item calibration process is similar for all COMPASS pools, although

there are some differences between pools that use discrete multiple-choice items (e.g., any of the five math domains) and those pools that use item sets (e.g., Reading, which includes a passage and a set of associated items). Pretest items (i.e., items that don’t “count” toward an examinee score) are embedded in operational test sessions and are invisible to examinees so that responses are not influenced by examinee motivation. For COMPASS pools that offer discrete items, a single pretest item is embedded in the test administration. For COMPASS pools that present sets of items, a single item set is embedded in the session.

The general approach is to target a minimum pretest item sample size that is equal to or greater than 1,000 responses for each item so that sufficient examinee data is available. Item-level examinee response data is extracted from the COMPASS system. The item-level examinee extract is “cleaned” to remove any data that may prove problematic (e.g., sessions that were not completed).

ACT initially selects eligible items using BILOG. At the initial stage, BILOG is used to determine which items should be excluded from the fixed calibration since it provides more informative diagnostic tools than PARSCALE (e.g., item fitting curves). The first BILOG calibration is conducted for all eligible items, which includes both operational and pretest items with sufficient counts, to ensure that items are calibrated on the same scale.



The resulting IRT parameters are examined according to the following criteria:

• IRT-A parameter > 3.0

• IRT-B parameter < – 4.5 or > 4.5

• IRT-C parameter > 0.4

• Biserial correlation < 0.01 Using the initial BILOG calibration results and the criteria above, ACT identifies items

that do not adhere to the established criteria. ACT reviews flagged items to determine the next steps, which can include excluding items due to their not meeting the criteria (e.g., IRT-A parameter too high, negative biserial correlation). In some cases, items with results that are on the border of the criteria above may be retained for a second BILOG calibration. After a second BILOG calibration is conducted, flagged items that do not adhere to the IRT-A, IRT-B, IRT-C, and biserial correlation criteria may be excluded from the fixed calibration.

PARSCALE is used to perform fixed parameter calibration because it allows item parameters of the operational items to be fixed at their original values, while those of the pretest items are estimated. In addition, PARSCALE allows the prior parameters to be updated so that parameters of pretest items can be estimated more accurately than by methods used in other calibration programs (Kang & Petersen, 2009; Kim, 2006). The data requirements for fixed parameter calibration include the use of a control file with calibration specifications, a candidate response data file, a “not presented” file (for handling missing responses in the item matrix), and an item parameter file for providing fixed item parameters of operational items as input to the calibration. These files are introduced as an iterative process to monitor results for each PARSCALE fixed parameter calibration activity (e.g., adding “missing response” data).

Iterations of PARSCALE fixed parameter calibration results are closely monitored to identify convergence at the 0.01 criterion level. The iterative process, with the application of varying data characteristics (e.g., “missing response” data), allows for the examination of results based on permutations of possible response scenarios. In some cases, the introduction of data differences will result in similar outcomes. In other cases, results may show very different parameter estimates for individual items, indicating that these items have unstable estimates from different calibration runs and should, potentially, be excluded from use. Data summaries and graphic representations of results are used to pinpoint calibration results (and item-level flags) for each iteration for A-, B-, and C-IRT values. Once convergence occurs for the fixed parameter calibrations, all items included are reviewed for any results that fall outside the established criteria. This includes a review of both the estimated IRT parameters and the standard error of the estimated values. Those items where IRT-A, IRT-B, and IRT-C values are suspect are not eligible for operational use; these items may be refined and re-routed for pretesting.



Mathematics, Reading and Writing Skills descriptions COMPASS Mathematics

COMPASS Mathematics is composed of five distinct content domains: Numerical Skills/Prealgebra, Algebra, College Algebra, Geometry, and Trigonometry. The COMPASS Mathematics item pools consist of approximately 1,160 discrete operational test items. The numerical skills/prealgebra and algebra content domains are further subdivided into seven and eight diagnostic areas, respectively. These diagnostic areas are listed in Table (3)6.1.

TABLE (3)6.1 COMPASS Mathematics diagnostic areas

Numerical Skills/Prealgebra Algebra Basic operations with integers Basic operations with fractions Basic operations with decimals Positive integer exponents Ratios and proportions Percentages Averages (means, medians, and modes)

Substituting values into algebraic expressions Setting up equations for given situations Basic operations with polynomials Factoring polynomials Linear equations in one variable Exponents and radicals Rational expressions Linear equations in two variables

Each math domain and diagnostic area is considered a separate test and is independently administered and scored. Any or all of the five content domains or diagnostic tests may be presented to a given examinee. Additional information on the contents of the COMPASS Mathematics placement and diagnostic tests can be found in Part 2, Chapter 3, “Mathematics tests” in this Reference Manual.

Approximately 70% of the operational items in COMPASS math pools were originally obtained from the ACT Assessment Program (AAP) or were items developed for the ASSET program but never used. In April 2011, ACT added pretest items to COMPASS mathematics pools. Again, pretest data are gathered by administering the pretest items to COMPASS examinees in addition to their regular COMPASS mathematics placement tests. Once sufficient pretest item data were available, ACT extracted item-level response information for all items. Calibration and scale linking was done independently for each of the five content domains. Items with appropriate content and statistical characteristics were incorporated into the operational pools in June 2012. Additional COMPASS math items continue to be developed and pretested on an ongoing basis. As sufficient pretest item data become available, the pretest items will be calibrated, and items that perform well will be added to the operational pool.



COMPASS Reading The Reading Placement Test pool consists of 71 passages classified into five types:

natural sciences, social sciences, prose fiction, humanities, and practical reading. All passages were developed exclusively for use in COMPASS. As of June 2012, COMPASS Reading includes a total of 330 operational items.

When administered operationally, each passage is followed by about five reading comprehension items that ask examinees questions on what they have read. At least two of the reading comprehension items are classified as referring items, which can be answered by simple reference to information presented in the passage. Additionally, at least two items are classified as reasoning items because they require examinees to draw conclusions or make inferences beyond the information presented explicitly. For details on the development of Reading items, refer to Part 2, Chapter 1, “Reading tests” in this Reference Manual.

Initially, 34 passages were pretested by administering them to almost 8,000 examinees drawn from approximately 80 two- and four-year colleges. Each examinee was presented two passages and the associated items. Each passage was administered to roughly 500 examinees under a design that provided scaling links between all pretested passages. The pretest data were then used to scale items to a common metric by fitting a three-parameter logistic-response model with BILOG. For this pretesting, examinees were presented a total of eight items for each passage, and these item sets were trimmed to their operational sizes of about five items after reviewing the content and statistical properties of the items. The items that performed most satisfactorily remained in the sets.

In 2011, pretest item sets including approximately eight items each were added to COMPASS. These pretest item sets were administered to examinees at the rate of one set for each test administration. Based on the data gathered, in 2011, Reading pretest items were calibrated. In cases where items in a set did not perform well (or performed less satisfactorily), items were removed to reduce the number of items in a set to five. Based on this effort, ACT added 37 more operational item sets to the pool in June 2012. Additional Reading passages and items continue to be developed and pretested on an ongoing basis, with significant numbers of additional pretest items added in 2012. As sufficient pretest item data become available, the pretest items will be calibrated. Items that perform well will be added to the operational pool.



COMPASS Writing Skills The Writing Skills Placement Test consists of 228 items associated with passages, and

the format of this multiple-choice writing test is unique. Each item set contains two types of items: (1) passage segments that require an examinee to identify errors in writing and (2) summary items that focus on more global rhetorical skills. For the editing items, examinees are presented with a passage and are asked to read it while looking for problems in grammar, usage, and style. Upon finding an error, students can replace that portion of text with one of the answer options. Not all portions of the passage are editable, and not all segments that are editable are incorrect. Examinees are advised of this format within the Writing Skills instructions prior to starting the test. Of the answer options, option “A” always reproduces the original text segment as it appears in the passage. If the segment selected by the examinee contains no error, then the correct alternative would be option “A.” Allowing students to select and correct parts of the passage broadens the task from simple recognition of the most plausible alternative to a more generative error-identification exercise.

In addition to the items that correspond to passage segments, the Writing Skills Placement Test has one or two multiple-choice rhetorical skills or strategy items that appear after the examinee is finished revising the passage. These items pose global questions related to the passage or portions of the passage. All Writing Skills passages were developed exclusively for COMPASS; for details, refer to Part 2, Chapter 3, “Writing Skills tests” in this Reference Manual.

Initially, Writing Skills passages were pretested by administering them to a total of almost 4,000 examinees drawn from nearly 80 two- and four-year colleges and universities. Every examinee was presented two randomly selected passages, resulting in about 400 examinees being tested on each passage. The data were used to scale items to a common metric with the BILOG program. In 2011–2012, Writing Skills item sets were recalibrated by ACT using both BILOG and PARSCALE to accommodate adjustments to sets (e.g., removal of flawed items identified as part of a DIF study); the adjusted item sets were deployed in June 2012. Additional Writing Skills Placement Test items continue to be developed and pretested on an ongoing basis, with significant numbers of additional pretest items added in 2012. As sufficient pretest item data become available, new items will be calibrated, and those that perform well will be moved to operational status.

ESL Reading, Grammar/Usage and Listening Item sets for all three modules of the ESL proficiency tests were developed exclusively

for COMPASS. The content of the passages and/or graphics (or in listening, the aural stimuli) in



all three modules reflects concrete, everyday experiences at the lower proficiency levels and more abstract, academic subjects in the higher levels. For the three ESL modules, short passages, graphics, or aural stimuli at the lowest proficiencies have one or two associated items. Longer passages or aural stimuli at the higher levels have more associated items, with those at the highest level having five associated items. For a description of ESL item development, refer to Part 2, Chapter 4, “English as a Second Language tests” in this Reference Manual.

In the ESL Grammar/Usage module, item types fall into the two broad categories of Sentence Elements and Sentence Structure and Syntax. Sentence Elements items measure understanding of grammar and usage at the word or phrase level, while Sentence Structure and Syntax items measure understanding at the clause, sentence, paragraph, or passage level. At the lowest level, Sentence Elements items make up approximately 75% of the pool and Sentence Structure and Syntax items make up about 25%. The ratio gradually changes until, at the highest level, Sentence Elements items make up approximately 40% of the pool and Sentence Structure and Syntax items make up about 60%.

The ESL Reading module has two broad categories of items types, Referring and Reasoning. Referring items can be answered by reference to information presented in the passage, while Reasoning items require examinees to draw conclusions or make inferences beyond the information presented explicitly in the passage. At the lowest level, Referring items account for 75% of the item pool and Reasoning items the remaining 25%. This ratio gradually changes until, at the highest level, Referring items make up 25% of the item pool, and Reasoning items make up 75%.

In the ESL Listening module, items fall into two broad categories: Explicit and Implicit. Explicit items require examinees to recognize information mentioned explicitly in the aural stimulus, and Implicit items require examinees to draw conclusions or make inferences based on the stimulus. At the lowest level, Explicit items make up 80% of the item pool and Implicit items make up 20%. Because of the inherent difficulty of listening comprehension, the ratio of item types in the Listening module does not change significantly from low to high levels. At the highest level, Explicit items make up 70% of the item pool and Implicit items make up 30%.

A total of approximately 200 items in each module were pretested. Passages and items were pretested by administering them to examinees from 55 two- and four-year colleges and universities. The approximate numbers of different examinees tested were: 2,400 for the Grammar/Usage module, 2,600 for the Reading module, and 2,300 for the Listening module. Each examinee was presented with items representing a wide range of English proficiency. For each module, from 6 to 8 passages with a total of 20 items were presented to each examinee. Each passage and its associated items were administered to 300 to 400 examinees under a design



that provided scaling links between all items. The pretest data were then used to scale the items to a common metric by fitting a three-parameter logistic response model with BILOG.

Operational item pools were selected by reviewing content and statistical properties of items and selecting the sets of items that provided the best content balance and statistical characteristics. Additional items are being developed and pretested for the three multiple-choice ESL tests. As sufficient pretest data are collected, these items are calibrated, and items that perform well are added to the ESL pools.


Part 3, Chapter 7: Adaptive testing

Chapter 7: Adaptive testing

Overview Adaptive testing is the selection, administration, and scoring of test items in a manner that matches the purposes of the test and the proficiency levels of the examinees. The goal is to most efficiently fulfill the purpose of the assessment (e.g., course placement) by administering the minimum number of test items needed to make an accurate decision. In adaptive testing, the items are selected during the process of testing, which means that individual examinees are not administered a fixed set or fixed number of items. Instead, each examinee will receive any combination of various item sets, different numbers of items, and varying orders of item presentation, all on the basis of his or her testing performance.

Proficiency estimation testing The guiding philosophy of adaptive testing is to tailor the selection of test items to the goal of testing. One common testing approach is proficiency estimation, which seeks to obtain an accurate estimate of an examinee’s proficiency in one or more content and skill areas and then to report the examinee’s performance in terms of a particular score scale.

In the psychometric literature, a number of methods have been devised for achieving an accurate estimate of the examinee’s proficiency level (score). Methods include the selection of items to maximize information at the most recent proficiency estimate (Weiss, 1982), as well as the selection of items to minimize the posterior variance in a Bayesian estimation model (Owen, 1975; Jemsema, 1974). All of the procedures effectively match the difficulty of the test items to the proficiency level of the examinee. After each response, the examinee’s proficiency estimate is updated, and the next test item to be administered is selected to match that estimate. Examinees with high levels of proficiency will generally receive more difficult items than will examinees with low levels of proficiency.

In COMPASS, the goal of obtaining an accurate estimate of each examinee’s level of proficiency is met by selecting items to maximize the information at the examinee’s most recent estimate of proficiency based on Owen’s Bayesian estimation procedure (Owen, 1969). However, after the last item is administered, the examinee’s estimate of proficiency is re-estimated using a maximum likelihood algorithm. This is done because of the better statistical properties of the maximum likelihood estimates of proficiency (Lord, 1980). All of the information functions and estimation procedures are based on the three-parameter logistic item response theory (IRT) model (Lord, 1980).



When a test is adapted to the proficiency level of each examinee, different examinees generally will receive different sets of test items; depending on the type of rule used for stopping the test, the examinees may receive different numbers of test questions as well. Despite the differences in test composition, scores from the tests are comparable because the test items have been previously calibrated and the calibrations linked so that all IRT item-parameter estimates are on the same scale.

For COMPASS, item-parameter estimates were obtained using marginal maximum likelihood as implemented in the BILOG (Zimowski, Muraki, Mislevy & Bock, 2003) and PARSCALE (Muraki & Bock, 2003) estimation programs. Because the item pools contain more items than could be conveniently administered to examinees in one sitting, the calibrations of separate item sets were linked. For details on item pool calibration and linking, please refer to Part 3, Chapter 6, “Calibration of test items,” in this Reference Manual.

Components of an adaptive testing system An adaptive testing system requires a number of components to operate properly and includes the following:

• an item pool that contains both items and IRT item-parameter estimates

• a procedure for selecting items from an item pool for administering to each examinee

• a method for estimating the level of performance for each examinee

• a procedure for determining when to stop administering test items

Figure (3)7.1 depicts the interrelationship of these components. The figure shows that when an adaptive test is started, a provisional performance level is set for the purpose of selecting the first item from the item pool. When that item is selected, administered, and scored, the examinee’s proficiency is estimated and on the basis of this estimate, the next item is selected. The selection-administration-estimation cycle continues until an estimate with sufficient precision is reached or the preset maximum number of items is reached. Then the testing session ends with the results reported (i.e., printed or stored).



FIGURE (3)7.1 Flowchart for an adaptive test

Start

Set initial performance estimate

Select appropriate item

Administer and score item

Estimate proficiency

Report results

End testing session

Stop testing?

Item pool



Item pool The item pool for an adaptive test consists of the text of the items and stimulus materials (e.g., reading passages, graphs, charts), the answer keys, and the estimates of item parameters used to obtain the estimates of proficiency. The item-parameter estimates that are stored with the item include the IRT-A, IRT-B, and IRT-C parameters from the three-parameter logistic model as obtained from BILOG (Zimowski, Muraki, Mislevy & Bock, 2003) and PARSCALE (Muraki & Bock, 2003). Parameter estimates must be accurate for an adaptive test to work well. Items must be available in the pool to match the capabilities of the examinees. Items also must be administered to large groups of examinees who are sufficiently varied in ability so that accurate parameter estimates are obtained. Such an administration will provide information for all features of an item characteristic curve (ICC), including its lower asymptote (C-parameter), point of inflection (B-parameter), and slope at the point of inflection (A-parameter). Typically, item calibration requires from 800 to 1,000 examinees who are broadly distributed over the proficiency range of interest; the ACT calibration sample target for pretest items is 1,000.

To match the capabilities of the examinees, an item pool must be sufficient in size and spread of difficulty to have a number of items that are appropriate for each examinee. When the goal of testing is to obtain accurate estimates of proficiencies, COMPASS requires about 200 items that cover a range of difficulty appropriate for the expected examinee population.

Item selection rules Different criteria can be used to select items for adaptive administrations, such as:

• matching the item difficulty to the ability estimate

• selecting the item that reduces the posterior variance of the ability estimate the most, thereby providing the most information at the most recent ability estimate

The latter method, which has been almost universally adopted, is the method used by COMPASS. As shown in Figure (3)7.1, the adaptive test starts with an initial proficiency estimate (i.e., initial theta value). If proficiency estimation in the form of a numerical score is the goal of the assessment, then the initial proficiency estimate is based on either a central value, such as a value or a self-selected value. Starting at the initial estimate, items are selected that provide maximum information at the latest estimate, contingent on an exposure-control feature that protects against overuse of items.

When using passage-related sets of items for an assessment, as is done in COMPASS Reading and Writing Skills, the entire set is selected to provide maximum information at the current proficiency estimate. All items related to a passage are administered so that an examinee’s time spent studying the passage will be used efficiently.



Test score estimation An adaptive test is scored following the response to each test item. For proficiency estimation, the score estimates are updated after each item is scored using Owen’s Bayesian estimation procedure (Owen, 1969). After the last item is administered, proficiency is re-estimated using a maximum likelihood procedure.

Stopping rules An adaptive test can stop in any of several ways. Typically, the adaptive test ceases to administer items and generates a score report when the desired level of accuracy has been reached or a preset maximum number of items has been administered. COMPASS offers users a choice of test lengths for each test. Each length option is associated with an expected reliability estimate, with longer tests generally having higher reliability estimates. Thus, users can choose to use shorter tests with slightly lower levels of reliability or tests that administer a few more items to achieve a somewhat higher estimated reliability. In all cases, if the desired accuracy is reached, the test will stop even if the maximum-allowed number of items has not been administered. However, the software will always administer at least the minimum number of items specified. Please refer to Part 3, Chapter 1, “Technical characteristics of COMPASS tests,” in this Reference Manual for details on the three test-length options and expected reliabilities for all COMPASS tests.


References

References American Psychological Association. (1999) Standards for educational and psychological

testing. Washington, DC: The committee to develop Standards of the American Research Association, American Psychological Association, National Council on Measurement in Education.

Ascher, C. (1990). Assessing bilingual students for placement and instruction (Report No. 65). ERIC Clearinghouse on Urban Education, May 1990. (EDO-UD-90-5)

Bachman, L. F. (1990). Fundamental considerations in language testing. New York: Oxford University Press.

Bachman, L. F. (1991). What does language testing have to offer? TESOL Quarterly, 25, 671–704.

Buck, K., Byrnes, A., & Thompson, I. (1989, February). The ACTFL oral proficiency tester training manual. Yonkers, NY: ACTFL.

Byrnes, H., & Canale, M. (Eds.). (1987). Defining and developing proficiency: Guidelines, implementations and concepts. Lincolnwood, IL: National Textbook.

Canale, M. (1983). From communicative competence to communicative language pedagogy. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 2–25). New York: Longman.

Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1–47.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. College Standards and Accreditation Council. (1993). CSAC Pilot Project Phase 11A, English as

a second language: Final report. Toronto: Ontario CSAC ESL Project. Dale, E., & O’Rourke, J. (1981). The living word vocabulary. Chicago: World Book—Childcraft

International. Dulay, H., Burt, M., & Krashen, S. (1982). Language two. New York: Oxford University Press. ESL Intersegmental Project. (1996). California pathways: The second language student in public

high schools, colleges, and universities. Sacramento: Intersegmental Council of Academic Senates in conjunction with the California Community Colleges Chancellor’s Office.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.

Hulin, C. L., Drasgow, F., & Parson, C. K. (1983). Item response theory: Applications to psychological measurement. Homewood, IL: Dow Jones-Irwin.

Jemsema, C. J. (1974). The validity of Bayesian tailored testing. Educational and Psychological Measurement, 34, 757–766.

Kang, T., & Petersen, N. (2009). Linking item parameters to a base scale, (ACT RR 2009-2). Iowa City, IA: ACT.

Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355–381.


References

Krashen, S. D., & Terrell, T. D. (1983). The natural approach: Language acquisition in the classroom. Hayward, CA: The Alemany Press.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.

Miller, S. (1988). Report on the testing of ESL students. Iowa City, IA: American College Testing.

Muraki, E., & Bock, R. D. (2003). PARSCALE 4 for Windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie, IL: Scientific Software International, Inc.

National Working Group on Accreditation (Canada). (1993). Language benchmarks: English as a second language for adults. Citizenship and Immigration Canada.

Owen, R. J. (1969). A Bayesian approach to tailored testing (ETS Research Bulletin RB-69-92). Princeton, NJ: Educational Testing Service.

Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.

Rea, P. M. (1985). Language testing and the communicative language teaching curriculum. In Y. P. Lee et al. (Eds.), New directions in language testing (pp. 15–32). Oxford: Pergamon Press.

Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237–257). New York: Academic Press.

Sawyer, R. (1989). Validating the use of ACT Assessment scores and high school grades for remedial course placement in college (ACT Research Report Series No. 89-4). Iowa City, IA: American College Testing.

Scarcella, R., Anderson, E. K., & Krashen, S. (1990). Developing communicative competence in a second language. Boston: Heinle & Heinle.

Shepard, L. A. (1984). Setting performance standards. In R. L. Berk (Ed.), A guide to criterion-referenced test construction (pp. 169–198). Baltimore: Johns Hopkins University Press.

Spray, J. A. (1993). Multiple category classification using a sequential probability ratio test (ACT Research Report Series No. 93-7). Iowa City, IA: American College Testing.

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.

Swain, M. (1985). Large-scale communicative language testing: A case study. In Y. P. Lee et al. (Eds.), New directions in language testing (pp. 35–46). Oxford: Pergamon Press.

Sympson, J. B. (1993, October). A procedure for linear polychotomous scoring of test items (TN-94-2). San Diego: Navy Personnel Research and Development Center.

Teachers of English to Speakers of Other Languages. (1996, March). ESL standards for Pre-K–12 students. Unpublished manuscript.


References

Wald, A. (1947). Sequential analysis. New York: Dover. Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing.

Applied Psychological Measurement, 6, 473–492. Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R.D. (2003). BILOG-MG 3 for Windows:

Multiple-group IRT analysis and test maintenance for binary items [Computer software]. Skokie, IL: Scientific Software International, Inc.

Zwick, R., Thayer, D. T., & Wingersky, M. (1993). A simulation study of methods for assessing differential item functioning in computer adaptive tests (ETS Research Report No. RR 93-11). Princeton, NJ: Educational Testing Service.


act compass: internet version reference manual

Documents