partnership for accessible reading assessment item characteristics, student characteristics, and...

Partnership for Accessible Reading Assessment

Item Characteristics,Student Characteristics,

and Segmented Text

Ross MoenDecember 7, 2007

NARAP GAC

Partnership for Accessible Reading Assessment (PARA):A collaboration between the University of Minnesota’s National Center on Educational Outcomes and Department of Curriculum & Instruction; CRESST, University of California, Davis; and Westat

www.readingassessment.info


Context for Current Studies• Working Assumptions:

– Exploring options; we don’t already have the answers

– Seeking universal solutions; minimize accommodations

• Prior Studies – Consult with reading experts (jointly with DARA) on

the construct: Definition panel and focus groups leading to Principles and Guidelines Report

– Review literature on disabilities relation to reading: Disabilities Reports

– Examine test materials: Test Specifications Report– Analyze test data: DIF/DDF for Pre-NCLB NRTs


• Data differed from previous item analyses:– Instead of pre-NCLB NRTs, obtained test

data from 3 states’ post-NCLB criterion referenced reading tests

– Distinguished students with different kinds of disabilities

Item Characteristics: Methods


Item Characteristics: Results

• CRTs lacked NRTs’ end-of-test DIF/DDF increase

• Results varied by state and by type of disability– Number of groups and items affected varied by

state– Which groups were affected varied by state

• DIF/DDF need not indicate bias against students with disabilities– Low performing students without disabilities

sometimes were more seduced by false foils– Can be seen by examining response plots– Leads to questions other than test bias


Results Varied

Disability State 1 State 2 State 3SLD 2 - 8

SL/I - 4 0

EMR 11 - -

EBD 7 3 0

OHD 2 - -

Perceptual/ Communication

- 0 -

Physical - 4 -


Foil “A” draws students without LD

A0 through D0 = students without disabilities; A1 through D1 = students with LD; the correct response is C

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Z Score

Pro

bab

ility

A0 B0 C0 D0 A1 B1

C1 D1


0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Z Score

Pro

bab

ility

A0 B0 C0 D0 A1 B1

C1 D1

A0 through D0 = students without disabilities; A1 through D1 = students with LD; the correct response is B

Foil “A” again


Foil “C” then “A”

A0 through D0 = students without disabilities; A1 through D1 = students with LD; the correct response is B

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Z Score

Pro

bab

ility

A0 B0 C0 D0 A1 B1

C1 D1


Item Characteristics Question

• How does the test behavior of students with a particular disability differ from other students? – In one state, DIF/DDF was found only for students

with learning disability (LD)– Those students show a different test score

distribution.


Score Distribution of Grade 3 Students Without Disability

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45

Raw Score

Perc

ent

No Disability


Score Distribution of Grade 3 Students With Speech/Language

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45

Raw Score

Perc

ent

SP


Score Distribution of Grade 3 Students With Emotional/Behavioral

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45

Raw Score

Perc

ent

EBD


Score Distribution of Grade 3 Students With Learning Disability

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45

Raw Score

Perc

ent

LD


Item Characteristics Question

• What are the implications of these findings?– For designing accessible reading

assessments– For understanding students with

disabilities


Who needs Accessible Reading Assessment?

(LAMS)Less Accurately Measured Students

Assessment

Student Characteristics of Less Accurately Measured Students (LAMS)

(MAMS)More Accurately Measured Students


?

MAMS

Compare test results with (what?) other information

Match

LAMS

How Can We Identify LAMS?

Mismatch

Compare


Compare Tests with Teacher Judgment?

? =


• How well can teachers identify LAMS?– Do they say they can?– Can they distinguish reasons for LAMS?– Can they provide supporting evidence?– Do brief supplemental examinations match

teacher judgments?

• What can we learn from teachers’ LAMS?– What do they say they need or want?– What do we observe in assessment situations?

LAMS Study Goals


• Teachers completed questionnaire – Provided four reasons; sought open ended responses– Stable questionnaire design over 2 phases– 21 teachers at 10 sites completed 77 questionnaires

• Researchers met with teachers– Structured interview & examine supporting evidence– Phase 2 had 7 teachers at 5 sites

• Researchers met with students– Structured interview and differentiated assessment– Phase 2 had 17 students at 5 sites

LAMS Study Procedures


Questionnaire: Reasons for Identifying Students as LAMS

Count* Percentage*

Fluency Limitations Obscure Comprehension Skills

32 41.6%

Some Comprehension Limitations Obscure other Skills

22 28.6%

Test Fails to Reveal Non-Tested Strengths

18 23.4%

Responds Poorly to Testing Circumstances or Materials

31 40.3%

Other 5 6.5%

* Note duplicate counts on 77 students sum to a total count of 108 and total percentage of 140%


Teacher Interview: Hindrances to Student PerformanceHardl

y At All A Little Some

Quite a Bit A Lot Blank Mean

Fluency limitations 3 0 4 6 4 <0> 3.47

17.6% 0.0% 23.5% 35.3% 23.5% 0.0%

Comprehension limitations

0 1 5 7 4 <0> 3.82

0.0% 5.9% 29.4% 41.2% 23.5% 0.0%

Low motivation for the test

7 1 4 1 4 <0> 2.65

41.2% 5.9% 23.5% 5.9% 23.5% 0.0%

Keeping attention focused on the test

3 5 5 2 2 <0> 2.71

17.6% 29.4% 29.4% 11.8% 11.8% 0.0%

Getting worn out by the test

5 4 2 3 3 <0> 2.71

29.4% 23.5% 11.8% 17.6% 17.6% 0.0%

Anxiety 5 3 6 0 2 <1> 2.44

29.4% 17.6% 35.3% 0.0% 11.8% 5.9%

Other: 0 1 0 2 7 <7> 4.50

0.0% 5.9% 0.0% 11.8% 41.2% 41.2%


Student Interview: Attitudes Toward Reading and Tests

Hardly At All A Little Some Quite a Bit A Lot Blank Mean

How much do your read not for school?

1 4 7 1 3 <1> 3.06

5.9% 23.5% 41.2% 5.9% 17.6% 5.9%

How much do you Like reading?

0 0 9 4 3 <1> 3.63

0.0% 0.0% 52.9% 23.5% 17.6% 5.9%

How hard is reading for you?

3 2 7 4 0 <1> 2.75

17.6% 11.8% 41.2% 23.5% 0.0% 5.9%

How well do tests show your reading?

0 1 6 5 2 <3> 3.57

0.0% 5.9% 35.3% 29.4% 11.8% 17.6%


Student Interview: Ways to Improve Test Performance

Hardly At All A Little Some

Quite a Bit A Lot Blank Mean

Shorter reading passages

0 2 4 7 1 <3> 3.50

0.0% 11.8% 23.5% 41.2% 5.9% 17.6%

More interesting passages

0 3 1 4 6 <3> 3.93

0.0% 17.6% 5.9% 23.5% 35.3% 17.6%

Computer instead of paper and pencil

2 1 2 4 4 <4> 3.54

11.8% 5.9% 11.8% 23.5% 23.5% 23.5%

Entire test read aloud by CD etc

1 1 7 2 3 <3> 3.36

5.9% 5.9% 41.2% 11.8% 17.6% 17.6%

Computer pronounces or explains words you pick

0 0 1 6 7 <3> 4.43

0.0% 0.0% 5.9% 35.3% 41.2% 17.6%

Other ideas you have

0 1 0 1 5 <10> 4.43

0.0% 5.9% 0.0% 5.9% 29.4% 58.8%


Qualitative Analysis - tentative:Teachers’ LAMS confirmed?

Clear Bulls EyeConsensus between

researchers & teachern = 8

Jackie

Matt Jimmy Ike

AlBeth Joan

Marie

Betty Stanley

SpockZorro

Rose

Frank

Off TargetNo evidence that

student is LAMn = 3

Bruce

Mac

HenryMike

Karen

Seems CloseDiffer on why LAMS

n = 3

Seems CloseWeak confirmation

n = 4

Jane

BorderlineQuestionable

n = 2


Segmenting Study

• Segmented Text related to “Chunking” Literature– Reading is chunked into meaningful units to aid

readers with working memory capacity constraints– The literature refers to chunking of sentences– Our “segmented text” refers to grouping passage

segments with their corresponding items on the test page.

• Segmented text may reduce the need for accommodations by providing “built-in” test breaks


Segmenting: Participants

• 737 Grade 8 students from ten public schools in California

• 620 Students without disabilities • 117 Students with disabilities:

– 107 specific learning disabilities– 2 deaf/hard of hearing– 3 autistic– 2 speech/language impairment– 4 other health impairments


Segmenting: Reading Test

• Three reading comprehension passages were obtained from publicly-released tests from two states outside of California.

• Two versions of the test were created: Original (version A) and Segmented (version B)

• Test designed to be completed in one classroom period (approx. 50 min.)


Segmenting: Passages

• All passages were informational.• First passage was 700 words, other

two passages were about 550 words each.

• Each passage had 8 multiple-choice items with 4 possible answer choices (24 total test items).


Segmenting: Adjustments

• Segments were grouped with corresponding test items

• Each passage was broken down into 3 to 4 segments; each segment contained 1-3 questions

• Inferential questions appeared at the end

• Test items appeared in the same order in both versions


Segmenting: Emotion/Mood

InventoryAsked students after each passage:How does taking the test make you feel? Please circle all the

words that describe how you feel. There is no right or wrong answer.

If none of these words describe how you feel, please circle NONE.

good tired

energetic upset

bored confident

frustrated okayhappy stressed

blanked out interested

relaxed bad

NONE


Segmenting: Motivation Scale

• Post-test (printed at the end of the test booklets)

• 10-item, 4-point Likert-type, combining “importance” and “effort” questions


Segmenting: Performance

• No significant differences in reading performance of either group due to segmenting

Groups Mean SD n

SD/Original

9.94 3.32 52

SD/ Segment

9.32 4.05 57

Non-SD/ Original

13.89 4.58 301

Non-SD/ Segment

13.88 4.67 292


Segmenting: Reliability Findings

• Unsegmented showed more reliability for students without disabilities (“Non-SD”)

• This reliability gap decreased on the segmented version (no longer significant).

• This suggests the segmented version may be more accessible for SD students

• (Caution: How much of this is attributable to standard deviation differences?)

Reliability limits validity, because rxy < √ rxx’ (Allen & Yen, p. 113)

Groups Reliability

Validity

SD/Original(n=53)

0.516 .718

SD/Segment(n=62)

0.689 .830

Non-SD/Original(n=312)

0.783 .884

Non-SD/Segment(n=305)

0.788 .888


Segmenting: Motivation Results

Summary of descriptive analyses for the motivation section

No significant differences

Group Mean SD nStudents with disabilities, original

22.21 3.65 53

Students with disabilities, segm

22.83 3.44 60

Students with disabilities, total

22.54 3.54 113

Non-disabled, original 21.36 5.07 313

Non-disabled, segmented 22.16 4.23 296

Non-disabled, total 21.75 4.69 609

Original version, total 21.48 4.89 366

Segmented version, total 22.27 4.12 356

Total 21.87 4.54 722


Questions

• Would segmenting have greater impact if the test was longer than 50 minutes?

• Would segmenting have greater impact for students with disabilities focused on working memory capacity issues?

Partnership for Accessible Reading Assessment36

Partnership for Accessible Reading Assessment (PARA):Calibration & Motivation Studies

presentation to the General Advisory Committee December 7, 2007

Deborah Dillon & David O’Brien

University of MinnesotaPartnership for Accessible Reading Assessment (PARA):

A collaboration between the University of Minnesota’s National Center on Educational Outcomes and the Department of Curriculum & Instruction; CRESST, University of

California, Davis; and Westat

www.readingassessment.info


Calibration Study

The purpose of the study is to scale or calibrate the measurement tools that will be used in a large-scale accessible reading assessment for students with disabilities. This process allows investigators to empirically determine the comparability of passages and items used in the reading assessment study by placing all passages and questions on a common IRT (item response theory) -based equal-interval measurement scale.


Research Questions

1. What is the difficulty of each reading passage (based on a passage total score, which, in turn, is based on performance on all passage comprehension items/questions) and each comprehension item/question?

2. How well can the reading passages be placed on a common interval measurement scale to allow scores from different passages (of equal or unequal difficulty) to be compared and equated?

3. Based on IRT item fit statistics, what multiple choice items should be retained and which should be eliminated?

4. Which reading passages do students prefer to read?


Participants

A representative total sample of 1,200 students

– 600 from grades 3-5 (200 3rd graders, 200 4th graders, 200 5th graders) in 12-16 intact classrooms

– 600 students from grades 7-9 (200 7th graders, 200 8th graders, 200 9th graders) in 12-16 intact classrooms.

Students representing the full range of reading ability, including students with disabilities are included in the study


Design: Steps in the Calibration Process

1. Selected 40 passages, including 10 literary-fiction and 10 informational-exposition texts for each grade level (4th and 8th); the passages were rated as easy, medium, and hard in difficulty.

2. Commissioned the writing of 10 items for each passage, using the 2009 NAEP Reading Framework cognitive targets .


Design

• Testing procedures were employed to assure representation of passage text types while removing order effects

• Within classes students will be assigned to one of several possible test forms (a form is a set of passages with counterbalanced passage order)

• The test includes anchor passages (included in all forms), and non-anchor passages, from which several are selected and included in each form.


Experimental Design and Analysis

This preliminary item/passage psychometric calibration study will allow for:

1. the placement of all passages/questions on a common equal-interval measurement scale,

2. the development of passage scoring tables by which to assign subjects reading “ability” scores, and

3.provision of a mechanism for equating scores across different passages.

This “item fit analysis” will determine which items will be retained and those that will be eliminated.


Purpose: To examine whether improving the motivational characteristics of a large-scale reading assessment increases its accessibility for students with disabilities, and in so doing provides a more valid assessment of these students’ reading proficiency due to their increased engagement.

Motivation Study


Research Questions

1. Is there an interaction effect between choice, type of text, and type of student?

2. Is there a correlation between students’ general motivation to read (e.g., as measured by the Motivation to Read Questionnaire [MRQ]) and their performance on a large-scale reading assessment? Are participants who are more motivated to read (as measured by the MRQ), more likely to benefit from the choice option on a large scale reading assessment?


Research Questions—

cont.3. Does the option of exercising choice in the selection of

reading comprehension passages, which is hypothesized to improve student motivation and engagement on a large-scale assessment, produce significantly higher measured reading comprehension for all students?

4. Is there a significant difference in reading scores of students with disabilities versus general education students on large-scale reading assessments?

5. Is there a significant difference in student performance on text type (literary-fiction versus informational-exposition passages) on large-scale reading assessments?


Participants

280 students who are fluent in English– 140 students from 4th grade– 140 students from 8th grade– targeted samples of students representing a range of

disability groups are included– students will be placed in a treatment condition based

on stratified random assignment (i.e., students representing particular disabilities will be randomly assigned to the experimental and control conditions).


Design: Components of the Test

• The motivation assessment includes 2 literary-fiction and 2 informational-expository passages for both grade 4 & grade 8; passage order will be randomly assigned.

• Each passage will be followed by 5-6 multiple choice items.

• The assessment is untimed and will be completed on a computer-based platform.


Attending to Issues of Motivation

• General motivation will be measured prior to the test to obtain information on students’ feelings about “self as reader” (e.g., Motivation for Reading Questionnaire-MRQ).

• Situated motivation will be measured using questions woven into the test booklets for the choice and no-choice conditions (placed after the comprehension items); specific questions will tap– students’ perceptions of the texts they read (e.g., difficulty; interest), and – students’ sense of self-efficacy in reading and

completing the items following the passage (the task).


Design

A counterbalanced stratified random assignment design will be used with experimental choice (C) groups that select reading passages for the assessment (“design your own assessment”) and control no choice (NC) groups that do not select passages


Design: Procedures

Students in the experimental group are given choice (C) in selecting the passages they read in comparison to students in a control group who are not given choice in selecting passages (NC).– students in the (C) & (NC) condition read short

descriptions for 6 informational-exposition and 6 literary-fiction passages;

– they rate the passages according to interest;– students in the (C) condition select 2 passages from

each genre to create their “own personal assessment.”


Design: Procedures—

cont.• Post-assessment interviews will be

conducted with subsets of students from the control and experimental groups at both grade levels.

• Students from the various disabilities groups as well as regular education students will be selected for interviews (16 students from 4th grade and 16 from 8th grade)


Analysis

• The dependent measure is comprehension performance (Y); factors include choice condition (choice/ no choice), disability status (youth with disabilities/ youth without disabilities) & text type (literary-fiction/informational-exposition)

• A split-plot design will be used with two between-subjects factors (A = passage choice & B = disability status), one within-subjects factor (C = text type), one blocking variable (S = subject), & one covariate (X = motivation as assessed on the MRQ) at the between-subject level; A, B, C, and X are fixed effects, and S is a random effect


Analysis—cont.• Analysis of variance will be used to evaluate

various effects; correlations of students’ performance on the comprehension test & responses on the MRQ and situated motivation questions will be calculated

• Various analytic deduction approaches will also be used to analyze the post assessment interview data and a mixed-design approach will be used to integrate the overall quantitative and qualitative findings.

partnership for accessible reading assessment item characteristics, student characteristics, and...

Documents

accessible reading assessments

disabilities slide

reading experts

draws students

assessment situations

test behavior of students

c slide

b slide