learning disabilities research · learning disabilities research learning disabilities research...

16
Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council for Exceptiona.l Children Creating a Progress-Monit01ing System in Reading for Middle-School Students: Tracking Progress Toward Meeting High-Stakes Standards Christine Espin, Teri Wallace, Erica Lembke, Heather Campbell, and Jeffrey D. Long Universi(l' of Minnesota In this study, we examined the reliability and validity of curriculum-based measures (CBM) in reading for indexing the perfonnance of secondary-school students. Participants were 236 eighth-grnde students (134 females and 102 males) in the classrooms of 17 English teachers. Students completed 1-, 2-. and 3-minute reading aloud and 2-, 3-, and 4-minute maze selection tasks. The relation between performance on the CBMs and the state reading test were examined. Results revealed that both reading aloud and maze selection were refotble and valid predictors of performance on the state standards tests, with validity coefficients above. 70. An exploratory follow-up .study was conducted in which the growth curves produced by the reading-aloud and maze-selection measures were compared for a subset of 31 students from the original study. For these 31 students, maze selection reflected change over time whereas reading aloud did not. This pattern of results was found for both lower- and higher-performing students. Results suggest that it is important to com;ider both performance and progress when examining the technical adequacy of CBMs. implications for the use of measures with secondary-level student::, for progress monitoring are discussed_ In recent years, much attention has been directed to early intervention and prevention in reading. An alternative to a singular focus on early intervention is an approach in which early intervention ls cori1bined with continuous, long-term, intensive interventions for struggling readers, "Long term" in this approach refers to reading instruction that extends into the high school years. The goal of such an approach would be to diminish the magnitude of reading difficulties experienced by struggling readers and increase the likelihood of postgraduation success, Supporting the notion that long- term, intensive reading interventions may be needed for a select group of student,;; are two sources of data: ( l) results of early intervention studies and (2) results of secondary- school studies for students with learning disabilities. Need for Long-Term, Intensive Intervention Efforts Recent research on the effects of early identification and intervention programs have produced promising outcomes and demonstrated reductions in the magnitude and preva- lence of reading failure (O'Connor, Fulmer, Harty, & Beli, 2005; O'Connor, Harty, & Fulmer, 2005; Vaughn, Linan- Thompson, & Hickman, 2003). However, these studies also Requests for reprints should be sent to Christine Espin, Wassenaarseweg 52. PO Box 9555, 2300 RB Leiden, The Netherlanda.. Electronic inquiries .~hould be sent to espinca@)fsw.Jeidenuniv.nL have uncovered a small group of children who "fail to thrive'" (Vaughn et al., 2003), even when given intensive and poten- tially powerful interventions. Such children either do not reach a level of performance that warrants placement into a typical instructional setting or do not maintain satisfactory levels of performance without continued intensive interven~ tions. These students have reading difficulties that seem to be especially resistant to change (see Torgesen, 2000) and are oft.en considered to have learning disabilities (LD). Research at the secondary-school level reveals that stu- dents with LD continue to experience reading difficulties well into their high school years. Secondary-school students with LD experience difficulties with phonological, language comprehension, and reading fluency skills (Fuchs, Fuchs, Mathes, & Lipsey, 2000; Vellutino, Fletcher, Snowling, & Scanlon, 2004; Vellutino, Scanlon, & Tanzman, 1994; Vel- lutino, Tunmer, Jaccard, & Chen, 2007). They typically per- form at levels 4-6 years behind non-LD peers in reading and score in the lowest decile on reading achievement tests (Deshler, Schumaker, Alley, Warner, & Clark, 1982; Levin, Zigmond, & Birch, 1985; Warner, Schumaker, Alley,& Desh- ler, l 980). For example, on the 2007 National Assessment of Educational Progress (Lee, Grigg, & Donahue, 2007), 66 percent of students with disabilities in public schools scored below a Basic Level, compared to only 24 percent of students without disabilities. (A Basic Level implies partial mastery of the knowledge and skills needed for proficient work at a given grade level.)

Upload: others

Post on 29-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

Learning Disabilities Research

Learning Disabilities Research amp Practice 25(2) 60-75 0 2010 TI1e Division for Learning Disabilities of the Council for Exceptional Children

Creating a Progress-Monit01ing System in Reading for Middle-School Students Tracking Progress Toward Meeting High-Stakes Standards

Christine Espin Teri Wallace Erica Lembke Heather Campbell and Jeffrey D Long Universi(l of Minnesota

In this study we examined the reliability and validity of curriculum-based measures (CBM) in reading for indexing the perfonnance of secondary-school students Participants were 236 eighth-grnde students (134 females and 102 males) in the classrooms of 17 English teachers Students completed 1- 2- and 3-minute reading aloud and 2- 3- and 4-minute maze selection tasks The relation between performance on the CBMs and the state reading test were examined Results revealed that both reading aloud and maze selection were refotble and valid predictors of performance on the state standards tests with validity coefficients above 70 An exploratory follow-up study was conducted in which the growth curves produced by the reading-aloud and maze-selection measures were compared for a subset of 31 students from the original study For these 31 students maze selection reflected change over time whereas reading aloud did not This pattern of results was found for both lower- and higher-performing students Results suggest that it is important to comider both performance and progress when examining the technical adequacy of CBMs implications for the use of measures with secondary-level student for progress monitoring are discussed_

In recent years much attention has been directed to early intervention and prevention in reading An alternative to a singular focus on early intervention is an approach in which early intervention ls cori1bined with continuous long-term intensive interventions for struggling readers Long term in this approach refers to reading instruction that extends into the high school years The goal of such an approach would be to diminish the magnitude of reading difficulties experienced by struggling readers and increase the likelihood of postgraduation success Supporting the notion that longshyterm intensive reading interventions may be needed for a select group of student are two sources of data ( l) results of early intervention studies and (2) results of secondaryshyschool studies for students with learning disabilities

Need for Long-Term Intensive Intervention Efforts

Recent research on the effects of early identification and intervention programs have produced promising outcomes and demonstrated reductions in the magnitude and prevashylence of reading failure (OConnor Fulmer Harty amp Beli 2005 OConnor Harty amp Fulmer 2005 Vaughn LinanshyThompson amp Hickman 2003) However these studies also

Requests for reprints should be sent to Christine Espin Wassenaarseweg 52 PO Box 9555 2300 RB Leiden The Netherlanda Electronic inquiries ~hould be sent to espinca)fswJeidenunivnL

have uncovered a small group of children who fail to thrive (Vaughn et al 2003) even when given intensive and potenshytially powerful interventions Such children either do not reach a level of performance that warrants placement into a typical instructional setting or do not maintain satisfactory levels of performance without continued intensive interven~ tions These students have reading difficulties that seem to be especially resistant to change (see Torgesen 2000) and are often considered to have learning disabilities (LD)

Research at the secondary-school level reveals that stushydents with LD continue to experience reading difficulties well into their high school years Secondary-school students with LD experience difficulties with phonological language comprehension and reading fluency skills (Fuchs Fuchs Mathes amp Lipsey 2000 Vellutino Fletcher Snowling amp Scanlon 2004 Vellutino Scanlon amp Tanzman 1994 Velshylutino Tunmer Jaccard amp Chen 2007) They typically pershyform at levels 4-6 years behind non-LD peers in reading and score in the lowest decile on reading achievement tests (Deshler Schumaker Alley Warner amp Clark 1982 Levin Zigmond amp Birch 1985 Warner Schumaker Alleyamp Deshshyler l 980) For example on the 2007 National Assessment of Educational Progress (Lee Grigg amp Donahue 2007) 66 percent of students with disabilities in public schools scored below a Basic Level compared to only 24 percent of students without disabilities (A Basic Level implies partial mastery of the knowledge and skills needed for proficient work at a given grade level)

Taken together research on younger and older children with reading difficulties produces a picture of students whose reading difficulties begin early and persist throughout their schooJ career For such students a program of intervention that begins early-and then continues throughout their school careers--is needed

Reading Interventions at the Secondary-School Level

Two questions arise when considering reading interventions for secondary-school students with LD The first is At what level do students need to read to be successful after high school graduation ln recent years this question often has been addressed through the development of state standards tests in reading Such tests define by design or default the level of reading considered to be necessary for students to be successful at the secondary-school level-this despite the faet that the extent to which many state tests reflect the type of reading necessary for success either in school or in postsecshyondary settings is unknown However given the high~stakes nature of state tests for schools in terms of meeting No Child Left Behind standards and for students Who are required to pass reading tests to graduate (as is the case in 23 states Center on Education Policy 2008) the tests are an important outcome for students and schools at the secondary-school level

The second question is How can we determine whether our reading interventions are eflective The reading progress of secondary-school students with LD might prove to be slow and incremental-but not necessarily unimportant For example improvement of even one grade level (to use a typical metric) in reading over the course of 4 years in high school might translate into large advantages in post-high school settings Yet are there instruments that are sensitive to such slow and incremental growth Are those instruments reliable and valid and can they be tied to success on tasks of importance such as performance on state reading tests or performance in postsecondary educational settings One instrument that might potentially fulfill these requirements is curriculum-based measurement (CBM)

CBM

CBM is a system of measurement designed to allow teachers to monitor student progress and evaluate the effectiveness of instructional programs (Deno 1985) The success of CBM relies on two key characteristics practicality and technical adequacy (Deno l 985) With respect to practicality if the measures are to be given on a frequent basis they must be time efficient and easy to develop administer and score and must allow for the creation of multiple equivalent forms With respect to technical adequacy if the measures are to provide educationally useful information they must be valid and reshyliable indicators of perfonnance in an academic area For a measure to be considered a valid indicator of performance evidence must demonstrate that performance on the measure relates to performance in the academic domain more broadly

LEARNING DISABILITIES RESEARCH 61

In reading the number of words read correctly in I minute is often used as a CBM indicator of general reading perforshymance at the elementary-school level (Wayman Wallace Wiley Ticha amp Espin 2007) One-minute reading-aloud measures are time efficient and easy to develop adminisshyter and score and they allow for the creation of multiple eguivaJent forms Further a large body of research supports the relation between the number of words read aloud in 1 minute and other measures of reading proficiency including reading comprehension (see reviews by Marston 1989 Wayshyman et al 2007) Although most CBM reading research has focused on a reading-aloud measure support also has been found for the technical adequacy of a maze-selection meashysure ( see Wayman et al 2007) In a maze-selection measure every seventh word of a passage is deleted and replaced with a multiple-choice item consisting of the correct word plus two distracters Students read through the text and choose the correct word for each multiple-choice item Specific to the present study both reading-aloud (Crawford Tindal amp Stieber 2001 Hintze amp Silberglitt 2005 McGlinchey amp Hixson 2004 Silberglitt amp Hintze 2005 Stage amp Jacobsen 2001) and maze-selection measures (Wiley amp Deno 2005) have been shown to predict performance on state standards tests

Although research supports the technical adequacy of both reading aloud and maze selection the majority of that research ha5 been done at the elementary-school level (Wayman et aL 2007) Far less research has been conducted in reading at the secondary-school level even though the reshysults of cross-age studies suggest that the nature and type of CBM in reading might need to change as students become older and more proficient readers (Jenkins amp Jewell 1993 MacMillan 2000 Yovanoff Duesbery Alonzo amp Tindal 2005) Many of the studies that have been conducted in readshying at the secondary-school level have focused on reading as it relates to learning in the content areas ( eg Esp in amp Deno 1993a 1993b Espin amp Deno 1994-1995 Fewster amp MacMillan 2002) rather than on the development of general reading proficiency However a small group of studies has focused on general reading proficiency

Fuchs Fuchs and Maxwell (1988) examined the vashylidity of reading aloud for students with mild disabilishyties across grades 4-8 Across-grade correlations between words read conectly (WRC) in l minute and scores on comprehension and word study subtests of a stanshydardized achievement test were 91 and 80 respectively however because the study was not specifically focused on the secondary-school level correlations were not reshyported separately for the secondary-school students in the study

Three subsequent studies focused specifically on secondary-school students Espin and Foegen (1996) exshyamined the validity of three CBMs-reading aloud maze selection and vocabulary matching-on the comprehension acquisition and retention of expository text for students in grades 6-8 Comprehension acquisishytion and retention were measured with researcher-designed multiple-choice questions given immediately after reading (comprehension) immediately after instruction on the text (acquisition) and a week or more following instruction

62 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

(retention) Correlations ranged from 54 to 65 and were similar for comprehension acquisition and retention meashysures Brown-Chidsey Davis and Maya (2003) examined the reliabiJity and vaJidity of a I 0-mjnute maze task-a somewhat long task by CBM standards~as an indicator of reading for students in grades 5~8 They found that scores generaJly differentiated students by grade level and special education status Rasinski et al (2005) in discussing the importance of reading fluency for high school students reshyported correlations between WRC in 1 minute and scores on a state standards test of 53 for ninth-grade students Descriptive data and methods were not reported in the article

In sum little research has been conducted at the secondary-school level on the development of CBM readshying measures as indicators of general reading proficiency and that which has been done has been limited in terms of measures and methodology or has not focused specifically on secondary-school students What is more the research to date bas focused on the characteristics of the measures as performance or static measures not as probrress or growth measures The validity and reliability of the measures may differ based on their intended use

In this anicle we examine the technical adequacy ofCBM reading measures for secondaiy-school students Specifshyically the reliability and validity of CBMs as predicshytors of performance on a state standards test in reading is examined Differences related to time frame and scorshying procedure are examined Reading-aloud and mazeshyselection measures are selected because of previous reM search demonstrating their practical and technical adequacy at the elementary-school level and their potential promise at the secondary-school level Time frames are examined because longer samples of work might be needed at the middle-school level to obtain a distribution of student scores For example reading-aloud scores might bunch together at I minute but spread out at 3 minutes Finally scoring proshycedures are examined to determine the influence of ershyrors on the reliability and validity of students scores For example counting the number of correct selections on a maze task is less time consuming than counting the number of correct minus incorrect selections but using a correct minus incorrect score may help to control for guessing

Two research questions are addressed in the study

( ) What are the reliability and validity of reading aloud and maze selection for predicting performance on a state standards test in reading

(2) Do reliability and validity vary with time frame and scoring procedures

Our primary focus was on the technical adequacy of CBMs as static measures or indicators of performance at a single point in time However we were also able to col~ lect progress measures on a small subsample of the origshyinal sample Thus we conducted an exploratory study in which we compared the growth rates produced by readingshyaloud and maze-selection measures for this subsample of students

STUDY 1 READING ALOUD AND MAZE SELECTION AS PERFORMANCE INDICATORS

Method

Setting and Participants

The study took place in two middle schools Jn an urban disshytrict of a large midwestem metropolitan area The district enrolled over 47000 students Seventy-five percent of the students were from diverse cultural back61rounds 24 percent received ESL services 67 percent were eligible for free and reduced lunches and 13 percent were in special education The first school had 669 students in grades 6-8 Eightyshythree percent of the students were from diverse cultural backshygrounds 35 percent received ESL services 83 percent were eligible for free and reduced lunches and 15 percent were in special education The second school had 778 students in grades 6-8 Sixty-two percent of the students were from diverse cultural backgrounds 18 percent received ESL sershyvices 56 percent were eligible for free or reduced lunches and 16 percent were in special education

All eighth-grade students were invited to participate in the study to ensure a range of student performance levels Participants were 236 eighth-grade students ( 134 females and l 02 males) in the classrooms of 17 English teachers from the two schools Fifty-eight percent of the participants were eligible for free or reduced lunches Students were Caucasian (34 percent) A~ian American (24 percent) African American (20 percent) Hispanic (19 pereent) and Native American (3 percent) Nine percent of the students were receiving special education services for learning disabilities or mild disabilities (4 percent) speech and language (3 percent) emotional and behavior disorders (1 percent) or other health impaired ( 1 percent) Fifty-eight riercent of the students spoke English at home The rest spoke Spanish ( 185 percent) Hmong (J 6 pereent) Laotian (4percent) Vietnamese (1 percent) Cambodian (1 pereent) Amharic (5 percent) Chinese (05 pereent) and Somali (05 percent) The mean standard score on the state standards reading test for Sample 1 was 6269 This compared to a state-wide mean score of 6406 and a district-wide mean score of6073

Note that the sample did not consist of struggling readshyers only even though the primary purpose of the study was to identify petformance and progress measures for strugshygling readers To establish the reliability and validity of CBM it was necessary to have a sample that represented a range of student ability levels because vaJidhy and reliashybility coefficients could be negatively affected by a truncated distribution of scores We had two options One was to seshylect students who were struggling readers across a range of grade levels similar to the approach taken by Fuchs et al ( 1988) A second was to work within one grade level but to include students across a range of performance levshyels within that grade Given that the purpose of the smdy was to tie the CBM to performance on a state standards test and given that the state standards test was given in only one grade we chose the latter approach This approach is not unique In a review of the CBM research in reading

(Wayman et al 2007) 28 of the 29 technical adequacy studshyies conducted at the elementary-school leveJ used general education samples (13 studies) or mixed samples of general and special education ( 15 studies) Only l used an exclusively special education sample

Measures

Predictor variables Predictor variables were scores on two CBM tasks reading aloud and maze selection The readingshyaloud and maze-selection tasks were drawn from humanshyinterest stories published in the local daily newspaper and were selected on the basis of content readability level length and scores on a pilot test conducted with four sn1dents who were not involved in the study Passages whose content was determined to be too technical or culturally specific were not used To ensure that students would not complete the CBM tasks before time was expired only passages that were longer than 800 words were selected Readability was calculated usshying the Flesch-Kincaid formula (Kincaid Fishburne Rogers amp Chissom 1975) via Microsoft Word and the Degrees of Reading Power (DRP Touchstone Applied Science and Asshysociates 2006) Readability levels for the selected passages ranged from fifth to seventh grade and DRP levels ranged from 51 to 61 Means (number of words read aloud in 3 minshyutes) and standard deviations from the pilot study for selected passages were 4215 (SD= 805) 4895 (SD= 117) 4325 (SD= 1405) and 4017 (SD= 75)

Tiie reading-aloud task was administered to students on an individual basis using standardized administration proceshydures Students read aloud from the passage while the examshyiner followed along on a numbered copy of the same passage making a slash through words read incorrectly or words supshyplied for the student The examiner timed for 3 minutes using a stopwatch marking progress at 1 2 and 3 minutes Readshying aloud was scored for total words read (TWR) and WRC at I 2 and 3 minutes

Maze-selection passages were created from the same stoshyries used for reading aloud Every seventh word was deleted and replaced by the con-ect choice and two distracters The distracters were within one letter in length of the correct word but started with different letters of the alphabet and comprised different parts of speech (see Fuchs Fuchs Hamshylett amp Ferguson l 992 for maze-construction procedures) The three word choices were underlined in bold print and were not split at the end of the sentence in order to preserve continuity for the reader

The maze selection task was administered to students in a group setting using standardized administration procedures Students read silently for 4 minutes making selections for each multiple-choice item Examiners timed for 4 minutes and instructed students to mark their progress with a slash at 2 3 and 4 minutes Examiners monitored to ensure that students made the slashes Maze selection was scored for corshyrect maze choices (CMC) and correct minus incorrect choices (CMI) in 2 3 and 4 minutes As a control for guessing and following the procedures used in previous research on maze selection (Espin Deno Maruyama amp Cohen 1989 Fuchs et al l 992) maze scoring was stopped when three consec-

LEARNING DISABILITIES RESEARCH 63

utive incorrect choices were made A recent investigation comparing different maze-selection scoring procedures reshyvealed no differences in criterion-related validity associated with using a two-in-a-row versus three-in-a-row incorrect rule (Wayman et aL 2009)

Criterion variables The criterion variable in this study was performance on the Minnesota Basic Standards Test (MBST) in reading a high-stakes test required for gradushyation The MBST was designed by the state of Minnesota to test the minimum level of reading skills needed for surshyvival (MN Department of Education 200 l) and at the time of the study was administered annually in the winter to all eighth-grade students in Minnesota 1 The untirned test comshyprised four or more passages of 500 words or more selected from newspaper and magazine articles Passages were both narrative and expository and had average DRPmiddotmiddotJevels ranging from 64 to 67 Each prnsage was followed by multiple-choice questions with approximately 40 questions per test The test was constructed so that 60 percent of the questions on the test were literal 30 percent inferential and 10 percent could be either The test was machine-scored on a scale from O to 40 and then the raw score was converted to a scale score between 375 and 750 A passing scale score was 600 which correshysponded to 75 percent correct (MN Department of Education 200] ) Students who did not pass the test were permitted to retake it two times each year Students had to pass the test in order to graduate from high school

The MBST Technical Manual (MN Department ofEducashytion 2001 ) reported reliability and validity inforn1ation for the MBST Reading test Internal consistency measures for reliability were based on the Rasch model index of person separation The Kuder-Richardson 20 internal consistency reliahility estimate was 90 No alternate-form reliability was calculated Content validity according to the manual was determined by the relationship of the reading test items to statewide content standards as verified by educators item developers and experts in the field Construct validity was measured by item point-biserial correlations (the correlation between students raw scores on the MBST and their scores on individual test items) The mean point biserial correlation was 38 There were no criterion-related validity statistics noted

Procedures

In the fall students completed two maze passages in a group setting jn their classrooms On a subsequent day in the same week students completed two reading-aloud passages indishyvidually Type of measure (reading aloud vs maze selection) and passage were counterbalanced across students as was the order in which the students completed the passages within reading aloud or maze selection Examples of each task were given to students prior to administration The MBST WdS

administered by teachers to students in February Sixteen graduate students administered and scored the

reading-aloud and maze-selection measures Prior to data collection the graduate students were interviewed by memshybers of the research team to ascertain their ability to work

64 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

with students and to accurately score reading samples Folshylowing this initial screening the graduate students particshyipated in two 2-hour training sessions on administration and scoring During training the graduate students adminshyistered and scored three samples Inter-scorer agreement on the three passages betvveen the data collectors and tbe trainer was calculated by dividing the smaller by the larger score and multiplying by lOO Inter-scorer agreement exshyceeded 95 percent on maze selection and 90 percent on readshying aloud for all scorers During data collection and scorshying 33 percent of the reading-aloud and 10 percent of the maze-selection probes were randomly selected to be checked for accuracy of scoring Inter-scorer agreement exceeded 90 percent for all measures

Results

Means and standard deviations for reading-aloud and mazeshyselection scores for each time frame are reported in Table I Examination of mean scores reveals that students worked at a steady pace across the duration of the passages Students read aloud approximately l 25 words with 6 errors per minute across the 3 minutes and made approximately 6 correct maze choices with 05 errors per minute across the 4 minutes of maze The mean score for study participants on the MBST in reading was a standard score of62690 (SD= 6566) with a range of 475-750

To determine alternate-form reliability correlations beshytween scores on the two forms of the maze-selection and reading-aloud measures were calculated for each time frame and scoring procedure see Table 2) Reliabilities for b~th reading aloud and maze were generally above 80 Reliashybilities for reading aloud ranged from 93 to 96 and were similar across scoring method and sample duration Reliashybilities for maze ranged from 79 to 96 and were generally similar for scoring method but increased somewhat with time frame The highest obtained reliability coefficient was for the 4-minute maze passages scored for CMI (r = 96) however reliabilities for the 3-minute maze selection were above 85 regardless of scoring method

TABLE 1 Means and Standard Deviations for Reading Aloud and Maze

Selection by Scoring Procedure and Time Frame

Curriculum-Based Weasurements and scoring procedure Time

Reading uloud J minute 2 minutes 3 minutes Total word8 read 12588 25046 3733]

(4375) (8505) (12595)

Words read correct J ]982 23854 35527 (4729) (9214) (13692)

Maze selection 2 minutes 3 minutes 4 minutes

Correct choices 1233 1876 2524 (712) (1087) (J453

Correct minwgt 1 l 8 1717 23 JO incorrect choices (753 I (1140) (l 5 7

Vote Standard deviations are in pare11lheses

TABLE 2 Alternate-Form Reliability tor Reading Aloud and Maze Selection

by Scoring Procedure and Time Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud 1 minutl 2 mi1mtes 3 minutes Total words read 93 96 95 Words read correct 94 96 94

Maze selection 2 minutes 3 mimnes 4 minuies Correct choices 80 86 88 Correct minus 79 86 96

incorrect choices

Nore All correlations significant atp lt OJ

TABLE 3 Predictive Validity Coefficients for Reading Aloud and Maze

Selection with MBST by Scoring Procedure and Tme Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud J minute 2 minutes 3 minutes Total words read 76 77 76 Words read correct 78 79 78

Maze selection 2 minutes 3 111i11ues 4 minutes Correcl choCes 75 77 80 Correct minus 77 78 81

incorrect c-hoices

Nme All correlations significant at p lt O MBST MinneigtOtd Basic Standards Test

To examine the predictive validity of the measures corshyrelations between mean scores on the two forms of readingshyaloud and maze-selection measures and scores on the MBST were calculated (see Table 3) Correlations ranged from 75 to 81 The magnitude of the correlations was similar across type of measure (reading aloud and maze) and method of scoring For reading aloud correlations for 1 2 and 3 minshyutes were virtually identical For maze selection a consisshytent but small increae in correlations was seen across time frames with correlations of 75 (CMC) and 77 (CMI) for the 2-minute measure and 80 (CMC) and 81 (CMI) for the 4-minute measure

In summary results revealed that both maze selection and reading aloud produced respectable altemate-fom1 reliabilshyities although reading aloud yielded consistently larger reshyliability coefficients than maze Few differences in reliabilishyties were seen for scoring procedure or time frame with the exception that reliabilities for the maze selection increased somewhat with time Predictive validity coefficients were similar for the two types of measures Correlations were simshyilar across scoring procedures for both measures With regard to time frame smal but consistent increases in correlations were seen for maze selection

Discussion

ln this study we examined the reliability and validity ofreadshying aloud and maze selection as indictors of performance on

LEARNING DISABILITIES RESEARCH 65

a state standards test Difference in technical characteristics related to time frame and scoring procedure were examined

Both reading aloud and maze selection showed reasonshyable alternate-fonn reliabilities at all time frames with most coefficients at or above 80 In general reading aloud reshysulted in higher alternate-form reliability coefficients (rangshying from 93 to 96) than did maze selection (ranging from 79 to 96) but reliability for maze selection was in the range typical for CBM Time frame did not influence reshyliability coefficients for reading aloud but had some influshyence on maze selection Obtained reliability coefficients for maze increased with time frame with coefficients for the 2-minute time frame havering around 80 but increasing for 3-minute (r = 86) and 4-minute (r = 88 and 96) time frames Finally scoring procedure had little effect on reliashybility with the exception that when 4-minute maze selection was scored for CMI reliability was somewhat larger (r = 96) than when it was scored for CMC (r = 88)

Like reliability coefficients validity coefficients were quite similar across type of measure time frame and scoring procedure Validity coefficients for reading aloud ranged beshytween 76 and 79 and were similar across scoring procedure and time frames Maze-selection coefficients ranged between 75 and 81 and also were similar across scoring procedure A systematic increase in validity coefficients was seen with an increase in time for maze hut differences were small

We wish to make two observations regarding the magnishytude of the validity coefficients found in the performance study First the correlations obtained in our study were larger than those found in previous research at the middleshyschool level For example Yovanoff et al (2005) reported correlations of 5 l and 52 between WRC in 1 minute and scores on a reading comprehension task for eighth-grade stushydents Espin and Foegen (I 996) reported correlations of 57 and 56 respectively between WRC in 1 minute and CMC in 2 minutes and scores on a reading comprehension task

One might hypothesize that the differences in correlashytions are related to the materials used to develop the CBMs although no consistent pattern of differences can be seen across studies Yovanoff et al (2005) used grade-level prose material Espin and Foegen (1996) used fifth-grade level exshypository material and we used fifth-to seventh-grade humanshyinterest stories from the newspaper-material that might be considered to be both narrative and expository Moreover previous research conducted at the elementary-school level has revealed few differences in reliability and validity for CB Ms drawn from material of different difficulty levels or from various sources (see Wayman et al 2007 for a review)

It is possible that differences are related to the criterion variable used Both Yovanoff et al (2005) and Espin and Foegen ( 1996) used a limited number of researcher~designed multiple-choice questions as an outcome whereas in our study we used a broad-based measure of comprehension deshysigned to scale student performance across a range of levels Supporting this hypothesis are data from two studies demonshystrating nearly identical correlations (in the 70s) to those we found between the CBM reading-aloud and maze-selection measures and the MBST (Muyskens amp Marston 2006 Ticha Espin amp Wayman 2009) In addition Ticha et al (2009)

found high correlations between maze-selection scores and a standardized achievement test

Second the state standards test used in the current study was designed to test the minimal reading competency for students in eighth grade Tims one might question whether the CB Ms would predict reading competence as well if the criterion measures were measures ofbroader reading compeshytence Results ofTicha et al (2009) indicate that the reading measures predict performance on a standardized reading test as well as (or better than) they predict performance on the state standards test Perhaps the nature of the state test serves to reduce the overall variability in scores and thus serves to reduce the correlations Replication of the current study Vlth other outcome measures of reading proficiency is in order

In summary the results supported the reliability and vashylidity of both reading aloud and maze selection as indictors of perfom1ance on a state standards reading test for middleshyschool students For reading aloud our data combined with practicalmiddot considerations would suggest use of a I-minute sample scored for TWR or WRC as a valid and reliable indishycator of perfonnance Little was gained in technical adequacy by increasing the reading time Given that reading aloud is typically scored for WRC and given that this scoring proshycedure is no more time consuming than scoring TWR we would recommend scoring the sample for WRC rather than TWR

For maze selection our data combined with practical considerations wouJd suggest use of a 3-rninute selecshytion task scored for CMC as a valid and reliable indicator of performance Although reliability and validity coeffishycients were the strongest for 4 minutes the differences beshytween 3- and 4-minute coefficients were small in magnitude and both data collectors and teachers reported anecdotalty that a 4-minute maze task was tedious for the sn1dents to complete

Altl1ough our data support the use of both WRC in J minute and CMC in 3 minutes as predictors of performance on a state standard test one might ask how teachers can use such data in their decision making A common approach is to create a district-wide cutoff score on the CBM that is associated with a high probability of passing the state standards test For example district-wide data may show that of students who read 145 WRC in 1 minute 80 percent pass the state standards test Teachers might then set a goal of I 45 WRC in 1 minute for their students The disadvantage of a cutoff score for students who struggle in reading is that these students often perform weil below the cutoff score An alternative approach is to present the relationship between performance on the CBM measures and the likelihood of passing the state standards test along the entire performance continuum For example district-wide data may show that of students who read l 00 WRC in 1 minute 26 percent pass the state standards test but of students who read 126 WRC in I minute 57 percent pass Teachers may choose to set an annual goal of 126 WRC for a student who begins the year reading only 100 WRC This goal would move the student closer to a level oflikely success A method that can be used to create these Table of Probable Success using CBM data is explained and illustrated in Espin et al (2008)

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 2: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

Taken together research on younger and older children with reading difficulties produces a picture of students whose reading difficulties begin early and persist throughout their schooJ career For such students a program of intervention that begins early-and then continues throughout their school careers--is needed

Reading Interventions at the Secondary-School Level

Two questions arise when considering reading interventions for secondary-school students with LD The first is At what level do students need to read to be successful after high school graduation ln recent years this question often has been addressed through the development of state standards tests in reading Such tests define by design or default the level of reading considered to be necessary for students to be successful at the secondary-school level-this despite the faet that the extent to which many state tests reflect the type of reading necessary for success either in school or in postsecshyondary settings is unknown However given the high~stakes nature of state tests for schools in terms of meeting No Child Left Behind standards and for students Who are required to pass reading tests to graduate (as is the case in 23 states Center on Education Policy 2008) the tests are an important outcome for students and schools at the secondary-school level

The second question is How can we determine whether our reading interventions are eflective The reading progress of secondary-school students with LD might prove to be slow and incremental-but not necessarily unimportant For example improvement of even one grade level (to use a typical metric) in reading over the course of 4 years in high school might translate into large advantages in post-high school settings Yet are there instruments that are sensitive to such slow and incremental growth Are those instruments reliable and valid and can they be tied to success on tasks of importance such as performance on state reading tests or performance in postsecondary educational settings One instrument that might potentially fulfill these requirements is curriculum-based measurement (CBM)

CBM

CBM is a system of measurement designed to allow teachers to monitor student progress and evaluate the effectiveness of instructional programs (Deno 1985) The success of CBM relies on two key characteristics practicality and technical adequacy (Deno l 985) With respect to practicality if the measures are to be given on a frequent basis they must be time efficient and easy to develop administer and score and must allow for the creation of multiple equivalent forms With respect to technical adequacy if the measures are to provide educationally useful information they must be valid and reshyliable indicators of perfonnance in an academic area For a measure to be considered a valid indicator of performance evidence must demonstrate that performance on the measure relates to performance in the academic domain more broadly

LEARNING DISABILITIES RESEARCH 61

In reading the number of words read correctly in I minute is often used as a CBM indicator of general reading perforshymance at the elementary-school level (Wayman Wallace Wiley Ticha amp Espin 2007) One-minute reading-aloud measures are time efficient and easy to develop adminisshyter and score and they allow for the creation of multiple eguivaJent forms Further a large body of research supports the relation between the number of words read aloud in 1 minute and other measures of reading proficiency including reading comprehension (see reviews by Marston 1989 Wayshyman et al 2007) Although most CBM reading research has focused on a reading-aloud measure support also has been found for the technical adequacy of a maze-selection meashysure ( see Wayman et al 2007) In a maze-selection measure every seventh word of a passage is deleted and replaced with a multiple-choice item consisting of the correct word plus two distracters Students read through the text and choose the correct word for each multiple-choice item Specific to the present study both reading-aloud (Crawford Tindal amp Stieber 2001 Hintze amp Silberglitt 2005 McGlinchey amp Hixson 2004 Silberglitt amp Hintze 2005 Stage amp Jacobsen 2001) and maze-selection measures (Wiley amp Deno 2005) have been shown to predict performance on state standards tests

Although research supports the technical adequacy of both reading aloud and maze selection the majority of that research ha5 been done at the elementary-school level (Wayman et aL 2007) Far less research has been conducted in reading at the secondary-school level even though the reshysults of cross-age studies suggest that the nature and type of CBM in reading might need to change as students become older and more proficient readers (Jenkins amp Jewell 1993 MacMillan 2000 Yovanoff Duesbery Alonzo amp Tindal 2005) Many of the studies that have been conducted in readshying at the secondary-school level have focused on reading as it relates to learning in the content areas ( eg Esp in amp Deno 1993a 1993b Espin amp Deno 1994-1995 Fewster amp MacMillan 2002) rather than on the development of general reading proficiency However a small group of studies has focused on general reading proficiency

Fuchs Fuchs and Maxwell (1988) examined the vashylidity of reading aloud for students with mild disabilishyties across grades 4-8 Across-grade correlations between words read conectly (WRC) in l minute and scores on comprehension and word study subtests of a stanshydardized achievement test were 91 and 80 respectively however because the study was not specifically focused on the secondary-school level correlations were not reshyported separately for the secondary-school students in the study

Three subsequent studies focused specifically on secondary-school students Espin and Foegen (1996) exshyamined the validity of three CBMs-reading aloud maze selection and vocabulary matching-on the comprehension acquisition and retention of expository text for students in grades 6-8 Comprehension acquisishytion and retention were measured with researcher-designed multiple-choice questions given immediately after reading (comprehension) immediately after instruction on the text (acquisition) and a week or more following instruction

62 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

(retention) Correlations ranged from 54 to 65 and were similar for comprehension acquisition and retention meashysures Brown-Chidsey Davis and Maya (2003) examined the reliabiJity and vaJidity of a I 0-mjnute maze task-a somewhat long task by CBM standards~as an indicator of reading for students in grades 5~8 They found that scores generaJly differentiated students by grade level and special education status Rasinski et al (2005) in discussing the importance of reading fluency for high school students reshyported correlations between WRC in 1 minute and scores on a state standards test of 53 for ninth-grade students Descriptive data and methods were not reported in the article

In sum little research has been conducted at the secondary-school level on the development of CBM readshying measures as indicators of general reading proficiency and that which has been done has been limited in terms of measures and methodology or has not focused specifically on secondary-school students What is more the research to date bas focused on the characteristics of the measures as performance or static measures not as probrress or growth measures The validity and reliability of the measures may differ based on their intended use

In this anicle we examine the technical adequacy ofCBM reading measures for secondaiy-school students Specifshyically the reliability and validity of CBMs as predicshytors of performance on a state standards test in reading is examined Differences related to time frame and scorshying procedure are examined Reading-aloud and mazeshyselection measures are selected because of previous reM search demonstrating their practical and technical adequacy at the elementary-school level and their potential promise at the secondary-school level Time frames are examined because longer samples of work might be needed at the middle-school level to obtain a distribution of student scores For example reading-aloud scores might bunch together at I minute but spread out at 3 minutes Finally scoring proshycedures are examined to determine the influence of ershyrors on the reliability and validity of students scores For example counting the number of correct selections on a maze task is less time consuming than counting the number of correct minus incorrect selections but using a correct minus incorrect score may help to control for guessing

Two research questions are addressed in the study

( ) What are the reliability and validity of reading aloud and maze selection for predicting performance on a state standards test in reading

(2) Do reliability and validity vary with time frame and scoring procedures

Our primary focus was on the technical adequacy of CBMs as static measures or indicators of performance at a single point in time However we were also able to col~ lect progress measures on a small subsample of the origshyinal sample Thus we conducted an exploratory study in which we compared the growth rates produced by readingshyaloud and maze-selection measures for this subsample of students

STUDY 1 READING ALOUD AND MAZE SELECTION AS PERFORMANCE INDICATORS

Method

Setting and Participants

The study took place in two middle schools Jn an urban disshytrict of a large midwestem metropolitan area The district enrolled over 47000 students Seventy-five percent of the students were from diverse cultural back61rounds 24 percent received ESL services 67 percent were eligible for free and reduced lunches and 13 percent were in special education The first school had 669 students in grades 6-8 Eightyshythree percent of the students were from diverse cultural backshygrounds 35 percent received ESL services 83 percent were eligible for free and reduced lunches and 15 percent were in special education The second school had 778 students in grades 6-8 Sixty-two percent of the students were from diverse cultural backgrounds 18 percent received ESL sershyvices 56 percent were eligible for free or reduced lunches and 16 percent were in special education

All eighth-grade students were invited to participate in the study to ensure a range of student performance levels Participants were 236 eighth-grade students ( 134 females and l 02 males) in the classrooms of 17 English teachers from the two schools Fifty-eight percent of the participants were eligible for free or reduced lunches Students were Caucasian (34 percent) A~ian American (24 percent) African American (20 percent) Hispanic (19 pereent) and Native American (3 percent) Nine percent of the students were receiving special education services for learning disabilities or mild disabilities (4 percent) speech and language (3 percent) emotional and behavior disorders (1 percent) or other health impaired ( 1 percent) Fifty-eight riercent of the students spoke English at home The rest spoke Spanish ( 185 percent) Hmong (J 6 pereent) Laotian (4percent) Vietnamese (1 percent) Cambodian (1 pereent) Amharic (5 percent) Chinese (05 pereent) and Somali (05 percent) The mean standard score on the state standards reading test for Sample 1 was 6269 This compared to a state-wide mean score of 6406 and a district-wide mean score of6073

Note that the sample did not consist of struggling readshyers only even though the primary purpose of the study was to identify petformance and progress measures for strugshygling readers To establish the reliability and validity of CBM it was necessary to have a sample that represented a range of student ability levels because vaJidhy and reliashybility coefficients could be negatively affected by a truncated distribution of scores We had two options One was to seshylect students who were struggling readers across a range of grade levels similar to the approach taken by Fuchs et al ( 1988) A second was to work within one grade level but to include students across a range of performance levshyels within that grade Given that the purpose of the smdy was to tie the CBM to performance on a state standards test and given that the state standards test was given in only one grade we chose the latter approach This approach is not unique In a review of the CBM research in reading

(Wayman et al 2007) 28 of the 29 technical adequacy studshyies conducted at the elementary-school leveJ used general education samples (13 studies) or mixed samples of general and special education ( 15 studies) Only l used an exclusively special education sample

Measures

Predictor variables Predictor variables were scores on two CBM tasks reading aloud and maze selection The readingshyaloud and maze-selection tasks were drawn from humanshyinterest stories published in the local daily newspaper and were selected on the basis of content readability level length and scores on a pilot test conducted with four sn1dents who were not involved in the study Passages whose content was determined to be too technical or culturally specific were not used To ensure that students would not complete the CBM tasks before time was expired only passages that were longer than 800 words were selected Readability was calculated usshying the Flesch-Kincaid formula (Kincaid Fishburne Rogers amp Chissom 1975) via Microsoft Word and the Degrees of Reading Power (DRP Touchstone Applied Science and Asshysociates 2006) Readability levels for the selected passages ranged from fifth to seventh grade and DRP levels ranged from 51 to 61 Means (number of words read aloud in 3 minshyutes) and standard deviations from the pilot study for selected passages were 4215 (SD= 805) 4895 (SD= 117) 4325 (SD= 1405) and 4017 (SD= 75)

Tiie reading-aloud task was administered to students on an individual basis using standardized administration proceshydures Students read aloud from the passage while the examshyiner followed along on a numbered copy of the same passage making a slash through words read incorrectly or words supshyplied for the student The examiner timed for 3 minutes using a stopwatch marking progress at 1 2 and 3 minutes Readshying aloud was scored for total words read (TWR) and WRC at I 2 and 3 minutes

Maze-selection passages were created from the same stoshyries used for reading aloud Every seventh word was deleted and replaced by the con-ect choice and two distracters The distracters were within one letter in length of the correct word but started with different letters of the alphabet and comprised different parts of speech (see Fuchs Fuchs Hamshylett amp Ferguson l 992 for maze-construction procedures) The three word choices were underlined in bold print and were not split at the end of the sentence in order to preserve continuity for the reader

The maze selection task was administered to students in a group setting using standardized administration procedures Students read silently for 4 minutes making selections for each multiple-choice item Examiners timed for 4 minutes and instructed students to mark their progress with a slash at 2 3 and 4 minutes Examiners monitored to ensure that students made the slashes Maze selection was scored for corshyrect maze choices (CMC) and correct minus incorrect choices (CMI) in 2 3 and 4 minutes As a control for guessing and following the procedures used in previous research on maze selection (Espin Deno Maruyama amp Cohen 1989 Fuchs et al l 992) maze scoring was stopped when three consec-

LEARNING DISABILITIES RESEARCH 63

utive incorrect choices were made A recent investigation comparing different maze-selection scoring procedures reshyvealed no differences in criterion-related validity associated with using a two-in-a-row versus three-in-a-row incorrect rule (Wayman et aL 2009)

Criterion variables The criterion variable in this study was performance on the Minnesota Basic Standards Test (MBST) in reading a high-stakes test required for gradushyation The MBST was designed by the state of Minnesota to test the minimum level of reading skills needed for surshyvival (MN Department of Education 200 l) and at the time of the study was administered annually in the winter to all eighth-grade students in Minnesota 1 The untirned test comshyprised four or more passages of 500 words or more selected from newspaper and magazine articles Passages were both narrative and expository and had average DRPmiddotmiddotJevels ranging from 64 to 67 Each prnsage was followed by multiple-choice questions with approximately 40 questions per test The test was constructed so that 60 percent of the questions on the test were literal 30 percent inferential and 10 percent could be either The test was machine-scored on a scale from O to 40 and then the raw score was converted to a scale score between 375 and 750 A passing scale score was 600 which correshysponded to 75 percent correct (MN Department of Education 200] ) Students who did not pass the test were permitted to retake it two times each year Students had to pass the test in order to graduate from high school

The MBST Technical Manual (MN Department ofEducashytion 2001 ) reported reliability and validity inforn1ation for the MBST Reading test Internal consistency measures for reliability were based on the Rasch model index of person separation The Kuder-Richardson 20 internal consistency reliahility estimate was 90 No alternate-form reliability was calculated Content validity according to the manual was determined by the relationship of the reading test items to statewide content standards as verified by educators item developers and experts in the field Construct validity was measured by item point-biserial correlations (the correlation between students raw scores on the MBST and their scores on individual test items) The mean point biserial correlation was 38 There were no criterion-related validity statistics noted

Procedures

In the fall students completed two maze passages in a group setting jn their classrooms On a subsequent day in the same week students completed two reading-aloud passages indishyvidually Type of measure (reading aloud vs maze selection) and passage were counterbalanced across students as was the order in which the students completed the passages within reading aloud or maze selection Examples of each task were given to students prior to administration The MBST WdS

administered by teachers to students in February Sixteen graduate students administered and scored the

reading-aloud and maze-selection measures Prior to data collection the graduate students were interviewed by memshybers of the research team to ascertain their ability to work

64 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

with students and to accurately score reading samples Folshylowing this initial screening the graduate students particshyipated in two 2-hour training sessions on administration and scoring During training the graduate students adminshyistered and scored three samples Inter-scorer agreement on the three passages betvveen the data collectors and tbe trainer was calculated by dividing the smaller by the larger score and multiplying by lOO Inter-scorer agreement exshyceeded 95 percent on maze selection and 90 percent on readshying aloud for all scorers During data collection and scorshying 33 percent of the reading-aloud and 10 percent of the maze-selection probes were randomly selected to be checked for accuracy of scoring Inter-scorer agreement exceeded 90 percent for all measures

Results

Means and standard deviations for reading-aloud and mazeshyselection scores for each time frame are reported in Table I Examination of mean scores reveals that students worked at a steady pace across the duration of the passages Students read aloud approximately l 25 words with 6 errors per minute across the 3 minutes and made approximately 6 correct maze choices with 05 errors per minute across the 4 minutes of maze The mean score for study participants on the MBST in reading was a standard score of62690 (SD= 6566) with a range of 475-750

To determine alternate-form reliability correlations beshytween scores on the two forms of the maze-selection and reading-aloud measures were calculated for each time frame and scoring procedure see Table 2) Reliabilities for b~th reading aloud and maze were generally above 80 Reliashybilities for reading aloud ranged from 93 to 96 and were similar across scoring method and sample duration Reliashybilities for maze ranged from 79 to 96 and were generally similar for scoring method but increased somewhat with time frame The highest obtained reliability coefficient was for the 4-minute maze passages scored for CMI (r = 96) however reliabilities for the 3-minute maze selection were above 85 regardless of scoring method

TABLE 1 Means and Standard Deviations for Reading Aloud and Maze

Selection by Scoring Procedure and Time Frame

Curriculum-Based Weasurements and scoring procedure Time

Reading uloud J minute 2 minutes 3 minutes Total word8 read 12588 25046 3733]

(4375) (8505) (12595)

Words read correct J ]982 23854 35527 (4729) (9214) (13692)

Maze selection 2 minutes 3 minutes 4 minutes

Correct choices 1233 1876 2524 (712) (1087) (J453

Correct minwgt 1 l 8 1717 23 JO incorrect choices (753 I (1140) (l 5 7

Vote Standard deviations are in pare11lheses

TABLE 2 Alternate-Form Reliability tor Reading Aloud and Maze Selection

by Scoring Procedure and Time Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud 1 minutl 2 mi1mtes 3 minutes Total words read 93 96 95 Words read correct 94 96 94

Maze selection 2 minutes 3 mimnes 4 minuies Correct choices 80 86 88 Correct minus 79 86 96

incorrect choices

Nore All correlations significant atp lt OJ

TABLE 3 Predictive Validity Coefficients for Reading Aloud and Maze

Selection with MBST by Scoring Procedure and Tme Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud J minute 2 minutes 3 minutes Total words read 76 77 76 Words read correct 78 79 78

Maze selection 2 minutes 3 111i11ues 4 minutes Correcl choCes 75 77 80 Correct minus 77 78 81

incorrect c-hoices

Nme All correlations significant at p lt O MBST MinneigtOtd Basic Standards Test

To examine the predictive validity of the measures corshyrelations between mean scores on the two forms of readingshyaloud and maze-selection measures and scores on the MBST were calculated (see Table 3) Correlations ranged from 75 to 81 The magnitude of the correlations was similar across type of measure (reading aloud and maze) and method of scoring For reading aloud correlations for 1 2 and 3 minshyutes were virtually identical For maze selection a consisshytent but small increae in correlations was seen across time frames with correlations of 75 (CMC) and 77 (CMI) for the 2-minute measure and 80 (CMC) and 81 (CMI) for the 4-minute measure

In summary results revealed that both maze selection and reading aloud produced respectable altemate-fom1 reliabilshyities although reading aloud yielded consistently larger reshyliability coefficients than maze Few differences in reliabilishyties were seen for scoring procedure or time frame with the exception that reliabilities for the maze selection increased somewhat with time Predictive validity coefficients were similar for the two types of measures Correlations were simshyilar across scoring procedures for both measures With regard to time frame smal but consistent increases in correlations were seen for maze selection

Discussion

ln this study we examined the reliability and validity ofreadshying aloud and maze selection as indictors of performance on

LEARNING DISABILITIES RESEARCH 65

a state standards test Difference in technical characteristics related to time frame and scoring procedure were examined

Both reading aloud and maze selection showed reasonshyable alternate-fonn reliabilities at all time frames with most coefficients at or above 80 In general reading aloud reshysulted in higher alternate-form reliability coefficients (rangshying from 93 to 96) than did maze selection (ranging from 79 to 96) but reliability for maze selection was in the range typical for CBM Time frame did not influence reshyliability coefficients for reading aloud but had some influshyence on maze selection Obtained reliability coefficients for maze increased with time frame with coefficients for the 2-minute time frame havering around 80 but increasing for 3-minute (r = 86) and 4-minute (r = 88 and 96) time frames Finally scoring procedure had little effect on reliashybility with the exception that when 4-minute maze selection was scored for CMI reliability was somewhat larger (r = 96) than when it was scored for CMC (r = 88)

Like reliability coefficients validity coefficients were quite similar across type of measure time frame and scoring procedure Validity coefficients for reading aloud ranged beshytween 76 and 79 and were similar across scoring procedure and time frames Maze-selection coefficients ranged between 75 and 81 and also were similar across scoring procedure A systematic increase in validity coefficients was seen with an increase in time for maze hut differences were small

We wish to make two observations regarding the magnishytude of the validity coefficients found in the performance study First the correlations obtained in our study were larger than those found in previous research at the middleshyschool level For example Yovanoff et al (2005) reported correlations of 5 l and 52 between WRC in 1 minute and scores on a reading comprehension task for eighth-grade stushydents Espin and Foegen (I 996) reported correlations of 57 and 56 respectively between WRC in 1 minute and CMC in 2 minutes and scores on a reading comprehension task

One might hypothesize that the differences in correlashytions are related to the materials used to develop the CBMs although no consistent pattern of differences can be seen across studies Yovanoff et al (2005) used grade-level prose material Espin and Foegen (1996) used fifth-grade level exshypository material and we used fifth-to seventh-grade humanshyinterest stories from the newspaper-material that might be considered to be both narrative and expository Moreover previous research conducted at the elementary-school level has revealed few differences in reliability and validity for CB Ms drawn from material of different difficulty levels or from various sources (see Wayman et al 2007 for a review)

It is possible that differences are related to the criterion variable used Both Yovanoff et al (2005) and Espin and Foegen ( 1996) used a limited number of researcher~designed multiple-choice questions as an outcome whereas in our study we used a broad-based measure of comprehension deshysigned to scale student performance across a range of levels Supporting this hypothesis are data from two studies demonshystrating nearly identical correlations (in the 70s) to those we found between the CBM reading-aloud and maze-selection measures and the MBST (Muyskens amp Marston 2006 Ticha Espin amp Wayman 2009) In addition Ticha et al (2009)

found high correlations between maze-selection scores and a standardized achievement test

Second the state standards test used in the current study was designed to test the minimal reading competency for students in eighth grade Tims one might question whether the CB Ms would predict reading competence as well if the criterion measures were measures ofbroader reading compeshytence Results ofTicha et al (2009) indicate that the reading measures predict performance on a standardized reading test as well as (or better than) they predict performance on the state standards test Perhaps the nature of the state test serves to reduce the overall variability in scores and thus serves to reduce the correlations Replication of the current study Vlth other outcome measures of reading proficiency is in order

In summary the results supported the reliability and vashylidity of both reading aloud and maze selection as indictors of perfom1ance on a state standards reading test for middleshyschool students For reading aloud our data combined with practicalmiddot considerations would suggest use of a I-minute sample scored for TWR or WRC as a valid and reliable indishycator of perfonnance Little was gained in technical adequacy by increasing the reading time Given that reading aloud is typically scored for WRC and given that this scoring proshycedure is no more time consuming than scoring TWR we would recommend scoring the sample for WRC rather than TWR

For maze selection our data combined with practical considerations wouJd suggest use of a 3-rninute selecshytion task scored for CMC as a valid and reliable indicator of performance Although reliability and validity coeffishycients were the strongest for 4 minutes the differences beshytween 3- and 4-minute coefficients were small in magnitude and both data collectors and teachers reported anecdotalty that a 4-minute maze task was tedious for the sn1dents to complete

Altl1ough our data support the use of both WRC in J minute and CMC in 3 minutes as predictors of performance on a state standard test one might ask how teachers can use such data in their decision making A common approach is to create a district-wide cutoff score on the CBM that is associated with a high probability of passing the state standards test For example district-wide data may show that of students who read 145 WRC in 1 minute 80 percent pass the state standards test Teachers might then set a goal of I 45 WRC in 1 minute for their students The disadvantage of a cutoff score for students who struggle in reading is that these students often perform weil below the cutoff score An alternative approach is to present the relationship between performance on the CBM measures and the likelihood of passing the state standards test along the entire performance continuum For example district-wide data may show that of students who read l 00 WRC in 1 minute 26 percent pass the state standards test but of students who read 126 WRC in I minute 57 percent pass Teachers may choose to set an annual goal of 126 WRC for a student who begins the year reading only 100 WRC This goal would move the student closer to a level oflikely success A method that can be used to create these Table of Probable Success using CBM data is explained and illustrated in Espin et al (2008)

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 3: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

62 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

(retention) Correlations ranged from 54 to 65 and were similar for comprehension acquisition and retention meashysures Brown-Chidsey Davis and Maya (2003) examined the reliabiJity and vaJidity of a I 0-mjnute maze task-a somewhat long task by CBM standards~as an indicator of reading for students in grades 5~8 They found that scores generaJly differentiated students by grade level and special education status Rasinski et al (2005) in discussing the importance of reading fluency for high school students reshyported correlations between WRC in 1 minute and scores on a state standards test of 53 for ninth-grade students Descriptive data and methods were not reported in the article

In sum little research has been conducted at the secondary-school level on the development of CBM readshying measures as indicators of general reading proficiency and that which has been done has been limited in terms of measures and methodology or has not focused specifically on secondary-school students What is more the research to date bas focused on the characteristics of the measures as performance or static measures not as probrress or growth measures The validity and reliability of the measures may differ based on their intended use

In this anicle we examine the technical adequacy ofCBM reading measures for secondaiy-school students Specifshyically the reliability and validity of CBMs as predicshytors of performance on a state standards test in reading is examined Differences related to time frame and scorshying procedure are examined Reading-aloud and mazeshyselection measures are selected because of previous reM search demonstrating their practical and technical adequacy at the elementary-school level and their potential promise at the secondary-school level Time frames are examined because longer samples of work might be needed at the middle-school level to obtain a distribution of student scores For example reading-aloud scores might bunch together at I minute but spread out at 3 minutes Finally scoring proshycedures are examined to determine the influence of ershyrors on the reliability and validity of students scores For example counting the number of correct selections on a maze task is less time consuming than counting the number of correct minus incorrect selections but using a correct minus incorrect score may help to control for guessing

Two research questions are addressed in the study

( ) What are the reliability and validity of reading aloud and maze selection for predicting performance on a state standards test in reading

(2) Do reliability and validity vary with time frame and scoring procedures

Our primary focus was on the technical adequacy of CBMs as static measures or indicators of performance at a single point in time However we were also able to col~ lect progress measures on a small subsample of the origshyinal sample Thus we conducted an exploratory study in which we compared the growth rates produced by readingshyaloud and maze-selection measures for this subsample of students

STUDY 1 READING ALOUD AND MAZE SELECTION AS PERFORMANCE INDICATORS

Method

Setting and Participants

The study took place in two middle schools Jn an urban disshytrict of a large midwestem metropolitan area The district enrolled over 47000 students Seventy-five percent of the students were from diverse cultural back61rounds 24 percent received ESL services 67 percent were eligible for free and reduced lunches and 13 percent were in special education The first school had 669 students in grades 6-8 Eightyshythree percent of the students were from diverse cultural backshygrounds 35 percent received ESL services 83 percent were eligible for free and reduced lunches and 15 percent were in special education The second school had 778 students in grades 6-8 Sixty-two percent of the students were from diverse cultural backgrounds 18 percent received ESL sershyvices 56 percent were eligible for free or reduced lunches and 16 percent were in special education

All eighth-grade students were invited to participate in the study to ensure a range of student performance levels Participants were 236 eighth-grade students ( 134 females and l 02 males) in the classrooms of 17 English teachers from the two schools Fifty-eight percent of the participants were eligible for free or reduced lunches Students were Caucasian (34 percent) A~ian American (24 percent) African American (20 percent) Hispanic (19 pereent) and Native American (3 percent) Nine percent of the students were receiving special education services for learning disabilities or mild disabilities (4 percent) speech and language (3 percent) emotional and behavior disorders (1 percent) or other health impaired ( 1 percent) Fifty-eight riercent of the students spoke English at home The rest spoke Spanish ( 185 percent) Hmong (J 6 pereent) Laotian (4percent) Vietnamese (1 percent) Cambodian (1 pereent) Amharic (5 percent) Chinese (05 pereent) and Somali (05 percent) The mean standard score on the state standards reading test for Sample 1 was 6269 This compared to a state-wide mean score of 6406 and a district-wide mean score of6073

Note that the sample did not consist of struggling readshyers only even though the primary purpose of the study was to identify petformance and progress measures for strugshygling readers To establish the reliability and validity of CBM it was necessary to have a sample that represented a range of student ability levels because vaJidhy and reliashybility coefficients could be negatively affected by a truncated distribution of scores We had two options One was to seshylect students who were struggling readers across a range of grade levels similar to the approach taken by Fuchs et al ( 1988) A second was to work within one grade level but to include students across a range of performance levshyels within that grade Given that the purpose of the smdy was to tie the CBM to performance on a state standards test and given that the state standards test was given in only one grade we chose the latter approach This approach is not unique In a review of the CBM research in reading

(Wayman et al 2007) 28 of the 29 technical adequacy studshyies conducted at the elementary-school leveJ used general education samples (13 studies) or mixed samples of general and special education ( 15 studies) Only l used an exclusively special education sample

Measures

Predictor variables Predictor variables were scores on two CBM tasks reading aloud and maze selection The readingshyaloud and maze-selection tasks were drawn from humanshyinterest stories published in the local daily newspaper and were selected on the basis of content readability level length and scores on a pilot test conducted with four sn1dents who were not involved in the study Passages whose content was determined to be too technical or culturally specific were not used To ensure that students would not complete the CBM tasks before time was expired only passages that were longer than 800 words were selected Readability was calculated usshying the Flesch-Kincaid formula (Kincaid Fishburne Rogers amp Chissom 1975) via Microsoft Word and the Degrees of Reading Power (DRP Touchstone Applied Science and Asshysociates 2006) Readability levels for the selected passages ranged from fifth to seventh grade and DRP levels ranged from 51 to 61 Means (number of words read aloud in 3 minshyutes) and standard deviations from the pilot study for selected passages were 4215 (SD= 805) 4895 (SD= 117) 4325 (SD= 1405) and 4017 (SD= 75)

Tiie reading-aloud task was administered to students on an individual basis using standardized administration proceshydures Students read aloud from the passage while the examshyiner followed along on a numbered copy of the same passage making a slash through words read incorrectly or words supshyplied for the student The examiner timed for 3 minutes using a stopwatch marking progress at 1 2 and 3 minutes Readshying aloud was scored for total words read (TWR) and WRC at I 2 and 3 minutes

Maze-selection passages were created from the same stoshyries used for reading aloud Every seventh word was deleted and replaced by the con-ect choice and two distracters The distracters were within one letter in length of the correct word but started with different letters of the alphabet and comprised different parts of speech (see Fuchs Fuchs Hamshylett amp Ferguson l 992 for maze-construction procedures) The three word choices were underlined in bold print and were not split at the end of the sentence in order to preserve continuity for the reader

The maze selection task was administered to students in a group setting using standardized administration procedures Students read silently for 4 minutes making selections for each multiple-choice item Examiners timed for 4 minutes and instructed students to mark their progress with a slash at 2 3 and 4 minutes Examiners monitored to ensure that students made the slashes Maze selection was scored for corshyrect maze choices (CMC) and correct minus incorrect choices (CMI) in 2 3 and 4 minutes As a control for guessing and following the procedures used in previous research on maze selection (Espin Deno Maruyama amp Cohen 1989 Fuchs et al l 992) maze scoring was stopped when three consec-

LEARNING DISABILITIES RESEARCH 63

utive incorrect choices were made A recent investigation comparing different maze-selection scoring procedures reshyvealed no differences in criterion-related validity associated with using a two-in-a-row versus three-in-a-row incorrect rule (Wayman et aL 2009)

Criterion variables The criterion variable in this study was performance on the Minnesota Basic Standards Test (MBST) in reading a high-stakes test required for gradushyation The MBST was designed by the state of Minnesota to test the minimum level of reading skills needed for surshyvival (MN Department of Education 200 l) and at the time of the study was administered annually in the winter to all eighth-grade students in Minnesota 1 The untirned test comshyprised four or more passages of 500 words or more selected from newspaper and magazine articles Passages were both narrative and expository and had average DRPmiddotmiddotJevels ranging from 64 to 67 Each prnsage was followed by multiple-choice questions with approximately 40 questions per test The test was constructed so that 60 percent of the questions on the test were literal 30 percent inferential and 10 percent could be either The test was machine-scored on a scale from O to 40 and then the raw score was converted to a scale score between 375 and 750 A passing scale score was 600 which correshysponded to 75 percent correct (MN Department of Education 200] ) Students who did not pass the test were permitted to retake it two times each year Students had to pass the test in order to graduate from high school

The MBST Technical Manual (MN Department ofEducashytion 2001 ) reported reliability and validity inforn1ation for the MBST Reading test Internal consistency measures for reliability were based on the Rasch model index of person separation The Kuder-Richardson 20 internal consistency reliahility estimate was 90 No alternate-form reliability was calculated Content validity according to the manual was determined by the relationship of the reading test items to statewide content standards as verified by educators item developers and experts in the field Construct validity was measured by item point-biserial correlations (the correlation between students raw scores on the MBST and their scores on individual test items) The mean point biserial correlation was 38 There were no criterion-related validity statistics noted

Procedures

In the fall students completed two maze passages in a group setting jn their classrooms On a subsequent day in the same week students completed two reading-aloud passages indishyvidually Type of measure (reading aloud vs maze selection) and passage were counterbalanced across students as was the order in which the students completed the passages within reading aloud or maze selection Examples of each task were given to students prior to administration The MBST WdS

administered by teachers to students in February Sixteen graduate students administered and scored the

reading-aloud and maze-selection measures Prior to data collection the graduate students were interviewed by memshybers of the research team to ascertain their ability to work

64 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

with students and to accurately score reading samples Folshylowing this initial screening the graduate students particshyipated in two 2-hour training sessions on administration and scoring During training the graduate students adminshyistered and scored three samples Inter-scorer agreement on the three passages betvveen the data collectors and tbe trainer was calculated by dividing the smaller by the larger score and multiplying by lOO Inter-scorer agreement exshyceeded 95 percent on maze selection and 90 percent on readshying aloud for all scorers During data collection and scorshying 33 percent of the reading-aloud and 10 percent of the maze-selection probes were randomly selected to be checked for accuracy of scoring Inter-scorer agreement exceeded 90 percent for all measures

Results

Means and standard deviations for reading-aloud and mazeshyselection scores for each time frame are reported in Table I Examination of mean scores reveals that students worked at a steady pace across the duration of the passages Students read aloud approximately l 25 words with 6 errors per minute across the 3 minutes and made approximately 6 correct maze choices with 05 errors per minute across the 4 minutes of maze The mean score for study participants on the MBST in reading was a standard score of62690 (SD= 6566) with a range of 475-750

To determine alternate-form reliability correlations beshytween scores on the two forms of the maze-selection and reading-aloud measures were calculated for each time frame and scoring procedure see Table 2) Reliabilities for b~th reading aloud and maze were generally above 80 Reliashybilities for reading aloud ranged from 93 to 96 and were similar across scoring method and sample duration Reliashybilities for maze ranged from 79 to 96 and were generally similar for scoring method but increased somewhat with time frame The highest obtained reliability coefficient was for the 4-minute maze passages scored for CMI (r = 96) however reliabilities for the 3-minute maze selection were above 85 regardless of scoring method

TABLE 1 Means and Standard Deviations for Reading Aloud and Maze

Selection by Scoring Procedure and Time Frame

Curriculum-Based Weasurements and scoring procedure Time

Reading uloud J minute 2 minutes 3 minutes Total word8 read 12588 25046 3733]

(4375) (8505) (12595)

Words read correct J ]982 23854 35527 (4729) (9214) (13692)

Maze selection 2 minutes 3 minutes 4 minutes

Correct choices 1233 1876 2524 (712) (1087) (J453

Correct minwgt 1 l 8 1717 23 JO incorrect choices (753 I (1140) (l 5 7

Vote Standard deviations are in pare11lheses

TABLE 2 Alternate-Form Reliability tor Reading Aloud and Maze Selection

by Scoring Procedure and Time Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud 1 minutl 2 mi1mtes 3 minutes Total words read 93 96 95 Words read correct 94 96 94

Maze selection 2 minutes 3 mimnes 4 minuies Correct choices 80 86 88 Correct minus 79 86 96

incorrect choices

Nore All correlations significant atp lt OJ

TABLE 3 Predictive Validity Coefficients for Reading Aloud and Maze

Selection with MBST by Scoring Procedure and Tme Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud J minute 2 minutes 3 minutes Total words read 76 77 76 Words read correct 78 79 78

Maze selection 2 minutes 3 111i11ues 4 minutes Correcl choCes 75 77 80 Correct minus 77 78 81

incorrect c-hoices

Nme All correlations significant at p lt O MBST MinneigtOtd Basic Standards Test

To examine the predictive validity of the measures corshyrelations between mean scores on the two forms of readingshyaloud and maze-selection measures and scores on the MBST were calculated (see Table 3) Correlations ranged from 75 to 81 The magnitude of the correlations was similar across type of measure (reading aloud and maze) and method of scoring For reading aloud correlations for 1 2 and 3 minshyutes were virtually identical For maze selection a consisshytent but small increae in correlations was seen across time frames with correlations of 75 (CMC) and 77 (CMI) for the 2-minute measure and 80 (CMC) and 81 (CMI) for the 4-minute measure

In summary results revealed that both maze selection and reading aloud produced respectable altemate-fom1 reliabilshyities although reading aloud yielded consistently larger reshyliability coefficients than maze Few differences in reliabilishyties were seen for scoring procedure or time frame with the exception that reliabilities for the maze selection increased somewhat with time Predictive validity coefficients were similar for the two types of measures Correlations were simshyilar across scoring procedures for both measures With regard to time frame smal but consistent increases in correlations were seen for maze selection

Discussion

ln this study we examined the reliability and validity ofreadshying aloud and maze selection as indictors of performance on

LEARNING DISABILITIES RESEARCH 65

a state standards test Difference in technical characteristics related to time frame and scoring procedure were examined

Both reading aloud and maze selection showed reasonshyable alternate-fonn reliabilities at all time frames with most coefficients at or above 80 In general reading aloud reshysulted in higher alternate-form reliability coefficients (rangshying from 93 to 96) than did maze selection (ranging from 79 to 96) but reliability for maze selection was in the range typical for CBM Time frame did not influence reshyliability coefficients for reading aloud but had some influshyence on maze selection Obtained reliability coefficients for maze increased with time frame with coefficients for the 2-minute time frame havering around 80 but increasing for 3-minute (r = 86) and 4-minute (r = 88 and 96) time frames Finally scoring procedure had little effect on reliashybility with the exception that when 4-minute maze selection was scored for CMI reliability was somewhat larger (r = 96) than when it was scored for CMC (r = 88)

Like reliability coefficients validity coefficients were quite similar across type of measure time frame and scoring procedure Validity coefficients for reading aloud ranged beshytween 76 and 79 and were similar across scoring procedure and time frames Maze-selection coefficients ranged between 75 and 81 and also were similar across scoring procedure A systematic increase in validity coefficients was seen with an increase in time for maze hut differences were small

We wish to make two observations regarding the magnishytude of the validity coefficients found in the performance study First the correlations obtained in our study were larger than those found in previous research at the middleshyschool level For example Yovanoff et al (2005) reported correlations of 5 l and 52 between WRC in 1 minute and scores on a reading comprehension task for eighth-grade stushydents Espin and Foegen (I 996) reported correlations of 57 and 56 respectively between WRC in 1 minute and CMC in 2 minutes and scores on a reading comprehension task

One might hypothesize that the differences in correlashytions are related to the materials used to develop the CBMs although no consistent pattern of differences can be seen across studies Yovanoff et al (2005) used grade-level prose material Espin and Foegen (1996) used fifth-grade level exshypository material and we used fifth-to seventh-grade humanshyinterest stories from the newspaper-material that might be considered to be both narrative and expository Moreover previous research conducted at the elementary-school level has revealed few differences in reliability and validity for CB Ms drawn from material of different difficulty levels or from various sources (see Wayman et al 2007 for a review)

It is possible that differences are related to the criterion variable used Both Yovanoff et al (2005) and Espin and Foegen ( 1996) used a limited number of researcher~designed multiple-choice questions as an outcome whereas in our study we used a broad-based measure of comprehension deshysigned to scale student performance across a range of levels Supporting this hypothesis are data from two studies demonshystrating nearly identical correlations (in the 70s) to those we found between the CBM reading-aloud and maze-selection measures and the MBST (Muyskens amp Marston 2006 Ticha Espin amp Wayman 2009) In addition Ticha et al (2009)

found high correlations between maze-selection scores and a standardized achievement test

Second the state standards test used in the current study was designed to test the minimal reading competency for students in eighth grade Tims one might question whether the CB Ms would predict reading competence as well if the criterion measures were measures ofbroader reading compeshytence Results ofTicha et al (2009) indicate that the reading measures predict performance on a standardized reading test as well as (or better than) they predict performance on the state standards test Perhaps the nature of the state test serves to reduce the overall variability in scores and thus serves to reduce the correlations Replication of the current study Vlth other outcome measures of reading proficiency is in order

In summary the results supported the reliability and vashylidity of both reading aloud and maze selection as indictors of perfom1ance on a state standards reading test for middleshyschool students For reading aloud our data combined with practicalmiddot considerations would suggest use of a I-minute sample scored for TWR or WRC as a valid and reliable indishycator of perfonnance Little was gained in technical adequacy by increasing the reading time Given that reading aloud is typically scored for WRC and given that this scoring proshycedure is no more time consuming than scoring TWR we would recommend scoring the sample for WRC rather than TWR

For maze selection our data combined with practical considerations wouJd suggest use of a 3-rninute selecshytion task scored for CMC as a valid and reliable indicator of performance Although reliability and validity coeffishycients were the strongest for 4 minutes the differences beshytween 3- and 4-minute coefficients were small in magnitude and both data collectors and teachers reported anecdotalty that a 4-minute maze task was tedious for the sn1dents to complete

Altl1ough our data support the use of both WRC in J minute and CMC in 3 minutes as predictors of performance on a state standard test one might ask how teachers can use such data in their decision making A common approach is to create a district-wide cutoff score on the CBM that is associated with a high probability of passing the state standards test For example district-wide data may show that of students who read 145 WRC in 1 minute 80 percent pass the state standards test Teachers might then set a goal of I 45 WRC in 1 minute for their students The disadvantage of a cutoff score for students who struggle in reading is that these students often perform weil below the cutoff score An alternative approach is to present the relationship between performance on the CBM measures and the likelihood of passing the state standards test along the entire performance continuum For example district-wide data may show that of students who read l 00 WRC in 1 minute 26 percent pass the state standards test but of students who read 126 WRC in I minute 57 percent pass Teachers may choose to set an annual goal of 126 WRC for a student who begins the year reading only 100 WRC This goal would move the student closer to a level oflikely success A method that can be used to create these Table of Probable Success using CBM data is explained and illustrated in Espin et al (2008)

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 4: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

(Wayman et al 2007) 28 of the 29 technical adequacy studshyies conducted at the elementary-school leveJ used general education samples (13 studies) or mixed samples of general and special education ( 15 studies) Only l used an exclusively special education sample

Measures

Predictor variables Predictor variables were scores on two CBM tasks reading aloud and maze selection The readingshyaloud and maze-selection tasks were drawn from humanshyinterest stories published in the local daily newspaper and were selected on the basis of content readability level length and scores on a pilot test conducted with four sn1dents who were not involved in the study Passages whose content was determined to be too technical or culturally specific were not used To ensure that students would not complete the CBM tasks before time was expired only passages that were longer than 800 words were selected Readability was calculated usshying the Flesch-Kincaid formula (Kincaid Fishburne Rogers amp Chissom 1975) via Microsoft Word and the Degrees of Reading Power (DRP Touchstone Applied Science and Asshysociates 2006) Readability levels for the selected passages ranged from fifth to seventh grade and DRP levels ranged from 51 to 61 Means (number of words read aloud in 3 minshyutes) and standard deviations from the pilot study for selected passages were 4215 (SD= 805) 4895 (SD= 117) 4325 (SD= 1405) and 4017 (SD= 75)

Tiie reading-aloud task was administered to students on an individual basis using standardized administration proceshydures Students read aloud from the passage while the examshyiner followed along on a numbered copy of the same passage making a slash through words read incorrectly or words supshyplied for the student The examiner timed for 3 minutes using a stopwatch marking progress at 1 2 and 3 minutes Readshying aloud was scored for total words read (TWR) and WRC at I 2 and 3 minutes

Maze-selection passages were created from the same stoshyries used for reading aloud Every seventh word was deleted and replaced by the con-ect choice and two distracters The distracters were within one letter in length of the correct word but started with different letters of the alphabet and comprised different parts of speech (see Fuchs Fuchs Hamshylett amp Ferguson l 992 for maze-construction procedures) The three word choices were underlined in bold print and were not split at the end of the sentence in order to preserve continuity for the reader

The maze selection task was administered to students in a group setting using standardized administration procedures Students read silently for 4 minutes making selections for each multiple-choice item Examiners timed for 4 minutes and instructed students to mark their progress with a slash at 2 3 and 4 minutes Examiners monitored to ensure that students made the slashes Maze selection was scored for corshyrect maze choices (CMC) and correct minus incorrect choices (CMI) in 2 3 and 4 minutes As a control for guessing and following the procedures used in previous research on maze selection (Espin Deno Maruyama amp Cohen 1989 Fuchs et al l 992) maze scoring was stopped when three consec-

LEARNING DISABILITIES RESEARCH 63

utive incorrect choices were made A recent investigation comparing different maze-selection scoring procedures reshyvealed no differences in criterion-related validity associated with using a two-in-a-row versus three-in-a-row incorrect rule (Wayman et aL 2009)

Criterion variables The criterion variable in this study was performance on the Minnesota Basic Standards Test (MBST) in reading a high-stakes test required for gradushyation The MBST was designed by the state of Minnesota to test the minimum level of reading skills needed for surshyvival (MN Department of Education 200 l) and at the time of the study was administered annually in the winter to all eighth-grade students in Minnesota 1 The untirned test comshyprised four or more passages of 500 words or more selected from newspaper and magazine articles Passages were both narrative and expository and had average DRPmiddotmiddotJevels ranging from 64 to 67 Each prnsage was followed by multiple-choice questions with approximately 40 questions per test The test was constructed so that 60 percent of the questions on the test were literal 30 percent inferential and 10 percent could be either The test was machine-scored on a scale from O to 40 and then the raw score was converted to a scale score between 375 and 750 A passing scale score was 600 which correshysponded to 75 percent correct (MN Department of Education 200] ) Students who did not pass the test were permitted to retake it two times each year Students had to pass the test in order to graduate from high school

The MBST Technical Manual (MN Department ofEducashytion 2001 ) reported reliability and validity inforn1ation for the MBST Reading test Internal consistency measures for reliability were based on the Rasch model index of person separation The Kuder-Richardson 20 internal consistency reliahility estimate was 90 No alternate-form reliability was calculated Content validity according to the manual was determined by the relationship of the reading test items to statewide content standards as verified by educators item developers and experts in the field Construct validity was measured by item point-biserial correlations (the correlation between students raw scores on the MBST and their scores on individual test items) The mean point biserial correlation was 38 There were no criterion-related validity statistics noted

Procedures

In the fall students completed two maze passages in a group setting jn their classrooms On a subsequent day in the same week students completed two reading-aloud passages indishyvidually Type of measure (reading aloud vs maze selection) and passage were counterbalanced across students as was the order in which the students completed the passages within reading aloud or maze selection Examples of each task were given to students prior to administration The MBST WdS

administered by teachers to students in February Sixteen graduate students administered and scored the

reading-aloud and maze-selection measures Prior to data collection the graduate students were interviewed by memshybers of the research team to ascertain their ability to work

64 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

with students and to accurately score reading samples Folshylowing this initial screening the graduate students particshyipated in two 2-hour training sessions on administration and scoring During training the graduate students adminshyistered and scored three samples Inter-scorer agreement on the three passages betvveen the data collectors and tbe trainer was calculated by dividing the smaller by the larger score and multiplying by lOO Inter-scorer agreement exshyceeded 95 percent on maze selection and 90 percent on readshying aloud for all scorers During data collection and scorshying 33 percent of the reading-aloud and 10 percent of the maze-selection probes were randomly selected to be checked for accuracy of scoring Inter-scorer agreement exceeded 90 percent for all measures

Results

Means and standard deviations for reading-aloud and mazeshyselection scores for each time frame are reported in Table I Examination of mean scores reveals that students worked at a steady pace across the duration of the passages Students read aloud approximately l 25 words with 6 errors per minute across the 3 minutes and made approximately 6 correct maze choices with 05 errors per minute across the 4 minutes of maze The mean score for study participants on the MBST in reading was a standard score of62690 (SD= 6566) with a range of 475-750

To determine alternate-form reliability correlations beshytween scores on the two forms of the maze-selection and reading-aloud measures were calculated for each time frame and scoring procedure see Table 2) Reliabilities for b~th reading aloud and maze were generally above 80 Reliashybilities for reading aloud ranged from 93 to 96 and were similar across scoring method and sample duration Reliashybilities for maze ranged from 79 to 96 and were generally similar for scoring method but increased somewhat with time frame The highest obtained reliability coefficient was for the 4-minute maze passages scored for CMI (r = 96) however reliabilities for the 3-minute maze selection were above 85 regardless of scoring method

TABLE 1 Means and Standard Deviations for Reading Aloud and Maze

Selection by Scoring Procedure and Time Frame

Curriculum-Based Weasurements and scoring procedure Time

Reading uloud J minute 2 minutes 3 minutes Total word8 read 12588 25046 3733]

(4375) (8505) (12595)

Words read correct J ]982 23854 35527 (4729) (9214) (13692)

Maze selection 2 minutes 3 minutes 4 minutes

Correct choices 1233 1876 2524 (712) (1087) (J453

Correct minwgt 1 l 8 1717 23 JO incorrect choices (753 I (1140) (l 5 7

Vote Standard deviations are in pare11lheses

TABLE 2 Alternate-Form Reliability tor Reading Aloud and Maze Selection

by Scoring Procedure and Time Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud 1 minutl 2 mi1mtes 3 minutes Total words read 93 96 95 Words read correct 94 96 94

Maze selection 2 minutes 3 mimnes 4 minuies Correct choices 80 86 88 Correct minus 79 86 96

incorrect choices

Nore All correlations significant atp lt OJ

TABLE 3 Predictive Validity Coefficients for Reading Aloud and Maze

Selection with MBST by Scoring Procedure and Tme Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud J minute 2 minutes 3 minutes Total words read 76 77 76 Words read correct 78 79 78

Maze selection 2 minutes 3 111i11ues 4 minutes Correcl choCes 75 77 80 Correct minus 77 78 81

incorrect c-hoices

Nme All correlations significant at p lt O MBST MinneigtOtd Basic Standards Test

To examine the predictive validity of the measures corshyrelations between mean scores on the two forms of readingshyaloud and maze-selection measures and scores on the MBST were calculated (see Table 3) Correlations ranged from 75 to 81 The magnitude of the correlations was similar across type of measure (reading aloud and maze) and method of scoring For reading aloud correlations for 1 2 and 3 minshyutes were virtually identical For maze selection a consisshytent but small increae in correlations was seen across time frames with correlations of 75 (CMC) and 77 (CMI) for the 2-minute measure and 80 (CMC) and 81 (CMI) for the 4-minute measure

In summary results revealed that both maze selection and reading aloud produced respectable altemate-fom1 reliabilshyities although reading aloud yielded consistently larger reshyliability coefficients than maze Few differences in reliabilishyties were seen for scoring procedure or time frame with the exception that reliabilities for the maze selection increased somewhat with time Predictive validity coefficients were similar for the two types of measures Correlations were simshyilar across scoring procedures for both measures With regard to time frame smal but consistent increases in correlations were seen for maze selection

Discussion

ln this study we examined the reliability and validity ofreadshying aloud and maze selection as indictors of performance on

LEARNING DISABILITIES RESEARCH 65

a state standards test Difference in technical characteristics related to time frame and scoring procedure were examined

Both reading aloud and maze selection showed reasonshyable alternate-fonn reliabilities at all time frames with most coefficients at or above 80 In general reading aloud reshysulted in higher alternate-form reliability coefficients (rangshying from 93 to 96) than did maze selection (ranging from 79 to 96) but reliability for maze selection was in the range typical for CBM Time frame did not influence reshyliability coefficients for reading aloud but had some influshyence on maze selection Obtained reliability coefficients for maze increased with time frame with coefficients for the 2-minute time frame havering around 80 but increasing for 3-minute (r = 86) and 4-minute (r = 88 and 96) time frames Finally scoring procedure had little effect on reliashybility with the exception that when 4-minute maze selection was scored for CMI reliability was somewhat larger (r = 96) than when it was scored for CMC (r = 88)

Like reliability coefficients validity coefficients were quite similar across type of measure time frame and scoring procedure Validity coefficients for reading aloud ranged beshytween 76 and 79 and were similar across scoring procedure and time frames Maze-selection coefficients ranged between 75 and 81 and also were similar across scoring procedure A systematic increase in validity coefficients was seen with an increase in time for maze hut differences were small

We wish to make two observations regarding the magnishytude of the validity coefficients found in the performance study First the correlations obtained in our study were larger than those found in previous research at the middleshyschool level For example Yovanoff et al (2005) reported correlations of 5 l and 52 between WRC in 1 minute and scores on a reading comprehension task for eighth-grade stushydents Espin and Foegen (I 996) reported correlations of 57 and 56 respectively between WRC in 1 minute and CMC in 2 minutes and scores on a reading comprehension task

One might hypothesize that the differences in correlashytions are related to the materials used to develop the CBMs although no consistent pattern of differences can be seen across studies Yovanoff et al (2005) used grade-level prose material Espin and Foegen (1996) used fifth-grade level exshypository material and we used fifth-to seventh-grade humanshyinterest stories from the newspaper-material that might be considered to be both narrative and expository Moreover previous research conducted at the elementary-school level has revealed few differences in reliability and validity for CB Ms drawn from material of different difficulty levels or from various sources (see Wayman et al 2007 for a review)

It is possible that differences are related to the criterion variable used Both Yovanoff et al (2005) and Espin and Foegen ( 1996) used a limited number of researcher~designed multiple-choice questions as an outcome whereas in our study we used a broad-based measure of comprehension deshysigned to scale student performance across a range of levels Supporting this hypothesis are data from two studies demonshystrating nearly identical correlations (in the 70s) to those we found between the CBM reading-aloud and maze-selection measures and the MBST (Muyskens amp Marston 2006 Ticha Espin amp Wayman 2009) In addition Ticha et al (2009)

found high correlations between maze-selection scores and a standardized achievement test

Second the state standards test used in the current study was designed to test the minimal reading competency for students in eighth grade Tims one might question whether the CB Ms would predict reading competence as well if the criterion measures were measures ofbroader reading compeshytence Results ofTicha et al (2009) indicate that the reading measures predict performance on a standardized reading test as well as (or better than) they predict performance on the state standards test Perhaps the nature of the state test serves to reduce the overall variability in scores and thus serves to reduce the correlations Replication of the current study Vlth other outcome measures of reading proficiency is in order

In summary the results supported the reliability and vashylidity of both reading aloud and maze selection as indictors of perfom1ance on a state standards reading test for middleshyschool students For reading aloud our data combined with practicalmiddot considerations would suggest use of a I-minute sample scored for TWR or WRC as a valid and reliable indishycator of perfonnance Little was gained in technical adequacy by increasing the reading time Given that reading aloud is typically scored for WRC and given that this scoring proshycedure is no more time consuming than scoring TWR we would recommend scoring the sample for WRC rather than TWR

For maze selection our data combined with practical considerations wouJd suggest use of a 3-rninute selecshytion task scored for CMC as a valid and reliable indicator of performance Although reliability and validity coeffishycients were the strongest for 4 minutes the differences beshytween 3- and 4-minute coefficients were small in magnitude and both data collectors and teachers reported anecdotalty that a 4-minute maze task was tedious for the sn1dents to complete

Altl1ough our data support the use of both WRC in J minute and CMC in 3 minutes as predictors of performance on a state standard test one might ask how teachers can use such data in their decision making A common approach is to create a district-wide cutoff score on the CBM that is associated with a high probability of passing the state standards test For example district-wide data may show that of students who read 145 WRC in 1 minute 80 percent pass the state standards test Teachers might then set a goal of I 45 WRC in 1 minute for their students The disadvantage of a cutoff score for students who struggle in reading is that these students often perform weil below the cutoff score An alternative approach is to present the relationship between performance on the CBM measures and the likelihood of passing the state standards test along the entire performance continuum For example district-wide data may show that of students who read l 00 WRC in 1 minute 26 percent pass the state standards test but of students who read 126 WRC in I minute 57 percent pass Teachers may choose to set an annual goal of 126 WRC for a student who begins the year reading only 100 WRC This goal would move the student closer to a level oflikely success A method that can be used to create these Table of Probable Success using CBM data is explained and illustrated in Espin et al (2008)

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 5: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

64 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

with students and to accurately score reading samples Folshylowing this initial screening the graduate students particshyipated in two 2-hour training sessions on administration and scoring During training the graduate students adminshyistered and scored three samples Inter-scorer agreement on the three passages betvveen the data collectors and tbe trainer was calculated by dividing the smaller by the larger score and multiplying by lOO Inter-scorer agreement exshyceeded 95 percent on maze selection and 90 percent on readshying aloud for all scorers During data collection and scorshying 33 percent of the reading-aloud and 10 percent of the maze-selection probes were randomly selected to be checked for accuracy of scoring Inter-scorer agreement exceeded 90 percent for all measures

Results

Means and standard deviations for reading-aloud and mazeshyselection scores for each time frame are reported in Table I Examination of mean scores reveals that students worked at a steady pace across the duration of the passages Students read aloud approximately l 25 words with 6 errors per minute across the 3 minutes and made approximately 6 correct maze choices with 05 errors per minute across the 4 minutes of maze The mean score for study participants on the MBST in reading was a standard score of62690 (SD= 6566) with a range of 475-750

To determine alternate-form reliability correlations beshytween scores on the two forms of the maze-selection and reading-aloud measures were calculated for each time frame and scoring procedure see Table 2) Reliabilities for b~th reading aloud and maze were generally above 80 Reliashybilities for reading aloud ranged from 93 to 96 and were similar across scoring method and sample duration Reliashybilities for maze ranged from 79 to 96 and were generally similar for scoring method but increased somewhat with time frame The highest obtained reliability coefficient was for the 4-minute maze passages scored for CMI (r = 96) however reliabilities for the 3-minute maze selection were above 85 regardless of scoring method

TABLE 1 Means and Standard Deviations for Reading Aloud and Maze

Selection by Scoring Procedure and Time Frame

Curriculum-Based Weasurements and scoring procedure Time

Reading uloud J minute 2 minutes 3 minutes Total word8 read 12588 25046 3733]

(4375) (8505) (12595)

Words read correct J ]982 23854 35527 (4729) (9214) (13692)

Maze selection 2 minutes 3 minutes 4 minutes

Correct choices 1233 1876 2524 (712) (1087) (J453

Correct minwgt 1 l 8 1717 23 JO incorrect choices (753 I (1140) (l 5 7

Vote Standard deviations are in pare11lheses

TABLE 2 Alternate-Form Reliability tor Reading Aloud and Maze Selection

by Scoring Procedure and Time Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud 1 minutl 2 mi1mtes 3 minutes Total words read 93 96 95 Words read correct 94 96 94

Maze selection 2 minutes 3 mimnes 4 minuies Correct choices 80 86 88 Correct minus 79 86 96

incorrect choices

Nore All correlations significant atp lt OJ

TABLE 3 Predictive Validity Coefficients for Reading Aloud and Maze

Selection with MBST by Scoring Procedure and Tme Frame

Curriculum-Based Measurements and scoring procedure Time

Reading aloud J minute 2 minutes 3 minutes Total words read 76 77 76 Words read correct 78 79 78

Maze selection 2 minutes 3 111i11ues 4 minutes Correcl choCes 75 77 80 Correct minus 77 78 81

incorrect c-hoices

Nme All correlations significant at p lt O MBST MinneigtOtd Basic Standards Test

To examine the predictive validity of the measures corshyrelations between mean scores on the two forms of readingshyaloud and maze-selection measures and scores on the MBST were calculated (see Table 3) Correlations ranged from 75 to 81 The magnitude of the correlations was similar across type of measure (reading aloud and maze) and method of scoring For reading aloud correlations for 1 2 and 3 minshyutes were virtually identical For maze selection a consisshytent but small increae in correlations was seen across time frames with correlations of 75 (CMC) and 77 (CMI) for the 2-minute measure and 80 (CMC) and 81 (CMI) for the 4-minute measure

In summary results revealed that both maze selection and reading aloud produced respectable altemate-fom1 reliabilshyities although reading aloud yielded consistently larger reshyliability coefficients than maze Few differences in reliabilishyties were seen for scoring procedure or time frame with the exception that reliabilities for the maze selection increased somewhat with time Predictive validity coefficients were similar for the two types of measures Correlations were simshyilar across scoring procedures for both measures With regard to time frame smal but consistent increases in correlations were seen for maze selection

Discussion

ln this study we examined the reliability and validity ofreadshying aloud and maze selection as indictors of performance on

LEARNING DISABILITIES RESEARCH 65

a state standards test Difference in technical characteristics related to time frame and scoring procedure were examined

Both reading aloud and maze selection showed reasonshyable alternate-fonn reliabilities at all time frames with most coefficients at or above 80 In general reading aloud reshysulted in higher alternate-form reliability coefficients (rangshying from 93 to 96) than did maze selection (ranging from 79 to 96) but reliability for maze selection was in the range typical for CBM Time frame did not influence reshyliability coefficients for reading aloud but had some influshyence on maze selection Obtained reliability coefficients for maze increased with time frame with coefficients for the 2-minute time frame havering around 80 but increasing for 3-minute (r = 86) and 4-minute (r = 88 and 96) time frames Finally scoring procedure had little effect on reliashybility with the exception that when 4-minute maze selection was scored for CMI reliability was somewhat larger (r = 96) than when it was scored for CMC (r = 88)

Like reliability coefficients validity coefficients were quite similar across type of measure time frame and scoring procedure Validity coefficients for reading aloud ranged beshytween 76 and 79 and were similar across scoring procedure and time frames Maze-selection coefficients ranged between 75 and 81 and also were similar across scoring procedure A systematic increase in validity coefficients was seen with an increase in time for maze hut differences were small

We wish to make two observations regarding the magnishytude of the validity coefficients found in the performance study First the correlations obtained in our study were larger than those found in previous research at the middleshyschool level For example Yovanoff et al (2005) reported correlations of 5 l and 52 between WRC in 1 minute and scores on a reading comprehension task for eighth-grade stushydents Espin and Foegen (I 996) reported correlations of 57 and 56 respectively between WRC in 1 minute and CMC in 2 minutes and scores on a reading comprehension task

One might hypothesize that the differences in correlashytions are related to the materials used to develop the CBMs although no consistent pattern of differences can be seen across studies Yovanoff et al (2005) used grade-level prose material Espin and Foegen (1996) used fifth-grade level exshypository material and we used fifth-to seventh-grade humanshyinterest stories from the newspaper-material that might be considered to be both narrative and expository Moreover previous research conducted at the elementary-school level has revealed few differences in reliability and validity for CB Ms drawn from material of different difficulty levels or from various sources (see Wayman et al 2007 for a review)

It is possible that differences are related to the criterion variable used Both Yovanoff et al (2005) and Espin and Foegen ( 1996) used a limited number of researcher~designed multiple-choice questions as an outcome whereas in our study we used a broad-based measure of comprehension deshysigned to scale student performance across a range of levels Supporting this hypothesis are data from two studies demonshystrating nearly identical correlations (in the 70s) to those we found between the CBM reading-aloud and maze-selection measures and the MBST (Muyskens amp Marston 2006 Ticha Espin amp Wayman 2009) In addition Ticha et al (2009)

found high correlations between maze-selection scores and a standardized achievement test

Second the state standards test used in the current study was designed to test the minimal reading competency for students in eighth grade Tims one might question whether the CB Ms would predict reading competence as well if the criterion measures were measures ofbroader reading compeshytence Results ofTicha et al (2009) indicate that the reading measures predict performance on a standardized reading test as well as (or better than) they predict performance on the state standards test Perhaps the nature of the state test serves to reduce the overall variability in scores and thus serves to reduce the correlations Replication of the current study Vlth other outcome measures of reading proficiency is in order

In summary the results supported the reliability and vashylidity of both reading aloud and maze selection as indictors of perfom1ance on a state standards reading test for middleshyschool students For reading aloud our data combined with practicalmiddot considerations would suggest use of a I-minute sample scored for TWR or WRC as a valid and reliable indishycator of perfonnance Little was gained in technical adequacy by increasing the reading time Given that reading aloud is typically scored for WRC and given that this scoring proshycedure is no more time consuming than scoring TWR we would recommend scoring the sample for WRC rather than TWR

For maze selection our data combined with practical considerations wouJd suggest use of a 3-rninute selecshytion task scored for CMC as a valid and reliable indicator of performance Although reliability and validity coeffishycients were the strongest for 4 minutes the differences beshytween 3- and 4-minute coefficients were small in magnitude and both data collectors and teachers reported anecdotalty that a 4-minute maze task was tedious for the sn1dents to complete

Altl1ough our data support the use of both WRC in J minute and CMC in 3 minutes as predictors of performance on a state standard test one might ask how teachers can use such data in their decision making A common approach is to create a district-wide cutoff score on the CBM that is associated with a high probability of passing the state standards test For example district-wide data may show that of students who read 145 WRC in 1 minute 80 percent pass the state standards test Teachers might then set a goal of I 45 WRC in 1 minute for their students The disadvantage of a cutoff score for students who struggle in reading is that these students often perform weil below the cutoff score An alternative approach is to present the relationship between performance on the CBM measures and the likelihood of passing the state standards test along the entire performance continuum For example district-wide data may show that of students who read l 00 WRC in 1 minute 26 percent pass the state standards test but of students who read 126 WRC in I minute 57 percent pass Teachers may choose to set an annual goal of 126 WRC for a student who begins the year reading only 100 WRC This goal would move the student closer to a level oflikely success A method that can be used to create these Table of Probable Success using CBM data is explained and illustrated in Espin et al (2008)

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 6: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

LEARNING DISABILITIES RESEARCH 65

a state standards test Difference in technical characteristics related to time frame and scoring procedure were examined

Both reading aloud and maze selection showed reasonshyable alternate-fonn reliabilities at all time frames with most coefficients at or above 80 In general reading aloud reshysulted in higher alternate-form reliability coefficients (rangshying from 93 to 96) than did maze selection (ranging from 79 to 96) but reliability for maze selection was in the range typical for CBM Time frame did not influence reshyliability coefficients for reading aloud but had some influshyence on maze selection Obtained reliability coefficients for maze increased with time frame with coefficients for the 2-minute time frame havering around 80 but increasing for 3-minute (r = 86) and 4-minute (r = 88 and 96) time frames Finally scoring procedure had little effect on reliashybility with the exception that when 4-minute maze selection was scored for CMI reliability was somewhat larger (r = 96) than when it was scored for CMC (r = 88)

Like reliability coefficients validity coefficients were quite similar across type of measure time frame and scoring procedure Validity coefficients for reading aloud ranged beshytween 76 and 79 and were similar across scoring procedure and time frames Maze-selection coefficients ranged between 75 and 81 and also were similar across scoring procedure A systematic increase in validity coefficients was seen with an increase in time for maze hut differences were small

We wish to make two observations regarding the magnishytude of the validity coefficients found in the performance study First the correlations obtained in our study were larger than those found in previous research at the middleshyschool level For example Yovanoff et al (2005) reported correlations of 5 l and 52 between WRC in 1 minute and scores on a reading comprehension task for eighth-grade stushydents Espin and Foegen (I 996) reported correlations of 57 and 56 respectively between WRC in 1 minute and CMC in 2 minutes and scores on a reading comprehension task

One might hypothesize that the differences in correlashytions are related to the materials used to develop the CBMs although no consistent pattern of differences can be seen across studies Yovanoff et al (2005) used grade-level prose material Espin and Foegen (1996) used fifth-grade level exshypository material and we used fifth-to seventh-grade humanshyinterest stories from the newspaper-material that might be considered to be both narrative and expository Moreover previous research conducted at the elementary-school level has revealed few differences in reliability and validity for CB Ms drawn from material of different difficulty levels or from various sources (see Wayman et al 2007 for a review)

It is possible that differences are related to the criterion variable used Both Yovanoff et al (2005) and Espin and Foegen ( 1996) used a limited number of researcher~designed multiple-choice questions as an outcome whereas in our study we used a broad-based measure of comprehension deshysigned to scale student performance across a range of levels Supporting this hypothesis are data from two studies demonshystrating nearly identical correlations (in the 70s) to those we found between the CBM reading-aloud and maze-selection measures and the MBST (Muyskens amp Marston 2006 Ticha Espin amp Wayman 2009) In addition Ticha et al (2009)

found high correlations between maze-selection scores and a standardized achievement test

Second the state standards test used in the current study was designed to test the minimal reading competency for students in eighth grade Tims one might question whether the CB Ms would predict reading competence as well if the criterion measures were measures ofbroader reading compeshytence Results ofTicha et al (2009) indicate that the reading measures predict performance on a standardized reading test as well as (or better than) they predict performance on the state standards test Perhaps the nature of the state test serves to reduce the overall variability in scores and thus serves to reduce the correlations Replication of the current study Vlth other outcome measures of reading proficiency is in order

In summary the results supported the reliability and vashylidity of both reading aloud and maze selection as indictors of perfom1ance on a state standards reading test for middleshyschool students For reading aloud our data combined with practicalmiddot considerations would suggest use of a I-minute sample scored for TWR or WRC as a valid and reliable indishycator of perfonnance Little was gained in technical adequacy by increasing the reading time Given that reading aloud is typically scored for WRC and given that this scoring proshycedure is no more time consuming than scoring TWR we would recommend scoring the sample for WRC rather than TWR

For maze selection our data combined with practical considerations wouJd suggest use of a 3-rninute selecshytion task scored for CMC as a valid and reliable indicator of performance Although reliability and validity coeffishycients were the strongest for 4 minutes the differences beshytween 3- and 4-minute coefficients were small in magnitude and both data collectors and teachers reported anecdotalty that a 4-minute maze task was tedious for the sn1dents to complete

Altl1ough our data support the use of both WRC in J minute and CMC in 3 minutes as predictors of performance on a state standard test one might ask how teachers can use such data in their decision making A common approach is to create a district-wide cutoff score on the CBM that is associated with a high probability of passing the state standards test For example district-wide data may show that of students who read 145 WRC in 1 minute 80 percent pass the state standards test Teachers might then set a goal of I 45 WRC in 1 minute for their students The disadvantage of a cutoff score for students who struggle in reading is that these students often perform weil below the cutoff score An alternative approach is to present the relationship between performance on the CBM measures and the likelihood of passing the state standards test along the entire performance continuum For example district-wide data may show that of students who read l 00 WRC in 1 minute 26 percent pass the state standards test but of students who read 126 WRC in I minute 57 percent pass Teachers may choose to set an annual goal of 126 WRC for a student who begins the year reading only 100 WRC This goal would move the student closer to a level oflikely success A method that can be used to create these Table of Probable Success using CBM data is explained and illustrated in Espin et al (2008)

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 7: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

66 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

In conclusion our study supports the technical adequacy of both reading aloud and maze selection as indicators of perfom1a11ce on a state standards reading test In the past technical adequacy research would stop here with the asshysumption that if both mealtmres were shown to be valid and reliable with respect to pe1formance on a criterion measure then both measures would reflect growth as progress meashysures However development of more advanced statistical techniques such as Hierarchical Linear Modeling now allow for examination of the characteristics of CBM measures as progress as well as performance measures ( eg see Shin Esshypin Deno amp McConnell 2004) From our original sample we had access to one classroom wit11 31 students for weekly progress monitoring Although this sample size was too small to produce generalizable resuJts about typical growth rates on the measures it was large enough to conduct an exploratory within-snbject comparison of the growth rates produced by the two measures for a sample of students Specifically we examined differences in the sensitivity of the two measures to growth and their relation to performance on the state stanshydards test Results of this exploratory study could help us to generate hypotheses for future research

STUDY 2 EXPLORATORY PROGRESS STUDY

Method

Participants

Participants in exploratory progress study were selected from the original sample and were 31 (10 male 21 female) stushydents from one classroom in tJ1e first school described above Fifty-five percent of the students were eligible for free or reshyduced lunches Students were Caucasian (42 percent) Asian American (26 percent) African American ( 16 percent) Hisshypanic (10 percent) and Native American (6 percent) Ten percent of the students received special education services for emotional disturbance or speech-language difficulty Sixshyteen percent of the students were identified as English lanshyguage learners (ELL) but did not receive ESL services The mean standard score on the state standards reading test for the students was 64617

Procedures

Sn1dents were monitored weekly on both a maze-selection task administered in a group setting by the classroom teacher and a reading-aloud task administered on an individual basis

by a member of the research team The maze-selection and reading-aloud tasks were created from the same passages each week Sixteen passages were selected from humanshyinterest stories from the newspaper Passages that required specific background knowledge (eg knowledge of the game of baseball) were eliminated from consideration For the reshymaining passages readability levels were calculated using both the DRP (Touchstone Applied Science and Associates 2006) and Flesch-Kincaid (Kincaid et al 1975) In addishytion teachers vere consulted regarding appropriateness of the passages for secondary-school students A final set of l O passages was selected based on readability formula and teacher input DRP scores ranged from 51 to 61 representing approximately a sixth-grade level and Flesch-Kincaid readshyability level was between the fifth and seventh grade levels Passages were on average 750 words long Alternate-form reliabHities between adjacent pairs of passages are reported in Table 4 All reliabilities were statistically significant alt but one were above 70 and all but three were above 80 For reading aloud reliabilities ranged from 79 to 92 and for maze selection from 69 to 90

Maze selection was administered first usually on a Monshyday and reading aloud was administered on a subsequent day within the same week usually on a Friday Progress data were coiected over a period of approximately 3 months yielding an average of IO data points per student (note that during vacation weeks no data were collected)

Scoring

Maze selection was administered by the classroom teacher using a standard script Maze-selection probes were scored by graduate students Prior to administering the measure the first time the teacher observed one of the members of the reshysearch team administering the maze task to her class FideUty of treatment checks were conducted at equal intervals three times during the course of the study to assess accuracy of the administration and timing of the maze For each fidelity check the teacher was found to read the directions and comshyplete the timings correctly Reading aloud was administered and scored by 11 of the data collectors from the original study Every week l O of the reading-aloud samples were tape-recorded and checked for fidelity and reliability and 10 of the maze-selection passages were checked for accuracy of scoring On all occasions data coJlectors read the directions and timed correctly for the reading-aloud samples Accushyracy of scoring for reading aloud and maze selection was checked by the two graduate students involved in the study

TABLE 4 Alternate-Form Reliability tor Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud words COTeet 1 mimle Passages I and 2 2 and 3 3 and 4 4 and 5 5and 6 6and 7 7 and8 fland9 9 and JO

92 91 85 88 88 86 79 84 83

Maze correct choices 3 mi1111res

72 84 69 80 80 85 90 83 74

11 = 25 to 3 Now A correiations significant a p lt 01

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 8: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

Percentage agreement between the graduate students and scorers was calculated by dividing agreements by agreeshyments plus disagreements Accuracy of scoring across the study for both maze selection and reading aloud ranged from 97 percent to I 00 percent

Results

The sensitivity of the measures to growth and the validity of the grovrth rates produced by the measures were examined Growth curve analyses were carried out using the MIXED procedure of SAS 91 Linear mixed effects growth curves were used (see Fitzmaurice Laird amp Ware 2004 ch 8) with different models specified for each measure Similar patterns of results were found for each time frame and scoring proshycedure within both reading aloud and maze selection thus we report results selectively First we report on the meashysures with the best reliability validity and efficiency from the performance study-WRC in a -minute reading-aloud and CMC in a 3-minute maze-selection task Second to proshyvide a direct comparison of reading aloud and maze selection with time held constant we also report results for WRC in a 3-minute reading-aloud task Finally taking into considshyeration practicality of use we repoti results for CMC in a 2-rninute maze selection A 2-minute maze selection is more practicaJ than a 3-minute maze selection for ongoing and freshyquent progress monitoring and we considered the reliability and validity coefficients for 2-minute maze selection to be within an acceptable range for progress monitoring

Sensitivity to growth for reading aloud The growth curve model for reading aloud was a simple linear grovrth curve

(1)

where i is the participant subscript i = l N and j is the wave subscript j = I ni In Equation (1) fgto is the intercept (status at wave l and 3 1 is the linear slope with tu = j - I and su = hoi + hutu + eu which is the random effects stmcture with boi being the deviation of an individuals individual intercept from the mean intercept h1 1

being the deviation of an individuals slope from the mean slope and ef is random error (Fitzmaurice et al 2004 ch 8) Restricted maximum likelihood was used for parameter estimation and degrees of freedom (df) for the t tests of the parameter estimates were estimated using the method of Kenward and Roger (l 997)

The observed means and predicted means based on the linear model for I-minute reading aloud are presented at the top of Figure 1 Detailed results of all the growth curve analshyyses are in Table 5 Results reveal that the linear slope WdS

significant fe = 84 1(245) = 263 p = 009 and the inshytercept was significant Pu= 13901 1(288) = 2487 p lt 0001 The variance of the linear slopes was estimated to be zero so the df were based on the sum of all the time points over all participant E 1n i rather than the number of particshyipants (N) This resulted in relatively high df (ie 245) for the t test of the slopes However the result is still significant

LEARNING DISABILITIES RESEARCH 67

when based on the same df used for testing the intercept (ie 1(288) = 263 p = 014) The mean linear slope indicates that the number ofWRC in l minute tended to increase at a rate of 84 per wave

The observed means and predicted means based on the linear model for 3-minute reading aloud are presented at the bottom of Figure I The linear slope for WRC3 was not significant p = -041 1(294) = -055p = 058 but the intercept was significant ~o = 42004 1(29) = 2671p lt 0001 The linear slope indicates that WRC in 3 minutes did not show a significant increase over time (Although not reported here the 2-minute reading-aloud measure scored for WRC also showed no significant change over time)

SensitivifV to growth fbr maze selection The observed and predicted means for 3- and 2-minute maze selection scored for CMC are presented in Figure 2 As illustrated in Figure 2 there was a change of direction at wave 8 for the n1aze-selection scores This presented a problem for the analyses as the major goal was to estimate linear growth over time After a close examination of the data and discussions with the teacher it was determined that this shift might be due to a passage effect To address this problem a piecewise or spline model was used to fit an additional linear predictor starting at wave 8 to account for the observed nonlinearity (see Ruppert Wand amp Carroll 2003 ch 3) That is we decided to model thenearMlinear growth apparent before wave 8 without deleting any of the data The spline growth curve model was

~=~+fi0+~0+~ w In Equation (2) f 0 is the intercept (constant across all 10

waves) fJ I is the linear slope over wave 1 to wave 7 (with tu = j - l) and (3 2 is the linear slope starting at wave 8 with ti = 0 for wave I through wave 7 and t = (tu - 7) starting at wave 8 Random effects terms were specified for each linear slope and the intercept that is E) = boi + hiij + h2t + eu Primary interest was on fgt 1 as this was the linear slope for waves 1-7

The observed and predicted means based on the spline model for 3-minute maze selection are shown at the top of Figure 2 The results show that each parameter estimate of the spline model was significant Po= 2122 1(253) = 1722p lt 0001 jJ =288 1(327) = 1279p lt 0001 and fi2 = -726 1(227) = -1082p lt 0001 (The random effects component of the linear slope starting at wave 8 was estimated to be zero accounting for the higher df for the test of Ho f = 0 also see Table 5) The latter two estimates indicate there was an overnll rate of increase of 288 CMC in 3 minutes per wave but a decrease of 726 per wave at wave 8

TI1e observed and predicted means based on the spline model for 2-minute maze selection scored for CMC are pre~ sented at the bottom of Figure 2lt Each parameter estimate of the spline model was significant Po= 1358 1(2400) = 1693 P lt 0001 p = 211 1(327) = 1285 P lt 0001 and P2 = -5 75 1(225) = -1019 p lt 000 l Thus for the 2-minute maze there was an overall rate of increase of 217 CMC in 2 minutes per wave but a decrease of 575 per wave

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 9: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

68 ESPlN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Linear Model Words Read Correct I Minute

bull L -- --middot -- - ~ _- _ r

r

-WRCl Obs

bullWRCJ Pred

2 6 7 9 IIJ

Wave

Linear Model Words Read Correct 3 Minutes

460

180

160

140

120

100

80

60

40

20

II

+40

420

400

380

360

340

320

300

21W

I I ~ T - - ~- -- - -

I )__WRC3 Obs I _

WRC3 Prcd I

2 10

Wave

FIGURE I Orw- and 3-rnimlle rending aloud words read correctly) observed and predicted means by wave

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 10: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

TABLE 5 Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p J1aue

WRCJ

io f

1381 0839

55538 0319

288 245

2487 263

lt0001 0091

WRC3

fo 42004 [57232 29 2671 lt0001 fi I -04057 07333 294 -055 05842

CMC2

fo ]35786 08022 24 1693 lt0001 f 21674 01686 327 1285 lt000 h -5747 05642 225 -1019 lt0001

ClvC3

fio 2 22 12321 253 1722 lt0001 f 28808 02253 327 1279 lt0001 f 2 - 72639 06716 227 -1082 lt000

Ole YRC WRC3 Word read correctly in l and 3 minutes CMC2 CMC3 Correct maze choices in 2 and 3 minutes

at wave 8 (Although not reported here a similar pattern of results was found for 4-minute maze-selection measures)

In summary the results of the growth curve analysis for the reading-aloud and maze-selection measures demonshystrated that reading aloud showed minimal or no growth over time whereas maze selection showed significant and substanshytial growth except after week 7 This pattern of results held generally across scoring procedure and time frame Thus there were no differences in patterns of growth found for the different scoring methods for either reading aloud or maze selection With regard to time frame maze selection demonshystrated significant and substantial growth for both 2-minute (217 CMC) and 3-minute (288 CMC per week) time frames (although recall that these growth rates were obtained folshylowing a correction for a score shift) However results for reading aloud revealed a statistically significant but minimal growth rate for the ]-minute (84 words per wee-k) readingshyaloud measure but no significant grmvth for the 3-minute (or 2-minute) measure

One might conjecture that reading aloud might be more sensitive to growth for lower-perfonning than for highershyperfom1ing students However examination of Figure 3 which presents individual student data across time reveals that growth across time for reading aloud (top graph) was fairly flat for those at both the lower and higher levels of CBM performance In contrast growth across time for maze selection (bottom graph) reflects a fanning out of scores over time with students at the higher levels of CBM performance reflecting larger gains than those at the lower levels of perforshymance The meaningfulness of the growth rates produced by the maze-selection measures were examined in the followshying analysis Reading aloud was not included in this analysis due to the lack in interindividual variability in growth rates which would mean that slopes would not be correlated with other variables

Relation between growth rates and pefiJrmance on the lvBST To examine the validity of the growth rates we

LEARNING DISABILITIES RESEARCH 69

investigated the extent to which the growth rates produced by the 3-minute maze-selection (CMC) measures were related to performance on the MBST Specifically we used MBST scores as predictors of linear slopes and intercepts but were primarily interested in the former A random effects was associated with each fixed effects in the same manner as the groVth curves above (Le tu= bor + b1itiJ + eu or tii = bOi + hutij + h2itij- + eif) The mean score on the MBST for participants in Study 2 was a standard scored of 64617 (SD= 3811) with a range of587 to 750 Letm1 = the MBST score for the ith participant which is a static predictor (not varying over time) For the CMC the statistic predictor was incorporated into the spline model

YJ = fJo + fJ1tJ + f32tlj + (hmi + J4mit1J + tu (3)

In Equation (3) f3 4 represents the association between the MBST and the slopes for waves 1-7

The number of CMC in 3 minutes resulted in significant growth over time and this growth was related to perfonnance on the MBST with students passing the MBST obtaining higher rates of growth over time than those not passing the MBST (J = 0009 1(313) = 235 p = 0025) Figure 4 shows the predicted curves based on the estimated parameters of Equation (3) for MBST scores of 500 (not passing) and 700 (passing) Note the students passing the MBST start higher and increase at a faster rate of change Although not reported here this same pattern of results also was found for the 2-minute maze-selection measure

Discussion

The characteristics of the measures as progress measures were compared for a small subset of our original partici~ pant sample For these students only maze selection resulted in substantial and significant growth over time (288 selecshytions per week for a 3-minute sample) while the J -minute reading-aloud measure revealed statistically significant but minimal growth over time (84 WRC per week) Controlling for time frame and using a 3-minute reading-aloud task did not increase the amount of growth In fact a 3-minute samshyple of reading aloud resulted in no significant growth over time Further the growth rates produced by maze selection were significantly related to performance on the MBSTshystudents with higher scores on the MBST also grew more on the maze-selection measures Growth on the I -minute reading-aloud measure was not related to performance on the MBST

The differences in growth for reading-aloud and mazeshyselection measures are surprising and difficult to explain especially in light of the good technical adequacy of both measures as performance measures One obvious explanashytion might be differences in the materials used to construct the measures-but recall that the reading-aloud and mazeshyselection tasks were created from the same passages each week Another obvious explanation might be a bunching of scores over time because of a ceiling effect on the reading aloud however inspection of the data reveals no ceiling efshyfect for the reading-aloud scores in the original study In

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 11: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

70 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Spline Model Correct Maze Choices 3 Minutes 50

j 45

L _ 40 7 35 -----bull

~ ~ -JO -middot -~ i-- 25 ~- ~

CMC30bs

20 --e bullCMC3 Pred bull 15

Ill

5

II J 4 s 6 7 8 9 Ill

Wave

Spline Model Correct Maze Choices 2 Minutes 50

45

o

35

J()

25

20

I 5

Iii

Ii

I - - -

- 7 - bull _ - -- -~

_ ~-

P 1-----cMC20bs I -CMC2Pred

I

I 4 ( 7 8 10

lave

FIGURE 2 Three- and 2-minutc maze selection (correct maze choices) observed and predicted means by wave

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 12: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

71 LEARNING DISABILITIES RESEARCH

Words Read Correctly in 1 Minute 230

210

190

170

150

130

110

90

70

50

7- lt

2 3 4 5 6 7 8 9 10 Wave

Correct Maze Choices in 3 Minutes 80

70

60

bull 50 B middots

-5 40 0 ~ 0

30

20

10

0

Mean

2 3 4 5 6 7 8 9

Wave

FIGURE 3 Individual growth fates for l-mim1te reading aloud (words read correctly) and 3-minute maze selection (correct maze choices)

addition examination of the individual growth rates over na1 possible explanation is that the maze-selection task was time as illustrated in Figure 3 reveals no bunching of scores relatively novel to the students while the reading-aloud task over time Both higher- and lower-performing students (in was not and that growth on the maze selection was due to terms of CB M scores) tended to maintain their relative levels practice on the task rather than improvements in reading of perfom1ance over time When viewing a similar picture performance However both the reading-aloud and mazeshyof the maze selection 3-minute task (Figure 3 ) one sees a selection tasks were novel to the students (Le the district did fairly steady increase in scores for all students with a fanshy not regularly collect reading-aloud data on the students) and ning out of the scores over time This fanning out is due the relation between growth on the maze and the criterion to higher-performing students showing greater grovrth than variable would argue against simple practice effects the lower-performing students an observation confirmed by A more plausible reason for differences in growth rates the subsequent analysis with the state standards test A fi- produced by the measures might be the order in which the

10

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 13: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

72 ESPJN ET AL CREATING A READING PROGRESS MEASURFMENT SYSTEM

50

45 40 35

30

25

20 I 5

O

5

0 +-=+-------------CC+-------------------------------------------i 2 4 6 7 8 9 IO

Waw

FIGURE 4 Relationship between 3-minmc maze sckclion (correct maze choices) and Minnesota Basic Standards Test (MBST) scores

measures were administered In the cun-ent study maze alshyways was administered first and reading aloud second Pershyhaps completing the maze task diminished the sensitivity of the reading-aloud measure to change over time PJthough the effect of order must be considered we would note that in a follow-up study in which the reading-aloud measure was administered first and maze selection second (Ticha et al 2009) a similar pattern of results was obtained with maze selection reflecting growth over time but reading aloud not2

Another plausible reason f0r differences in the measures may be related to the small convenience sample used ln this exploratory study Compared to the larger sample students in the exploratory study had relatively higher mean MBST scores (64617 vs 62690) Correlations between the CBM and MBST scores for this subsample were lower tlrnn for the larger sample ( see Table 6) ranging from 30 to 34 for readshying aloud and from 48 to 57 for maze selection (despite no restriction in the range of scores on the predictor and criteshyrion variables) In addition correlations for reading aloud are consistently and substantially lower than for maze selection Perhaps for this particular sample of students reading aloud did not function as a reasonable indicator of growth but maze selection did However in the follow-up study referred to earlier (Ticha et al 2009) similar results were obtained regarding grOvth rates produced by reading aloud and maze selection It is important to note that in the fo11ow-up study performance-related correlations between the CBM and the MBST were virtually identical to those obtained in this study

Our results suggest that there may be differences in the characteristics of the measures when used as predictors of performance versus measures of progress and that both should be considered when examining the technical adeshyquacy of the measures Our sample is too small to draw conclusions regarding the best measure for monitoring the progress of secondary-school students but it does suggest the need for further research examining the characteristics of the two measures for reflecting grmvth over time for older students It may be that although students as a group do not reach a ceiling in scores each individual student reaches a natural level of reading fluency that when compared to

TABLE 6 Correlations for Readlng Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBAf measure and scoi-ing pmcedure Time

Reading aloud I minuie 2 minutes 3 minutes Total words read 30 32 33 Words read correct 32 ll 34

Maze selection 2 minwes 3 minures 4 minutes Correct choice 52 50 48--- Correct minus 57 55deg 5 l

incorrect choices

II= 31 hplt Ol

Note MBST Minnesota Basic Standards Test

others reveals a general level of reading proficiency but does not change with time ln addition despite the fact that in our study neither lower- nor higher-performing students showed growth on the reading-aloud measure it would be important to replicate the findings with a large cross-grade sample of just struggling readers

Although not the original intent of the study this exshyploratory study also provides us with data regarding betweenshypassage variability The order in which the measures were adminjstered was not counterbalanced across students preshyventing us from drawing conclusions regarding typical growth rates for lower- and higher-performing readers Hmvshyever the design does allow us to examine characteristics of the passages themselves as growth measures As is evident in Figure 2 students displayed a steady rate of growth on the maze-selection measure until week 8 when there WdS a sudden spike in scores for nearly all students followed by a fairly low score in week 9 for nearly all students Even more interesting if one examines the reading-aloud graph in Figure 2 this same spike in scores on week 8 is not evishydent (recall that the same passages were used for reading aloud and maze selection each week) Perhaps the simplest explanation for this pattern would be administration error on the particular days that maze passages 8 and 9 were given Although fidelity checks on teacher adminishmiddotation of maze selection revealed that the teacher administered the passages correctly those checks were conducted only three times durshying the course of the study and were not done on the days that passages 8 and 9 were administered However when asked the teacher reported no particular problems with adshyministration on those days ( although one must still consider administration error a potential explanation for the pattern of resu Its)

There is however a potential somewhat troubling exshyplanation for the pattern seen at points 8 and 9 related to between-passage variability As discussed in Wayman et al (2007) determination of passage difficulty is an important but complicated task for progress monitoring The task is important because passages must be equivalent if we are to attribute growth over time to change in student performance as opposed to passage variability The task is complicated beshycause it is difficult to predict what may make a passage easy

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 14: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

or difficult The most common technique for determining passage equivalence use of readability formulas is not relishyable (see Ardoin Suldo Witt Aldrich amp McDonald 2005 Compton Appleton amp Hosp 2004) Especially troubling with respect to our data is that whatever affected the scores in week 8 on maze selection did not affect the scores on readshying aloud Thus it would not be merely the characteristics of the passage itself but the characteristic of the passage as a maze passage that produced the variability for this particular sample3

We suggest further examination of the effects of passage variability on growth and emphasize the need for careful and systematic approaches to developing equivalent passages for CBM progress monitoring especially when that progress monitoring is to be used as a part of a high-stakes decisionshymakingprocess Perhaps the best guarantee of passage equivshyalence is to administer the passages to a group of students and examine whether the group as a whole obtains higher or lower scores on particular passages orto use the same passage for repeated testing (see Griffiths VanDerHeyden Skokut amp Lilies 2009 for a discussion of these approaches) If using parallel forms it would be wise to counterbalance the order in which the passages are administered especially if the goal is to establish nonnative growth rates on the measures

CONCLUSION

We examined the technical characteristics of two CBM readshying measures as indicators of performance for middle-school students We also conducted an exploratory study to examine the characteristics of the measures as progress measures Our goal was to develop measures that could be used to monishytor the progress of students with reading difficulties howshyever to examine technical characteristics of the measures we needed to include students across a range of performance levels Our results supported the use of both reading aloud and maze selection as indicators of perfiJrmance on a state standards test representing survival levels of reading perforshymance Reliability and validity were good for both measures and within the range of levels found in previous research at the elementary-school level Further few differences were found related to scoring procedure or time frame although reliability did increase somewhat for maze with an increase in time frame Given the results of the perfonnance study and talcing into account practical considerations we would recommend use of WRC on a 1-minute reading-aloud task or CMC on a 3-minute maze-selection task as indicators of performance

Results of the exploratory progress study implied that it is important to consider technical adequacy of the measures as both performance and progress measures ln our study only maze selection revealed growth over time reading aloud did not Further growth on the maze-selection measure was related to performance on the MBST

The studies represent only the first step in the development of CBM for monitoring the progress of students with reading difficulties at the secondary-school level First our results ap-

LEARNING DISABILITIES RESEARCH 73

ply only to the sample used in the study and replication with other samples is necessary Second our research addressed students at the middle-school level There is a need for reshysearch at the high school level We do not know whether our results would generalize to older studems Third our progress study was a pilot study Results must be repHcated with a larger representative sample Fourth once reliable and valid measures are developed it will be important to examine whether teacher use of the measures leads to improvement for struggling readers That of course is the ultimate goal of the research program and one which will need to be examined directly at both the middle- and high school level because one cannot assume that the positive results for use of the mea~ures at the elementary-school level (see Stecker Fuchs amp Fuchs 2005 for a review) will necessarily replicate at the secondary-school level Finally although our result support the use of the CBM reading measures as indicators of pershyformance in reading we would support the use of multiple measures for determination of students need for additional intensive reading instruction

ACKNOWLEDGMENTS

The research reported here was funded by the Office of Special Education Programs US Department of Education Field Initiated Research Projects CFDA 84324C We thank Mary Pickart for her contributions to this research and Stanshyley Deno for his insights We also thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript

NOTES

1 TI1e MBST in reading is being replaced by the Minnesota Comprehensive Assessment a more broad-based reading test that is given annually in 3rd through 8th grade and again in I 0th grade

2 We would like to note that there was an error in the Ticha et al (2009) article In the methods section the maze selection is said to be given first and the- reading aloud second Later in the discussion section the reading aloud is said to be given first and the maze selection second In fact the reading-aloud measures were given first and the maze-selection measures second

3 We hypothesize that the difficulty of the passage in week 8 had to do with a fairly infrequent word appearing in the very first maze selection item that created difficulties for all students In the reading-aloud measure this word was supplied after 3 seconds and thus may have had less of an effect on the overall score of the students

REFERENCES

Ardoin S P Suldo S M Win J Aldrich S amp McDonald E (2005) Accuracy of readability esiima1esmiddot predictions of CBM performance School Aycwlogy Review 20 1-22

Brown-Chidsey R Davis L amp Maya C (2003) Sources of variance in curriculum-based measures of silent realtling Psyc10ngy in the School 40 363-377

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 15: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

7 4 ESPIN ET AL CREATING A READING PROGRESS MEASUREMENT SYSTEM

Center on Education Policy (2008 S1a1e high school c~rit exams A move toshyward end-of-cow1middote exams Wnshington DC US Government Printshying Office Retrieved on March 25 2009 from wwwcep-dcorg

Compton D L Appleton A amp HospM K (2004) Exploring the relalionshyship between text-leveling systems and reading accuracy and fluency in second grade students who are average and poor decoders Learning Disabilities Research amp Practice 19 176-84

Crawford L Tindal G amp Stieber S (2001) Using oral reading rnteto preshydict student perfom1ance on statewide achievement tests Edt1ca1ional Assessme111 7 303-323

Deno S L (1985) Curriculum-baselmcasurement The emerging alternashytive EXcep1io11al Children 52 219-232

Deshler D D Schumaker JB Alley G B Warner M M amp Clark F L ( l 982) Learning disabilities in adolescent and young aduh populations Research implications Focu on Exceptiona Children 15(1) l-12

Espin C A Deno S L Mart1yama G amp Cohen C (1989) The Basic Academic Skills Samples (BASS) An inslrumentJbr the screening and idemfficalion of children at riskforfailure in regular education class~ rooms Paper presented at the National Convention of the A1rn-rican Educational Research Association March

Espin C A amp Deno S L (1993a) Performance in reading from conshytem area text a~ an indicator of achievement Remedial and Special Educaiion 14 47-59

Espi11 C A amp Deno S L (J993b) Content-specific and general reading disabilities of secondary-level students Identi1lcation and educational relevance The Journal l( Special Education 27 321-337

Esp in C A amp Deno S L ( 1994-95) Curriculmn-based measures for secondary students Utility and task specificity or text-based reading and vocabulary measures for predicting performance on content-area tasks Dingnosfique 20 121-142

Esp in C A amp Foegen A ( 1996) Validity of general outcome measures for predicting secondary students performance on content-area tasks Exceptional Children 62 497-514

Esp in C A Wallace T Campbell H Lembke E S Long J D amp Ticha R (2008) Curricuium-baseltl measurement i11 writing Predicting 1he success of high-school smdents on state standards ests Erceptional Children 74 74middot-193

Fcwster A amp MacMillan P D (2002) School-based evidence for the validshyity of curriculum-based measurement of reading and writing Remedial a11d Special Educntion 23 49-156

Fitzmaurice G M Laird N M amp Ware J H (2004) Applied o11git1tdinal unalysis New York Wiley

Fuchs D Fuchs L S Mathes P G amp Lipsey M W (2000) Reading differences between ow-acl1ieving students with and without learning disabilities A meta-analvss In R Gersten E Schiller amp S Vaughn (Eds) Research synrhess in special education (pp 8 --104) Mahwah NJ Erlbaum

Fuchs L S Fuchs D Hamlett C L amp Ferguson C (1992) Effects of expert system consultation within curriculum-based measurement using a reading maze task Exceptionul Children 58 436-450

Fuchs L S Fuchs D amp Maxwell L (1988) The validity of Informal reading measures Remedial and Secial Educario11 9 20-28

Griffiths A J VanDerHeydeyn A M Skokut M amp Lilles E (2009) Progress monitoring in oral reading fluency within the context of RTL School Aychoogv Quarter(v 24 lJ-23

Hintze J M amp Silberglitt B (2005) A longitudinal examination of the diagnostic accuracy and predictive validity ofR-CBM and high-stakes testing School Psyc10ogr Review 34 372-386

Jenkins J R amp Jewell M ( J 993) Examining the validity of1wo rne11sures for formative teaching Reading aloud and maze Exceptional Children 59 429--432

Ken ward M G amp Roger J H ( 1997) Small sample inference for fixed e0Ccts from restricted maximum likdihood Biometrics 53 983-997

Kincaid JP Fishburne R P Rogers R L amp Chissom B S (975) OCtiVCllion of 11ew readahiliry Jhrmulas (A11Jm1UJed Readability Inshydex Fog Cmmr and Flesch Reading Ease Formula) Jhr Navy enlisred personnel Research Branch report 8-75) Memphis TN Naval Air Station

Lee L Grigg W amp Donahue P (2007) The Nations Report Card Reading 2007 (NCES 2007-496) Nationai Center for Education Statistics Jnsdtute of Education Sciences US Department of Education Washington DC Retrieved January 16 2008 frommiddot httpncesedgovpubscarchpubsinfoasppubid-2007496

Levin E K Zigmond N amp Birch J W (1985) A followup study of 52 learning disabled adolescents Journal (Learning Disabilities 8 2-7

MacMman P (2000) Simultaneous measuremenl of reading growth genshyder and relative-age effects Many-faceted Rasch applied to CBM reading scores Journal of Applied Measuremem I 393--408

Marston D ( t989) A curriculum-based measurement approach to assessing academic performance What it is and why do it ln M Shinn Ed) Curriculum-based measurement Assessing special children (pp 18-78) New York Guilford

McGlinchey_ M T amp Hixson M D 2004) Using curriculum-based measurement to predict performa11ce on state assessments in reading Schoof Aychology Reviev 33 193-203

Minnesota Department of Education 200 l ) Hinnesow Basic Ski1middot TI1middot1 Technical Manual Accountability_ProgramsAssessment_und_ T cs ting AssessmentsEST B ST_ Technical_ Reports index html

Muyskens P amp Marston D (2006) The relmionship between CurriwumshyBased Measurement and ourcomes on high-srake~ 1es1 witl1 secondary studenrs Minneapolis Public Schools Unpublished manuscript

OConnor R E Fulmer D Harty K R amp Bell K M (2005) Layers of reading intervention in kindergarten through third grade Changes in teacl1ing and student outcomes Journal f~( Learning Disabilities 38 440--445

OConnor RE Hary K R amp Fuimer D (2005) Tiers of intervention in kindergarten through third grade Journal oleaming Disahiilies 38 532-538

Rasinski T V Pudak N D McKeon C A Wilfong L G Friedauer J A amp Heim P (2005) Is reading fluency a key for successful high school reading Journal of Adolescent and Adult Li1eracy 48 22-27

Ruppert D Wand M P amp CaiTol R l (2003) Semiparamerric regression New York Cambridge University Press

Shin J Espin C A Deno S L amp McConnell S (2004) Use ofhierarchishycal linear modeling anltl curriculum-based measurement for assessing academic growth and in~iructional factors for students with learning difficulties Asia Pucific Education Review 5 36-48

Silberglitt B amp Hintze J (2005) Formative assesmem usmg CBM-R cut scores to track progress toward success on state-mandated achieveshyment tests A comparison of methods Journal ofPsvchoed11catio11al Assessment 23 304-325 middot

Srnge S A amp Jacobsen MA (200 ) Predicting student success on a staleshymandated performance-based assessment using oral reading fluency School Psychology Review 30 407-419

Stecker P M Fuchs L S amp Fuch~ D (2005) Using curriculum-based measurement to improve student achievement Review of re~earch Aychology in the Schools 42 795-819

Ticha R Espin C A amp Wayman M M (2009) Reading progress monishytoring for secondary-school students Reliability validity and sensitivshyity to growth of reading aloud and maze selection measures Learning Disahilities Research amp Praclice 24 132-middot142

Torgesen J K (2000) Individual differences in response to early intervenshytions in reading Tlie lingering problem of treatment resisters Learning Disahililies Research amp Prac1ice 15 55-64

Touchstone Applied Science and Associates (2006) Degrees of reading power Brewster NY Author

Vaughn S Linan-Thompson S amp Hickman P (2003) Response to inshystructi011 as a means of identifying sludents with readinglearning disshyabilities Exceptional Children 69 391-409

Vellutno F R fletcher J M Snowling M L amp Scanlon D (2004) Specific reading disability(dyslexiaj What have we learned in the past four decades Journal aChild Ayc10log1 and Aychialry 45 2--40

Vclutino F R Scanlon D M amp Tanzman M S (1994) Components or reading ability issues and problems in operationalizing word identifishycation phonological coding and orthographic coding In G R Lyon (Ed) Fnmes ()freferencesjhr the assessmen1 ofearning disabii1ies new Fiews on measurement issues (pp 279-332) Baltimore Brookes Publishing

Vcllutino F R Tunmer W EJaccard J J amp Chen R (2007) Components of reading ability Mtiltivariate evidence for a convergent skills model of reading development Scientific 1udies of Reading l J 3--32

Warner M M Schumaker JB Alley G R amp Deshler D D (1980) Learning disabled adolescents in the public schools Are they different from other low achievers Exceptional Ed11calio11 Quar1eriy 1(2) 27-36

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis

Page 16: Learning Disabilities Research · Learning Disabilities Research Learning Disabilities Research & Practice. 25(2), 60-75 0 2010 TI1e Division for Learning Disabilities of the Council

Wayman M M Ticha R Wallace T Espin C A Wiley H l Du X amp Long J (2009) Companwm qf different sconmiddotng procedures Or CBM mazeselecion measures (Technical Report No 10) Minneapolis MN University of Minnesota Research Institute on Progress Monitoring

Wayman M M Wallace T Wiley H 1 Ticha R amp Espin C A_

(2007) Literature synthesis on curriculum-based measurement in readshying Journal of Special Educmion 41 85-120

About the Authors

LEARNING DISABILITIES RESEARCH 75

Wiley H I amp Deno S L (2005) Oral reading and maze measures as preshydictors of success for English learners on a state standards assessment Remedial and Special Education 26 207-214

Yovanoff P Duesbery L Alonzo J amp Tindal G (2005) Grade-level invariance ofa theoretical causal structure predicting reading compreshyhension with vocabulary and oral reading fluency Educational Meashysuremem hsues and Practice 24 4-2

Christine Espin is a professor in Education and Child Studies at Leiden University Leiden the Netherlands She is also an adjunct professor in Cognitive Sciences at the University of Minnesota Her research interests focus on the development of curriculum-based measurement (CBM) procedures inreading vritten expression and content-area learning for secondary students with learning disabilities and teachers use ofCBM data

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato Her research focuses on the development of general outcome measures for students with significant cognitive disabilities implementation of response to intervention and utilization of data in decision making at the student clasroom school and district level

Heather Campbell is an assistant professor of Education at St Olaf College in Northfield Minnesota She works with educationaJ opportunity programs at St Olaf and her research interests include the development of CBM procedures in written expression for English language learners

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri Her research interests focus on the development of CBM procedures in reading written expression and mathematics for students in early elementary grades as well as implementation of response to intervention in classrooms

Jeffrey D Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in the Department of Educational Psychology University of Minnesota His interest is longitudinal data analysis