2) 3 grade reading summative assessment performance · 2018. 8. 7. · • summative – typically...

13
TECHNICAL PAPER | JULY 1, 2015 Relating Star Reading ® to the Mississippi K-3 Assessment Support System (MKAS 2 ) 3 rd Grade Reading Summative Assessment Performance

Upload: others

Post on 23-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

TECHNICAL PAPER | JULY 1, 2015

Relating Star Reading® to the Mississippi K-3 Assessment Support System (MKAS2) 3rd Grade Reading Summative Assessment Performance

Page 2: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 2

Contents Project purpose ............................................................................................................................................................................. 3

Assessments ................................................................................................................................................................................. 4

Method ............................................................................................................................................................................................ 4

Results ............................................................................................................................................................................................ 5 Appendix A: About Star Reading® ............................................................................................................................................ 12 References ................................................................................................................................................................................... 13 Figures Figure 1. Linkage of MKAS2 to the Star Reading® scale ....................................................................................................... 6

Figure 2a. Scatter plot of MKAS2 and Star Reading® scores: Concurrent sample .......................................................... 7

Figure 2b. Scatter plot of MKAS2 and Star Reading® scores: Predictive sample ............................................................. 7

Figure 3a. Fourfold table of classification diagnostic data .................................................................................................. 9

Figure 3b. Descriptions of classification diagnostic accuracy measures ....................................................................... 10

Tables Table 1. Concurrent and predictive sample information....................................................................................................... 5

Table 2. Probability of meeting or exceeding MKAS2 passing score .................................................................................. 8

Table 3. Summary Statistics from the concurrent sample .................................................................................................. 9

Page 3: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 3

Project Purpose

Educators face many challenges; chief among them is making decisions regarding how to allocate limited resources to best serve diverse student needs. A good assessment system supports teachers by providing timely, relevant information that can help address key questions about which students are on track to meet important performance standards and which students may need additional help. Different educational assessments serve different purposes, but those that can identify students early in the school year as being at-risk to miss academic standards can be especially useful because they can help inform instructional decisions that can improve student performance and reduce gaps in achievement. Assessments that can do that while taking little time away from instruction are particularly valuable. Indicating which students are on track to meet later expectations is one of the potential capabilities of a category of educational assessments called “interim” (Perie, Marian, Gong, & Wurtzel, 2007). They are one of three broad categories of assessment:

• Summative – typically annual tests that evaluate the extent to which students have met a set of standards. Most common are state-mandated tests such as the Mississippi K-3 Assessment Support System (MKAS2) 3rd Grade Reading Summative Assessment.

• Formative – short and frequent processes embedded in the instructional program that support

learning by providing feedback on student performance and identifying specific things students know and can do as well as gaps in their knowledge.

• Interim – assessments that fall in between formative and summative in terms of their duration

and frequency. Some interim tests can serve one or more purposes, including informing instruction, evaluating curriculum and student responsiveness to intervention, and forecasting likely performance on a high-stakes summative test later in the year.

This project focuses on the application of interim test results, notably their power to inform educators about which students are on track to succeed on the year-end summative state test and which students might need additional assistance. Specifically, the purpose of this project is to explore statistical linkage between Renaissance Learning interim assessment Star Reading1 and the MKAS2. If the linkage is sufficiently strong, it may be useful for: 1. The early identification of students at risk of failing to make yearly progress goals, which could help teachers decide to adjust instruction for selected students. 2. Forecasting percentages of students likely to pass the state assessment sufficiently in advance to permit redirection of resources and serve as an early warning system for administrators at the building and district level.

1 A technical manual is available for Star Reading by request to [email protected].

Page 4: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 4

Assessments Mississippi K-3 Assessment Support System (MKAS2) This report is concerned with the Mississippi K-3 Assessment Support System (MKAS2) 3rd Grade Reading Summative Assessment. The MKAS2 reports scaled scores to describe a student’s location on the achievement continuum ranging from 600 to 1200. MKAS2 results are used to classify students into two achievement levels: Failing or Passing. The passing score is 926. Students who fail are allowed to retest, and those students who fail the test after three attempts are retained in 3rd grade. The analyses summarized in this report focus on students’ initial MKAS2 performance.

Renaissance Star Reading® Renaissance Star Reading is a computer-administered, adaptive measure of general achievement in reading. The adaptive nature permits the test to be administered to students in grades 1 through 12. It is intended for use as an interim assessment that can be administered at multiple points throughout the school year for purposes such as screening, placement, progress monitoring, and outcomes assessment. Renaissance Learning recommends that Star tests be administered two to five times a year for most purposes, and more frequently when used in progress monitoring programs. Recent changes to the Star test item banks and software make it possible to test as often as weekly, for short term progress monitoring in programs such as RTI (response to intervention).

Star Reading is a standardized, nationally normed, computer-adaptive, assessment. It places a minimal burden on teacher time, as it can be self-administered, is automatically scored by internal software, and generates a variety of reports. Furthermore, each student’s test is adapted according to his or her previous responses, increasing the accuracy and reliability. Star Reading fully automates every aspect of a testing program, including test administration, scoring, record-keeping, and report preparation. A core component of Star assessment systems is a longitudinal database that contains permanent records of every test administered to a student, both within and across school years.

Method Data collection Analysis plans included the evaluation of correlations and statistical linkages between scores on of Mississippi K-3 Assessment Support System (MKAS2) and Star Reading. Such analyses require matched data, with student records that include both the MKAS2 and Star test scores. Mississippi provided Renaissance Learning with MKAS2 test scores for students who had taken Star Reading during the 2014–2015 school year. Each record in the resulting data file included a student’s first MKAS2 score as well as scores on any Star Reading tests taken during that same year. Linkages between the Star Reading and MKAS2 score scales were developed by applying equipercentile linking analysis (Kolen & Brennan, 2004). The MKAS2 score scale was linked to the Star score scale yielding a table of equivalent MKAS2 scores for each possible Star score. This type of analysis requires students take both assessments at about the same time. Sample characteristics The matched Star-MKAS2 data was divided into two samples. Table 1 contains sample sizes and descriptive statistics for each sample. Linking was completed using a concurrent sample, which included all Star tests taken within 30 days before or after the MKAS2. The concurrent sample consisted of a total of 27,102 students with matched

Page 5: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 5

MKAS2 and Star Reading scores. Of the concurrent sample, approximately 10% of the students were reserved as part of a holdout sample which was used exclusively to evaluate the linking, and was not included in the sample used to compute it. Star tests taken outside the +/-30 day MKAS2 window were included in a predictive sample, which was used to evaluate the accuracy of using the linking results to predict MKAS2 performance using Star Reading data from earlier in the school year. In the predictive sample, Star scaled scores were projected to the MKAS2 test date using national growth norms (Renaissance Learning, 2015). National growth norms are based on grade and initial performance, and are updated annually using a five-year period of data which includes millions of students. They provide typical growth rates for students based on their starting Star test score. For each Star score in the predictive sample, the number of weeks between the Star administration date and the MKAS2 mid-date was calculated. Then the number of weeks between the two tests was multiplied by the student’s expected weekly scaled score growth (based on national growth norms). The expected growth was then added to the observed scaled score to determine the projected Star score at the time of the MKAS2. If a student had multiple Star Reading tests in the predictive sample, then all the projected scores were averaged. Table 1. Concurrent and predictive sample information

Sample Sample

Size

MKAS2 Star Reading

M SD M SD

Concurrent Linking 24,404 966.6 56.6 376.2 149.4

Hold Out 2,698 967.3 56.1 380.2 147.9

Predictive 25,305 978.9 58.2 420.0 138.9

Results Scale linkage Equipercentile linking was used to develop linkages between Star Reading and MKAS2. The result of the analysis was equivalent MKAS2 scores for each possible Star score. These results allow the user to look up the MKAS2 score that corresponds to every possible Star Reading score (see Figure 1).

Page 6: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 6

Figure 1. Linkage of MKAS2 to the Star Reading® scale

600

700

800

900

1000

1100

1200

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

Equi

vale

nt M

KAS2

Sco

re

Star Reading Scaled Score

Star score equivalent to the MKAS2 pass score The principal purpose of the linkage between Star Reading and MKAS2 test scores was to identify the score on Star Reading that is approximately equivalent to the passing score on the MKAS2. The benchmark score of 926 on the MKAS2 was linked to a score of 253 on Star Reading. Scatter plots For both the concurrent and predictive samples, scatter plots were created to visualize the relationship between Star Reading and MKAS2 scores (see Figure 2a and 2b). Included in the plots are lines marking the MKAS2 passing cut score of 926 and the equivalent Star Reading cut score of 253.

Page 7: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 7

Figure 2a. Scatter plot of MKAS2 and Star Reading® scores: Concurrent sample Figure 2b. Scatter plot of MKAS2 and Star Reading® scores: Predictive sample

Probability estimates Once the Star Reading cut score 253 was identified as corresponding to the MKAS2 passing score of 926, probability estimates were calculated based on the proportion of students in the concurrent sample who met the Star Reading cut score. Table 2 lists probability estimates as a function of Star Reading performance. For example, a third-grade student with a Star Reading scaled score of 175 at the time of the MKAS2 would have about a 24% chance of passing the MKAS2.

700

800

900

1000

1100

1200

0 200 400 600 800 1000 1200 1400

MKA

S2 S

core

STAR Reading Scaled Score (Concurrent)

700

800

900

1000

1100

1200

0 200 400 600 800 1000 1200 1400

MKA

S2 S

core

STAR Reading Scaled Score (Predictive)

Page 8: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 8

Table 2. Probability of meeting or exceeding MKAS2 passing score

Star Reading Scaled Score Probability of Passing MKAS2 1 - 60 0%

61 - 70 8% 71 - 80 8% 81 - 90 8%

91 - 100 11% 101 - 110 11% 111 - 120 16% 121 - 130 16% 131 - 140 16% 141 - 150 18% 151 - 160 21% 161 - 170 22% 171 - 180 24% 181 - 190 27% 191 - 200 27% 201 - 210 27% 211 - 220 39% 221 - 230 46% 231 - 240 48% 241 - 250 48% 251 - 260 58% 261 - 270 64% 271 - 280 73% 281 - 290 78% 291 - 300 81% 301 - 310 83% 311 - 320 89% 321 - 330 89% 331 - 340 93% 341 - 350 94% 351 - 360 94% 361 - 370 96% 371 - 380 97% 381 - 390 98% 391 - 400 98% 401 - 500 99%

500 - 1400 100% Correlations Two correlations were obtained from the sample: one between the MKAS2 scores and concurrent Star scores, and another between MKAS2 scores and the MKAS2 score equivalents (obtained from the linking). The correlation between the MKAS2 and Star Reading was .83. The correlation between MKAS2 and MKAS2 score equivalents was similar, r = .85.

Page 9: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 9

RMSEL and mean differences Accuracy of the scale linkage was evaluated two ways. The same scores used to complete the linking were used to compute the root mean squared error of linking (RMSEL). Additionally, the holdout sample (i.e., concurrent scores not used to complete the linking) were used to evaluate differences between observed MKAS2 scores and MKAS2 score equivalents. Table 3 displays these statistics.

Table 3. Summary statistics from the concurrent sample

Linking Sample RMSEL

Holdout Sample Difference Scores

N Mean SD Min Max

20.91 2,698 -0.83 32.95 -251 306

Classification accuracy The predictive sample was used in analyses exploring the accuracy of using Star Reading tests taken earlier in the school year to predict MKAS2 performance. Two correlations were calculated to summarize the predictive power of the Star test scores: the raw correlation between the projected Star Reading and observed MKAS2 scale scores, and the equated-score correlation between the MKAS2 score equivalents obtained from the linking and the observed MKAS2 scores. The predictive sample correlations were similar to the correlations presented earlier for the concurrent sample, indicating that projected Star scores are reliable estimates of MKAS2 performance. The raw correlation was .87 and the correlation between MKAS2 and MKAS2 score equivalents was .89. Classification diagnostics were derived from counts of correct and incorrect classifications that could be made when using Star Reading scores to predict whether or not a student would pass the MKAS2. The types of classifications are summarized in Table 3a and classification diagnostic formulas are outlined in Table 3b. Figure 3a. Fourfold table of classification diagnostic data

MKAS2 Result Total

Pass Fail

Star Reading Estimate

Pass True Positive (TP)

False Positive (FP)

Projected Pass (TP + FP)

Fail False Negative (FN)

True Negative (TN)

Projected Fail (FN + TN)

Total Observed Pass (TP + FN)

Observed Fail (FP + TN) N = TP + FP + FN + TN

Page 10: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 10

Figure 3b. Descriptions of classification diagnostic accuracy measures

Measure

Formula

Interpretation

Overall classification accuracy

TP + TN

Percentage of correct classifications N

Sensitivity TP

Percentage of passed students identified as such using Star TP + FN

Specificity TN

Percentage of failed students identified as such using Star TN + FP

Positive predictive value (PPV)

TP

Percentage of students Star finds passed who actually did TP + FP

Negative predictive value (NPV)

TN

Percentage of students Star finds failed who actually are did FN + TN

Observed proficiency rate (OPR)

TP + FN

Percentage of students who passed N

Projected proficiency rate (PPR)

TP + FP

Percentage of students Star finds passed N

Proficiency status projection error PPR - OPR Difference between projected and observed

proficiency rates

Overall classification accuracy was high, students were correctly classified as either passing or failing 93% of the time. Sensitivity (i.e., the percentage of passing students correctly forecasted) was 98% and specificity (i.e., the percentage of failing students correctly forecasted) was 61%. The positive predictive value was 93%, meaning when Star Reading forecasted students to pass, they actually did pass 93% of the time. The negative predictive value was 85%, meaning when Star Reading forecasted that students would fail, they actually did fail 85% of the time. The difference between the observed and projected proficiency rates (i.e., proficiency status projection error) was 4%, indicating that projected Star Reading scores tended to slightly over-predict MKAS passing rates. Finally, the area under the ROC curve (AUC) is a summary measure of diagnostic accuracy. The National Center on Response to Intervention has set an AUC of .85 or higher as indicating convincing evidence that an assessment can accurately predict another assessment result or outcome. The AUC was .96, well

Page 11: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 11

exceeding that standard and indicating that Star Reading scores did a very good job of discriminating between which students passed the MKAS2. Conclusions and Applications The equipercentile linking method was used to link Star Reading scaled scores to the Mississippi K-3 Assessment Support System (MKAS2) 3rd Grade Reading Summative Assessment scores. The result was estimates of equivalent MKAS2 scores for each Star Reading scaled score. Using the tables of linked scores, we identified that a Star Reading score of 253 corresponded to the MKAS2 passing score of 926. Correlations indicated a strong relationship between the Star and MKAS2 tests. The correlation between MKAS2 and concurrent Star Reading scores (i.e., Star tests taken within +/- 30 days of the MKAS2 mid-date) was .83 and the correlation between MKAS2 and predictive Star scores (i.e., Star tests taken earlier and projected to the MKAS2 mid-date) was .87. When projecting Star Reading scores to estimate MKAS2 performance, students were correctly classified as either proficient or not 93% of the time. The statistical linkages between Star Reading interim assessments and the MKAS2 provide a means of forecasting student achievement on the MKAS2 based on Star scores obtained earlier in the school year. Example Star Reading reports that utilize the Star-MKAS2 linking are provided in the Appendix. They include individualized Pathway to Proficiency reports, which compare each student’s Star performance to the growth trajectory that typically would lead to passing the MKAS2, as well as group-level performance reports that forecast of the number of students that are expected to pass the MKAS2. Both types of reports can be used to help educators determine early and periodically which students are on track to pass and to make decisions accordingly.

Page 12: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com 12

Appendix A: About Star Reading® The computer-adaptive Star Reading assessments serve multiple purposes including screening, progress monitoring, instructional planning, forecasting proficiency, standards mastery, and measuring growth. These highly reliable, valid, and efficient standards-based measures of student performance in reading and math provide valuable information regarding the acquisition of skills along a continuum of learning expectations. The assessments can be completed in about 20 minutes, and we recommend administering them two to five times a year for most purposes and more frequently—as often as weekly—when used in progress monitoring programs. Star Reading is highly rated for progress monitoring by the National Center on Intensive Intervention and received high ratings for screening and progress monitoring by the National Center on Response to Intervention.

Page 13: 2) 3 Grade Reading Summative Assessment Performance · 2018. 8. 7. · • Summative – typically annual tests that evaluate the extent to which students ha ve met a set of standards

©Copyright 2015 Renaissance Learning, Inc. All rights reserved. (800) 338-4204www.renaissance.com

13 R58326.150701

All logos, designs, and brand names for Renaissance’s products and services, including but not limited to Star Assessments, Star Reading, and Renaissance, are trademarks of Renaissance Learning, Inc., and its subsidiaries, registered, common law, or pending registration in the United States. All other product and company names should be considered the property of their respective companies and organizations.

References

Kolen, M. J., & Brennan, R. R. (2004). Test equating scaling and linking: Methods and practices. New York, NY: Springer Science+Business Media.

Perie, M., Marion, S., Gong, B., & Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system. Aspen, CO:

Aspen Institute. Renaissance Learning. (2015). STAR Reading technical manual. Wisconsin Rapids, WI: Author. Available by request to

[email protected] Independent technical reviews of Star Reading® U.S. Department of Education: National Center on Intensive Intervention. (2015). Review of progress monitoring tools [Review of STAR

Reading]. Washington, DC: Author. Retrieved from http://www.intensiveintervention.org/chart/progress-monitoring U.S. Department of Education: National Center on Response to Intervention. (2010). Review of progress monitoring tools [Review of

STAR Reading]. Washington, DC: Author. Retrieved from https://web.archive.org/web/20120813035500/http://www.rti4success.org/pdf/progressMonitoringGOM.pdf

U.S. Department of Education: National Center on Response to Intervention. (2011). Review of screening tools [Review of STAR

Reading]. Washington, DC: Author. Retrieved from http://www.rti4success.org/resources/tools-charts/screening-tools-chart