california educational research association anaheim, california december 1, 2011

41
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research California Educational Research Association Association Anaheim, California December 1, 2011

Upload: maxime

Post on 07-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

California Educational Research Association Anaheim, California December 1, 2011. Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels. Objective Extend knowledge of assessment team to: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: California Educational Research Association Anaheim, California December 1, 2011

11

Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels

California Educational Research California Educational Research AssociationAssociation

Anaheim, California

December 1, 2011

Page 2: California Educational Research Association Anaheim, California December 1, 2011

22

Review of Benchmark Test and Item Statistics

Objective

Extend knowledge of assessment team to:

1.Better understand test reliability and the influences of test composition and test length.

2.Better understand item statistics and use them to identify items in need of revision

Page 3: California Educational Research Association Anaheim, California December 1, 2011

Reliability is a measure of the consistency of the assessment

Types of reliability coefficients (always range from 0 to 1)

Test-retestAlternate formsSplit-halfInternal consistency (Cronbach’s

Alpha/KR-20)

33

Page 4: California Educational Research Association Anaheim, California December 1, 2011

Reliability Influenced by Test Length

• Spearman-Brown formula estimates reliabilities of shorter tests– Remember: The reliability of a score is

an indication of how much an observed score can be expected to be the same if observed again.

NOTE: See handout from STAR Technical Manual for exact cluster reliabilities.

44

Page 5: California Educational Research Association Anaheim, California December 1, 2011

Reliability Influenced by Test Length

• Example: given a 75 item test with r=.95– 40 item test has r=.91– 35 item test has r=.90– 30 item test has r=.88– 25 item test has r=.86– 20 item test has r=.84– 10 item test has r=.72– 5 item test has r=.56

NOTE: See handout from STAR Technical Manual for exact cluster reliabilities.

55

Page 6: California Educational Research Association Anaheim, California December 1, 2011

Reliability Statistics for CST’s(see handout)

Note that CST reliabilities range from .90 to .95

Note that cluster reliabilities are consistent with those predicted by Spearman-Brown formula

Page 7: California Educational Research Association Anaheim, California December 1, 2011

Validity is the degree to which the test is measuring what was intended

Types of test validity

A. Predictive or Criterion (How does it correlate with other measures?)

B. Content 1. How well does the test sample

from the content domain?2. How aligned are the items with

regard to format and rigor77

Page 8: California Educational Research Association Anaheim, California December 1, 2011

Validity Is Influenced by Reliability

Impact of Lower Reliability on Validity Remember: Validity is the agreement

between a test score and the quality it is believed to measure

Upper limit on validity coefficient is the square root of the reliability coefficient

75 item test = square root of .95 = .97

88

Page 9: California Educational Research Association Anaheim, California December 1, 2011

Validity Is Influenced by Reliability

Upper limit on validity coefficient is the square root of the reliability coefficient 75 item test =square root of .95=.97 30 item test= square root of .88=.94 20 item test= square root of .86=.93 10 item test = square root of .72=.85 5 item test = square root of .56=.75

99

Page 10: California Educational Research Association Anaheim, California December 1, 2011

Coefficient of Determination (R squared)

Square of validity coefficient gives “proportion of variance in the achievement construct accounted for by the test” 75 item test =.97 squared=.94 30 item test=.94 squared=.88 20 item test=.93 squared=.86 10 item test=.85 squared=.72 5 item test=.75 squared=.56

1010

Page 11: California Educational Research Association Anaheim, California December 1, 2011

Using Item Statistics (p-value & point-biserials)

Apply item analysis statistics from assessment reporting system (e.g. Datadirector, Edusoft, OARS, EADMS, etc.) P-values (percent of group getting item correct

Most should be between 30 and 80 Very high indicates it may be too easy; too low may indicate a

problem item

Point-biserials (correlation of item with total score) Most should be .30 or higher Very low or negative generally indicates a problem with the item

Page 12: California Educational Research Association Anaheim, California December 1, 2011

Item statistics for CST’s(see handout)

Note that the range of P-values is consistent with most being between .30 and .80

Note that median point-biserials are generally in the 40’s

Page 13: California Educational Research Association Anaheim, California December 1, 2011
Page 14: California Educational Research Association Anaheim, California December 1, 2011

Algebra 1

Question 7 District Pilot Group PL: Basic

Choice # of

Students Percent

# of Students

Percent

A 1691 36.81 220 37.48 B 1563 34.02 187 31.86

C 669 14.56 85 14.48

D 629 13.69 89 15.16

E 4 0.09 2 0.34

BLANK 38 0.83 4 0.68

Total 4594 100 587 100

Point

Biserial 0.31

0.38

Page 15: California Educational Research Association Anaheim, California December 1, 2011

Algebra 1

Question 19 District Pilot Group PL: Advanced Proficient

Choice # of

Students Percent

# of Students

Percent

A 971 21.18 108 18.40

B 1028 22.42 125 21.29

C 1193 26.02 145 24.70 D 1148 25.04 155 26.41

E 7 0.15 0 0.00

BLANK 238 5.19 54 9.20

Total 4585 100 587 100

Point

Biserial 0.23

0.19

Page 16: California Educational Research Association Anaheim, California December 1, 2011

Algebra 2

Question 21 District Pilot Group PL: Beyond Advanced Proficient

Choice # of

Students Percent

# of Students

Percent

A 286 23.50 45 24.32

B 248 20.38 37 20.00

C 354 29.09 63 34.05 D 260 21.36 35 18.92

E 0 0.00 0 0.00

BLANK 69 5.67 5 2.70

Total 1217 100 185 100

Point

Biserial 0.19

0.24

Page 17: California Educational Research Association Anaheim, California December 1, 2011

Geometry

Question 12 District Pilot Group PL: Proficient

Choice # of

Students Percent

# of Students

Percent

A 247 13.46 42 15.91 B 603 32.86 90 34.09

C 703 38.31 99 37.50

D 273 14.88 31 11.74

E 0 0.00 0 0.00

BLANK 9 0.49 2 0.76

Total 1835 100 264 100

Point

Biserial 0.10

0.10

Page 18: California Educational Research Association Anaheim, California December 1, 2011

1818

Maximizing Predictive Accuracy of District Benchmarks

Objective

Extend knowledge of assessment team to:

1.Better understand how performance level setting is key to predictive validity.

2.Better understand how to create performance level bands based on equipercentile equating

Page 19: California Educational Research Association Anaheim, California December 1, 2011

1919

Comparing District Benchmarks to CST Results

Common Methods for Setting Cutoffs on District Benchmarks:

Use default settings on assessment platform (e.g. 20%, 40%, 60%, 80%)

Ask curriculum experts for their opinion of where cutoffs should be set

Determine percent correct corresponding to performance levels on CSTs and apply to benchmarks

Page 20: California Educational Research Association Anaheim, California December 1, 2011

2020

Comparing District Benchmarks to CST Results

There is a better way!

Page 21: California Educational Research Association Anaheim, California December 1, 2011

2121

Comparing District Benchmarks to CST Results

“Two scores, one on form X and the other on form Y, may be considered equivalent if their corresponding percentile ranks in any given group are equal.” (Educational Measurement-Second Edition, p. 563)

Page 22: California Educational Research Association Anaheim, California December 1, 2011

2222

Comparing District Benchmarks to CST Results

Equipercentile Method of Equating at the Performance Level Cut-points Establishes cutoffs for benchmarks at

equivalent local percentile ranks as cutoffs for CSTs

By applying same local percentile cutoffs to each trimester benchmark, comparisons across trimesters within a grade level are more defensible

Page 23: California Educational Research Association Anaheim, California December 1, 2011

2323

Equipercentile Equating MethodStep 1-Identify CST SS Cut-points

Page 24: California Educational Research Association Anaheim, California December 1, 2011

2424

Equipercentile Equating Method

Step 2 - Establish Local Percentiles at CSTPerformance Level Cutoffs (from scaled score frequency distribution)

Page 25: California Educational Research Association Anaheim, California December 1, 2011

2525

Equipercentile Equating Method

Step 3 – Locate Benchmark Raw ScoresCorresponding to the CST CutoffPercentiles (from benchmark raw scorefrequency distribution)

Page 26: California Educational Research Association Anaheim, California December 1, 2011

2626

2nd Semester

Biology

Old Cutoff FBB BB Basic Proficient Advanced Total

0-17 FBB 57 72 25 1 0 155

18-34 BB 118 297 511 60 4 990

35-48 Basic 19 51 427 401 45 943

49-62 Proficient 1 5 27 141 207 381

63-70 Advanced 0 0 0 0 20 20

Total 195 425 990 603 276 2489

Correct Classification: Proficient & Advanced on CST = 42%

Correct Classification: Each Level on CST = 38%

2006 CST

Equipercentile Equating MethodStep 4 – Validate Classification Accuracy – Old Cutoffs

Page 27: California Educational Research Association Anaheim, California December 1, 2011

2727

2nd Semester

BiologyOld Cutoff FBB BB Basic Proficient Advanced Total

0-17 FBB 57 72 25 1 0 155

18-34 BB 118 297 511 60 4 990

35-48 Basic 19 51 427 401 45 943

49-62 Proficient 1 5 27 141 207 381

63-70 Advanced 0 0 0 0 20 20

Total 195 425 990 603 276 2489

Correct Classification: Proficient & Advanced on CST = 42%

Correct Classification: Each Level on CST = 38%

2006 CST

Equipercentile Equating MethodStep 4 – Validate Classification Accuracy – Old Cutoffs

Page 28: California Educational Research Association Anaheim, California December 1, 2011

2828

2nd Semester

BiologyOld Cutoff FBB BB Basic Proficient Advanced Total

0-17 FBB 57 72 25 1 0 155

18-34 BB 118 297 511 60 4 990

35-48 Basic 19 51 427 401 45 943

49-62 Proficient 1 5 27 141 207 381

63-70 Advanced 0 0 0 0 20 20

Total 195 425 990 603 276 2489

Correct Classification: Proficient & Advanced on CST = 42%

Correct Classification: Each Level on CST = 38%

2006 CST

Equipercentile Equating MethodStep 4 – Validate Classification Accuracy – Old Cutoffs

Page 29: California Educational Research Association Anaheim, California December 1, 2011

2929

2nd Semester

BiologyNew Cutoff FBB BB Basic Proficient Advanced Total

0-19 FBB 89 107 53 4 0 253

20-26 BB 59 142 148 12 0 361

27-40 Basic 39 161 596 176 9 981

41-51 Proficient 8 12 181 354 82 637

52-70 Advanced 0 3 12 57 185 257

Total 195 425 990 603 276 2489

Correct Classification: Proficient & Advanced on CST = 77%

Correct Classification: Each Level on CST = 55%

2006 CST

Equipercentile Equating MethodStep 4 – Validate Classification Accuracy –

New Cutoffs

Page 30: California Educational Research Association Anaheim, California December 1, 2011

3030

2nd Semester

BiologyNew Cutoff FBB BB Basic Proficient Advanced Total

0-19 FBB 89 107 53 4 0 253

20-26 BB 59 142 148 12 0 361

27-40 Basic 39 161 596 176 9 981

41-51 Proficient 8 12 181 354 82 637

52-70 Advanced 0 3 12 57 185 257

Total 195 425 990 603 276 2489

Correct Classification: Proficient & Advanced on CST = 77%

Correct Classification: Each Level on CST = 55%

2006 CST

Equipercentile Equating MethodStep 4 – Validate Classification Accuracy –

New Cutoffs

Page 31: California Educational Research Association Anaheim, California December 1, 2011

3131

BiologyNew Cutoff FBB BB Basic Proficient Advanced Total

0-19 FBB 89 107 53 4 0 253

20-26 BB 59 142 148 12 0 361

27-40 Basic 39 161 596 176 9 981

41-51 Proficient 8 12 181 354 82 637

52-70 Advanced 0 3 12 57 185 257

Total 195 425 990 603 276 2489

Correct Classification: Proficient & Advanced on CST = 77%

Correct Classification: Each Level on CST = 55%

2006 CST

Equipercentile Equating MethodStep 4 – Validate Classification Accuracy –

New Cutoffs

Page 32: California Educational Research Association Anaheim, California December 1, 2011

3232

Example: Classification AccuracyBiology

Old New

2nd Semester

Proficient or Advanced 42% 77%

Each Level 38% 55%

1st Semester

Proficient or Advanced 30% 77%

Each Level 31% 50%

Page 33: California Educational Research Association Anaheim, California December 1, 2011

3333

Example: Classification AccuracyBiology

Old New

1st Quarter

Proficient or Advanced 53% 71%

Each Level 41% 46%

Page 34: California Educational Research Association Anaheim, California December 1, 2011

3434

Example: Classification AccuracyChemistry

Old New

2nd Semester: Prof. & Adv. 63% 79%

2nd Semester: Each Level 47% 52%

1st Semester: Prof. & Adv. 74% 74%

1st Semester: Each Level 49% 50%

1st Quarter: Prof. & Adv. 83% 76%

1st Quarter: Each Level 48% 47%

Page 35: California Educational Research Association Anaheim, California December 1, 2011

3535

Example: Classification AccuracyEarth Science

Old New

2nd Semester: Prof. & Adv. 48% 68%

2nd Semester: Each Level 43% 52%

1st Semester: Prof. & Adv. 33% 66%

1st Semester: Each Level 38% 47%

1st Quarter: Prof. & Adv. 42% 56%

1st Quarter: Each Level 34% 41%

Page 36: California Educational Research Association Anaheim, California December 1, 2011

3636

Example: Classification AccuracyPhysics

Old New

2nd Semester: Prof. & Adv. 57% 87%

2nd Semester: Each Level 37% 57%

1st Semester: Prof. & Adv. 60% 88%

1st Semester: Each Level 42% 50%

1st Quarter: Prof. & Adv. 65% 87%

1st Quarter: Each Level 47% 45%

Page 37: California Educational Research Association Anaheim, California December 1, 2011

3737

Things to Consider Prior to Establishing the Benchmark Cutoffs

Will there be changes to the benchmarks after CST percentile cutoffs are established? If NO then raw score benchmark cutoffs can be

established by linking CST to same year benchmark administration (i.e. spring 2011 CST matched to 2010-11 benchmark raw scores)

If YES then wait until new benchmark is administered and then establish raw score cutoffs on benchmark

How many cases are available for establishing the CST percentiles? (too few cases could lead to unstable percentile distributions)

Page 38: California Educational Research Association Anaheim, California December 1, 2011

3838

Things to Consider Prior to Establishing the Benchmark Cutoffs (Continued)

How many items comprise the benchmarks to be equated? (as test gets shorter it becomes more difficult to match the percentile cutpoints established on the CST’s)

Page 39: California Educational Research Association Anaheim, California December 1, 2011

3939

SummaryEquipercentile Equating Method

Method generally establishes a closer correspondence between the CST and Benchmarks

When benchmarks are tightly aligned with CSTs, the approach may be less advantageous (i.e. elementary math)

Comparisons between benchmark and CST performance can be made more confidently

Comparisons between benchmarks within the school year can be made more confidently

Page 40: California Educational Research Association Anaheim, California December 1, 2011

4040

Coming Soon from Illuminate Education, Inc.!

Reports using the equipercentile methodology are being programmed to:

(1) establish benchmark cutoffs for performance bands

(2) create validation tables showing improved classification accuracy based on the method

Page 41: California Educational Research Association Anaheim, California December 1, 2011

Contact:

Tom Barrett, Ph.D.

President, Barrett Enterprises, LLC

Director, Owl Corps, School Wise Press

2173 Hackamore Place

Riverside, CA 92506

951-905-5367 (office)

951-237-9452 (cell)

4141