vertical scaling and the development of skills marty mccall northwest evaluation association...
TRANSCRIPT
![Page 1: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/1.jpg)
Vertical Scaling and the Development of Skills
Marty McCallNorthwest Evaluation Association
WERA/OSPI State Assessment ConferenceSeaTac, WA
December 7, 2007
![Page 2: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/2.jpg)
2
Examining constructs through vertical scales
What are scales, anyway?
Examples:temperaturelengthvolumetime
What do they have in common?
![Page 3: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/3.jpg)
3
Achievement scales – Latent constructs
A framework for measuring student achievement. Scores refer to a point on the scale.
What is the meaning of the point on the scale?
Example: A score of 400 on the 4th grade Reading WASL.
What does it mean? How do you know?
What do you know about a score of 385 on the same test? How do you know?
![Page 4: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/4.jpg)
4
Achievement scales
A framework for measuring the difficulty of test questions. Each item has a difficulty rating expressed as a point on the scale.
What is the meaning of the point on the scale?
A student with a score of 400 gets items with a difficulty of 400 right about half the time.
![Page 5: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/5.jpg)
5
Achievement scales
WASL scales were originally developed separately for each grade and subject.
Items were written specifically for each set of grade level standards.
The scale was developed using these items and students in the tested grade.
For each grade the score representing meeting standard was set at 400.
![Page 6: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/6.jpg)
6
What are vertical scales?
span ages or gradesProvide a common framework
for measurement over timeScores show change over timeItems taken at different times are
on the same scale
How do you interpret the difference between a 400 on the 4th grade WASL and a 400 on the 7th grade WASL?
![Page 7: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/7.jpg)
Vertical scales articulate content across grades In development of vertical scales,
the progression from early skills to late skills is used throughout the process. What are the foundational skills? How do they relate to later, more
complex skills? Gives an empirical check to theory
7
![Page 8: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/8.jpg)
Who uses vertical scales? CTB McGraw
TerraNova Series Comprehensive Test of Basic Skills (CTBS)
Harcourt Stanford Achievement Test Metropolitan Achievement Test
Statewide NCLB tests All states using CTB or Harcourt’s tests Mississippi, North Carolina, Oregon, Idaho
Woodcock cognitive batteries NWEA – MAP tests
![Page 9: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/9.jpg)
9
Why use vertical scales? To model growth:Tests that are vertically scaled are intended to
support valid inferences regarding growth over time.
--Patz, Yao, Chia, Lewis, & Hoskins (CTB/McGraw)
To study cognitive changes:“When people acquire new skills, they are
changing in fundamental interesting ways. By being able to measure change over time it is possible to map phenomena at the heart of the educational enterprise.”
--John Willet
![Page 10: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/10.jpg)
Modeling Growth
The original NCLB model was a status model. After intensive discussion, a growth component pilot has been added.
Why are growth models better than status models in evaluating school effectiveness?
Why did NCLB initially reject them?
10
![Page 11: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/11.jpg)
Modeling Growth
Growth models share common characteristics:
-Measure change over time-Take initial conditions into
account-Compare to some expectation of
growth
11
![Page 12: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/12.jpg)
Take initial conditions into account
Students with low scores grow more than those with high scores. (WASL research shows this as well.)
What happens if you don’t account for this?
What is expected growth?Normative, policy, both.
12
![Page 13: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/13.jpg)
Modeling Cognitive Changes
What skills are acquired first?What skills are precursors of
others?What skills are components or
features of others?As people change over time, what
patterns are present in the data?
13
![Page 14: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/14.jpg)
14
Why is there a concern in educational settings?
Qualitative changes Experienced differently Described differently
Perceived discontinuities Requires different measurement
instruments in different areas of the scale
What makes a vertical achievement scale different from other scales?
![Page 15: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/15.jpg)
15
Compare with physical scales, e.g. temperature--
Qualitative changes Experienced differently Described differently
Perceived discontinuities Requires different measurement
instruments in different areas of the scale
What makes a vertical scale different from other scales?
![Page 16: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/16.jpg)
16
What is different about achievement scales?
Physical scales Measured directly No controversy over dimensional structure
Achievement scales Latent, inferred Differences of opinions about dimensional
structure Choice of metric determined by substantive belief
First ask the question: Is there a construct that grows over time?
Then look at structure.
![Page 17: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/17.jpg)
17
Beliefs more conducive to vertical scaling
The construct embodies a complex ability—one that has many parts and relations between the parts
The mature ability (reading or doing algebra problems) involves many component skills working together
The ability itself is unlike any of its component skills.
Complex skills are emergent properties of simpler skills and in turn become components of still more complex skills
![Page 18: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/18.jpg)
18
Why NOT use vertical scales?
Criticism centers on two major issues
Linking error
Violations of dimensionality assumptions
![Page 19: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/19.jpg)
19
Why NOT use vertical scales? Trying to merge two or more existing
scales can be tricky (e.g., merging existing benchmark scales).
Merging scales from tests given far apart in time can be difficult to interpret (e.g. Haertel’s analysis of NAEP scales)
Fixed form linking may be too weak for vertical scaling (e.g., Huynh, Meyer & Barton)
![Page 20: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/20.jpg)
20
Issue #1: Linking creates error
What is linking?
Finding common information to associate students and items to the same scale
Common item linking.Common person linking.
Finding the unknown from the known
![Page 21: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/21.jpg)
21
Issue #1: Linking creates error
There is some error associated with all measurement, but current methods of vertical scaling greatly minimize it. These methods include:
--triangulation with multiple forms or common person links
--comprehensive and well-distributed linking blocks
--continuous adjacent linking--fixed parameter linking in adaptive context
![Page 22: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/22.jpg)
22
How do people actually create and maintain vertical scales?
Harcourt – common person for SAT and comprehensive linking blocks
CTB – methods include concurrent calibration, non-equivalent anchor tests (NEAT), innovative linking methods
ETS – (the king of NEAT) – also uses an integrated IRT method (Davier & Davier)
![Page 23: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/23.jpg)
23
How do we do it?
Scale establishment method extensively described in
Probability in the Measurement of Achievement
By George Ingebo
![Page 24: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/24.jpg)
24
How do we do it?Extensive initial linking
A
C D
B
1
3
4
1
2
3
423
4
2
3
![Page 25: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/25.jpg)
Vertical Linking Block
Benchmark X Form
Benchmark X +1
Form
Fixed Form Vertical Linking for non-adjacent grades
![Page 26: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/26.jpg)
26
Benchmark X
Benchmark
X +1
Adaptive Continuous Vertical Linking
![Page 27: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/27.jpg)
27
Issue #2Dimensionality
Reading and mathematics at grade 3 looks very different than those subjects at grade 8. In addition, the curricular topics differ at each grade.
How can they be on the same scale?
![Page 28: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/28.jpg)
2828
The Assumption of Unidimensionality
A student’s response to an item is determined by his or her ability in the subject (construct) being tested.
When this single ability is taken into account, there is no correlation among items.
The underlying construct does not have statistical dimensions or factors.
Is this a convenient fiction?
![Page 29: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/29.jpg)
29
McCall & Hauser - Item response theory and longitudinal Modeling: The real world is less complicated than we fear. Presented at the MSDE/MARCES Conference-In press
Do content areas within grades form statistical dimensions?
Does essential unidimensionality hold throughout the scale?
Looking for method to evaluate dimensionality in CATs
Study of Dimensionality:
![Page 30: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/30.jpg)
30
Used reading and mathematics items following state content design in grades 3 through 8-- 252 items in each subject
Items had been used in fixed form tests within grades and had also been administered adaptively across grades.
Were able to look compare dimensionality of an item set used on both fixed form and adaptive tests.
Study of Dimensionality:
![Page 31: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/31.jpg)
31
Do content areas within grades form statistical dimensions?
Used method from
Bejar (1980). “A procedure for investigating the unidimensionality of achievement tests based on item parameter estimates” J of Ed Meas, 17(4), 283-296
Calibrate each item twice; once, using responses to all items on the test (the usual method); again using only responses to items in the same goal area.
![Page 32: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/32.jpg)
32
![Page 33: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/33.jpg)
33
![Page 34: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/34.jpg)
34
Does essential unidimensionality hold throughout the scale?
Dimensionality detection methods usually involve looking at common-form tests. Is there a good way to examine dimensionality in CATs?
Use Yen’s Q3 statistic to do an exploratory dimensionality study
![Page 35: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/35.jpg)
35
Pairs of responses from adaptive tests – NWEA’s Measures of Academic Progress
Over 49 million response pairs per subjectLimited study to pairs that had occurred on at least 120 tests.
READING MATH
Number of Items 252 252
Number of valid item pairs 25,713 20,449
![Page 36: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/36.jpg)
36
Basic concept: When the assumption of unidimensionality is satisfied, responses exhibit local independence. That is, when the effects of theta are taken into account, correlation between responses is zero.
Q3 is the correlation between residuals of response pairs.
![Page 37: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/37.jpg)
37
)exp(1
1)|1()(
buPP ijki
where:
uik is the score of the kth examinee on the ith item Pi(k) is as given in the Rasch model:
)( kiikik Pud
dik is the residual:
![Page 38: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/38.jpg)
38
jiddij rQ
The correlation taken over examinees who have taken item i and item j is:
Fishers r to z’ transformation gives a normal distribution to the correlations:
)1ln()1ln(5.' rrz
Q3 values tend to be negative (Kingston & Doran)
![Page 39: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/39.jpg)
39
![Page 40: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/40.jpg)
40
![Page 41: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/41.jpg)
41
Pairs of responses from adaptive tests – NWEA’s Measures of Academic Progress
READING MATH
Mean Fishers z' -0.025 -0.020
Standard Deviation z' 0.041 0.050
These are very small Q3 values compared to what we had seen in the literature.
This indicates that the constructs are unidimensional within and across grades
![Page 42: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/42.jpg)
42
Good news, right?
We concluded that our scale was essentially unidimensional within each grade and that the vertical scale was unidimensional throughout.
But then we started thinking…..
![Page 43: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/43.jpg)
43
Is Q3 adequate for evaluating CAT dimensionality?
Adaptive tests seek the most informative items for the examinee, quickly homing in on items whose expected p-value is around .5.
There is a possibility that variance of residuals is restricted leading to low correlations.
NEW Study – establish plausible Q3 values to aid interpretation
![Page 44: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/44.jpg)
44
Criteria adopted Using the standard deviation of the Q3 statistic for the
unidimensional condition (.011), the criteria for large Q3 statistics were set as more than .022 from the mean for each condition.
Criteria for large Q3 statistics for simulated data are:
-.047 < Q3 < -.0036
Most of the Condition 4 pairs with large positive Q3 statistics are items from the same half of the test. Pairs with large negative correlations are from different halves. Q3 can detect violations of local independence.
![Page 45: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/45.jpg)
45
Criteria adopted for adaptive data
MeanStandardDeviation
Number of Item Pairs
LowThreshold
HighThreshold
Adaptive Reading data -0.024 0.029 19,774 -0.045 -0.0018
Adaptive Math data -0.018 0.037 13,864 -0.039 0.0043
Condition 1 data -0.025 0.011 780 -0.047 -0.0036
Neither reading nor math showed patterns of local dependence corresponding to grade level. Reading did not show local dependence corresponding to content structure. Mathematics did show evidence of local dependence related to content structure.
![Page 46: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/46.jpg)
What we have found regarding dimensionality:
New topics build on earlier ones and show up statistically as part of the construct
Although they may not be specified in later standards, early topics and skills are embedded in later ones (e.g., phonemics, number sense)
Essential unidimensionality holds throughout the scale with minor dimensions of interest
![Page 47: Vertical Scaling and the Development of Skills Marty McCall Northwest Evaluation Association WERA/OSPI State Assessment Conference SeaTac, WA December](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649eff5503460f94c13ff5/html5/thumbnails/47.jpg)
47
Thank you for your attention.
Marty McCallNorthwest Evaluation Association
5885 SW Meadows Road, Suite 200Lake Oswego, Oregon 97035-3256
Phone: 503-624-1951FAX: 503-639-7873