robert l. linn cresst, university of colorado at boulder presentation at the ninth annual maryland...

44
Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions, New Directions and Application. College Park MD: University of Maryland. Sponsored by the Maryland State Department of Education and the Maryland Assessment Research Center for Education Success, October 9 and 10, 2008 The Concept of Validity in the Context of NCLB

Upload: sterling-jonson

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Robert L. Linn

CRESST, University of Colorado at Boulder

Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions, New Directions and Application. College Park MD: University of Maryland. Sponsored by the Maryland State Department of Education and the Maryland Assessment Research Center for Education Success, October 9 and 10, 2008

The Concept of Validity in the Context of NCLB

Page 2: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Validity

Points of Broad Consensus

• Validity is the most fundamental consideration in the evaluation of the appropriateness of claims about, and uses and interpretations of assessment results.

• Validity is a matter of degree rather than all or none.

Page 3: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Validity (continued)

Broad, but not universal agreement (for exception, see Lissitz & Samuelson, 2007)

• It is the uses and interpretations of tests rather than the test itself that is validated.

• Validity may be relatively high for one use or interpretation of assessment results by quite low for another use or interpretation.

Page 4: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Validity (continued)

• A comprehensive validation program for state tests used for purposes of NCLB requires systematic analysis of the myriad uses, interpretations, and claims that are made.

• Evidence relative to particular uses, interpretations and claims needs to be accumulated and organized into relevant validity arguments (Kane, 2006).

Page 5: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

1999 Test Standards

• “Validity is the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests.”

• Validation logically begins with an explicit statement of the proposed interpretation of test scores, along with a rationale for the relevance of the interpretation to the proposed use.”

(AERA, APA, & NCME, 1999, p. 9).

Page 6: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Foundation for position in the Test Standards

Concept of validity in the Test Standards builds on the work of major validity theorists

• Cronbach (1971, 1980, 1988, 1989)

• Kane (1993)

• Messick (1975, 1989)

• Shepard (1993)

Page 7: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Kane (2006) Argument-Based Approach

• Interpretive Argument: specification of proposed interpretations and uses of

• Validation Argument: evaluation of the interpretive argument

• Builds on earlier work by Cronbach (1989), Kane(1992), Messick (1989), and Shepard (1993)

Page 8: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Validity Argument(Cronbach, 1988)

Functional perspectivePolitical perspectiveOperationalist perspectiveEconomic perspectiveExplanatory perspective

Page 9: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Guiding Questions Shepard (1993)

• “What does the testing practice claim to do?

• What are the arguments for and against the intended aims of the test?

• What does the test do in the system other than in claims?” (p. 429)

Page 10: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

NCLB Accountability

• States required to administer tests of mathematics and Reading or English language arts required for all students grade 3 though 8

• Science tests required for one grade in each of three levels: elementary, middle, and high school

Page 11: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

NCLB Accountability (continued)

• States had to adopt academic achievement standards defining proficient performance and two other levels (usually called basic and advanced)

• States had to establish targets, known as annual measurable objectives (AMO’s) that would be on trajectories that would lead to all students being at the proficient level or above by 2014

Page 12: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

NCLB Targets

• Current Status: AMO is percent proficient each year that is set to be on a trajectory to 100% proficient or above by 2014

• Change: Safe harbor allows school to make AYP if percentage of students is reduced by at least 10% compared to previous year

Page 13: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

NCLB Targets (continued)

Disaggregated reporting for subgroups

• Economically disadvantaged students

• Major racial and ethnic groups

• Students with disabilities

• Students with limited English proficiency

Page 14: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

NCLB Targets (continued)

Subgroup reporting

• Critical for monitoring the closing of gaps in achievement

• No real relevance for small schools with homogeneous student bodies

• However, it leads to many hurdles that large, diverse schools must meet

Page 15: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Multiple-Hurdle Approach

• NCLB uses multiple-hurdle approach

• Schools must meet multiple targets each year – participation and achievement separately for reading and mathematics for the total student body and for subgroups of sufficient size

Page 16: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Multiple-Hurdle Approach (continued)

• Many ways to fail to make AYP (miss any target), but only one way to make AYP (meet or exceed every target)

• Large schools with diverse student bodies at a relative disadvantage in comparison to small schools or schools with relatively homogeneous student bodies

Page 17: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Growth Models

• Growth Pilot Program: Percentage of students who are either proficient or on a growth trajectory toward proficient within three years

• Restriction of growth results for AYP by rapid growth trajectory has meant that few schools that would not make AYP under status approach do so because of growth approach

Page 18: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Primary Use and Interpretation of Test Results for NCLB

• Use: Identification of schools as making or failing to make AYP– Schools that fail to make AYP two or more

years in a row placed in “needs improvement” category

• Interpretation: Schools that make AYP or better or more effective than schools that fail to make AYP

Page 19: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Multi-level Interpretations

• Validity of interpretations of individual student scores not equivalent to validity of interpretations of aggregate results (Zumbo & Forer, in press)

• Need to think in terms of validation at aggregate level (e.g., school or school district) as well as individual student level

Page 20: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Validation of School Quality Inference

• Validating the claim that if school A makes AYP it is of higher quality or more effective than school B that fails to make AYP requires elimination of plausible hypotheses for difference in AYP status– AYP differences due to higher achievement at school

A higher than school B in earlier years, e.g., when children enter school

– AYP Differences due to differences in demographics– Differences due to differences in parental support

Page 21: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Inferences from Growth Models

• Growth models rule out the alternate explanation of differences in prior achievement

• Nonetheless, causal inferences about school effectiveness are not justified by the growth approach to test-based accountability (Raudenbush, 2004, Rubin, Stuart, & Zanutto, 2004)

Page 22: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Growth Model Results

• Many rival explanations to between-school differences in growth besides differences in school quality or effectiveness

• Results better thought of as descriptive for generating hypotheses about school quality that need to be evaluated

Page 23: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

School Characteristicsand Instructional Practice

• School differences in achievement and in growth describe outcomes and can be the source of hypotheses about school effectiveness

• Accountability systems need to be informed by direct information about school characteristics and instructional practices

Page 24: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

NCLB Peer Review

• Peer Review Purposes1.Inform states about what would be Useful

Evidence

2.Guide review teams who advise the Department

Page 25: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Validity Evidence for Peer Review

• Related to test content

• Based on relationships to other variables

• Based on student response processes

• Based on internal structure

• Alignment of assessments to content standards

• Based on consequences of assessments

Page 26: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Consequences and Validity

• “Perhaps the most contentious topic in validity is the role of consequences” (Brennan, 2006, p. 8).

• Although investigations of consequence of test uses commonly referred to as “consequential validity”, Messick did not use that designation.

Page 27: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Messick’s Facets of Validity

Test Interpretation

Test

Use

Evidential

Basis

Construct

Validity

Construct Validity + Relevance/Utility

Consequential

Basis

Value

Interpretations

Social Consequences

Page 28: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Controversy

• Many experts (e.g., Popham, Mehrens, Green, Ebel, and, most recently, Lissitz and Samuelson) have argued that consequences should not be considered part of validity, while others (e.g., Lane, Linn, Moss, Shepard, Brennan, and Kane) have argued that they should be considered as part of validity.

Page 29: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Controversy (continued)

• Fairly broad agreement that it is important to look at positive and negative effects of test use as part of overall evaluation, even if such and evaluation is considered beyond the scope of validation, per se.

Page 30: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Peer Review Guidance on Consequences

“In validating an assessment, the State must also consider the consequences of its interpretation and use. Messick (1989) points out that these are different functions and that the impact of an assessment can be traced either to an interpretation or to how it is used. Furthermore, as in all evaluative endeavors, States must attend not only to the intended outcomes, but also to unintended effects” (U.S. Department of Education, 2004, p. 33).

Page 31: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Test Standards

• Narrow view of consequences and validity– Consequences that are directly due to the

way in which the construct is measured– Degree to which intended benefits are

realized– Excludes “evidence that may inform decisions

about social policy but falls outside the realm of validity

Page 32: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Test Standards

• 1.24 “When unintended consequences result from test use an attempt should be made to investigate whether such consequences arise from the test’s sensitivity to characteristics other than those it is intended to assess or to the test’s failure fully to represent the intended construct” (1999, p. 23).

Page 33: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Michael Kane

• “Consequences have always been a part of our conception of validity… Traditional definitions of validity in terms of how well a testing programs achieves its goals… necessarily raise questions about consequences, positive and negative” Kane, 2006, p. 54).

Page 34: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Consequences of Uses of NCLB Assessments

• Controversy regarding consequences as a component of validity, but not about the importance of evaluating consequences

• Frameworks– Bill Mehrens– Suzanne Lane and her colleagues

Page 35: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Mehrens Framework

• Curricular and instructional reform

• Teacher motivation and stress

• Student motivation and self concept

• Changes in student achievement

• Public awareness of student achievement

Page 36: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Lane, et al Framework

• Identification of a set of propositions about consequences that are central to an interpretive argument– (e.g., School administrators and teachers are motivated to adapt

instruction and curriculum to the content standards)– (e.g., students are motivated to learn as well as to perform their

best on the assessment)

• Teacher and student questionnaires and interviews regarding motivation and instructional practices

• Collection of multiple indicators of student achievement

Page 37: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Frameworks of Lane and Mehrens

• Applicable to the status approach to AYP as well as to growth model approach to AYP, and/or other types of accountability uses of growth models, e.g., value-added models.

• With growth models the emphasis on student learning may be greater than in a status approach to accountability.

Page 38: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Curricular and instructional reform

• Questionnaire studies of are most common– Teachers– Principals

• Interviews– Teachers– Principals

• Qualitative studies

• Collection of instructional artifacts

Page 39: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Teacher motivation and stress - Student motivation and self concept

• Questionnaire studies are most common

– Teachers– Students

• Interviews– Teachers– Students

• Qualitative studies

Page 40: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Student achievement

• Center on Education Policy– Tracked trends on state tests before and after

enactment of NCLB– Tracked size of achievement gaps– Compared trends in achievement and gaps on

state tests to NAEP– Generally modest increases in achievement

and modest reductions in size of gaps– Doesn’t prove effect of NCLB tests but

generally consistent with intention

Page 41: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Alternate Assessments

• Inclusion of students with severe cognitive disabilities in alternate assessments intended to improve learning for those students

• Inclusion judged to be having positive effects on students participating in alternate assessments

• Need more evidence of influence on instruction for included students and effects on their learning

Page 42: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

End-of-Course Tests

• Use of questionnaires, interviews, and collection of instructional artifacts to document changes in – Rigor of courses and instruction– Uniformity of instruction across schools– Student course taking patterns– Student dropout rates

Page 43: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Conclusion

Two major validity issues yet to be addressed by states regarding their NCLB testing programs

1. Validity of inferences about school quality based on test-based AYP determinations for schools

2. Consequences of state testing programs used for purposes of NCLB

Neither issue is easy to address, but both are important to the justification of state testing programs used for NCLB

Page 44: Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

“Validation is doing your damnedest with your mind – no holds barred. Eddington, as you know said that about science” (Cronbach, 1988, p. 14).