lesson 3: validity - cs.cas.cz · review - reliability validity test validation studies further...

27
Review - Reliability Validity Test validation studies Further issues Conclusion Lesson 3: Validity Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of Education Faculty of Education, Charles University, Prague NMST570, October 16, 2018 Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 1/24

Upload: nguyennguyet

Post on 10-Jan-2019

251 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Lesson 3: Validity

Patrícia Martinková

Department of Statistical ModellingInstitute of Computer Science, Czech Academy of Sciences

Institute for Research and Development of EducationFaculty of Education, Charles University, Prague

NMST570, October 16, 2018

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 1/24

Page 2: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Table of contents

1. Review - Reliability

2. Validity

3. Test validation studies

4. Further issues

5. Conclusion

Page 3: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Review – Reliability

Latent variableMeasurement errorReliabilityTest-retest reliabilityAlternate formsInternal consistency

Split-half (first-second half, even-odd, random, average)Cronbach’s alphaKuder-Richardson formula

Inter-rater reliability

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 2/24

Page 4: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Review – Reliability

The degree to which an assessment tool produces stable andconsistent results.Assuming X = T + e, true score and error uncorrelatedDefined as squared correlation of the true and observed scoreRel (X) = ρX = cor 2(T,X) = ρ2

T,X

Equivalently: the ratio of the true score variance to total observedvariance ρX = var (T )

var (X) =σ2T

σ2X

=σ2X−σ2

e

σ2X

= 1− σ2e

σ2X

Correlation between two independent, equaly precise measurements,measuring the same construct ρX = ρX1,X2

ρX ∈ 〈0, 1〉

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 3/24

Page 5: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Review – Reliability of composite measurements

Goal is to provide multiple converging pieces of informationE.g. educational tests, scales, questionnaires, . . .

What is the relationship between reliability of composite measurementX =

∑mj=1Xj and reliability of its components?

Spearman-Brown prophecy formula (1910)

Assume m parallel measurements X1, . . . , Xm (independent, equallyprecise, with uncorrelated errors and uncorrelated with true scores). Thenreliability of each Xi is the same ρ and the reliability of compositemeasurement X is

ρX =m · ρ

1 + (m− 1)ρ

Remark: Adding parallel items increases reliability of total score.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 4/24

Page 6: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Generalized prophecy formula

Spearman-Brown prophecy formula (generalized)

Assume test composed of m1 parallel measurements X =∑m1

j=1Xj andits prolonged or shortened version composed of m2 parallel measurementsX =

∑m2j=1Xj . Then the relationship between their reliabilities is

ρm2 =m2m1· ρm1

1 + (m2m1− 1)ρm1

Proof (hint): Notice that

ρ1 =1m1· ρm1

1 + ( 1m1− 1)ρm1

=1m2· ρm2

1 + ( 1m2− 1)ρm2

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 5/24

Page 7: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Review - Reliability and Validity

high reliability does not ensure high validityvalidity is bounded by reliability

Low reliability thus low validity High reliability but low validity High reliability and high validity

cor (X1, X2) =cov (X1, X2)√

var (X1)√

var (X2)=

cov (T1, T2) + 0 + 0 + 0√var (T1)var (X1)

var (T1)

√var (T2)var (X2)

var (T2)

= cor (T1, T2)√

Rel (X1)Rel (X2)

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 6/24

Page 8: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Test validity

The degree to which evidence and theory support the interpretationsof test scoresThe degree to which test measures what it is supposed to measure

Content-relatedFace validityConstruct validityContent validity

Criterion-relatedConcurrentPredictiveIncremental

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 7/24

Page 9: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Content-related validity

Construct validityExtent to which a test captures a specific theoretical constructSubsumes other types of validity

convergent validity: associated with things it should bediscriminant validity: not associated with things it should not be

Needs empirical and theoretical evidenceanalyses of the internal structure (correlations between item answers)

Content validityDoes the test cover the domain to be measured?Needs careful selection of which items to include

Face validityDoes the test “apear to” measure what it aims to?(to a member of target population)

Advantage: respondent can use context to help interpret the questionDisadvantage: respondent might try to "bend & shape"their answers

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 8/24

Page 10: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Analysis of internal structure

Correlation between answers to individual itemsFactor analysisCluster analysis

McFarland, Price, Wenderoth, Martinková, et al. Development and Validation of theHomeostasis Concept Inventory. CBE Life Sciences Education, 16(2), ar35, 2017.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 9/24

Page 11: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Criterion-related validity

Concurrent validityCorrelation with other measures of the same construct that aremeasured at the same timeE.g. admission test and IQ test

Predictive validityCorrelation with other measures of the same construct that aremeasured laterE.g. admission test and subsequent GPA or study success

Incremental validityIncrease of predictive validity, adds information beyond that providedby an existing methodsUsually assessed by multiple regressionE.g. admission test adds to prediction of subsequent GPA abovehigh-school GPA

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 10/24

Page 12: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Criterion-related validity

CorrelationRegression

Linear, logisticMultiple (accounting for more characteristics)Hierarchical (accounting for hierarchical structure -countries/schools/classes)

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 11/24

Page 13: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Example 1: Physiology concept inventories

http://www.physiologyconcepts.org/

Biology Education Research Group (BERG, University of Washington)

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 12/24

Page 14: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Cell-cell communication (CCC) conceptual framework

Study develops and validates hierarchical CCCconceptual frameworkValidation based on responses of undergraduatebiology facultySubsequently can be used for development andconstruct validation of related test on CCC

Michael J, Martinková P, McFarland JL, Wright A, Cliff W,Modell H, Wenderoth MP. Validating a conceptual framework forthe core concept of “cell-cell communications”. Advances inPhysiology Education, Vol. 41 no. 2, pp. 260-265, 2017.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 13/24

Page 15: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Homeostasis concept inventory (HCI)

HCI developed based on Homeostasis conceptual framework (HCF)Items test knowledge of individual elements in HCF

McFarland, Price, Wenderoth, Martinková, et al. Development and Validation of theHomeostasis Concept Inventory. CBE Life Sciences Education, 16(2), ar35, 2017.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 14/24

Page 16: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Homeostasis concept inventory (HCI)

McFarland, Price, Wenderoth, Martinková, et al. Development and Validation of theHomeostasis Concept Inventory. CBE Life Sciences Education, 16(2), ar35, 2017.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 15/24

Page 17: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Homeostasis concept inventory (HCI)

McFarland, Price, Wenderoth, Martinková, et al. Development and Validation of theHomeostasis Concept Inventory. CBE Life Sciences Education, 16(2), ar35, 2017.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 16/24

Page 18: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Example 2: Assessment set for Multiple Sclerosis

Set of 11 clinical tests with 2-20 items/componentsReliability

Internal consistency (Cronbach’s alpha)Test-retestValidity

ValidityStability without treatmentChanges after treatmentCorrelations with EDSSCorrelations between individual clinical tests

Řasová K, Martinková P, Vyskotová J, Šedová M. Assessment set for evaluation ofclinical outcomes in multiple sclerosis - psychometric properties. Patient RelatedOutcome Measures, 3, pp. 59-70, 2012.

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 17/24

Page 19: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Further issues in validity and reliability

Restriction in rangeCorrection of validity estimateCorrection of reliability estimate

Effect of unreliability on validity

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 18/24

Page 20: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Restriction of range

Common problem in validation studiesRestriction in range of the predictor variableExample:

Observing only admitted studentsObserving only students who did not pass the first exam

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 19/24

Page 21: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Correction for range restriction

Correction for range restriction (see e.g. Wiberg & Sundstrom, 2009)

rXY =sXrx,y√

s2Xr2x,y + s2x − s2xr2x,y

=0.99 · 0.53√

0.992 · 0.532 + 0.462 + 0.462 · 0.532= 0.80

rxy = 0.53 observed correlation btw X and Y in restricted samplesx = 0.46 estimated SD of X in restricted samplesX = 0.99 estimated SD of X in original sample

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 20/24

Page 22: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Reliability – Restriction to range

Similar issue for reliability estimateHaving restricted sample, estimate of reliability may be unproperCorrection to restriction of range (see e.g. Fife et al., 2012)

ρX = 1− σ2x

σ2X

(1− ρx)

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 21/24

Page 23: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Correction for unreliability

Example: Ratings of teacher applicants

(SB) prophecy formula to estimate reliability of average rating:

IRRY =σ2A + σ2

S + σ2AS

σ2A + σ2

S + σ2AS + σ2

B/J + σ2e/J

Standard error of measures: σ2B/J + σ2

e/J

Attenuation formula to estimate corrected correlation with VA:

cor (Y ,VA) = cor (T,VA)√IRRY

Martinková et al (2018). Disparities in ratings of internal and external applicants...https://doi.org/10.1371/journal.pone.0203002

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 22/24

Page 24: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Conclusion

In this presentation, we havePresented most important aspects/types of test validity

Content-relatedConstruct validityContent validtyFace validity

Criterion-relatedConcurrent validityPredictive validityIncremental validity

Presented examples of test validation studiesPresented further issues in validity estimation

Correction for range restrictionCorrection for unreliability

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 23/24

Page 25: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Thank you for your attention!www.cs.cas.cz/martinkova

Page 26: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

References

Kane, MT (2006). Validation. In Brennan RL (ed.), (2006). EducationalMeasurement (4th edn.). Westport, CT: Praeger. pp. 65–110.

Michael J, Martinková P, McFarland JL, Wright A, Cliff W, Modell H,Wenderoth MP. Validating a conceptual framework for the core concept of“cell-cell communications”. Advances in Physiology Education, Vol. 41 no.2, pp. 260-265, 2017.

McFarland, Price, Wenderoth, Martinková, et al. (2017). Development andValidation of the Homeostasis Concept Inventory. CBE Life SciencesEducation, 16(2), ar35. doi 10.1187/cbe.16-10-0305

Řasová K, Martinková P, Vyskotová J, Šedová M. Assessment set forevaluation of clinical outcomes in multiple sclerosis - psychometricproperties. Patient Related Outcome Measures, 3, pp. 59-70, 2012.

Martinková P, Goldhaber D, & Erosheva E. (2018). Disparities in ratings ofinternal and external applicants: A case for model-based inter-raterreliability. PLOS ONE, 13(10): e0203002.https://doi.org/10.1371/journal.pone.0203002

Page 27: Lesson 3: Validity - cs.cas.cz · Review - Reliability Validity Test validation studies Further issues Conclusion Review–Reliability Latentvariable Measurementerror Reliability

Review - Reliability Validity Test validation studies Further issues Conclusion

Vocabulary

ValidityContent-related

Construct validityContent validtyFace validity

Criterion-relatedConcurrent validityPredictive validityIncremental validity

Correction for range restrictionCorrection for unreliability

Patrícia Martinková NMST570, L3: Validity Oct 16, 2018 24/24