performance assessments with mfrm

39
1 Producing Unbiased Performance Assessment Scores Using the Many-Facet Rasch Model Ross Brown, Ph.D. Measurement Incorporated

Upload: ross-brown-phd-senior-level-psychometrician

Post on 20-Feb-2017

76 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Assessments with MFRM

1

Producing Unbiased Performance Assessment Scores Using the Many-

Facet Rasch Model

Ross Brown, Ph.D.Measurement Incorporated

Page 2: Performance Assessments with MFRM

2

Background

• For more than two decades, the MRA Division of MI has used the many-facet Rasch model (MFRM) for the analysis of client performance assessments.

• Using MFRM for performance assessments offers benefits relating to measurement, fairness, administration, resources, and security.

Page 3: Performance Assessments with MFRM

3

Other Session Objectives

• Understand the psychometric properties of a many-facet Rasch measurement approach to performance assessment scoring

• Understand how stakeholder concerns regarding a MFRM approach can be addressed

• Understand the basics of setting up a performance assessment for a MFRM analysis, as well as setting a passing standard and equating that standard

Page 4: Performance Assessments with MFRM

4

Why Performance Assessments?

• Performance assessments complement written examinations, allowing testing organizations to assess candidates on higher-level decision-making abilities.

• Our clients often use performance assessments to measure candidates’ abilities to apply skills such as diagnosis, treatment, and management of complications in a clinical context, replicating real-world patient situations.

Page 5: Performance Assessments with MFRM

5

Performance Assessment Format

• Examiners rating candidate performance on standardized protocols, i.e., hypothetical patient scenarios, or candidates’ actual patients

• Candidates describe how they would diagnose and treat.

• Some other permutations• The methods we use for organizing and analyzing

such performance assessment can be used in different ways and in different fields.

Page 6: Performance Assessments with MFRM

6

Benefits of MFRM: Fairness

• Different examiners have different levels of severity when they assign ratings to candidate

• If candidate outcomes were determined based on raw scores alone, the severity of individual examiners could differentially affect candidates.

• MFRM allows for the severity of individual examiners to be accounted for before candidate scores are calculated.

Page 7: Performance Assessments with MFRM

7

Benefits of MFRM: Security

• Using MFR approach, different exam content (i.e., patient scenarios) can be used for different candidates, reducing the likelihood that candidates will be able to accurately disclose to other candidates information about the exam content.

Page 8: Performance Assessments with MFRM

8

Benefits of MFRM: Fairness

• However, different exam content logically would have different levels of difficulty.

• If candidate outcomes were determined using raw scores, this differential difficulty could unfairly penalize or benefit individual candidates.

• But calculating candidate scores using the MFRM takes into account differences in the particular exam content that different candidates are tested on.

Page 9: Performance Assessments with MFRM

9

Structuring a Performance Assessmentfor MFRM Analysis

• Examiners interview candidates regarding exam material such as standardized patient

• Examiners lead the discussion, asking pointed questions about how candidates would do such things as diagnose patients illnesses, treat patients, and manage complications.

Page 10: Performance Assessments with MFRM

10

Structuring a Performance Assessment

• Examiners use a rating scale, typically with four points on it, to assign ratings to candidates’ responses.

• Candidates rotate between examiners who assess them on different protocols.

• Ratings are assigned to specific skills, related to the exam materials, such as diagnosis, treatment, management of complications.

• Therefore, you have examiners rating candidates’ performance on skills within protocols.

Page 11: Performance Assessments with MFRM

11

Linking Facet Elements

Facets of the performance assessment:

• Candidates• Examiners• Protocols• Skills

Page 12: Performance Assessments with MFRM

12

Linking Facet Elements

• To quantify and account for differences in the severity of individual examiners and the difficulty of individual protocols, the performance assessment must be carefully structured so that there is overlap of examiners’ ratings on candidates and protocols.

• This overlap links the different facet elements and allows for differences between individual elements to be quantified and accounted for.

Page 13: Performance Assessments with MFRM

13

Benefits: Resources and Administration

• No adjustments necessary if all candidates perform the same skills on all protocols and are evaluated by the same examiners.

• Reality: This is usually too expensive or logistically impossible.

• MRFM: Candidates interact with some examiners on selected protocols; each candidate takes a parallel examination form.

Page 14: Performance Assessments with MFRM

14

Benefits: Resources and Administration

• The differences and biases in each of these examination forms must be accounted for to make the candidate ability estimates reasonably consistent, objective and reproducible.

• Organizing a PA this way also affords benefits in terms of the resources required to conduct the PA, and the administration of the PA

Page 15: Performance Assessments with MFRM

15

Benefits: Resources and Administration

• Candidates moved through several pairs of examiners who assess them on several protocols.

• A lot of performance information is collected efficiently as several candidates are assessed simultaneously.

Page 16: Performance Assessments with MFRM

16

Benefits: Resources and Administration

• Like the regular Rasch model with only two facets (persons and items), the MFRM produces candidate ability estimates of known precision (error) and reproducibility (reliability).

• Testing organizations can scale their performance assessments so that they achieve the measurement precision and reliability they desire with the resources (time for administration, number of examiners) that they have available.

Page 17: Performance Assessments with MFRM

17

Pnmijk = probability of person n being rated in category k by examiner m on skill j in protocol i,

Pnmij(k-1) = probability of person n being rated in category (k – 1) by examiner m on skill j in protocol i,

Bn = the ability of candidate n,Sm = the severity of examiner m,Ci = the difficulty of protocol i,Dj = the difficulty of skill j, and Fk = the difficulty of the step up from category (k – 1) to

category k.

Psychometric Model

kjimnknmij

nmijk FDC SBP

Plog

)1(

Page 18: Performance Assessments with MFRM

18

Psychometric Model

• Probability of a performance: A function of the difference between candidate ability and skill difficulty, after adjustment for the severity of the examiner and the difficulty of the protocol.

• If after adjustment, candidate's ability is higher, then the probability of an acceptable performance is greater than 50%.

• If after adjustment, skill difficulty is greater than the ability of the candidate, the probability of achieving an acceptable performance is less than 50%.

Page 19: Performance Assessments with MFRM

19

Psychometric Model: Ordering Facet Elements

• Ordering of the candidates, examiners, protocols, and skills on a linear scale provides a frame of reference for understanding the relationship of the facets of the PA:

• Candidate ability (Bn) from highest to lowest• Skill difficulty (Dj) from most to least difficult• Examiner severity (Sm) from most to least severe• Protocol difficulty (Ci) from most to least difficult.

Page 20: Performance Assessments with MFRM

20

Psychometric Model: Sums of Ratings

• Ratings given by examiners are the basic units of analysis.

• Skill difficulty is calculated from all ratings given to all candidates by all examiners on the skill.

• Protocol difficulty includes all ratings given to all candidates by all examiners on the protocol.

• Examiner severity includes the ratings given by the examiner on all skills across all protocols to all candidates encountered.

Page 21: Performance Assessments with MFRM

21

Psychometric Model: Logits

• Estimates are based on probability of performance given the nature of the facets of the examination encountered by a candidate.

• Log odds units or logits are used to construct an equal interval scale.

• All facet element calibration estimates (candidate ability, examiner severity, skill and/or protocol difficulty) are reported in logits, with a mean of zero.

Page 22: Performance Assessments with MFRM

22

Psychometric Model: Measurement Statistics

• Error • Reliability• Fit

Page 23: Performance Assessments with MFRM

23

Psychometric Model: Fit

• Estimates of the consistency of the ratings across examiners, skills, and protocols, reported as the fit of the data to the model. Fit statistics indicate inconsistent rating patterns on any of the facets.

• Model expects observed ratings to be consistent: • More able candidates should earn higher ratings

more frequently than less able candidates from all examiners on skills within the protocols.

• More difficult skills and protocols cause lower ratings to be awarded more frequently than easier skills and protocols by all examiners.

Page 24: Performance Assessments with MFRM

24

Psychometric Model: Fit

• Fit statistic is the ratio of the observed rating to the expected (modeled) rating

• 1 is perfect fit; range of acceptable fit is generally 0.5 to 1.5, although more stringent criteria have been suggested for high-stakes examinations.

Page 25: Performance Assessments with MFRM

25

Fit Statistic: Examiners

• The fit statistics for examiners indicate the degree to which each examiner is internally consistent across candidates, skills, and protocols (intra-examiner consistency).

• The fit statistic allows examiners who award unexpectedly high or low ratings to some candidates on some skills or protocols to be identified.

Page 26: Performance Assessments with MFRM

26

Fit Statistic: Candidates, Protocols and Skills

• The fit statistic for each candidate, protocol and skill indicates inter-examiner consistency.

• Misfit indicates that some examiners deviated significantly from others when grading the skill or protocol for some candidates.

• This information is useful for testing organizations to monitor, and, if necessary, conduct additional analysis to identify which rating situations are resulting in the larger unexpected ratings.

Page 27: Performance Assessments with MFRM

27

Guidelines for Implementing a MFRM PA

• Development of the rating scale is critical• Allows for a “disciplined dialogue” among

examiners about candidate performance• Rating scale example: Unacceptable, Deficient,

Acceptable and Excellent• Defining these terms and providing specific

examples of of candidate performance for each scale point is critical

Page 28: Performance Assessments with MFRM

28

Content Slide

• Content Slides

Page 29: Performance Assessments with MFRM

29

Content Slide

• Content Slides

Page 30: Performance Assessments with MFRM

30

Content Slide

• Content Slides

Page 31: Performance Assessments with MFRM

31

Content Slide

• Content Slides

Page 32: Performance Assessments with MFRM

32

Content Slide

• Content Slides

Page 33: Performance Assessments with MFRM

33

Content Slide

• Content Slides

Page 34: Performance Assessments with MFRM

34

Content Slide

• Content Slides

Page 35: Performance Assessments with MFRM

35

Content Slide

• Content Slides

Page 36: Performance Assessments with MFRM

36

Content Slide

• Content Slides

Page 37: Performance Assessments with MFRM

37

Content Slide

• Content Slides

Page 38: Performance Assessments with MFRM

38

Content Slide

• Content Slides

Page 39: Performance Assessments with MFRM

39

Thank You

If any questions, contact [email protected]

Please complete the session evaluation that has been distributed to you.