iccs nrc(feb10)iccsworkshop pvs

28
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010

Upload: alex-guerrero-chinga

Post on 18-Feb-2016

228 views

Category:

Documents


0 download

DESCRIPTION

ICCS workshop

TRANSCRIPT

Page 1: Iccs Nrc(Feb10)Iccsworkshop Pvs

Introduction to plausible values

National Research Coordinators Meeting Madrid, February 2010

Page 2: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Content of presentation

• Rationale for scaling• Rasch model and possible ability

estimates• Shortcomings of point estimates• Drawing plausible values• Computation of measurement error

Page 3: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Rationale for IRT scaling of data

• Summarising data instead of dealing with many single items

• Raw scores or percent correct sample-dependent

• Makes equating possible and can deal with rotated test forms

Page 4: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

The ‘Rasch model’

• Models the probability to respond correctly to an item as

• Likewise, the probability of NOT responding correctly is modelled as

in

innii XP

exp1exp)1(

)exp(11)0(

inniXP

Page 5: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

IRT curves

0

0.5

1

-4 -3 -2 -1 0 1 2 3 4

Page 6: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How might we impute a reasonable proficiency value?

• Choose the proficiency that makes the score most likely– Maximum Likelihood Estimate– Weighted Likelihood Estimate

• Choose the most likely proficiency for the score– empirical Bayes

• Choose a selection of likely proficiencies for the score– Multiple imputations (plausible values)

Page 7: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Maximum Likelihood vs. Raw Score

0

1

2

3

4

5

Proficiency

Scor

e

Page 8: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

The Resulting Proficiency Distribution

Score 0

Score 1

Score 2

Score 3Score 4

Score 5

Score 6

Proficiency on Logit Scale

Page 9: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of Maximum Likelihood Estimates (MLE)• Unbiased at individual level with

sufficient information BUT biased towards ends of ability scale.

• Arbitrary treatment of perfects and zeroes required

• Discrete scale & measurement error leads to bias in population parameter estimates

Page 10: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of Weighted Likelihood Estimates

• Less biased than MLE• Provides estimates for perfect and

zero scores• BUT discrete scale &

measurement error leads to bias in population parameter estimates

Page 11: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Plausible Values

• What are plausible values?

• Why do we use them?

• How to analyse plausible values?

Page 12: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Purpose of educational tests

• Measure particular students(minimise measurement error of individual estimates)

• Assess populations(minimise error when generalising to the population)

Page 13: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Posterior distributionsfor test scores on 6 dichotomous items

Page 14: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Empirical Bayes – Expected A-Priori estimates (EAP)

Page 15: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of EAPs

• Biased at the individual level but unbiased population means (NOT variances)

• Discrete scale, bias & measurement error leads to bias in population parameter estimates

• Requires assumptions about the distribution of proficiency in the population

Page 16: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Plausible Values

Score 0

Score 1

Score 2

Score 3 Score 4

Score 5

Score 6

Proficiency on Logit Scale

Page 17: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of Plausible Values

• Not fair at the student level• Produces unbiased population parameter

estimates– if assumptions of scaling are reasonable

• Requires assumptions about the distribution of proficiency

Page 18: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Estimating percentages below benchmark with Plausible Values

Level One Cutpoint

The proportion of plausible values less than the cut-point will be a superior estimator to the EAP, MLE or WLE based values

Page 19: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Methodology of PVs

• Mathematically computing posterior distributions around test scores

• Drawing 5 random values for each assessed individual from the posterior distribution for that individual

Page 20: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

What is conditioning?

• Assuming normal posterior distribution: • Model sub-populations:

X=0 for boyX=1 for girl

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69

2,N

2,N X

2...,N X Y Z

Page 21: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Conditioning Variables• Plausible values should only be

analysed with data that were included in the conditioning (otherwise, results may be biased)

• Aim: Maximise information included in the conditioning, that is use as many variables as possible

• To reduce number of conditioning variables, factor scores from principal component analysis were used in ICCS

• Use of classroom dummies takes between-school variation into account (no inclusion of school or teacher questionnaire data needed)

Page 22: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Plausible values

• Model with conditioning variables will improve precision of prediction of ability (population estimates ONLY)

• Conditioning provides unbiased estimates for modelled parameters.

• Simulation studies comparing PVs, EAPs and WLEs show that– Population means similar results– WLEs (or MLEs) tend to overestimate variances– EAPs tend to underestimate variance

Page 23: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Calculating of measurement error

• As in TIMSS or PIRLS data files, there are five plausible values for cognitive test scales in ICCS

• Using five plausible values enable researchers to obtain estimates of the measurement error

Page 24: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How to analyse PVs - 1

• Estimated mean is the AVERAGE of the mean for each PV

• Sampling variance is the AVERAGE of the sampling variance for each PV

M

iiM 1

ˆ1ˆ

M

iiM 1

2)(

2)( ˆ1ˆ

Page 25: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How to analyse PVs - 2

• Measurement variance computed as:

• Total standard error computed from measurement and sampling variance as:

25

1

2)( ˆˆ

11ˆ

i

iPV M

2 2ˆ ˆ ( )( ) ( )

1ˆ ˆ ˆ(1 )PV PVM

Page 26: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How to analyse PVs - 3

can be replaced by any statistic for instance:- SD- Percentile- Correlation coefficient- Regression coefficient- R-square- etc.

Page 27: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Steps for estimating both sampling and measurement error

• Compute statistic for each PV for fully weighted sample

• Compute statistics for each PV for 75 replicate samples

• Compute sampling error (based on previous steps)

• Compute measurement error• Combine error variances to calculate

standard error

Page 28: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Questions or comments?