generalizability theory

Generalizability Theory

Nothing more practical than a good theory!

This presentation is made by Prof. ZhaoThis presentation is made by Prof. Zhao

Overview of Presentation

Classes of reliability theories Generalizability Theory

G-study D-study

Illustrations

Three Reliability Theories

Classical Test Theory Generalizability Theory Item Response Theory



G-study D-study

Illustrations

Generalizability Theory

Fundamental is the concept of parallel measures (like classical test theory), but the theory allows a multitude of error sources

Generalizability concept:Reliability is dependent on the inferences (generalizations) that the investigator wishes to make with the data from the measurement

Illustration

Essay test 7 vignette based essay questions 2 markers independently marking all

questions for all examinees Reliability in a classical framework:

Cronbach’s alpha: 0.66 Inter rater reliability (i.e. kappa) 0.71

Fundamental Equation

X =X = Observed score

T + E T = True score

E = Error score

Reliability = Variance of TVariance of X

The larger the variance of T in relation to X, the higher the

reliability




E = Error score


= = =




E = Error score


Reliability = Variance of TVar T + Var E

Multiple sources of error variance


Markers Essays Unexplained

Two steps in G analysis

1) G(eneralizability)-study:Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers)

2) D(ecision)-study:Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

G-study steps

Determine facets (factors of variance)

Determine design Random vs fixed Crossed vs nested

Crossed vs nested designs

A B

1

2

3

4

5

6

A B C D E F G H I J K L

Crosseddesign

Nesteddesign

G-study

Determine facets (factors of variance)

Determine design Random vs Fixed Crossed vs nested Collect data

Analysis of Variance (ANOVA) Estimation of variance components

Illustration 1

Essay Test 7 vignette based open ended questions 100 students One marker marked all essays for all

students G-study questions?

N of factors/facets? Random/fixed facets? Nested or crossed?

One facet designRandomCrossed

Sources of Variance

Person x Items

p ipi,e

Sources of Variance

Person x Items

ip pi,e

Sources of Variance

Person x Items

p ipi,e

Sources of Variance

Person x Items

p pi,e

Variance component estimation (one facet design)

An observed score for a person on an item (Xpi):

Xpi = [Overall mean]

+ p - [Person effect]

+ i - [Item effect]

+ pi - p - i - [Residual]

Each of these effects have an average (always 0) anda variance (2). The latter ones are the variance components.

The variance of all observes scores Xpi across all persons and items:

^

^2 (Xpi) = ^2p

^2i + ^2

pi,e +

Variance components

P x I design

Source

pi

pi,e

EstimatedVariance

Component

97.57261.24371.97

StandardError

19.02112.9817.60

Percentageof TotalVariance

13.3535.7550.90

Crossed vs nested designs

A B

1

2

3

4

5

6

A B C D E F G H I J K L

Crosseddesign

Nesteddesign

Sources of Variance

Items : Persons

p i,pi,e

Variance components

I : P design

p

i,pi,e

97.57

663.21

35.7550.90

13.35

86.65

ipi,e

261.24371.97

Source

EstimatedVariance

Component


Variance components

I : P design

p

i,pi,e

97.57

663.21

35.7550.90

13.35

86.65

ipi,e

261.24371.97

Source

EstimatedVariance

Component


pi,pi,e

97.57663.21

13.3586.65

Sources of Variance

Person x Items x Judges

p i

pij,e

pi

pj ij

j

Variance component estimation (two facet design)

An observed score for a person on an item (Xpi):

Xpi = [Overall mean]

+ p - [Person effect]

+ j - [Item effect]

+ i - [Judge effect]

+ pj - p - j + [Person by judge effect] + pi - p - i + [Person by item effect]

+ ij - j - i + [Judge x item effect]

+ pij - pj - pi - ij + p + j + i - [Residual]

The variance of observes scores Xpi across all persons and items:

^2 (Xpij) = ^2p

^2j + + ^2

i + ^2pj +

^2pi +

^2ij +

^2pij,e

Variance componentsP x I x J design

Source

pij

pipjij

pij,e

EstimatedVariance

Component

48.7125.1215.00

185.8733.1880.0072.94


10.575.453.26

40.337.20

17.3615.83



G-study D-study

Illustrations

Two steps in G analysis

1) G(eneralizability)-study:Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers)

2) D(ecision)-study:Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

Interpretation of scores

Norm-oriented perspectiveScores have relative meaning; scores have meaning in relation to each other

Domain-oriented perspectiveScores have absolute meaning to the domain of measurement

Mastery-oriented perspectiveScores have meaning in relation to a cut-off score (reliability of decisions, not of scores)




E = Error score



Illustration 1

Essay test 7 vignette based essay questions 1 markers marked all questions for all

examinees Norm-referenced perspective

Calculate generalizability coefficient!

D-study (ni = 7; norm-referenced)

Source

pi

pi,e

EstimatedVariance

Component

97.57261.24371.97

StandardError

19.02112.9817.60


13.3535.7550.90

G =T

T + E=

97.57

97.57 + 371.97/7= 0.65

Illustration 2


examinees Domain-referenced perspective

Calculate dependability coefficient!

D-study (ni = 7; domain referenced)

Source

pi

pi,e

EstimatedVariance

Component

97.57261.24371.97

StandardError

19.02112.9817.60


13.3535.7550.90

D =97.57

97.57+= 0.52

261.24/ 7

+371.97/ 7

Illustration 3


examinees Domain-referenced perspective

Calculate dependability coefficient fora sample of 10 essays!

D-study (ni = 10; domain referenced)

Source

pi

pi,e

EstimatedVariance

Component

97.57261.24371.97

StandardError

19.02112.9817.60


13.3535.7550.90

D =97.57

97.57+= 0.61

261.24/10

+371.97/ 10

D-studies for several item samples

N Essays

1571015

GeneralizabilityCoefficient (G)

0.210.570.650.720.80

DependabilityCoefficient (D)

0.130.440.520.610.70

Illustration 4

Essay test 7 vignette based essay questions 2 markers independently marked all

questions for all examinees Norm-referenced perspective


D-study (ni=7; nj=2; norm referenced)

Source

pij

pipjij

pij,e

VarianceComponent

48.7125.1215.00

185.8733.1880.0072.94

% of TotalVariance

10.575.453.2640.337.2017.3615.83

G =48.71

48.71+= 0.50

185.87/ 7

+33.18/2

+72.94/2 x 7

Illustration 5

Essay test 7 vignette based essay questions 2 markers independently marked all

questions for all examinees Domain-referenced perspective

Calculate dependability coefficient!

D-study (ni=7; nj=2; domain referenced)

Source

pij

pipjij

pij,e

VarianceComponent

48.7125.1215.00

185.8733.1880.0072.94

% of TotalVariance

10.575.453.2640.337.2017.3615.83

D =48.71

48.71+= 0.43

25.12/ 7

+15.00/2+185.87/

14+33.18/

2+80.00/

14+72.94/

14

Illustration 6

Essay test 7 vignette based essay questions 2 different markers

independently marked each question for all examinees

Norm-referenced perspective


D-study (ni=7; nj=2; norm referenced)

SourceEstimated Var

ComponentPerc of Total

Variance

(Judges : Items) x Persons

pi

j,ijpi

pj,pij,e

48.7125.1895.00

185.87106.12

10.575.45

20.6240.3323.03

G =48.71

48.71+= 0.52

185.87/ 7

+ 106.12/2 x 7

D-study summary table

TwoMarkers

0.440.500.560.61

OneMarker

0.390.470.560.65

TwoMarkers

0.460.540.630.72

Same Markerfor all essays

Different Markerfor each essayNumber

ofEssays

571015

OneMarker

0.360.410.450.49

Norm-referenced score interpretation

Another reliability index

Reliability coefficient (G & D coefficients) Scale independent (0-1) Non-intuitive interpretation

Standard Error of Measurement (SEM) Intuitive interpretation Scale dependent

Standard Error of Measurement



E = Error score

Reliability index = Variance of TVariance T + Variance E

EStandard Error of Measurement (SEM) =

Interpretation of SEM

Suppose an examinee has a score of 60% and the SEM is 5:

60555045 65 70 7565% CI

1.96 x 5 10

60555045 65 70 7595% CI

2.14 x 5 11

60555045 65 70 7595% CI

D-study (ni = 7; norm referenced)

Source

pi

pi,e

EstimatedVariance

Component

97.57261.24371.97

StandardError

19.02112.9817.60


13.3535.7550.90

G =97.57

97.57 + 371.97/7= 0.65

SEM = = 7.29 371.97 /7

D-study (ni=7; nj=2; domain referenced)

Source

pij

pipjij

pij,e

VarianceComponent

48.7125.1215.00

185.8733.1880.0072.94

% of TotalVariance

10.575.453.2640.337.2017.3615.83

D =48.71

48.71+= 0.43

25.12/ 2

+15.00/2+185.87/

14+33.18/

2+80.00/

14+72.94/

14SEM = = 8.57



G-study D-study

Illustrations

Scenario CEX

A clinical mini exercise (CEX) was developed in which examinees are periodically observed and rated on a rating form. An investigator analyzed a data set from 88 residents who were each observed on 4 occasions by a single different examiner (cf. 1. Norcini JJ, Blank LL, Arnold GK, Kimbal HR. The mini-CEX (Clinical Evaluation Exercise): A

preliminary investigation. Annals of Internal Medicine 1995;123:795-799.). Variance

Componentsp

o,op,eG =

p

p + o:p /4

= Do:p

Scenario OSCE I

An OSCE was administered to 100 final year students consisting of 15 stations. Each station was scored by two independent examiners on a case specific checklist. Different examiners were used in each station.

VarianceComponents

ps

G =p

p +j:spspj:s

ps /15

+ pj:s /2 x15

Scenario OSCE II

An experimental OSCE was administered to 20 residents. Each resident was tested on a different day. For each resident 3 stations were organized consisting of real patients that were available that day. Two examiners observed all residents in all stations and completed a generic rating scale.

VarianceComponents

ps:p D =

p

p +s:p /3j

ps:spj

+ j /2+ ps:s /

3+ pj /

6

Scenario Clerkship Evaluation

An investigator wishes to evaluate teaching quality of 10 clinical clerkships. She developed a questionnaire with 30 items on various quality aspects. The questionnaire was administered in all clerkships by 50 students.

VarianceComponents

ci

s:cci

cs:i

G =c

c + s:c /50

+ ci /30

+ cs:i /50 x 30

PS: It is doubtful that i is a random facet and i could be treated as fixed or ignored!

Further reading & software

Literature Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The dependability of behavioral

measurements: Theory of generalizability for scores and profiles. New York: Wiley, 1972. Original monograph on generalizability theory. Complete, but hardly accessible for any reader.

Brennan RL. Elements of Generalizability Theory. Iowa: ACT Publications, 1983.This is the resource book for most specialists. Not easy for non-statistically trained readers

Shavelson RJ, Webb NM. Generalizability theory: A primer. Newbury Park, CA: Sage Publications, 1991 . Good and accessible introduction to generalizability theory for any reader

Software GENOVA

Conducts G and D studies and provides ample statistical information. Operates on any PC. Program is relatively old and not user friendly. Program available from Dr. J. Crick, National Board of Medical Examiners, National Board of Medical Examiners, 3750 Market Street,Philadelphia, PA 19104-3190, USA.

SPSSSPSS General Linear Models, Subprogram Variance Components, estimates variance components (also for unbalanced designs). D-studies need to be done manually.

generalizability theory

Documents