itc conference itc conference, winchester, 2002 computer-based testing usability of psychometric...

45
ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Usability of Psychometric Admeasurements Admeasurements Dr. J. M. Müller University of Tübingen, Germany University of Tübingen, Germany http://www. joergmmueller .de/default. htm

Upload: matthew-lindsay

Post on 28-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

ITC ConferenceITC Conference, Winchester, 2002

Computer-based Testing

Usability of Psychometric Admeasurements Usability of Psychometric Admeasurements

Dr. J. M. Müller

University of Tübingen, GermanyUniversity of Tübingen, Germany

http://www.joergmmueller.de/default.htm

Page 2: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Overview

1. Introduction: Formal test descriptions in practice

2. Definition of usability in the context of test description

3. Illustrating problems: Reliability

4. Criteria of usability: foundation, scaling, general attributes

5. Two examples of enhanced usability: NDR and PDR

6. Summary

Page 3: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Introduction: Psychometric admeasurements in practice today and tomorrow

1. Test users often use poor quality tests (e.g. Piotrowski et al.; Wade & Baker, 1977) Psychometric knowledge (Moreland et al. 1995)/Competence approach (Bartram, 1995, 1996)

2. What should be described? CBT: Criteria for software usability (ISO 9241/10, 1991; Willumeit, Gediga & Hamborg, 1995) and further criteria: platform-independence, possibility of making own norm banking, protection)

3. How should it be described?

4. “Good practice” guidelines and standards are based on quality criteria (e.g. Standards for educational and psychological Testing, PA, 1999; International Guidelines for testing, ITC, 2000)

Quality Supply Quality Demand

Page 4: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Definition of Usability

Scope of usability: Usability in the context of psychological testing concerns all important kinds of information for test users to describe a test for various purposes and the ways to communicate them. This includes test manuals as well as a formal test descriptions with the help of psychometric admeasurements.

Aim of usability: The product or effect of good usability is that any test user finds all necessary information quickly and in a proper standardized form, ready to use for answering the questions of the test users to enable them to decide whether a test is an appropriate help for the diagnostic question.

Frame of usability: Quality assurance in the context of psychological testing refers to test construction, test translation, test description and the use of tests in practice.Methods to enhance quality control can contain guidelines for test use, standards for test description, etc. Usability is a strategy to enhance quality on the level of formal description.

Consequences of usability concern the reengineering of formal test description,

Page 5: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Indices of measurement of error

Spe a

rman

Co r

rela

t i on

% o

r SM

C

Phi

-Coe

f f ici

ent

Ret

est P

ears

o n c

orre

l ati o

n

Yul

e‘s

Y

Cro

nba c

h‘s

Alp

ha

Kud

e r- R

i ch a

r ds o

n‘s

For

mul

a 20

Spe a

rman

-Bro

wn

prop

hecy

f orm

u la

intr

acla

ss-c

orre

latio

n

S en s

i tiv i

t y T

P/(

TP

+F

N)

S pe c

i fic y

TN

/(T

N+

FP

)

S ta n

d ar d

er r

o r o

f a s

c ore

Kap

pa R

ecla

ssifi

catio

n

Mod

el-F

it L

ikel

ihoo

ds

Info

rma t

i on -

f un c

t ion

Kap

pa I

nter

rate

r

Stan

dard

err

or s

core

Measurement of error

Dimensional construct Categorical construct

CTT IRTGeneralizability Theorynonspecific

misclassificationspecific

misclassification

Page 6: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Standard error score

Reliability

Relationships between indices of error of measurementSp

e arm

an C

o rre

lat i o

n

% o

r SM

C

Phi

-Coe

f f ici

ent

Ret

est P

ears

o n c

orre

l ati o

n

Yul

e‘s

Y

Cro

nba c

h‘s

Alp

ha

Kud

e r- R

i ch a

r ds o

n‘s

For

mul

a 20

Spe a

rman

-Bro

wn

prop

hecy

f orm

u la

intr

acla

ss-c

orre

latio

n

S en s

i tiv i

t y T

P/(

TP

+F

N)

S pe c

i fic y

TN

/(T

N+

FP

)

S ta n

d ar d

er r

o r o

f a s

c ore

Kap

pa R

ecla

ssifi

catio

n

Mod

el-F

it L

ikel

ihoo

ds

Info

rma t

i on -

f un c

t ion

Kap

pa I

nter

rate

r

Info

rmat

ion-

crite

ria

Y/ Kappa/ Phi

Korrelation

Phi

Kappa

Page 7: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

test theory/statistic

Index: Generic formula

Algorithm

scale (correction)

Interpretation of the score (operational meaning)

Top-down vs. bottom-up strategy to develop a coefficient

Practitioner‘s point of view

Scientist‘s point of view

Defining the operational meaning

Scale definition

Specification of within a test theory

Index: Defining the influencing factors

Index: Generic formula

Page 8: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Rescaling reliability: Number of distinctive results (NDR)

(Wright & Master, 1982; Lehrl & Kinzel, 1973; Müller, 2001)

Rang R

Test score distribution

x1 x2

criticaldifference

criticaldifference

criticaldifference

criticaldifference

criticaldifference

ttx rsxxk 1296,105.012Formula

R = test score range

k = critical difference

21

1*2

2

ttrk

RD

Page 9: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Foundation1. Unambiguous

operational meaning

2. Unambiguous formal definition

3. Broad application area

4. Relevant dependencies

5. Independent of irrelevant factors

Scale Definition1. Meaningful scale unit,

that implies:• Interval scale• Positive values• Defined range of

values 2. Comparable to the

reference scale 3. Significant scale unit that

implies a minimum of observations (Nmin)

Global attributes in using

1. Relevance2. Informative (not

redundant)3. Predictable for the test

user (nominal/actual value comparison)

4. Easy to learn 5. Easy to utilise6. Fisher(1925) criteria

of estimating

Criteria of usabilityfor formal quality criteria

(modified from Müller, 2001, 2002a,b; Goodmann & Kruskal, 1954)

Foundation1. Unambiguous

operational meaning

2. Unambiguous formal definition

3. Broad application area

4. Relevant dependencies

5. Independent of irrelevant factors

Scale Definition1. Meaningful scale unit,

that implies:• Interval scale• Positive values• Defined range of

values 2. Comparable to the

reference scale 3. Significant scale unit that

implies a minimum of observations (Nmin)

Global attributes in using

1. Relevance2. Informative (not

redundant)3. Predictable for the test

user (nominal/actual value comparison)

4. Easy to learn 5. Easy to utilise6. Fisher(1925) criteria

of estimating

Page 10: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

NDR at work...

NDR = 2NDR = 5NDR = 10

r = .50r = .92r = .98

Distribution of reliability coefficient Distribution of NDR coefficient

Conclusion: many precise tests Conclusion: some precise tests

Page 11: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Probability of distinctive results (PDR)

2

)1(*

nntD

Formula

tD

sDPDR

n

ji jiji

jijiji kxxifs

kxxifsssD

, ,

,, ,0

,1

Complete score comparison of pairs

Rectangular distribution shows an 80 %

probability to distinguish two test scores

Gaussian distribution shows a 60 %

probability to distinguish two test scores

Page 12: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Reliability

PDR

PDR: Simulation studyPerformance to separate test scores with respect

to reliability and score distribution

Page 13: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

PDR: Example

Subscale ‚Resignation‘; Stress-Coping-Questionnaire

SVF-KJ; Hampel, Petermann & Dickow, 1999; N=1123

Subscale ‚Unsicherheit‘ Symptom Check List

(Derogatis, 1977; German Version Franke, 1995; N=875

r = 0.81

PDR = 41.6 % PDR = 30.6 %

r = 0.81

Page 14: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Reviewing NDR and PDR

1. NDR and PDR can be derived in any test theoretical model – there is progress in the application area.

2. NDR and PDR have an easy to understand operational meaning

3. NDR and PDR are predictable for the test user for the nominal/actual value comparison

NDR and PDR serve as examples of how to develop more usable NDR and PDR serve as examples of how to develop more usable formal test descriptionsformal test descriptions

Page 15: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Summary

1. Usability is a possible strategy with explicit and observable criteria, for improving formal test descriptions – and strengthening indirectly the role of guidelines and standards.

2. With NDR and PDR two easy to understood coefficients have been proposed, the application of which in is progress in several test theoretical models.

Page 16: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Thank you for your attention!

Page 17: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Medicine: Effect-size measures

Practitioners coefficient

m

i i

ii

P

PPw

1 0

201

Scientific coefficient (Cohen, 1988)

CER*

1

RRRNNT

NNTs [Number-Needed-to-Treat] the number of patients who need to be treated to prevent 1 adverse outcome. Taken from EBM Glossary - Evidence Based Medicine Volume 125 Number 1

Page 18: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Measuring in technical fields: Solutions from engineering

The is a German Norm DIN 2257 on how to measure the physical length of an object and how to report the result. The norm allows as output only values with statistical evidence.

Page 19: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Criteria of usabilityfor formal quality criteria for NNT

Foundation1. Unambiguous

operational meaning

2. Unambiguous formal definition

3. Broad application area

4. Relevant dependencies

5. Independent of irrelevant factors

Scale Definition1. Meaningful scale unit,

that implies:• Interval scale• Positive values• Defined range of

values 2. Comparable to the

reference scale 3. Significant scale unit, that

implies a minimum of observations (Nmin)

Global attributes in using

1. Relevance2. Informative (not

redundant)3. Predictable for the test

user (nominal/actual value comparison)

4. Easy to learn 5. Easy to utilise6. Fisher‘s (1925)

criteria of estimating

Page 20: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Criteria of software usability(from Willumeit, Gediga & Hamborg, 1995)

Questionnaire on the basis of ISO9241/10 (IsoMetrics) to evaluate the following dimensions:

1. Suitability for the task

2. Self-descriptiveness

3. Controllability

4. Conformity with user expectations

5. Error tolerance

6. Suitability for individualization

7. Suitability for learning

Page 21: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

KR20 and Cronbach

2

111 t

n

uii

tt

qp

n

nr

Kuder-Richardson-Formel KR20i itempi relative Anzahl von 1qi relative Anzahl von 0

(aus Cronbach, 1951)

Cronbachs Alphac Anzahl der Variablen

si2 Varianz der Variablen i

stot2 Varianz der Summe

21

2

11 x

J

ii

s

s

c

c

Page 22: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Formula to the error of measurement in categorial constructs

Cohen‘s KappaWeiter 16 Maße zur Konkordanz zweier

Messungen für binaäre Daten verglichen Conger & Ward (1984)

Yule Vierfelderinterdependenzmaß

Q-Koeffizient

Phi-Koeffizient Abhängigkeit von Randsummenverteilung Abhängigkeit des Signifkanztests von N

(Yates-Kontinuitätskorrektur, 1934)

adbc

adbdY

1

1

2

1

2

1

2

2

i i ij

ijij

e

ef

B1 B2

A1 a bA2 c d

bcad

bcadQ

e

e

p

pp

1

02

N

dap

0

2

)()()()(

N

dbdcbacape

N2

Page 23: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Formula to the error of measurement in categorial constructs

Frickes Übereinstimmungs-koeffizient SS: Quadratsumme innerhalb einer

Person; max SS: maximal mögliche Quadratsumme innerhalb der Personen

Punkt-biseriale KorrelationX=arithmetisches Mittel aller Testrohwerte

XR=arithmetishes Mittel der Pbn mit richtigen Antworten

sx=Standardabweichung der Testrohwerte aller Pbn

N = Anzahl aller Pbn

NR=Anzahl der Pbn, mit richtigen Antworten

Tetrachrorische Korrelation

max

1SS

SSÜ

n

daÜ

A B C

I 1 4 3

II 0 4 2

III 0 5 2

bcadrtet

1

180cos

0

q

p

s

XXr

x

Rjtbisp

_

Page 24: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Formula to the error of measurement in CTT, IRT + prophecy-formula

Spearman-Brown-Formel

k= Faktor der Testverlängerung

Rasch model

CTT

tt

tttt rk

rkr

11

)1(

1)(

1vi

k

ivi pp

EVar

ttxe rss 1

Page 25: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Some Formula for the error of measurement in metric constructs

reliability (Kelley, 1921)

Pearson(1907) -Correlation Bravais (1846)

Spearman‘s rho (1904)

Kendalls Tau , 1942(S=difference of pro- und inversionsnumber)

22

2

ew

wtt ss

sr

N

i yx

ii

ssN

yyxxr

122

1

61

2

2

NN

dRho i

i

2/)1(

NN

S

r

rZ

1

1ln

2

1

1 2 3 4 5

3 2 3 5 4

Page 26: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Non-linear Relationsship between reliability, NDR and the standard error score

1reliability

NDR Standard error score

NDR

Standard error score

Page 27: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Item-Response-Theory(Fischer & Molenaar, 1994)

1. Dichotomous raschmodel

2. Linear logistic test model

3. Linear logistic model for change

4. Dynamic generalization of the raschmodel

5. One parametric logistic model

6. Linear logistic latent class analysis

7. Mixture distribution rasch models

8. Polytomous rasch Models

9. Extended rating scale and partial credit models

10. Polytomous mixed rasch models

11. ...

Page 28: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

...more IRT (van der Linden & Hambleton, 1997)

1. Nominal categories model

2. Response model for multiple choice

3. Graded response model

4. Partial credit model

5. Generalized partial credit model

6. Logistic model for time-limit tests

7. Hyperbolic cosine IRT model for unfolding direct responses

8. Single-item response model

9. Response model with manifest predictors

10. A linear multidimensional model

11. ...

Page 29: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Formula of some IRT

rasch model

binomial model

Unfolding-model

iA

iAAiAi

xxp

exp1

exp

A

AAxp

exp1

exp

))(exp(1

))(exp()(

ivi

ivivivi

xxp

Birnbaum model

))(exp(1

))(exp()(

2

2

iv

ivvivi

xxp

ii xig

G

g

k

i

xiggxp

1

1 1

)1()( Latent-Class-model

Page 30: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Criteria of software usability(from Willumeit, Gediga & Hamborg, 1995)

Questionnaire on the basis of ISO9241/10 (IsoMetrics) to evaluate the following dimensions:

1. Suitable for the task

2. Self-descriptiveness

3. Controllability

4. Conformity with user expectations

5. Error Tolerance

6. Suitable for individualization

7. Suitability for learning

Page 31: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Norm scales

Page 32: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

SCL-90-R test score distribution

Page 33: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Simulation study about the relationsship between measures of association

Y/ Kappa/ Phi

correlation Y/ Kappa/ Phi Q

Y/ Kappa/ Phi

correlation

SMCY/Kappa/Phi

Q

correlation

Phi

SMC

Phi

Kappa

SMC

KappaMeasure of associationMea

sure

of

asso

ciat

ion

Linear relationship?

dcA2

baA1

B2B1

dichotome Normal distribution- equal marginals

Skewed distribution - unequal marginals

Page 34: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Efficiency in measuring

Content: efficiencyConcept: The less effort you need for the same

amount of information, the more efficiency the test isefficiency = f(Information;effort)

Indice: E = Amount of Information/TimeEstimates: Information Theory

(Shannon & Weaver, 1949)

Page 35: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Amount of Information of a signal: Chess example

In the chess example you need at least (binary, 50-50 chance) 6 question‘s that are 6 bit.

1.Frage: links-rechts?

2.Frage: oben-unten?

3.Frage

4. 5.

6.

The scale unit ‚bit‘ can be understand as the minimal or optimal number of question‘s, to identify a signal out of quantity of alternatives.

Page 36: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

AB

Schachspieler

C1:21:2

Rasch variances are a measure of the variability of person‘s within a dimension

Als Maßeinheit der Unterschiedlichkeit dient die

Differenz der Gewinnwahrscheinlichkeiten.

1: 21: 2

1: 21: 2

1: 21: 2

1: 21:2

Page 37: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

1. Gewinnwahrscheinlichkeiten -> Lösungswahrscheinlichkeiten

2. Gegner -> Testaufgabe (Itemparameter)

3. Spielstärke -> Personenparameter

4. Differenz der Gewinnwahrscheinlichkeit definiert über den Logit des Raschmodells

Interpretable Rasch Variances

personen parameter

Probability to solve an item

Item i with = 0

B A

item m with = 1

C

iA

iAAiAi

xxp

exp1

exp

minmax A

Difference to solve a question or task

Page 38: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Empirical Evidence of the range of person parameters in rasch units

AID Kubingen & Wurst Standardform Parallelform

Alltagswissen 21,1 21,3

Realitätssicherheit 13,3 13,1

Angewandtes Rechnen 21,7 20,5

Eigenschaft Autor AusdehnungVerbaler Intelligenztest Metzler & Schmidt 11,4

Averbale Intelligenz Forman & Pieswanger 8,2

Einstellung zur Sexualmoral Wakenhut 8,1

Einstellung zur Strafrechtsreform Wakenhut 7,2

Beschwerdeliste Fahrenberg 6,4

Räumliches Vorstellungsvermögen Gittler 5,9

Umgang mit Zahlen bei Kindern Rasch 3,5

Page 39: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Usability criteria explanations

• Relevant dependencies: Example: Reliability and test length, stability, ...

• Irrelevant dependencies: Example: Reliability and test score distribution

• Displaying numbers: Integer, positive, predictable range

• Meaningful scale unit

• Familiarness: each new coefficient should distinctly more usable than the traditional

Page 40: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

7. Linearität zur Unit-in-Change

Erläuterung: ‚Linearität zur Unit-in-Change‘

- Im Falle der Messgenauigkeit betrifft dies die Beziehung der Reliabilität zum Messfehler.

- Im Falle der Übereinstimmung betrifft dies die Beziehung von Yules Y zur Veränderung der Zellhäufigkeit a bzw. d.

Korrelation/Reliabilität

Standardmessfehler

Yules Y

Freq (Zelle a)

Page 41: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Evaluation the progress trough enhancing usability

1. Formal test criteria are used more frequently for test selection

2. Tests in practice are of higher quality

Page 42: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Ergonomics in psychological test selection

Ergonomics

Psychological diagnostic

Configuration of Environment

Software conception

Designing a tool to fit in hand.

Developing a program to be used

intuitively

Restrict a test description, that

relevant information are ready to use

Page 43: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Integrating ergonomics in the formal test description

Human interface techniques

test user Psychometric admeasurements

test

Analysis

of usage evaluationUsability criteria

1. Formal test criteria are used more frequently for test selection2. Tests in practice are of higher quality

Page 44: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Ergonomics and the development of criteria of usability

Requirement Analysis (Mayhew,1999)

User-ProfileTask-

Analysis

Platform Capabilities/ Constrains

Testuser Test selection Test theory

Page 45: ITC Conference ITC Conference, Winchester, 2002 Computer-based Testing Usability of Psychometric Admeasurements Dr. J. M. Müller University of Tübingen,

Top-down vs. bottom-up strategy to develop a coefficient

test theory/statistic

Index: Generic formula

Algorithm

scale (correction)

Interpretation of the score (operational meaning)

Practitioner‘s point of view

Scientist‘s point of view

Defining the operational meaning

Scale definition

Specification of within a test theory

Index: Defining the influencing factors

Index: Generic formula

CTT

22

2

ew

w

ss

sr

N

i yx

ii

ssN

yyxxr

122

none

associationP-R-E

SEDTTS

NDR

k

RD

f(me, score range, probability)

ttx rsD

1296,1

s*6 x