evaluating survey items and scales bonnie l. halpern-felsher, ph.d. professor university of...

Evaluating Survey Items and Scales

Bonnie L. Halpern-Felsher, Ph.D.

Professor

University of California,

San Francisco

Question Evaluation Involves

• How well questions are understand and how easy they are to answer

• Affects quality of measurement

• How well do answers relate to what we are actually trying to measure

– Validity and reliability

Standards for Survey Items/Questions

• Content – are the questions asking the right things?

• Cognitive – do participants understand and are they able and willing to answer

• Usability – can the questions be completed easily?

Ways to Test

• Expert review – have experts in the field review the items and scale to ensure they meet the criteria needed– Great for perspective on what researcher

needs, but might not tell about the best way to answer most accurately

Ways to Test

• Focus groups – hold discussions with people in the sample of interest to inquire as to what terms are used, whether the questions make sense, and whether the content is complete– Great for knowing what people in general

think, but not individuals

Ways to Test

• Cognitive interviews – pilot test the survey and ask questions to samples about what the questions meant, why they gave certain answers, and whether they would change something– Great to tell how individuals understand

questions, but might not be generalizable

Looking For:

• Interviewer: Reads question as worded vs with slight changes vs meaning is altered

Looking For:

• Respondent: – Interrupts– Asks for clarification– Gives adequate answer– Gives inadequate answer– Answers “I don’t know”– Refuses to answer

Ways to Test• Field pretests – pilot test in same manner

and to same sample as will do for real survey administration; see what distribution of answers were given; hold debriefings; determine best administration method– Tells how instrument and procedures work

under real circumstances and to get summary data; but not flexible to probe and test variations

Ways to Test

• Randomized or split-ballot experiments – randomly assign different sets of items or wording of items to different groups and compare

Reliable & Valid Measures

• Reliability: Answers provided are consistent.

• Validity: Responses relate to some truth concerning what we are trying to describe.

• Items need to be consistently understood, administered, and communicated.

Reliability

Validity

• Translation Validity– Does the measurement of the construct

reflect well the construct of interest• Face validity

• Content validity

Validity

• Criterion-related validity– Does the measure behave the way it should

given the theory or construct• Predictive validity

• Concurrent validity

• Convergent validity

• Discriminant validity

Translation Validity: Face Validity

• Does the measure “on the surface” or “on the face” seem like a good representation of the construct?

Translation Validity: Content Validity

• Ensuring that your measure taps into each part of the construct that is necessary to measure

Criterion-Related Validity: Predictive Validity

• Determine whether the measure can predict something it should predict– E.g., doing well on a certain math exam

should predict achievement in engineering

Criterion Validity: Concurrent Validity

• Can the measure distinguish between groups that it should be able to distinguish between– E.g., assessment of manic-depression should

differentiation between manic-depressive patients and schizophrenic patients

Criterion Validity: Convergent Validity

• Degree to which the measure is similar to (converges with) other similar constructs– E.g., a new measure of IQ would correlate

highly with the Stanford Binet IQ test

Convergent Validity: Discriminant Validity

• Degree to which the measure is NOT similar to (diverges from) other constructs not expected to be similar– E.g., measure of math skills should not be

correlated with measure of literacy skills

Reliability

• Inter-rater or inter-observer

• Test-retest

• Parallel Forms

• Internal Consistency

Inter-rater Reliability

• Degree to which different raters or observers give consistent estimates of the same issue or phenomenon

Test-Retest Reliability

• Give same test to same people twice and see how much they correlate

Parallel-Forms Reliability

• Have 2 related forms and see how they correlate

Internal Consistency

• Give measure to a sample and estimate how well the items reflect the same construct

evaluating survey items and scales bonnie l. halpern-felsher, ph.d. professor university of...

Documents