1 locating and assessing the usefulness of health measures for health disparities research anita l....

Locating and Assessing the Usefulness of Health Measures for Health

Disparities Research

Anita L. Stewart, Ph.D.University of California, San Francisco

Clinical Research with Diverse CommunitiesEPI 222, SpringApril 26, 2005

Outline

Locating measures Basic psychometric properties Rationale for multi-item measures Additional measurement considerations in

health disparities research Steps in selecting measures for your study

Outline

Need Measures with Good Psychometric Properties

Measure assesses concept of interest Low levels of missing data Good variability Evidence of reliability Evidence of validity Responsive to change (for interventions)

Inappropriate Measures can Result in:

Conceptual inadequacy – Measuring wrong concept for your study

Poor data quality (e.g. missing data) Poor variability Poor reliability and validity Inability to detect true associations among

variables– e.g., no measured change in outcome when

change occurred

Good Variability

All (or nearly all) scale levels are represented

Distribution approximates bell-shaped normal

Indicators of Variability

Range of scores (possible, observed) Mean, median, mode Standard deviation (standard error) Skewness, kurtosis % at floor (lowest score) % at ceiling (highest score) Inter-quartile range

Reliability

Extent to which an observed score is free of random error

Population-specific; reliability increases with:– sample size– variability in scores (dispersion)– a person’s level on the scale

Reliability Coefficient

Typically ranges from .00 - 1.00 Higher scores indicate better reliability Types of reliability tests

– Internal-consistency– Test-retest– Inter-rater– Intra-rater

Internal Consistency Reliability: Cronbach’s Alpha Requires multiple items supposedly measuring

same construct to calculate Extent to which all items measure the same

construct (same latent variable) Internal consistency reliability is a function of:

– Number of items– Average correlation among items– Variability of items in your sample

Minimum Standardsfor Internal Consistency Reliability For group comparisons (e.g., regression,

correlational analyses)– .70 or above is minimum (Nunnally, 1978)– .80 is optimal– above .90 is unnecessary

For individual assessment (e.g., treatment decisions)– .90 or above (.95) is preferred (Nunnally, 1978)

Reliable Scale?

NO! There is no such thing as a “reliable” scale We only have accumulated “evidence” of

reliability in a variety of populations in which it has been tested

Validity

Does a measure (or instrument) measure what it is supposed to measure?

And…Does a measure NOT measure what it is NOT supposed to measure?

Validation of Measures is an Iterative, Lengthy Process

Validity is not a property of the measure– validity is a property of a measure for

particular purpose and sample– validation studies for one purpose and

sample may not serve another purpose or sample

Accumulation of evidence:– Different samples– Longitudinal designs

Three Major Forms of Measurement Validity

Content Criterion Construct

Construct Validity Basics

A process of answering the following questions:

What is the hypothesis? What are the results? Do the results support (confirm) the

hypothesis?

Construct Validity: NOTE

Sometimes the hypothesis is that the measure will NOT be correlated with certain other measures, or will be less correlated with some than with others

THUS, observing a low or non-significant correlation can confirm construct validity

Outline

Single- and Multi-Item Measures

A single-item measure consists of only one item

Response choices are interpretable Example: How would you rate your health?

1 - Excellent 2 - Very good 3 - Good 4 - Fair 5 - Poor

Multi-Item Measures or Scales

Multi-item measures are created by combining two or more items into an overall measure or scale score

Summated score, scale score– A score in which multiple items are

“summed” or combined

Example of a 2-item Measure or Scale

How much of the time .... tired?

1 - All of the time

2 - Most of the time

3 - Some of the time

4 - A little of the time

5 - None of the time

How much of the time…. full of energy?

1 - All of the time

Step 1: Reverse One Item So They Are All in the Same Direction

1 - All of the time

1=5 All of the time

2=4 Most of the time

3=3 Some of the time

4=2 A little of the time

5=1 None of the time

Reverse “energy” item so high score = more energy

Step 1: Reverse One Item So They Are All in the Same Direction

1 - All of the time

1=5 All of the time

Reverse “energy” item so high score = more energy

Step 2: Sum the Two Items

1 - All of the time

1=5 All of the time

Highest score= 10 (tired none of the time, full of energy allof the time)

Step 2: Sum the Two Items

1 - All of the time

1=5 All of the time

Lowest score= 2 (tired all of the time, full of energy noneof the time)

Advantages of Multi-item measures

More scale values (enhances sensitivity)– Moved from 2 items with 1-5 levels to 1 scale with

9 levels (2 – 10) Improves score distribution (more normal) Reduces number of variables needed to

measure one concept Improves reliability (reduces random error) Can estimate a score if some items missings

Outline

Additional Measurement Issues: Health Disparities Research

Measurement adequacy and equivalence in diverse groups

Group Comparisons Are Even More Problematic

Health disparities studies involve comparing mean levels of health

Requires conceptual equivalence Also, if psychometric properties are not

comparable across groups…– potential true differences may be obscured

– observed group differences may be inaccurate

Why Not Use Culture-Specific Measures? Measurement goal is to identify measures

that can be used across all groups, yet maintain sensitivity to diversity and have minimal bias

Most health disparities studies require comparing mean scores across diverse groups– need comparable measures

Issues Concerning Group Comparisons Disparities in observed scores can be due

to – culturally- or group-mediated differences

in true score (true differences) -- OR -- – bias - systematic differences between

group observed scores not attributable to true scores

Bias - A Special Concern

Measurement bias may make group comparisons invalid

Bias can be due to group differences in:

– the meaning of concepts or items

– the extent to which measures represent concepts

– cognitive processes of responding

– appropriateness of methods

Psychometric Adequacy in One Group

Conceptual

Psychometric

Adequacyin 1 Group

EquivalenceAcross Groups

Concept equivalentacross groups

Psychometric propertiesmeet minimal standards

within one group

Psychometric propertiesinvariant (equivalent)

across groups

Concept meaningfulwithin one group

Psychometric Adequacy in a Diverse Group

Psychometric properties meet minimal standards– Adequate reliability/reproducibility– Confirmation of theoretically-based factor

structure – Construct validity evidence– Responsiveness to change evidence

Psychometric Adequacy in a Diverse Group (cont.)

Measures have similar measurement properties in a diverse group as in original mainstream groups on which the measures were developed, i.e., similar– reliability– factor structure– construct validity– responsiveness to change

Psychometric Equivalence

Conceptual

Psychometric

Adequacyin 1 Group

EquivalenceAcross Groups

Concept equivalentacross groups

Psychometric propertiesmeet minimal standards

within one group

Psychometric propertiesinvariant (equivalent)

across groups

Concept meaningfulwithin one group

Equivalence of Factor Structure: Psychometric Invariance

Psychometric invariance (equivalence) Important properties of theoretically-based

factor structure of measurement model do not vary across groups

Methods for Assessing Equivalence of Factor Structure

Exploratory factor analysis– Two or more groups– Subjective comparison of factor structure

Confirmatory factor analysis – Two or more groups– Test for equivalence of factor structure

» test fit of theoretical model to data

Outline

The Problem

You are beginning a study You know the concepts (variables) of interest Question:

Which measure of ________ should I use?» A popular measure» One that a colleague used successfully» Create your own

Basic Steps in Selecting Appropriate Measures1. Specify context (research question, target group)2. Define concept for your study3. Review potential measures for:

a) conceptual match to your definition b) adequate psychometric properties in your target

group5. Pretest potential measures in your target group6. Choose best ones based on pretest results OR7. Adapt if necessary to address problems

1. Specify Context

A. Research question and how concept fits research

B. Nature of target population

C. Practical constraints

1A. Context: How Concept Fits Research Question

State problem or question being addressed Describe purpose of measure

– Evaluate intervention (outcome)

– Describe population

– Covariate

– Independent variable

Outcome Measures of Interventions: Entire Study Depends on These

Requires special attention to selecting the best measure that …– taps content areas that the intervention is likely to

change – has good variability at baseline, room to improve– has excellent reliability and validity– is appropriate and acceptable to target population– is sensitive to change

Main Dependent Variable of Non-intervention Studies

Pay special attention to selecting the best measure that …– taps full content of concept

– has good variability (variance to predict)

– evidence of reliability and validity

– is appropriate and acceptable to target population

1B. Context: Nature of Population

Describe known characteristics of your target population– Age (range, mean)

– Range of health states

»chronic conditions, frailty

– SES (e.g. educational level)

– % with literacy problems

– Racial/ethnic and language diversity

1C. Context: Practical Constraints

Time frame for completing study Personnel available

– Research assistants, interviewers Other costs

– Data entry, mailings, phone, coding Preferred method of administration Acceptable respondent burden

Step 2: Define Each Concept ForYour Study Define each concept from your perspective,

taking into account– Your study questions– Your target population

For outcome concepts:– Describe how the intervention or independent

variables might affect it– Describe specific types of changes you expect

Define Each Concept (cont.)

Include response dimension in definition(what is it about the concept you are interested in?)– Frequency

– Intensity

– Proportion of time

– Whether they have condition/symptom

Example: Defining Pain in Your Study

Context: clinical intervention to minimize stomach pain

Define exactly how you expect to reduce pain:– eliminate pain completely?– reduce severity of pain when it occurs?– reduce frequency of pain?– change quality of pain?

Concept you aim to improve varies across these

Step 3. Review Potential Measures

Identify candidate measures for all domains or concepts in your framework

For health outcomes:– Generic or condition-specific profiles of multiple

domains OR measures of single domains Redundancy OK for now Do NOT develop your own questions unless

it is absolutely necessary

Locating Specific Measures

Reference databases– Medline, Pubmed, Psychinfo, others

Compendia of measures– Books that compile and review various measures

Web is fast becoming the best resource– Specific measures– Web resources from measurement core

Identify researchers doing work in a field and contact them for their measures

Review Potential Measures for:

Conceptual appropriateness & relevance– in your study– in target group

Clear scoring rules Psychometric adequacy in target group(s) Practicality Acceptability

– To respondents and interviewers

Conceptual Relevance

Example: you are interested in reports of perceived discrimination in the health care setting

In reviewing measures of discrimination; most are about– Discrimination over the lifecourse– Discrimination in various life settings (work,

school) Not relevant for your purpose

Psychometric Adequacy for Your Study

In samples similar to yours:– good variability (e.g., no floor or ceiling effects)– low percent of missing data– good reliability– good validity

As an outcome for your planned intervention– responsiveness, sensitivity to change in similar

population– able to detect expected magnitude of change

Limited Data on Measurement Properties of Many Measures

Not easy to find this information Many studies do not report any

psychometric properties

– Assume the properties from original study carry over

Limited Data on Measurement Properties of Many Measures (cont.)

Especially in diverse populations:

– Few studies test measures across diverse groups

– Even when diverse groups are included in research»sample sizes usually too small to conduct

measurement studies by subgroups

Review Measures for Practicality

Method of administration appropriate for your study

Scoring rules clearly documented, or computer scoring algorithm available

Measure available at cost you can afford You are allowed to adapt it if necessary Costs of administration within study resources

Practical Considerations

Once you have decided on the measures, you must think about:

• Obtaining permission• Method of administration• Data collection• Scoring• Availability of translations if needed

Practical - Scoring

Know ahead of time how you plan to score the items– Count of “correct” answers?– Sum Likert items into a summated scale?

Are scoring instructions or computer scoring programs available?

Can scoring programs be purchased from developers?

Do you have a scoring codebook?

Review Measures for Availability of Translations if Needed

If you need the questionnaire in another language, are there translations available?– Official (published and tested)

– Unofficial (by some other researcher)

Translation Availability

Is the measure available in the language of your target populations?

Yes No

•Know the method of translation •Assess adequacy or quality of translation

•Perform double translation•Use bilingual, bicultural translators

Review Measures for Acceptability

Acceptability is the ease with which a measure can be used in your setting and population

Acceptability to target population – respondent burden (length, time needed), distress– burden for sickest, oldest, least educated– culturally sensitive

Acceptability to interviewers– interviewer burden– do they like administering the questionnaire?– amount of training needed

Respondent Burden

Diverse populations may have more difficulty with instruments, take longer to complete

Perceived burden– a function of item difficulty, distress due to

content, perceived value of survey, expectations of length

– is as important as actual burden

5. Choose Best Measures to Pretest in Your Target Population

Select best measures for all concepts in your conceptual framework– existing instrument in its entirety

– subscales of relevant domains (e.g., only those that meet your needs)

Pretest Pretesting essential for priority measures (e.g.,

outcomes) Pretest is to identify:

– problems with method of administration – unacceptable respondent burden– problems with questions or response choices

» Hard to understand, complex, vague

– words and phrases that do not mean what you intended to target population

Types of Pretests

General pretest, small (N=10) Cognitive interviewing

(N=5-10 each group) Large pretest (N=100)

– test measurement properties prior to major study

General Pretest (Small): Debriefing Pretest Goal

– Find out how well subjects do with the procedures

– Estimate time needed to complete instrument– Identify serious problems

Procedures– Subjects answer entire questionnaire– At end, debrief– Close to true task

Debriefing Questions After Administration of Survey..

Ask respondents: Were any questions confusing? Which words were hard to understand? Which questions were difficult to answer?

caused distress? Was questionnaire too long? Confusing instructions?

Problems with General Pretests

Respondents… often don’t understand the task. don’t want to appear as if they didn’t

understand have a hard time telling you anything was

wrong easier to say everything was fine

Pretest Several Measures of Same Concept?

If you are unsure about which of several measures will be appropriate for your study– pilot test all you are considering

– can use pilot test results to select best one Saves time

– if test only one measure and it has many problems, have to repeat entire process for next candidate measure

Conduct Pretests in All Diverse Groups Being Included in Your Study

Important to recruit people from each of your target populations

– Won’t learn anything if you just recruit friends, persons easy to recruit

Cognitive Interviewing

Individual in-depth interviews with individuals using open-ended probes to assess– how items are interpreted– adequacy of response choices

Typically 1.5 hr interview

Cognitive Interviewing Helps You Learn About the 4 Steps in Answering Questions

Interpret and understand the question– as intended by the researchers

Retrieve the information – various schemas used to access memory

Judgment formation - formulate an answer– calculate or judge the correct information

Edit response - decide what to report– is answer embarrassing, socially undesirable?

Summary

Selecting best measures is critical to validity of research

Very little published information on measurement properties in diverse groups– New area of focus and policy attention

– Raises issues of conceptual and psychometric adequacy and equivalence

Pretesting is the most important thing you can do

Conclusions

Methods described here are “ideal”– Impractical for most researchers

Apply these methods to your most important measures– e.g., outcomes, key independent variables

Keep learning– Good, appropriate measures remain the foundation

of excellent research

1 locating and assessing the usefulness of health measures for health disparities research anita l....

instrument measure

usefulness of health

studyneed measures

validation of measures

scores possible

higher scores

sampleaccumulation of

scale levels

Documents

decreasing health disparities

the health disparities myth

health disparities in wv and the wvu health disparities...

health disparities and cancer€¦ · cancer and...

health disparities 2014

disparities in children's health

health disparities in appalachia: executive … disparities...

health care costs and access disparities in appalachia ·...

health disparities & hiap - minnesota public health ......

health disparities in sle - the lupus initiative...

panel on health disparities

population disparities in health & health care

health disparities:

health disparities in dementia4

epigenetics and health disparities

health promotion: asian american health disparities

nci center to reduce cancer health disparities...

mapping health disparities

eliminating tobacco disparities health disparities council...

health disparities and pregnancy