three main points to be covered nature, weakness, and (sometime) strength of studies using...

Three main points to be covered

• Nature, weakness, and (sometime) strength of studies using group-level observations

• Cohort study as gold standard and its assumptions and limitations

• Concept of the study base linking case-control design to the cohort design

Studies making observations on groups of individuals vs. individuals

• Studies using group level data are usually called ecological studies

• Two main weaknesses:– ecological fallacy– very limited control of confounding

• One (sometime) strength:– some exposures may be best measured at area

or group level

Example from Szklo and Nieto of grouped datafrom cohorts in the Seven Countries Study

Ecological Fallacy

• Cannot tell whether the relationship between the predictor and the outcome at the group level holds at the individual level

• In this example: Are the individuals in the cohorts eating more saturated fat the same individuals experiencing more CHD deaths?

• Sometimes called confounding at the group level

Confounding in group data

• If no ecological fallacy, still left with possible confounding: some 3rd variable causing increase in CHD deaths and also related to consumption of fat (eg, exercise)

• Difficult to control for because measures may not be available

• Even if data available, don’t know relationship of confounding variable to other two variables at individual level

Example of the potential strength of measures at group level: Effect of Floods

in Bangladesh in 1988 on Children

• Children 2 - 9 years samples 6 months before flood and 5 months after

• Outcomes: Enuresis and aggressive behavior• Individual level predictor: individual danger of

drowning• No association seen at individual level• At group level, before and after flood

comparison showed significant difference

Situations where group level variables may be better

• Exposures without much within group variability or a threshold effect (eg, salt consumption in U.S.)

• Herd immunity in studying infectious disease (vaccination levels may be more informative than individual behavior)

• Exposures that have powerful effects at group level (Bangladesh flood -- may also be example of a threshold effect)

Ecological Studies: Summary

• As text emphasizes, common view that they are only hypothesis-generating is inadequate

• Weakest design for establishing causality but has a role because inexpensive and easy to do

• For some situations and kinds of data may actually be superior

• Some variables can only be measured at group level (policies and laws, environment)

Cohort Study Design

• Mimics individual’s progress through life and accompanying disease risk

• Gold standard because exposure/risk factor is observed before the outcome occurs

• Randomized trial is a cohort design with exposure assigned rather than observed

• Other study designs can be understood by how they sample the experience of a cohort

Cohort study designcensored observations = losses to follow-up

Minimum loss to follow-up (1%)

Time of Cohort Follow-up vs. Time when measurements made

• Concurrent cohorts give most control because measurements are made at the same time as cohort assembly and follow-up (most texts call these prospective cohorts)

• Non-concurrent cohorts rely on obtaining measurements made in the past (most texts call these retrospective cohorts)

• Mixed cohorts obtain some measures made in the past and rest at same time as follow-up

Selecting a non-concurrent cohort from a current administrative data base

• Not a cohort study if you sample persons currently in the data base in order to insure retrospective data from past years– cross-sectional sample – no loss to follow-up by definition

• Must sample individuals from some baseline in the past in the data base– ascertain outcome, losses to follow-up from that

time forward

Non-concurrent cohort study cannot be defined by presence at end of follow-up

Not thecohort

This is thecohort

Main Threats to Validity of a Cohort Study

• Subjects lost during follow-up

– Number of losses is less important than how losses are related to outcome and risk factor

• Ascertainment of outcome– Ascertainment should be complete and unbiased with respect to

risk factor

Subjects lost during follow-up

• If losses are random, only power is affected

• If disease incidence important, losses related to outcome bias results

• If association of risk factor to disease is focus, losses bias results only if related to both outcome and the risk factor

• If losses introduce bias in the outcome, the censoring is called informative censoring

Crucial issue is who is leaving cohort: what bias do thecensored observations (losses to follow-up) introduce?

Same issue with ascertainment of outcome events.

? ?? ?

? ? ?

Two Cohort Studies of HCV/HIV Coinfection and Risk of AIDS

• Swiss HIV Cohort• 3111 patients, ‘96-’99• At least two visits• Med. follow-up 28 mos• HCV+ more rapid disease

progression• Adj RH = 1.7 (95% CI =

1.3 - 2.3)• No loss to follow-up info (Greub, Lancet, 2000)

• Johns Hopkins Cohort• 1955 patients, ‘95-’01• At least two visits• Med. follow-up 25 mos• HCV not associated with

disease progression• Adj RH = 1.0 (95% CI =

0.9 - 1.2)• No loss to follow-up info (Sulkowski, JAMA, 2002)

Case-Control Design: Concept of the Study Base

• Study Base = the population that gave rise to the cases (Szklo and Nieto call it the “reference population”)

• Key concept that shows the link between case-control design and cohort design

• Case-control design using the study base concept is most easily understood in the setting of a cohort study

Nested Case-Control Study within a Cohort Study Study Base = Cohort

Controls Sampled each time a Case is diagnosed = Incidence Density

Nested Case-control Study• In text example, 4 cases occur at 4 different

points in time giving rise to 4 risk sets of cases and controls

• Controls for each case are selected at random in each risk set from cohort subjects under follow-up at the time

• It follows from the random selection, that a control can later become a case

• Results can be just as valid as using entire cohort; gives unbiased estimate of rate ratio

Definition of a Primary Study Base

• Primary Study Base = population that gives rise to cases that can be defined before cases appear by a geographical area or some other identifiable entity like a health delivery system

• Nested within a cohort is a special case

Examples of Primary Study Bases

• Residents of San Francisco during 2001

• Members of the Kaiser Permanente system in the Bay Area during 2001

• Military personnel stationed at California bases during 2001

Example of Case-Control Incidence Density Sampling in a

Primary Study Base

• Use cancer registry covering San Francisco County to identify all new cases of glioma during a defined time period

• At time each new glioma case is reported, randomly sample two controls from current residents of San Francisco

Incidence Density Sampling in a Primary Study Base (e.g., San Francisco County)

New residents

Nested case-control in an open cohort with new subjects entering

PrimaryStudyBase

Case-Control Incidence Density Sampling in a Primary Study Base

• Same as nested case-control sampling in a cohort study with exception that in-migration of new persons requires one additional assumption

• Just as losses to the study base should not bias the results, additions to the study base should not introduce bias

Case-Based Case-Control Study: The Secondary Study Base

• Secondary Study Base = population that gave rise to cases, identified after cases diagnosed; those persons who would have been among the cases if they had developed the disease during the time period of study

• Start with a cases and then attempt to identify hypothetical cohort that gave rise to them

Primary vs. Secondary Base

• Main problem with a primary base is often ascertainment of all cases

• Main problem with a secondary base is the definition of the base

Case-Based Case Control Studies and the Secondary Study Base

• Source of cases is often one or more hospitals or other medical facilities

• Problem is identifying the population who would come to those institutions if they were diagnosed with the disease

• Careful consideration has to be given to factors causing someone to show up at that institution with that diagnosis

Case-control study starting with a sampleof cases and identifying secondary study base

Sampling can be incidence density just as in primary study base

Secondarystudy base

Case-Based Case Control Studies

• Example: glioma cases seen at UCSF

• Difficult because referrals come from many areas

• One possible control group might be UCSF patients with a different neurologic disease

• Patients from a similar tertiary referral clinic are another possible control group

Text example of case-based case-control design shows sampling prevalent controls

SecondaryStudyBase

Cross-Sectional Study Design

Case-based design using prevalent cases: essentially same as cross-sectional design

Example of case-based design using prevalent cases

• Sampling glioma patients under treatment in a hospital during study period

• Poor survival so patients in treatment will over-represent those who live longest

• Nature of bias variable and not predictable

Study base and case-control design

Critical features of best case-control design:

- cases need to consist of all, or a random sample, of subjects in the study base experiencing the outcome

- controls need to consist of a sample of the study base that can be used to estimate the distribution of the exposure (risk factor) in the base

Summary Points

• Ecological studies weak in showing cause but have some valuable features

• Nature, not the size, of losses to follow-up crucial in cohort studies

• Key to case-control design is specifying and sampling the study base

• Case-control results can be as valid as cohort results if study properly designed and measurements made without bias

three main points to be covered nature, weakness, and (sometime) strength of studies using...

Documents