case-control studies: statistical analysis

52
Greg Stoddard December 16, 2010 University of Utah School of Medicine Case-Control Studies: Statistical Analysis

Upload: keira

Post on 14-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Case-Control Studies: Statistical Analysis. Greg Stoddard December 16, 2010 University of Utah School of Medicine. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Case-Control Studies:  Statistical Analysis

Greg Stoddard

December 16, 2010University of Utah School of Medicine

Case-Control Studies: Statistical Analysis

Page 2: Case-Control Studies:  Statistical Analysis

Rothman claims, “Properly carried out, case-control studies provide information that mirrors what could be learned from a cohort study, usually at considerably less cost and time.”

[Rothman KJ, Epidemiology: An Introduction, 2002, p.73]

Goal: contrast the statistical approaches of the two study designs to verify Rothman’s claim.

Page 3: Case-Control Studies:  Statistical Analysis

Diagrammatically,

Cohort study Case-Control Study

D E

E not-D D not-E

D E

not-E not-D not-D not-E

Page 4: Case-Control Studies:  Statistical Analysis

Data Layout,

Cohort Study

Case-Control

Study

N = fixed , n = free to vary

E Not-E

D a b nD

Not-D c d nnot-D

NE Nnot-E

E Not-E

D a b ND

Not-D c d Nnot-D

nE nnot-E

Page 5: Case-Control Studies:  Statistical Analysis

Cohort Study

incidence proportion =

disease cases / persons at

risk

Case-Control Study

incidence proportion =

(not estimable)

E Not-E

D a b nD

Not-D c d nnot-D

NE Nnot-E

E Not-E

D a b ND

Not-D c d Nnot-D

nE nnot-E

Page 6: Case-Control Studies:  Statistical Analysis

The incidence proportion not being estimable is not much of a shortcoming.

Given a study’s inclusion/exclusion criteria, the incidence proportion does not actually apply to a very wide patient population, anyway.

Page 7: Case-Control Studies:  Statistical Analysis

The goal is not to estimate incidence, but rather to assess an exposure-disease association.

We can do that just fine with relative measures of effect, the risk ratio and odds ratio.

Page 8: Case-Control Studies:  Statistical Analysis

Cohort Study

risk ratio = (a/NE)/(b/Nnot-E)

odds ratio = odds(D|E)/odds(D|not-E)

= ( = (a/c)/(b/d) = (ad)/(bc)

Case-Control Study

exposure odds ratio

= odds(E|D)/odds(E|not-D)

= (a/b)/(c/d) = (ad)/(bc)

So, as long as either E or D is free to vary, you get the same relative effect measure, the odds ratio, with both study designs.

E Not-E

D a b nD

Not-D c d nnot-D

NE Nnot-E

E Not-E

D a b ND

Not-D c d Nnot-D

nE nnot-E

Page 9: Case-Control Studies:  Statistical Analysis

Cohort Study

If the disease is rare (<10% in both E and Not-E groups), so a ≈ 0 and b ≈ 0, then c ≈ a + c and d ≈ b + d.

Substituting,

So, OR from case-control study approximates RR from cohort study, when the rare disease assumption is met.

E Not-E

D a b nD

Not-D c d nnot-D

NE Nnot-E

E

not-E

a aN a+cRR=b bN b+d

a aada+c cRR= = =OR

b b bcb+d d

Page 10: Case-Control Studies:  Statistical Analysis

Why the 10%, or 0.10, incidence proportion is a good cutpoint for “rare disease” is illustrated nicely in a figure published in:

Zhang J, Yu KF. What’s the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA 1998;

280(19):1690-91.

Page 11: Case-Control Studies:  Statistical Analysis

Aside:

The formula in Zhang and Yu (1998) for converting an odds ratio to a risk ratio in cohort studies has been convincing criticized as unreliable (Zou, 2004) so you should avoid using it.[Zou G. A modified Poisson regression approach to prospective studies with binary data. Am J Epidemiol 2004;159(7):702-706.]

Page 12: Case-Control Studies:  Statistical Analysis

Checking our progress

How far have we gotten, thus far, in verifying that a case-control study can mirror what can be learned in a cohort study?

Page 13: Case-Control Studies:  Statistical Analysis

Checking our progress

We have seen that the OR is the same in both study designs.

We have seen that the OR approximates the RR under the rare disease assumption, and so it has a straightforward interpretation.

Page 14: Case-Control Studies:  Statistical Analysis

Checking our progress

However, cohort studies rarely use the odds ratio, nor do they use the risk ratio.

Instead, cohort studies use survival analysis.

Why?

Page 15: Case-Control Studies:  Statistical Analysis

Risk Ratio Analysis

This type of analysis ignores time-at-risk. That is, it assumes an equal follow-up time for every study subject.

Page 16: Case-Control Studies:  Statistical Analysis

The risk ratio uses partial information (shown in blue) from the complete data in the life table.

Exposed Non-Exposed

Follow-up day

BeginN

Disease Cases

Day-Specific

Risk

BeginN

Disease Cases

Day-Specific

Risk

Day-Specific

RiskRatio

1 50 5 0.10 50 2 0.04 2.5

2 30 10 0.33 40 8 0.20 1.7

3 10 10 1.00 20 10 0.50 2.0

Total 90 25 110 20

Page 17: Case-Control Studies:  Statistical Analysis

Risk ratio analysis data

Risk Ratio = (25/50)/(20/50) =1.25

Chi-square test, p = 0.31

Analyzing these data in this way, we do not demonstrate a significant effect. In fact, this crude RR underestimates each of the day-specific RR estimates.

Exposed Not-Exposed

Disease 25 (50%) 20 (40%)

Not-Disease 25 30

N 50 50

Page 18: Case-Control Studies:  Statistical Analysis

Rate Ratio Analysis

Let’s see if we can do better with a rate ratio analysis. It uses a person-time denominator, so in that sense, it relaxes the equal time-at-risk assumption of the risk ratio analysis.

Page 19: Case-Control Studies:  Statistical Analysis

The rate ratio uses partial information (shown in blue) from the complete data in the life table.

Exposed Non-Exposed

Follow-up day

BeginN

Disease Cases

Day-Specific

Risk

BeginN

Disease Cases

Day-Specific

Risk

Day-Specific

RiskRatio

1 50 5 0.10 50 2 0.04 2.5

2 30 10 0.33 40 8 0.20 1.7

3 10 10 1.00 20 10 0.50 2.0

Total 90 25 110 20

Page 20: Case-Control Studies:  Statistical Analysis

Rate ratio analysis data

Rate Ratio = (25/90)/(20/110) =1.53

Binomial probability mid-p exact test for person-time data,

p = 0.080

Analyzing these data in this way, we almost demonstrate a significant effect. Again, this crude rate ratio underestimates each of the day-specific risk ratio (rate ratio) estimates.

Exposed Not-Exposed

Disease 25 (50%) 20 (40%)

Person-Days 90 110

Page 21: Case-Control Studies:  Statistical Analysis

Inefficient Use of Time in Rate Ratio Analysis

The reason the rate ratio analysis failed to convey the information in the life table is because it only considers ratio of cases to average person-time, without distinguishing times to event and times to censoring.

Page 22: Case-Control Studies:  Statistical Analysis

person-time = total time for subjects

= mean time x N

Suppose the individual times-at-risk for a sample are: 10, 20, and 30. The person-time is computed as:

PT = total time for subjects

= 10+20+30 = 60

which is equivalent to :

PT = mean time x N

= (10+20+30)/3 x 3 = 20 x 3 = 60

Page 23: Case-Control Studies:  Statistical Analysis

So, a rate ratio analysis would find the following two scenarios equal (even though Group B outperforms Group A)

(let x----x denote time)

 

x-------------------------------------x (censored) Group A x-----x (died)

x--------x (died)

x--------------------------------------------x (censored)

x-------------------------------------x (died) Group B

x-----x (censored)

x--------x (censored)

x--------------------------------------------x (died)

Page 24: Case-Control Studies:  Statistical Analysis

Hazard Ratio Analysis (Survival Analysis)

This analysis uses time-at-risk is a very complete way, using all of the information in the life table.

Page 25: Case-Control Studies:  Statistical Analysis

From Cox regression, HR = 1.92, p = 0.032

The HR is identically the Mantel-Haenzsel summary risk ratio.

Exposed Non-Exposed

Follow-up day

BeginN

Disease Cases

Day-Specific

Risk

BeginN

Disease Cases

Day-Specific

Risk

Day-Specific

RiskRatio

1 50 5 0.10 50 2 0.04 2.5

2 30 10 0.33 40 8 0.20 1.7

3 10 10 1.00 20 10 0.50 2.0

Total 90 25 110 20

Page 26: Case-Control Studies:  Statistical Analysis

Aside

Showing a life table like this and pointing out that the HR is just the weighted average of the day specific risk ratios, and so is a relative risk estimate, is a very clear way to explain the HR to a researcher.

Page 27: Case-Control Studies:  Statistical Analysis

Checking our progress

Recall, we are trying to verify that a case-control study can mirror what can be learned in a cohort study.

It appears, then, that we need to incorporate survival analysis into the case-control framework in order to keep up with what a cohort study can do.

Page 28: Case-Control Studies:  Statistical Analysis

It turns out we can do this, use survival analysis in the case-control framework, if we tweak the study design slightly.

The slight variant is called the case-cohort design (also called the density case-control design).

Page 29: Case-Control Studies:  Statistical Analysis

While presenting this design, I am going to show some simulation results. In this way, I can demonstrate that the case-cohort design really does perform as well as a cohort study design.

Page 30: Case-Control Studies:  Statistical Analysis

Dataset

The dataset comes from Breslow and Day [Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research, Vol II: The Design and Analysis of Cohort Studies, Lyon, France, IARC, 1987.]

Men (n=679) employed in a nickel refinery in South Wales were investigated to determine whether the risk of developing carcinoma of the bronchi and nasal sinuses (ICD = 160), which had been associated with the refining of nickel from previous studies in the 1930s, was present in this cohort.

Page 31: Case-Control Studies:  Statistical Analysis

Modified Dataset

I also modified the dataset, to create a second dataset that does not meet the rare disease assumption, by duplicating the cases five times.

Page 32: Case-Control Studies:  Statistical Analysis

Treating this dataset as the “population”, and then analyzing it, we know what the answer is that a case-control design which samples from this cohort is supposed to achieve.

Page 33: Case-Control Studies:  Statistical Analysis

The population relative measures are:Population

Relative Effect Measure

Actual Dataset with almos rare disease (3% in

unexposed, 12% in exposed)

Augmented Dataset with

frequent disease (15% in

unexposed, 60% in exposed)

Odds Ratio 3.76 3.76

Risk Ratio 3.43 2.65

Rate Ratio 4.76 3.87

Hazard Ratio 5.02 4.19

Page 34: Case-Control Studies:  Statistical Analysis

Classical Case-Control Study (controls are sampled from the population controls only)

Using a 2:1 sampling ratio

use all 56 cases

sample 56 x 2

controls

Exposed to nickel

Not exposed to nickel

Total

Tumor 46 10 56

No Tumor 343 280 56

Total 389 290 679

Page 35: Case-Control Studies:  Statistical Analysis

Monte Carlo simulation, computing OR from 1,000 samples, to get long-run average of OR.

(Each sample keeps all 56 subjects from the tumor row of the population 2 x 2 table, and the randomly samples 112 subjects from the no-tumor row of the population 2 x 2 table.)

Page 36: Case-Control Studies:  Statistical Analysis

The simulations results are:

Classical case-control design (sample

controls from no-tumor subjects only)

Population Relative Effect

Measure

Actual Dataset with almos rare disease (3% in

unexposed, 12% in exposed)

Augmented Dataset with

frequent disease (15% in

unexposed, 60% in exposed)

Odds Ratio 3.76 (OR=3.81) 3.76 (OR=3.77)

Risk Ratio 3.43 2.65

Rate Ratio 4.76 3.87

Hazard Ratio 5.02 4.19

Page 37: Case-Control Studies:  Statistical Analysis

Case-Cohort Study Design

- In this design, we keep the cases. Then,

we sample our controls from the total row of the population 2 x 2 table.

- For those cases that get mixed in with the controls, we set their status variable to 0, the control value.

- We then calculate the OR in the usual way.

Page 38: Case-Control Studies:  Statistical Analysis

Case-Cohort Study (controls are sampled from the population row totals, which includes both cases and controls)

Using a 2:1 sampling ratio

use all 56 cases

sample 56 x 2

controls

The odds ratio is then a direct calucation of the risk ratio.

OR = (a x kd)/(b x kc) = (kad)/(kbc) = (ad)/(bc) , where k=(56x2)/679

RR = (a/c)/(b/d) = (ad)/(bc) = OR

Exposed to nickel

Not exposed to nickel

Total

Tumor 46 a 10 b 56

No Tumor 343 280 56

Total 389 c 290 d 679

Page 39: Case-Control Studies:  Statistical Analysis

The simulations results are:

Case-cohort design (sample controls

from total row of population 2 x 2 table)

Population Relative Effect

Measure

Actual Dataset with almos rare disease (3% in

unexposed, 12% in exposed)

Augmented Dataset with

frequent disease (15% in

unexposed, 60% in exposed)

Odds Ratio 3.76 3.76

Risk Ratio 3.43 (OR=3.48) 2.65 (OR=2.67)

Rate Ratio 4.76 3.87

Hazard Ratio 5.02 4.19

Page 40: Case-Control Studies:  Statistical Analysis

Case-Cohort Study Design

For the case-cohort design, the rare-disease assumption is not required for the OR to be an estimate of RR (Rothman and Greenland, 1998, p.110). We have demonstrated that to be the case.

[Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA.]

Page 41: Case-Control Studies:  Statistical Analysis

Case-Cohort Study DesignIt is nice to be able to use the OR to directly

estimate RR, and not worry about the rare disease assumption at all.

It comes with a price, however. Since your controls are now “messy”, with cases mixed in, you do not have as clear of a signal for the effect, so statistical power is reduced. You need to sample additional controls to make up the difference (to get it back to the power of the classic case-control study).

Page 42: Case-Control Studies:  Statistical Analysis

Case-Cohort Study Design With Risk Set Sampling

In this design, you again keep all of the cases.

You then, again, sample controls from the total row of population 2 x 2 table (sampled from cases & controls). This time, however, you sample from total row subjects which have the same or longer time-at-risk. This is called risk set sampling.

Page 43: Case-Control Studies:  Statistical Analysis

In this design, we also use a type of “total row” sampling. That is, we select our controls from the “Beginning N” column’s of the life table.

Exposed Non-Exposed

Follow-up day

BeginN

Disease Cases

Day-Specific

Risk

BeginN

Disease Cases

Day-Specific

Risk

Day-Specific

RiskRatio

1 50 5 0.10 50 2 0.04 2.5

2 30 10 0.33 40 8 0.20 1.7

3 10 10 1.00 20 10 0.50 2.0

Page 44: Case-Control Studies:  Statistical Analysis

For the 5+2 cases that occurred on day 1, we sample our controls from the 50+50 persons still at risk on day 1.

Exposed Non-Exposed

Follow-up day

BeginN

Disease Cases

Day-Specific

Risk

BeginN

Disease Cases

Day-Specific

Risk

Day-Specific

RiskRatio

1 50 5 0.10 50 2 0.04 2.5

2 30 10 0.33 40 8 0.20 1.7

3 10 10 1.00 20 10 0.50 2.0

Page 45: Case-Control Studies:  Statistical Analysis

For the 10+8 cases that occurred on day 2, we sample our controls from the 30+40 persons still at risk on day 2. …and so on.

Exposed Non-Exposed

Follow-up day

BeginN

Disease Cases

Day-Specific

Risk

BeginN

Disease Cases

Day-Specific

Risk

Day-Specific

RiskRatio

1 50 5 0.10 50 2 0.04 2.5

2 30 10 0.33 40 8 0.20 1.7

3 10 10 1.00 20 10 0.50 2.0

Page 46: Case-Control Studies:  Statistical Analysis

We do this by forming risk sets. For every case, we form a risk set that includes all subjects with an equal or longer follow-up time. Then we sample 2 controls from that risk set, if we are using a 2:1 sampling ratio, that we match with that case.

This is identical to sampling on the correct row from the Beginning N column, like we did above.

Page 47: Case-Control Studies:  Statistical Analysis

We have already seen that the OR from a case-cohort study design directly estimates the RR.

We are now doing a version of the case-cohort approach for each row of the life table.

We know that the HR is just the summary RR across the rows of the life table.

If we use conditional logistic regression, then, to account for the row-specific matching, it would seem the OR should directly estimate the HR.

Page 48: Case-Control Studies:  Statistical Analysis

Let’s see if that is true.

This time in the simulation, we will take the OR from the conditional logistic regression, rather than calculate if from a 2 x 2 table like we did for the previous simulations.

The mean of the 1,000 conditional logistic regression ORs will be our estimate of the HR.

Page 49: Case-Control Studies:  Statistical Analysis

The simulations results are:Case-cohort design with risk set

sampling.

We were close, but the estimates appear to be biased.

Population Relative Effect

Measure

Actual Dataset with almos rare disease (3% in

unexposed, 12% in exposed)

Augmented Dataset with

frequent disease (15% in

unexposed, 60% in exposed)

Odds Ratio 3.76 3.76

Risk Ratio 3.43 2.65

Rate Ratio 4.76 3.87

Hazard Ratio 5.02 (OR=5.42) 4.19 (OR=4.43)

Page 50: Case-Control Studies:  Statistical Analysis

The way it is really done is to use risk set sampling followed by an actual Cox regression.

To adjust the standard error for the way the sampling was done, there are three approaches:

Prentice

Self and Prentice

Barlow

Page 51: Case-Control Studies:  Statistical Analysis

In Stata,

Prentice:stcascoh, alpha(.18) // risk set sampling

stcox nickel, robust

Self and Prenticestcascoh, alpha(.18) // risk set sampling with log weights (_wSelPre)

stcox nickel, robust offset(_wSelPre)

Barlowstcascoh, alpha(.18) // risk set sampling with log weights (_wBarlow)

stcox nickel, robust offset(_wBarlow)

Page 52: Case-Control Studies:  Statistical Analysis

The simulations results are:Case-cohort design with risk set

sampling (Prentice Method)

Estimates appear unbiased using this approach.

Population Relative Effect

Measure

Actual Dataset with almos rare disease (3% in

unexposed, 12% in exposed)

Augmented Dataset with

frequent disease (15% in

unexposed, 60% in exposed)

Odds Ratio 3.76 3.76

Risk Ratio 3.43 2.65

Rate Ratio 4.76 3.87

Hazard Ratio 5.02 (HR=5.08) 4.19