a short introduction to epidemiology chapter 9: data analysis neil pearce centre for public health...

51
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Upload: susan-perkins

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

A short introduction to epidemiology

Chapter 9: Data analysis

Neil PearceCentre for Public Health

ResearchMassey University

Wellington, New Zealand

Page 2: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Chapter 9

Data analysis

• Basic principles

• Basic analyses

• Control of confounding

Page 3: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Basic principles

• Effect estimation

• Confidence intervals

• P-values

Page 4: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Testing and estimation• The effect estimate provides an estimate of the effect (e.g.

relative risk, risk difference) of exposure on the occurrence of disease

• The confidence interval provides a range of values in which it is plausible that the true effect estimate may lie

• The p-value is the probability that differences as large or larger as those observed could have arisen by chance if the null hypothesis (of no association between exposure and disease) is correct

• The principal aim of an individual study should be to estimate the size of the effect (using the effect estimate and confidence interval) rather than just to decide whether or not an effect is present (using the p-value)

Page 5: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Problems of significance testing• The p-value depends on two factors: the size of the effect;

and the size of the study• A very small difference may be statistically significant if the

study is very large, whereas a very large difference may not be significant if the study is very small.

• The purpose of significance testing is to reach a decision based on a single study. However, decisions should be based on information from all available studies, as well as non-statistical considerations such as the plausibility and coherence of the effect in the light of current theoretical and empirical knowledge (see chapter 10).

Page 6: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Chapter 9

Data analysis

• Basic principles

• Basic analyses

• Control of confounding

Page 7: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Basic analyses• Measures of occurrence

– Incidence proportion (risk)– Incidence rate– Incidence odds

• Measures of effect– Risk ratio– Rate ratio– Odds ratio

Page 8: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Example:

C

E

c

a

E

b

M0d

N0N1 T

C

M1

Page 9: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Example: Smoking and Ovarian Cancer

98

158

60

E

36

40

76

E

58

24

82

C

C

Page 10: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

46.0

5836

4024

/

/

x

x

bc

ad

dc

baOR

Page 11: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

45.5157158158/98607682

)15860(8224

)1(/

)/(

))((

)()(

2

20101

211

22

xxxxxTTMMNN

TMNa

aEVar

aExpaObsX

Page 12: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

This 2 is based on the assumptions that the marginal totals of the table (N1, N0, M1,M0) are fixed and that the proportion of exposed cases is the same as the proportion of exposed controls (i.e. that the overall proportion M1/T applies to both cases and controls)

Page 13: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

The natural logarithm of the odds ratio has (under a binomial model) an approximate standard error of:

SE[ln(OR)] = (1/a +1/b+ 1/c +1/d)0.5

An approximate 95% confidence interval for the odds ratio is then given by:

OR e+1.96 SE

Page 14: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Chapter 9

Data analysis

• Basic principles

• Basic analyses

• Control of confounding

Page 15: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Control of confounding

There are two methods of calculating a summary effect estimate to control confounding:

• Pooling

• Standardisation

Page 16: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

The unadjusted (crude) findings indicate that there is a strong association between smoking and the ovarian cancer. Suppose, however, that we are concerned about the possibility that the effect of smoking is confounded by use of oral contraception (this would occur if oral contraception caused the ovarian cancer and if oral contraception was associated with smoking). We then need to stratify the data into those who have used oral contraceptives and those who have not.

Example of pooling:

Page 17: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

OC use

Yes No

Smoking Smoking

Cases

Controls

Yes No

65

50

15

16

12

4 19

81

62

17

8

9

60

28

32 41

77

36

Page 18: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

In those who have used oral contraceptives, the odds ratio for smoking is:

In those who have not used oral contraceptives, the odds ratio for smoking is:

90.0504

1215

x

xOR

98.0832

289

x

xOR

Page 19: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Thus, the crude OR for smoking (=0.46) was partly elevated due to confounding by oc use. When we remove this problem (by stratifying on oral contraceptive use) the odds ratios increase and are close to 1.0

Page 20: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

In this example, the odds ratios are not exactly the same in each stratum. If they are very different (e.g. 1.0 in one stratum and 4.0 in the other stratum) then we would usually report the findings separately for each stratum. However, if the odds ratio estimates are reasonably similar then we usually wish to summarize our findings into a single summary odds ratio by taking a weighted average of the OR estimates in each stratum.

Page 21: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

i

ii

W

ORWOR

where ORi = OR in stratum i Wi = weight given to stratum i

Page 22: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

One obvious choice of weights would be to weight each stratum by the inverse of its variance (precision-based estimates). However, this method of obtaining a summary odds ratio yields estimates which are unstable and highly affected by small numbers in particular strata.

Page 23: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

A better set of weights were developed by Mantel-Haenszel. These involve using the weights bi ci /Ti :

iii

iii

iii

ii

iiiii

i

ii

Tcb

Tda

Tcb

cb

daTcb

W

ORWOR

/

/

/

))(/(

Page 24: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

C

E

65

50

15

16

12

4 19

81

62

17

8

9

60

28

32 41

77

36

C C

C

EE

Stratum 1 Stratum 2

95.077/83281/504

77/28981/1215

xx

xxORMH

E

Page 25: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

This set of weights yields summary odds ratio estimates which are very close to being statistically optimal (they are very close to the maximum likelihood estimates) and are very robust in that they are not unduly affected by small numbers in particular strata (provided that the strata do not have any zero marginal totals).

Page 26: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

We can calculate a corresponding chi-square:

1/ 2

0101

2

112

2

iiiiii

i

iii

TTMMNN

TM

Na

Var

ExpObsMH

Page 27: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

C

E

65

50

15

16

12

4 19

81

62

17

8

9

60

28

32 41

77

36

C C

C

EE

Stratum 1 Stratum 2

E

016.0

767777/36416017808181/62191665

77

41179

81

196515

2

2

xxxxxxxxxx

MH

Page 28: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

The natural logarithm of the odds ratio has (under a binomial model) an approximate standard error of:

ΣPR Σ(PS + QR) ΣQS

SE = ----- + -------------- + ------

2R+2 2R+S+ 2S+

2

where: P = (ai + di)/Ti

Q = (bi + ci)/Ti

R = aidi/Ti

S = bici/Ti

R+ = ΣR

S+ = ΣS

Page 29: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

An approximate 95% confidence interval for the odds ratio is then given by:

OR e+1.96 SE

Page 30: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

E

a

E

bc M1

Y1 Y0PY

Rate ratios:

Page 31: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

E

350

0.001250.00350

E

125

10,000 10,000

Case

PY

Rate

Page 32: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

8.200125.0

00350.0

000,100/125

000,100/350

/

/

0

1 Yb

YaRR

Page 33: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Stratifying On Tobacco

Tobacco Tobacco

Yes No

Alcohol Alcohol

Yes No Yes No

Cases 300 50 50 75

Person-years

75,000 25,000 25,000 75,000

Rate 0.00400 0.00200 0.00200 0.00100

Page 34: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

10

1

/

/

bY

aY

Yb

YaRR o

Page 35: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

The summary Mantel-Haenszel rate ratio involves taking the weights bY1/T to yield:

TbY

TaYRRMH /

/

1

0

Page 36: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

0.2000,100/2500075000,100/7500050

000,100/7500050000,100/25000300

xx

xxRRMH

Page 37: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

The equivalent Mantel-Haenszel chi-square is:

1

201

2

112

2

/)( TMYY

TM

Ya

aEVar

aEa i

i

iiMH

Page 38: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

This is very similar to the 2MH for case-control

studies, but it has some minor modifications to take account of the fact that we are using person-time data rather than binomial data.

Page 39: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

5.35

000,100000,100/1257500025000000,100000,100/3502500075000

000,100125

2500050000,100

35075000300

2

2

xxxxxx

MH

Page 40: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

An approximate standard error for the natural log of the rate ratio is :

[ ΣM1iY1iY0i/Ti2]0.5

SE = ------------------------------

[(ΣaiY0i/Ti)(ΣbiY1i/Ti)]0.5

Page 41: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

An approximate 95% confidence interval for the rate ratio is then given by:

RR e+1.96 SE

Page 42: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Risk ratios:

E

a

E

bCases M1

N1 N0Total

c dNon Cases M0

Page 43: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

TbN

TaNRRMH /

/

1

0

Page 44: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

1/

/2

0101

2112

TTMMNN

TMNaMH

Page 45: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

An approximate standard error for the natural log of the risk ratio is :

[ ΣM1iN1iN0i/Ti2 - aibi/Ti]0.5

SE = ---------------------------------

[(ΣaiN0i/Ti)(ΣbiN1i/Ti)]0.5

Page 46: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

An approximate 95% confidence interval for the risk ratio is then given by:

RR e+1.96 SE

Page 47: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Standardization, in contrast to pooling, involves taking a weighted average of the rates in each stratum (eg age-group) before taking the ratio of the two standardized rates. Standardization has many advantages in descriptive epidemiology involving comparisons between countries, regions, ethnic groups or gender groups. However, pooling (when done appropriately) has some superior statistical properties when comparing exposed and non-exposed in specific study.

Page 48: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Summary of Stratified Analysis

If we are concerned about confounding by a factor such as age, gender, smoking then we need to stratify on this factor (or all factors simultaneously if there is more than one potential confounder) and calculate the exposure effect separately in each stratum.

If the effect is very different in different strata then we would report the findings separately for each stratum.

Page 49: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

If the effect is similar in each stratum then we can obtain a summary estimate by taking a weighted average of the effect in each stratum.If the adjusted effect is different from the crude effect this means that the crude effect was biased due to confounding.

Page 50: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

Usually we need to adjust the findings (ie stratify on) age, gender, and some other factors.If we have five age-groups and two gender-groups then we need to divide the data into ten age-gender-groups. If we have too many strata then we begin to get strata with zero marginal totals (eg with no cases or no controls).The analysis then begins to ‘break down’ and we have to consider using mathematical modelling.

Page 51: A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

A short introduction to epidemiology

Chapter 9: Data analysis

Neil PearceCentre for Public Health

ResearchMassey University

Wellington, New Zealand