17. september2015 - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~pka/mph15/mph-trans3-15.pdf ·...

Statistik for MPH: 3

17. september 2015

www.biostat.ku.dk/~pka/mph15

Analyse af kohortestudier, rater, standardisering (Silva: 70-80,110-133.)

Per Kragh Andersen

1

Fra den. 2. uges statistikundervisning:

– skulle jeg gerne1. forstå at den relative risiko er et fornuftigt mål for association

mellem determinant og udfald i kohorte- og tværsnitsundersøgelser

2. kunne beregne relativ risiko med tilhørende sikkerhedsgrænser udfra en given tabel

3. kunne beregne “chi-square” teststørrelsen for hypotesen: “relativrisiko=1” ud fra en given tabel

4. så småt begynde at vænne mig til begreberne: nulhypotese,teststørrelse, fordeling af teststørrelse, P-værdi, signifikansniveau,accept/forkastelse af hypotese

5. forstå at der er en tæt sammenhæng mellem sikkerhedsgrænser oghypotesetest

2

Fra den. 2. uges statistikundervisning:

– behøver jeg derimod ikke nødvendigvis

1. at have forstået, hvordan formlen for SD(lnRR) fremkommer

2. at have forstået, hvordan standard afvigelsen i “Mantel-Haenszelteststørrelsen” fremkommer

3. at have forstået, hvorfor teststørrelsens værdi netop skal slås op i“χ2-tabellen med 1. frihedsgrad”

4. at have forstået alle begreberne nævnt under 4. ovenfor til bunds

3

Cohort studiesClassical ex.: Evans County Study. 609 white men, aged 40-76,followed for 7 years from 1960.

CHD

CHL>260mg/100mL Yes No Total

Yes 14 91 105

No 57 447 504

Total 71 538 609

Summarizing data in this way, it has been assumed that everyone hasbeen followed for 7 years - this is more an exception than the rule:

• People leave the study due to loss of follow-up

• People may enter the study at different time points

We wish to take the varying time at risk into account.

4

The 7-year CHD-risk (“cumulative incidence”) expresses the proportionof the cohort at risk at entry having a CHD-event during the 7 years offollow-up.

It is, therefore, ≥ the 5-year risk

and ≤ the 10-year risk

Alternative frequency measure: THE RATE, r(t) expressing risk pertime unit at time t (∼ “speed” at t). For h > 0 (small):

r(t) ≈ Prob(Disease before time t+ h given no disease time t)/h.

We estimate a constant rate as (Silva, p.63):

r =number of (CHD-) events obs.

total person-time at risk=a

y

Alternative names: INCIDENCE RATE, INCIDENCE DENSITY,HAZARD (RATE)

5

Calculation of person-yearsCalculation of person-years of observation for six individuals (see alsoSilva, p.62)..

Date of birth Date observation Date observation Death

began ended (0=no, 1=yes)

Jan. 4, 1897 July 11, 1954 Dec. 31, 1962 0

Sept. 5, 1884 Aug. 3, 1954 Nov. 25, 1960 1

Dec. 16, 1904 Oct. 25, 1954 Dec 31, 1962 0

Jan. 16, 1899 Nov. 1, 1954 Dec. 31, 1962 0

Apr. 9, 1912 Feb. 19, 1957 Dec. 31, 1962 0

Feb. 22, 1910 Dec. 8, 1957 Aug. 18, 1959 1

Years Months Days ' Years1 8 5 20 8.472 6 3 22 6.313 8 4 6 8.354 8 1 30 8.175 5 10 12 5.876 1 8 10 1.69

38.86

6

r = 238.86 years = 0.051 per year

= 51 per 1000 years

Note that while the risk is between 0 and 1, the rate has no upperlimit (the value depends on the time unit used)

Some times: approximations are used for person-years when dates areunavailable (see, e.g. Silva p.71)

What does the fraction 26 tell us? Not much! If it is to be thought of

as a risk estimate then it is quite unclear to which follow-up period itcorresponds.

7

Confidence intervals for rates

Better (and more complicated) method:

1. Calculate l = ln(r)

2. Calculate L2 = l + 1.96 · 1√a

Calculate L1 = l − 1.96 1√a

3. The desired 95% confidence limits are:from exp(L1) to exp(L2)

Example: l = ln(0.051) = −2.98L2 = −2.98 + 1.96 · 1√

2= −1.59

L1 = −2.98− 1.96 · 1√2= −4.36

95% confidence limits from exp(−4.36) = 0.013 per year toexp(−1.59) = 0.204 per year.

8

Relation between rate and riskIf the rate is (constant) = r (per year) what is then the T-year risk (ifthere are no other “competing” events)?

T

|

0

h = TN

| |

h = TN · · ·

|

h = TN

|

Probability of no event before T

= (1− r · h) · (1− r · h) · · · (1− r · h) = (1− r · h)N

=(1− r · TN

)N ≈ exp(−r · T )

That is: T -year risk = 1− exp(−rT ).

If rT is small then T -year risk ≈ rT (Silva, p.77).

9

Example: r = 0.051 per year, T = 5 years

1− exp(−rT )=1− exp(−0.051 · 5)=1− exp(−0.255)=0.225

∧r · T = 0.051 · 5 = 0.255

Note that it is quite ‘smart’ that, given a (constant) rate, it is possibleto calculate the risk for any chosen follow-up period, T .

10

Corresponding measure of associationRATE RATIO

Some times called “Relative risk” which is bad terminology, but nottotally silly because if the disease is “rare”:

Risk1Risk2

≈ Rate1 · TRate2 · T

=Rate1Rate2

95% confidence limits for rate ratio, RR:

1. Calculate ln(RR)

2. Calculate L2 = ln(RR) + 1.96√

1a1

+ 1a2

L1 = ln(RR)− 1.96√

1a1

+ 1a2

3. The 95% confidence limits are:from exp(L1) to exp(L2)

11

Example: Nurses health study (p. 179)BC cases Person-years

Age Group OC Non-OC OC Non-OC

45-49 a1 = 204 a2 = 240 y1 = 94029 y2 = 128528

Table 8.12: Cohort study of breast cancer (BC) and oral contraceptive (OC) use.

OC users Non-OC users

rO = 20494029

= 0.00217 per year rN = 240128528

= 0.00187 per year

lO =ln(0.00217) lN =ln(0.00187)

=−6.133 =−6.283

L2 =−6.133 + 1.96 1√204

L2 =−6.157

=−5.996

L1 =−6.270 L1 =−6.410

exp(L1)=0.00189 per year exp(L1)=0.00165 per year


12

Example (contd.)

RR =0.00217

0.00187= 1.162

ln(RR) = 0.150

L2 = 0.150 + 1.96

√1

204+

1

240= 0.337

L1 = 0.150− 1.96

√1

204+

1

240= −0.0366

exp(L1) = 0.964, exp(L2) = 1.400

13

Exercise:

1. Calculate the crude rates for stomach cancer (SC) in Cali andBirmingham (with 95% confidence limits) based on the table inSilva, p.71.

2. Calculate the rate ratio with 95% confidence limits

3. Calculate the chi-square test (to be defined in the following) forcomparing the two rates.

14

Solution (1).

SC cases Person-years

Cali Birmingham Cali Birmingham

620 3468 5 · 622922 4 · 2556200

rC = 6203114610

= 0.000199 per year rB = 346810224800

= 0.000339 per year

ln(rC ) =−8.522 ln(rB) =−7.989

L2 =−8.522 + 1.96√

1620

L2 =−7.956

=−8.443

L1 =−8.522 − 1.96√

1620

L1 =−8.022

=−8.601



15

Solution (2).

RR =0.000199

0.000339= 0.59, ln(RR) = −0.533

L2 = −0.533 + 1.96

√1

620+

1

3468= −0.447,

L1 = −0.533− 1.96

√1

620+

1

3468= −0.618,

exp(L1) = 0.54, exp(L2) = 0.64.

16

Test for comparing two ratesNurses health study

Events (BC) Person-years

OC-users 204 (a1) 94029 (y1)

Non-OC-users 240 (a2) 128528 (y2)

444 (a) 222557 (y)

Here: a = a1 + a2, y = y1 + y2.

17

“Mantel-Haenszel” idea:Compare: OBServed = 204 = a1

with : EXPected = 94029 · 444222557

= y1 · ay = 187.6

SD(a1) =√a · y1·y2

y·y

=√444 · 94029

222557 ·128528222557 = 10.41

Test statistic =(OBS−EXP

SD

)2=(204−187.6

10.41

)2= 2.48

The chi-square table (1 d.f.) gives0.10 < P < 0.25

18

Solution to exercise (3).

OBS = 620

EXP = 3114610 · 4088

13339410= 954.5

SD =

√4088 · 3114610

13339410· 1022480013339410

= 27.0(OBS − EXP

SD

)2

= 152.9 ∼ chi-square (1 d.f.)

P < 0.001

19

How can we adjust/standardise

for age when comparing two groups?

Two types of standardisation exist:

– “DIRECT” cf. Silva– “INDIRECT” p. 70-79

“Direct”: Compare weighted averages of the age-specific rates in thetwo groups using some standard age-distribution as weights

“Indirect”: In each group, compare the observed number of eventswith what one would have expected, had the group been exposedto some standard age-specific rates.

Other ways of adjustment (also for other factors than age):

– STRATIFIED

– REGRESSION

ANALYSIS

20

Example: 1970 US mortality data (Kahn & Sempos, 1989).California (a) Maine (b)

Age Pop.in No.of Rate per Pop.in No.of Rate per

1000 deaths 1000 ys. 1000 deaths 1000 ys.

< 15 5524 8751 1.6 286 535 1.9

15-24 3558 4747 1.3 168 192 1.1

25-34 2677 4036 1.5 110 152 1.4

35-44 2359 6701 2.8 109 313 2.9

45-54 2330 15675 6.7 110 759 6.9

55-64 1704 26276 15.4 94 1622 17.3

65-74 1105 36259 32.8 69 2690 39.0

75+ 696 63840 91.7 46 4788 104.1

Total 19953 166285 8.3 992 11051 11.1

“Crude rates”

(a) California:166285

19953000= 8.3 per 1000 ys.

(b) Maine:11051

992000= 11.1 per 1000 ys.

RR = 1.34

21

The age distributions in California and Maine differ so the apparentdifference may be (partly) ascribed to this.

California Maine

Age Pop. in 1000 % Pop. in 1000 %

< 15 5524 28 286 29

15-24 3558 18 168 17

25-34 2677 13 110 11

35-44 2359 12 109 11

45-54 2330 12 110 11

55-64 1704 9 94 9

65-74 1105 6 69 7

75+ 696 3 46 5

Total 19953 100 992 100

22

Example: 1970 US mortality data.United States

Age Pop.in Weight No.of Rate per

1000 deaths 1000 ys.

< 15 57900 .285 103062 1.8

15-24 35441 .174 45261 1.3

25-34 24907 .123 39193 1.6

35-44 23088 .114 72617 3.1

45-54 23220 .114 169517 7.3

55-64 18590 .091 308373 16.6

65-74 12436 .061 445531 35.8

75+ 7630 .038 736758 96.6

Total 203212 1.000 1920312 9.4

(Kahn & Sempos, 1989)

Directly standardised rates:DSRa = 0.285×1.6+0.174×1.3+· · ·+0.038×91.7 = 8.8 per 1000 ys.

DSRb = 0.285× 1.9 + 0.174× 1.1 + · · ·+ 0.038× 104.1 =

9.9 per 1000 ys.

RR = 1.12

23

Example from Silva, p.71.Male stomach cancer cases:

Cali Birmingham

Age Popu- Person- No.of Rate per Popu- Person- No.of Rate per

lation years cancers 100000 ys. lation years cancers 100000 ys.

0-44 524220 2621100 39 1.5 1683600 6734400 79 1.2

45-64 76304 381520 266 69.7 581500 232600 1037 44.6

65+ 22398 111990 315 281.3 291100 1164400 2352 202.0

Total 622922 3114610 620 19.9 2556200 10224800 3468 33.9

Age distributions (%):

Age Cali Birmingham Standard

0-44 84 66 74

45-64 12 23 19

65+ 4 11 7

Total 100 100 100

24

The age distributions in Cali and Birhingham differ so the differencebetween the crude rates may be (partly) ascribed to this.

Directly standardised rates:

DSRC = 0.74× 1.5 + 0.19× 69.7 + 0.07× 281.3 = 34.04

DSRB = 0.74× 1.2 + 0.19× 44.6 + 0.07× 202.2 = 23.50

(both per 100000 ys.)

25

Indirect standardisation(a) California: Obs. number of deaths = 166 285

Expected number of deaths =

1.8× 5524 + 1.3× 3558 + · · ·+ 96.6× 696 = 178253

Standardised mortality ratio:

SMRa =166285

178253= 0.933 =

93.3

100

(b) Maine: Obs. = 11051

Exp. = 10524

SMRb = 1105110524 = 1.050 = 105.0

100

26

If one wants to express the results as rates then indirectly standardisedrates may be computed:

ISRa = SMRa · 0.0094 = 8.8 per 1000 ys

ISRb = SMRb · 0.0094 = 9.9 per 1000 ys

where 0.0094 per year=9.4 per 1000 ys. is the overall crude rate in thestandard population.

Silva (p. 74) illustrates indirect standardisation by computing thenumber of cancer cases expected in Cali if the age-specific rates fromBirmingham apply.

27

Discussion

– Standardisation is a classical technique for age-adjustment.

– For comparing groups statistically, stratified (and regression)analysis (to be discussed later) is superior.

– The average calculated in direct standardisation frequentlyprovides an over-simplification for analysis (but it may be OK forgraphical displays etc.).

– Indirect standardisation (SMR) may be used for comparing asample with standard rates (for this, we need a confidenceinterval, see below) but for comparing two (or more) groups(samples) stratified (and regression) analysis is better.

28

Confidence interval for SMRBest method (using ln!)

1. Calculate ln(SMR)

2. Calculate ln(SMR) + 1.96 1√OBS

= L2

ln(SMR)− 1.96 1√OBS

= L1

3. The desired 95% confidence limits are from exp(L1) to exp(L2)

Example: Maine compared to US

1. ln(1.050) = 0.049

2. 0.049 + 1.96 1√11051

= L2 = 0.068,

0.049− 1.96 1√11051

= L1 = 0.030

3. exp(L1) = 1.030, exp(L2) = 1.070

Exercise: California!

29

Solution.SMR for California:

166285

178253= 0.933

1. ln(SMR) = −0.069

2. L1 = −0.069− 1.96 1√166285

= −0.074L2 = −0.069 + 1.96 1√

166285= −0.064

3. exp(L1) = 0.929

exp(L2) = 0.938

30

17. september2015 - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~pka/mph15/mph-trans3-15.pdf ·...

Documents