17. september2015 - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~pka/mph15/mph-trans3-15.pdf ·...
TRANSCRIPT
Statistik for MPH: 3
17. september 2015
www.biostat.ku.dk/~pka/mph15
Analyse af kohortestudier, rater, standardisering (Silva: 70-80,110-133.)
Per Kragh Andersen
1
Fra den. 2. uges statistikundervisning:
– skulle jeg gerne1. forstå at den relative risiko er et fornuftigt mål for association
mellem determinant og udfald i kohorte- og tværsnitsundersøgelser
2. kunne beregne relativ risiko med tilhørende sikkerhedsgrænser udfra en given tabel
3. kunne beregne “chi-square” teststørrelsen for hypotesen: “relativrisiko=1” ud fra en given tabel
4. så småt begynde at vænne mig til begreberne: nulhypotese,teststørrelse, fordeling af teststørrelse, P-værdi, signifikansniveau,accept/forkastelse af hypotese
5. forstå at der er en tæt sammenhæng mellem sikkerhedsgrænser oghypotesetest
2
Fra den. 2. uges statistikundervisning:
– behøver jeg derimod ikke nødvendigvis
1. at have forstået, hvordan formlen for SD(lnRR) fremkommer
2. at have forstået, hvordan standard afvigelsen i “Mantel-Haenszelteststørrelsen” fremkommer
3. at have forstået, hvorfor teststørrelsens værdi netop skal slås op i“χ2-tabellen med 1. frihedsgrad”
4. at have forstået alle begreberne nævnt under 4. ovenfor til bunds
3
Cohort studiesClassical ex.: Evans County Study. 609 white men, aged 40-76,followed for 7 years from 1960.
CHD
CHL>260mg/100mL Yes No Total
Yes 14 91 105
No 57 447 504
Total 71 538 609
Summarizing data in this way, it has been assumed that everyone hasbeen followed for 7 years - this is more an exception than the rule:
• People leave the study due to loss of follow-up
• People may enter the study at different time points
We wish to take the varying time at risk into account.
4
The 7-year CHD-risk (“cumulative incidence”) expresses the proportionof the cohort at risk at entry having a CHD-event during the 7 years offollow-up.
It is, therefore, ≥ the 5-year risk
and ≤ the 10-year risk
Alternative frequency measure: THE RATE, r(t) expressing risk pertime unit at time t (∼ “speed” at t). For h > 0 (small):
r(t) ≈ Prob(Disease before time t+ h given no disease time t)/h.
We estimate a constant rate as (Silva, p.63):
r =number of (CHD-) events obs.
total person-time at risk=a
y
Alternative names: INCIDENCE RATE, INCIDENCE DENSITY,HAZARD (RATE)
5
Calculation of person-yearsCalculation of person-years of observation for six individuals (see alsoSilva, p.62)..
Date of birth Date observation Date observation Death
began ended (0=no, 1=yes)
Jan. 4, 1897 July 11, 1954 Dec. 31, 1962 0
Sept. 5, 1884 Aug. 3, 1954 Nov. 25, 1960 1
Dec. 16, 1904 Oct. 25, 1954 Dec 31, 1962 0
Jan. 16, 1899 Nov. 1, 1954 Dec. 31, 1962 0
Apr. 9, 1912 Feb. 19, 1957 Dec. 31, 1962 0
Feb. 22, 1910 Dec. 8, 1957 Aug. 18, 1959 1
Years Months Days ' Years1 8 5 20 8.472 6 3 22 6.313 8 4 6 8.354 8 1 30 8.175 5 10 12 5.876 1 8 10 1.69
38.86
6
r = 238.86 years = 0.051 per year
= 51 per 1000 years
Note that while the risk is between 0 and 1, the rate has no upperlimit (the value depends on the time unit used)
Some times: approximations are used for person-years when dates areunavailable (see, e.g. Silva p.71)
What does the fraction 26 tell us? Not much! If it is to be thought of
as a risk estimate then it is quite unclear to which follow-up period itcorresponds.
7
Confidence intervals for rates
Better (and more complicated) method:
1. Calculate l = ln(r)
2. Calculate L2 = l + 1.96 · 1√a
Calculate L1 = l − 1.96 1√a
3. The desired 95% confidence limits are:from exp(L1) to exp(L2)
Example: l = ln(0.051) = −2.98L2 = −2.98 + 1.96 · 1√
2= −1.59
L1 = −2.98− 1.96 · 1√2= −4.36
95% confidence limits from exp(−4.36) = 0.013 per year toexp(−1.59) = 0.204 per year.
8
Relation between rate and riskIf the rate is (constant) = r (per year) what is then the T-year risk (ifthere are no other “competing” events)?
T
|
0
h = TN
| |
h = TN · · ·
|
h = TN
|
Probability of no event before T
= (1− r · h) · (1− r · h) · · · (1− r · h) = (1− r · h)N
=(1− r · TN
)N ≈ exp(−r · T )
That is: T -year risk = 1− exp(−rT ).
If rT is small then T -year risk ≈ rT (Silva, p.77).
9
Example: r = 0.051 per year, T = 5 years
1− exp(−rT )=1− exp(−0.051 · 5)=1− exp(−0.255)=0.225
∧r · T = 0.051 · 5 = 0.255
Note that it is quite ‘smart’ that, given a (constant) rate, it is possibleto calculate the risk for any chosen follow-up period, T .
10
Corresponding measure of associationRATE RATIO
Some times called “Relative risk” which is bad terminology, but nottotally silly because if the disease is “rare”:
Risk1Risk2
≈ Rate1 · TRate2 · T
=Rate1Rate2
95% confidence limits for rate ratio, RR:
1. Calculate ln(RR)
2. Calculate L2 = ln(RR) + 1.96√
1a1
+ 1a2
L1 = ln(RR)− 1.96√
1a1
+ 1a2
3. The 95% confidence limits are:from exp(L1) to exp(L2)
11
Example: Nurses health study (p. 179)BC cases Person-years
Age Group OC Non-OC OC Non-OC
45-49 a1 = 204 a2 = 240 y1 = 94029 y2 = 128528
Table 8.12: Cohort study of breast cancer (BC) and oral contraceptive (OC) use.
OC users Non-OC users
rO = 20494029
= 0.00217 per year rN = 240128528
= 0.00187 per year
lO =ln(0.00217) lN =ln(0.00187)
=−6.133 =−6.283
L2 =−6.133 + 1.96 1√204
L2 =−6.157
=−5.996
L1 =−6.270 L1 =−6.410
exp(L1)=0.00189 per year exp(L1)=0.00165 per year
exp(L2)=0.00249 per year exp(L2)=0.00212 per year
12
Example (contd.)
RR =0.00217
0.00187= 1.162
ln(RR) = 0.150
L2 = 0.150 + 1.96
√1
204+
1
240= 0.337
L1 = 0.150− 1.96
√1
204+
1
240= −0.0366
exp(L1) = 0.964, exp(L2) = 1.400
13
Exercise:
1. Calculate the crude rates for stomach cancer (SC) in Cali andBirmingham (with 95% confidence limits) based on the table inSilva, p.71.
2. Calculate the rate ratio with 95% confidence limits
3. Calculate the chi-square test (to be defined in the following) forcomparing the two rates.
14
Solution (1).
SC cases Person-years
Cali Birmingham Cali Birmingham
620 3468 5 · 622922 4 · 2556200
rC = 6203114610
= 0.000199 per year rB = 346810224800
= 0.000339 per year
ln(rC ) =−8.522 ln(rB) =−7.989
L2 =−8.522 + 1.96√
1620
L2 =−7.956
=−8.443
L1 =−8.522 − 1.96√
1620
L1 =−8.022
=−8.601
exp(L1)=0.000184 per year exp(L1)=0.000328 per year
exp(L2)=0.000215 per year exp(L2)=0.000351 per year
15
Solution (2).
RR =0.000199
0.000339= 0.59, ln(RR) = −0.533
L2 = −0.533 + 1.96
√1
620+
1
3468= −0.447,
L1 = −0.533− 1.96
√1
620+
1
3468= −0.618,
exp(L1) = 0.54, exp(L2) = 0.64.
16
Test for comparing two ratesNurses health study
Events (BC) Person-years
OC-users 204 (a1) 94029 (y1)
Non-OC-users 240 (a2) 128528 (y2)
444 (a) 222557 (y)
Here: a = a1 + a2, y = y1 + y2.
17
“Mantel-Haenszel” idea:Compare: OBServed = 204 = a1
with : EXPected = 94029 · 444222557
= y1 · ay = 187.6
SD(a1) =√a · y1·y2
y·y
=√444 · 94029
222557 ·128528222557 = 10.41
Test statistic =(OBS−EXP
SD
)2=(204−187.6
10.41
)2= 2.48
The chi-square table (1 d.f.) gives0.10 < P < 0.25
18
Solution to exercise (3).
OBS = 620
EXP = 3114610 · 4088
13339410= 954.5
SD =
√4088 · 3114610
13339410· 1022480013339410
= 27.0(OBS − EXP
SD
)2
= 152.9 ∼ chi-square (1 d.f.)
P < 0.001
19
How can we adjust/standardise
for age when comparing two groups?
Two types of standardisation exist:
– “DIRECT” cf. Silva– “INDIRECT” p. 70-79
“Direct”: Compare weighted averages of the age-specific rates in thetwo groups using some standard age-distribution as weights
“Indirect”: In each group, compare the observed number of eventswith what one would have expected, had the group been exposedto some standard age-specific rates.
Other ways of adjustment (also for other factors than age):
– STRATIFIED
– REGRESSION
ANALYSIS
20
Example: 1970 US mortality data (Kahn & Sempos, 1989).California (a) Maine (b)
Age Pop.in No.of Rate per Pop.in No.of Rate per
1000 deaths 1000 ys. 1000 deaths 1000 ys.
< 15 5524 8751 1.6 286 535 1.9
15-24 3558 4747 1.3 168 192 1.1
25-34 2677 4036 1.5 110 152 1.4
35-44 2359 6701 2.8 109 313 2.9
45-54 2330 15675 6.7 110 759 6.9
55-64 1704 26276 15.4 94 1622 17.3
65-74 1105 36259 32.8 69 2690 39.0
75+ 696 63840 91.7 46 4788 104.1
Total 19953 166285 8.3 992 11051 11.1
“Crude rates”
(a) California:166285
19953000= 8.3 per 1000 ys.
(b) Maine:11051
992000= 11.1 per 1000 ys.
RR = 1.34
21
The age distributions in California and Maine differ so the apparentdifference may be (partly) ascribed to this.
California Maine
Age Pop. in 1000 % Pop. in 1000 %
< 15 5524 28 286 29
15-24 3558 18 168 17
25-34 2677 13 110 11
35-44 2359 12 109 11
45-54 2330 12 110 11
55-64 1704 9 94 9
65-74 1105 6 69 7
75+ 696 3 46 5
Total 19953 100 992 100
22
Example: 1970 US mortality data.United States
Age Pop.in Weight No.of Rate per
1000 deaths 1000 ys.
< 15 57900 .285 103062 1.8
15-24 35441 .174 45261 1.3
25-34 24907 .123 39193 1.6
35-44 23088 .114 72617 3.1
45-54 23220 .114 169517 7.3
55-64 18590 .091 308373 16.6
65-74 12436 .061 445531 35.8
75+ 7630 .038 736758 96.6
Total 203212 1.000 1920312 9.4
(Kahn & Sempos, 1989)
Directly standardised rates:DSRa = 0.285×1.6+0.174×1.3+· · ·+0.038×91.7 = 8.8 per 1000 ys.
DSRb = 0.285× 1.9 + 0.174× 1.1 + · · ·+ 0.038× 104.1 =
9.9 per 1000 ys.
RR = 1.12
23
Example from Silva, p.71.Male stomach cancer cases:
Cali Birmingham
Age Popu- Person- No.of Rate per Popu- Person- No.of Rate per
lation years cancers 100000 ys. lation years cancers 100000 ys.
0-44 524220 2621100 39 1.5 1683600 6734400 79 1.2
45-64 76304 381520 266 69.7 581500 232600 1037 44.6
65+ 22398 111990 315 281.3 291100 1164400 2352 202.0
Total 622922 3114610 620 19.9 2556200 10224800 3468 33.9
Age distributions (%):
Age Cali Birmingham Standard
0-44 84 66 74
45-64 12 23 19
65+ 4 11 7
Total 100 100 100
24
The age distributions in Cali and Birhingham differ so the differencebetween the crude rates may be (partly) ascribed to this.
Directly standardised rates:
DSRC = 0.74× 1.5 + 0.19× 69.7 + 0.07× 281.3 = 34.04
DSRB = 0.74× 1.2 + 0.19× 44.6 + 0.07× 202.2 = 23.50
(both per 100000 ys.)
25
Indirect standardisation(a) California: Obs. number of deaths = 166 285
Expected number of deaths =
1.8× 5524 + 1.3× 3558 + · · ·+ 96.6× 696 = 178253
Standardised mortality ratio:
SMRa =166285
178253= 0.933 =
93.3
100
(b) Maine: Obs. = 11051
Exp. = 10524
SMRb = 1105110524 = 1.050 = 105.0
100
26
If one wants to express the results as rates then indirectly standardisedrates may be computed:
ISRa = SMRa · 0.0094 = 8.8 per 1000 ys
ISRb = SMRb · 0.0094 = 9.9 per 1000 ys
where 0.0094 per year=9.4 per 1000 ys. is the overall crude rate in thestandard population.
Silva (p. 74) illustrates indirect standardisation by computing thenumber of cancer cases expected in Cali if the age-specific rates fromBirmingham apply.
27
Discussion
– Standardisation is a classical technique for age-adjustment.
– For comparing groups statistically, stratified (and regression)analysis (to be discussed later) is superior.
– The average calculated in direct standardisation frequentlyprovides an over-simplification for analysis (but it may be OK forgraphical displays etc.).
– Indirect standardisation (SMR) may be used for comparing asample with standard rates (for this, we need a confidenceinterval, see below) but for comparing two (or more) groups(samples) stratified (and regression) analysis is better.
28
Confidence interval for SMRBest method (using ln!)
1. Calculate ln(SMR)
2. Calculate ln(SMR) + 1.96 1√OBS
= L2
ln(SMR)− 1.96 1√OBS
= L1
3. The desired 95% confidence limits are from exp(L1) to exp(L2)
Example: Maine compared to US
1. ln(1.050) = 0.049
2. 0.049 + 1.96 1√11051
= L2 = 0.068,
0.049− 1.96 1√11051
= L1 = 0.030
3. exp(L1) = 1.030, exp(L2) = 1.070
Exercise: California!
29
Solution.SMR for California:
166285
178253= 0.933
1. ln(SMR) = −0.069
2. L1 = −0.069− 1.96 1√166285
= −0.074L2 = −0.069 + 1.96 1√
166285= −0.064
3. exp(L1) = 0.929
exp(L2) = 0.938
30