health and disease in populations 2001 sources of variation (2) jane hutton (paul burton)

31
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Upload: anna-lawson

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Health and Disease in Populations 2001

 Sources of variation (2)

 Jane Hutton

(Paul Burton)

Page 2: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Informal lecture objectives

Objective 1 To enable the student to distinguish between

observed data and the underlying tendencies which give rise to those data

Objective 2: To understand the concept of random variation

Page 3: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

...

Objective 3 Describe how ‘observed’ values provide

knowledge of the ‘true’ values using tests of hypotheses about about the true value Confidence intervals give a range which

include the ‘true’ value with a specific probability.

Page 4: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Neural tube defects in Western Australia (1975-2000) – hypothetical data

Page 5: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Hypothesis testing1. Calculate the probability of getting an observation

as extreme as, or more extreme than, the one observed if the stated hypothesis was true.

2. If this probability is very small, then eithera) something very unlikely has occurred; orb) the hypothesis is wrong

3. It is then reasonable to conclude that the data are incompatible with the hypothesis.

The probability is called a ‘p-value’

Page 6: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Remember!

IMPORTANT: Think of the implications Rejecting H0 is little use without a conclusion

p<0.05 is arbitrary; nothing special happens between p=0.049 and p=0.051

p=0.0001 and p=0.6 are easy to interpret False positive and false negative results Statistical significance depends on sample size. Flip a

coin 3 times minimum p=0.25 (i.e. 2×1/8) Statistically significant clinically important

P values widely used  

Page 7: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Conclusions - range of values

Objective 3 Describe how ‘observed’ values help us

towards a knowledge of the ‘true’ values by:

b) Confidence intervals give a range which include the ‘true’ value with a specific probability.

Allowing us to test hypotheses about the true value

Page 8: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Any questions?

Page 9: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Estimation

In a study, we observe a 30% higher risk of TB in Warwick than in the rest of the UK  IRR of 1.3

H0 ‘rejected’ (p=0.01)

But, what is our ‘best guess’ at the true excess risk?

Page 10: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Hypothesis p-value Rejected? -20% risk 0.0001 Rejected -10% risk 0.002 Rejected Same risk 0.01 Rejected +10% risk 0.1 Not rejected +20% risk 0.4 Not rejected +30% risk 0.5 Not rejected +40% risk 0.2 Not rejected +50% risk 0.1 Not rejected +60% risk 0.01 Rejected

Informally Values outside the range [10% excess risk to 50% excess risk] are in some sense ‘inconsistent’ with the data The range [10% excess risk to 50% excess risk] probably includes the true value

Page 11: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

The 95% confidence interval

A range which we can be 95% certain includes the true value of the underlying tendency.

The IRR for Warwick lies in (1.1, 1.5) with probability 95%.

Centred on the observed value (our best guess at the real underlying value). So, the observed value always falls inside the 95%

confidence interval

Page 12: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

The 95% confidence interval Fortunately, the link between hypothesis tests and confidence

intervals means that we don’t have to calculate lots of p-values and check whether to reject the hypothesis. For this course, just use ‘error factor’.

Instead, simply calculate the ‘observed value’ and a second quantity called the ‘error factor’ (e.f.). Then: (observed value e.f.) is called the lower 95% confidence limit

(CL) (observed value e.f.) is called the upper 95% confidence limit

(CL)

The full range between the lower and upper 95% CLs is called the 95% confidence interval

Page 13: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

An example

Observe 50 new cases of diabetes in a population of 2,000 people over 5 years. Exposure = 2,0005 = 10,000 person years New cases = 50 Incidence = 50/10,000 = 0.005 per person year

= 5 per 1000 person years

  33.150

12expe.f.

Page 14: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Diabetes example continued Incidence = 50/10,000 = 0.005 per person year = 5 per 1,000 person years

Lower 95% CL = 0.0051.33 = 0.00376 Upper 95% CL = 0.0051.33 = 0.00665

So, our best estimate of the true incidence is 5 cases per 1,000 person years and we are 95% certain that the range 3.8 to 6.7 cases per 1,000 person years includes the true rate.

33.150

12expe.f.

Page 15: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

As we get more data We get more and more sure about the underlying

value: e.f. gets smaller and the 95% CI narrower Observe 200 new cases of diabetes in a population

of 40,000 people over 1 year. Estimated rate = 0.005 (same as before)

lower 95% CL = 0.005 1.15 = 0.0043 upper 95% CL = 0.005 1.15 = 0.0058 Best estimate still 5 cases per 1,000 person years,

but now 95% certain that the true rate lies between 4.3 and 5.8 cases per 1,000 person years.

15.1200

12expe.f.

Page 16: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Any questions?

Page 17: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Confidence intervals

Reflect uncertainty about the true value of something, e.g. an incidence, a population prevalence, a population average height etc.

  NOT a range within which 95% of

individual observations lie

Page 18: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

CASES P-YRS RATE 13 2,000 0.0065 10 2,000 0.005 6 2,000 0.003

14 2,000 0.007 7 2,000 0.0035

50 cases, 10,000 p-yrs.

Estimate = 0.005, 95% CI (see above) = 3.8 to 6.7 cases per 1,000 person years.

But rates in 3 individual years fall outside this range!!

Page 19: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Another example A sample of 50 students

Observed mean height = 1.675m

The 95% confidence interval for mean height is 1.65m to 1.70m

But 95% of the 50 students fall between 1.55m and 1.85m in height. This is called a reference range (or normal range) not a confidence interval. 

This is an important distinction 

Page 20: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Inference on a rate ratio

Population 1: d1 cases in P1 person years

Population 2: d2 cases in P2 person years

 

Rate ratio = d1/P1 d2/P2

Inference on a rate ratio

Population 1: d1 cases in P1 person years

Population 2: d2 cases in P2 person years

 Confidence interval and test

Rate ratio = d1/P1 d2/P2

21 d

1

d

12expe.f.

Page 21: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Estimation versus hypothesis testing Estimation is more informative Estimation can incorporate a hypothesis test:

Hypothesis: the incidence of diabetes in population A is the same as that in B.

Data: Population A: 12 cases in 2,000 patient years

Population B: 16 cases in 4,000 patient years Rates:A: 12/2,000 = 0.006

B: 16/4,000 = 0.004 Ratio of rates: AB = 1.5

Page 22: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Estimation vs hypothesis testing …. Estimation can incorporate a hypothesis

test: Ratio of rates = 1 if rates are the same. Ratio of rates: AB = 1.5

95% CI for rate ratio = 1.52.15 = 0.70 to 1.52.15 = 3.23. The range [0.70 to 3.23] includes 1.00: data are consistent with the original hypothesis so cannot reject it (p>0.05). This does not prove it’s true!!

15.216

1

12

12expe.f.

Page 23: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Another example 80 deaths in 8,000 person-yrs (male) 50 deaths in 10,000 person-yrs (female) RateM = 10 per 1,000 p-y; RateF = 5 per 1,000 p-y Observed rate ratio (M/F) = 2.0

 

95% CI: [2÷1.43 to 2×1.43] = [1.40 to 2.86]  Best estimate of true rate ratio=2.0, and 95% certain

that true rate ratio lies between 1.40 and 2.86. This range does not include 1.00 so able to reject hypothesis of equality (p<0.05)

43.150

1

80

12expe.f.

Page 24: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Inference on an SMR

Observe O deaths Expect E deaths (based on age-specific

rates in the standard population and age-specific population sizes in the test population) 

SMR = (O/E) 100

O

12expe.f.

Page 25: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Example for SMR On basis of age specific rates in standard

population expect 50 deaths in test population. Observe 60. (O=60, E=50)  

SMR = (60/50)×100 = 120

 

95% CI for SMR = 120 ÷/× 1.29 = 93 to 155. CI includes 100 so data consistent with equality of death rate in test and standard populations (p>0.05). But also consistent with e.g. a 50% excess so certainly doesn’t prove equality.

29.160

12expe.f.

Page 26: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Any questions?

Page 27: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Summary All observations (disease rates, levels of

occupational risk, effectiveness of new drugs etc) are subject to random variation

We always want to know about the underlying tendency = the true value of rates or risks

We use observed data to test hypotheses about the underlying value

We use observed data to estimate the underlying tendency

Page 28: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Summary In this course the best estimate of the true value

of the underlying tendency is the observed value We express uncertainty by calculating error

factors and deriving confidence intervals A 95% confidence interval is the range which

includes the true value of the statistic of interest with probability 95%.

It can also be viewed as the range of true values that are consistent with the observed data. If different values consistent with the observed data would lead to different conclusions you can only be uncertain what to conclude

Page 29: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Summary

Population A: rate=0.008; B: rate=0.002 Rate ratio = 4, e.f.=2, 95% CI [2 to 8]

All values in the 95% CI suggest A higher than B. Can safely conclude A higher than B. This is equivalent to saying the 95% CI does not include 1.00 (null hypothesis) so the rate ratio is significantly different from 1.00 (p<0.05)

Page 30: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

SummaryPopulation A: rate=0.01; B: rate=0.005 Rate ratio = 2, 95% CI [0.5 to 8]

Values in 95% consistent with: A much higher than B; A somewhat lower than B; or both the same. Cannot really conclude anything too firmly.

In this case 95% CI does include 1.00 (the null hypothesis) so the rate ratio is not significantly different from 1.00 (p>0.05) so cannot reject hypothesis of equality

But this does not prove that the rates are equal

Page 31: Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Any questions?