health and disease in populations 2001 sources of variation (2) jane hutton (paul burton)

Health and Disease in Populations 2001

Sources of variation (2)

Jane Hutton

(Paul Burton)

Informal lecture objectives

Objective 1 To enable the student to distinguish between

observed data and the underlying tendencies which give rise to those data

Objective 2: To understand the concept of random variation

...

Objective 3 Describe how ‘observed’ values provide

knowledge of the ‘true’ values using tests of hypotheses about about the true value Confidence intervals give a range which

include the ‘true’ value with a specific probability.

Neural tube defects in Western Australia (1975-2000) – hypothetical data

Hypothesis testing1. Calculate the probability of getting an observation

as extreme as, or more extreme than, the one observed if the stated hypothesis was true.

2. If this probability is very small, then eithera) something very unlikely has occurred; orb) the hypothesis is wrong

3. It is then reasonable to conclude that the data are incompatible with the hypothesis.

The probability is called a ‘p-value’

Remember!

IMPORTANT: Think of the implications Rejecting H0 is little use without a conclusion

p<0.05 is arbitrary; nothing special happens between p=0.049 and p=0.051

p=0.0001 and p=0.6 are easy to interpret False positive and false negative results Statistical significance depends on sample size. Flip a

coin 3 times minimum p=0.25 (i.e. 2×1/8) Statistically significant clinically important

P values widely used

Conclusions - range of values

Objective 3 Describe how ‘observed’ values help us

towards a knowledge of the ‘true’ values by:

b) Confidence intervals give a range which include the ‘true’ value with a specific probability.

Allowing us to test hypotheses about the true value

Any questions?

Estimation

In a study, we observe a 30% higher risk of TB in Warwick than in the rest of the UK IRR of 1.3

H0 ‘rejected’ (p=0.01)

But, what is our ‘best guess’ at the true excess risk?

Hypothesis p-value Rejected? -20% risk 0.0001 Rejected -10% risk 0.002 Rejected Same risk 0.01 Rejected +10% risk 0.1 Not rejected +20% risk 0.4 Not rejected +30% risk 0.5 Not rejected +40% risk 0.2 Not rejected +50% risk 0.1 Not rejected +60% risk 0.01 Rejected

Informally Values outside the range [10% excess risk to 50% excess risk] are in some sense ‘inconsistent’ with the data The range [10% excess risk to 50% excess risk] probably includes the true value

The 95% confidence interval

A range which we can be 95% certain includes the true value of the underlying tendency.

The IRR for Warwick lies in (1.1, 1.5) with probability 95%.

Centred on the observed value (our best guess at the real underlying value). So, the observed value always falls inside the 95%

confidence interval

The 95% confidence interval Fortunately, the link between hypothesis tests and confidence

intervals means that we don’t have to calculate lots of p-values and check whether to reject the hypothesis. For this course, just use ‘error factor’.

Instead, simply calculate the ‘observed value’ and a second quantity called the ‘error factor’ (e.f.). Then: (observed value e.f.) is called the lower 95% confidence limit

(CL) (observed value e.f.) is called the upper 95% confidence limit

(CL)

The full range between the lower and upper 95% CLs is called the 95% confidence interval

An example

Observe 50 new cases of diabetes in a population of 2,000 people over 5 years. Exposure = 2,0005 = 10,000 person years New cases = 50 Incidence = 50/10,000 = 0.005 per person year

= 5 per 1000 person years

33.150

12expe.f.

Diabetes example continued Incidence = 50/10,000 = 0.005 per person year = 5 per 1,000 person years

Lower 95% CL = 0.0051.33 = 0.00376 Upper 95% CL = 0.0051.33 = 0.00665

So, our best estimate of the true incidence is 5 cases per 1,000 person years and we are 95% certain that the range 3.8 to 6.7 cases per 1,000 person years includes the true rate.

33.150

12expe.f.

As we get more data We get more and more sure about the underlying

value: e.f. gets smaller and the 95% CI narrower Observe 200 new cases of diabetes in a population

of 40,000 people over 1 year. Estimated rate = 0.005 (same as before)

lower 95% CL = 0.005 1.15 = 0.0043 upper 95% CL = 0.005 1.15 = 0.0058 Best estimate still 5 cases per 1,000 person years,

but now 95% certain that the true rate lies between 4.3 and 5.8 cases per 1,000 person years.

15.1200

12expe.f.

Any questions?

Confidence intervals

Reflect uncertainty about the true value of something, e.g. an incidence, a population prevalence, a population average height etc.

NOT a range within which 95% of

individual observations lie

CASES P-YRS RATE 13 2,000 0.0065 10 2,000 0.005 6 2,000 0.003

14 2,000 0.007 7 2,000 0.0035

50 cases, 10,000 p-yrs.

Estimate = 0.005, 95% CI (see above) = 3.8 to 6.7 cases per 1,000 person years.

But rates in 3 individual years fall outside this range!!

Another example A sample of 50 students

Observed mean height = 1.675m

The 95% confidence interval for mean height is 1.65m to 1.70m

But 95% of the 50 students fall between 1.55m and 1.85m in height. This is called a reference range (or normal range) not a confidence interval.

This is an important distinction

Inference on a rate ratio

Population 1: d1 cases in P1 person years


Rate ratio = d1/P1 d2/P2

Inference on a rate ratio



Confidence interval and test

Rate ratio = d1/P1 d2/P2

21 d

1

d

12expe.f.

Estimation versus hypothesis testing Estimation is more informative Estimation can incorporate a hypothesis test:

Hypothesis: the incidence of diabetes in population A is the same as that in B.

Data: Population A: 12 cases in 2,000 patient years

Population B: 16 cases in 4,000 patient years Rates:A: 12/2,000 = 0.006

B: 16/4,000 = 0.004 Ratio of rates: AB = 1.5

Estimation vs hypothesis testing …. Estimation can incorporate a hypothesis

test: Ratio of rates = 1 if rates are the same. Ratio of rates: AB = 1.5

95% CI for rate ratio = 1.52.15 = 0.70 to 1.52.15 = 3.23. The range [0.70 to 3.23] includes 1.00: data are consistent with the original hypothesis so cannot reject it (p>0.05). This does not prove it’s true!!

15.216

1

12

12expe.f.

Another example 80 deaths in 8,000 person-yrs (male) 50 deaths in 10,000 person-yrs (female) RateM = 10 per 1,000 p-y; RateF = 5 per 1,000 p-y Observed rate ratio (M/F) = 2.0

95% CI: [2÷1.43 to 2×1.43] = [1.40 to 2.86] Best estimate of true rate ratio=2.0, and 95% certain

that true rate ratio lies between 1.40 and 2.86. This range does not include 1.00 so able to reject hypothesis of equality (p<0.05)

43.150

1

80

12expe.f.

Inference on an SMR

Observe O deaths Expect E deaths (based on age-specific

rates in the standard population and age-specific population sizes in the test population)

SMR = (O/E) 100

O

12expe.f.

Example for SMR On basis of age specific rates in standard

population expect 50 deaths in test population. Observe 60. (O=60, E=50)

SMR = (60/50)×100 = 120

95% CI for SMR = 120 ÷/× 1.29 = 93 to 155. CI includes 100 so data consistent with equality of death rate in test and standard populations (p>0.05). But also consistent with e.g. a 50% excess so certainly doesn’t prove equality.

29.160

12expe.f.

Any questions?

Summary All observations (disease rates, levels of

occupational risk, effectiveness of new drugs etc) are subject to random variation

We always want to know about the underlying tendency = the true value of rates or risks

We use observed data to test hypotheses about the underlying value

We use observed data to estimate the underlying tendency

Summary In this course the best estimate of the true value

of the underlying tendency is the observed value We express uncertainty by calculating error

factors and deriving confidence intervals A 95% confidence interval is the range which

includes the true value of the statistic of interest with probability 95%.

It can also be viewed as the range of true values that are consistent with the observed data. If different values consistent with the observed data would lead to different conclusions you can only be uncertain what to conclude

Summary

Population A: rate=0.008; B: rate=0.002 Rate ratio = 4, e.f.=2, 95% CI [2 to 8]

All values in the 95% CI suggest A higher than B. Can safely conclude A higher than B. This is equivalent to saying the 95% CI does not include 1.00 (null hypothesis) so the rate ratio is significantly different from 1.00 (p<0.05)

SummaryPopulation A: rate=0.01; B: rate=0.005 Rate ratio = 2, 95% CI [0.5 to 8]

Values in 95% consistent with: A much higher than B; A somewhat lower than B; or both the same. Cannot really conclude anything too firmly.

In this case 95% CI does include 1.00 (the null hypothesis) so the rate ratio is not significantly different from 1.00 (p>0.05) so cannot reject hypothesis of equality

But this does not prove that the rates are equal

Any questions?

health and disease in populations 2001 sources of variation (2) jane hutton (paul burton)

Documents