instructor resource chapter 5 copyright © scott b. patten, 2015. permission granted for classroom...

Instructor Resource

Chapter 5

Copyright © Scott B. Patten, 2015.

Permission granted for classroom use with Epidemiology for Canadian Students: Principles, Methods & Critical Appraisal (Edmonton: Brush Education Inc. www.brusheducation.ca).

Chapter 5. Random error from sampling

Objectives

• Identify and differentiate the 2 main sources of error in epidemiologic research: random error and systematic error.• Describe the relationship between sampling and

random error.• Define confidence intervals and how to calculate

them.

Objectives (continued)

• Describe the relationship between sample size and precision in a prevalence study.• Differentiate estimation and statistical testing.• Describe statistical testing and define key related

concepts (significant versus nonsignificant tests, type I and type II error, statistical power).• Explain the influence of sample size on statistical

power.

Sources of error in epidemiological researchSources of error include:• random error (a.k.a. stochastic error)• systematic error (a.k.a. bias)A clear definition of bias comes from a clear understanding of what is meant by random error—which is why we are starting with random error.

PREVALENCE

• PREVALENCE is spelled in uppercase letters to indicate that the parameter is calculated from the population (not sampled) data.• PREVALENCE is not an estimate: in the absence of

measurement errors, it is the true population parameter.

Prevalence

• Prevalence is spelled in lowercase letters (prevalence) to indicate that the parameter is calculated from a sample.• When calculated from a sample, prevalence is an

estimate: repeating the process of sampling would result in different estimates.• The different estimates are due to sampling

variability.• The difference between a true value and a sample-

based estimate is a type of error: random error.

Random samples

• In a random sample, the selection of subjects into the sample cannot be predicted.• Each person’s disease status is an independent

observation that reflects true prevalence of disease in the population through the law of large numbers.• The sample prevalence therefore estimates the true

value, but can differ from the true value due to random error.

Sampling terminology

• In a probability sample, the probability of selecting a person from the population is known.• A simple random sample is a basic form of a

probability sample: the probability of selecting each member of the population is the same.• The probability of selection is a selection

probability.• In practice, sampling requires a list from which to

select. This is a sampling frame.

Sampling terminology (continued)• Inference describes the process of gaining

information about a population based on data collected from a sample.• The target population is the subject of inference: it

is the population whose parameters are estimated through sampling.

Sampling terminology (continued)• A source population is a subset of a target

population: it is a smaller population within a larger target population from which a sample is drawn.• A study population is common term for a sample

drawn from a source population: this is a confusing term because a “study population” is not a population, it’s a sample.

Dealing with random error

• The law of large numbers predicts that larger samples lead to parameter estimates (e.g., prevalence) that more closely reflect the true population values.• Therefore, epidemiological studies prefer large

samples.• Nevertheless, random error needs to be addressed

during data analysis.

Dealing with random error (continued)• There are 2 general approaches:• confidence intervals• statistical tests

• Confidence intervals are the preferred approach.

Confidence intervals

• Confidence intervals define a range of plausible values for true population parameters, based on a desired level of confidence.• Usually, 95% confidence is the desired level.• A confidence interval consists of 2 numbers called

confidence limits.• The confidence interval comprises all values

between the lower and upper confidence limits.• You can be 95% confident that a 95% confidence

interval captures the true population value.

Confidence intervals (continued)• The best type of confidence intervals are exact

confidence intervals.• Others are based on approximations—for example,

in a standard normal distribution, +/- 1.96 will include 95% of values, so if an estimate is normally distributed:

Lower 95% Confidence Limit = Estimate – (1.96 x SE)Upper 95% Confidence Limit = Estimate + (1.96 x SE)where SE is the standard error associated with the estimate

Statistical tests

• Instead of providing a range of values, statistical tests are designed to help answer the question, “Is exposure associated with disease?”• They follow a series of steps.

Statistical tests (continued)• Step 1: Formulate a null hypothesis (e.g., there is no

association between exposure and disease).• Step 2: Calculate the probability of observing an

effect as large, or larger, than observed due to chance, assuming that the null hypothesis is true.• Step 3: If the probability in step 2 is small, the null

hypothesis is rejected.

Statistical tests (continued)• Statistical tests work by rejecting a hypothesis, not

by proving a hypothesis.• Null hypotheses are never rejected with certainty,

they are just deemed unlikely • The decision that a result (or one more extreme) is

unlikely is usually based on its probability (given the null hypothesis) being less than 5% (p < 0.05).

Statistical errors

• Statistical tests can make 2 types of errors:• rejecting a null assumption that is true (type I error)• failing to reject a null assumption that is false (type II

error)

An association exists in the population

(null hypothesis is false)

No association exists in the population

(null hypothesis is true)

Statistical test is significant No error Type I error

Statistical test is nonsignificant Type II error No error

Statistical power

• Statistical power is the probability of rejecting a null hypothesis that is false.• Power is calculated from:• sample size (larger = greater power)• effect size (bigger = greater power)• probability at which null rejected (larger = greater

power*)

• For continuous measures (e.g., comparing means), the standard deviation of the outcome also contributes to statistical power.

* but this is usually set at the conventional 5% power and not changed to increase power

Probability of error

The probability of type I error is: • the value of probability at which the null is rejected

The probability of type II error is: • 1 – statistical power

instructor resource chapter 5 copyright © scott b. patten, 2015. permission granted for classroom...

Documents