some statistics and epidemiology for the akt. we’ll try to cover 1.general tips 2.types of study,...
TRANSCRIPT
Some Statistics and Epidemiology for the AKT
We’ll try to cover
1. General tips
2. Types of study, the ‘evidence hierarchy’, what you get from each study type
3. Looking at numbers: calculations and data representation
4. Screening: qualities of test, more calculations
Scope
• What I learnt
• Hopefully will give an approach for most questions
• Not everything! (See AKT content guide)
• PasTest / OnExamination
Some General Tips
• A few key formulae - write them down!
• Stats more than anything - ‘RTQ2’
• Flag and return
• If presented with a random chart…(You don’t have to fully understand a graph
to get the information you need)
• Flag and return… (don’t stress)
2. Looking at Evidence
“It is a truth universally acknowledged that a medical intervention justified by observational data must be in want of verification through a randomised controlled trial”
BMJ 2003;327:1459-61
Qualitiative vs. Quantitative
• Focus on quantitative here
• Qualitative:– Generate ‘informed assertions’ or
hypotheses– May inform planning quantitiative research– Focus groups, interviews, questionnaires
Example questions?
Quantitative study types
• Try to be able to identify type of study from a description of the study
• Therefore know what it can and can’t tell you • Any study:
– What is the exposure?– What is the outcome?– What type of study is it?
Rates of liver cancer and HBV seroprevance, by country.
Population Studies• Population studies
– Examples?– Uses– Limitations
• Information from groups, not individuals• Correlations• Generate hypotheses, can’t test them• Confounding• The Ecological Fallacy
The Ecological FallacyUS 2004 elections:
– Republicans won in the poorest 15 states– Democrats won in 9 of the 11 wealthiest
states– So, are wealthy people more likely to vote
for the Democrats?
NO!Incorrect assumption that individuals from wealth states are more likely to be wealthy
= the ecological fallacy
Individual Studies
• Case reports, case series, cross sectional• Generate hypotheses, can’t test them• Cross sectional study: exposure and outcome
at the same time– No temporal relationship– BMI (outcome) and time spent exercising
(exposure)
Analytical Studies
• Generally the most useful for us..can answer questions (not just ask them)
• Observational– No control over exposure
• Interventional– Control over exposure
Observational Studies
• A type of analytical study
• Consider two types…
Women diagnosed with DVT are six times more likely to have a history of oral contraceptive usage
Case - control studies
• It’s all in the title…
• Take cases, find matched controls• Exposure to risk factor is determined retrospectively
Case-control studies
Advantages• Rare diseases• Cheap and quick
Limitations• Need matching• Confounding• Recall bias
Can calculate odds ratios
As study looking at new diagnoses of lung cancer in smokers vs. non-smokers.
Cohort studies
A type of observational, analytical study:• Determine exposure e.g. measure tobacco use• Follow up cohort over a set period of time e.g. 10
years• Measure outcome e.g. new diagnoses of lung
cancer
Cohort studies
Advantages• Generate incidence• Less danger of bias
by poor selection of controls
• No recall bias• Multiple exposures
Limitations• Expensive• Time consuming• Loss to follow-up• Rare conditions?
Can calculate relative risk
Rivaroxiban reduces the risk of ischamic stroke over five years in patients with atrial fibrillation compared
with placebo.
Interventional Studies
Basically a cohort study, but…The difference is that the investigator intervenes:• The investigator determines exposure• Individuals are followed up for a period of time to
determine the outcome.
Need to consider:• Double blind versus single blind• Placebo control• randomisation
RCTRandomisation:• Subjects are randomised into groups (drug vs. placebo, or drug A vs.
drug B…)• The aim is that the groups are identical in every way apart form
exposure to the drug, to minimise confounders• Minimise selection bias• (Also crossover - added safeguard)
Control:• The control group should be identical• The control may receive a placebo intervenion, which ideally looks
and tastes the same
Blinding• The investigators don’t know who has received what when collecting
and analysing data• The subjects don’t know whether they have reviewed placebo or not
RCTs
Advantages• Minimise
confounding and bias
• ‘Gold standard’
Limitations• Extremely expensive• Side effects may
unblind• Ethics
Can calculate relative risk
(remember it is a cohort study)
Meta-anaysis
Not to be confused with a systematic review.A meta-anaylsis:• Aggregate data from multiple trials• Complex analysis - probably less accessible
…but generally analysed by people with less of an interest in deceiving you.
• ‘Gold standard’
Answering Questions
Using EBM in practice:• Need a specific question:
– P– I– C– O
• Need use a systematic means of searching for available evidence
• Need to be able to appraise the evidence– Some useful tools if you have the time
• Or, get someone else to do it:– CKS, Cohrane, BMJ Clinical Evidence
Answering Questions: The Evidence Hierachy
Hierarchy of quality of evidence for quantitative studies• I-1: Systematic review of RCTs• I-2: RCT• II-1: Cohort• II-2: Case-control• II-3: Uncontrolled experiment• III: Expert committees, respected authorities• IV: ‘Somebody once told me’, The Daily Mail,
– case reports, case series
BMJ 2003;327:1459-61
“Only two options exist. The first is that we accept that, under exceptional circumstances, common sense might be applied when considering the potential risks and benefits of interventions. The second is that we continue our quest for the holy grail of exclusively evidence based interventions and preclude parachute use outside the context of a properly conducted trial. The dependency we have created in our population may make recruitment of the unenlightened masses to such a trial difficult. If so, we feel assured that those who advocate evidence based medicine and criticise use of interventions that lack an evidence base will not hesitate to demonstrate their commitment by volunteering for a double blind, randomised, placebo controlled, crossover trial.”
3. Looking at Numbers
Looking at numbers
• Definitions
• Comparing risk between populations
• Looking at data: some other points
• Draw a table if you can!
A few definitions
• Rate– Denominator = person-time at risk– e.g. cases / 1000 at risk population / year
• Incidence
• Prevalence
Comparing risk between populations
• Quantifying risk is basis of measuring effect of an exposure or intervention
• Calculations depend on type of study:– Risk: cohort– Odds: case-control
• Once you have a measure of risk, you can compare between exposed and unexposed– Risk ratio or Odds ratio
Risk• Risk = proportion• 6 our of 10 medical students are female• The risk of being female is 0.6 if you are a
medical student
Risk = affected / total(in this case the exposure is being a medical student, the outcome is being female)
• In a case-control, you don’t know the total exposed, so you can’t calculate risk.
• In a cohort, you determine exposure at the start, so you can
Risk• Remember, in a cohort study, you select
according to exposure, and follow up to determine outcome
• Design a cohort study to look at car accidents in yellow and black cars– Suggest that yellow cars prevent accidents– Let’s suppose there are only two colours of car for
simplicity
Risk• Exposure: car being yellow• Outcome: accident free after 1 year• Plan: determine exposure at the start (record
colour), follow up for one year and monitor for car accidents
• Draw a table– 100 yellow cars, 10 had accidents– 100 black cars, 20 had accidents
Yes No
Yes 90 10 100
No 80 20 100
170 30
Exposure
Outcome
Totals
TotalsNo accidents at 1yr
Being yellow
Risk
• We can calculate the risk in those exposed (yellow car) and those not exposed (black)
• Then we can calculate a ratio of the risks in the two groups
• Work out the formula…
Relative Risk• The risk being accident free at one year if you
have a yellow car relative to if you have a black one = a ratio of the risks
RR = risk (exposed) / risk (unexposed)
• Or, how much more likely are you to be accident free if you drive a yellow car vs. a black one
• So work out the relative risk using our table• If RR = 1, there is no difference in risk.
Yes No
Yes 90 10 100
No 80 20 100
170 30
Exposure
Outcome
Totals
TotalsNo accidents at 1yr
Being yellow
Relative Risk
• Risk in exposed = 0.9• Risk in unexposed = 0.8
• Relative risk = 0.9/0.8 = 1.125
NB from these data you can also calculate an incidence, as mentioned earlier
Absolute Risk Reduction
• ARR is a measure of the reduction in risk which an exposure causes
• This is basically what we want to know to decide whether something is effective – Does it make a difference, and if so how much?
• The main use of ARR is to calculate the NNT
• Could you work out the formula knowing this information?
Absolute Risk Reduction
• Reduction in risk caused by an exposure
ARR = Risk (exposed) - Risk (unexposed)
• In our example:= Risk of accident free (yellow) - Risk (black)= 0.9 - 0.8= 0.1
Number Needed to Treat
• NNT (or NNH - harm) is a useful intuitive number
• Just learn the formula:NNT = 1/ARR
• In this case 1/0.1 =10– So you would need to paint 10 cars yellow
to prevent one accident.
Another example
• A trial looking at effectiveness of nicotine gum on smoking cessation
• 6328 smokers given gum, 1149 stopped• 8380 smokers given placebo, 893
stopped• Calculate the NNT You are allowed a calculator for this one!
– Table– Write out formulae
Yes No
Yes 1149 5179 6328
No 893 7487 8380
2042 12666
Exposure
Outcome
Totals
TotalsSmoking cessation
Nicotine gum
Odds
• Remember, Relative Risk needs the total exposed, which you can’t know from a case-control study
• Odds and Odds Ratio give similar information for data from case-control studies
• Note that in rare conditions the OR approximates the RR (you can work it out if you want, I wouldn’t bother)
Odds
Odds = affected / unaffected
OR = odds (exposed) / Odds (unexposed)
Odds• Consider a study looking retrospectively at
asbestos exposure in patients with mesothelioma
• 100 cases of mesothelioma, 80 recalled aspestos exposure
• 100 controls, 50 recalled aspestos exposure
• Draw a table…
Yes No
Yes 80 50 130
No 20 50 70
100 100
Exposure
Outcome
Totals
TotalsMesothelioma
Aspestos exposure
Odds
• Odds in exposed = 80/50 = 1.6• Odd in unexposed = 20/50 = 0.4• Odds ratio = 1.6 / 0.4 = 4
• I don’t find OR as intuitive as RR, and I think they are less useful– So hopefully less likely to need to calculate
them, but you might!
Looking at numbers
• Some other useful things– P values and confidence intervals, chance
and error– Normal distribution and skewness– (Mean, median and mode)– (Standard deviation, standard error)
Results Interpretation
• The point of any study is to inform us about things in the real world
• We need to consider:– Could the results be due to chance or do
they show a real effect?– Could error have affected the results– Can these results be applied to the real
world, I.e. are they generaliseable?
Results Interpretation
• All a study can tell you is how likely or unlikely things are:– “This study shows X causes Y”– What we actually mean is “it is very likely
that X causes Y”
• How can this be measured?– How do we know if differences are due to
chance or due to a real effect?
Significance
• A significant result is one that is unlikely to be due to chance alone– P value is the probability that the results
seen could have been caused by chance alone
– p < 0.05 or p < 0.01 are generally taken to indicate significance
Significance
• The null hypothesis– This means that the results are due to
chance alone– If your p-value is less than your chosen
level of significance (say 0.05) then you “reject the null hypothesis”
– This means you are saying that you don’t think the results are due to chance.
Significance• Take a finding with a p < 0.01
– This means that the chance that the observed results are due to chance is less than 1 in 100
– Or if you did the same experiment 100 times, one time the results would be due to chance, 99 times they would be due to a real effect.
• As your p value is less than your chosen level of significance, you reject the null hypothesis.
Significance
• P values depend on power, I.e the amount of data available.
• The effect of chance in a small study is greater, hence it would have a bigger p value.
• When planning a study it is necessary to do a power calculation to work out the sample size which is likely to produce a significant result– Let’s not go there… (get a actual statistician)
Confidence Intervals• A confidence interval gives a range for a
value of interest (such as relative risk)– Remember, if RR = 1 that means no effect
• As with p values, a CI is calculated with a chosen significance level (usually 0.05 - giving a 95% CI)
• If a CI range crosses 1, a result is not significant
• A CI gives the largest and smallest effects that are likely given the data
Forest Plot
• Used in meta-analyses
• Plot confidence intervals– Which studies produced a significant
result?– Look at the effect on the values of having
extra power form aggregated data!
Fig 3 Meta-analysis of studies showing impact of opiate substitution treatment in relation to HIV transmission in people who inject drugs among all pooled studies and studies reporting
only adjusted effect estimates .
MacArthur G J et al. BMJ 2012;345:bmj.e5945
©2012 by British Medical Journal Publishing Group
Error
• Results from your data may (will) differ from ‘real life’ - this is error.– Random– Systematic
Error - Random
• Random error occurs because we are only using a selection to get our information, not the whole population (sampling)– ß-error / Type II– Alpha-error / Type I
Error - Random
• Beta / Type II– Conclude there is no association when
there is one– Due to a small study, not enough power,
sample size too small– The danger is that we are not be aware of
risk factors or treatments (also less likely to get published)
Error - Random
• Alpha / Type I– Conclude there is an association when
there isn’t one (much worse)– Minimised by using a small P-value as our
chosen level of significance.– If the risk of harm due to a treatment was
very high, we might use a smaller P-value• P-value = risk of alpha error. 0.05 = 5% risk
Error - Systematic
• Selection bias:– Esp. case-control studies (cases and controls not
comparable)– If controls are selected using criteria related to risk
factors under investigation• Selecting lung cancer controls from resp. clinic
– Rx: good study design and control selection
Error - Systematic
• Information bias:– Mis-classification of disease or exposure
status– Recall bias is the main example
Error - Systematic
• Confounding– Factor which is associated both with exposure and
outcome, and isn’t taken into account• Consider occupation and lung cancer rates• Smoking may be more common in some occupations
– Rx: good study design, anticipating confounders and taking into account
Validity
• Validity: extent to which a variable measures what it is supposed to measure– Internal: study design, inferences about
study population– External: can the results be applied to non-
study patients/populations (is the study generalisable?)
– NB statistical association ≠ causality
Skewness
Positive and negative skew in describing uni-modal data
4. Screening
Screening
• Not time to cover here properly
• Learn the Winson and Jungner criteria– Condition– Test– Treatment– Programme
• Learn about screening test evaluation
Screening Test Evaluation
Again, having a descriptive defintion will help you to work out the formula:
• Sensitivity
• Specificity
• Positive Predictive Value
• Negative Predictive Value
• Likelihood Ratios
Screening Test Evaluation
• Sensitivity and specificity: refer to test
• Predictive values: refer to population– Depend on disease prevalence
• Likelihood ratios: refer to individuals
Screening Test Evaluation
• My advice:
• Draw a table (surprise!)– Consider doing this at the start of the exam
• Be able to work out the formulae from the table and your descriptive definitions
Pos. Neg.
Pos.TRUE
POSITIVEFALSE
POSITIVE
Neg.FALSE
NEGATIVETRUE
NEGATIVE
TRUE
TEST
Remember: ‘positive’ or ‘negative’ refers to the test result
Pos. Neg.
Pos.A B A+B
Neg.C D C+D
A+C B+DA+B C+D
TRUE
TEST
Sensitivity
• How good a test is at picking up cases– What proportion of the cases of disease
are picked up by the test?– Cases of disease = A+C– Picked up by the test (true positives) = A– So…
Sensitivity = A / (A+C)
Pos. Neg.
Pos.A B A+B
Neg.C D C+D
A+C B+DA+B C+D
TRUE
TEST
Specificity
• How specific is a positive test result to cases of the disease?– A specific test would have a small proportion false
positives (B), which mean sit would have a high proportion of true negatives (D)…
– Out of all of those without the disease (B+D)• Remember, specificity is independent of prevalence
– So…Specificity = D / (B+D)
Pos. Neg.
Pos.A B A+B
Neg.C D C+D
A+C B+DA+B C+D
TRUE
TEST
PPV
• Chance that a test +ve actually has the disease– Test +ve who actually have disease (true
positives) = A – All test +ves = A+B– So…
PPV = A / (A+B)
Remember this depends in prevalence, which has important implications for screening
Pos. Neg.
Pos.A B A+B
Neg.C D C+D
A+C B+DA+B C+D
TRUE
TEST
NPV
• Chance that a test -ve really doesn’t have the disease– Test negatives who really don’t have disease (ture
negatives) = D– All test -ves = C+D– So…
NPV = D / (C+D)
Remember this depends in prevalence, which has important implications for screening
Pos. Neg.
Pos.A B A+B
Neg.C D C+D
A+C B+DA+B C+D
TRUE
TEST
Likelihood Ratios
• Useful when considering individuals
• +ve LR: How much more likely a person is to have a disease, if they test positive
• Allow you to take into account pre test probability.
So what to learn?
• AKT syllabus?– At least read the list of terms with wikipedia at your
side
• Stuff covered here?• Or key bits?
– Familiarity with study types– How to calculate an NNT– What P-values and ORs mean– Wilson and Junger criteria– Screening test evaluation– Mean/median/mode/SD and some graph types