brief outline of basic statistics - university of cape town · 2017. 8. 30. · brief outline of...

90
Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics, University of Cape Town [email protected] Thanks to Landon Myer for slides

Upload: others

Post on 08-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Brief outline of basic statistics

S/Lecturer Maia Lesosky Division of Epidemiology &

Biostatistics, University of Cape Town [email protected]

Thanks to Landon Myer for slides

Page 2: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

“These are the topics that need to be covered”

1. definitions of mean, median and standard deviation 2. interpretation of confidence intervals and p values 3. sensitivity and specificity 4. positive and negative predictive values 5. interpretation of parametric and non parametric data tests commonly

used- students T test, Chi square analysis, Fishers exact test, ANOVA

6. correlation 7. risk reduction and numbers needed to treat 8. relative risk and hazard ratio 9. regression analysis 10. interpretation of kappa values 11. survival analysis

Page 3: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Principles of this talk More detail here than you need

à Will go quickly, you must stop to ask questions à These are the basics, you will be asked to apply them

Many things here are gross simplifications You want this

Terminology varies à have kept as general as possible, noted synonyms

Page 4: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Outline

I. Measurements & distributions (20%) – describing distributions

II. Making comparisons (70%) – Statistical tests, p-values & CI – Regression, survival analysis

III. Evaluating measurements (10%) – Validity, reliability

Page 5: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

I. Measurements & distributions

Page 6: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Measurements •  Broadly 3 kinds of measurements in health

sciences (“variables”)

–  Numeric measures •  Continuous •  Discrete

–  Categorical measures •  Polytomous

–  Ordinal vs nominal •  Binary

–  Time-based measures •  Time-to-event, survival

Page 7: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Examples •  Systolic blood pressure •  Mortality •  ALT •  TMN staging •  Gender/sex •  Time to remission

Note: can make categories of continuous measures

Page 8: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Con

tinuo

us m

easu

re (g

/dL)

Haemoglobin

Pol

ytom

ous

mea

sure

(low

, med

ium

, hig

h)

Bin

ary

mea

sure

(low

, hig

h)

NB: Categorical measures NB: Numeric measure

Page 9: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Distributions

•  When we take measurements on many patients, we can describe measures as distributions

•  How we describe a distribution will depend on the kind of measure

Page 10: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Categorical distributions

•  Describe in frequency distributions

Page 11: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Can describe distribution of categorical variable in terms of counts, percentages Different units, same conclusions

Page 12: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Continuous distributions

•  We also describe in terms of frequency distributions

•  But we have many “categories” (“bins”)

Page 13: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

We draw summaries to describe the shape of the distribution

Page 14: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

There are other ways to show shapes of distributions

‘Box-and-whisker’ plots

Page 15: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Many different possible shapes for continuous distributions

Page 16: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Skewed distributions Negative (left) skew

–  Long “tail” of distribution is to the left

–  Bulk of observations shifted right

Positive (right) skew –  Long “tail” of distribution is to

the right –  Bulk of observations shifted left

Page 17: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

There are some “classic” distributions

Page 18: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Ways to describe a distribution

•  Measure of central tendency – Where the distribution clusters

•  Measure of dispersion – How spread out the distribution is

Page 19: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Measures of central tendency

•  Mean – Arithmetic mean, average value

•  Median – 50th percentile, middle value

•  Mode – Most commonly occurring value

Page 20: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Quantiles (regular intervals of a distribution)

•  Percentiles 1, 2, 3, 4, 5, 6, 7, 8 …… 95, 95, 96, 97, 98, 99, 100

•  Deciles 1-10, 11-20, 21-30 ….. 81-90, 91-100

•  Quintiles 1-20, 21-40, 41-60, 61-80, 81-100

•  Quartiles 1-25, 26-50, 51-75, 76-100

•  Tertlies 1-33, 34-67, 67-100

Page 21: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,
Page 22: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Describing distributions

Note: if the data are normally distributed, the mean is a good measure of central tendency If the data are non-Normal, the median is better measure of central tendency

Page 23: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Measures of dispersion (‘spread’)

•  Range –  Minimum value to Maximum value

•  Variance –  Average distance between each point and the mean

•  Standard deviation –  Square root of variance

•  Interquartile range –  25th percentile to 75th percentile

Page 24: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Variance

Page 25: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

3 distributions: same mean value, but different variances

Page 26: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

We have a favourite distribution

Remember: standard deviation is just the square root of variance

Normal distribution ~ “Gaussian distribution”, “standard normal distribution”

Page 27: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

We like the Normal distribution because it has some well-defined features

In a Normal distribution, 95% of the data falls within 1.96 standard deviations of

the mean value

(here, 95.46% within 2 standard deviations)

Page 28: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,
Page 29: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  This is where a 95% confidence interval comes from – 95% confidence interval around a mean value is

± 1.96 standard deviations around the mean value

– Sometimes we are lazy and call it ± 2 SD around the mean

•  95% CI is a generic statistic –  It will come up elsewhere (same concept,

different application)

Page 30: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  We often manipulate (“transform”) variables so that we can make them “Normal” –  Common manipulations include logarithms, square

roots, or squares

•  Eg, log HIV viral loads

Page 31: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

There are many different standard distributions with well-defined

features

•  Gaussian (Normal) is most common

•  Others – Z-distribution, T-distribution, F-distribution – Chi-squared (χ2) distribution – Binomial distribution (for categorical data) – Poisson distribution (for counts of things)

Page 32: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Parametric statistics

•  If the distribution of our measures follows a known distribution, we can make assumptions about our data based on rules of the known distribution – Eg, if our data are normally distributed, we

know that 95% of data fall within 2 standard deviations of the mean value

•  These kinds of statistics are parametric statistics

Page 33: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Non-parametric statistics

•  If our measures really don’t look like any known distribution, we can’t make assumptions about it based on any standard distribution – We have to work with the actual values of our

measurements •  These are non-parametric statistics

Page 34: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Example There are parametric and non-parametric

approaches to describing distributions

•  If data are normally distributed –  Mean and standard deviation (or variances) used to

describe distributions

•  If data are not normally distributed –  Median and interquartile ranges (or just ranges) used

to describe distribution

Page 35: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

II. Making comparisons

Page 36: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  Sometimes we want to compare 2 distributions to each other – Are the distributions different from each

other? –  Is there an association between the two

measures? •  We can ask this question about different

combinations of •  Continuous measures

– Normal or non-normal distributions

•  Categorical measures –  Polytomous or binary

Page 37: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Example: comparison of 2 distributions

Serum cholesterol among women

Serum cholesterol among men

Question: Is cholesterol associated with gender?

Page 38: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Example: comparison of 2 binary measures

Patients without TB Patients with TB

Question: Is TB disease associated with death?

Page 39: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Statistical hypothesis testing

•  There are different statistical tests that are applied in different situations to answer the question – Are the distributions of one variable different

according to another variable Which is the same thing as

–  Is there an association between one measure and another measure?

•  Different tests all give rise to p-values

Page 40: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Statistical test for every situation •  Comparing 2 continuous variables to each other

–  Correlation coefficient •  Comparing 2 categorical variables to each other

–  Chi-square test, Fisher’s exact test •  Comparing a binary categorical variable to a continuous

variable –  Student’s T-test (parametric ~ if continuous variable is normally

distributed) –  Wilcoxon rank-sum test (=Mann-Whitney U-test) (nonparametric - if

continuous variable not normally distributed) •  Comparing a polytomous categorical variable to a

continuous variable –  ANOVA (parametric ~ if continuous variable is normally distributed) –  Kruskall-Wallis test (=Mann-Whitney U-test) (nonparametric - if

continuous variable not normally distributed)

Page 41: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Correlation coefficient

•  Correlation coefficients (usually “r”) used to examine association between 2 continuous variables

This graph is sometimes called a “scatterplot”

Page 42: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,
Page 43: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Chi-squared tests

•  Used to examine the association between 2 categorical variables

Dead Alive no  TB 26 128

TB+ 67 91

Note: Chi-square tests are parametric and used for larger sample sizes

Page 44: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Fisher’s exact tests

Dead Alive no  TB 2 6

TB+ 7 5

For smaller sample sizes we replace chi-squared with Fisher’s exact tests (non-parametric)

They do the same thing but different formulae, much more calculations

Small sample size ~ table contains <60 total, or any cell <5

Page 45: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  Chi-squared tests and Fisher’s exact tests can be used to compare – 2 binary variables to each other (2x2) – Binary versus polytomous (eg, 2x3) – Polytomous versus polytomous (eg, 4x5)

Page 46: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

(Student’s) T-test

•  Used to compare 2 normal distributions (parametric test)

•  Whether 2 distributions are different depends on the size of the difference in means AND how much variability is present

Page 47: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Wilcoxon rank-sum test

(= Mann-Whitney U-test) •  Non-parametric test •  Compares 2 non-normal distributions

– The non-parametric version of t-test

•  “Comparing means”: t-test •  “Comparing medians”: rank-sum test

Page 48: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

ANOVA •  ANOVA = analysis of variance •  Used to test for any difference in mean values

for >2 distributions

Parametric – requires Normally distributed data

Is there any difference between these 3 distributions?

Page 49: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Kruskall-Wallis test •  Extension of Wilcoxon

rank-sum test for comparing >2 groups at once

•  Also = Mann-Whitney U test

•  Non-parametric version of the ANOVA

Page 50: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  Comparing 2 continuous variables to each other –  Correlation coefficient

•  Comparing 2 categorical variables to each other –  Chi-square test, Fisher’s exact test

•  Comparing a binary categorical variable to a continuous variable –  Student’s T-test (parametric) –  Wilcoxon rank-sum test (nonparametric)

•  Comparing a polytomous categorical variable to a continuous variable –  ANOVA (parametric) –  Kruskall-Wallis test (nonparametric)

Page 51: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Relative risks

•  Often data from clinical research seeks to understand whether patients with some pre-existing status (‘exposure’) may be more/less like to develop some subsequent health outcome – Cohort studies – Randomised controlled trials

Page 52: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Dead Alive

Drug  A 12 88 Drug  B 37 63

•  Imagine a trial randomising 100 patients to receive drug A and 100 patients to receive drug B, then following them over time to observe survival

•  We could calculate a chi-square test here, but not very useful clinically (but only tells us “statistical significance”)

•  Often we prefer to calculate the relative risk (risk ratio or rate ratio)

Page 53: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Relative risk Proportion of all the exposed (here, drug A)

patients developing the outcome divided by

Proportion of all unexposed (here, drug B) patients developing the outcome

Dead Alive

Drug  A 12 88 Drug  B 37 63

12 / (12 + 88) = 0.12

37/ (37+ 63) = 0.37

0.12 / 0.37 = 0.33

Page 54: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Interpreting the relative risk

•  Relative risk is how much more (or less) likely the health outcome is in one group relative to the other – Here, death is 0.32 times as likely (ie, less

likely) in patients receiving drug A relative to patients receiving drug B

•  Note: if the risk of the outcome is the same in both arms, the relative risk is 1

•  If ‘exposure’ is protective, RR < 1 •  If ‘exposure’ is detrimental, RR > 1

Page 55: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Confidence intervals (again)

•  We can calculate confidence intervals (CI) around this relative risk – Here, the interval is (0.19 – 0.61)

•  The CI gives a range of estimates for the RR that the observed data (from the table) are consistent with – Narrow CI ~ precise estimate of RR (good)

– Wide CI ~ imprecise estimate or RR (bad)

Page 56: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Absolute risk reduction (risk difference)

•  Like relative risk, but subtract instead of divide

Proportion of all the exposed (here, drug A) patients developing the outcome minus

Proportion of all unexposed (here, drug B) patients developing the outcome

12 / (12 + 88) = 0.12

37/ (37+ 63) = 0.37

0.12 - 0.37 = - 0.25

Dead Alive

Drug  A 12 88 Drug  B 37 63

Page 57: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  Absolute risk reduction (risk difference) tells us how the risk of the health outcome changes when the exposure is taken away – Here, risk of death drops by 0.25 (25%) when

patients receive drug A compared to drug B

•  Note: if the risk of the outcome is the same in both arms, the risk reduction (risk difference) is 0

•  If ‘exposure’ is protective, RR < 0 •  If ‘exposure’ is detrimental, RR > 0

Page 58: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Numbers needed to treat

•  The average number of patients who need to receive an intervention (here, Drug A) to prevent 1 outcome from happening

•  Calculated as 1 / (risk reduction)

•  Here, 1 / 0.25 = 4 – On average, 4 patients need to receive drug A

instead of drug B to prevent 1 death

Page 59: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

P-values

•  P-values provide a measure of “statistical significance” from any statistical test that compares 2 things

– “universal currency” of statistical comparison

– Helps us understand the role of chance in explaining an association

Page 60: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Interpreting p-values

•  P-values ~ probabilities ~ range from 0 - 1

•  P-value’s formal definition based on hypothesis testing – Evaluates the probability of null hypothesis

•  Null hypothesis ~ usually that there is no association between variables

– P-value: the probability of observing the data in your study if the null hypothesis is true

Page 61: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Practical interpretation of p-values

•  Large p-value:

The association observed between the 2 variables in your data is consistent with the hypothesis of no association between variables

– “Association is not statistically significant”

– “No statistically significant difference”

Page 62: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  Small p-value: the association observed between the 2 variables in your data is NOT consistent with the hypothesis of no association – “Association is statistically significant” – “Statistically significant difference”

•  Smaller p-value à association less consistent with chance finding – “Statistically significant” = not consistent with

chance

Page 63: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Small vs Large

•  We traditionally use 0.05 as a cut off for a “statistically significant” p-value

•  This is arbitrary rule-of-thumb – 0.048 it not very different from 0.053

•  Another guide – >0.1 = not statistically significant – 0.05-0.1 = approaching statistical significance – 0.001-0.05 = statistically significant – <0.001 = highly statistically significant

Page 64: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Sample sizes •  “Statistical significance” (the size of a p-

value) is determined by a few things, most importantly

– The size of the difference in the measure you are looking at AND

– The number of patients (sample sizes) involved

Page 65: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

For example

Dead Alive

no  TB 2 8

TB+ 6 4

Dead Alive no  TB 20 80

TB+ 60 40

The proportions in the 2 tables are the same (calculate the risk ratios to see this)

But the p-value for the table on the left is 0.17 The p-value for the table on the right is <0.001

Page 66: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

P-values and CI

•  P-values and CI are closely related – Calculated from the same place – A small p-value suggests narrow CI

•  more precise, good – A large p-value suggest wide CI

•  less precise, bad

•  CI for an RR that do not overlap 1 mean the corresponding p-value is <0.05

‘statistically significant’

Page 67: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Example: interpret the following

RR = 1.9; 95% CI= 1.4 – 2.8; p=0.008 – Outcome about 2x more common in exposed vs

unexposed; narrow CI, statistically significant RR = 0.8; 95% CI = 0.2 – 4.8; p=0.37

•  Outcome slightly less common in exposed vs unexposed; wide CI, not statistically significant

RR = 1.02; 95% CI= 0.5 - 2.0; p=0.98 – Not much difference in frequency of outcome

between exposed and unexposed, not statistically significant

Page 68: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Regression models

•  All the statistical tests we have looked at so far only look at the association between two variables at a time

•  But sometimes we want to look at the associations involved between >2 variables at once – Regression models commonly used for this

Page 69: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Concept of regression

•  Equation used to predict an outcome variable (y) according to the one or more predictor variables (x’s)

•  Basic equation for a line Y = intercept + slope * X Here we’re interested in the slope à relationship between X and Y

Page 70: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Linear regression

Note: There are many different kinds of regression

Page 71: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Application of regression models in medical research

•  Regression models used to look at how multiple factors combine predict a health outcome –  Especially adjustment for confounding variables

•  Equations like Y = intercept + (slope*X) + (slope*R) + (slope*Z) Would you be used to understand how a

certain health outcome (Y) is predicted by 3 different factors (X, R, Z)

Page 72: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Survival analysis

•  Survival analysis uses time-to-event measures

•  “Survival” can mean time until death – Or any other specific outcome

•  Remission, Cure, Relapse, Need for admission – Any binary outcome studies over time

•  Note: survival analysis from cohort studies or RCTs – Need to follow patients over time

Page 73: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Kaplan-Meier plots

Page 74: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Kaplan-Meier survival analyses to compare survival in 2 groups over time

Page 75: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,
Page 76: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Hazard ratios

•  There is a particular kind of regression model for survival analysis: Cox’s proportional hazards model

•  Model gives us hazard ratios – Like the distance between 2 survival curves –  Interpreted exactly like relative risks

•  So how would you interpret: – HR > 1 – HR < 1 – HR = 1

Page 77: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

III. Evaluating measurements

Page 78: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Evaluating a new test

•  We often want to know how well a certain test performs in detecting a condition of interest – We may be interested in screening for a

condition or diagnosing it •  Test of interest may be

– Laboratory assay, radiological investigation •  We want to know how test performs

–  Identifying those with disease, those without

Page 79: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  We study this by comparing the new test to an established gold-standard (representing the truth)

True pos

True  neg

Test  pos A B A+B Test  neg C D C+D

A+C B+D

False Positives

False Negatives

Page 80: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Ways of evaluating a new test

•  Sensitivity – The proportion of people who truly have

disease who are detected correctly by the test •  Specificity

– The proportion of people who do truly do not have disease who are detected correctly by the test

These are features of tests, but not actually

what is needed by a clinician at bedside

Page 81: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  Positive predictive value – What proportion of people who have a

positive test truly have disease •  Negative predictive value

– What proportion of people who have a negative test truly do not have disease

These are of greater clinical interest, but are

problematic in reality

Page 82: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

True pos

True  neg

Test  pos A B A+B Test  neg C D C+D

A+C B+D

•  Sensitivity = A / (A+C) – High sensitivity means few false negatives

•  Specificity = D / (B+D) – High specificity means few false positives

Page 83: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

True pos

True  neg

Test  pos A B A+B Test  neg C D C+D

A+C B+D

•  Positive Predictive Value= A / (A+B) – High PPV means few false positives

•  Negative Predictive Value = D / (C+D) – High NPV means few false negatives

Page 84: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Example: raised IFN-γ in detecting culture-confirmed TB

in smear-negative patients TB cult neg

TB cult neg

Raised  IFN-γ 75 125 200 Normal  IFN-γ 25 175 200

100 300

Calculate – Sens – Spec – PPV – NPV

Page 85: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Interpretations: Sens, Spec

•  Sensitivity and specificity of raised IFN-γ are 75% and 58%, respectively in detecting culture-confirmed TB

– This means that 75% of patients with culture-confirmed TB will have raised IFN-γ (test pos)

– And that 58% of patients without disease will not have raised IFN-γ (test neg)

Page 86: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Interpretations: PPV, NPV

•  PPV and NPV of raised IFN-γ are 38% and 88%, respectively in detecting culture-confirmed TB

– This means that 38% of patients with a raised IFN-γ will truly have culture-confirmed TB

– And that 88% of patients with a low IFN-γ will truly not have TB

Page 87: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

General rules

•  Higher sensitivities usually mean lower specificities (good tests are high on both)

•  High sens à few false negatives à high NPV –  If a test has a high sens, negative result helps

rule out disease (SnOUT) •  High spec à few false positives à high PPV

–  If a test has a high spec, positive result helps rule in disease (SpIN)

Page 88: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Reliability

•  There are some situations in which there is no “gold-standard” to compare with

•  We then compare the reliability (repeatability) of measures

•  Example: –  radiologists identifying lesions on scan – pathologists identifying malignancy on biopsy – psychiatrists making any diagnosis

Page 89: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

•  In these situations we compare the repeatability of measures

•  Here it’s not clear who the gold-standard is à can’t really calculate sens, spec, etc.

Radiologist A: positive

Radiologist A: negative

Radiologist B: positive 31 19 50

Radiologist B: negative 21 79 100

52 98 150

Page 90: Brief outline of basic statistics - University of Cape Town · 2017. 8. 30. · Brief outline of basic statistics S/Lecturer Maia Lesosky Division of Epidemiology & Biostatistics,

Kappa •  Instead we want to see the amount of

agreement between the raters

•  Kappa = the degree of agreement of 2 raters above chance – Measure of test-retest reliability – Range, -1 (perfect disagreement) to 1 (perfect

agreement)