basic statistical concepts donald e. mercante, ph.d. biostatistics school of public health l s u - h...

40
Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Basic Statistical Concepts

Donald E. Mercante, Ph.D.

BiostatisticsSchool of Public Health

L S U - H S C

Page 2: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C
Page 3: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Two Broad Areas of Statistics

Descriptive Statistics- Numerical descriptors

- Graphical devices- Tabular displays

Inferential Statistics- Hypothesis testing- Confidence intervals- Model building/selection

Page 4: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Descriptive Statistics

When computed for a population of values, numerical descriptors are called

Parameters

When computed for a sample of values, numerical descriptors are called

Statistics

Page 5: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Descriptive Statistics

Two important aspects of any population

Magnitude of the responses

Spread among population members

Page 6: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Descriptive Statistics

Measures of Central Tendency (magnitude)

Mean - most widely used

- uses all the data- best statistical properties- susceptible to outliers

Median - does not use all the data

- resistant to outliers

Page 7: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Descriptive Statistics

Measures of Spread (variability)

range - simple to compute

- does not use all the data

variance - uses all the data

- best statistical properties- measures average

distance of values from a reference point

Page 8: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Properties of Statistics

• Unbiasedness - On target• Minimum variance - Most reliable

• If an estimator possesses both properties then it is a MINVUE = MINimum Variance Unbiased Estimator

• Sample Mean and Variance are UMVUE =Uniformly MINimum Variance Unbiased Estimator

Page 9: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Inferential Statistics

- Hypothesis Testing

- Interval Estimation

Page 10: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Hypothesis Testing

Specifying hypotheses:

H0: “null” or no effect hypothesis

H1: research or alternative hypothesis

Note: Only H0 (null) is tested.

Page 11: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Errors in Hypothesis Testing

Reality Decision H0 True H0 False

Fail to Reject H0

Reject H0

Page 12: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Hypothesis Testing

In parametric tests, actual

parameter values are specified

for H0 and H1.

H0: µ < 120

H1: µ > 120

Page 13: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Hypothesis Testing

Another example of explicitly

specifying H0 and H1.

H0: = 0

H1: 0

Page 14: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Hypothesis Testing

General framework:

• Specify null & alternative

hypotheses

• Specify test statistic

• State rejection rule (RR)

• Compute test statistic and

compare to RR

• State conclusion

Page 15: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Common Statistical TestsTest Name Purpose

One-sample (z) t-test Test value of a mean

Two-sample (z) t-test Compare two means

Paired t-test Compare difference in means (compare re-lated means)

ANOVA Test for differences in 2 or more means

Page 16: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Common Statistical Tests (cont.)Test Purpose

Test on binomial proportion(s)

Test whether binomial proportions =0, or each other.

Test on correlation coefficient(s)

Test whether correlation coefficient =0, or each other.

Regression Test whether slope = 0

RxC contingency table analysis

Test whether two categorical variables are related

Page 17: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Advanced Topics

Test Purpose

Multivariate Testse.g., MANOVA

Test value of severalparameters simultaneously

Repeated Measures /Crossovers

Test means when subjectsrepeatedly measured

Survival Analysis Estimate and comparesurvival probabilities forone or more groups

Nonparametric Tests Many analogous to standardparametric tests

Page 18: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

P-Values

p = Probability of obtaining a

result at least this extreme given

the null is true.

P-values are probabilities

0 < p < 1

Computed from distribution of the

test statistic

Page 19: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Rate a proportion, specifically a fraction, where

The numerator, c, is included in the denominator:

-Useful for comparing groups of unequal size

Example:

Epidemiological Concepts

dcc

births live # totalold days 28deaths#

rate mortatilty neonatal

Page 20: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Measures of Morbidity:

Incidence Rate: # new cases occurring during a given time interval divided by population at risk at the beginning of that period.

Prevalence Rate: total # cases at a given time divided by population at risk at that time.

Epidemiological Concepts

Page 21: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Most people think in terms of probability (p) of an event as a natural way to quantify the chance an event will occur => 0<=p<=1

0 = event will certainly not occur

1 = event certain to occur

But there are other ways of quantifying the chances that an event will occur….

Epidemiological Concepts

Page 22: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Odds and Odds Ratio:

For example, O = 4 means we expect 4 times as many occurrences as non-occurrences of an event.

In gambling, we say, the odds are 5 to 2. This corresponds to the single number 5/2 = Odds.

Epidemiological Concepts

occurnot event will the times# expectedoccur event willan times# expected

eventan of Odds O

Page 23: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

The relationship between probability & odds

Epidemiological Concepts

event no of probevent of prob

p-1p

O

O

Op

1

Page 24: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Epidemiological ConceptsProbability Odds

.1 .11

.2 .25

.3 .43

.4 .67

.5 1.00

.6 1.50

.7 2.33

.8 4.00

.9 9.00

Odds<1 correspond

To probabilities<0.5

0<Odds<

Page 25: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Blacks Nonblacks Total

Death 28 22 50

Life 45 52 97

Total 73 74 147

Death sentence by race of defendant in 147 trials

Example 1: Odds Ratio

Page 26: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Odds of death sentence = 50/97 = 0.52

For Blacks: O = 28/45 = 0.62

For Nonblacks: O = 22/52 = 0.42

Ratio of Black Odds to Nonblack Odds = 1.47

This is called the Odds Ratio

Example 2: Odds Ratio

47.1990

145645*2252*28

5222

4528

OR

Page 27: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Odds ratios are directly related to the parameters of the logit (logistic regression) model.

Logistic Regression is a statistical method that models binary (e.g., Yes/No; T/F; Success/Failure) data as a function of one or more explanatory variables.

We would like a model that predicts the probability of a success, ie, P(Y=1) using a linear function.

Logistic Regression

Page 28: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Problem: Probabilities are bounded by 0 and 1.

But linear functions are inherently unbounded.

Solution: Transform P(Y=1) = p to an odds. If we take the log of the odds the lower bound is also removed.

Setting this result equal to a linear function of the explanatory variables gives us the logit model.

Logistic Regression

Page 29: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Logit or Logistic Regression Model

Where pi is the probability that yi = 1.

The expression on the left is called the logit or log odds.

Logistic Regression

ikkiii

i XXXp

p

22111log

Page 30: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Probability of success:

Odds Ratio for Each Explanatory Variable:

Logistic Regression

ikkii XXXi e

YPp 22111

11

ieOR iXfor

Page 31: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Suppose a new screening test for herpes virus has been developed and the following summary for 1000 individuals has been compiled:

Has Herpes

Does Not

Have Herpes

Screened Positive 45 10

Screened Negative 5 940

Screening Tests

Page 32: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

How do we evaluate the usefulness of such a test?

Diagnostics:

sensitivity

specificity

False positive rate

False negative rate

predictive value positive

predictive value negative

Screening Tests

Page 33: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Screening Tests

Generic Screening Test Table

With Disease

Without Disease

Total

Screened Positive

a b a+b

Screened Negative

c d c+d

Total a+c b+d N

Page 34: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Screening Tests

caa

ySensitivit

dbd

ySpecificit

dbb

ratepositiveFalse

cac

ratenegativeFalse

ba

avaluepredictiveorYield

Nca

prevalence

dc

dvaluepredictiveorYield

Page 35: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Screening Tests

%9050

45ySensitivit %95.98

950

940ySpecificit

%05.1950

10rate positive False %10

50

5 ratenegativeFalse

%82.8155

45 valuepredictiveorYield

%51000

50prevalence

%47.99950

940 valuepredictiveorYield

Page 36: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Interval Estimation

Statistics such as the sample mean, median, variance, etc., are called

point estimates-vary from sample to

sample-do not incorporate

precision

Page 37: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Interval Estimation

Take as an example the sample mean:

X ——————> (popn mean)

Or the sample variance:

S2 ——————> 2

(popn variance)

Estimates

Page 38: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Interval Estimation

Recall Example 1, a one-sample t-test on the population mean. The test statistic was

This can be rewritten to yield:

nsx

t 0

Page 39: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Interval Estimation

1

210

21t

nsx

tP

Which can be rearranged to give a(1-)100% Confidence Interval for :

nstx

n 1 ,21

Form: Estimate ± Multiple of Std Error of the Est.

Page 40: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Interval Estimation

Example 1: Standing SBP

Mean = 140.8, s.d. = 9.5, N = 12

95% CI for :140.8 ± 2.201 (9.5/sqrt(12))

140.8 ± 6.036(134.8, 146.8)