basic statistical concepts donald e. mercante, ph.d. biostatistics school of public health l s u - h...

Post on 22-Dec-2015

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Basic Statistical Concepts

Donald E. Mercante, Ph.D.

BiostatisticsSchool of Public Health

L S U - H S C

Two Broad Areas of Statistics

Descriptive Statistics- Numerical descriptors

- Graphical devices- Tabular displays

Inferential Statistics- Hypothesis testing- Confidence intervals- Model building/selection

Descriptive Statistics

When computed for a population of values, numerical descriptors are called

Parameters

When computed for a sample of values, numerical descriptors are called

Statistics

Descriptive Statistics

Two important aspects of any population

Magnitude of the responses

Spread among population members

Descriptive Statistics

Measures of Central Tendency (magnitude)

Mean - most widely used

- uses all the data- best statistical properties- susceptible to outliers

Median - does not use all the data

- resistant to outliers

Descriptive Statistics

Measures of Spread (variability)

range - simple to compute

- does not use all the data

variance - uses all the data

- best statistical properties- measures average

distance of values from a reference point

Properties of Statistics

• Unbiasedness - On target• Minimum variance - Most reliable

• If an estimator possesses both properties then it is a MINVUE = MINimum Variance Unbiased Estimator

• Sample Mean and Variance are UMVUE =Uniformly MINimum Variance Unbiased Estimator

Inferential Statistics

- Hypothesis Testing

- Interval Estimation

Hypothesis Testing

Specifying hypotheses:

H0: “null” or no effect hypothesis

H1: research or alternative hypothesis

Note: Only H0 (null) is tested.

Errors in Hypothesis Testing

Reality Decision H0 True H0 False

Fail to Reject H0

Reject H0

Hypothesis Testing

In parametric tests, actual

parameter values are specified

for H0 and H1.

H0: µ < 120

H1: µ > 120

Hypothesis Testing

Another example of explicitly

specifying H0 and H1.

H0: = 0

H1: 0

Hypothesis Testing

General framework:

• Specify null & alternative

hypotheses

• Specify test statistic

• State rejection rule (RR)

• Compute test statistic and

compare to RR

• State conclusion

Common Statistical TestsTest Name Purpose

One-sample (z) t-test Test value of a mean

Two-sample (z) t-test Compare two means

Paired t-test Compare difference in means (compare re-lated means)

ANOVA Test for differences in 2 or more means

Common Statistical Tests (cont.)Test Purpose

Test on binomial proportion(s)

Test whether binomial proportions =0, or each other.

Test on correlation coefficient(s)

Test whether correlation coefficient =0, or each other.

Regression Test whether slope = 0

RxC contingency table analysis

Test whether two categorical variables are related

Advanced Topics

Test Purpose

Multivariate Testse.g., MANOVA

Test value of severalparameters simultaneously

Repeated Measures /Crossovers

Test means when subjectsrepeatedly measured

Survival Analysis Estimate and comparesurvival probabilities forone or more groups

Nonparametric Tests Many analogous to standardparametric tests

P-Values

p = Probability of obtaining a

result at least this extreme given

the null is true.

P-values are probabilities

0 < p < 1

Computed from distribution of the

test statistic

Rate a proportion, specifically a fraction, where

The numerator, c, is included in the denominator:

-Useful for comparing groups of unequal size

Example:

Epidemiological Concepts

dcc

births live # totalold days 28deaths#

rate mortatilty neonatal

Measures of Morbidity:

Incidence Rate: # new cases occurring during a given time interval divided by population at risk at the beginning of that period.

Prevalence Rate: total # cases at a given time divided by population at risk at that time.

Epidemiological Concepts

Most people think in terms of probability (p) of an event as a natural way to quantify the chance an event will occur => 0<=p<=1

0 = event will certainly not occur

1 = event certain to occur

But there are other ways of quantifying the chances that an event will occur….

Epidemiological Concepts

Odds and Odds Ratio:

For example, O = 4 means we expect 4 times as many occurrences as non-occurrences of an event.

In gambling, we say, the odds are 5 to 2. This corresponds to the single number 5/2 = Odds.

Epidemiological Concepts

occurnot event will the times# expectedoccur event willan times# expected

eventan of Odds O

The relationship between probability & odds

Epidemiological Concepts

event no of probevent of prob

p-1p

O

O

Op

1

Epidemiological ConceptsProbability Odds

.1 .11

.2 .25

.3 .43

.4 .67

.5 1.00

.6 1.50

.7 2.33

.8 4.00

.9 9.00

Odds<1 correspond

To probabilities<0.5

0<Odds<

Blacks Nonblacks Total

Death 28 22 50

Life 45 52 97

Total 73 74 147

Death sentence by race of defendant in 147 trials

Example 1: Odds Ratio

Odds of death sentence = 50/97 = 0.52

For Blacks: O = 28/45 = 0.62

For Nonblacks: O = 22/52 = 0.42

Ratio of Black Odds to Nonblack Odds = 1.47

This is called the Odds Ratio

Example 2: Odds Ratio

47.1990

145645*2252*28

5222

4528

OR

Odds ratios are directly related to the parameters of the logit (logistic regression) model.

Logistic Regression is a statistical method that models binary (e.g., Yes/No; T/F; Success/Failure) data as a function of one or more explanatory variables.

We would like a model that predicts the probability of a success, ie, P(Y=1) using a linear function.

Logistic Regression

Problem: Probabilities are bounded by 0 and 1.

But linear functions are inherently unbounded.

Solution: Transform P(Y=1) = p to an odds. If we take the log of the odds the lower bound is also removed.

Setting this result equal to a linear function of the explanatory variables gives us the logit model.

Logistic Regression

Logit or Logistic Regression Model

Where pi is the probability that yi = 1.

The expression on the left is called the logit or log odds.

Logistic Regression

ikkiii

i XXXp

p

22111log

Probability of success:

Odds Ratio for Each Explanatory Variable:

Logistic Regression

ikkii XXXi e

YPp 22111

11

ieOR iXfor

Suppose a new screening test for herpes virus has been developed and the following summary for 1000 individuals has been compiled:

Has Herpes

Does Not

Have Herpes

Screened Positive 45 10

Screened Negative 5 940

Screening Tests

How do we evaluate the usefulness of such a test?

Diagnostics:

sensitivity

specificity

False positive rate

False negative rate

predictive value positive

predictive value negative

Screening Tests

Screening Tests

Generic Screening Test Table

With Disease

Without Disease

Total

Screened Positive

a b a+b

Screened Negative

c d c+d

Total a+c b+d N

Screening Tests

caa

ySensitivit

dbd

ySpecificit

dbb

ratepositiveFalse

cac

ratenegativeFalse

ba

avaluepredictiveorYield

Nca

prevalence

dc

dvaluepredictiveorYield

Screening Tests

%9050

45ySensitivit %95.98

950

940ySpecificit

%05.1950

10rate positive False %10

50

5 ratenegativeFalse

%82.8155

45 valuepredictiveorYield

%51000

50prevalence

%47.99950

940 valuepredictiveorYield

Interval Estimation

Statistics such as the sample mean, median, variance, etc., are called

point estimates-vary from sample to

sample-do not incorporate

precision

Interval Estimation

Take as an example the sample mean:

X ——————> (popn mean)

Or the sample variance:

S2 ——————> 2

(popn variance)

Estimates

Interval Estimation

Recall Example 1, a one-sample t-test on the population mean. The test statistic was

This can be rewritten to yield:

nsx

t 0

Interval Estimation

1

210

21t

nsx

tP

Which can be rearranged to give a(1-)100% Confidence Interval for :

nstx

n 1 ,21

Form: Estimate ± Multiple of Std Error of the Est.

Interval Estimation

Example 1: Standing SBP

Mean = 140.8, s.d. = 9.5, N = 12

95% CI for :140.8 ± 2.201 (9.5/sqrt(12))

140.8 ± 6.036(134.8, 146.8)

top related