basics of statistics

Chap 9-1

Basic Statistics

Fundamentals of Hypothesis Testing:

One-Sample, Two-Sample Tests

Chap 9-2

What is biostatistics

Statistics is the science and art of collecting, summarizing, and analyzing data that are subject to random variation.

Biostatistics is the application of statistics and mathematical methods to the design and analysis of health, biomedical, and biological studies.

Chap 9-3

Different Tests of Significance

1. One-Sample z-test or t-testa. Compares one sample mean versus a population

mean

2. Two-Sample t-testa. Compares one sample mean versus another

sample meana. Independent t-tests (equal samples)b. Dependent t-tests (dependent/paired samples)

3. One-way analysis of variance (ANOVA)a. Comparing several sample means

Chap 9-4

How to properly useBiostatistics

Develop an underlying question of interest

Generate a hypothesis Design a study (Protocol) Collect Data Analyze Data

Descriptive statistics Statistical Inference

Chap 9-5

Relationship between population and sample

(Simple random sampling)

Chap 9-6

Sampling Techniques

Population

Simple RandomSample

Systematic Sampling

Stratified RandomSample

ConvenienceSampling

Cluster Sampling

Bias free sample

Bias freesample

Biasedsample

Bias freesample

Biasedsample

Chap 9-7

Example

How are my 10 patients doing after I put them on an anti-hypertensive medications? Describe the results of your 10 patients

Chap 9-8

Example

What is the in hospital mortality rate after open heart surgery at SAL hospital so far this year Describe the mortality

What is the in hospital mortality after open heart surgery likely to be this year, given results from last year Estimate probability of death for patients

like those seen in the previous year.

Chap 9-9

Misuse of statistics

About 25% of biological research is flawed because of incorrect conclusions drawn from confounded experimental designs and misuse of statistical methods

Chap 9-10

What is a Hypothesis?

A hypothesis is a claim (assumption)about the populationparameter Difference

between the value of sample statistic and the corresponding hypothesized parameter value is called hypothesis testing.

I claim that mean CVD in the INDIA is atleast 3!

© 1984-1994 T/Maker Co.

Chap 9-11

Hypothesis Testing Process

Identify the Population

Assume thepopulation

mean age is 50.

( )

REJECT

Take a Sample

Null Hypothesis

No, not likely!X 20 likely if Is ?

0 : 50H

20X

Chap 9-12

Sampling Distribution of

= 50

It is unlikely that we would get a sample mean of this value ...

... Therefore, we reject the

null hypothesis

that m = 50.

Reason for Rejecting H0

20

If H0 is trueX

... if in fact this were the population mean.

X

Chap 9-13

Biostatistics

DescriptiveStatistical Inference

Estimation Hypothesis Testing

Confidence Intervals P-values

Components of Biostatistics

Chap 9-14

A variable is said to be normally distributed or to have a normal distribution if its distribution has the shape of a normal curve.

Normal Distribution

Chap 9-15

Normal distribution bell-shaped

symmetrical about the mean (No skewness)

total area under curve = 1

approximately 68% of distribution is within one standard deviation of the mean

approximately 95% of distribution is within two standard deviations of the mean

approximately 99.7% of distribution is within 3 standard deviations of the mean

Mean = Median = Mode

Chap 9-16

Empirical Rule

About 95% of the area lies within 2 standard

deviations

About 99.7% of the area lies within 3 standard deviations of the mean

68%

About 68% of the area lies within 1 standard deviation of the mean

3 2 32

Chap 9-17

Chap 9-18

Level of Significance,

Is designated by , (level of significance) Typical values are .01, .05, .10

Is selected by the researcher at the beginning

Provides the critical value(s) of the test

Chap 9-19

The z-Test for Comparing Population

MeansCritical values for standard normal

distribution

Chap 9-20

Level of Significance and the Rejection Region

H0: 3

H1: < 30

0

0

H0: 3

H1: > 3

H0: 3

H1: 3

/2

Critical Value(s)

Rejection Regions


Chap 9-21

Hypothesis Testing

1. State the research question.2. State the statistical hypothesis.3. Set decision rule.4. Calculate the test statistic.5. Decide if result is significant.6. Interpret result as it relates to your

research question.

Chap 9-22

Rejection & Nonrejection Regions

Two-tailed test Left-tailed test Right-tailed

Sign in Ha < >

Rejection region Both sides Left side Right side=


Chap 9-23

The Null Hypothesis, H0

States the assumption (numerical) to be tested e.g.: The average number of CVD in INDIA is

at least three ( ) Is always about a population parameter

( ), not about a sample statistic ( )

0 : 3H

0 : 3H 0 : 3H X

Chap 9-24

The Null Hypothesis, H0

Begins with the assumption that the null hypothesis is true Similar to the notion of innocent until

proven guilty

(continued)

Chap 9-25

The Alternative Hypothesis, H1

Is the opposite of the null hypothesis e.g.: The average number of CVD in

INDIA is less than 3 ( ) Never contains the “=” sign May or may not be accepted

1 : 3H

Chap 9-26

General Steps in Hypothesis Testing

e.g.: Test the assumption that the true mean number of of CVD in INDIA is at least three ( Known)

1. State the H0

2. State the H1

3. Choose

4. Choose n

5. Choose Test

0

1

: 3

: 3

=.05

100

Z

H

H

n

test

Chap 9-27

100 persons surveyed

Computed test stat =-2,p-value = .0228

Reject null hypothesis

The true mean number of CVD is less than 3 in human population.

(continued)

Reject H0

-1.645Z

6. Set up critical value(s)

7. Collect data

8. Compute test statistic and p-value

9. Make statistical decision

10. Express conclusion

General Steps in Hypothesis Testing

Chap 9-28



distribution

Chap 9-29

p-Value Approach to Testing Convert Sample Statistic (e.g. ) to

Test Statistic (e.g. Z, t or F –statistic) Obtain the p-value from a table or

computer

Compare the p-value with If p-value , do not reject H0

If p-value , reject H0

X

Chap 9-30

Comparison of Critical-Value & P-Value Approaches

Critical-Value Approach P-Value Approach

Step1 State the null and alternative hypothesis.

Step1 State the null and alternative hypothesis.

Step 2 Decide on the significance level,

Step 2 Decide on the significance level,

Step 3 Compute the value of the test statistic.

Step 3 Compute the value of the test statistic.

Step 4 Determine the critical value(s).

Step 4 Determine the P-value.

Step 5 If the value of the test statistic falls in the rejection region,

reject Ho; otherwise, do not reject

Ho.

Step 5 If P < , reject Ho;

otherwise do not reject Ho.

Step 6 Interpret the result of the hypothesis test.

Step 6 Interpret the result of the hypothesis test.

Chap 9-31

Result ProbabilitiesH0: Innocent

The Truth The Truth

Verdict Innocent Guilty Decision H0 True H0 False

Innocent Correct ErrorDo NotReject

H0

1 - Type IIError ( )

Guilty Error Correct RejectH0

Type IError( )

Power(1 - )

Jury Trial Hypothesis Test

Chap 9-32

Type I & II Errors Have an Inverse Relationship

If you reduce the probability of one error, the other one increases so that everything else is unchanged.

Chap 9-33

Critical Values Approach to Testing

Convert sample statistic (e.g.: ) to test statistic (e.g.: Z, t or F –statistic)

Obtain critical value(s) for a specifiedfrom a table or computer If the test statistic falls in the critical region,

reject H0

Otherwise do not reject H0

X

Chap 9-34

One-tail Z Test for Mean( Known)

Assumptions Population is normally distributed If not normal, requires large samples Null hypothesis has or sign only

Z test statistic

/X

X

X XZ

n

Chap 9-35

Rejection Region

Z0

Reject H0

Z0

Reject H0

H0: 0 H1: < 0

H0: 0 H1: > 0

Z Must Be Significantly Below 0

to reject H0

Small values of Z don’t contradict H0

Don’t Reject H0 !

Chap 9-36

Example: One Tail Test

Q. Does an average box of cereal contain more than 368 grams of cereal? A random sample of 25 boxes showed = 372.5. The company has specified to be 15 grams. Test at the 0.05 level.

368 gm.

H0: 368 H1: > 368

X

Chap 9-37

Finding Critical Value: One Tail

Z .04 .06

1.6 .9495 .9505 .9515

1.7 .9591 .9599 .9608

1.8 .9671 .9678 .9686

.9738 .9750

Z0 1.645

.05

1.9 .9744

Standardized Cumulative Normal Distribution Table

(Portion)

What is Z given = 0.05?

= .05

Critical Value = 1.645

.95

1Z

Chap 9-38

Example Solution: One Tail Test

= 0.5

n = 25

Critical Value: 1.645

Decision:

Conclusion:

Do Not Reject at = .05

No evidence that true mean is more than 368

Z0 1.645

.05

Reject

H0: 368 H1: > 368 1.50

XZ

n

1.50

Chap 9-39

p -Value Solution

Z0 1.50

P-Value =.0668

Z Value of Sample Statistic

From Z Table: Lookup 1.50 to Obtain .9332

Use the alternative hypothesis to find the direction of the rejection region.

1.0000 - .9332 .0668

p-Value is P(Z 1.50) = 0.0668

Chap 9-40

p -Value Solution(continued)

01.50

Z

Reject

(p-Value = 0.0668) ( = 0.05) Do Not Reject.

p Value = 0.0668

= 0.05

Test Statistic 1.50 is in the Do Not Reject Region

1.645

Chap 9-41

Example: Two-Tail Test

Q. Does an average box of cereal contain 368 grams of cereal? A random sample of 25 boxes showed = 372.5. The company has specified to be 15 grams. Test at the 0.05 level.

368 gm.

H0: 368

H1: 368

X

Chap 9-42

372.5 3681.50

1525

XZ

n

= 0.05

n = 25

Critical Value: ±1.96

Example Solution: Two-Tail Test

Test Statistic:

Decision:

Conclusion:


No Evidence that True Mean is Not 368Z0 1.96

.025

Reject

-1.96

.025

H0: 368

H1: 368

1.50

Chap 9-43

p-Value Solution

(p Value = 0.1336) ( = 0.05) Do Not Reject.

01.50

Z

Reject

= 0.05

1.96

p Value = 2 x 0.0668

Test Statistic 1.50 is in the Do Not Reject Region

Reject

Chap 9-44

For 372.5, 15 and 25,

the 95% confidence interval is:

372.5 1.96 15 / 25 372.5 1.96 15 / 25

or

366.62 378.38

If this interval contains the hypothesized mean (368),

we do not reject the null hypothesis.

I

X n

t does. Do not reject.

Connection to Confidence Intervals

Chap 9-45

What is a t Test?

Commonly Used Definition: Comparing two means to see if they are significantly different from each other

Technical Definition: Any statistical test that uses the t family of distributions

Chap 9-46

Independent Samples t Test Use this test when you

want to compare the means of two independent samples on a given variable

• “Independent” means that the members of one sample do not include, and are not matched with, members of the other sample

Example:• Compare the average

height of 50 randomly selected men to that of 50 randomly selected women

Compare using Compare using tt test test

Independent Independent MeanMean

#1#1

Independent Independent MeanMean

#2#2

Chap 9-47

Dependent Samples t Test

Used to compare the means of a single sample or of two matched or paired samples

Example: • If a group of students

took a math test in March and that same group of students took the same math test two months later in May, we could compare their average scores on the two test dates using a dependent samples t test

Chap 9-48

Comparing the Two t TestsIndependent Samples Tests the equality of the

means from two independent groups (diagram below)

Relies on the t distribution to produce the probabilities used to test statistical significance

Dependent Samples Tests the equality of the means

between related groups or of two variables within the same group (diagram below)

Relies on the t distribution to produce the probabilities used to test statistical significance

Before treatmentBefore treatment

Person Person #1#1

After treatmentAfter treatment

Person Person #1#1

Treatment groupTreatment group

Person Person #1#1

Control groupControl group

Person Person #2#2

Chap 9-49

Types

One sample compare with population

Unpairedcompare with control

Pairedsame subjects: pre-post

Z-test large samples >30

Chap 9-50

Compare Means (or medians)Example: Compare blood presures of two or more groups,

or compare BP of one group with a theoretical value.

1 Group: 1. One Sample t test 2. Wilcoxon rank sum test 2 Groups: 1. Unpaired t test 2. Paired t test 3. Mann-Whitney t test 4. Welch’s corrected t test 5. Wilcoxon matched pairs test

Chap 9-51

3-26 Groups: 1. One-way ANOVA 2. Repeated measures ANOVA 3. Kruskal-Wallis test 4. Friedman test

(All with post tests) Raw data Average data Mean, SD, & NAverage data Mean, SEM, & N

Chap 9-52

Is there a difference?

between you…means,

who is meaner?

Chap 9-53

Statistical Analysis

controlgroupmean

treatmentgroupmean

Is there a difference?

Slide downloaded from the Internet

Chap 9-54

What does difference mean?

mediumvariability

highvariability

lowvariability

The mean differenceis the same for all

three cases


Chap 9-55

What does difference mean?

mediumvariability

highvariability

lowvariability

Which one showsthe greatestdifference?


Chap 9-56

t Test: Unknown

Assumption Population is normally distributed If not normal, requires a large sample

T test statistic with n-1 degrees of freedom

/

XtS n

Chap 9-57

Example: One-Tail t Test

Does an average box of cereal contain more than 368 grams of cereal? A random sample of 36 boxes showed X = 372.5, ands 15. Test at the 0.01 level.

368 gm.

H0: 368 H1: 368

is not given

Chap 9-58

Example Solution: One-Tail

= 0.01

n = 36, df = 35

Critical Value: 2.4377

Test Statistic:

Decision:

Conclusion:


No evidence that true mean is more than 368t35

0 2.4377

.01

Reject

H0: 368 H1: 368

372.5 3681.80

1536

Xt

Sn

1.80

Chap 9-59

The t Table

Since it takes into account the changing shape of the distribution as n increases, there is a separate curve for each sample size (or degrees of freedom).

However, there is not enough space in the table to put all of the different probabilities corresponding to each possible t score.

The t table lists commonly used critical regions (at popular alpha levels).

Chap 9-60

Z-distribution versus t-distribution

Chap 9-61



distribution

Chap 9-62

Summary

We can use the z distribution for testing hypotheses involving one or two independent samples To use z, the samples are independent

and normally distributed The sample size must be greater than 30 Population parameters must be known

Chap 9-63

basics of statistics

Documents

mean chap

mean mean

true chap

sample ttest

sample ztest

sample testschap

statistical hypothesis

null hypothesis x