basics of statistics
DESCRIPTION
TRANSCRIPT
Chap 9-1
Basic Statistics
Fundamentals of Hypothesis Testing:
One-Sample, Two-Sample Tests
Chap 9-2
What is biostatistics
Statistics is the science and art of collecting, summarizing, and analyzing data that are subject to random variation.
Biostatistics is the application of statistics and mathematical methods to the design and analysis of health, biomedical, and biological studies.
Chap 9-3
Different Tests of Significance
1. One-Sample z-test or t-testa. Compares one sample mean versus a population
mean
2. Two-Sample t-testa. Compares one sample mean versus another
sample meana. Independent t-tests (equal samples)b. Dependent t-tests (dependent/paired samples)
3. One-way analysis of variance (ANOVA)a. Comparing several sample means
Chap 9-4
How to properly useBiostatistics
Develop an underlying question of interest
Generate a hypothesis Design a study (Protocol) Collect Data Analyze Data
Descriptive statistics Statistical Inference
Chap 9-5
Relationship between population and sample
(Simple random sampling)
Chap 9-6
Sampling Techniques
Population
Simple RandomSample
Systematic Sampling
Stratified RandomSample
ConvenienceSampling
Cluster Sampling
Bias free sample
Bias freesample
Biasedsample
Bias freesample
Biasedsample
Chap 9-7
Example
How are my 10 patients doing after I put them on an anti-hypertensive medications? Describe the results of your 10 patients
Chap 9-8
Example
What is the in hospital mortality rate after open heart surgery at SAL hospital so far this year Describe the mortality
What is the in hospital mortality after open heart surgery likely to be this year, given results from last year Estimate probability of death for patients
like those seen in the previous year.
Chap 9-9
Misuse of statistics
About 25% of biological research is flawed because of incorrect conclusions drawn from confounded experimental designs and misuse of statistical methods
Chap 9-10
What is a Hypothesis?
A hypothesis is a claim (assumption)about the populationparameter Difference
between the value of sample statistic and the corresponding hypothesized parameter value is called hypothesis testing.
I claim that mean CVD in the INDIA is atleast 3!
© 1984-1994 T/Maker Co.
Chap 9-11
Hypothesis Testing Process
Identify the Population
Assume thepopulation
mean age is 50.
( )
REJECT
Take a Sample
Null Hypothesis
No, not likely!X 20 likely if Is ?
0 : 50H
20X
Chap 9-12
Sampling Distribution of
= 50
It is unlikely that we would get a sample mean of this value ...
... Therefore, we reject the
null hypothesis
that m = 50.
Reason for Rejecting H0
20
If H0 is trueX
... if in fact this were the population mean.
X
Chap 9-13
Biostatistics
DescriptiveStatistical Inference
Estimation Hypothesis Testing
Confidence Intervals P-values
Components of Biostatistics
Chap 9-14
A variable is said to be normally distributed or to have a normal distribution if its distribution has the shape of a normal curve.
Normal Distribution
Chap 9-15
Normal distribution bell-shaped
symmetrical about the mean (No skewness)
total area under curve = 1
approximately 68% of distribution is within one standard deviation of the mean
approximately 95% of distribution is within two standard deviations of the mean
approximately 99.7% of distribution is within 3 standard deviations of the mean
Mean = Median = Mode
Chap 9-16
Empirical Rule
About 95% of the area lies within 2 standard
deviations
About 99.7% of the area lies within 3 standard deviations of the mean
68%
About 68% of the area lies within 1 standard deviation of the mean
3 2 32
Chap 9-17
Chap 9-18
Level of Significance,
Is designated by , (level of significance) Typical values are .01, .05, .10
Is selected by the researcher at the beginning
Provides the critical value(s) of the test
Chap 9-19
The z-Test for Comparing Population
MeansCritical values for standard normal
distribution
Chap 9-20
Level of Significance and the Rejection Region
H0: 3
H1: < 30
0
0
H0: 3
H1: > 3
H0: 3
H1: 3
/2
Critical Value(s)
Rejection Regions
I claim that mean CVD in the INDIA is atleast 3!
Chap 9-21
Hypothesis Testing
1. State the research question.2. State the statistical hypothesis.3. Set decision rule.4. Calculate the test statistic.5. Decide if result is significant.6. Interpret result as it relates to your
research question.
Chap 9-22
Rejection & Nonrejection Regions
Two-tailed test Left-tailed test Right-tailed
Sign in Ha < >
Rejection region Both sides Left side Right side=
I claim that mean CVD in the INDIA is atleast 3!
Chap 9-23
The Null Hypothesis, H0
States the assumption (numerical) to be tested e.g.: The average number of CVD in INDIA is
at least three ( ) Is always about a population parameter
( ), not about a sample statistic ( )
0 : 3H
0 : 3H 0 : 3H X
Chap 9-24
The Null Hypothesis, H0
Begins with the assumption that the null hypothesis is true Similar to the notion of innocent until
proven guilty
(continued)
Chap 9-25
The Alternative Hypothesis, H1
Is the opposite of the null hypothesis e.g.: The average number of CVD in
INDIA is less than 3 ( ) Never contains the “=” sign May or may not be accepted
1 : 3H
Chap 9-26
General Steps in Hypothesis Testing
e.g.: Test the assumption that the true mean number of of CVD in INDIA is at least three ( Known)
1. State the H0
2. State the H1
3. Choose
4. Choose n
5. Choose Test
0
1
: 3
: 3
=.05
100
Z
H
H
n
test
Chap 9-27
100 persons surveyed
Computed test stat =-2,p-value = .0228
Reject null hypothesis
The true mean number of CVD is less than 3 in human population.
(continued)
Reject H0
-1.645Z
6. Set up critical value(s)
7. Collect data
8. Compute test statistic and p-value
9. Make statistical decision
10. Express conclusion
General Steps in Hypothesis Testing
Chap 9-28
The z-Test for Comparing Population
MeansCritical values for standard normal
distribution
Chap 9-29
p-Value Approach to Testing Convert Sample Statistic (e.g. ) to
Test Statistic (e.g. Z, t or F –statistic) Obtain the p-value from a table or
computer
Compare the p-value with If p-value , do not reject H0
If p-value , reject H0
X
Chap 9-30
Comparison of Critical-Value & P-Value Approaches
Critical-Value Approach P-Value Approach
Step1 State the null and alternative hypothesis.
Step1 State the null and alternative hypothesis.
Step 2 Decide on the significance level,
Step 2 Decide on the significance level,
Step 3 Compute the value of the test statistic.
Step 3 Compute the value of the test statistic.
Step 4 Determine the critical value(s).
Step 4 Determine the P-value.
Step 5 If the value of the test statistic falls in the rejection region,
reject Ho; otherwise, do not reject
Ho.
Step 5 If P < , reject Ho;
otherwise do not reject Ho.
Step 6 Interpret the result of the hypothesis test.
Step 6 Interpret the result of the hypothesis test.
Chap 9-31
Result ProbabilitiesH0: Innocent
The Truth The Truth
Verdict Innocent Guilty Decision H0 True H0 False
Innocent Correct ErrorDo NotReject
H0
1 - Type IIError ( )
Guilty Error Correct RejectH0
Type IError( )
Power(1 - )
Jury Trial Hypothesis Test
Chap 9-32
Type I & II Errors Have an Inverse Relationship
If you reduce the probability of one error, the other one increases so that everything else is unchanged.
Chap 9-33
Critical Values Approach to Testing
Convert sample statistic (e.g.: ) to test statistic (e.g.: Z, t or F –statistic)
Obtain critical value(s) for a specifiedfrom a table or computer If the test statistic falls in the critical region,
reject H0
Otherwise do not reject H0
X
Chap 9-34
One-tail Z Test for Mean( Known)
Assumptions Population is normally distributed If not normal, requires large samples Null hypothesis has or sign only
Z test statistic
/X
X
X XZ
n
Chap 9-35
Rejection Region
Z0
Reject H0
Z0
Reject H0
H0: 0 H1: < 0
H0: 0 H1: > 0
Z Must Be Significantly Below 0
to reject H0
Small values of Z don’t contradict H0
Don’t Reject H0 !
Chap 9-36
Example: One Tail Test
Q. Does an average box of cereal contain more than 368 grams of cereal? A random sample of 25 boxes showed = 372.5. The company has specified to be 15 grams. Test at the 0.05 level.
368 gm.
H0: 368 H1: > 368
X
Chap 9-37
Finding Critical Value: One Tail
Z .04 .06
1.6 .9495 .9505 .9515
1.7 .9591 .9599 .9608
1.8 .9671 .9678 .9686
.9738 .9750
Z0 1.645
.05
1.9 .9744
Standardized Cumulative Normal Distribution Table
(Portion)
What is Z given = 0.05?
= .05
Critical Value = 1.645
.95
1Z
Chap 9-38
Example Solution: One Tail Test
= 0.5
n = 25
Critical Value: 1.645
Decision:
Conclusion:
Do Not Reject at = .05
No evidence that true mean is more than 368
Z0 1.645
.05
Reject
H0: 368 H1: > 368 1.50
XZ
n
1.50
Chap 9-39
p -Value Solution
Z0 1.50
P-Value =.0668
Z Value of Sample Statistic
From Z Table: Lookup 1.50 to Obtain .9332
Use the alternative hypothesis to find the direction of the rejection region.
1.0000 - .9332 .0668
p-Value is P(Z 1.50) = 0.0668
Chap 9-40
p -Value Solution(continued)
01.50
Z
Reject
(p-Value = 0.0668) ( = 0.05) Do Not Reject.
p Value = 0.0668
= 0.05
Test Statistic 1.50 is in the Do Not Reject Region
1.645
Chap 9-41
Example: Two-Tail Test
Q. Does an average box of cereal contain 368 grams of cereal? A random sample of 25 boxes showed = 372.5. The company has specified to be 15 grams. Test at the 0.05 level.
368 gm.
H0: 368
H1: 368
X
Chap 9-42
372.5 3681.50
1525
XZ
n
= 0.05
n = 25
Critical Value: ±1.96
Example Solution: Two-Tail Test
Test Statistic:
Decision:
Conclusion:
Do Not Reject at = .05
No Evidence that True Mean is Not 368Z0 1.96
.025
Reject
-1.96
.025
H0: 368
H1: 368
1.50
Chap 9-43
p-Value Solution
(p Value = 0.1336) ( = 0.05) Do Not Reject.
01.50
Z
Reject
= 0.05
1.96
p Value = 2 x 0.0668
Test Statistic 1.50 is in the Do Not Reject Region
Reject
Chap 9-44
For 372.5, 15 and 25,
the 95% confidence interval is:
372.5 1.96 15 / 25 372.5 1.96 15 / 25
or
366.62 378.38
If this interval contains the hypothesized mean (368),
we do not reject the null hypothesis.
I
X n
t does. Do not reject.
Connection to Confidence Intervals
Chap 9-45
What is a t Test?
Commonly Used Definition: Comparing two means to see if they are significantly different from each other
Technical Definition: Any statistical test that uses the t family of distributions
Chap 9-46
Independent Samples t Test Use this test when you
want to compare the means of two independent samples on a given variable
• “Independent” means that the members of one sample do not include, and are not matched with, members of the other sample
Example:• Compare the average
height of 50 randomly selected men to that of 50 randomly selected women
Compare using Compare using tt test test
Independent Independent MeanMean
#1#1
Independent Independent MeanMean
#2#2
Chap 9-47
Dependent Samples t Test
Used to compare the means of a single sample or of two matched or paired samples
Example: • If a group of students
took a math test in March and that same group of students took the same math test two months later in May, we could compare their average scores on the two test dates using a dependent samples t test
Chap 9-48
Comparing the Two t TestsIndependent Samples Tests the equality of the
means from two independent groups (diagram below)
Relies on the t distribution to produce the probabilities used to test statistical significance
Dependent Samples Tests the equality of the means
between related groups or of two variables within the same group (diagram below)
Relies on the t distribution to produce the probabilities used to test statistical significance
Before treatmentBefore treatment
Person Person #1#1
After treatmentAfter treatment
Person Person #1#1
Treatment groupTreatment group
Person Person #1#1
Control groupControl group
Person Person #2#2
Chap 9-49
Types
One sample compare with population
Unpairedcompare with control
Pairedsame subjects: pre-post
Z-test large samples >30
Chap 9-50
Compare Means (or medians)Example: Compare blood presures of two or more groups,
or compare BP of one group with a theoretical value.
1 Group: 1. One Sample t test 2. Wilcoxon rank sum test 2 Groups: 1. Unpaired t test 2. Paired t test 3. Mann-Whitney t test 4. Welch’s corrected t test 5. Wilcoxon matched pairs test
Chap 9-51
3-26 Groups: 1. One-way ANOVA 2. Repeated measures ANOVA 3. Kruskal-Wallis test 4. Friedman test
(All with post tests) Raw data Average data Mean, SD, & NAverage data Mean, SEM, & N
Chap 9-52
Is there a difference?
between you…means,
who is meaner?
Chap 9-53
Statistical Analysis
controlgroupmean
treatmentgroupmean
Is there a difference?
Slide downloaded from the Internet
Chap 9-54
What does difference mean?
mediumvariability
highvariability
lowvariability
The mean differenceis the same for all
three cases
Slide downloaded from the Internet
Chap 9-55
What does difference mean?
mediumvariability
highvariability
lowvariability
Which one showsthe greatestdifference?
Slide downloaded from the Internet
Chap 9-56
t Test: Unknown
Assumption Population is normally distributed If not normal, requires a large sample
T test statistic with n-1 degrees of freedom
/
XtS n
Chap 9-57
Example: One-Tail t Test
Does an average box of cereal contain more than 368 grams of cereal? A random sample of 36 boxes showed X = 372.5, ands 15. Test at the 0.01 level.
368 gm.
H0: 368 H1: 368
is not given
Chap 9-58
Example Solution: One-Tail
= 0.01
n = 36, df = 35
Critical Value: 2.4377
Test Statistic:
Decision:
Conclusion:
Do Not Reject at = .01
No evidence that true mean is more than 368t35
0 2.4377
.01
Reject
H0: 368 H1: 368
372.5 3681.80
1536
Xt
Sn
1.80
Chap 9-59
The t Table
Since it takes into account the changing shape of the distribution as n increases, there is a separate curve for each sample size (or degrees of freedom).
However, there is not enough space in the table to put all of the different probabilities corresponding to each possible t score.
The t table lists commonly used critical regions (at popular alpha levels).
Chap 9-60
Z-distribution versus t-distribution
Chap 9-61
The z-Test for Comparing Population
MeansCritical values for standard normal
distribution
Chap 9-62
Summary
We can use the z distribution for testing hypotheses involving one or two independent samples To use z, the samples are independent
and normally distributed The sample size must be greater than 30 Population parameters must be known
Chap 9-63