biostatistics course part 8 inferences of a mean dr. sc nicolas padilla raygoza department of...
TRANSCRIPT
Biostatistics coursePart 8
Inferences of a mean
Dr. Sc Nicolas Padilla RaygozaDepartment of Nursing and Obstetrics
Division of Health Sciences and EngineeringCampus Celaya Salvatierra
University of Guanajuato Mexico
Biosketch
Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on
Pediatrics. Postgraduate Diploma on Epidemiology, London School of
Hygine and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International
University. Doctorate Sciences with aim in Epidemiology, Atlantic
International University. Professor Titular A, Full Time, University of Guanajuato. Level 1 National Researcher System [email protected]
Competencies
The reader will apply a Z test to obtain inferences of a mean.
The reader will obtain a confidence interval for a mean.
He (she) will apply a t test for a mean in a short sample.
He (she) will obtain a confidence interval for a mean in a short sample.
Introduction
If we measure the stature of students of FEOC, we can obtain its mean and standard deviation: Number of students: 269 Mean of stature: 161.6 cm Standard deviation: 6.3 cm Median: 159 cm Range: 149 a 185 cm.
Notation
For parameters of population, we use Greek letters; to parameters in sample, we use Roman letters.
Parameter Population Sample
Mean μ _
X
Standard deviation
σ s
Sampling distribution
If we take many samples of the same size of the same population, each sample can have different mean and standard deviation.
If we plot these sample means we can obtain a sampling distribution.
If the sample size is big, the mean distribution is almost Normal, although data distribution in the population is not Normal.
Sampling distribution (contd…)Stature (cm) n % % accumulated
149 2 0.7 0.7
150 3 1.1 1.8
152 6 2.2 4.0
154 12 4.5 8.5
155 27 10.0 18.5
157 29 10.8 29.3
158 26 9.7 39.0
159 33 12.3 51.3
163 37 13.8 65.1
164 16 5.9 71.0
165 24 8.9 79.9
168 18 6.7 86.6
169 14 5.2 91.8
171 6 2.2 94.0
174 7 2.6 96.6
175 1 0.4 97.0
177 4 1.5 98.5
179 2 0.7 99.2
184 1 0.4 99.6
185 1 0.4 100.0
Total 269 100.0
Data of students from FEOC. If we take other 999 samples of students, we can graphic the distribution of their means.
Sampling distribution 1000 samples; n=269
0
100
200
300
158 157 158 159 160 161 162
Means of stature (cm)
Freq
uenc
y
95% Confidence Interval
They use the probability theory to obtain conclusions on a population, from data obtained of a sample.
It is difficult study all population, because of this, we study samples.
Methods for obtain estimates and hypothesis test are important to obtain inferences.
95% confidence intervals (contd…)
Then, the confidence intervals for a mean, are calculated: _ X ± 1.96 (ES) _ X is the estimate obtained of the sample, 1.96 is the multiply of standard errors for 95%, SE is the standard error
We should wait that the 95% confidence interval around of the mean of sample include the mean of the population in the 95% of times, if we obtain thousands of samples.
95% confidence intervals (contd…)
We calculate 95% confidence interval for the first sample of 269 students from FEOC:
_X = 161.6
SE= 6.3/√269= 0.38
95%CI= 161.6 ± 1.96 (0.38) = 161.6 ± 0.74 =
160.86 a 162.34
95% confidence intervals (cont…)
We can use confidence intervals in another percentage of confidence, only we need change the multiply of standard error: For example, for 90% change to 1.69. For 95.4% change to 2. For 99% change to 3.
Hypothesis test for a mean Hypothesis test is to probe if our estimate is similar with a specific
value. Our sample of 269 students had a mean of 161.6 with
standard deviation of 6.3 and standard error of 0.38. In a similar study in students from School of Accounting and
Administration, obtained a mean of stature of 167 cm. How we can demonstrate if the stature of students from FEOC is
equal or different that stature of students from FCA? Mean of FEOC 161.6 Mean of FCA 167 We can see that obviously, they are different.
But, we do not know if the observe difference is true or it is by error sampling, because 161.6 is an estimate of many that we can have obtaining
Hypothesis test for a mean (contd…)
To evaluate if the observe difference is real, we can do: Null hypothesis say that the means of both populations
are the same (the first population is students from FEOC and reference population is students from FCA).
Null hypothesis is writing as Ho. If the mean of hypothesis is μo and the mean in study is
μ, then, null hypothesis is writing as HO : μ = μo Alternative hypothesis It is that the means of two populations are not equal. Usually, it is writing as H1: μ≠μ0
Hypothesis test for a mean (contd…)
When we are pointing the null hypothesis, calculate the probability of obtain the observe data if the null hypothesis is true.
To obtain this probability, we calculate a statistic test and it is compare with the distribution implicated for the null hypothesis.
In many cases it will be Normal distribution.
Hypothesis test for a mean (contd…)
The general form of statistic test compare the estimate of observed values of the sample and the expected value if the null hypothesis is true.
Also, it take into account the variability in the population using standard error.
This statistic test is called Z and it is equal to: _ X – μo Z= ------------ ES
_
Then, the test is a standardized difference between X and μo.
Example
The students sample from FEOCMean = 161.6S = 6.395%CI = 160.6 a 162.60
Null hypothesis; there is not difference between the means of students from FEOC and FCA
Ho: μ = 167cmWe need use Z test: _ X – μo 161.6 -167z = ----------- = ---------------- = - 14.21 ES(X) 0.38
Small samples
If the sample size is small, we use t distribution.
Its form depend of freedom degrees, that it is a measure that is so small is sample size.
The degree freedom of a t distribution is equal to sample size minus 1.
Small samples
Less freedom degrees, less probability of stay around of mean of sample and high probability to stay in the tails.
The t distributions with a few freedom degrees have more smaller probabilities to sides of the mean and higher probabilities in the tails.
However, if the samples size is bigger and more freedom degrees, more similar is t distribution to Normal distribution.
There are published tables of selected values of the area under t distribution that we shall use when calculate confidence intervals and hypothesis test.
Small samples
When the sample size is small, less than 100, the formulas for confidence interval and hypothesis test, are:
95%CI Hypothesis testEstimate ± multiplier (standard error) To test Ho: μ=μoEstimate is the estimate mean To test H1: μ≠μoMultiplier is the value of t _
Correspond to p=0.05 with degree X – μ0
Freedom equal to sample size minus 1 t = --------- SE
P values
One or two tails? Now, we know that the p value is the probability to
obtain a result at least extreme as the found with our sample, if the null hypothesis are true.
But, what is the meaning of extreme? When the alternative hypothesis is H1: µ ≠ µo Then, the extreme results can occur for chance at each
side of the mean of the hypothesis, µo. Due of this, we used tables for two tails of Normal and t
distributions.
P values
There are occasions less common where the alternative hypothesis is H1: µ < µo or H1: µ > µo
Then, extreme values can occur only to the left or only to the right, of the mean of hypothesis.
How little is little the p value? Many people are using the p value of 0.05 as cut
point. This is a arbitrary value, but it is sensitive. The meaning is that we are prepare to reject the null hypothesis at least one time of 20 when is true.
Note that when the value of a test has a p value less than 0.05, the confidence interval does not include the hypothesis value.
P values
If we obtained a p value of 0.048, can we reject the null hypothesis?
If we obtained a p value of 0.052 , do we cannot reject the null hypothesis?
When the p values are between 0.07 and 0.03 they should be joint the real p value, because they are in the border of significance statistic.
Showing the results
We should show the results with their confidence intervals.
Clear what is the null and alternative hypothesis.
Show the p value of each test; it is sufficient with say p< 0.001 when apply.
Not misunderstood the p values A small p value reject the null hypothesis, A high p value only does not reject the null
hypothesis
Bibliografía
1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173.
2.- Kirkwood BR. Essentials of medical statistics. Oxford, Blackwell Science, 1988: 1-4.
3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.