biostatistics course part 8 inferences of a mean dr. sc nicolas padilla raygoza department of...

24
Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering Campus Celaya Salvatierra University of Guanajuato Mexico

Upload: gervais-baker

Post on 12-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Biostatistics coursePart 8

Inferences of a mean

Dr. Sc Nicolas Padilla RaygozaDepartment of Nursing and Obstetrics

Division of Health Sciences and EngineeringCampus Celaya Salvatierra

University of Guanajuato Mexico

Page 2: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Biosketch

Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on

Pediatrics. Postgraduate Diploma on Epidemiology, London School of

Hygine and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International

University. Doctorate Sciences with aim in Epidemiology, Atlantic

International University. Professor Titular A, Full Time, University of Guanajuato. Level 1 National Researcher System [email protected]

Page 3: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Competencies

The reader will apply a Z test to obtain inferences of a mean.

The reader will obtain a confidence interval for a mean.

He (she) will apply a t test for a mean in a short sample.

He (she) will obtain a confidence interval for a mean in a short sample.

Page 4: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Introduction

If we measure the stature of students of FEOC, we can obtain its mean and standard deviation: Number of students: 269 Mean of stature: 161.6 cm Standard deviation: 6.3 cm Median: 159 cm Range: 149 a 185 cm.

Page 5: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Notation

For parameters of population, we use Greek letters; to parameters in sample, we use Roman letters.

Parameter Population Sample

Mean μ _

X

Standard deviation

σ s

Page 6: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Sampling distribution

If we take many samples of the same size of the same population, each sample can have different mean and standard deviation.

If we plot these sample means we can obtain a sampling distribution.

If the sample size is big, the mean distribution is almost Normal, although data distribution in the population is not Normal.

Page 7: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Sampling distribution (contd…)Stature (cm) n % % accumulated

149 2 0.7 0.7

150 3 1.1 1.8

152 6 2.2 4.0

154 12 4.5 8.5

155 27 10.0 18.5

157 29 10.8 29.3

158 26 9.7 39.0

159 33 12.3 51.3

163 37 13.8 65.1

164 16 5.9 71.0

165 24 8.9 79.9

168 18 6.7 86.6

169 14 5.2 91.8

171 6 2.2 94.0

174 7 2.6 96.6

175 1 0.4 97.0

177 4 1.5 98.5

179 2 0.7 99.2

184 1 0.4 99.6

185 1 0.4 100.0

Total 269 100.0

Data of students from FEOC. If we take other 999 samples of students, we can graphic the distribution of their means.

Sampling distribution 1000 samples; n=269

0

100

200

300

158 157 158 159 160 161 162

Means of stature (cm)

Freq

uenc

y

Page 8: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

95% Confidence Interval

They use the probability theory to obtain conclusions on a population, from data obtained of a sample.

It is difficult study all population, because of this, we study samples.

Methods for obtain estimates and hypothesis test are important to obtain inferences.

Page 9: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

95% confidence intervals (contd…)

Then, the confidence intervals for a mean, are calculated: _ X ± 1.96 (ES) _ X is the estimate obtained of the sample, 1.96 is the multiply of standard errors for 95%, SE is the standard error

We should wait that the 95% confidence interval around of the mean of sample include the mean of the population in the 95% of times, if we obtain thousands of samples.

Page 10: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

95% confidence intervals (contd…)

We calculate 95% confidence interval for the first sample of 269 students from FEOC:

_X = 161.6

SE= 6.3/√269= 0.38

95%CI= 161.6 ± 1.96 (0.38) = 161.6 ± 0.74 =

160.86 a 162.34

Page 11: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

95% confidence intervals (cont…)

We can use confidence intervals in another percentage of confidence, only we need change the multiply of standard error: For example, for 90% change to 1.69. For 95.4% change to 2. For 99% change to 3.

Page 12: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Hypothesis test for a mean Hypothesis test is to probe if our estimate is similar with a specific

value. Our sample of 269 students had a mean of 161.6 with

standard deviation of 6.3 and standard error of 0.38. In a similar study in students from School of Accounting and

Administration, obtained a mean of stature of 167 cm. How we can demonstrate if the stature of students from FEOC is

equal or different that stature of students from FCA? Mean of FEOC 161.6 Mean of FCA 167 We can see that obviously, they are different.

But, we do not know if the observe difference is true or it is by error sampling, because 161.6 is an estimate of many that we can have obtaining

Page 13: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Hypothesis test for a mean (contd…)

To evaluate if the observe difference is real, we can do: Null hypothesis say that the means of both populations

are the same (the first population is students from FEOC and reference population is students from FCA).

Null hypothesis is writing as Ho. If the mean of hypothesis is μo and the mean in study is

μ, then, null hypothesis is writing as HO : μ = μo Alternative hypothesis It is that the means of two populations are not equal. Usually, it is writing as H1: μ≠μ0

Page 14: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Hypothesis test for a mean (contd…)

When we are pointing the null hypothesis, calculate the probability of obtain the observe data if the null hypothesis is true.

To obtain this probability, we calculate a statistic test and it is compare with the distribution implicated for the null hypothesis.

In many cases it will be Normal distribution.

Page 15: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Hypothesis test for a mean (contd…)

The general form of statistic test compare the estimate of observed values of the sample and the expected value if the null hypothesis is true.

Also, it take into account the variability in the population using standard error.

This statistic test is called Z and it is equal to: _ X – μo Z= ------------ ES

_

Then, the test is a standardized difference between X and μo.

Page 16: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Example

The students sample from FEOCMean = 161.6S = 6.395%CI = 160.6 a 162.60

Null hypothesis; there is not difference between the means of students from FEOC and FCA

Ho: μ = 167cmWe need use Z test: _ X – μo 161.6 -167z = ----------- = ---------------- = - 14.21 ES(X) 0.38

Page 17: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Small samples

If the sample size is small, we use t distribution.

Its form depend of freedom degrees, that it is a measure that is so small is sample size.

The degree freedom of a t distribution is equal to sample size minus 1.

Page 18: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Small samples

Less freedom degrees, less probability of stay around of mean of sample and high probability to stay in the tails.

The t distributions with a few freedom degrees have more smaller probabilities to sides of the mean and higher probabilities in the tails.

However, if the samples size is bigger and more freedom degrees, more similar is t distribution to Normal distribution.

There are published tables of selected values of the area under t distribution that we shall use when calculate confidence intervals and hypothesis test.

Page 19: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Small samples

When the sample size is small, less than 100, the formulas for confidence interval and hypothesis test, are:

95%CI Hypothesis testEstimate ± multiplier (standard error) To test Ho: μ=μoEstimate is the estimate mean To test H1: μ≠μoMultiplier is the value of t _

Correspond to p=0.05 with degree X – μ0

Freedom equal to sample size minus 1 t = --------- SE

Page 20: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

P values

One or two tails? Now, we know that the p value is the probability to

obtain a result at least extreme as the found with our sample, if the null hypothesis are true.

But, what is the meaning of extreme? When the alternative hypothesis is H1: µ ≠ µo Then, the extreme results can occur for chance at each

side of the mean of the hypothesis, µo. Due of this, we used tables for two tails of Normal and t

distributions.

Page 21: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

P values

There are occasions less common where the alternative hypothesis is H1: µ < µo or H1: µ > µo

Then, extreme values can occur only to the left or only to the right, of the mean of hypothesis.

How little is little the p value? Many people are using the p value of 0.05 as cut

point. This is a arbitrary value, but it is sensitive. The meaning is that we are prepare to reject the null hypothesis at least one time of 20 when is true.

Note that when the value of a test has a p value less than 0.05, the confidence interval does not include the hypothesis value.

Page 22: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

P values

If we obtained a p value of 0.048, can we reject the null hypothesis?

If we obtained a p value of 0.052 , do we cannot reject the null hypothesis?

When the p values are between 0.07 and 0.03 they should be joint the real p value, because they are in the border of significance statistic.

Page 23: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Showing the results

We should show the results with their confidence intervals.

Clear what is the null and alternative hypothesis.

Show the p value of each test; it is sufficient with say p< 0.001 when apply.

Not misunderstood the p values A small p value reject the null hypothesis, A high p value only does not reject the null

hypothesis

Page 24: Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering

Bibliografía

1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173.

2.- Kirkwood BR. Essentials of medical statistics. Oxford, Blackwell Science, 1988: 1-4.

3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.