introduction to biostatistics-145 lectures4

257
Lectures of Stat -145 (Biostatistics) Text book Biostatistics Basic Concepts and Methodology for the Health Sciences By Wayne W. Daniel Prepared By: Sana A. Abunasrah

Upload: hanim-rahim

Post on 17-Feb-2016

89 views

Category:

Documents


29 download

DESCRIPTION

introduction to Biostatistics-145 Lectures4

TRANSCRIPT

Page 1: Introduction to Biostatistics-145 Lectures4

Lectures of Stat -145(Biostatistics)

Text bookBiostatistics

Basic Concepts and Methodology for the Health Sciences

ByWayne W. Daniel

Prepared By:Sana A. Abunasrah

Page 2: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

2

Chapter 1

Introduction To

Biostatistics

Page 3: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

3

Key words :

Statistics , data , Biostatistics, Variable ,Population ,Sample

Page 4: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

4

IntroductionSome Basic concepts

Statistics is a field of study concerned with

1- collection, organization, summarization and analysis of data.

2- drawing of inferences about a body of data when only a part of the data is observed.

Statisticians try to interpret and communicate the results to

others.

Page 5: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

5

* Biostatistics:The tools of statistics are employed in

many fields:business, education, psychology,

agriculture, economics, … etc.When the data analyzed are derived

from the biological science and medicine,

we use the term biostatistics to distinguish this particular application of statistical tools and concepts.

Page 6: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

6

Data:• The raw material of Statistics is data. • We may define data as figures. Figures

result from the process of counting or from taking a measurement.

•For example: • - When a hospital administrator counts

the number of patients (counting).• - When a nurse weighs a patient

(measurement)

Page 7: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

7

We search for suitable data to serve as the raw material for our investigation.

Such data are available from one or more of the following sources:

1- Routinely kept records. For example:- Hospital medical records contain

immense amounts of information on patients.

- Hospital accounting records contain a wealth of data on the facility’s business

- activities.

*Sources of Data:

Page 8: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

8

2- External sources.The data needed to answer a question may already exist in the form ofpublished reports, commercially available data banks, or the research literature, i.e. someone else has already asked the same question.

Page 9: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

9

3- Surveys:The source may be a survey, if the data

needed is about answering certain questions.

For example: If the administrator of a clinic wishes to

obtain information regarding the mode of transportation used by patients to visit the clinic, then a survey may be conducted among

patients to obtain this information.

Page 10: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

10

4- Experiments.Frequently the data needed to answer

a question are available only as the result of an experiment.For example:If a nurse wishes to know which of several

strategies is best for maximizing patient compliance, she might conduct an experiment in which the different strategies of motivating compliance

are tried with different patients.

Page 11: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

11

*A variable:It is a characteristic that takes on

different values in different persons, places, or things.

For example:- heart rate, - the heights of adult males, - the weights of preschool children,- the ages of patients seen in a dental

clinic.

Page 12: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

12

Quantitative Variables

It can be measured in the usual sense.

For example: - the heights of

adult males, - the weights of

preschool children,

- the ages of patients seen in a

- dental clinic.

Qualitative VariablesMany characteristics

are not capable of being measured. Some of them can be ordered or ranked.

For example:- classification of people

into socio-economic groups,

- social classes based on income, education, etc.

Types of variables

Quantitative Qualitative

Page 13: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

13

A discrete variableis characterized by

gaps or interruptions in the values that it can assume.

For example:- The number of daily

admissions to a general hospital,

- The number of decayed, missing or filled teeth per child

- in an - elementary - school.

A continuous variablecan assume any value within

a specified relevant interval of values assumed by the variable.

For example:- Height, - weight, - skull circumference.No matter how close together

the observed heights of two people, we can find another person whose height falls somewhere in between.

Types of quantitative variables

Discrete Continuous

Page 14: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

14

* A population:It is the largest collection of It is the largest collection of valuesvalues

of a of a ranrandom variabledom variable for which we for which we have an interest at a particular have an interest at a particular time. time.

For example: The weights of all the children

enrolled in a certain elementary school.

Populations may be finite or infinite.

Page 15: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

15

** A sample: A sample:It is a part of a population. It is a part of a population. For example:The weights of only a fraction

of these children.

Page 16: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

16

Excercises• Question (6) – Page 17• Question (7) – Page 17 “ Situation A , Situation B “

Page 17: Introduction to Biostatistics-145 Lectures4

Chapter ( 2 )Chapter ( 2 )Strategies for Strategies for

understanding the understanding the meanings of Datameanings of Data

Pages( 19 – 27)Pages( 19 – 27)

Page 18: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

1818

Key wordsKey words

frequency table, bar chart ,rangefrequency table, bar chart ,range width of interval ,width of interval , mid-intervalmid-interval Histogram , PolygonHistogram , Polygon

Page 19: Introduction to Biostatistics-145 Lectures4

Descriptive StatisticsDescriptive StatisticsFrequency Distribution Frequency Distribution

for Discrete Random Variablesfor Discrete Random VariablesExample:Example:Suppose that we take a Suppose that we take a samplesample of size 16 from of size 16 from children in a primary school children in a primary school and get the following data and get the following data about the number of their about the number of their decayed teeth,decayed teeth,3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,13,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1To construct a To construct a frequencyfrequency table:table:1- 1- OrderOrder the values from the the values from the smallest to the largest.smallest to the largest.0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,50,1,1,2,2,2,2,3,3,3,3,3,4,4,5,52- 2- CountCount how many how many numbers are the same.numbers are the same.

No. of decayed

teeth

FrequencyRelativeFrequency

012345

124522

0.06250.1250.25

0.31250.1250.125

Total161

Page 20: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2020

Representing the Representing the simple frequency table simple frequency table

using the bar chartusing the bar chart

Number of decayed teeth

5.004.003.002.001.00.00

Freq

uenc

y

6

5

4

3

2

1

0

22

5

4

2

1

We can represent the above simple frequency table using the bar chart.

Page 21: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2121

2.3 Frequency Distribution 2.3 Frequency Distribution for Continuous Random Variablesfor Continuous Random Variables

For For large sampleslarge samples, we can’t use the simple frequency table to , we can’t use the simple frequency table to represent the data.represent the data.

We need to We need to dividedivide the data into the data into groupsgroups or or intervals intervals oror classes.classes.

So, we need to determine:So, we need to determine:

1- The number of intervals (k).1- The number of intervals (k).Too fewToo few intervals are not good because information will be intervals are not good because information will be

lost.lost.Too manyToo many intervals are not helpful to summarize the data. intervals are not helpful to summarize the data.A commonly followed rule is that A commonly followed rule is that 6 ≤ k ≤ 15,6 ≤ k ≤ 15,or the following formula may be used,or the following formula may be used,k = 1 + 3.322 (log n)k = 1 + 3.322 (log n)

Page 22: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2222

2- The range (R).2- The range (R).It is the difference between the It is the difference between the largest and the smallest observation largest and the smallest observation in the data set.in the data set.

3- The Width of the interval (w).3- The Width of the interval (w).ClassClass intervals generally should be of intervals generally should be of

the the same widthsame width. Thus, if we want k . Thus, if we want k intervals, then w is chosen such that intervals, then w is chosen such that

w ≥ R / k.w ≥ R / k.

Page 23: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2323

Example:Example:Assume that the number of observations Assume that the number of observations equal 100, then equal 100, then k = 1+3.322(log 100) k = 1+3.322(log 100) = 1 + 3.3222 (2) = 7.6 = 1 + 3.3222 (2) = 7.6 8. 8.Assume that the smallest value = 5 and the Assume that the smallest value = 5 and the

largest one of the data = 61, then largest one of the data = 61, then R = 61 – 5 = 56 andR = 61 – 5 = 56 andw = 56 / 8 = 7.w = 56 / 8 = 7.To make the summarization more To make the summarization more

comprehensible, the class width may be 5 comprehensible, the class width may be 5 or 10 or the multiples of 10.or 10 or the multiples of 10.

Page 24: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2424

Example 2.3.1Example 2.3.1 We wish to know how many class interval to have We wish to know how many class interval to have

in the frequency distribution of the data in Table in the frequency distribution of the data in Table 1.4.1 Page 9-10 of ages of 189 subjects who 1.4.1 Page 9-10 of ages of 189 subjects who Participated in a study on smoking cessationParticipated in a study on smoking cessation

SolutionSolution : : Since the number of observations Since the number of observations equal 189, then equal 189, then k = 1+3.322(log 169) k = 1+3.322(log 169) = 1 + 3.3222 (2.276) = 1 + 3.3222 (2.276) 9, 9, R = 82 – 30 = 52 andR = 82 – 30 = 52 and w = 52 / 9 = 5.778w = 52 / 9 = 5.778

It is better to let w = 10, then the intervals It is better to let w = 10, then the intervals will be in the form:will be in the form:

Page 25: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2525

Class intervalFrequency

30 – 3911

40 – 4946

50 – 597060 – 694570 – 7916

80 – 891Total189

Sum of frequency=sample size=n

Page 26: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2626

The Cumulative FrequencyThe Cumulative Frequency::It can be computed by adding successive It can be computed by adding successive frequenciesfrequencies..

The Cumulative Relative FrequencyThe Cumulative Relative Frequency::It can be computed by adding successive relative It can be computed by adding successive relative frequenciesfrequencies..

TheThe Mid-intervalMid-interval::It can be computed by adding the lower bound of It can be computed by adding the lower bound of the interval plus the upper bound of it and then the interval plus the upper bound of it and then divide over 2divide over 2 . .

Page 27: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2727

For the above example, the following table represents the For the above example, the following table represents the cumulative frequency, the relative frequency, the cumulative cumulative frequency, the relative frequency, the cumulative

relative frequency and the mid-intervalrelative frequency and the mid-interval.. Class

intervalMid –

intervalFrequency

Freq (f)Cumulative Frequency

RelativeFrequency

R.f

Cumulative Relative

Frequency

30 – 3934.511110.05820.058240 – 4944.546570.2434-50 – 5954.5-127-0.672060 – 69-45-0.23810.910170 – 7974.5161880.08470.9948

80 – 8984.511890.00531

Total1891

R.f= freq/n

Page 28: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2828

ExampleExample : : From the above frequency table, complete the From the above frequency table, complete the

table then answer the following questions:table then answer the following questions: 1-The number of objects with age less than 50 1-The number of objects with age less than 50

years ?years ? 2-The number of objects with age between 40-69 2-The number of objects with age between 40-69

years ?years ? 3-Relative frequency of objects with age between 3-Relative frequency of objects with age between

70-79 years ?70-79 years ? 4-Relative frequency of objects with age more 4-Relative frequency of objects with age more

than 69 years ?than 69 years ? 5-The percentage of objects with age between 40-5-The percentage of objects with age between 40-

49 years ?49 years ?

Page 29: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

2929

6-6- The percentage of objects with age less than The percentage of objects with age less than 60 years ?60 years ?

7-The Range (R) ?7-The Range (R) ? 8- Number of intervals (K)?8- Number of intervals (K)? 9- The width of the interval ( W) ?9- The width of the interval ( W) ?

Page 30: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

3030

Representing the grouped Representing the grouped frequency table using the frequency table using the

histogramhistogramTo draw the histogram, the To draw the histogram, the true classes limitstrue classes limits should be used. should be used. They can be computed by They can be computed by subtracting subtracting 0.5 from the0.5 from the lower lower limit and limit and adding adding 0.5 to the0.5 to the upper upper limit for each interval.limit for each interval.

True class limitsFrequency

29.5 – <39.511

39.5 – < 49.546

49.5 – < 59.570

59.5 – < 69.545

69.5 – < 79.516

79.5 – < 89.51

Total189

0

10

20

30

40

50

60

70

80

34.5 44.5 54.5 64.5 74.5 84.5

Page 31: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

3131

Representing the grouped Representing the grouped frequency table using the frequency table using the

PolygonPolygon

0

10

20

30

40

50

60

70

80

34.5 44.5 54.5 64.5 74.5 84.5

Page 32: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

3232

ExercisesExercises PagesPages : 31 – 34 : 31 – 34QuestionsQuestions: 2.3.2(a) , 2.3.5 (a): 2.3.2(a) , 2.3.5 (a)H.W.H.W. : : 2.3.6 , 2.3.7(a) 2.3.6 , 2.3.7(a)

Page 33: Introduction to Biostatistics-145 Lectures4

Section (2.4) :Section (2.4) : Descriptive Statistics Descriptive Statistics

Measures of Central Measures of Central Tendency Tendency

Page 38 - 41Page 38 - 41

Page 34: Introduction to Biostatistics-145 Lectures4

3434Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

key words: Descriptive Statistic, measure of

central tendency ,statistic, parameter, mean (μ) ,median, mode.

Page 35: Introduction to Biostatistics-145 Lectures4

3535Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

The Statistic and The The Statistic and The ParameterParameter • A Statistic:

It is a descriptive measure computed from the data of a sample.

• A Parameter:It is a a descriptive measure computed from the

data of a population.Since it is difficult to measure a parameter from the

population, a sample is drawn of size n, whose values are 1 , 2 , …, n. From this data, we measure the statistic.

Page 36: Introduction to Biostatistics-145 Lectures4

3636Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

Measures of Central Measures of Central TendencyTendency

A measure of central tendency is a measure which indicates where the middle of the data is.

The three most commonly used measures of central tendency are:

The Mean, the Median, and the Mode.

The Mean:It is the average of the data.

Page 37: Introduction to Biostatistics-145 Lectures4

3737Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

The Population Mean:

= which is usually unknown, then we use the

sample mean to estimate or approximate it.The Sample Mean: =

Example:Here is a random sample of size 10 of ages, where 1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31, 6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37.

= (42 + 28 + … + 37) / 10 = 36.6

x

1

N

ii

N

X

x

1

n

ii

n

x

Page 38: Introduction to Biostatistics-145 Lectures4

3838Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

Properties of the Mean:• Uniqueness. For a given set of data there is

one and only one mean.• Simplicity. It is easy to understand and to

compute.• Affected by extreme values. Since all

values enter into the computation.Example: Assume the values are 115, 110, 119, 117, 121

and 126. The mean = 118.But assume that the values are 75, 75, 80, 80 and 280. The

mean = 118, a value that is not representative of the set of data as a whole.

Page 39: Introduction to Biostatistics-145 Lectures4

3939Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

The Median:When ordering the data, it is the observation that divide the

set of observations into two equal parts such that half of the data are before it and the other are after it.

* If n is odd, the median will be the middle of observations. It will be the (n+1)/2 th ordered observation.

When n = 11, then the median is the 6th observation.* If n is even, there are two middle observations. The median

will be the mean of these two middle observations. It will be the (n+1)/2 th ordered observation.

When n = 12, then the median is the 6.5th observation, which is an observation halfway between the 6th and 7th ordered observation.

Page 40: Introduction to Biostatistics-145 Lectures4

4040Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

Example:For the same random sample, the ordered

observations will be as:23, 28, 28, 31, 32, 34, 37, 42, 50, 61.Since n = 10, then the median is the 5.5th

observation, i.e. = (32+34)/2 = 33.

Properties of the Median:• Uniqueness. For a given set of data there is

one and only one median.• Simplicity. It is easy to calculate.• It is not affected by extreme values as

is the mean.

Page 41: Introduction to Biostatistics-145 Lectures4

4141Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

The Mode:It is the value which occurs most frequently.If all values are different there is no mode.Sometimes, there are more than one mode.Example:For the same random sample, the value 28 is

repeated two times, so it is the mode.Properties of the Mode:• Sometimes, it is not unique.• It may be used for describing qualitative

data.

Page 42: Introduction to Biostatistics-145 Lectures4

Section (2.5) :Section (2.5) : Descriptive Statistics Descriptive Statistics

Measures of Dispersion Measures of Dispersion Page 43 - 46Page 43 - 46

Page 43: Introduction to Biostatistics-145 Lectures4

4343Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

key words: Descriptive Statistic, measure of

dispersion , range ,variance, coefficient of variation.

Page 44: Introduction to Biostatistics-145 Lectures4

4444Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

2.5. Descriptive Statistics – 2.5. Descriptive Statistics – Measures of Dispersion:Measures of Dispersion:

• A measure of dispersion conveys information regarding the amount of variability present in a set of data.

• Note:1. If all the values are the same → There is no dispersion .2. If all the values are different → There is a dispersion: 3.If the values close to each other →The amount of Dispersion small.b) If the values are widely scattered → The Dispersion is greater.

Page 45: Introduction to Biostatistics-145 Lectures4

4545Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

Ex. Figure 2.5.1 –Page 43Ex. Figure 2.5.1 –Page 43

• ** Measures of Dispersion are : 1.Range (R). 2. Variance.3. Standard deviation.4.Coefficient of variation (C.V).

Page 46: Introduction to Biostatistics-145 Lectures4

4646Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

1.The Range (R):1.The Range (R): • Range =Largest value- Smallest value =

• Note: • Range concern only onto two values • Example 2.5.1 Page 40: • Refer to Ex 2.4.2.Page 37 • Data:• 43,66,61,64,65,38,59,57,57,50. • Find Range?• Range=66-38=28

SL xx

Page 47: Introduction to Biostatistics-145 Lectures4

4747Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

2.The Variance:2.The Variance: • It measure dispersion relative to the scatter of the values

a bout there mean. a) Sample Variance ( ) :• ,where is sample mean

• Example 2.5.2 Page 40: • Refer to Ex 2.4.2.Page 37• Find Sample Variance of ages , = 56 • Solution: • S2= [(43-56) 2 +(66-43) 2+…..+(50-56) 2 ]/ 10• = 900/10 = 90

x

2S

1

)(1

2

2

n

xxS

n

ii

x

Page 48: Introduction to Biostatistics-145 Lectures4

4848Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

• b)Population Variance ( ) :• where , is Population mean3.The Standard Deviation: • is the square root of variance=a) Sample Standard Deviation = S =b) Population Standard Deviation = σ =

2

N

xN

ii

1

2

2)(

Varince2S

2

Page 49: Introduction to Biostatistics-145 Lectures4

4949Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

4.The Coefficient of Variation 4.The Coefficient of Variation (C.V):(C.V):

• Is a measure use to compare the dispersion in two sets of data which is independent of the unit of the measurement .

• where S: Sample standard deviation.

• : Sample mean.

)100(.XSVC

X

Page 50: Introduction to Biostatistics-145 Lectures4

5050Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

Example 2.5.3 Page 46Example 2.5.3 Page 46::

• Suppose two samples of human males yield the following data:

Sampe1 Sample2 Age 25-year-olds 11year-olds Mean weight 145 pound 80 poundStandard deviation 10 pound 10 pound

Page 51: Introduction to Biostatistics-145 Lectures4

5151Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

• We wish to know which is more variable.• Solution:• c.v (Sample1)= (10/145)*100= 6.9

• c.v (Sample2)= (10/80)*100= 12.5

• Then age of 11-years old(sample2) is more variation

Page 52: Introduction to Biostatistics-145 Lectures4

5252Text Book : Basic Concepts and MethText Book : Basic Concepts and Methodology for the Health Sciences odology for the Health Sciences

ExercisesExercises

• Pages : 52 – 53• Questions: 2.5.1 , 2.5.2 ,2.5.3• H.W. :2.5.4 , 2.5.5, 2.5.6, 2.5.14• * Also you can solve in the review

questions page 57:• Q: 12,13,14,15,16, 19

Page 53: Introduction to Biostatistics-145 Lectures4

Chapter 3Chapter 3ProbabilityProbability

The Basis of the The Basis of the Statistical inferenceStatistical inference

Page 54: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

5454

Key words:Key words:

Probability, objective Probability,Probability, objective Probability,subjective Probability, equally likelysubjective Probability, equally likelyMutually exclusive, multiplicative ruleMutually exclusive, multiplicative ruleConditional Probability, independent events, Conditional Probability, independent events,

Bayes theoremBayes theorem

Page 55: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

5555

3.13.1 IntroductionIntroduction The concept of probability is frequently encountered in everyday The concept of probability is frequently encountered in everyday

communication. communication. For exampleFor example, a physician may say that a , a physician may say that a patient has a 50-50 chance of surviving a certain operation. patient has a 50-50 chance of surviving a certain operation. Another physician may say that she is 95 percent certain that a Another physician may say that she is 95 percent certain that a patient has a particular disease. patient has a particular disease.

Most people express probabilities in terms of percentages. Most people express probabilities in terms of percentages.

But, it is more convenient to express probabilities as fractions. But, it is more convenient to express probabilities as fractions. Thus, we may measure the probability of the occurrence of Thus, we may measure the probability of the occurrence of some event by a number between 0 and 1.some event by a number between 0 and 1.

The more likely the event, the closer the number is to one. An The more likely the event, the closer the number is to one. An event that can't occur has a probability of zero, and an event event that can't occur has a probability of zero, and an event that is certain to occur has a probability of one.that is certain to occur has a probability of one.

Page 56: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

5656

3.23.2 Two views of Probability Two views of Probability objective and subjectiveobjective and subjective::

*** *** Objective ProbabilityObjective Probability ** ** Classical and RelativeClassical and Relative Some definitionsSome definitions::1.Equally likely outcomes: 1.Equally likely outcomes: Are the outcomes that have the same Are the outcomes that have the same

chance of occurring.chance of occurring.2.Mutually exclusive:2.Mutually exclusive:Two events are said to be mutually exclusive Two events are said to be mutually exclusive

if they cannot occur simultaneously such if they cannot occur simultaneously such that A B =Φ .that A B =Φ .

Page 57: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

5757

The universal SetThe universal Set (S): The set all (S): The set all possible outcomes.possible outcomes.

The empty setThe empty set Φ Φ : Contain no elements. : Contain no elements. The event ,The event ,EE : is a set of outcomes in S : is a set of outcomes in S

which has a certain characteristic.which has a certain characteristic. Classical ProbabilityClassical Probability : If an event can : If an event can

occur in N mutually exclusive and equally occur in N mutually exclusive and equally likely ways, and if m of these possess a likely ways, and if m of these possess a triat, E, the probability of the occurrence of triat, E, the probability of the occurrence of event E is equal to m/ N .event E is equal to m/ N .

For ExampleFor Example: : in the rolling of the die , in the rolling of the die , each of the six sides is equally likely to be each of the six sides is equally likely to be observed . So, the probability that a 4 will observed . So, the probability that a 4 will be observed is equal to 1/6.be observed is equal to 1/6.

Page 58: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

5858

Relative Frequency Probability:Relative Frequency Probability: Def:Def: If some posses is repeated a large If some posses is repeated a large

number of times, n, and if some resulting number of times, n, and if some resulting event E occurs m times , the relative event E occurs m times , the relative frequency of occurrence of E , m/n will be frequency of occurrence of E , m/n will be approximately equal to probability of E . approximately equal to probability of E . P(E) = m/n .P(E) = m/n .

*** *** Subjective ProbabilitySubjective Probability : : Probability measures the confidence that a Probability measures the confidence that a

particular individual has in the truth of a particular individual has in the truth of a particular proposition.particular proposition.

For ExampleFor Example : the probability that a cure : the probability that a cure for cancer will be discovered within the for cancer will be discovered within the next 10 years. next 10 years.

Page 59: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

5959

3.33.3 Elementary Properties of Elementary Properties of ProbabilityProbability::

Given some process (or experiment ) Given some process (or experiment ) with n mutually exclusive events Ewith n mutually exclusive events E11, , EE22, E, E33,…………, E,…………, Enn, then, then

1-P(E1-P(Eii ) 0, i= 1,2,3,……n ) 0, i= 1,2,3,……n 2- P(E2- P(E1 1 )+ P(E)+ P(E22) +……+P(E) +……+P(Enn )=1 )=1 3- P(E3- P(Eii +E +EJJ )= P(E )= P(Ei i )+ P(E)+ P(EJJ ), ),

EEii ,E ,EJJ are mutually exclusive are mutually exclusive

Page 60: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6060

Rules of ProbabilityRules of Probability 1-Addition Rule1-Addition Rule P(A U B)= P(A) + P(B) – P (A∩B )P(A U B)= P(A) + P(B) – P (A∩B ) 2- If A and B are mutually exclusive 2- If A and B are mutually exclusive

(disjoint) ,then(disjoint) ,then P (A∩B ) = 0P (A∩B ) = 0 Then , addition rule isThen , addition rule is P(A B)= P(A) + P(B) .P(A B)= P(A) + P(B) . 3- Complementary Rule3- Complementary Rule P(A' )= 1 – P(A)P(A' )= 1 – P(A) where, A' = = complement eventwhere, A' = = complement event Consider example Consider example 3.4.1 Page 633.4.1 Page 63

Page 61: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6161

Table 3.4.1 in Example 3.4.1Table 3.4.1 in Example 3.4.1Family history of Mood Disorders

Early = 18) E(

Later >18)L (

Total

Negative(A)283563

Bipolar Disorder(B)

193857

Unipolar (C) 414485

Unipolar and Bipolar(D)

5360113

Total141177318

Page 62: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6262

****Answer the following questionsAnswer the following questions::Suppose we pick a person at random from this sample.Suppose we pick a person at random from this sample.1-The probability that this person will be 18-years old 1-The probability that this person will be 18-years old

or younger?or younger?2-The probability that this person has family history of 2-The probability that this person has family history of

mood orders Unipolar(C)?mood orders Unipolar(C)?3-The probability that this person has no family history 3-The probability that this person has no family history

of mood orders Unipolar( )?of mood orders Unipolar( )?4-The probability that this person is 18-years old or 4-The probability that this person is 18-years old or

younger younger oror has no family history of mood orders has no family history of mood orders Negative (A)?Negative (A)?

5-The probability that this person is more than18-5-The probability that this person is more than18-years old years old andand has family history of mood orders has family history of mood orders Unipolar and Bipolar(D)?Unipolar and Bipolar(D)?

C

Page 63: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6363

Conditional ProbabilityConditional Probability::

P(A\B) is the probability of A assuming P(A\B) is the probability of A assuming that B has happened.that B has happened.

P(A\B)= , P(B)≠ 0P(A\B)= , P(B)≠ 0

P(B\A)= , P(A)≠ 0P(B\A)= , P(A)≠ 0

)()(

BPBAP

)()(

APBAP

Page 64: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6464

Example 3.4.2 Page 64Example 3.4.2 Page 64From previous example From previous example 3.4.1 Page 633.4.1 Page 63 , ,

answeranswer suppose we pick a person at random and suppose we pick a person at random and

find he is 18 years or younger (E),what is find he is 18 years or younger (E),what is the probability that this person will be one the probability that this person will be one who has no family history of mood disorders who has no family history of mood disorders (A)?(A)?

suppose we pick a person at random and suppose we pick a person at random and find he has family history of mood (D) what find he has family history of mood (D) what is the probability that this person will be 18 is the probability that this person will be 18 years or younger (E)? years or younger (E)?

Page 65: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6565

Calculating a joint ProbabilityCalculating a joint Probability: : Example 3.4.3.Page 64Example 3.4.3.Page 64 Suppose we pick a person at random Suppose we pick a person at random

from the 318 subjects. Find the from the 318 subjects. Find the probability that he will early (E) and probability that he will early (E) and has no family history of mood has no family history of mood disorders (A).disorders (A).

Page 66: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6666

Multiplicative RuleMultiplicative Rule:: P(A∩B)= P(A\B)P(B)P(A∩B)= P(A\B)P(B) P(A∩B)= P(B\A)P(A)P(A∩B)= P(B\A)P(A) Where,Where, P(A): marginal probability of A.P(A): marginal probability of A. P(B): marginal probability of B.P(B): marginal probability of B. P(B\A):The conditional probability.P(B\A):The conditional probability.

Page 67: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6767

Example 3.4.4 Page 65Example 3.4.4 Page 65 From previous example From previous example 3.4.1 Page 633.4.1 Page 63

, we wish to compute the joint , we wish to compute the joint probability of Early age at onset(E) probability of Early age at onset(E) and a negative family history of and a negative family history of mood disorders(A) from a knowledge mood disorders(A) from a knowledge of an appropriate marginal of an appropriate marginal probability and an appropriate probability and an appropriate conditional probability.conditional probability.

Exercise: Example 3.4.5.Page 66Exercise: Example 3.4.5.Page 66 Exercise: Example 3.4.6.Page 67Exercise: Example 3.4.6.Page 67

Page 68: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6868

Independent EventsIndependent Events:: If A has no effect on B, we said that If A has no effect on B, we said that

A,B are independent events.A,B are independent events. Then,Then, 1- P(A∩B)= P(B)P(A)1- P(A∩B)= P(B)P(A) 2- P(A\B)=P(A)2- P(A\B)=P(A) 3- P(B\A)=P(B)3- P(B\A)=P(B)

Page 69: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

6969

Example 3.4.7 Page 68Example 3.4.7 Page 68 In a certain high school class consisting of In a certain high school class consisting of

60 girls and 40 boys, it is observed that 24 60 girls and 40 boys, it is observed that 24 girls and 16 boys wear eyeglasses . If a girls and 16 boys wear eyeglasses . If a student is picked at random from this class student is picked at random from this class ,the probability that the student wears ,the probability that the student wears eyeglasses , P(E), is 40/100 or 0.4 .eyeglasses , P(E), is 40/100 or 0.4 .

What is the probability that a student What is the probability that a student picked at random wears eyeglasses given picked at random wears eyeglasses given that the student is a boy?that the student is a boy?

What is the probability of the joint What is the probability of the joint occurrence of the events of wearing eye occurrence of the events of wearing eye glasses and being a boy?glasses and being a boy?

Page 70: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7070

Example 3.4.8 Page 69Example 3.4.8 Page 69 Suppose that of 1200 admission to a Suppose that of 1200 admission to a

general hospital during a certain period of general hospital during a certain period of time,750 are private admissions. If we time,750 are private admissions. If we designate these as a set A, then compute designate these as a set A, then compute P(A) , P( ).P(A) , P( ).

Exercise: Example 3.4.9.Page 76Exercise: Example 3.4.9.Page 76

A

Page 71: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7171

Marginal ProbabilityMarginal Probability:: Definition:Definition: Given some variable that can be broken Given some variable that can be broken

down into m categories designated down into m categories designated by and another jointly occurring by and another jointly occurring

variable that is broken down into n variable that is broken down into n categories designated by categories designated by

, the marginal probability of with all the , the marginal probability of with all the categories of B . That is,categories of B . That is,

for all value of jfor all value of j Example 3.4.9.Page 76Example 3.4.9.Page 76 Use data of Table 3.4.1, and rule of Use data of Table 3.4.1, and rule of

marginal Probabilities to calculate P(E). marginal Probabilities to calculate P(E).

),()( jii BAPAP

mi AAAA ,.......,,.......,, 21

nj BBBB ,.......,,.......,, 21

iA

Page 72: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7272

ExerciseExercise:: Page 76-77Page 76-77 Questions :Questions : 3.4.1, 3.4.3,3.4.43.4.1, 3.4.3,3.4.4 H.W.H.W. 3.4.5 , 3.4.73.4.5 , 3.4.7

Page 73: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7373

Baye's Theorem Baye's Theorem Pages 79-83Pages 79-83

Page 74: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7474

Definition.1

The sensitivity of the symptom

This is the probability of a positive result given that the subject has the disease. It is denoted by P(T|D)

Definition.2

The specificity of the symptomThis is the probability of negative result given that the subject does not have the disease. It is denoted by

Page 75: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7575

)()|()()|()()|()|(

DPDTPDPDTPDPDTPTDP

)|(1)|(

)(1)(

DTPDTp

DPDP

Page 76: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7676

Definition.4The predictive value negative of the symptomThis is the probability that a subject does not have the disease given that the subject has a negative screening test resultIt is calculated using Bayes Theorem through the following formula

where,)()|()()|(

)()|()|(DPDTPDPDTP

DPDTPTDP

)|(1)|( DTPDTp

Page 77: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7777

Example 3.5.1 page 82

A medical research team wished to evaluate a proposed screening test for Alzheimer’s disease. The test was given to a random sample of 450 patients with Alzheimer’s disease and an independent random sample of 500 patients without symptoms of the disease. The two samples were drawn from populations of subjects who were 65 years or older. The results are as follows.

Test ResultYes (D)No( ) TotalPositive(T)4365441

Negativ( )14495509

Total450500950T

D

Page 78: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7878

In the context of this examplea)What is a false positive?A false positive is when the test indicates a positive result (T) when the person does not have the disease

b) What is the false negative?A false negative is when a test indicates a negative result ( ) when the person has the disease (D).

c) Compute the sensitivity of the symptom.

d) Compute the specificity of the symptom.

D

T

9689.0450436)|( DTP

99.0500495)|( DTP

Page 79: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

7979

e) Suppose it is known that the rate of the disease in the general population is 11.3%. What is the predictive value positive of the symptom and the predictive value negative of the symptom The predictive value positive of the symptom is calculated as

The predictive value negative of the symptom is calculated as

996.0.113)(0.0311)(087)(0.99)(0.8

87)(0.99)(0.8

)()|()()|()()|()|(

DPDTPDPDTPDPDTPTDP

925.00.113)-(.01)(1.113)(0.9689)(0

.113)(0.9689)(0

)()|()()|()()|()|(

DPDTPDPDTPDPDTPTDP

Page 80: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

8080

ExerciseExercise:: Page 83Page 83 Questions :Questions : 3.5.1, 3.5.23.5.1, 3.5.2 H.W.:H.W.: Page 87 : Q4,Q5,Q7,Q9,Q21Page 87 : Q4,Q5,Q7,Q9,Q21

Page 81: Introduction to Biostatistics-145 Lectures4

Chapter 4:Probabilistic features of

certain data DistributionsPages 93- 111

Page 82: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

82

Key words

Probability distribution , random variable , Bernolli distribution, Binomail distribution, Poisson distribution

Page 83: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

83

The Random Variable (X):

When the values of a variable (height, weight, or age) can’t be predicted in advance, the variable is called a random variable.

An example is the adult height.

When a child is born, we can’t predict exactly his or her height at maturity.

Page 84: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

84

4.2 Probability Distributions for Discrete Random Variables

Definition:The probability distribution of a discrete random variable is a table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities.

Page 85: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

85

The Cumulative Probability Distribution of X, F(x):

It shows the probability that the variable X is less than or equal to a certain value, P(X x).

Page 86: Introduction to Biostatistics-145 Lectures4

8686Text Book : Basic Concepts and MText Book : Basic Concepts and Methodology for the Health Sciencesethodology for the Health Sciences

Example 4.2.1 page 94Example 4.2.1 page 94::Number of Number of ProgramsPrograms

frequencfrequencyy

P(X=x)P(X=x)F(x)F(x)==P(X≤ x)P(X≤ x)

1162620.20880.20880.20880.20882247470.15820.15820.36700.36703339390.13130.13130.49830.49834439390.13130.13130.62960.62965558580.19530.19530.82490.82496637370.12460.12460.94950.949577440.01350.01350.96300.96308811110.03700.03701.00001.0000

TotalTotal2972971.00001.0000

Page 87: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

87

See figure 4.2.1 page 96See figure 4.2.2 page 97

Properties of probability distribution of discrete random variable.

1. 2. 3. P(a X b) = P(X b) – P(X a-1) 4. P(X < b) = P(X b-1)

0 ( ) 1P X x ( ) 1P X x

Page 88: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

88

Example 4.2.2 page 96: (use table in example 4.2.1)

What is the probability that a randomly selected family will be one who used three assistance programs?Example 4.2.3 page 96: (use table in example 4.2.1)

What is the probability that a randomly selected family used either one or two programs?

Page 89: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

89

Example 4.2.4 page 98: (use table in example 4.2.1)

What is the probability that a family picked at random will be one who used two or fewer assistance programs?Example 4.2.5 page 98: (use table in example 4.2.1)

What is the probability that a randomly selected family will be one who used fewer than four programs?Example 4.2.6 page 98: (use table in example 4.2.1)

What is the probability that a randomly selected family used five or more programs?

Page 90: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

90

Example 4.2.7 page 98: (use table in example 4.2.1)

What is the probability that a randomly selected family is one who used between three and five programs, inclusive?

Page 91: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

91

4.3 The Binomial Distribution:The binomial distribution is one of the most widely encountered probability distributions in applied statistics. It is derived from a process known as a Bernoulli trial.Bernoulli trial is :

When a random process or experiment called a trial can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, the trial is called a Bernoulli trial.

Page 92: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

92

The Bernoulli ProcessA sequence of Bernoulli trials forms a Bernoulli process under the following conditions

1- Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure.

2- The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure, 1-p, is denoted by q.

3- The trials are independent, that is the outcome of any particular trial is not affected by the outcome of any other trial

Page 93: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

93

The probability distribution of the binomial random variable X, the number of successes in n independent trials is:

Where is the number of combinations of n distinct objects taken x of them at a time.

* Note: 0! =1

( ) ( ) , 0,1,2,....,X n Xn

f x P X x p q x nx

n

x

!!( )!

n nx n xx

! ( 1)( 2)....(1)x x x x

Page 94: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

94

Properties of the binomial distribution

1.2.3.The parameters of the binomial distribution are n and p4.5.

( ) 0f x ( ) 1f x

( )E X np 2 var( ) (1 )X np p

Page 95: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

95

Example 4.3.1 page 100 If we examine all birth records from the North Carolina State Center for Health statistics for year 2001, we find that 85.8 percent of the pregnancies had delivery in week 37 or later (full- term birth).

If we randomly selected five birth records from this population what is the probability that exactly three of the records will be for full-term births?

Exercise: example 4.3.2 page 104

Page 96: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

96

Example 4.3.3 page 104Suppose it is known that in a certain population 10 percent of the population is color blind. If a random sample of 25 people is drawn from this population, find the probability that

a) Five or fewer will be color blind.b) Six or more will be color blindc) Between six and nine inclusive will be color

blind.d) Two, three, or four will be color blind.

Exercise: example 4.3.4 page 106

Page 97: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

97

4.4 The Poisson DistributionIf the random variable X is the number of occurrences of some random event in a certain period of time or space (or some volume of matter).The probability distribution of X is given by:

f (x) =P(X=x) = ,x = 0,1,…..

The symbol e is the constant equal to 2.7183. (Lambda) is called the parameter of the distribution and is the average number of occurrences of the random event in the interval (or volume)

!

x

xe

Page 98: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

98

Properties of the Poisson distribution

1.2.3.4.

( ) 0f x

( ) 1f x ( )E X

2 var( )X

Page 99: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

99

Example 4.4.1 page 111In a study of a drug -induced anaphylaxis among patients taking rocuronium bromide as part of their anesthesia, Laake and Rottingen found that the occurrence of anaphylaxis followed a Poisson model with =12 incidents per year in Norway .Find

1- The probability that in the next year, among patients receiving rocuronium, exactly three will experience anaphylaxis?

Page 100: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

100

2- The probability that less than two patients receiving rocuronium, in the next year will experience anaphylaxis?3- The probability that more than two patients receiving rocuronium, in the next year will experience anaphylaxis?4- The expected value of patients receiving rocuronium, in the next year who will experience anaphylaxis.5- The variance of patients receiving rocuronium, in the next year who will experience anaphylaxis6- The standard deviation of patients receiving rocuronium, in the next year who will experience anaphylaxis

Page 101: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

101

Example 4.4.2 page 111: Refer to example 4.4.1

1-What is the probability that at least three patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia?2-What is the probability that exactly one patient in the next year will experience anaphylaxis if rocuronium is administered with anesthesia?3-What is the probability that none of the patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia?

Page 102: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

102

4-What is the probability that at most two patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia?

Exercises: examples 4.4.3, 4.4.4 and 4.4.5 pages111-113Exercises: Questions 4.3.4 ,4.3.5, 4.3.7 ,4.4.1,4.4.5

Page 103: Introduction to Biostatistics-145 Lectures4

4.5 Continuous 4.5 Continuous Probability Probability DistributionDistribution

Pages 114 – 127Pages 114 – 127

Page 104: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

104

• Key words: Continuous random variable,

normal distribution , standard normal distribution , T-distribution

Page 105: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

105

• Now consider distributions of continuous random variables.

Page 106: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

106

1- Area under the curve = 1.2- P(X = a) = 0 , where a is a

constant.3- Area between two points a , b =

P(a<x<b) .

Properties of continuous probability Distributions:

Page 107: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

107

4.6 The normal distribution:

• It is one of the most important probability distributions in statistics.

• The normal density is given by• , - ∞ < x < ∞, - ∞ < µ < ∞, σ >

0

• π, e : constants• µ: population mean.• σ : Population standard deviation.

2

2

2)(

21)(

x

exf

Page 108: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

108

Characteristics of the normal distribution: Page 111

• The following are some important characteristics of the normal distribution:

1- It is symmetrical about its mean, µ.2- The mean, the median, and the mode

are all equal. 3- The total area under the curve above

the x-axis is one. 4-The normal distribution is completely

determined by the parameters µ and σ.

Page 109: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

109

5- The normal distributiondepends on the twoparameters and . determines the location of the curve.(As seen in figure 4.6.3) ,

But, determines the scale of the curve, i.e. the degree of flatness or peaked ness of the curve.(as seen in figure 4.6.4)

11 22 33

11 < < 22 < < 33

11

22

33

11 < < 22 < < 33

Page 110: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

110

Note that : (As seen in Figure 4.6.2)

1. P( µ- σ < x < µ+ σ) = 0.68 2. P( µ- 2σ< x < µ+ 2σ)= 0.953. P( µ-3σ < x < µ+ 3σ) = 0.997

Page 111: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

111

The Standard normal distribution:

• Is a special case of normal distribution with mean equal 0 and a standard deviation of 1.

• The equation for the standard normal distribution is written as

• , - ∞ < z < ∞2

2

21)(

z

ezf

Page 112: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

112

Characteristics of the standard normal

distribution

1 -It is symmetrical about 0.2 -The total area under the curve

above the x-axis is one.3 -We can use table (D) to find the

probabilities and areas.

Page 113: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

113

“How to use tables of Z”Note that The cumulative probabilities P(Z z) are given intables for -3.49 < z < 3.49. Thus, P (-3.49 < Z < 3.49) 1.For standard normal distribution, P (Z > 0) = P (Z < 0) = 0.5Example 4.6.1:If Z is a standard normal distribution, then1) P( Z < 2) = 0.9772is the area to the left to 2 and it equals 0.9772.

2

Page 114: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

114

Example 4.6.2:P(-2.55 < Z < 2.55) is the area between -2.55 and 2.55, Then it equals P(-2.55 < Z < 2.55) =0.9946 – 0.0054 = 0.9892.

Example 4.6.2: P(-2.74 < Z < 1.53) is the area between -2.74 and 1.53. P(-2.74 < Z < 1.53) =0.9370 – 0.0031 = 0.9339.

-2.74 1.53

-2.55 2.550

Page 115: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

115

Example 4.6.3:P(Z > 2.71) is the area to the right to 2.71. So, P(Z > 2.71) =1 – 0.9966 = 0.0034.

Example : P(Z = 0.84) is the area at z = 2.71. So, P(Z = 0.84) =1 – 0.9966 = 0.0034

0.84

2.71

Page 116: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

116

How to transform normal distribution (X) to standard normal distribution (Z)?

• This is done by the following formula:

• Example:• If X is normal with µ = 3, σ = 2. Find

the value of standard normal Z, If X= 6?

• Answer:

xz

5.12

36

xz

Page 117: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

117

4.7 Normal Distribution Applications

The normal distribution can be used to model the distribution of many variables that are of interest. This allow us to answer probability questions about these random variables.

Example 4.7.1:The ‘Uptime ’is a custom-made light weight battery-operatedactivity monitor that records the amount of time an individualspend the upright position. In a study of children ages 8 to 15years. The researchers found that the amount of time childrenspend in the upright position followed a normal distribution withMean of 5.4 hours and standard deviation of 1.3.Find

Page 118: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

118

If a child selected at random ,then1-The probability that the child spend less than 3 hours in the upright position 24-hour period

P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322

-------------------------------------------------------------------------2-The probability that the child spend more than 5 hours in the upright position 24-hour period

P( X > 5) = P( > ) = P(Z > -0.31)

= 1- P(Z < - 0.31) = 1- 0.3520= 0.648-----------------------------------------------------------------------3-The probability that the child spend exactly 6.2 hours in the upright position 24-hour period

P( X = 6.2) = 0

X

3.14.53

X

3.14.55

Page 119: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

119

4-The probability that the child spend from 4.5 to 7.3 hours in the upright position 24-hour period

P( 4.5 < X < 7.3) = P( < < ) = P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69) = 0.9279 – 0.2451 = 0.6828

• Hw…EX. 4.7.2 – 4.7.3

X

3.14.55.4

3.14.53.7

Page 120: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

120

6.3 The T Distribution:)167-173(

1- It has mean of zero.2- It is symmetric about the mean.3- It ranges from - to .

0

Page 121: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

121

4- compared to the normal distribution, the t distribution is less peaked in the center and has higher tails.

5- It depends on the degrees of freedom (n-1).

6- The t distribution approaches the standard normal distribution as (n-1) approaches .

Page 122: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

122

Examplest (7, 0.975) = 2.3646

------------------------------t (24, 0.995) = 2.7696

--------------------------If P (T(18) > t) = 0.975,

then t = -2.1009-------------------------If P (T(22) < t) = 0.99,

then t = 2.508

0.005

t (24, 0.995)

0.995

t (7, 0.975)

0.0250.975

t

0.9750.025

0.990.01

t

Page 123: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

123

• Exercise:

• Questions : 4.7.1, 4.7.2• H.W : 4.7.3, 4.7.4, 4.7.6

Page 124: Introduction to Biostatistics-145 Lectures4

Chapter 6Using sample data to make estimates about population parameters (P162-172)

Page 125: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

125

Key words:

Point estimate, interval estimate, estimator,

Confident level ,α , Confident interval for mean μ, Confident interval for two means,

Confident interval for population proportion P,

Confident interval for two proportions

Page 126: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

126

6.1 Introduction: Statistical inference is the procedure by which we

reach to a conclusion about a population on the basis of the information contained in a sample drawn from that population.

Suppose that: an administrator of a large hospital is interested

in the mean age of patients admitted to his hospital during a given year.

1. It will be too expensive to go through the records of all patients admitted during that particular year.

2. He consequently elects to examine a sample of the records from which he can compute an estimate of the mean age of patients admitted to his that year.

Page 127: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

127

• To any parameter, we can compute two types of estimate: a point estimate and an interval estimate.

A point estimate is a single numerical value used to estimate the corresponding population parameter.

An interval estimate consists of two numerical values defining a range of values that, with a specified degree of confidence, we feel includes the parameter being estimated.

The Estimate and The Estimator: The estimate is a single computed value, but the

estimator is the rule that tell us how to compute this value, or estimate.

For example, is an estimator of the population mean,. The

single numerical value that results from evaluating this formula is called an estimate of the parameter .

i

ixx

Page 128: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

128

6.2 Confidence Interval for a Population Mean: (C.I) Suppose researchers wish to estimate the

mean of some normally distributed population. They draw a random sample of size n from the

population and compute , which they use as a point estimate of .

Because random sampling involves chance, then can’t be expected to be equal to .

The value of may be greater than or less than .

It would be much more meaningful to estimate by an interval.

x

x

Page 129: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

129

The 1- percent confidence interval (C.I.) for :

We want to find two values L and U between which lies with high probability, i.e.

P( L ≤ ≤ U ) = 1-

Page 130: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

130

For example: When, = 0.01, then 1- = = 0.05, then 1- = = 0.05, then 1- =

Page 131: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

131

We have the following casesa) When the population is

normal1) When the variance is known and the sample size is

large or small, the C.I. has the form: P( - Z (1- /2) /n < < + Z (1- /2) /n) = 1-

2) When variance is unknown, and the sample size is small, the C.I. has the form:

P( - t (1- /2),n-1 s/n < < + t (1- /2),n-1 s/n) = 1-

x x

xx

Page 132: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

132

b) When the population is not normal and n large (n>30)1) When the variance is known the C.I.

has the form:P( - Z (1- /2) /n < < + Z (1- /2) /n) = 1-

2) When variance is unknown, the C.I. has the form:

P( - Z (1- /2) s/n < < + Z (1- /2) s/n) = 1-

x x

x x

Page 133: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

133

Example 6.2.1 Page 167: Suppose a researcher , interested in obtaining

an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of approximately

Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. We wish to estimate . (=0.05)

22x

Page 134: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

134

Solution: 1- =0.95→ =0.05→ /2=0.025, variance = σ2 = 45 → σ= 45,n=10 95%confidence interval for is given by: P( - Z (1- /2) /n < < + Z (1- /2) /n) = 1- Z (1- /2) = Z 0.975 = 1.96 (refer to table D) Z 0.975(/n) =1.96 ( 45 / 10)=4.1578 22 ± 1.96 ( 45 / 10) → (22-4.1578, 22+4.1578) → (17.84, 26.16) Exercise example 6.2.2 page 169

22x

x x

Page 135: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

135

ExampleThe activity values of a certain enzyme measured in

normal gastric tissue of 35 patients with gastric carcinoma has a mean of 0.718 and a standard deviation of 0.511.We want to construct a 90 % confidence interval for the population mean.

Solution: Note that the population is not normal, n=35 (n>30) n is large and is

unknown ,s=0.511 1- =0.90→ =0.1 → /2=0.05→ 1-/2=0.95,

Page 136: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

136

Then 90% confident interval for is given by:

P( - Z (1- /2) s/n < < + Z (1- /2) s/n) = 1- Z (1- /2) = Z0.95 = 1.645 (refer to table D) Z 0.95(s/n) =1.645 (0.511/ 35)=0.1421 0.718 ± 1.645 (0.511) / 35→ (0.718-0.1421, 0.718+0.1421) → (0.576,0.860). Exercise example 6.2.3 page 164:

xx

Page 137: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

137

Example6.3.1 Page 174: Suppose a researcher , studied the effectiveness of

early weight bearing and ankle therapies following acute repair of a ruptured Achilles tendon. One of the variables they measured following treatment the muscle strength. In 19 subjects, the mean of the strength was 250.8 with standard deviation of 130.9

we assume that the sample was taken from is approximately normally distributed population. Calculate 95% confident interval for the mean of the strength ?

Page 138: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

138

Solution: 1- =0.95→ =0.05→ /2=0.025, Standard deviation= S = 130.9 ,n=19 95%confidence interval for is given by: P( - t (1- /2),n-1 s/n < < + t (1- /2),n-1 s/n) = 1- t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table E) t 0.975,18(s/n) =2.1009 (130.9 / 19)=63.1 250.8 ± 2.1009 (130.9 / 19) → (250.8- 63.1 , 22+63.1) → (187.7, 313.9) Exercise 6.2.1 ,6.2.2 6.3.2 page 171

8.250x

x x

Page 139: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

139

6.3 Confidence Interval for the difference between two Population Means: (C.I)

If we draw two samples from two independent population

and we want to get the confident interval for thedifference between two population means , then

we havethe following cases :a) When the population is normal1) When the variance is known and the sample

sizes is large or small, the C.I. has the form: 2

22

1

21

212121

2

22

1

21

2121 )()(

nnZxx

nnZxx

Page 140: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

140

2) When variances are unknown but equal, and the sample size is small, the C.I. has the form:

2)1()1(

11)(11)(

21

222

2112

21)2(,

212121

21)2(,

2121

2121

nnSnSnS

wherenn

Stxxnn

Stxx

p

pnnpnn

Page 141: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

141

a) When the population is normal1) When the variance is known and the sample

sizes is large or small, the C.I. has the form:

2

22

1

21

212121

2

22

1

21

2121 )()(

nS

nSZxx

nS

nSZxx

Page 142: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

142

Example 6.4.1 P174:The researcher team interested in the difference between serum

uricand acid level in a patient with and without Down’s syndrome .In alarge hospital for the treatment of the mentally retarded, a sample

of 12 individual with Down’s Syndrome yielded a mean of mg/100 ml. In a general hospital a sample of 15 normal individual

ofthe same age and sex were found to have a mean value of If it is reasonable to assume that the two population of values arenormally distributed with variances equal to 1 and 1.5,find the

95%C.I for μ1 - μ2

Solution:1- =0.95→ =0.05→ /2=0.025 → Z (1- /2) = Z0.975 = 1.96

1.1±1.96)0.4282 = (1.1± 0.84 ) = 0.26 , 1.94(

5.41 x

4.32 x

2

22

1

21

2121 )(

nnZxx

15

5.112196.1)4.35.4(

Page 143: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

143

Example 6.4.1 P178:The purpose of the study was to determine the effectiveness of anintegrated outpatient dual-diagnosis treatment program formentally ill subject. The authors were addressing the problem of substance

abuseissues among people with sever mental disorder. A retrospective chart

review wascarried out on 50 patient ,the recherché was interested in the number of

inpatienttreatment days for physics disorder during a year following the end of the

program.Among 18 patient with schizophrenia, The mean number of treatment days

was 4.7with standard deviation of 9.3. For 10 subject with bipolar disorder, the

meannumber of treatment days was 8.8 with standard deviation of 11.5. We

wish toconstruct 99% C.I for the difference between the means of the populationsRepresented by the two samples

Page 144: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

144

Solution: 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995

n2 – 2 = 18 + 10 -2 = 26+ n1t (1- /2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2

where

then(4.7-8.8)± 2.7787 √102.33 √(1/18)+(1/10)- 4.1 ± 11.086 =( - 15.186 , 6.986)Exercises: 6.4.2 , 6.4.6, 6.4.7, 6.4.8 Page

180

21)2(,

2121

11)(21 nn

Stxx pnn

33.10221018

)5.119()3.917(2

)1()1( 22

21

222

2112

xx

nnSnSnS p

Page 145: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

145

6.5 Confidence Interval for a Population proportion (P):

A sample is drawn from the population of interest ,then compute the sample proportion such as

This sample proportion is used as the point estimator of the population proportion . A confident interval is obtained by the following formula

na

p sample in theelement of no. Totalisticcharachtar some with sample in theelement of no.

ˆ

nPPZP )ˆ1(ˆˆ

21

Page 146: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

146

Example 6.5.1The Pew internet life project reported in 2003 that

18%of internet users have used the internet to search forinformation regarding experimental treatments ormedicine . The sample consist of 1220 adult internetusers, and information was collected from telephoneinterview. We wish to construct 98% C.I for theproportion of internet users who have search forinformation about experimental treatments or

medicine

Page 147: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

147

Solution: 1-α =0.98 → α = 0.02 → α/2 =0.01 → 1- α/2 = 0.99Z 1- α/2 = Z 0.99 =2.33 , n=1220,The 98% C. I is

0.18 ± 0.0256 = ( 0.1544 , 0.2056 )

Exercises: 6.5.1 , 6.5.3 Page 187

18.010018

ˆ p

1220)18.01(18.033.218.0)ˆ1(ˆˆ

21

nPPZP

Page 148: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

148

6.6 Confidence Interval for the difference between two Population proportions:

Two samples is drawn from two independent population

of interest ,then compute the sample proportion for each

sample for the characteristic of interest. An unbiased

point estimator for the difference between two population

proportionsA 100(1-α)% confident interval for P1 - P2 is given by

21ˆˆ PP

2

22

1

11

2121

)ˆ1(ˆ)ˆ1(ˆ)ˆˆ(

nPP

nPPZPP

Page 149: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

149

Example 6.6.1Connor investigated gender differences in

proactive andreactive aggression in a sample of 323 adults (68

femaleand 255 males ). In the sample ,31 of the female

and 53of the males were using internet in the internet

café. Wewish to construct 99 % confident interval for thedifference between the proportions of adults go tointernet café in the two sampled population .

Page 150: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

150

Solution: 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255,

The 99% C. I is

0.2481 ± 2.58(0.0655) = ( 0.07914 , 0.4171 )

2078.025553

ˆ,4559.06831

ˆ M

MMF

FF n

apn

ap

255)2078.01(2078.0

68)4559.01(4559.058.2)2078.04559.0(

M

MM

F

FFMF n

PPn

PPZPP )ˆ1(ˆ)ˆ1(ˆ)ˆˆ(

21

Page 151: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

151

Exercises: Questions : 6.2.1, 6.2.2,6.2.5 ,6.3.2,6.3.5, 6.4.2 6.5.3 ,6.5.4,6.6.1

Page 152: Introduction to Biostatistics-145 Lectures4

Chapter 7Chapter 7Using sample statistics to Using sample statistics to

Test Hypotheses Test Hypotheses about population parametersabout population parameters

PagesPages 215-233 215-233

Page 153: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

153153

Key words :Key words :

Null hypothesis HNull hypothesis H0, 0, Alternative hypothesis HAlternative hypothesis HAA , testing , testing hypothesis , test statistic , P-valuehypothesis , test statistic , P-value

Page 154: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

154154

Hypothesis TestingHypothesis Testing

One type of statistical inference, estimation, One type of statistical inference, estimation, was discussed in Chapter 6 . was discussed in Chapter 6 .

The other type ,hypothesis testing ,is discussed The other type ,hypothesis testing ,is discussed in this chapter.in this chapter.

Page 155: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

155155

Definition of a hypothesisDefinition of a hypothesis

It is a statement about one or more populations . It is a statement about one or more populations . It is usually concerned with the parameters of It is usually concerned with the parameters of

the population. e.g. the hospital administrator the population. e.g. the hospital administrator may want to test the hypothesis that the average may want to test the hypothesis that the average length of stay of patients admitted to the length of stay of patients admitted to the hospital is 5 days hospital is 5 days

Page 156: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

156156

Definition of Statistical hypothesesDefinition of Statistical hypotheses

They are hypotheses that are stated in such a way that They are hypotheses that are stated in such a way that they may be evaluated by appropriate statistical they may be evaluated by appropriate statistical techniques. techniques.

There are two hypotheses involved in hypothesis There are two hypotheses involved in hypothesis testing testing

Null hypothesisNull hypothesis H H00: It is the hypothesis to be tested .: It is the hypothesis to be tested . Alternative hypothesisAlternative hypothesis H HAA : It is a statement of what : It is a statement of what

we believe is true if our sample data cause us to reject we believe is true if our sample data cause us to reject the null hypothesisthe null hypothesis

Page 157: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

157157

7.27.2 Testing a hypothesis about the Testing a hypothesis about the mean of a populationmean of a population::

We have the following steps:We have the following steps:1.1.DataData:: determine variable, sample size (n), sample determine variable, sample size (n), sample

mean( ) , population standard deviation or sample mean( ) , population standard deviation or sample standard deviation (s) if is unknown standard deviation (s) if is unknown

2. 2. Assumptions :Assumptions : We have two cases: We have two cases: Case1:Case1: Population is normally or approximately Population is normally or approximately

normally distributed with known or unknown normally distributed with known or unknown variance (sample size n may be small or large), variance (sample size n may be small or large),

Case 2:Case 2: Population is not normal with known or Population is not normal with known or unknown variance (n is large i.e. n≥30).unknown variance (n is large i.e. n≥30).

x

Page 158: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

158158

3.Hypotheses:3.Hypotheses: we have three caseswe have three cases Case ICase I : : H H00: : μμ==μμ00

HHAA: : μ μμ μ00

e.g. we want to test that the population mean is e.g. we want to test that the population mean is different than 50different than 50

Case IICase II : : H H00: : μ μ = = μμ00 HHAA: : μμ > > μμ00

e.g. we want to test that the population mean is e.g. we want to test that the population mean is greater than 50greater than 50

Case IIICase III : : H H0:0: μ = μ μ = μ00

HHAA: : μμ< < μμ00

e.g. we want to test that the population mean is lesse.g. we want to test that the population mean is less than 50than 50

Page 159: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

159159

4.Test Statistic4.Test Statistic:: Case 1:Case 1: population is normalpopulation is normal or or approximately approximately

normalnormal σσ22 is known σ is known σ22 is unknown is unknown( n large or small)( n large or small) n large n smalln large n small

Case2:Case2: If population is If population is not normallynot normally distributed and distributed and n is n is largelarge

i)If σi)If σ22 is known ii) If σ is known ii) If σ22 is unknown is unknown

n

XZ o-

ns

XZ o-

ns

XT o-

ns

XZ o-

n

XZ o-

Page 160: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

160160

5.Decision Rule:5.Decision Rule:i) i) If HIf HAA: μ μ: μ μ00 Reject H Reject H 00 if Z >Z if Z >Z1-α/2 1-α/2 or Z< - Zor Z< - Z1-α/21-α/2

(when use Z - test) (when use Z - test) OrOr Reject H Reject H 00 if T >t if T >t1-α/2,n-1 1-α/2,n-1 or T< - tor T< - t1-α/2,n-11-α/2,n-1

))when use T- testwhen use T- test ( ( ____________________________________________________ ii) If Hii) If HAA: μ> μ: μ> μ00 Reject HReject H00 if Z>Z if Z>Z1-α1-α (when use Z - test) (when use Z - test) OrOr Reject H Reject H00 if T>t if T>t1-α,n-11-α,n-1 (when use T - test)(when use T - test)

Page 161: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

161161

iii) If Hiii) If HAA: μ< μ: μ< μ00 Reject HReject H00 if Z< - Z if Z< - Z1-1-α α (when use Z - test) (when use Z - test) OrOrReject HReject H00 if T<- t if T<- t1-1-α,n-1 α,n-1 (when use T - test)(when use T - test)

NoteNote:: ZZ1-α/21-α/2 , Z , Z1-α1-α , Z , Zαα are tabulated values obtained are tabulated values obtained

from table Dfrom table Dtt1-α/21-α/2 , t , t1-α1-α , t , tαα are tabulated values obtained from are tabulated values obtained from

table E with (n-1) degree of freedom (df)table E with (n-1) degree of freedom (df)

Page 162: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

162162

6.Decision :6.Decision : If we reject HIf we reject H00, we can conclude that H, we can conclude that HAA is is

true.true. If ,however ,we do not reject HIf ,however ,we do not reject H00, we may , we may

conclude that Hconclude that H00 is true. is true.

Page 163: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

163163

An Alternative Decision Rule using theAn Alternative Decision Rule using the p - value Definition p - value Definition The The p-valuep-value is defined as the smallest value of is defined as the smallest value of

α for which the null hypothesis can be α for which the null hypothesis can be rejected.rejected.

If the p-value is less than or equal to α ,we If the p-value is less than or equal to α ,we reject the null hypothesisreject the null hypothesis (p ≤ (p ≤ αα))

If the p-value is greater than α ,we If the p-value is greater than α ,we do not do not reject the null hypothesis reject the null hypothesis (p > (p > αα))

Page 164: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

164164

Example 7.2.1 Page 223Example 7.2.1 Page 223 Researchers are interested in the mean age of a Researchers are interested in the mean age of a

certaincertain populationpopulation.. A random sample of 10 individuals drawn from the A random sample of 10 individuals drawn from the

population of interest has a mean of 27. population of interest has a mean of 27. Assuming that the population is approximately Assuming that the population is approximately

normally distributed with variance 20,can we normally distributed with variance 20,can we conclude that the mean is different from 30 years ? conclude that the mean is different from 30 years ? (α=0.05) .(α=0.05) .

If the p - value is 0.0340 how can we use it in making If the p - value is 0.0340 how can we use it in making a decision? a decision?

Page 165: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

165165

SolutionSolution

1-1-Data:Data: variable is age, n=10, =27 ,σ variable is age, n=10, =27 ,σ22=20,α=0.05=20,α=0.052-2-Assumptions:Assumptions: the population is approximately the population is approximately

normally distributed with variance 20 normally distributed with variance 20 3-Hypotheses:3-Hypotheses: HH00 : μ=30 : μ=30 HHAA: μ 30: μ 30

x

Page 166: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

166166

4-Test Statistic:4-Test Statistic: Z Z = -2.12 = -2.125.Decision Rule5.Decision Rule The alternative hypothesis isThe alternative hypothesis is HHAA: μ > 30: μ > 30 Hence we reject H0 if Z >ZHence we reject H0 if Z >Z1-0.025/21-0.025/2= Z= Z0.9750.975 or Z< - Zor Z< - Z1-0.025/21-0.025/2= - Z= - Z0.9750.975

ZZ0.9750.975=1.96(from table D)=1.96(from table D)

Page 167: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

167167

6.Decision:6.Decision:

We reject HWe reject H00 ,since -2.12 is in the rejection ,since -2.12 is in the rejection region .region .

We can conclude that μ is not equal to 30We can conclude that μ is not equal to 30

Using the p value ,we note that p-value Using the p value ,we note that p-value =0.0340< 0.05,therefore we reject H0 =0.0340< 0.05,therefore we reject H0

Page 168: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

168168

Example7.2.2 page227Example7.2.2 page227 Referring to example 7.2.1.Suppose that the Referring to example 7.2.1.Suppose that the

researchers have asked: Can we conclude that researchers have asked: Can we conclude that μ<30.μ<30.

1.Data.1.Data.see previous examplesee previous example2. Assumptions .2. Assumptions .see previous examplesee previous example3.Hypotheses:3.Hypotheses: HH00 μ =30 μ =30 HH ِِAA: μ < 30: μ < 30

Page 169: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

169169

4.Test Statistic4.Test Statistic : :

= = = -2.12 = -2.12

5. 5. DecisionDecision RuleRule: : Reject HReject H00 if Z< Z if Z< Z αα, where , where

Z Z αα= -1.645. (from table D) = -1.645. (from table D)

6. 6. DecisionDecision: : Reject HReject H00 ,thus we can conclude that the ,thus we can conclude that the population mean is smaller than 30. population mean is smaller than 30.

n

XZ

o-

1020

3027

Page 170: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

170170

Example7.2.4 page232Example7.2.4 page232 Among 157 African-American men ,the mean Among 157 African-American men ,the mean

systolic blood pressure was 146 mm Hg with a systolic blood pressure was 146 mm Hg with a standard deviation of 27. We wish to know if standard deviation of 27. We wish to know if on the basis of these data, we may conclude on the basis of these data, we may conclude that the mean systolic blood pressure for a that the mean systolic blood pressure for a population of African-American is greater than population of African-American is greater than 140. Use α=0.01.140. Use α=0.01.

Page 171: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

171171

SolutionSolution1. 1. Data:Data: Variable is systolic blood pressure, Variable is systolic blood pressure,

n=157 , =146, s=27, α=0.01.n=157 , =146, s=27, α=0.01.2. 2. Assumption:Assumption: population is not normal, σ population is not normal, σ22 is is

unknownunknown3. 3. Hypotheses:Hypotheses: HH00 :μ=140 :μ=140

HHAA: μ>140 : μ>140

4.Test Statistic:4.Test Statistic: = = = 2.78= = = 2.78

ns

XZ o-

15727

140146 1548.26

Page 172: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

172172

5. Desicion Rule:5. Desicion Rule: we reject Hwe reject H00 if Z>Z if Z>Z1-α1-α

= Z= Z0.990.99= 2.33 = 2.33 (from table D)(from table D)

6. 6. Desicion:Desicion: We reject H We reject H00. . Hence we may conclude that the mean systolic Hence we may conclude that the mean systolic

blood pressure for a population of African-blood pressure for a population of African-American is greater than 140.American is greater than 140.

Page 173: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

173173

7.37.3 Hypothesis Testing :The Difference Hypothesis Testing :The Difference between two population meanbetween two population mean ::

We have the following steps:We have the following steps:1.1.DataData:: determine variable, sample size (n), sample means, determine variable, sample size (n), sample means,

population standard deviation or samples standard population standard deviation or samples standard deviation (s) if is unknown for two population.deviation (s) if is unknown for two population.

2. 2. Assumptions :Assumptions : We have two cases: We have two cases: Case1:Case1: Population is normally or approximately normally Population is normally or approximately normally

distributed with known or unknown variance (sample size distributed with known or unknown variance (sample size n may be small or large), n may be small or large),

Case 2:Case 2: Population is not normal with known variances (n Population is not normal with known variances (n is large i.e. n≥30).is large i.e. n≥30).

Page 174: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

174174

3.Hypotheses:3.Hypotheses: we have three caseswe have three cases Case ICase I : : H H00: : μ μ 11 == μ μ2 → 2 → μ μ 11 - - μμ22 = 0= 0

HHAA: : μ μ 1 1 ≠ ≠ μ μ 2 2 → → μ μ 1 1 -- μ μ 2 2 ≠ 0≠ 0 e.g. we want to test that the mean for first e.g. we want to test that the mean for first

population is different from second population population is different from second population mean.mean.

Case IICase II : : H H00: : μ μ 11 == μ μ2 → 2 → μ μ 11 - - μμ22 = 0= 0

HHAA: : μ μ 1 1 >> μ μ 2 2 →→ μ μ 1 1 -- μ μ 2 2 >> 0 0 e.g. we want to test that the mean for first e.g. we want to test that the mean for first

population is greater than second population mean.population is greater than second population mean. Case IIICase III : : HH00: : μ μ 11 == μ μ2 → 2 → μ μ 11 - - μμ22 = 0= 0

HHAA: : μ μ 1 1 << μ μ 2 2 →→ μ μ 1 1 -- μ μ 2 2 < 0< 0 e.g. we want to test that the mean for first e.g. we want to test that the mean for first

population is greater than second population mean.population is greater than second population mean.

Page 175: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

175175

4.Test Statistic4.Test Statistic:: Case 1:Case 1: Two population is normalTwo population is normal or or approximately approximately

normalnormal σσ22 is known σ is known σ22 is unknown if is unknown if

( n ( n11 ,n ,n22 large or small) large or small) ( n ( n11 ,n ,n22 small) small)

populationpopulation populationpopulation VariancesVariances Variances equal not equalVariances equal not equal

wherewhere

2

22

1

21

2121 )(- )X-X(

nn

Z

21

2121

11)(- )X-X(

nnS

T

p

2

22

1

21

2121 )(- )X-X(

nS

nS

T

2)1(n)1(n

21

222

2112

nn

SSS p

Page 176: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

176176

Case2:Case2: If population is If population is not normallynot normally distributed distributed and nand n1, 1, nn2 2 is large(is large(nn1 1 ≥ 0 ,n≥ 0 ,n22≥ 0) ≥ 0) and population variances is known, and population variances is known,

2

22

1

21

2121 )(- )X-X(

nn

Z

Page 177: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

177177

5.Decision Rule:5.Decision Rule:i) i) If If HHAA: : μ μ 1 1 ≠ ≠ μ μ 2 2 → → μ μ 1 1 -- μ μ 2 2 ≠ 0≠ 0

Reject H Reject H 00 if Z >Z if Z >Z1-α/2 1-α/2 or Z< - Zor Z< - Z1-α/21-α/2

(when use Z - test) (when use Z - test) OrOr Reject H Reject H 00 if T >t if T >t1-α/2 ,(n1-α/2 ,(n11+n+n22 -2) -2) or T< - tor T< - t1-α/2,,(n1-α/2,,(n11+n+n22 -2) -2)

))when use T- testwhen use T- test ( ( ____________________________________________________ ii) ii) HHAA: : μ μ 1 1 >> μ μ 2 2 →→ μ μ 1 1 -- μ μ 2 2 >> 0 0

Reject HReject H00 if Z>Z if Z>Z1-α1-α (when use Z - test) (when use Z - test) OrOr Reject H Reject H00 if T>t if T>t1-α,(n1-α,(n11+n+n22 -2) -2) (when use T - test)(when use T - test)

Page 178: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

178178

iii) If iii) If HHAA: : μ μ 1 1 << μ μ 2 2 →→ μ μ 1 1 -- μ μ 2 2 < 0< 0 Reject H Reject H00 if Z< - Zif Z< - Z1-1-α α (when use Z - test) (when use Z - test)

OrOrReject HReject H00 if T<- t if T<- t1-1-α, ,(nα, ,(n11+n+n22 -2) -2) (when use T - test)(when use T - test)

NoteNote:: ZZ1-α/21-α/2 , Z , Z1-α1-α , Z , Zαα are tabulated values obtained from are tabulated values obtained from

table Dtable Dtt1-α/21-α/2 , t , t1-α1-α , t , tαα are tabulated values obtained from are tabulated values obtained from

table E with (ntable E with (n11+n+n22 -2) -2) degree of freedom (df)degree of freedom (df)

6.6. Conclusion: Conclusion: reject or fail to reject Hreject or fail to reject H00

Page 179: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

179179

Example7.3.1 page238Example7.3.1 page238 Researchers wish to know if the data have collected provide Researchers wish to know if the data have collected provide

sufficient evidence to indicate a difference in mean serum sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individual uric acid levels between normal individuals and individual with Down’s syndrome. The data consist of serum uric with Down’s syndrome. The data consist of serum uric reading on 12 individuals with Down’s syndrome from reading on 12 individuals with Down’s syndrome from normal distribution with variance 1 and 15 normal individuals normal distribution with variance 1 and 15 normal individuals from normal distribution with variance 1.5 . The mean arefrom normal distribution with variance 1.5 . The mean are

andand α=0.05.α=0.05. Solution:Solution:1. 1. Data:Data: Variable is Variable is serum uric acid levelsserum uric acid levels, n, n11=12 , n=12 , n22=15, =15,

σσ2211=1, σ=1, σ22

22=1.5 ,α=0.05.=1.5 ,α=0.05.

100/5.41 mgX 100/4.32 mgX

Page 180: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

180180

2. 2. Assumption:Assumption: Two population are normal, σ Two population are normal, σ221 1 , σ, σ22

22 are knownare known

3. 3. Hypotheses:Hypotheses: HH00: : μ μ 11 == μ μ2 → 2 → μ μ 11 - - μμ22 = 0= 0

HHAA: : μ μ 1 1 ≠ ≠ μ μ 2 2 → → μ μ 1 1 -- μ μ 2 2 ≠ 0≠ 0

4.Test Statistic:4.Test Statistic: = = 2.57= = 2.57

5. Desicion Rule:5. Desicion Rule: Reject H Reject H 00 if Z >Z if Z >Z1-α/2 1-α/2 or Z< - Zor Z< - Z1-α/21-α/2

ZZ1-α/2= 1-α/2= ZZ1-0.05/2= 1-0.05/2= ZZ0.975=0.975=1.96 (from table D)1.96 (from table D)6-6-Conclusion: Conclusion: Reject Reject HH0 0 sincesince 2.57 > 1.962.57 > 1.96Or if p-value =0.102→ reject Or if p-value =0.102→ reject HH0 0 if pif p << αα → then reject → then reject HH0 0

2

22

1

21

2121 )(- )X-X(

nn

Z

155.1

121

)0(- 3.4)-(4.5

Page 181: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

181181

Example7.3.2 page 240Example7.3.2 page 240The purpose of a study by Tam, was to investigate wheelchairThe purpose of a study by Tam, was to investigate wheelchairManeuvering in individuals with over-level spinal cord injury (SCI)Maneuvering in individuals with over-level spinal cord injury (SCI)And healthy control (C). Subjects used a modified a wheelchair toAnd healthy control (C). Subjects used a modified a wheelchair toincorporate a rigid seat surface to facilitate the specifiedincorporate a rigid seat surface to facilitate the specifiedexperimental measurements. The data for measurements of theexperimental measurements. The data for measurements of theleft ischial tuerosity left ischial tuerosity ( ( المتحرك الكرسي من وتأثيرها الفخذ المتحرك عظام الكرسي من وتأثيرها الفخذ for for ( (عظام

SCI and control C are shown belowSCI and control C are shown below

C13111512413112211788114150169

SCI60150130180163130121119130143

Page 182: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

182182

We wish to know if we can conclude, on the We wish to know if we can conclude, on the basis of the above data that the mean of basis of the above data that the mean of left ischial tuberosity for control C lower left ischial tuberosity for control C lower than mean of left ischial tuerosity for SCI, than mean of left ischial tuerosity for SCI, Assume normal populations Assume normal populations equalequal variancesvariances. . αα=0.05, p-value = -1.33=0.05, p-value = -1.33

Page 183: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

183183

Solution:Solution:1. 1. Data:Data:, n, nCC=10 , n=10 , nSCISCI=10, S=10, SCC=21.8, S=21.8, SSCISCI=133.1 ,α=0.05.=133.1 ,α=0.05. ,, (calculated from data)(calculated from data)2.2.Assumption:Assumption: Two population are normal, σ Two population are normal, σ22

1 1 , σ, σ2222 are are

unknown but unknown but equalequal3. 3. Hypotheses:Hypotheses: HH00: : μ μ CC == μ μ SCISCI → → μ μ CC - - μ μ SCISCI = 0= 0

HHAA: : μ μ C C < < μ μ SCI SCI → → μ μ C C -- μ μ SCI SCI < 0< 0

4.Test Statistic:4.Test Statistic:

Where,Where,

1.126CX 1.133SCIX

569.0

101

10104.756

0)1.1331.126(11

)(- )X-X(

21

2121

nnS

T

p

04.75621010

)3.32(9)8.21(92

)1(n)1(n 22

21

222

2112

nn

SSS p

Page 184: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

184184

5. Decision Rule:5. Decision Rule: Reject H Reject H 00 if T< - T if T< - T1-α,(n1-α,(n11+n+n22 -2) -2)

TT1-α,(n1-α,(n11+n+n22 -2) = -2) = TT0.95,18 =0.95,18 = 1.7341 (from table E) 1.7341 (from table E)

6-6-Conclusion: Conclusion: Fail toFail to reject reject HH0 0 sincesince -0.569 < - -0.569 < - 1.73411.7341OrOrFail to reject Fail to reject HH0 0 since p = -1.33 since p = -1.33 >> αα =0.05 =0.05

Page 185: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

185185

Example7.3.3 page 241Example7.3.3 page 241Dernellis and Panaretou examined subjects with hypertension Dernellis and Panaretou examined subjects with hypertension and healthy control subjects .One of the variables of interest wasand healthy control subjects .One of the variables of interest wasthe aortic stiffness index. Measures of this variable werethe aortic stiffness index. Measures of this variable werecalculated From the aortic diameter evaluated by M-mode andcalculated From the aortic diameter evaluated by M-mode andblood pressure measured by a sphygmomanometer. Physics wishblood pressure measured by a sphygmomanometer. Physics wishto reduce aortic stiffness. In the 15 patients with hypertensionto reduce aortic stiffness. In the 15 patients with hypertension(Group 1),the mean aortic stiffness index was 19.16 with a(Group 1),the mean aortic stiffness index was 19.16 with astandard deviation of 5.29. In the30 control subjects (Group 2),thestandard deviation of 5.29. In the30 control subjects (Group 2),themean aortic stiffness index was 9.53 with a standard deviation ofmean aortic stiffness index was 9.53 with a standard deviation of2.69. We wish to determine if the two populations represented by2.69. We wish to determine if the two populations represented bythese samples differ with respect to mean stiffness index .we wishthese samples differ with respect to mean stiffness index .we wishto know if we can conclude that in general a person withto know if we can conclude that in general a person withthrombosis have on the average higher IgG levels than personsthrombosis have on the average higher IgG levels than personswithout thrombosis at without thrombosis at αα=0.01, p-value = 0.0559=0.01, p-value = 0.0559

Page 186: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

186186

Solution:Solution:1. 1. Data:Data:, n, n11=53 , n=53 , n22=54, S=54, S11= = 44.8944.89, S, S22= = 34.8534.85 α=0.01. α=0.01.

2.2.Assumption:Assumption: Two population are not normal, σ Two population are not normal, σ221 1 , σ, σ22

22 are unknown and sample size largeare unknown and sample size large

3. 3. Hypotheses:Hypotheses: HH00: : μ μ 11 == μ μ 2 2 → → μ μ 11 - - μ μ 22 = 0= 0

HHAA: : μ μ 1 1 > > μ μ 2 2 → → μ μ 1 1 -- μ μ 2 2 > 0> 0

4.Test Statistic:4.Test Statistic:

GroupMean LgG levelSample Size

}ٍstandard deviation

Thrombosis59.015344.89No Thrombosis

46.615434.85

59.1

5485.34

5389.44

0)61.4601.59()(- )X-X(22

2

22

1

21

2121

nS

nS

Z

Page 187: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

187187

5. Decision Rule:5. Decision Rule: Reject H Reject H 00 if Z > Z if Z > Z1-α1-α

ZZ1-α = 1-α = ZZ0.99 =0.99 = 2.33 (from table D) 2.33 (from table D)

6-6-Conclusion: Conclusion: Fail toFail to reject reject HH0 0 sincesince 1.59 > 2.33 1.59 > 2.33OrOrFail to reject Fail to reject HH0 0 since p = 0.0559 since p = 0.0559 >> αα =0.01 =0.01

Page 188: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

188188

7.57.5 Hypothesis Testing A single Hypothesis Testing A single population proportionpopulation proportion::

Testing hypothesis about population proportion (P) is carried out Testing hypothesis about population proportion (P) is carried out in much the same way as for mean when condition is necessary forin much the same way as for mean when condition is necessary forusing normal curve are metusing normal curve are met We have the following steps:We have the following steps:1.1.DataData:: sample size (n), sample proportion( ) , P sample size (n), sample proportion( ) , P00

2. 2. Assumptions :Assumptions :normal distributionnormal distribution , ,

na

p sample in theelement of no. Totalisticcharachtar some with sample in theelement of no.

ˆ

Page 189: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

189189

3.Hypotheses:3.Hypotheses: we have three caseswe have three cases Case ICase I : : H H00: P = P: P = P00

HHAA: : P ≠ PP ≠ P00

Case IICase II : : H H00: P = P: P = P00

HHAA: : PP > > PP00

Case IIICase III : : HH00: P = P: P = P00

HHAA: : P P < < PP00

4.Test Statistic4.Test Statistic::

Where Where HH00 is true ,is distributed approximately as the is true ,is distributed approximately as the standard normalstandard normal

nqpppZ

00

Page 190: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

190190

5.Decision Rule:5.Decision Rule:i) i) If HIf HAA: P ≠ P: P ≠ P00 Reject H Reject H 00 if Z >Z if Z >Z1-α/2 1-α/2 or Z< - Zor Z< - Z1-α/21-α/2 ______________________________________________ ii) If Hii) If HAA: P> P: P> P00 Reject HReject H00 if Z>Z if Z>Z1-α1-α __________________________________________________________ iii) If Hiii) If HAA: P< P: P< P00 Reject HReject H00 if Z< - Z if Z< - Z1-1-α α

NoteNote: Z: Z1-α/21-α/2 , Z , Z1-α1-α , Z , Zαα are tabulated values obtained from are tabulated values obtained from table Dtable D

6.6. ConclusionConclusion: : reject or fail to reject Hreject or fail to reject H00

Page 191: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

191191

2.2. Assumptions : Assumptions : is approximatelyis approximately normaly distributednormaly distributed3.Hypotheses:3.Hypotheses: we have three caseswe have three cases HH00: P = 0.063: P = 0.063 HHAA: : PP > 0.063 > 0.063 4.Test Statistic 4.Test Statistic ::

5.Decision Rule: 5.Decision Rule: Reject HReject H00 if Z>Z if Z>Z1-α1-α

Where Where ZZ1-α 1-α = Z= Z1-0.051-0.05 =Z =Z0.950.95== 1.6451.645

21.1

301)0.937(063.0

063.008.0ˆ

00

0

nqpppZ

Page 192: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

192192

6.6. Conclusion: Conclusion: Fail to reject HFail to reject H00

SinceSince Z =1.21 > ZZ =1.21 > Z1-α=1-α=1.6451.645Or , Or , If P-value = 0.1131,If P-value = 0.1131, fail to reject Hfail to reject H0 0 → P > → P > αα

Page 193: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

193193

Example7.5.1 page 259Example7.5.1 page 259Wagen collected data on a sample of 301 Hispanic womenWagen collected data on a sample of 301 Hispanic womenLiving in Texas .One variable of interest was the percentageLiving in Texas .One variable of interest was the percentageof subjects with impaired fasting glucose (IFG). In theof subjects with impaired fasting glucose (IFG). In thestudy,24 women were classified in the (IFG) stage .The articlestudy,24 women were classified in the (IFG) stage .The articlecites population estimates for (IFG) among Hispanic womencites population estimates for (IFG) among Hispanic womenin Texas as 6.3 percent .Is there sufficient evidence toin Texas as 6.3 percent .Is there sufficient evidence toindicate that the population Hispanic women in Texas has aindicate that the population Hispanic women in Texas has aprevalence of IFG higher than 6.3 percent ,let prevalence of IFG higher than 6.3 percent ,let αα=0.05=0.05Solution:Solution:1.Data:1.Data: n = 301, p n = 301, p00 = 6.3/100=0.063 ,a=24,= 6.3/100=0.063 ,a=24,

qq00 =1- p=1- p00 = 1- 0.063 =0.937, = 1- 0.063 =0.937, αα=0.05=0.05

08.030124ˆ

nap

Page 194: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

194194

7.67.6 Hypothesis Testing :TheHypothesis Testing :The Difference between two Difference between two

population proportionpopulation proportion:: Testing hypothesis about two population proportion (PTesting hypothesis about two population proportion (P1,, 1,, PP2 2 ) is) iscarried out in much the same way as for difference between twocarried out in much the same way as for difference between twomeans when condition is necessary for using normal curve are metmeans when condition is necessary for using normal curve are met We have the following steps:We have the following steps:1.Data1.Data:: sample size (n sample size (n1 1 ووnn22), sample proportions( ), ), sample proportions( ), Characteristic in two samples (x1 , x2),

2- Assumption : Two populations are independent .

21ˆ,ˆ PP

21

21

nnxxp

Page 195: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

195195

3.Hypotheses:3.Hypotheses: we have three caseswe have three cases Case ICase I : : H H00: P: P11 = P = P22 → → PP11 - P - P22 = 0 = 0 HHAA: P: P1 1 ≠ ≠ PP2 2 → → PP11 - P - P22 ≠ 0 ≠ 0 Case IICase II : : H H00: P: P1 1 = P = P2 2 → → PP11 - P - P22 = 0 = 0 HHAA: P: P1 1 > P > P2 2 → → PP11 - P - P22 > 0 > 0 Case IIICase III : : HH00: P: P11 = P = P2 2 → → PP11 - P - P22 = 0 = 0 HHAA: P: P11 < P< P2 2 → → PP11 - P - P22 < 0 < 0 4.Test Statistic4.Test Statistic::

Where Where HH00 is true ,is distributed approximately as the is true ,is distributed approximately as the standard normalstandard normal

21

2121

)1()1()()ˆˆ(

npp

npp

ppppZ

Page 196: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

196196

5.Decision Rule:5.Decision Rule:i) i) If HIf HAA: P: P11 ≠ P ≠ P22 Reject H Reject H 00 if Z >Z if Z >Z1-α/2 1-α/2 or Z< - Zor Z< - Z1-α/21-α/2 ______________________________________________ ii) If Hii) If HAA: P: P11 > P > P22 Reject HReject H00 if Z >Z if Z >Z1-α1-α __________________________________________________________ iii) If Hiii) If HAA: P: P11 < P < P22

Reject HReject H00 if Z< - Z if Z< - Z1-1-α α

NoteNote: Z: Z1-α/21-α/2 , Z , Z1-α1-α , Z , Zαα are tabulated values obtained from are tabulated values obtained from table Dtable D

6.6. ConclusionConclusion: : reject or fail to reject Hreject or fail to reject H00

Page 197: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

197197

Example7.6.1 page 262Example7.6.1 page 262Noonan is a genetic condition that can affect the heart growth,Noonan is a genetic condition that can affect the heart growth,blood clotting and mental and physical development. Noonan examinedblood clotting and mental and physical development. Noonan examinedthe stature of men and women with Noonan. The study contained 29the stature of men and women with Noonan. The study contained 29Male and 44 female adults. One of the cut-off values used to assessMale and 44 female adults. One of the cut-off values used to assessstature was the third percentile of adult height .Eleven of the males fellstature was the third percentile of adult height .Eleven of the males fellbelow the third percentile of adult male height ,while 24 of the femalebelow the third percentile of adult male height ,while 24 of the femalefell below the third percentile of female adult height .Does this study fell below the third percentile of female adult height .Does this study provide sufficient evidence for us to conclude that among subjects with provide sufficient evidence for us to conclude that among subjects with Noonan ,females are more likely than males to fall below the respectiveNoonan ,females are more likely than males to fall below the respectiveof adult height? Let of adult height? Let αα=0.05=0.05Solution:Solution:1.Data:1.Data: n n MM = 29, n = 29, n FF = 44 , x = 44 , x MM= 11 , x = 11 , x FF= 24, = 24, αα=0.05=0.05

479.044292411

FM

FM

nnxxp 545.0

4424ˆ,379.0

2911ˆ

F

FF

M

mM n

xpnxp

Page 198: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

198198

2- Assumption : Two populations are independent .3.Hypotheses:3.Hypotheses: Case IICase II : : H H00: P: PF F = P = PM M → → PPFF - P - PMM = 0 = 0 HHAA: P: PF F > P > PM M → → PPFF - P - PMM > 0 > 0 4.Test Statistic4.Test Statistic::

5.Decision Rule:5.Decision Rule:Reject HReject H00 if Z >Z if Z >Z1-α1-α , Where Z , Where Z1-α 1-α = Z= Z1-0.051-0.05 =Z =Z0.950.95== 1.6451.645 6.6. Conclusion: Conclusion: Fail to reject HFail to reject H00

Since Z =1.39 > ZSince Z =1.39 > Z1-α=1-α=1.6451.645Or , If P-value = 0.0823 → fail to reject HOr , If P-value = 0.0823 → fail to reject H0 0 → P > → P > αα

39.1

29)521.0)(479.0(

44)521.0)(479.0(

0)379.0545.0()1()1()()ˆˆ(

21

2121

npp

npp

ppppZ

Page 199: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MeText Book : Basic Concepts and Methodology for the Health Sciences thodology for the Health Sciences

199199

Exercises:Exercises: Questions Questions : Page 234 -237: Page 234 -237 7.2.1,7.8.2 ,7.3.1,7.3.6 ,7.5.2 ,,7.6.17.2.1,7.8.2 ,7.3.1,7.3.6 ,7.5.2 ,,7.6.1

H.WH.W: : 7.2.8,7.2.9, 7.2.11, 7.2.15,7.3.7,7.3.8,7.3.107.2.8,7.2.9, 7.2.11, 7.2.15,7.3.7,7.3.8,7.3.10 7.5.3,7.6.47.5.3,7.6.4

Page 200: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

200

Chapter 9Chapter 9 Statistical Inference and TheStatistical Inference and The

Relationship between two Relationship between two variablesvariables

Prepared By : Dr. Shuhrat KhanPrepared By : Dr. Shuhrat Khan

Page 201: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

201

REGRESSION REGRESSION CORRELATIONCORRELATIONANALYSIS OF ANALYSIS OF VARIANCEVARIANCE

•Regression, Correlation and Analysis Regression, Correlation and Analysis of Covariance are all statistical of Covariance are all statistical

techniques that use the idea that one techniques that use the idea that one variable say, may be related to one or variable say, may be related to one or more variables through an equation. more variables through an equation. Here we consider the relationship of Here we consider the relationship of

two variables only in a linear form, two variables only in a linear form, which is called linear regression and which is called linear regression and

linear correlation; or simple linear correlation; or simple regression and correlation. The regression and correlation. The

relationships between more than two relationships between more than two variables, called multiple regression variables, called multiple regression

and correlation will be considered and correlation will be considered laterlater..

•Simple regression uses the Simple regression uses the relationship between the two variables relationship between the two variables

to obtain information about one to obtain information about one variable by knowing the values of the variable by knowing the values of the other. The equation showing this type other. The equation showing this type of relationship is called simple linear of relationship is called simple linear

regression equation. The related regression equation. The related method of correlation is used to method of correlation is used to

measure how strong the relationship is measure how strong the relationship is between the two variables isbetween the two variables is..

201201

EQUATION OF REGRESSIONEQUATION OF REGRESSION

Page 202: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

202

Line of RegressionLine of Regression

•Simple Linear RegressionSimple Linear Regression::•Suppose that we are interested in a variable Y, Suppose that we are interested in a variable Y,

but we want to know about its relationship to but we want to know about its relationship to another variable X or we want to use X to another variable X or we want to use X to

predict (or estimate) the value of Y that might predict (or estimate) the value of Y that might be obtained without actually measuring it, be obtained without actually measuring it,

provided the relationship between the two can provided the relationship between the two can be expressed by a line.’ X’ is usually called thebe expressed by a line.’ X’ is usually called the

independent variableindependent variable and ‘Y’ is called the and ‘Y’ is called the dependent variabledependent variable..

•  •We assume that the values of variable X are We assume that the values of variable X are

either fixed or random. By fixed, we mean that either fixed or random. By fixed, we mean that the values are chosen by researcher--- either the values are chosen by researcher--- either

an experimental unit (patient) is given this an experimental unit (patient) is given this value of X (such as the dosage of drug or a value of X (such as the dosage of drug or a

unit (patient) is chosen which is known to have unit (patient) is chosen which is known to have this value of Xthis value of X . .

•By random, we mean that units (patients) are By random, we mean that units (patients) are chosen at random from all the possible units,, chosen at random from all the possible units,,

and both variables X and Y are measuredand both variables X and Y are measured..•We also assume that for each value of x of X, We also assume that for each value of x of X,

there is a whole range or population of there is a whole range or population of possible Y values and that the mean of the Y possible Y values and that the mean of the Y

population at X = x, denoted by population at X = x, denoted by µµy/xy/x , is a linear , is a linear function of x. That isfunction of x. That is,,

•  •µµy/xy/x = α +βx = α +βx

DEPENDENT VARIABLEDEPENDENT VARIABLEINDEPENDENT VARIABLEINDEPENDENT VARIABLE

TWO RANDOM VARIABLETWO RANDOM VARIABLEOROR

BIVARIATEBIVARIATERANDOMRANDOM

VARIABLEVARIABLE

Page 203: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

203

ESTIMATIONESTIMATION

•Estimate α and βEstimate α and β..•Predict the value of Y at Predict the value of Y at

a given value x of Xa given value x of X..•Make tests to draw Make tests to draw

conclusions about the conclusions about the model and its usefulnessmodel and its usefulness..

  •We estimate the We estimate the

parameters α and β by ‘a’ parameters α and β by ‘a’ and ‘b’ respectively by and ‘b’ respectively by

using sample regression using sample regression lineline::

•Ŷ = a+ bxŶ = a+ bx•Where we calculateWhere we calculate•

We select a sample ofWe select a sample of n observations n observations (x(xii,y,yii))

from the populationfrom the population , ,WITHWITH

the goalsthe goals

Page 204: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

204

BB= =

ESTIMATION AND CALCULATION OF CONSTANTS , ‘’a’’ AND ‘’b’’

Page 205: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

205

EXAMPLEEXAMPLE•investigators at a sports health centre investigators at a sports health centre

are interested in the relationship are interested in the relationship between oxygen consumption and between oxygen consumption and

exercise time in athletes recovering exercise time in athletes recovering from injury. Appropriate mechanics from injury. Appropriate mechanics

for exercising and measuring oxygen for exercising and measuring oxygen consumption are set up, and the consumption are set up, and the

results are presented belowresults are presented below : :–x variablex variable

Page 206: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

206

exercise time

) min(

0.51.01.52.02.53.03.54.04.55.0

y variableoxygen consumption

620630800840840870

1010940950

1130

Page 207: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

207

calculationscalculations•

or

Page 208: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

208

Pearson’s Correlation Pearson’s Correlation CoefficientCoefficient • With the aid of Pearson’s correlation With the aid of Pearson’s correlation

coefficient (coefficient (rr), we can determine the ), we can determine the strength and the direction of the strength and the direction of the relationship between relationship between XX and and YY variables, variables,

• both of which have been measured both of which have been measured and they must be quantitative. and they must be quantitative.

• For example, we might be interested For example, we might be interested in examining the association between in examining the association between height and weight for the following height and weight for the following sample of eight children:sample of eight children:

Page 209: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

209

Height and weights of 8 Height and weights of 8 childrenchildren

ChildHeight(inches)XWeight(pounds)Y

A4981B5088C5387D5599E6091F5589G6095H5090

Average = )54 inches( = )90 pounds(

Page 210: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

210

Scatter plot for 8 babiesScatter plot for 8 babiesheight weight

49 8150 8853 8355 9960 9155 8960 9550 90

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70

متسلسلة1

Page 211: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

211

Table : The Strength of a Table : The Strength of a CorrelationCorrelation

•   • Value of r (positive or negative) Value of r (positive or negative)

MeaningMeaning• ______________________________________________________________________________________________________________•   • 0.00 to 0.190.00 to 0.19 A very weak correlation A very weak correlation• 0.20 to 0.390.20 to 0.39 A weak correlation A weak correlation• 0.40 to 0.690.40 to 0.69 A modest correlation A modest correlation• 0.70 to 0.890.70 to 0.89 A strong correlation A strong correlation• 0.90 to 1.000.90 to 1.00 A very strong correlationA very strong correlation• ________________________________________________________________________________________________________________

Page 212: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

212

FORMULA FOR FORMULA FOR CORRELATION CORRELATION

COEFFECIENT ( r )COEFFECIENT ( r )

•   With Pearson’s With Pearson’s rr, , • means that we add the products of the deviations to see if the means that we add the products of the deviations to see if the

positive products or negative products are more abundant and positive products or negative products are more abundant and sizable. Positive products indicate cases in which the variables sizable. Positive products indicate cases in which the variables go in the same direction (that is, both taller or heavier than go in the same direction (that is, both taller or heavier than average or both shorter and lighter than average); average or both shorter and lighter than average);

• negative products indicate cases in which the variables go in negative products indicate cases in which the variables go in opposite directions (that is, taller but lighter than average or opposite directions (that is, taller but lighter than average or shorter but heavier than average).shorter but heavier than average).

•   

Page 213: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

213

Computational Formula for Pearsons’s Correlation Computational Formula for Pearsons’s Correlation Coefficient rCoefficient r •

Where SP (sum of the product), SSx (Sum of the squares for x) and SSy (sum of the squares for y) can be computed as follows:

Page 214: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

214

ChildXYX2Y2XY

A 1212 144144144B 10 8100 64 80C 612 3614472D 1611256121176

E 810 64 100 80F 9 8 8164 72G 1216144256192H 1115121225165

∑84 92 946 1118 981

Page 215: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

215

Table 2 : Chest circumference Table 2 : Chest circumference and Birth Weight of 10 babiesand Birth Weight of 10 babies

• X(cm)X(cm) y(kg)y(kg) xx22 yy22 xy xy• ______________________________________________________________________________________________________• 22.422.4 2.002.00 501.76501.76 4.004.00 44.8 44.8• 27.527.5 2.252.25 756.25756.25 5.065.06 61.88 61.88• 28.528.5 2.102.10 812.25812.25 4.41 59.854.41 59.85• 28.528.5 2.352.35 812.25812.25 5.525.52 66.98 66.98• 29.429.4 2.452.45 864.36864.36 6.006.00 72.03 72.03• 29.429.4 2.502.50 864.36864.36 6.256.25 73.5 73.5• 30.530.5 2.802.80 930.25930.25 7.847.84 85.4 85.4• 32.032.0 2.802.80 1024.01024.0 7.847.84 89.6 89.6• 31.431.4 2.552.55 985.96985.96 6.506.50 80.07 80.07• 32.532.5 3.003.00 1056.25 9.001056.25 9.00 97.5 97.5• TOTALTOTAL• 292.1292.1 24.824.8 8607.69 62.42 8607.69 62.42 731.61 731.61

Page 216: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

216

Checking for significanceChecking for significance

• There appears to be a strong between chest There appears to be a strong between chest circumference and birth weight in babies.circumference and birth weight in babies.

• We need to check that such a correlation is unlikely to We need to check that such a correlation is unlikely to have arisen by in a sample of ten babies. have arisen by in a sample of ten babies.

• Tables are available that gives the significant values of Tables are available that gives the significant values of this correlation ratio at two probability levels.this correlation ratio at two probability levels.

• First we need to work out degrees of freedom. They First we need to work out degrees of freedom. They are the number of pair of observations less two, that is are the number of pair of observations less two, that is (n – 2)= 8. (n – 2)= 8.

• Looking at the table we find that our calculated value Looking at the table we find that our calculated value of 0.86 exceeds the tabulated value at 8 df of 0.765 at of 0.86 exceeds the tabulated value at 8 df of 0.765 at p= 0.01. Our correlation is therefore statistically highly p= 0.01. Our correlation is therefore statistically highly significant.significant.

Page 217: Introduction to Biostatistics-145 Lectures4

Chapter 12Chapter 12Analysis of Frequency DataAnalysis of Frequency DataAn Introduction to the Chi-An Introduction to the Chi-

SquareSquareDistributionDistribution

Prepared By : Dr. Shuhrat KhanPrepared By : Dr. Shuhrat Khan

Page 218: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

218

TESTS OF INDEPENDENCETESTS OF INDEPENDENCE To test whether two criteria of classification To test whether two criteria of classification

are independent . For example are independent . For example socioeconomic status and area of residence socioeconomic status and area of residence of people in a city are independent.of people in a city are independent.

We divide our sample according to status, We divide our sample according to status, low, medium and high incomes etc. and the low, medium and high incomes etc. and the same samples is categorized according to same samples is categorized according to urban, rural or suburban and slums etc. urban, rural or suburban and slums etc.

Put the first criterion in columns equal in Put the first criterion in columns equal in number to classification of 1number to classification of 1stst criteria criteria ( Socioeconomic status) and the 2( Socioeconomic status) and the 2ndnd in rows, in rows, where the no. of rows equal to the no. of where the no. of rows equal to the no. of categories of 2categories of 2ndnd criteria (areas of cities). criteria (areas of cities).

Page 219: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

219

The Contingency TableThe Contingency Table Table Two-Way Classification of Table Two-Way Classification of

samplesample First Criterion of Classification →First Criterion of Classification → Second

Criterion↓ 12

3

..…cTotal123..

r

N11

N21

N31

.

.

Nr1

N12

N22

N32

.

.

Nr2

N13

N 23

N33

.

.

Nr3

…………...………

N1c

N2c

N3c

.

.

N rc

N1.

N2.

N3.

.

.

Nr.

TotalN.1N.2N.3……N.cN

Page 220: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

220

Observed versus Expected Observed versus Expected FrequenciesFrequencies

OOi ji j : The frequencies in ith row and jth column : The frequencies in ith row and jth column given in any contingency table are called given in any contingency table are called observed frequencies that result form the cross observed frequencies that result form the cross classification according to the two classifications.classification according to the two classifications.

eei ji j :Expected frequencies on the assumption of :Expected frequencies on the assumption of independence of two criterion are calculated by independence of two criterion are calculated by multiplying the marginal totals of any cell and multiplying the marginal totals of any cell and then dividing by total frequencythen dividing by total frequency

Formula: Formula:

NNNe ji

ij

)((

Page 221: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

221

Chi-square TestChi-square Test After the calculations of expected frequency,After the calculations of expected frequency, Prepare a table for expected frequencies and use Prepare a table for expected frequencies and use

Chi-squareChi-square

Where summation is for all values of r xc = k Where summation is for all values of r xc = k cells.cells.

D.F.: the degrees of freedom for using the table are D.F.: the degrees of freedom for using the table are (r-1)(c-1) for (r-1)(c-1) for αα level of significance level of significance

Note that the test is always one-sided.Note that the test is always one-sided.

k

i

eeoi

ii1

2 ])([2

Page 222: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

222

Example 12.401(page 613)Example 12.401(page 613) The researcher are interested to determine that The researcher are interested to determine that

preconception use of folic acid and race are preconception use of folic acid and race are independent. The data is:independent. The data is:

Observed Frequencies Table Expected Observed Frequencies Table Expected frequencies Tablefrequencies Table

Use of Folic

Acidtotal

Yes

No

WhiteBlackOther

260157

2994114

5595621

Total282354636

YesnoTotalWhite

Black

Others

)282)(559/(636

=247.86

)282)(56/(636

=24.83)282))(21 (

=9.31

)354)(559/(636

=311.14

)354)(559 ( = 31.17

21x354/636= 11.69

559

56

21

total282354636

Page 223: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

223

Calculations and TestingCalculations and Testing

091.969.11/.....

14.311/86.247/

)69.1114()14.311299()86.247260(

2

222

Data: See the given tableData: See the given tableAssumption: Simple random sampleAssumption: Simple random sampleHypothesis: HHypothesis: H00: race and use of folic acid are independent: race and use of folic acid are independent

HA: the two variables are not independent. HA: the two variables are not independent. Let Let αα = = 0.050.05

The test statistic is Chi Square given earlierThe test statistic is Chi Square given earlierDistribution when HDistribution when H00 is true chi-square is valid with (r-1) is true chi-square is valid with (r-1)

(c-1) = (3-1)(2-1)= 2 d.f(c-1) = (3-1)(2-1)= 2 d.f..Decision Rule: Reject H0 if value of is greater thanDecision Rule: Reject H0 if value of is greater than

= = 5.9915.991

CalculationsCalculations::

2

2

)1)(1(, cr

Page 224: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

224

ConclusionConclusionStatistical decision. We reject HStatistical decision. We reject H00 since 9.08960> since 9.08960>

5.9915.991

Conclusion: we conclude that HConclusion: we conclude that H00 is false, and that is false, and that there is a relationship between race and there is a relationship between race and

preconception use of folic acidpreconception use of folic acid..P value. Since 7.378< 9.08960< 9.210, P value. Since 7.378< 9.08960< 9.210,

0.01<p <0.0250.01<p <0.025We also reject the hypothesis at 0.025 level of We also reject the hypothesis at 0.025 level of

significance but do not reject it at 0.01 levelsignificance but do not reject it at 0.01 level..Solve Ex12.4.1 and 12.4.5 (p 620 & P 622)Solve Ex12.4.1 and 12.4.5 (p 620 & P 622)

Page 225: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

225

ODDS RATIOODDS RATIO In a retrospective study, samples are selected from In a retrospective study, samples are selected from

those who have the disease called ‘those who have the disease called ‘cases’ cases’ and those and those who do not have the disease called who do not have the disease called ‘controls’ . ‘controls’ . The The investigator looks back (have a investigator looks back (have a retrospective look)retrospective look) at at the subjects and determines which one have (or had) the subjects and determines which one have (or had) and which one do not have (or did not have ) the risk and which one do not have (or did not have ) the risk factor.factor.

The data is classified into 2x2 table, for comparing The data is classified into 2x2 table, for comparing cases and controls for risk factor cases and controls for risk factor ODDS RATIOODDS RATIO IS IS CALCULATEDCALCULATED

ODDS are defined to be the ratio of probability of ODDS are defined to be the ratio of probability of success to the probability of failure.success to the probability of failure.

The estimate of population odds ratio is The estimate of population odds ratio is bcad

cldbaOR

/

Page 226: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

226

ODDS RATIOODDS RATIO Where a, b, c and d are the numbers given in the Where a, b, c and d are the numbers given in the

following table:following table:

We may construct 100(1-We may construct 100(1-αα)%CI for OR by )%CI for OR by formula:formula:

Risk Factor

SampleTotalCasesControl

Present

aba + b

Absentcdc + d

Totala + cb + d

R Xz )/(1 22/

Page 227: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

227

Example 12.7.2 for Odds RatioExample 12.7.2 for Odds Ratio Example 12.5.7.2 page 640: Data Example 12.5.7.2 page 640: Data

relates to the obesity status of children relates to the obesity status of children aged 5-6 and the smoking status of aged 5-6 and the smoking status of their mothers during pregnancytheir mothers during pregnancy

Hence OR for table Hence OR for table is : is :

Obesity statusObesity status

Smoking status(during

Pregnancy)

casesNon-cases

Total

Smoked throughout

64342406

Never smoked6834963564

Total13238383970

62.9)68)(342()3496)(64(OR

Page 228: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

228

Confidence Interval for Odds Confidence Interval for Odds RatioRatioThe (1-The (1-αα) 100% Confidence Interval for Odds Ratio is:) 100% Confidence Interval for Odds Ratio is:

WhereWhere

For Example 12.5.7.2 we have: a=64, b=342, c=68, For Example 12.5.7.2 we have: a=64, b=342, c=68, d=3496 , therefore:d=3496 , therefore:

Its 95% CI is: Its 95% CI is:

or (7.12, 13.00)or (7.12, 13.00)

))()()(()( 2

2dbcbdaca

bcadnX

RO Xzˆ )2/(1

68.217)3564)(406)(3833)(132()68342349664( 239702 X

62.9 )6831.217/96.1(1

RO Xzˆ )2/(1

Page 229: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

229

Interpretation of Example 12.7.2 Interpretation of Example 12.7.2 DataData

The 95% confidence interval (7.12, 13.00)The 95% confidence interval (7.12, 13.00) mean that we are 95% confident that the mean that we are 95% confident that the

population odds ratio is somewhere population odds ratio is somewhere between 7.12 and 13.00between 7.12 and 13.00

Since the interval does not contain 1, in Since the interval does not contain 1, in fact contains values larger than one, we fact contains values larger than one, we conclude that, in Pop. Obese children conclude that, in Pop. Obese children (cases) are more likely than non-obese (cases) are more likely than non-obese children ( non-cases) to have had a mother children ( non-cases) to have had a mother who smoked throughout the pregnancy.who smoked throughout the pregnancy.

Solve Ex 12.7.4 (page 646)Solve Ex 12.7.4 (page 646)

Page 230: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and Methodology for the Health Sciences

230

Interpretation of ODDS RATIOInterpretation of ODDS RATIO The sample odds ratio provides an estimate The sample odds ratio provides an estimate

of the relative risk of population in the case of the relative risk of population in the case of a rare disease.of a rare disease.

The odds ratio can assume values between The odds ratio can assume values between 0 to ∞.0 to ∞.

A value of 1 indicate no association A value of 1 indicate no association between risk factor and disease status.between risk factor and disease status.

A value greater than one indicates A value greater than one indicates increased odds of having the disease increased odds of having the disease among subjects in whom the risk factor is among subjects in whom the risk factor is present.present.

Page 231: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

231231

Chapter 13Chapter 13 Special Techniques for use Special Techniques for use

when population parameters when population parameters and/or population distributions and/or population distributions

are unknoenare unknoenpages 683-689pages 683-689

Prepared By : Dr. Shuhrat KhanPrepared By : Dr. Shuhrat Khan

Page 232: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

232232

NON-PARAMETRIC STATISTICSNON-PARAMETRIC STATISTICS

The t-test, z-test etc. were all parametric The t-test, z-test etc. were all parametric tests as they were based n the tests as they were based n the assumptions of normality or known assumptions of normality or known variances. variances.

When we make no assumptions about the When we make no assumptions about the sample population or about the population sample population or about the population parameters the tests are called non-parameters the tests are called non-parametric and parametric and distribution-freedistribution-free. .

Page 233: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

233233

ADVANTAGES OF NON-PARAMETRIC ADVANTAGES OF NON-PARAMETRIC STATISTICSSTATISTICS

Testing hypothesis about simple statements (not Testing hypothesis about simple statements (not involving parametric values) e.g. involving parametric values) e.g. The two criteria are independent (test for independence)The two criteria are independent (test for independence)The data fits well to a given distribution (goodness of fit The data fits well to a given distribution (goodness of fit test)test)Distribution Free: Non-parametric tests may be Distribution Free: Non-parametric tests may be used when the form of the sampled population is used when the form of the sampled population is unknown. unknown. Computationally easyComputationally easyAnalysis possible for ranking or categorical data Analysis possible for ranking or categorical data (data which is not based on measurement scale )(data which is not based on measurement scale )

Page 234: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

234234

The Sign TestThe Sign TestThis test is used as an alternative to t-test, This test is used as an alternative to t-test, when normality assumption is not metwhen normality assumption is not metThe only assumption is that the The only assumption is that the distribution of the underlying variable distribution of the underlying variable (data) is continuous.(data) is continuous.Test focuses on median rather than mean.Test focuses on median rather than mean.The test is based on signs, plus and The test is based on signs, plus and minusesminusesTest is used for one sample as well as for Test is used for one sample as well as for two samplestwo samples

Page 235: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

235235

ExampleExample(One Sample Sign Test)(One Sample Sign Test)

Score of 10 mentally Score of 10 mentally retarded girls retarded girls

We wish to know We wish to know if Median of population isif Median of population is different from 5.different from 5.Solution:Solution:Data:Data: is about scores of 10 is about scores of 10 mentally retarded girlsmentally retarded girlsAssumptionAssumption: : The measurements are continuous variable.The measurements are continuous variable.

GirlScore

Girl

Score

12345

45889

6789

10

610

766

Page 236: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

236236

ContinuedContinued.…….……

Hypotheses:Hypotheses: H H00: The population median is 5: The population median is 5 HHAA: The population median is not 5: The population median is not 5Let Let αα = 0.05 = 0.05

Test StatisticTest Statistic: : The test statistic for the sign The test statistic for the sign test is either the observed number of plus signs test is either the observed number of plus signs or the observed number of minus signs. The or the observed number of minus signs. The nature of the alternative hypothesis determines nature of the alternative hypothesis determines which of these test statistics is appropriate. In a which of these test statistics is appropriate. In a given test, any one of the following alternative given test, any one of the following alternative hypotheses is possible: hypotheses is possible:

HHAA: : PP(+) > (+) > PP(-) one-sided alternative(-) one-sided alternative HHAA: : PP(+) < (+) < PP(-) one-sided alternative(-) one-sided alternative HHAA: : PP(+) ≠ (+) ≠ PP(-) two-sided alternative(-) two-sided alternative

Page 237: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

237237

ContinuedContinued.…….……

If the alternative hypothesis is HIf the alternative hypothesis is HAA: : PP(+) > (+) > PP(-) a (-) a sufficiently small number of minus signs causes sufficiently small number of minus signs causes rejection of Hrejection of H0. 0. The test statistic is the number of The test statistic is the number of minus signs. minus signs. If the alternative hypothesis is HIf the alternative hypothesis is HAA: : PP(+) < (+) < PP(-) a (-) a sufficiently small number of plus signs causes sufficiently small number of plus signs causes rejection of Hrejection of H0. 0. The test statistic is the number of The test statistic is the number of plus signs. plus signs. If the alternative hypothesis is HIf the alternative hypothesis is HAA: : PP(+) ≠ (+) ≠ PP(-) (-) either a sufficiently small number of plus signs or either a sufficiently small number of plus signs or a sufficiently small number of minus signs causes a sufficiently small number of minus signs causes rejection of the null hypothesis. We may take as rejection of the null hypothesis. We may take as the test statistic the less frequently occurring sign. the test statistic the less frequently occurring sign.

Page 238: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

238238

ContinuedContinued.…….……Distribution of test statistic:Distribution of test statistic: If we assign If we assign a plus sign to those scores that lie above the a plus sign to those scores that lie above the hypothesized median and a minus to those hypothesized median and a minus to those that fall below. that fall below.

Decision Rule: Decision Rule: Let k = minimum of pluses Let k = minimum of pluses or minuses. Here k = 1, the minus sign. or minuses. Here k = 1, the minus sign. For HFor HAA: : PP(+) > (+) > PP(-) reject H(-) reject H0 0 if, when Hif, when H0 0 if true, if true, the probability of observing k or fewer minus the probability of observing k or fewer minus signs is less than or equal to signs is less than or equal to αα. .

Girl12345678910

Score relative to median = 5-0++++++++

Page 239: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

239239

ContinuedContinued.…….……

For HFor HAA: : PP(+) > (+) > PP(-) reject H(-) reject H00 if, when Hif, when H0 0 if true, the if true, the probability of observing k or fewer minus signs is probability of observing k or fewer minus signs is less than or equal to less than or equal to αα..For HFor HAA: : PP(+) < (+) < PP(-), reject H(-), reject H0 0 if the probability of if the probability of observing, when Hobserving, when H0 0 is true, k or fewer plus signs is is true, k or fewer plus signs is equal to or less than equal to or less than αα..For HFor HAA: : PP(+) ≠ (+) ≠ PP(-) , reject H(-) , reject H0 0 if (given that Hif (given that H00 is is true) the probability of obtaining a value of true) the probability of obtaining a value of k k as as extreme as or more extreme than was actually extreme as or more extreme than was actually computed is equal to or less than computed is equal to or less than αα/2. /2. Calculation of test statistic: Calculation of test statistic: The probability of The probability of observing k or fewer minus signs when given a observing k or fewer minus signs when given a sample of size n and parameter sample of size n and parameter p p by evaluating the by evaluating the following expression: following expression: P (X ≤ k | n, p) = P (X ≤ k | n, p) =

qpC

xnxk

x

n

x

0

Page 240: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

240240

ContinuedContinued.…….……

For our example we would computeFor our example we would compute

Statistical decision: Statistical decision: In Appendix Table B we find In Appendix Table B we find P (k ≤ 1 | 9, 0.5) = 0.0195 P (k ≤ 1 | 9, 0.5) = 0.0195

Conclusion: Conclusion: Since 0.0195 is less than 0.025, we Since 0.0195 is less than 0.025, we reject the null hypothesis and conclude that the reject the null hypothesis and conclude that the median score is not 5.median score is not 5.pp value: value: The The p p value for this test is 2(0.0195) = value for this test is 2(0.0195) = 0.0390, because it is two-sided test.0.0390, because it is two-sided test.

0195.001758.000195.0)5.0()5.0()5.0()5.0( 1919

1

0909

0

CC

Page 241: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

241241

SIGN TEST----Paired Data SIGN TEST----Paired Data This is used an alternative to t-test for paired observations, This is used an alternative to t-test for paired observations,

when the underlying assumptions of t test are not met.when the underlying assumptions of t test are not met.Null Hypothesis Null Hypothesis to be tested the median difference is zero. to be tested the median difference is zero. OROR P (Xi > Yi ) = P (Yi > Xi ) P (Xi > Yi ) = P (Yi > Xi ) Subtract Yi from Xi , if Yi is less than Xi , the sign of the Subtract Yi from Xi , if Yi is less than Xi , the sign of the

difference is (+), if Yi is greater than Xi , the sign of the difference is (+), if Yi is greater than Xi , the sign of the difference is ( - ), so that difference is ( - ), so that

HH00 : P(+) = P(-) = 0.5 : P(+) = P(-) = 0.5 TEST STATISTIC: As before is k, the no of least occurring of TEST STATISTIC: As before is k, the no of least occurring of

Plus or minus signs. Plus or minus signs.

Page 242: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

242242

SIGN TEST----Example 13.3.2SIGN TEST----Example 13.3.2 A dental research team matched 12 pairs of 24 patients in age, sex, intelligence. Six A dental research team matched 12 pairs of 24 patients in age, sex, intelligence. Six

months later random evaluation showed the following score (low score score is months later random evaluation showed the following score (low score score is higher level of hygiene)higher level of hygiene)

HH0 0 : P(+) = P(-) = 0.5 : P(+) = P(-) = 0.5

1.1.DataData. Scores of dental hygiene, one member instructed how to brush and . Scores of dental hygiene, one member instructed how to brush and other remained uninstructed. other remained uninstructed.

2. 2. AssumptionAssumption: the variable of dist is continues: the variable of dist is continues3. H3. Ho o : The median of the difference is zero: The median of the difference is zero [P(+) =P(-)] [P(+) =P(-)] HHAA : The median of the difference is negative : The median of the difference is negative [P(+) <P(-)][P(+) <P(-)]

pair no.123456789101112

instructed1.52.03.53.03.52.52.01.51.52.03.02.0

Not instructed

2.02.04.02.54.03.03.53.02.52.52.52.5

Difference -0-+------+-

Page 243: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

243243

Continued…….Continued……. Let Let αα be 0.05 be 0.054. 4. Test StatisticTest Statistic: The test statistic is the number of plus : The test statistic is the number of plus

signs which occurs less frequent. i.e. k = 2signs which occurs less frequent. i.e. k = 2 5. 5. DistributionDistribution of k is binomial with n= 11 (as one of k is binomial with n= 11 (as one

observation is discarded) and p= 0.5observation is discarded) and p= 0.56. 6. Decision RuleDecision Rule: Reject H: Reject H00 if P(k≤2| 11,0.5) ≤ 0.05. if P(k≤2| 11,0.5) ≤ 0.05.7. 7. CalculationsCalculations: : P(k≤2/11,0.5)=P(k≤2/11,0.5)= Table B or calculations show the probability is equal to Table B or calculations show the probability is equal to

0.0327 which is less than 0.05, we 0.0327 which is less than 0.05, we must reject Hmust reject H00 . .8. 8. ConclusionConclusion: median difference is negative and : median difference is negative and

instructions are beneficialinstructions are beneficial 9. 9. p valuep value: Since it is one sided test the p-value is : Since it is one sided test the p-value is

p= .0327p= .0327

)5.0()5.0 112

011 (

kk

k k

Page 244: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

244244

NON-PARAMETRIC STATISTICSNON-PARAMETRIC STATISTICS

The t-test, z-test etc. were all parametric The t-test, z-test etc. were all parametric tests as they were based n the tests as they were based n the assumptions of normality or known assumptions of normality or known variances. variances.

When we make no assumptions about the When we make no assumptions about the sample population or about the population sample population or about the population parameters the tests are called non-parameters the tests are called non-parametric and parametric and distribution-freedistribution-free. .

Page 245: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

245245

EXAMPLE 1EXAMPLE 1Cardiac output (liters/minute) was measured by Cardiac output (liters/minute) was measured by thermodilution in a simple random sample of 15 thermodilution in a simple random sample of 15 postcardiac surgical patients in the left lateral position. postcardiac surgical patients in the left lateral position. The results were as follows: The results were as follows:

We wish to know if we can conclude on the basis of these We wish to know if we can conclude on the basis of these data that the population mean is different from 5.05. data that the population mean is different from 5.05. Solution:Solution:1.1. DataData.. As given above As given above2. 2. AssumptionsAssumptions. . We assume that the requirements for We assume that the requirements for the application of the Wilcoxon signed-ranks test are the application of the Wilcoxon signed-ranks test are met. met. 3. 3. Hypothesis.Hypothesis. HH00: µ = 5.05: µ = 5.05 HHAA: µ ≠ 5.05: µ ≠ 5.05Let Let αα = 0.05. = 0.05.

4.914.106.747.277.427.506.564.645.983.143.235.806.175.395.77

Page 246: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

246246

EXAMPLE 1EXAMPLE 144 . .Test StatisticTest Statistic. . The test statistic will be The test statistic will be T T + or + or TT-, -,

whichever is smaller, called the test statistic whichever is smaller, called the test statistic TT . .5. 5. Distribution of test statisticDistribution of test statistic. . Critical values of Critical values of the test statistic are given in Table K of the the test statistic are given in Table K of the AppendixAppendix. . 6. 6. Decision ruleDecision rule. We will reject . We will reject HH0 0 if the computed if the computed value of value of TT is less than or equal to 25, the critical is less than or equal to 25, the critical value value nn = 15, and = 15, and αα/2 = 0.0240, the closest value to /2 = 0.0240, the closest value to 0.0250 in Table K. 0.0250 in Table K. 7. 7. CalculationCalculation of test statistic. of test statistic. The calculation of The calculation of the test statistic is shown in Table. the test statistic is shown in Table.

8. 8. Statistical decisionStatistical decision.. Since 34 is greater than Since 34 is greater than 25, we are unable to reject 25, we are unable to reject HH0. 0.

Page 247: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

247247

Cardiac output

di = xi – 5.05

Rank of |di| Signed Rank of |di |

4.91-0.141-1

4.10-0.957-7

6.74+1.6910+10

7.27+2.2213+13

7.42+2.3714+14

7.50+2.4515+15

6.56+1.519+9

4.64-0.413-3

5.98+0.936+6

3.14-1.9112-12

3.23-1.8211-11

5.80+0.755+5

6.17+1.128+8

5.39+0.342+2

5.77+0.724+4

T+ = 86, T- = 34, T = 34

Page 248: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

248248

EXAMPLE 1EXAMPLE 1

8. 8. Statistical decisionStatistical decision.. Since 34 is greater than Since 34 is greater than 25, we are unable to reject 25, we are unable to reject HH0. 0. 9. 9. ConclusionConclusion.. We conclude that the population We conclude that the population mean may be 5.05mean may be 5.0510. 10. p p valuevalue.. From Table K we see that the p value is From Table K we see that the p value is p = 2(0.0757) = 0.1514p = 2(0.0757) = 0.1514

Page 249: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

249249

EXAMPLE 2EXAMPLE 2

A researcher designed an experiment to assess the effects A researcher designed an experiment to assess the effects of prolonged inhalation of cadmium oxide. Fifteen laboratory of prolonged inhalation of cadmium oxide. Fifteen laboratory animals served as experimental subjects, while 10 similar animals served as experimental subjects, while 10 similar animals served as controls. The variable of interest was animals served as controls. The variable of interest was hemoglobin level following the experiment. The results are hemoglobin level following the experiment. The results are shown in Table 2. shown in Table 2. We wish to know if we can conclude that prolonged We wish to know if we can conclude that prolonged inhalation of cadmium oxide reduces hemoglobin level.inhalation of cadmium oxide reduces hemoglobin level.

Page 250: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

250250

EXAMPLE 2EXAMPLE 2TABLE 2.TABLE 2. HEMOGLOBIN DETERMINATIONS (GRAMS) FOR 25 HEMOGLOBIN DETERMINATIONS (GRAMS) FOR 25 LABORATORY ANIMALSLABORATORY ANIMALS

EXPOSED ANIMALS (X)UNEXPOSED ANIMALS (Y)

14.417.4

14.216.2

13.817.1

16.517.5

14.115.0

16.616.0

15.916.9

15.615.0

14.116.3

15.316.8

15.7

16.7

13.7

15.3

14.0

Page 251: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

251251

EXAMPLE 2EXAMPLE 2

Solution:Solution:1. 1. Data.Data. See table above See table above2. 2. AssumptionsAssumptions. . We presume that the assumptions We presume that the assumptions of the Mann-Whitney test are met.of the Mann-Whitney test are met.3. 3. Hypothesis.Hypothesis.

HH00: M: Mxx ≥ M ≥ Myy

HHAA: M: Mxx < M < Myy

where Mwhere Mx x is the median of a population of animals is the median of a population of animals exposed to cadmium oxide and Mexposed to cadmium oxide and My y is the median of is the median of a population of animals not exposed to the a population of animals not exposed to the substance. Suppose we let substance. Suppose we let αα = 0.05. = 0.05.

Page 252: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

252252

EXAMPLE 2EXAMPLE 2

4. 4. Test StatisticTest Statistic.. The test statistic is The test statistic is

where where nn is the number of sample is the number of sample XX observations observations and and SS is the sum of the ranks assigned to the is the sum of the ranks assigned to the sample observations from the population of sample observations from the population of XX values. The choice of which sample’s values we values. The choice of which sample’s values we label as label as XX is arbitrary. is arbitrary.

2)1(

nnST

Page 253: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

253253

Sum of the Sum of the YY ranks = ranks = S S = 145= 145TABLE 2.TABLE 2. ORIGINAL DATA AND RANKS ORIGINAL DATA AND RANKS

X13.713.814.014.114.114.214.415.315.315.6

Rank1234.54.56710.510.512

Y15.015.0

Rank 8.58.5

X15.715.916.5

16.616.7

Rank

131418.1920

Y16.016.2

16.3

16.8

16.9

17.117.4

17.5

Rank

1516172122232425

Page 254: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

254254

EXAMPLE 2EXAMPLE 2

5. 5. Distribution of test statistic. Distribution of test statistic. The critical values The critical values are given in Table K. are given in Table K. 6. 6. Decision Rule. Decision Rule. Reject HReject H00: M: Mxx ≥ M ≥ Myy, if the computed , if the computed TT is less than w is less than wαα with n, the number of X observations; with n, the number of X observations; m the number of Y observations and m the number of Y observations and αα, the chosen , the chosen level of significance. level of significance. If the null hypothesis were of the types If the null hypothesis were of the types

HH00: M: Mxx ≤ M ≤ Myy HHAA: M: Mxx > M > Myy

Reject HReject H00: M: Mxx ≤ M ≤ Myy if the computed if the computed TT is greater than is greater than ww1-1-αα, where W, where W1-1-αα = = nmnm - W - W α α. .

Page 255: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

255255

EXAMPLE 2EXAMPLE 2

For the two-sided test situation withFor the two-sided test situation with

HH00: M: Mxx = M = Myy HHAA: M: Mxx ≠ M ≠ Myy

Reject HReject H00: M: Mxx = M = Myy if the computed value of if the computed value of TT is is either less than weither less than wαα/2/2 or greater than w or greater than w1-1-αα/2 /2 , where , where wwαα/2 /2 is the critical value of is the critical value of T T for for n, m n, m andand αα/2 /2 given given in Appendix II Table K and win Appendix II Table K and w1-1-αα/2 = /2 = nm nm - - wwαα/2. /2. For this example the decision rule of For this example the decision rule of TT is smaller is smaller than 45, the critical value of the test statistic for than 45, the critical value of the test statistic for nn = = 15, 15, mm = 10, and = 10, and αα = 0.05 found in Table K. = 0.05 found in Table K.

Page 256: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

256256

EXAMPLE 2EXAMPLE 2

7. 7. Calculation of test statistic. Calculation of test statistic. We have We have SS = 145, = 145, so thatso that

8. 8. Statistical DecisionStatistical Decision. When we enter Table K . When we enter Table K with with nn = 15, = 15, mm = 10, and = 10, and αα = 0.05, we find the = 0.05, we find the critical value of wcritical value of w1-1-αα to be 45. Since 25 is less than to be 45. Since 25 is less than 45, we reject H45, we reject H00. . 9. 9. ConclusionConclusion. We conclude that M. We conclude that Mxx is smaller than is smaller than MMY. Y. This leads us to the conclusion that prolonged This leads us to the conclusion that prolonged inhalation of cadmium oxide does reduce the inhalation of cadmium oxide does reduce the hemoglobin level. hemoglobin level.

Since 22< 25 < 30, we have for this testSince 22< 25 < 30, we have for this test 0.005 > 0.005 > pp >0.001. >0.001.

252

)115(15145

T

Page 257: Introduction to Biostatistics-145 Lectures4

Text Book : Basic Concepts and MethodoText Book : Basic Concepts and Methodology for the Health Sciences logy for the Health Sciences

257257

EXAMPLE 2EXAMPLE 2

When either When either n n or or m m is greater than 20 we cannot is greater than 20 we cannot use Appendix Table K to obtain critical values for the use Appendix Table K to obtain critical values for the Mann-Whitney test. When this is the case we may Mann-Whitney test. When this is the case we may computecompute

And compare the result, for significance, with critical And compare the result, for significance, with critical values of the standard normal distribution. values of the standard normal distribution.

12/)1(2/

mnnmmnTz