introduction biostatistics
TRANSCRIPT
Introduction to Biostatistics
DR. SYED SANOWAR ALI
CENTRAL TENDENCY
The centre of the distributionOr
The most typical case
Measures of CENTRAL TENDENCYGiven a data set, a measure of theCENTRAL TENDENCY is a value about whichthe observations tend to cluster
In other words In other words a measure of theCENTRAL TENDENCY is a value around whichCENTRAL TENDENCY is a value around whicha data set is centered a data set is centered
Measures of CENTRAL TENDENCYThe three most common measures are• Mean• Median• Mode
Mean: It is the value that is closest to all the other values in a distribution.
Mean = X1 + X2 + -------- Xn or nµ = X1 + X2 + -------- XN or N∑ = summation = X barµ = muN = total number of values in populationn = total number of values in sample
nxx
Nx
Find the mean of the following five salaries 6000, 10000, 14000, 50000, 10000• Step 1. Arrange the values in ascending order. 6000, 10000, 10000, 14000, 50000• Step 2. Add all of the observed values in the distribution. 6000+10000+10000+14000+50000= 90000• Step 3. Divide the sum by the number of observations. 90000 / 5 = 18000
• Therefore, the mean salary is 18000nxx
Properties of Mean1. One computes the mean by using all
the values of the data.2. The mean is used in computing other statistics, such as variance3. The mean for the data set is unique and not necessarily one of the data value4. The mean is affected by extremely high or low values, called outliers, and may not be the appropriate to use in these situation
Median is the middle value of a set of data that has been put into rank order. The median is also the 50th percentile of the distribution.
Median
Example A: Odd Number of Observations Find the median of the following6000, 10000, 14000, 50000, 10000• Step 1. Arrange the values in ascending order. 6000, 10000, 10000, 14000, 50000• Step 2. Find the middle position of the distribution by using
(n + 1) / 2. Middle position = (5 + 1) / 2 = 6 / 2 = 3• Therefore, the median will be the value at the third
observation.• Step 3. Identify the value at the middle position. Third observation = 10000
Example A: Even Number of Observations Find the median of the following6000, 10000, 14000, 50000, 10000, 12000• Step 1. Arrange the values in ascending order. 6000, 10000, 10000, 12000, 14000, 50000• Step 2. Find the middle position of the distribution by
using (n + 1) / 2. Middle position = (6 + 1) / 2 = 7 / 2 = 3.5• Step 3. Identify the value at the middle position.The median equals the average of the values of the third(value = 10000) and fourth (value = 12000 observations: Median = (10000 + 12000) / 2 = 11000
Properties of Median1. The median is used when one must
find the center or middle value 2. The median is used when one must determine whether the data values fall into the upper half or lower half of the distribution 3. The median is affected less than mean by extremely high or extremely low values
Mode is the value that occurs most often in a set of data. It can be determined simply by tallying the number of times each value occurs.
ModeIn this case salary 10000 is the value thatoccurs most frequently.The mode is 10000It should be noted that there can be morethan one mode for a data set
Properties of Mode1. The mode is used when the most
typical case is desired2. The mode is the easiest to compute 3. The mode can be used when the data
are nominal such as religious preference, gender, or political affiliation 4. The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a data set
Find the mean of the following incubation periods for hepatitis A:
27, 31, 15, 30, and 22 days.• Step 1. Arrange the values in ascending order
distribution. 15, 22, 27, 30, 31 Step 2. Add all of the observed values in the distribution. 15 + 22 + 27 + 30 + 31 = 125• Step 3. Divide the sum by the number of observations. 125 / 5 = 25.0• Therefore, the mean incubation period is 25.0 days.
Example B: Even Number of ObservationsSuppose a sixth case of hepatitis was reported. hepatitis A:
27, 31, 15, 30, 22 and 29 days.• Step 1. Arrange the values in ascending order. 15, 22, 27, 29, 30, and 31 days• Step 2. Find the middle position of the distribution by
using (n + 1) / 2. Middle location = 6 + 1 / 2 = 7 / 2 = 3½• Step 3. Identify the value at the middle position.The median equals the average of the values of the third
(value = 27) and fourth (value = 29) observations: Median = (27 + 29) / 2 = 28 days
Example B: Find the mode of the following incubation periods for hepatitis A:
27, 31, 15, 30, and 22 days.• Step 1. Arrange the values in ascending order. 15, 22, 27, 30, and 31 days• Step 2. Identify the value that occurs most often. None• Note: When no value occurs more than once, the
distribution is said to have no mode.
the number of doses of diphtheria-pertussis- tetanus (DPT) vaccine each of seventeen 2-year-old children in a particular village received:0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4Two children received no doses; two children received 1 dose; three received 2 doses; six received 3 doses; and four received all 4 doses.
Therefore, the mode is 3 doses, because more children received 3 doses than any other number of doses.
Which measure of CT should you use ?The Mean is by far the most common measure ofCT. It uses all of the information in the sample.This measure is very good when the distributionis symmetrical.
Mean , Median and ModeData:4000, 4500, 5000, 5500, 6000, 6000, 6500,7000, 7500 and 8000Mean = 6000Median = 6000Mode = 6000
= = Same Same
Salary
Mean , Median and Mode= SameMean , Median and Mode= Same
Normal Distribution Or Curve
Which measure of CT should you use ?If the distribution is skewed or there areextreme values the Mean is artificially pulledtowards the extreme value. Age example: 19, 20, 21, 22, 49 Mean=26.2 Mean=26.2
yrs. yrs. Mean=49.2 Mean=49.2
Marks example 05, 55, 57, 63, 66
Which measure of CT should you use ?Age : 19, 20, 21, 22,
49 Mean=26.2 Mean=26.2 yrs. yrs.
Right skewed or Positively skewed
Which measure of CT should you use ?Marks 05, 55, 57, 63, 66
Mean=49.2 Mean=49.2
Left skewed or Negatively skewed
Which measure of CT should you use ?• If the distribution is skewed or there are extreme
values, in such a case Median proves to be better measure of the CT.
• Median is resistant to extreme observations.
Which measure of CT should you use ?• Mode is commonly used as a measure of
popularity that reflect CT of Opinion • Examples: 1. Most preferred pain killer 2. Most preferred model of washing machine 3. Most popular candidate
Most fighting cricket team • Pakistan=1• Australia=2• India=3• England=4
1, 2, 4, 1, 2, 1, 3, 1, 4, 1,1, 2, 4, 1, 2, 1, 3, 1, 4, 1,2, 1, 3, 2, 4, 4, 1, 1, 1, 4,2, 1, 3, 2, 4, 4, 1, 1, 1, 4,3, 1, 1, 4, 2, 1, 1, 2, 1, 2,3, 1, 1, 4, 2, 1, 1, 2, 1, 2,1, 4, 1, 1, 3, 2, 4, 1, 4, 1 1, 4, 1, 1, 3, 2, 4, 1, 4, 1
Which measure of CT should you use ?Mean(2.075
)
MODE 19884499
Median(2) Mode(1)
Measurement of Variation
Measurement of DispersionOR
RangeThe range is the simplest measure of variation to find. It is simply the highest value minus the lowest value.RANGE = MAXIMUM - MINIMUM Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change.
Variance (σ2)
The Variance is defined as:The average of the squared differences from the Mean.
σ2 = Σ (Xi - x̄)2 / N-1 (if sample size ≤ 30)
σ2 = Σ (Xi - x̄)2 / N
Standard deviation (σ)
The Standard Deviation is a measure of how spread out numbers are.Its symbol is σ (the greek letter sigma)The formula is easy: it is the square root of the Variance. σ = √σ2
Coefficient of variance (Cv)
The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each otherCv = Standard Deviation x 100 Mean