introduction to medical statistics sun jing health statistics department
TRANSCRIPT
Introduction to Medical Statistics
Sun Jing
Health Statistics Department
Contents Introduction basic concepts in Medical Statistics
Small game to practice basic concepts
★
Vocabulary
Medical Statistics 医学统计学
Statistics
The discipline concerned with the treatment of numerical data derived from groups of individuals (P. Armitage).
The science and art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results ( JM Last).
Medical Statistics
Application of mathematical statistics inthe field of medicine
Homogeneity: All individuals have similar values or belong to same category.
Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation.
Homogeneity and Variation
Variation: the differences in height, weight…
Toss a coin: The mark face may be up or down ---- variation!
Treat the patients suffering from pneumonia with same antibiotics: A part of them recovered and others didn’t ---- variation!
If there is no variation, there is no need for statistics.
Can you give an example of variation in medical field?
Homogeneity and Variation
Population: The whole collection of
individuals that one intends to study.Sample: A representative part of the
population.
Population and sample
Questions: Which one is “population”?
All the cases with hepatitis B collected in a hospital in Guangzhou.
All the deaths found from the permanent residents in a city.
All the rats for testing the toxicity of a medicine.
Randomization : An important way to make the sample representative.
Randomization
ProbabilityProbability
Measure the possibility of occurrence of a random event.
A : random eventP(A) : Probability of the random event A
P(A)=1 , if an event always occurs.
P(A)=0, if an event never occurs.
RandomRandom By chance!Random event: the event may occur or may not
occur in one experiment.
Before one experiment, nobody is sure whether the event occurs or not.
Question: Please give some examples of random event.
There must be some regulation in a large number of experiments.
Number of observations: n (large enough)
Number of occurrences of random event A: m
P(A) m/n
(Frequency or Relative frequency)
Question: Please give some examples for
probability of a random event,
and frequency of that random event
Estimation of Probability----FrequencyEstimation of Probability----Frequency
Parameter and statisticParameter and statistic
Parameter : A measure of population
or
A measure of the distribution of population.
Parameter is usually presented by Greek letter
such as μ,π.-- Parameters are unknown usually
To know the parameter of a population, we need a sample
Statistic: A measure of sample or A measure of the distribution of sample.
Statistic is usually presented by Latin letter such as s and p.
Questions: Please give an example for parameter and statistics. Does a parameter vary? Does a statistic vary?
5. Sampling Error5. Sampling Error
The difference between observed value and true value.
Three kinds of error:
(1) Systematic error (fixed)
(2) Measurement error (random)
(3) Sampling error (random)
Sampling errorThe statistics of different samples from same
population: different each other!The statistics: different from the parameter!
The sampling error exists in any sampling research.
It can not be avoided but may be estimated.
8.2 Types of data8.2 Types of data
1. Numerical Variable and Measurement Data
The variable describe the characteristic of individuals quantitatively
-- Numerical Variable
The data of numerical variable
-- Measurement Data
2. Categorical Variable and
Enumeration Data
The variable describe the category of individuals according to a characteristic of individuals
-- Categorical VariableThe number of individuals in each category
-- Enumeration Data
Special case of categorical variable :
Ordinal variable and rank data
There exists order among all possible categories
-- Ordinal variable
The data of ordinal variable, which represent the order of individuals only
-- Rank data
ExamplesExamples
Which type of variables they belong to?RBC (4.58 106/l)Diastolic/systolic blood pressure (8/12 kappa)Percentage of individuals with blood type
A (20%) Protein in urine (++)Transition rate of cell ( 90%)
2. Measures for Average2. Measures for Average
(1) Arithmetic Meathe : it is calculated by summering all the observations in a set of data and dividing by the total number of measurements.
Based on observed data
Example: Blood sugar 6.2, 5.4, 5.7, 5.3, 6.1, 6.0, 5.8, 5.9
n
X
n
X
n
XXXX
n
ii
n
121 ...
8.58
4.46
8
9.5...4.52.6
X
(2) Geometric mean
Example 9-4 See Table 9-4
)lg
(lg)lg...lglg
(lg 1211
21
n
X
n
XXXG
XXXG
n
nn
(3) Median Ranking the values of observation from the smallest to the largest, Median = the value in the middle. It is also called 50th
percentile: half the values would be greater than it and the other half would be less than it
(3) Median
Based on raw dataBased on raw data
Example 1: (7 values)
120,123,125,127,128,130,132
Median =127
Example 2: (8 values)
118,120,123,125,127,128,130,132
Median=(125+127)/2=126
3. Measures for variability3. Measures for variability
(1) Range
Range= Maximum - Minimum
Based on only two observations, it ignores the
observations within the two extremes.
The greater the number of observations, the
greater the range is.
(2) Inter- quartile range
Lower Quartile: 25 percentile
Upper Quartile: 75 percentile
Difference between two Quartiles
= Upper Quartile - Lower Quartile
= 13.120 – 8.083 = 5.037
(3)Variance and Standard Deviation
variance is calculated by subtracting the mean of a set of data value from each of the observations, squaring these deviations, adding them up, and dividing by one less than the number of the observations in the data set. The mean of squared deviation
n
X
22 )(
1
)(
1
)(
22
22
n
n
XX
n
XXs
(3)Variance and Standard Deviation
Standard deviation (SD): is the square root of the variance.
n
X
2)(
1
)(
1
)(
22
2
n
n
XX
n
XXs
(4)Coefficient of Variation(4)Coefficient of Variation
X
sCV
CV: is a ratio of standard deviation to arithmetic mean multiplied by 100.
Example 9-10 Variation of height and variation of weight Mean
(1)
Standard deviation
(2)
Coefficient of
Variation (%)
(3)=(2)/(1)
Height 171.21(cm) 5.34(cm) 3.12
Weight 59.72 (kg) 4.16 (kg) 6.97
Absolute measure: The numbers counted for each
category (frequencies) The absolute measure can
hardly be used for comparison between different populations.
Relative measureRelative measure Three kinds of relative measures:
Frequency (Proportion) Intensity (Rate) Ratio
Relative FrequencyRelative Frequency
condition certain with possiblely units ofnumber totalThe
conditioncertain with units ofnumber The
frequency Relative Proportion
It is proportion or Relative frequency!
Eg3.1 In an alcohol drinking survey with sample size 2327 aged between 15 and 65, it was found that there were 347 were alcohol abusers, estimate the relative frequency of alcohol abuse.According to the formula, the relative frequency
of alcohol abuse =347/2327×100%=14.9%.
Proportion ( constitute rate): A part consi
dered in relation to the whole.
Eg, proportion of sex
proportion of age
proportion of mortality of diseases
Disease Mortality Proportion (%)
Malignant tumor 50 33.33
Circulation system 40 26.67
Respiration system 30 20.00
Digestive system 20 13.33
Infectious disease 10 6.67
Total 150 100.00
Table 3.1 proportions of 5 disease death in 2001
Example 1
%16.15)gradeFirst Myopia( P
%89.15)grade SecondMyopia( P
%36.18)grade ThirdMyopia( P
%08.35)MyopiagradeFirst ( P%60.35)Myopiagrade Second( P
%32.29)Myopiagrade Third( P
Question: Which grade has the most serious condition of myopias?
Table 10-1 Prevalence rates and constitute of myopia in a junior high school Grade Number
of students tested
Number of students
with myopia
Prevalence rate (%)
Constitute Among myopias
(%) First grade 442 67 15.16 35.08 Second grade 428 68 15.89 35.60 Third grade 305 56 18.36 29.32 Total 191 100.00
Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade)Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia)Which grade has the most serious condition of myopias?Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.
(2) Intensity(2) IntensityExample A smoking population had Example A smoking population had followed up for 562833 person-years, 346 followed up for 562833 person-years, 346 lung cancer cases were found.lung cancer cases were found.
The incidence rate of lung cancer in the The incidence rate of lung cancer in the smoking population is :smoking population is :
Incidence rate =346/562833Incidence rate =346/562833
=61.47 per 100,000 person-year=61.47 per 100,000 person-year
year theduring disease ofrisk the toexposing yearsperson
year theduring occuring patients ofNumber
yearcertain in rate Incidence
period in the observed years-person Total
period in the appearing events ofNumber
periodcertain in Intensity
In general,
Denominator: Sum of the person-years observed in the period
Numerator: Total number of the event appearing in the period
Unit: person/person year, or 1/YearNature: the relative frequency per unit of time.
Eg3.2 In an infection survey, the researchers observed
500 patients in a hospital, the total number of observed
days is 12500 (person-day). They were found that 59
patients were infected in the hospital. Calculate the
daily infection rate in the hospital.
According to the formula, the daily infection rate =
59/12500 = 0.00472 = 0.472%, that is, there are 0.472%
of patients may be infected every day in this hospital.
year theduringdeath ofrisk the toexposing yearsperson
year theduring deaths ofNumber
yearcertain in rateMortality
Example The mortality rate of liver cancer in Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.Guangzhou is 32 per 100,000 per year.
(3) Ratio(3) Ratio Ratio is a number divided by another related numberExamples Sex ratio of students in this class: No. of males : No. of females = 52%
Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40
Ratio : It is quotient of any two values. It repr
esents the times of one to another.
Eg, ratio of sex
ratio of sickbed of two hospital
relative risk
Eg3.3 There are 14750 doctors in a city with
population of 8100000 in 2000. Find out the possession
of doctors per 1000 person in this city.
According to the formula, the possession of doctors
per 1000 person = 14750/8100 = 1.82, that is, it is hold
about 1.82 doctors per 1000 people.
13.1 Principles of research design13.1 Principles of research design
1. Control
2. Balance
3. Randomization
4. Replication
2. Balance: The experimental group and control group are almost the same in all aspects except the treatment.
Others
Effect of others
Effect of treatmentTreatment Subject
Others
Control Effect of controlSubject
Effect of others
3. Randomization
Many factors, we know that they may influence the results, but they are very difficult to deal with – Randomization is the best choice!
Example To improve the homogeneity of subjects, collect a number of students with same age and gender; randomly arrange them into two groups to make them balance in height and weight.
Randomization is the prerequisite of statistical inference.
Randomization Casual
Randomization means that all subjects in population have same probability to be sampled out for research.
4. ReplicationOne meaning of replication :
The results can be reproduced in different labs and by different researchers.
Another meaning of replication :
The study should be performed in a big enough sample.
Altman & Dore checked 90 papers: 39% mentioned their sample size and why.Sample sizes of 27% papers were too small to ma
ke a conclusion.
Experimental designExperimental design
1. Why?
To plan and arrange subject selection, treatment
assignment, data collection and statistical analysis
To make sure validity, reproducible and economy.
2. Types of research
• Experiment: animal experiment, clinical trial,
community intervention trial
• Survey
Both need well design !
Survey designSurvey design
1. Survey Observe the existing process Without intervention Well design Example for surveys: Health condition survey Epidemiological survey Etiologic survey Clinical follow up survey Sanitary survey …….
regression coefficient regression coefficient
regression coefficient: measures the quantitative
dependency relationship of the variable Y on X.
correlation coefficient correlation coefficient
correlation coefficient (also called coefficient of product-moment
correlation) measures the strength and direction of the linear
relationship between the two variables.
“regression ” has became the statistic term which show the
quantitative dependency between the variables, and formed some
new statistic concepts such as the “regression equation” and
“regression coefficient”.