introduction to medical statistics sun jing health statistics department

54
Introduction to Medical Statistics Sun Jing Health Statistics Department

Upload: kathlyn-hoover

Post on 16-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Medical Statistics Sun Jing Health Statistics Department

Introduction to Medical Statistics

Sun Jing

Health Statistics Department

Page 2: Introduction to Medical Statistics Sun Jing Health Statistics Department

Contents Introduction basic concepts in Medical Statistics

Small game to practice basic concepts

Page 3: Introduction to Medical Statistics Sun Jing Health Statistics Department

Vocabulary

Medical Statistics 医学统计学

Page 4: Introduction to Medical Statistics Sun Jing Health Statistics Department

Statistics  

The discipline concerned with the treatment of numerical data derived from groups of individuals (P. Armitage).

The science and art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results ( JM Last).

Page 5: Introduction to Medical Statistics Sun Jing Health Statistics Department

Medical Statistics  

Application of mathematical statistics inthe field of medicine

Page 6: Introduction to Medical Statistics Sun Jing Health Statistics Department

Homogeneity: All individuals have similar values or belong to same category.

Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation.

Homogeneity and Variation

Page 7: Introduction to Medical Statistics Sun Jing Health Statistics Department

Variation: the differences in height, weight…

Toss a coin: The mark face may be up or down ---- variation!

Treat the patients suffering from pneumonia with same antibiotics: A part of them recovered and others didn’t ---- variation!

If there is no variation, there is no need for statistics.

Can you give an example of variation in medical field?

Homogeneity and Variation

Page 8: Introduction to Medical Statistics Sun Jing Health Statistics Department

Population: The whole collection of

individuals that one intends to study.Sample: A representative part of the

population.

Population and sample

Page 9: Introduction to Medical Statistics Sun Jing Health Statistics Department

Questions: Which one is “population”?

All the cases with hepatitis B collected in a hospital in Guangzhou.

All the deaths found from the permanent residents in a city.

All the rats for testing the toxicity of a medicine.

 

Page 10: Introduction to Medical Statistics Sun Jing Health Statistics Department

Randomization : An important way to make the sample representative.

Randomization

Page 11: Introduction to Medical Statistics Sun Jing Health Statistics Department

ProbabilityProbability

Measure the possibility of occurrence of a random event.

A : random eventP(A) : Probability of the random event A

P(A)=1 , if an event always occurs.

P(A)=0, if an event never occurs.

Page 12: Introduction to Medical Statistics Sun Jing Health Statistics Department

RandomRandom By chance!Random event: the event may occur or may not

occur in one experiment.

Before one experiment, nobody is sure whether the event occurs or not.

Question: Please give some examples of random event.

There must be some regulation in a large number of experiments.

Page 13: Introduction to Medical Statistics Sun Jing Health Statistics Department

Number of observations: n (large enough)

Number of occurrences of random event A: m

P(A) m/n

(Frequency or Relative frequency)

Question: Please give some examples for

probability of a random event,

and frequency of that random event

Estimation of Probability----FrequencyEstimation of Probability----Frequency

Page 14: Introduction to Medical Statistics Sun Jing Health Statistics Department

Parameter and statisticParameter and statistic

Parameter : A measure of population

or

A measure of the distribution of population.

Parameter is usually presented by Greek letter

such as μ,π.-- Parameters are unknown usually

Page 15: Introduction to Medical Statistics Sun Jing Health Statistics Department

To know the parameter of a population, we need a sample

Statistic: A measure of sample or A measure of the distribution of sample.

Statistic is usually presented by Latin letter such as s and p.

Questions: Please give an example for parameter and statistics. Does a parameter vary? Does a statistic vary? 

Page 16: Introduction to Medical Statistics Sun Jing Health Statistics Department

5. Sampling Error5. Sampling Error

The difference between observed value and true value.

Three kinds of error:

(1)   Systematic error (fixed)

(2)   Measurement error (random)

(3) Sampling error (random)

Page 17: Introduction to Medical Statistics Sun Jing Health Statistics Department

Sampling errorThe statistics of different samples from same

population: different each other!The statistics: different from the parameter!

The sampling error exists in any sampling research.

It can not be avoided but may be estimated.

Page 18: Introduction to Medical Statistics Sun Jing Health Statistics Department

8.2 Types of data8.2 Types of data

1. Numerical Variable and Measurement Data

The variable describe the characteristic of individuals quantitatively

-- Numerical Variable

The data of numerical variable

-- Measurement Data

Page 19: Introduction to Medical Statistics Sun Jing Health Statistics Department

2. Categorical Variable and

Enumeration Data

The variable describe the category of individuals according to a characteristic of individuals

-- Categorical VariableThe number of individuals in each category

-- Enumeration Data

Page 20: Introduction to Medical Statistics Sun Jing Health Statistics Department

Special case of categorical variable :

Ordinal variable and rank data

There exists order among all possible categories

-- Ordinal variable

The data of ordinal variable, which represent the order of individuals only

-- Rank data

Page 21: Introduction to Medical Statistics Sun Jing Health Statistics Department

ExamplesExamples

Which type of variables they belong to?RBC (4.58 106/l)Diastolic/systolic blood pressure (8/12 kappa)Percentage of individuals with blood type

A (20%) Protein in urine (++)Transition rate of cell ( 90%)

Page 22: Introduction to Medical Statistics Sun Jing Health Statistics Department

2. Measures for Average2. Measures for Average

(1) Arithmetic Meathe : it is calculated by summering all the observations in a set of data and dividing by the total number of measurements.

Based on observed data

Example: Blood sugar 6.2, 5.4, 5.7, 5.3, 6.1, 6.0, 5.8, 5.9

n

X

n

X

n

XXXX

n

ii

n

121 ...

8.58

4.46

8

9.5...4.52.6

X

Page 23: Introduction to Medical Statistics Sun Jing Health Statistics Department

(2) Geometric mean

Example 9-4 See Table 9-4

)lg

(lg)lg...lglg

(lg 1211

21

n

X

n

XXXG

XXXG

n

nn

Page 24: Introduction to Medical Statistics Sun Jing Health Statistics Department

(3) Median Ranking the values of observation from the smallest to the largest, Median = the value in the middle. It is also called 50th

percentile: half the values would be greater than it and the other half would be less than it

Page 25: Introduction to Medical Statistics Sun Jing Health Statistics Department

(3) Median

Based on raw dataBased on raw data

Example 1: (7 values)

120,123,125,127,128,130,132

Median =127 

Example 2: (8 values)

118,120,123,125,127,128,130,132

Median=(125+127)/2=126

Page 26: Introduction to Medical Statistics Sun Jing Health Statistics Department

3. Measures for variability3. Measures for variability

(1) Range

Range= Maximum - Minimum

Based on only two observations, it ignores the

observations within the two extremes.

The greater the number of observations, the

greater the range is.

Page 27: Introduction to Medical Statistics Sun Jing Health Statistics Department

(2) Inter- quartile range

Lower Quartile: 25 percentile

Upper Quartile: 75 percentile

Difference between two Quartiles

= Upper Quartile - Lower Quartile

= 13.120 – 8.083 = 5.037

Page 28: Introduction to Medical Statistics Sun Jing Health Statistics Department

(3)Variance and Standard Deviation

variance is calculated by subtracting the mean of a set of data value from each of the observations, squaring these deviations, adding them up, and dividing by one less than the number of the observations in the data set. The mean of squared deviation

n

X

22 )(

1

)(

1

)(

22

22

n

n

XX

n

XXs

Page 29: Introduction to Medical Statistics Sun Jing Health Statistics Department

(3)Variance and Standard Deviation

Standard deviation (SD): is the square root of the variance.

n

X

2)(

1

)(

1

)(

22

2

n

n

XX

n

XXs

Page 30: Introduction to Medical Statistics Sun Jing Health Statistics Department

(4)Coefficient of Variation(4)Coefficient of Variation

X

sCV

CV: is a ratio of standard deviation to arithmetic mean multiplied by 100.

Example 9-10 Variation of height and variation of weight Mean

(1)

Standard deviation

(2)

Coefficient of

Variation (%)

(3)=(2)/(1)

Height 171.21(cm) 5.34(cm) 3.12

Weight 59.72 (kg) 4.16 (kg) 6.97

Page 31: Introduction to Medical Statistics Sun Jing Health Statistics Department

Absolute measure: The numbers counted for each

category (frequencies) The absolute measure can

hardly be used for comparison between different populations.

Page 32: Introduction to Medical Statistics Sun Jing Health Statistics Department

Relative measureRelative measure Three kinds of relative measures:

Frequency (Proportion) Intensity (Rate) Ratio

Page 33: Introduction to Medical Statistics Sun Jing Health Statistics Department

Relative FrequencyRelative Frequency

condition certain with possiblely units ofnumber totalThe

conditioncertain with units ofnumber The

frequency Relative Proportion

It is proportion or Relative frequency!

Page 34: Introduction to Medical Statistics Sun Jing Health Statistics Department

Eg3.1 In an alcohol drinking survey with sample size 2327 aged between 15 and 65, it was found that there were 347 were alcohol abusers, estimate the relative frequency of alcohol abuse.According to the formula, the relative frequency

of alcohol abuse =347/2327×100%=14.9%.

Page 35: Introduction to Medical Statistics Sun Jing Health Statistics Department

Proportion ( constitute rate): A part consi

dered in relation to the whole.

Eg, proportion of sex

proportion of age

proportion of mortality of diseases

Page 36: Introduction to Medical Statistics Sun Jing Health Statistics Department

Disease Mortality Proportion (%)

Malignant tumor 50 33.33

Circulation system 40 26.67

Respiration system 30 20.00

Digestive system 20 13.33

Infectious disease 10 6.67

Total 150 100.00

Table 3.1 proportions of 5 disease death in 2001

Page 37: Introduction to Medical Statistics Sun Jing Health Statistics Department

Example 1

%16.15)gradeFirst Myopia( P

%89.15)grade SecondMyopia( P

%36.18)grade ThirdMyopia( P

%08.35)MyopiagradeFirst ( P%60.35)Myopiagrade Second( P

%32.29)Myopiagrade Third( P

Question: Which grade has the most serious condition of myopias?

Table 10-1 Prevalence rates and constitute of myopia in a junior high school Grade Number

of students tested

Number of students

with myopia

Prevalence rate (%)

Constitute Among myopias

(%) First grade 442 67 15.16 35.08 Second grade 428 68 15.89 35.60 Third grade 305 56 18.36 29.32 Total 191 100.00

Page 38: Introduction to Medical Statistics Sun Jing Health Statistics Department

Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade)Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia)Which grade has the most serious condition of myopias?Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.

Page 39: Introduction to Medical Statistics Sun Jing Health Statistics Department

(2) Intensity(2) IntensityExample A smoking population had Example A smoking population had followed up for 562833 person-years, 346 followed up for 562833 person-years, 346 lung cancer cases were found.lung cancer cases were found.

The incidence rate of lung cancer in the The incidence rate of lung cancer in the smoking population is :smoking population is :

Incidence rate =346/562833Incidence rate =346/562833

=61.47 per 100,000 person-year=61.47 per 100,000 person-year

year theduring disease ofrisk the toexposing yearsperson

year theduring occuring patients ofNumber

yearcertain in rate Incidence

Page 40: Introduction to Medical Statistics Sun Jing Health Statistics Department

period in the observed years-person Total

period in the appearing events ofNumber

periodcertain in Intensity

In general,

Denominator: Sum of the person-years observed in the period

Numerator: Total number of the event appearing in the period

Unit: person/person year, or 1/YearNature: the relative frequency per unit of time.

Page 41: Introduction to Medical Statistics Sun Jing Health Statistics Department

Eg3.2 In an infection survey, the researchers observed

500 patients in a hospital, the total number of observed

days is 12500 (person-day). They were found that 59

patients were infected in the hospital. Calculate the

daily infection rate in the hospital.

According to the formula, the daily infection rate =

59/12500 = 0.00472 = 0.472%, that is, there are 0.472%

of patients may be infected every day in this hospital.

Page 42: Introduction to Medical Statistics Sun Jing Health Statistics Department

year theduringdeath ofrisk the toexposing yearsperson

year theduring deaths ofNumber

yearcertain in rateMortality

Example The mortality rate of liver cancer in Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.Guangzhou is 32 per 100,000 per year.

Page 43: Introduction to Medical Statistics Sun Jing Health Statistics Department

(3) Ratio(3) Ratio Ratio is a number divided by another related numberExamples Sex ratio of students in this class: No. of males : No. of females = 52%

Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40

Page 44: Introduction to Medical Statistics Sun Jing Health Statistics Department

Ratio : It is quotient of any two values. It repr

esents the times of one to another.

Eg, ratio of sex

ratio of sickbed of two hospital

relative risk

Page 45: Introduction to Medical Statistics Sun Jing Health Statistics Department

Eg3.3 There are 14750 doctors in a city with

population of 8100000 in 2000. Find out the possession

of doctors per 1000 person in this city.

According to the formula, the possession of doctors

per 1000 person = 14750/8100 = 1.82, that is, it is hold

about 1.82 doctors per 1000 people.

Page 46: Introduction to Medical Statistics Sun Jing Health Statistics Department

13.1 Principles of research design13.1 Principles of research design

1. Control

2. Balance

3. Randomization

4. Replication

Page 47: Introduction to Medical Statistics Sun Jing Health Statistics Department

2. Balance: The experimental group and control group are almost the same in all aspects except the treatment.

Others

Effect of others

Effect of treatmentTreatment Subject

Others

Control Effect of controlSubject

Effect of others

Page 48: Introduction to Medical Statistics Sun Jing Health Statistics Department

3. Randomization

Many factors, we know that they may influence the results, but they are very difficult to deal with – Randomization is the best choice!

Example To improve the homogeneity of subjects, collect a number of students with same age and gender; randomly arrange them into two groups to make them balance in height and weight.

Page 49: Introduction to Medical Statistics Sun Jing Health Statistics Department

Randomization is the prerequisite of statistical inference.

Randomization Casual

Randomization means that all subjects in population have same probability to be sampled out for research.

Page 50: Introduction to Medical Statistics Sun Jing Health Statistics Department

4. ReplicationOne meaning of replication :

The results can be reproduced in different labs and by different researchers.

Another meaning of replication :

The study should be performed in a big enough sample.

Altman & Dore checked 90 papers: 39% mentioned their sample size and why.Sample sizes of 27% papers were too small to ma

ke a conclusion.

Page 51: Introduction to Medical Statistics Sun Jing Health Statistics Department

Experimental designExperimental design

1. Why?

To plan and arrange subject selection, treatment

assignment, data collection and statistical analysis

To make sure validity, reproducible and economy.

2. Types of research

• Experiment: animal experiment, clinical trial,

community intervention trial

• Survey

Both need well design !

Page 52: Introduction to Medical Statistics Sun Jing Health Statistics Department

Survey designSurvey design

1. Survey Observe the existing process Without intervention Well design Example for surveys: Health condition survey Epidemiological survey Etiologic survey Clinical follow up survey Sanitary survey …….

Page 53: Introduction to Medical Statistics Sun Jing Health Statistics Department

regression coefficient regression coefficient

regression coefficient: measures the quantitative

dependency relationship of the variable Y on X.

Page 54: Introduction to Medical Statistics Sun Jing Health Statistics Department

correlation coefficient correlation coefficient

correlation coefficient (also called coefficient of product-moment

correlation) measures the strength and direction of the linear

relationship between the two variables.

“regression ” has became the statistic term which show the

quantitative dependency between the variables, and formed some

new statistic concepts such as the “regression equation” and

“regression coefficient”.