introduction to medical statistics sun jing health statistics department

Post on 16-Jan-2016

231 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Medical Statistics

Sun Jing

Health Statistics Department

Contents Introduction basic concepts in Medical Statistics

Small game to practice basic concepts

Vocabulary

Medical Statistics 医学统计学

Statistics  

The discipline concerned with the treatment of numerical data derived from groups of individuals (P. Armitage).

The science and art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results ( JM Last).

Medical Statistics  

Application of mathematical statistics inthe field of medicine

Homogeneity: All individuals have similar values or belong to same category.

Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation.

Homogeneity and Variation

Variation: the differences in height, weight…

Toss a coin: The mark face may be up or down ---- variation!

Treat the patients suffering from pneumonia with same antibiotics: A part of them recovered and others didn’t ---- variation!

If there is no variation, there is no need for statistics.

Can you give an example of variation in medical field?

Homogeneity and Variation

Population: The whole collection of

individuals that one intends to study.Sample: A representative part of the

population.

Population and sample

Questions: Which one is “population”?

All the cases with hepatitis B collected in a hospital in Guangzhou.

All the deaths found from the permanent residents in a city.

All the rats for testing the toxicity of a medicine.

 

Randomization : An important way to make the sample representative.

Randomization

ProbabilityProbability

Measure the possibility of occurrence of a random event.

A : random eventP(A) : Probability of the random event A

P(A)=1 , if an event always occurs.

P(A)=0, if an event never occurs.

RandomRandom By chance!Random event: the event may occur or may not

occur in one experiment.

Before one experiment, nobody is sure whether the event occurs or not.

Question: Please give some examples of random event.

There must be some regulation in a large number of experiments.

Number of observations: n (large enough)

Number of occurrences of random event A: m

P(A) m/n

(Frequency or Relative frequency)

Question: Please give some examples for

probability of a random event,

and frequency of that random event

Estimation of Probability----FrequencyEstimation of Probability----Frequency

Parameter and statisticParameter and statistic

Parameter : A measure of population

or

A measure of the distribution of population.

Parameter is usually presented by Greek letter

such as μ,π.-- Parameters are unknown usually

To know the parameter of a population, we need a sample

Statistic: A measure of sample or A measure of the distribution of sample.

Statistic is usually presented by Latin letter such as s and p.

Questions: Please give an example for parameter and statistics. Does a parameter vary? Does a statistic vary? 

5. Sampling Error5. Sampling Error

The difference between observed value and true value.

Three kinds of error:

(1)   Systematic error (fixed)

(2)   Measurement error (random)

(3) Sampling error (random)

Sampling errorThe statistics of different samples from same

population: different each other!The statistics: different from the parameter!

The sampling error exists in any sampling research.

It can not be avoided but may be estimated.

8.2 Types of data8.2 Types of data

1. Numerical Variable and Measurement Data

The variable describe the characteristic of individuals quantitatively

-- Numerical Variable

The data of numerical variable

-- Measurement Data

2. Categorical Variable and

Enumeration Data

The variable describe the category of individuals according to a characteristic of individuals

-- Categorical VariableThe number of individuals in each category

-- Enumeration Data

Special case of categorical variable :

Ordinal variable and rank data

There exists order among all possible categories

-- Ordinal variable

The data of ordinal variable, which represent the order of individuals only

-- Rank data

ExamplesExamples

Which type of variables they belong to?RBC (4.58 106/l)Diastolic/systolic blood pressure (8/12 kappa)Percentage of individuals with blood type

A (20%) Protein in urine (++)Transition rate of cell ( 90%)

2. Measures for Average2. Measures for Average

(1) Arithmetic Meathe : it is calculated by summering all the observations in a set of data and dividing by the total number of measurements.

Based on observed data

Example: Blood sugar 6.2, 5.4, 5.7, 5.3, 6.1, 6.0, 5.8, 5.9

n

X

n

X

n

XXXX

n

ii

n

121 ...

8.58

4.46

8

9.5...4.52.6

X

(2) Geometric mean

Example 9-4 See Table 9-4

)lg

(lg)lg...lglg

(lg 1211

21

n

X

n

XXXG

XXXG

n

nn

(3) Median Ranking the values of observation from the smallest to the largest, Median = the value in the middle. It is also called 50th

percentile: half the values would be greater than it and the other half would be less than it

(3) Median

Based on raw dataBased on raw data

Example 1: (7 values)

120,123,125,127,128,130,132

Median =127 

Example 2: (8 values)

118,120,123,125,127,128,130,132

Median=(125+127)/2=126

3. Measures for variability3. Measures for variability

(1) Range

Range= Maximum - Minimum

Based on only two observations, it ignores the

observations within the two extremes.

The greater the number of observations, the

greater the range is.

(2) Inter- quartile range

Lower Quartile: 25 percentile

Upper Quartile: 75 percentile

Difference between two Quartiles

= Upper Quartile - Lower Quartile

= 13.120 – 8.083 = 5.037

(3)Variance and Standard Deviation

variance is calculated by subtracting the mean of a set of data value from each of the observations, squaring these deviations, adding them up, and dividing by one less than the number of the observations in the data set. The mean of squared deviation

n

X

22 )(

1

)(

1

)(

22

22

n

n

XX

n

XXs

(3)Variance and Standard Deviation

Standard deviation (SD): is the square root of the variance.

n

X

2)(

1

)(

1

)(

22

2

n

n

XX

n

XXs

(4)Coefficient of Variation(4)Coefficient of Variation

X

sCV

CV: is a ratio of standard deviation to arithmetic mean multiplied by 100.

Example 9-10 Variation of height and variation of weight Mean

(1)

Standard deviation

(2)

Coefficient of

Variation (%)

(3)=(2)/(1)

Height 171.21(cm) 5.34(cm) 3.12

Weight 59.72 (kg) 4.16 (kg) 6.97

Absolute measure: The numbers counted for each

category (frequencies) The absolute measure can

hardly be used for comparison between different populations.

Relative measureRelative measure Three kinds of relative measures:

Frequency (Proportion) Intensity (Rate) Ratio

Relative FrequencyRelative Frequency

condition certain with possiblely units ofnumber totalThe

conditioncertain with units ofnumber The

frequency Relative Proportion

It is proportion or Relative frequency!

Eg3.1 In an alcohol drinking survey with sample size 2327 aged between 15 and 65, it was found that there were 347 were alcohol abusers, estimate the relative frequency of alcohol abuse.According to the formula, the relative frequency

of alcohol abuse =347/2327×100%=14.9%.

Proportion ( constitute rate): A part consi

dered in relation to the whole.

Eg, proportion of sex

proportion of age

proportion of mortality of diseases

Disease Mortality Proportion (%)

Malignant tumor 50 33.33

Circulation system 40 26.67

Respiration system 30 20.00

Digestive system 20 13.33

Infectious disease 10 6.67

Total 150 100.00

Table 3.1 proportions of 5 disease death in 2001

Example 1

%16.15)gradeFirst Myopia( P

%89.15)grade SecondMyopia( P

%36.18)grade ThirdMyopia( P

%08.35)MyopiagradeFirst ( P%60.35)Myopiagrade Second( P

%32.29)Myopiagrade Third( P

Question: Which grade has the most serious condition of myopias?

Table 10-1 Prevalence rates and constitute of myopia in a junior high school Grade Number

of students tested

Number of students

with myopia

Prevalence rate (%)

Constitute Among myopias

(%) First grade 442 67 15.16 35.08 Second grade 428 68 15.89 35.60 Third grade 305 56 18.36 29.32 Total 191 100.00

Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade)Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia)Which grade has the most serious condition of myopias?Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.

(2) Intensity(2) IntensityExample A smoking population had Example A smoking population had followed up for 562833 person-years, 346 followed up for 562833 person-years, 346 lung cancer cases were found.lung cancer cases were found.

The incidence rate of lung cancer in the The incidence rate of lung cancer in the smoking population is :smoking population is :

Incidence rate =346/562833Incidence rate =346/562833

=61.47 per 100,000 person-year=61.47 per 100,000 person-year

year theduring disease ofrisk the toexposing yearsperson

year theduring occuring patients ofNumber

yearcertain in rate Incidence

period in the observed years-person Total

period in the appearing events ofNumber

periodcertain in Intensity

In general,

Denominator: Sum of the person-years observed in the period

Numerator: Total number of the event appearing in the period

Unit: person/person year, or 1/YearNature: the relative frequency per unit of time.

Eg3.2 In an infection survey, the researchers observed

500 patients in a hospital, the total number of observed

days is 12500 (person-day). They were found that 59

patients were infected in the hospital. Calculate the

daily infection rate in the hospital.

According to the formula, the daily infection rate =

59/12500 = 0.00472 = 0.472%, that is, there are 0.472%

of patients may be infected every day in this hospital.

year theduringdeath ofrisk the toexposing yearsperson

year theduring deaths ofNumber

yearcertain in rateMortality

Example The mortality rate of liver cancer in Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.Guangzhou is 32 per 100,000 per year.

(3) Ratio(3) Ratio Ratio is a number divided by another related numberExamples Sex ratio of students in this class: No. of males : No. of females = 52%

Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40

Ratio : It is quotient of any two values. It repr

esents the times of one to another.

Eg, ratio of sex

ratio of sickbed of two hospital

relative risk

Eg3.3 There are 14750 doctors in a city with

population of 8100000 in 2000. Find out the possession

of doctors per 1000 person in this city.

According to the formula, the possession of doctors

per 1000 person = 14750/8100 = 1.82, that is, it is hold

about 1.82 doctors per 1000 people.

13.1 Principles of research design13.1 Principles of research design

1. Control

2. Balance

3. Randomization

4. Replication

2. Balance: The experimental group and control group are almost the same in all aspects except the treatment.

Others

Effect of others

Effect of treatmentTreatment Subject

Others

Control Effect of controlSubject

Effect of others

3. Randomization

Many factors, we know that they may influence the results, but they are very difficult to deal with – Randomization is the best choice!

Example To improve the homogeneity of subjects, collect a number of students with same age and gender; randomly arrange them into two groups to make them balance in height and weight.

Randomization is the prerequisite of statistical inference.

Randomization Casual

Randomization means that all subjects in population have same probability to be sampled out for research.

4. ReplicationOne meaning of replication :

The results can be reproduced in different labs and by different researchers.

Another meaning of replication :

The study should be performed in a big enough sample.

Altman & Dore checked 90 papers: 39% mentioned their sample size and why.Sample sizes of 27% papers were too small to ma

ke a conclusion.

Experimental designExperimental design

1. Why?

To plan and arrange subject selection, treatment

assignment, data collection and statistical analysis

To make sure validity, reproducible and economy.

2. Types of research

• Experiment: animal experiment, clinical trial,

community intervention trial

• Survey

Both need well design !

Survey designSurvey design

1. Survey Observe the existing process Without intervention Well design Example for surveys: Health condition survey Epidemiological survey Etiologic survey Clinical follow up survey Sanitary survey …….

regression coefficient regression coefficient

regression coefficient: measures the quantitative

dependency relationship of the variable Y on X.

correlation coefficient correlation coefficient

correlation coefficient (also called coefficient of product-moment

correlation) measures the strength and direction of the linear

relationship between the two variables.

“regression ” has became the statistic term which show the

quantitative dependency between the variables, and formed some

new statistic concepts such as the “regression equation” and

“regression coefficient”.

top related