1. introduction to biostatistics

43
KNOWLEDGE FOR THE BENEFIT OF HUMANITY KNOWLEDGE FOR THE BENEFIT OF HUMANITY BIOSTATISTICS (HFS3283) INTRODUCTION TO BIOSTATISTICS Dr. Dr. Mohd Mohd Razif Razif Shahril Shahril School of Nutrition & Dietetics School of Nutrition & Dietetics Faculty of Health Sciences Faculty of Health Sciences Universiti Universiti Sultan Sultan Zainal Zainal Abidin Abidin 1

Upload: razif-shahril

Post on 12-Jan-2017

3.024 views

Category:

Health & Medicine


7 download

TRANSCRIPT

Page 1: 1. Introduction to biostatistics

KNOWLEDGE FOR THE BENEFIT OF HUMANITYKNOWLEDGE FOR THE BENEFIT OF HUMANITY

BIOSTATISTICS (HFS3283)

INTRODUCTION TO BIOSTATISTICS

Dr.Dr. MohdMohd RazifRazif ShahrilShahril

School of Nutrition & Dietetics School of Nutrition & Dietetics

Faculty of Health SciencesFaculty of Health Sciences

UniversitiUniversiti Sultan Sultan ZainalZainal AbidinAbidin

1

Page 2: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Synopsis “This course exposes students to different types of data,

measures of central tendency and dispersion, probability,

normal distribution and inferential analysis in biostatistics. It

also covers various basic statistical analyses including

descriptive, parametric and nonparametric tests,

association and prediction models commonly used in

research. This course provides hands-on experience for

students to perform statistical analysis using SPSS and

interpret their outcomes in answering specific research

question”

2

Page 3: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Learning Outcomes At the end of this course, students should be able to;

• Describe concepts of descriptive, inferential, parametric

and non-parametric tests in biostatistics.

• Describe concepts of categorical data analysis,

association, prediction, reliability and validity in

biostatistics.

• Choose statistical analysis of data based on types of

variables and objective of analysis using SPSS and

interpret their outcomes.

3

Page 4: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Teaching & Learning Strategy Blended Mode Learning (Face to Face + e-Learning)

• Interactive Lecture – offline + online

• Hands-on Practical

• Project

• Self-Directed Learning (Literature, Video)

4

Page 5: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Brief Course Outline 1. Introduction to biostatistics

2. Inferential statistics

3. Independent t test

4. ANOVA & Paired t-test

5. Nonparametric analysis

6. Categorical data analysis: Chi square & Fisher exact test

7. Categorical data analysis: OR and RR

8. Correlation

9. Regression

10. Validity analysis

11. Reliability analysis

5

Page 6: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Course Schedule

6

WEEK / DATE ACTIVITIES

1 MONDAY

(25/01/2016)

Lecture (8.00am - 10.00am) – Dr. Mohd Razif Shahril

INTRODUCTION TO BIOSTATISTICS

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

INTRODUCTION TO SPSS

2 MONDAY

(01/02/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

INFERENTIAL STATISTICS

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: DESCRIPTIVE ANALYSIS & NORMALITY

3 MONDAY

(08/02/2016)

CHINESE NEW YEAR (PUBLIC HOLIDAY)

4 MONDAY

(15/02/2016)

Lecture (8.00am - 10.00am) –Dr. Sharifah Wajihah Wafa

INDEPENDENT T-TEST

Practical (10.00am - 12.00pm) –Dr. Sharifah Wajihah Wafa

SPSS: INDEPENDENT T-TEST

5 MONDAY

(22/02/2016)

Lecture (8.00am - 10.00am) –Dr. Sharifah Wajihah Wafa

ANOVA & PAIRED T-TEST

Practical (10.00am - 12.00pm) –Dr. Sharifah Wajihah Wafa

SPSS: ANOVA & PAIRED T-TEST

Page 7: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Course Schedule (cont.)

7

WEEK / DATE ACTIVITIES

6 MONDAY

(29/02/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

NONPARAMETRIC ANALYSIS

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: NONPARAMETRIC ANALYSIS

7 MONDAY

(07/03/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

CATEGORICAL DATA ANALYSIS: CHI SQUARE & FISHER

EXACT TEST

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: CHI SQUARE & FISHER EXACT TEST

THURSDAY

(10/03/2016)

TEST (8.30am - 10.00am) –Dr. Mohd Razif Shahril

8 MONDAY

(14/03/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

CATEGORICAL DATA ANALYSIS: OR AND RR

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: OR AND RR

MID SEMESTER BREAK (20/03/2016 – 26/03/2016)

Page 8: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Course Schedule (cont.)

8

WEEK / DATE ACTIVITIES

9 MONDAY

(28/03/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

CORRELATION

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: CORRELATION

10 MONDAY

(04/04/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

REGRESSION

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: REGRESSION

11 MONDAY

(11/04/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

RELIABILITY

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: INTERNAL CONSISTENCY & INTRACLASS

CORRELATION

12 MONDAY

(18/04/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

VALIDITY

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

SPSS: FACTOR ANALYSIS

Page 9: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Course Schedule (cont.)

9

WEEK / DATE ACTIVITIES

13 MONDAY

(25/04/2016)

Project (8.00am - 11.00am) –Dr. Mohd Razif Shahril

REPORT PREPARATION

14 MONDAY

(02/05/2016)

Project (8.00am - 11.00am) –Dr. Mohd Razif Shahril

REPORT SUBMISSION

STUDY WEEK (06/05/2016 – 09/05/2016)

END OF SEMESTER EXAMINATION (10/05/2016 – 26/05/2016)

Location Makmal Komputer Khadijah

(Khadijah Computer Lab)

Page 10: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Assessment • Continuous Assessment

– Test (30%)

– Project Report (1500 words) (30%)

• Summative Assessment

– End of Semester Examination (40%)

10

Page 11: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Main References

11

Page 12: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

12

ATTENDANCE

LESS THAN 80% BARRED FROM EXAMINATION

[NO MC FROM PRIVATE CLINICS!]

Page 13: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Topic Learning Outcomes At the end of this lecture, students should be able to;

• define data and types of data.

• define descriptive statistics, variables and scales.

• explain types of central tendency and dispersion

measurements.

13

Page 14: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What is DATA?

14

• Definition: A collection of items of information

• Types of data

– Qualitative

– Quantitative

Page 15: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What is DATA? (cont)

15

• Qualitative data

– Observation or information characterized by

measurement on a categorical scale (dichotomous,

nominal or ordinal scale).

– Data that describe a quality of the subject studied.

• E.g. gender, ethnic, death or survival, nationality etc.

– Generally describes in terms of percentages or

proportions.

– Mostly displayed by using contingency table, pie

chart, bar charts.

Page 16: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What is DATA? (cont)

16

• Quantitative data

– Data in numerical quantities such as continuous

measurements or counts.

– Observation for which the differences between

numbers have meaning on a numerical scale.

– They measure the quantity of something

– Types of numerical scales;

• Continuous scale (e.g. Age, height)

• Discrete scale (e.g. Number of pregnancy)

– Described in terms of means and standard deviation.

– Frequency tables and histograms are most often used

to display this type of information.

Page 17: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

How to analyse DATA? (cont)

17

• Using STATISTICS!

– A small representative ‘sample’ is used to study a big

‘population’

• Why?

– Expensive to conduct very large study

– Impossible to collect information from everyone in the

population

• Types of statistics

– Descriptive statistics

– Inferential statistics

Page 18: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

How to analyse DATA? (cont)

18

Population: A complete collection of data on the group under study Sample: A collection of sampling units selected from the population Sampling unit: A member of the population

Sampling unitSampling unit n=1n=1

SampleSample n= 200n= 200

PopulationPopulation N=20,000N=20,000

Page 19: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Descriptive statistics

19

• Describe the frequency and distribution to

characterize data collected from a group of

sample to represent the population.

• E.g.

– Percentage of patients attending diabetes clinic

– Gender, age group, education level of the patients

– Patients waiting time for doctors consultation

– Patients fasting glucose and HbA1c level

– Etc.

Page 20: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Variable

20

• A variable is a characteristics that can take on

different values for different members of the

group under study

– E.g. a group of university students will be found to

differ in gender, height, attitudes, intelligence and may

ways. These characteristics are called variables.

• Categories of variable

– Continuous vs. Discrete

– Independent vs. Dependent

Page 21: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Variable (cont.)

21

• Continuous variable – can take on any values on the measurement scale under

study

– Do not fit into a finite number or categories

– Referred to as measurement data

– E.g. weight, height, age, blood pressure etc.

• Discrete variable – only designated values or integer values i.e. 1, 2, 3…

– Fit into limited categories

– Referred as count data (dichotomous/ multichotomous) • E.g. dichotomous

– Male-Female

– Yes-No

• E.g. multichotomous – Malay-Chinese-Indian

– Man Utd-Arsenal-Chelsea-Man City-Liverpool

Page 22: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Variable (cont.)

22

• Independent variable (IV)

– Manipulated in accordance with the purpose of the

investigation

– Set by researcher

• Dependent variable (DV)

– Consequence of the independent variable

– Effected by independent variable

– Outcome

Page 23: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Scales

23

• Type of scales

– Nominal – classify observation into categories that

cannot be numerically arranged (no order)

– Ordinal – assign order to categories so that one

category is higher than another

– Interval / ratio –sequential ranking of values (as

ordinal scales)

Page 24: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Measurement of central tendency

24

Mean

Mode

Median

Page 25: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Mode

25

• Is the most frequent occurring value in a set of

discrete data

• Can be more than one mode if two or more

values are equally common

• E.g.

– 1,3,4,7,2,5,9,4,6,7,8,9,3,4,9,6,4,5,2,1,6,6,7,4,3

– 1,1, 2,2,3,3,3,4,4,4,4,4,5,5,6,6,6,6,7,7,7,8,9,9,9

Mode=4

Page 26: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Median

26

• The value halfway through the ordered data set

• Generally a good descriptive measure of the

location which works well for skewed data or

data with outliers

• E.g. (n=25)

– 3,4,7,2,5,1, 9,4,6,7,8,9,3,4,9,6,4,5,2,1,6,6,7,4,3

– 1,1, 2,2,3,3,3,4,4,4,4,4,5,5,6,6,6,6,7,7,7,8,9,9,9

Median=5

Ordered data

Page 27: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Mean

27

• The sample mean is an estimator available for

estimating the population mean.

• Its value depends equally on all of the data

which may include outliers.

• E.g. (N=10) (3, 4, 7, 2, 5, 7, 5, 5, 1, 2)

= 3+3+4+7+2+5+7+5+5+1+2

10

= 4.1

Page 28: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Measurement of dispersion

28

• Used to describe the variability (spread and

dispersion) in a given sample

• Dispersion measurement;

– Range

– Percentiles

– Variance

– Standard deviation

– Standard error

– Interquartile range

Page 29: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Measurement of dispersion (cont.)

29

• Range – Difference between highest and lowest value

• Percentiles – Indicate the % of individuals who have equal to/below a given

value.

• Variance – Provides information about how individuals differ within sample.

• Standard deviation (SD) – Gives information about the spread/variability of scores around

the mean.

• Standard error (SE) – Indicates about the certainty of the mean itself.

• Interquartile Range (IQR) – the distance between 1st and 3rd quartile

Page 30: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Range

30

• Is the difference between the smallest and

largest value in a set of observation

• Range = (the largest value – the smallest value)

– E.g. 3,5,6,7,9,10

– Range = 10 – 3 = 7

• Uses only extreme values and ignores the other

values in the data set.

Page 31: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Variance

31

• Measure spread or dispersion within a set of

sample data.

• E.g. for N observation X1, X2, X3,.... Xn with

sample mean:

• Therefore, the sample variance is

Page 32: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Standard deviation (SD)

32

• Measure of spread or dispersion of a set of data

• Calculated by taking square root of the variance

• The more widely the values are spread out, the

larger the standard deviation

Page 33: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Standard Error (of the Mean)

33

• The SEM quantifies the precision of the mean.

• A small SEM indicates that the sample mean is

likely to be quite close to the true population

mean.

• A large SEM indicates that the sample mean is

likely to be far from the true population mean

• A small SEM can be due to a large sample size

rather than due to tight data.

Page 34: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Interquartile Range (IQR)

34

• IQR is the distance between 1st and 3rd quartile.

• It is not sensitive to extreme values (outliers).

• Thus, it is usually described together with the

median in skewed distribution of observation.

• Formula: IQR = (Q3 – Q1)

Page 35: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Histogram

35

• Display the frequency distributions of one

variable

• Very similar to bar chart that are used for

categorical data

• Consists of a set of columns with no space

between each of them

– On horizontal axis, the variables of consideration

– On vertical axis, the scale of frequency of occurrence

– The area under the each column represents the

frequency of each class and thus the total under all

columns equals the total frequency

Page 36: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Histogram (cont.)

36

Page 37: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Histogram (cont.)

37

• Unimodal - bell shaped curve • Symmetric about its mean –

Mirror image • Mean = Median = Mode

Page 38: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Histogram (cont.)

38

Page 39: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Histogram (cont.)

39

Page 40: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Boxplot (box and whisker plot)

40

• It is a graphical display using percentile in an

ordered data

• The plot provides information about central

tendency and the variability of the distribution

• It allows detection of outliers and symmetry of

data set

– Outliers : if distance is away > (1.5 times IQR) above

and below of Median, it is shown as circle

– Extreme outliers : if distance is away > (3 times IQR)

above and below of Median, it is shown as asterisks

Page 41: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Boxplot (cont.)

41

Page 42: 1. Introduction to biostatistics

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Note

42

• In descriptive presentation use Mean (SD) or

Median (IQR)

– Normal distribution data, use Mean (SD)

– Skewed distribution data, use Median (IQR)

• In inferential results use Mean (SE)

Page 43: 1. Introduction to biostatistics

Thank YouThank You

43