biol2608 biometrics 2011-2012 computer lab session ii basic concepts in statistics

32
BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Upload: cornelia-peters

Post on 17-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

BIOL2608 Biometrics 2011-2012Computer lab session II

Basic concepts in statistics

Page 2: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Measures of central tendency

• Also known as measure of location

• Indicates the location of the popn/sample along the measurement scale

• Useful for describing and comparing popn

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Page 3: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Mean (= Arithmetic mean)

• Commonly called average

• Sum of all measurements in the popn/sample divided by the popn/sample size

Mean = (10.5 + 11.5 x 2 + 12 + 12.5 + 13 x 3 + 13.5 x 2 + 14 + 14.5 + 15) / 13 = 12.88cm

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Page 4: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Median

• Middle measurement in an ordered dataset

10.5 11.5 11.5 12.0 12.5 13.0 13.0 13.0 13.5 13.5 14.0 14.5 15.0

Median = the middle (7th) of the 13 measurements

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Page 5: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Quartile

• Describes an ordered dataset in four equal fractions – 1/4 of the data smaller than 1st quartile (Q1)

– 1/4 lies between Q1 and Q2

– 1/4 lies between Q2 and Q3

– 1/4 bigger than the Q3

10.5 11.5 11.5 12.0 12.5 13.0 13.0 13.0 13.5 13.5 14.0 14.5 15.0

Q1 = 11.63 Q2 = Median = 13.0 Q3 = 13.88

Page 6: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Percentile

• Describes an ordered dataset in 100 equal fractions– 25th percentile = 1st quartile – 50th percentile = 2nd quartile = median– 75th pecentile = 3rd quartile

Page 7: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Measures of dispersion and variability

• Indicates how the measurements spread around the center of distribution

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Sample A

Sample B

Page 8: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Variance and standard deviationSample A Sample B

Variance (s2) 1.17cm2 2.67cm2

Standard deviation (s) 1.08cm 1.63cm

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Sample A

Sample B

Page 9: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Population or sample?

• Population– Entire collection of measurements in which one is

interested

Page 10: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Population or sample?

Page 11: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Population or sample?

• Population– Entire collection of measurements in which one is

interested– Often large and hard to obtain all measurements

• Sample– Subset of all measurements in the population

Page 12: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Population or sample?

Page 13: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Population or sample?

Page 14: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

………..…..…………..…….……...

……..……………………………………………………………………………………………………………………………………………………………………………………………….………….......

Population or sample?

Sampling

Inference

Population (very large size)

Sample

Page 15: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Commonly used symbols

Population SampleMean μSize N nVariance σ2 s2

Standard deviation σ s

Page 16: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Estimation of mean

• Confidence Interval – Allows us to express the precision of the estimate of

population mean (μ) from sample mean ( )– When we say at 95% confidence level μ = ± y, it

means that we are 95% confident that μ lies between - y and + y

Page 17: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Estimation of variance and standard deviation

• NOTE: – Variance and standard deviation for a population

are calculated using slightly different formulae

.

Page 18: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normal distribution

• A very common bell-shaped statistical distribution of data which allows us to carry out different statistical analysis

Page 19: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics
Page 20: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normality check

• 6 criteria:Mean & Median Mean = Median

Page 21: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shape

Page 22: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Histogram

Bin: Ideal bin size obtained by dividing the range by ideal no. of bin (n = 5logn)

Page 23: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1

Page 24: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Skewness

• Negative skew– longer left tail– data concentrated

on the right

• Positive skew– longer right tail– data concentrated

on the left

Page 25: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Kurtosis

• Measure of “peakedness” and “tailedness”• Positive kurtosis (leptokurtic)

– More acute peak around mean– Longer, fatter tails

• Negative kurtosis (platykurtic)– Lower, wider peak

around mean– Shorter, thinner tails

Page 26: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1 Box plot Symmetric

Page 27: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Box plot

Page 28: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1 Box plot SymmetricP-P plot / Q-Q plot Dots follow the incline straight line

Page 29: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

P-P Plot / Q-Q Plot

Page 30: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1 Box plot SymmetricP-P plot / Q-Q plot Dots follow the incline straight lineGoodness of fit test K-S one-sample test; p > 0.05

Page 31: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

K-S one-sample test

Page 32: BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Related Readings

• Zar, J. H. (1999). Biostatistical Analysis, 4th edition. New Jersey: Prentice-Hall.– Chapters 2, 3, 4, 6