introduction to statistics rss6 2014

32
Introduction to Statistics Amr Albanna, MD, MSc

Upload: rss6

Post on 04-Jul-2015

215 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Introduction to statistics RSS6 2014

Introduction to Statistics

Amr Albanna, MD, MSc

Page 2: Introduction to statistics RSS6 2014

Content

• Scales of Measurement – Categorical Variables – Numerical Variables:

• Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart

• Numerical Measures of Central Tendency – Mean – Median – Mode

• Numerical Measures of Spread • Association • Correlation • Regression

Page 3: Introduction to statistics RSS6 2014

Scales of Measurement

• Categorical Variables: – Nominal: Categorical variable with no order (e.g. Blood

type A, B, AB or O). – Ordinal: Categorical, but with an order (e.g. Pain: “none",

“mild", “moderate", or “severe").

• Numerical Variables:

– Interval: Quantitative data where differences are meaningful (e.g. Years 2009 -2010.). Here differences are meaningful; ratios are not meaningful.

– Ratio: Quantitative data where ratios are meaningful (e.g. weights, 200 lbs is twice as heavy as 100 lbs).

Page 4: Introduction to statistics RSS6 2014

Categorical Variables

• Displays of Categorical Data

– Frequencies

– Bar Graph

– Pie Chart

Page 5: Introduction to statistics RSS6 2014

Categorical Variables Variable (Sex) Frequency Proportion

Male 609 0.61

Female 391 0.39

Total 1000 100

0

100

200

300

400

500

600

700

Male Female

Bar Graph Pie Chart

Page 6: Introduction to statistics RSS6 2014

Bar Graph

Page 7: Introduction to statistics RSS6 2014

Numerical Variables

Central Tendency

Numerical Spread

Page 8: Introduction to statistics RSS6 2014

Measures of Central Tendency

• The 3 M's

– Mean

– Median

– Mode

Page 9: Introduction to statistics RSS6 2014

Measures of Central Tendency

Sample Mean

The sample mean, 𝑥 , is the sum of all values in the sample divided by the total number of observations, n, in the sample.

𝑥 = 𝑥𝑖𝑛𝑖=1

𝑛

Page 10: Introduction to statistics RSS6 2014

Example: Sample Mean

• Mean systolic blood pressure

Scenario 1:

Mean = (120 + 135 + 115 + 110 + 105 + 140)/6 =121

Subjects BP

1 120 (x1)

2 135 (x2)

3 115 (x3)

4 110 (x4)

5 105 (x5)

6 140 (x6)

Page 11: Introduction to statistics RSS6 2014

Sample Mean

• The mean is affected by extreme observations and is not a resistant measure.

Scenario 2:

Mean = (120 + 135 + 115 + 110 + 105 + 140 + 280)/7 =144

Subjects BP

1 120 (x1)

2 135 (x2)

3 115 (x3)

4 110 (x4)

5 105 (x5)

6 140 (x6)

7 280 (x7)

Page 12: Introduction to statistics RSS6 2014

Median

• The sample median, M, is the number such that “half" the values in the sample are smaller and the other “half" are larger.

• Use the following steps to find M. – Sort the data (arrange in increasing order).

– Is the size of the data set n even or odd?

– If odd: M = value in the exact middle.

– If even: M = the average of the two middle numbers.

Page 13: Introduction to statistics RSS6 2014

Example: Sample Median

• Median systolic BP: Scenario 1: 120 : 135 : 115 : 110 : 105 : 140 Median = (115 + 110) /2 = 112.5 Scenario 2: 120 : 135 : 115 : 110 : 105 : 140 : 280 Median = 110

• The median is not affected by extreme observations and is a resistant measure.

Page 14: Introduction to statistics RSS6 2014

Mode

• The sample mode is the value that occurs most frequently in the sample (a data set can have more than one mode).

• This is the only measure of center which can also be used for categorical data.

• The population mode is the highest point on the population distribution.

Page 15: Introduction to statistics RSS6 2014

Symmetric Data Distribution

0

1

2

3

4

5

6

10 20 30 40 50

Fre

qu

en

cy

Value

Page 16: Introduction to statistics RSS6 2014

Rightward Skewness of Data

0

1

2

3

4

5

6

10 20 30 40 50

Mode

Fre

qu

en

cy

Value

Median Mean

Page 17: Introduction to statistics RSS6 2014

Leftward Skewness of Data

0

1

2

3

4

5

6

10 20 30 40 50

Mean Median Mode

Value

Fre

qu

en

cy

Page 18: Introduction to statistics RSS6 2014

Numerical Measures of Spread

• Range

• Sample Variance

• Inter Quartile Range (IQR)

Page 19: Introduction to statistics RSS6 2014

Numerical Measures of Spread

Range: The range of the data set is the difference between the highest value and the lowest value.

– Range = highest value - lowest value

– Easy to compute BUT ignores a great deal of information.

– Obviously the range is affected by extreme observations and is not a resistant measure.

Page 20: Introduction to statistics RSS6 2014

Numerical Measures of Spread

• Variance: equal to the sum of squared deviations from the sample mean divided by n - 1, where n is the number of observations in the sample.

Page 21: Introduction to statistics RSS6 2014

Numerical Measures of Spread

• Percentile: The percentile of a distribution is the value at which observations fall at or below it.

Page 22: Introduction to statistics RSS6 2014

Numerical Measures of Spread

• The most commonly used percentiles are the quartiles.

1st quartile Q1 = 25th percentile.

2nd quartile Q2 = 50th percentile.

3rd quartile Q1 = 75th percentile.

Page 23: Introduction to statistics RSS6 2014

Numerical Measures of Spread

Inter Quartile Range (IQR)

A simple measure spread giving the range covered by the middle half of the data is the (IQR) defined below.

IQR = Q3 - Q1

The IQR is a resistant measure of spread.

Page 24: Introduction to statistics RSS6 2014

Numerical Measures of Spread

Outliers: extreme observations that fall well outside the overall pattern of the distribution.

• An outlier may be the result of a

– Recording error,

– An observation from a different population,

– An unusual extreme observation (biological diversity)

Page 25: Introduction to statistics RSS6 2014

Numerical Measures of Spread

Page 26: Introduction to statistics RSS6 2014

Association Between Variables

• Explanatory (exposure) variable “X”

• Response (outcome) variable “Y”

Page 27: Introduction to statistics RSS6 2014

Association Between Variables

Page 28: Introduction to statistics RSS6 2014

Association Between Variables

Page 29: Introduction to statistics RSS6 2014

Association Between Variables

Page 30: Introduction to statistics RSS6 2014

Measurement of Correlation

Page 31: Introduction to statistics RSS6 2014

Correlation is NOT Association

Page 32: Introduction to statistics RSS6 2014

Regression