examining distributions - university of virginia graphs and pie charts describe the distribution of...

44
Examining Distributions - Introduction Chapter 1

Upload: ngodien

Post on 15-Mar-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining Distributions - Introduction

Chapter 1

Page 2: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

  A variable records characteristics of individuals (i.e., objects of interest) in its values.

  A variable’s distribution describes the counts or relative proportions of its values.

Variables

Page 3: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining Distributions - Describing Distributions with Graphs

Section 1.1

Page 4: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

 Bar graphs and pie charts describe the distribution of a categorical variable.

 A Pareto chart is a bar graph with categories ordered by decreasing frequency.

 Histograms are essentially bar graphs of a quantitative variable.

 Stemplots are back-of-the-envelope histograms drawn with the digits of quantitative values.

 Time plots graph time series values by time.

Some graphical statistics

Page 5: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Histograms

Use equal bar-widths and “eyeball” for best picture

December 2004 state unemployment rates.

(Raw data in Table 1.1 of text.)

Page 6: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Interpreting histograms

Too much detail Visualize a smooth curve highlighting the overall pattern

Look for shape, center, and spread.

Page 7: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Distribution shapes Symmetric distribution

Right-skewed distribution

Complex, multimodal distribution

Page 8: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Interpreting histograms

Look for deviations, like outliers.

Alaska Florida

Page 9: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Stemplot

December 2004 state unemployment rates.

(Raw data in Table 1.1 of text.)

Stem Leaves

Split stem

Page 10: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining Distributions - Describing Distributions with Numbers

Section 1.2

Page 11: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Measure of center: the mean

Heights (in.) of 25 women

Page 12: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Measure of center: the median

Step 2.a: If n is odd, M = middle value

Step 1: Sort x1, …, xn.

Step 2.b: If n is even, M = avg. of two middle values

M = 3.4

M = (3.3+3.4)/2 = 3.35

Page 13: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Left skew Right skew

Comparisons Symmetry

Observe:

 The mean is “pulled” by outliers.

 The median is resistant to outliers.

Page 14: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

M = 3.4

Q1= 2.2

Q3 = 4.35

Measure of spread: the quartiles

The first quartile, Q1, is the median of values below M.

The third quartile, Q3, is the median of values above M.

Page 15: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

M = 3.4

Q3 = 4.35

Q1 = 2.2

Max = 6.1

Min = 0.6

Five-number summary and boxplot

Page 16: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Measure of spread: the standard deviation

Heights (in.) of 25 women

, where

Note: Calculate by computer

Page 17: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Summarizing distributions

M Q3

Q1

Max

Min

Five number summary Error bars

(Resistant) (Not resistant)

Page 18: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining Distributions - The Normal Distributions

Section 1.3

Page 19: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Density curves A density curve is a mathematical idealization of a histogram

Actual

Idealization

“Area under the curve” ≈ proportion of observations.

Page 20: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Other idealizations

Histogram Density curve

Median halves “area under the curve” The mean is the balance point

Page 21: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examples

Have easy mathematical formulas

No easy formula

Page 22: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Normal distributions The normal curves:

x x Properties:

 Symmetric, single-peaked, and bell-shaped.

 Indexed by µ and σ, denoted N(µ, σ)

 µ ± σ mark inflection points.

“Exponential” function

Page 23: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Impact of µ and σ

Same µ, different σ

Different µ, same σ

Page 24: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

The 68-95-99.7 Rule

If x is N(µ, σ):

 68% of obs. within µ ± σ

 95% of obs. within µ ± 2σ

 99.7% of obs. within µ ± 3σ

Page 25: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Standardization A z-score measures the location of x from µ in units

of σ,

Key property: If x is N(µ, σ) then z is N(0, 1).

Benefit: To calculate an “area under the curve” for N(µ, σ) translate to a z-score and use N(0, 1).

“Standard Normal” distribution

Page 26: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Example calculation: heights

Problem: Heights, x, is N(64.5, 2.5).

For what proportion of individuals is x < 67?

Solution:

Ask: How far is c = 67 from µ = 64.5 in units of σ = 2.5?

(c – µ) / σ = (67 – 64.5) / 2.5 = 1

Translate: z = (x – µ) / σ is N(0, 1)

For what proportion of individuals is z < 1?

Calculate: normsdist(1) = 0.84

Page 27: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Example calculation: heights (cont) 68-95-99.7 rule:

Proportion with -1 < z < 1 is 0.68

Equally divide remaining between z < -1 and z > 1

Proportion with z < 1 is 0.16 + 0.68 = 0.84

0.68

0.16 0.16

Page 28: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Calculation of “area between” Problem: Proportion with c1< z < c2

Solution: (prop. with z < c2) – (prop. with z < c1)

Example: Proportion with 1.4 < z < 2.2.

normsdist(2.2) – normsdist(1.4)

= 0.9861 – 0.9192 = 0.0669

Page 29: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Backward calculations Problem: For what c is p the proportion with z < c?

Solution: c = normsinv(p)

Examples:

normsinv(0.84) = 1

normsinv(0.16) = -1

0.68

0.16 0.16

Page 30: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Problem: MPG, x, of compact cars is N(25.7, 5.88).

For what c does 10% of compact cars have x > c?

Solution: First, normsinv(0.90) = 1.28

Translate: z = (x – µ) / σ is N(0, 1)

10% of compact cars

have z > 1.28 = (c – µ) / σ

Solve: 1.28 = (c – 25.7) / 5.88

⇒ c = 25.7 + (1.28)(5.88)

= 33.2

Example calculation: mpg

Page 31: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining Relationships Scatterplots

Section 2.1

Page 32: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Often, individuals are measured in more than one variable

Follow the same approach as before:

 Plot data and calculate numerical summaries

 Look for overall patterns and deviations

 Consider suitability of mathematical models (later)

Examining relationships

Page 33: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining relationships

Additional considerations:

 Do some variables tend to vary together?

 Do some variables explain variability in another?

Definitions:  A response variable measures or records an

outcome of a study. (Also: y, dependent variable.)

 An explanatory variable explains changes in the response variable. (Also: x, independent variable.)

Page 34: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Scatterplots

A scatterplot is a graph of two quantitative variables measured on the same set of individuals.

If appropriate:  response variable on y-axis

 explanatory variable on x-axis

Page 35: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Example Beers Drank

Blood Alcohol

5 0.10

2 0.03

9 0.19

7 0.10

3 0.07

3 0.02

4 0.07

5 0.09

8 0.12

3 0.04

5 0.06

5 0.05

6 0.10

7 0.09

1 0.01

4 0.05

Page 36: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Interpretation: form Linear

Nonlinear

No relationship

Page 37: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Interpretation: direction

Negative Positive

high x ↔ low y low x ↔ high y

high x ↔ high y low x ↔ low y

Page 38: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Interpretation: strength

A stronger relationship has points falling more closely to a clear from

Perfect linear Less strong

Page 39: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

An outlier (of the relationship) is a point that falls off the trend

Outlier

Outlier in x and y but not of the relationship

Outlier of the relationship

Page 40: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examining Relationships - Correlation

Section 2.2

Page 41: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Measure of direction and strength: correlation Beers Drank

Blood Alcohol

5 0.10

2 0.03

9 0.19

7 0.10

3 0.07

3 0.02

4 0.07

5 0.09

8 0.12

3 0.04

5 0.06

5 0.05

6 0.10

7 0.09

1 0.01

4 0.05

Note: Calculate by computer

Page 42: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Examples

Page 43: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Properties

  -1 ≤ r ≤ 1, always

  Response and explanatory variables are interchangeable

  Unitless, and independent of variables’ units.

  r is not resistant.

Page 44: Examining Distributions - University of Virginia graphs and pie charts describe the distribution of a categorical variable. A Pareto chart is a bar graph with categories ordered by

Properties (cont.)   Interprets only linear relationships

Linear Non-linear

r is appropriate r may mislead