data presentation agenda

25
Part 1: Data Presentation -1/35 Data Presentation Agenda Data and Data Types Representing Data: pie chart, bar chart. Summarizing Data: box plot, histogram Central tendency Spread Distribution (shape)

Upload: chars

Post on 06-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Data Presentation Agenda. Data and Data Types Representing Data: pie chart, bar chart. Summarizing Data: box plot, histogram Central tendency Spread Distribution (shape). Data = A Set of Facts A picture of some aspect of the world. Pizza Sales by Type. What do the data tell you? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Presentation Agenda

Part 1: Data Presentation1-1/35

Data Presentation Agenda

Data and Data Types Representing Data: pie chart, bar chart. Summarizing Data: box plot, histogram

Central tendency Spread Distribution (shape)

Page 2: Data Presentation Agenda

Part 1: Data Presentation1-2/35

Data = A Set of FactsA picture of some aspect of the world

Pizza Sales by Type

What do the data tell you?

How can you use the information?

What additional information would make these data more informative?

Page 3: Data Presentation Agenda

Part 1: Data Presentation1-3/35

Data Types and Measurement Quantitative

Discrete = count: Number of car accidents by city by time Continuous = measurement: Housing prices

Qualitative Categorical: Shopping mall, car brand, trip mode Ordinal: Survey data on attitudes; “How do you feel about…?”Strongly disagree Disagree Neutral Agree Strongly agreeMoody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on.

Frameworks Cross section Time series

Page 4: Data Presentation Agenda

Part 1: Data Presentation1-4/35

Problem with Ordered Survey Response Data

Safety Count Percent Cum Pct

1 17 27.87 27.87

2 15 24.59 52.46

3 17 27.87 80.33

4 10 16.39 96.72

5 2 3.28 100.00

61 Stern Students’ Ranking of Subway Safety (1994)*

Very Unsatisfactory

Unsatisfactory

OK

Satisfactory

Very Satisfactory

Is there an objective meaning to “3” on some standard scale?Does everyone’s “1” or “2” or “3” … mean the same thing?

* Jeff Simonoff: Data Presentation and Summary, pp. 3-4

Page 5: Data Presentation Agenda

Part 1: Data Presentation1-5/35

Representing Data

In raw form Transformed to a visual form Summarized graphically Summarized statistically

Page 6: Data Presentation Agenda

Part 1: Data Presentation1-6/35

Pie Chart

PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball

CategoryMeatball

5.0%Garlic2.3%

Mushroom and Onion9.2%

Pepper and Onion7.3%

Sausage5.8%

Mushroom16.2%

Plain32.5%

Pepperoni21.8%

Pie Chart of Percent vs Type

Pizza Pies Sold, by Type

Page 7: Data Presentation Agenda

Part 1: Data Presentation1-7/35

Data Representation

Type

Num

ber

Meatball

Garlic

Mushroo

m and

Onio

n

Pepp

er and

Onio

n

Saus

age

Mushroo

mPla

in

Pepp

eron

i

4000

3000

2000

1000

0

Chart of Number vs Type

PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball

CategoryMeatball

5.0%Garlic2.3%

Mushroom and Onion9.2%

Pepper and Onion7.3%

Sausage5.8%

Mushroom16.2%

Plain32.5%

Pepperoni21.8%

Pie Chart of Percent vs Type

Same data. Which is easier to understand?

BAR CHART PIE CHART

Page 8: Data Presentation Agenda

Part 1: Data Presentation1-8/35 2013 data. Source: Bloomberg

Page 9: Data Presentation Agenda

Part 1: Data Presentation1-9/35

Page 10: Data Presentation Agenda

Part 1: Data Presentation1-10/35

Raw Data on Housing Prices and Incomes

Page 11: Data Presentation Agenda

Part 1: Data Presentation1-11/35

A Box Plot Describes the Distributionof Values in a Set of Data

List

ing

900000

800000

700000

600000

500000

400000

300000

200000

100000

Average House Listing Price by State

Hawaii

Box and Whisker Plot for House Price Listings

Page 12: Data Presentation Agenda

Part 1: Data Presentation1-12/35

Making a Box Plot for Per Capita IncomeMaximum=31136

Median=22610

Minimum=17043

1st Quartile = 21677

3rd Quartile = 24933

Interquartile Range = IQR= 24933-21677 = 3256

Page 13: Data Presentation Agenda

Part 1: Data Presentation1-13/35

Box and Whisker Plot

Median

75th Percentile

25th Percentile

Interquartile range=IQR

Larger of (Minimum, Median – 1.5 IQR

Smaller of (Maximum, Median + 1.5 IQR

Outliers

HOG, pp. 39-43

What is an outlier?Why do we believe a particular point is an outlier?

Page 14: Data Presentation Agenda

Part 1: Data Presentation1-14/35

A Frequency Distribution

Page 15: Data Presentation Agenda

Part 1: Data Presentation1-15/35

Histogram for House Price Listings

Listing

Frequency

900000800000700000600000500000400000300000200000

14

12

10

8

6

4

2

0

Histogram of Listing

HOG, pp. 16-18

A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.

Page 16: Data Presentation Agenda

Part 1: Data Presentation1-16/35

Distribution of House Price Listings

Listing

Frequency

900000800000700000600000500000400000300000200000

14

12

10

8

6

4

2

0

Histogram of Listing

List

ing

900000

800000

700000

600000

500000

400000

300000

200000

100000

Average House Listing Price by State

Asymmetry (skewness) in the histogram of listing prices…

… shows up in the box and whisker plot. Note the long whisker at the top of the figure.

Page 17: Data Presentation Agenda

Part 1: Data Presentation1-17/35

A Caution About Graphical Data Summaries

Graphical tools can be very badly behaved when:

(1) The data have only a few observations.

(2) There are wild observations in the data set.

The box and whisker plot is distorted (and dominated) by one wildly errant observation.

Page 18: Data Presentation Agenda

Part 1: Data Presentation1-18/35

Summary What story does the data presentation tell?

Data in raw form tell no story. Visual representation of data tells something about the data

Data reduction and summary representation: What do we learn? Location Spread Shape of the distribution

What tool is most informative? Reduction to a small number of features Visual displays of data

Pie chart Box and whisker plots Histograms Time series plots

“There are lies, damned lies and statistics.” (Benjamin Disraeli)

Page 19: Data Presentation Agenda

Part 1: Data Presentation1-19/35

The Visual Data Do Tell the Story:Napoleon’s March to Moscow

Page 20: Data Presentation Agenda

Part 1: Data Presentation1-20/35Source: Bloomberg. August 2013

Page 21: Data Presentation Agenda

Part 1: Data Presentation1-21/35 Source: Bloomberg. August 2013

Page 22: Data Presentation Agenda

Part 1: Data Presentation1-22/35

Page 23: Data Presentation Agenda

Part 1: Data Presentation1-23/35

Page 24: Data Presentation Agenda

Part 1: Data Presentation1-24/35

Page 25: Data Presentation Agenda

Part 1: Data Presentation1-25/35

Probability of Survival to Age 50, Female at BirthU.S. and 20 Other Wealthy Countries