introduction to data analytics part four: principles of data visualization david schuff

27
Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Upload: norman-henry

Post on 16-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Introduction to Data Analytics

Part Four: Principles of Data Visualization

David Schuff

Page 2: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

What makes a good chart?

Minard’s map of Napoleon’s campaign into Russia, 1869Reprinted in Tufte (2009), p. 41

Page 3: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

What makes a good chart?

http://www.popvssoda.com/countystats/total-county.html

Page 4: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

What makes a good chart?

Zhang et al. (2010), “A case study of micro-blogging in the enterprise: use, value, and related issues,” Proceedings of the 28th International Conference on Human Factors in Computing Systems.

This is from an academic conference

paper.

What are the problems with

this chart?

Page 5: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Some basic principles (adapted from Tufte 2009)

• The chart should tell a story1• The chart should have graphical

integrity2• The chart should minimize

graphical complexity3Tufte’s fundamental principle:Above all else show the data

Page 6: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Principle 1: The chart should tell a story

Graphics should be clear on their own

The depictions should enable meaningful comparison

The chart should yield insight beyond the text

“If the statistics are boring, then you’ve got the wrong numbers.” (Tufte 2009)

Page 7: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Examples?http://www.evl.uic.edu/aej/491/week03.html

http://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/

Page 8: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Telling a Story

http

://e

cono

mix

.blo

gs.n

ytim

es.c

om/2

009/

05/0

5/ob

esi

ty-

and

-the

-fa

stne

ss-o

f-fo

od/

http

://f

low

ingd

ata.

com

/201

1/0

1/1

9/s

tate

s-w

ith-t

he-m

ost-

and-

leas

t-fir

earm

s-m

urde

rs/

Page 9: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Principle 2: The chart should have graphical integrity

• Basically, it shouldn’t “lie” (mislead the reader)

• Tufte’s “Lie Factor”:

Should be ~ 1

< 1 = understated effect

> 1 = exaggerated effect

Page 10: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Examples of the “lie factor”

𝐿𝐹=5.3 /0.627.5 /18

=8.831.53

=5.77

𝐿𝐹=4280% ( h𝑐 𝑎𝑛𝑔𝑒𝑖𝑛𝑣𝑜𝑙𝑢𝑚𝑒)454% ( h𝑐 𝑎𝑛𝑔𝑒 𝑖𝑛𝑝𝑟𝑖𝑐𝑒)

=9.4Reprinted from Tufte (2009), p. 57 & p. 62

Page 11: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

A more recent, basic example

http://20bits.com/articles/politics-and-tuftes-lie-factor/

The original graphic from Real Clear Politics, 2008.

(Look at the y-axis)The adjusted graphic.

Page 12: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Other tips to avoid “lying”

Adjust for inflation

Make sure the context is presented

80

90

100

110

120

130

140

2003 2004 2005 2006 2007 2008 2009 2010Year

Hypothetical Industries, Inc.

Revenue

Adjusted Revenue

350

360

370

380

390

400

410

2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010Th

efts

per 1

0000

0 ci

tizen

s

Hypothetical City Crime

vs.

Page 13: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Principle 3: The chart should minimize graphical complexity

Key concepts

Sometimes a table is

betterData-ink Chartjunk

Generally, the simpler the better…

Page 14: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

When a table is better than a chart• For a few data points, a table can do just as well…

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by SalespersonSalesperson Total Sales

Peacock $225,763.68

Leverling $201,196.27

Davolio $182,500.09

Fuller $162,503.78

Callahan $123,032.67

King $116,962.99

Dodsworth $75,048.04

Suyama $72,527.63

Buchanan $68,792.25

The table carries more information in less space and is more precise.

Page 15: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

The Ultimate Table: The Box Score

• Large amount of information in a very small space

• So why does this work?• Depends on the

reader’s knowledge of the data

Page 16: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

The Business Box Score?

• Applying the same concept to our salesforce example.

• How does this help? How could it hurt?

Sales Performance – March 2011

Salesperson TS WD BD NC DOR

Peacock 225 3 40 20 28

Leverling 201 2 45 18 27

Davolio 182 5 38 22 28

Fuller 162 2 22 16 20

Callahan 123 1 15 14 15

King 116 0.5 20 12 18

Dodsworth 75 0.3 12 10 20

Suyama 72 0 8 10 8

Buchanan 68 0 8 8 12

Key:TS – total salesWD – worst dayBD – best dayNC – number of customersDOR – days on the road

Page 17: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Data Ink• The amount of “ink” devoted to data in a chart• Tufte’s Data-Ink ratio:

Should be ~ 1

< 1 = more non-data related ink in graphic

= 1 implies all ink devoted to data

Tufte’s principle:Erase ink whenever possible

Page 18: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Being conscious of data ink

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

200

270

320 330

370350

400370

2003 2004 2005 2006 2007 2008 2009 2010

Hypothetical City Crime

Lower data-ink ratio(worse)

Higher data-ink ratio(better)

Page 19: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

What makes a good chart?

020000400006000080000

100000120000140000160000

2011 Total Sales

Order Date

Sum of Extended Price

020000400006000080000

100000120000140000160000

2011 Total Sales

Order Date

Sum of Extended Price

Sometimes it’s really a matter of

preference.

These both minimize data

ink.

Why isn’t a table better here?

Page 20: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

3-D Charts

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by Salesperson

Evaluate this from a data-ink perspective.How does it affect the clarity of the chart?

Page 21: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Chartjunk: Data Ink “gone wild”

Unnecessary visual clutter that doesn’t provide additional insight

Distraction from the story the chart is supposed to convey

When the data-ink ratio is low, chartjunk is likely to be high

Page 22: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

Example: Moiré effects (Tufte 2009)

Creates illusion of movement

Stands out, in a bad way

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by Salesperson

Page 23: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Example: The Grid

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

Why are these examples of chartjunk?

What could you do to

remedy it?

Page 24: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Data Ink Working Against Us

Evaluate this chart in terms of Data Ink.

Are there better

visualizations?

Page 25: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Data Ink Working For Us

Evaluate this chart in terms of Data Ink.

Imagine this as a bar chart.

As a table!!

Page 26: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Stacked Bar Charts are Often Trouble

• Original chart from the BBC website

• Why is this so difficult to read?

• What would be a better way to visualize it?

http://j-walkblog.com/index.php?/weblog/posts/bad_charts/

Page 27: Introduction to Data Analytics Part Four: Principles of Data Visualization David Schuff

Key Questions: Can you answer…

• What are three aspects of a good graphic?

• How can a chart “lie”?

• What is the Data Ink ratio and how does it relate to Chartjunk?

• When is a table better than a chart?