shivkumar kalyanaraman rensselaer polytechnic institute 1 graphing to visualize data satish...

Post on 04-Jan-2016

230 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

Graphing to visualize data

Satish Raghunathrsatish@alum.rpi.edu

Shiv KalyanaramanGoogle: “Shiv RPI”

shivkuma@ecse.rpi.edu http://www.ecse.rpi.edu/Homepages/shivkuma

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Issues with graphing Types of graphs Examples of graph usage & what you get out of them

Art: how to choose what graph to use? Graphing Tools Pitfalls and mistakes in graphing Advanced: visualization In class work: reviewing graphing use in selected technical

papers

Overview

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

Thoughts on Presentation Styles

Primary purpose: illustrate to help understand

“The goal of simulation is intuition, not numbers," - R.W. Hamming

Corollary: don’t dump data on the reader. Distill it into presentations that give insight instead…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Descriptive Statistics

InvolvesCollecting DataPresenting DataCharacterizing DataUnderstanding

data: distill insights!

X = 30.5 SX = 30.5 S22 = 113 = 113

00

2525

5050

Q1Q1 Q2Q2 Q3Q3 Q4Q4

$$

Insights: Somewhat skewed Bell shape: perhaps a Poisson (distrn) would fit?

Statistics obtained from data

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

To graph or not to graph

Use graphs whenTrends in data are not obvious It is hard to explain the X-Y relationship in

words Consider tables if

The number of data-points are smallReader might find exact value of data-points

useful

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Summary Table: Frequencies

1. Lists Categories & No. Elements in Category 2. Obtained by Tallying Responses in Category 3. May Show Frequencies (Counts), % or Both

Row Is Category

Tally:|||| |||||||| ||||

Major CountAccounting 130Economics 20Management 50Total 200

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

Example Tables from Networking

SACK (Multiple Sources)

LT-TCP (Multiple Sources)

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

What kind of graph?

Pie-charts to depict “fraction of a whole” Bar-charts when data-points few and table is not

suitable Line-plots when there are a lot of data-points Box-plots if statistical inference is drawn: shows

1st, 2nd, 3rd quartile for each point. Scatter-plots, 3-d plots only if necessary –

AVOID complex graphs

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Econ.10%

Mgmt.25%

Acct.65%

Pie Chart

1. Shows Breakdown of Quantity into Categories

2. Useful for Showing Relative Differences

3. Angle Size (360°) x (Percent)

Majors

(360°) (10%) = 36°

36°

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Pie Chart Networking Example

Source: http://www.caida.org/~bhuffake/papers/skitviz/

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Another eg: VPN Classification

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

00 5050 100100 150150

Acct.Acct.

Econ.Econ.

Mgmt.Mgmt.

Bar Chart

Horizontal Horizontal Bars for Bars for Categorical Categorical VariablesVariables

Bar Length Bar Length Shows Shows Frequency Frequency or %or %

1/2 to 1 Bar 1/2 to 1 Bar WidthWidth

Equal Bar Equal Bar WidthsWidths

Zero PointZero Point

FrequencyFrequency

MajorMajor

Percent Used AlsoPercent Used Also

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

Networking Example Bar Chart

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Example Analysis with Bar Charts

LT-TCP is able toreduce timeouts drasticallykeep the queue non-empty maximizing throughput and capacity utilization.minimize use of FEC to level needed

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

00

11

22

33

44

55

Histogram: for “distributions”

FrequencyFrequency

Relative Relative FrequencyFrequency

PercentPercent

00 1515 2525 3535 4545 5555

Lower BoundaryLower Boundary

Bars Bars TouchTouch

ClassClass Freq.Freq.15 but < 2515 but < 25 3325 but < 3525 but < 35 5535 but < 4535 but < 45 22

CountCount

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

Recall: Real Example Histogram What is the fairness between TCP goodputs when we use different queuing

policies? What is the confidence interval around your estimates of mean file size? Note: “distribution” need not just be a probability/frequency distribution

FQ

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1 4 7 10 13 16 19 22 25 28 31Flow Number

Thro

ughp

ut(M

bps)

RED

0

1

2

3

4

5

6

7

8

9

10

1 4 7 10 13 16 19 22 25 28 31Flow Number

Thro

ughp

ut(M

bps)

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

00 5050 100100 150150

Acct.Acct.

Econ.Econ.

Mgmt.Mgmt.

Dot Chart or Scatterplots

FrequencyFrequency

MajorMajor Line Length Line Length Shows Shows Frequency or %Frequency or %

Equal Equal SpacingSpacing

Like Like Horizontal Horizontal Bar ChartBar Chart

Percent Used AlsoPercent Used Also

Horizontal Horizontal Lines for Lines for Categorical Categorical VariablesVariables

Zero PointZero Point

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Scatter Plots

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Scatter plots with trends

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

WiFi Analysis: Scatter Plots http://www.sigcomm.org/sigcomm2004/papers/p442-aguayo1111.pdf

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

Line Charts:Example:

Comparative Performance

Note: also plots confidence intervals!

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Line Plots for Distributions: Example

Hop count and RTT distributions

Source: http://www.caida.org/~bhuffake/papers/skitviz/

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Recall: Distribution Shape

1. Describes How Data Are Distributed 2. Measures of Shape

Skew = Symmetry

Right-SkewedRight-SkewedLeft-SkewedLeft-Skewed SymmetricSymmetric

MeanMean = = MedianMedian = = ModeModeMeanMean MedianMedian ModeMode ModeMode MedianMedian MeanMean

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Box Plot

Graphical Display of Data Using5-Number Summary

MedianMedian

44 66 88 1010 1212

QQ33QQ11 XXlargestlargestXXsmallestsmallest

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

3D Graphs Example Illustrates a complex parameter response surface ...

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

3D Plots: N/w Example: Code Red Worm Analysis

http://www.prism.uvsq.fr/users/qst/Tomography/Articles_jmf/renesys_bgp_instabilities2001.pdf http://www.caida.org/outreach/isma/0112/talks/andyo/index.pdf http://www.renesys.com/resource_library/Renesys-NANOG23.pdf

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Contd…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Tools: Gnuplot

To use with data-generating programs for repetitive plotting

E.g. generate the plot of throughput for every 1 hour interval in the last week.

http://www.gnuplot.info TIP: Export gnuplot plots as “.fig” file and edit it in

xfig for greater flexibility

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Tools: XmGrace

For more intricate details (e.g., creating error-bars, different shades for bar-charts); GUI-driven, very user friendly.

http://plasma-gate.weizmann.ac.il/Grace/ Exports images to EPS (good for LaTeX

documents), PNG (good for PowerPoint) etc. Can also run on Windows on top of Cygwin!

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Tools: MATLAB

For complex 3-d and other statistical plots like box-plots, scatter-plots and in general if enormous quantities of data is involved.

http://www.mathworks.com

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Tools: Excel Data Presentations

Open up Excel to a new Worksheet. Code a data set as below:

Blue 34

White 68

Red 25

Green 50 Explore simple data presentation possibilities…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

Graphs: things to watch out Purpose: illustrate entire time-series or response

distribution Label the x- and y-axis Check what units the x- and y-axes are in (not “goats” or

“sheep”!) Check if either scale is logarithmic (changes meaning) Check where is the origin (or zero point) for each axis! After understanding WHAT is being plotted, close your

eyes and ask: what will different patterns on this graph imply (relative

to what I want to understand)? See if the relative performance is over- or under-

emphasized (if two systems are being compared) Several examples in the Jain textbook

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Errors in Presenting Data

1. Using ‘Chart Junk’

2. No Relative Basis in Comparing Data Batches

3. Compressing the Vertical Axis

4. No Zero Point on the Vertical Axis

Shivkumar KalyanaramanRensselaer Polytechnic Institute

34

‘Chart Junk’

Bad PresentationBad Presentation Good PresentationGood Presentation

1960: $1.001960: $1.00

1970: $1.601970: $1.60

1980: $3.101980: $3.10

1990: $3.801990: $3.80

Minimum WageMinimum Wage Minimum WageMinimum Wage

00

22

44

19601960 19701970 19801980 19901990

$$

Shivkumar KalyanaramanRensselaer Polytechnic Institute

35

No Relative Basis

Good PresentationGood Presentation

A’s by ClassA’s by Class A’s by ClassA’s by Class

Bad PresentationBad Presentation

00

100100

200200

300300

FRFR SOSO JRJR SRSR

Freq.Freq.

0%0%

10%10%

20%20%

30%30%

FRFR SOSO JRJR SRSR

%%

Shivkumar KalyanaramanRensselaer Polytechnic Institute

36

Compressing Vertical Axis

Good PresentationGood Presentation

Quarterly SalesQuarterly Sales Quarterly SalesQuarterly Sales

Bad PresentationBad Presentation

00

2525

5050

Q1Q1 Q2Q2 Q3Q3 Q4Q4

$$

00

100100

200200

Q1Q1 Q2Q2 Q3Q3 Q4Q4

$$

Shivkumar KalyanaramanRensselaer Polytechnic Institute

37

No Zero Point on Vertical Axis

Good PresentationGood Presentation

Monthly SalesMonthly Sales Monthly SalesMonthly Sales

Bad PresentationBad Presentation

00

2020

4040

6060

JJ MM MM JJ SS NN

$$

3636

3939

4242

4545

JJ MM MM JJ SS NN

$$

Shivkumar KalyanaramanRensselaer Polytechnic Institute

38

Graphing Practices: In pictures

Shivkumar KalyanaramanRensselaer Polytechnic Institute

39

Graphing Practices…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

40

Graphing Practices…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

41

Graphing Practices….

Shivkumar KalyanaramanRensselaer Polytechnic Institute

42

Checklist: In textbook

Shivkumar KalyanaramanRensselaer Polytechnic Institute

43

More Complex Visualizations Internet topology aspects: CAIDA skitter project

http://www.caida.org/tools/measurement/skitter/visualizations.xml

Shivkumar KalyanaramanRensselaer Polytechnic Institute

44

More…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

45

The End

top related