data visualization seminar ncdc, april 27 2011 todd pierce module 5 types of graphs
TRANSCRIPT
![Page 1: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/1.jpg)
Data Visualization SeminarNCDC, April 27 2011
Todd Pierce
Module 5 Types of Graphs
![Page 2: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/2.jpg)
Best PracticesTime Series
(sources: Colin Ware and Stephen Kosslyn)
![Page 3: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/3.jpg)
Time Series Graphs
• Most graphics show values changing over time – time gives us a context for understanding data – random sample of 4000 newspaper graphics 1874-
1989 found 75% of them had time series– Time Series can be shown best by line graphs but
sometimes other graphs work best
![Page 4: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/4.jpg)
Time Series Graphs
• Patterns– Trend: overall tendency of values to increase,
decrease, or stay stable during a time period; trend lines can show this (but see later caveats)
– Variability: average degree of change from one point in time to the next in a time period; but be careful, if the y scale is narrow or does not start at zero, variability may be overstated
– Rate of change: percent difference between one value and the next; rates of change may be increasing faster than the raw data values would indicate
![Page 5: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/5.jpg)
Time Series Graphs
• Patterns– Co-variation: changes in one time series are reflected
as changes in another, either immediately or later; changes can be in same or different directions; if changes are not immediate, we have leading or lagging indicators
– Cycles: patterns that repeat at regular intervals instead of in one fixed interval
– Exceptions: values that fall far outside the norm
![Page 6: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/6.jpg)
Time Series Graphs
![Page 7: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/7.jpg)
Time Series Graphs
• Line Graphs: show how quantitative values have changed over a continuous time period; show pattern or shape of change over time; show exceptions– Lines make visible the sequential flow of values over
time– Lines trace connection from one value to the next– Lines shows extent and direction of change through
slope– If we want to compare magnitudes of values at a
point in time, we should add dots to the lines
![Page 8: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/8.jpg)
Time Series Graphs
• Bar Graphs: emphasize individual values and allow for comparisons of specific values at points in time– Visual weight of bars and their separation makes us
focus on individual values rather than the overall patterns
• Dot Plots: useful when sampling at irregular intervals– A line connecting sporadic values implies smooth
transitions between values– More regular sampling might show different picture– Use dots instead of lines to avoid false conclusions
![Page 9: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/9.jpg)
Time Series Graphs
• Box Plots: show distribution of values over time by showing the average, min and max – see Distribution Analysis for more information
• Animated Scatterplots : show correlation analysis over time – such as Gapminder– see Correlation Analysis for more information– Great for telling a story, not so good for analysis – hard to
track individual dots– Must be combined with trails to show patterns of change over
time, and small multiples (trellis display) to compare patterns of changes for multiple items
![Page 10: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/10.jpg)
Time Series Graphs
• Best Practices– Aggregating to different time intervals: combine data
into different time spans (month, week, year, day) to see different patterns emerge
– Viewing time periods in context: extend the time period – trends that look significant in a small time span may not be over longer periods
– Grouping related time intervals: add vertical lines or shading on the time axis to show for example each quarter or when the weekends are
![Page 11: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/11.jpg)
Time Series Graphs
• Best Practices– Using running averages to enhance perception of
high level patterns: trend lines can mislead if they don’t take into account values just outside the time period; better to look at running averages of current value and a few previous values – this smoothing can reduce variability that throws off trend lines
– Omitting missing values from a display: rather than have the line dip to zero, either skip the value (show a broken line) or show the line lighter or dashed; do not confuse a valid zero value with a missing value
![Page 12: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/12.jpg)
Time Series Graphs
• Best Practices– Optimizing a graph’s aspect ratio: change the aspect ratio to
get a lumpy profile instead of a flat or spiky profile, to allow for optimal comparison of slopes
– Using log scales and percentages to compare rates of change: variations in numerical magnitudes may hide true rates of change – use log scales, or percent change from previous value or from a baseline value, to see true rates of change
– Overlapping time scales to compare cyclical patterns: instead of showing for example all three years in one line, show each year as a different line over the 12 months, to allow comparisons from year to year for a given month
![Page 13: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/13.jpg)
Time Series Graphs
• Best Practices– Using cycle plots to examine trends and cycles
together: compare cycles and see trends across multiple cycles
– Shifting time to compare leading and lagging indicators: shift the time axis on one graph so it aligns with the other and see patterns
– Stacking line graphs to compare multiple values: if multiple time series have very different units or scale ranges, put them in stacked line graphs with the same time axis
![Page 14: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/14.jpg)
Time Series Graphs
• Best Practices– Expressing time as 0-100% to compare asynchronous
processes: if activities have different start dates, reduce each to 0% and show later dates as percentage of total activity time, to compare values at similar times in total activity length
– Maintaining consistency through time: must adjust for inflation in currency over time; and account for how information gathering changed or values were defined over time
![Page 15: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/15.jpg)
Time Series Graphs
• Do’s and Don’t’s– Change salience of lines if needed to show relative
importance.– Ensure crossing or nearby lines are discriminable.– If using points on lines, make points at least twice as
thick as the lines.– Vary the lengths of dashes in dashed lines by at least
a ratio of 2 to 1.– Use different, discriminable symbols for points on
different lines.
![Page 16: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/16.jpg)
Time Series Graphs
• Do’s and Don’t’s– Do not fill in the areas between two lines – it’s not an
area graph.– In a mixed line and bar display, make one more salient
and important.– Put labels of all lines in same part of graph (else it
draws attention to certain lines – also less busy).– Put labels at end of lines (so labels and lines group with
each other.– Label any critical data points explicitly rather than
labeling all points.
![Page 17: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/17.jpg)
Best PracticesPart-to-Whole and Ranking Analysis
(sources: Colin Ware and Stephen Kosslyn)
![Page 18: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/18.jpg)
Part-to-Whole and Ranking
• Comparing parts to a whole and ranking them by value – for example the expenses of each department of a company as a % of total expenses, ranked in order
![Page 19: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/19.jpg)
Part-to-Whole and Ranking
• Patterns– Uniform – all values roughly the same– Uniformly different – differences from one
value to the next increase by roughly the same amount
– Non-uniformly different – differences from one value to the next vary significantly
![Page 20: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/20.jpg)
Part-to-Whole and Ranking
• Patterns– Increasingly different – differences from one value to
the next increase– Decreasingly different – differences from one value
to the next decrease– Alternating differences – differences from one value
to the next begin small then shift to large and finally back to small
– Exceptional – one or more values are very different from the rest
![Page 21: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/21.jpg)
Part-to-Whole and Ranking
![Page 22: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/22.jpg)
Part-to-Whole and Ranking
![Page 23: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/23.jpg)
Part-to-Whole and Ranking
• Part to whole is usually shown with pie charts – bad idea!
• Makes us compare areas or angles, both of which humans do poorly
• If pie uses a legend, eye must bounce between chart and legend– You can label pie wedges directly with name and %
value – but this is no better than a table – why use a graph if we must resort to printed values to make sense of it?
![Page 24: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/24.jpg)
Part-to-Whole and Ranking
Acceptable
Bad
![Page 25: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/25.jpg)
Part-to-Whole and Ranking
Bad
![Page 26: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/26.jpg)
Part-to-Whole and Ranking
Bad
![Page 27: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/27.jpg)
Part-to-Whole and Ranking
Acceptable?
![Page 28: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/28.jpg)
Part-to-Whole and Ranking
• Instead, use a bar graph– One exception – if values cluster close together, the
bar differences are small and hard to see– So narrow the scale (zoom in) so differences bigger– But, use dot plot – dots or lines instead of bars – so
we don’t misjudge the bar lengths
![Page 29: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/29.jpg)
Part-to-Whole and Ranking
• Use a Pareto chart to show the cumulative contributions of each part to a whole– a line graph plus a bar chart shows how the parts
sum to 100– summarize and display the relative importance of the
differences between groups of data. Pareto charts – distinguish the "vital few" from the "useful many."
![Page 30: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/30.jpg)
Part-to-Whole and Ranking
• Vilfredo Pareto, a turn-of-the-century Italian economist, studied the distributions of wealth, finding that about 20% of people controlled about 80% of a society's wealth.
• This same distribution has been observed in other areas and has been termed the Pareto Principle or 80/20 rule.
![Page 31: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/31.jpg)
Part-to-Whole and Ranking
![Page 32: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/32.jpg)
Part-to-Whole and Ranking
![Page 33: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/33.jpg)
Part-to-Whole and Ranking
• Best Practices– Grouping categorical values in ad hoc manner: group very
small categories into one called ‘other’ or regrouping similar categories into one master category for better analysis
– Using Pareto charts with percentile scales: group values into percentile intervals (top 10%, ,next 10%, etc) and use Pareto line – can lead to new insights
– Using line graphs to view ranking changes through time: use line graphs to show changes in ranking (such as salesperson’s sales) over time – the lines show the relative ranking but not the actual values – inspired by bump charts from racing
![Page 34: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/34.jpg)
Part-to-Whole and Ranking
• Best Practices– Re-expressing values to solve quantitative scaling problems:
sometimes the small values on a bar chart are hard to see relative to the large values – so re-express the number using the square root, or a logarithm, if it reduces the range from highest to lowest; can also use an inverse scale (divide each value by the largest value or some other value such as a million)
![Page 35: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/35.jpg)
Part-to-Whole and Ranking
• Do’s and Don’t’s: Bar Charts– Do not insist on minimizing ink.– Mark corresponding bars in same color or symbol for multiple
parameters.– Arrange corresponding bars in same order for multiple
parameters.– Ensure overlapping bars do not look like stacked bars – offset
the bars.– Leave space between bar clusters for multiple parameters.– Do not extend bars beyond the end of the scale.
![Page 36: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/36.jpg)
Part-to-Whole and Ranking
• Do’s and Don’t’s: Pie Charts– Draw radii from the center of the circle.– Explode a maximum of 25% of the wedges.– Arrange wedges in a simple increasing progression.– Place labels in wedges provided they can be easily
read.– Place labels next to all wedges if they cannot fit
inside wedges (otherwise reader will think ones outside wedge are more important).
![Page 37: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/37.jpg)
Best Practices Deviation Analysis
(sources: Colin Ware and Stephen Kosslyn)
![Page 38: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/38.jpg)
Deviation Analysis
• Examining how a set of values deviate from a reference point (a budget, average, or price in time)– Usually use a bar graph with two bars per entity – the
actual and expected, such as for a budget– However this makes user subtract values in head– Better to have the graph 0 line be the expected
reference, and the bars show the amount over or under (the deviation)
![Page 39: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/39.jpg)
Deviation Analysis
• Comparisons– Current target, future target– Same point in time in past– Immediately prior period– Standard or norm– Other items in same category or same market
![Page 40: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/40.jpg)
Deviation Analysis
![Page 41: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/41.jpg)
Deviation Analysis
![Page 42: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/42.jpg)
Deviation Analysis
• Best shown as bar or line graphs with reference line at 0 or 100%– If at 0, values expressed as positive and negative
deviations in dollars or percents– If at 100%, values expressed as percentages of the
reference value– Best to use a line graph when doing comparisons
over time, from one period to the next; if comparing entities such as areas or companies, use a bar graph
![Page 43: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/43.jpg)
Deviation Analysis
• Best Practices– Expressing deviations as percentages: helps
normalize multiple data sets to same units to allow for better comparison – works best if values or mostly <= 100% and nothing exceeds 500%
– Comparing deviations to other points of reference: besides showing reference line, show other lines such as acceptable deviations from norm, or standard deviations from mean
![Page 44: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/44.jpg)
Best Practices Distribution Analysis
(sources: Colin Ware and Stephen Kosslyn)
![Page 45: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/45.jpg)
Distribution Analysis
• Seeing how numerical values are distributed from low to high, and compare how multiple values sets are distributed
• “The median isn’t the message” (Stephen Jay Gould) – knowing the average or median value hides the full
range of values– even knowing the max and min values hides the
number of values at each numerical value in a range of data
![Page 46: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/46.jpg)
Distribution Analysis
• Characteristics of distributions of values– Spread: the difference between the max and min
values – the full range of values– Center: estimate of the middle of a set of values – the
mean or median or average– Shape: where values are located in a spread –
skewed to a side? Evenly distributed?
• Distribution summaries:– 3 value: low, median, high– 5 value: low, 25th %ile, median, 75th %ile, high
![Page 47: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/47.jpg)
Distribution Analysis
• Patterns - Shape: – Curved or flat?– If curved, curved upward (bell curve) or downward
(opposite of bell curve)?– If curved upward, one peak, two peaks (bi-modal), or
more?– If single peaked, symmetrical or skewed left or right?– Concentrations? Noticeably high peaks, that may not
be the absolute peak– Gaps? Areas of low or no values
![Page 48: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/48.jpg)
Distribution Analysis
Gaussian distribution
![Page 49: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/49.jpg)
Distribution Analysis
![Page 50: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/50.jpg)
Distribution Analysis
Bimodal distributionfor graduating lawyersalaries
![Page 51: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/51.jpg)
Distribution Analysis
• Patterns - Outliers: – values way beyond the norm– good rule of thumb – take distance between 75th and
25th percentile values, multiply that by 1.5, and then subtract that from 25th percentile to make lower bound and add to 75th percentile to mark upper bound
![Page 52: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/52.jpg)
Distribution Analysis
• Histograms: single distribution display – Bar graph with X axis showing value ‘bins’ like age
groups, and y axis showing number of values falling in each bin
– Bars touch to show continuous distribution between bins
– Enhanced if you can show the 3 value or 5 value marks on the X axis – otherwise no good way to determine the center and spread, just the shape
![Page 53: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/53.jpg)
Distribution Analysis
![Page 54: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/54.jpg)
Distribution Analysis
• Box Plots: multiple distribution display – Box shows median and 25th/75th percentiles
(midspread)– Whiskers show high and low values (spread)– Could also have whiskers stop at 5th/95th percentiles
and show outliers as dots
![Page 55: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/55.jpg)
Distribution Analysis
![Page 56: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/56.jpg)
Distribution Analysis
![Page 57: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/57.jpg)
Distribution Analysis
• Best Practices– Keeping intervals consistent: each X axis bin should have an
equal number of values in it; but it is OK to group outliers at one or both ends into one bin
– Selecting the best interval: if bins are too large, patterns are lost and the graph is too general; if bins are too small, the graph is too jagged and patterns cannot be seen
– Using measures that are resistant to outliers: certain measure such as the mean and the standard deviation can be greatly changed by the presence or absence of outliers; the median is very resistant to outliers and hence is preferred
![Page 58: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/58.jpg)
Best Practices Correlation Analysis
(sources: Colin Ware and Stephen Kosslyn)
![Page 59: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/59.jpg)
Correlation Analysis
• Examining how numerical values relate to and affect one another; helps to track down causes– Does one value vary systematically with another
value?– If so, in what manner, degree, direction, and why?
![Page 60: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/60.jpg)
Correlation Analysis
• Correlation between two variables can mean– One variable causes another– Neither variable affects the other – instead both are
caused by one or more other variables (spurious correlation – due to these lurking variables)
– Neither variable affects the other – instead another variable connects them in causation
– The apparent correlation is an error due to bad or insufficient data
![Page 61: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/61.jpg)
Correlation Analysis
• Describing correlations– Direction: positive or negative (refers to slope on
graph)– Strength: amount of grouping along the trend line –
the stronger the grouping, the more likely the variables are related; if values are scattered the correlation is weak or not present
– Shape: linear or curved (curvilinear)
![Page 62: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/62.jpg)
Correlation Analysis
• Patterns - Shape– Linear or curved? If linear, an increase in one variable
is matched by same increase in another variable; if curved, the increases are not the same
– One direction or two? Does curve go up or down only, or both?
– Logarithmic (values go up or down at ever decreasing rate of change) or exponential (values go up or down at ever increasing rate of change)?
![Page 63: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/63.jpg)
Correlation Analysis
• Patterns - Shape– Curved upward or downward? Shaped like an S?– Concentrations? (can be due to overlapping
distributions creating multiple clusters)– Gaps? (only useful to examine when there is a
correlation)– Outliers? Values very far from the fit line showing the
trend
![Page 64: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/64.jpg)
Correlation Analysis
![Page 65: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/65.jpg)
Correlation Analysis
![Page 66: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/66.jpg)
Correlation Analysis
![Page 67: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/67.jpg)
Correlation Analysis
• Statistical summaries of correlation– Linear correlation (r): direction and strength of correlation,
from r=+1 (perfect positive) to r=-1 (perfect negative); each analysis has different value of r that is significant
– Coefficient of correlation (r2): strength of correlation – equal to r squared – so values range from 0 to 1; value indicates percent of change in dependent variable that can be attributed to the independent variable (from 0 to 100%)
• Visual displays on a graph are still needed because very different sets of data can have the same statistical values (see next slide)
![Page 68: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/68.jpg)
Correlation Analysis
from Few
![Page 69: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/69.jpg)
Correlation Analysis
• Correlation displays– Scatterplots: use x and y axes to show two variables,
then plot all the points– Scatterplot matrices: show all combinations of two
variables from a set of multiple variables; let you see how multiple variables are related
– Table lenses: horizontal bars (or dots) show values in a column; multiple columns show multiple variables; columns are compared to the left most column to see how values correlate
![Page 70: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/70.jpg)
Correlation Analysis
• Best Practices– Optimizing aspect ratio and quantitative scales: make width
and height of graph equal, and have axes go from just below lowest value to just above highest value of each variable
– Removing fill color to reduce over plotting: just show outline to avoid overlaps
– Comparing data to reference regions: shade the reference or normed region to see outliers
– Visually distinguishing data sets when divided into groups: either through easily distinguished hues, or by symbols (best to use are circle, square, triangle, plus, and X)
![Page 71: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/71.jpg)
Correlation Analysis
• Best Practices– Using trend lines to enhance perception of
correlation’s shape, strength, and outliers: • line of best fit is one such that vertical distance of each
point from the line, squared and them summed, is the least amount; can be linear or curved
• line shouldn’t match every point! look for overall trend• can be used (if r squared for the line is high) to estimate
values for missing data points• use with caution for predicting values though – how do we
know if we’re in the middle of an upward trend or just at the top of an S curve and about to go down?
![Page 72: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/72.jpg)
Correlation Analysis
• Best Practices– Using multiple trend lines to see categorical
differences: may be useful if multiple tends show up– Removing the rough to see the smooth more clearly:
removing outliers can make graph more compact and show the trend (the smooth) better
– Using trellis and crosstab displays: to reduce complexity and over-plotting
– Using grid lines to enhance comparisons between scatterplots: helps focus on particular areas from one graph to the next by using lines as reference
![Page 73: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/73.jpg)
Correlation Analysis
• Do’s and Don’t’s– Do not indicate overlapping points with different
symbols – vary the size with number of points at given location.
– Ensure error bars do not make less stable points (with longer bars) look bigger.
– Ensure best fit lines are salient and distinguishable.– Do not fit a line by eye.– If using more than one best fit line, label each
directly.
![Page 74: Data Visualization Seminar NCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649e795503460f94b793d6/html5/thumbnails/74.jpg)
Next Module
• We are done with graphs and charts
• What about maps?