byjeff rey heer, michael bostock, and vadim ogievetsky …this minimal-istic representation uses the...
TRANSCRIPT
june 2010 | vol. 53 | no. 6 | communications of the acm 59
Doi:10.1145/1743546.1743567
Article development led by queue.acm.org
A survey of powerful visualization techniques, from the obvious to the obscure.
BY JeffReY heeR, michaeL BostocK, anD VaDim oGieVetsKY
a tour through the Visualization zoo
help engage more diverse audiences in exploration and analysis. The challenge is to create effective and engaging visu-alizations that are appropriate to the data.
Creating a visualization requires a number of nuanced judgments. One must determine which questions to ask, identify the appropriate data, and select effective visual encodings to map data values to graphical features such as position, size, shape, and color. The challenge is that for any given data set the number of visual encodings—and thus the space of possible visualization designs—is extremely large. To guide this process, computer scientists, psy-
of valuable information on how we conduct our businesses, governments, and personal lives. To put the informa-tion to good use, we must find ways to explore, relate, and communicate the data meaningfully.
The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers. Well-designed visual representations can replace cognitive calculations with simple perceptual in-ferences and improve comprehension, memory, and decision making. By mak-ing data more accessible and appeal-ing, visual representations may also
ThAnKs To ADVA n Ces in sensing, networking, and data management, our society is producing digital information at an astonishing rate. According to one estimate, in 2010 alone we will generate 1,200 exabytes—60 million times the content of the Library of Congress. Within this deluge of data lies a wealth
60 communications of the acm | june 2010 | vol. 53 | no. 6
practice
chologists, and statisticians have stud-ied how well different encodings facili-tate the comprehension of data types such as numbers, categories, and net-works. For example, graphical percep-tion experiments find that spatial po-sition (as in a scatter plot or bar chart) leads to the most accurate decoding of numerical data and is generally prefer-able to visual variables such as angle, one-dimensional length, two-dimen-sional area, three-dimensional volume, and color saturation. Thus, it should be no surprise that the most common data graphics, including bar charts, line charts, and scatter plots, use posi-tion encodings. Our understanding of graphical perception remains incom-plete, however, and must appropriately be balanced with interaction design and aesthetics.
This article provides a brief tour through the “visualization zoo,” show-casing techniques for visualizing and interacting with diverse data sets. In many situations, simple data graphics will not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated and unusual techniques that deal with complex data sets. After all, you don’t go to the zoo to see chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the terrifying Suma-tran tiger. Analogously, we cover some of the more exotic (but practically use-ful) forms of visual data representation, starting with one of the most common, time-series data; continuing on to sta-tistical data and maps; and then com-pleting the tour with hierarchies and networks. Along the way, bear in mind that all visualizations share a common “DNA”—a set of mappings between data properties and visual attributes such as position, size, shape, and col-or—and that customized species of vi-sualization might always be construct-ed by varying these encodings.
Each visualization shown here is accompanied by an online interactive example that can be viewed at the URL displayed beneath it. The live examples were created using Protovis, an open source language for Web-based data visualization. To learn more about how a visualization was made (or to copy and paste it for your own use), see the online version of this article available on the ACM Queue site at http://queue.
-1.0x
0.0x
1.0x
2.0x
3.0x
4.0x
5.0x
Gai
n /
Los
s Fa
ctor
S&P 500MSFT
AMZN
IBM
AAPL
GOOG
Jan 2005
time-series Data: figure 1a. index chart of selected technology stocks, 2000–2010.
Source: yahoo! Finance; http://hci.stanford.edu/jheer/files/zoo/ex/time/index-chart.html
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Agriculture
Business services
Construction
Education and Health
FinanceGovernmentInformation
Leisure and hospitality
ManufacturingMining and ExtractionOtherSelf-employedTransportation and Utilities
Wholesale and Retail Trade
time-series Data: figure 1b. stacked graph of unemployed u.s. workers by industry, 2000–2010.
Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/stack.html
Self-employed Agriculture
Other Leisure and hospitality
Education and Health Business services
Finance Information
Transportation and Utilities Wholesale and Retail Trade
Manufacturing Construction
Mining and Extraction Government
time-series Data: figure 1c. small multiples of unemployed u.s. workers, normalized by industry, 2000–2010.
Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/multiples.html
time-series Data: figure 1d. horizon graphs of u.s. unemployment rate, 2000–2010.
Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/horizon.html
practice
june 2010 | vol. 53 | no. 6 | communications of the acm 61
acm.org/detail.cfm?id=1780401/. All example source code is released into the public domain and has no restric-tions on reuse or modification. Note, however, that these examples will work only on a modern, standards-compli-ant browser supporting scalable vector graphics (SVG). Supported browsers in-clude recent versions of Firefox, Safari, Chrome, and Opera. Unfortunately, In-ternet Explorer 8 and earlier versions do not support SVG and so cannot be used to view the interactive examples.
time-series Data Sets of values changing over time—or, time-series data—is one of the most common forms of recorded data. Time-varying phenomena are central to many domains such as finance (stock prices, exchange rates), science (temperatures, pollution levels, electric potentials), and public policy (crime rates). One of-ten needs to compare a large number of time series simultaneously and can choose from a number of visualizations to do so.
Index Charts. With some forms of time-series data, raw values are less im-portant than relative changes. Consider investors who are more interested in a stock’s growth rate than its specific price. Multiple stocks may have dra-matically different baseline prices but may be meaningfully compared when normalized. An index chart is an inter-active line chart that shows percentage changes for a collection of time-series data based on a selected index point. For example, the image in Figure 1a shows the percentage change of select-ed stock prices if purchased in January 2005: one can see the rocky rise enjoyed by those who invested in Amazon, Ap-ple, or Google at that time.
Stacked Graphs. Other forms of time-series data may be better seen in aggregate. By stacking area charts on top of each other, we arrive at a visual summation of time-series values—a stacked graph. This type of graph (some-times called a stream graph) depicts aggregate patterns and often supports drill-down into a subset of individual series. The chart in Figure 1b shows the number of unemployed workers in the U.S. over the past decade, subdivided by industry. While such charts have prov-en popular in recent years, they do have some notable limitations. A stacked
graph does not support negative num-bers and is meaningless for data that should not be summed (temperatures, for example). Moreover, stacking may make it difficult to accurately interpret trends that lie atop other curves. Inter-active search and filtering is often used to compensate for this problem.
Small Multiples. In lieu of stacking, multiple time series can be plotted within the same axes, as in the index chart. Placing multiple series in the same space may produce overlapping curves that reduce legibility, however. An alternative approach is to use small multiples: showing each series in its own chart. In Figure 1c we again see the number of unemployed workers, but normalized within each industry category. We can now more accurately see both overall trends and seasonal patterns in each sector. While we are considering time-series data, note that small multiples can be constructed for just about any type of visualization: bar charts, pie charts, maps, among others. This often produces a more effective vi-sualization than trying to coerce all the data into a single plot.
Horizon Graphs. What happens when you want to compare even more time series at once? The horizon graph is a technique for increasing the data density of a time-series view while pre-serving resolution. Consider the five graphs shown in Figure 1d. The first one is a standard area chart, with posi-tive values colored blue and negative values colored red. The second graph “mirrors” negative values into the same region as positive values, doubling the data density of the area chart. The third chart—a horizon graph—doubles the data density yet again by dividing the graph into bands and layering them to create a nested form. The result is a chart that preserves data resolution but uses only a quarter of the space. Al-though the horizon graph takes some time to learn, it has been found to be more effective than the standard plot when the chart sizes get quite small.
statistical Distributions Other visualizations have been de-signed to reveal how a set of numbers is distributed and thus help an analyst better understand the statistical prop-erties of the data. Analysts often want to fit their data to statistical models, ei-
ther to test hypotheses or predict future values, but an improper choice of mod-el can lead to faulty predictions. Thus, one important use of visualizations is exploratory data analysis: gaining in-sight into how data is distributed to inform data transformation and mod-eling decisions. Common techniques include the histogram, which shows the prevalence of values grouped into bins, and the box-and-whisker plot, which can convey statistical features such as the mean, median, quartile boundaries, or extreme outliers. In addition, a number of other techniques exist for assessing a distribution and examining interac-tions between multiple dimensions.
Stem-and-Leaf Plots. For assessing a collection of numbers, one alternative to the histogram is the stem-and-leaf plot. It typically bins numbers accord-ing to the first significant digit, and then stacks the values within each bin by the second significant digit. This minimal-istic representation uses the data itself to paint a frequency distribution, re-placing the “information-empty” bars of a traditional histogram bar chart and allowing one to assess both the overall distribution and the contents of each bin. In Figure 2a, the stem-and-leaf plot shows the distribution of completion rates of workers completing crowd-sourced tasks on Amazon’s Mechani-cal Turk. Note the multiple clusters: one group clusters around high levels of completion (99%–100%); at the oth-er extreme is a cluster of Turkers who complete only a few tasks (~10%) in a group.
Q-Q Plots. Though the histogram and the stem-and-leaf plot are common tools for assessing a frequency distribu-tion, the Q-Q (quantile-quantile) plot is a more powerful tool. The Q-Q plot com-pares two probability distributions by graphing their quantiles against each other. If the two are similar, the plotted values will lie roughly along the central diagonal. If the two are linearly related, values will again lie along a line, though with varying slope and intercept.
Figure 2b shows the same Mechani-cal Turk participation data compared with three statistical distributions. Note how the data forms three distinct components when compared with uni-form and normal (Gaussian) distribu-tions: this suggests that a statistical model with three components might
62 communications of the acm | june 2010 | vol. 53 | no. 6
practice
be more appropriate, and indeed we see in the final plot that a fitted mixture of three normal distributions provides a better fit. Though powerful, the Q-Q plot has one obvious limitation in that its effective use requires that viewers possess some statistical knowledge.
SPLOM (Scatter Plot Matrix). Other visualization techniques attempt to represent the relationships among multiple variables. Multivariate data occurs frequently and is notoriously hard to represent, in part because of the difficulty of mentally picturing data in more than three dimensions. One technique to overcome this problem is to use small multiples of scatter plots showing a set of pairwise relations among variables, thus creating the SP-LOM (scatter plot matrix). A SPLOM en-ables visual inspection of correlations between any pair of variables.
In Figure 2c a scatter plot matrix is used to visualize the attributes of a da-tabase of automobiles, showing the re-lationships among horsepower, weight, acceleration, and displacement. Addi-tionally, interaction techniques such as brushing-and-linking—in which a selection of points on one graph high-lights the same points on all the other graphs—can be used to explore pat-terns within the data.
Parallel Coordinates. As shown in Figure 2d, parallel coordinates (||-co-ord) take a different approach to visu-alizing multivariate data. Instead of graphing every pair of variables in two dimensions, we repeatedly plot the data on parallel axes and then connect the corresponding points with lines. Each poly-line represents a single row in the database, and line crossings between dimensions often indicate inverse cor-relation. Reordering dimensions can aid pattern-finding, as can interactive querying to filter along one or more di-mensions. Another advantage of paral-lel coordinates is that they are relatively compact, so many variables can be shown simultaneously.
mapsAlthough a map may seem a natural way to visualize geographical data, it has a long and rich history of design. Many maps are based upon a carto-graphic projection: a mathematical function that maps the 3D geometry of the Earth to a 2D image. Other maps
0 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 6 7 8 8 8 8 8 8 9
1 0 0 0 0 1 1 1 1 2 2 3 3 3 3 4 4 4 4 5 5 6 7 7 8 9 9 9 9 9
2 0 0 1 1 1 5 7 8 9
3 0 0 1 2 3 3 3 4 6 6 8 8
4 0 0 1 1 1 1 3 3 4 5 5 5 6 7 8 9
5 0 2 3 5 6 7 7 7 9
6 1 2 6 7 8 9 9 9
7 0 0 0 1 6 7 9
8 0 0 1 2 3 4 4 4 4 4 4 4 5 6 7 7 7 9
9 1 3 3 5 7 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
statistical Distributions: figure 2a. stem-and-leaf plot of mechanical turk participation rates.
Source: Stanford Visualization group; http://hci.stanford.edu/jheer/files/zoo/ex/stats/stem-and-leaf.html
Uniform Distribution
0%
50%
100%
0% 50% 100%
Gaussian Distribution0% 50% 100%
Fitted Mixture of 3 Gaussians0% 50% 100%T
urke
r T
ask
Gro
up C
ompl
etio
n %
statistical Distributions: figure 2b. Q-Q plots of mechanical turk participation rates.
Source: Stanford Visualization group; http://hci.stanford.edu/jheer/files/zoo/ex/stats/qqplot.html
cylinders displacement weight horsepower acceleration mpg year
3 68 cubic inch 1613 lbs 46 hp 8 (0 to 60mph) 9 miles/gallon 70
8 455 cubic inch 5140 lbs 230 hp 25 (0 to 60mph) 47 miles/gallon 82
statistical Distributions: figure 2d. Parallel coordinates of automobile data.
Source: ggobi; http://hci.stanford.edu/jheer/files/zoo/ex/stats/parallel.html
horsepower
20003000
40005000 10 15 20
100200
300400
50
100
150
200
2000
3000
4000
5000
weight
2000
3000
4000
5000
10
15
20
acceleration
10
15
20
50 100150
200
100
200
300
400
20003000
40005000
10 15 20
displacement
United States European Union Japan
statistical Distributions: figure 2c. scatter plot matrix of automobile data.
Source: ggobi; http://hci.stanford.edu/jheer/files/zoo/ex/stats/splom.html
practice
june 2010 | vol. 53 | no. 6 | communications of the acm 63
knowingly distort or abstract geo-graphic features to tell a richer story or highlight specific data.
Flow Maps. By placing stroked lines on top of a geographic map, a flow map can depict the movement of a quantity in space and (implicitly) in time. Flow lines typically encode a large amount of multivariate information: path points, direction, line thickness, and color can all be used to present dimensions of information to the viewer. Figure 3a is a modern interpretation of Charles Mi-nard’s depiction of Napoleon’s ill-fated march on Moscow. Many of the greatest flow maps also involve subtle uses of distortion, as geography is bended to accommodate or highlight flows.
Choropleth Maps. Data is often col-lected and aggregated by geographi-cal areas such as states. A standard approach to communicating this data is to use a color encoding of the geo-graphic area, resulting in a choropleth map. Figure 3b uses a color encoding to communicate the prevalence of obe-sity in each state in the U.S. Though this is a widely used visualization tech-nique, it requires some care. One com-mon error is to encode raw data values (such as population) rather than using normalized values to produce a densi-ty map. Another issue is that one’s per-ception of the shaded value can also be affected by the underlying area of the geographic region.
Graduated Symbol Maps. An alterna-tive to the choropleth map, the gradu-ated symbol map places symbols over an underlying map. This approach avoids confounding geographic area with data values and allows for more dimensions to be visualized (for example, symbol size, shape, and color). In addition to simple shapes such as circles, gradu-ated symbol maps may use more com-plicated glyphs such as pie charts. In Figure 3c, total circle size represents a state’s population, and each slice indi-cates the proportion of people with a specific BMI rating.
Cartograms. A cartogram distorts the shape of geographic regions so that the area directly encodes a data variable. A common example is to redraw every country in the world sizing it propor-tionally to population or gross domes-tic product. Many types of cartograms have been created; in Figure 3d we use the Dorling cartogram, which represents
0°-10°-20°-30°
18 Oct24 Oct09 Nov
14 Nov24 Nov
28 Nov01 Dec06 Dec07 Dec
maps: figure 3a. flow map of napoleon’s march on moscow, based on the work of charles minard.
http://hci.stanford.edu/jheer/files/zoo/ex/maps/napoleon.html
WY
WI
WV
WA
OR
NV CO
SD
NDMT
ID
UT
NMAZ
CA
NE
KS
OK
TX LA
MS AL GA
ARTN
SC
NC
FL
MN
IA
MO
IL
KY
IN
MI
OH
VA
DEMD
PANJ
NY
CT RIMA
NH
ME
VT
14 - 17%
17 - 20%
20 - 23%
23 - 26%
26 - 29%
29 - 32%
32 - 35%
maps: figure 3b. choropleth map of obesity in the u.s., 2008.
Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/choropleth.html
Obese
Overweight
Normal
maps: figure 3c. Graduated symbol map of obesity in the u.s., 2008.
Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/symbol.html
WY
WI
WV
WA
OR
NVCO
SD
NDMT
ID
UT
NMAZCA
NE
KS
OK
TX
LA
MS
AL
GA
ARTN
SC
NC
FL
MN
IA
MO
IL
KY
IN
MI
OH
VA
DEMD
PA
NJ
NY
CT
RI
MA
NH
ME
VT
10M
1M
5M
100K14 - 17%17 - 20%20 - 23%23 - 26%26 - 29%29 - 32%32 - 35%
maps: figure 3d. Dorling cartogram of obesity in the u.s., 2008.
Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/cartogram.html
64 communications of the acm | june 2010 | vol. 53 | no. 6
practice
each geographic region with a sized circle, placed so as to resemble the true geographic configuration. In this ex-ample, circular area encodes the total number of obese people per state, and color encodes the percentage of the to-tal population that is obese.
hierarchiesWhile some data is simply a flat collec-tion of numbers, most can be organized into natural hierarchies. Consider: spa-tial entities, such as counties, states, and countries; command structures for businesses and governments; soft-ware packages and phylogenetic trees. Even for data with no apparent hierar-chy, statistical methods (for example, k-means clustering) may be applied to organize data empirically. Special visu-alization techniques exist to leverage hierarchical structure, allowing rapid multiscale inferences: micro-observa-tions of individual elements and mac-ro-observations of large groups.
Node-link diagrams. The word tree is used interchangeably with hierarchy, as the fractal branches of an oak might mirror the nesting of data. If we take a two-dimensional blueprint of a tree, we have a popular choice for visualizing hierarchies: a node-link diagram. Many different tree-layout algorithms have been designed; the Reingold-Tilford al-gorithm, used in Figure 4a on a package hierarchy of software classes, produces a tidy result with minimal wasted space.
An alternative visualization scheme is the dendrogram (or cluster) algorithm, which places leaf nodes of the tree at the same level. Thus, in the diagram in Fig-ure 4b, the classes (orange leaf nodes) are on the diameter of the circle, with the packages (blue internal nodes) in-side. Using polar rather than Cartesian coordinates has a pleasing aesthetic, while using space more efficiently.
We would be remiss to overlook the indented tree, used ubiquitously by operating systems to represent file directories, among other applications (see Figure 4c). Although the indented tree requires excessive vertical space and does not facilitate multiscale infer-ences, it does allow efficient interactive exploration of the tree to find a specific node. In addition, it allows rapid scan-ning of node labels, and multivariate data such as file size can be displayed adjacent to the hierarchy.
flare
analytics
cluster
Agglom
erativeCluster
Com
munityS
tructureH
ierarchicalCluster
MergeE
dge
graph
Betw
eennessCentrality
LinkDistance
MaxFlow
MinC
utS
hortestPaths
SpanningT
ree
optimization
AspectR
atioBanker
animate
Easing
FunctionSequence
ISchedulable
Parallel
Pause
Scheduler
Sequence
Transition
TransitionE
ventT
ransitionerT
ween
interpolate
ArrayInterpolator
ColorInterpolator
DateInterpolator
InterpolatorM
atrixInterpolatorN
umberInterpolator
ObjectInterpolator
PointInterpolator
RectangleInterpolator
data
DataField
DataS
chema
DataS
etD
ataSource
DataT
ableD
ataUtil
converters
Converters
Delim
itedTextC
onverterG
raphMLC
onverterID
ataConverter
JS
ON
Converter
display
DirtyS
priteLineS
priteR
ectSprite
TextS
prite
flex
FlareVis
physics
DragForce
GravityForce
IForceN
BodyForce
Particle
Sim
ulationS
pringS
pringForce
query
AggregateE
xpressionA
ndA
rithmetic
Average
BinaryE
xpressionC
omparison
Com
positeExpression
Count
DateU
tilD
istinctE
xpressionE
xpressionIteratorFn If IsALiteralM
atchM
aximum
Minim
umN
otO
rQ
ueryR
angeS
tringUtil
Sum
Variable
Variance
Xorm
ethods
_ addandaveragecountdistinctdiveq fn gt gteiff isalt ltem
axm
inm
odm
ulneqnotor orderbyrangeselectstddevsubsumupdatevariancew
herexor
scale
IScaleM
apLinearS
caleLogS
caleO
rdinalScale
QuantileS
caleQ
uantitativeScale
RootS
caleS
caleS
caleType
Tim
eScale
util
Arrays
Colors
Dates
Displays
FilterG
eometry
IEvaluable
IPredicate
IValueP
roxyM
athsO
rientationP
ropertyS
hapesS
ortS
tatsS
tringsheap
FibonacciHeap
HeapN
ode
math
DenseM
atrixIM
atrixS
parseMatrix
palette
ColorP
aletteP
aletteS
hapePalette
SizeP
alette
vis
Visualization
axis
Axes
Axis
AxisG
ridLineA
xisLabelC
artesianAxes
controls
AnchorC
ontrolC
lickControl
Control
ControlList
DragC
ontrolE
xpandControl
HoverC
ontrolIC
ontrolP
anZoomC
ontrolS
electionControl
TooltipC
ontrol
data
Data
DataList
DataS
priteE
dgeSprite
NodeS
priteS
caleBinding
Tree
TreeB
uilderrender
Arrow
Type
EdgeR
endererIR
endererS
hapeRenderer
events
DataE
ventS
electionEvent
TooltipE
ventV
isualizationEvent
legendLegendLegendItemLegendR
ange
operator
IOperator
Operator
OperatorList
OperatorS
equenceO
peratorSw
itchS
ortOperator
distortionB
ifocalDistortion
Distortion
FisheyeDistortion
encoder
ColorE
ncoderE
ncoderP
ropertyEncoder
ShapeE
ncoderS
izeEncoder
filterFisheyeT
reeFilterG
raphDistanceFilter
VisibilityFilter
labelLabelerR
adialLabelerS
tackedAreaLabeler
layout
AxisLayout
BundledE
dgeRouter
CircleLayout
CircleP
ackingLayoutD
endrogramLayout
ForceDirectedLayout
IcicleTreeLayout
IndentedTreeLayout
LayoutN
odeLinkTreeLayout
PieLayout
RadialT
reeLayoutR
andomLayout
StackedA
reaLayoutT
reeMapLayout
hierarchies: figure 4a. Radial node-link diagram of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/tree.html
933KB47KB14KB25KB
6KB97KB29KB23KB
4KB29KB
1KB1KB0KB
10KB2KB9KB2KB1KB
87KB30KB
161KB422KB
16KB33KB
flareanalytics
clustergraphoptimization
animatedatadisplayflexphysics
DragForceGravityForceIForceNBodyForceParticleSimulationSpringSpringForce
queryscaleutilvis
Visualizationaxis
43KB107KB
6KB35KB
179KB1KB2KB5KB4KB2KB1KB
13KB14KB11KB16KB
105KB6KB3KB9KB
11KB4KB8KB4KB3KB
controlsdataeventslegendoperator
IOperatorOperatorOperatorListOperatorSequenceOperatorSwitchSortOperatordistortionencoderfilterlabellayout
AxisLayoutBundledEdgeRouterCircleLayoutCirclePackingLayoutDendrogramLayoutForceDirectedLayoutIcicleTreeLayoutIndentedTreeLayout
hierarchies: figure 4c. indented tree layout of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/indent.html
flare
anal
ytic
s
clus
ter
Agg
lom
erat
iveC
lust
erC
omm
unity
Str
uctu
reH
iera
rchi
calC
lust
erM
erge
Edg
egr
aph
Bet
wee
nnes
sCen
tral
ityLi
nkD
ista
nce
Max
Flow
Min
Cut
Sho
rtes
tPat
hsS
pann
ingT
ree
optim
izat
ion
Asp
ectR
atio
Ban
ker
anim
ate
Easi
ngFu
nctio
nSeq
uenc
e
ISch
edul
able
Para
llel
Paus
eSc
hedu
ler
Sequ
ence
Tran
sitio
nTr
ansi
tionE
vent
Tran
sitio
ner
Twee
n
interp
olat
e
Array
Inte
rpol
ator
Color
Inte
rpol
ator
DateI
nter
pola
tor
Inte
rpol
ator
Mat
rixIn
terp
olat
or
Number
Inte
rpolat
or
ObjectInte
rpolato
r
PointInte
rpolato
r
RectangleInterpolator
data
DataField
DataSchema
DataSet
DataSource
DataTable
DataUtil
converters
Converters
DelimitedTextConverter
GraphMLConverter
IDataConverter
JSONConverter
display
DirtySprite
LineSprite
RectSprite
TextSprite
flexFlareVis
physics
DragForceGravityForce
IForceNBodyForceParticleSimulationSpringSpringForce
query
AggregateExpressionAndArithmeticAverageBinaryExpressionComparisonCompositeExpression
CountDateUtilDistinctExpressionExpressionIterator
FnIfIsALiteralMatchMaximumMinimum
NotOrQueryRangeStringUtil
SumVariableVariance
Xor
methods _addandaverage
count
distinct
diveqfngtgteiffisaltltemax
min
mod
mul
neqnotororderbyrangeselectstddevsubsumupdatevariancew
here
xor
scal
e
ISca
leM
apLi
near
Sca
leLo
gSca
leO
rdin
alS
cale
Qua
ntile
Sca
le
Qua
ntita
tiveS
cale
Roo
tSca
leS
cale
Sca
leT
ype
Tim
eSca
le
util
Arr
ays
Col
ors
Dat
es
Dis
play
sFi
lter
Geo
met
ry
IEva
luab
le
IPre
dica
te
IVal
uePr
oxy
Mat
hs
Orient
atio
n
Prop
erty
Shap
esSort
Stats
String
s
heap
Fibonac
ciHea
p
HeapNode
math
DenseMatri
xIMatri
x
SparseMatrix
palette
ColorPalettePalette
ShapePalette
SizePalette
vis
Visualization
axis
AxesAxisAxisGridLine
AxisLabel
CartesianAxes
controls
AnchorControlClickControlControlControlListDragControlExpandControlHoverControlIControlPanZoomControlSelectionControlTooltipControl
data
DataDataList
DataSpriteEdgeSpriteNodeSprite
ScaleBindingTree
TreeBuilderrender
ArrowType
EdgeRenderer
IRenderer
ShapeRenderer events
DataEvent
SelectionEvent
TooltipEvent
VisualizationEventlegend
Legend
LegendItem
LegendRange
operator
IOperator
Operator
OperatorList
OperatorSequence
OperatorSwitch
SortOperator
distortion
BifocalDistortion
Distortion
FisheyeDistortion
encoder
ColorEncoderEncoder
PropertyEncoder
ShapeEncoder
SizeEncoder
filter
FisheyeTreeFilter
GraphDistanceFilter
VisibilityFilter
label
Labeler
RadialLabeler
StackedAreaLabeler
layout
AxisLayout
BundledEdgeR
outerC
ircleLayout
CircleP
ackingLayoutD
endrogramLayout
ForceDirectedLayout
IcicleTreeLayout
IndentedTreeLayout
LayoutN
odeLinkTreeLayoutP
ieLayoutR
adialTreeLayout
Random
LayoutS
tackedAreaLayout
TreeM
apLayout
hierarchies: figure 4b. cartesian node-link diagram of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/cluster-radial.html
practice
june 2010 | vol. 53 | no. 6 | communications of the acm 65
Adjacency Diagrams. The adjacency diagram is a space-filling variant of the node-link diagram; rather than draw-ing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in Figure 4d is similar to the first node-link diagram in that the root node appears at the top, with child nodes underneath. Because the nodes are now space-filling, however, we can use a length encoding for the size of software classes and packages. This reveals an additional dimension that would be difficult to show in a node-link diagram.
The sunburst layout, shown in Fig-ure 4e, is equivalent to the icicle lay-out, but in polar coordinates. Both are implemented using a partition layout, which can also generate a node-link diagram. Similarly, the previous cluster layout can be used to generate a space-filling adjacency diagram in either Car-tesian or polar coordinates.
Enclosure Diagrams. The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rect-angles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. The example shown in Figure 4f uses padding (in blue) to emphasize enclosure; an alternative saturation encoding is sometimes used. Squarified treemaps use approxi-mately square rectangles, which offer better readability and size estimation than a naive “slice-and-dice” subdivi-sion. Fancier algorithms such as Vo-ronoi and jigsaw treemaps also exist but are less common.
By packing circles instead of sub-dividing rectangles, we can produce a different sort of enclosure diagram that has an almost organic appear-ance. Although it does not use space as efficiently as a treemap, the “wast-ed space” of the circle-packing layout, shown in Figure 4g, effectively reveals the hierarchy. At the same time, node sizes can be rapidly compared using area judgments.
NetworksIn addition to organization, one aspect
flare
analytics
cluster
graph
optimization
animate
interpolate
data
converters
display
flex
physics
query
methods
scale
util heap
math
palette
vis
axis
controls data
render
events
legend
operatordistortion
encoder
filterlabel
layout
hierarchies: figure 4g. nested circles layout of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/pack.html Source: The Flare Toolkit http://flare.prefuse.org
flar
e
anal
ytic
s
clus
ter
grap
h
anim
ate
Eas
ing
Tra
nsiti
on
Tra
nsiti
oner
inte
rpol
ate
Inte
rpol
ator
data
conv
erte
rsG
raph
MLC
onve
rter
disp
lay
Dir
tyS
prite
Tex
tSpr
ite
phys
ics
NB
odyF
orce
Sim
ulat
ion
quer
y
Que
ry
met
hods
scal
e
util
Arr
ays
Col
ors
Dat
es
Dis
play
s
Geo
met
ry
Mat
hs
Sha
pes
Str
ings
heap
Fibo
nacc
iHea
p
mat
h
pale
tte
vis
Vis
ualiz
atio
n
axis
Axi
s
cont
rols
Too
ltip
Con
trol
data
Dat
a
Dat
aLis
t
Dat
aSpr
ite
Nod
eSpr
ite
Sca
leB
indi
ng
Tre
eBui
lder
rend
er
lege
ndLe
gend
Lege
ndR
ange
oper
ator
dist
ortio
n
enco
der
filte
r
labe
lLa
bele
r
layo
ut
Cir
cleL
ayou
t
Cir
cleP
acki
ngLa
yout
Forc
eDir
ecte
dLay
out
Nod
eLin
kTre
eLay
out
Rad
ialT
reeL
ayou
t
Sta
cked
Are
aLay
out
Tre
eMap
Layo
ut
hierarchies: figure 4d. icicle tree layout of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/icicle.html
flare
anal
ytic
scl
uste
r
grap
hM
axFl
owM
inC
utan
imat
eEa
sing
Tran
sitio
nTra
nsiti
oner
interpolate
Interpolator
data
converters
GraphMLConverter
display
DirtySprite
TextSprite
physics
NBodyForce
Simulation
query
Query
methods
scale
util
Arrays
Colors
Dates
Displays
Geom
etry
Maths
Shapes
Str
ings
heap
Fibo
nacc
iHea
pm
ath
pale
tte
vis
Visu
aliz
atio
n
axis
Axi
sco
ntrols
SelectionContro
l
TooltipContro
l
data
Data
DataListDataSprite
NodeSprite
ScaleBinding
TreeBuilder
render
legend
Legend
LegendRange
operator
distortion
encoder
filter
label
Labeler
layout
CircleLayoutCirclePackingLayout
ForceDirectedLayout
LayoutN
odeLinkTreeLayout
RadialT
reeLayout
StackedA
reaLayout
TreeM
apLayout
hierarchies: figure 4e. sunburst (radial space-filling) layout of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/sunburst.html
flare
analyticstics
clusteromerativemunityStru
HierarchicalCluste
MergeEdgeanalytgraph
weennessCenLinkDistanc
MaxFlowMinChortestPat
SpanningTree optimizationAspectRatioBanker
animateEasing
FunctionSequence chedul
Parallel
Pause
Scheduler
Sequence
Transition
sition
Transitioner
Tween
interpolateyInterp
lorInterpoInter
Interpolator
trixInterpo
berInter
ectInterp
tInter
angleInter
data
DataFie
DataSchema
DataSe
DataSourceataTa
DataUtil
data
converters
nver
itedTextCo
aphMLConve
ataConv
SONConver display
DirtySprite
LineSprite
RectSprit
TextSprite
flexFlareVis
physics
ragFor
avityFor
Fo
NBodyForce
Particle
Simulation
Spring
SpringForc
query
egateExpr
And
Arithmetic
Average
naryExpress
Comparison
mpositeExpre
Coun
DateUtil
Distinc
Expression
pressionIter
Fn
If IsA
Literal
Match
aximnim
Not
Or
Query
Range
StringUtil
Sum
VariablVariance
Xor
methods
_
add
anderacoun
stin
div eq
fn
gt
gteiff
isa
ltlte
maxmin
modmul
neq not
orrder
range
elec
tddesub
sum
pdarian
wher
xor
scale
IScaleMap
inearSca
LogScale
OrdinalScale
uantileSc
uantitativeSca
RootScal
Scale
ScaleType
TimeScale
flare
utilArrays
Colors
Dates
Displays
Filter
Geometry
valuredialuePr
Maths
ientat
Property
Shapes
Sort
Stats
Strings
heapFibonacciHeap pN
mathDenseMatrix
IMatrSparseMatri
flare
palette
ColorPalette
Palette
hapePalezePale
vis
Visualization
axis
Axes
Axis
GridisLaCartesianAxes
viscontrols
nchorCont
ClickContr
Contro
ControlList
DragControl
ExpandControHoverContro
Cont
anZoomCont
SelectionControl
TooltipControl
data
Data DataList
DataSprite
dgeSpr
NodeSprite ScaleBinding
TreeTreeBuilder
renderrowT
EdgeRendereend
peRend
eventsDataEven
lectionEv
oltipEv
lizatio
legend
Legend
egendIteLegendRange
operator
Operat
Operator
OperatorList
OperatorSequenc
eratorSw
SortOperat
distortionifocalDistort
DistortionsheyeDistort
encoder
lorEnco
EncoderpertyEnc
apeEnco
izeEncod
filterheyeTreeFaphDistanceF
VisibilityFilte
label
Labeler
RadialLabelkedAreaL
operatorlayout
AxisLayout ledEdgeR
CircleLayout
CirclePackingLayo
ndrogramLa
ForceDirectedLayou
cicleTreeLay
dentedTreeLa
Layout
NodeLinkTreeLayou
PieLayout
RadialTreeLayout
om
StackedAreaLayout
TreeMapLayou
hierarchies: figure 4f. treemap layout of the flare package hierarchy.
http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/treemap.html
66 communications of the acm | june 2010 | vol. 53 | no. 6
practice
of data that we may wish to explore through visualization is relationship. For example, given a social network, who is friends with whom? Who are the central players? What cliques ex-ist? Who, if anyone, serves as a bridge between disparate groups? Abstractly, a hierarchy is a specialized form of net-work: each node has exactly one link to its parent, while the root node has no links. Thus node-link diagrams are also used to visualize networks, but the loss of hierarchy means a different al-gorithm is required to position nodes.
Mathematicians use the formal term graph to describe a network. A central challenge in graph visualiza-tion is computing an effective layout. Layout techniques typically seek to po-sition closely related nodes (in terms of graph distance, such as the number of links between nodes, or other met-rics) close in the drawing; critically, unrelated nodes must also be placed far enough apart to differentiate rela-tionships. Some techniques may seek to optimize other visual features—for example, by minimizing the number of edge crossings.
Force-directed Layouts. A common and intuitive approach to network lay-out is to model the graph as a physical system: nodes are charged particles that repel each other, and links are damp-ened springs that pull related nodes together. A physical simulation of these forces then determines the node posi-tions; approximation techniques that avoid computing all pairwise forces enable the layout of large numbers of nodes. In addition, interactivity allows the user to direct the layout and jiggle nodes to disambiguate links. Such a force-directed layout is a good starting point for understanding the structure of a general undirected graph. In Figure 5a we use a force-directed layout to view the network of character co-occurrence in the chapters of Victor Hugo’s classic novel, Les Misérables. Node colors de-pict cluster memberships computed by a community-detection algorithm.
Arc Diagrams. An arc diagram, shown in Figure 5b, uses a one-dimen-sional layout of nodes, with circular arcs to represent links. Though an arc diagram may not convey the overall structure of the graph as effectively as a two-dimensional layout, with a good ordering of nodes it is easy to identify
networks: figure 5a. force-directed layout of Les Misérables character co-occurrences.
http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/force.html
Myr
iel
Nap
oleo
nM
lle. B
aptis
tine
Mm
e. M
aglo
ireC
ount
ess
de L
oG
ebor
and
Cha
mpt
erci
erC
rava
tteC
ount
Old
Man
Laba
rreVa
ljean
Mar
guer
iteM
me.
de
RIs
abea
uG
erva
isTh
olom
yes
List
olie
rFa
meu
ilBl
ache
ville
Favo
urite
Dah
liaZe
phin
eFa
ntin
eM
me.
The
nard
ier
Then
ardi
erC
oset
teJa
vert
Fauc
hele
vent
Bam
atab
ois
Perp
etue
Sim
plic
eSc
auffl
aire
Wom
an 1
Judg
eC
ham
pmat
hieu
Brev
etC
heni
ldie
uC
oche
paille
Pont
mer
cyBo
ulat
ruel
leEp
onin
eAn
zelm
aW
oman
2M
othe
r In
noce
ntG
ribie
rJo
ndre
tteM
me.
Bur
gon
Gav
roch
eG
illeno
rman
dM
agno
nM
lle. G
illeno
rman
dM
me.
Pon
tmer
cyM
lle. V
aubo
isLt
. Gille
norm
and
Mar
ius
Baro
ness
TM
abeu
fEn
jolra
sC
ombe
ferre
Prou
vaire
Feui
llyC
ourfe
yrac
Baho
rel
Boss
uet
Joly
Gra
ntai
reM
othe
r Pl
utar
chG
ueul
emer
Babe
tC
laqu
esou
sM
ontp
arna
sse
Tous
sain
tC
hild
1C
hild
2Br
ujon
Mm
e. H
uche
loup
networks: figure 5b. arc diagram of Les Misérables character co-occurrences.
http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/arc.html
Child 1
Chi
ld 1
Child 2
Chi
ld 2
Mother Plutarch
Mot
her
Plu
tarc
h
Gavroche
Gav
roch
e
Marius
Mar
ius
Mabeuf
Mab
euf
Enjolras
Enj
olra
s
Combeferre
Com
befe
rre
Prouvaire
Pro
uvai
re
Feuilly
Feu
illy
Courfeyrac
Cou
rfey
rac
Bahorel
Bah
orel
Bossuet
Bos
suet
Joly
Joly
Grantaire
Gra
ntai
re
Mme. Hucheloup
Mm
e. H
uche
loup
Jondrette
Jond
rette
Mme. Burgon
Mm
e. B
urgo
n
Boulatruelle
Bou
latr
uelle
Cosette
Cos
ette
Woman 2
Wom
an 2
Gillenormand
Gill
enor
man
d
Magnon
Mag
non
Mlle. Gillenormand
Mlle
. G
illen
orm
and
Mme. Pontmercy
Mm
e. P
ontm
ercy
Mlle. Vaubois
Mlle
. V
aubo
is
Lt. Gillenormand
Lt.
Gill
enor
man
d
Baroness T
Bar
ones
s T
Toussaint
Tou
ssai
nt
Mme. Thenardier
Mm
e. T
hena
rdie
r
Thenardier
The
nard
ier
Javert
Jave
rt
Pontmercy
Pon
tmer
cy
Eponine
Epo
nine
Anzelma
Anz
elm
a
Gueulemer
Gue
ulem
er
Babet
Bab
et
Claquesous
Cla
ques
ous
Montparnasse
Mon
tpar
nass
e
Brujon
Bru
jon
Marguerite
Mar
guer
ite
Tholomyes
Tho
lom
yes
Listolier
List
olie
r
Fameuil
Fam
euil
Blacheville
Bla
chev
ille
Favourite
Fav
ourit
e
Dahlia
Dah
lia
Zephine
Zep
hine
Fantine
Fan
tine
Perpetue
Per
petu
e
Labarre
Laba
rre
Valjean
Val
jean
Mme. de R
Mm
e. d
e R
Isabeau
Isab
eau
Gervais
Ger
vais
Bamatabois
Bam
atab
ois
Simplice
Sim
plic
e
Scaufflaire
Sca
uffla
ire
Woman 1
Wom
an 1
Judge
Judg
e
Champmathieu
Cha
mpm
athi
eu
Brevet
Bre
vet
Chenildieu
Che
nild
ieu
Cochepaille
Coc
hepa
ille
Myriel
Myr
iel
Napoleon
Nap
oleo
n
Mlle. Baptistine
Mlle
. B
aptis
tine
Mme. Magloire
Mm
e. M
aglo
ire
Countess de Lo
Cou
ntes
s de
Lo
Geborand
Geb
oran
d
Champtercier
Cha
mpt
erci
er
Cravatte
Cra
vatte
Count
Cou
nt
Old Man
Old
Man
Fauchelevent
Fau
chel
even
t
Mother Innocent
Mot
her
Inno
cent
Gribier
Grib
ier
networks: figure 5c. matrix view of Les Misérables character co-occurrences.
http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/matrix.htmlSource: http://www-personal.umich.edu/~mejn/netdata
practice
june 2010 | vol. 53 | no. 6 | communications of the acm 67
cliques and bridges. Further, as with the indented-tree layout, multivariate data can easily be displayed alongside nodes. The problem of sorting the nodes in a manner that reveals underlying cluster structure is formally called seriation and has diverse applications in visualiza-tion, statistics, and even archaeology.
Matrix Views. Mathematicians and computer scientists often think of a graph in terms of its adjacency matrix: each value in row i and column j in the matrix corresponds to the link from node i to node j. Given this representa-tion, an obvious visualization then is: just show the matrix! Using color or sat-uration instead of text allows values as-sociated with the links to be perceived more rapidly.
The seriation problem applies just as much to the matrix view, shown in Figure 5c, as to the arc diagram, so the order of rows and columns is im-portant: here we use the groupings generated by a community-detection algorithm to order the display. While path-following is more difficult in a matrix view than in a node-link dia-gram, matrices have a number of com-pensating advantages. As networks get large and highly connected, node-link diagrams often devolve into giant hairballs of line crossings. In matrix views, however, line crossings are im-possible, and with an effective sort-ing one quickly can spot clusters and bridges. Allowing interactive group-ing and reordering of the matrix facili-tates even deeper exploration of net-work structure.
conclusionWe have arrived at the end of our tour and hope the reader has found the ex-amples both intriguing and practical. Though we have visited a number of visual encoding and interaction tech-niques, many more species of visualiza-tion exist in the wild, and others await discovery. Emerging domains such as bioinformatics and text visualization are driving researchers and designers to continually formulate new and creative representations or find more powerful ways to apply the classics. In either case, the DNA underlying all visualizations remains the same: the principled map-ping of data variables to visual features such as position, size, shape, and color.
As you leave the zoo and head back
into the wild, try deconstructing the various visualizations crossing your path. Perhaps you can design a more ef-fective display?
Additional Resources
Few, S. Now I See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press, 2009.
Tufte, E. The Visual Display of Quantitative Information. Graphics Press, 1983.
Tufte, E.Envisioning Information. Graphics Press, 1990.
Ware, C.Visual Thinking for Design. Morgan Kaufmann, 2008.
Wilkinson, L.The Grammar of Graphics. Springer, 1999.
Visualization Development Tools
Prefuse: Java API for information visualization.
Prefuse Flare: ActionScript 3 library for data visualization in the Adobe Flash Player.
Processing: Popular language and IDE for graphics and interaction.
Protovis: JavaScript tool for Web-based visualization.
The Visualization Toolkit: Library for 3D and scientific visualization.
Related articles on queue.acm.org
A Conversation with Jeff heer, Martin Wattenberg, and Fernanda Viégashttp://queue.acm.org/detail.cfm?id=1744741
Unifying Biological Image Formats with hDF5Matthew T. Dougherty, Michael J. Folk, Erez Zadok, Herbert J. Bernstein, Frances C. Bernstein, Kevin W. Eliceiri, Werner Benger, Christoph Besthttp://queue.acm.org/detail.cfm?id=1628215
Jeffrey heer is an assistant professor of computer science at Stanford university, where he works on human-computer interaction, visualization, and social computing. He led the design of the Prefuse, Flare, and Protovis visualization toolkits.
Michael Bostock is currently a Ph.D. student in the Department of Computer Science at Stanford university. before attending Stanford, he was a staff engineer at google, where he developed search quality evaluation methodologies.
Vadim Ogievetsky is a master’s student at Stanford university specializing in human-computer interaction. He is a core contributor to Protovis, an open-source Web-based visualization toolkit.
© 2010 ACM 0001-0782/10/0600 $10.00
all visualizations share a common “Dna”—a set of mappings between data properties and visual attributes such as position, size, shape, and color—and customized species of visualization might always be constructed by varying these encodings.