byjeff rey heer, michael bostock, and vadim ogievetsky …this minimal-istic representation uses the...

june 2010 | vol. 53 | no. 6 | communications of the acm 59

Doi:10.1145/1743546.1743567

Article development led by queue.acm.org

A survey of powerful visualization techniques, from the obvious to the obscure.

BY JeffReY heeR, michaeL BostocK, anD VaDim oGieVetsKY

a tour through the Visualization zoo

help engage more diverse audiences in exploration and analysis. The challenge is to create effective and engaging visu-alizations that are appropriate to the data.

Creating a visualization requires a number of nuanced judgments. One must determine which questions to ask, identify the appropriate data, and select effective visual encodings to map data values to graphical features such as position, size, shape, and color. The challenge is that for any given data set the number of visual encodings—and thus the space of possible visualization designs—is extremely large. To guide this process, computer scientists, psy-

of valuable information on how we conduct our businesses, governments, and personal lives. To put the informa-tion to good use, we must find ways to explore, relate, and communicate the data meaningfully.

The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers. Well-designed visual representations can replace cognitive calculations with simple perceptual in-ferences and improve comprehension, memory, and decision making. By mak-ing data more accessible and appeal-ing, visual representations may also

ThAnKs To ADVA n Ces in sensing, networking, and data management, our society is producing digital information at an astonishing rate. According to one estimate, in 2010 alone we will generate 1,200 exabytes—60 million times the content of the Library of Congress. Within this deluge of data lies a wealth

60 communications of the acm | june 2010 | vol. 53 | no. 6

practice

chologists, and statisticians have stud-ied how well different encodings facili-tate the comprehension of data types such as numbers, categories, and net-works. For example, graphical percep-tion experiments find that spatial po-sition (as in a scatter plot or bar chart) leads to the most accurate decoding of numerical data and is generally prefer-able to visual variables such as angle, one-dimensional length, two-dimen-sional area, three-dimensional volume, and color saturation. Thus, it should be no surprise that the most common data graphics, including bar charts, line charts, and scatter plots, use posi-tion encodings. Our understanding of graphical perception remains incom-plete, however, and must appropriately be balanced with interaction design and aesthetics.

This article provides a brief tour through the “visualization zoo,” show-casing techniques for visualizing and interacting with diverse data sets. In many situations, simple data graphics will not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated and unusual techniques that deal with complex data sets. After all, you don’t go to the zoo to see chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the terrifying Suma-tran tiger. Analogously, we cover some of the more exotic (but practically use-ful) forms of visual data representation, starting with one of the most common, time-series data; continuing on to sta-tistical data and maps; and then com-pleting the tour with hierarchies and networks. Along the way, bear in mind that all visualizations share a common “DNA”—a set of mappings between data properties and visual attributes such as position, size, shape, and col-or—and that customized species of vi-sualization might always be construct-ed by varying these encodings.

Each visualization shown here is accompanied by an online interactive example that can be viewed at the URL displayed beneath it. The live examples were created using Protovis, an open source language for Web-based data visualization. To learn more about how a visualization was made (or to copy and paste it for your own use), see the online version of this article available on the ACM Queue site at http://queue.

-1.0x

0.0x

1.0x

2.0x

3.0x

4.0x

5.0x

Gai

n /

Los

s Fa

ctor

S&P 500MSFT

AMZN

IBM

AAPL

GOOG

Jan 2005

time-series Data: figure 1a. index chart of selected technology stocks, 2000–2010.

Source: yahoo! Finance; http://hci.stanford.edu/jheer/files/zoo/ex/time/index-chart.html

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Agriculture

Business services

Construction

Education and Health

FinanceGovernmentInformation

Leisure and hospitality

ManufacturingMining and ExtractionOtherSelf-employedTransportation and Utilities

Wholesale and Retail Trade

time-series Data: figure 1b. stacked graph of unemployed u.s. workers by industry, 2000–2010.

Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/stack.html

Self-employed Agriculture

Other Leisure and hospitality

Education and Health Business services

Finance Information

Transportation and Utilities Wholesale and Retail Trade

Manufacturing Construction

Mining and Extraction Government

time-series Data: figure 1c. small multiples of unemployed u.s. workers, normalized by industry, 2000–2010.

Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/multiples.html

time-series Data: figure 1d. horizon graphs of u.s. unemployment rate, 2000–2010.

Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/horizon.html

practice


acm.org/detail.cfm?id=1780401/. All example source code is released into the public domain and has no restric-tions on reuse or modification. Note, however, that these examples will work only on a modern, standards-compli-ant browser supporting scalable vector graphics (SVG). Supported browsers in-clude recent versions of Firefox, Safari, Chrome, and Opera. Unfortunately, In-ternet Explorer 8 and earlier versions do not support SVG and so cannot be used to view the interactive examples.

time-series Data Sets of values changing over time—or, time-series data—is one of the most common forms of recorded data. Time-varying phenomena are central to many domains such as finance (stock prices, exchange rates), science (temperatures, pollution levels, electric potentials), and public policy (crime rates). One of-ten needs to compare a large number of time series simultaneously and can choose from a number of visualizations to do so.

Index Charts. With some forms of time-series data, raw values are less im-portant than relative changes. Consider investors who are more interested in a stock’s growth rate than its specific price. Multiple stocks may have dra-matically different baseline prices but may be meaningfully compared when normalized. An index chart is an inter-active line chart that shows percentage changes for a collection of time-series data based on a selected index point. For example, the image in Figure 1a shows the percentage change of select-ed stock prices if purchased in January 2005: one can see the rocky rise enjoyed by those who invested in Amazon, Ap-ple, or Google at that time.

Stacked Graphs. Other forms of time-series data may be better seen in aggregate. By stacking area charts on top of each other, we arrive at a visual summation of time-series values—a stacked graph. This type of graph (some-times called a stream graph) depicts aggregate patterns and often supports drill-down into a subset of individual series. The chart in Figure 1b shows the number of unemployed workers in the U.S. over the past decade, subdivided by industry. While such charts have prov-en popular in recent years, they do have some notable limitations. A stacked

graph does not support negative num-bers and is meaningless for data that should not be summed (temperatures, for example). Moreover, stacking may make it difficult to accurately interpret trends that lie atop other curves. Inter-active search and filtering is often used to compensate for this problem.

Small Multiples. In lieu of stacking, multiple time series can be plotted within the same axes, as in the index chart. Placing multiple series in the same space may produce overlapping curves that reduce legibility, however. An alternative approach is to use small multiples: showing each series in its own chart. In Figure 1c we again see the number of unemployed workers, but normalized within each industry category. We can now more accurately see both overall trends and seasonal patterns in each sector. While we are considering time-series data, note that small multiples can be constructed for just about any type of visualization: bar charts, pie charts, maps, among others. This often produces a more effective vi-sualization than trying to coerce all the data into a single plot.

Horizon Graphs. What happens when you want to compare even more time series at once? The horizon graph is a technique for increasing the data density of a time-series view while pre-serving resolution. Consider the five graphs shown in Figure 1d. The first one is a standard area chart, with posi-tive values colored blue and negative values colored red. The second graph “mirrors” negative values into the same region as positive values, doubling the data density of the area chart. The third chart—a horizon graph—doubles the data density yet again by dividing the graph into bands and layering them to create a nested form. The result is a chart that preserves data resolution but uses only a quarter of the space. Al-though the horizon graph takes some time to learn, it has been found to be more effective than the standard plot when the chart sizes get quite small.

statistical Distributions Other visualizations have been de-signed to reveal how a set of numbers is distributed and thus help an analyst better understand the statistical prop-erties of the data. Analysts often want to fit their data to statistical models, ei-

ther to test hypotheses or predict future values, but an improper choice of mod-el can lead to faulty predictions. Thus, one important use of visualizations is exploratory data analysis: gaining in-sight into how data is distributed to inform data transformation and mod-eling decisions. Common techniques include the histogram, which shows the prevalence of values grouped into bins, and the box-and-whisker plot, which can convey statistical features such as the mean, median, quartile boundaries, or extreme outliers. In addition, a number of other techniques exist for assessing a distribution and examining interac-tions between multiple dimensions.

Stem-and-Leaf Plots. For assessing a collection of numbers, one alternative to the histogram is the stem-and-leaf plot. It typically bins numbers accord-ing to the first significant digit, and then stacks the values within each bin by the second significant digit. This minimal-istic representation uses the data itself to paint a frequency distribution, re-placing the “information-empty” bars of a traditional histogram bar chart and allowing one to assess both the overall distribution and the contents of each bin. In Figure 2a, the stem-and-leaf plot shows the distribution of completion rates of workers completing crowd-sourced tasks on Amazon’s Mechani-cal Turk. Note the multiple clusters: one group clusters around high levels of completion (99%–100%); at the oth-er extreme is a cluster of Turkers who complete only a few tasks (~10%) in a group.

Q-Q Plots. Though the histogram and the stem-and-leaf plot are common tools for assessing a frequency distribu-tion, the Q-Q (quantile-quantile) plot is a more powerful tool. The Q-Q plot com-pares two probability distributions by graphing their quantiles against each other. If the two are similar, the plotted values will lie roughly along the central diagonal. If the two are linearly related, values will again lie along a line, though with varying slope and intercept.

Figure 2b shows the same Mechani-cal Turk participation data compared with three statistical distributions. Note how the data forms three distinct components when compared with uni-form and normal (Gaussian) distribu-tions: this suggests that a statistical model with three components might


practice

be more appropriate, and indeed we see in the final plot that a fitted mixture of three normal distributions provides a better fit. Though powerful, the Q-Q plot has one obvious limitation in that its effective use requires that viewers possess some statistical knowledge.

SPLOM (Scatter Plot Matrix). Other visualization techniques attempt to represent the relationships among multiple variables. Multivariate data occurs frequently and is notoriously hard to represent, in part because of the difficulty of mentally picturing data in more than three dimensions. One technique to overcome this problem is to use small multiples of scatter plots showing a set of pairwise relations among variables, thus creating the SP-LOM (scatter plot matrix). A SPLOM en-ables visual inspection of correlations between any pair of variables.

In Figure 2c a scatter plot matrix is used to visualize the attributes of a da-tabase of automobiles, showing the re-lationships among horsepower, weight, acceleration, and displacement. Addi-tionally, interaction techniques such as brushing-and-linking—in which a selection of points on one graph high-lights the same points on all the other graphs—can be used to explore pat-terns within the data.

Parallel Coordinates. As shown in Figure 2d, parallel coordinates (||-co-ord) take a different approach to visu-alizing multivariate data. Instead of graphing every pair of variables in two dimensions, we repeatedly plot the data on parallel axes and then connect the corresponding points with lines. Each poly-line represents a single row in the database, and line crossings between dimensions often indicate inverse cor-relation. Reordering dimensions can aid pattern-finding, as can interactive querying to filter along one or more di-mensions. Another advantage of paral-lel coordinates is that they are relatively compact, so many variables can be shown simultaneously.

mapsAlthough a map may seem a natural way to visualize geographical data, it has a long and rich history of design. Many maps are based upon a carto-graphic projection: a mathematical function that maps the 3D geometry of the Earth to a 2D image. Other maps

0 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 6 7 8 8 8 8 8 8 9

1 0 0 0 0 1 1 1 1 2 2 3 3 3 3 4 4 4 4 5 5 6 7 7 8 9 9 9 9 9

2 0 0 1 1 1 5 7 8 9

3 0 0 1 2 3 3 3 4 6 6 8 8

4 0 0 1 1 1 1 3 3 4 5 5 5 6 7 8 9

5 0 2 3 5 6 7 7 7 9

6 1 2 6 7 8 9 9 9

7 0 0 0 1 6 7 9

8 0 0 1 2 3 4 4 4 4 4 4 4 5 6 7 7 7 9

9 1 3 3 5 7 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

statistical Distributions: figure 2a. stem-and-leaf plot of mechanical turk participation rates.

Source: Stanford Visualization group; http://hci.stanford.edu/jheer/files/zoo/ex/stats/stem-and-leaf.html

Uniform Distribution

0%

50%

100%

0% 50% 100%

Gaussian Distribution0% 50% 100%

Fitted Mixture of 3 Gaussians0% 50% 100%T

urke

r T

ask

Gro

up C

ompl

etio

n %

statistical Distributions: figure 2b. Q-Q plots of mechanical turk participation rates.

Source: Stanford Visualization group; http://hci.stanford.edu/jheer/files/zoo/ex/stats/qqplot.html

cylinders displacement weight horsepower acceleration mpg year

3 68 cubic inch 1613 lbs 46 hp 8 (0 to 60mph) 9 miles/gallon 70

8 455 cubic inch 5140 lbs 230 hp 25 (0 to 60mph) 47 miles/gallon 82

statistical Distributions: figure 2d. Parallel coordinates of automobile data.

Source: ggobi; http://hci.stanford.edu/jheer/files/zoo/ex/stats/parallel.html

horsepower

20003000

40005000 10 15 20

100200

300400

50

100

150

200

2000

3000

4000

5000

weight

2000

3000

4000

5000

10

15

20

acceleration

10

15

20

50 100150

200

100

200

300

400

20003000

40005000

10 15 20

displacement

United States European Union Japan

statistical Distributions: figure 2c. scatter plot matrix of automobile data.

Source: ggobi; http://hci.stanford.edu/jheer/files/zoo/ex/stats/splom.html

practice


knowingly distort or abstract geo-graphic features to tell a richer story or highlight specific data.

Flow Maps. By placing stroked lines on top of a geographic map, a flow map can depict the movement of a quantity in space and (implicitly) in time. Flow lines typically encode a large amount of multivariate information: path points, direction, line thickness, and color can all be used to present dimensions of information to the viewer. Figure 3a is a modern interpretation of Charles Mi-nard’s depiction of Napoleon’s ill-fated march on Moscow. Many of the greatest flow maps also involve subtle uses of distortion, as geography is bended to accommodate or highlight flows.

Choropleth Maps. Data is often col-lected and aggregated by geographi-cal areas such as states. A standard approach to communicating this data is to use a color encoding of the geo-graphic area, resulting in a choropleth map. Figure 3b uses a color encoding to communicate the prevalence of obe-sity in each state in the U.S. Though this is a widely used visualization tech-nique, it requires some care. One com-mon error is to encode raw data values (such as population) rather than using normalized values to produce a densi-ty map. Another issue is that one’s per-ception of the shaded value can also be affected by the underlying area of the geographic region.

Graduated Symbol Maps. An alterna-tive to the choropleth map, the gradu-ated symbol map places symbols over an underlying map. This approach avoids confounding geographic area with data values and allows for more dimensions to be visualized (for example, symbol size, shape, and color). In addition to simple shapes such as circles, gradu-ated symbol maps may use more com-plicated glyphs such as pie charts. In Figure 3c, total circle size represents a state’s population, and each slice indi-cates the proportion of people with a specific BMI rating.

Cartograms. A cartogram distorts the shape of geographic regions so that the area directly encodes a data variable. A common example is to redraw every country in the world sizing it propor-tionally to population or gross domes-tic product. Many types of cartograms have been created; in Figure 3d we use the Dorling cartogram, which represents

0°-10°-20°-30°

18 Oct24 Oct09 Nov

14 Nov24 Nov

28 Nov01 Dec06 Dec07 Dec

maps: figure 3a. flow map of napoleon’s march on moscow, based on the work of charles minard.

http://hci.stanford.edu/jheer/files/zoo/ex/maps/napoleon.html

WY

WI

WV

WA

OR

NV CO

SD

NDMT

ID

UT

NMAZ

CA

NE

KS

OK

TX LA

MS AL GA

ARTN

SC

NC

FL

MN

IA

MO

IL

KY

IN

MI

OH

VA

DEMD

PANJ

NY

CT RIMA

NH

ME

VT

14 - 17%

17 - 20%

20 - 23%

23 - 26%

26 - 29%

29 - 32%

32 - 35%

maps: figure 3b. choropleth map of obesity in the u.s., 2008.

Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/choropleth.html

Obese

Overweight

Normal

maps: figure 3c. Graduated symbol map of obesity in the u.s., 2008.

Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/symbol.html

WY

WI

WV

WA

OR

NVCO

SD

NDMT

ID

UT

NMAZCA

NE

KS

OK

TX

LA

MS

AL

GA

ARTN

SC

NC

FL

MN

IA

MO

IL

KY

IN

MI

OH

VA

DEMD

PA

NJ

NY

CT

RI

MA

NH

ME

VT

10M

1M

5M

100K14 - 17%17 - 20%20 - 23%23 - 26%26 - 29%29 - 32%32 - 35%

maps: figure 3d. Dorling cartogram of obesity in the u.s., 2008.

Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/cartogram.html


practice

each geographic region with a sized circle, placed so as to resemble the true geographic configuration. In this ex-ample, circular area encodes the total number of obese people per state, and color encodes the percentage of the to-tal population that is obese.

hierarchiesWhile some data is simply a flat collec-tion of numbers, most can be organized into natural hierarchies. Consider: spa-tial entities, such as counties, states, and countries; command structures for businesses and governments; soft-ware packages and phylogenetic trees. Even for data with no apparent hierar-chy, statistical methods (for example, k-means clustering) may be applied to organize data empirically. Special visu-alization techniques exist to leverage hierarchical structure, allowing rapid multiscale inferences: micro-observa-tions of individual elements and mac-ro-observations of large groups.

Node-link diagrams. The word tree is used interchangeably with hierarchy, as the fractal branches of an oak might mirror the nesting of data. If we take a two-dimensional blueprint of a tree, we have a popular choice for visualizing hierarchies: a node-link diagram. Many different tree-layout algorithms have been designed; the Reingold-Tilford al-gorithm, used in Figure 4a on a package hierarchy of software classes, produces a tidy result with minimal wasted space.

An alternative visualization scheme is the dendrogram (or cluster) algorithm, which places leaf nodes of the tree at the same level. Thus, in the diagram in Fig-ure 4b, the classes (orange leaf nodes) are on the diameter of the circle, with the packages (blue internal nodes) in-side. Using polar rather than Cartesian coordinates has a pleasing aesthetic, while using space more efficiently.

We would be remiss to overlook the indented tree, used ubiquitously by operating systems to represent file directories, among other applications (see Figure 4c). Although the indented tree requires excessive vertical space and does not facilitate multiscale infer-ences, it does allow efficient interactive exploration of the tree to find a specific node. In addition, it allows rapid scan-ning of node labels, and multivariate data such as file size can be displayed adjacent to the hierarchy.

flare

analytics

cluster

Agglom

erativeCluster

Com

munityS

tructureH

ierarchicalCluster

MergeE

dge

graph

Betw

eennessCentrality

LinkDistance

MaxFlow

MinC

utS

hortestPaths

SpanningT

ree

optimization

AspectR

atioBanker

animate

Easing

FunctionSequence

ISchedulable

Parallel

Pause

Scheduler

Sequence

Transition

TransitionE

ventT

ransitionerT

ween

interpolate

ArrayInterpolator

ColorInterpolator

DateInterpolator

InterpolatorM

atrixInterpolatorN

umberInterpolator

ObjectInterpolator

PointInterpolator

RectangleInterpolator

data

DataField

DataS

chema

DataS

etD

ataSource

DataT

ableD

ataUtil

converters

Converters

Delim

itedTextC

onverterG

raphMLC

onverterID

ataConverter

JS

ON

Converter

display

DirtyS

priteLineS

priteR

ectSprite

TextS

prite

flex

FlareVis

physics

DragForce

GravityForce

IForceN

BodyForce

Particle

Sim

ulationS

pringS

pringForce

query

AggregateE

xpressionA

ndA

rithmetic

Average

BinaryE

xpressionC

omparison

Com

positeExpression

Count

DateU

tilD

istinctE

xpressionE

xpressionIteratorFn If IsALiteralM

atchM

aximum

Minim

umN

otO

rQ

ueryR

angeS

tringUtil

Sum

Variable

Variance

Xorm

ethods

_ addandaveragecountdistinctdiveq fn gt gteiff isalt ltem

axm

inm

odm

ulneqnotor orderbyrangeselectstddevsubsumupdatevariancew

herexor

scale

IScaleM

apLinearS

caleLogS

caleO

rdinalScale

QuantileS

caleQ

uantitativeScale

RootS

caleS

caleS

caleType

Tim

eScale

util

Arrays

Colors

Dates

Displays

FilterG

eometry

IEvaluable

IPredicate

IValueP

roxyM

athsO

rientationP

ropertyS

hapesS

ortS

tatsS

tringsheap

FibonacciHeap

HeapN

ode

math

DenseM

atrixIM

atrixS

parseMatrix

palette

ColorP

aletteP

aletteS

hapePalette

SizeP

alette

vis

Visualization

axis

Axes

Axis

AxisG

ridLineA

xisLabelC

artesianAxes

controls

AnchorC

ontrolC

lickControl

Control

ControlList

DragC

ontrolE

xpandControl

HoverC

ontrolIC

ontrolP

anZoomC

ontrolS

electionControl

TooltipC

ontrol

data

Data

DataList

DataS

priteE

dgeSprite

NodeS

priteS

caleBinding

Tree

TreeB

uilderrender

Arrow

Type

EdgeR

endererIR

endererS

hapeRenderer

events

DataE

ventS

electionEvent

TooltipE

ventV

isualizationEvent

legendLegendLegendItemLegendR

ange

operator

IOperator

Operator

OperatorList

OperatorS

equenceO

peratorSw

itchS

ortOperator

distortionB

ifocalDistortion

Distortion

FisheyeDistortion

encoder

ColorE

ncoderE

ncoderP

ropertyEncoder

ShapeE

ncoderS

izeEncoder

filterFisheyeT

reeFilterG

raphDistanceFilter

VisibilityFilter

labelLabelerR

adialLabelerS

tackedAreaLabeler

layout

AxisLayout

BundledE

dgeRouter

CircleLayout

CircleP

ackingLayoutD

endrogramLayout

ForceDirectedLayout

IcicleTreeLayout

IndentedTreeLayout

LayoutN

odeLinkTreeLayout

PieLayout

RadialT

reeLayoutR

andomLayout

StackedA

reaLayoutT

reeMapLayout

hierarchies: figure 4a. Radial node-link diagram of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/tree.html

933KB47KB14KB25KB

6KB97KB29KB23KB

4KB29KB

1KB1KB0KB

10KB2KB9KB2KB1KB

87KB30KB

161KB422KB

16KB33KB

flareanalytics

clustergraphoptimization

animatedatadisplayflexphysics

DragForceGravityForceIForceNBodyForceParticleSimulationSpringSpringForce

queryscaleutilvis

Visualizationaxis

43KB107KB

6KB35KB

179KB1KB2KB5KB4KB2KB1KB

13KB14KB11KB16KB

105KB6KB3KB9KB

11KB4KB8KB4KB3KB

controlsdataeventslegendoperator

IOperatorOperatorOperatorListOperatorSequenceOperatorSwitchSortOperatordistortionencoderfilterlabellayout

AxisLayoutBundledEdgeRouterCircleLayoutCirclePackingLayoutDendrogramLayoutForceDirectedLayoutIcicleTreeLayoutIndentedTreeLayout

hierarchies: figure 4c. indented tree layout of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/indent.html

flare

anal

ytic

s

clus

ter

Agg

lom

erat

iveC

lust

erC

omm

unity

Str

uctu

reH

iera

rchi

calC

lust

erM

erge

Edg

egr

aph

Bet

wee

nnes

sCen

tral

ityLi

nkD

ista

nce

Max

Flow

Min

Cut

Sho

rtes

tPat

hsS

pann

ingT

ree

optim

izat

ion

Asp

ectR

atio

Ban

ker

anim

ate

Easi

ngFu

nctio

nSeq

uenc

e

ISch

edul

able

Para

llel

Paus

eSc

hedu

ler

Sequ

ence

Tran

sitio

nTr

ansi

tionE

vent

Tran

sitio

ner

Twee

n

interp

olat

e

Array

Inte

rpol

ator

Color

Inte

rpol

ator

DateI

nter

pola

tor

Inte

rpol

ator

Mat

rixIn

terp

olat

or

Number

Inte

rpolat

or

ObjectInte

rpolato

r

PointInte

rpolato

r

RectangleInterpolator

data

DataField

DataSchema

DataSet

DataSource

DataTable

DataUtil

converters

Converters

DelimitedTextConverter

GraphMLConverter

IDataConverter

JSONConverter

display

DirtySprite

LineSprite

RectSprite

TextSprite

flexFlareVis

physics

DragForceGravityForce

IForceNBodyForceParticleSimulationSpringSpringForce

query

AggregateExpressionAndArithmeticAverageBinaryExpressionComparisonCompositeExpression

CountDateUtilDistinctExpressionExpressionIterator

FnIfIsALiteralMatchMaximumMinimum

NotOrQueryRangeStringUtil

SumVariableVariance

Xor

methods _addandaverage

count

distinct

diveqfngtgteiffisaltltemax

min

mod

mul

neqnotororderbyrangeselectstddevsubsumupdatevariancew

here

xor

scal

e

ISca

leM

apLi

near

Sca

leLo

gSca

leO

rdin

alS

cale

Qua

ntile

Sca

le

Qua

ntita

tiveS

cale

Roo

tSca

leS

cale

Sca

leT

ype

Tim

eSca

le

util

Arr

ays

Col

ors

Dat

es

Dis

play

sFi

lter

Geo

met

ry

IEva

luab

le

IPre

dica

te

IVal

uePr

oxy

Mat

hs

Orient

atio

n

Prop

erty

Shap

esSort

Stats

String

s

heap

Fibonac

ciHea

p

HeapNode

math

DenseMatri

xIMatri

x

SparseMatrix

palette

ColorPalettePalette

ShapePalette

SizePalette

vis

Visualization

axis

AxesAxisAxisGridLine

AxisLabel

CartesianAxes

controls

AnchorControlClickControlControlControlListDragControlExpandControlHoverControlIControlPanZoomControlSelectionControlTooltipControl

data

DataDataList

DataSpriteEdgeSpriteNodeSprite

ScaleBindingTree

TreeBuilderrender

ArrowType

EdgeRenderer

IRenderer

ShapeRenderer events

DataEvent

SelectionEvent

TooltipEvent

VisualizationEventlegend

Legend

LegendItem

LegendRange

operator

IOperator

Operator

OperatorList

OperatorSequence

OperatorSwitch

SortOperator

distortion

BifocalDistortion

Distortion

FisheyeDistortion

encoder

ColorEncoderEncoder

PropertyEncoder

ShapeEncoder

SizeEncoder

filter

FisheyeTreeFilter

GraphDistanceFilter

VisibilityFilter

label

Labeler

RadialLabeler

StackedAreaLabeler

layout

AxisLayout

BundledEdgeR

outerC

ircleLayout

CircleP

ackingLayoutD

endrogramLayout

ForceDirectedLayout

IcicleTreeLayout

IndentedTreeLayout

LayoutN

odeLinkTreeLayoutP

ieLayoutR

adialTreeLayout

Random

LayoutS

tackedAreaLayout

TreeM

apLayout

hierarchies: figure 4b. cartesian node-link diagram of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/cluster-radial.html

practice


Adjacency Diagrams. The adjacency diagram is a space-filling variant of the node-link diagram; rather than draw-ing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in Figure 4d is similar to the first node-link diagram in that the root node appears at the top, with child nodes underneath. Because the nodes are now space-filling, however, we can use a length encoding for the size of software classes and packages. This reveals an additional dimension that would be difficult to show in a node-link diagram.

The sunburst layout, shown in Fig-ure 4e, is equivalent to the icicle lay-out, but in polar coordinates. Both are implemented using a partition layout, which can also generate a node-link diagram. Similarly, the previous cluster layout can be used to generate a space-filling adjacency diagram in either Car-tesian or polar coordinates.

Enclosure Diagrams. The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rect-angles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. The example shown in Figure 4f uses padding (in blue) to emphasize enclosure; an alternative saturation encoding is sometimes used. Squarified treemaps use approxi-mately square rectangles, which offer better readability and size estimation than a naive “slice-and-dice” subdivi-sion. Fancier algorithms such as Vo-ronoi and jigsaw treemaps also exist but are less common.

By packing circles instead of sub-dividing rectangles, we can produce a different sort of enclosure diagram that has an almost organic appear-ance. Although it does not use space as efficiently as a treemap, the “wast-ed space” of the circle-packing layout, shown in Figure 4g, effectively reveals the hierarchy. At the same time, node sizes can be rapidly compared using area judgments.

NetworksIn addition to organization, one aspect

flare

analytics

cluster

graph

optimization

animate

interpolate

data

converters

display

flex

physics

query

methods

scale

util heap

math

palette

vis

axis

controls data

render

events

legend

operatordistortion

encoder

filterlabel

layout

hierarchies: figure 4g. nested circles layout of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/pack.html Source: The Flare Toolkit http://flare.prefuse.org

flar

e

anal

ytic

s

clus

ter

grap

h

anim

ate

Eas

ing

Tra

nsiti

on

Tra

nsiti

oner

inte

rpol

ate

Inte

rpol

ator

data

conv

erte

rsG

raph

MLC

onve

rter

disp

lay

Dir

tyS

prite

Tex

tSpr

ite

phys

ics

NB

odyF

orce

Sim

ulat

ion

quer

y

Que

ry

met

hods

scal

e

util

Arr

ays

Col

ors

Dat

es

Dis

play

s

Geo

met

ry

Mat

hs

Sha

pes

Str

ings

heap

Fibo

nacc

iHea

p

mat

h

pale

tte

vis

Vis

ualiz

atio

n

axis

Axi

s

cont

rols

Too

ltip

Con

trol

data

Dat

a

Dat

aLis

t

Dat

aSpr

ite

Nod

eSpr

ite

Sca

leB

indi

ng

Tre

eBui

lder

rend

er

lege

ndLe

gend

Lege

ndR

ange

oper

ator

dist

ortio

n

enco

der

filte

r

labe

lLa

bele

r

layo

ut

Cir

cleL

ayou

t

Cir

cleP

acki

ngLa

yout

Forc

eDir

ecte

dLay

out

Nod

eLin

kTre

eLay

out

Rad

ialT

reeL

ayou

t

Sta

cked

Are

aLay

out

Tre

eMap

Layo

ut

hierarchies: figure 4d. icicle tree layout of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/icicle.html

flare

anal

ytic

scl

uste

r

grap

hM

axFl

owM

inC

utan

imat

eEa

sing

Tran

sitio

nTra

nsiti

oner

interpolate

Interpolator

data

converters

GraphMLConverter

display

DirtySprite

TextSprite

physics

NBodyForce

Simulation

query

Query

methods

scale

util

Arrays

Colors

Dates

Displays

Geom

etry

Maths

Shapes

Str

ings

heap

Fibo

nacc

iHea

pm

ath

pale

tte

vis

Visu

aliz

atio

n

axis

Axi

sco

ntrols

SelectionContro

l

TooltipContro

l

data

Data

DataListDataSprite

NodeSprite

ScaleBinding

TreeBuilder

render

legend

Legend

LegendRange

operator

distortion

encoder

filter

label

Labeler

layout

CircleLayoutCirclePackingLayout

ForceDirectedLayout

LayoutN

odeLinkTreeLayout

RadialT

reeLayout

StackedA

reaLayout

TreeM

apLayout

hierarchies: figure 4e. sunburst (radial space-filling) layout of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/sunburst.html

flare

analyticstics

clusteromerativemunityStru

HierarchicalCluste

MergeEdgeanalytgraph

weennessCenLinkDistanc

MaxFlowMinChortestPat

SpanningTree optimizationAspectRatioBanker

animateEasing

FunctionSequence chedul

Parallel

Pause

Scheduler

Sequence

Transition

sition

Transitioner

Tween

interpolateyInterp

lorInterpoInter

Interpolator

trixInterpo

berInter

ectInterp

tInter

angleInter

data

DataFie

DataSchema

DataSe

DataSourceataTa

DataUtil

data

converters

nver

itedTextCo

aphMLConve

ataConv

SONConver display

DirtySprite

LineSprite

RectSprit

TextSprite

flexFlareVis

physics

ragFor

avityFor

Fo

NBodyForce

Particle

Simulation

Spring

SpringForc

query

egateExpr

And

Arithmetic

Average

naryExpress

Comparison

mpositeExpre

Coun

DateUtil

Distinc

Expression

pressionIter

Fn

If IsA

Literal

Match

aximnim

Not

Or

Query

Range

StringUtil

Sum

VariablVariance

Xor

methods

_

add

anderacoun

stin

div eq

fn

gt

gteiff

isa

ltlte

maxmin

modmul

neq not

orrder

range

elec

tddesub

sum

pdarian

wher

xor

scale

IScaleMap

inearSca

LogScale

OrdinalScale

uantileSc

uantitativeSca

RootScal

Scale

ScaleType

TimeScale

flare

utilArrays

Colors

Dates

Displays

Filter

Geometry

valuredialuePr

Maths

ientat

Property

Shapes

Sort

Stats

Strings

heapFibonacciHeap pN

mathDenseMatrix

IMatrSparseMatri

flare

palette

ColorPalette

Palette

hapePalezePale

vis

Visualization

axis

Axes

Axis

GridisLaCartesianAxes

viscontrols

nchorCont

ClickContr

Contro

ControlList

DragControl

ExpandControHoverContro

Cont

anZoomCont

SelectionControl

TooltipControl

data

Data DataList

DataSprite

dgeSpr

NodeSprite ScaleBinding

TreeTreeBuilder

renderrowT

EdgeRendereend

peRend

eventsDataEven

lectionEv

oltipEv

lizatio

legend

Legend

egendIteLegendRange

operator

Operat

Operator

OperatorList

OperatorSequenc

eratorSw

SortOperat

distortionifocalDistort

DistortionsheyeDistort

encoder

lorEnco

EncoderpertyEnc

apeEnco

izeEncod

filterheyeTreeFaphDistanceF

VisibilityFilte

label

Labeler

RadialLabelkedAreaL

operatorlayout

AxisLayout ledEdgeR

CircleLayout

CirclePackingLayo

ndrogramLa

ForceDirectedLayou

cicleTreeLay

dentedTreeLa

Layout

NodeLinkTreeLayou

PieLayout

RadialTreeLayout

om

StackedAreaLayout

TreeMapLayou

hierarchies: figure 4f. treemap layout of the flare package hierarchy.

http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/treemap.html


practice

of data that we may wish to explore through visualization is relationship. For example, given a social network, who is friends with whom? Who are the central players? What cliques ex-ist? Who, if anyone, serves as a bridge between disparate groups? Abstractly, a hierarchy is a specialized form of net-work: each node has exactly one link to its parent, while the root node has no links. Thus node-link diagrams are also used to visualize networks, but the loss of hierarchy means a different al-gorithm is required to position nodes.

Mathematicians use the formal term graph to describe a network. A central challenge in graph visualiza-tion is computing an effective layout. Layout techniques typically seek to po-sition closely related nodes (in terms of graph distance, such as the number of links between nodes, or other met-rics) close in the drawing; critically, unrelated nodes must also be placed far enough apart to differentiate rela-tionships. Some techniques may seek to optimize other visual features—for example, by minimizing the number of edge crossings.

Force-directed Layouts. A common and intuitive approach to network lay-out is to model the graph as a physical system: nodes are charged particles that repel each other, and links are damp-ened springs that pull related nodes together. A physical simulation of these forces then determines the node posi-tions; approximation techniques that avoid computing all pairwise forces enable the layout of large numbers of nodes. In addition, interactivity allows the user to direct the layout and jiggle nodes to disambiguate links. Such a force-directed layout is a good starting point for understanding the structure of a general undirected graph. In Figure 5a we use a force-directed layout to view the network of character co-occurrence in the chapters of Victor Hugo’s classic novel, Les Misérables. Node colors de-pict cluster memberships computed by a community-detection algorithm.

Arc Diagrams. An arc diagram, shown in Figure 5b, uses a one-dimen-sional layout of nodes, with circular arcs to represent links. Though an arc diagram may not convey the overall structure of the graph as effectively as a two-dimensional layout, with a good ordering of nodes it is easy to identify

networks: figure 5a. force-directed layout of Les Misérables character co-occurrences.

http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/force.html

Myr

iel

Nap

oleo

nM

lle. B

aptis

tine

Mm

e. M

aglo

ireC

ount

ess

de L

oG

ebor

and

Cha

mpt

erci

erC

rava

tteC

ount

Old

Man

Laba

rreVa

ljean

Mar

guer

iteM

me.

de

RIs

abea

uG

erva

isTh

olom

yes

List

olie

rFa

meu

ilBl

ache

ville

Favo

urite

Dah

liaZe

phin

eFa

ntin

eM

me.

The

nard

ier

Then

ardi

erC

oset

teJa

vert

Fauc

hele

vent

Bam

atab

ois

Perp

etue

Sim

plic

eSc

auffl

aire

Wom

an 1

Judg

eC

ham

pmat

hieu

Brev

etC

heni

ldie

uC

oche

paille

Pont

mer

cyBo

ulat

ruel

leEp

onin

eAn

zelm

aW

oman

2M

othe

r In

noce

ntG

ribie

rJo

ndre

tteM

me.

Bur

gon

Gav

roch

eG

illeno

rman

dM

agno

nM

lle. G

illeno

rman

dM

me.

Pon

tmer

cyM

lle. V

aubo

isLt

. Gille

norm

and

Mar

ius

Baro

ness

TM

abeu

fEn

jolra

sC

ombe

ferre

Prou

vaire

Feui

llyC

ourfe

yrac

Baho

rel

Boss

uet

Joly

Gra

ntai

reM

othe

r Pl

utar

chG

ueul

emer

Babe

tC

laqu

esou

sM

ontp

arna

sse

Tous

sain

tC

hild

1C

hild

2Br

ujon

Mm

e. H

uche

loup

networks: figure 5b. arc diagram of Les Misérables character co-occurrences.

http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/arc.html

Child 1

Chi

ld 1

Child 2

Chi

ld 2

Mother Plutarch

Mot

her

Plu

tarc

h

Gavroche

Gav

roch

e

Marius

Mar

ius

Mabeuf

Mab

euf

Enjolras

Enj

olra

s

Combeferre

Com

befe

rre

Prouvaire

Pro

uvai

re

Feuilly

Feu

illy

Courfeyrac

Cou

rfey

rac

Bahorel

Bah

orel

Bossuet

Bos

suet

Joly

Joly

Grantaire

Gra

ntai

re

Mme. Hucheloup

Mm

e. H

uche

loup

Jondrette

Jond

rette

Mme. Burgon

Mm

e. B

urgo

n

Boulatruelle

Bou

latr

uelle

Cosette

Cos

ette

Woman 2

Wom

an 2

Gillenormand

Gill

enor

man

d

Magnon

Mag

non

Mlle. Gillenormand

Mlle

. G

illen

orm

and

Mme. Pontmercy

Mm

e. P

ontm

ercy

Mlle. Vaubois

Mlle

. V

aubo

is

Lt. Gillenormand

Lt.

Gill

enor

man

d

Baroness T

Bar

ones

s T

Toussaint

Tou

ssai

nt

Mme. Thenardier

Mm

e. T

hena

rdie

r

Thenardier

The

nard

ier

Javert

Jave

rt

Pontmercy

Pon

tmer

cy

Eponine

Epo

nine

Anzelma

Anz

elm

a

Gueulemer

Gue

ulem

er

Babet

Bab

et

Claquesous

Cla

ques

ous

Montparnasse

Mon

tpar

nass

e

Brujon

Bru

jon

Marguerite

Mar

guer

ite

Tholomyes

Tho

lom

yes

Listolier

List

olie

r

Fameuil

Fam

euil

Blacheville

Bla

chev

ille

Favourite

Fav

ourit

e

Dahlia

Dah

lia

Zephine

Zep

hine

Fantine

Fan

tine

Perpetue

Per

petu

e

Labarre

Laba

rre

Valjean

Val

jean

Mme. de R

Mm

e. d

e R

Isabeau

Isab

eau

Gervais

Ger

vais

Bamatabois

Bam

atab

ois

Simplice

Sim

plic

e

Scaufflaire

Sca

uffla

ire

Woman 1

Wom

an 1

Judge

Judg

e

Champmathieu

Cha

mpm

athi

eu

Brevet

Bre

vet

Chenildieu

Che

nild

ieu

Cochepaille

Coc

hepa

ille

Myriel

Myr

iel

Napoleon

Nap

oleo

n

Mlle. Baptistine

Mlle

. B

aptis

tine

Mme. Magloire

Mm

e. M

aglo

ire

Countess de Lo

Cou

ntes

s de

Lo

Geborand

Geb

oran

d

Champtercier

Cha

mpt

erci

er

Cravatte

Cra

vatte

Count

Cou

nt

Old Man

Old

Man

Fauchelevent

Fau

chel

even

t

Mother Innocent

Mot

her

Inno

cent

Gribier

Grib

ier

networks: figure 5c. matrix view of Les Misérables character co-occurrences.

http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/matrix.htmlSource: http://www-personal.umich.edu/~mejn/netdata

practice


cliques and bridges. Further, as with the indented-tree layout, multivariate data can easily be displayed alongside nodes. The problem of sorting the nodes in a manner that reveals underlying cluster structure is formally called seriation and has diverse applications in visualiza-tion, statistics, and even archaeology.

Matrix Views. Mathematicians and computer scientists often think of a graph in terms of its adjacency matrix: each value in row i and column j in the matrix corresponds to the link from node i to node j. Given this representa-tion, an obvious visualization then is: just show the matrix! Using color or sat-uration instead of text allows values as-sociated with the links to be perceived more rapidly.

The seriation problem applies just as much to the matrix view, shown in Figure 5c, as to the arc diagram, so the order of rows and columns is im-portant: here we use the groupings generated by a community-detection algorithm to order the display. While path-following is more difficult in a matrix view than in a node-link dia-gram, matrices have a number of com-pensating advantages. As networks get large and highly connected, node-link diagrams often devolve into giant hairballs of line crossings. In matrix views, however, line crossings are im-possible, and with an effective sort-ing one quickly can spot clusters and bridges. Allowing interactive group-ing and reordering of the matrix facili-tates even deeper exploration of net-work structure.

conclusionWe have arrived at the end of our tour and hope the reader has found the ex-amples both intriguing and practical. Though we have visited a number of visual encoding and interaction tech-niques, many more species of visualiza-tion exist in the wild, and others await discovery. Emerging domains such as bioinformatics and text visualization are driving researchers and designers to continually formulate new and creative representations or find more powerful ways to apply the classics. In either case, the DNA underlying all visualizations remains the same: the principled map-ping of data variables to visual features such as position, size, shape, and color.

As you leave the zoo and head back

into the wild, try deconstructing the various visualizations crossing your path. Perhaps you can design a more ef-fective display?

Additional Resources

Few, S. Now I See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press, 2009.

Tufte, E. The Visual Display of Quantitative Information. Graphics Press, 1983.

Tufte, E.Envisioning Information. Graphics Press, 1990.

Ware, C.Visual Thinking for Design. Morgan Kaufmann, 2008.

Wilkinson, L.The Grammar of Graphics. Springer, 1999.

Visualization Development Tools

Prefuse: Java API for information visualization.

Prefuse Flare: ActionScript 3 library for data visualization in the Adobe Flash Player.

Processing: Popular language and IDE for graphics and interaction.

Protovis: JavaScript tool for Web-based visualization.

The Visualization Toolkit: Library for 3D and scientific visualization.

Related articles on queue.acm.org

A Conversation with Jeff heer, Martin Wattenberg, and Fernanda Viégashttp://queue.acm.org/detail.cfm?id=1744741

Unifying Biological Image Formats with hDF5Matthew T. Dougherty, Michael J. Folk, Erez Zadok, Herbert J. Bernstein, Frances C. Bernstein, Kevin W. Eliceiri, Werner Benger, Christoph Besthttp://queue.acm.org/detail.cfm?id=1628215

Jeffrey heer is an assistant professor of computer science at Stanford university, where he works on human-computer interaction, visualization, and social computing. He led the design of the Prefuse, Flare, and Protovis visualization toolkits.

Michael Bostock is currently a Ph.D. student in the Department of Computer Science at Stanford university. before attending Stanford, he was a staff engineer at google, where he developed search quality evaluation methodologies.

Vadim Ogievetsky is a master’s student at Stanford university specializing in human-computer interaction. He is a core contributor to Protovis, an open-source Web-based visualization toolkit.

© 2010 ACM 0001-0782/10/0600 $10.00

all visualizations share a common “Dna”—a set of mappings between data properties and visual attributes such as position, size, shape, and color—and customized species of visualization might always be constructed by varying these encodings.

byjeff rey heer, michael bostock, and vadim ogievetsky …this minimal-istic representation uses the...

Documents