graphical displays - information technology...

28
Graphical displays Dan Carr Volume 2, pp 933–960 in Encyclopedia of Environmetrics (ISBN 0471 899976) Edited by Abdel H. El-Shaarawi and Walter W. Piegorsch John Wiley & Sons, Ltd, Chichester, 2002

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays

Dan Carr

Volume 2, pp 933–960

in

Encyclopedia of Environmetrics(ISBN 0471 899976)

Edited by

Abdel H. El-Shaarawi and Walter W. Piegorsch

John Wiley & Sons, Ltd, Chichester, 2002

Page 2: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays

The purpose of environmental visualization viagraphical displays is to facilitate scientific andpublic understanding of environmental status, trends,and processes. Such understanding is incrementalowing to:

ž the complexity of the environment;ž the difficulty of parsimoniously conceptualizing

this complexity;ž the logistic and political impediments to collect-

ing adequate, representative data;ž and the limits of human perception and cognition

for understanding multivariate summaries.

This entry can only hint at the range of challengesfaced in attempts to characterize the environment andcommunicate using graphical summaries. It can onlytouch on the background knowledge that helps ex-perienced readers to assess and interpret environmen-tal graphics. However, the background is importantsince environmental visualization goes far beyondroutine production and superficial interpretation ofenvironmental graphics.

After a brief introduction, the focus herein ison graphical design principles and graphical tem-plates for representing environmental summaries. Thequality of environmental graphics depends on manyfactors: conceptualization, data collection, model-ing, summarization, and, as emphasized here, soundgraphical design.

Our understanding of quantitative graphical designcontinues to evolve. Since one design principle isto use familiar templates, a tension arises betweenusing familiar templates and introducing new tem-plates that offer richer, more focused or accuratelycommunicated content. This entry includes some newtemplates and uses design principles to motivate theirinclusion. The sequence of static templates presentedis far from complete. Further, a printed publicationhas difficulty in doing justice to interactive graphicsand no such attempt is made here. This entry attemptsto help the reader fill in omissions by providing ref-erences into the literature.

Another omission is a discussion of integratingenvironmental graphics into extended documents.Interested readers can refer to Stone et al. [99] andWahlstr̈om et al. [109] as examples of excellence. Thereader may also want to adapt methods and graphics

illustrated in the award-winningAtlas of UnitedStates Mortality [Pickle et al. 89] to environmentapplications.

Environmental Complexity

The environment is complex. Spatial and temporalprocesses at different scales influence environmentalstatus and trends. Large external influences includesolar radiation and lunar gravity inducing tides. Tinyocean plankton are crucial to the food chain (seeCommunity food webs). Mankind is busy rearrang-ing the molecular composition of the earth’s landsurface, ocean, and atmosphere. Computer modelsthat simulate the earth processes should, at somescale, account for changes potentially induced bymankind’s actions, including those motivated by sim-ulations. Our environment hosts the full range ofprocesses to visualize: self-balancing processes, pro-cesses sensitive to the flap of a butterfly’s wings, andprocesses influenced by political decisions.

Concepts and Indicators

The development of concepts to characterize environ-mental status and trends is itself an ongoing process.The seemingly simple status question, ‘how manylakes are there in the nation?’, depends on the defini-tion of a lake. What width, depth, duration and otherproperties must a body of water have to be a lake?Obtaining consensus can be difficult. The concept ofa lake is relatively simple compared with many otherenvironmental concepts. For example, the notion ofecoregions [4–6, 86, 87] is generally accepted as use-ful but much more difficult to characterize than lakes.Which multivariate descriptors and thresholds shouldbe involved? The process of relating existing ecore-gion definitions to other variables continues [29].

The variables available typically start out as fieldmeasurements. Environmental scientists are inclinedto retain some measurements, such as pH, as theystand. However, scientists often transform the fieldmeasurements to produce more useful variables. Thepercentage of time that a body of water has dissolvedoxygen values below a critical threshold may bemore instructive than the dissolved oxygen values bythemselves. A big challenge is to convert a collectionof field measurements into indicators, such as anindicator of biological integrity. (A discussion of

Page 3: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

2 Graphical displays

indicators appears in [63].) Indicators need to bemeaningful across different environmental habitatsand their composition often needs to vary from regionto region. Efforts continue toward the developmentof concepts, indicators, and indices that summarizemultivariate relationships. Often the goal is to providea broad univariate characterization of environmentalhealth or safety for humans. Can a single index ofair quality capture the human health implications ofvarying air toxins (such as lead,ozone, and volatileorganic compounds)? The well-known US consumerprice index was a compromise that took on a life of itsown as an index. Environmental scientists continueto seek analogous simplifications whose utility ascommunication devices outweighs the dangers oftheir simplification.

Data, Estimates, Comparison andInterpretation

The sources of environmental data include fieldsamples, survey samples (see Sampling, environ-mental), satellite imagery (see Remote sensing),administrative records and computer simulation. Tobe useful such data must be transformed into esti-mates that are scientifically meaningful. Each typeof data comes with its own set of issues to addressin the process of producing estimates that are wor-thy of evaluation. Common issues include calibratinginstruments, scaling variables, estimating variables assurrogates for the desired variables of interest, adjust-ing for covariates, assessing representative coverageof the population of interest, and validating simulatedor indirect estimates.

Graphics can be no better than the estimates pre-sented. The reader should be concerned wheneverquantitative graphics fail to show confidence boundsfor estimates. The lack of confidence bounds is oftena warning that estimates have not been assessed withrespect to accuracy (bias) and precision (variabil-ity).

In many cases the available data are inadequateto address the question of interest. In such casesthe presentation of tables and graphs derived fromunrepresentative or marginally related data promotesthe illusion of serious scientific monitoring andassessment. Appropriate interpretation of graphicsdepends upon understanding the meta-data, the dataabout the underlying data and resulting estimates.

The meta-data provide important information aboutdata quality. The concern about accuracy is notlimited to statistical estimates but extends to spatialdatabases [61].

The heart of graphics is comparison. Qualitygraphics help the reader to make meaningful compar-isons. Consider Figure 1, which showstimes series ofCO2 production per capita for energy used in OECD(Organization for Economic Cooperation and Devel-opment) nations. First note that confidence boundsare not present. This complicates making compar-isons. When comparing estimates without confidencebounds, the reader should immediately wonder if theestimates are worthy of comparison. In the absenceof confidence bounds, the reader is tempted to makeassumptions that may not be justified. A first plausibleassumption is that a nation’s estimates are com-parable over time. (A comparison of 1995 and 1997OECD compendium values shows that some nationscontinue to refine their historical estimates.) A muchmore questionable assumption in studying Figure 1 iswhether estimates from different nations are compar-able. The methodology that nations employ to obtainestimates can vary greatly, especially in situationsinvolving Third World nations. The reader shouldinterpret the ranking of nations (by 1995 values) inFigure 1 cautiously, not only because some values arevery close. While the OECD works toward harmo-nization of estimates from member nations, the goalis often not in hand. The reader should be aware thatnations, like people, are motivated to put themselvesin a good light. In discussing maps, Wood [118] saysthat every map serves an agenda. The same is true forpublished tables and graphs. Scientific comparison isdifficult in the presence of differing degrees of ‘goodlight’ estimates.

One class of methods for making estimateslook better involves transformations that change thereader’s perspective. Showing values per capita inFigure 1 is more favorable to the US than showingthe total CO2 production values. While the top panelin Figure 1 is taller to accommodate a bigger rangeof values, the US per capita values are roughly on thesame scale as other OECD nations. Since the US hasa high gross domestic product, an even more favor-able view shows CO2 production relative to the grossdomestic product. This suggests that US energy pro-duction is more efficient. The agenda influences thechoice of transformation.

Page 4: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 3

Denmark

Australia

Canada

United States

Luxembourg

Germany

Belgium

Netherlands

Czech Republic

Japan

United Kingdom

Ireland

Finland

Norway

Korea

New Zealand

Poland

Greece

Italy

Austria

Iceland

Switzerland

France

Spain

Sweden

Turkey

Mexico

Portugal

Hungary

0

10

20

30

0

10

0

10

0

10

0

10

0

10

1980 1985 1990 1995Year

0

10

Annual CO2 emissions from energy use(units = tons per person)

Figure 1 Times series sorted and grouped into panels. Values and ranking are subject to question

Page 5: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

4 Graphical displays

Comparability issues are not limited to differencesamong nations, but arise whenever researchers em-ploy different methods. Historically, the USEnviron-mental Protection Agency’s (EPA) STORET data-base accumulated statistics on hundreds of thousandswater quality samples each year. Even after theEPA made efforts to provide the data in the sameunits, comparability issues remained due to differ-ent sample collection, chemical analysis, and record-ing procedures. Those interested in US water qual-ity are inclined to focus attention on the subset ofwater quality data from the US Geological Survey(USGS), because of the consistent methodology andhigh standards. Integrating environmental informa-tion from multiple sources that use different methodsis problematic (see Meta-analysis). Consequentlymany data are never used beyond the original studyeven when made readily available in public databases.

Changes over time produce comparability prob-lems. Political entities and boundaries change. Howshould the unification of West and East Germany behandled in Figure 1? Measurement and calculationmethods tend to improve over time. Researchers arenot inclined to make statistical adjustments so thatanalyses can span estimates based on older data andmethods. A common attitude is that the new esti-mates are exciting and so much better than previousestimates that they should serve as benchmarks forthe future assessment of trends. This postpones theestimation of short-term trends.

High-quality, representative environmental dataare often difficult and expensive to collect. Forexample, atmospheric scientists want a detailed snap-shot of the whole earth’s atmosphere. The logisticsand expense of such a massive simultaneous col-lection effort represent a major barrier. Statisticalresearchers have developed a representative spatialsampling methodology that could produce estimateswith uncertainty bounds for a host of variables [97,98]. Olsen et al. [83] provide an overview of its usein national monitoring. Studies [65, 88] have usedthe methodology for various regions within the US,but the methodology is not employed at a nationalscale in the US, presumably due to the expense.Also, it is naive to be unaware of corporate andpolitical interests opposed to the collection of envi-ronmental data, while privacy considerations make itdifficult to access the collected data even as advancesin databases and web technology would seem toimprove access. The lack of available, high-quality,

representative data is a common problem in environ-mental visualization.

Figure 2, a multiple panel bar plot of EPA’s toxicrelease inventory estimates, raises numerous com-parison and interpretation issues [18]. Note first theabsence of confidence bounds. The estimates derivefrom federally mandated company self-reports. With-out substantial efforts to obtain external validationmeasurements, the reader is not in a position to accessthe accuracy of the self-reported estimates. Assum-ing comparability across states, the reader infers thatthe state of Louisiana has a surface water problem,but how much better or worse might it really be?EPA publications make it clear that the survey doesnot include all release sources nor all kinds of toxicreleases. This suggests that the real totals are higher.

Another interpretation problem with Figure 2 isthat the unit of measurement, total pounds, is anindex comprised of many different things. This indexis not well suited for purposes of communicatingrisk and the personal implications of being part ofa toxin-distributing society. If the index decreases2% from one year to the next, is that good? Whatdoes a percentage decrease in pounds mean to theUS in terms of genetic mutations (seeMutagenesis,environmental), morbidity and mortality in plant,animal and human populations? The public may inferthat fewer pounds are better than more pounds, butthat can be wrong due to the changing mixture oftoxins involved. Even if the mixture keeps the sameproportions, continuing accumulations of toxins maylead to increased risk. Interpretation of Figure 2 isproblematic, even though the multiple panel bar chartwith panel sizing to make comparable scales is anexcellent template.

Research in environmental visualization shoulddevelop indices to help people to understand the risksinvolved. An index of toxic pounds per person makesthe index more personal. While such an index maydraw attention to the implications of living in thesociety, it does not translate the index into quanti-tative implications. The task of understanding andcommunicating risk is challenging. For very simplis-tic scenarios it is possible to do some calculations.In terms of one effect, mortality, scientists could useLD50s for mice (the dose that is lethal to half theexposed mice;seeBiological assay) to calculate howmany mice could be killed by direct exposure assum-ing additive effects. This scenario ignores such thingsas exposure pathways, transformations of the toxins

Page 6: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 5

TexasLouisianaOhioFloridaTennessee

MichiganIllinoisIndianaUtahPennsylvania

CaliforniaVirginiaNew YorkMissouriNew Jersey

MississippiGeorgiaNorth CarolinaAlabamaKansas

KentuckyWisconsinSouth CarolinaArkansasArizona

MassachusettsWest VirginiaMinnesotaConnecticutOklahoma

IowaMarylandWashingtonAlaskaOregon

MontanaWyomingNew MexicoColoradoMaine

NebraskaNew HampshireIdahoDelawareRhode Island

HawaiiSouth DakotaVermontNevadaNorth Dakota

0 84

Total

0 2

Air

0

Transfer

0 1 4

Underground

0 2

Land

0

Sewage

0 2

Water

TRI releases and transfers for 1987Totals by state and distribution class

Grand total = 7 billion pounds

STATE

(Reduced scale)Units: 100 million pounds

1

1 2

2 3 5 1 2

1 1

Figure 2 A multiple panel bar plot with perceptual grouping. Panel widths vary to make bar lengths comparable. Estimatesare self-reported. Interpretation in terms of toxicity is also problematical

Page 7: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

6 Graphical displays

along the pathways, differing within-species suscep-tibilities, and toxic interactions. Species differencesin susceptibility complicate scaling results to otherspecies where the LD50s are not known.

Knowledge about actual effects of exposure tomultiple toxins is limited, in part due to the over-whelming combinations to be studied. It is interestingto note that the majority of USGS water samples con-taining one pesticide actually contain more than one.At the same time, the EPA safety standards addresspesticides individually, as if the presence of multipletoxins does not change response thresholds (seeJointaction models).

A mortality index does not incorporate outcomessuch as mutations, cancer incidence (see Carcino-genesis, environmental), or the loss of genetic vari-ety that goes with species extinction. Any index thatbegins to communicate risk is likely to be controver-sial. Much is known about mapping hazards [82] butdeveloping understandable, scientifically and politi-cally acceptable indices remains a challenge.

The quality of graphics depends heavily upon thequality of data summaries or estimates being repre-sented. There are many ways of producing estimatesfor summaries. Designed sampling studies provideone source of estimates. Models operating on the datathat happen to be available provide a more commonsource of estimates. If the available data are not ade-quately representative of the underlying phenomena,then model estimates, however sophisticated in termsof handling spatial and temporal correlations, canmiss the mark. Often direct data are not available forregions of interest, so analysts use models developedin other locations or circumstances and covariatesfor the regions of interest to produce estimates. Theavailable covariates may not be adequate for adapt-ing models to different situations. Privacy issues andthe expense of collecting observational data havemotivated the development of many simulation mod-els. Sometimes researchers validate simulation resultsagainst observational data and sometimes they do notor can not. With some understanding of estimate lim-itations and interpretation problems in hand, the nextstep is to describe graphics templates developed tocommunication estimate descriptions and summaries.

Templates for Environmental Graphics

Environmental complexity motivates the use of mul-tivariate graphics templates. Univariate and bivariate

graphics provide starting points, as building blocksfor more multivariate graphics.

Univariate Guidelines

Cleveland and McGill [39] discuss human perceptualaccuracy of extraction and indicate preferred meth-ods for univariate encoding. Their research subjectsjudged relative magnitudes of graphically encodedvariables. Their results ranked the graphical encod-ing methods into three classes described here as best,good, and poor.

The two best encoding methods represent vari-ables using position along a common scale as shownin Figure 3 and position along identical nonalignedscales. That humans do well in judging the position ofa point relative to scale should come as no surprise.Marr [77] notes the ‘quintessential fact of humanvision – that it tells us about shape and space andspatial arrangement’. Locating the position of objectsis a fundamental visual task. Map makers have longused position along a scale as the fundamental encod-ing for spatial coordinates. MacEachren’s [74] reviewof the perception literature attests to the power andprimacy of positional encoding.

Length, angle, and orientation are good encodings.Figure 4 shows that transforming line segments into astandard position converts the task of judging lengthinto a task of judging the position of one endpointagainst a scale. While this is not necessarily whatpeople do, the example suggests that judging linelength is more complicated than judging position.

Figure 5 shows angle encoding. Rotation of theangles puts them in a position for comparison againstequivalent angular scales shown in gray. The transfor-mation suggests that while angle comparisons work

0.0 0.2 0.4 0.6 0.8 1.0

Figure 3 The best continuous univariate encoding –position along a scale. Reproduced from theEncyclopediaof Biostatistics, Vol. 4, pp. 2864–2886, by permission ofJohn Wiley & Sons, Ltd. 1998

Page 8: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 7

0.0 0.4 0.8 1.2

Variouspositions

Standardposition

Figure 4 A good continuous univariate encoding – linelength. Reproduced from theEncyclopedia of Biostatistics,Vol. 4, pp. 2864–2886, by permission of John Wiley &Sons, Ltd. 1998

Variousorientations

Standardorientation

Figure 5 A good continuous univariate encoding – angle.Reproduced from theEncyclopedia of Biostatistics, Vol. 4,pp. 2864–2886, by permission of John Wiley & Sons, Ltd. 1998

reasonably well, they are more complicated thandirect comparison against angle scales.

Area, volume, point density, and color saturationare poor encodings. The reader familiar with exper-imental results involving Steven’s law will not besurprised about poor results for the area and vol-ume encodings. Steven’s law states that the perceived

magnitude of a stimulus follows a power law:

p�x� D axb �1�

where x is the magnitude of the true stimulus (i.e.length, area, volume), and where the constantsaand b depend on the type of stimulus. Based onvalues cited in Baird and Noma [7], Table 1 providesranges of the characteristic exponentsb for length,area, and volume. That is to say, people’s perceptionof length tends to be directly proportional to objectlength. However, we tend to judge area and volumenonlinearly. Consider comparing areas, one of 4square units and the other of 1 square unit. With anexponent of 0.75, the ratio of perceived magnitudesis not 4 to 1, but 2.8 to 1. We underjudge the largeareas relative to small areas. If everyone had thesame exponent, graphical encoding could adjust forsystematic human bias. However, the range of valuesfor b in Table 1 indicates substantial variability fromperson to person. Providing a set of reference symbolsin a legend helps people calibrate to the intendedinterpretation, but the best strategy is to use betterencodings whenever possible.

Weber’s law, a fundamental law in human per-ception, also has important ramifications in terms ofaccurate human decoding. A simple example givesthe basic notion of the law. The probability of detect-ing that a 1.01 in. line is longer than a 1 in. line isabout the same as the probability of detecting thata 1.01 ft line is longer than a 1 ft line. In absoluteterms 0.01 in. is much smaller than 0.01 ft. The useof a finer resolution scale allows more accurate judg-ments on an absolute scale. A common applicationis to put tic marks on a ruler to help us make moreaccurate assessments. The graphical equivalent [36]is to use grid lines to provide a finer resolution scalefor more precise comparisons. In interactive graphics,zooming in provides a finer scale. Computer–humaninterface implementations often provide sliders thatallow the reader to change the reference scale to makemore accurate judgments (see, for example, [1], [50]and [92]).

Table 1

Encoding Exponent range

Length (0.9, 1.1)Area (0.6, 0.9)Volume (0.5, 0.8)

Page 9: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

8 Graphical displays

We render most graphics on a plane. We couldshow values of a continuous variable as points on aline, but there are good reasons for not doing so. Forcategorical variables using bar chart and pie chartsto show percentages is common and dot plots couldbe used. From a perceptual accuracy and labelingconvenience viewpoint, bar charts and dot plots arepreferable. Decoding bar charts and dot plots involvesjudging position along a common scale while piechart decoding compares angles at different positions.The bar chart vs. pie chart controversy is old. Themerit of pie charts is that the reader assumes thatpercentages add to 100. Bar charts and dot plotscan handle this with a footnote if the context doesnot make it obvious. The labeling alone for all ofthese forms demands a planar or higher-dimensionalrepresentation.

Bivariate Guidelines and Examples

Tufte [102] notes that it took over 5000 years formankind to generalize from the early use of claytablet maps to representing other kinds of point pairswith a scatterplot. It is an excellent representationsince the two orthogonal axes allow two coordi-nates to be encoded independently as position alonga common scale. While the popular press in theUS still considers the scatterplot too complicated forthe general public, in the sciences the scatterplotis the standard for representing continuous bivariatedata. Common bivariate activities include assessingunivariate distributions, comparing univariate distri-butions, and looking for functional relationships.

We humans do not assess data density accuratelyeven when points are plotted on a line or on aplane. Overplotting just makes things worse. Con-sequently, it is advantageous to compute data densityand show it as directly as possible. Figure 6 illus-trates the construction of a kernel density estimate(see Meteorological extremes) based on a sampleof five univariate values. The locations of the whitetriangles relative to thex-axis indicate the magni-tudes of observed values. The basic idea is that eachobserved value is a surrogate for values in a neigh-borhood. We then associate a relative likelihood witha neighborhood about the value. Figure 6 shows thefive likelihoods (or kernels) as bell-shaped curves,one in each of the upper panels. For each locationwhere we want to estimate the data density (thex

locations of white lines in Figure 6) we simply aver-age the five likelihoods at that location, one for eachobserved value. The white lines indicate the locationsof the density estimates. In the bottom panel eachwhite line is the average height of all the white linesdirectly above it. (When panels show no white linesdirectly above, they contribute zero to the average.)The construction is straightforward.

Scott [93] provides the theory behind densityestimation along with many graphical examples forunivariate and higher-dimensional density estimates.For a valid density estimate, the kernel needs tointegrate to 1. The hard part is in deciding howwide to make the kernel. Scott describes methods formaking this decision.

In the environmental sciences, cumulative distri-bution plots and quantile plots are commonly usedto describe the distributions of populations. Thetwo types of plots are essentially the same, beingtransposes of each other. For quantile plots thex-axis shows cumulative probabilities and they-axisshows sorted observed values. Figure 7 is a quantileplot. The construction plots cumulative probabilitiesagainst the sorted values from a random sample.While integrating an estimated density function upto a value approximates a cumulative probability,there are two more common approaches to calcu-lating cumulative probabilities: the order-statisticsapproach and the empirical approach. The order-statistics approach here follows Cleveland [35] anduses the expression�i � 0.5�/n for i D 1, . . . , n tocalculate the cumulative probabilities, wheren isthe sample size. While often used, the empiricalapproach yields probabilities that imply future val-ues will never be more extreme values than thosealready observed. Here the guessed probability is0.5/n that a smaller value could be observed and0.5/n that a larger value could be observed. This isusually a minor detail for large samples. (The con-struction can be adapted if the observations are notequally representative of the population of interest.See Cook et al. [42] for a definition of a spatialcumulative distribution function and an applicationto a crowndefoliation index.) To finish the construc-tion, Cleveland interpolates between the point pairs.This produces a piecewise linear curve.

Interpretation is straightforward. For any probabil-ity covered on thex-axis it is possible to determinea quantile. To obtain the 0.5quantile (or estimatedmedian) go straight up from 0.5 on thex-axis to

Page 10: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 9

0.00

0.14

0.00

0.14

0.00

0.14

70 80 90 100 110

Figure 6 The construction of a kernel density estimate

•••••••••••••••••

••••••••••••••••

••••••••••••••••••••••••••••••••••

••••••••••••••••••••

••••••••••••

p=0.5

q=−.03

0.1 0.3 0.5 0.7 0.9

-2

0

2

Cumulative Probabilities

Qua

ntile

s

Figure 7 A quantile plot. The coordinates for they-axisare typically the sorted observations. The coordinates forthe x-axis are the corresponding cumulative probabilities.The quantile corresponding to the cumulative probabilityof 0.5 is also known as the median. Quantiles for othercumulative probabilities can be found graphically or bylinear interpolation

the curve and then straight across to they-axis andread the value. Starting with 0.25 and 0.75 yieldscorresponding quantiles also known as the 1st and3rd quartiles, or 25th and 75th percentiles, respec-tively. Similarly one can go from quantiles to cumu-lative probabilities. Since scientists use such plots todescribe collections of data in a database as well asenvironmental populations, the hardest interpretationtask is often to decide if inference about an environ-mental population is appropriate.

Quantile or cumulative distribution plots are use-ful for characterizing environmental populations. Forexample, quantile plots can indicate the fraction oflakes in a defined region that have eutrophicationvalues below a given threshold. Quantile plots arehelpful on maps and provide a frame of referencefor observing change over time. For maps, Carr andOlsen [23] highlight selected cumulative probabil-ity and quantile pairs using a parallel coordinateapproach to save space. They note the importance of

Page 11: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

10 Graphical displays

Sept. 1982

Sept. 1992

0.1 0.2 0.5 0.8

NDVI from South America

Figure 8 A variation on boxplots. The median: a longhorizontal line. First and third quartiles: ends of thickerboxes. Adjacent values: ends of thinner boxes. Outliers:open circles (none present). Test intervals for differentmedians: white lines inside boxes. NDVI: normalizeddifference vegetation index

considering which distribution to show. For example,in mortality rate mapping,geographic informationsystem (GIS) defaults will typically base the legendon the number of regions, and not other variablesassociated with regions. Showing the fraction of peo-ple living in regions with human mortality ratesbelow given values is much more to the point thanshowing the fraction of regions with mortality ratesbelow given values. Thoughtful selection of the dis-tribution can lead to more meaningful quantile plots.

The boxplot is a distribution caricature that hasachieved wide acceptance. Although it is used torepresent individual distributions, the common useis to compare distributions. Figure 8 provides anexample of a set of boxplots. The features showninclude the median, quartiles, adjacent values, andoutliers. Cleveland [35] describes the determinationof adjacent values and outliers. Variations [57, 78]may show extrema rather than adjacent values andoutliers. The design variation in Figure 8 uses awhite line [19] to provide comparison intervals for themedians. If two comparison intervals do not overlap,then the medians are significantly different.

Q–Q plots provide the preferred graphic to makedetailed continuous distribution comparisons [35].For theoretical distributions, the cumulative distri-bution function F�Ð� provides the correspondencebetween the probability and quantile pairs viap DF�q�. In simple cases the quantile functionQ�Ð� isthe inverse ofF�Ð� andQ�p� D q. Familiar pq pairsfrom the standard normal distribution are (0.5, 0)and (0.975, 1.96). Comparison of two distributions,

Batch 1

Bat

ch 2

–2 –1 0 1 2–2

–10

12

3

Two-sample Q−Q plotThin line = robust fit

Thick line = same distribution line

Figure 9 A two-sampleQ–Q plot. A good straight line fitsuggests similar distributional shapes. Given similar shapes,the slope shows the ratio of scale parameters such asstandard deviations. Given a slope of one, the interceptshows the difference of location parameters such as the twomeans. Reproduced from theEncyclopedia of Biostatistics,Vol. 4, pp. 2864–2886, by permission of John Wiley &Sons, Ltd. 1998

denoted 1 and 2, proceeds by plotting quantile pairs[Q1�p�, Q2�p�] over a range of probabilities, such asfrom 0.05 to 0.95 in steps of 0.05. For two distribu-tions of observed data, the calculations described forthe quantile plots (above) are appropriate. Figure 9shows aQ–Q plot for two batches of data. Thex-axis shows quantiles from Batch 1 and they-axisshows quantiles from Batch 2. Sometimes statisti-cians chooseQ1�p� to be from a theoretical familyof distributions, such as the normal family, to see ifparametric modeling is reasonable using the familyof distributions.

A strong merit of Q–Q plots is that in simplecases they have a nice interpretation. If points fall ona straight line, then the distributions have the sameshape (basically, the same moments higher than two).This is the case in Figure 9 since the robust fit thinline matches the quantiles quite well.

The slope and intercept of the approximatingstraight line tell us about the discrepancies in thesecond moment (scale parameter) and first moment(location). The slope of the thin line tells us about

Page 12: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 11

the ratio of the scale estimates (for example, stan-dard deviations). The thick line in the figure is thereference line for identical distributions. The lines arenot quite parallel in Figure 9 so standard deviationsare not quite the same. Graphical fitting of the scaleratio and location difference can start by guessing theratio and dividing this into they-axis quantiles untilthe lines are parallel. When the lines are parallel,the vertical distance between the two lines gives thedifference in location (or means) given the Batch 2rescaling. In Figure 9 the lines are nearly parallel soa reasonable guess is that the distributions differ inlocation by about 0.5.

Q–Q plots avoid the visually deceptive proce-dure of superimposing two cumulative distributionfunctions or two survival curves. As Figure 10 sug-gests, humans are really poor at judging the distancebetween curves. Our visual processing assesses theclosest differences between curves rather than thecorrect vertical distances [36]. Adding grid lines canhelp, but it is often better to plot the difference explic-itly or make comparisons usingQ–Q plots.

It is straightforward to associate quantiles fromthree or more distributions based on the same cumu-lative probability. Jones and Cook [68] have general-izedQ–Q plots to higher dimensions and applicationof this is worth considering.

Before-and-after comparisons are common in sci-ence. The general idea is to control for the varia-tion in experimental units by studying the changein experimental unit values. This differs fromQ–Qplots in that the study unit is the basis for pairingrather than cumulative probabilities. Figure 11 showsa paired comparison plot for two low-resolution satel-lite images of the same region. The traditional refer-ence line for equality is a 45° line through the origin.Figure 11 also shows a mean and difference plotas proposed by John Tukey and described in Cleve-land [35]. Thex-axis shows the mean of the pairedvalues and they-axis shows the difference. Thistransformation rotates the plot so the reference linefor identical values is a horizontal line at zero. Mak-ing transformations to simplify the visual referenceis an important graphical design principle. Statisti-cians often study the variability in distributions ofdata. They find that the variability of differences oftenincreases as the mean increases. To assess the vari-ability of differences, they sometimes plot the squareroot of absolute difference vs. the mean and then fita smooth line. Cleveland calls this a spread–location

0.0

0.5

1.0

Two curvesand their difference

0.02

0.04

0.06

0.08

0.10

0.12

0.0 0.5 1.0

Figure 10 Explicit difference of two curves. Humans tendto see closest differences between curves, not differencesin the y direction. Reproduced from theEncyclopedia ofBiostatistics, Vol. 4, pp. 2864–2886, by permission of JohnWiley & Sons, Ltd. 1998

plot. This is one way to use the scatterplot in studyingfunctional relationships. The general topic is dis-cussed below.

Before proceeding to functional relationships itmay be helpful to comment on the extension ofQ–Qplots to multiple distributions. There are two basicapproaches. The first shows all paired comparisonsusing a scatterplot matrix, and the second establishesa common reference distribution and makes all com-parisons against the common reference distribution.An example of the latter appears in Figure 15 below.

Functional Relationships and Smoothing

When y is considered a function ofx, commonpractice is to enhance scatterplots of (x, y) pairs byadding a smooth curve. To avoid the considerablehuman variability in sketching an eyeballed fit,the standard procedure is to model data using acomputational procedure that others can replicate.

Page 13: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

12 Graphical displays

–50

–25

0

25

50

Sept. 1992–0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 NDVI

–50

–25

0

25

50

–100 –50 0 50 100 150

Sept. 1982

0.0 0.2 0.4 0.6

–.4

–.2

.0

.2

.4

Diff

eren

ce: 1

992−

1982

ND

VI

Mean of 1992 and 1982 NDVI

Binned pixel-based comparison with smoothfor two NDVI images

Figure 11 Paired comparison of grid cells using differences and averages. Hexagon binning and symbols avoidoverplotting. Interpretation issues include equal angle cells rather than equal area cells and mixed grid cell composition.NDVI: mean normalized difference vegetation index

Figure 12 shows a scatterplot with a smooth line gen-erated using LOESS (LOcal regrESSion) (see [35],or the entry onnonparametric regression modelfor more details). LOESS smooths the data usingweighted local regression. That is, the regression uses

data local tox0 to predict a value atx0. Points closestto x0 receive the greatest weight. The use of manylocal regressions produces a set of pairs (x, y) thatcomprise the smooth curve. Each regression in thesmooth shown in Figure 12 used a linear model inx

Page 14: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 13

Latitude in degrees

Mel

anom

a m

orta

lity

per

mill

ion

30 35 40 45

1014

1822

1950−1959 State mortality rates

o

o

o

oo

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

xx

x

x

xxx x

x

x

x x

x

xx

x

x

x

x

x

x

x

x

xx

x

x

o = Ocean statex = Land state

Figure 12 A smooth for scatterplots. An explicit smooth suggests the same functional relationship to different people.Reproduced from theEncyclopedia of Biostatistics, Vol. 4, pp. 2864–2886, by permission of John Wiley & Sons, Ltd.1998

and included the closest 60% of the observations tothe prediction pointx0. Those with the data [52] andthe algorithm can reproduce the smooth. The smoothin Figure 12 draws further attention to the distinc-tion between ocean and land states and additionalmodeling is appropriate. A first step might be tosmooth the ocean and land states separately.

Smoothing is an extremely important visualenhancement technique. It helps us to see thestructure through the noise. The decomposition ofdata into smooth and residual parts is fundamentalin statistical modeling. Hastie and Tibshirani [64]provide a good introduction to smoothing methods.Their description includesgeneralized additivemodels that cover more situations than LOESS.

Numerous smoothers are available. Historically,many researchers used cubic splines as smoothers.Cubic splines have a continuous second derivativeand that is sufficient to make curves appear smoothto humans. The elegant mathematical formulationbehind splines increased their popularity in segmentsof the statistical community (seeSplines in nonpara-metric regression). However, there is no a prioribest smoother. New methods, such as thewaveletsmoothing in Bruce and Gao [15], keep appearing instatistical software. Different smoothers have differ-ent merits. Recently developed wavelets smoothersare better than many smoothers (but not necessar-ily all smoothers) at tracking discontinuities in the

functional form. The older local median smoothersstill do well at handling discontinuities.

Smoothers typically have some form of smoothingparameter that needs to be estimated or specified bythe user. With computational power at hand,cross-validation methods have become increasingly popu-lar as a community standard. This reduces the judg-ment burdens on the analyst, but of course does notguarantee a match between an empirical curve anda hypothesized true but unknown underlying curve.Hastie and Tibshirani [64] discuss cross-validation formoderate-sized applications. Golub and von Matt [60]discuss generalized cross-validation for large-scaleproblems.

Multivariate Visualization

Environmental visualization is inherently multivari-ate. Environmental scientists are interested in the rela-tionships between many attributes and the attributeshave space–time coordinates. The purpose of mul-tivariate graphics is to show multivariate patternsand to facilitate comparisons (seeMultivariate datavisualization). As in low dimensions, the patternsconcern population distributions or models withat least one dependent variable. After convertingattributes and space–time coordinates to images forevaluation, human comparisons typically fall intothree categories: comparison of external images with

Page 15: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

14 Graphical displays

each other; comparisons of external images withexternal visual references; and comparison of exter-nal images with the analyst’s internal references.These internal references include scientific knowl-edge, statistical expectations, and process models.The visualization investigation process often seeks toconvert internal references into external visual refer-ences subject to further manipulation. With externalimages and references available, the next step ofteninvolves transformation to simpler forms in terms ofour perceptual–cognitive processing abilities.

Multivariate graphics must deal not only with thenoise that obscures patterns but also with the chal-lenge of conveying the structure in large datasetswith relationships that are much higher than two-dimensional (2-D). Advances in remote sensing pro-vide difficult challenges for visualization. Imaginetrying to view 30 m resolution land cover of thecontinental US [108]. A back of the envelope cal-culation suggests this will take over 7000 work-station images each with 1024ð 1280 pixels. Thisonly addresses intensities for one spectral band.Researchers have used hyperspectralimage analy-sis to partition pixels into the constituents of theirmixtures, so pixel description is bound to becomeincreasingly detailed. Modeling is becoming increas-ingly important to reduce the information to structurethat is suitable for human visualization and under-standing.

Databases providing geospatial frameworks formodeling continue to evolve. Frameworks for anal-ysis include the US National Hydrography Dataset(NHD), a comprehensive set of digital spatial datathat contains information about surface water fea-tures, digital elevation, groundwater flow and age (seeGroundwater monitoring), soils, climate, and landcover. For example, the NHD at 1/10 000 resolutionis gradually being upgraded to 1/24 000 resolution.The inclusion of smaller streams and their connec-tions will impact the study of contaminant transportand fate using such computer modeling programsas SPARROW (SPAtially Referenced Regression OnWatershed attributes) [91]. Urban planners will beable to take more streams into account. Frameworksinfluence visualization in many ways.

The spatially detailed presentation of estimatescan be controversial. Obtaining good small areaestimates is often problematic, but because interestincreases as observations get closer to home, there ispressure to produce local estimates. As an interesting

small area example, EPA staff modeled 1990 long-term cumulative concentrations of 148 hazardousair pollutants (HAPs) for the 60 803 US censustracts in the 48 contiguous states [100]. Publicofficials suggested that the EPA should not release theestimates because the underlying 1990 data were oldand limited. The public could be unduly concernedand decisions to move to apparently safer placescould be misguided. Ostensibly, the decision not todistribute the estimates centered on estimate qualityand the difficulties in communicating this quality.

Graphical Design Principles

The above description begins to demonstrate theenormity of the multivariate visualization challenge.At the same time, Kosslyn [70] warns that ‘The spiritis willing but the mind is weak.’ We should approachthe challenge prepared to do battle. As Cleveland [35]says, ‘tools matter’. Our tools are design principlesand templates. Some of our tools include:

ž distributional caricatures such as boxplots to helpus deal with large datasets;

ž map caricatures that let us show small multiples;ž modeling to reduce noise and complexity;ž layering and separating to manage the informa-

tion flow;ž partitioning and sorting to promote and simplify

comparisons;ž linking to peek into higher dimensions.

The basic formats for comparison graphics includejuxtaposition, superposition, or the direct displaydifferences. The art of multivariate graphics is toselect the methods and enhancements that work bestin view of the phenomenon’s complexity view and inview of human perceptual and cognitive limitations.

This entry cannot begin to cover all the repre-sentation tools and principles. A few pointers to theliterature may help the reader explore several facetsof multivariate graphics. MacEachren [73] providesa readily accessible primer on symbolization anddesign. The classic covering a wide variety of visualsymbols and signs is Bertin [11]. Grinstein and Lev-kowitz [62] cover perceptual issues in visualization.Kosslyn [70] provides a gentle introduction to theapplication of human perception and cognition ingraph design, along with excellent references into the

Page 16: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 15

literature. MacEachren [74] gives an extended treat-ment on how maps work.

Foley et al. [53] provide an extensive overview ofcomputer graphics methods. The methods are mostimmediately relevant to low-dimensional visualiza-tion. Wegman and Carr [112] cover selected computergraphics methods and address issues in perceptionand connections to statistical graphics.

Gnanadesikan [59] covers many of the basics inmultivariate statistics, and numerous texts have fol-lowed. The multivariate analysis literature deals withimportant methods such asclustering, classification,factor analysis, discriminant analysis, and dimen-sion reduction that are not described here.

Early work in multivariate statistical graphicsprovides a continuing source of ideas. Fienberg [51]provides an early review. Barnett [8] contains astimulating collection of papers. The work of JohnTukey (see Cleveland [37]) had a profound influenceon statistical graphics and is a third resource worthrevisiting.

Cleveland’s recent books [35, 36] capture much ofhis protracted efforts to guide scientists toward supe-rior statistical graphics. Cleveland and McGill [38]provide an early survey on dynamic multivariategraphics that foreshadows the visualization revolutionin computer science. Tufte [102–104] puts principlesto work and draws attention to works of elegance andbeauty that appear on the printed page.

Much literature is available on the use of color.A good starting point is Brewer [12] and Lev-kowitz [71]. Humans are very sensitive to a dark-to-light scale that is referred to in the literature byterms such as value, lightness, or brightness. Thisis an ordered scale and very important in visualinterpretation. Friedhoff and Benzon [55] describethree visual processing channels, especially a high-resolution dark-to-light channel. Humans get theirshape information and many depth cues (linear per-spective, interposition, shadow, and detail perspec-tive) through this dark-to-light channel. Tufte [103]and others warn that when rainbow colors representan ordered variable, lightness jumps create unin-tended edges and patterns that can be confusing.Brewer [13] cites numerous papers opposed to the useof the spectral ordering, but found that when usingspectral color to represent a few ordered classes theapproach did well in usability studies after reducingthe brightness of yellow to be more consistent withthe neighboring spectral colors.

In addition to brightness, the literature alsodescribes two other color dimensions, i.e. saturationand hue, along with many other trivariate descriptionsof color. (There are many other related descriptions.)A saturation scale goes from an achromatic colorsuch as medium gray, to a saturated color suchas vivid red. This scale is also ordered so it canrepresent an ordered variable. However, saturationsupports fewer distinctions than a dark-to-light scale.The hue dimension can be thought of as a circlethat includes points between the colors red, yellow,green, cyan, blue and magenta. Hue is not an orderedscale and is good for distinguishing six or fewercategorical variables. Wilkinson [116] cites literatureindicating that humans perceive hue and brightnessas integral dimensions, so we should not use themto encode two variables. Additional color choiceconsiderations apply to people with impaired colorvision. Brewer et al. [14] and Olson and Brewer [85]provide guidance.

As more work is done in computing environments,issues around the computer–human interface becomeincreasingly important. Card et al. [16] edited a bookof readings that gathers many important concepts.

The computing revolution has increased access toand usage of visualization methodology by all disci-plines. For example, people routinely get maps fromthe internet to help in their travels. However, theprogress in quantitative graphics has been slow interms of common application. The simple, elegantdots plots promoted by Cleveland in the mid-1980sare hard to find in publications. This is due partly tothe limited options available in highly used spread-sheet graphics. The presence of Wilkinson’s [116]book The Grammar of Graphicsand the correspond-ing JAVA implementation suggests that a graphicsrevolution is about to take place. Environmentalscientists may soon find it easy to produce graphicsthat follow some of the templates illustrated here andin other documents generated by those with specialresources.

The evolving literature on human perceptionand cognition provides the foundation for graphicaldesign principles. In terms of quantitative graphics,the grand design goal may be stated as to reduce thecognitive effort required to make appropriate com-parisons and decisions. Since most people can workwith four items of information, this entry elaboratesthis goal into four broad categories of quantitativedesign principles:

Page 17: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

16 Graphical displays

ž use encodings that have high perceptual accuracyof extraction;

ž provide context for appropriate interpretation;ž strive for simple appearance;ž involve the reader.

The organizing categories contain some conflictingguidelines. For example, a long list of caveats mayprovide the context for appropriate interpretationbut conflict with simple appearance and readerinvolvement. Balancing among the guidelines remainssomething of an art form. The communication objec-tives influence the balance.

Communication Objectives

Multivariate graphics can have many different com-munication objectives. Four common objectives areto provide an overview, to tell a story, to suggesthypotheses, and to criticize a model. In providingan overview, coverage is important. Hiding details isoften crucial to achieve clarity in the coverage shown.Similarly, in telling a story the predetermined mes-sage must shine through. Tufte [104] is an importantresource on the topic of visual explanations. Scien-tists often fail to tell simple stories because they arereluctant to suppress caveats and a host of detailsthat qualify the basic results. Interactive web graph-ics [30] can alleviate the archival side of this problemby showing the basic graphics and providing readyaccess to meta-data, supplemental documents andgigabyte-sized databases. It still takes careful design,however, to lure readers to the details.

This entry leans toward graphics discovery objec-tives that include suggesting hypotheses and criti-cizing models. For discovery, it is often crucial tosee through the known and miscellaneous sourcesof variation. In the context of mortality mapping,Tukey [105] said, ‘the unadjusted plot should notbe made’. Today mortality maps begin to control forknown variation by being sex- and race-specific. Themaps control for age either by limiting the age rangeor by statistical adjustments. Typically there are alsoknown risk factors that warrant further adjustments.After such controls and adjustments, inverse vari-ance weighted smoothing can be used in the attemptto bring out the central structure in the remainingnoise.

In terms of discovery, balanced visual emphasisof the variables helps the data to speak. Of course a

happenstance emphasis of some variables over othersoccasionally leads to insight, but even then the carefulanalyst will move toward the symmetrical position oftrying all permutations of the variables.

Functions of Two or More Variables

Multivariate visualization involves showing densi-ties, functional relationships, and maps that havemore than two coordinates. The density of bivari-ate points constitutes a third coordinate. The basicidea is that bivariate kernel density estimation issimilar to that for univariate estimation: averagelocal likelihoods. The basic difference in the localneighborhood is bivariate. The result is a surfacez D f�x, y�. Estimating functional relationships ofthe form z D f�x, y� also follows the pattern estab-lished with one less variable. The domain (x, y)is not limited to spatial coordinates, and statisticssometimes consider attributes divorced from spa-tial indices. Stepping up one dimension higher leadsto the study of hypersurfaces. Scott [93] providesgraphics showing contours of hypersurfaces. Meth-ods for modeling surfaces are similar whether or notthe domain consists of spatial coordinates. However,there are some important issues to address, such asspatial correlation. The interested reader can refer toCressie [43] or the entries onspatial covariance orkriging.

Maps

Maps are an important part of environmental visual-ization. Spatial indices provide a basis for computersand people to organize and access information. Car-tographers have developed map projections that areuseful for many different purposes [96]. For environ-mental visualization there are usually good reasons touse equal area projections. Olsen et al. [84] discussthe application of equal area global grids to sam-pling. The US National Center for Health Statisticsuses an Albers equal area projection and the US EPAuses a Lambert equal area projection. NASA (theUS National Aeronautics and Space Administration)bases its storage of satellite information and level-three satellite products on latitude and longitude. Thishas the merit of being familiar but equal angle gridsmake the North Pole as wide as the equator. Thisis not directly suitable for polar or global modeling.

Page 18: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 17

Rather, the information must be regridded with theattendant information losses.

Representing attributes on maps is the subject ofnumerous books; see, for example, Dent [46], Mon-monier [81], MacEachren [74] and Slocum [95]. Onecommon representation is the choropleth map (seeLandscape pattern metrics). Regions on a map arecolored and the color indicates the region’s mem-bership in a class. The classes may be based on acategorical variable or on breaking a continuous vari-able into class intervals. Discriminating and keepingtrack of many different colors is not easy, so generalguidance is to limit the number of classes to six orfewer.

Authors such as Dent [46] describe various lim-itations of choropleth maps. The maps are not par-ticularly informative when the variable represented ishighly correlated with a region’s area. Standard guid-ance is to represent rates whose denominators adjustfor variables related to area. For example, show-ing pesticide application per unit area or deaths per100 000 people is reasonable.

Another common difficulty occurs when regionboundaries have little to do with the spatial struc-ture of the variable. The spatial variation of thevariable within a region may be considerable. Thesingle estimate for a region may be a ratio with largedenominator and have a smallstandard error. Thecalculated uncertainty for the region may provide fewclues about the spatial variability. The problem ofobtaining different values at different geographicalscales is known as the modifiable areal unit prob-lem [54, 117] (seeSample support).

Another problem with choropleth maps is the dif-ficulty in representing the facets of estimate qual-ity. Typical choropleth maps discard confidencebounds for the estimates and other indicators of esti-mate quality. Some things can be done, however.MacEachren et al. [76] use light and dark stripes tomark regions whose estimates have low reliability.MacEachren [73] discusses other representations foruncertainty.

Cartograms provide a controversial approach forrepresenting spatially indexed estimates. Cartogramsdistort the spatial relationships to provide equal rep-resentation based on a variable such as human pop-ulation. Dorling [48] provides numerous examples.The approach is readily applicable to populations ofbirds, mammals, lakes and so on. The distorted spatialrelationships make it difficult for people to associate

other information they have stored mentally based onspatial landmarks. Dykes [49] juxtaposes traditionalmaps with cartograms to ameliorate the problem.Since many people get used to particular map views,cartograms are not likely to see widespread use, butsome people find them helpful to see patterns.

Cartographic representations are available forpoint features, linear features, and surfaces [73, 79].Rather than represent local values, a common map-ping approach estimates surfaces of the formz Df�x, y�. The surface can be represented as contourson a map. Given a surface value,z0, a contourline consists of pairs (x, y) that satisfy the equationz0 D f�x, y�. A typical contour plot shows approxi-mate contour lines for several values ofz. Labeledcontour lines do not have much visual impact, how-ever. Several methods can improve this. An easyapproach is to communicate contour values by con-tour line thickness. Another option is to fill theregions between contour lines with color. (The colorsshould be ordered.) Interestingly, there are no confi-dence bounds for contour lines. Consequently Carret al. [28] estimate values on a hexagon grid, defineclass intervals based on the distribution of estimatedvalues, choose colors for the hexagons based on theintervals, and call the result a hexagon mosaic map.If the data and modeling provide justification, thenconfidence bounds can be calculated for the estimatedvalues. Of course the same can be done for squaregrids. However, hexagons have merits over squaresof the same area [17, 93].

Many surface representations are available suchas color-draped perspective wireframes and full ren-dered color surfaces with highlights. See, for exam-ple, Cleveland [35]. Wegman and Luo [113] note thatspecular reflection highlights local density anoma-lies. Tufte [104] shows that the pairing of contourand surface plots can aid understanding. The surfacetends to provide a good overall impression (exceptfor what is hidden) while the contours help to locatefeatures, such as local extrema, on the plane. Wire-frames and translucent surfaces can be superimposedon a map. Sometimes researchers stack three distinctviews, a tipped map, contour plot, and surface, sothey appear to be aligned in three dimensions. Per-spective views appeal to many people and have alsobeen used to show local values, such as the heightof three-dimensional bars in animated flyovers. How-ever, perspective foreshortening complicates accuratedecoding of values and comparisons.

Page 19: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

18 Graphical displays

Researchers have used various extensions. Tufte[103] uses small multiples very effectively to showchanges over time. Researchers often use anima-tion to show temporal change. This can reveal rapidchange, but it is hard to remember the old viewsneeded to make comparisons over longer intervals.There is little reason to believe that people canvisualize the difference between surfaces any betterthan they can the differences between curves. Con-sequently, showing the explicit differences is usefulin both small multiples and in animation. Carr [20]reports on an early release of satellite data whereimages of sea surface levels were not properly reg-istered. The previously undetected problem becameimmediately evident when animating thedifferencebetween consecutive images.

More complicated extensions include the simul-taneous display of two surfaces. This is possibleusing translucence. Another approach is to have sur-face height encode one variable and surface color theother.

Of special interest is the fast nonparametric shifthistogram technique of Scott and Whittaker [94] forestimating surfaces. This can incorporate samplingweights and handle three or four variables in additionto the spatial coordinates. For instance, it can showa smoothed surface conditioned by other variables.

There are many ways of representing estimatesand spatial indices. This entry calls attention to threeadditional static approaches: plotting glyphs, linkingplots and maps, and juxtaposed maps.

Glyphs

Glyphs provide one way to represent multivariateobservations. Estimated values control the glyphparameters. We can think of bar charts, pie charts andboxplots as glyphs that represent local distributions.We can also use tiny scatterplots with smoothsas glyphs. Thus, glyphs can show local functionalrelationships. A circle is a simple commonly usedglyph. We do not decode circle area very accurately.(Using a legend with a few reference values helpsthe reader to judge glyph values more accurately.)Carr [21] notes that for multivariate glyphs it ishard to assess the multivariate distance betweenglyphs so that geometrical pattern-finding breaksdown. Carr et al. [29] suggest several variations forusing simple ray angle glyphs. For large datasetsand maps, Carr [17] and Carr et al. [26, 28] use

hexagonal binning to provide symbol congestioncontrol. The ray glyph provides a summary for eachhexagonal region and with only one symbol perhexagon, overplotting is not a problem. When raysrepresent estimated values with confidence intervals,the authors represent the confidence intervals usingarcs. Small reference wheels at the base of the rayprovide an unobtrusive angular scale for comparison.The angle can be judged accurately in the context ofcomparison against a scale. Carr et al. [26] developan angular boxplot glyph. Carr [17] uses a bivariateray glyph to show two dependent variables. A raypointing to the right encodes one variable (for smallvalues the ray points down and for large values itpoints up) and a ray pointing to the left encodes theother. Chambers et al. [32] describe many other suchgraphical representations including a closely relatedmetroglyph that represents both wind direction andspeed. See also [2].

Many glyphs have a long history. Some glyphs,such as Chernoff faces [33, 34], have attracted consid-erable attention. Takacs [101] provides informationabout human face recognition that suggests someglyph enhancements. Some, such as the trees ofKleiner and Hartigan [69], have seen little use. Starglyphs, a polar variant of a parallel coordinate plot,occasionally appear. What survives in the long termremains to be seen.

With numerous methods available for producingstereoscopic graphics, stereo rays [22] provide yetanother way to show two variables in addition totwo spatial coordinates. Few researchers have seri-ously tackled the visualization of six-dimensionaldata. A notable exception is Bayly et al. [9]. Theysuccessfully used colored stereo ellipsoids to evalu-ate problems in improving an electrostatic potentialmodel. Their article includes color side-by-side stereofigures. In a long sequence of efforts Bayly (personalcommunication) failed to obtain insight using a widevariety of encodings. Their eventual success suggeststrying a variant of their glyph that devotes two of thecoordinates to representing a position on a map. Theimage of small translucent or wireframe dirigiblescomes to mind.

As indicated above, symbol congestion limits thenumber of glyphs that can be placed on maps. Puck-ett [91] partly addresses this issue by placing the piecharts (glyphs) around the map and drawing linesfrom each pie chart to the spatial location. This maysuffice for local booking but complicates local spatial

Page 20: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 19

comparisons of the distributions. Cuffney et al. [44]provide an interesting glyph composed of six rect-angles indicating the status of metal, nonpesticideagricultural intensity (NPAI), fish, invertebrates andalgae at monitoring stations. The rectangles of red,yellow, green and white represent the impairmentstatus of severe, moderate, unimpaired, and no data,respectively. The height of six lines could have shownthe original continuous values and color could stillindicate the class membership. Carr and Olsen [24]show 159 variables using line height. It is possibleto represent many variables as a glyph with a soundencoding position along a scale. However, variablelabeling and symbol congestion challenges remain.

Linked Plots

Linking points across plots provides a way to con-nect variables that are represented in different plots.Linking provides a weaker binding of the multivariateobservations than glyphs. Linking methods includelinking by lines, colors, names, pointers, and spa-tial linking by juxtaposition. The following discussionemphasizes line linking and color linking.

Diaconis and Friedman [47] discuss M and Nplots that link points in different plots with lines. Forexample, they represent four-dimensional data usingtwo two-dimensional scatterplots. There is nothingthat prevents one plot from being a map with spatialcoordinates. The first plot represents the two coordi-nates and the second plot represents the remainingtwo coordinates. A line between a bivariate pointin one plot and a bivariate point in the secondplot indicates that bivariate points really representone four-coordinate point. Their general descriptionincludes linking across multiple plots of varyingdimensionality. For example a four-dimensional rep-resentation might link a one-dimensional plot to atwo-dimensional plot to a one-dimensional plot.

Parallel coordinate plots are the only variationof M and N plots that have caught on. The paral-lel coordinate plot forp dimensions is a sequenceof p univariate plots. The representation connectsp coordinates withp � 1 line segments. An earlyexample appears in Bertin [11]. Inselberg [66] andWegman [110] introduce the mathematical and sta-tistical aspects of parallel coordinate plots. They andInselberg and Dimsdale [67] describe the point–lineduality and other mathematical relationships that pro-vide a basis for extended interpretation. For example,

Inselberg has used the representation to find theclosest distance between two lines in four dimensions.Interpretation of some patterns requires significantbackground. Other patterns are easy. For example,Wegman notes that one can readily assess the cor-relation between adjacent variables. Many crossingsegments between adjacent axes indicate a high nega-tive correlation and many parallel segments indicate ahigh positive correlation. Wegman and Luo [114] alsouse parallel coordinates for high-dimensional clus-tering. Even for two variables, parallel coordinatesare not very good at communicating a detailed func-tional relationship. Parallel coordinates seem partic-ularly well suited for showingtime series or multi-spectral intensities. In both cases the data units arethe same.

Increasingly, exploratory analyses use interactivecolor brushing [10] to highlight elements representedboth in statistical plots and maps. Common plots usedin this fashion are scatterplots and parallel coordinateplots. Cook et al. [42] provide a good entry point intothe domain of dynamic graphics and GIS. This workbuilds on the software Xgobi, which supports higher-dimensional exploration using grand tour [3] andprojection pursuit [41]. The grand tour or projectionpursuit combined with scatterplot or parallel coor-dinate plots can show evolving linear combinationsof variables [111]. The projection pursuit algorithmsprogressively modify the linear combinations to bringout different features in the plot. Furnas and Buja [58]indicate that we can learn about structure dimension-ality through projection and sectioning.

Figure 13 introduces a relatively new templatethat Carr and Pierson [25] and Carr et al. [27]called linked micromap (LM) plots. This extends theidea of linking statistical plots and maps as pro-moted by Monmonier [80]. The basic template iscomposed of three parallel sequences: small gen-eralized maps, region labels, and statistical panels.This design implements the idea of small multiplesand parallelism recommended by Tufte. The designpartitions the units of study into small perceptualgroups and highlights different study units in differ-ent panels to encourage selective focus. In Figure 13,adapted from Carr et al. [28], the study units arelevel 2 Omernik ecoregions. Finding all parts of dis-joint ecoregions is easy due to the selective focusand color link. The boxplots summarize the spatialvariation in the half million grid cells that parti-tion the US. Each grid cell has estimates produced

Page 21: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

20 Graphical displays

1234567

Mixedwood shieldAtlantic highlandsWestern cordilleraMarine west coast forestsMixedwood plainsCentral USA plainsSoutheastern USA plains

89

1011121314

Ozark/Ouachita Appalachian forestsMississippi alluvial & coastal painsTemperate prairiesWest−central semi-arid prairiesSouth−central semi-arid prairiesTexas−Louisiana coastal plainTamaulipas−Texas semi-arid plain

15161718192021

Western interior basins and rangesSonoran and Mohave desertsChihuahuan desertMediterranean CaliforniaWestern Sierra Madre piedmontUpper Gila mountainsEverglades

Ecoregions

---Median---

Sorted bymedian growing degree days

3

2

4

1

15

11

5

20

10

6

8

18

12

19

17

7

9

16

13

14

21

Precipitation

4 16 64 256Inches (log scale)

Growing degree days

0 20 40 60 80 100Growing degree days/100

Omernik level II ecoregionsYearly average values from 1961 to 1990

Spatial distribution from 1/2 million grid cells

Figure 13 An LM plot with boxplots showing spatial variation. The ecoregions may not be familiar but the horizontalcolor linking makes it easy to find them, even the ones that are disjoint

by Daly et al. [45] giving the 30-year average pre-cipitation and average number of growing degreedays. The distribution of cell values within ecoregionsshows considerable variability. If the two variablesare closely related to the concept of ecoregion andinfluence ecoregion definition, then the expectationis that variability will rapidly decrease when consid-ering level-three and higher ecoregions that provide a

much finer partition of the US. Note the linking of sta-tistical estimates and spatial coordinates using cycliccolors. There is a learning curve for understandingthe color when reading a sequence of micromaps.The highlighted regions in the sequence of panelscan reveal more detailed patterns than typical classedchoropleth maps, especially in examples with manypanels.

Page 22: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 21

There are many variations on LM plots. Carret al. [27] show the use of dot plots with confidencebounds in the background. Carr et al. [29] use lineheights to represent percentages for the 159 landclasses defined by Loveland et al. [72]. The boxplotsin Carr et al. [31] reveal the variation in mortalityrates for local health service areas that contrast withthe seemingly stable state estimates. Layouts showingvalues for counties are now available for severalstates. Statistical panels can show time series andeven bivariate boxplots (see Boxplot, bivariate).The micromaps can show sites and river segments.Interactive extensions can involve zooming into LMplots to show progressively revealed detail.

Conditioned Plots

The simplest form of conditioned plots partition theestimates (or data) into sets based on classes ofconditioning variables. The different sets appear indifferent juxtaposed panels. The visualization taskis then to study how the distributions or functionrelationships shown in the panels vary across theconditioning panels. This approach is very similarto the nested plots of Tukey and Tukey [107]. Anearly exposition on conditioned plots (or coplots)appears in Cleveland et al. [40]. Conditioned plotsare typically two-dimensional plots, but they can bethree-dimensional wireframe plots or other higher-dimensional plots. People readily understand one- andtwo-way conditioning and the corresponding layoutof panels. Thus, conditioned plots prove a reasonableway to study relationships involving three to fivedimensions.

Conditioned views do not have to partition thedata strictly to produce different panels. Clevelandet al. [40] introduced the notion of shingles thatallows the same observations to appear in morethan one panel. This is helpful when smoothing ascatterplot because it increases the number of pointsin the plots and addresses poor smoothing at thepanel edges.

Carr et al. [31] developed the coplot idea in thecontext of maps and call the resulting template condi-tioned choropleth (CC) maps. Figure 14 provides anexample. The data for Figure 14 include many differ-ent estimates that describe each equal-area hexagon.These estimates include the number of species foundfor birds, mammals, insects and reptiles. Additionalestimates include the average elevation, the number

of different land classes and so on. For Figure 14the variable chosen for study is the number of birdspecies. The conditioning variables chosen are num-ber of different land classes and the number of differ-ent mammal species. None of the variables is categor-ical. Consequently, the analyst chooses a transforma-tion that uses class intervals to convert the variablesinto ordered classes. The bar with five colors at thetop shows class boundaries and class colors that par-tition the hexagons based on the number of birds. Thechosen boundaries put 20% of the hexagons in eachclass. In terms of the number of land cover classes,the four numbers at the bottom of the plot defineboundaries that partition the hexagons into threeroughly equal classes, each with about one-thirdof the hexagons. Similarly, the numbers on theleft partition the hexagons based on the number ofmammal species. The panel is a 3ð 3 layout basedon the classes of the conditioning variable. In a singlepanel only those hexagons appear that satisfy the rowand column conditioning constraints. Each hexagononly appears once in the full plot.

The visual task in Figure 14 is to compare thedistributions. The little box in each panel shows themean for the panel. The box background color classi-fies the mean. The diagonal color pattern indicates thecondition–variable interaction. Figure 15 shows moredetailed comparisons usingQ–Q plots. These plotscompare the distribution of values in each panel withthe composite across all panels. This is superior forcomparing distributions. However, Figure 14 keepsthe spatial context and allows the analyst to generatehypotheses about the spatial patterns observed.

Carr et al. [31] show a layout for an interactiveversion where the analyst controls the class bound-aries with partitioning sliders. This kind of inter-activity encourages the reader’s involvement, whichwas one of the four guiding principles. Sophisticatedresearchers may prefer to move directly to modelingand the study of structure andresiduals. Explorationusing a few variables and a few classes and slidersis limiting. There are recursive partitioning modelsthat have built trees from over two million potentialexplanatory variables. However, tools like CC mapsare easy to understand and can draw people towardsmore sophisticated modeling.

Laying out multiway panels in rows and columnsacross many pages is often a good start. However,there are many facets to graphical design and gen-eral purpose algorithms have not yet captured all the

Page 23: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

22 Graphical displays

102 104 105

90 99 98

79 84 94

22

0 20 40 60 80 100

82 92 100 106 127Bird species

Per cent

30

38

19

52

Num

ber

of m

amm

al s

peci

es: p

anel

row

con

stra

int i

nter

vals

2 8 12 42Number of land classes: panel column constraints intervals

Number of bird species in hexagons partitioned bynumber of land classes and number of mammal species

Figure 14 A conditional chloropleth (CC) map. The units are equal area hexagons covering the Mid-Atlantic Region.The plot omits fractional hexagons with less than one-half of the area inside the region. Values in the tabs are the averagenumber of bird species present based on the hexagons in each panel. Conditioning on associated or causal variables reducesvariation. Analysts may then see other spatial patterns

current graphical design expertise. For example, Carrand Olsen [24] have found the spanning tree traversaldescribed by Friedman and Rafsky [56] very usefulfor multivariate sorting. Methods for simplifyingvisual appearance remain applicable. These includethe key strategies of perceptual grouping of informa-tion, sorting, presenting the information in layers, theremoval of redundant information and the purposefuluse of white space [24, 70]. The graphics make iteasy to apply thoughtful sorting, but the analyst stillhas to do the thinking.

The difficulty of seeing patterns across many lev-els of conditioning factors and pages needs to berecognized. Humans are not good at integrating low-dimensional relationships into higher-dimensional oroverview patterns. When the information appearsacross pages, the limits of our short-term memoriescompound the difficulty. When across-panel insightsoccur, they are likely to be based on panels juxtaposedclosely in space or time. Careful attention to thechoice of layout is often the key to obtaining mul-tivariate insights.

Page 24: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 23

30 60 90 120 30 60 90 120 30 60 90 120

30

60

90

120

30

60

90

120

30

60

90

12030

38

19

52

Num

ber

of m

amm

al s

peci

es: p

anel

row

con

stra

int i

nter

vals

8 122 42Number of land classes: panel column constraint intervals

Number of bird species presentPanel: Q−Q plot of the subset (y-axis)

against the composite (x-axis)

Figure 15 Two-way conditionedQ–Q plots. Q–Q plots reveal distributional differences of subsets, not just changes inthe mean. Comparison with pooled data automatically implies some similarity. However, a single distribution provides aconvenient framework for comparison

Closing Remarks

The tools for environment visualization continue toadvance. The 1997 special issue ofComputers andGeosciences, on exploratory cartographic visualiza-tion, edited by MacEachren and Kraak [75], con-tains articles on a variety of topics such as the

representation of uncertainty, encoding for character-izing landscapes, and dynamic methods. This entrydoes not individually list all the instructive articles inthat collection and certainly does not begin to captureall the literature that is available. Undoubtedly thepresentation here is slanted toward the literature theauthor knows best. The tools will continue to evolve

Page 25: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

24 Graphical displays

along with both our understanding of the environmentand ourselves. The tools help us to think about theavailable data.

Sometimes we need to think about the data thatwe are not seeing. Sometimes there are barriers toscientific inquiry that those who see beyond the datashould address.

Increasingly the size, detail, and complexity ofenvironmental datasets will overwhelm our visu-alization capacity. In thinking about the future,Tukey [106] coined the term cognostics (diagnosticsinterpreted by a computer rather than a human). Theidea was to compute features of merit and have acomputer rank plots by their potential interest tohumans. This is frightening; for example algorithmscan miss little details like the hole in theozone layer.However, humans miss a lot because our looking isnot automated and not optimized for understanding.

Some may think environmental visualization iseasy. It is not. Rather, it is a huge intellectualchallenge that spans developing concepts, collectingdata thoughtfully, modeling, grappling with complex-ity, and dealing with our own limits.

Acknowledgments

Portions of this work were supported by NSF Grants9819945 and 9983461 and EPA Cooperative AgreementCR825564-01-2. This entry has not be reviewed by theagencies, does not necessarily reflect agency views and noendorsement should be inferred.

References

[1] Ahlberg, C. & Schneiderman, B. (1994). Visual infor-mation seeking: tight coupling of dynamic query filterswith starfield displays,Proceedings of the ACM CHIInternational Conference on Human Factors in Com-puting (Chi’94), Boston, pp. 303–317.

[2] Anderson, E. (1960). A semi-graphical method forthe analysis of complex problems,Technometrics2,287–292.

[3] Asimov, D. (1985). The grand tour: a tool for viewingmultidimensional data,SIAM Journal of Scientific andStatistical Computing6, 128–143.

[4] Bailey, R.G. (1995).Description of the Ecoregions ofthe United States, USDA Forest Service, Washington,Miscellaneous Publications No. 1391 (review).

[5] Bailey, R.G. (1995).Ecosystem Geography, Springer-Verlag, New York.

[6] Bailey, R.G. (1998).Ecoregions Map of North America:Explanatory Note, USDA Forest Service, Washington,Miscellaneous Publications No. 1548.

[7] Baird, J.C. & Nome, E. (1978).Fundamentals of Scalingand Psychophysics, Wiley, New York.

[8] Barnett, V., ed. (1981).Interpreting Multivariate Data,Wiley, New York.

[9] Bayly, C.I., Cieplak, P., Cornell, W.D. & Kollman, P.A.(1993). A well-behaved electostatic potential basedmethod using charge restraints for deriving atomiccharges: the RESP model,Journal of Physical Chem-istry 97, 10 269–10 280.

[10] Becker, R.A. & Cleveland, W.S. (1987). Brushingscatterplots,Technometrics29, 127–142.

[11] Bertin, J. (1967, 1983).Semiologie Graphique, Gathier-Villars, Paris. Semiology of Graphics, Translated byW.J. Berg, The University of Wisconsin Press, Madison.

[12] Brewer, C.A. (1994). Color use guidelines for mappingand visualization, inVisualization in Modern Cartogra-phy, A.M. MacEachren & D.R.F. Taylor, eds, Perga-mon/Elsevier, Oxford, pp. 123–147.

[13] Brewer, C.A. (1997). Spectral schemes: controversialcolor use on maps,Cartography and Geographic Infor-mation Systems24, 203–220.

[14] Brewer, C.A., MacEachren, A.M., Pickle, L.W. &Herrmann, D. (1997). Mapping mortality, evaluationcolor schemes for choropleth maps,The Annals of theAssociation of American Geographers87, 411–438.

[15] Bruce, A. & Gao, H. (1996).Applied Wavelet Analysiswith S-PLUS, Springer-Verlag, New York.

[16] Card, K.J., Mackinlay, D. & Schneiderman, B., eds(1999).Information Visualization Using Vision to Think,Morgan Kaufmann, San Francisco.

[17] Carr, D.B. (1991). Looking at large data sets usingbinned data plots, inComputing and Graphics in Statis-tics, A. Buja & P. Tukey, eds, Springer-Verlag, NewYork, pp. 7–39.

[18] Carr, D.B. (1994). Converting Tables to Plots, TechnicalReport No. 101, Center for Computational Statistics,George Mason University, Fairfax.

[19] Carr, D.B. (1994). A colorful variation on boxplots,Sta-tistical Computing and Graphics Newsletter5, 19–23.

[20] Carr, D.B. (1996). Perspectives on the analysis of mas-sive data sets,Computing Science and Statistics, Pro-ceedings of the Twenty-seventh Symposium on the Inter-face, Interface Foundation of North America, Fairfax,pp. 410–419.

[21] Carr, D.B. (1998). Multivariate graphics, inEncyclope-dia of Biostatistics, Vol. 4, P. Armitage & T. Colton,eds, Wiley, New York, 2864–2886.

[22] Carr, D.B. & Nicholson, W.L. (1988). EXPLOR4:a program for exploring four-dimensional data, inDynamic Graphics for Statistics, W.S. Cleveland &M.E. McGill, eds, Wadsworth, Belmont, pp. 309–329.

[23] Carr, D.B. & Olsen, A.R. (1995). Parallel coordinateplots for representing distribution summaries in maplegends,Proceedings of the Seventeenth InternationalCartography Association Conference, Tenth GeneralAssembly of the ICA, pp. 733–742.

[24] Carr, D.B. & Olsen, A.R. (1996). Simplifying visualappearance by sorting: an example using 159 AVHRR

Page 26: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 25

classes,Statistical Computing and Graphics Newsletter7, 10–16.

[25] Carr, D.B. & Pierson, S.M. (1996). Emphasizingstatistical summaries and showing spatial contextwith micromaps,Statistical Computing and GraphicsNewsletter7, 16–23.

[26] Carr, D.B., Littlefield, R.J., Nicholson, W.L. & Little-field, J.S. (1987). Scatterplot matrix techniques for largeN, Journal of the American Statistical Association82,424–436.

[27] Carr, D.B., Olsen, A.R., Courbois, J.P., Pierson, S.M. &Carr, D.A. (1998). Linked micromap plots: named anddescribed,Statistical Computing and Graphics Newslet-ter 9, 24–32.

[28] Carr, D.B., Olsen, A.R. & White, D. (1992). Hexagonmosaic maps for display of univariate and bivariategeographical data,Cartography and Geographic Infor-mation Systems19, 228–236, 271.

[29] Carr, D.B., Olsen, A.R., Pierson, S.M. & Courbois, J.P.(2000). Using linked micromap plots to characterizeOmernik ecoregions,Data Mining and Knowledge Dis-covery 4, 43–67.

[30] Carr, D.B., Valliant, R. & Rope, D. (1996). Plot inter-pretation and information webs: a time-series examplefrom the bureau of labor statistics,Statistical Computingand Graphics Newsletter7, 19–26.

[31] Carr, D.B., Wallin, J.F. & Carr, D.A. (2000). Twonew templates for epidemiology applications: linkedmicromap plots and conditioned choropleth maps,Statistics In Medicine19, 2621–2538.

[32] Chambers, J.M., Cleveland, W.S., Kleiner, B. &Tukey, P.A. (1983). Graphical Methods for DataAnalysis, Wadsworth and Brooks/Cole, Pacific Grove.

[33] Chernoff, H. (1973). Using faces to represent pointsin K-dimensional space graphically,Journal of theAmerican Statistical Association68, 361–368.

[34] Chernoff, H. & Haseeb, R.M. (1975). Effect on clas-sification error or random permutation of features inrepresenting multivariate data by faces,Journal of theAmerican Statistical Association70, 548–554.

[35] Cleveland, W.S. (1993).Visualizing Data, Hobart Press,Summit.

[36] Cleveland, W.S. (1994).The Elements of GraphingData, Hobart Press, Summit.

[37] Cleveland, W.S. ed. (1988).The Collect Works of JohnW. Tukey:Vol. V, Graphics 1965–1985, Wadsworth andBrooks Cole, Pacific Grove.

[38] Cleveland, W.S. & McGill, M.E., eds (1988).DynamicGraphics for Statistics, Chapman & Hall, New York.

[39] Cleveland, W.S. & McGill, R. (1984). Graphical per-ception: theory, experimentation, and application to thedevelopment of graphics methods,Journal of the Amer-ican Statistical Association79, 531–554.

[40] Cleveland, W.S., Grosse, E. & Shyu, W.M. (1992).Local regression models, inStatistical Models,J.M. Chambers & T.J. Hastie, eds, Wadsworth andBrooks Cole, Pacific Grove.

[41] Cook, D., Buja, A., Cabrera, J. & Hurley, C. (1995).Grand tour and projection pursuit,Journal of Computa-tional and Statistical Graphics4, 155–171.

[42] Cook, D., Symanzik, J., Majure, J. & Cressie, N. (1997).Dynamic graphics in a GIS: more examples using linkedsoftware,Computers and Geosciences23, 371–385.

[43] Cressie, N.A.C. (1993).Statistics for Spatial Data,Wiley, New York.

[44] Cuffney, T.F., Meador, M.R., Porter, S.D. & Gurtz, M.E.(1997). Distribution of fish, benthic invertebrate, andalgal communities in relation to physical and chemi-cal conditions, Yakima River Basin, Washington, 1990,US Geological Survey Water Resources InvestigationReport 96, 4280, US Geological Survey, Denver.

[45] Daly, C., Neilson, R.P. & Phillips, D.L. (1994). Astatistical–topographic model for mapping climatolog-ical precipitation over mountainous terrain,Journal ofApplied Meteorology33, 140–158.

[46] Dent, B.D. (1990). Cartography – Thematic MapDesign, Brown, Dubuque.

[47] Diaconis, P. & Friedman, J.H. (1980). M and N plots,Pub-2495, Stanford Linear Accelerator Center, StanfordUniversity, Stanford.

[48] Dorling, D. (1995).A New Social Atlas of Britain, JohnWiley, Chichester.

[49] Dykes, J.A. (1997). Exploring spatial data represen-tation with dynamic graphics,Computers and Geo-sciences23, 345–371.

[50] Eick, S.G., Steffen, J. & Sumner, E. (1992). Seesoft –a tool for visualization software,IEEE Transaction onSoftware Engineering18, 957–968.

[51] Fienberg, S.E. (1979). Graphical methods in statistics,The American Statistician33, 165–178.

[52] Fisher, L.D. & van Belle, G. (1993).Biostatistics:A Methodology for The Health Sciences, Wiley, NewYork.

[53] Foley, J.D., van Dam, A., Feiner, W.K. & Hughes, J.F.(1990). Computer Graphics: Principles and Practice,Addison-Wesley, Reading.

[54] Fotheringham, A.S. & Wong, D.W.S. (1991). Themodifiable a real unit problem in multivariate statis-tical analysis,Environment and Planning, Series A23,1025–1044.

[55] Friedhoff, R.M. & Benzon, W. (1991).The SecondComputer Revolution, Visualization, W.H. Freeman,New York.

[56] Friedman, J.H. & Rafsky, L.C. (1979). Multivariategeneralizations of the Wald–Wolfowitz and Smirnovtwo-sample tests,The Annals of Statistics7, 697–717.

[57] Frigge, M., Hoaglin, D.C. & Iglewicz, B. (1989).Some implementations of the boxplot,The AmericanStatistician43, 50–54.

[58] Furnas, G.W. & Buja, A. (1994). Prosection views:dimensional inference through sections and projections,Journal of Computational and Graphical Statistics3,323–353.

Page 27: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

26 Graphical displays

[59] Gnanadesikan, R. (1977).Methods of Statistical DataAnalysis of Multivariate Observations, Wiley, NewYork.

[60] Golub, G.H. & von Matt, U. (1997). Generalizedcross-validation for large-scale problems,Journal ofComputational and Graphical Statistics6, 1–34.

[61] Goodchild, M. & Gopal, S., eds (1989).Accuracy ofSpatial Databases, Taylor & Francis, New York.

[62] Grinstein, G. & Levkowitz, H., eds (1995).Percep-tual Issues in Visualization, Springer-Verlag, NewYork.

[63] Hammond, A., Adriaanse, A., Rodenburg, E., Byran, D.& Woodward, R. (1995).Environmental Indicators: ASystematic Approach to Measuring and Reporting onEnvironmental Policy Performance in the Context ofSustainable Development, World Resources Institute,Washington.

[64] Hastie, T.J. & Tibshirani, R.J. (1990).GeneralizedAdditive Models, Chapman & Hall, New York.

[65] Herlihy, A.T., Larsen, D.P., Paulsen, S.G., Urquhart,N.S. & Rosenbaum, B.J. (2000). Designing a spatiallybalanced, randomized site selection process for regionalstream surveys: the EMAP Mid-Atlantic pilot study,Environmental Monitoring and Assessment63, 95–113.

[66] Inselberg, A. (1985). The plane with parallel coordi-nates,The Visual Computer1, 69–96.

[67] Inselberg, A. & Dimsdale, B. (1994). Multidimensionallines II: proximity and applications,SIAM Journal ofApplied Mathematics54, 578–596.

[68] Jones, P.G. & Cook, D. (1995). Multivariate Q-Q plotsbased on quantile contours,Computing Science andStatistics27, 269–278.

[69] Kleiner, B. & Hartigan, J.A. (1981). Representing pointsin many dimensions by trees and castles,Journal of theAmerican Statistical Association76, 260–276.

[70] Kosslyn, S.M. (1994).Elements of Graph Design, W.H.Freeman, New York.

[71] Levkowitz, H. (1997). Color Theory & Modelingfor Computer Graphics, Visualization & Multimedia,Kluwer, Dordrecht.

[72] Loveland, T.R., Merchant, J.W., Reed, B.C., Brown,J.F., Ohlen, D.O., Olson, P. & Hutchinson, J. (1995).Seasonal land cover regions of the United States,TheAnnals of the Association of American Geographers85,339–355.

[73] MacEachren, A.M. (1994).Some Truth with Maps:A Primer on Symbolization & Design, Association ofAmerican Cartographers, Washington.

[74] MacEachren, A.M. (1995).How Maps Work, GuilfordPress, New York.

[75] MacEachren, A.M., Kraak, M.J., eds (1997).Computers& Geosciences23(4).

[76] MacEachren, A.M., Brewer, C.A. & Pickle, L.W.(1998). Visualizing georeferenced data: representingreliability of health statistics, Environment andPlanning, Series A30, 1547–1561.

[77] Marr, D. (1985). Vision: the philosophy and theapproach, inIssues in Cognitive Modeling, M. Aitken-head & M.M. Slack, eds, Lawrence Erlbaum, London,pp. 103–126.

[78] McGill, R., Tukey, J.W. & Larsen, W.A. (1978). Variationof boxplots,The American Statistician32, 12–16.

[79] Mitas, L., Brown, W.M. & Mitasova, H. (1997). Roleof dynamic cartography in simulating of landscapeprocesses based on multivariate fields,Computers &Geosciences23, 437–446.

[80] Monmonier, M. (1988). Geographical representationsin statistical graphics: a conceptual framework,1988Proceedings of the Section on Statistical Graphics,American Statistical Association, Alexandria, pp. 1–10.

[81] Monmonier, M. (1993).Mapping It Out, University ofChicago Press, Chicago.

[82] Monmonier, M. (1998).Cartographies of Danger –Mapping Hazards in America, University of ChicagoPress, Chicago.

[83] Olsen, A.R., Sedransk, J., Edwards, D., Gotway, C.A. &Liggett, W., et al. (1999). Statistical issues for monitoringecological and natural resources in the United States,Environmental Monitoring and Assessment54, 1–45.

[84] Olsen, A.R., Stevens, D.L. Jr & White, D. (1998).Application of global grids in environmental sampling,Computing Science and Statistics30, 279–284.

[85] Olson, J.M. & Brewer, C.A. (1997). An evaluation ofcolor selections to accommodate map users with colorvision impairments,The Annals of the Association ofAmerican Geographers87, 103–134.

[86] Omernik, J.M. (1987). Ecoregions of the conterminousUnited States,The Annals of the Association of Ameri-can Geographers77, 18–25.

[87] Omernik, J.M. (1995). Ecoregions: a spatial frameworkfor environmental management, inBiological Assess-ment and Criteria: Tools for Water Resource Planningand Decision Making, W.S. Davis & T.P. Simon, eds,Lewis, Boca Raton, pp. 49–62.

[88] Peterson, S.A., Larsen, D.P., Paulsen, S.G. &Urquhart, N.S. (1998). Regional lake trophic patternsin the northeastern United States: three approaches,Environmental Management22, 789–801.

[89] Pickle, L.W., Mingle, M., Jones, G.K. & White, A.A.(1997).Atlas of United States Mortality, National Centerfor Health Statistics, Hyattsville.

[90] Preston, S.D., Smith, R.A., Schwartz, G.E., Alexan-der, R.B. & Brakebill, J.W. (1998). Spatially referencedregression modeling of nutrient loading in the Chesa-peake Bay watershed,Proceedings of the First FederalInteragency Hydrologic Modeling Conference; Bridg-ing the Gap Between Technology and Implementation ofSurface Water Quality and Quality Models in the NextCentury, Meeting: First Federal Interagency HydrologicModeling Conference, Las Vegas, pp. 1.143–1.150.

[91] Puckett, L.J. (1999). Nonpoint and point sources ofnitrogen in major watersheds of the United StatesUS Geological Survey Water-Resource, InvestigationReport 94-4001, Reston.

Page 28: Graphical displays - Information Technology Servicesmason.gmu.edu/~dcarr/lib/EnvironmentalGraphics.pdf · environmental graphics. After a brief introduction, the focus herein is on

Graphical displays 27

[92] Rao, R. & Card, S.K. (1994). The table lens: merginggraphical and symbolic representation in an interactivefocusC context visualization for tabular information,Proceedings of the ACM Chi International Conferenceon Human Factors in Computing, pp. 318–322.

[93] Scott, D.W. (1992). Multivariate Density Estima-tion; Theory, Practice and Visualization, Wiley, NewYork.

[94] Scott, D.W. & Whittaker, G. (1996). Multivariate appli-cations of the ash in regression,Communications inStatistics–Theory and Methods25, 2521–2530.

[95] Slocum, T. (1998).Thematic Cartography and Visual-ization, Prentice-Hall, New York.

[96] Snyder, J.P. (1982).Map Projections Used by the USGeological Survey, US Government Printing Office,Washington.

[97] Stevens, D.L. Jr (1997). Variable density grid-basedsampling designs for continuous spatial populations,Environmetrics8, 167–195.

[98] Stevens, D.L. Jr & Olsen, A.R. (1999). Spatiallyrestricted surveys over time for aquatic resources,Journal of Agricultural, Biological, and EnvironmentalStatistics4, 415–428.

[99] Stone, D., Lijelun, L., Reiersen, L., Nilson, A., et al.(1997). Arctic pollution issues: a state of the Arcticenvironmental report,AMAP, Box 8100 Dep. N-0032Oslo, Norway.

[100] Symanzik, J., Carr, D.B., Axelrad, D.A., Wang, J.,Wong, D. & Woodruff, T.J. (1999). Interactive tablesand maps – a glance at EPA’s cumulative exposureproject web page,1999 Proceedings of the Section onStatistical Graphics, American Statistical Association,Alexandria.

[101] Takacs, B. (1996). Perception and recognition of humanfaces, Ph.D. Thesis, George Mason University, Fairfax.

[102] Tufte, E.R. (1983).The Visual Display of QuantitativeInformation, Graphics Press, Cheshire.

[103] Tufte, E.R. (1990).Envisioning Information, GraphicsPress, Cheshire.

[104] Tufte, E.R. (1997).Visual Explanations, Graphics Press,Cheshire.

[105] Tukey, J.W. (1979). Statistical mapping: what shouldnot be plotted,Proceedings of the 1976 Workshop onAutomated Cartography, Department of Health, Edu-cation and Welfare Publication No. (PHS) 79-1254,pp. 18–26.

[106] Tukey, J.W. (1983). Another look at the future,Computer Science and Statistics, Proceedings of theFourteenth Symposium on the Interface, Vol. 14,K.W. Heiner, R.S. Sacher & J.W. Wilkinson, eds,Springer-Verlag, New York, pp. 2–8.

[107] Tukey, J.W. & Tukey, P.A. (1983). Some graphicsfor studying four-dimensional data,Computer Scienceand Statistics, Proceedings of the Fourteenth Symposiumon the Interface, Vol. 14, K.W. Heiner, R.S. Sacher& J.W. Wilkinson, eds, Springer-Verlag, New York,pp. 60–66.

[108] Vogelmann, J.E. & Wickham, J.D. (2000).Implemen-tation Strategy for Production of National Land-CoverData (NLCD) from the Landsat 7 Thematic MapperSatellite, EPA 600/R-00/051 Office of Research andDevelopment, US Environmental Protection Agency,Washington.

[109] Wahlstr̈om, E., Hallanaro, E. & Manninen, S., eds(1996). The Future of the Finnish Environment, Edita,Helsinki.

[110] Wegman, E.J. (1990). Hyperdimensional analysis usingparallel coordinates,Journal of the American StatisticalAssociation85, 664–675.

[111] Wegman, E.J. (1991). The grand tour in K dimen-sions, Computer Science and Statistics, Proceedingsof the Twenty-second Symposium on the Interface,pp. 127–136.

[112] Wegman, E.J. & Carr, D.B. (1993). Statisticalgraphics and visualization, inHandbook of Statistics,Computational Statistics, Vol. 9, C.R. Rao, ed., North-Holland, New York, pp. 857–958.

[113] Wegman, E.J. & Luo, Q. (1994).Visualizing Den-sities, Technical Report No. 100, Center for Com-putational Statistics, George Mason University,Fairfax.

[114] Wegman, E.J. & Luo, Q. (1997). High-dimensionalclustering using parallel coordinates and the grand tour,Computing Science and Statistics28, 361–368.

[115] Wiken, E.B. (1986). Terrestrial ecozones of Canada,Environment Canada, Ottawa, Ontario, Canada, Eco-logical Land Classification Series No. 19.

[116] Wilkinson, L. (1999). The Grammar of Graphics,Springer-Verlag, New York.

[117] Wong, D.W.S. & Amrhein, C.G. (1996). Researchon the MAUP: old wine in a new bottle or realbreakthrough?,Geographical Systems3, pp. 73–76.

[118] Wood, D.W. (1992).The Power of Maps, GuilfordPress, New York.

(See alsoInfluence diagrams; Nelder plots; Riskperception; Statistical computing in environmentalscience)

DAN CARR