c this material has not a been copy-edited yet, so you may ... · 1. choropleth maps these are...

21
www.thefunctionalart.com © ALBERTO CAIRO ALL RIGHTS RESERVED. COPIES AND DISTRIBUTION OF THIS DOCUMENT ARE NOT ALLOWED WITHOUT PERMISSION This material has not been copy-edited yet, so you may find not only typos, but also a few errors 1 © ALBERTO CAIRO

Upload: others

Post on 22-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

www.thefunctionalart.com

© Alberto CAiroAll rights reserved.

Copies And distribution of this doCument Are not Allowed without

permission

This material has not been copy-edited yet, so you may find not

only typos, but also a few errors

1

© Alberto CAiro

Page 2: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

Thematic MapsStatistics and Cartography Meet

If the goal of any infographic is to tell a very complex story using limited space (or time, if we consider multimedia), thematic maps are probably the purest and most successful form of information graphics.

To define thematic maps we have to incur in tautology: they are those maps that have a theme, a topic, a specific set of data to focus on. They eliminate some kinds of features to better stress the ones that are relevant to the story. They are a hybrid between traditional mapping and statistical representation intended to portray (and also reveal) patterns of spatial distribution of one or more magnitudes.

The International Cartographic Association defines thematic maps as those that are designed “to demonstrate particular features or concepts.” The emphasis in locator maps and general-reference maps is placed on locating an event. The emphasis of thematic maps is placed on locating phenomena that usually come in the form of quantitative data. The goal of a thematic map is to tie a set of data to its geographical locations.

Unfortunately, not many newspapers outside of the US have taken thematic maps seriously, and even inside the country there are still not enough publications driven by high cartographic standards. It is

true that it is very common to see some very basic colored maps where each region uses a different shade of the same hue depending on the prevalence of a magnitude, percentage of foreigners in the different regions of Spain, for example. Or crime rate in the counties of North Carolina, but that’s all.

It is not common to find pieces as elaborate as the ones reproduced on this page and the next. Both were published by The New York Times, a newspaper that has made thematic mapping one of its main tools for news reporting. In the last ten years, The New York Times has achieved a level of sophistication in the use of thematic cartography unparalleled by any other newspaper in the world.

Creating a thematic map can be time-consuming, but the basic rules that guide its design are not necessarily hard to understand. In this chapter we will

Chapter 6

2 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 3: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

discuss some of the concepts that were presented in the previous one, and we will see some others. It is also necessary to remember some basic statistical tools in order to grasp how the data should be gathered and processed. Those tools will come in handy when we see statistical charts later in this book, so it is a good idea to start warming up.

pAtterns And dAtAThe first thing to understand is that for any thematic map the whole is bigger than the sum of its parts. That’s

the case of the map above, which shows the distribution of the more than 1 million people that applied for help after hurricane Katrina struck New Orleans. The map uses ZIP codes to distribute proportionally scaled circles that represent the amount of applications received from those places.

The story seems to be clear: there are many applications and they are concentrated mainly in the Gulf of Mexico area. However, the map also tells the story of a diaspora: thousands of applications were received from Dallas and Houston, cities that hosted many refugees, but also, surprisingly, from areas as far from New Orleans as Los Angeles, Detroit, Seattle or Boston.

There is another level of information here: the counties from which families did not file applications are colored white. The ones that sent applications are colored in grey. It is a complex story told in just six columns of a newspaper. In this piece, the designers did not want to focus on every single number, but to make the pattern rise: most of those affected remained in the Gulf Area. Those who fled went to very different places, probably to where their families came from. The whole is bigger than the sum of its parts.

A preliminary lesson that we can extract from the Katrina map and the one on the previous page, an outstanding background piece about Israel and the West Bank, is that the main components of any thematic map are a traditional geographical map (the base) and one or more data overlays. These overlays should be designed so the reader would not have any difficulty differentiating them.

There are numerous types of thematic maps whose effectiveness has already been tested. However, thematic mapping is a flexible area with a lot of room for creativity and new ways of displaying data, so don’t be afraid to push the boundaries of the categories that I will explain on the next pages. It is also possible to combine

different kinds of thematic maps into the same piece. The possibilities are unlimited.Turn the page and get your first glance of the realm of thematic maps.

The Katrina Diaspora©The New York Times.

Reproduced with permision.

3© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 4: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

1. Choropleth MapsThese are probably the most common kind of thematic map in news

publications because their meaning is usually obvious to the reader. They are created by using different shades of color to represent the proportions of a certain magnitude in different areas. They are appropriate for showing ratios, percentages and other derived data, and usually don’t work very well for displaying raw amounts.

In other words: a choropleth map is good for representing the percentage of Hispanic population or the number of abortions per thousand pregnancies, but not for the total number of Hispanics or the total number of abortions in certain areas. The reason is that color intensities are unconsciously associated with concentration of a variable in a certain area, and not with total amounts or values.

2. Dot MapsThese are used to map discrete phenomena (see next page), and their main

goal is to reveal the density pattern of those phenomena. Every dot or point on the map can represent a single unit or, in the case of small-scale maps that show entire countries or continents, groups of units.

3. Proportional Symbol MapsIf choropleth maps are suited to showing derived (or “standardized”) data

such as percentages or ratios, proportional symbol maps usually work better with the raw data itself. They are also called “graduated quantitative point symbol” maps because they use objects scaled up or down depending on the amount of a variable in each place. The most widely used symbol is the circle, although is not uncommon to see square-based maps or even triangle-based maps.

4. Isopleth MapsThese share one characteristic with choropleth maps: they also use shades

of color (or color lines) to show the density of phenomena in a certain area. However, the boundaries of each shaded area are not determined by political or geographical boundaries, but by “lines of equal value” (called “isolines”). A good example is the very common weather map with the lines that show areas that will have the same temperature the next day.

5. Flow MapsThese represent movement across the map using lines with thicknesses

proportional to the values used. The most famous example of this kind of map is Charles Joseph Minard’s map of the Napoleon’s Russian campaign, discussed in the history of maps chapter. They are very difficult to design, as no software tool will generate one automatically: they have to be drawn manually and every thickness has to be calculated individually, which is both complicated and time consuming. However, when they are well done, they are usually strong, informative and attractive pieces.

6. Cartograms or Value-by-AreaSome designers argue that cartograms cannot be even considered “maps”

at all, but pure diagrams. They are called “value-by-area” maps because each area of the map is scaled up or down depending on the density or the value of the phenomena shown.

The main challenge when you’re designing these displays is to keep the shapes of the different areas recognizable, as they are heavily distorted in most cases. As in the case of Flow Maps, no GIS software will make the work for you. If you want to create a cartogram, you will have to do it manually.

0-25%

Percentage of peoplewho have eaten boiledoctopus in the last year

26-50%51-75%76-100%

Choropleth MapLorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis placerat, tellus at vehicula porttitor, neque erat

Every dot representa restaurant that hasserved boiledoctopus in the last year

Dot MapLorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis placerat, tellus at vehicula porttitor, neque erat

People that haveeaten boiled octopusin the last year

ProportionalSymbol MapLorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis placerat, tellus at vehicula

1,000,000500,000250,000

Cooking temperaturefor the boiled octopus

Isopleth MapLorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis placerat, tellus at vehicula porttitor, neque erat

100-105°C106-110°C111-115°C116-120°C

Tons of boiled octopusexported in Spain

Flow MapLorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis placerat, tellus at vehicula porttitor, neque erat

4 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 5: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

stAtistiCs terms CruCiAl for themAtiC mAppingIn order to be considered “information” raw data needs to be processed and classified according to certain rules. It is important

to understand some key concepts before you even start creating thematic maps. The first word I am going to ask you to memorize is phenomena.Any single object or feature susceptible to being geographically displayed is a phenomenon: cities and towns, murder rate in the area of Washington D.C., country boundaries, roads, Irish pubs in Pamplona (Spain), or restaurants on Franklin Street.

Phenomena can be discrete or continuous, depending on whether they happen at a certain spatial point or they are located everywhere. “Restaurants on Franklin Street” is an example of discrete phenomena, whereas “murder rates in North Carolina” is a continuous one because not a single city or town in the area is empty of this feature.

Some cartographers also talk about “abrupt” and “smooth” phenomena. Discrete phenomena have abrupt boundaries: they can be located in that place and nowhere else. This distinction is important, as different kinds of phenomena are better displayed with different kinds of thematic maps, and, more importantly, because even if we are displaying continuous phenomena, in many cases we will be forced to set artificial, conventional boundaries to classify the data, as we will see later.

Irish Pubs

A-15

A-15

Scottish PubsPercentage of Irish Pubs overthe total of bars in Pamplona, Spain

A-15

A-15

0-10% 11-20% 21-30%

Discrete phenomena Continuous phenomena

In statistical analysis, the phenomena that can be associated with quantitative data are called variables: population density, average household income, real estate prices. Each individual observation of those variables is called a value (the actual “data”). If the variable is “population density”, the values can be 536 people/sq. mile, 459 people/sq. mile, and so on.

All thematic maps can be classified according to the data that they use. They can show raw data or derived data of some sort. The first map above is an example of a raw-data, discrete phenomena map: each point represents a single bar. The second one, on the other side, uses derived data (percentages). Derived data, also called standardized data, represent relationships between features: babies born per 1,000 people a year. Percentage of kids that have purchased an ice cream in the last month. Or population per square mile, like in this list:

RegionPopulation(raw data)

Gallia LugdunensisMaxima SequanorumViennensisAquitanicaNovempopulanaNarbonensisBaeticaCarthaginiensisTarraconensisGallaeciaLusitaniaMaxima CaesariensisBritannia PrimaFlavia CaesariensisSamnium

1,327,4201,234,9001,221,478

988,765956,218942,390925,117909,998854,325842,127825,321743,400701,558688,433651,129

Square miles(raw data)

12,23413,49115,43010,20017,23914,29513,44515,77619,00112,39411,4389,456

15,43113,23411,092

Population per square mile(derived-standardized data)

108.591.579.296.955.565.968.857.745.067.972.278.645.552.058.7

5© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 6: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

Derived/standardized data can be classified according to the relationships they create. The ones that you will find most often are averages, ratios and densities. A fourth group, potentials, will not be discussed in this book, as they are not so widely used in newspapers and magazines.

1. AverAges

An average is a mathematically calculated value that is used to characterize a set of data. If you want to show off for your friends you can refer to them as “measures of central tendency,” which takes longer to say but it is also pretty self-explanatory. Somehow, a “measure of central tendency” works like the axis every other value rotates around.

There are three different kinds of averages used in thematic mapping: the mode, the median and the mean. Each one of them is appropriate for different kinds of sets of data.

The mode is the most frequently found value in any distribution of data:�e most frequent observation is

25�e most frequent observation is

Basil

36, 25, 47, 328, 25, 66, 13, 32, 25, 25, 47, 390, 458, 335, 25, 13, 32, 25, 400, 25

Acacia, Basil, Cactus, Hawthorn, Basil, �istle, Rose, Narcissus, Parsley, Basil

(If two numeric values are equally prevalent, the mode will be the mean between them).The mode is useful for maps that show the most frequent kind of phenomena that can be found in an area; for example, the dominant

vegetation species or the most widely spoken language. For the purposes of this book the mode is the least relevant of the three averages.In any distribution, the median value lies exactly in the middle of the list. That is, half of the values will be higher than the median and

the other half will be lower.

A Barbanza A Barcala A Coruña Arzúa Bergantiños Betanzos Eume FerrolFisterra

Muros Noia O Sar Ordes Ortegal Santiago Terra de MelideT. de Soneira Xallas

You canaverage them:

9.5

MEDIAN9 and 10

Likehood of having eaten boiled octopus in thelast month in the regions (”Comarcas”) of A Coruña, Spain(1: No likely, 18: Very likely)

Likehood of having eatenboiled octopus inA Coruña(October 2006)

Median

Less likely More likely

123456789

101112131415161718

Source:InventedData

A Barbanza A Barcala A Coruña Arzúa Bergantiños Betanzos Eume FerrolFisterra

Muros Noia O Sar Ordes Ortegal Santiago Terra de MelideT. de Soneira Xallas

47%43%22%15%33%18%50%28%21%

Percentage of people who have eaten boiled octopusin the last month in the regions (”Comarcas”)of A Coruña, Spain

Percentage of people whohave eaten boiled octopusin A Coruña(October 2006)

515 = 28.6%18

Sum ofall values

MEAN

Number ofvalues

Average28.6%

11.0%-20.0%21.0%-28.6%

33%36%28%16%37%42%15%20%11%

Source:InventedData

41.0%-50.0%28.7%-40.0%

The mean is what in the common language we know as “average,” although this can be misleading (the median and the mode are also averages, after all). Calculating it is pretty intuitive: add all the values on your list and divide that sum by the total number of values.

6 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 7: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

When do you use the median or the mean for your set of data? The median is usually used for ordinal sets of data, those that have some kind or hierarchy and that don’t indicate how much each value is separated from the next. On the map shown in the previous page, you will see quite clearly: the regions in A Coruña, in the northwestern coast of Spain, are arranged according to the likelihood that people have eaten boiled octopus in the last year or not. But the map does indicate whether being first means that people in that region eat 10%, 20% or 30% more octopus than the ones in the region that is classified second.

On the other hand, the mean is used for sets of data where you know how broad the differences between the values in the array are. The second map of A Coruña is a good example of this.

2. rAtios

A ratio is a way of standardizing data by relating each value to the whole or to the units into which the set is divided. There are two main kinds: rates and proportions.

A rate is a derived quantity that relates two different categories. For example, children per marriage, bits downloaded per second by a P2P program, etc. The formula for calculating rates is quite simple:

A Barbanza A Barcala A Coruña Arzúa Bergantiños Betanzos Eume FerrolFisterra

Muros Noia O Sar Ordes Ortegal Santiago T. de MelideT. de Soneira Xallas

4,2453,3894,8902,2293,4503,9433,6614,3193,235

3,6753,2334,0112,9343,8013,1214,0013,1984,545

945876434578559859459444672

323500399687802768698589576

11.46.5

10.14.34.74.15.75.47.9

Boiled octopi cooked per restaurant in Jan. 27 in the regions (”Comarcas”) of A Coruña, Spain

Boiled octopi cooked per restaurant inA Coruña( Jan. 27, 2006)

Rate = na/nb Where na is the first category and nb is the second

4.53.9

11.33.96.24.68.09.74.8

Totaloctopus

Ratio

Totalrestaurants

Totaloctopus

Ratio

Totalrestaurants

3.9-5.45.5-7.07.1-8.68.7-10.210.3-11.4

Source:Invented Data

A Barbanza A Barcala A Coruña Arzúa Bergantiños Betanzos Eume FerrolFisterra

Muros Noia O Sar Ordes Ortegal Santiago T. de MelideT. de Soneira Xallas

4,2453,3894,8902,2293,4503,9433,6614,3193,235

3,6753,2334,0112,9343,8013,1214,0013,1984,545

110108105103126103108150115

142121132113104124131120132

3.9%3.7%3.3%3.9%2.7%4.0%3.3%3.8%2.9%

Percentage of nine-legged boiled octopi purchased onJan. 27 in the regions (”Comarcas”) of A Coruña, Spain

Nine-legged octopi purchased per restaurant inA Coruña( Jan. 27, 2006)

Proportion = na/N Where na is the category (nine-legged) and N is the total octopi

Percentage = na/N × 100

2.6%3.2%2.1%4.6%3.7%2.6%3.0%3.5%3.6%

NineLegged

Percentage

Totaloctopus

NineLegged

Percentage

Totaloctopus

2.1%-2.5%2.6%-3.0%3.1%-3.5%3.6%-4.0%4.1%-4.6%

Source:Invented Data

A proportion relates categories of the same nature. In the case below, the map shows the number of nine-legged octopi related to the number of octopi purchased in each region. If a proportion is multiplied by 100, you will obtain a percentage, which is probably the most common kind of ratio average that you will find on thematic maps:

7© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 8: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

3. densities

You will use densities when the main goal of the map is to show the pattern of concentration or dispersion of the phenomena in different areas. Densities are pretty similar to rates in the sense that they relate two different categories to obtain the derived data. With densities, however, the second category involved is always a measure of geographical area: population per square mile.

A Barbanza A Barcala A Coruña Arzúa Bergantiños Betanzos Eume FerrolFisterra

Muros Noia O Sar Ordes Ortegal Santiago T. de MelideT. de Soneira Xallas

5,1432,3847,2941,2494,4208,9332,4414,3496,234

2,6455,2324,1133,4392,1081,3122,1002,8915,210

5.410.9

8.67.24.92.74.85.9

12.0

People per square kilometer that have purchased boiledoctopus in the last month in A Coruña, Spain

People per square kilometerthat have purchased boiledoctopus in the last monthin A Coruña(October, 2006)

Density= N/A N is the value observed and A is the geographic area of the region

12.24.8

16.82.6

10.418.4

5.810.113.8

Purchasers Density

Squarekilometers

Purchasers Density

Squarekilometers

2,6-5.75.8-8.99.0-12.212.3-15.815.9-

420501435485426485421429452

492482477479434481434488435

Source:Invented Data

Choropleth mApsThe very strange name of this map type has Greek

origins: choros means “place” and pleth means “value”. These maps are commonly known as shaded maps because of their main construction feature: matching different colors (or shades of color) to regions (from now on, enumeration units), which usually correspond to political or administrative subdivisions of the territory, according to a scale of values.

Choropleth maps are perfect for data that correspond with artificial or conventional boundaries (countries, regions, counties, etc) and that can be considered evenly distributed across each one of those enumeration units. That is one of the reasons it is not usually a good idea to use a choropleth to map raw data: raw data is unevenly distributed, as a general rule. It is concentrated in some areas of the region and disperse in others, whereas standardized data (percentages, densities) correspond to the boundaries of the region.

There are two things that we have to consider when designing choropleth maps: a) how many enumeration units (regions) you will display and b) how many subdivisions (classes) you are going to use to divide the data set. As a general rule, the more enumeration units and classes used, the better, as the data will be more accurately represented.

To see how the message conveyed by an election map depends on those two variables, I have used three maps about the 2004 Presidential Elections. The first map on the right uses only two classes: Kerry wins or Bush wins. No nuance here: the data is either red or blue, right or left, Democrat or Republican. Fortunately, reality is

8 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 9: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

not black or white. There are usually many shades of grey in between, and the second map takes this into consideration. On the second map, the data is better classified: it uses the same number of enumeration units but more classes (data subdivisions).

The problem is that the enumeration units are too big: again, there is not enough room for nuance here. A state either supported Bush or Kerry. That is not totally wrong; after all, any candidate that gets the majority of votes wins the state, but it is still a limited picture. What about if we use counties as enumeration units? The results are much more interesting and accurate.

The-more-the-better rule has severe limits, of course, as too much detail can make a piece illegible. In order to decide how many geographic enumeration units to use, you have to first consider how big the map is going to be. In deciding how many classes the map legend should include you have to think about how many color shades the average eye can identify. Let’s focus first on data classification.

how to ClAssify dAtAClassifying data means that you will have to devise subdivisions that split the value scales into intervals: 0-20, 21-30, 31-40... Those

interval ranges are called classes. Some thematic map legends look beautifully intuitive:

0%-20% 21%-40% 41%-60% 61%-80% 81%-100%

I wish that we were able to split data sets into classes of equal range like the ones above more often but unfortunately, that’s not the case. What about if most of the values of your distribution are concentrated at the bottom of the scale? Look at these two maps and then take a look at the data used to create them:

0-10%11-20%21-30%31-75%75-100%

0-20%21-40%41-60%61-80%81-100%

GaliciaAsturiasCantabriaPaís VascoNavarraLa RiojaAragónCataluñaCastilla-LeónMadridExtremaduraCast.-La ManchaAndalucíaMurciaValenciaBalearesCanarias

6%7%8%8%

12%14%16%19%22%24%25%33%35%36%38%75%98%

123456789

1011121314151617

% who use pink whigsData A bit

BetterNO 12 3 4

56

7 89

10

11 12

1314

15

16

17

12 3 4

56

7 89

10

11 12

1314

15

16

17

As the data is concentrated at the bottom of the scale and only two of the regions have high values, two of the class intervals are empty on the first map: the 41-60% and the 61-80%. Moreover, a not-very-good classification has a worse consequence: the overall transition pattern is not as visible as in the second map. The second map uses classes of unequal ranges and, as a result, the map looks richer and more interesting.

As an information graphics journalist, it is very likely that you will have to process raw data every day. Data sets are more and more available, both in print publications and on websites. The primary impulse if you are absent-minded (as I am) is to take all the values, divide them into equal intervals and there you go: a nice choropleth map. This is encouraged by a very common work

method in newsrooms that stresses the importance of finishing graphics as quickly as possible, without paying too much attention to “small technical details” . Small details, as we just have seen, can make a huge difference.

There are two ways of approaching data classification: visual or “informal” and mathematical or “formal.” On the maps above I used the informal approach to shape the story that I wanted the map to tell. I simply took a look at the data and identified some breaks that could serve as the boundaries of the five classes. Is this the right way to proceed? Some statisticians would say no, as we are relying on pure guesses. But it is a quick way of devising intervals if you are in a breaking-news situation and you don’t have time to stop and do the math. However, on a regular basis, you should opt to calculate the breaks more precisely. Let’s roll up our sleeves and sharpen our pencils.

9© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 10: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

A device that is commonly used in cartography to pick a method of classification is the histogram. A histogram is a simple bar chart that displays the frequency of each value or set of values. It is an excellent way to see which areas the data are concentrated in. In the case of the data below, if we quickly create the bar chart with any software tool we will find out that the values that are most frequently observed lie within the 7% and 13% range. Below and above that range, the frequency (number of observations) decreases abruptly.

% who use pink whigsAlavaAlbaceteAlicanteAlmeríaAsturiasAvilaBadajozBarcelonaBurgosCáceresCádizCantabriaCastellónCiudad RealCórdobaCuenca

GironaGranadaGuadalajaraGuipúzcoaHuelvaHuescaBalearesJaénA CoruñaLa RiojaLas PalmasLeónLleidaLugoMadridMálagaMurcia

NavarraOurensePalenciaPontevedraSalamancaSegoviaSevillaSoriaTarragonaTenerifeTeruelToledoValenciaValladolidVizcayaZamoraZaragoza

3%4%5%5%6%6%7%7%7%8%8%8%9%9%9%9%

10%10%10%10%10%11%11%11%11%11%11%12%12%12%12%12%12%

13%13%13%13%14%14%15%15%16%17%18%19%20%21%22%23%24%

-0

1

2

3

4

5

6

Number ofobservationsof each value

Values24%23%22%21%20%19%18%17%16%15%14%13%12%11%10%9%8%7%6%5%4%3%

Keeping the histogram in mind, let’s examine the different methods of classification of data:

1. equAl intervAls

This method is also called “equal steps” or “constant intervals.” Boundaries calculated according to this method will enclose equal ranges of data: 0-10, 11-20, 21-30, 31-40, and so on. The main problem with equal intervals distributions is that they don’t consider which values are more frequently observed in the data set. Therefore, this method works well only if the histogram created with the data does not show big differences in the observations of the values. Or, in the ideal case, if it’s nearly rectangular.

Some software tools will calculate the intervals for you but it is good to learn how to do it manually. Follow the steps below:

�ese are the raw data And this is the histogramAlavaAlbaceteAlicanteAlmeríaAsturiasAvilaBadajozBarcelonaBurgosCáceresCádizCantabriaCastellónCiudad RealCórdobaCuencaGirona

GranadaGuadalajaraGuipúzcoaHuelvaHuescaBalearesJaénA CoruñaLa RiojaLas PalmasLeónLleidaLugoMadridMálagaMurciaNavarra

5%6%7%7%9%

12%13%13%14%18%21%21%24%26%27%29%32%

32%35%37%39%39%41%46%47%48%49%51%52%53%54%57%63%64%

OurensePalenciaPontevedraSalamancaSegoviaSevillaSoriaTarragonaTenerifeTeruelToledoValenciaValladolidVizcayaZamoraZaragoza

64%68%69%71%77%78%78%79%81%81%83%84%90%92%94%96%

Steps to calculate the equal intervals

1 Calculate the range (R) of the data 3 Calculate each class upper limit

2 Decide the number of classes youare going to use (NC)We are going to use 6

3 Obtain the class interval (CI)

R = =96-5 = 91 5+(1×15.2) = 20.2Highestvalue - × CI)(1+Lowest

valueLowestvalue

�e other limitscan be calculatedthe same way orsimply addingthe class interval(15.2) to theprevious limit

When you finish,use the upper limitsto create the legend

Limit1

= 5+(2×15.2) = 35.4Limit2

= 5+(3×15.2) = 50.6Limit3

= 5+(4×15.2) = 65.8Limit4

= 5+(5×15.2) = 81.0Limit5

= 5+(6×15.2) = 96.2Limit6

CI = RNC

= 15.2916

0

1

2

3

4

5

6

Frequency: howmany times avalue is observed

Values (I organized them in categories to make things easier)0-10% 11-20% 21-30% 31-40% 41-50% 51-60% 61-70% 71-80% 81-90% 91-100%

5-20.2% 20.3-35.4% 35.5-50.6%50.7-65.8% 65.9-81.00% 81.1-96.0%*

*�e last limit is substituted by the higest value

10 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 11: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

Another way of calculating intervals that are equal in size is using the standard deviation of the data set. The standard deviation is a measure of how far the values of a distribution are from the mean, or, in specialized jargon, of how dispersed the data are. A large deviation means that the data are, as a general rule, much bigger or smaller than the mean. On the other hand, if the standard deviation is small, it means that the values are clustered around the mean.

Finding the standard deviation of a small data set is relatively easy, although a calculator will come in handy. However, doing it manually for longer lists is time consuming, and you will need the aid of some kind of software tool, such as Microsoft Excel. The standard deviation is often useful for thematic maps with values that are going to be organized around the mean because each class can be calculated just by adding or subtracting the standard deviation to the mean.

This is how to calculate the standard deviation:

A Barbanza A Barcala A Coruña Arzúa Bergantiños Betanzos Eume FerrolFisterraMuros Noia O Sar Ordes Ortegal Santiago T. de MelideT. de Soneira Xallas

Sum of (Every value-Mean)Number of values

2

SD =

�is is the formula

877 = 48.7%18

Sum ofall values

MEAN

Number ofvalues

�ese are the raw data Calculating the standard deviation (SD)

SD = 27.70603 = 27.7

(5-48.7) +18

2 (11-48.7) +2 (17-48.7) +2 (20-48.7) +2 (24-48.7) +2 And it goes on, and on...SD =

Translated to actual numbers...

And the result of the square root of that huge sum is...

We can use the standard deviation to calculate the intervals�e scale will show above and below-the-average values

5%11%17%20%24%29%35%41%47%52%55%61%66%72%77%82%88%95%

48.7%(Mean)

21.0%Mean-SD:48.7-27.7

76.4%Mean+SD:48.7+27.7

If we need more intervals, it is acceptable tomanipulate the SD a bit, for example, by dividing it by 2:

27.7 / 2 = 13.85 48.7%21.0% 34.85% 62.55% 90,25%7.15% 76.4%

Below the average Above the average

Equal-interval scales are a very common choice in newspapers and magazines because they are easily understood by readers. The main reason is that there are not gaps or missing values in the intervals; the lower limit of an interval is one number (or a fraction) bigger than the upper limit of the previous one. This is really intuitive.

However, as I said before, no every data set is so nicely organized around the mean like the ones that I used for the examples shown. In some data sets the data dispersion is crazy. Just try to think of categorizing these figures in an equal interval scale: 5, 6, 7, 7, 8, 9, 10, 11, 11, 11, 12, 13, 47, 49, 97, 100. It is obvious that there are huge gaps in the data: between 13 and 47 and between 49 and 97. Any categorization scale should be adjusted based on how the values are distributed.

2. quAntiles

The quantile method of classification has the same limitations as equal intervals method: it does not work very well when the data are widely dispersed, like the one shown in the previous paragraph. The main difference between equal intervals and quantile is that in the latter we focus our attention not on the values, but on their frequency.

In a quantile distribution you will never see an empty class or classes that contain a lot or a few values within its boundaries. In a quantile classification, each class will include the same number of observations.

The main challenge of the quantile method is that gaps between the classes may exist (20%-36%, 39%-48%, 51-60%, and so on), and that can puzzle some readers.

Calculating a quantile scale is quite simple. To determine how many observations will lie in each class:1. Decide how many classes you need. I usually try to avoid using more than six, as I will explain later. So let’s use six.2. Divide the number of observations by the number of classes.

Values (36) Calculate the range of each class Place 6 observations per class1 3 5 7 9 9 11 13 16 18 21 24

27 30 33 36 39 41 45 48 52 52 59 64

67 74 77 81 84 89 92 94 94 96 96 98

Range ofeach class

6 values ineach class

Number of values

Number of classes= = =36

6

1 3 5 7 9 9 11 13 16 18 21 24

27 30 33 36 39 41 45 48 52 52 59 64

67 74 77 81 84 89 92 94 94 96 96 98

92-867-8945-6427-4111-241-9

11© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 12: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

Raw data: flu cases per 1,000 people (50 regions)AlavaAlbaceteAlicanteAlmeríaAsturiasAvilaBadajozBarcelonaBurgosCáceresCádizCantabriaCastellónCiudad RealCórdobaCuencaGirona

GranadaGuadalajaraGuipúzcoaHuelvaHuescaBalearesJaénA CoruñaLa RiojaLas PalmasLeónLleidaLugoMadridMálagaMurciaNavarra

56779

121313161821212426283032

3235373939414648485151535355576365

OurensePalenciaPontevedraSalamancaSegoviaSevillaSoriaTarragonaTenerifeTeruelToledoValenciaValladolidVizcayaZamoraZaragoza

65687171787880808283838790929496

567791213131618212124262830

3232353739394146484851515353555763

6565687171787880808283838790929496

5-13 16-30 32-4648-63 65-80 82-96

Range ofeach class = = 8.33 values

in each class

We are going to create classes including 8 or 9 values�e goal will be to make the classes equally big and toplace equal values inside the same classes

506

Class 1

Class 2

Class 3

Class 4

Class 5

Class 6

There are certain cases in which you may face some problems. For example, the number of values in each class could be a fraction figure, such as 6.7. Or two equal values may lie in two different categories. In those scenarios, use your common sense and don’t be rigid! Try to make class sizes more or less equal and move the two equal values to the same class by placing the break somewhere else.

3. (nAturAl) breAksThis method is a half-objective, half-subjective way of dividing the data up into classes. It is half-objective because you will need to

create some kind of chart so that you can see how the data are organized. It is half-subjective because what you will do with that chart is to identify “natural breaks” that limit logical clusters of data and use them as boundaries for the different classes. In the example below, we would place the breaks where the slope of the bars or the big gaps of data occur. The breaks method is also called “classes based on similarities” for obvious reasons: those values that have similar amount of observations should be grouped together.

If the histogram looks like this...

A 6-class scale based on those“natural breaks” mightlook like this:

( equals to 1 observation of that value) Identifies places on the distribution where “natural breaks” occur due to the steep slopes in the chart

0% 5% 10% 15%Values

20% 25% 30% 0%-5%6%-9%10%-13%14%-18%19%-22%23%-30%

Between5 and 6%

Between9 and 10%

Between13 and 14%

Between18 and 19%

Between22 and 23%

4. optimAlThis category comprises several different techniques that are intended to create classes that are internally homogeneous and that are

adapted to the dispersion of the data. To calculate classes using these techniques you will need specialized GIS software, but their rationale can be explained. You start creating an arbitrary set of classes; it could be an equal interval or a quantiles. Then the values are moved among classes trying to achieve a better array.

The most famous optimal method is the Fisher-Jenks Algorithm. If you want to know more about this and other optimal classification systems, refer to the bibliography at the end of this section.

5. unClAssedThere is another way of classifying data for a choropleth map: not

classifying it at all. The purest choropleth map is the one that has a class for each value. If you use this technique, you will assign a color shade to every data value. The resulting map pattern will be more subtle, smoother. Not all data sets are good for this kind of classification, as in many cases you will need to use abrupt limits between classes of values so the message will be clear. But it is still another available choice.

DummyHeadline

0%

100%

50%

75%

25%Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer sed leo. Vestibulum vel tellus. Nunc ornare sapien id nunc.

12 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 13: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

Some recommendations regarding the design of choropleth maps:

1. As with any other infographic, ask yourself what the story is. That will drive all your further decisions.

2. Make sure that the legend is clear and that the differences between the shades of color used are noticeable enough. If you work for a newspaper, you will have to exaggerate these differences a bit more, as the paper used to print it absorbs more ink than glossy magazine paper. The consequence is that colors tend to look dull. Before you start making complex choropleth maps, do some print tests to identify which hues look better printed.

3. Don’t use more than six classes unless it is necessary. The average human eye finds it difficult to distinguish more than six shades of a hue. Place the highest values at the top of the scale in the legend.

4. As a general rule, the information you will get from the wires and agencies will be raw data. You will have to standardize it; don’t use raw data in a choropleth map. You will have to pick a classification method, as well. Use equal-intervals when the distribution histogram looks more or less rectangular. If that is not the case, choose optimal (if you use GIS software) or natural breaks (if you don’t). Use quantiles only if you think that the reader will notice that there are gaps between the class limits.

5. Pay attention to the enumeration units (regions) you use. If they are very big and you use too few, your map will not be accurate enough. If you use too many and they are tiny, the map can be confusing. Remember the election maps we saw a few pages ago.

OKNO 0-1011-2021-3031-4041-5051-6061-7071-8081-9091-100

81-10061-8041-6021-400-20

�e map above is OK, but the information would be clearer if you publish it along the one below

proportionAl symbol mApsThere are few infographics as beautiful as a well designed, data-rich, complex proportional symbol map. How much did each county

count in the 2004 Presidential Elections? A combination of geographic location with a technique that traditionally has belonged to the statistical display discipline might reveal the pattern. The pattern is clear at the first glance:

13© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 14: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

Unfortunately, it is not uncommon to see maps like the one on the left. In the 80s, when Mac computers and vector illustration programs became more and more available and easier to use, the proliferation of cute symbols and unnecessarily elaborate drawings in newspapers and magazines were a concern. In this case, the complexity of the symbols used to portray the data makes it impossible for the reader to perceive a clear pattern. It is obvious that more octopus are purchased in coastal areas, but how many more?

A proportional symbol map is based on scaling a particular kind of object according to the data. Each symbol is then placed on the center of the geographical location where each data observation was

made. The most popular shape on proportional symbol maps is the circle. It is

very compact and solid and it looks pleasant to the eye because of the lack of corners. Smoothness is a plus. Squares are also quite popular, although, as

they have straight edges and 90° angles, they tend to look “edgy” and not as appealing to the eye as circles. Triangles are less common, but they can be an option for certain stories.

Another two options are volumetric, 3D-like shapes, that are intriguing and could be appropriate for stories about the “volume” of a phenomena. Also, as in the octopus thematic map above, pictographic symbols are other option. My rule of thumb is to avoid volumetric and pictographic symbols, though, as I think that they are too distracting: the detail of the illustration gets in the way of the data. Also, complex shapes and volumes are harder to compare than simple circles and squares. Our goal is to make the overall pattern arise. We won’t succeed on that if our readers are not able to even to compare symbol sizes properly.

There is another reason to prefer squares and circles: they are easier to scale.

1. sCAling proportionAl symbols

It is misleadingly simple. I draw a circle or a square, I select it, I pick the scale tool and type the scaling amount: 200%. I duplicate the second circle and do the transformation again. And, voilá! I will have three symbols that represent 1,000, 2,000 and 4,000 units! Let’s move on to the next section!

Boiled octopi purchasedin A Coruña(October 2006)

Source: Invented Data

500

250

1,000

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer sed leo. Vestibulum vel

1,000

200%

200%

2,000 4,000 1,000

200%

200%

2,000 4,000

Hum, there’s something strange here. Just to make sure that the second square is actually twice the size of the first one I am going to place the first square inside the second one. Hey! This is wrong! The second square does not only fit two copies of the first square, but four! And the third square can fit eight copies of the first one. It seems that I have to remake my map legend:

1,000 2,000 4,000

OKNO

1,000 4,000 8,000

The problem is that when you use the scale tool in a graphic design program you are changing the object’s two dimensions: height and width. And, as you probably remember, the area of a square is the result of multiplying height by width. If your square’s sides are 2 by 2 meters, its area will be 4 square meters. If you scale that object up 200% the result will be 4 by 4, so the are area will be 16: four times the area side, not two. Circles are as tricky as squares. Lucky for us, there are some simple formulas that will come in handy.

First of all, remember that the area of a circle is equal to πr2 , where π is the dreadful 3.1416 number and r represents the radius of the circle. To this formula you will have to know the radius of the biggest circle on the map:

= New value: 1,100Biggest value: 2,600

Rm=1.1 cm

0.72 cm

Rm is the radius of the biggestcircle. It represents 2.600 units

If we want to calculate the circle thatrepresents 1,100 units (Ra)

�e circle representing 1,100 unitswill have a radius of 0.72 cm

Which can be reduced to: And then...

πRa2

πRm2

= ×RmRa1,1002,600√ = ×1.1Ra 0.42√ = 0.72 cm

14 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 15: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

If you want to use squares, the formula is analogous:

= ×SmSaSm is the length of the side of the biggest square (known value)and Sa is the length of the side of square you want to calculate

Value represented by Sa

Highest value√

2. CirCles or squAres?If you are already convinced that you want to use a simple and geometrical shape, and not the cartoony octopus, an important thing

to decide is if you need circles or squares. Squares have an obvious advantage: their proportional areas are very well perceived. It is easier to judge the comparative sizes of two squares than the sizes of two circles. Our eye-brain system prefers straight lines than smooth curves when the main goal is to detect relative sizes.

A major disadvantage of squares is that they don’t look as great as circles, especially on crowded displays. A map packed with squares usually looks clunky and static. That effect disappears if you substitute the squares for circles.

Value aValue bValue c

Value aValue bValue c

Besides, the main goal of a proportional symbol map is not to allow the readers to compare every single symbol to the next one, but to allow them to perceive the overall geographical pattern of the data.

Therefore, circles are generally the best choice. This does not mean that we should forget two important issues that have to do with the tricks and visual illusions that our perception system plays with us:

1. The effect of neighboring circles affect how we judge the symbol areas. This effect is known as the Ebbinghaus illusion. Given two circles of the same size surrounded by others, the one surrounded with larger circles will appear to be smaller. And vice versa:

Both grey circles have exactly the same area. However, the first one seems to be smaller and to recede, while the second one looks larger than it really is and it tends to advance towards as. �e reason lies in how our visual perception works. As we saw in the first section of this book, we don’t perceive individual units. Instead, our perception relies on how units relate to each other. �e way we perceive the size of an object depends on the surrounding elements.

2. Readers don’t estimate the size of symbols accurately when they see many of them displayed simultaneously. This is a well known phenomenon that especially affects circles. The average reader underestimates sizes of large circles in relation with smaller ones. If you place two circles and the second one is twice the area of the first one, the average reader will think that the big circle is quite smaller than it really is. Many cartographers argue that this fact should have consequences in the way we calculate the size of the symbols we use.

The traditional way of calculating symbol areas matches the proportions of the data with the sizes of the symbols that represent them.

15© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 16: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

This is known as the square root method of circle symbol scaling. If the data are 1 and 2, the second circle will be twice the area of the first one.

Another technique that has been proposed to solve this unconscious misestimation is called the psychological scaling method or perceptual scaling, and it involves the slight exaggeration of the size of larger circles. Unfortunately, no consensus has arisen about what formula is better to calculate these slightly-exaggerated-circles. Just keep in mind that if there is a lot of variation in your map you might want to cheat a little bit in order to make the message clearer.

Square root scale

5 20 50 100300 Perceptual scale

5 20 50100

300

3. where to plACe the symbols

As a general rule, the symbol should be placed where that particular value is observed. If the location is a “point” (a town, a city), this is quite easy: just center the symbol to the point. If the symbol represents a value associated with an area, you should put it in the visual center of the region. There is an exception to this rule: when too much overlap occurs it is acceptable to displace the symbols slightly off-center.

Overlaps can be a problem: some areas are so crowded that it would be difficult to see what’s going on, and the symbols might be so big that they will hide each other. Research in visual perception has revealed that there is a direct relationship between too much overlap (“cut-out” circles) and estimation mistakes.

There are two way of addressing this problem: use semitransparent symbols, so you can see through, or use solid symbols and make sure that smaller symbols are always in front of larger ones. However, this might not be enough and you could be forced to scale all the symbols down to reduce overlap.

NO Better Better

4. how to design the mAp legend

Although there are no fixed rules about how to organize the legend, or key box, of a proportional symbol map, two main styles have been invented: nested (the smaller symbols are placed within the larger ones) and linear (the symbols are placed next to each other). Linear legends can have a vertical or horizontal orientation. Use as many symbols as needed, as the number of elements in the legend will depend on the range of the data, its dispersion and variance. As a general rule, though, try not to use more than three or four.

Regarding which symbols to include on the map legend, it depends on the data shown. In some cases, the best option is to include the largest and lowest value and some values in between. In others, you should include values that are relevant to the information, those that you want to highlight or those that are more frequently observed in the data set. In any case, the goal of the legend should be to minimize perception errors.

When you organize the legend, remember that it is possible to classify the data using some of the techniques that we covered in the choropleth maps section.

Nested legend Linear legends Legend showing classified data

500

300

100

30

500

300

100

30

50030010030

351-550151-35051-1500-50

16 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 17: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

isopleth mApsAn isopleth map is built by connecting points where observations of equal value are located using lines, called isolines. You can think

of them as choropleth maps that try to match the natural flow of smooth phenomena by using curved lines. As with choropleth maps, isopleth maps are better for displaying standardized data. The most obvious example of an isopleth is the weather map in any newspaper. The shaded areas identify regions of approximate equal temperature or of equal air pressure.

The map below is a simplified explanation of how isopleth maps are created: you locate the points of equal values and then you connect them using slightly smoothed curves.

3338 38

464646

46

46 46

4646

46

38 38 3838

3838

3838

3838

3838

383838

3838

38383838

3838

33 33 33 33 33 33 3333

3333

3333

33

3333

33

33333333

33

33

33

3333333333

3333

�ese pointsidentify placeswhere the samevalue was observed

5. multivAriAte mAps And redundAnt symbolizAtion

Proportional symbol maps are extremely flexible, as they allow you to show the patterns of multiple phenomena without increasing the complexity of the display significantly. Even though these kind of maps are called “multivariate,” the “multi” part of the word is misleading. You should never show more than two (“bivariate”) or three different data kinds on them. Combining more data sets can be confusing:

Some cartographers have proposed the use of redundant symbols on maps. In this kind of infographic, a single attribute is represented by two visual variables: it could be circle size and circle shade, for example. The rationale of this decision is easy to see: if readers have trouble discriminating scaled symbols, especially in crowded areas, why not give them another visual clue that could organize the information better? Redundant symbols can be an interesting option when too much overlap occurs:

50

30

20

1031

Numberof butcheries

% of peoplewho eat meat

50-60%50-60%50-60%50-60%50-60%

50

30

20

103

Numberof butcheries

17© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 18: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

dot mApsYou can think of dot maps as though they were

pointillism paintings. George Seurat, the founder of Neoimpressonism and the most famous artist that used this technique, created his work by using thousands of color dots that, on a certain level, work like the pixels in your computer monitor. When seen from a distance, the dots are not perceived as discrete units, but as a part of a pattern of color: the painting has a nice texture thanks to the dots, and the figures and shapes stand out.

In its purest form, a point in a dot map represents one observation of the phenomena the story talks about. The point is placed on the exact same place where the value is observed and the overall pattern of density is estimated by the concentration or dispersion of dots in the different areas. This kind of map is usually called one-to-one. One dot corresponds to one observation: one farm, one oil field, one restaurant, one out-of-wedlock birth. In this case, the dot should be drawn at the center of each enumeration unit (whether it is a city or a region).

In the real world, though, it is not very common to find such a clear pattern of values. In most cases, the data will be dispersed and you will need to use some kind of interpolation technique. For example, triangulation, which is much simpler than its name might suggest, although it is time consuming if you do it manually. This is how it works:

4

16

1514 16 18

1617

18

16

2627

28 27

28

2617

15

18

16

15

17

1918

20

26

2628

26

37 39

38

37

53

16

1514 16 18

1617

18

16

2627

28 27

28

2617

15

18

16

15

17

1918

20

26

2628

26

37 39

38

37

53

Locate the control points. Each point on this map shows the value associated with that area (it can be temperatures, number of cars sold per 1,000 persons, etc).

1 Connect the control points using straight lines to create a grid between them.

2

Interpolate along the connection lines. You will have to calculate where to cross each line depending on what the values used the map legend are. In this case: 25 (red) 35 (green) and 45 (blue). Each intersection point will lie closer to either end of each line.

3 Smooth the manually drawn lines. After this step you can add different shades of color.

16

1514 16 18

1617

18

16

2627

28 27

28

2617

15

18

16

15

17

1918

20

26

2628

26

37 39

38

37

53

25

Below 24

3545

George Seurat’s Sunday Afternoon on the Island of La Grande Jatte (Art Institute of Chicago)

18 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 19: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

A common dot mapThere were no computers in the 50s, when

this map was produced. That meant that the cartographer had to draw every single dot, a work that requires a lot of patience, indeed.

United States Census Bureau. Portfolio of U.S. Census Maps, 1950.

DUMMY MAP

Dots can represent more than a value of one: each dot could be 100 farms, 300 hundred oil fields and so on. As long as the density pattern is clear, any value is acceptable. In this case, the dot map needs a legend to explain how many units the dot represents. There are no set rules about this, but some basic guidelines have been suggested:

• Choose a value that can be easily understood: 100 is better than 113. • Scale the dots so that they will visually blend in the region that has the highest density.

It is not mandatory for the dots to be located on the exact same place where the phenomena occur in every case. It can happen that it is not clear where that place is: you know that 153 Asian restaurants are open in the state of the Ohio, but you don’t know where all of them are exactly. It would be acceptable to just draw 153 dots within the boundaries of the state and do the same with the neighboring ones. In this case, dots are used as shading symbols specially suitable for showing continuous phenomena. They are used instead of the plain colors of a choropleth map, which are not able to suggest smooth transitions.

The determination of dot size is crucial for this kind of map. The dots should be neither too large (that would result in their undesired dominance) nor too small (that would make them barely visible).

Dotblending

Not so good

Good

NO NO OK

19© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 20: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

CArtogrAmsAlso called value-by-area maps or diagrammatic

maps. Their underlying principle is easy to understand: scale the enumeration units (remember: regions, countries, counties) proportionally to the values they are associated with. Both raw and standardized data can be encoded using this method.

Two basic types have been defined: contiguous and non-contiguous. In a contiguous cartogram, boundary relations are kept, which forces the cartographer to strongly distort the mapped areas. Contiguity is respected to make it easier for the reader to identify the enumeration units by placement.

A nice example of a contiguous cartogram is the one used by The New York Times in 2004 to show the “weight” of each state in the National Elections according to the number of electoral votes each one has. The journalist chose to represent each vote as a square. As you can see, the gross shapes of the areas are roughly respected, and they emerge by stacking squares. Our perception identifies objects using their silhouettes, so this factor is crucial in making the map understandable.

In non-contiguous cartograms, the shapes of the enumeration units are respected. The areas are separated, but the gross spatial relationships between them are roughly kept. This has a consequence: big gaps appear where the boundaries between the areas were before.

flow m ApsAlso called linear cartograms, flow maps have a

double purpose: to show movements or interchange relationships between areas and to visually represent how many units are actually displaced or traded.

The quantitative side of these maps is represented by the use of lines of different weights: the thicker the line, the higher the value it represents.

Three types of flow maps have been described: radial, network and distributive, depending on their main characteristics. Radial flow maps have one or more focal points, and all the lines radiate from them. Network maps show the spatial relationships between a set of places (called “nodes”). The map on the left is a distributive flow chart.

Most guidelines that can be mentioned for the design of flow maps are shared with other types of maps (especially proportional symbol maps). As you can end up with a very busy display, make sure that the lines are not so thick that they will obscure the geographical data. And not so thin that they are not visible. Thinner lines should be placed in front of thicker ones, etc. The legend plays a key role in this kind of map, so keep it simple and clear.

Arrowheads are only necessary if the direction of the movement is relevant to the content of the map.

DUMMY MAP

© The New York Times. Reproduced with permission.

An Inconvenient Truth, Al Gore.

20 © Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro

Page 21: C This material has not A been copy-edited yet, so you may ... · 1. Choropleth Maps These are probably the most common kind of thematic map in news publications because their meaning

reCommended reAding

These are the books that I recommend you to take a look at, regarding the content of chapters 5 and 6:

The classic Elements of Cartography (6th ed., John Wiley & Sons, 1995) might be overwhelming for a beginner, as it was conceived as a textbook and contains tons of information about creating first-hand maps based on aerial or satellite pictures, which is not really relevant for visual journalists; it’s more likely that you will create derivative maps, that are based on someone else’s work. It’s a highly recommended reading, though.

Much more accessible for newbies is the work of John Krygier and Denis Wood, Making Maps: A Visual Guide to Map Design for GIS (The Guilford Press, 2005) which contains a huge amount of useful advice about making maps more legible and organized.

Mark Monmonier’s How to Lie with Maps (2nd ed.,The University of Chicago Press, 1996), besides being incredibly interesting for exemplifying how a visual display can be deceptive without being plainly wrong, includes three chapters that summarize cartography’s rules of thumb in a very concise way. Monmonier’s body of work is an example of how science can be made accessible to the public.

Alan M. MacEachren’s How Maps Work (The Guilford Press, 2004) is to the date, the most thorough overview of how maps are created, perceived and interpreted. Its interdisciplinary approach and the broad range of topics covered make this book a breathtaking scholarly tour de force.

Terry A. Slocum’s Thematic Cartography and Visualization (Prentice Hall, 1999) and Borden Dent’s Cartography: Thematic Map Design (5th ed., McGraw-Hill, 1998) are must-haves in any newspaper infographics department.

reCommended websites

The University of Colorado at Boulder’s website about map projections is one of the most comprehensive sites I have ever found:http://www.colorado.edu/geography/gcraft/notes/mapproj/mapproj_f.html

Carlos A. Furuti’s website on projections: http://www.progonos.com/furuti/MapProj/Normal/TOC/cartTOC.html

Charles Sturt University’s Introduction to Thematic Mapping course resources: http://mapmaker.rutgers.edu/355/links.html

Charles Sturt University (Australia) Map Maker tool allows you to generate projections of any area of the world and save the resulting files in vector or bitmap format. Indispensable for busy journalists and designers.

http://life.csu.edu.au/cgi-bin/gis/Map

The University of Texas - Perry Castañeda Library’s website is one of the most important sources for copyright-free maps. In this extensive collection you can find bitmap maps (that you’ll need to trace in any vector illustration program) and PDF files. It also includes many historical maps.

http://www.lib.utexas.edu/maps/

The Geography Network is a great source for maps and data. As its own name indicates, it is a “network” formed by professionals and companies devoted to cartography.

http://www.geographynetwork.com/

Places to search for dataThe National Atlas - http://nationalatlas.gov/The U.S. Census Bureau - http://www.census.gov/ It contains data that can be downloaded for use with GIS mapping software.The U.S. Geological Survey - http://www.usgs.gov/

Some software developersESRI - http://www.esri.com is the best known GIS company. They own the very famous ArcGIS software.Two overviews of the many free and open-source GIS packages available (GRASS, MapServer, Geoserver, etc):http://freegis.org/ and http://www.maptools.org/A generator of flow maps: http://graphics.stanford.edu/~dphan/code/flowmap/

21© Alberto CAiro. All rights reserved. www.thefunCtionAlArt.Com

© Alberto CAiro