sds podcast episode 271: making the public … · 2019-06-19 · you just need to go to...

43
SDS PODCAST EPISODE 271: MAKING THE PUBLIC GRAPHICALLY LITERATE

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

SDS PODCAST

EPISODE 271:

MAKING THE

PUBLIC

GRAPHICALLY

LITERATE

Page 2: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Kirill Eremenko: This is episode number 271 with the legend of visual

journalism, Alberto Cairo.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name

is Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur, and each week we bring you inspiring

people and ideas to help you build your successful

career in data science. Thanks for being here today

and now let's make the complex simple.

Kirill Eremenko: This episode is brought to you by SuperDataScience,

our online membership platform for learning data

science at any level. We've got over two and a half

thousand video tutorials, over 200 hours of content

and 30 plus courses with new courses being added on

average once per month. You can get access to all of

this today just by becoming a SuperDataScience

member. There is no strings attached. You just need to

go to superdatasceine.com and sign up there, cancel at

any time. In addition with your membership, you get

access to any new courses that we release plus all the

bonuses associated with them. Of course there are

many additional features that are in place or are being

put in place as we speak, such as a slack channel for

members where you can already today connect with

other data scientists all over the world or in your

location, and discuss different topics such as artificial

intelligence, machine learning, data science,

visualization and more, or just hang out in the pizza

room and have random chats with fellow data

scientists.

Kirill Eremenko: Also, another feature of the SuperDataScience

platform is the office hours, where every week we invite

Page 3: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

valuable guests in the space of data science and

interrogate them about their techniques, about their

methodologies in the space of data science, and you

actually get a presentation from the guest and you get

an opportunity to ask Q&A at the end. In some of our

office hours, we just present some of the most valuable

techniques that our hosts think are going to be

valuable to you. All of that and more you get as part of

your membership at SuperDataScience, so don't hold

off, sign up today at www.superdatascience.com,

secure your membership and take your data science

skills to the next level.

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies

and gentlemen. Super, super pumped to have you

back here on the show today because the guest for

today, I've been hunting this man down for months.

We've been inviting Alberto or trying to get a spot in

Alberto's super busy schedule for months now, and

finally it's happened. I just got off the phone with

nobody else, but Alberto Cairo, and we had an

amazing, amazing chat about data visualization. If

you're not familiar with who Alberto is, Alberto is a

journalist, he's a speaker, an author. He's also the

Knight chair in visual journalism at University of

Miami. The knight chair means that he's endowed by

the Knight Foundation, which recognizes and puts

certain journalists into leading positions as tenure

professors in academia. There's only a handful of

Knight chairs in the US, maybe a couple of dozen, and

Alberto Cairo is one of them.

Page 4: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Kirill Eremenko: All of these credentials should speak for themselves as

to what kind of calibre of a journalist and data

visualization expert Alberto is. He's presented at

numerous conferences and he's actually published two

books already. You might be actually familiar with

them. The first one is called The Functional Art and

Introduction to Information Graphics and

Visualization, came out in 2012. The second one is

The Truthful Art: Data, Charts, and Maps for

Communication, came out in 2016. What's exciting is

that Alberto's third book is coming out, it's called How

Charts Lie: Getting Smarter about Visual Information.

It's coming out in October this year, October 2019,

and you can actually already pick it up on pre-order.

We talked about Alberto's book and you get some very

useful insights from this book for your visualization

practices, and also for understanding visualizations

better.

Kirill Eremenko: Plus, we talked about plenty of other things on this

podcast. Here's a couple of teasers of what you're

about to experience. Why do people misinterpret

visualizations? The Simpson's paradox, the ecological

fallacy, four kinds of literacy, being conscious about

visualizations, exploratory data analysis versus

communicating results, how to design effective

visualizations, and ethics in data visualization. Those

are just a few topics that we touched on. As you can

imagine, it's going to be a value packed podcast.

Without further ado, I bring to Alberto Cairo, the

legend of data visualization.

Page 5: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Kirill Eremenko: Welcome back to the SuperDataScience podcast, ladies

and gentlemen. Today I'm super excited because I've

got a legendary guest on the show, Alberto Cairo

calling in from Miami. Alberto, how are you going

today?

Alberto Cairo: Hey, doing good. How are you?

Kirill Eremenko: I'm doing very well, and super pumped to talking to

you. I watched your presentation at Microsoft

yesterday as we were chatting just before the podcast,

and my God, you have some very interesting

approaches to visualization. I'm very excited to dig into

these today.

Alberto Cairo: Likewise. Thanks for having me.

Kirill Eremenko: Yeah. No, pleasure's mine. How is Miami this time of

the year? I saw on your Twitter feed that you're

spending ... you're finally taking some time away from

all the presentations and conferences, and I guess

spend some time with family. Are you looking forward

to that? How's that going to be?

Alberto Cairo: Oh yeah, I'm so super looking forward to that. One

thing that I usually joke about Miami is that I am

originally from Spain, from Northwestern Spain, a

region called Galicia, and Galicia is very rainy and

dark and windy and cold, and Miami can be rainy

sometimes particularly during the summer because

clouds build up during the day and you get a

downpour at the end of the day, but most of the time

is warm and sunny. I got used to this weather very

quickly and I love it here, and I'm looking forward to

those three months of staying at home, no trouble. But

Page 6: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

I will have tons of work. I mean I'm not planning to

basically rest, so will be working on tons of stuff. It's

only that I can do it in my backyard, next to the swing,

which is the luxury that I have.

Kirill Eremenko: Yeah, no, that's very exciting. But it doesn't get too hot

in ... I've only been in Miami briefly and then I went to

Florida Keys. I was wondering, it doesn't get too hot?

Because in Spain, for instance, in summer, last year, I

think it was like 37 degrees Celsius or something like

that.

Alberto Cairo: Oh yeah. If you go to the south of Spain, you can get

to 40 degrees Celsius or even more, 40, 42. Miami

doesn't get that warm. However, what happens is that

you have crazy humidity. You need to hydrate all day

basically. But if you do that, you're fine. I mean, if you

always carry water with you, which is advisable, then

you're fine. But you need to like this kind of weather. I

mean, if you are a cold weather person, you will suffer

mightily, mightily here. But I'm a warm weather

person, so I really enjoy Miami.

Kirill Eremenko: Yeah, yeah. I understand. Indeed, it's really humid. As

soon as you get out the plane, you start sweating like

crazy.

Alberto Cairo: Yeah. Exactly, yeah.

Kirill Eremenko: Which part of Miami?

Alberto Cairo: I live in a neighborhood called Kendall, which is in

Southwestern Miami. I am not close to the coast, to

Miami beach. I'm closer to the Everglades, which is the

large natural park, the swamp. It over here. I usually

Page 7: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

joke that I'm closer to the alligators than I am to the

dolphins.

Kirill Eremenko: Or the sharks.

Alberto Cairo: Or the sharks, yes.

Kirill Eremenko: Okay, got you. Okay. Well, very cool. Very excited for

your time in Miami for the next few months to have a

rest. A well deserved rest because as we were chatting

before the podcast, you've got your third book coming

out in October. Once that happens, you're going to be

on the move going to conferences pretty much every

day. As you said, you can see it as a problem or as a

huge opportunity.

Alberto Cairo: Yeah. It's a problem-

Kirill Eremenko: How are you feeling?

Alberto Cairo: Yeah, it's a problem or an opportunity. Yeah, the book

that comes out in October, it's actually my first book

for the general public. The title is How Charts Lie,

although perhaps a more appropriate title would be

how we lie to ourselves with charts. The way that it is

written, it's very informal, very nontechnical. It's an

introduction to how to become a better reader of

charts. Not a better designer, but a better reader

because it's for the general public, it's not for

designers. It's how to correctly interpret all the line

charts and bar graphs and data maps that we see

every day in social media and the news media, how to

extract the right meaning from them. I don't know.

Perhaps it will ... I don't know. It will sell well, it will

attract lots of attention. Who knows? Yeah. I already

Page 8: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

have several speaking engagements lined up for the

fall in relationship to it, just to help with the

promotional efforts.

Kirill Eremenko: No, that's very exciting, and totally agree that a book

for the general public, well, especially from somebody

of your level in the space of visualization, it's

necessary because there's people who want to hear

from you, but maybe they're not technical, they don't

have the technical background to understand certain

concepts or to keep up with certain concepts. A book

for the general public I think is a great idea. What are

some of the main things that you color off in this

book? What are some of the main themes?

Alberto Cairo: Yeah. What I did in the new book was to basically ask

myself, if I had not learned anything myself, about

data visualization by studying or practicing it, what

are the most elementary skills or pieces of knowledge

that I need to have in order to be a critical, not

designer, but a critical reader of these kinds of

products in news media? Right? Obviously, I cover

things such as the main principles of data

visualization that you can read about in any more

technical books, like the ones that I wrote in the past,

such as the Truthful Art, for example, right?

Alberto Cairo: Principles such as visual encoding, what is visual

encoding? Right? Visual encoding basically is getting

your data and then mapping your data onto objects,

and then changing some properties of those objects in

proportion to the data that you're trying to represent.

It could be the length of the object or the height of the

Page 9: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

object or the color of the object and so on and so forth.

Those properties we call them encodings. Right?

Kirill Eremenko: Mm-hmm (affirmative)

Alberto Cairo: In the past I taught these skills to people who wanted

to work in data visualization. What I do in the new

book is to try to explain these very elementary

principles to people who are not going to be graphic

designers or visualization designers or data scientists,

but who are going to be consumers of those kinds of

products. So they need to be prepared to read them

correctly, and in order to read them correctly, you

need to understand data visualization at the

symbolical level, so understanding the principle of

mapping data onto objects at the grammatical level,

meaning that you need to learn about encodings. In

the third level, which is the core of the book actually,

it's the semantics level.

Alberto Cairo: Once you are able to understand the mechanics of a

graphic, how to read it, right? Then you need to be

able to interpret it, right? It's at the semantics level.

What is the information that that graphic is caring,

how to extract the right insights, or the right

inferences from the chart that you are seeing. I think

that these skills are of greater value for anybody.

Right? The problem is that the literature about data

visualization, and this includes my own previous

books, they are aimed too much at people who want to

specialize in the field. We don't really share some

knowledge, right? We have basically the same, similar

levels of knowledge. Right? There are challenges that

... Basically what is happening is that there is an

Page 10: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

increase in the sophistication that visualization

designers have, but there is not the same increase in

sophistication in the readers who consume these types

of data visualizations, right?

Alberto Cairo: There's a growing gap between, let's say, the

communities made of visualization designers, data

scientists, statisticians, et cetera. We are developing

new methods every day, we are making all these fields

advance very quickly and improve very quickly and

create new tools and so on and so forth, but the

general public is falling behind, right? My interest in

the past few years has been, how can we help the

general public bring themselves up to speed with all

these new techniques? Obviously, I cannot write about

data science. I'm not a statistician, I'm not a data

scientist, but I'm a visualization designer. I asked

myself, what can I do to help my dad, for example,

who's a medical doctor, not trained in statistics, not

trained in data visualization, what can I do to help my

dad bring himself up to speed with data visualization?

Alberto Cairo: I wrote the book that way. If I had to explain to a

nontechnical person what data visualization is about,

why it is so important, why it can be so powerful, but

at the same time how dangerous it can be as well, if

you're don't use it correctly. How would I write that

book? That's the frame of mind that I put myself into

to write this new book.

Kirill Eremenko: I totally understand. I like how you say in your talks,

that good data visualizations have two really powerful

qualities, that they're persuasive, and they're memoral

right? If you see a good visualization, not only

Page 11: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

understand what hopefully, probably understand

properly, always communicate property, what's the

underlying insights are, but also you're able to

memorize it because it's an image and you can see it in

your head and you can maybe describe those insights

later to somebody else. I think perhaps those are the

two reasons why more and more publications, such as

the Wall Street Journal, New York Times and so on,

they're moving to visualization.

Kirill Eremenko: The amount of info graphics and visual

representations of information, whether it's about

elections or about population statistics or about crime

rates and things like that. The amount of info graphics

out there is crazy, and now they're getting interactive

and they're getting more and more exciting and

interesting on these publications. That's very

interesting.

Alberto Cairo: There is a reason. There is a reason for this increase,

which is that if you ask people who work in data

journalism departments or graphics departments in

news publications such as the ones that you

mentioned, Wall Street Journal or 538 or the New York

Times or ProPublica or many others, the Financial

Times, all of these publications are considered the gold

standard in using data visualization in the news. They

will all tell you the same thing, which is that if our

data visualization is well designed, and it covers a

topic that the public is interested in obviously, it will

become extremely, extremely, extremely popular. I

mean, some of the most popular pieces of content

Page 12: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

published in the past decade by some of these media

publications have been data visualization.

Alberto Cairo: The most popular, and this is a factoid that I usually

talk about in some of the talks in relationship to the

new book, How Charts Lie, one of the things I say is

that the most popular piece of content ever published

by the newyorktimes.com, the New York Times is the

most important, most serious newspaper in the United

States, and one of the most important newspapers in

the world, the most popular piece of content ever

published by the New York Times online is a data

visualization.

Kirill Eremenko: Oh wow.

Alberto Cairo: It's a data visualization that is commonly called ...

Yeah, it's commonly called the dialect map. You can

Google it up. The dialect map, New York Times. The

actual title is How You, Y'all and Youse Guys Talk, or

something like that. I don't remember exactly what the

title is, but everybody knows it as the dialect map.

Basically it's a tool that asks you several questions.

How do you pronounce this word in English? Or how

do you refer to this particular phenomenon or this

particular animal in English? What word do you use

for that? Based on your responses to the questions

that are posed to you, basically what you start seeing

is a bunch of maps that predict where you'll probably

live or where you're from, right? Based on some of

your-

Kirill Eremenko: In the United Sates, right?

Page 13: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Alberto Cairo: In the United States, although recently they created a

version for the United Kingdom.

Kirill Eremenko: Oh wow.

Alberto Cairo: Yeah. It's a lot of fun. That project is the ... The

reasons why this project is so popular or viral it has to

do with how interesting the topic is, but also because

it's a visually ... it's a visual tool, right? And it is so

well designed and so well done, and it's the most

popular piece of content ever published by the

Newyorktimes.com.

Kirill Eremenko: Why would you say people like visualization so much?

Alberto Cairo: Well, I mean it appeals to us, visualization, because

first of all, it's visual and we are visual creatures. We

prefer to see things rather than to read things. We've

basically evolved to be visual creatures. I mean, a huge

part of our brain is devoted to processing visual

information. Then another version of a data

visualization is that, as I mentioned before, I mean, it's

persuasive and it's memorable when it is very well

designed. The way that I usually put this in talks and

in the new book is that if I did a visualization which

was well designed and it reveals certain insights

coming from the data, once you see those insights, you

can not unsee them anymore. Basically they stick to

your brain. It's like they are very memoral. That's

another reason. Visualization is much more

memorable if it is well designed, right? Sometimes

than text alone, right?

Alberto Cairo: By the way, visualization is not just visualizing things,

visualization is very often the combination of visuals

Page 14: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

with words that supplement those visual, right? The

best data visualizations are usually combinations of

visual optics with words that reinforce each other. We

call the annotation layer in the world of data

visualization. They are also beautiful objects, right?

We human beings like beauty, beautiful, to see

beautiful objects and enjoy, right? Good visualizations

are highly enjoyable. There's maybe a bunch of

reasons. This may be just some of them, a few of them.

Kirill Eremenko: They trigger an emotion, right? Like that example-

Alberto Cairo: Yeah, yeah, absolutely.

Kirill Eremenko: ... that you gave about the hockey stick, right?

Alberto Cairo: They can be joyful, right? As very common ... we say

commonly these days, they may spark joy, right?

Kirill Eremenko: Yeah.

Alberto Cairo: They-

Kirill Eremenko: Or they can terrify, right? You have that-

Alberto Cairo: They can terrify you. I mean, they can terrify you, they

can surprise you, they can ... I don't know. They can

be emotional. The same way that a good text can be,

right? Texts can also elicit emotion sometimes, but

there is something more visceral, something more

direct in the use of visual objects to do that.

Kirill Eremenko: Yeah. Therefore, because a visual creates this imprint

and creates ... As you said, if you see it, you cannot

unsee it, it's a bit dangerous or sometimes sad when

visuals are, as you put it, either misused or

misinterpreted, and people see the wrong thing or are

Page 15: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

shown the wrong thing and therefore now they cannot

unsee the wrong thing, and that creates a whole

[crosstalk 00:20:46]

Alberto Cairo: It's so persuasive, so powerful that they can overpower

you, all right? They can basically become means that

controls your thoughts. That's a whole another reason

why I wrote this new book, right? To basically warn

people about how careful we need to be when reading

visualizations, right? There are many examples of that.

One example that I have in the book is ... which I use

by the way to explain one of the core principles of

reading data visualization, which is that when you see

a data visualization, one of the key things that you

need to do is to come up with the right description of

what you are seeing, right? I do this as scatter plot,

which I borrow from a friend of mine, Heather Cross,

who is a statistician. It's a scatter plot that shows the

positive association as a positive correlation between

cigarette consumption and life expectancy, country by

country. When you take a look at the country level

data, the association between cigarette consumption

per capita and life expectancy is positive, right?

Kirill Eremenko: Wow.

Alberto Cairo: Imagine this scatter plot. Now the way that I, that you

would describe, that we commonly describe that kind

of chart, and I know this because I have done this

myself, is to say if you see the x-axis, cigarette

consumption per capita and the y axis, the vertical

axis, life expectancy per capita, and you see that one

of them is positively correlated with the other, the way

that we usually describe that kind of chart is, the more

Page 16: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

cigarettes we consume the longer we live, right? But

that is not the right description. If you describe the

chart like that, you are biasing people's perception of

that chart and you are biasing your own perception of

that chart. Because what you're only maybe

considering is that you're looking at the data

aggregated at the national level, and that can be very

dangerous because it could be an example of a

Simpson's paradox, right?

Alberto Cairo: The phenomenon that data that gets aggregated at

certain level may display patterns that may disappear

or reverse completely once you disaggregate the data

at lower levels of aggregation. It's a perfect example to

explain these phenomena. I do this in the book.

Because once you disaggregate the data at the regional

level, at the local level, and you go down to the

individual level, you will see that the relationship that

was positive before, more cigarettes more life

expectancy, reverses completely; more cigarettes, less

life expectancy. Why the reversal? The reversal is

related to wealth, right? The wealthier a country is, the

more cigarettes people in that country can consume.

The wealthier a country is, the more cigarettes per

capita you have. But at the same time, the wealthier a

country is, the higher the life expectancy is as well,

because people can pay for better health care, right?

Alberto Cairo: Basically what you're seeing there is a spurious

correlation between the ... Well, it's not really

spurious. The correlation really exists, but it only

exists at the national level, not at the individual level,

which is the level that you are interested in. If you

Page 17: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

want to know, for example, whether smoking cigarettes

is good for you, you should not look at data at the

national level because the correlation that you see at

the national level may not reproduce at the individual

level.

Kirill Eremenko: Got you. At the national level, every point on the chart

is a country, at the individual level-

Alberto Cairo: Exactly. Every [inaudible 00:24:10] at individual level.

So the x-axis is cigarette consumption, the y-axis will

be life expectancy, and just see a positive association.

The more cigarette ... the bigger the cigarette

consumption is, the further to the right a point is, the

further up the point needs to be as well.

Kirill Eremenko: Yeah, yeah. No, that's very interesting. Or that other ...

There was another example in one of your talks I had

in my mind just now, that had the same thing that if

you ... it depends on how you interpret it, right? How

you ... Oh, the chocolate and Nobel prize winners. That

the example.

Alberto Cairo: Yeah, the chocolate and [crosstalk 00:24:51]

Kirill Eremenko: Can you tell us about that?

Alberto Cairo: Right.

Kirill Eremenko: I love that example.

Alberto Cairo: Yeah, that's an example that I don't use in How Charts

Lie. I use it in the previous book, in The Truthful Art.

Basically it's like, if you take a look at a scatter plot,

it's a very similar. Imagine a scatter plot at the

national level, each dot is a country. Then on the x-

Page 18: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

axis, you plot a chocolate consumption per capita. So

the farther to the right a country is, the more

chocolate per capita that country consumes, and then

the further up on the y-scale, on the vertical axes, the

larger the number of Nobel prizes per ten million

people you have. There is a very strong positive

association, is correlation. It's linear. It's a linear

association between chocolate consumption per capita

and Nobel prices per capita ... per ten million people,

right? The more chocolate consumption ... The bigger

the chocolate consumption, the bigger, the larger the

number of Nobel prizes.

Alberto Cairo: But obviously you cannot enfigure that there's a

relationship between those two things. That's the first

thing, right? The classic correlation is not causation,

right? But we need to go beyond that, right? The

correlation is not causation is a mantra that we have

been repeating for decades now, and it's basic

knowledge, it's an elementary knowledge. We need to

keep repeating it because it's very easy to infer

causation based on some mere correlation, but we

need to go beyond that, and that's what I need to ... I

try to do in the new book; explaining concepts again

such as a Simpson's paradox or the ecological fallacy,

right? That the ecological fallacy being inferring

something about yourself, for example, based on data

that is aggregated at the national level or the regional

level, right?

Alberto Cairo: You cannot infer something about yourself, whether

cigarette consumption is good for you, right?

Individually, based on data that you're seeing at the

Page 19: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

national level, right? Because there may be

confounding variables that you're not taking into

consideration, for example, wealth in this particular

case. I am emphasizing all of these examples so much

in our conversation today and also in the new book,

because this is a mistake that I have made myself,

because I was careless about data, right? Describing

the cigarette consumption chart or life expectancy

chart as the more we smoke, the longer we live. Well,

that's not true. The way to describe a scatter plot

showing the positive as a correlation between cigarette

consumption and life expectancy would be to say that

there is a positive association between cigarette

consumption and life expectancy, but that doesn't

mean that one of the variables causes the other, and

this relationship may disappear once we start

disaggregating the data.

Alberto Cairo: We need to warn people about these kinds of

phenomena when we present it to them. At the same

time, a reader of charts need to be prepared not to just

look at the graphic and move away, but to read the

graphic carefully and think about the chart because if

you don't pay attention to the chart, right, you will

probably be misled by the chart, you will struck the

wrong inferences from it. Charts, maps, graph, et

cetera, they are not meant to be seen, they are meant

to be read like a piece of text. You need to read them

and think about them carefully. Right? Otherwise, you

would probably be misled by them.

Kirill Eremenko: Got you, and I really like what you say about why

people misinterpret charts and how we can ... what is

Page 20: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

missing in that puzzle. When you talk about the four

kinds of literacy, so the normal literacy as in reading,

the one we're used to-

Alberto Cairo: Reading and writing.

Kirill Eremenko: Yeah. Articulacy, numeracy-

Alberto Cairo: Articulacy.

Kirill Eremenko: Numeracy and the graphicacy. Do you mind telling us

a bit about those?

Alberto Cairo: Sure. Sure, sure.

Kirill Eremenko: What are the last two?

Alberto Cairo: Yeah. These are not terms that I have invented. They

haven't been around for many, many years. I learned

about all these in books such as Innumeracy, which is

a very famous book about how to interpret numbers

correctly, and also a book called Mapping It Out, by a

cartographer called Mark Monmonier. In Mapping It

Out, Monmonier says that, and I agree with that, that

in order to consider a source, educated citizens

nowadays, we need to be able to do more than just

merely read and write. That's basic literacy, right? We

need that obviously.

Alberto Cairo: We cannot abandon that obviously. But we also need

articulacy, which is the ability to express ourselves

correctly through spoken words. On top of that, we

need numeracy. Numeracy is basically the elementary

skill, being able to think critically about numbers. I

usually equate it, compare it to some sort of sixth

sense in the back of your brain, that it starts ringing

Page 21: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

when you see a number in news media that doesn't

sound right.

Kirill Eremenko: Like a BS meter, a bullshit meter.

Alberto Cairo: Yes. Yeah, but it's not conscious. It's sort of a sixth

sense, that you see a number in the media and say,

"There is something dubious about this number.

There's something wrong about it. I don't know what it

is, but it doesn't sound right." That's a numeracy at

work. Numeracy is a skill that can be developed. You

can be educated in that, right? You don't need to

become a statistician or data scientist to have

elementary numeracy. Right? Obviously if you want to

become really, really numerate, it is better if you

formally study statistics and data science. But I've

come to believe that any regular citizen, like myself,

I'm not a statistician, I'm a journalist and a graphic

designer. I have come to believe that any citizen can

educate themselves, herself or himself in basic

numeracy.

Alberto Cairo: Then on top of that, you have a graphicacy, which is

graphical literacy, right? The ability to interpret, to

read and interpreted correctly maps and charts and

graphs and any sort of visual that represents the

numbers, right? How to extract the right meaning from

them, and it all begins with attention. You need to

basically put yourself in the frame of mind that says

that what you're seeing is not an illustration, is a

visual argument. In order to understand that visual

argument, you need to pay attention to it, right? Then

you need to apply some elementary principles of chart

reading that I explain in the book and in talks, et

Page 22: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

cetera, such as don't read too much into a chart. A

chart shows only what it shows and nothing else,

right? Because we tend to project what we want to

believe onto the charts that we see every day in news

media and that's very, very dangerous, right?

Alberto Cairo: Double check the sources. Where did the data come

from? Right? You need to ask yourself whether the

numbers that are displayed on the chart are

measuring what they say that they are measuring.

This is another critical thing to do sometimes, right?

So is it measuring the right thing, and what methods

were used to measure these particular phenomena?

Right? These things don't take longer than five or 10

minutes, and they can take you a long way to avoid

most of the cases in which you can be misled by a

chart that you see in news media.

Kirill Eremenko: Yeah. I really liked a lot your principles of graphical

literacy. So definite, it's not something that is taught

at school. If you don't mind, let's go over them. I think

they'll get a lot of value. Maybe starting with the

foundational one that you call as number zero, is your

data measuring what you think is measuring?

Alberto Cairo: Yeah, [crosstalk 00:32:14] measuring what you think

their measuring. Yes.

Kirill Eremenko: That's a very important question, right? Have you seen

examples of when charts are created-

Alberto Cairo: Oh, yeah.

Kirill Eremenko: ... with the wrong data?

Page 23: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Alberto Cairo: Yeah, I have seen. I have seen the samples of charts

measuring the wrong thing and saying that they are

measuring the right thing. Yeah. I don't know. But, for

example ... I don't know, not adjusting for inflation, for

example. Right? How many times have we seen stories

in news media saying, "The latest Marvel movie, the

latest superhero movie, is the highest grossing movie

of all time." Right? Then you take a look at the data

and you realize that the data is not adjusted for

inflation. That statement is not true obviously,

because you're basically using the absolute values,

when you should be using the adjusted values in order

to make that comparison. That happens all the time,

and sometimes we don't pay enough attention, and

therefore we are misled by those charts. Right? I have

plenty of examples of this in the book. The one that is

most popular with people in conferences is, is that I

once saw a map, this plain number of heavy metal

bands all over Europe-

Kirill Eremenko: Oh, yes. That one.

Alberto Cairo: Yeah, you saw that in the talk. That's a good chart by

the way. It's not a bad chart. But it's an example of

how to double check the source, because I actually

double checked the source in that particular case,

because when I saw the map, number of heavy metal

bands per million people per country, I asked myself,

"Well, what is this source of this chart calling heavy

metal? Are all the bands out there counting really

heavy metal, or do they belong to other musical genre,

et cetera?" Before tweeting the map and popularizing

the mapping in social media, I actually went to the

Page 24: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

source and made sure that they are actually counting

... that they had a more or less strict definition of what

heavy metal is. Obviously it's very hard to define, but

you can set some boundaries in there and basically

assess whether they are counting heavy metal bands,

or they are also including ... I don't know, pop or rock

bands or hard rock bands that are not really, really

heavy metal.

Alberto Cairo: I took a look at the source. I use these fun examples

and talks in the books to explain people how

important it is to spend at least one minute or a

couple of minutes double checking that, verifying that,

before you put that chart that you have seen in social

media in your own feed, for example. Because the

chart may be wrong, and if the chart is wrong, then

what you're doing is spreading misinformation, right?

We should ... We all have a responsibility as citizens

not to spread misinformation, or at least try not to

spread misinformation. We all make mistakes, right?

We all spread misinformation, but if we only spent one

minutes or two thinking about what we are seeing, it

will be less likely that we will spread misinformation

among our peers, or family or friends in social media.

Kirill Eremenko: Yeah. That's a common problem these days in the

world we live in, where people just catch onto

something they hear and they start spreading it. It's

very evident, for instance, in the political space where

something happens and people think it's really bad,

they start spreading, and they don't know the full

story, they don't know what actually happened. Then

when the full story emerges is completely different,

Page 25: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

and now there's all this deformation is already

happened. People are calling each other-

Alberto Cairo: Look, it happens to all of us. This is something that I

make very clear in talks and in the book, it happens to

all of us. It has happened to me, it will keep happening

to me in the future. However, it is less likely that it will

happen to me today than it was, say, five years ago or

10 years ago. Right? It was more likely before just

because now I'm a little bit more conscious about how

I consume media, how prone I am to be misled by

numbers or by stories or by charts. I try to be a little

bit more careful, and if we all try to be a little bit more

careful, we would not be able to avoid 100% of

problems or cases in which we may be misled by a

number or by a chart, but if we only avoid, say half of

them, that means half less misinformation around

there, right?

Kirill Eremenko: Yeah. With the hard rock bands, as far as I remember

from your talk, they had Bon Jovi in that ...

Alberto Cairo: No, they didn't. No, they didn't. That's the key thing.

That's what I explained in the talk and also in the

book, that the reason why I double checked the source

of that chart is that if you look into the literature

about the history of heavy metal or even if you go to

the Wikipedia page about heavy metal, you will see

that there are some bands that are mentioned in there

that is a little bit dubious that they are heavy metal.

For example, I think that the Wikipedia page mentions

Poison, which is a glam rock band from the '80s and

'90s.

Page 26: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Alberto Cairo: I doubt that that band can be really called heavy

metal. It's like if you ... I mean, heavy metal, what is ...

Heavy metal is Metallica, or is a ...I don't know, Slayer

or Judas Priest and all these bands, or Iron Maiden,

right? Poison is a fine rock band, but it's certainly not

heavy metal, I would say. They don't mention Bon

Jovi. None of these bands that I have some times seen

being categorized as heavy metal. They don't appear in

the source. I mean, the source only counts all the sub

genres of heavy metal.

Kirill Eremenko: Yeah. I guess that's your journalistic investigative

minds. It's interesting to see you coming from a

journalism background because then you can apply

this curiosity, this investigative approach to digging in

and being ... double checking all the facts. How would

you say that somebody can just develop that without

being a journalist, without the background that you

have?

Alberto Cairo: Through practice. It's also practice. As I said before, I

mean, I am a little bit better at doing this today than I

was say 10 years ago. The way that I wrote both How

Charts Lie and my previous book, The Truthful Art,

was trying to remember how I was 10 years ago or 15

years ago. What didn't I know 10 or 15 years ago that I

should have known? I try to basically summarize all

that into some key principles. Take a look at the

source, ask yourself whether the source is counting

what they said that they're counting, make sure that

the data is displayed in correct scales, that they are

not destroying the scales of the chart. Ask yourself

Page 27: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

whether the chart that you're seeing is showing

sufficient or insufficient information, right?

Alberto Cairo: Is it showing the right amount of detail in order for you

to figure it out what's going on, right? Try not to

project your own beliefs onto the chart that you are

seeing because a chart shows only what it shows and

nothing else. Be really, really careful because we are

prone, all prone to doing that, right? Try to curb your

own impulses a little bit to see your own views

confirmed by the data that you are seeing. Take a look

at whether the patterns that you are ... that the chart

is displaying are really there or not, right? You'll ask

yourself, be a little bit more attentive. Only by doing

that, as I said before, you will not be able to avoid all

cases in which you may be misled by chart but you

will avoid many, and by doing that you will become a

better chart reader.

Kirill Eremenko: Or creator, right? That's-

Alberto Cairo: Or a creator, right.

Alberto Cairo: ... very important as well.

Alberto Cairo: Yeah. Yeah. It's very important as well, because many

of these problems or many of the mistakes that we

make when reading charts, they're very common, even

among practitioners like myself, like journalists or

graphic designers, et cetera, that sometimes we are a

little bit careless with the data that we handle. I speak

based on my own experience. I mean, I take a look

back, 10, 15 years ago and I see some charts that if it

were today, I would have never had [inaudible

00:40:23] such as pie charts in 3D and with shadows

Page 28: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

and shades or highlights and things like that that

totally distorted the data, or scatter plot, the one that I

mentioned before, which I described such as the more

cigarettes we consume the longer we live. But no,

that's the wrong description for that chart. That's not

how to describe that chart, because that's not what

the chart is showing and so on and so forth.

Kirill Eremenko: Got you. I'll probably, here, jump to your, fifth

principle of graphical literacy because it fits in really

well. When you build visualizations, you recommend to

build narratives and test utilization. Specifically, I

really liked what you said about beginning of the text,

have ... rather than just starting to throw visualization

together, once you know what you want to display,

think of a long sentence that will describe

visualization, and then break it down into pieces and

visualize that. Could you tell us a bit more about this

approach, please?

Alberto Cairo: Sure. Sure, sure. But before I do that, I need to also

emphasize that visualization can be used with multiple

purposes in mind. When you take a look, for example,

at the classical cycle of data science diagram, right?

That you can read about in books, it just hardly

weakens, are for data science and many others.

Visualization comes in two different steps in that cycle,

because visualization can be used to either explore

data and discover things from the data, and we call

that exploratory data analysis, obviously. Right? It can

also be used to communicate your findings, right?

What I specialize in is on the second use of

visualization. I'm not an expert in exploratory data

Page 29: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

analysis, right? There are many people who work in

these fields, people who work in scientific

visualization, and in data science, and specializing

visualization for exploration.

Alberto Cairo: What I specialize in is in helping scientists and other

kinds of experts in communicating the results. When

you already know what you want to say, once you have

come out with the conclusions of your study, and you

want to communicate those conclusions, how you do

it. Then when I teach these principles to specialists, I

describe that technique that you have just mentioned,

that this is a little trick that I learned throughout the

years, to never begin with the visualization itself but

always begin with a very long description of what you

want to say, right? An elevator speech, or what you

want to describe.

Alberto Cairo: This is not a technique that I have invented. I need to

credit the sources for this technique because I

shamelessly stole it from some friends of mine. I heard

about this technique from Juan Velazco, who used to

be the graphics director at National Geographic

Magazine, he's a friend of mine, and also Javier

Zarracina, who is the graphics director at vox.com,

both long time visualization designers. Very, very

talented, very nice people.

Kirill Eremenko: Both from Spain, right?

Alberto Cairo: Both from Spain, yeah. There's some sort of Spanish

Mafia in the world of visualization in journalism. They

are both from Spain, yes. Anyway, they both described

this technique one day in a conference that I attended,

Page 30: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

a couple of conferences that I attended, and it all

begins by writing a very long sentence of what you

want to say. What is the story? What is the narrative

that you're trying to convey? Right? Begin always with

that, begin with a very long sentence. Oh, my study

focus on this and that, I discover this and that, the

exceptions are these and that, the limitations are this

and that, and you write a very long sentence about

that, and my conclusions are such and such, and

possible autonomous explanations may be such and

such.

Alberto Cairo: You'll begin with a very long sentence, and then what

you do is to split up that sentence into its natural

components. You try to find the natural breaks in that

sentence, and then you split it up into four, five, six

different components. Each one of those components

may become the headline of a different section in your

visualization or in your scientific poster or in your

whatever it is that you're writing, your article, right?

Those will be the main themes, the main topic in your

design, and they may become the titles of the sections

for your design. Then what you do is to design the

visualizations that support the assertions that you're

making in those pieces of the sentence, right? You put

your visualizations underneath each one of the pieces

of the sentence. By doing that, you're basically, first of

all, providing the elevator speech itself.

Alberto Cairo: If people don't want to really dig very deeply into your

visualizations, they can still read the long sentence

because the long sentence is after all the headlines

over your sections, so they can get away ... they can

Page 31: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

just read that, right? And get the gist of your story.

But then, if they want to really double check whether

what you are saying is right or not, they can take a

look at your visualizations, as your charts or graphs,

your maps, whatever visualizations that you're

designing.

Kirill Eremenko: That's a very powerful approach. On top of that, I

would like to probably talk a little bit about building

narrative into visualization. With this day and age, one

thing is just to create one image, which can be very

useful and insightful, but sometimes and more often

we see these infographics that combine multiple

images and a whole story behind them. In one of your

talks, I really enjoyed that whole story you built

around the population of Brazil as you were doing

some research or visualization on how the population

of Brazil has changed from 2000 to 2010. But then

once you added additional charts about the fertility

rate, you were able to tell a much clearer story. If you

don't mind, could you tell us a bit about that and how

that played out and the whole thing-

Alberto Cairo: Yes.

Kirill Eremenko: ... behind that?

Alberto Cairo: Yeah, [inaudible 00:46:15] It's actually quite weird to

do a podcast about visualization because you need to

verbally describe the chart. But this is an example that

appears in my first book, The Functional art, and it's a

story that I published when I was working for a media

organization in Brazil. I lived in Brazil for a few years.

We published this very large poster about population

Page 32: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

pattern changes in Brazil. It's a story made of several

graphics, and the first thing that you see is basically a

map and a bunch of bar graphs that shows you the

population increase, between 2000 and 2010, right?

Basically, the population of Brazil increased

everywhere, right? At the national level, at the regional

level, at the local level, with some exceptions. There

are several regions that lost population rather than

gaining population. But in general, the population of

Brazil grew between the two years.

Alberto Cairo: Well, that's interesting per se, right? But we decided to

start, in collaboration with demographers ... I rarely do

these kinds of project alone because I'm not an expert

on anything, right? In collaboration with

demographers and some political scientists, we started

digging a little bit deeper into the data provided by the

Brazilian Census Bureau. One critical piece of data

that appear in the news releases that we were getting

and the data that we were getting, is that Brazil's

fertility rate, which is the number of children per

woman in a country, was strangely or surprisingly

different to what it was expected, right? When you

think about fertility rates, when you think about rich

nations, for example, rich nations tend to have low

fertility rates, right? If you think about Germany or

Spain or whatever, western nations, relatively high

income in general, they tend to have fertility rates that

are around 1.5 children per woman, 1.8 children per

woman and so on and so forth.

Alberto Cairo: They are relatively low. If you go to very poor nations,

right? For example, Afghanistan or Yemen, fertility

Page 33: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

rates are very high, five children per woman, six

children per woman. Some African nations also have

very high fertility rates. I think that Nigeria is around

four right now. That's the average, right? Then if you

go to the middle of the spectrum, middle of the income

spectrum countries such as Brazil for example, right?

Fertility rates are usually between 2.5 or three point

something children per woman. That's the benchmark

of these kinds of nations, right? But when you take a

look at the data, that is not true. I mean, the fertility

rate of Brazil, if you ask Brazilians themselves, right? I

know this because I did it. If you ask Brazilian

journalists, what do you think that is the current

fertility rate of Brazil, you will get numbers such as 2.5

for three children per woman.

Alberto Cairo: Just because we have this idea of Brasil in mind as a

nation that is still in development, right? Or a nation

that is still very poor, and certainly there's a high

degree of poverty in Brazil, but that is not true over the

entirety of the country. Brazil is a continent, right?

When you take a look at the data, you will discover

that fertility rates in Brazil have dropped very

dramatically in the past 50 years, and the current

fertility rate of Brazil is around 1.8 children per

woman. That was a second piece of content that we

put in that poster that we designed. Because

obviously, if you have such a low fertility rate, 1.8,

that's below the replacement rate. The replacement

rate is the minimum number of children per woman of

fertility rate that a country needs to have in order to

keep the population stable.

Page 34: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Alberto Cairo: If your fertility rate drops below 2.1, which is this

magical number, right? Your population will become

older, and will start shrinking in the future, just

because you do not have enough children. If your

fertility rate drops below that number, your population

will become older, and in the future will start

shrinking. If you ask Brazilian demographers about

future population patterns in Brazil, they will tell you

that, that Brazil's population is predicted to become

older and to start shrinking around 2030 or something

like that. That's a problem. Why? Because well, Brazil

has a public health care system, it has retirement,

obviously public social security like the United States.

These population patterns would put a lot of pressure

in Brazil's public finances. How can you face that?

Well, there are several things that political scientists

have recommended to face these future situation.

Alberto Cairo: If you think about it, what I have done over here is

basically to use the technique that I explained before.

My very long sentence would be, "Brazil's population

has grown bigger but fertility rate is way below

expected. As a consequence of these, Brazil's

population will become older, and it will start

shrinking in the future. This will be a problem. Here's

how to face these problems." That's a very long

sentence. You split it up into its components, and then

you compare each one of these headlines, these little

titles, with the graphics that show the evidence for the

assertion that you're making. What we did was to use

maps and bar graphs to show population change, align

Page 35: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

chart to show the drop of fertility rates in Brazil in

comparison to other countries all over the world.

Alberto Cairo: We used a population pyramid to compare Brazil's

population today versus Brazil's population based on

age groups in 2050. A line chart to show Brazil's

population growing but then it started shrinking in

2030, and so on and so forth. Basically, it's a good

example, I believe, to illustrate how these narrative

principle works, right? It doesn't work always, but

when it does, when you can structure your

information this way, it can be really, really powerful.

Kirill Eremenko: It also takes care of the audience, because if you just

showed a chart where you're showing how the

population of Brazil grows from 2000 to 2010, people

might ... even though the chart's showing the correct

insights, people might misinterpret it and extrapolate

that the population is going to keep growing, and by

2020, it's going to-

Alberto Cairo: Or they may miss important features of the data,

right? That's why I emphasized before, the importance

of using text in data visualization. Again, we call this

the annotation layer in data visualization. Let's say

that you are doing a line chart showing progress in

sales in your company, and there is a sudden spike in

a particular point in time, you better put on an

annotation in there because otherwise people will

wonder, why is there this spike over here? What's

going on? Because you need to try to explain it. Put an

annotation in there, right? That annotation layer is

really, really relevant in data visualization. Pairing,

again. Pairing the visuals with the copy, with the texts

Page 36: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

that you can write to emphasize the important points

in the data, to supplement the data a little bit, to

reinforce the main messages that you're trying to

convey, or to avoid misinterpretations, right? Also, to

avoid misinterpretations of the data that you're

presenting.

Kirill Eremenko: In that sense, I really like the grammar of graphics,

how did they describe the multiple layers of

visualization. Multiple, starting from the axes all the

way to different colors and including annotation. Once

you understand, basically as they call it in the book,

the grammar of graphics, it really helps-

Alberto Cairo: The layer-

Kirill Eremenko: Layers.

Alberto Cairo: [inaudible 00:53:45] grammar of graphics. Yeah. This

is another one of those concepts that I try to explain to

the general public in the new book, in How Charts Lie.

I talk about the grammar of graphics. Obviously, I do it

in a much less technical way that Leland Wilkinson

did in his famous book, the Grammar of Graphics, or

Hadley Wickham does when talking about ggplot2, but

I still describe it. I still teach this principle in the new

book.

Kirill Eremenko: Definitely. That's very interesting. Unfortunately, we

won't have time to go into the rest of the principles of

graphical literacy. For our listeners, if you'd like to

learn more about them, I highly recommend picking

up Alberto's book, which is available on pre-order,

right Alberto?

Page 37: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Alberto Cairo: Yeah, it's already available everywhere for pre-order;

Amazon, Barnes & Noble, independent bookstores. It's

basically everywhere. [crosstalk 00:54:38] Yeah. Yeah,

it comes out in October the 15th, but yeah, you can

order it now.

Kirill Eremenko: Guys, girls, go get that book. It's going to be epic. I'm

definitely going to pick up a copy. In the remaining five

or so minutes, I wanted to just quickly touch on

something I'd love to get your opinion on, and that is

ethics in visualization. We already spoke a little bit

about being conscious about what you reshare, how

you read charts and double check the data behind

that, and I think with how we're moving more into a

technological world, with more and more screens

around us, with soon wearable devices and things like

that, ethics is going to be super important. What is

your stance on ethics in visualization? What

recommendations can you give to practitioners

listening to this?

Alberto Cairo: Oh Wow. That would take another entire book to talk

about. I may write about that in the future. I have that

on the pipeline, to write a book about how to handle

data, and particularly when you are going to visualize

it. I don't have very formed thoughts at the moment

because again, I may use this new book to think

clearly about these sorts of principles. But there's lots

of people writing about these things already, not from

the point of view of visualization but more from the

point of view of data science in general. I'm a

bookworm, I would like to recommend books. I would

recommend, for example, Cathy O'Neal's Weapons of

Page 38: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Math Destruction. I think that is a good introduction

to thinking about the implications of the data that we

handle every day, how to handle it carefully, clearly,

and ethically. I think that is a good introduction to

that.

Alberto Cairo: If you like something a little bit more controversial and

aggressive, which I really, really enjoy and that makes

you think, even if you disagree with the book

sometimes, because it's so aggressive, I would really,

really, really recommend Mike Montero's new book.

His new book, I believe, is called Ruined By Design. He

has a word design in the title, but it's a book about

data science. It's about a book about technologists,

how technologists gather data, how the data is

handled or mishandled, how careful we need to be

with the tools that we create and think about the

possible consequences of the tools that we create and

that we put out for the public to use, and so on and so

forth. Mike is a very passionate speaker. He's also a

very passionate writer.

Alberto Cairo: Again, you may not agree with everything that he says

in the book, but it's one of those books that even if you

disagree with it sometimes, it makes you think deeply,

and it makes you stop and think, "Is this gay right?

Am I doing things correctly?" Ethics begins with that;

with doubt. With doubting about your own decisions

and making ... have a dialogue with the book itself.

The book makes you think clearly. Those are two of my

favorite books to start thinking about how to use data

ethically, and visualization as an extension of that.

But there are many others. For example, Meredith

Page 39: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

Broussard, she has a book title Artificial

Unintelligence, which I really enjoyed. This is by MIT

Press if I'm not wrong.

Alberto Cairo: Virginia Eubanks, she has another book titled

Automating Inequality, which is about how algorithms

may promote or may perpetuate societal inequality.

That's another book that make me think. Again, none

of them covers visualization graphics in general, but

you can not understand visualizations separately from

the data that visualization is representing. Any book or

any thoughts about the ethics in data visualization,

necessarily needs to begin with thinking about the

data themselves.

Kirill Eremenko: Well, totally I love it. You're definitely a book warm.

That's so many interesting books that I've just been

writing down. Yeah, now I'm very curious about this

one, Ruined By Design by Mike Montero.

Alberto Cairo: You should really read it. I mean, it will make you feel

angry sometimes, I think, but for a very, very good

reason. I think that he makes a very good case. I

think.

Kirill Eremenko: That's wonderful. Well, on that note, Alberto, thank

you so much for coming on the show, sharing all your

insights. It's been a huge pleasure. Before I let you go,

what are some of the best ways to get in touch for your

work? Of course, in addition, or apart from purchasing

your book, which I highly recommend to everybody if

you love this podcast, go and get Alberto's new book,

How Charts Lie. In addition to that, what are some

Page 40: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

other ways that people can follow you and get access

to all these great things that you're creating?

Alberto Cairo: Sure. The best ways, I use Twitter quite a lot. My

handle is very easy to remember, is my first name and

last name. So it's Alberto Cairo, @AlbertoCairo. You

can find me on Twitter. I'm also on Facebook and on

LinkedIn. I'm both in LinkedIn and Facebook, but I

use Twitter most of the time, as a way to promote

things that other people do, graphics that other people

design, articles that I read, papers that I have

discovered, books that I'm reading, whatever. I use it

as a platform to share, basically, things that I see and

that I enjoy. I also have a web blog. The web blog is the

title of my first book, The Functional Art. It's

thefunctionalart.com. That's my web blog, and that's

the platform that I use to write a little bit more

extensively about things that I see or so. Those are the

best ways, I would say.

Kirill Eremenko: Got you.

Alberto Cairo: My E-mail address is very easy to find, in any of these

platforms.

Kirill Eremenko: Fantastic. Also, everybody listening, Alberto, you have

a huge 45 and a half thousand followers on Twitter.

Yeah, it's a great community to be part of, I guess, to

follow-

Alberto Cairo: Yeah, and it's a-

Kirill Eremenko: [crosstalk 01:00:36] his insights.

Alberto Cairo: It's a fun community, as well. There is one virtue that

the visualization community has, which is that it's

Page 41: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

very welcoming to newcomers. If you want to get

started in data visualization, you just need to basically

get started. Start designing your graphics, putting it

out there, asking people for advice, asking people for

feedback, and most people, or 99.9% of the people who

I know in the visualization community are very

constructive, welcoming, friendly, and it's a great

community to work in.

Kirill Eremenko: For sure. I find that to be true across all of data

science. It's surprisingly such and inspiringly so such

a wonderful community of helpful-

Alberto Cairo: Yeah, absolutely.

Kirill Eremenko: ... people.

Alberto Cairo: The [inaudible 01:01:21] community is very similar to

the visualization, as far as I have seen. Yeah.

Kirill Eremenko: Fantastic. Well, once again, Alberto, thank you so

much for coming on the show and sharing all these

amazing insights. Super, super excited to chat, and

good luck for the book once and for all the touring that

you're going to do in a couple months from now.

Alberto Cairo: Thank you so much for having me again. It was a

pleasure.

Kirill Eremenko: There you have it, ladies and gentlemen. Thank you so

much for being part of today's episode of the

SuperDataScience podcast. That was Alberto Cairo.

What an epic person. What an epic expert in the space

of data visualization. I got a ton from this podcast, got

so many takeaways, and I hope you did too. Just from

this conversation, you can tell the depth of thinking

Page 42: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

that goes into Alberto in his visualization. You're going

to find all of the infographics that we talked about in

the show notes for this episode at

www.superdatascience.com/271. That's

superdatascience.com/271, and just have a look

through them. Look at, for instance, the cigarettes

versus life expectancy, or the Brazil visualization that

we were talking about, or the Nobel Prize and

chocolates visualizations.

Kirill Eremenko: Just look at all of these different visualizations that

you'll find there, and notice the depth of thinking that

went into creating them, and you will recognize a lot of

the things that Alberto was actually talking about on

this podcast, from understanding if your data is

measuring the right thing that you wanted to be

measuring and that you think it's measuring, to

building narratives and creating a narrative structure

in your visualization and conveying those insights in a

certain way so that people can better understand

them. Also, if you see Alberto's visualizations on the

Internet, you'll find that they're definitely very

persuasive and very memoral. Of course if you enjoyed

this podcast, make sure to pick up Alberto's new book,

which is called How Charts Lie: Getting Smarter about

Visual Information, is coming out in October, 2019,

but you can already pick up a copy now. You can pre-

order a copy on Amazon or Barnes & Noble, on

Amazon UK, or wherever you're shopping for your

books.

Kirill Eremenko: Highly recommend putting on a pre-order so that you

get it fresh once they're live. What I really like about

Page 43: SDS PODCAST EPISODE 271: MAKING THE PUBLIC … · 2019-06-19 · You just need to go to superdatasceine.com and sign up there, cancel at any time. In addition with your membership,

this book, as Alberto described it, is that it's for the

general public, and that means if you're not that at

depth at data visualization, you're going to get a great

headstart. But if you're already a data scientist, and

you're already visualizing a lot of things and you're

pretty experienced in this space, it will help you see

visualization from the eyes of your audience, and

understand what kind of issues they're going through,

what kind of challenges they're facing. I think it's a

very valuable skill to empathize with the people that

you're creating this for, for your audience. That can be

very, very powerful.

Kirill Eremenko: Of course, as usual, if you know anybody who can

benefit from this podcast, somebody who's interested

in visualization, somebody who's a fan of Alberto

Cairo, or somebody who's dabbling on the verge of

getting into visualization or not, send them this

podcast, give them this gift of insights into what the

world of data visualization's all about, and you might

even help them change their lives, changing their

careers and progress forward. Share the love, share

this link; superdatascience.com/271 with anybody

who you think could benefit from it. On that note,

thank you so much for being here today, make sure to

follow Alberto on Twitter and any other social media,

and I look forward to seeing you back here next time.

Until then, happy analyzing.