nyc data science meetup: computational social science

99
Computational Social Science Jake Hofman Microsoft Research November 6, 2014 @jakehofman (Microsoft Research) Computational Social Science November 6, 2014 1 / 62

Upload: jakehofman

Post on 12-Jul-2015

329 views

Category:

Science


7 download

TRANSCRIPT

Page 1: NYC Data Science Meetup: Computational Social Science

Computational Social Science

Jake Hofman

Microsoft Research

November 6, 2014

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 1 / 62

Page 2: NYC Data Science Meetup: Computational Social Science

MSR NYC

http://research.microsoft.com/en-us/labs/newyork/

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 2 / 62

Page 3: NYC Data Science Meetup: Computational Social Science

Questions

Many long-standing questions in the social sciences are notoriouslydi�cult to answer, e.g.:

• “Who says what to whom in what channel with what e↵ect”?(Laswell, 1948)

• How do ideas and technology spread through cultures?(Rogers, 1962)

• How do new forms of communication a↵ect society?(Singer, 1970)

• . . .

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 3 / 62

Page 4: NYC Data Science Meetup: Computational Social Science

Conventional methods

Typically di�cult to observe the relevant information viaconventional methods

(Katz & Lazarsfeld, 1955)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 4 / 62

Page 5: NYC Data Science Meetup: Computational Social Science

Large-scale data

Recently available electronic data provide an unprecedentedopportunity to address these questions at scale

Demographic Behavioral Network

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 5 / 62

Page 6: NYC Data Science Meetup: Computational Social Science

Computational social science

An emerging discipline at the intersection of the social sciences,statistics, and computer science

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 6 / 62

Page 7: NYC Data Science Meetup: Computational Social Science

Computational social science

An emerging discipline at the intersection of the social sciences,statistics, and computer science

(motivating questions)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 6 / 62

Page 8: NYC Data Science Meetup: Computational Social Science

Computational social science

An emerging discipline at the intersection of the social sciences,statistics, and computer science

(fitting large, potentially sparse models)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 6 / 62

Page 9: NYC Data Science Meetup: Computational Social Science

Computational social science

An emerging discipline at the intersection of the social sciences,statistics, and computer science

(parallel processing for filtering and aggregating data)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 6 / 62

Page 10: NYC Data Science Meetup: Computational Social Science

biogeographic patterns. Their study, too, is

centered on a large database, but in this case it

is entirely of living organisms, the marine

bivalves. Over 28,000 records of bivalve gen-

era and subgenera from 322 locations around

the world have now been compiled by these

authors, giving a global record of some 854

genera and subgenera and 5132 species. No

fossils are included in the database, but

because bivalves have a good fossil record, it is

possible to estimate accurately the age of ori-

gin of almost all extant genera. It is then possi-

ble to plot a backward survivorship curve (8)

for each of the 27 global bivalve provinces (9).

On the basis of these curves, Krug et al. find

that origination rates of marine bivalves in-

creased significantly almost everywhere im-

mediately after the K-Pg mass extinction event.

The highest K-Pg origination rates all occurred

in tropical and warm-temperate regions. A dis-

tinct pulse of bivalve diversification in the early

Cenozoic was concentrated mainly in tropical

and subtropical regions (see the figure).

The steepest part of the global backward

survivorship curve for bivalves lies between 65

and 50 million years ago, pointing to a major

biodiversification event in the Paleogene (65 to

23 million years ago) that is perhaps not yet

captured in Alroy et al.’s database (5, 7). The

jury is still out on what may have caused this

event. But we should not lose sight of the fact

that the steep rise to prominence of many mod-

ern floral and faunal groups in the Cenozoic

may bear no simple relationship to climate or

any other type of environmental change (10, 11).

References

1. G. G. Mittelbach et al., Ecol. Lett. 10, 315 (2007).2. A. Z. Krug, D. Jablonski, J. W. Valentine, Science 323, 767

(2009).3. P. W. Signor, Annu. Rev. Ecol. Syst. 21, 509 (1990).4. R. K. Bambach, Geobios 32, 131 (1999).5. J. Alroy et al., Proc. Natl. Acad. Sci. U.S.A. 98, 6261 (2001).6. A.M. Bush et al., Paleobiology 30, 666 (2004).7. J. Alroy et al., Science 321, 97 (2008).8. M. Foote, in Evolutionary Patterns, J. B. C. Jackson et al.,

Eds. (Univ. of Chicago Press, Chicago, IL, 2001), vol. 245,pp. 245–295.

9. M. D. Spalding et al., Bioscience 57, 573 (2007).10. S. M. Stanley, Paleobiology 33, 1 (2007).11. M. J. Benton, B. C. Emerson, Palaeontology 50, 23 (2007).

10.1126/science.1169410

www.sciencemag.org SCIENCE VOL 323 6 FEBRUARY 2009 721

PERSPECTIVES

We live life in the network. We check

our e-mails regularly, make mobile

phone calls from almost any loca-

tion, swipe transit cards to use public trans-

portation, and make purchases with credit

cards. Our movements in public places may be

captured by video cameras, and our medical

records stored as digital files. We may post blog

entries accessible to anyone, or maintain friend-

ships through online social networks. Each of

these transactions leaves digital traces that can

be compiled into comprehensive pictures of

both individual and group behavior, with the

potential to transform our understanding of our

lives, organizations, and societies.

The capacity to collect and analyze massive

amounts of data has transformed such fields as

biology and physics. But the emergence of a

data-driven “computational social science” has

been much slower. Leading journals in eco-

nomics, sociology, and political science show

little evidence of this field. But computational

social science is occurring—in Internet compa-

nies such as Google and Yahoo, and in govern-

ment agencies such as the U.S. National Secur-

ity Agency. Computational social science could

become the exclusive domain of private com-

panies and government agencies. Alternatively,

there might emerge a privileged set of aca-

demic researchers presiding over private data

from which they produce papers that cannot be

critiqued or replicated. Neither scenario will

serve the long-term public interest of accumu-

lating, verifying, and disseminating knowledge.

What value might a computational social

science—based in an open academic environ-

ment—offer society, by enhancing understand-

ing of individuals and collectives? What are the

A field is emerging that leverages the

capacity to collect and analyze data at a

scale that may reveal patterns of individual

and group behaviors.

Computational Social Science

David Lazer,

1

Alex Pentland,

2

Lada Adamic,

3

Sinan Aral,

2,4

Albert-László Barabási,

5

Devon Brewer,

6

Nicholas Christakis,

1

Noshir Contractor,

7

James Fowler,

8

Myron Gutmann,

3

Tony Jebara,

9

Gary King,

1

Michael Macy,

10

Deb Roy,

2

Marshall Van Alstyne

2,11

SOCIAL SCIENCE

1Harvard University, Cambridge, MA, USA. 2MassachusettsInstitute of Technology, Cambridge, MA, USA. 3Universityof Michigan, Ann Arbor, MI, USA. 4New York University,New York, NY, USA. 5Northeastern University, Boston, MA,USA. 6Interdisciplinary Scientific Research, Seattle, WA,USA. 7Northwestern University, Evanston, IL, USA.8University of California–San Diego, La Jolla, CA, USA.9Columbia University, New York, NY, USA 10CornellUniversity, Ithaca, NY, USA. 11Boston University, Boston,MA, USA. E-mail: [email protected]. Completeaffiliations are listed in the supporting online material.

Data from the blogosphere. Shown is a link structure within a community of political blogs (from 2004),where red nodes indicate conservative blogs, and blue liberal. Orange links go from liberal to conservative,and purple ones from conservative to liberal. The size of each blog reflects the number of other blogs thatlink to it. [Reproduced from (8) with permission from the Association for Computing Machinery]

Published by AAAS

“... a computational social science is emerging that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scale ...”

http://sciencemag.org/content/323/5915/721

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 7 / 62

Page 11: NYC Data Science Meetup: Computational Social Science

biogeographic patterns. Their study, too, is

centered on a large database, but in this case it

is entirely of living organisms, the marine

bivalves. Over 28,000 records of bivalve gen-

era and subgenera from 322 locations around

the world have now been compiled by these

authors, giving a global record of some 854

genera and subgenera and 5132 species. No

fossils are included in the database, but

because bivalves have a good fossil record, it is

possible to estimate accurately the age of ori-

gin of almost all extant genera. It is then possi-

ble to plot a backward survivorship curve (8)

for each of the 27 global bivalve provinces (9).

On the basis of these curves, Krug et al. find

that origination rates of marine bivalves in-

creased significantly almost everywhere im-

mediately after the K-Pg mass extinction event.

The highest K-Pg origination rates all occurred

in tropical and warm-temperate regions. A dis-

tinct pulse of bivalve diversification in the early

Cenozoic was concentrated mainly in tropical

and subtropical regions (see the figure).

The steepest part of the global backward

survivorship curve for bivalves lies between 65

and 50 million years ago, pointing to a major

biodiversification event in the Paleogene (65 to

23 million years ago) that is perhaps not yet

captured in Alroy et al.’s database (5, 7). The

jury is still out on what may have caused this

event. But we should not lose sight of the fact

that the steep rise to prominence of many mod-

ern floral and faunal groups in the Cenozoic

may bear no simple relationship to climate or

any other type of environmental change (10, 11).

References

1. G. G. Mittelbach et al., Ecol. Lett. 10, 315 (2007).2. A. Z. Krug, D. Jablonski, J. W. Valentine, Science 323, 767

(2009).3. P. W. Signor, Annu. Rev. Ecol. Syst. 21, 509 (1990).4. R. K. Bambach, Geobios 32, 131 (1999).5. J. Alroy et al., Proc. Natl. Acad. Sci. U.S.A. 98, 6261 (2001).6. A.M. Bush et al., Paleobiology 30, 666 (2004).7. J. Alroy et al., Science 321, 97 (2008).8. M. Foote, in Evolutionary Patterns, J. B. C. Jackson et al.,

Eds. (Univ. of Chicago Press, Chicago, IL, 2001), vol. 245,pp. 245–295.

9. M. D. Spalding et al., Bioscience 57, 573 (2007).10. S. M. Stanley, Paleobiology 33, 1 (2007).11. M. J. Benton, B. C. Emerson, Palaeontology 50, 23 (2007).

10.1126/science.1169410

www.sciencemag.org SCIENCE VOL 323 6 FEBRUARY 2009 721

PERSPECTIVES

We live life in the network. We check

our e-mails regularly, make mobile

phone calls from almost any loca-

tion, swipe transit cards to use public trans-

portation, and make purchases with credit

cards. Our movements in public places may be

captured by video cameras, and our medical

records stored as digital files. We may post blog

entries accessible to anyone, or maintain friend-

ships through online social networks. Each of

these transactions leaves digital traces that can

be compiled into comprehensive pictures of

both individual and group behavior, with the

potential to transform our understanding of our

lives, organizations, and societies.

The capacity to collect and analyze massive

amounts of data has transformed such fields as

biology and physics. But the emergence of a

data-driven “computational social science” has

been much slower. Leading journals in eco-

nomics, sociology, and political science show

little evidence of this field. But computational

social science is occurring—in Internet compa-

nies such as Google and Yahoo, and in govern-

ment agencies such as the U.S. National Secur-

ity Agency. Computational social science could

become the exclusive domain of private com-

panies and government agencies. Alternatively,

there might emerge a privileged set of aca-

demic researchers presiding over private data

from which they produce papers that cannot be

critiqued or replicated. Neither scenario will

serve the long-term public interest of accumu-

lating, verifying, and disseminating knowledge.

What value might a computational social

science—based in an open academic environ-

ment—offer society, by enhancing understand-

ing of individuals and collectives? What are the

A field is emerging that leverages the

capacity to collect and analyze data at a

scale that may reveal patterns of individual

and group behaviors.

Computational Social Science

David Lazer,

1

Alex Pentland,

2

Lada Adamic,

3

Sinan Aral,

2,4

Albert-László Barabási,

5

Devon Brewer,

6

Nicholas Christakis,

1

Noshir Contractor,

7

James Fowler,

8

Myron Gutmann,

3

Tony Jebara,

9

Gary King,

1

Michael Macy,

10

Deb Roy,

2

Marshall Van Alstyne

2,11

SOCIAL SCIENCE

1Harvard University, Cambridge, MA, USA. 2MassachusettsInstitute of Technology, Cambridge, MA, USA. 3Universityof Michigan, Ann Arbor, MI, USA. 4New York University,New York, NY, USA. 5Northeastern University, Boston, MA,USA. 6Interdisciplinary Scientific Research, Seattle, WA,USA. 7Northwestern University, Evanston, IL, USA.8University of California–San Diego, La Jolla, CA, USA.9Columbia University, New York, NY, USA 10CornellUniversity, Ithaca, NY, USA. 11Boston University, Boston,MA, USA. E-mail: [email protected]. Completeaffiliations are listed in the supporting online material.

Data from the blogosphere. Shown is a link structure within a community of political blogs (from 2004),where red nodes indicate conservative blogs, and blue liberal. Orange links go from liberal to conservative,and purple ones from conservative to liberal. The size of each blog reflects the number of other blogs thatlink to it. [Reproduced from (8) with permission from the Association for Computing Machinery]

Published by AAAS

“... shares with other nascent interdisciplinary fields

(e.g., sustainability science) the need to develop a

paradigm for training new scholars ...”

http://sciencemag.org/content/323/5915/721

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 7 / 62

Page 12: NYC Data Science Meetup: Computational Social Science

The clean real story

“We have a habit in writing articles published in

scientific journals to make the work as finished as

possible, to cover all the tracks, to not worry about the

blind alleys or to describe how you had the wrong idea

first, and so on. So there isn’t any place to publish, in

a dignified manner, what you actually did in order to

get to do the work ...”

-Richard FeynmanNobel Lecture

1, 1965

1

http://bit.ly/feynmannobel

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 8 / 62

Page 13: NYC Data Science Meetup: Computational Social Science

Outline

Search predictions"Right Round"

Week

Ran

k

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

Web diversity

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

Information di↵usion

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 9 / 62

Page 14: NYC Data Science Meetup: Computational Social Science

Predicting consumer activity with Web searchwith Sharad Goel, Sebastien Lahaie, David Pennock, Duncan Watts

"Right Round"

Week

Ran

k

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 10 / 62

Page 15: NYC Data Science Meetup: Computational Social Science

Search predictionsMotivation

Does collective search activityprovide useful predictive signalabout real-world outcomes?

"Right Round"

Week

Ran

k

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 11 / 62

Page 16: NYC Data Science Meetup: Computational Social Science

Search predictionsMotivation

Past work mainly focuses on predicting the present2 and ignoresbaseline models trained on publicly available data

Date

Flu

Leve

l (Pe

rcen

t)

1

2

3

4

5

6

7

8

2004 2005 2006 2007 2008 2009 2010

ActualSearchAutoregressive

2

Varian, 2009

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 12 / 62

Page 17: NYC Data Science Meetup: Computational Social Science

Search predictionsMotivation

We predict future sales for movies, video games, and music

"Transformers 2"

Time to Release (Days)

Sear

ch V

olum

e

a

−30 −20 −10 0 10 20 30

"Tom Clancy's HAWX"

Time to Release (Days)

Sear

ch V

olum

e

b

−30 −20 −10 0 10 20 30

"Right Round"

Week

Rank

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 13 / 62

Page 18: NYC Data Science Meetup: Computational Social Science

Search predictionsSearch models

For movies and video games, predict opening weekend box o�ceand first month sales, respectively:

log(revenue) = �0

+ �1

log(search) + ✏

For music, predict following week’s Billboard Hot 100 rank:

billboardt+1

= �0

+ �1

searcht + �2

searcht�1

+ ✏

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 14 / 62

Page 19: NYC Data Science Meetup: Computational Social Science

Search predictionsSearch volume

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 15 / 62

Page 20: NYC Data Science Meetup: Computational Social Science

Search predictionsSearch models

Search activity is predictive for movies, video games, and musicweeks to months in advance

Movies

Predicted Revenue (Dollars)

Actu

al Re

venu

e (D

ollar

s)

103

104

105

106

107

108

109

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●●

●●●

●●

●●

●●

●●

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

103 104 105 106 107 108 109

Video Games

Predicted Revenue (Dollars)

Actu

al Re

venu

e (D

ollar

s)103

104

105

106

107

●●

●●

●●

●●

●●

●●

● ●

●●

bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

103 104 105 106 107

● Non−SequelSequel

Music

Predicted Billboard Rank

Actu

al Bi

llboa

rd R

ank

0

20

40

60

80

100

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

c

0 20 40 60 80 100

Movies

Time to Release (Weeks)

Mod

el Fi

t

0.4

0.5

0.6

0.7

0.8

0.9 ddddddd

−6 −5 −4 −3 −2 −1 0

Video Games

Time to Release (Weeks)

Mod

el Fi

t

0.4

0.5

0.6

0.7

0.8

0.9 eeeeeee

−6 −5 −4 −3 −2 −1 0

Music

Time to Release (Weeks)M

odel

Fit

0.4

0.5

0.6

0.7

0.8

0.9 fffffff

−6 −5 −4 −3 −2 −1 0

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 16 / 62

Page 21: NYC Data Science Meetup: Computational Social Science

Search predictionsBaseline models

For movies, use budget, number of opening screens and HollywoodStock Exchange:

log(revenue) = �0

+ �1

log(budget) + �2

log(screens) +

�3

log(hsx) + ✏

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 17 / 62

Page 22: NYC Data Science Meetup: Computational Social Science

Search predictionsBaseline models

For video games, use critic ratings and predecessor sales (sequelsonly):

log(revenue) = �0

+ �1

rating + �2

log(predecessor) + ✏

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 17 / 62

Page 23: NYC Data Science Meetup: Computational Social Science

Search predictionsBaseline models

For music, use an autoregressive model with the previouslyavailable rank:

billboardt+1

= �0

+ �1

billboardt�1

+ ✏

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 17 / 62

Page 24: NYC Data Science Meetup: Computational Social Science

Search predictionsBaseline + combined models

Baseline models are often surprisingly good

Movies (Baseline)

Predicted Revenue (Dollars)

Actu

al Re

venu

e (D

ollar

s)

103

104

105

106

107

108

109

●●

●●

● ●

●●

●●●

● ●

●●

● ●

●●●●

●●●

●●

●●

●●

●●

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

103 104 105 106 107 108 109

Video Games (Baseline)

Predicted Revenue (Dollars)

Actu

al Re

venu

e (D

ollar

s)103

104

105

106

107

●●

●●

●●

●●

●●

●●

●●

●●

bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

103 104 105 106 107

● Non−SequelSequel

Music (Baseline)

Predicted Billboard Rank

Actu

al Bi

llboa

rd R

ank

0

20

40

60

80

100

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

c

0 20 40 60 80 100

Movies (Combined)

Predicted Revenue (Dollars)

Actu

al Re

venu

e (D

ollar

s)

103

104

105

106

107

108

109

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

●●

ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd

103 104 105 106 107 108 109

Video Games (Combined)

Predicted Revenue (Dollars)

Actu

al Re

venu

e (D

ollar

s)

103

104

105

106

107

●●

●●

●●

●●

●●

●●

●●

●●

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

103 104 105 106 107

● Non−SequelSequel

Music (Combined)

Predicted Billboard Rank

Actu

al Bi

llboa

rd R

ank

0

20

40

60

80

100

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

f

0 20 40 60 80 100

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 18 / 62

Page 25: NYC Data Science Meetup: Computational Social Science

Search predictionsModel comparison

For movies, search is outperformed by the baseline and of littlemarginal value

M

odel

Fit

0.4

0.5

0.6

0.7

0.8

0.9

1.0

CombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombined

SearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearch

BaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaseline

Nonse

quel

Games

Seque

l Gam

esMus

ic

Movies Flu

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 19 / 62

Page 26: NYC Data Science Meetup: Computational Social Science

Search predictionsModel comparison

For video games, search helps substantially for non-sequels, less sofor sequels

M

odel

Fit

0.4

0.5

0.6

0.7

0.8

0.9

1.0

CombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombined

SearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearch

BaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaseline

Nonse

quel

Games

Seque

l Gam

esMus

ic

Movies Flu

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 19 / 62

Page 27: NYC Data Science Meetup: Computational Social Science

Search predictionsModel comparison

For music, the addition of search yields a substantially bettercombined model

M

odel

Fit

0.4

0.5

0.6

0.7

0.8

0.9

1.0

CombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombinedCombined

SearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearchSearch

BaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaselineBaseline

Nonse

quel

Games

Seque

l Gam

esMus

ic

Movies Flu

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 19 / 62

Page 28: NYC Data Science Meetup: Computational Social Science

Search predictionsSummary

• Relative performance and value of search varies acrossdomains

• Search provides a fast, convenient, and flexible signal acrossdomains

• “Predicting consumer activity with Web search”Goel, Hofman, Lahaie, Pennock & Watts, PNAS 2010

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 20 / 62

Page 29: NYC Data Science Meetup: Computational Social Science

Outline

Search predictions"Right Round"

Week

Ran

k

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

Web diversity

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

Information di↵usion

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 21 / 62

Page 30: NYC Data Science Meetup: Computational Social Science

Demographic diversity on the Webwith Irmak Sirer and Sharad Goel (ICWSM 2012)

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 22 / 62

Page 31: NYC Data Science Meetup: Computational Social Science

Motivation

Previous work is largely survey-based and focuses and group-leveldi↵erences in online access

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 23 / 62

Page 32: NYC Data Science Meetup: Computational Social Science

Motivation

“As of January 1997, we estimate that 5.2 million

African Americans and 40.8 million whites have ever used

the Web, and that 1.4 million African Americans and

20.3 million whites used the Web in the past week.”

-Ho↵man & Novak (1998)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 23 / 62

Page 33: NYC Data Science Meetup: Computational Social Science

Motivation

Focus on activity instead of access

How diverse is the Web?

To what extent do online experiences vary across demographicgroups?

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 24 / 62

Page 34: NYC Data Science Meetup: Computational Social Science

Data

• Representative sample of 265,000 individuals in the US, paidvia the Nielsen MegaPanel3

• Log of anonymized, complete browsing activity from June2009 through May 2010 (URLs viewed, timestamps, etc.)

• Detailed individual and household demographic information(age, education, income, race, sex, etc.)

3

Special thanks to Mainak Mazumdar

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 25 / 62

Page 35: NYC Data Science Meetup: Computational Social Science

Data

# ls -alh nielsen_megapanel.tar

-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar

• Normalize pageviews to at most three domain levels, sans wwwe.g. www.yahoo.com ! yahoo.com,us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com

• Restrict to top 100k (out of 9M+ total) most popular sites(by unique visitors)

• Aggregate activity at the site, group, and user levels

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 26 / 62

Page 36: NYC Data Science Meetup: Computational Social Science

Data

# ls -alh nielsen_megapanel.tar

-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar

• Normalize pageviews to at most three domain levels, sans wwwe.g. www.yahoo.com ! yahoo.com,us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com

• Restrict to top 100k (out of 9M+ total) most popular sites(by unique visitors)

• Aggregate activity at the site, group, and user levels

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 26 / 62

Page 37: NYC Data Science Meetup: Computational Social Science

Data

# ls -alh nielsen_megapanel.tar

-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar

• Normalize pageviews to at most three domain levels, sans wwwe.g. www.yahoo.com ! yahoo.com,us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com

• Restrict to top 100k (out of 9M+ total) most popular sites(by unique visitors)

• Aggregate activity at the site, group, and user levels

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 26 / 62

Page 38: NYC Data Science Meetup: Computational Social Science

Data

# ls -alh nielsen_megapanel.tar

-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar

• Normalize pageviews to at most three domain levels, sans wwwe.g. www.yahoo.com ! yahoo.com,us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com

• Restrict to top 100k (out of 9M+ total) most popular sites(by unique visitors)

• Aggregate activity at the site, group, and user levels

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 26 / 62

Page 39: NYC Data Science Meetup: Computational Social Science

Aggregate usage patterns

How do users distribute their time across di↵erent categories?

Frac

tion

of to

tal p

agev

iew

s

0.05

0.10

0.15

0.20

0.25●

● ●

Social

Media

E−mail

Games

Portals

Search

All groups spend the majority of their time in the top five mostpopular categories

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 27 / 62

Page 40: NYC Data Science Meetup: Computational Social Science

Aggregate usage patterns

How do users distribute their time across di↵erent categories?

User Rank by Daily Activity

Frac

tion

of P

agev

iew

s in

Cat

egor

y

0.05

0.10

0.15

0.20

0.25

0.30

● ● ● ●●

10% 30% 50% 70% 90%

● Social MediaE−mailGamesPortalsSearch

Highly active users devote nearly twice as much of their time tosocial media relative to typical individuals

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 27 / 62

Page 41: NYC Data Science Meetup: Computational Social Science

Group-level activity

How does browsing activity vary at the group level?

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

Large di↵erences exist even at the aggregate level(e.g. women on average generate 40% more pageviews than men)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 28 / 62

Page 42: NYC Data Science Meetup: Computational Social Science

Group-level activity

How does browsing activity vary at the group level?

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

Younger and more educated individuals are both more likely toaccess the Web and more active once they do

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 28 / 62

Page 43: NYC Data Science Meetup: Computational Social Science

Group-level activity

All demographic groups spend the majority of their time in thesame categories

Age

Frac

tion

of to

tal p

agev

iew

s

0.0

0.1

0.2

0.3

0.4

0.5

●●

● ●

●●

● ●

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

● Social MediaE−mailGamesPortalsSearch

Fr

actio

n of

tota

l pag

evie

ws

0.0

0.1

0.2

0.3

0.4Education

● ●

●●

Grammar

Schoo

l

Some H

igh Sch

ool

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Sex

Female Male

Income

●● ●

●●

$0−25k

$25−50k

$50−75k

$75−100k

$100−150k

$150k+

Race

● ●● ●

Other

Hispan

icBlack

White

Asian

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 29 / 62

Page 44: NYC Data Science Meetup: Computational Social Science

Group-level activity

Older, more educated, male, wealthier, and Asian Internet usersspend a smaller fraction of their time on social media

Age

Frac

tion

of to

tal p

agev

iew

s

0.0

0.1

0.2

0.3

0.4

0.5

●●

● ●

●●

● ●

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

● Social MediaE−mailGamesPortalsSearch

Fr

actio

n of

tota

l pag

evie

ws

0.0

0.1

0.2

0.3

0.4Education

● ●

●●

Grammar

Schoo

l

Some H

igh Sch

ool

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Sex

Female Male

Income

●● ●

●●

$0−25k

$25−50k

$50−75k

$75−100k

$100−150k

$150k+

Race

● ●● ●

Other

Hispan

icBlack

White

Asian

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 29 / 62

Page 45: NYC Data Science Meetup: Computational Social Science

Group-level activity

Lower social media use by these groups is often accompanied byhigher e-mail volume

Age

Frac

tion

of to

tal p

agev

iew

s

0.0

0.1

0.2

0.3

0.4

0.5

●●

● ●

●●

● ●

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

● Social MediaE−mailGamesPortalsSearch

Fr

actio

n of

tota

l pag

evie

ws

0.0

0.1

0.2

0.3

0.4Education

● ●

●●

Grammar

Schoo

l

Some H

igh Sch

ool

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Sex

Female Male

Income

●● ●

●●

$0−25k

$25−50k

$50−75k

$75−100k

$100−150k

$150k+

Race

● ●● ●

Other

Hispan

icBlack

White

Asian

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 29 / 62

Page 46: NYC Data Science Meetup: Computational Social Science

Group-level activity

Fem

ale−

to−m

ale

page

view

ratio

0.5

1

2

● ●

●●

● ● ● ● ●

● ● ●

●● ●

● ● ● ●●

● ●●●●●● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ●● ●

● ● ● ● ●●●● ●

●● ● ● ● ● ● ● ●

●● ●● ●

● ● ● ● ●●

● ● ●

●●

Appare

l/Bea

uty

Family

Resou

rces

Multi−c

atego

ry Hom

e & Fa

shionPets

Holiday

s & Spe

cial E

vents

Health

, Fitn

ess &

Nutritio

n

Food

& Cookin

g

Photog

raphy

Non−P

rofit

Multi−c

atego

ry Spe

cial O

ccasio

ns

Home &

Gard

en

Multi−c

atego

ry Fa

mily & Li

festyle

sBoo

ks

Membe

r Com

munitie

s

Mass M

ercha

ndise

r

Greetin

g Card

s

Genea

logy

Univers

ities

Shopp

ing Dire

ctorie

s & G

uides

Educa

tiona

l Res

ource

s

Gifts & Flow

ers

Corpora

te Inf

ormati

on

Real E

state/

Apartm

ents

E−mail

Kids, G

ames

, Toy

s

Govern

ment

Online G

ames

Directo

ries/L

ocal

Guides

Coupo

ns/Rew

ards

Cellular

/Paging

Multi−c

atego

ry Te

lecom

/Inter

net S

ervice

s

Cruise

Line

s

Insura

nce

Full Serv

ice Ban

ks & Cred

it Unio

ns

Full Serv

ice Com

mercial

Banks

& Credit U

nionsLo

ans

Religion

& Spiritu

ality

Broadc

ast M

edia

Destin

ation

s

Multi−c

atego

ry Tra

vel

Genera

l Inter

est P

ortals

& Commun

ities

Software

Man

ufactu

rers

Delivery

/Stamps

Arts/G

raphic

s

Credit C

ard

Search

Hotels/H

otel D

irecto

ries

Maps/T

ravel

Info

Multi−c

atego

ry Ente

rtainm

ent

Long

Distanc

e/Loc

al Carr

ier

Airline

s

Career

Develop

ment

Financ

ial To

ols

Classifi

eds/A

uctio

ns

Free M

ercha

ndiseEve

nts

Multi−c

atego

ry New

s & In

formati

onISP

Instan

t Mes

sagin

g

Ground

Tran

sport

ation

Multi−c

atego

ry Fina

nce/I

nsura

nce/I

nvestm

ents

Curren

t Eve

nts & G

lobal

News

Music

Specia

l Inter

est N

ews

Weathe

r

Intern

et To

ols/W

eb Serv

ices

Gamblin

g/Swee

pstak

es

Resea

rch To

ols

Military

Hardware

Man

ufactu

rers

Targe

ted Port

als & Com

munitie

s

Multi−c

atego

ry Com

puter

s & Con

sumer

Electro

nics

Automoti

ve M

anufa

cturer

Videos

/Mov

ies

Web Hos

ting

Compu

ter & Con

sumer

Electro

nics N

ews

Multi−c

atego

ry Auto

motive

Automoti

ve In

formati

on

Multi−C

atego

ry Edu

catio

n & Care

ers

Parts &

Accesso

ries

Financ

ial New

s & In

formati

onHum

or

Person

als

Online T

radingSpo

rtsAdu

lt

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 30 / 62

Page 47: NYC Data Science Meetup: Computational Social Science

Revisiting the digital divide

How does usage of news, health, and reference vary withdemographics?

A

vera

ge p

agev

iews

per

mon

th

0

2

4

6

8

10

12Education

● ●

Grammar

Schoo

l

Some H

igh Sch

ool

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Sex

Female Male

Income

● ● ●●

$0−25k

$25−50k

$50−75k

$75−100k

$100−150k

$150k+

Race

● ●●

Other

Hispan

icBlack

White

Asian

● NewsHealthReference

Post-graduates spend three times as much time on health sitesthan adults with only some high school education

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 31 / 62

Page 48: NYC Data Science Meetup: Computational Social Science

Revisiting the digital divide

How does usage of news, health, and reference vary withdemographics?

A

vera

ge p

agev

iews

per

mon

th

0

2

4

6

8

10

12Education

● ●

Grammar

Schoo

l

Some H

igh Sch

ool

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Sex

Female Male

Income

● ● ●●

$0−25k

$25−50k

$50−75k

$75−100k

$100−150k

$150k+

Race

● ●●

Other

Hispan

icBlack

White

Asian

● NewsHealthReference

Asians spend more than 50% more time browsing online news thando other race groups

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 31 / 62

Page 49: NYC Data Science Meetup: Computational Social Science

Revisiting the digital divide

How does usage of news, health, and reference vary withdemographics?

A

vera

ge p

agev

iews

per

mon

th

0

2

4

6

8

10

12Education

● ●

Grammar

Schoo

l

Some H

igh Sch

ool

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Sex

Female Male

Income

● ● ●●

$0−25k

$25−50k

$50−75k

$75−100k

$100−150k

$150k+

Race

● ●●

Other

Hispan

icBlack

White

Asian

● NewsHealthReference

Even when less educated and less wealthy groups gain access tothe Web, they utilize these resources relatively infrequently

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 31 / 62

Page 50: NYC Data Science Meetup: Computational Social Science

Revisiting the digital divide

How does usage of news, health, and reference vary withdemographics?

A

vera

ge p

agev

iew

s pe

r mon

th

0

2

4

6

8

10

12News

● ●

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Health

●● ●

●●

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

Reference

●● ●

● ●

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

AsianBlackHispanicWhite

Controlling for other variables, e↵ects of race and gender largelydisappear, while education continues to have large e↵ect

pi =X

j

↵jxij +X

j

X

k

�jkxijxik +X

j

�jx2

ij + ✏i

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 32 / 62

Page 51: NYC Data Science Meetup: Computational Social Science

Revisiting the digital divide

How does usage of news, health, and reference vary withdemographics?

A

vera

ge p

agev

iew

s pe

r mon

th

0

2

4

6

8

10

12Health

●● ●

● ●

High Sch

ool G

radua

te

Some C

ollege

Associa

te Deg

ree

Bache

lor's D

egree

Post G

radua

te Deg

ree

FemaleMale

However, women spend considerably more time on health sitescompared to men

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 33 / 62

Page 52: NYC Data Science Meetup: Computational Social Science

Revisiting the digital divide

How does usage of news, health, and reference vary withdemographics?

Monthly pageviews on health sites

20 40 60 80 100

FemaleMale

However, women spend considerably more time on health sitescompared to men, although means can be misleading

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 33 / 62

Page 53: NYC Data Science Meetup: Computational Social Science

Individual-level prediction

How well can one predict an individual’s demographics from theirbrowsing activity?

• Represent each user by the set of sites visited

• Fit linear models4 to predict majority/minority for eachattribute on 80% of users

• Tune model parameters using a 10% validation set

• Evaluate final performance on held-out 10% test set

4

http://bit.ly/svmperf

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 34 / 62

Page 54: NYC Data Science Meetup: Computational Social Science

Individual-level prediction

Reasonable (⇠70-85%) accuracy and AUC across all attributes

College/No College

Under/Over $50,000Household Income

White/Non−White

Female/Male

Over/Under 25Years Old

Accuracy●

.5 .6 .7 .8 .9 1

AUC●

.5 .6 .7 .8 .9 1

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 35 / 62

Page 55: NYC Data Science Meetup: Computational Social Science

Individual-level prediction

Highly-weighted sites under the fitted models

Large positive weight Large negative weight

Femalewinster.com

lancome-usa.com

sports.yahoo.com

espn.go.com

Whitemarlboro.com

cmt.com

mediatakeout.com

bet.com

College Educatednews.yahoo.com

linkedin.com

youtube.com

myspace.com

Over 25 Years Oldevite.com

classmates.com

addictinggames.com

youtube.com

Household IncomeUnder $50,000

eharmony.com

tracfone.com

rownine.com

matrixdirect.com

Table 2: A selection of the most predictive (i.e., most highly weighted) sites for each classification task.

College/No College

Under/Over $50,000Household Income

White/Non−White

Female/Male

Over/Under 25Years Old

AUC!

!

!

!

!

.5 .6 .7 .8 .9 1

Accuracy!

!

!

!

!

.5 .6 .7 .8 .9 1

Figure 7: Summary of model performance, indicatedby solid circles, for all demographic attributes. Pop-ulation skew is given by x’s for comparison. Notethat higher AUC closely corresponds to lower Jac-card similarity, as shown in Figure 6.

ear SVMs generate predictions of the form

y(xi) = w · xi + b

where the predicted class is defined by the sign of y(xi). Toguard against overfitting, SVMs seek the weight vector wthat maximally separates the positive and negative examplesin the training set. Specifically, SVMs optimize the lossfunction

L(y, y) = CX

i

[1 � yiy(xi)]+ + ||w||2

where [x]+ = (|x|+x)/2 indicates the positive part, and C isa tunable parameter that balances model fit against gener-alization. Users are randomly divided into an 80% trainingset on which models are fit, a 10% validation set used toselect the optimal parameter C for each demographic at-tribute, and a 10% held-out test set on which we evaluateand report final performance.

Figure 7 summarizes our results for all five classificationtasks. The right panel displays the accuracy of predictions,showing reasonable performance across all demographic di-mensions, with slightly higher accuracies for age, sex, andrace—80%, 76%, and 82%, respectively—than for educationand income—70% and 68%. To help put these numbers inperspective, Figure 7 also includes the overall populationskew for each demographic attribute, indicated by x’s (e.g.,57% of the online population is female, while 76% is com-prised of adults).

Given the substantial demographic skew, we also presentAUC—or area under the ROC curve—in the left panel of

Figure 7, a measure that e�ectively re-normalizes the ma-jority and minority classes to have equal size. Intuitively,AUC is the probability that a model scores a randomly se-lected positive example higher than a randomly selected neg-ative one (e.g., the probability that the model correctly dis-tinguishes between a randomly selected female and male).Though an uninformative rule would correctly discriminatebetween such pairs 50% of the time, predictions based onbrowsing histories are relatively reliable, ranging from 74%to 85%. Thus, whether we measure performance in terms ofaccuracy or AUC, we find that browsing activity provides astrong signal for inferring individual-level demographic at-tributes.

A benefit of linear models is the interpretability of theweight vector w. In Table 2, we report a sample of the mostpredictive (i.e., largest positively and negatively weighted)sites for each attribute. For example, visiting the popu-lar cosmetics company lancome-usa.com strongly indicatesthat a user is female, while visits to the sports sites sports.yahoo.com or espn.go.com are highly predictive of beingmale. Interestingly, and perhaps less apparent, the collab-orative gaming community site winster.com is also amongthe highest weighted female-predicitive sites; closer inspec-tion reveals that the site was created by a northern Cal-ifornia housewife as an alternative to gaming destinationsthat cater to young males. Analogously, visits to Coun-try Music Television (cmt.com) are a strong indicator of be-ing White, while visits to Black Entertainment Television(bet.com) are a strong non-White indicator. Though visitsto highly weighted sites provide strong cues, we note thatmany such sites are frequented by a relatively small frac-tion of the population. Thus, model performance is likelyenhanced by the many weak signals from visits to popularbut less discriminating sites.

We next examine whether demographic di�erences in on-line activity—as measured by predictive quality—persist aswe restrict to increasingly popular sites. As shown in Figure8, models fit on as few as the top 1,000 sites perform onlymarginally worse than those fit on all 114,000 domains (farright)—in other words, even on these top sites, demographicdi�erences are relatively large. For example, in predictingsex using the top 1,000 sites, AUC decreases only four per-centage points, from 75% to 71%. That visits to popular—and relatively heterogenous—sites are quite informative is atestament to the aggregate strength of weak signals.

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 36 / 62

Page 56: NYC Data Science Meetup: Computational Social Science

Individual-level prediction

Substantially better performance when restricted to “stereotypical”users (⇠80-90%)

Fraction of Users

AUC

0.70

0.75

0.80

0.85

0.90

0.95

●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

● AgeSexRaceEducationIncome

Fraction of Users

Accu

racy

0.70

0.75

0.80

0.85

0.90

0.95

●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

● AgeSexRaceEducationIncome

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 37 / 62

Page 57: NYC Data Science Meetup: Computational Social Science

Individual-level prediction

Similar performance even when restricted to top 1k sites

Number of Domains

AUC

0.5

0.6

0.7

0.8

0.9

● ●

102 102.5 103 103.5 104 104.5 105

● AgeSexRaceEducationIncome

Number of Domains

Accu

racy

0.5

0.6

0.7

0.8

0.9

●● ●

102 102.5 103 103.5 104 104.5 105

● AgeSexRaceEducationIncome

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 38 / 62

Page 58: NYC Data Science Meetup: Computational Social Science

Site-level skew

Proportion Female Visitors

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0Proportion White Visitors

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0Proportion College Educated Visitors

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

Proportion Adult Visitors

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Visitors WithHousehold Incomes Under $50,000

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

Many sites have skew close the overall mean, but there alsopopular, highly-skewed sites

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 39 / 62

Page 59: NYC Data Science Meetup: Computational Social Science

Individual-level prediction

Proof of concept browser demo

http://bit.ly/surfpreds

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 40 / 62

Page 60: NYC Data Science Meetup: Computational Social Science

Summary

• Highly active users spend disproportionately more of theirtime on social media and less on e-mail relative to the overallpopulation

• Access to research, news, and healthcare is strongly related toeducation, not as closely to ethnicity

• User demographics can be inferred from browsing activity withreasonable accuracy

• “Who Does What on the Web”, Goel, Hofman & Sirer,ICWSM 2012

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 41 / 62

Page 61: NYC Data Science Meetup: Computational Social Science

Outline

Search predictions"Right Round"

Week

Ran

k

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

Web diversity

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

Information di↵usion

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 42 / 62

Page 62: NYC Data Science Meetup: Computational Social Science

The structual virality of online di↵usionwith Ashton Anderson, Sharad Goel, Duncan Watts (201?)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 43 / 62

Page 63: NYC Data Science Meetup: Computational Social Science

“Going Viral”?

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 44 / 62

Page 64: NYC Data Science Meetup: Computational Social Science

“Going Viral”?

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 45 / 62

Page 65: NYC Data Science Meetup: Computational Social Science

“Going Viral”?

“Therefore we ... wish to proceed with great care as is

proper, and to cut o↵ the advance of this plague and

cancerous disease so it will not spread any further ...”

5

-Pope Leo XExsurge Domine (1520)

5

http://www.economist.com/node/21541719

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 45 / 62

Page 66: NYC Data Science Meetup: Computational Social Science

“Going Viral”?

Rogers (1962), Bass (1969)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 46 / 62

Page 67: NYC Data Science Meetup: Computational Social Science

“Going viral”?

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 47 / 62

Page 68: NYC Data Science Meetup: Computational Social Science

“Going viral”?

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 47 / 62

Page 69: NYC Data Science Meetup: Computational Social Science

Data

• Examined one year of tweets from July 2011 to July 2012

• Restricted to 1.4 billion tweets containing links to top news,videos, images, and petitions sites

• Aggregated tweets by URL, resulting in 1 billion distinct“events”

• Crawled friend list of each adopter

• Inferred “who got what from whom” to construct di↵usiontrees

• Characterized size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 48 / 62

Page 70: NYC Data Science Meetup: Computational Social Science

Data

• Examined one year of tweets from July 2011 to July 2012

• Restricted to 1.4 billion tweets containing links to top news,videos, images, and petitions sites

• Aggregated tweets by URL, resulting in 1 billion distinct“events”

• Crawled friend list of each adopter

• Inferred “who got what from whom” to construct di↵usiontrees

• Characterized size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 48 / 62

Page 71: NYC Data Science Meetup: Computational Social Science

Data

• Examined one year of tweets from July 2011 to July 2012

• Restricted to 1.4 billion tweets containing links to top news,videos, images, and petitions sites

• Aggregated tweets by URL, resulting in 1 billion distinct“events”

• Crawled friend list of each adopter

• Inferred “who got what from whom” to construct di↵usiontrees

• Characterized size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 48 / 62

Page 72: NYC Data Science Meetup: Computational Social Science

Data

• Examined one year of tweets from July 2011 to July 2012

• Restricted to 1.4 billion tweets containing links to top news,videos, images, and petitions sites

• Aggregated tweets by URL, resulting in 1 billion distinct“events”

• Crawled friend list of each adopter

• Inferred “who got what from whom” to construct di↵usiontrees

• Characterized size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 48 / 62

Page 73: NYC Data Science Meetup: Computational Social Science

Data

• Examined one year of tweets from July 2011 to July 2012

• Restricted to 1.4 billion tweets containing links to top news,videos, images, and petitions sites

• Aggregated tweets by URL, resulting in 1 billion distinct“events”

• Crawled friend list of each adopter

• Inferred “who got what from whom” to construct di↵usiontrees

• Characterized size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 48 / 62

Page 74: NYC Data Science Meetup: Computational Social Science

Data

• Examined one year of tweets from July 2011 to July 2012

• Restricted to 1.4 billion tweets containing links to top news,videos, images, and petitions sites

• Aggregated tweets by URL, resulting in 1 billion distinct“events”

• Crawled friend list of each adopter

• Inferred “who got what from whom” to construct di↵usiontrees

• Characterized size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 48 / 62

Page 75: NYC Data Science Meetup: Computational Social Science

The Structural Virality of Online Di↵usion

A

B

D

C

E

Tim

e

Group posts by URL

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 49 / 62

Page 76: NYC Data Science Meetup: Computational Social Science

The Structural Virality of Online Di↵usion

A

B

D

C

E

Tim

e

Label each friend who previously adopted as a potential parent

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 49 / 62

Page 77: NYC Data Science Meetup: Computational Social Science

The Structural Virality of Online Di↵usion

A

B

D

C

E

Tim

e

Select each node’s most recent adopting friend as its parent

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 49 / 62

Page 78: NYC Data Science Meetup: Computational Social Science

The Structural Virality of Online Di↵usion

A

B

D

C

E

Gene

ratio

ns

Characterize size and structure of trees

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 49 / 62

Page 79: NYC Data Science Meetup: Computational Social Science

Information di↵usionCascade size distribution

0.00001%

0.0001%

0.001%

0.01%

0.1%

1%

10%

1 10 100 1,000 10,000

Cascade Size

CC

DF

Focus on the rare hits that get at least 100 adoptions

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 50 / 62

Page 80: NYC Data Science Meetup: Computational Social Science

Quantifying structure

Measure the average distance between all pairs of nodes6

⌫(T ) =1

n(n � 1)

nX

i=1

nX

j=1

dij

6

Weiner (1947); correlated with other possible metrics

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 51 / 62

Page 81: NYC Data Science Meetup: Computational Social Science

Quantifying structure

Measure the average distance between all pairs of nodes6

⌫(T ) =2n

n � 1

"1

n

X

S2S|S |� 1

n

2

X

S2S|S |2

#

6

Weiner (1947); correlated with other possible metrics

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 51 / 62

Page 82: NYC Data Science Meetup: Computational Social Science

Information di↵usionSize and virality by category

Remarkable structural diversity across across categories

0.001%

0.01%

0.1%

1%

10%

100%

100 1,000 10,000

Cascade Size

CC

DF

VideosPicturesNewsPetitions

0.001%

0.01%

0.1%

1%

10%

100%

3 10 30

Structural Virality

CC

DF

VideosPicturesNewsPetitions

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 52 / 62

Page 83: NYC Data Science Meetup: Computational Social Science

Information di↵usionStructural diversity

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 53 / 62

Page 84: NYC Data Science Meetup: Computational Social Science

Information di↵usionStructural diversity

Size is relatively poor predictive of structure

Petitions News Pictures Videos

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●●●

●●●

●●●

●●

●●●●●

●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●●●●●

●●●

●●●

●●●●●

●●

●●●●

●●●

●●

●●

●●●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●●

●●●●

●●●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●

●●

●●

●●●

●●●●●

●●●

●●

●●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●●●

●●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●●●●●

●●

●●●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

3

10

30

100

300

1,000 10

030

01,0

003,0

00 100

300

1,000

3,000

10,00

010

030

01,0

003,0

0010

,000

Cascade size

Stru

ctur

al v

iralit

y

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 54 / 62

Page 85: NYC Data Science Meetup: Computational Social Science

Simulations

Simulate cascades with a simple SIR model7,varying infectivity and degree skew

662 CHAPTER 21. EPIDEMICS

y

x z

t

r v

u

w

s

(a)

y

x z

t

r v

u

w

s

(b)

y

x z

t

r v

u

w

s

(c)

y

x z

t

r v

u

w

s

(d)

Figure 21.2: The course of an SIR epidemic in which each node remains infectious for anumber of steps equal to tI = 1. Starting with nodes y and z initially infected, the epidemicspreads to some but not all of the remaining nodes. In each step, shaded nodes with darkborders are in the Infectious (I) state and shaded nodes with thin borders are in the Removed(R) state.

Extensions to the SIR model. Although the contact network in the general SIR model

can be arbitrarily complex, the disease dynamics are still being modeled in a simple way.

Contagion probabilities are set to a uniform value p, and contagiousness has a kind of “on-o�”

property: a node is equally contagious for each of the tI steps while it has the disease.

However, it is not di�cult to extend the model to handle more complex assumptions.

First, we can easily capture the idea that contagion is more likely between certain pairs of

nodes by assigning a separate probability pv,w to each pair of nodes v and w for which v

links to w in the directed contact network. Here, higher values of pv,w correspond to closer

contact and more likely contagion, while lower values indicate less intensive contact. We

can also choose to model the infectious period as random in length, by assuming that an

infected node has a probability q of recovering in each step while it is infected, while leaving

8

7

Kermack & McKendrick (1927)

8

Easley & Kleinberg (2010)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 55 / 62

Page 86: NYC Data Science Meetup: Computational Social Science

Simulations

This reproduces the observed marginal distributions of size andstructure

3

10

30

100

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●●

●●●●●●●●●●●●●

●●●●●●

●●

●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●

●●●●

●●●

●●

●●

●●●●●●●●

●●

●●●●●

●●

●●●●●●●

●●●●●●

●●●●●●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●

●●●

●●●●●

●●●●

●●

●●●●●●●●●●

●●●●

●●●

●●●●

●●

●●●●●

●●

●●●

●●●●●●●●●●●●●●

●●●

●●●

●●●

●●●●●●●●●●●

●●

●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●

●●●

●●●●●●

●●●●●●●

●●●

●●●●●●●●●●

●●

●●●●●

●●●●

●●●●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●●

●●

●●●●●

●●●

●●●●●●

●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●

●●●

●●

●●●●●●●●

●●

●●●●●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●●●●●●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●●●●●●

●●

●●●●

●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●

●●●

●●●●●

●●●●●●

●●●●●●●●

●●●●

●●●●●●●

●●

●●

●●●

●●

●●●●●

●●●

●●●

●●

●●●●●

●●●●

●●●●●●

●●●●●●●●

●●●●●●●●

●●●

●●●

●●●

●●●●

●●●

●●●●●●●

●●●●

●●●

●●●●●

●●

●●●●

●●●●

●●●

●●●●●●●●

●●●

●●●●●●●●●●

●●●

●●●●●

●●

●●

●●●●●

●●●

●●●●●●

●●

●●

●●●

●●●●●●

●●●

●●●

●●●●

●●●

●●●●●●

●●●●●●●

●●●●●

●●●●

●●●●

●●●●●●●●

●●●●

●●●●●●

●●

●●●

●●

●●●●

●●●●●●

●●●●●●●●

●●●●

●●●

●●●

●●●●●●●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●●

●●●

●●●●

●●●

●●●●●●●●●●●

●●●●●●

●●

●●●●

●●●●●●●

●●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●●●●

●●●

●●●●●●

●●

●●●●●●●●●●●●

●●●

●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●●

●●

●●

●●●

●●

●●●●●●●

●●

●●●

●●●●

●●●●●●●●

●●●●

●●●●●●

●●●●

●●

●●●●●●●

●●

●●●●●●●●●●●●●●

●●

●●

●●●●

●●●●

●●●●●●

●●●●●●●

●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●●●●●●●●●●●

●●●●●●●

●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●

●●●

●●●●●●●●●

●●

●●●●

●●●●

●●

●●●●●●

●●

●●●●●●

●●●

●●

●●●●●

●●●

●●

●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●●●

●●●

●●●●

●●●

●●

●●

●●●●●●●

●●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●●●●●●

●●●●

●●

●●●

●●●●●●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●

●●●●●●

●●●●●●●●

●●

●●●●●●

●●●●●●●●

●●

●●●

●●●●●●●●●●●●

●●●

●●

●●●●

●●

●●●●●●●●●

●●●●●●●●●●●

●●

●●●●●

●●●●●●●

●●●

●●●

●●

●●

●●●●

●●●●●●●●●●●

●●●

●●●●●●●

●●

●●●●●●●

●●●●●●

●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●●●●●

●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●●●●●

●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●

●●●

●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●●●●●●●●●

●●●●

●●●●

●●●

●●

●●●●

●●

●●●●●●

●●●

●●●

●●●●●

●●●●●

●●●

●●●●●●●●●

●●●●●

●●

●●●●●●●

●●●●●●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●●●●●●●

●●●●

●●●●●

●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●

●●●●

●●●●●●●

●●●●●●●

●●

●●

●●

●●●●

●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●

●●●

●●

●●●●●●●●●

●●●●

●●

●●●●

●●●

●●●

●●●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●●●●●●●

●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●●

●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●●●●●●●

●●●

●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●●●

●●●●

●●

●●●

●●●●

●●●

●●●

●●

●●●●●

●●●

●●●●●

●●

●●●●●●●●●●●●●●

●●●●●

●●

●●●●

●●●●●●●●●●

●●

●●●

●●

●●

●●●

●●●●●●●

●●●

●●●

●●●●●

●●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●●●●●

●●

●●●

●●●●●

●●●●●●

●●●●●

●●

●●●●●

●●

●●●●●●

●●●●●●●●●

●●●●

●●●●●●●●

●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●●●

●●●

●●

●●●●●●●●●●●

●●●●●●

●●

●●●

●●●●●●

●●●●●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●●

●●●●●

●●●●●●●●●

●●●●●●●●●●

●●

●●●

●●

●●

●●

●●●●●●

●●●●●●●●

●●

●●●●

●●●●

●●●

●●●●●●●●●●

●●

●●

●●●●●

●●●

●●●●●

●●●●●●

●●●●●

●●●●●●●●●●●●●●

●●●

●●

●●

●●●●

●●●●

●●●●●

●●●●●●

●●●●●●●●

●●●

●●

●●●●●●

●●●●

●●

●●●●●●●

●●●

●●●●●

●●●●●●●

●●●

●●

●●●●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●●●●●●●

●●●●

●●●●●●

●●●●

●●●●●●●

●●●●

●●

●●●

●●

●●

●●●●●

●●●

●●●

●●●

●●

●●●

●●●●●●●●●●

●●

●●●●●●

●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●●●●●

●●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●●

●●

●●●

●●

●●●●

●●●

●●●

●●●●●●

●●

●●●

●●●●●●

●●●●

●●●

●●●●

●●●●●

●●●●

●●●●●

●●

●●

●●

●●●●●

●●●

●●●

●●●●●●●●●●●●

●●

●●●

●●

●●

●●

●●●●●●●●●●

●●●●

●●

●●●

●●●●●●●●●●

●●●●

●●

●●●●●●●●●●

●●

●●●●●●●●

●●

●●●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●●●

●●●●●●●●

●●●●●●

●●

●●●

●●●●

●●●●●●●●●●●●●●●

●●●●

●●●

●●

●●●●●●

●●●

●●●●

●●●

●●●

●●

●●●●●●●●

●●●●

●●●●●●●

●●

●●

●●●

●●●●●●

●●●●●●●●●

●●●●

●●●●●

●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●●●●●

●●

●●●●●●●

●●●●

●●●

●●●

●●

●●●●●●

●●●●●

●●

●●●

●●●

●●

●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●

●●

●●●●●●●●●●

●●●●●●●●●

●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●

●●●●●

●●●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●

●●

●●●●●●●

●●●●

●●●●●●●●●●

●●●●●●●●

●●●

●●●

●●●●●●●●●●●

●●●●●●●

●●

●●

●●●●●●●●

●●●●●●●●●

●●

●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●●●●

●●●

●●●●

●●●

●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●

●●●●●

●●

●●●●●●

●●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●

●●

●●

●●

●●●●●

●●●

●●●●●

●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●●●●●●●●

●●

●●●●●●●●●●●●●

●●●

●●●●

●●

●●●●●

●●●●●●

●●

●●

●●●●●

●●●

●●

●●●

●●●●●

●●●●

●●●●●●

●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●

●●●●●●●●●●●

●●

●●●●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●●●●●●

●●●●●●

●●

●●●●

●●●

●●●●●●●●●●●●

●●

●●●●●●●●

●●●●●

●●

●●●●●●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●

●●

●●●

●●

●●●●●●●●●●●●

●●●●●

●●

●●

●●●

●●●●●●●●●●

●●

●●●

●●●●●

●●●●●●

●●●●●

●●●●●●

●●●

●●●●●

●●●●●●

●●

●●

●●●●

●●

●●●●●●●

●●●●●●●●

●●

●●●●●

●●●●●●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●

●●

●●●

●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●

●●

●●●●●●●●

●●●●●●●

●●

●●

●●●●

●●●●●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●●●

●●●●

●●●●●

●●●●●●●●

●●●

●●●

●●●

●●●●●

●●

●●●

●●

●●●●●●

●●●

●●●●

●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●

●●

●●

●●

●●●●●●●

●●●●●●●

●●

●●

●●●●●

●●●

●●●●●

●●●●●

●●●●

●●●

●●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●●●

●●

●●

●●●●●●●●●

●●●●

●●

●●●●

●●●

●●●

●●●

●●●●●●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●●

●●●●●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●

●●●●

●●●●●

●●●●●●●●●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●●●●

●●

●●●●

●●●●●●

●●●●●●●

●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●●●●●

●●●

●●●●●●●●●

●●●●●●

●●

●●●●●●●●

●●●●●

●●●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●●●●

●●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●●●●●●

●●

●●●

●●

●●●●●●●

●●●●●●

●●●

●●●●●●●

●●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●●●●

●●

●●●

●●●●

●●●

●●●●●●●

●●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●●

●●●●

●●●●

●●●●

●●●

●●●●●

●●

●●●●●●

●●●●●●●●

●●

●●●

●●●●

●●

●●●●

●●●●●●●●

●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●●●

●●

●●●●●

●●●●●●●●●

●●

●●

●●●●●●●●●●

●●●●●●●●●

●●

●●●●●●●●●●●

●●●

●●

●●●●●●●

●●●●

●●●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●●●●●

●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●●●●

●●●

●●●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●

●●●●

●●

●●●

●●●●●●●●●●

●●

●●●

●●

●●●●●

●●●●●

●●

●●●●●●●

●●●

●●●●●●●●

●●

●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●●●●●

●●

●●●

●●●●●●●

●●●●

●●●

●●●●

●●●

●●●

●●●●

●●●●●

●●●●

●●●●●●●●●●

●●●

●●●●●

●●●

●●●●●●●

●●●

●●●●●

●●

●●●

●●●●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●●●●

●●

●●●●

●●●●

●●●●

●●●

●●●

●●●●

●●

●●●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●●●

●●●

●●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●

●●

●●

●●●●●

●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●●●●

●●●●

●●●●●

●●

●●●●

●●●

●●●●

●●

●●●●

●●●●

●●

●●●●

●●●

●●●●●●

●●●●●●●●●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●●

●●●●

●●●●●

●●

●●

●●●●●●●●●

●●

●●

●●

●●●●●

●●●

●●●●

●●

●●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●

●●●●●●●●●●

●●

●●●●●●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●●●●●●●●

●●

●●

●●

●●●●●●

●●●●●●●●

●●●●●

●●●●●●●

●●●●●●●●

●●

●●

●●●●●●

●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●

●●●●●●

●●●●

●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●●●

●●●●●●●●●

●●●

●●

●●●

●●●●●●●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●●●●●●●●

●●●

●●●

●●

●●●●●●

●●●

●●●●●●

●●

●●●●●

●●●

●●

●●●

●●●●

●●●●●●

●●●●

●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●●●

●●

●●

●●●●●●

●●

●●●●●●

●●●

●●●●●●●

●●●●●●●

●●

●●●●●●●●

●●●

●●●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●●●

●●●●●

●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●●●●●

●●●●●

●●●●

●●●●●●●●

●●●

●●●

●●●●

●●●●●●●

●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●

●●●●●●

●●

●●●●

●●●●●

●●●●●

●●

●●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●●●●

●●

●●●●

●●●●●●●●

●●●●

●●●

●●●●●

●●

●●●●●

●●●●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●●

●●

●●●●●

●●●●●●●●

●●

●●●●

●●●●

●●●●●

●●

●●●●●

●●

●●

●●●●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●

●●●●

●●●●●●●

●●●

●●●●●●●

●●●●

●●

●●

●●●

●●●

●●●●●●●●●●●

●●●

●●●

●●

●●●●●●●●●●●

●●●●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●●●●●●

●●●●

●●●●●●●●

●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●●●●●●

●●

●●●●●●●●

●●●●

●●

●●

●●●●

●●●

●●●

●●●●●●

●●●●

●●●●●

●●●●●●●

●●●●●

●●●●

●●●

●●

●●●

●●●●

●●

●●

●●●●

●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●

●●

●●●

●●●●●●●●●●●

●●

●●

●●

●●●●●●●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●●●●

●●

●●●●●

●●

●●●●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●●●●

●●●●●●●●

●●

●●

●●●

●●●

●●●●●●●●●●●

●●●●●

●●

●●●●●●●

●●

●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●

●●●●

●●

●●●●●

●●

●●●

●●●

●●●●●●●

●●●●●●

●●●●●●●

●●●

●●

●●

●●●●●

●●●

●●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●●●●

●●●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●●●

●●●●

●●●●

●●

●●●

●●

●●●

●●●●●

●●●●●

●●●●

●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●

●●●

●●●●●●●

●●●

●●●●

●●

●●●●●●●●●

●●●●

●●●●●

●●●●●●

●●

●●●●●●●●●●

●●●●

●●●

●●●

●●●●●●

●●

●●

●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●●●●●●●●●●●

●●

●●●

●●●●●

●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●

●●●●

●●●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●●●

●●●●●●●

●●

●●●●●

●●

●●●●

●●●●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●●●●●

●●●●

●●

●●●●●●●●●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●

●●●●●

●●

●●●●●

●●●

●●●●●●●●

●●

●●●●●

●●●●●

●●●

●●

●●

●●●●

●●●●●●●●

●●●●●

●●●●●

●●●●●

●●

●●●

●●

●●

●●●●●●●●

●●●●●●●●

●●●●

●●●

●●

●●●

●●●●●

●●

●●●●●●●

●●●●●●●

●●●●●

●●

●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●

●●

●●●●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●●

●●●

●●

●●

●●●●●

●●●●●●●●●

●●

●●●

●●●●●

●●●●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●

●●●●●●

●●●●●

●●●

●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●●●●

●●●●●●●●●●●●

●●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●

●●

●●●●

●●●●●●

●●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●●●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●●●●●●●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●●●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●●

●●●●●

●●

●●●●●●

●●●●●●

●●●●●●●●●●●

●●●

●●●●●

●●●

●●●●●●●●

●●

●●●●●●●●

●●●●●●●●●

●●

●●

●●●

●●

●●●

●●●●●●●●●

●●●

●●●

●●●●

●●

●●

●●

●●●●●●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●●●●●●

●●●

●●

●●●

●●●●

●●●

●●●●

●●●●

●●

●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●●●

●●●●●●

●●●●

●●●●

●●

●●●●

●●●●●

●●

●●●●●●

●●

●●●●●●

●●●●●●●

●●●●●●●●●●●●●

●●

●●●●

●●

●●●●

●●●●

●●●●

●●●

●●●●

●●●●●●

●●●●●

●●●●●

●●●●●●●●●

●●●

●●

●●

●●

●●●

●●●●●

●●●●●

●●●●●

●●

●●●●●

●●

●●

●●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●●●

●●●●

●●

●●●

●●●●●●●

●●●●●●●

●●●●●●●

●●

●●●●●●●●

●●●●●●●

●●●

●●●

●●●●●●●●●●●

●●

●●●●●●

●●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●●●

●●

●●●●●

●●●

●●

●●●●●●●

●●●●●

●●●

●●●

●●

●●●●●●●●

●●

●●●●●●●

●●

●●●●●●

●●●●●

●●●●●●●●●

●●

●●

●●●●

●●

●●

●●

●●●●●●●●

●●●●●

●●

●●●

●●●●●●●

●●

●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●

●●

●●●

●●●

●●

●●●●●

●●

●●●●

●●

●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●●●●●

●●

●●●●●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●●●

●●

●●●●●●

●●●●●●

●●

●●

●●

●●●

●●●●●●

●●●●

●●

●●

●●●●●

●●●●●●●

●●●●●●●

●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●●●●●●●

●●

●●●●●●●●●

●●●

●●

●●●

●●●●●●

●●●

●●●●●●

●●●●●

●●

●●●●●●●●●●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●

●●●●

●●

●●●

●●●●●

●●●●●●

●●●

●●●

●●●●●●●●●●●●●●●

●●

●●●●●

●●

●●●●

●●●●

●●●

●●●●●

●●●●●●●●

●●●●

●●

●●●●●

●●●●●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●●

●●●●●●

●●●●●●

●●●●●

●●

●●

●●

●●●

●●●●●●●●●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●●●●●●●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●

●●●

●●●

●●●

●●

●●●●

●●●●●●

●●●

●●

●●●●●

●●●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●●●●

●●

●●●●●●●●●●

●●●

●●●●●●●

●●●●

●●

●●●●●

●●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●●●●●

●●

●●

●●●●

●●

●●●●

●●●●●●●●

●●●●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●●●●

●●

●●●●●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●●●●

●●

●●

●●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●●●

●●

●●●●●●●●

●●

●●●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●●●

●●●●●●

●●

●●●●

●●●●

●●

●●●●●●●●●●●

●●

●●●●

●●●

●●

●●●●●●●●

●●●●●

●●

●●●

●●●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●

●●●●●●●

●●●

●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●

●●●●●●●

●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●●●●●●●●●

●●●●

●●

●●●●

●●●●●●●

●●●●

●●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●●●●

●●

●●●●●●●●●

●●●●●●●

●●●●

●●

●●●●●

●●●●●●

●●●

●●●●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●

●●●●

●●

●●●●●●●

●●●

●●●●●

●●

●●●●

●●●●

●●●●●●●

●●

●●●

●●●●

●●●

●●●●

●●

●●●●●●

●●

●●●●●●●●

●●●●●

●●●●●

●●●●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●

●●

●●●

●●●●●●

●●●●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●●●●●●●●●

●●●●●●●

●●●

●●●●●●

●●●●●●●●●

●●

●●●●●●

●●●●●●●

●●

●●●●●●●

●●●

●●●

●●●

●●●●●●●●●●

●●●

●●

●●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●●●●●●●

●●●●●

●●

●●●

●●●●●

●●

●●●●

●●●●

●●

●●●●●●●

●●●

●●●●●●●●●

●●

●●

●●●●●●●

●●●●●●

●●●●●●●●●●

●●

●●

●●●●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●●

●●

●●

●●●●●

●●●●●●●●●●●●●●

●●

●●

●●

●●

●●●●●

●●●●●

●●

●●●●

●●●●

●●●●

●●

●●●●

●●●

●●

●●●●●●

●●●

●●●

●●●●●●

●●●

●●●●●

●●●

●●●

●●●●●●

●●●

●●

●●●●

●●

●●●

●●●●●●●

●●

●●●●●●●●

●●

●●●●

●●●●●●●●●●

●●

●●●

●●

●●●

●●●●●●

●●

●●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●●●●

●●

●●●●●●

●●

●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●●●

●●●●

●●

●●●●

●●●●●●●

●●●

●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●●●●●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●●●●

●●●

●●●●●●●●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●●

●●●●

●●●●●

●●●●●

●●●●

●●●

●●

●●

●●●●●●●

●●

●●●

●●●●

●●

●●●●●●●●●

●●●

●●●

●●

●●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●●●●●●●●

●●●

●●

●●

●●●●●●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●●●●●●

●●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●●●●●●●●

●●●●●●

●●●●●

●●

●●

●●

●●●

●●●●●●●

●●

●●

●●●

●●●●●●●●

●●

●●

●●●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●●●●●●●●

●●

●●●●

●●●●

●●

●●●

●●●●●

●●●●●

●●●●●●

●●●

●●

●●●●●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●●●●●●●●

●●

●●

●●●●

●●●●

●●●●●

●●●●

●●

●●●

●●●●●●●●●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●●●●●●●●●

●●●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●●●●●●●●

●●●

●●

●●●●●●●●

●●●

●●

●●

●●●

●●●●●

●●●●●

●●●●●●

●●

●●●

●●●●●

●●●●●●

●●●●●●●

●●●●●●●

●●●

●●●●●●●

●●

●●

●●●●●●

●●●

●●●●●

●●●●●

●●●

●●●

●●

●●●●

●●●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●●●●●●

●●●

●●●●

●●●●

●●●●●

●●

●●●

●●●

●●

●●

●●●●●●●

●●●●●●

●●

●●●●

●●

●●●●●

●●●●●

●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●

●●●

●●●●●●●●●●●●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●●●●●●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●●●●

●●●●●●●●

●●●

●●

●●●●

●●

●●●●●●

●●●●

●●●●

●●

●●●●●●●

●●●

●●

●●●

●●●

●●●

●●

●●●●●●

●●●●●

●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

●●●

●●●●●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●●●

●●●●●

●●

●●●

●●●

●●●●

●●●●●

●●●●●●●●

●●●●●●●

●●●

●●●●

●●●●

●●●

●●●●●●

●●

●●

●●

●●

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●●

●●

●●

●●●

●●●●●●

●●●●●●●

●●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●●●●●

●●●

●●●●

●●●

●●●●●

●●●

●●●●●

●●●●●●●

●●●

●●

●●●

●●

●●●●●

●●

●●●●●

●●

●●●●

●●

●●●●●

●●●●●●●●

●●

●●●

●●

●●

●●●●●●●●

●●●●

●●

●●●●●●

●●●●

●●

●●●●

●●●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●

●●●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●●●●●●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●●

●●

●●●

●●●

●●●●●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●●●

●●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●●●●●●●●

●●

●●

●●●

●●●

●●●●●

●●●●

●●●●

●●

●●

●●●

●●

●●●●●●

●●

●●●●●●

●●●●●●●●

●●

●●●

●●●

●●

●●●●●●●

●●

●●●●

●●

●●●●●●

●●

●●●

●●●

●●●●●●

●●●

●●

●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●

●●●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

100

300

1,000

3,000

10,00

030

,000

100,0

00

Cascade size

Stru

ctur

al v

iralit

y

... but fails to account for the variance in structure given size

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 56 / 62

Page 87: NYC Data Science Meetup: Computational Social Science

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 57 / 62

Page 88: NYC Data Science Meetup: Computational Social Science

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 57 / 62

Page 89: NYC Data Science Meetup: Computational Social Science

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 57 / 62

Page 90: NYC Data Science Meetup: Computational Social Science

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 57 / 62

Page 91: NYC Data Science Meetup: Computational Social Science

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 57 / 62

Page 92: NYC Data Science Meetup: Computational Social Science

Information di↵usionSummary

• Most cascades fail, resulting in fewer than two adoptions, onaverage

• Of the hits that do succeed, we observe a wide range ofdiverse di↵usion structures

• It’s di�cult to say how something spread given only itspopularity

• “The structural virality of online di↵usion”, Anderson, Goel,Hofman & Watts (Under review.)

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 58 / 62

Page 93: NYC Data Science Meetup: Computational Social Science

Outline

Search predictions"Right Round"

Week

Ran

k

40

30

20

10

cccccccccccccccccccccccccccccccccccccccccc

Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09

BillboardSearch

Web diversity

Dai

ly P

er−C

apita

Pag

evie

ws

0

10

20

30

40

50

60

70

●●

Over $25k

Under $25k

Black&

Hispanic

White

No College

Some College

Over 65

Under 65

Female

Male

Income Race Education Age Sex

Information di↵usion

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 59 / 62

Page 94: NYC Data Science Meetup: Computational Social Science

Conclusion

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 60 / 62

Page 95: NYC Data Science Meetup: Computational Social Science

Lessons learned

Data jeopardy

Regardless of scale, it’s di�cult to find the right questions to askof the data

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 61 / 62

Page 96: NYC Data Science Meetup: Computational Social Science

Lessons learned

Hacking

Cleaning and normalizing data is a substantial amount of the work

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 61 / 62

Page 97: NYC Data Science Meetup: Computational Social Science

Lessons learned

Modeling

Understanding human activity is often useful for detectingmalicious activity

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 61 / 62

Page 98: NYC Data Science Meetup: Computational Social Science

Lessons learned

Modeling

Simple methods (e.g., linear models) work surprisingly well,especially with lots of (diverse) data

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 61 / 62

Page 99: NYC Data Science Meetup: Computational Social Science

Thanks. Questions?

[email protected]

Also, we’re hiring:bit.ly/msrnyc_appsci

bit.ly/msrnyc_eng

@jakehofman (Microsoft Research) Computational Social Science November 6, 2014 62 / 62