informaon)theore(c.tools. for.social.media. partii.gregv/part3.pdfdawn_com paknews abdullah1213...

52
Informa(onTheore(c Tools for Social Media Part II Greg Ver Steeg and Aram Galstyan

Upload: others

Post on 06-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Informa(on-­‐Theore(c  Tools    for  Social  Media  

 Part  II  

Greg Ver Steeg and Aram Galstyan

Page 2: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

•  Informa(on  Theory  Basics  – Entropy,  MI,  Discrete  IT  es(mators  – Entropy  es(ma(on  demo  – Example:  predic(ng  verdicts  from  text  

•  Social  network  dynamics  – Entropic  measures  for  (me  series  – Transfer  entropy  &  Granger  causality  – Examples  

Coffee  Break  (4:00-­‐4:30)  •  Content  on  social  networks  

– Represen(ng  content  – Con(nuous  IT  es(mators  

Page 3: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

3  

URLs,  memes,  hash  tags,  re-­‐tweets  spread  from    node  to  node  on  a  network  

Infected  

Not  Infected  

Structural  measures:  page-­‐rank,  centrality  -­‐  Node  dynamics  ignored  

Dynamic  measures:  E.g.  number  of  retweets  a  user  gets  -­‐  Requires  explicit  info  encoding  

Informa(on-­‐Theore(c  approach:  -­‐  Use  node  dynamics  -­‐  Model-­‐free:  no  encoding  assumed  -­‐  Predic<ve  interpreta<on  

Page 4: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

I’m going to Boston!

Lobster is hard to eat!

Alice Bob

Noisy  Channel  

Informa(on  in  human  speech  

Page 5: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

I’m going to Boston! I like borscht.

Alice Bob

How much information is communicated?

Noisy  Channel  

Informa(on  in  human  speech  

Page 6: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Informa(on  in  human  speech  

•  Mutual  informa(on  between  Alice  and  Bob’s  statements:  

 •  Includes  such  hard  to  quan(fy  probabili(es  as:    

•  And,  this  is  different  for  each  pair  of  people!  

6  

H(A : B) =

X

A,B

P (A,B) log

P (A,B)

P (A)P (B)

Sum  over  all  possible  statements!  

Pr(Alice says “I’m going...”, then Bob says “I like borscht”)

Page 7: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

You’re  so  10  dimensional  

7  

Dimen

sion

ality

 

10??  

105  

102  

10  

1  

Possible  human  statements  

Bag  of  words  

Compressed  bag-­‐of-­‐words  (LDA  topic  models)  103  

Entropy  from  histogram  Entropy  from  kernel  density  es(mators  

Non-­‐parametric  (direct)  entropy  es(mators  Average  Twi[er  user*    *Effec<vely  

Page 8: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Past  tweet  for  X  

Past  tweet  for  Y  

X’s  future  tweet  User  X  

User  Y  Time

XP =

0

@00.3. . .

1

A

Y P =

0

@0.70.2. . .

1

A

XF =

0

@0.60.4. . .

1

A

Time User  X  

User  Y  

•  N  samples  of  tweet  exchanges  

•  Convert  to  an  abstract  representa(on  

•  Es(mate  transfer  entropy:  a  measure  of  Y’s    predic(vity  of  X  

TE  for  Content  Dynamics  

TEY!X = H(XF : Y P|XP)

Page 9: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

In  Content  Space,  X’s  and  Y’s  tweets  

Tweets about health care reform

Tweets about taxes

Tweets about the 2012 election

x x x

x x

x y

y y y

y

y

Low Transfer Entropy – X is already predictable

y x

High transfer entropy : x’s tweet was more predictable from y’s, recent tweet than from his own past tweets

y x

Page 10: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

ITY!X = H(XF : Y P|XP)

Past  tweet  for  X  

Past  tweet  for  Y  

X’s  future  tweet  User  X  

User  Y  Time

XP =

0

@00.3. . .

1

A

Y P =

0

@0.70.2. . .

1

A

XF =

0

@0.60.4. . .

1

A

Time User  X  

User  Y  

•  N  samples  of  tweet  exchanges  

•  Convert  to  an  abstract  representa(on  

•  Es(mate  transfer  entropy:  a  measure  of  Y’s    predic(vity  of  X  

Page 11: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Convert  to  an  abstract  representa(on  

Easiest: we’ll use LDA topic model vectors from gensim. Best?

HOLY FLYING COWS FROM SPACE WHY DID THIS SONG DO BAD IF IT'S SO INCREDIBLE.

0

BBBB@

0.010.320.610.04. . .

1

CCCCA

Page 12: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Topic  Modeling  •  A bag-of-words representation for text

•  A document is a sequence of N words denote by d = (w1,w2,…,wN)

•  A corpus is a collection of M documents denoted by D = {d1,d2,…,dM}

Page 13: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Latent  Dirichlet  Alloca(on  •  Latent Dirichlet allocation (LDA) is a generative probabilistic

model of a document corpus. •  Generative process for each document d in a corpus D:

1. Choose N ~ Poisson(ξ) – number of words in d

2. Choose θ ~ Dir(α) – the weights of different topics in d

3. For each of the N words wn

(a)  Choose a topic zn ~ Multinomial(θ)

(b)  Choose a word wn from p(wn|zn, β), a multinomial probability conditioned on the topic zn

4. Inference and Learning

(a)  Topics and associated word probabilities

(b)  Topic mixture of each document

Page 14: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Twi[er  Study    

•  1 month of tweets •  ~2k users, snowball sampling, constrained to Middle East •  768k tweets •  PREPROCESSING:

•  No RTs •  [a-zA-Z] only, lowercased •  No punctuation •  No stop words

•  LDA topic model with gensim (Python package available online)

Data courtesy of Sofus Macskassy, “On the study of social interactions in twitter.”

Page 15: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

ITY!X = H(XF : Y P|XP)

Past  tweet  for  X  

Past  tweet  for  Y  

X’s  future  tweet  User  X  

User  Y  Time

XP =

0

@00.3. . .

1

A

Y P =

0

@0.70.2. . .

1

A

XF =

0

@0.60.4. . .

1

A

Time User  X  

User  Y  

•  N  samples  of  tweet  exchanges  

•  Convert  to  an  abstract  representa(on  

•  EsCmate  transfer  entropy:  a  measure  of  Y’s    predic(vity  of  X  

Page 16: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Non-parametric entropy estimators •  No binning of data •  No estimating probability density •  Nice convergence properties

ITY!XXP, Y P, XF =

0

@0.60.4. . .

1

A ,

0

@0.10.3. . .

1

A ,

0

@0.20.8. . .

1

A

~100 samples of ~100-dim topic vectors!

(Luckily, most users’ activity is effectively low-d…)

Es(mate  transfer  entropy  

Page 17: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Entropy  from  con(nuous  random  variables  

•  We  need  probability  distribu(ons,  usually  we  only  have  samples  

Learned  Distribu(on    Samples  

Kernel  func(on  

H(X) = E(log 1/p(x))

Page 18: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Advantage  of  bin-­‐less  es(mator  Differential entropy for a Gaussian in 3 dimensions, as a function of N, the number of samples

Binned methods

Binless method

From Victor, “Binless strategies for estimation of information for neural data”

Page 19: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Intro  to  bin-­‐less  entropy  es(mator  One way to write entropy: Given some samples xi~p(x), Now we don’t need p(x) globally, only local

estimates…

H(x) = Ex

[� log p(x)]

⇡ � 1

N

X

i

log p(xi)

Page 20: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Intro  to  bin-­‐less  entropy  es(mator  H(x) ⇡ � 1

N

X

i

log p(xi)

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê

Ê Ê

Ê Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

ÊÊÊ

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

-2 -1 1 2

-2

-1

1

2 Instead, we’ll estimate the density p(x) at each point xi

p(xi)

Page 21: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Intro  to  bin-­‐less  entropy  es(mator  H(x) ⇡ � 1

N

X

i

log p(xi)

Instead, we’ll estimate the density p(x) at each point xi

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

Ê

Ê Ê

Ê Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

Ê

Ê

ÊÊÊ

Ê

Ê

Ê

ÊÊ

Ê

Ê

Ê

-2 -1 1 2

-2

-1

1

2

ri i

p(xi) =

% points in ball i

Volume of ball i

k = 3

p(xi) ⇡ 3/N

⇡r

2i

/ d

N

X

i

log ri

Lp norm, p=2

Page 22: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

See  this  famous  paper  for  a  more  detailed  account  of  how  we  get  this  correc(on  term          You  could  es(mate  mutual  informa(on:    

ˆH(X) =

d

N

X

i

log ri + (N)� (k) + log cd

I(X : Y ) = E"

\p(x, y)

p(x), p(y)

#I(X : Y ) = H(X) + H(Y )� H(X,Y )

Page 23: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Mutual  informa(on  es(mator  

/ log

K-nn

ny(i)

nx(i)

If the points in this box were due to uncorrelated chance:

k

N=

nx

N

ny

NNk

nx

ny

= 1

logN + log k � log nx

� log ny

= 0

I(X,Y ) = E [ (N) + (k)� (nx

+ 1)� (ny

+ 1)]

Page 24: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

But  for  topic  models?  •  Nice trick in a few dimensions, but if we pick a topic model

with 125 topics,

•  Leads to a 375 dimensional space! We are estimating information transfer with as few as 100 samples!

•  Ok, but is it REALLY 375 dimensional? •  (answer: no! most people don’t use most topics)

•  If not, does it matter that we wrote it that way? •  (answer: no! The estimator relies on distances only)

XP, Y P, XF 2 R125

Page 25: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Example  

0

@x

y

z

1

A ⇠ N

0

@

0

@000

1

A,

0

@4 3 13 4 11 1 2

1

A

1

A

H(X : Y |Z) = 0.357

H(X : Y ) = 0.413The example in “test.py” of NPEET

Page 26: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Example  0

@x

y

z

1

A ⇠ N

0

@

0

@000

1

A,

0

@4 3 13 4 11 1 2

1

A

1

A

x

(i) = (x(i), 0.01,�0.02, 0.014, . . . ...)

y

(i) = (y(i),�0.02,�0.03, 0.04, . . . ...)

z

(i) = (z(i), 0.04, 0.017, 0, . . . ...)

A total of 400 dimensions with small random noise.

Page 27: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Convergence  of  es(mators  

Ê

ÊÊ Ê Ê Ê Ê

‡ ‡ ‡ ‡ ‡ ‡ ‡

ÏÏ Ï Ï Ï Ï

10 25 50 100 200 400 800-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Samples

H`HX:Y

»ZL

Permute labels of samples

CMI estimator

CMI estimator with 400 extraneous dimensions

Page 28: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Number  of  ac(ve  topics  per  user  

770 total users

0 10 20 30 400

20

40

60

80

100

Active Topic Dimensions

Users

Require:    standard  devia(on    of  tweets  for  ac(ve  topic  dimension  >  0.05  

Page 29: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Convergence  for  some  real-­‐world  data  

Ê

‡‡ ‡ ‡ ‡

Ï Ï Ï Ï Ï ÏÚ

Ú Ú Ú Ú

10 25 50 100 200 400 800-0.2

-0.1

0.0

0.1

0.2

0.3

Samples

H`HX:Y

»ZL

Ê Ê ÊÊ Ê

10 200012

Permute labels of samples

Random tweets

Conversation (noy->lihi)

Copied tweets

Page 30: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Results

Page 31: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

ITY!X = H(XF : Y P|XP)

Past  tweet  for  X  

Past  tweet  for  Y  

X’s  future  tweet  User  X  

User  Y  Time

XP =

0

@00.3. . .

1

A

Y P =

0

@0.70.2. . .

1

A

XF =

0

@0.60.4. . .

1

A

Time User  X  

User  Y  

•  N  samples  of  tweet  exchanges  

•  Convert  to  an  abstract  representa(on  

•  EsCmate  transfer  entropy:  a  measure  of  Y’s    predic(vity  of  X  

Page 32: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Histogram  of  transfer  entropy  

0.0 0.2 0.4 0.6 0.8 1.0 1.2Content Transfer

1.

10.

100

1000

10000

Edges

Very  high  transfer  entropy  pairs!  

#  Pairs  

Transfer  Entropy  

Page 33: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

geekword

sheikhali

karachibourse

sportsencounter

jawad1978

hussainhamad

dawn_com

paknews

abdullah1213

murtaza352

ibneadamvision

zaheerahmed_pk

x_meganews

mzaila

friendxpoint

aadyeel

shakirhusain

yasir_sudkb

zeeshan_ahmed

mabdullahkhan

abidbeli

rabiagarib

drawab

farhanmasood

ali360

jahanzaib81

rumaisam

faisalkapadia

tweetsmarter

laeeqrind

abdullahnasim

The  “Friend”  Network  

Page 34: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

geekword

sheikhali

karachibourse

sportsencounter

jawad1978

hussainhamad

dawn_com

paknews

abdullah1213

murtaza352

ibneadamvision

zaheerahmed_pk

x_meganews

mzaila

friendxpoint

aadyeel

shakirhusain

yasir_sudkb

zeeshan_ahmed

mabdullahkhan

abidbeli

rabiagarib

drawab

farhanmasood

ali360

jahanzaib81

rumaisam

faisalkapadia

tweetsmarter

laeeqrind

abdullahnasim

High  TE  edges  (based  on  content  dynamics)  

Page 35: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

geekword

sheikhali

karachibourse

sportsencounter

jawad1978

hussainhamad

dawn_com

paknews

abdullah1213

murtaza352

ibneadamvision

zaheerahmed_pk

x_meganews

mzaila

friendxpoint

aadyeel

shakirhusain

yasir_sudkb

zeeshan_ahmed

mabdullahkhan

abidbeli

rabiagarib

drawab

farhanmasood

ali360

jahanzaib81

rumaisam

faisalkapadia

tweetsmarter

laeeqrind

abdullahnasim

High  TE  edges  (based  on  content  dynamics)  

Page 36: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

geekword

sheikhali

karachibourse

sportsencounter

jawad1978

hussainhamad

dawn_com

paknews

abdullah1213

murtaza352

ibneadamvision

zaheerahmed_pk

x_meganews

mzaila

friendxpoint

aadyeel

shakirhusain

yasir_sudkb

zeeshan_ahmed

mabdullahkhan

abidbeli

rabiagarib

drawab

farhanmasood

ali360

jahanzaib81

rumaisam

faisalkapadia

tweetsmarter

laeeqrind

abdullahnasim

High  TE  edges  (based  on  content  dynamics)  

Page 37: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

geekword:  #Skype  for  #Windows  gets  deep  rooted  #Facebook  Integra(on  h[p://bit.ly/cb7UOj  #SocialNetwork  sheikhali:  #Skype  for  #Windows  gets  deep  rooted  #Facebook  Integra(on  h[p://bit.ly/cb7UOj  #SocialNetwork  sheikhali:  @l3v5y  nice  one  geekword:  #Windows  Phone  7  to  get  copy/paste  feature  in  early  2011  h[p://bit.ly/a9AfF5  #Wp7  #Microsom  #gadgets  sheikhali:  #Windows  Phone  7  to  get  copy/paste  feature  in  early  2011  h[p://bit.ly/a9AfF5  #Wp7  #Microsom  #gadgets  geekword:  #Windows  Phone  7  makes  a  guest  appearance  on  #HTC  #HD2  h[p://bit.ly/aUJmJp  #WP7  sheikhali:  #Windows  Phone  7  makes  a  guest  appearance  on  #HTC  #HD2  h[p://bit.ly/aUJmJp  #WP7  geekword:  Where  to  watch  #Apple’s  Back  to  the  Mac  event  streamed  live  h[p://goo.gl/o/843kl  #gadgets  #newsreviews  #macbookair  sheikhali:  How  to  watch  live  streaming  of  #Apple's  Back  to  the  #Mac  Event  h[p://bit.ly/bGJ4w2  #gadgets  #Macbook  sheikhali:  @geekword  trending  post:  #Ultrasn0w  #iOS  4.1  #unlock  for  #iPhone  3G(S)  will  go  live  two  days  amer  the  iOS  4.2  release  h[p://bit.ly/9QKcNB  geekword:  #PwnageTool  4.1  unleashed  brings  iOS  4.1/3.2.2  #jailbreak  for  your  #iDevice  h[p://bit.ly/cn50Qu  #Apple  #jbiPhone  sheikhali:  #PwnageTool  4.1  unleashed  brings  iOS  4.1/3.2.2  #jailbreak  for  your  #iDevice  h[p://bit.ly/cn50Qu  #Apple  #jbiPhone  geekword:  @tweetmeme  How  to  watch  live  streaming  of  #Apple's  Back  to  the  #Mac  Event  h[p://bit.ly/bGJ4w2  #gadgets  #Macbook  sheikhali:  @tweetmeme  How  to  watch  live  streaming  of  #Apple's  Back  to  the  #Mac  Event  h[p://bit.ly/bGJ4w2  #gadgets  #Macbook  geekword:  #Guide  to  #jailbreak  iOS  4.1  using  #PwnageTool  4.1  h[p://bit.ly/bz6dv8  #jbiPhone  #Howto  sheikhali:  #Guide  to  #jailbreak  iOS  4.1  using  #PwnageTool  4.1  h[p://bit.ly/bz6dv8  #jbiPhone  #Howto  geekword:  @tweetmeme  #Guide  to  #jailbreak  iOS  4.1  using  #PwnageTool  4.1  h[p://bit.ly/bz6dv8  #jbiPhone  #Howto  sheikhali:  @tweetmeme  #Guide  to  #jailbreak  iOS  4.1  using  #PwnageTool  4.1  h[p://bit.ly/bz6dv8  #jbiPhone  #Howto    

-­‐No  follows  -­‐No  retweets  -­‐Random  order  leads  to  bi-­‐directed  transfer  

Page 38: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain
Page 39: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

No  following  No  men(ons  No  RT  Different  URL  Different  Hash  Different  wording    

Page 40: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Asymmetric:  Temporally,  only  one  order  occurs  (mza  then  zah)  It’s  predictable  but  is  it  causal?  

Seconds  

Hours  

0.24  

0.01  

Page 41: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Social  influence  Previous  examples  were  predictable  but  not  social  •  Can  we  use  men(ons  to  check  if  we  capture  social  behavior?  

•  Men(ons  !=  Social  

 •  We  constrain  to  a  subset  of  users  who  use  men(ons  in  conversa(on  

aya_bieber3:  @jus(nbieber  africa  but  not  israel  :(  aya_bieber3:  @jus(nbieber  i'm  excited  to  see  this  video  ♥  i  love  u  aya_bieber3:  @jus(nbieber  no(ce  ur  amazing  isralis  fans?  (:  ♥  aya_bieber3:  @jus(nbieber  i  just  want  u  to  no(ce  me  or  to  ur  fans  in  israel!  but..  i  guess  u'll  never  do  it  :(  aya_bieber3:  @jus(nbieber  haha  we  have  the  same  number  of  followers  !!  ♥  aya_bieber3:  @jus(nbieber  I  will  never  say  never  un(l  ull  tweet  me  !!!  aya_bieber3:  @jus(nbieber  I  will  never  say  never  un(l  ull  tweet  me  !!  ♥  aya_bieber3:  @jus(nbieber  we  have  the  same  number  of  followers  haha  aya_bieber3:  @jus(nbieber  I  love  uuuu  &lt;3  aya_bieber3:  @jus(nbieber  heyy  jus(n  how  r  u?  ((:  aya_bieber3:  @jus(nbieber  it's  weird  but  all  the  (mes  u  no(ced  me  (2  (mes  haha  not  really  no(ce)  were  when  i  didn't  mean  u  to  do  that  (:  love  uu  ♥  aya_bieber3:  @jus(nbieber  u  know  i  love  u?  (:  

Page 42: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Reconstruc(ng  men(on  graph  tabankhamosh:  PEPCO  plants  in  the  seraiki  belt  belch  un-­‐scrubbed  pollutants  into  the  air,  sicken  the  ci(zens  &  dump  sludge  in  the  water  table.  #taraqqi  enggandy:  NEW  VERSION  OF  TWITTER  IS  HERE  …  shahidsaeed:  @tabankhamosh  ISI  officers  will  be  sent  to  PEPCO,  WAPDA,  NEPRA,  PTA,  PR,  PIA,  PSM,  NBP,  etc  and  fix  everything.  ISI  zindabad  tabankhamosh:  Revenge  of  the  Seraikis  =  The  Punjabi  strain  of  the  Taliban.  #oy  #vey  fzzzkhan:  u  got  it?  :O  RT  @ENggANdY  NEW  VERSION  OF  TWITTER  IS  HERE  ..  shahidsaeed:  @tabankhamosh  Seraikis  meant  for  cul(va(ng  fields  so  Punjab  speaking  South-­‐Punjabi  people  fight  Jihad  in  Kashmir,Chechya,Phillipines,etc  enggandy:  @fzzzKhan  YEAH  I  THINK  SO  ...  YOU  GOT  IT  ???  SPLIT  SCREEN  VERSION  ?  fzzzkhan:  @ENggANdY  no  :(  m  s(ll  wai(ng  for  it  

  enggandyfzzzkhan

tabankhamoshshahidsaeed

Page 43: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

93 edges, 38 nodes

Reconstruc(ng  men(on  graph  Top  4  edges  according  to    transfer  entropy  are  correct:    "tabankhamosh",  "shahidsaeed”,  0.110  "noy_shahar",  "lihifarag”,  0.0987  "enggandy",  "fzzzkhan”,  0.0976  "noy_shahar",  "reutgolan”,  0.0975  

Metric:  Probability  that  a  true  edge  has  higher  transfer  entropy  than  a  false  edge  

         AUC  =  0.648    

Null  model  AUC  =  0.5     (w/  SE  =  3.5%)  

Men(on  edges  

Page 44: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Top  transfer  entropy  examples  

Page 45: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Top  transfer  entropy  examples  

93 edges, 38 nodes

Noy_shahar  

reutgolan   lihifarag  Tri-­‐lingual    friends  

Page 46: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

93 edges, 38 nodes

Summary  

•  Go beyond followers, RT, #hash, URL, like: use the text!

•  Transfer entropy: Recover predictive links from content dynamics

Challenges/Future work:

•  Better/different content representation, e.g. style instead of content (LIWC?)

•  Estimation: Trade-off between richness of data representation and amount of data

Page 47: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

•  Informa(on  Theory  Basics  – Entropy,  MI,  Discrete  IT  es(mators  – Entropy  es(ma(on  demo  – Example:  predic(ng  verdicts  from  text  

•  Social  network  dynamics  – Entropic  measures  for  (me  series  – Transfer  entropy  &  Granger  causality  – Examples  

Coffee  Break  (4:00-­‐4:30)  •  Content  on  social  networks  

– Represen(ng  content  – Con(nuous  IT  es(mators  

Wrap  up!  

Page 48: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

I.T.  a  flexible,  powerful  approach  for  complex  systems…  

Page 49: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Es(ma(on  is  the  main  problem,  but  not  insurmountable…  

Page 50: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Useful  measures  like  transfer  entropy  

Page 51: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Useful  measures  like  transfer  entropy  

Page 52: Informaon)Theore(c.Tools. for.Social.Media. PartII.gregv/part3.pdfdawn_com paknews abdullah1213 murtaza352 ibneadamvision zaheerahmed_pk x_meganews mzaila friendxpoint aadyeel shakirhusain

Links  

•  Tutorial  website:  (Slides/references)  – h[p://www.isi.edu/~galstyan/icwsm13/  

•  Somware  – NPEET:  www.isi.edu/~gregv/npeet.html  – Thoth:  www.thoth-­‐python.org  

•  Contact:  {gregv,galstyan}@isi.edu