Informa(on-‐Theore(c Tools for Social Media
Part II
Greg Ver Steeg and Aram Galstyan
• Informa(on Theory Basics – Entropy, MI, Discrete IT es(mators – Entropy es(ma(on demo – Example: predic(ng verdicts from text
• Social network dynamics – Entropic measures for (me series – Transfer entropy & Granger causality – Examples
Coffee Break (4:00-‐4:30) • Content on social networks
– Represen(ng content – Con(nuous IT es(mators
3
URLs, memes, hash tags, re-‐tweets spread from node to node on a network
Infected
Not Infected
Structural measures: page-‐rank, centrality -‐ Node dynamics ignored
Dynamic measures: E.g. number of retweets a user gets -‐ Requires explicit info encoding
Informa(on-‐Theore(c approach: -‐ Use node dynamics -‐ Model-‐free: no encoding assumed -‐ Predic<ve interpreta<on
I’m going to Boston!
Lobster is hard to eat!
Alice Bob
Noisy Channel
Informa(on in human speech
I’m going to Boston! I like borscht.
Alice Bob
How much information is communicated?
Noisy Channel
Informa(on in human speech
Informa(on in human speech
• Mutual informa(on between Alice and Bob’s statements:
• Includes such hard to quan(fy probabili(es as:
• And, this is different for each pair of people!
6
H(A : B) =
X
A,B
P (A,B) log
P (A,B)
P (A)P (B)
Sum over all possible statements!
Pr(Alice says “I’m going...”, then Bob says “I like borscht”)
You’re so 10 dimensional
7
Dimen
sion
ality
10??
105
102
10
1
Possible human statements
Bag of words
Compressed bag-‐of-‐words (LDA topic models) 103
Entropy from histogram Entropy from kernel density es(mators
Non-‐parametric (direct) entropy es(mators Average Twi[er user* *Effec<vely
Past tweet for X
Past tweet for Y
X’s future tweet User X
User Y Time
XP =
0
@00.3. . .
1
A
Y P =
0
@0.70.2. . .
1
A
XF =
0
@0.60.4. . .
1
A
Time User X
User Y
• N samples of tweet exchanges
• Convert to an abstract representa(on
• Es(mate transfer entropy: a measure of Y’s predic(vity of X
TE for Content Dynamics
TEY!X = H(XF : Y P|XP)
In Content Space, X’s and Y’s tweets
Tweets about health care reform
Tweets about taxes
Tweets about the 2012 election
x x x
x x
x y
y y y
y
y
Low Transfer Entropy – X is already predictable
y x
High transfer entropy : x’s tweet was more predictable from y’s, recent tweet than from his own past tweets
y x
ITY!X = H(XF : Y P|XP)
Past tweet for X
Past tweet for Y
X’s future tweet User X
User Y Time
XP =
0
@00.3. . .
1
A
Y P =
0
@0.70.2. . .
1
A
XF =
0
@0.60.4. . .
1
A
Time User X
User Y
• N samples of tweet exchanges
• Convert to an abstract representa(on
• Es(mate transfer entropy: a measure of Y’s predic(vity of X
Convert to an abstract representa(on
Easiest: we’ll use LDA topic model vectors from gensim. Best?
HOLY FLYING COWS FROM SPACE WHY DID THIS SONG DO BAD IF IT'S SO INCREDIBLE.
0
BBBB@
0.010.320.610.04. . .
1
CCCCA
Topic Modeling • A bag-of-words representation for text
• A document is a sequence of N words denote by d = (w1,w2,…,wN)
• A corpus is a collection of M documents denoted by D = {d1,d2,…,dM}
Latent Dirichlet Alloca(on • Latent Dirichlet allocation (LDA) is a generative probabilistic
model of a document corpus. • Generative process for each document d in a corpus D:
1. Choose N ~ Poisson(ξ) – number of words in d
2. Choose θ ~ Dir(α) – the weights of different topics in d
3. For each of the N words wn
(a) Choose a topic zn ~ Multinomial(θ)
(b) Choose a word wn from p(wn|zn, β), a multinomial probability conditioned on the topic zn
4. Inference and Learning
(a) Topics and associated word probabilities
(b) Topic mixture of each document
Twi[er Study
• 1 month of tweets • ~2k users, snowball sampling, constrained to Middle East • 768k tweets • PREPROCESSING:
• No RTs • [a-zA-Z] only, lowercased • No punctuation • No stop words
• LDA topic model with gensim (Python package available online)
Data courtesy of Sofus Macskassy, “On the study of social interactions in twitter.”
ITY!X = H(XF : Y P|XP)
Past tweet for X
Past tweet for Y
X’s future tweet User X
User Y Time
XP =
0
@00.3. . .
1
A
Y P =
0
@0.70.2. . .
1
A
XF =
0
@0.60.4. . .
1
A
Time User X
User Y
• N samples of tweet exchanges
• Convert to an abstract representa(on
• EsCmate transfer entropy: a measure of Y’s predic(vity of X
Non-parametric entropy estimators • No binning of data • No estimating probability density • Nice convergence properties
ITY!XXP, Y P, XF =
0
@0.60.4. . .
1
A ,
0
@0.10.3. . .
1
A ,
0
@0.20.8. . .
1
A
~100 samples of ~100-dim topic vectors!
(Luckily, most users’ activity is effectively low-d…)
Es(mate transfer entropy
Entropy from con(nuous random variables
• We need probability distribu(ons, usually we only have samples
Learned Distribu(on Samples
Kernel func(on
H(X) = E(log 1/p(x))
Advantage of bin-‐less es(mator Differential entropy for a Gaussian in 3 dimensions, as a function of N, the number of samples
Binned methods
Binless method
From Victor, “Binless strategies for estimation of information for neural data”
Intro to bin-‐less entropy es(mator One way to write entropy: Given some samples xi~p(x), Now we don’t need p(x) globally, only local
estimates…
H(x) = Ex
[� log p(x)]
⇡ � 1
N
X
i
log p(xi)
Intro to bin-‐less entropy es(mator H(x) ⇡ � 1
N
X
i
log p(xi)
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê
Ê Ê
Ê Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
ÊÊÊ
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
-2 -1 1 2
-2
-1
1
2 Instead, we’ll estimate the density p(x) at each point xi
p(xi)
Intro to bin-‐less entropy es(mator H(x) ⇡ � 1
N
X
i
log p(xi)
Instead, we’ll estimate the density p(x) at each point xi
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
Ê
Ê Ê
Ê Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
Ê
Ê
ÊÊÊ
Ê
Ê
Ê
ÊÊ
Ê
Ê
Ê
-2 -1 1 2
-2
-1
1
2
ri i
p(xi) =
% points in ball i
Volume of ball i
k = 3
p(xi) ⇡ 3/N
⇡r
2i
/ d
N
X
i
log ri
Lp norm, p=2
See this famous paper for a more detailed account of how we get this correc(on term You could es(mate mutual informa(on:
ˆH(X) =
d
N
X
i
log ri + (N)� (k) + log cd
I(X : Y ) = E"
\p(x, y)
p(x), p(y)
#I(X : Y ) = H(X) + H(Y )� H(X,Y )
Mutual informa(on es(mator
/ log
K-nn
ny(i)
nx(i)
If the points in this box were due to uncorrelated chance:
k
N=
nx
N
ny
NNk
nx
ny
= 1
logN + log k � log nx
� log ny
= 0
I(X,Y ) = E [ (N) + (k)� (nx
+ 1)� (ny
+ 1)]
But for topic models? • Nice trick in a few dimensions, but if we pick a topic model
with 125 topics,
• Leads to a 375 dimensional space! We are estimating information transfer with as few as 100 samples!
• Ok, but is it REALLY 375 dimensional? • (answer: no! most people don’t use most topics)
• If not, does it matter that we wrote it that way? • (answer: no! The estimator relies on distances only)
XP, Y P, XF 2 R125
Example
0
@x
y
z
1
A ⇠ N
0
@
0
@000
1
A,
0
@4 3 13 4 11 1 2
1
A
1
A
H(X : Y |Z) = 0.357
H(X : Y ) = 0.413The example in “test.py” of NPEET
Example 0
@x
y
z
1
A ⇠ N
0
@
0
@000
1
A,
0
@4 3 13 4 11 1 2
1
A
1
A
x
(i) = (x(i), 0.01,�0.02, 0.014, . . . ...)
y
(i) = (y(i),�0.02,�0.03, 0.04, . . . ...)
z
(i) = (z(i), 0.04, 0.017, 0, . . . ...)
A total of 400 dimensions with small random noise.
Convergence of es(mators
Ê
ÊÊ Ê Ê Ê Ê
‡ ‡ ‡ ‡ ‡ ‡ ‡
ÏÏ Ï Ï Ï Ï
10 25 50 100 200 400 800-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Samples
H`HX:Y
»ZL
Permute labels of samples
CMI estimator
CMI estimator with 400 extraneous dimensions
Number of ac(ve topics per user
770 total users
0 10 20 30 400
20
40
60
80
100
Active Topic Dimensions
Users
Require: standard devia(on of tweets for ac(ve topic dimension > 0.05
Convergence for some real-‐world data
Ê
‡‡ ‡ ‡ ‡
Ï Ï Ï Ï Ï ÏÚ
Ú Ú Ú Ú
10 25 50 100 200 400 800-0.2
-0.1
0.0
0.1
0.2
0.3
Samples
H`HX:Y
»ZL
Ê Ê ÊÊ Ê
10 200012
Permute labels of samples
Random tweets
Conversation (noy->lihi)
Copied tweets
Results
ITY!X = H(XF : Y P|XP)
Past tweet for X
Past tweet for Y
X’s future tweet User X
User Y Time
XP =
0
@00.3. . .
1
A
Y P =
0
@0.70.2. . .
1
A
XF =
0
@0.60.4. . .
1
A
Time User X
User Y
• N samples of tweet exchanges
• Convert to an abstract representa(on
• EsCmate transfer entropy: a measure of Y’s predic(vity of X
Histogram of transfer entropy
0.0 0.2 0.4 0.6 0.8 1.0 1.2Content Transfer
1.
10.
100
1000
10000
Edges
Very high transfer entropy pairs!
# Pairs
Transfer Entropy
geekword
sheikhali
karachibourse
sportsencounter
jawad1978
hussainhamad
dawn_com
paknews
abdullah1213
murtaza352
ibneadamvision
zaheerahmed_pk
x_meganews
mzaila
friendxpoint
aadyeel
shakirhusain
yasir_sudkb
zeeshan_ahmed
mabdullahkhan
abidbeli
rabiagarib
drawab
farhanmasood
ali360
jahanzaib81
rumaisam
faisalkapadia
tweetsmarter
laeeqrind
abdullahnasim
The “Friend” Network
geekword
sheikhali
karachibourse
sportsencounter
jawad1978
hussainhamad
dawn_com
paknews
abdullah1213
murtaza352
ibneadamvision
zaheerahmed_pk
x_meganews
mzaila
friendxpoint
aadyeel
shakirhusain
yasir_sudkb
zeeshan_ahmed
mabdullahkhan
abidbeli
rabiagarib
drawab
farhanmasood
ali360
jahanzaib81
rumaisam
faisalkapadia
tweetsmarter
laeeqrind
abdullahnasim
High TE edges (based on content dynamics)
geekword
sheikhali
karachibourse
sportsencounter
jawad1978
hussainhamad
dawn_com
paknews
abdullah1213
murtaza352
ibneadamvision
zaheerahmed_pk
x_meganews
mzaila
friendxpoint
aadyeel
shakirhusain
yasir_sudkb
zeeshan_ahmed
mabdullahkhan
abidbeli
rabiagarib
drawab
farhanmasood
ali360
jahanzaib81
rumaisam
faisalkapadia
tweetsmarter
laeeqrind
abdullahnasim
High TE edges (based on content dynamics)
geekword
sheikhali
karachibourse
sportsencounter
jawad1978
hussainhamad
dawn_com
paknews
abdullah1213
murtaza352
ibneadamvision
zaheerahmed_pk
x_meganews
mzaila
friendxpoint
aadyeel
shakirhusain
yasir_sudkb
zeeshan_ahmed
mabdullahkhan
abidbeli
rabiagarib
drawab
farhanmasood
ali360
jahanzaib81
rumaisam
faisalkapadia
tweetsmarter
laeeqrind
abdullahnasim
High TE edges (based on content dynamics)
geekword: #Skype for #Windows gets deep rooted #Facebook Integra(on h[p://bit.ly/cb7UOj #SocialNetwork sheikhali: #Skype for #Windows gets deep rooted #Facebook Integra(on h[p://bit.ly/cb7UOj #SocialNetwork sheikhali: @l3v5y nice one geekword: #Windows Phone 7 to get copy/paste feature in early 2011 h[p://bit.ly/a9AfF5 #Wp7 #Microsom #gadgets sheikhali: #Windows Phone 7 to get copy/paste feature in early 2011 h[p://bit.ly/a9AfF5 #Wp7 #Microsom #gadgets geekword: #Windows Phone 7 makes a guest appearance on #HTC #HD2 h[p://bit.ly/aUJmJp #WP7 sheikhali: #Windows Phone 7 makes a guest appearance on #HTC #HD2 h[p://bit.ly/aUJmJp #WP7 geekword: Where to watch #Apple’s Back to the Mac event streamed live h[p://goo.gl/o/843kl #gadgets #newsreviews #macbookair sheikhali: How to watch live streaming of #Apple's Back to the #Mac Event h[p://bit.ly/bGJ4w2 #gadgets #Macbook sheikhali: @geekword trending post: #Ultrasn0w #iOS 4.1 #unlock for #iPhone 3G(S) will go live two days amer the iOS 4.2 release h[p://bit.ly/9QKcNB geekword: #PwnageTool 4.1 unleashed brings iOS 4.1/3.2.2 #jailbreak for your #iDevice h[p://bit.ly/cn50Qu #Apple #jbiPhone sheikhali: #PwnageTool 4.1 unleashed brings iOS 4.1/3.2.2 #jailbreak for your #iDevice h[p://bit.ly/cn50Qu #Apple #jbiPhone geekword: @tweetmeme How to watch live streaming of #Apple's Back to the #Mac Event h[p://bit.ly/bGJ4w2 #gadgets #Macbook sheikhali: @tweetmeme How to watch live streaming of #Apple's Back to the #Mac Event h[p://bit.ly/bGJ4w2 #gadgets #Macbook geekword: #Guide to #jailbreak iOS 4.1 using #PwnageTool 4.1 h[p://bit.ly/bz6dv8 #jbiPhone #Howto sheikhali: #Guide to #jailbreak iOS 4.1 using #PwnageTool 4.1 h[p://bit.ly/bz6dv8 #jbiPhone #Howto geekword: @tweetmeme #Guide to #jailbreak iOS 4.1 using #PwnageTool 4.1 h[p://bit.ly/bz6dv8 #jbiPhone #Howto sheikhali: @tweetmeme #Guide to #jailbreak iOS 4.1 using #PwnageTool 4.1 h[p://bit.ly/bz6dv8 #jbiPhone #Howto
-‐No follows -‐No retweets -‐Random order leads to bi-‐directed transfer
No following No men(ons No RT Different URL Different Hash Different wording
Asymmetric: Temporally, only one order occurs (mza then zah) It’s predictable but is it causal?
Seconds
Hours
0.24
0.01
Social influence Previous examples were predictable but not social • Can we use men(ons to check if we capture social behavior?
• Men(ons != Social
• We constrain to a subset of users who use men(ons in conversa(on
aya_bieber3: @jus(nbieber africa but not israel :( aya_bieber3: @jus(nbieber i'm excited to see this video ♥ i love u aya_bieber3: @jus(nbieber no(ce ur amazing isralis fans? (: ♥ aya_bieber3: @jus(nbieber i just want u to no(ce me or to ur fans in israel! but.. i guess u'll never do it :( aya_bieber3: @jus(nbieber haha we have the same number of followers !! ♥ aya_bieber3: @jus(nbieber I will never say never un(l ull tweet me !!! aya_bieber3: @jus(nbieber I will never say never un(l ull tweet me !! ♥ aya_bieber3: @jus(nbieber we have the same number of followers haha aya_bieber3: @jus(nbieber I love uuuu <3 aya_bieber3: @jus(nbieber heyy jus(n how r u? ((: aya_bieber3: @jus(nbieber it's weird but all the (mes u no(ced me (2 (mes haha not really no(ce) were when i didn't mean u to do that (: love uu ♥ aya_bieber3: @jus(nbieber u know i love u? (:
Reconstruc(ng men(on graph tabankhamosh: PEPCO plants in the seraiki belt belch un-‐scrubbed pollutants into the air, sicken the ci(zens & dump sludge in the water table. #taraqqi enggandy: NEW VERSION OF TWITTER IS HERE … shahidsaeed: @tabankhamosh ISI officers will be sent to PEPCO, WAPDA, NEPRA, PTA, PR, PIA, PSM, NBP, etc and fix everything. ISI zindabad tabankhamosh: Revenge of the Seraikis = The Punjabi strain of the Taliban. #oy #vey fzzzkhan: u got it? :O RT @ENggANdY NEW VERSION OF TWITTER IS HERE .. shahidsaeed: @tabankhamosh Seraikis meant for cul(va(ng fields so Punjab speaking South-‐Punjabi people fight Jihad in Kashmir,Chechya,Phillipines,etc enggandy: @fzzzKhan YEAH I THINK SO ... YOU GOT IT ??? SPLIT SCREEN VERSION ? fzzzkhan: @ENggANdY no :( m s(ll wai(ng for it
enggandyfzzzkhan
tabankhamoshshahidsaeed
93 edges, 38 nodes
Reconstruc(ng men(on graph Top 4 edges according to transfer entropy are correct: "tabankhamosh", "shahidsaeed”, 0.110 "noy_shahar", "lihifarag”, 0.0987 "enggandy", "fzzzkhan”, 0.0976 "noy_shahar", "reutgolan”, 0.0975
Metric: Probability that a true edge has higher transfer entropy than a false edge
AUC = 0.648
Null model AUC = 0.5 (w/ SE = 3.5%)
Men(on edges
Top transfer entropy examples
Top transfer entropy examples
93 edges, 38 nodes
Noy_shahar
reutgolan lihifarag Tri-‐lingual friends
93 edges, 38 nodes
Summary
• Go beyond followers, RT, #hash, URL, like: use the text!
• Transfer entropy: Recover predictive links from content dynamics
Challenges/Future work:
• Better/different content representation, e.g. style instead of content (LIWC?)
• Estimation: Trade-off between richness of data representation and amount of data
• Informa(on Theory Basics – Entropy, MI, Discrete IT es(mators – Entropy es(ma(on demo – Example: predic(ng verdicts from text
• Social network dynamics – Entropic measures for (me series – Transfer entropy & Granger causality – Examples
Coffee Break (4:00-‐4:30) • Content on social networks
– Represen(ng content – Con(nuous IT es(mators
Wrap up!
I.T. a flexible, powerful approach for complex systems…
Es(ma(on is the main problem, but not insurmountable…
Useful measures like transfer entropy
Useful measures like transfer entropy
Links
• Tutorial website: (Slides/references) – h[p://www.isi.edu/~galstyan/icwsm13/
• Somware – NPEET: www.isi.edu/~gregv/npeet.html – Thoth: www.thoth-‐python.org
• Contact: {gregv,galstyan}@isi.edu