Download - Predictability of conversation partners
Predictability of
conversation partners
1Taro Takaguchi, 1Mitsuhiro Nakamura,2Nobuo Sato, 2Kazuo Yano,
and 1,3Naoki Masuda
1 Univ. of Tokyo
2 Central Research Lab., Hitachi, Ltd.
3 JST PRESTO
Phys. Rev. X 1, 011008 (2011)
Conventional assumptions
Mobility patterns in the physical space
Temporal order ofselecting interaction partners
by random (or Lévy) walk by Poisson process
Actual behavior: to what degree random / deterministic ?
Modeling human behavior such as
Predictability of mobility patterns (Song et al., 2010)
Mobility patterns are largely predictable.
Method: measuring the entropy of the sequence of locations of each cell-phone user
How about other components of human behavior?
Predictability of interaction partners
interacts with
- Characterize individuals in a social organization- Related to epidemics and information spreading
“partner sequence”
time
2 2 23 3 1
discard the timing of events
Data set: face-to-face interaction log in an office
Recording from two offices in Japanese companies- subjects: 163 individuals- period: 73 days- total conversation events: 51,879 times
Name tag with an infrared module(The Business Microscope system)
We used the data collected by World Signal Center, Hitachi, Ltd., Japan.(株)日立製作所ワールドシグナルセンタにより収集されたデータセットを使用した。
http://www.hitachi-hitec.com/jyouhou/business-microscope/
Details of data set
- A conversation event = communication between close tags
- Conversation events: undirected
- Time resolution = one minute
-Multiple events initiated within the same minute
→ determine their order at random
Three entropy measures (cf. Song et al., 2010)
Random entropy: interaction in a totally random manner
Uncorrelated entropy: heterogeneity among
Conditional entropy: second-order correlation
- for individual who has partners in the recording period,
: the probability to talk with
: to talk with immediately after with
cf. Entropy
Ex.) coin toss
“The uncertainty of the random variable”
: always tail
Example 1: random
1
2
3
i
Suppose that
Example 2: predictable
1
2
3
i
Suppose that
(same as example 1)
Quantifying the predictability
The information about the NEXT partnerthat is earned by knowing the PREVIOUS partner
Interpretation:
Mutual information of the partner sequence
predictable random
Aim of the study
Examine the predictability of conversation partners
- similar to that of mobility patterns
“Predictability” = the mutual information of the partner sequence
Data set: face-to-face interaction log from two offices in Japan
events
interacts with
2 2 23 3 1
i
Histogram of the entropies
H i
nu
mb
er o
f in
div
idu
als
1 2 3 4 5 6 7
0
10
20
30
40
H i0
H i1
H i2
(a)
2.0 2.5 3.0 3.5 4.0
2.0
2.5
3.0
3.5
4.0
H i1
(b)
H i2
, regardless of the value of
→ Partner sequences are predictable to some extent.
Significance of
Null hypothesis:“ is positive only because of the small data size.”
Compare with the value obtained from shuffled partner sequences
1 50 100 1500
1
2
3
i in the ascending order of I i
Ii
(a)
Empirical
Error bars: 99% confidential intervals of shuffled sequences
Main cause of the predictability
The bursty human activity pattern:characterized by the long-tailed inter-event intervals
“ tends to talk with within a short period after talking with .”
time
of a typical individual
Test 1 (1)
time
2 3 1
Partner sequence in a given day
with1
2
3
with
with
Purpose: examine the contribution of the bursty activity
extract the sequences with each partner
111 3 2 2 3 32 1
Test 1 (2)
with1
2
3
with
with
Next, shuffle the intervals within each partner
merge
Calculate of this shuffled sequence1
2
2 3 1111 32 2 3 32 1
time
Result of Test 1
1 50 100 1500
1
2
3
i in the ascending order of I i
I iburst
(a)
Generate 100 shuffled sequence
Empirical
Error bars: mean ± sd for the shuffled sequences
Contribution of burstiness
Test 2
events
2 2 23 3 12 3
of the merged sequenceCalculate
Purpose: examine the predictability after omitting the burstiness
merge into one
Result of Test 2
1 50 100 1500
1
2
3
i in the ascending order of I imerge
I imerge
(b)
Shuffle the merged sequence with replacementconditioned that no partner ID appears successively
Original
Error bars: 99% confidential intervals of shuffled sequences
is significantly large.
→ Predictability remains without the effect of the burstiness.
Summary so far
Partner sequence: predictable to some extent
Main cause of predictability: the bursty activity pattern
- Predictability remains after omitting the burstiness.
2.0 2.5 3.0 3.5 4.0
2.0
2.5
3.0
3.5
4.0
H i1
(b)
H i2
Predictability depends on individuals
I i
num
ber
of
ind
ivid
ual
s
0.5 1.0 1.5 2.0
0
10
20
30
40
Depends on ’s position in the social network?
Histogram of
Conversation network
Node : individual
Link : a pair of individuals having at least one event
Link weight : The total number of events for the pair
Undirected and weighted network
Degree
Strength
Mean weight
Node attributions
Correlation between and node attributions
No correlation Negative
Negative
Hypotheses
- Fix , small → abundance of weak links (with small weights)
□ Hypothesis about links
- “Strength of weak ties” (Granovetter, 1973)
□ Hypothesis about nodes
Strength of weak ties (Granovetter, 1973)
Weak links tend to connect different communities.Weak links offer non-redundant contacts.
“bridge”
Hypotheses
- Fix , small → abundance of weak links (with small weights)
□ Hypothesis about links
- Weak links connect different communities.
□ Hypothesis about nodes
• Individuals connecting different communities → large
• Individuals concealed in a community → small
Hypothesis about links
Relative neighborhood overlap of a link (Onnela et al., 2007)
Purpose: quantify the degree of being concealed in a community
Hypothesis about links
P cum(w)
(a)
<O
>w
0.2 0.4 0.6 0.8 1.0
0.30
0.35
0.40
0.45
“Strength of weak ties” hypothesis;a link with large tends to have a large→ increases with
averaged over the links with
fraction of the links with
:
Consistent with the hypothesis
Hypotheses
- Fix , small → abundance of weak links (with small weights)
□ Hypothesis about links
- Weak links connect different communities.
□ Hypothesis about nodes
• Individuals connecting different communities → large
• Individuals concealed in a community → small
Hypothesis about nodes
Clustering coefficient of a node (Watts and Strogatz, 1998)
Purpose: quantify the degree of being concealed in a community
Abundance of triangles
- 3-clique percolation community (Palla et al., 2005)
- Hierarchical structure (Ravász and Barabási, 2003)Nodes connecting communities → small
Clustering coefficient with link-weight threshold
: after eliminating the links with
Original CN
125
7
11 3
52
10
Expectation: triangles persist around individuals with small
for a fixed
Hypothesis about nodes
■: Pearson correlation coefficient between and
○: Partial correlation coefficient between
and
with and fixed
Hypotheses
- Fix , small → abundance of weak links (with small weights)
□ Hypothesis about links
- Weak links connect different communities.
□ Hypothesis about nodes
• Individuals connecting different communities → large
• Individuals concealed in a community → small
P cum(w)
(a)
<O
>w
0.2 0.4 0.6 0.8 1.0
0.30
0.35
0.40
0.45
Conclusions
Predictability of partner sequence
- Face-to-face interaction log in offices in Japanese companies
Conversation partners: predictable to some extent
- A main cause: the bursty activity pattern
- Significant predictability after omitting the burstiness
Dependence on the individual’s position in the conversation network
- “Strength of weak ties”
- INTER-community individuals → large
- INTRA-community individuals → small
Full paper is available
Taro Takaguchi, Mitsuhiro Nakamura, Nobuo Sato, Kazuo Yano, and Naoki Masuda,“Predictability of conversation partners”,Phys. Rev. X 1, 011008 (2011).