analyzing the structure and dynamics of ...people.mpi-inf.mpg.de/~pramanik/thesis_soumajit.pdflytics...

ANALYZING THE STRUCTURE AND DYNAMICS OF

MULTILAYER NETWORKS: AN APPLICATION CENTRIC STUDY

Soumajit Pramanik

ANALYZING THE STRUCTURE AND DYNAMICS OF

MULTILAYER NETWORKS: AN APPLICATION CENTRIC STUDY

Thesis submitted to theIndian Institute of Technology, Kharagpur

for award of the degree

of

Doctor of Philosophy

by

Soumajit Pramanik

Under the supervision of

Prof. Bivas Mitra

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

June 2019

©2019 Soumajit Pramanik. All rights reserved.

APPROVAL OF THE VIVA-VOCE BOARD

Date: / / 20

Certified that the thesis entitled “Analyzing the Structure and Dynamics of Mul-tilayer Networks: An Application Centric Study” submitted by Soumajit Pra-manik to the Indian Institute of Technology, Kharagpur, for the award of the degreeof Doctor of Philosophy has been accepted by the external examiners and that thestudent has successfully defended the thesis in the viva-voce examination held to-day.

Prof. Niloy Ganguly Prof. Animesh Mukherjee Prof. Raja Datta(Member of DSC) (Member of DSC) (Member of DSC)

Prof. Bivas Mitra(Supervisor)

Prof. Arnab Bhattacharya Prof. Dipanwita Roychowdhury(External Examiner) (Chairman (HOD, CSE))

CERTIFICATE

This is to certify that the thesis entitled “Analyzing the Structure and Dynamics

of Multilayer Networks: An Application Centric Study”, submitted by Souma-

jit Pramanik to the Indian Institute of Technology, Kharagpur, for the award of the

degree of Doctor of Philosophy, is a record of bona fide research work carried out

by him under my supervision and guidance. The thesis, in my opinion, is worthy

of consideration for the award of the degree of Doctor of Philosophy of the Insti-

tute. To the best of my knowledge, the results embodied in this thesis have not been

submitted to any other University or Institute for the award of any other Degree or

Diploma.

Bivas Mitra

Assistant Professor

CSE, IIT Kharagpur

Date:

DECLARATION

I certify that

a. The work contained in this thesis is original and has been done by myself

under the general supervision of my supervisors.

b. The work has not been submitted to any other Institute for any degree or

diploma.

c. I have followed the guidelines provided by the Institute in writing the thesis.

d. I have conformed to the norms and guidelines given in the Ethical Code of

Conduct of the Institute.

e. Whenever I have used materials (data, theoretical analysis, figures, and text)

from other sources, I have given due credit to them by citing them in the text

of the thesis and giving their details in the references.

f. Whenever I have quoted written materials from other sources, I have put them

under quotation marks and given due credit to the sources by citing them and

giving required details in the references.

Soumajit Pramanik

Author’s Biography

Soumajit Pramanik has received his B.Tech degree in 2011 from the Department of

Computer Science and Engineering, St. Thomas’ College of Engineering and Tech-

nology, Kolkata. He then joined the Indian Statistical Institute, Kolkata for pursuing

M.Tech. degree in Computer Science and completed it in 2013. Since then, he has

been pursuing PhD degree in the Department of Computer Science and Engineering

of Indian Institute of Technology Kharagpur. In his PhD tenure, he has received the

SAP Labs India doctoral fellowship (2014-2017) and several student travel grants

to participate in different international conferences. His research interests include

social networks, multilayer networks, machine learning, information retrieval etc.

Publications from the Thesis

Journals1. Soumajit Pramanik, Rajarshi Haldar, Anand Kumar, Sayan Pathak and Bi-

vas Mitra, “Deep Learning driven Venue Recommender for Event-based So-cial Networks”, Transactions on Knowledge and Data Engineering (TKDE),IEEE, April, 2019.

2. Soumajit Pramanik, Mohit Sharma, Maximilien Danisch, Qinna Wang, Jean-Loup Guillaume and Bivas Mitra, “Easy-Mention: A model driven men-tion recommendation heuristic to boost your tweet popularity”, InternationalJournal of Data Science and Analytics (JDSA), pages 1-17, Springer, March,2018.

3. Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Jean-Loup Guil-laume and Bivas Mitra, “Modeling cascade formation in Twitter amidst men-tions and retweets”, Social Network Analysis and Mining (SNAM), 7(1):41,Springer, August, 2017.

xii

Conferences1. Soumajit Pramanik, Surya Teja Gora, Ravi Sundaram, Niloy Ganguly and

Bivas Mitra, “On the Migration of Researchers across Scientific Domains”,The 13th International AAAI Conference On Web And Social Media (ICWSM),Munich, Germany, June 11-14, 2019.

2. Soumajit Pramanik, Raphael Tackx, Anchit Navelkar, Jean-Loup Guillaumeand Bivas Mitra, “Discovering Community Structure in Multilayer Networks”,The 4th IEEE International Conference on Data Science and Advanced Ana-lytics (DSAA), pages 611-620, Tokyo, Japan, October 19-21, 2017.

3. Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Sumanth Bandi,Anand Kumar, Jean-Loup Guillaume, Bivas Mitra, “On the Role of Men-tions on Tweet Virality”, The 3rd IEEE International Conference on Data Sci-ence and Advanced Analytics (DSAA) (DSAA), pages 204-213, Montreal,Canada, October 17-19, 2016.

4. Soumajit Pramanik, Midhun Gundapuneni, Sayan Pathak and Bivas Mitra,“Can I foresee the success of my Meetup group?”, The 2016 IEEE/ACM In-ternational Conference on Advances in Social Networks Analysis and Mining(ASONAM), pages 366-373, San Francisco, CA, USA, August 18-21, 2016.

5. Soumajit Pramanik, Midhun Gundapuneni, Sayan Pathak and Bivas Mi-tra, “Predicting Group Success in Meetup”, The 10th International AAAIConference On Web And Social Media (ICWSM), pages 663-666, Cologne,Germany, May 17-20, 2016.

6. Soumajit Pramanik, Pranay Hasan Yerra and Bivas Mitra, “Whom-to-Interact:Does Conference Networking Boost Your Citation Count?”, 2nd ACM IKDDConference on Data Sciences (CoDS), pages 39-48, Bangalore, India, March18-21, 2015.

Posters and doctoral symposiums1. Soumajit Pramanik, Maximilien Danisch, Qinna Wang, Jean-Loup Guil-

laume and Bivas Mitra, “Analyzing the Impact of Mentioning in Twitter”,NetSci, Zaragoza, Spain, June 1-5, 2015. (Poster)

2. Soumajit Pramanik, Pranay Hasan Yerra and Bivas Mitra, “Analyzing theImpact of Interactions on Information Flow in Citation Network”, NetSci,Zaragoza, Spain, June 1-5, 2015. (Poster)

3. Soumajit Pramanik and Bivas Mitra, “Influence of Interactions on the Evo-lution of Scientific Citations and Collaborations: A Multiplex Network Ap-

xiii

proach”, European Conference on Complex Systems (ECCS), Lucca, Italy,September 22-26, 2014. (Poster)

4. Soumajit Pramanik and Bivas Mitra, “Influence of Interaction Events onthe Evolution of Scientific Citations and Collaborations”, Doctoral Sympo-sium, 8th ACM International Conference on Distributed Event-Based Sys-tems (DEBS), Mumbai, India, May 26-29, 2014.

5. Soumajit Pramanik and Bivas Mitra, “Influence of Interactions on the Evolv-ing Citation Network”, Xerox Research Center India (XRCI) Open, Banga-lore, India, March 14, 2014. (Awarded “Best Poster”)

Under revision and communicated1. Soumajit Pramanik, Raphael Tackx, Prishni Rateria, Jean-Loup Guillaume

and Bivas Mitra, “Towards developing a modularity based approach for de-tecting communities in multilayer networks”, Information Sciences, Elsevier,October, 2018. (Communicated)

2. Tyll Krueger, Bivas Mitra, Tomasz Ozanski and Soumajit Pramanik, “Epi-demic spreading on directed networks and Twitter cascades”, TheoreticalComputer Science (TCS), Elsevier, September, 2018. (Communicated)

ABSTRACT

Real-world systems are mostly comprised of multiple inter-related subsystems and

hence, can be accurately represented as multilayer networks. Understanding differ-

ent characteristics of these multilayer networks is of prime importance as it is essen-

tial in accurately discerning the true behavior of the underlying real-world systems.

In this thesis, we investigate the following structural and dynamical behaviors of

multilayer networks through the lens of data science and network science.

(a) Community structure: We propose a community detection algorithm for mul-

tilayer networks. The crux of this algorithm is based on a novel multilayer modular-

ity index QM . The proposed algorithm is parameter-free, scalable and adaptable to

complex network structures. More importantly, it can simultaneously detect com-

munities consisting of only single type, as well as multiple types of nodes (and

edges). We evaluate the performance of the proposed community detection algo-

rithm both in the controlled environment (with synthetic benchmark communities)

and on the empirical datasets (Yelp, Meetup and Digg datasets); in both cases, the

proposed algorithm outperforms the competing state-of-the-art algorithms.

(b) Information diffusion: In this work, we investigate the information diffusion

dynamics in the context of multilayer networks. We specifically concentrate on the

popular online microblogging network Twitter where information propagates via

two modalities - retweeting and mentioning. We develop an analytical framework

for cascade formation considering both retweet and mention activities into account.

The proposed framework CMF analytically computes the cascade size, depicting

tweet popularity and discovers the presence of a critical retweet rate, under which

mentioning in a tweet significantly helps in cascade formation. Additionally, taking

cues from the model, we propose a mention recommendation system Easy-Mention

which outperforms the state-of-the-art mention recommendation strategies.

(c) Node movement across layers: We conduct an empirical study to understand

the dynamics of node movement across the layers of multilayer networks. For this

purpose, we concentrate on the knowledge social network where each layer con-

tains the researchers working in a particular field, as nodes and a link among a pair

of nodes signify collaboration between the corresponding researchers. Due to shift

of research interest, researchers often tend to migrate from one field (layer) of re-

search to another field (layer), which depict the node movement across the layers

of the multilayer network. We investigate the key factors regulating a researcher’s

(node’s) decision to migrate to a specific research field (layer) and observe the ef-

fect of such migration on her career and the respective field (layer). We observe that

in general publication quantity and quality, collaborator profile, fields’ popularity

contribute to a researcher’s decision of field-migration.

(d) Entity recommendation: Most of the classical recommendation systems per-

form poorly while recommending entities in systems with multiple interacting en-

tities, which we observe in most real systems. For instance, in event based social

networks such as Meetup, a collection of entities (say, events, groups, venues and

members) interact with each other both online and offline. In such a scenario, we

present a deep learning based venue recommendation system DeepVenue which

provides context driven venue recommendations for the Meetup event-hosts to host

their events. For hosting an event, the proposed DeepVenue model computes a score

for each candidate venue and returns the list of top k venues ranked by these scores.

Our rigorous evaluation shows that DeepVenue significantly outperforms the base-

lines algorithms. Precisely, for 84% of events, the correct hosting venue appears in

the top 5 of DeepVenue recommended list.

Summarizing, in this thesis we analyze the multilayer networks from the perspec-

xvii

tive of different structural and functional properties with the help of state-of-the-art

network science and data science techniques. Furthermore, we develop several ap-

plications such as Easy-Mention, DeepVenue for the benefit of the users belonging

to multiple real multilayer networks.

Keywords: multilayer networks, community, information diffusion, mention rec-

ommendation, entity recommendation, venue recommendation, node movement,

scientific migration, Twitter, Meetup, knowledge social network.

Contents

Table of Contents xix

List of Figures xxv

List of Tables xxix

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem descriptions and motivations . . . . . . . . . . . . . . . . . . . . 31.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.1 Community detection in multilayer networks . . . . . . . . . . . . 71.4.2 Information diffusion in multilayer networks . . . . . . . . . . . . 81.4.3 Node movement in multilayer networks . . . . . . . . . . . . . . . 101.4.4 Entity recommendation in multilayer networks . . . . . . . . . . . 12

1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Literature survey 152.1 Community detection in multilayer networks . . . . . . . . . . . . . . . . 15

2.1.1 Limitations and scope of work . . . . . . . . . . . . . . . . . . . . 172.2 Information diffusion in multilayer networks . . . . . . . . . . . . . . . . . 18

2.2.1 Information propagation in Twitter . . . . . . . . . . . . . . . . . . 182.2.2 Analyzing and boosting tweet popularity . . . . . . . . . . . . . . 202.2.3 Mentioning activities in Twitter . . . . . . . . . . . . . . . . . . . 202.2.4 Limitations and scope of work . . . . . . . . . . . . . . . . . . . . 21

2.3 Node movement in multilayer networks . . . . . . . . . . . . . . . . . . . 212.3.1 Scientific migration . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.2 Researcher mobility . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.3 Interdisciplinarity in research . . . . . . . . . . . . . . . . . . . . 232.3.4 Limitations and scope of work . . . . . . . . . . . . . . . . . . . . 23

2.4 Entity recommendation in multilayer networks . . . . . . . . . . . . . . . 242.4.1 Recommendations in event and location based social networks . . . 242.4.2 Recommendations in heterogeneous networks . . . . . . . . . . . . 26

xix

xx CONTENTS

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Discovering community structure in multilayer networks 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Representation and problem statement . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Multilayer modularity index . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.1 Desired properties of multilayer communities: Intuition . . . . . . 353.3.2 Development of modularity index . . . . . . . . . . . . . . . . . . 363.3.3 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 Synthetic dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.1 Synthetic dataset generation . . . . . . . . . . . . . . . . . . . . . 403.4.2 Synthetic dataset evaluation . . . . . . . . . . . . . . . . . . . . . 41

3.5 Evaluation of modularity index . . . . . . . . . . . . . . . . . . . . . . . . 443.5.1 Varying synthetic parameters . . . . . . . . . . . . . . . . . . . . . 443.5.2 Perturbing synthetic networks . . . . . . . . . . . . . . . . . . . . 453.5.3 Comparing against baseline multilayer modularity . . . . . . . . . 47

3.6 Multilayer community detection . . . . . . . . . . . . . . . . . . . . . . . 503.6.1 ‘Multilayer Louvain’ algorithm . . . . . . . . . . . . . . . . . . . 503.6.2 Complexity and convergence . . . . . . . . . . . . . . . . . . . . . 51

3.7 Community evaluation: Synthetic network . . . . . . . . . . . . . . . . . . 553.7.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 553.7.2 Evaluation against ground truth . . . . . . . . . . . . . . . . . . . 563.7.3 Evaluation against competing algorithms . . . . . . . . . . . . . . 56

3.8 Community evaluation: Empirical network . . . . . . . . . . . . . . . . . . 613.8.1 Yelp dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.8.2 Meetup dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.8.3 Digg dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 Information diffusion in multilayer networks 694.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 Dataset and representation . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.2.2 Multiplex representation . . . . . . . . . . . . . . . . . . . . . . . 744.2.3 Dissecting mentioning . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 Development of analytical framework (CMF ) . . . . . . . . . . . . . . . . . 774.3.1 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3.2 Model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.3.3 Mention strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 794.3.4 Analytical representation of CMF framework . . . . . . . . . . . . . 804.3.5 Special case: Random mention . . . . . . . . . . . . . . . . . . . . 83

CONTENTS xxi

4.3.6 Special case: Smart mention . . . . . . . . . . . . . . . . . . . . . 844.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4.1 Simulation setup and parameter settings . . . . . . . . . . . . . . . 864.4.2 Follower networks . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.5 Evaluation of the framework . . . . . . . . . . . . . . . . . . . . . . . . . 904.5.1 Correctness of the simulation setup . . . . . . . . . . . . . . . . . 904.5.2 Validation with cascade size RU . . . . . . . . . . . . . . . . . . . 914.5.3 Validation with critical retweet rate . . . . . . . . . . . . . . . . . 93

4.6 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.6.1 Comparing smart versus random mention . . . . . . . . . . . . . . 964.6.2 Importance of follower network . . . . . . . . . . . . . . . . . . . 98

4.7 Easy-Mention: Recommendation heuristic . . . . . . . . . . . . . . . . . . 994.7.1 Development of Easy-Mention . . . . . . . . . . . . . . . . . . . . 994.7.2 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.8 Evaluation of Easy-Mention . . . . . . . . . . . . . . . . . . . . . . . . . 1044.8.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 1044.8.2 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . 106

4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5 Node movement in multilayer networks 1115.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125.2 Dataset and migrator identification . . . . . . . . . . . . . . . . . . . . . . 115

5.2.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.2.2 Defining migration . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.2.3 Detection of migrating authors . . . . . . . . . . . . . . . . . . . . 1175.2.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.2.5 First glimpse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.3 Migration dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.3.1 Field proximity: discovery of domains . . . . . . . . . . . . . . . . 1215.3.2 Domain migration . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.3.3 Mass migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.4 Motivating factors behind migration . . . . . . . . . . . . . . . . . . . . . 1265.4.1 Individual centric factors . . . . . . . . . . . . . . . . . . . . . . . 1265.4.2 Field centric factors . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.4.3 Collaboration centric factors . . . . . . . . . . . . . . . . . . . . . 130

5.5 Effects of migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.5.1 Individual-centric effect . . . . . . . . . . . . . . . . . . . . . . . 1345.5.2 Field-centric effect . . . . . . . . . . . . . . . . . . . . . . . . . . 1365.5.3 Collaboration-centric effects . . . . . . . . . . . . . . . . . . . . . 138

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

xxii CONTENTS

6 Entity recommendation in multilayer networks 1436.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.2 Terminologies and problem definition . . . . . . . . . . . . . . . . . . . . 147

6.2.1 Meetup preliminaries and terminologies . . . . . . . . . . . . . . . 1486.2.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.3 DeepVenue: Venue recommendation model . . . . . . . . . . . . . . . . . 1506.3.1 Key idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1506.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.3.3 Training of DeepVenue model . . . . . . . . . . . . . . . . . . . . 1556.3.4 Venue recommendation . . . . . . . . . . . . . . . . . . . . . . . . 155

6.4 Representation of Meetup entities . . . . . . . . . . . . . . . . . . . . . . 1566.4.1 Meetup venue representation . . . . . . . . . . . . . . . . . . . . . 1566.4.2 Meetup event representation . . . . . . . . . . . . . . . . . . . . . 1636.4.3 Meetup group representation . . . . . . . . . . . . . . . . . . . . . 165

6.5 Dataset and experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 1656.5.1 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.5.2 Evaluation procedure and metrics . . . . . . . . . . . . . . . . . . 1666.5.3 Baseline algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.6 Evaluation of DeepV enue . . . . . . . . . . . . . . . . . . . . . . . . . . 1696.6.1 Overall model evaluation . . . . . . . . . . . . . . . . . . . . . . . 1696.6.2 Recommending new venues . . . . . . . . . . . . . . . . . . . . . 1696.6.3 Category specific model evaluation . . . . . . . . . . . . . . . . . 1726.6.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.7 Dissecting DeepV enue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.7.1 Model variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.7.2 Impact of model parameters on DeepV enue . . . . . . . . . . . . 1756.7.3 Analyzing model deficiency: Recommending suitable alternate venues178

6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7 Conclusion and Future Work 1817.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7.1.1 Discovering community structure in multilayer networks . . . . . . 1817.1.2 Information diffusion in multilayer networks . . . . . . . . . . . . 1837.1.3 Node movement in multilayer networks . . . . . . . . . . . . . . . 1847.1.4 Entity recommendation in multilayer networks . . . . . . . . . . . 185

7.2 Future perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1867.2.1 Discovering community structure in multilayer networks . . . . . . 1867.2.2 Information diffusion in multilayer networks . . . . . . . . . . . . 1877.2.3 Node movement in multilayer networks . . . . . . . . . . . . . . . 1877.2.4 Entity recommendation in multilayer networks . . . . . . . . . . . 188

Bibliography 188

CONTENTS xxiii

A Appendix 211

Index 213

List of Figures

1.1 Example of an interdependent multilayer network. . . . . . . . . . . . . . . 21.2 Communities in multilayer networks. . . . . . . . . . . . . . . . . . . . . . 71.3 Multilayer representation of the Twitter social network where information

propagates via both mention and follow links. . . . . . . . . . . . . . . . . 91.4 Snapshots of the multilayer representation of the knowledge social network

at times t1 and t2. In between this two snapshots, the nodeU6 migrates fromfield F2 to field F1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Multilayer structure of EBSNs. . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 A sample multilayer (Yelp) network. . . . . . . . . . . . . . . . . . . . . . 323.2 Network configurations with two different types of ground truth communities. 333.3 Change of NMI values with µ, p, p1 and p2 for ‘CompMod’, ‘MetaFac’

and ‘InfoCom’ on 2-layer networks with 100 nodes in each layer, gener-ated with maximum degree kimax = 10 and average degree 〈ki〉 = 6. . . . . 43

3.4 Change of QM values of the ground-truth communities for different α, µ,p, p1 and p2 values on 2-layer networks with 100 nodes in each layer, gen-erated with maximum degree kimax = 10 and average degree 〈ki〉 = 6. . . . 45

3.5 Change in QM with varying PI for different perturbation strategies appliedon a 2-layer synthetic network generated with α = 0.8, µ = 0.05, p = 0.8,p1 = 0.8 and p2 = 0.0. It contains 100 nodes in each layer where eachnode has maximum degree kimax = 10 and average degree 〈ki〉 = 6. . . . . 47

3.6 Comparative results of QM and mQ on the configurations in Fig. 3.2. . . . 483.7 Drop in modularity gain along with fraction of nodes moved in the first

pass of Multilayer Louvain for a 300× 300 synthetic multilayer network. . 543.8 QM values of the ground truth communities and the communities detected

by Multilayer Louvain (‘ML’) for different α, µ, p, p1 and p2 values. Wealso show the NMI and ARI metric values for the communities detected by‘ML’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.9 QM values of the communities detected by Multilayer Louvain (‘ML’) andother state-of-the-art algorithms for different α, µ, p, p1 and p2 values. . . . 59

3.10 QM values of the communities detected by Multilayer Louvain (‘ML’) ver-sus Louvain and merging based (‘Merge’) baseline algorithms for differentα, p and p1 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

xxv

xxvi LIST OF FIGURES

3.11 Heatmaps showing QM values of the communities detected by classicalLouvain and Multilayer Louvain (‘ML’) for different dt/dm and db/dmvalues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.12 QM values of the communities detected by Multilayer Louvain (‘ML’) op-timizing QM versus baseline Multilayer Louvain (‘ML’) optimizing mQfor different α, p and p1 values. . . . . . . . . . . . . . . . . . . . . . . . . 61

3.13 Precision and F1 Score (avg. over all visitors) for communities obtainedfrom different algorithms for various recommendation lengths on Yelp net-work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.14 Precision and F1 Score (averaged over all groups for Meetup) for commu-nities obtained from different algorithms on Meetup and Digg N/W. . . . . 64

4.1 Retweet count distribution of all tweets in ‘Algeria’ dataset; Inset depictsthe retweet count distribution of all users in ‘Algeria’ dataset. . . . . . . . 70

4.2 Example of mention-follow multiplex. . . . . . . . . . . . . . . . . . . . . 744.3 Mention dependency for tweets and hashtags in ‘Algeria’ and ‘Egypt’ datasets. 754.4 Distribution of users’ probabilities of retweeting a tweet received via fol-

low links and mention links in ‘Algeria’ dataset. . . . . . . . . . . . . . . . 764.5 Distribution of number of mentions for tweets containing at least one men-

tion in ‘Egypt’ dataset; Inset shows the same for ‘Algeria’ dataset. . . . . . 764.6 Analytical representation of the CMF model - an example in (a) and the

schematic diagram in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.7 Follower (indegree) and Followee (outdegree) count distributions for ‘Egypt’

and ‘Algeria’ datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.8 Matching ground truth tweet popularities with the simulation setup (with

same αv, βv, λTv and initiator) and comparing with smart mention strategyfor ‘Algeria’ and ‘Egypt’ datasets. . . . . . . . . . . . . . . . . . . . . . . 90

4.9 Comparison of analytical CMF model and Monte Carlo simulation w.r.t. RUfor ‘Egypt’ network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.10 Comparison of analytical CMF model and Monte Carlo simulation w.r.t. RUfor random Kronecker network. . . . . . . . . . . . . . . . . . . . . . . . . 92

4.11 Effect of varying average α and average β (see Inset) on RU (random men-tioning) for ‘Algeria’ dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.12 Matching the epidemic thresholds obtained analytically and from simula-tion for random and smart mention strategies in ‘Algeria’ dataset. ‘UB’ and‘LB’ in the figure represent the corresponding estimated upper and lowerbounds respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.13 Critical β for different synthetic topologies for random and smart mentions. 944.14 Critical α for scale-free networks (for β = 0.04) and critical β for Kro-

necker networks (for α = 0.1) along with their corresponding analyticallyestimated values. ‘UB’ and ‘LB’ in the figures represent the correspondingestimated upper and lower bounds respectively. . . . . . . . . . . . . . . . 95

LIST OF FIGURES xxvii

4.15 Smart mentioning versus random mentioning w.r.t. RU for ‘Algeria’ dataset.Inset: Effect of varying average α on FM for ‘Algeria’ dataset. . . . . . . 96

4.16 In scale-free networks, effect of different exponents on RU for random andsmart mention strategies with α = 0.4 and λ = 2. . . . . . . . . . . . . . . 98

4.17 In Kronecker networks, effect of different topologies on RU for randommention strategy with α = 0.4 and λ = 2. . . . . . . . . . . . . . . . . . . . 99

4.18 Probabilities of mentioning users with different relations (reciprocal fol-lowers are denoted as ‘Friends’ here) and their probabilities of retweetingin “Egypt” dataset. Inset shows the annotators’ major reasons of labelingusers as spammers for “Egypt” dataset. . . . . . . . . . . . . . . . . . . . 100

4.19 Comparison of the tweet popularity distribution from the “Algeria” datasetand the model. The inset shows the same for the “Egypt” dataset. . . . . . . 103

4.20 CCDF of retweet counts of tweets using different mention strategies for“Algeria” and “Egypt” datasets. . . . . . . . . . . . . . . . . . . . . . . . . 103

4.21 Comparison of retweet rates (in “Algeria” dataset) of users mentioned bycompeting recommendation algorithms. . . . . . . . . . . . . . . . . . . . 105

4.22 Run time comparison of Easy-Mention and Whom-To-Mention recommen-dation algorithms with respect to number of candidate users considered. . . 107

5.1 Multilayer representation of the knowledge social network. . . . . . . . . . 1125.2 Example of a migrator and a non-migrator. Each color represents publica-

tion counts in a field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175.3 Preliminary analysis of the ‘Filtered’ dataset. . . . . . . . . . . . . . . . . 1205.4 Domain migration - (a) Validation of detected domains (b) Intra and inter

domain migration trends. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.5 Mass migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.6 Reasons of migration- (a) Individual publication count, Inset: Individual

publication quality; (b) Field productivity, Inset: Field impact. . . . . . . . 1285.7 Career Phase at which researchers prefer to migrate within same domain,

cross domain and overall. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.8 Correlation between collaboration and migration over the years (Pearson

correlation coefficient 0.91). . . . . . . . . . . . . . . . . . . . . . . . . . 1315.9 Collaboration centric factors affecting migration. . . . . . . . . . . . . . . 1325.10 Effects of migration- (a) Individual publication count, Inset: Individual

publication quality; (b) Effects on hike of overall publication count andcitation count. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.11 Effects of migration- (a) Improvement in top venue publication; (b) Impactof collaborators on quality of publication venues of migrators; (c) Gain ofdeparting field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.12 Effects of migration- (a) Gain of fields with high joining and leaving rates(b) Pressure on joining field’s infrastructure as indicated by the ProgramCommittee sizes of top conferences. . . . . . . . . . . . . . . . . . . . . . 138

5.13 Window-wise impact of migration on the collaborations of the migrators. . 140

xxviii LIST OF FIGURES

6.1 Multilayer structure of EBSNs. . . . . . . . . . . . . . . . . . . . . . . . . 1446.2 Pictorial and graphical (multilayer) representations of venue recommenda-

tion problem in EBSN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1466.3 Schematic diagram of the DeepVenue framework for estimating popularity

of a given event e∗ (organized by group g∗) if hosted at venue v∗. It consistsof three modules namely, venue module, event module and group module.Each module aims to asses the suitability of hosting the target event e∗ at acandidate venue v∗ from a different perspective. . . . . . . . . . . . . . . . 151

6.4 Detailed ‘Venue Module’ and ‘Event Module’ of the DeepVenue frame-work. ‘II’ symbolizes concatenation. . . . . . . . . . . . . . . . . . . . . . 153

6.5 Deep learning framework for learning venue representations via pair-wisevenue similarity prediction. In (b),Rv1 = {Qlv1 , Q

nv1} andRv2 = {Qlv2 , Q

nv2}

denote the combinations of review vectors and attribute vectors for venuesv1 and v2 respectively. Similarly, v1 and v2 denote the learnt representa-tions of venues v1 and v2 respectively. . . . . . . . . . . . . . . . . . . . . 159

6.6 Fractions of events hosted from different categories of groups in differentcategories of venues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6.7 Graph constructed by the SERGE algorithm to recommend venues forhosting event E3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.8 History lengthN versus recall and MIR values forDeepV enue and SERGE.1756.9 The left Y axis shows how the recall and MIR values ofDeepV enuemodel

varies with the event similarity threshold αE; The right Y axis depicts thenumber of events selected for the experiments depending on αE . . . . . . . 176

6.10 Improvement of recall with score-based relaxation in venue similarity. . . . 1796.11 Improvement of recall with embedding-based relaxation in venue similarity. 179

List of Tables

4.1 Mapping the terminologies of epidemic propagation and tweet propagation. 794.2 Summarizing the model parameters and their corresponding interpretations. 874.3 Estimating maximum eigen values and critical β thresholds for random

mention with α = 0.3 and λ = 2 from network’s structural parameters . . 964.4 Examples of content and behavioral attributes used for spammer detection . 1004.5 Metric values for different mentioning strategies applied on “Algeria” and

“Egypt” datasets. Importantly, the metric values corresponding to Easy-Mention are statistically higher than Whom-To-Mention (t-test confirms thestatistical significance with p-value < 0.05). . . . . . . . . . . . . . . . . 106

5.1 Field-wise work count distribution and corresponding acronyms. . . . . . . 115

6.1 Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1496.2 Entropy values for different similarity metrics for different categories of

venues in Chicago (considering only venues hosting at least 5 events). Se-lected metrics are marked in bold. . . . . . . . . . . . . . . . . . . . . . . 162

6.3 Model performance varying number of layers. . . . . . . . . . . . . . . . . 1626.4 Model performance varying activation functions where the activation func-

tions from left to right are applied from bottom to top layers (showing fourof the all possible variations). . . . . . . . . . . . . . . . . . . . . . . . . . 163

6.5 Evaluation results of deep learning based venue embedding with respect toSVM, DT and LR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

6.6 Overall model’s evaluation results. . . . . . . . . . . . . . . . . . . . . . . 1706.7 Overall model’s evaluation results for ‘new’ venues. . . . . . . . . . . . . . 1706.8 Category wise models’ evaluation results. . . . . . . . . . . . . . . . . . . 1716.9 Percentage of total events and events hosted at new venues for different

categories of groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1716.10 Comparison of runtime complexity of all models with DeepV enue. . . . . 1736.11 Results for different model variants of DeepV enue. . . . . . . . . . . . . . 175

xxix

Chapter 1

Introduction

1.1 Background

Network theory is an important tool for describing and analyzing complex systems through-

out the social, biological, physical, information, and engineering sciences. Classical net-

work theory typically deals with the monoplex network [Newman, 2003] (also termed as

single layer network), where only a single type of nodes represent the entity and a single

type of link connects a pair of nodes in the network. However, realistic systems are hetero-

geneous, hence nodes and links may have different types. For example, in a social network,

people are linked with different types of ties: friendship, family relationship, professional

relationship etc. Hence, it is easy to realize that treating all the nodes and links in an equiv-

alent manner severely limits the scope of analysis of the real world network. For instance,

consider the example of the event based social networks (such as Meetup, Plancast) which

have gained a huge popularity in recent times. In such system, similar minded people form

online groups and host offline events where they can physically attend, interact with each

other and participate in various activities. Evidently, such system exhibits two distinct types

of interactions among the members - online and offline, which single layer (monoplex) net-

work fail to capture. As a second example, consider the collaboration network among

the scientific researchers across different fields of research, which can be represented as a

complex network. Close inspection reveals multiple underlying dynamics occurring in such

1

2 Chapter 1 intro

Figure 1.1: Example of an interdependent multilayer network.

network. For instance, researchers may change their collaborations over time, they also mi-

grate from one field of research to another etc. Evidently, single layer networks are unable

to correctly capture the aforesaid complex interactions. Hence, the vanilla network repre-

sentations need to be extended to model the structure and dynamics of realistic systems.

Recently, the literature of ‘Multilayer’ networks [Kivelä et al., 2014] has attracted a huge at-

tention, which can be used to represent most types of complex systems that consist of mul-

tiple networks and include disparate and/or multiple interactions between entities. There

are mainly two popular variations of multilayer networks - multiplex networks and inter-

dependent networks. In case of multiplex networks, identical nodes (mostly) are present in

every layer of the network, but the links connecting various nodes in every layer signify a

different relationship [Gomez et al., 2013, Granell et al., 2013]. All the inter-layer coupling

links in such multiplex networks connect the copies of those identical nodes across layers.

On the other hand, in case of interdependent networks, the nodes in every layer are distinct

and the function or activity of a node in one layer depends on the activity of the linked

nodes in other layers ["Danziger et al., 2014, Buldyrev et al., 2010]. For example, Fig. 1.1

shows the inter-dependence between the power-grid network (containing power-stations as

nodes) and the communication network (containing communication servers as nodes). The

power-stations supply power to the servers which in turn control the functionalities and

communication among the power-stations (by sending control signals). Hence, the func-

tionality of the nodes in each layer of this network, is dependent on the functionality of the

nodes belonging to the other layer.

Since most of the complex natural systems can be realistically represented as multilayer

1.2 Problem descriptions and motivations 3

networks, it is essential to study the different structural and dynamical properties of multi-

layer networks to have the proper understanding of the real systems. For instance, in many

real systems, it is important to identify the community structures (i.e. relatively densely

connected sections of the networks) in order to perform multiple data-mining tasks (search,

recommendation) on them [Fortunato, 2010a]. Similarly, in real social network (Twitter

and Facebook), the diffusion of information plays an important role in information viral-

ity [Weng et al., 2013, Guille et al., 2013]. The diffusion of a post via multiple modalities

(say, simultaneous diffusion of posts via Twitter and Facebook; in case of Twitter, propaga-

tion of post via both retweet and mention activities) makes the diffusion modeling problem

highly challenging. State of the art endeavours on multilayer network developed a robust

theoretical foundation to model and analyze multiple facets of multilayer dynamics. For

instance, tools like tensors [De Domenico et al., 2013], supra-adjacency matrices [Sánchez-

García et al., 2014] provide powerful and flexible representations of multilayer networks.

However, there is ample scope to conduct an empirical data driven investigation on the

structure and dynamics of the real systems, through the lens of multilayer networks.

Importantly, only network science tools are not suitable for this investigation; rather, we

need to also rely on data science and machine learning techniques to conduct the experi-

ments. In this line, we identify the multilayer structure in several typical real world net-

works and crawl them systematically to collect real data for our empirical analysis. This

enables us to encounter various research problems of social and knowledge networks, per-

tinent to the context of the multilayer systems and provides us the scope to apply various

network science as well as data science tools to develop models and heuristics to solve

those problems. In the following, we state the problems addressed in this thesis.

1.2 Problem descriptions and motivations

In the following, we briefly describe the broad problems we choose to investigate in this

thesis, along with the key motivating factors behind choosing them. The first three prob-

lems address different structural and functional aspects of multilayer networks whereas the

fourth one relates to an application aspect of multilayer networks. Overall, the problems

are chosen in a manner such that they encompass the most frequently encountered issues

4 Chapter 1 intro

related to real multilayered systems.

1. Detecting community structure of the network is an important task for any real sys-

tem. It helps not only to understand the structure and behaviour of the system but also

to perform various data mining tasks on top of it. As a result, there exists a huge liter-

ature on detecting communities in traditional single layer networks. However, unlike

single layer networks, the edge densities across the layers of the multilayer networks

are not homogeneous. Hence, it is not possible to obtain optimal community structure

by combining the layers of the multilayer networks and applying a classical commu-

nity detection algorithm developed for single layer networks. Consequently, it is

essential to develop community detection algorithms specifically for multilayer net-

works which takes care of the inhomogeneity of the edge densities across the layers.

2. Understanding the information diffusion dynamics on real social networks is a crit-

ical problem in current time. It helps not only to interpret the popularity of certain

information content but also to develop methods for popularizing or de-popularizing

(say, stopping rumours) a specific information content. Since the last decade, a series

of diffusion models are developed for this purpose. However, most of these models

are suited for analyzing only a single modality of information flow in the network

whereas in multilayer networks, information flow via multiple modalities simulta-

neously. Hence, the existing methodologies are not suitable for understanding the

interplay of multiple modalities of information flow in realistic multilayer systems.

3. Another interesting dynamics of real multilayer networks is the movement of nodes

across the layers of the network which is frequently observed in many realistic sce-

narios. For instance, in knowledge social network where researchers are connected

with each other via collaboration links and each field of research is represented as

a network layer, we observe that researchers often migrate (move) from one field

of research (layer) to another (layer) over their career periods. Similarly, in case of

multimodal transportation systems, while traveling from one place to another, com-

muters often change their modes of transport (bus, train) as per convenience. In

this scenario, each layer represents the respective transportation modality and nodes

and links represent stops/stations and their respective connectivity. Hence, in order

to deal with such scenarios, it is essential to investigate this kind of phenomenon

1.3 Objectives 5

in detail. However, as this kind of situations are unique to the multilayer network

paradigm, we cannot apply existing single layer techniques for dealing with them.

Therefore, it is absolutely necessary to develop a novel analysis framework to study

the reasons, dynamics and impacts of such node movements in multilayer networks.

4. The explosive growth of information available online frequently overwhelms users

(information overload). Recommender system is a useful information filtering tool

for guiding users in a personalized way of discovering products or services they

might be interested in from a large space of possible options. In general, classi-

cal recommendation approaches work well for systems with two entities (say, users

and items) where recommended item lists are generated based on user preferences,

item features, user-item past interactions and some other additional information such

as temporal and spatial data. However, they are insufficient to perform well on real

multilayer networks dealing with three or more number of entities. For instance,

in event based social networks such as Meetup, a collection of entities (say, events,

groups, venues and members) interact with each other both online and offline. In such

a scenario if we wish to recommend suitable events/groups to members or suitable

venues to host events successfully, it is essential to develop recommender systems

specifically trained for working on multi-entity multilayer networks.

In the context of empirical study, this is difficult to identify a single source of dataset, which

can be able to cover all the aforementioned aspects simultaneously. Hence, we prefer to

rely on multiple datasets, relevant for the individual problems. For instance, we choose

a popular online social network (OSN) ‘Twitter’ for analyzing the information diffusion

dynamics whereas we choose an event based social network (EBSN) ‘Meetup’ to deal with

the multi-entity recommendation problem. Next we clearly articulate the precise objectives

of this thesis.

1.3 Objectives

The principal objective of this thesis is to analyze various structural and functional proper-

ties of multilayer networks. We primarily focus on the following four objectives,

6 Chapter 1 intro

1. Community detection in multilayer networks: The first objective of this thesisis to develop a framework for detecting communities in multilayer networks. Basi-

cally the community detection problem revolves around finding groups of nodes in a

network that are more densely connected to each other than to other nodes in the net-

work [Fortunato, 2010a]. It has been extensively studied for unipartite and bipartite

networks [Fortunato, 2010a, Barber, 2007]. However, there are only a few meth-

ods and results extending to multilayer networks, in spite of the fact that numerous

applications (such as search, recommendation) might benefit from such methods.

2. Information diffusion in multilayer networks: Our next objective is related to afunctional aspect of the multilayer network. We aim to propose an information dif-

fusion model motivated by the fact that unlike single layer network, in a multilayer

network of users, information can propagate via numerous means. Furthermore, we

wish to outline a theoretical framework to study diffusion procedure under such cir-

cumstances for various underlying topologies.

3. Node movement in multilayer networks: In several real multilayer networks, nodeshave the capability of moving from one layer of the network to another over time.

This is a unique phenomenon observed in multilayer networks which is totally ab-

sent in single layer networks. Our third objective is therefore to explore this kind of

temporal movement of nodes in the backdrop of multilayer networks and analyze the

potential reasons and impacts of such movements on the individual nodes as well as

overall structure of the network.

4. Entity recommendation in multilayer networks: In network theory, developingrecommender systems is a widely studied challenging problem. In case of real multi-

layer networks, the complexity of the problem gets compounded by the simultaneous

existence of numerous types of entities and relationships in the network. Hence, our

last objective is to study the entity recommendation problem in the context of multi-

entity multilayer networks.

1.4 Contributions 7

Intra-layer Links

Inter-layer Links

Layer 1

Layer 2

Figure 1.2: Communities in multilayer networks.

1.4 Contributions

In the following, we describe the exact problems addressed in this thesis and the corre-

sponding contributions.

1.4.1 Community detection in multilayer networks

Community detection in complex multilayer networks is an important research problem.

The communities in multilayer networks help to identify functionally cohesive sub-units

and reveal complex interactions between multi-type nodes and heterogeneous links. They

are also found to be beneficial for different data mining tasks such as context-sensitive

search, prediction and recommendation [Lin et al., 2009] etc. Formally, the problem of

multilayer community detection algorithm is to divide a multilayer network G into a set ofdisjoint cohesive modules C1, C2, . . . , CK which is a cover of the nodes in G such that eachmodule Ci is comprised of a group of nodes densely connected inside and loosely con-

nected outside the community. The key challenges of this problem are two-fold - (a) deal-

ing with multilayer network which contains multiple types of links (of different densities)

and nodes and (b) detecting both cross layer and single layer communities simultaneously

without any additional parameter.

In this subproblem, we propose a community detection algorithm ‘Multilayer Louvain’for multilayer networks which is able to detect communities comprising both single type

8 Chapter 1 intro

as well as multiple types of nodes, depending on the network structure (see Fig. 1.2). The

major contribution of this work is to propose a modularity index QM for characterizing

communities in multilayer networks which can be used for detecting the communities as

well. We both evaluate the performance of the proposed modularity as a community scoring

metric and as an optimization criterion while evaluating the performance of the developed

community detection algorithm against the competing algorithms.

Contributions

Summarizing, the key contributions of this work are the following:

1. We develop a methodology to construct synthetic multilayer network with ground

truth communities and evaluate it rigorously.

2. We propose a modularity index QM for characterizing communities in multilayer

networks.

3. We develop an algorithm ‘Multilayer Louvain’ incorporating the modularity indexQM which can detect communities in multilayer networks.

4. Finally, we evaluate the proposed multilayer algorithm on both the synthetic and em-

pirical datasets (‘Yelp’, ‘Meetup’ and ‘Digg’) and demonstrate that it outperforms

the state of the art baselines in correctly discovering the communities.

1.4.2 Information diffusion in multilayer networks

In this subproblem, we analyze the information diffusion dynamics in multilayer networks.

For this purpose, we choose a particular popular social network named ‘Twitter’ and model

it as a multilayer network. In recent times, Twitter has become one of the most influential

micro-blogging systems for spreading and sharing breaking news, personal updates and

spontaneous ideas [González-Bailón et al., 2011]. In Twitter, propagation of a tweet or

hashtag (unit of information) from one user to another occurs mainly via two activities:

1.4 Contributions 9

Figure 1.3: Multilayer representation of the Twitter social network where informationpropagates via both mention and follow links.

‘retweeting’ and ‘mentioning’ [Kato et al., 2012]. Interestingly, in case of retweet, infor-

mation is simply relayed to all the followers of the retweeting user whereas mention utility

allows to spread an information far beyond the neighborhood and improve its visibility by

making it available to the appropriate set of users.

Hence, in this subproblem, we investigate how an information gets spread over that net-

work by these multiple channels of propagation (see Fig. 1.3). Especially, as mentions get

listed in a separate tab, they gain higher attention than regular posts. As a result, mention

utility has a potential to play a significant role in the cascading behavior of tweets and

hashtags. Hence, in this work we specifically focus on studying the role of mention utility

behind popularizing a tweet. Furthermore, we also analyze if proper mentioning can be

utilized for popularizing useful tweets posted by a normal and trustworthy user. The key

challenges of this works are (a) to propose metrics to measure the utility of mentions in

tweet propagation, (b) to model the information diffusion in Twitter considering both fol-

low and mention links and (c) to analyze how the model can be exploited for building a

mention recommendation system.

Contributions:


1. We start with a comprehensive data study to motivate the importance of mention

10 Chapter 1 intro

utility on the popularity of a tweet. This study enables us to identify the important

features of the mentioned users contributing to tweet popularity; her follower count,

activity (retweet) rate, her profile similarity with the post etc.

2. We represent the tweet propagation process as a multilayer network [Granell et al.,

2013] and propose an analytical framework CMF to model the flow of tweets.

3. Simulation of the model with suitable parameters show a nice agreement with the

empirical tweet popularity observed in the dataset. Moreover, the simulation model

identifies critical threshold on the retweet rate which demarcates the phase transition

resulting (re)tweet cascading process. Our analytical framework nicely quantifies

this observed critical threshold.

4. Finally, taking cues from this model, we propose a simple mention recommendation

heuristics ‘Easy-Mention’ which outperforms the Whom-to-Mention benchmark al-gorithm [Wang et al., 2013a].

1.4.3 Node movement in multilayer networks

Analogous to human migration [Massey and Zenteno, 1999], scientific researchers migrate

from one field of research to another, primarily driven by the desire to excel in their careers

which demands a continuous flow of highly appreciated publications. Migration often

helps in inculcating novel ideas from encountering new challenges and provides opportuni-

ties associated with venturing into new research topics [Uzzi et al., 2013, Youn et al., 2015].

Multiple co-operative activities in knowledge social system [Allen et al., 2007]1 instigate

migration of the researchers (across disciplines) from various perspectives. For instance,

(a) strong and diverse collaboration among the researchers across various fields may initiate

the migration. (b). a researcher’s performance is determined by the amount of high quality

peer-reviewed publications she can produce; failure of which at certain phase may make

the researcher to contemplate migration (c) Further, citation process largely determines the

impact of an article2 and in-turn designates outstanding researchers. This helps a researcher

1In knowledge social system, researchers are the agents, and collaborations, citations, peer reviewing

etc. indicate various relationships.2Here, highly cited articles indicate influential work.

1.4 Contributions 11

U1

U5U6 U7 U8

U9

U13U12U10

FieldF1

U2 U3 U4

U14

Researchers

FieldF2

FieldF3

Collaboration Links

(a) Snapshot at time t1.

U1

U5

U6

U7 U8U9

U13U12U10

U2 U3 U4

U14

Researchers

Collaboration Links

FieldF1

FieldF2

FieldF3

(b) Snapshot at time t2.

Figure 1.4: Snapshots of the multilayer representation of the knowledge social network attimes t1 and t2. In between this two snapshots, the node U6 migrates from field F2 to fieldF1.

to assess her own performance in a research field, and take a decision of migration. Be-

yond this, a variety of microscopic factors can be identified that drive a researcher’s choice

of the new research field, ranging from age [Jones and Weinberg, 2011] to gender [Duch

et al., 2012, Ding et al., 2006], from funding opportunities [Defazio et al., 2009], to her

attitude and abilities [Osborne et al., 2003]. Given the broad impact of field migration

on individual careers and clear implications on science and innovation policy [Kuhn and

Hawkins, 1963, Merton, 1973, Rzhetsky et al., 2015, Wiggins and Crowston, 2010], it is an

interesting research problem to investigate the shift in the research interests of individual

researchers in this co-working scenario.

We can easily represent the knowledge social system as a multilayer ‘knowledge social net-

work’ where each layer consists of the scientists working on a particular field of research

and all scientists within and across layers are connected by collaboration (co-authorship)

links. In such a scenario, migration of researchers across multiple fields can be seamlessly

represented as temporal movement of nodes across layers of this multilayer network (see

Fig. 1.4). In this subproblem, we pose the following two research questions -

• What kind of co-working dynamics present in the Scientometrics echo-system leads a

researcher to migrate from one field to another? These factors can be very individual-

centric or it can be a fallout of the (rising/falling) state of the respective research

12 Chapter 1 intro

fields. Moreover, the collaborators and their research fields may play an important

role to drive the researchers for migration and the choice of research field to shift.

• What are the short-term and long-term impacts of migration on her own research pro-

file/performance, as well as on the overall Scientometrics echo-system? Precisely,

we concentrate on the gain and loss observed in the departing and joining fields, as

well as formation of new collaborations after migration and their role in shaping her

research profile/performance.

Contributions:


1. We propose a systematic classification algorithm for identifying migrating researchers

in computer science.

2. In order to explain their choice of fields during migration (movement across lay-

ers), we group the fields into ‘research domains’ and examine the temporal flow of

migrating researchers across them.

3. Subsequently, we introduce and investigate the role of various motivating factors be-

hind the researcher’s decision of moving from one field of research (layer) to another,

such as her own publication quantity and quality, popularity of her current research

field, influence of her collaborators etc.

4. Moreover, we reveal the impact of such movement on the researchers’ publication

and citation profile as well as on the evolution of the corresponding research fields.

1.4.4 Entity recommendation in multilayer networks

In order to fulfill this objective, we focus on an event based social networking (EBSN) por-

tal called ‘Meetup’ which is widely used for hosting events in various localities around the

1.4 Contributions 13

V1

E1 E2 E3 E4E5

G3G2G1

Venues

Events

Groups

V2 V3 V4

Event-Venue Links

Group-Event Links

Figure 1.5: Multilayer structure of EBSNs.

world [Macedo et al., 2015]. Meetup members join different online groups of their choice

and can enjoy face to face social interactions by participating in offline events organized

by those groups. Identifying and recommending suitable venue to organize a successful

Meetup event (attracting large population) is an essential problem for the event hosts [Qiao

et al., 2014, Macedo et al., 2015, Li et al., 2014]. In order to solve this subproblem, we

represent Meetup as a multilayer network containing groups, events and venues in three

individual layers (see Fig. 1.5). Within each layer, we connect the entities based on their

similarities. Furthermore, we add two types of cross-layer links - (a) group-event links

connecting events with their host groups and (b) event-venue links connecting events with

their host venues. Evidently, the aforementioned problem of venue recommendation can

be posed as an entity recommendation problem in this multi-entity multilayer setup.

The major challenge in performing venue recommendation in such a scenario is ‘Data

sparsity‘ i.e. the lack of qualitative and quantitative data related to the Meetup venues. Ad-

ditionally, in general, most user-item recommender systems [Lops et al., 2011, Mnih and

Salakhutdinov, 2008] learn user preferences based on past user-item interaction histories.

Unique to our problem is the lack of any history for a particular event, since by definition

each event instance is a singleton entity.

14 Chapter 1 intro

Contributions:


1. We propose a principled approach to map venues from Meetup to Yelp and retrieve

relevant venue related quantitative and qualitative information present there.

2. We build a deep learning based model to learn embeddings of venues, groups and

events for calculating their pairwise similarities.

3. Finally, we develop a Deep learning based Venue recommendation engine (DeepV enue)which recommends a ranked list of venues for hosting a target event.

1.5 Organization

The rest of the thesis is organized into six chapters.

• Chapter 2 presents a detailed literature survey on various structural (community de-

tection), functional (information diffusion, node movement) aspects of multilayer

networks and their applications to real-world systems (entity recommendation).

• Chapter 3 is dedicated to our first objective of detecting communities in multilayer

networks.

• Chapter 4 centers around our second objective, i.e., modeling information diffusion

in multilayer networks and its applications.

• Chapter 5 presents our third objective of analyzing movement of nodes across layers

of multilayer networks over time.

• Chapter 6 deals with our final objective of developing entity recommendation sys-

tems for multilayer networks.

• In chapter 7, we conclude the thesis by summarizing the contributions and pointing

to future directions that have opened up from this work.

Chapter 2

Literature survey

In this chapter, we review relevant studies related to the objectives of this thesis. In specific,

we concentrate on the following four directions. We first begin with discussing the state-of-

the-art community detection techniques proposed for multilayer networks. Next, we pro-

vide a detailed description of the existing information diffusion models for multilayer net-

works. In the third section, we review the existing literature on movement of nodes in mul-

tilayer knowledge social network. And finally, we dedicate our fourth section towards sum-

marizing different entity recommendation strategies in the context of multilayer networks.

2.1 Community detection in multilayer networks

Community detection deals with discovering densely connected groups of nodes in net-

works. Due to its wide applicability, the problem of community detection has been ex-

tensively studied in the context of single layer networks during the last decade [Fortunato,

2010a, Danon et al., 2005, Newman, 2006, Newman and Girvan, 2004]. Researchers have

proposed numerous methods, such as modularity maximization [Clauset et al., 2004, Blon-

del et al., 2008], spectral clustering [Newman, 2013] and statistical inference [Reichardt

and Bornholdt, 2006] for detecting communities in single networks. However, detecting

communities in multilayer networks is much more challenging as the discovered commu-

15

16 Chapter 2 Literature survey

nities have the possibility to contain only single or multiple types of nodes.

Most of the recent endeavors in this area concentrated on the multiplex networks [Mucha

et al., 2010, Kuncheva and Montana, 2015] where all layers share the identical set of nodes

but may have multiple types of interactions. In multiplex network, some of the approaches

propose new quality metrics [Mucha et al., 2010] to measure the goodness of the detected

communities whereas a few other approaches utilize random walk [Kuncheva and Montana,

2015] or frequent-pattern mining techniques [Berlingerio et al., 2013] to obtain structurally

similar components. In principle, most of these algorithms transform the problem to the

classical community detection in a monoplex network leveraging on the fact that in multi-

plex network, one-to-one cross layer links connect the copies of the same nodes in multiple

layers. Unfortunately, the presence of heterogeneous nodes across multiple layers and

cross layer dependency links make the aforementioned solutions inadequate for multilayer

networks.

Subsequently, attempts have been made in bits and pieces to detect communities in multi-

layer networks; novel methodologies have been introduced such as Dirichlet process [Sun

et al., 2014], tensor factorization [Lin et al., 2009], subspace clustering [Dong et al., 2014],

non-negative matrix factorization [Comar et al., 2012, Cheng et al., 2013], information

compression [Liu et al., 2016], high-order bi-clustering [Bekkerman et al., 2005, Banerjee

et al., 2007] etc. For instance, in [Sun et al., 2014], Sun et al. proposed a hierarchical

Dirichlet process mixture model-based evolution model which detects the co-evolution of

multityped objects in the form of multityped cluster evolution. On the other hand, Liu et

al. proposed an information compression based method [Liu et al., 2016] for community

detection in multi-partite, multi-relational networks. The idea was to convert the commu-

nity detection problem into a problem of finding an efficient compression of the network’s

structure using the minimum description length (MDL) principle [Rissanen, 1978]. Tradi-

tionally, bi-clustering methods were developed for simultaneously clustering both the rows

and the columns of a two-way matrix [Dhillon, 2001, Madeira and Oliveira, 2004, Pensa

et al., 2005]. This makes them extremely suitable for clustering bipartite-type relational

data, such as heterogeneous networks with two types of nodes. As a result, bi-clustering

methods have been extensively studied in many different applications, such as text min-

ing [Dhillon, 2001], gene expression analysis [Cho et al., 2004], and image retrieval [Qiu,

2004]. Furthermore, a substantial attention has been devoted to extend traditional bi-

2.1 Community detection in multilayer networks 17

clustering algorithms [Dhillon, 2001, Madeira and Oliveira, 2004, Pensa et al., 2005] for

multi-way clustering, in which entities of more than two classes are clustered simultane-

ously [Bekkerman et al., 2005, Banerjee et al., 2007]. Recent endeavors are also found to be

directed towards development of modularity index for multilayer networks. For instance,

composite modularity [Liu et al., 2014] calculates the modularity of a multi-relational net-

work as the integration of modularities calculated for each single-relational subnetwork.

Similarly, another modularity definition is proposed in [Song et al., 2015] in the context of

gene-chemical interaction network.

2.1.1 Limitations and scope of work

In the following we summarize the limitations of the aforementioned approaches - (a) First

of all, some of the aforesaid algorithms only work on a specific type of multilayer networks.

For instance, Comar et al. developed a method based on non-negative matrix factoriza-

tion [Comar et al., 2012] which is restricted to a specific subclass of networks containing

only two types of nodes and three types of edges. On the other hand, the Dirichlet process

based model developed by Sun et al. [Sun et al., 2014] can only be applied for in a special

type of heterogeneous networks, called star networks. Clearly, these type of methods are

not applicable to general multilayer networks with any possible structure. (b) Secondly,

some of them are forced to detect communities comprising only single type of nodes or

only multiple types of nodes, hence introducing bias. For instance, Liu et al. proposed

an information compression based method [Liu et al., 2016] for community detection in

multi-partite, multi-relational networks. The idea was to convert the community detection

problem into a problem of finding an efficient compression of the network’s structure using

the minimum description length (MDL) principle [Rissanen, 1978]. However, the informa-

tion compression based and modularity based algorithms proposed by Liu et al. [Liu et al.,

2016, Liu et al., 2014] detects only communities with single type of nodes in a multilayer

network. Similarly, high order bi-clustering algorithms [Bekkerman et al., 2005, Banerjee

et al., 2007] where objects of each type are separately grouped, also suffers from the lim-

itation of each cluster consisting of objects of only one type [Huang and Gao, 2014]. On

the contrary, Lin et al.’s tensor factorization based method [Lin et al., 2009] assumes that

each community contains nodes from every layer of the network which in turn implies that


nodes of different types have the same number of communities. Unfortunately, this kind

of situations are rarely observed in real world scenarios. (c) Thirdly, the desired number

of communities are required to be fixed apriori for most of them [Lin et al., 2009, Dong

et al., 2014], limiting their capability to discover the true set of communities. (d) Finally,

a proper framework to generate benchmark communities for generic multilayer network is

not available in any of them.

The detailed exploration of prior art reveals that there is a requirement as well as scope

to develop a multilayer community detection algorithm which is free from (a) any exter-

nal parameter such as total number of communities and (b) Any bias towards communities

with only single type or only multiple types of nodes. Developing a suitable modularity

index should be the first step towards this direction.

2.2 Information diffusion in multilayer networks

In this work, we specifically focus on how information propagates via follow and men-

tion links in Twitter. The state of the art literature in this domain can be summarized in

three different segments (a) First, modeling information diffusion via retweets, next (b)

various attempts to analyze and boost popularity of tweets, and finally (c) recent endeavors

incorporating mentions in tweets.The details follows.

2.2.1 Information propagation in Twitter

Diffusion on social network classically involves the following two propagation models -

linear threshold [Granovetter, 1978] and independent cascade [Goldenberg et al., 2001].

Linear threshold model associated a threshold with each node; a node gets infected if the

number of infected neighbors exceeds that threshold. On the other hand, the indepen-

dent cascade model associates a fixed spreading probability per graph edge and allows

each node to attempt infecting another node only once. Further studies [Galuba et al.,

2010, Dickens et al., 2012a] have generalized these models. In continuation, [Kwak et al.,

2010] treated retweet trees as communication channels of information diffusion and ana-

2.2 Information diffusion in multilayer networks 19

lyzed the tweets of top trending topics in whereas [Lerman and Ghosh, 2010] studied the

distribution of retweet cascades on Twitter. Side by side, popular epidemic like models

such as SIS (Susceptible-Infected-Susceptible), SIR (Susceptible-Infected-Recovered) are

also explored to model information contagion in Twitter [Li et al., 2013, Abdullah and Wu,

2011, Jin et al., 2013]. This type of models allows individuals to have the flexibility of

cyclically changing their dynamical states based on whether they are exposed to the infor-

mation, have actively participated in the spreading process or are immune to it. Recently,

the evolution of text mining in Twitter led to deep analysis of the topical retweet cascades

and virality prediction [Cheng et al., 2014, Kwak et al., 2010, Suh et al., 2010]. There

exists stochastic models (epidemic models [Li et al., 2013] and Markov chains [Dickens

et al., 2012b]), machine learning [Xu and Yang, 2012, Gupta et al., 2012] and point process

based models [Zhao et al., 2015, Bao et al., 2015] to explain the retweeting activities and

modeling the cascade formation.

Diffusion of tweets via both follow and mention modalities can be effectively modeled in

the multiplex network framework as the set of users in both the layers are largely over-

lapping. In recent times, there has been quite a lot of interest in developing diffusion

models for multiplex networks. Initially a few simplistic models such as [Buono et al.,

2014, Zhao et al., 2014, Cozzo et al., 2013] analyzed the epidemics in multilayer networks

which started as a single contagion process spreading across multiple layers. Subsequently,

recent endeavors modeled the simultaneous epidemic spread across the multiple layers of

the network. For example, [Darabi Sahneh and Scoglio, 2014] proposed a simple exten-

sion of standard SIS framework to model competitive spreading over a two-layer network.

The major contribution of this work was identifying and quantifying extinction, coexis-

tence, and absolute dominance of the competitive epidemic process via defining survival

thresholds and absolute-dominance thresholds. Interestingly, Guo et al. [Guo et al., 2015]

and Granell et al. [Granell et al., 2013] demonstrated the interplay between simultaneous

spread of an epidemic disease and awareness against it, in the framework of multiplex

networks. Utilizing a microscopic Markov chain approach, they identified a phase transi-

tion and allowed us to capture the evolution of the epidemic threshold depending on the

topological structure of the multiplex and the interrelation with the awareness process.


2.2.2 Analyzing and boosting tweet popularity

With the advent of text mining, research in the domain of information propagation in Twit-

ter started progressing in two clearly distinguishable tracks. On one hand, several studies

have been carried out for understanding the dynamics behind the popularity of tweets. For

example, [Suh et al., 2010] used a generalized linear model to understand what features

influence the chance of a tweet being retweeted by anyone. In similar vein, in [Petro-

vic et al., 2011] and [Malhotra et al., 2012], researchers investigated the role of content

and contextual features of tweets and identified factors that are significantly associated

with retweet rate and tweet popularity. Additionally, few [Bakshy et al., 2011b, Zaman

et al., 2010] tried to analyze the problem at individual level and predict the existence of

a retweet between a particular pair of users. On the other hand, several studies have been

made on influence models [Chen et al., 2009, Bakshy et al., 2011a] and different recom-

mendation systems have been proposed. For example, [Uysal and Croft, 2011] proposed

methods to recommend useful tweets that users are really interested in and more likely to

retweet: given a tweet, they rank users based on retweet probability. Importantly, [Cha

et al., 2010] revealed that follower count is not necessarily the best metric to measure the

influence. Subsequently, considering the influential users in Twitter as potential informa-

tion brokers, researchers proposed models to identify them and maximize the information

propagation [Borge-Holthoefer et al., 2012, Chen et al., 2009]. Notably, all the aforemen-

tioned models consider retweets as the only mode of tweet propagation.

2.2.3 Mentioning activities in Twitter

Mentioning is mainly considered as a medium of attracting the attention of influential peo-

ple regarding a tweet so that the popularity of the corresponding content increases. How-

ever, mentioning one influential user in a tweet does not ensure that she reposts it. This

later part depends on several factors including information content of the post, profile of the

tweeting user, etc. Standard influence models in general do not capture this type of prop-

erties while computing the influence of individual users. This motivates the community

for the development of mention recommendation algorithms to identify the suitable users

to mention. For instance, [Wang et al., 2013a] formulated the task as a learning-to-rank

2.3 Node movement in multilayer networks 21

problem and proposed Whom-to-Mention heuristic that uses features (such as user interest

match, content dependent user relationship and user influence) and trains a machine learned

ranking function to extract the best users to mention. Similar recommendation heuristics

can be found in [Zhou et al., 2015] and [Tang et al., 2014]. However, instead of being lim-

ited to maximizing the spread of a microblog, [Gong et al., 2015] proposed a novel topical

translation based method to predict the users whom the authors try to mention.

2.2.4 Limitations and scope of work

The literature discussed above, suffers from the following limitations - (a) most of the in-

formation diffusion models proposed for Twitter ignore the mention links. (b) Due to the

absence of network structure realizing mentioning activities, the aforementioned multiplex

models of diffusion stand inadequate for Twitter. (c) In case of mention recommendation,

most of the existing recommendation heuristics are based on empirical observations and

lacks comprehensive perception of the role of mentions on tweet diffusion.

The detailed survey of the relevant literature implies that there is an ample scope to develop

an information diffusion model for Twitter which considers both retweting and mentioning

modalities. There is also a scope of developing a novel mention recommendation system

which would not only take cues from the existing works on popularizing tweets but also rely

on the understanding of the dynamics obtained from the aforementioned diffusion model.

2.3 Node movement in multilayer networks

The dynamics behind node movement across the layers of multilayer networks is rarely

studied in literature (except few works in the domain of transportation networks [Lotero

et al., 2016]). While addressing this objective, we specifically focus on analyzing the

knowledge social network where layers represent fields of research and nodes in each layer

represent the researchers working in the corresponding field. Precisely, we study the rea-

sons and impacts of scientific migration i.e. the movement of researchers across the layers

(i.e. fields of research) of the knowledge social network over time. In the following, we


discuss the relevant literature.

2.3.1 Scientific migration

To the best of our knowledge, understanding of how scientists choose and shift their re-

search focus over time remains pretty scant. During the 80’s, there was a surge in the

research on field mobility and field migration [Vlachỳ, 1981, Le Pair, 1980, Van Houten

et al., 1983, Hargens, 1986]. Field mobility in those works has been discussed as the driv-

ing force for the exploration of new territories in the landscape of science [Urban, 1982]. In

those endeavors, experiments relied on personal interviews and surveys to trace academic

careers (primarily in physics domain); evidently those approaches were mostly restricted to

small case studies and could not be generalized [Van Houten et al., 1983]. In recent times,

few attempts have been made in bits and pieces on individual field migration oriented study.

In [Hellsten et al., 2007] Hellsten et al. developed a new way of approaching field mobility

via self-citations, whereas in [Chakraborty et al., 2015] Chakraborty et. al. proposed a

model to mimic the notion of field selection process of researchers. Recently, Jia et. al. [Jia

et al., 2017] aimed to model the research interest evolution of scientific researchers using

a simple random walk. They exploited the Physics and Astronomy Classification Scheme

(PACS) codes used by American Physical Society (APS) to classify topics in Physics and

identified three fundamental factors - recency, heterogeneity and subject proximity which

shape the interest evolution of researchers.

2.3.2 Researcher mobility

Rather than movements across research fields, in literature ‘mobility’ of a researcher is typ-

ically perceived as movement across countries or universities [Scellato et al., 2015, Deville

et al., 2014, Franzoni et al., 2014]. For instance, in [Franzoni et al., 2014] Franzoni et al.

showed that migrant scientists exhibit higher productivity compared to domestic scientists,

irrespective of their prior experience of international mobility. This superior performance is

potentially caused by the gain achieved from knowledge recombination and specialization

matching as a result of mobility. On the other hand, Barabasi et al. studied the pattern of

2.3 Node movement in multilayer networks 23

career movement at an institutional level [Deville et al., 2014]. The authors observed that

going from elite to lower-rank institutions, on average, associates with modest decrease in

scientific performance. However, transitioning into elite institutions does not result in sub-

sequent performance gain. In other words, it is not the institution that creates the research

impact; it is the individual researchers that make an institution [Fortunato et al., 2018].

2.3.3 Interdisciplinarity in research

In the scientometrics domain, plenty of attempts have been made to study the impact of

interdisciplinary research [Yegros-Yegros et al., 2015, Wagner et al., 2011, Chakraborty

e

analyzing the structure and dynamics of ...people.mpi-inf.mpg.de/~pramanik/thesis_soumajit.pdflytics...

Documents