online social networks: measurement, analysis, and ...· online social networks: measurement,...

Download Online Social Networks: Measurement, Analysis, and ...· Online Social Networks: Measurement, Analysis, and Applications to Distributed Information Systems Alan E. Mislove Abstract

Post on 04-Feb-2018

219 views

Category:

Documents

4 download

Embed Size (px)

TRANSCRIPT

  • RICE UNIVERSITY

    Online Social Networks:

    Measurement, Analysis, and

    Applications to Distributed Information Systems

    by

    Alan E. Mislove

    A Thesis Submitted

    in Partial Fulfillment of the

    Requirements for the Degree

    Doctor of Philosophy

    Approved, Thesis Committee:

    Peter Druschel, ChairProfessor of Computer Science

    T. S. Eugene NgAssistant Professor of Computer Science

    Krishna P. GummadiAssistant Professor of Computer Science

    Houston, Texas

    April, 2009

  • Online Social Networks:

    Measurement, Analysis, and

    Applications to Distributed Information Systems

    Alan E. Mislove

    Abstract

    Recently, online social networking sites have exploded in popularity. Numerous sites

    are dedicated to finding and maintaining contacts and to locating and sharing different

    types of content. Online social networks represent a new kind of information network

    that differs significantly from existing networks like the Web. For example, in the

    Web, hyperlinks between content form a graph that is used to organize, navigate, and

    rank information. The properties of the Web graph have been studied extensively,

    and have lead to useful algorithms such as PageRank. In contrast, few links exist

    between content in online social networks and instead, the links exist between content

    and users, and between users themselves. However, little is known in the research

    community about the properties of online social network graphs at scale, the factors

    that shape their structure, or the ways they can be leveraged in information systems.

    In this thesis, we use novel measurement techniques to study online social net-

    works at scale, and use the resulting insights to design innovative new information

    systems. First, we examine the structure and growth patterns of online social net-

  • works, focusing on how users are connecting to one another. We conduct the first

    large-scale measurement study of multiple online social networks at scale, capturing

    information about over 50 million users and 400 million links. Our analysis identifies

    a common structure across multiple networks, characterizes the underlying processes

    that are shaping the network structure, and exposes the rich community structure.

    Second, we leverage our understanding of the properties of online social networks

    to design new information systems. Specifically, we build two distinct applications

    that leverage different properties of online social networks. We present and evaluate

    Ostra, a novel system for preventing unwanted communication that leverages the

    difficulty in establishing and maintaining relationships in social networks. We also

    present, deploy, and evaluate PeerSpective, a system for enhancing Web search using

    the natural community structure in social networks. Each of these systems has been

    evaluated on data from real online social networks or in a deployment with real

    users.

  • Acknowledgments

    First and foremost, I would like to thank my advisors, Peter Druschel and Krishna P.

    Gummadi, for their help, advice, and mentoring during my graduate career. Without

    their support and guidance, none of the work presented in this thesis would have

    been possible. Moreover, I am deeply indebted to them both for showing me how to

    do successful research, how to mentor students, and how to communicate research

    results effectively. I suspect that this debt will only grow over time, as I use these

    skills in my own research career.

    I would also like to thank Eugene Ng for his service on my thesis committee.

    His insight and advice proved very useful during the preparation of this thesis, and

    in my search for a tenure-track job. I am also grateful to have worked with Bobby

    Bhattacharjee his advice and enthusiasm played no small part in my decision to

    continue a career in academia.

    I am extremely grateful to have worked with and mentored numerous talented

    students during my research career. Working with Bimal, Malveeka, and Hema was a

    pleasure, and the excitement and energy they each brought to their research was both

    refreshing and invigorating. I hope that I am lucky enough to work with students of

    a similar caliber in the future.

  • v

    I am deeply indebted to Brigitta Hansen, Claudia Richter, and Belia Martinez,

    whose assistance with many administrative matters proved invaluable. They all made

    living in Germany while finishing a Ph.D. at Rice a much easier experience.

    I would also like to thank my colleagues and friends in Saarbrucken: Ansley,

    Animesh, Atul, Andreas, Jeff, Jim, Rodrigo, Andrey, Derek, Rose, Marcel, Max,

    Nuno, Pedro, Mia, and Ashu. They all made MPI-SWS a wonderful place to be, and

    being in Germany is an experience that I will always treasure.

    I am also grateful for my close friendship with Rebecca. Her contagious excitement

    and enthusiasm was always refreshing, and I benefited greatly from her insight and

    advice. Additionally, I am grateful for my friendship with Stephanie our travels

    and adventures often provided a needed break from research.

    Finally, I would like to express my deep gratitude to my family, and especially my

    parents, for their love and support during the ups and downs of graduate school. I

    am grateful beyond words for all that they have given me.

  • Contents

    Abstract ii

    Acknowledgments iv

    List of Illustrations xv

    List of Tables xxii

    1 Introduction 1

    1.1 Background, related work, and methodology . . . . . . . . . . . . . . 4

    1.2 Network structure and growth . . . . . . . . . . . . . . . . . . . . . . 5

    1.3 Communities in online social networks . . . . . . . . . . . . . . . . . 7

    1.4 Ostra: Leveraging relationships . . . . . . . . . . . . . . . . . . . . . 8

    1.5 Peerspective: Leveraging shared interest . . . . . . . . . . . . . . . . 9

    2 Background 11

    2.1 What are online social networks? . . . . . . . . . . . . . . . . . . . . 11

    2.1.1 Definition and purpose . . . . . . . . . . . . . . . . . . . . . . 11

    2.1.2 A brief history . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.1.3 Mechanisms and policies . . . . . . . . . . . . . . . . . . . . . 14

    2.1.4 A new form of information exchange . . . . . . . . . . . . . . 17

  • vii

    2.2 Why study online social networks? . . . . . . . . . . . . . . . . . . . 19

    2.2.1 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.2.2 Shared interest . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.2.3 Content exchange . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.2.4 Other disciplines . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.3 How do we analyze complex networks? . . . . . . . . . . . . . . . . . 23

    2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    2.3.2 Radius and diameter . . . . . . . . . . . . . . . . . . . . . . . 23

    2.3.3 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . 24

    2.3.4 Joint degree distribution . . . . . . . . . . . . . . . . . . . . . 25

    2.3.5 Scale-free behavior . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.3.6 Assortativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.3.7 Clustering coefficient . . . . . . . . . . . . . . . . . . . . . . . 27

    2.3.8 Betweenness centrality . . . . . . . . . . . . . . . . . . . . . . 27

    2.3.9 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    2.3.10 Connected components . . . . . . . . . . . . . . . . . . . . . . 29

    2.3.11 Classes of studied networks . . . . . . . . . . . . . . . . . . . 30

    2.3.12 Preferential attachment . . . . . . . . . . . . . . . . . . . . . 31

    3 Related Work 32

    3.1 Complex network structure . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.1.1 Social networks . . . . . . . . . . . . . . . . . . . . . . . . . . 33

  • viii

    3.1.2 Other information networks . . . . . . . . . . . . . . . . . . . 35

    3.2 Complex network growth . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.2.1 Growth models . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.2.2 Observations of network growth . . . . . . . . . . . . . . . . . 39

    3.3 Detecting communities . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    3.3.1 Classical community detection . . . . . . . . . . . . . . . . . . 41

    3.3.2 Global community detection . . . . . . . . . . . . . . . . . . . 42

    3.3.3 Local community detection . . . . . . . . . . . . . . . . . . . . 44

    3.3.4 Observations of communities . . . . . . . . . . . . . . . . . . . 46

    3.4 Preventing unwanted communication . . . . . . . . . . . . . . . . . . 47

    3.4.1 Content-based filtering . . . . . . . . . . . . . . . . . . . . . . 48

    3.4.2 Originator-based filtering . . . . . . . . . . . . . . . . . . . . . 49

    3.4.3 Imposing a cost on the sender . . . . . . . . . . . . . . . . . . 50

    3.4.4 Content rating . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    3.4.5 Leveraging relationships . . . . . . . . . . . . . . . . . . . . . 53

    3.5 Personalized web search . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4 Measurement Methodology 57

    4.1 Challenges in crawling large graphs . . . . . . . . . . . . . . . . . . . 57

    4.1.1 Crawling the entire large WCC . . . . . . . . . . . . . . . . . 58

    4.1.2 Using only forward links . . . . . . . . .

Recommended

View more >