towards designing a light weight social network analysis tool

Towards Designing a Light Weight

Social Network Analysis Tool

by

Veerasingam Visithan

August 2011

Contents

1 Introduction 1

2 Background 42.1 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 The Definition of Online Social Networks . . . . . . . . . . . . . . 62.3 Social Network Analysis (SNA) . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Metrics of Interest in Social Network Analysis . . . . . . . . . . . . . . . . 8

2.4.1 Density of a Network . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.2 Degree of a Vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4.3 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4.4 Shortest Path and Shortest Path Length . . . . . . . . . . . . . . . 92.4.5 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.5.1 Degree Centrality . . . . . . . . . . . . . . . . . . . . . . 112.4.5.2 Betweenness Centrality . . . . . . . . . . . . . . . . . . . 112.4.5.3 Closeness Centrality . . . . . . . . . . . . . . . . . . . . . 11

2.4.6 Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.6.1 Weak Link Removing Method . . . . . . . . . . . . . . . 122.4.6.2 Removal of Broker Nodes . . . . . . . . . . . . . . . . . . 12

2.5 Modularity Division Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Network Analysis Software . . . . . . . . . . . . . . . . . . . . . . . . . . 142.7 Visualization of a Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Tool Development 163.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Implementation of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 Algorithm for Computing the Density of a Network . . . . . . . . 173.2.2 Algorithm for Finding the Degree of Node . . . . . . . . . . . . . . 173.2.3 Algorithm for Finding the Neighbors of Node . . . . . . . . . . . . 183.2.4 Algorithm for Finding the Shortest Path Length . . . . . . . . . . 183.2.5 Algorithm for Finding the Connected Components . . . . . . . . . 193.2.6 Algorithm for Finding the Shortest Path . . . . . . . . . . . . . . . 203.2.7 Algorithm for Finding the Degree Centrality . . . . . . . . . . . . 213.2.8 Algorithm for Finding the Betweenness Centrality . . . . . . . . . 213.2.9 Algorithm for Finding the Mutual Neighbors of Two Nodes . . . . 233.2.10 Algorithm for Community Detection (Weak Link Removing Algo-

rithm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

iv

CONTENTS v

4 Results and Performance Comparison 264.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 The Overview of a Network . . . . . . . . . . . . . . . . . . . . . . 264.1.2 Degree of a Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.1.3 Connected Components of a Network . . . . . . . . . . . . . . . . 264.1.4 Centrality in a Network . . . . . . . . . . . . . . . . . . . . . . . . 274.1.5 Shortest Paths and Shortest Path Lengths . . . . . . . . . . . . . . 284.1.6 Community in a Network . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1 Visualization Speed . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.2 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Conclusions and Future Work 33

Bibliography 34

List of Figures

2.1 An example of social network visualization . . . . . . . . . . . . . . . . . . 82.2 An example of connected components of a network . . . . . . . . . . . . . 102.3 A case of centrality measures applied on an example network . . . . . . . 112.4 Application of network clustering . . . . . . . . . . . . . . . . . . . . . . . 122.5 Application of network clustering . . . . . . . . . . . . . . . . . . . . . . . 122.6 Network Visualization using NetworkX . . . . . . . . . . . . . . . . . . . . 152.7 Visualization of a network using our tool . . . . . . . . . . . . . . . . . . . 15

4.1 Overview of a network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2 Calculation of degree of all nodes of a network . . . . . . . . . . . . . . . 274.3 Connected components in a network . . . . . . . . . . . . . . . . . . . . . 284.4 Calculation of centrality of a node in a network. . . . . . . . . . . . . . . 294.5 Shortest path calculation and visualization of given two nodes . . . . . . 304.6 Communities detection in a network and visualization of communities . . 304.7 Performing visualization using our tool . . . . . . . . . . . . . . . . . . . . 314.8 Comparing degree centrality performance using our tool and NetworkX . 314.9 Comparing betweenness centrality performance using our tool and Net-

workX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

vi

List of Tables

1.1 A listing of network analysis tools . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Categories of reasons for people to join social networks . . . . . . . . . . . 6

vii

Chapter 1

Introduction

Social network analysis has gained much popularity in recent years and the computerscience research community has paid much attention in designing and developing neweralgorithms,developing software tools and formulating theoretical foundations. There aremany reasons for such increased interest. Monitoring, automated instrumenting, analyz-ing and evaluating the Internet or computer based social activities can help researchers,and authorities to extract useful information. In a way These extracted informationcan assist authorities to monitor and prevent peer-to-peer file exchanges, unnecessarychatting, blogging and the outbreak of viral infections. This information can also assistintelligence agencies seeking to discover terrorist network malicious attacks etc.

Social network analysis began with the detection and interpretation of patterns of socialties between actors of networks. It has its origins in sociology and mathematics, butit is now widely being practiced across a wide range of other disciplines too. Socialnetwork analysis allows us to find density and degree of a network, recurring patternsof connectivity, clustering or community, structure present in the network, and to studyhow the behavior of network members is affected by their positions. It also allows us tomathematically and computationally determine centrality of nodes, connected compo-nents of a network, shortest paths between nodes and shortest path lengths between anytwo nodes. Social network analysis has been applied in areas as diverse as psychology,health, business organization, and electronic communications in order to characterizeand observe the behavior of human nature, epidemicity of diseases, business production,network monitoring and resource allocation, etc.

Social networks are often large, complex and continuously evolving, and this poses agrater deal of challenges is developing tools for social network analysis. Informationvisualization can be an effective approach to help social science researchers both to ex-plore relationships between actors and present their findings to others. For both of thesepurposes, making the visual representations of the social networks more understandablewill be the most critical job.

1

Chapter 1 Introduction 2

Network visualization has been an important tool for researchers from the very beginningof social network analysis. In visualization a social network is visualized as a graph. Eachactor is represented by a node and the relations among the actors are represented bylinks. We can easily find the most central node and the neighbor nodes of a node in anetwork using visualization tools.

Visualization is the act of representing a network as a graph of vertices(nodes) andedges(links). In social network analysis, nodes represent an actor and a link betweentwo nodes represents a relationship between the two nodes.

A challenging problem in the analysis of graph structures is the dense sub graph prob-lem, where given a sparse graph, the objective is to identify a set of meaningful dense subgraphs. This problem has attracted much attention in recent years due to the increasedinterest in studying various complex networks, such as the World Wide Web (informa-tion network), social networks, and biological networks, etc. The dense sub graphs areoften interpreted as ”communities” or ”clusters” based on the basic assumption that anetwork system consists of a number of communities, between which the connections aremuch fewer than those inside the same community. Detecting meaningful sub graphswithin a network is called community detection. There are several approaches exist-ing for community detection such as Kernighan Lin algorithm, spectral partitioning,hierarchical clustering, and weak link removal clustering etc.

Other important properties of a social network are the shortest paths between given twonodes and path length between any two nodes and the density of a network, degree of anode, common neighbors of any two given nodes.

Social network analysis tools have been used to analyze interactions within social networksites. There exist several network analysis tools and in Table1.1 we list some of the mostpopular tools.

In this work we have developed a light weight social network analysis tool using the Javaprogramming language. The reason for the selection of Java language is its platformindependence. This tool has been developed with a vision to assist researchers andauthorities to computationally visualize and analyze a social network and extract usefulinformation. We have develop this tool such that it can receive a text file consisting ofrelationship information among network nodes in a three column format and then canvisually present the network to the user. Then the user can apply operations such asbetweenness calculations, community identification etc. On this read network and canvisually see the result of the operation. Our tool also allows a user to store the visualresult at each stage as a picture in formats such as .png, .jpg, .gif and also allows storingof the calculated network as a text file in the .txt format. This tool is also capable ofhandling large networks consisting of thousand of nodes and edges.

This tool has organized as follows: in chapter 2 we describe the design background

Chapter 1 Introduction 3

Product Main FunctionalityAllegroGraph Graph Database. RDF with Gruff visualization toolNetworkX Python package for the creation, manipulation, and

study of the structure, dynamics, and functions ofcomplex networks.

Pajek Analysis and Visualization of Large Scale NetworksR Social network analysis within the versatile and pop-

ular R environmentNetMiner All-in-one Software for Network Analysis and Visu-

alizationGraphviz Graph visualization softwareiPoint Analysis and visualization of social networks trends,

geo-location, age, gender and sentimentNetMiner All-in-one Software for Network Analysis and Visu-

alizationCFinder Finding and visualizing communities’

Table 1.1: Network Analysis tools. Source from Wikipedia

of our tool, metrics in social network analysis and details of existing social networkanalysis tools.Chapter 3 describe the outline of the tool, the algorithms used in our toolenhancements to existing algorithms we have made and the implementation of algorithmsin our tool. In chapter 4, we present the performance comparison of our tool with theexisting tools and provide a discussion on the performance comparisons. In chapter 5we list our conclusions and we outline our future work.

Chapter 2

Background

2.1 Networks

A network is composed of a set of vertices or nodes that represent the selected units,and a set of lines or links that represent ties between units. Each link has two nodesat its end points, and can be directed or undirected. Additional data about nodes orlines are usually known as their properties or attributes. For example name or label of anode, weight of links, position, etc. Simply saying a network is said to consist of graphand data [4].

Network = Graph + Data.

Data attached to nodes and links can be obtained from measurements or by manipulatingother related factors and then can be computed. Formally a network N = (V, E, P, W)is considered consists of:

1. A Graph G = (V, E), Where V is the set of nodes and E is set of links

2. P = Properties of Nodes

3. W = The data attached to each link

The size of a network can be expressed by two numbers: number of nodes and numberof links. In the real world, there exist several networks such as:

• Information networks: which consist connections between information objects.

– Network of citations between academic papers, World Wide Web (networkof Web pages containing information with links from one page to other),semantic (how words or concepts link to each other).

4

Chapter 2 Background 5

• Technological networks: Designed typically for distribution of services.

– Infrastructure networks: e.g., Internet (connections of routers or administra-tive domains), power grid, transportation networks (road, rail, airline, mail)

– Temporary networks: e.g., ad hoc communication networks, sensor networks,autonomous vehicles

• Biological networks: A number of biological systems can also be represented asnetworks.

– Food web, protein interaction network, network of metabolic pathways

• Social and economic networks:

– Friendship networks, business relations between companies, intermarriagesbetween families, labor markets, Facebook.

2.2 Social Networks

Networking and Social relationships are key components of human life, and they havebeen historically bound according to space and time limitations; these restrictions havebeen partially removed because of the Internet evolution and its use diffusion. In par-ticular, the appearance of Web technologies and their evolution towards the Web 2.0enable people to organize themselves into Online Social Networks in the same mannerthey organize themselves in social networks in their real life [2].

A social network consists of a number of actors or individuals connected by some kindof relationship. Actors can be individuals, groups, or organizations. Relationships canbe of any kind, financial, friendship, professional, etc. Networks can also include actors’relationships with other kinds of entities, such as events that multiple people may attend,or interests that multiple people may share. In Online Social Networks, individuals canexchange information by posting conversations, or by browsing or by posting questions.These information resources are socially helpful because they allow people to easilyestablish contacts among themselves.

Infect, Online Social Networks are fostered by a combination of hardware and softwaretools that allow people to easily, inexpensively, immediately, universally, and reliablyshare information by interacting with each other using data communication on networks.

The difference between both earlier social networks and Online Social Networks mainlydepends of the mechanism used by the members to communicate with each other; theregular social networks based on face-to-face interaction and the online line social net-work employs information and communication technology tools to facilitate interactionand promote communication among people anywhere anytime.


The notion of an interaction refers to the mingling of a relationship during which peoplecan share their, interests and experiences with others.

The interest in online Social Network Analysis has been growing massively in recentyears. Psychologists, anthropologists, sociologists, economists, and statisticians havegiven significant contributions, making it actually an interdisciplinary research area.This growth matches with an increasing development of methods used to collect andvisualize network data in order to analyze relationships between people, groups, organi-zations and other knowledge-processing entities on the net.

2.2.1 The Definition of Online Social Networks

Romm et al,[16] define Online Social Networks as group of people who communicatewith each other via electronic media and share common interests unconstrained by theirgeographical location, physical interaction, or ethnic origin.

Ridings et al,[15] define Online Social Networks as groups of people with common in-terests and practices that communicate regularly and for some duration in an organizedway over the Internet through a common location or mechanism.

Hagel et al, [10] take a business perspective and define Online Social Networks as groupsof people drawn together by an opportunity to share a sense of community with like-minded strangers having common business interest.

Balasubramanian et al, [3] take an economic perspective and define Online Social Net-works as entities characterized by ’an aggregation of people, who are rational utility-maximizes, who interact without physical collocation in a social exchange process witha shared objective’.

The main issue in studying Online Social Networks is to analyze motivations that leadpeople to join them. As shown in Table 1.1, there are different motivations that leadpeople to join online Social Networks.

Category Description DimensionsExchange information Obtain information about

a topicMotivation, skills, digital divide,confidence, well-being

Social aspect Obtain emotional support discussion groups, Education,leisure entertainment

Friendship To made friends Shared knowledge, collective expe-rience, self-esteem

Recreation For entertainment Shopping, ordering,investments,bull paying

Table 2.1: Categories of reasons for people to join social networks


Information exchanging is the most important factor in the success of Virtual SocialNetworks. The possibility to share a lot of information in Virtual Social Networks, allowsdiscussions on different questions and problems. Individuals can either give information(by posting conversations) or get information (browsing or soliciting information byposting questions or comments).

2.3 Social Network Analysis (SNA)

The Online Social Network Analysis indicates the study of the online social structure,detecting and interpreting patterns of social relations among actors or individuals.

The analysis allows to find its density and degree, clustering (Community), and to studyhow the behavior of network members is affected by their positions and connections(Centrality) and connected components, shortest path between two actors.

There are two basic kinds of network analysis, reflecting two basically different kinds ofdata [2]:

1. Socio-centric (complete networks)

2. Ego-centric. (personal network)

The Socio-centric method maps the relations among actors that are regarded for ana-lytical purposes as bounded social collectives. This is most appropriate for tight-boundnetworks. On the contrary the Ego-centric method maps the relations of a key individ-ual. This is most appropriate for loose-bound networks.

Social networks are also characterized by a distinctive methodology encompassing tech-niques for collecting data, statistical analysis, visual representation, etc.

Social networks have also been used to examine how organizations interact with eachother, characterizing the many informal connections that link executives together, as wellas associations and connections between individual employees at different organizations.For example, power within organizations often comes more from the degree to which anindividual within a network is at the center of many relationships than actual job title.Social networks also play a key role in hiring, in business success, and in job performance.Networks provide ways for companies to gather information, deter competition, andcollude in setting prices or policies.


Figure 2.1: Example of social network visualization. This figure obtained from thistool

2.4 Metrics of Interest in Social Network Analysis

2.4.1 Density of a Network

In network analysis density represents the number of links in a network, expressed as aproportion of the maximum possible number of links. It is inversely related to networksize: the larger the social network, the lower the density because the number of possiblelinks increases rapidly with the number of nodes, whereas the number of ties which eachperson can maintain is limited. Maximum density is found in a complete simple network,that is, a simple network in which all pairs of nodes are linked by an edge

Density = Number of edges / Number of possible edges

Number of possible edges = ((Number of nodes-1)*Number of nodes)/2

If a network has a density of 0.025 then it means that only 2.5 percent of all possible arcsare present. Density of a network useful for determining the size of a Social network.


2.4.2 Degree of a Vertex

The Degree of a vertex is the number of lines incident with it. Degree is a discreteattribute of a vertex (it is always an integer). If a graph is directed graph then that hastwo type of degree. One is in-degree other one is out-degree. The in-degree of a vertexis the number of arcs it receives. The out-degree is the number of arcs it sends. In asimple undirected network, the degree of a vertex is equal to the number of nodes thatare adjacent to this vertex.

In social network degree indicate the popularity of a vertex. If node has maximumdegree then the node is called most popular node. If a node has high degree then thenode can better access to information and better opportunities to spread information[14, 11, 1].

2.4.3 Components

Sometimes, the network is cut up in to pieces. Isolated sections of the network maybe regarded as cohesive subgroups because the nodes within a section are connected,whereas nodes in different sections are not. Connected components are a sub graph.Each vertex belongs to exactly one component. A network may have several connectedcomponents.

2.4.4 Shortest Path and Shortest Path Length

A shortest path between two nodes is a minimal length path between them. Methodsto find a shortest path were discovered and analyzed already in the late 50’s and early60’s by Bellman, Bock and Cameron, Caldwell, Dantzig, Dijkstra, Floyd, and probablyothers. [17, 7].

In unweighted graphs, the distance between two nodes s and t (source and target, re-spectively) is defined as follows. If s and t are connected by an edge, their distance is 1.If they are not directly connected, the distance is defined by the length of a shortest pathbetween s and t, which is a sequence of adjacent edges. In a weighted graph, the lengthof a path is defined by the sum of the weights of the edges on the path. Consequently,shortest paths are defined with respect to these weights. Note that, in weighted graphs,even if two nodes are connected by an edge, depending on its weight, the edge is notnecessarily part of any shortest path. In social network, weight indicates relationshipbetween two nodes so we can consider social network is an unweighted graph.


Figure 2.2: Example of connected components. This network has three connectedcomponents. This figure is obtained from our tool by the application of connected

components measure

2.4.5 Centrality

Centrality is the oldest concepts in network analysis. Most social networks containpeople or organizations that are central. Because of their position, they have betteraccess to information and better opportunities to spread information. This is known asthe ego-centered approach to centrality. Viewed from a socio centered perspective, thenetwork as a whole is more or less centralized.

Centrality measures have been applied in a diversity of research works, for instance,to investigate influence patters in inter organizational networks, to study the power orcompetence in organizations, analyzing the structure of terrorist and criminal networks,analyzing employment opportunities, and many other fields

We can measure centrality using several methods such as Degree centrality, Betweennesscentrality, Closeness centrality, and Eigen vector centrality [5, 14, 1, 8].


Figure 2.3: A case of centrality measures applied on an example network

2.4.5.1 Degree Centrality

Degree centrality measures the number of direct connections that an individual node hasto other nodes within a network. Nodes with high degree centrality have been shown tobe more active and influential.

2.4.5.2 Betweenness Centrality

Betweenness centrality measures the extent to which a node can act as an intermediaryor broker to other nodes. The more times that a particular node lies on paths that existbetween other pairs of nodes in the network, the higher the betweenness centrality is forthat node. Nodes that have a high betweenness centrality may act as brokers betweensubgroups and they may have stronger membership in surrounding communities [7]

2.4.5.3 Closeness Centrality

Closeness centrality measures how many steps on average it takes for an individualnode to reach every other node in the network. In principle, nodes with high closenesscentrality should be able to connect more efficiently or easily with other nodes

2.4.6 Community

A community within a network is a densely connected group of nodes. There are severalmethod exist for detecting community structure. Weak link removing method, removalof broker nodes (broker nodes has high betweenness centrality). modularity method,Eigen vector method and so on.


2.4.6.1 Weak Link Removing Method

In this method, remove the nodes or edges that are below the threshold value. Thresholdvalue should be within the range between minimum weight of the network and maximumweight of the network.

(a) A Netwok before clustering (b) The Netwok after clustering. Three clustersappear.

Figure 2.4: Network Clustering using Weak link removal Method. These figuresobtained from our tool.

2.4.6.2 Removal of Broker Nodes

Broker node is a node that contains maximum betweenness centrality value. In thismethod, remove the broker nodes and their edges that node consist above the thresholdvalue. Threshold value should be within the range between minimum betweenness cen-trality value of the network and maximum betweenness centrality value of the network.

(a) A Netwok before clustering

Figure 2.5: Network Clustering using Broker node removal Method. These figuresobtained from our tool.


2.5 Modularity Division Algorithm

A given group of vertices in a network is considered as a community if the numberof edges within the group is significantly more than we expect by chance. Thus, wedefine the modularity of a division of a network into groups as the number of edgesfalling within groups minus the expected number in a random graph with the samenode degrees[13, 6, 12].

Let G be a random graph with n vertices (i.e., the edges of G are placed at random). Letki be the degree of vertex i in G. Then the expected number of edges between verticesi and j is approximately kikj/M , where M =

∑i ki [13, 6, 12]

Let A be the adjacency matrix of G, i.e. Aij = 1 if vertices i and j are connectedotherwise Aij = 0.

We assume that Aij ∈ 0, 1 (i.e. no parallel edges), and that Aii = 0 (i.e. no loops). themathematical formulation of the modularity Q is given by the sum of (Aij − kikj/M )over all pairs of vertices i, j that fall in the same group.

Let us initially focus on the question whether any good division of the network existsinto just two communities. For a particular division of G into two groups, let si = 1 ifvertex i belongs to group 1 and si = −1 if it belongs to group 2. Observe that

12(sisj + 1) = 1 if i and j are same group. otherwise 0.

Thus we can express the modularity as

Q = 12

∑(Aij − kikj/M)(sisj + 1)

Note that∑

i,jAij =∑

i ki = M . Hence

Q = 12

∑(Aij − kikj/M)sisj

Let s be a column vector whose elements are si. Let B be the matrix with elements

Bij = Aij − kikj/M

Q = 12s

TBs

Here B is called modularity matrix. B is a real symmetric matrix.Thus B is diagonal-izable with n real eigenvalues. Observe that the elements of each row (and column) inB sum to zero. Thus B has always the eigenvector (1, 1, 1, . . . ) with the eigenvalue


zero. Note that the vector (1, 1, 1, . . . ) corresponds to the trivial division of G withall vertices in a single group.

Let β1, β2, ....βn be the Eigenvalues of B, where β1 ≥ β2 ≥ .... ≥ βn. Let u1, u2, ...un bethe corresponding normalized eigenvectors of B. Thus uT

i .uj equals 1 if i = j, and 0 oth-erwise. We proceed by writing s as a linear combination of the normalized eigenvectorsof B.

We proceed by writing s as a linear combination of the normalized eigenvectors ui of Bso that s =

∑ni =1aiui with ai = uT

i .s. Then we find.

Q =∑

i aiuTi B

∑j aiuj =

∑ni =1(uT

i .s)2βi

This method dividing networks into two communities. If we recursively apply thismethod then we can divide the network into more than two communities.

2.6 Network Analysis Software

There exist many network analysis tools to visualize network data and to analyze thenetwork data such as pajek, NetworkX [9], NetMiner, R, and Network Workbench.Network analysis software allows researchers to investigate large networks like terroristor criminal’s network, disease transmission networks and the Internet, etc.

Pajek is a widely used tool for drawing networks, Pajek also has significant analyticalcapabilities, and can be used to calculate most centrality measures, identify structuralholes, block-model, and so on. Macros can be recorded to perform repetitive tasks. Datacan be sent directly to R, to calculate additional statistics.

NetworkX (NX) is a rich integrated tool set for graph creation, manipulation, analysis,and visualization. User interface is through scripting/command-line provided by python.NX includes a large set of key algorithms, metrics and graph generators. Visualizationis provided through pylab and graphviz. NX is an open-source project.

2.7 Visualization of a Network

The human eye is trained for pattern recognition. Therefore, network visualizations helpto trace and present patterns of social network relations. Visualization can be defined asany technique used to create images, diagrams, or animations in order to communicatea message. The visualization can be performed by using graphs made up of nodes andconnection lines.


Figure 2.6: Network Visualization using NetworkX

Figure 2.7: Visualization of a network using our tool

Chapter 3

Tool Development

3.1 Outline

We develop our tool as an application for the visualization, analysis and the manipulationof networks. It has been developed using the in Java programming language and itsupports multiple platforms. It provides a highly customizable graphical representationof networks based on local properties. Nodes can be aggregated and arranged on thespace manually. Network data can be entered via text files. In our tool, we haveincluded most popular network metrics calculations such as: density, degree, degreecentrality, betweenness centrality, neighbor detection, shortest path and shortest pathlength, cluster analysis and etc.

Our tool allow us to search for a single individual node or its neighbors and also enable usto find its most central node. Clusters are automatically arranged into a group and eachgroup is shown using different colors. Another feature of this tool is the visualization ofthe shortest path. Once our calculations are over and the results are obtained, we cansave the visualization as an image and the images has different format such as jpg, png,gif. The result of the analysis data can also be plotted as a graph.

3.2 Implementation of Algorithms

In our tool we have implemented several algorithms that are used for the calculation ofthe network metrics such as the density of a network, the degree of network, find theneighbors of a node, algorithm for shortest path and shortest path length, algorithm forfind the connected components, find the centrality of a node, find the community in anetwork etc.

16

Chapter 3 Tool Development 17

3.2.1 Algorithm for Computing the Density of a Network

This algorithm computes the density of a network. Input of this algorithm is number ofedges and number of nodes in the network. The output of the algorithm is density ofthe network. The running time of the algorithm is constant O(1).

Algorithm 1 Find Density of a networkInput: Number of nodes n and Number of edges e of a networkOutput: return the density of the Network1: npn is number of possible nodes2: npn← n ∗ (n− 1)/23: density ← e/npn4: return density

3.2.2 Algorithm for Finding the Degree of Node

The Algorithm2 find the degree of a given node. Input of this Algorithm2 is a node andlist of edges. Output of the Algorithm2 is degree of the given node. Algorithm3 find thedegree of all nodes. Input of the Algorithm3 is List of edges and output of the algorithmis nodes and degree of the nodes.

In these algorithms, we use Arraylist and Hashtable data structure. Arraylist containsEdges of the network. Each edge is an object. The property of the edge is from node,to node and weight of the edge. Hashtable used for store the output as a table. The keyof the Hashtable is name of the node and value of the Hashtable is degree of the node.

List<Edge> edge = new ArrayList<Edge>();

Map<String, Integer> degree = new Hashtable<String, Integer>();

The running time of these algorithms are O(m). Where m is number of edges in thenetwork. Memory space of the Algorithm2 is constant O(1) and algorithm3 is O(m).

Algorithm 2 Find the degree of a nodeInput: n is a node, edgelist is ListOutput: Degree of the given node1: degree← 02: for all edge in the edgelist do3: if edge.form equals to n then4: degree← degree+ 15: else if edge.to equals to n then6: degree← degree+ 17: end if8: end for9: return degree


Algorithm 3 Find the Degree of all nodesInput: edgelist is a ListOutput: A Hashtable, that contains nodes and degree of the nodes1: degree is a Hashtable2: for all edge in edgelist do3: if edge.from already in the degree then4: degree[edge.from]← degree[edge.from] + 1 { // degree[edge.from] is a Set}5: else6: degree.put(edge.from, 1)7: end if8: if edge.to already in the degree then9: degree[edge.to]← degree[edge.to] + 1 { // degree[edge.to] is a Set}

10: else11: degree.put(edge.to, 1)12: end if13: end for14: return degree

3.2.3 Algorithm for Finding the Neighbors of Node

The Algorithm4 find the neighbors of a given node. Input of this Algorithm4 is a nodeand list of edges. Output of the Algorithm4 is neighbors of the given node. Algorithm5find the neighbors of all nodes. Input of the Algorithm5 is List of edges and output ofthe Algorithm5 is nodes and neighbors of the nodes.

In these algorithms, we use Arraylist, Hashtable and Set data structure. The key ofthe Hashtable is name of the node and value of the Hashtable is a Set, which containsneighbors of the node.

Map<String,Set<String>> neighbor =

new Hashtable<String,Set<String>>();

Set<String> n = new HashSet<String>();

The running time of these algorithms are O(m). Where m is number of edges in thenetwork. Memory space of the Algorithm4 is constant O(1) and algorithm5 is O(m).

3.2.4 Algorithm for Finding the Shortest Path Length

This Algorithm6 finds the shortest path length between given node and all other nodesthese nodes should be in one component. The input of this Algorithm6 is source nodeand Adjacency matrix. We implement Adjacency matrix as a Hashtable, which is equalto result of the Algorithm5.

In this algorithm, we use Hashtable and Set data structure. The key of the Hashtableis name of the node and value of the Hashtable is a shortest path length.


Algorithm 4 Find the neighbors of a nodeInput: n is a node, edgelist is ListOutput: Neighbors of the given node1: S is a Set2: for all edge in the edgelist do3: if edge.form equals to n then4: add edge.from into the Set S.5: else if edge.to equals to n then6: add edge.to into the Set S.7: end if8: end for9: return S

Algorithm 5 Find the neighbors of all nodesInput: edgelist is a ListOutput: A Hashtable, that contains nodes and neighbor of the nodes1: neighbor is a Hashtable (Key of the Hashtable of the hash table is a node and value

of the hash table is a Set, that contains neighbors of the node)2: for all edge in the edgelist do3: if edge.from already in the neighbor then4: neighbor[edge.from].add(edge.to) {//neighbor[edge.from] is a Set}5: else6: neighbor.put(edge.from, edge.to)7: end if8: if edge.to already in the neighbor then9: neighbor[edge.to].add(edge.from) {//neighbor[edge.to] is a Set}

10: else11: neighbor.put(edge.to, edge.from)12: end if13: end for

Map<String, Integer> ht =

new Hashtable<String, Integer>();

Set<String> nextlevel = new HashSet<String>();

Set<String> thislevel = new HashSet<String>();

The order of this Algorithm6 is O(m+ n). Where m is number of edges in the networkand n is number of nodes in the network.

3.2.5 Algorithm for Finding the Connected Components

A network may contain isolated parts (Connected components). This Algorithm7 iden-tifies the isolated parts. Input of this Algorithm7 is Adjacency matrix. We implementAdjacency matrix as a Hashtable, which is equal to result of the Algorithm5 .

In this Algorithm7, we use Hashtable and Set data structure. The key of the Hashtable


Algorithm 6 Shortest path lengthInput: Adjacency matrix (A) of a network and source node. A is a hashtable.Output: return the shortest path length from source node to every other node.1: level← 02: splength is a hashtable3: nextlevel and thislevel are two Sets.4: add the source into nextlevel5: while size of the nextlevel > 0 do6: Add the all element in the nextlevel into thislevel7: Clear the nextlevel8: for all item in the thislevel do9: if splength contains key item then

10: put the item as key and level as value into the splength.11: S is a Set12: S ← A[item]13: Add the all elements in the S into nextlevel14: end if15: end for16: level← level + 117: Clear the thislevel Set18: end while19: return splength

is index of the component and value of the Hashtable is a Set, which contains nodes,which are in same components.

Map<Integer, Set<String>> components =

new Hashtable<Integer, Set<String>>();

The running time of this Algorithm7 is O(m + n). where n is number of nodes and m

is number of edges in the network.

3.2.6 Algorithm for Finding the Shortest Path

This Algorithm8 finds the shortest path from source node to other nodes, which nodes ina component. Input of this Algorithm8 is source node, destination node and Adjacencymatrix as a Hashtable, which is equal to result of the Algorithm5. Output of thisAlgorithm8 is shortest path from source node to other nodes.

In this Algorithm8, we use List, PriorityQueue, Hashtable and Set data structure. Listand PriorityQueue are used for temporary storage. The key of the Hashtable is node andvalue of the key is an ArrayList, which contains some nodes, which are in the shortestpath from source node to node in key of the Hashtable.


Algorithm 7 Connected ComponentsInput: Adjacency matrix (A) of a network and source node. A is a hashtable. Key of

the Hashtable is string type and value is a Set that contains String type variablesOutput: A hashtable, that contains Connected Components1: count← 12: components and c are Hashtables3: temp and d are Sets4: for all key in A.keySet() do5: if key is not in temp then6: c← shortestpathlength(A, key)7: d← c.keySet()8: components.put(count, set)9: add all elements in the d into the temp Set

10: count← count+ 111: end if12: end for13: clear the temp14: return components

Map<String, ArrayList<String>> path =

new Hashtable<String, ArrayList<String>>();

ArrayList<String> seen = new ArrayList<String>();

PriorityQueue<String> queue = new PriorityQueue<String>();

The running time of this Algorithm8 is O(m+ n) where m is number of edges and n isnumber of nodes in the network.

3.2.7 Algorithm for Finding the Degree Centrality

This Algorithm9 finds the degree centrality of a network. Input of this Algorithm9 isnumber of nodes and Adjacency matrix as a Hashtable, which is equal to result of theAlgorithm5. Output of this algorithm is Degree centrality of all nodes.

In this algorithm, we use Hashtable data structure. The key value of the Hashtable isnode name and value of the key is degree centrality value.

The running time of this Algorithm9 is O(n) where n is number of nodes in the network.

3.2.8 Algorithm for Finding the Betweenness Centrality

This algorithm finds the betweenness centrality of a network. Input of this algorithm isAdjacency matrix as a Hashtable, which is equal to result of the Algorithm5. Outputof this algorithm is Betweenness centrality of all nodes[5].


Algorithm 8 Single source dijkstra shortest path algorithmInput: Adjacency matrix (A) of a network, source node and distination node. A is

a hashtable. Key of the Hashtable is string type and value is a Set that containsString type variables

Output: Return the shortest path form source node to all other nodes.1: path is a Hashtable2: seen is an List3: queue is a PriorityQueue4: set is a List5: v is a String type variable6: add the source into the set List7: path.put(source, set)8: add the source into the seen9: add the source into the queue

10: while the size of the queue > 0 do11: v ← get the front element of the queue {queue.poll()}12: if v equals distination then13: break the loop14: end if15: nset is a Set16: nset← neighbors of v {A.get(V)}17: for all value in nset do18: if seen do not contains the value then19: add the value into the seen20: add the value into the queue21: add the value into List path.get(v)22: end if23: end for24: end while25: return path

Algorithm 9 Degree centralityInput: Adjacency matrix (A) of a network and n . A is a hashtable. Key of the

Hashtable is string type and value is a Set that contains String type variables. n isnumber of nodes

Output: Degree centrality of all node1: d is a double type variable2: dc is a Hashtable3: for all element in A.keys() do4: d← sizeofSetA.get(element)5: d← d/n6: dc.put(element, d)7: end for8: return dc

In this algorithm, we use Hashtable data structure. The key of the Hashtable is nodename and value of the key is Betweenness centrality value.

Map<String, Double> betweenness = new Hashtable<String, Double>();


Map<String, Double> sigma = new Hashtable<String, Double>();

Map<String, Double> delta = new Hashtable<String, Double>();

PriorityQueue<String> Q = new PriorityQueue<String>();

Stack<String> S = new Stack<String>();

The running time of this algorithm is O(mn) where m is number of edges and n is thenumber of nodes in the network.

3.2.9 Algorithm for Finding the Mutual Neighbors of Two Nodes

This algorithm finds the mutual neighbors of given two nodes. Input of this algorithmis two nodes and Adjacency matrix as a Hashtable, which is equal to result of theAlgorithm5. Output of this algorithm is a Set, which contains mutual neighbors of thegiven nodes.

In this algorithm, we use Hashtable and Set data structure. The running time of thisalgorithm is O(n) where n is minimum number of neighbors, which may contains node1or node2.

3.2.10 Algorithm for Community Detection (Weak Link Removing Al-

gorithm)

The Algorithm12 used for identify the community in a network. Input of this Algo-rithm12 is Edge list and threshold value. Output of the Algorithm12 is an edge list,which contains edges the weight of the edges is greater than threshold value.

In this Algorithm12, we use List data structure. The list contains edges of the network.Each edge has several properties such as form node, to node and weight of the edge.The running time is O(m) where m is number of edges in the network.


Algorithm 10 Betweenness CentralityInput: Adjacency matrix A of a network. A is a hashtable.Output: Betweenness centrality of all node1: betweenness, P, sigma, delta and D are Hashtables2: Q is PriorityQueue3: S is a Stack4: for all node in NodeList do5: betweenness.put(n, 0.0)6: end for7: for all node in NodeList do8: S ← emptyStack9: for all n in NodeList do

10: P.put(n, emptyList)11: sigma.put(n, 0.0)12: delta.put(n, 0.0)13: end for14: sigma.put(node, 1.0)15: D.put(node, 0.0)16: Q.add(node)17: while ThesizeoftheQ > 0 do18: v ← frontelementofQ19: S.add(v)20: dv = D.get(v)21: sigmav = sigma.get(v)22: for all neighborwofv do23: if D does not contain Key w then24: Q.add(w)25: D.put(w, (dv + 1))26: end if27: if D.get(w) equals to (dv + 1) then28: s1← sigma.get(w)29: sigma.put(w, s1 + sigmav)30: append w with P [w]31: end if32: end for33: end while34: while sizeoftheStackS > 0 do35: w ← S.pop()36: coeff ← (1.0 + delta.get(w))/sigma.get(w)37: for all v in P.get(w) do38: d1← delta.get(v)39: dl← d1 + coeff ∗ sigma.get(v)40: delta.put(v, dl)41: end for42: if w not equals to s then43: aa← betweenness.get(w)44: aa← aa+ delta.get(w)45: betweenness.put(w, aa)46: end if47: end while48: end for49: return betweenness


Algorithm 11 Find the Mutual Neighbors of Two NodesInput: Two nodes in the network and Adjacency matrix (A) of a network . A is a

hashtable.Output: Set of mutual Neighbors.1: f1, f2 and mf are Sets2: for all f in f1 do3: if f2 contains s then4: mf.add(s)5: end if6: end for7: return mf

Algorithm 12 Weak Link Removing AlgorithmInput: E is an edge List and threshold valueOutput: Communities in the network1: for all edge in E do2: if edge.weight < threshold then3: E.remove(edge)4: end if5: end for

Chapter 4

Results and Performance

Comparison

4.1 Results

In this section we elaborately describe the functionalities of our tool start form at verytop level.

4.1.1 The Overview of a Network

We can view the number of nodes and the number of edges in the network,the densityof the network, number of connected components in the network, most popular nodes,Maximum degree and Average degree of the network etc.When network data are visuallyrepresented the same data can also viewed in numerical format in the side panel.

4.1.2 Degree of a Node

We can obtain the degree of all nodes in a network, maximum degree and average degreeof the network using our tool. The side panel shows the degree of all nodes as a treestructure (see the figure 4.2). If we select a node from the tree structure, we can view theselected node and neighbor of the selected node and the selected node and the neighborof the selected node is shown different color.

4.1.3 Connected Components of a Network

We can simply identify the isolated group (connected Components) in the network usingour tool and we can measure the number of connected components and each connected

26

Chapter 4 Results and Performance Comparison 27

Figure 4.1: Overview of the network. Number of nodes = 297, Number of Edges =2361, Density = 0.0537, Number of Connected Components = 1, Most popular nodes

= 305, Maximum Degree = 134, Average Degree = 14.478114.

Figure 4.2: Degree of all nodes and their neighbors.

components are automatically grouped and each groups are shown different color.

4.1.4 Centrality in a Network

We can compute different kinds of centrality using our tool such as degree centralityand betweenness centrality.Both degree and betweenness centrality are displayed as atable. The table contains node and centrality value of the node. We can save the resultof centrality computations as a text file.


Figure 4.3: Connected components in a network. In this network, 13 connectedcomponents found.

4.1.5 Shortest Paths and Shortest Path Lengths

We can find the shortest path from source node to destination node using our tool. Ifwe give a source node and destination node then our tool display the shortest path fromsource node to all other nodes the numerical result can be saved as a text file. we canalso shown the visualization of the shortest path from source node to destination node.

4.1.6 Community in a Network

We can identify the community in the network using our tool. In our tool, we haveused two algorithm to detect the community such as weak link removal algorithm andremoval of broker nodes. We can show the visualization of the community and details ofinvolved nodes in each community. We can save the details of involved nodes as a textfile.

4.2 Performance Comparison

We compare the performance of our tool with NetworkX. NetworkX (NX) is a richintegrated tool set for graph creation, manipulation, analysis, and visualization. Userinterface is through scripting/command-line provided by python.


(a) Degree Centrality (b) Betweenness Centrality

Figure 4.4: Centrality Measure using this tool

4.2.1 Visualization Speed

In performance comparison, first we would like to compare the visualization speed ofour tool and NetworkX.

Figure4.7 shows the comparison of visualization speed for various data sets with variousedges. Our tool display good performance when increase the number of edges.

4.2.2 Centrality

In performance comparison, we would like to compare the Centrality of node. Wecompare Degree centrality and Betweenness centrality, which are obtained by this tooland NetworkX.


Figure 4.5: This figure shows visualization of the shortest path from node 87 to node5 and shortest form 87 to all other nodes, which are in the same connected components.

Figure 4.6: This is a Result of community detection. Here we use weak link removalalgorithm to detect the community and 18 communities detected. The threshold value

is 14

Figure4.8 shows the comparison of degree centrality. NetworkX has taken more timethan this tool. Figure4.9 shows the comparison of betweenness centrality. Both toolstake approximately equal time.


Figure 4.7: Visualization speed comparison between NetworkX and This Tool

Figure 4.8: Comparing degree centrality performance using our tool and NetworkX


Figure 4.9: Comparing betweenness centrality performance using our tool and Net-workX

4.2.3 Discussion

In social network analysis, handling of large data sets is a problem. Many tools do notsupport for large data sets. Our tool can support for large data set because we use Javacollection frame work for store the data. Java collection frame work is scalable.

In visualization speed performance comparison our tool works better than NetworkXwhen increase the number of edges. It is indicate our tool handle large data sets.

Degree centrality algorithm of our tools displays good performance than NetworkX’salgorithm degree centrality algorithm. Betweenness centrality algorithm also displaygood performance than NetworkX’s betweenness centrality algorithm.

Chapter 5

Conclusions and Future Work

In this project, we have developed a light weight tool for social network analysis. Thistool has been implemented using the Java programming language and NetBeans 6.9IDE.

In this tool, many algorithms are developed to measure the Metrics (Measures) in socialnetwork analysis such as finding the degree of a node, neighbors of a node, degreecentrality, weak link removal based community detection and etc.

We have carried out a comparative study by comparing the performance of our tool withother well known tool NetworkX.

Our results show that our tool displays a good performance lead when we use large datasets and our degree centrality algorithm works better than the NetworkX’s algorithm.Visualization speed of this tools is very much higher than NetworkX.

Our future work includes developing various algorithms for centrality, such as closenesscentrality, stress centrality and Eigen vector centrality. We also plan to develop variousalgorithms for detecting community structure such as Spectral Algorithms, DynamicAlgorithms and etc. We also plan to enhance the visualization component of our tool bydisplay the properties of the nodes such as image of the node and location of the nodeand etc.

33

Bibliography

[1] A. Abraham, A. E. Hassanien, and V. Snasel. Computational Social Network Anal-ysis. 2010.

[2] A.D. Andrea, F. Ferri, and P. Grifoni. An overview of methods for virtual socialnetworks analysis. 2010.

[3] S. Balasubramanian and V. Mahajan. The economic leverage of the virtual com-munity. Technical report, 2001.

[4] V. BATAGELJ. Social network analysis, large-scale. Technical report, Universityof Ljubljana, Ljubljana, Slovenia, 2008.

[5] U. Brandes. A faster algorithm for betweenness centrality. 2001.

[6] A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in verylarge networks. 2004.

[7] Z. Dong, G. Song, K. Xie, and J. Wang. An experimental study of large-scalemobile social network, 2009.

[8] M.G. Everett and S.P. Borgatti. The centrality of groups and classes. Journal ofMathematical Sociology, 1999.

[9] Aric Hagberg, Daniel A. Schult, and Pieter J. Swart. NetworkX.http://networkx.lanl.gov.

[10] J. Hagel and A. Armstrong. Net gain: expanding markets through virtual commu-nities. Technical report, 1997.

[11] J. Han and M. Kamber. Data Mining:Concepts and Techniques, Second Edition.Diane Cerra, 2006.

[12] M. E. J. Newman. Finding community structure in networks using eigenvector ofmatrices. pages 4–7, 2006.

[13] M. E. J. Newman. Modularity and community structure in network. In Proceedingsof the National Academy of Sciences, USA, 2006.

34

BIBLIOGRAPHY 35

[14] W. Nooy, A. Mrvar, and V. Batagelj. Exploratory Network Analysis with Pajek.The United States of America by Cambridge University Press, New York, 2005.

[15] CM. Ridings, D. Gefen, and B. Arinze. Some antecedents and effects of trust invirtual communities. Technical report, 2002.

[16] C. Romm, C. Pliskin, and R. Clarke. Virtual communities and society: toward anintegrative three phase model. Technical report, 1997.

[17] C. Sommer. Approximate Shortest Path and Distance Queries in Networks. PhDthesis, Department of Computer Science,Graduate School of Information Scienceand Technology,The University of Tokyo, 2010.