![Page 1: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/1.jpg)
Team No 24Integrating Network
Discovery and Community Detection
Nikhil Daliya - 201301142Athresh G - 201505565
![Page 2: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/2.jpg)
Overview Integrating network discovery and community detection routines for nodes in thegiven network and identifying the characteristics of the nodes (constant or rapidlychanging) in the network.
![Page 3: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/3.jpg)
Dataset Railway datasetRailway network, proposed by [Ghosh et al. 2011] consists of nodes representing railway stations in India, where two stations si and sj are connected by an edge if there exists at least one train-route such that both si and sj are scheduled halts on that route. Here the communities are states/provinces of India since the number of trains within each state is much higher than the trains in-between two states.
![Page 4: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/4.jpg)
Dataset FootballFootball network, proposed by [Girvan and Newman 2002a] contains the network of American football games between Division IA colleges during the regular season of Fall 2000. The vertices in the graph represent teams (identified by their college names) and edges represent regular-season games between the two teams they connect.
![Page 5: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/5.jpg)
DatasetFootballThe teams are divided into conferences (indicating communities) containing around 8-12 teams each. Games are more frequent between members of the same conference than between members of different conferences. Teams that are geographically close to one another but belong to different conferences are more likely to play one another than teams separated by large geographic distances.
![Page 6: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/6.jpg)
Application●Exploring the adversarial networks(such as terrorist networks).
●Clustering in social networks.
●Politeness policies on crawling website makes it difficult to mine the whole network on social networking sites. There are space and bandwidth limits which put constraints on the size of network that can be mined.
![Page 7: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/7.jpg)
Challenges● Dynamic discovery of the network imposes problems in clustering of nodes . ● Identifying the characteristic of nodes(constant , changing or rapidly changing) is difficult problem.●The dataset grows rapidly with network discovery and keepingtrack of probability distribution of each node for different communities ischallenging task.
![Page 8: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/8.jpg)
Tools Used● Third party package ( https://sites.google.com/site/santofortunato/inthepress2 ) for generating synthetic graphs as input.● Language to be used: Python and Java. Packages such as panda, numpy, scikit learn, networkX and igraph will be used accordingly.● matplotlib package for plotting the results for better visualization and understanding.
![Page 9: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/9.jpg)
Implementation●We have used 2 modules mainly ChooseNode which chooses node in each iteration to be merged to the network and UpdateCommunity which will update the community or clusters from the choosen node.
●Spectral clustering is applied on the initial set of target nodes.
![Page 10: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/10.jpg)
ImplementationDuring ChooseNode we use 2 measures to choose the node for updation.Ncut measure : minimize the similarity across a cut, while simultaneously maximizing the similarity within the same community.
Modularity : additional fraction of the edges that fall within the given communities over the expected fraction
![Page 11: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/11.jpg)
ImplementationI/P : ●Initial set of clustering , Initial network, cost and budget.
O/P : ●Final network and nodes with clusters formed from nodes we have discovered.
●List of rapidly changing nodes in the network.
![Page 12: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/12.jpg)
Results and Analysis- We have used Average Clustering Purity (ACP) and Average Clustering Entropy (ACE) to measure effectiveness of our algorithm.
- Both these measures incorporates the fraction of nodes of particular cluster belonging to same class as their measure.
![Page 13: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/13.jpg)
Results and AnalysisRailway Dataset :
Total no. of target nodes : 80
Average cluster purity : 0.79
Average Cluster entropy : 0.17
Rapidly changing nodes : 6,47,84,91
![Page 14: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/14.jpg)
Results and AnalysisRailway Dataset :
![Page 15: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/15.jpg)
Results and AnalysisRailway Dataset :
![Page 16: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/16.jpg)
Results and Analysis Football Dataset :
Total no. of target nodes : 48
Average cluster purity : 0.91
Average Cluster entropy : 0.11
Changing nodes : 51 , 63 , 49
![Page 17: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/17.jpg)
Results and Analysis Football Dataset :
![Page 18: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/18.jpg)
References
Research paper : On integrating Network and
Community Discovery
http://hanj.cs.illinois.edu/pdf/wsdm15_jliu.pdf
![Page 19: Integrating Network Discovery and Community Detection (IRE IIITH) Team 24](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecae701a28abf1048b46bf/html5/thumbnails/19.jpg)
Thank You !!!!