discovering important nodes through graph entropy

17
Discovering Important Nodes through Graph Entropy Jitesh Shetty, Jafar Adibi [KDD’ 05] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/09/18 1

Upload: dexter-bonner

Post on 02-Jan-2016

41 views

Category:

Documents


4 download

DESCRIPTION

Discovering Important Nodes through Graph Entropy. Jitesh Shetty, Jafar Adibi [KDD’ 05] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/09/18. Outline. Introduction Order In Networks Graph Entropy Experimental Result Conclusions. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Discovering Important Nodes through Graph Entropy

Jitesh Shetty, Jafar Adibi[KDD’ 05]

Advisor: Dr. Koh Jia-LingReporter: Che-Wei, Liang

Date: 2008/09/18

1

Outline

• Introduction• Order In Networks• Graph Entropy• Experimental Result• Conclusions

2

Introduction

• A new challenge in the area of Link Discovery and Social Network Analysis– To exploit communication pattern information

and text information within knowledge discovery processes

– such as discovery of hidden organizational structure and selection of interesting prominent members

3

Introduction

• Email logs– Prime importance and relevance in the study of

information flow in an organization– Evidence database for law enforcement and

intelligence organizations to detect hidden groups in an organization which are engaged in illegal activities

• Graph entropy– To determine the most prominent interesting people

4

Order In Networks

• A graph model might not be the best representation of organizations– Such as drug dealers, terrorist organization, threat

groups

• Usually ignore their hierarchy– They are composed of leaders and followers

5

Order In Networks

• Example

6

Graph Entropy (1/6)

• To find prominent people in a network– Need to aggregate links between them and discover

which node has the most effect on network– Entropy model can identify an entity that most effect

on the graph entropy

• Transform the problem space into a multigraph– Each node represents an entity, each link represents

action between entities

7

Graph Entropy (2/6)

8

Graph Entropy (3/6)

• Let G = (V, E) be a graph. P is the probability distribution on the vertex set V(G)

• P(AemailB) =

9

Graph Entropy (4/6)

• A great concern in LD domain is that elements of data are not independent– Ex: link AsendemailtoB and link BsendemailtoC are

dependent to each other, means B may forward A’s email to C

• Three approach to discover dependency1.Examine the similarity of emails2.check

10

Graph Entropy (5/6)

3. Exploitation of Markov Blanket type of model– Assume an event(link) between two nodes is only

dependent to those node’s events

11

Graph Entropy (6/6)

12

Experiment• Enron Email Dataset– 151 users, mostly senior management of Enron– contains 252,759 email messages– Almost all users use folders to organize their

emails

13

Experiment

14

Experiment

• Created an Enron dictionary– Normalized all emails using porter stemming

algorithm– Compare the vectors using Jaccards Algorithm

• Ordered emails based on the time stamp

15

Experiment

16

Conclusions

• Defined and addressed the problem of important nodes and finding closed group around them

• Using event based entropy to find influential nodes in a graph and exhibit entropy model can act as a good means for detecting influential nodes

17