mining social network for personalized email prioritization

21
Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae Yoo, Yiming Yang, Frank Lin, and Il- Chul Moon

Upload: kiona-dotson

Post on 31-Dec-2015

18 views

Category:

Documents


2 download

DESCRIPTION

Mining Social Network for Personalized Email Prioritization. Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon. Outline. Problem Description Approaches Experiments Contributions. Problem Description. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mining Social Network for Personalized Email Prioritization

Mining Social Networkfor Personalized Email Prioritization

Language Techonology InstituteSchool of Computer Science

Carnegie Mellon University

Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon

Page 2: Mining Social Network for Personalized Email Prioritization

2

Outline

Problem Description Approaches Experiments Contributions

Page 3: Mining Social Network for Personalized Email Prioritization

3

Problem Description

Email Overload is severe problem Identifying Importance of email will alleviate

email overload Challenges

No access to other people’s emails and labels Personalized labeling is time consuming The same message may have different

priority labels for different recipientsWe want to leverage the sparse training

data by using social network of each user

Sparse Training Data

Page 4: Mining Social Network for Personalized Email Prioritization

4

Outline

Problem Description Approaches

Social Clustering Social Importance Semi-supervised Importance Propagation

Experiments Conclusion and Future Work

Page 5: Mining Social Network for Personalized Email Prioritization

5

Social Clustering – Motivation

Personal Email Inbox Lots of unlabeled emails No privacy issue

Observations The sender can be important Some senders are not appeared in the training set at all

or very few instances Need generalization of sender Let’s find similar senders from social network

Page 6: Mining Social Network for Personalized Email Prioritization

6

Social Clustering – Contact Network

Personal Contact NetworkG =(V,E ) All the network is constructed from personal

inbox

3 541 2Agent/Person

Page 7: Mining Social Network for Personalized Email Prioritization

1 1

Social Clustering – Newman Clustering

Newman Clustering Algorithm [Newman, 04] Find social cliques or cohesive social groups Based on edge betweeness

The number of shortest path that go through the edge / the total number of shortest path

Drop edges from highest edge betweeness Hard clustering

1

2 3

4

5 6

9

4 4 4 4

Group A Group B

Page 8: Mining Social Network for Personalized Email Prioritization

Social Clustering – Validations

8

Clusters are coherent!

Page 9: Mining Social Network for Personalized Email Prioritization

Social Clustering – Feature Incorporation

Extended Vector Space text: social network: combined: The combined vector space is used as

enriched feature set to the email prioritizer

9

Page 10: Mining Social Network for Personalized Email Prioritization

10

Social Importance – Motivations

Social Importance A person in the center of a cluster might be

more important than others Betweeness

Edge betweeness for Newman Clustering Vertex betweeness

The degree of communication bottleneck from social network Contact points among the network Might be important person We may try other kinds of social importance metrics too

Page 11: Mining Social Network for Personalized Email Prioritization

11

Social Importance – Metrics

Metrics Degree (in, out, total) [Wasserman and Faust, 94] Clique Counts (ClqCnt) [Wasserman and Faust, 94]

The number of clique sub-graphs which contain a node v Betweeness (BetCent) [Freeman, 77] HITS Authority (Authority) [Kleinberg, 99]

λ: the greatest Eigen value r : the Eigen vector similar to PageRank scores

Neighborhood Connectivity (“Clustering Coefficient”, ClustCoef) [Boykin and Roychowdhury, 05]

measure the connectivity among the neighbor of a node v

Page 12: Mining Social Network for Personalized Email Prioritization

Social Importance – Validations

Correlation coefficients with priority levels

12

),( yfPCC

]5..1[iy valuefeature ssender' email : thifi

Page 13: Mining Social Network for Personalized Email Prioritization

SIP- Motivations

Semi-supervised Importance Propagation (SIP)

Can we propagate importance labels? Bi-partite graph, Labels only in Emails

13

Agent/Person

Emails

4 3 2 ? ?

???? ?

Page 14: Mining Social Network for Personalized Email Prioritization

SIP- Email Network

A: Sender to Emails (N x M) BT: Email to Recipients (M x N) xk: kth importance labels for emails(M x 1)

yk=Bxk (N x 1)

1414

Agent/Person

Emails

4 3 2 ? ?

???? ?

Page 15: Mining Social Network for Personalized Email Prioritization

SIP - Algorithm

Problems of the above propagation

may not be irreducible is insensitive to (not personalized)

Apply Personalized PageRank with Normalize and column-wise normalize

C :C’

15

kt

k

tTtk

Ttk

tk xBCxBBAyBAyCy

1

1tky

kx

': 11kk yy

]1,0[ and 1' where)1(' 1

kkkk yUUCE

ktkk yEy

TBAC

1ky

Page 16: Mining Social Network for Personalized Email Prioritization

16

Outline

Problem Description Approaches Experiments Contributions

Page 17: Mining Social Network for Personalized Email Prioritization

Collected Data 25 subjects are recruited from Canegie Mellon University 7 users who submitted more than 200 emails 1 faculty, 2 staffs, 4 students

17

Experiments – Data Collection

Training Testingtime

Page 18: Mining Social Network for Personalized Email Prioritization

18

Experiments – Metrics

Mean Absolute Error (MAE)

1.0 MAE means on average the prediction is deviated from the truth by one priority level

MAE considers the difference among the errors It ranges from 0 to 4 when we use five importance level 1 vs. 5 and 4 vs. 5

Micro-MAE Pooling the test instances from all users to obtain a joint test set

Macro-MAE Compute each user MAE first and then take the average of per-user

MAE

Page 19: Mining Social Network for Personalized Email Prioritization

Experiments – Setups

Features : four subsets Basic Feature (BF) : from, to, cc, title, body Newman Clustering (NC) Social Importance (SI) Semi-supervised Importance Propagation (SIP)

Ten times random shuffling among training data

Linear SVM 10 Fold C.V. for parameter tuning Tuned regularization parameter [10-3.. 103]

19

Page 20: Mining Social Network for Personalized Email Prioritization

Experiments – Results

20

Page 21: Mining Social Network for Personalized Email Prioritization

21

Contributions

The first study on personalized email prioritization Using statistical classification and clustering Based on fine-grained personal judgments with multiple

users Enriched representation through personal Social

Network Social Clustering Social Importance Estimation Semi-supervised Importance Propagation

Fully personalized methodology Technical development and Evaluation