Download - Mining Email Social Networks
![Page 1: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/1.jpg)
Mining Email Social Networks
Christian Bird, Alex Gourley,Prem Devanbu, Michael Gertz, Anand Swaminathan
University of California, Davis
Presented By:Arnamoy Bhattacharyya
![Page 2: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/2.jpg)
Communication & Co-ordination (C&C) activities are central to large software projects
![Page 3: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/3.jpg)
Communication & Co-ordination (C&C) activities are central to large software projects
Difficult to observe and study in traditional (closed-source, commercial) settings
![Page 4: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/4.jpg)
Communication & Co-ordination (C&C) activities are central to large software projects
Difficult to observe and study in traditional (closed-source, commercial) settings
the email archives of OSS projects provide a useful trace of the communication and co-ordination activities of the participants
![Page 5: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/5.jpg)
CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
![Page 6: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/6.jpg)
CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
Anyone can post messages to the list.
![Page 7: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/7.jpg)
CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
Anyone can post messages to the list.
Posted messages are visible to all the mailing list subscribers.subscribers.
![Page 8: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/8.jpg)
CHATTERERS & CHANGERS
A mailing list in an OSS project is a public forum
Anyone can post messages to the list.
Posted messages are visible to all the mailing list subscribers.
Posters include developers, bug-reporters, contributors (who submitpatches, but don't have commit privileges) and ordinaryusers.
subscribers.
![Page 9: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/9.jpg)
A response b to a message a is an indication That –
the sender of b; (Sb) found that the sender of a; (Sa) had something interesting to say
![Page 10: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/10.jpg)
A response b to a message a is an indication That –
the sender of b; (Sb) found that the sender of a; (Sa) had something interesting to say
It is also an indication of Sa’s status, i.e., Sb indicates that s/he found Sa's email worth reading, and worthy of response.
![Page 11: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/11.jpg)
A response b to a message a is an indication That –
the sender of b; (Sb) found that the sender of a; (Sa) had something interesting to say
It is also an indication of Sa’s status, i.e., Sb indicates that s/he found Sa's email worth reading, and worthy of response.
However, the vast majority of individuals participating on the email list sent very few messages, and received very few replies to their messages
![Page 12: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/12.jpg)
OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog“ - Peter Steiner
![Page 13: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/13.jpg)
OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog"
The same individualcan use different email aliases
![Page 14: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/14.jpg)
OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog"
The same individualcan use different email aliases
developer Ian Holsman uses 7 different email aliases
![Page 15: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/15.jpg)
OF DOGS AND DEVELOPERS
“On the Internet, no one knows if you're a Dog"
The same individualcan use different email aliases
developer Ian Holsman uses 7 different email aliases
Ignoring these aliases would confound latersteps of data analysis
![Page 16: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/16.jpg)
Unmasking Aliases
Most emails include a header that identifies the sender, of this form:
From: "Bill Stoddard" <[email protected]>
![Page 17: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/17.jpg)
Unmasking Aliases
Most emails include a header that identifies the sender, of this form:
From: "Bill Stoddard" <[email protected]>
Crawl messages and extract all headers to produce a list of <Name,email> identifiers (IDs)
Execute a clustering algorithm that measure the similarity between every pair of IDs
Manually Post Process the clusters formed to remove further aliases
![Page 18: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/18.jpg)
Unmasking Aliases
Most emails include a header that identifies the sender, of this form:
From: "Bill Stoddard" <[email protected]>
Crawl messages and extract all headers to produce a list of <Name,email> identifiers (IDs)
Execute a clustering algorithm that measure the similarity between every pair of IDs
Manually Post Process the clusters formed to remove further aliases
set the cluster similarity threshold quite low:easier to split big clusters than to unify two disparate clusters from a very large set.
![Page 19: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/19.jpg)
THE CLUSTERING ALGORITHM
1. Normalize name
à remove all punctuation, suffixes(“jr")
àturn all whitespace into a single space
à Remove generic terms like “admin", “support", from the name
à split the name into first name and last name (using whitespace and commas as cues)
![Page 20: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/20.jpg)
THE CLUSTERING ALGORITHM
2. Name Similarity:
Use a scoring algorithm between –
à The full namesà The first name and last name separatelyà Consider names similar if the full names are similar, orif both first and last names are similarif both first and last names are similar
e.G Andy Smith <-> Andrew Smith
Deepa Patel !<-> Deepa Ratnaswamy
![Page 21: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/21.jpg)
THE CLUSTERING ALGORITHM
3. Names-email Similarity:
à If the email contains both first and last names – match
Arnamoy Bhattacharyya <-> [email protected]
à if the email contains the initial of one part of the name and entirety of the other part – match
Erin Bird <-> ebirdErin Bird <-> erinb
![Page 22: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/22.jpg)
4. Email Similarity:
à If the Levenshtein edit distance between two email address bases (not including the domain, after the "@") is small – Match
THE CLUSTERING ALGORITHM
![Page 23: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/23.jpg)
THE CLUSTERING ALGORITHM
5. Cumulative ID similarity:
à The similarity between two IDs is the maximum of the all mentioned above
E.G
Name Similarity – 3Names-email similarity – 5Names-email similarity – 5Email Similarity – 2
If the threshold is 4, it would be considered as a match
![Page 24: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/24.jpg)
![Page 25: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/25.jpg)
vast majority of people send only one message, andthere are some who send a great many
![Page 26: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/26.jpg)
![Page 27: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/27.jpg)
Out-degree - # of different people from whom an individual has received responses
Higher out-degree <-> higher status
![Page 28: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/28.jpg)
In-degree - # of different people to whom an individual has replied-to
Indicates the level of engagement of an individual in the mailing list and the breadth of his/her interests
![Page 29: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/29.jpg)
In-degree - # of different people to whom an individual has replied-to
Indicates the level of engagement of an individual in the mailing list and the breadth of his/her interests
The distributions show a small-world character
![Page 30: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/30.jpg)
High correlation between messages sent and replies got(out order) -0.97
![Page 31: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/31.jpg)
Correlation may not be true-
1. People who only post relevant messages get large responds to messages
2. Only people who receive replies from several people keep sending messages (Survival Effect)
![Page 32: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/32.jpg)
Each link indicates at least 150 messages least 150 messages sent
![Page 33: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/33.jpg)
C&C ACTIVITY AND DEVELOPMENTACTIVITY
How does email activity relate to software development activity?
73 committers-
1. A correlation of 0.80 between the number of messages sent by an individual, and number of source changes they make –
more software development work <-> more C&C activitymore software development work <-> more C&C activity
![Page 34: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/34.jpg)
C&C ACTIVITY AND DEVELOPMENTACTIVITY
How does email activity relate to software development activity?
73 committers-
1. A correlation of 0.80 between the number of messages sent by an individual, and number of source changes they make –
more software development work <-> more C&C activity
2. A correlation of 0.57 between the number of messages sent by an individual, and number of document changes they make
source code activities require much more co-ordination effortthan documentation effort
more software development work <-> more C&C activity
![Page 35: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/35.jpg)
Are developers more likely to play the role of gatekeepers or brokers in the complete email social network?
![Page 36: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/36.jpg)
Are developers more likely to play the role of gatekeepers or brokers in the complete email social network?
Betweenness (BW)---
![Page 37: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/37.jpg)
Are developers more likely to play the role of gatekeepers or brokers in the complete email social network?
Betweenness (BW)---
High betweenness <-> that the person is a kind of broker, or gatekeeper
![Page 38: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/38.jpg)
mean
![Page 39: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/39.jpg)
mean
Developers are higher in status than non-developers
![Page 40: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/40.jpg)
Relative Status of Developers
Do the most active developers have the highest status among developers ?
![Page 41: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/41.jpg)
Relative Status of Developers
Do the most active developers have the highest status among developers ?
Source changes are not as highly correlated with document changes <-> not all developers are engaged in both to the same degree
![Page 42: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/42.jpg)
Relative Status of Developers
Do the most active developers have the highest status among developers ?
Source changes are not as highly correlated with document changes <-> not all developers are engaged in both to the same degree
Source changes shows the strongest rank correlation with the social network status <-> the most active developers play the strongest role of communicators, brokers, and gatekeepers
![Page 43: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/43.jpg)
The level of activity on the mailing list is strongly correlated with source code change activity, and to a lesser extent with document change activity.
Conclusion
![Page 44: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/44.jpg)
The level of activity on the mailing list is strongly correlated with source code change activity, and to a lesser extent with document change activity.
Social network measures such as in-degree, out-degree and betweennessindicate that developers who actually commit changes, play much more significant roles in the email community than non-developers.
Conclusion
![Page 45: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/45.jpg)
The level of activity on the mailing list is strongly correlated with source code change activity, and to a lesser extent with document change activity.
Social network measures such as in-degree, out-degree and betweennessindicate that developers who actually commit changes, play much more significant roles in the email community than non-developers.
Conclusion
Even within the select group of developers, there is a strong correlation between the social network importance and level of source code change activity.
![Page 46: Mining Email Social Networks](https://reader033.vdocuments.net/reader033/viewer/2022042813/54803ec0b37959582b8b5af6/html5/thumbnails/46.jpg)
Questions?