e-mail mining: extracting collaborative activities from e-mail akiko murakami koichi takeda

19
E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Upload: diana-fitzgerald

Post on 28-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

E-mail Mining:Extracting Collaborative Activities

from E-Mail

Akiko Murakami

Koichi Takeda

Page 2: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Contents

Overview of our Text mining work Text Mining for individual text Text Mining for discussion text Text Mining for e-mail

Discussion on E-mail mining Pair-mail Three levels of e-mail mining targets

Preliminary study of e-mail mining

Page 3: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Text Mining

Text mining has become one of the most influential natural language processing research.

Text mining is extended to various domain CRM (Customer Relationship Management) Biomedical domain Web pages Discussion records Patent

Page 4: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Text Mining for Individual Text

Call Taker: James Date: Aug. 30, 2002Duration: 10 min.CustomerID: ADC00123

Q: cust sys has stopped working.A: checked cust bios and it need updated. …

Unstructured Data

Structured Data[Call Taker] James [Date] 2002/08/30[Duration] 10 min.[CustomerID] ADC00123

[Noun] Customer[Software] BIOS[Subj...Verb] customer system..stop[SW..Problem] BIOS..need

Original Data Meta Data

LinguisticAnalysis

TaggingDependency AnalysisNamed Entity ExtractionIntention Analysis

CategoryDictionary

SynonymDictionary

Category Item

Visualization & Interactive Mining

Mining

IBM TAKMI(Nasukawa, Nagano,1999)

Mining target: individual text Mining unit: >texts >category labeled items extracted from text using NLP

Page 5: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

TAKMI Client GUI

Mining History

Document List

Distribution AnalysisView

Other Mining Views

Page 6: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Text Mining for Discussion Records

Mail A

Mail B

Mail C

Quotation from Mail A

Comment on the quotationQuotation from Mail B

Comment on the quotation

Thread Summary

Discussion Mining(Murakami, Nagao,2001)

Linguistic Annotation

Mining target: discussion recordsMining unit:>summarized texts based on thread structure >mail graph structures

Page 7: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Discussion Mining

Page 8: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Text Mining for E-mail

Private E-mail Data Various structured data as mail messages

Sender(From), Receiver(To,cc.,bcc.), Time Stamp, Mail unique ID, Referential ID, etc.

Independent and relational documents are mixed in e-mail data.

F.Y.I., invitation, CFP etc. Mailing List, inquiry, request etc.

Page 9: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Properties of e-mail messages

Private Mailwithout c.c.

Private Mailwith c.c.

Private

Public

IndependentRelative

F.Y.I

Spam

memo

Mailing List

ScheduleDiscussion Mining

E-mail Mining

Text Mining

Discussion, BBS,,,

Discourse

Paper, Report,,,

Page 10: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

E-mail mining

Not suitable for annotation Need to consider scalability

Shorter threads than discussion records’.

New concept of the E-mail mining target is required.

AND

Lack of information like discourse structure participants are small than discussion

Page 11: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Pair-mail

Pair-mail is formed by reference link, reply-to information.

Each reply-to link forms a pair-mail. It contains reference type

information based on previous/next mail contents

Question/Answer, Imperative/Action, Action/Regards... etc

reply-to

Page 12: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Mining Target -mining units-

Three levels of mining target in mail data

1st level : e-mail an individual e-mail as a single substance

2nd level : pair-mail a pair of e-mail linked by reply-to relations.

3rd level : thread a chain of e-mail messages (threads)

Scalabilit

y

High

Low

Page 13: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Preliminary study

Page 14: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Examples of mail mining

Mail data for one month (May, 2003)Business related mails

discussion with co-author of my paper meeting invitations mail magazines and mailing list messages are

received in another accountIncluding my sending messagesVolume: 380mail messages (19 mail messages / a working day)

Page 15: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Thread Properties

Extracting thread structure based on the header information (Reference ID).

Average length of threads 1.60 mail message(238 threads). but, most of mail message are individual type

Average length without individual mail is 3.09 mail messages(68 threads).

Most threads are shorter than 3 messagesLong thread (over 4 messages) is only 16

The average of participant number of long thread (more than 4 messages) is 3.5.

Page 16: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Changes in numbers of thread participants

Changes in numbers of thread participants

012345678

1 2 3 4 5 6 7 8 9 10 11

Mail Thread

Num

bers

of pa

rtic

ipan

t

Total ParticipantsC.C.B.C.C.

Expansion of participants number →  general information

No member in c.c. field→  Special topics in sender and receiver

Consider the pair mail properties (ex. the shift of the number of participants),it helps to extract the relevant information.

Page 17: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Pair-mail Extraction

Extracted pair-mail contains some expression in second mail ex. gratitude expression such as

“Thank you”. These pair-mails contain some

relation to the expression in the example, “gratitude”

expressions is a result of some “action” in the previous mail

“thank you...”

Action

Page 18: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Result of pair-mail extraction

action described in maildatainformation in previousmailreal world action

platitudinous expression

attachment text

Most of the expressions are found in previous mail as attachment - data cleansing are required

In the rest of results, we can find the action described in previous mail. About 40% is one’s gratitude for actions described in mail (8% is for information) and 10 % is for real world action.

5% is platitudinous expression.

Extracted 106 pair-mail

Page 19: E-mail Mining: Extracting Collaborative Activities from E-Mail Akiko Murakami Koichi Takeda

Summary

Text Mining for e-mail Text Mining for individual and relational text

Introduce the new mining unit Three levels of e-mail mining targets

single mail. pair-mail. thread

Preliminary study of e-mail mining Pair-mail information is important in threads. Needs data cleansing.

Remove signature, attachment,,,