the study on mining temporal patterns and related applications in dynamic social...

47
Yi-Cheng Chen 陳陳陳 1 Mining Temporal Pattern and Related Applications

Upload: thanh-hieu

Post on 29-Jun-2015

183 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Yi-Cheng Chen 陳以錚

1

Mining Temporal Pattern and Related Applications

Page 2: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Curriculum VitaeBasic Information

Birthday – Aug. 31, 1978Education

Depart. of CSE, YZU (B. S. 2000) Depart. of CS, NTUST (M. S. 2002)Depart. of CSIE, NCTU (Ph. D. 2012)

Advisor: Prof. Suh-Yin Lee ( 李素瑛 教授 ), Wen-Chih Peng (彭文志 教授 )

Ph. D. Dissertation: A Study on Time Interval-based Sequential Patterns Mining

2

Page 3: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

OutlineCurrent Research

Temporal Pattern Mining

Social Network Analysis

Smart Home Application

Cloud Computing

3

Page 4: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Lots of data is being collected Web data, e-commerce purchases at department Bank/Credit Card

transactions

Computers have become cheaper and more powerful

Competitive Pressure is Strong Provide better, customized services for an edge (e.g. in

Customer Relationship Management)

Why Data Mining? Commercial Viewpoint

4

Page 5: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Why Data Mining? Scientific Viewpoint

Data collected and stored at enormous speeds (GB/hour)

remote sensors on a satellite

telescopes scanning the skies

microarrays generating gene expression data

scientific simulations generating terabytes of data

Traditional techniques infeasible for raw dataData mining may help scientists

in classifying and analyzing data in Hypothesis Formation 5

Page 6: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Data MiningWe are buried in data, but looking for

knowledge Data mining

Knowledge discovery in databasesExtraction of interesting knowledge (rules,

regularities, patterns) from data in large databases

6

Page 7: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

7

Temporal Pattern Mining

Page 8: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

8

Point-based sequential pattern mining Customer analysis, network intrusion detection, finding

tandem repeats in DNA sequence… Simple relation between point

time point-based

diaper

milk

diaper

beer

milk

beer

Three relation(before, equal, after )

with min_sup = 2, (ab)dc is a frequent sequential pattern

Sequential Pattern Mining

Page 9: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Interval Data Everywhere !!Interval data

Data has duration time

Clinical data, library data, appliance usage data

ApplicationsDiagnosis System, recommendation

system, Smart home

9Diagnosis System Smart Home

DB

Recommendation

Page 10: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

10

Chess pain

fever

cough

Interval-based sequential pattern mining Library reader analysis, patient disease analysis, stock

fluctuation, ... Complex relations

Allen’s 13 temporal relations

time interval-based

With min_sup = 4, is a frequent temporal pattern

Temporal Pattern Mining

Page 11: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

11

Allen’s 13 temporal logics describe relationship between any two events (binary relation) [ACM 1983]

Allen Relationship

Page 12: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

12

Real example Some temporal patterns generated from NCTU library

Page 13: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

13

Representation Allen’s relations are binary relation Express the relation more than 3 intervals

Ambiguous problem Space usage

Efficient algorithms Mining temporal pattern * Mining closed temporal pattern Incrementally maintain discovered temporal

pattern and closed temporal pattern Related applications

Social network Smart home

Motivation

Page 14: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

14

Proposed Method Coincidence representation

Segment intervals into disjoint slices Nonambiguous and compact representation

Endpoint representation Global information of a sequence Nonambiguous and compact representation

TPMiner (Temporal Pattern Miner) Pattern-growth approach

Without candidate generation and test Two components

RPrefixSpan Pruning strategies

Page 15: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

15

Segment intervals into disjoint slices Four kinds of event slice Start slice (+), intermediate slice (*), finish slice (-)

and intact slice ( ) Coincidence

Slices occurring simultaneously

Space usage (for a k-pattern) Best: k, Worst: 2k space

coincidence

event intervals

coincidence representation: (A+) (AB+) (B) (C+) (C*D ) (C) (E)

C

(AB+) (E)(C+) (C*D) (C)(A+) (B)

EA

B D

Coincidence representation

Page 16: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

16

A data structure, endtime_list Sort and merge Trace endtime_list one-by-one

(A, 1, 4)

(B, 2, 5)

(C, 2, 8)

(D, 3, 5)

(E, 5, 7)

Incision strategy

coincidence representation:

(A+) (B+ C+) (A D+ ) (B D) @ (E) (C)

trace one- by- one

endtime_listtypesymbol time

sD 3

sA 1sBC 2

fA 4fBD 5sE 5fE 7fC 8

endtime_listtypesymbol time

sC 2

sA 1

sB 2

fC 8sD 3

sE 5fE 7

fD 5

fA 4

fB 5

merge

sort

Page 17: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

17

Sequence of ordered time points +: start time, : finish time

NonambiguousSpace usage (for a k-pattern)

2k space

Endpoint Representation time points of events

ABCD

A ( B C ) A ( B C D ) D

Page 18: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

18

Example Database

Page 19: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

19

Every item is disjoint The relations among slices are simple

Before, equal and after (like time-point data) RPrefixSpan

Borrow the idea of PrefixSpan Scan local database to find frequent slices Append and extend the pattern Project database

Pruning strategy Reduce search space Pre-pruning and post-pruning

TPMiner – RPrefixSpan (1/2)

Page 20: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

20

D

D |en

D |e1

D |e2

D |ei

transform sequences and project database

scan database

frequent items:e1, e2, ..., ei, ..., en

..

..

..

..

..

..

..

D |e2...

D |e1...

..

…D |en...

D |ei...

..

..

..

collect all mining patterns

Frequent temporal patterns

recursively project database and append & extend pattern

TPMiner – RPrefixSpan (2/2)

Page 21: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

21

Pruning Strategy – Pre-pruning

scan database

frequent local slice :A, B+, B, C

D| A+

A+ …

A+ C D|A+ C

A+ B D|A+ B

A+ B+ D|A+ B+

A+ A D|A+ A

Non-qualified pattern

Non-promising projection can be pre-pruning !

Utilize the concept of slice and coincidence Start slices and finish slices occur in pairs Only require projecting the frequent finish slices which

have the corresponding start slices in their prefixes

Page 22: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

22

Pruning Strategy – Post-pruning

E S1: (D - )(B- )S2: (D - )S3: (D - )

D |E

...S1: (B + )(D + )(E)(D - )(B- )S2: (B + )(B - D + )(E)(D - )S3: (B)(A)(D + )(E)(D - )

A coincidence database D

...

Insignificant sequences

Projected database can be post-pruning

Utilize the concept of slice and coincidence Start slice always appear before finish slice Only collect the significant postfixes

With respect to a prefix , all finish slices in postfix have corresponding start slices in

Page 23: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

23

Experimental Results (1/2)

(b) The number of temporal patterns(a) The performance of six algorithms

D200k – C40 – N10k

num

ber

of g

ener

ated

pat

tern

s

minimum support (%)minimum support (%)

exec

utio

n tim

e (s

ec)

D200k – C40 – N10k

H-DFS

ARMADA

TPrefixSpan

IEMiner

TPMiner-CR

TPMiner-ER

0

10000

20000

30000

40000

50000

60000

70000

1 0.9 0.8 0.7 0.6 0.5

0

500

1000

1500

2000

2500

3000

3500

4000

1 0.9 0.8 0.7 0.6 0.5

(b) The number of temporal patterns(a) The performance of six algorithms

D200k – C40 – N10k

num

ber

of g

ener

ated

pat

tern

s

minimum support (%)minimum support (%)

exec

utio

n tim

e (s

ec)

D200k – C40 – N10k

H-DFS

ARMADA

TPrefixSpan

IEMiner

TPMiner-CR

TPMiner-ER

H-DFS

ARMADA

TPrefixSpan

IEMiner

TPMiner-CR

TPMiner-ER

0

10000

20000

30000

40000

50000

60000

70000

1 0.9 0.8 0.7 0.6 0.50

10000

20000

30000

40000

50000

60000

70000

1 0.9 0.8 0.7 0.6 0.5

0

500

1000

1500

2000

2500

3000

3500

4000

1 0.9 0.8 0.7 0.6 0.5

0

500

1000

1500

2000

2500

3000

3500

4000

1 0.9 0.8 0.7 0.6 0.5

N10k – C20 – N10k

minimum support (%)

mem

ory

usag

e (M

B)

0

500

1000

1500

2000

2500

1 0.9 0.8 0.7 0.6 0.5

H-DFS

ARMADA

TPrefixSpan

IEMiner

TPMiner-CR

TPMiner-ER

N10k – C20 – N10k

minimum support (%)

mem

ory

usag

e (M

B)

0

500

1000

1500

2000

2500

1 0.9 0.8 0.7 0.6 0.50

500

1000

1500

2000

2500

1 0.9 0.8 0.7 0.6 0.5

H-DFS

ARMADA

TPrefixSpan

IEMiner

TPMiner-CR

TPMiner-ER

H-DFS

ARMADA

TPrefixSpan

IEMiner

TPMiner-CR

TPMiner-ER

Page 24: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

24

Experimental Results (2/2)

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without post-pruning strategy

(b) The performance test of influence on post-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

TPMiner-CR

TPMiner-CR without pre-pruning strategy

(a) The performance test of influence on pre-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

7000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without subset-pruning strategy

(c) The performance test of influence on subset-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

(b) The performance test of influence on all proposed pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

7000

8000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without any pruning strategy

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without post-pruning strategy

(b) The performance test of influence on post-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.50

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without post-pruning strategy

TPMiner-CR

TPMiner-CR without post-pruning strategy

(b) The performance test of influence on post-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

TPMiner-CR

TPMiner-CR without pre-pruning strategy

(a) The performance test of influence on pre-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

7000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without pre-pruning strategy

TPMiner-CR

TPMiner-CR without pre-pruning strategy

(a) The performance test of influence on pre-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

7000

1 0.9 0.8 0.7 0.6 0.50

1000

2000

3000

4000

5000

6000

7000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without subset-pruning strategy

(c) The performance test of influence on subset-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without subset-pruning strategy

TPMiner-CR

TPMiner-CR without subset-pruning strategy

(c) The performance test of influence on subset-pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

0

1000

2000

3000

4000

5000

6000

1 0.9 0.8 0.7 0.6 0.5

(b) The performance test of influence on all proposed pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

7000

8000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without any pruning strategy

(b) The performance test of influence on all proposed pruning strategies

minimum support (%)

exec

utio

n tim

e (s

ec)

0

1000

2000

3000

4000

5000

6000

7000

8000

1 0.9 0.8 0.7 0.6 0.5

0

1000

2000

3000

4000

5000

6000

7000

8000

1 0.9 0.8 0.7 0.6 0.5

TPMiner-CR

TPMiner-CR without any pruning strategy

TPMiner-CR

TPMiner-CR without any pruning strategy

Page 25: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

25

Related Applications

Page 26: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

26

Smart Home Application

(2) Pattern Mining

CloudDatabase

UsagePattern

s

P2:P3: …

P1: (1) Sensor data log

(5) System Alarm & Remote Control

(3) Behavior Detection

(4) Abnormal Detection

Home

Current Behavior

Usage Pattern

Air Conditioner

light

Air Conditioner

light

Current Behavior

Air Conditioner

light

Alarm

Home Server

Remote Control

on offID3

on offID2

on offID2

on offID4

D-Link controler

Light

Alarm

Home Server

Remote Control

Alarm

Home Server

Remote Control

on offID3

on offID2

on offID2

on offID4

D-Link controler

Light on offID3

on offID2

on offID2

on offID4

D-Link controler

Light on offID3

on offID3

on offID2

on offID2

on offID2

on offID4

D-Link controler

Light

Page 27: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

27

Dynamic Social Network (1/2)Dynamic social network

A sequence of interaction graph Nodes and edges vary with time

A lossless transformation Graph sequence interval sequence

B

A

CD

E

G4

B

A

CD

E

G1

B

A

CD

E

G2

B

A

CD

E

G3

….B

A

CD

E

G4

B

A

CD

E

G1

B

A

CD

E

G2

B

A

CD

E

G3

….

31C

31AD

64E

42D

31B

C

31C

31AB

64E

42D

31BA

event sequencefinishtime

starttime

event symbol

SID

31C

31AD

64E

42D

31B

C

31C

31AB

64E

42D

31BA

event sequencefinishtime

starttime

event symbol

SID

EB

D

EB

D

A

C

A

C

EB

D

EB

D

A

C

A

C

31C

31AD

64E

42D

31B

C

31C

31AB

64E

42D

31BA

event sequencefinishtime

starttime

event symbol

SID

31C

31AD

64E

42D

31B

C

31C

31AB

64E

42D

31BA

event sequencefinishtime

starttime

event symbol

SID

EB

D

EB

D

A

C

A

C

EB

D

EB

D

A

C

A

C

t3

t2t1

Page 28: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Reduce the complexity of graphAvoid isomorphism testing

Dynamic Social Network Analysis Pattern miningClassificationRecommending systemNetwork sampling Clustering

28

Dynamic Social Network (2/2)

Page 29: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

29

Social Network Analysis

Page 30: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

30

Social Network Analysis A graph representation

Nodes and edges

Page 31: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

31

Influence Maximization

Page 32: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

32

Advertisement Budget According to , advertisement spending

on worldwide social networking sites 2008, $23.3 millions 2010, $23.6 billions 2011, almost $25.5 billions Advertisement spending

Page 33: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

33

Word-of-mouth effect in social networkInfluence maximization problem

Select initial users (seeds) so that the number of users that adopt the product or innovation is maximized

Influence Maximization

social networksocial network

Seeds select

Page 34: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

34

MotivationCharacteristic of social network

Community structure

Community and degree heuristic (CDH) Utilize community information Avoid influence overlapping

65

4

11

12

103

72

9 8

1

65

4

11

12

103

72

9 8

1

8

2

9

4

5

1 21

8

7

9

4

5 6

Page 35: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

35

Proposed Algorithm – CDHFramework of CDH

Page 36: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

36

CDH – Adjust Step Adjust selected fundamental nodes

Seeds selected from large community may activate more inactive nodes than small community

Replace the fundamental node in small community If we can activate more inactive nodes

Finally, output the result as selected seed nodes

CkC1 C2

second largestdegree node

in C1

C3 ……

largest degree node in Ck

replace!!delete!!

Page 37: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

37

Experimental Results - Facebook

Page 38: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

38

Dynamic Recommendation

Page 39: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Recommendation Systempredict the ratings or preferencesusing a model build from the characteristics

39

(a) amazon.com (b) youtube.com

Page 40: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Collaborative Filtering (CF)1. Calculate the similarity between the active user

and the other users• Person’s correlation, cosine similarity, conditional

probability, etc.

2. Predict the rating of items that have not been rated by the active user

3. Output the top-k items by the predicting results

40

Page 41: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

i1 i2 i3 i4Avg.Ofuser

A 4 1 4 3

B 2 4 3

C 3 3 2 2

normalize

wwp

normalizew

normalizew

cabaia

ca

ba

,,,

,

,

*)22(*)34(3

)23)(34()23)(31(

)32)(31(

4

item

user

41

Page 42: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

MotivationDynamic! Dynamic! Dynamic!

Why we need dynamicAll things vary with time

Dynamic Collaborative Filteringconsider the time influence in the calculation.

Without considering about the timethe results of prediction might be out of date.

42

Page 43: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Dynamic Similarity based on Collaborative Filtering (DSCF)

( user->item : rating (time) )1 -> 1193 :5 (2012.5.18)5 -> 661 :3 (2012.3.5)3 -> 914 :3 (2012.6.27)1 -> 3408 :4 (2012.3.18)… …

( user->item : rating (time) )9 -> 6610 : 5 (2012.7.8)2 -> 6610 : 3 (2012.7.15)… ….

………. ….. ..

………. …. ..

………. …. ..

*(1-α)*(1-α)

101

0 )1( t

ttt MsimMsimMsim

01tDB

1ttDB

01tMsim

1ttMsim

0tMsim

43

Page 44: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Advanced DSCFα (similarity decay value, SDV) might not be

consistent for all time.each user might have his/her own SDV in

different time points.feedback predicted values from actual values

44

Page 45: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

k

j jaja

k

j jajajij

aiamsimsi

msimsirrrA

1 ,,

1 ,,,

,])1([

])1([)(

45

Activeuser

?

k

j jaja

k

j jajajij

aiamsimsi

msimsirrrp

1 ,,

1 ,,,

,])1([

])1([)(

Recommend

Predict

Activeuser

Aa,i

Feedback

Page 46: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Experimental Results

46

Page 47: The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

47