mining trajectory profiles for discovering user communities speaker : chih-wen chang national chiao...

Post on 21-Jan-2016

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mining Trajectory Profiles for Discovering User

Communities

Speaker : Chih-Wen ChangNational Chiao Tung University, Taiwan

2009.11.03

Chih-Chieh Hung, Chih-Wen Chang, Wen-Chih Peng

2

Outline

• Motivation• Goal• Framework

– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community

• Experiments• Conclusion

3

Motivation (1/2)

• Rapid development of positioning techniques, users can easily collect their trajectories– GPS Logger, smart phones and navigation

devices

4

Motivation (2/2)

• Many GPS community sites are established– Users can share their own trajectories – Users can search trajectories

My tracks

Every Trail

Query

5

Goal

• Mine user communities from raw trajectories– User Communities

• Sets of users who have similar moving behaviors

• Applications– Find new friends– Recommendation– Rank of trajectories

6

Profile Profile

Profile

Measure Distance Between UsersCommunity 2

Community 1

1. Construct User’s Profile2. Formulate distance function3. Identify users communities

7

Outline

• Motivation• Goal• Framework

– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community

• Experiments• Conclusion

8

Framework

Preprocess

Construct User’s Profile

Measure Distance Between Users

Identify Community

9

Preprocessing• Step 1:

– Find frequent regions• Input: all trajectories of users• Output: frequent regions • Density-based approach

• Step 2: – Transform trajectories into sequences of

frequnet region id• T1 : <A, B, D>

10

Framework

Preprocess

Construct User’s Profile

Measure Distance Between Users

Identify Community

11

Construct User’s Profiles (1/2)

• User’s Profile– Probabilistic Suffix Tree (abbreviated as PST)

• Find and organize trajectory patterns• Record the probability of next movements

Frequently moving sequence

Conditional tables(next possible movements)

12

Construct User’s Profiles (2/2)

• Construct PST– Level by level– Two operations:

• Create a child node– The counts of Before symbol > MinSup

• Add a symbol into the related conditional table– The counts of After symbol > MinSup

root

A:0.5 B:0.375

A

A

B

ABEABAACBADFHJHIEDH AB:0.25

Before symbol A : 2 2/3 × 0.375 = 0.25

After symbol A : 1 1/2 = 0.5 E : 1 1/2 = 0.5

Node B

SID Count C. Prob.

A 1 0.5

E 1 0.5

ABEABAACBADFHJHIEDH

ABEABAACBADFHJHIEDH

B:0.375

MinSup = 0.2

13

Framework

Preprocess

Construct User’s Profile

Measure Distance Between Users

Identify Community

14

• Determine distance of users1. Transform the PST into Moving Sequence

ListEach element in moving sequence list is a branch of PST with their probability

Formulate Distance function (1/3)

L1 [1..2] = <[(A,0.5)],[(B,0.375)(AB,0.33)]>

15

Formulate Distance function (2/3)

2. Define the distance between PSTs−Find the minimal dist(Li[1..m], Lj[1..n])

−Use three editing operations• Insertion

L1={m1:0.3,m2:0.2,m3:0.3}

L2={m1:0.3,m2:0.2}L1={m1:0.3,m2:0.2,m3:0.3}L2={m1:0.3,m2:0.2,m3:0.3}

Insert0.2

0.1

T1 T2 Cost = 0.3

• Deletion

• Replacement

L1={m1:0.2,m2:0.2,m3:0.2}

L2={m1:0.2,m2:0.2,m3:0.2}

Replace

Formulate Distance function (3/3)

16

L1={m1:0.2,m2:0.3}

L2={m1:0.2,m2:0.3,m3:0.3}

Delete

L1={m1:0.2,m2:0.3}L2={m1:0.2,m2:0.3,____}

L1={m1:0.2,m2:0.2,m3:0.2}

L2={m1:0.2,m2:0.2,m4:0.3}

T1 T2

T1 T2

0.3 Cost = 0.3

0.2 0.3Cost = 0.3+0.2 = 0.50.2

17

Framework

Preprocess

Construct User’s Profile

Measure Distance Between Users

Identify Community

18

Identify Community (1/4)

• User community– The same community: δMLS(Ti,Tj) < thresholdδ

– The number of communities is minimal• Transform the relation between PSTs into a

graph– A vertex represents a user– An edge exists between two vertices when

δMLS(Ti,Tj) < thresholdδ O1

O2 O5O3

O4

19

Identify Community (2/4)

• Model as a minimum clique problem– A clique is a set of pair-wise adjacent vertices Example

O1

O2 O5O3

O4

20

Identify Community (3/4)

• Select a representative PST for each community– Represent all PSTs in the same community– Advantages

• Reduce the overhead of storages• Speed up query processing• Identify new users for their communities

Representative PST

Add into

?

21

Identify Community (4/4)

• Two factors1. Size of representative PST

▪ The number of tree nodes, denoted as N(Ti)

2. Distance between the selected PST and othersin the same community▪ The error sum, denoted as ES

- Sum of the distance between selected PST and others

• Representative PST– Minimize

22

Outline

• Motivation• Goal• Framework

– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community

• Experiments• Conclusion

23

Experiments (1/4)

• Simulator Model– Use real trajectories from CarWeb to simulate

the group mobility of users• Total : 2400 trajectories

24

• Compare to General Sequential Pattern mining algorithm (GSP)– Set of sequential patterns Ex. sp1, sp2, ..., spn

– Trajectory profile of a user represented as a

– Distance function between profiles• Cosine similarity measurement, similarity(Vi, Vj) = Example

Experiments (2/4)

Similarity : <1,1,0,0> . <0,1,1,1>

|<1,1,0,0>||<0,1,1,1>| 32

1

||||||| ji

ji

VV

VV

25

Experiments (3/4)

• Impact of Trajectory Profiles

Storage

Prediction

GSP are always larger than PSTEspecially in MinSup smaller than 0.15

26

Experiments (4/4)

• Impact of the thresholdδ and MinSup– Smaller thresholdδ will find more number of

communities

Storage

Prediction

27

Outline

• Motivation• Goal• Framework

– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community

• Experiments• Conclusion

28

Conclusion

• Explore the problem of mining communities from trajectories

Preprocess

Construct User’s Profile

Measure Distance Between Users

Identify Community

Find frequent regionsReplace trajectories by region ids

Formulate distance function

Cluster users by distance functionSelect Representative PSTs

Build probabilistic suffix tree (abbreviated as PST)

29

THANK YOU!

top related