transactions on knowledge discovery and engineering 1...

14
1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEE Transactions on Knowledge and Data Engineering TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community Discovery and Recommendation by Multi-source Diffusion Modeling Siyuan Liu, Shuhui Wang Abstract—In this paper, we detect communities from trajectories. Existing algorithms for trajectory clustering usually rely on simplex representation and a single proximity-related metric. Unfortunately, additional information markers (e.g., social interactions or semantics in the spatial layout) are ignored, leading to the inability to fully discover the communities in trajectory database. This is especially true for human-generated trajectories, where additional fine-grained markers (e.g., movement velocity at certain locations, or the sequence of semantic spaces visited) are especially useful in capturing latent relationships among community members. To overcome this limitation, we propose TODMIS, a general framework for Trajectory-based cOmmunity Detection by diffusion modeling on Multiple Information Sources. TODMIS combines additional information with raw trajectory data and construct the diffusion process on multiple similarity metrics. It also learns the consistent graph Laplacians by constructing the multi-modal diffusion process and optimizing the heat kernel coupling on each pair of similarity matrices from multiple information sources. Then, dense sub-graph detection is used to discover the set of distinct communities (including community size) on the coupled multi-graph representation. At last, based on the community information, we propose a novel model for online recommendation. We evaluate TODMIS and our online recommendation methods using different real-life datasets. Experimental results demonstrate the effectiveness and efficiency of our methods. Index Terms—Community detection, trajectory, multiple information sources, semantic information. 1 I NTRODUCTION Community detection is essential for social behavior analysis and recommendation. Previous literature focuses on commu- nity detection in social connection and graph partition. But in practice, due to privacy concerns, it is usually very hard for us to capture connections in human society. Instead, we are able to capture human trajectories with Wi-Fi and GPS devices. The question is how to detect communities in trajec- tories? Intuitively, community detection is usually achieved by clustering. The objective of trajectory clustering is to identify clusters from a set of trajectories of moving objects, where the trajectories in a specific cluster exhibit similarity in one or more movement-related features [42], [16]. Examples of trajectory data include vehicle position data, animal movement data and human behavior tracking data. Consequently, there is an increasing interest in performing data mining and behavior analysis over such trajectory datasets [9]. Our first interest lies in developing an efficient and effective trajectory-based grouping algorithm, that can accommodate various latent attributes embedded within the raw trajectory data. We are motivated by a couple of important use cases: 1) Social Recommendation: Knowing the size of a group of people visiting a mall or browsing through a store together will allow merchants to tailor discounts and promotions specifically targeted to the group. Siyuan Liu is with Smeal College of Business, Pennsylvania State Univer- sity. Shuhui Wang is with Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences. 2) Online and Offline Behavior Analysis: By combining knowledge of the real world physical inter-connections among people with their online social media data, com- putational social scientists may be able to create versatile methodologies of human social interaction. In this paper, we study community discovery from trajec- tories, which aims to identify groups of objects from trajec- tory data based on additional behaviorally-driven markers of individual and collective movement. The difference between cluster and community (or group) is that a cluster is a set of objects related purely through spatial proximity, whereas a community is a set of objects whose proximity or movement similarity is likely a manifestation of some underlying mutual interaction or shared relationship. 1.1 Prior Literature and Research Challenge Previous studies normally perform trajectory clustering based on only a single information source, such as location data [8], [16]; by viewing shared location as the sole determinant of community relationship, real relationships may be missed or non-existent communities may be falsely identified. In the social graph community detection literature [17], [36], [13], a community is usually defined over a link-based graph capturing direct pair-wise interactions; such explicit interaction markers are obviously hard to directly obtain in many prac- tical environments due to privacy concerns or technological limitations. Hence, we focus on inferring groups based on trajectory-related information (e.g., spatial disperson, temporal duration, movement velocity) of individual users and the semantic information of the space.

Upload: others

Post on 13-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1

Trajectory Community Discovery andRecommendation by Multi-source Diffusion

ModelingSiyuan Liu, Shuhui Wang

Abstract—In this paper, we detect communities from trajectories. Existing algorithms for trajectory clustering usually rely on simplexrepresentation and a single proximity-related metric. Unfortunately, additional information markers (e.g., social interactions or semanticsin the spatial layout) are ignored, leading to the inability to fully discover the communities in trajectory database. This is especially true forhuman-generated trajectories, where additional fine-grained markers (e.g., movement velocity at certain locations, or the sequence ofsemantic spaces visited) are especially useful in capturing latent relationships among community members. To overcome this limitation,we propose TODMIS, a general framework for Trajectory-based cOmmunity Detection by diffusion modeling on Multiple InformationSources. TODMIS combines additional information with raw trajectory data and construct the diffusion process on multiple similaritymetrics. It also learns the consistent graph Laplacians by constructing the multi-modal diffusion process and optimizing the heat kernelcoupling on each pair of similarity matrices from multiple information sources. Then, dense sub-graph detection is used to discoverthe set of distinct communities (including community size) on the coupled multi-graph representation. At last, based on the communityinformation, we propose a novel model for online recommendation. We evaluate TODMIS and our online recommendation methodsusing different real-life datasets. Experimental results demonstrate the effectiveness and efficiency of our methods.

Index Terms—Community detection, trajectory, multiple information sources, semantic information.

F

1 INTRODUCTION

Community detection is essential for social behavior analysisand recommendation. Previous literature focuses on commu-nity detection in social connection and graph partition. Butin practice, due to privacy concerns, it is usually very hardfor us to capture connections in human society. Instead, weare able to capture human trajectories with Wi-Fi and GPSdevices. The question is how to detect communities in trajec-tories? Intuitively, community detection is usually achieved byclustering. The objective of trajectory clustering is to identifyclusters from a set of trajectories of moving objects, wherethe trajectories in a specific cluster exhibit similarity in oneor more movement-related features [42], [16]. Examples oftrajectory data include vehicle position data, animal movementdata and human behavior tracking data. Consequently, there isan increasing interest in performing data mining and behavioranalysis over such trajectory datasets [9].

Our first interest lies in developing an efficient and effectivetrajectory-based grouping algorithm, that can accommodatevarious latent attributes embedded within the raw trajectorydata. We are motivated by a couple of important use cases:

1) Social Recommendation: Knowing the size of a groupof people visiting a mall or browsing through a storetogether will allow merchants to tailor discounts andpromotions specifically targeted to the group.

• Siyuan Liu is with Smeal College of Business, Pennsylvania State Univer-sity.

• Shuhui Wang is with Key Lab of Intelligent Information Process, Instituteof Computing Technology, Chinese Academy of Sciences.

2) Online and Offline Behavior Analysis: By combiningknowledge of the real world physical inter-connectionsamong people with their online social media data, com-putational social scientists may be able to create versatilemethodologies of human social interaction.

In this paper, we study community discovery from trajec-tories, which aims to identify groups of objects from trajec-tory data based on additional behaviorally-driven markers ofindividual and collective movement. The difference betweencluster and community (or group) is that a cluster is a setof objects related purely through spatial proximity, whereas acommunity is a set of objects whose proximity or movementsimilarity is likely a manifestation of some underlying mutualinteraction or shared relationship.

1.1 Prior Literature and Research Challenge

Previous studies normally perform trajectory clustering basedon only a single information source, such as location data[8], [16]; by viewing shared location as the sole determinantof community relationship, real relationships may be missedor non-existent communities may be falsely identified. Inthe social graph community detection literature [17], [36],[13], a community is usually defined over a link-based graphcapturing direct pair-wise interactions; such explicit interactionmarkers are obviously hard to directly obtain in many prac-tical environments due to privacy concerns or technologicallimitations. Hence, we focus on inferring groups based ontrajectory-related information (e.g., spatial disperson, temporalduration, movement velocity) of individual users and thesemantic information of the space.

Page 2: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 2

In our study, we assume that trajectories are generated andobserved in physical spaces, with individual’s movement inthe space being encoded by multiple information markers.For example, the trajectories in a mall contain informationon what kind of stores the customers visit, how long theywould stay, the transition likelihood between stores, and howfast they are walking. The key observation is that it is hardto detect the real community-driven behavior from analysisperformed over a single timescale or over a single feature.Indeed, apparently unrelated trajectories may turn out to havestrong similarity, when viewed at certain time-scales or interms of certain features (information markers). Therefore, ourapproach is to develop a unified framework for communitydiscovery, given a set of trajectories and semantic informationof sites visited by those trajectories. This unified frameworkleverages upon different information markers embedded withinthe basic trajectory data.

Furthermore, when choosing features for trajectory analy-sis, previous approaches usually leverage upon informationfrom an individual trajectory or from long-term movementtrends, but not both. We propose to encode both the globalstatistical information and individual information into thesemantic feature, and then extract similarity measures throughthe use of kernels that operate on the probability distributionof semantic-level movement. This approach avoids the “curseof dimensionality” and overcomes the difficulty in measuringthe similarity among features with unequal length. More im-portantly, they jointly model the complicated inter-connectionof trajectories among members of a community.

1.2 Our Approach and Contribution

In this paper, we propose an approach, Trajectory-basedcOmmunity Discovery using Multiple Information Sources(TODMIS). TODMIS consists of three distinct phases:• An explicit modeling of trajectory similarities along four

distinct dimensions (or information markers): semanticproperties of the locations, temporal duration of thetrajectory, spatial proximity to other objects and move-ment velocity at different timescales. The similaritiesare computed by applying appropriate kernels for eachdimension to extract the key relevant features.

• A multi-modal diffusion process construction and a com-putation of a weighted similarity measure that linearlycombines the coupled similarity measures along eachdimension.

• The application of conventional graph clustering tech-niques (dense sub-graph detection algorithms), over agraph with edges measuring the true pair-wise similar-ity score, to identify the unknown number of differentcommunities (of varying sizes).

Based on the detected communities, we observe that similartrajectories reveal similar user interests, which can be used forsocial recommendation. Thus, in our work, we propose theonline recommendation via trajectory community information.

Our key contributions are summarized as follows.1) Multi-Dimensional Diffusion Process for Commu-

nity Detection: We propose a new unified model for

trajectory-based community detection using multiple di-mensions. Our approach is the first to model the behaviorof different dimensions by multi-dimensional diffusionprocess. With the pair-wise heat kernel coupling process,the similarity matrices of different dimensions are jointlylearned to maximize the consistency of all the diffusionprocesses. Different dimensions are linearly combinedinto a single multi-attribute weighted similarity score foreach pair of objects. Applying the model to differentapplication scenarios requires only the tuning of therelative weights for different dimensions.

2) Novel Similarity Metrics: First, we propose a modelfor extracting the semantic features from the trajectories.This feature measures the stationary distribution of theobject’s residency probability on different semantic sites(e.g., types of stores in a mall), on which a semantic ker-nel is applied to determine semantic similarity betweentrajectories. Second, to compute the spatial proximityof trajectory pairs under realistic conditions (such asinaccurate location measurements and highly-crowdedindoor spaces), we modify Global Alignment Kernels[6] to incorporate the inverse of crowd density (at aparticular location). Third, we design a novel velocity-based similarity measure that computes velocity at mul-tiple temporal resolutions (both coarse and fine-grained),which enables us to measure the velocity similarity onboth the whole trajectory and different partial trajec-tories. This approach allows us to go beyond spatialproximity and explicitly incorporate proximity measuresunder different movement rates (e.g, stationary, movingslow or fast).

3) Online Recommendation via Trajectory Community:First, we propose a memory-based online recommenda-tion model using the detected trajectory communities.The recommendation process is to average the userratings weighted by the similarity which can be obtainedbased on the detected trajectory communities. Second,based on the activity level of users and the distribution oftheir trajectories, strategies to generate different recom-mended candidate set of users are proposed for differentemphases in recommendation.

4) Experiments on Multiple Real-life Datasets: We eval-uate TODMIS and our online recommendation meth-ods against other prior trajectory-based clustering, link-based community detection and recommendation tech-niques on three distinct datasets: customer behavior ina shopping mall, student behavior in a campus buildingand taxi driver behavior in a city. Experimental resultsdemonstrate that our approaches accurately discoverreal grouping behaviors, recommend the most interestedinformation to the target objects in all three cases, andoutperform existing algorithms.

2 TODMIS OVERVIEWFirst, we define a trajectory database X = [x1, ..., xk, ..., xN ]

>,where each trajectory xk is a structured element with nksequential points, each point containing information from mul-tiple aspects, namely, the two dimensional spatial coordinate

Page 3: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 3

sequence ck, the site annotation of the sequence sk, thetime marker ek and velocity vk associated with each spatialcoordinate. Second, we have the following problem definition.

Problem definition: Given a set of trajectories, we aimto discover a set of trajectory groups, which we name asthe community in this paper. The trajectories in a certaincommunity demonstrate similar behaviors, when evaluatedunder different measures (both spatial and semantic).

Definition 1. Given N trajectories, we define a trajectoryaffinity graph G(α) with N vertices, denoted by xi, i =1, ..., N . The weight of the edge between two vertices representthe similarity of the two trajectories.

The similarity between the vertexes in the graph is aweighted combination of a set of similarities calculated frommultiple information sources, i.e., the semantic kernel Ks, thetemporal kernel Kβ , the spatial kernel Kp and the velocitykernel Kh. We denote the weighted combined similarity matrixK where each element K(xk, xk′) is the similarity of a pair ofvertexes in the graph. For the sake of simplicity, let K(k, k′)denote K(xk, xk′) in this paper, and it is defined as:

K(k, k′) =MK∑m=1

αmKm(k, k′), αm ≥ 0,MK∑m=1

αm = 1

K(k, k) = 1, k, k′ = 1, ..., N(1)

where Km(·, ·) denotes the element of one type of kernel(similarity) from the kernel set. K(k, k) = 1 denotes thatone trajectory has the maximum similarity with itself. Theαm denotes the pre-assigned weights reflecting the specificinterests of the problem domain.

For example, when one prefers to find the trajectories withsimilar semantic relation, temporal relation and similar spatialrelation, and emphasize the semantic relation, the weight canbe: α1 = 1/2, α2 = 1/4, α3 = 1/4, α4 = 0. To acquire thebest performance of community discovery, we can fine-tunethe kernel weight α on a given validation set.

Trajectory community discovery framework: Our algo-rithm consists of three phases - 1) modeling the similarity oftrajectories by first applying kernels along four dimensions:semantic-level movement, temporal, spatial (proximity) andvelocity; 2) performing multi-view heat kernel coupling tolearn the aligned kernels; and then 3) deriving an over-all similarity measure between every pair of trajectories inG(α) through a weighted combination of multiple informationsources; finally followed by 4) detecting trajectory communi-ties and their features from these similarity values. The frame-work is illustrated in Figure 1. Based on this framework, theproblem of trajectory-based community discovery is equivalentto the computation of cliques on G(α). Note that unlike thetraditional trajectory clustering approach, each trajectory is notnecessarily assigned to a community ID, as some trajectoriesin real-life data show a unique motion pattern that are totallydistinct from the rest. We denote these trajectories without anycommunity ID as the “outliers” in our study.

3 MODELING SEMANTIC INFORMATIONGiven the trajectory database X, we extract the sequential sitesvisiting description of an individual user considering that the

number of spatially distinct sites is M . One site correspondsto one place (e.g., a jewellery shop or a Point-Of-Interests)in physical world, denoted by a polygon region on the map.One point of trajectory is on the site when the point is in thepolygon. We skim over the technical part of site recognition[28] since it is not the focus of this paper.

Given M distinct sites, each trajectory can be representedby site transition sequence with corresponding temporal in-terval. For example, the site visiting record of trajectory kis s1 → s3 → s8 → s4. Let tk(sa) denote the time of k-th trajectory spent at site sa and tk(sa, sb) denotes the timetaken by moving from site sa to site sb. Then the temporalinterval of the k-th trajectory is recorded as [tk(1), tk(1, 3),tk(3), tk(3, 8), tk(8), tk(8, 4)].

We measure the traverse statistics of the sites on X, anduse the statistics to measure the semantic correlation ofuser trajectories. To this end, we construct the Markov statetransition matrix A ∈ RM×M , where A(sa, sb) representsthe overall transition probability from site sa to sb. Notedthat, A(sc, sc) represents the stay probability at sc. Thematrix A represents the overall state transition probability ona given trajectory database. One major concern is how wecalculate transition pairs A(sa, sb) that can well capture thesite transition behavior in big trajectory data. We propose threesolutions accordingly. Note that for each case, A is initializedas a zero matrix.

Transition frequency accumulation. As the simplest strat-egy, it approximates the transition probability by transition fre-quency accumulation. We collect all the transition pairs fromthe whole trajectory database. Then we count the occurrencefrequency of each transition pair by scanning the database.For example, if we have s1 → s3, then A(1, 3)= A(1, 3)+1.Finally, after we finish the transition frequency counting onthe whole database, we do row normalization of A, to ensure∑sbA(sa, sb) = 1. The row normalization turns the transition

frequencies in A into transition probabilities.Modeling temporal interest and convenience. In many

applications, the temporal interval spent on a site or duringa site transition movement may indicate the correspondinglevel of interest, or illustrates the convenience of moving fromsite sa to site sb, showing the semantic relation of two sites.Each trajectory contains the time spent at each site (denotinguser’s interest on the site, as Interest) and the time takento transit between sites (denoting the convenience betweensites, as Convenience). For one given trajectory, Interest andConvenience usually interact with each other. If one spendsmore time in transition, he/she usually spends less time ateach site since the total shopping time is more or less fixed.So both of Interest and Convenience should be consideredinto the computation of matrix A. Either one of them cannot represent the complete information of trajectory. Thisresembles a Markov Random Walk with self-loop.

To calculate A by considering temporal interval, first, wenormalize the time interval vectors for all user trajectoriesto ensure the total time taken for each trajectory is 1. Forexample, we normalize the the k-th vector so that tk(1) +tk(1, 3) + tk(3) + tk(3, 8) + tk(8) + tk(8, 4) = 1. Second,accumulation is done on the normalized time interval vectors

Page 4: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 4

Weighted combination

Trajectories

sa sb

sc sd

1r

Ks

Kp

2r

Nr…

Semantic kernel

Temporal kernel

Spatial kernel

Velocity kernel

1w

Nw…

1h

Nh… Kh Dense sub-graph detection

Ks Kβ Kp Kh1 2 3 4’ ’ ’ ’

Heat kernel coupling

Fig. 1. The framework of the proposed approach (TODMIS). TODMIS consists of four phases: 1) modeling thesimilarity of trajectories by applying kernels along four dimensions: semantic-level movement, temporal, spatial(proximity) and velocity; 2) performing multi-view heat kernel coupling to learn the aligned kernels. 3) deriving anoverall similarity measure between every pair of trajectories through a weighted combination of multiple informationsources; 4) detecting trajectory communities from these similarity values.

of the database. For example, if we have a transition pair(s1, s3) with tk(1, 3), then A(1, 3)= A(1, 3) + tk(1, 3). Weproceed similarly for the time spent at the sites, say s1 ands5, where A(1, 1)= A(1, 1) + tk(1), and A(5, 5) = A(5, 5)+ tk(5). Finally, we normalize each row of A to ensure∑sbA(sa, sb) = 1.

Dealing with site uncertainty. In many indoor scenes,the sites are very close to each other. Due to the noise ofGPS sensors, the location information may be inaccurate.Given an estimated location center and its location error circle,users may be actually staying at multiple sites covered bythe uncertainty range with different probabilities. Therefore,we propose probabilistic weighted accumulation strategy todeal with site uncertainty. The probabilistic weighting schemeis shown in Fig. 2, where the crosses denote the estimatedlocation center, and the in-dashed circle represents the uncer-tainty of the signals (localization error bounds). We use theNadaraya-Watson kernel regression on the Gaussian kernel toapproximate the probabilistic site membership according totheir distances θ to the estimated location center, which wedenote by o. For example, the probabilistic site membershipof Coordinate 1 for site a, b and c in Fig. 2 is:

oa =exp(

−θ2aσ )∑

i exp(−θ2iσ )

, ob =exp(

−θ2bσ )∑

i exp(−θ2iσ )

, oc =exp(

−θ2cσ )∑

i exp(−θ2iσ )

,

oa + ob + oc = 1.(2)

where σ denotes the uncertainty range of the sensor. Similarcalculations can be done on Coordinate 2. Given a singletransition behavior from Coordinate 1 to Coordinate 2 for Userk with transition time denoted by tk(c1, c2), we conduct a setof probabilistic weighted accumulations as follows:

A(si, sj) = A(si, sj) + tk(c1, c2)oioj ,i = {a, b, c}, j = {d, e, f}. (3)

Discussion. We adopt Eqn. 2 and 3 to calculate A, andensure that A is row normalized after scanning the wholedatabase. By appropriately normalize the total time spent ofeach trajectory to be 1 with respect to the site stay/transitionbehavior and location inaccuracy, our model is able to balancethe influence of trajectories with significantly different lengthsand time spans. By appropriately normalize the rows of A, ourmodel is able to balance the influence of the most visited sites

+a

b

c

θa

θb

θc

+d

e

f

θd

θe

θf

Coordinate 1 Coordinate 2

Fig. 2. Location uncertainty. A soft assignment of thecoordinates to the sites can be used to alleviate thenegative effect of hard-assignment to the nearest sites.

and the less visited sites in Markov random walk process.The resulted A reflects user interests, transition convenienceand semantic relation among sites by modeling the uncertaintyin both spent time and site membership.

Representative distribution of a user’s site visits: Givena user trajectory and A, we calculate the stable distributionwhich represents the probability a user appearing at the sites.We denote such stable distribution for the k-th trajectoryas rk ∈ RM . We collect the set of sites where the usersappears, and enumerate the number of visits. For example,by considering the temporal interval information, we calculatethe personalized appearance vector of trajectory k:ρok=[tk(1), 0, tk(3),tk(4),0,0,0,tk(8),0,0,...]>, ρok ∈ RM .Furthermore, the values of ρok can be represented in a

probabilistic form based on the probabilistic site member-ship o to deal with location inaccuracies. This extensionis straightforward and omitted here. We normalize ρok overall of its elements, to ensure

∑i ρok(i) = 1. Finally, for

each trajectory, we apply an iterative process to calculate thestationary distribution rk for each trajectory,where:1. FOR t = 1 to TArk = ηA> · rk + (1− η)ρok

2. Normalize rk so that∑Mm=1 rk(m) = 1.

The process is inspired by the famous personalized Pagerankmethod in [12]. A> represents the transpose of matrix A. TheA> · rk represents the global site transition information of alltrajectories, ρok represents local individual information of eachtrajectory, and η is the weight between them. To analyze theproperty of rk, we have the following lemma.

Lemma 1. Each semantic feature rk converges to a uniqueanalytical solution when TA →∞1.

1. Typically, setting TA to 20 comfortably guarantees convergence.

Page 5: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 5

Remark: If rank(A) = M − 1 and 0 ≤ η < 1, for any εsatisfying ||ρok − ρok′ ||2 ≤ ε, then there exists δ so that ||rk −rk′ ||2 ≤ δ.

The proof is provided in our conference version [23].From the remark we see that, if the personalized pattern

ρok and ρok′ is similar, their semantic features will be similaras well. By using rk as the stable distribution of the sites,the semantic divergence between the k-th and k

′-th trajectory

can be easily calculated. The non-zero dimensions in rk notonly tells that how long a user has stayed on the certainsite, but also provide a probability that how likely a userwould be visiting the site if it is not visited by user k inthe initial site occurrence vector ρok. The proposed stationaryfeature encodes more abundant semantic information by usingthe global statistical information, and thus it helps to betterdiscriminate among the semantic behavior of different people.

Another issue to be considered is the semantic annotationof the site. While each element of the probability vector rkdenotes a site, it is possible for multiple sites to be seman-tically equivalent: e.g., there may be three coffee shops in ashopping mall. To accommodate this equivalence, we apply asimple post-processing step on each stationary distribution rk.We merge and accumulate the dimensions of the probabilitiesof those sites with the same semantic meaning into onedimension, and then the M -dimensional stationary distributionfeature rk is reduced to M ′, where M ′ < M . Note that, afterthe post-processing step, rk is still guaranteed to be a stableprobability distribution and

∑Mi=1 rk(i) = 1.

Given semantic residency distributions, rk and rk′ , theirsemantic similarity can be computed by Radial Basis Function(RBF) kernel: Ks(k, k

′) = exp(−λ ‖rk − rk′‖2), where λdenotes the bandwidth parameter of RBF kernel. We calculatethe pairwise distance matrix D with N ×N , where D(i, j) =||ri − rj ||2. A heuristic setting of λ can be the inverse of theaverage of all the elements in D where λ0 = 1

mean(D) . Givenλ0, we select λ = [0.8λ0, 0.9λ0, λ0, 1.1λ0, 1.2λ0] to obtain aset of kernel matrices Ks, and finally the kernel with the bestperformance is selected.

4 MODELING TEMPORAL SPATIAL AND VE-LOCITY INFORMATION4.1 Modeling temporal informationFor each trajectory k, we extract a vector with four dimen-sional time stamps w: wk(1) records the first showing time;wk(2) is the first meeting time stamp for two visitors ina group; wk(3) denotes the ending time; wk(4) is the dayindex. The time stamp trajectory features describe the temporalactivity pattern for real-life communities. For example, youngpeople may prefer to go shopping or play together in theafternoon and evening, while the elderly may prefer to getup earlier and collectively go for exercise. Moreover, theteenagers may be seen to spend an appreciable amount oftime waiting for the arrival of their friends, whereas the elderlymay be more punctual. To appropriately measure the temporalsimilarity, the temporal kernel can be formulated as:

Kβ (k, k′) = e−β1

3∑i=1

(wk(i)−wk′ (i))2

e−β2(wk(4)−wk′ (4))2 (4)

where β1 and β2 denote the bandwidth factors. While theseneed to be adjusted for the specific trajectory mining scenario,the adjustment process is not very complicated. One feasibleapproach for setting β1 and β2 is as follows:

1) Given N trajectories, calculate the average temporalpattern distances as:

dβ1 =1

N2

N∑k′=1

N∑k=1

3∑i=1

(wk(i)− wk′(i))2 (5)

dβ2 =1

N2

N∑k′=1

N∑k=1

(wk(4)− wk′(4))2 (6)

2) To acquire the best performance, one can conduct pa-rameter tuning near the initial setting. For example:

β1 = [ 0.8dβ1

, 0.9dβ1

, 1dβ1

, 1.1dβ1

, 1.2dβ1

], β2 = 1dβ2

(7)

We then select the best parameters that maximize the perfor-mance of community discovery.

4.2 Modeling spatial information

We can directly use the Global Alignment Kernel (GAK) [6]to measure the spatial similarity between two trajectories. Wedenote the similarity of spatial information as Kp. Accordingto the analysis in [6], the GAK is positive definite.

However, GAK only measures the spatial closeness ofindividual trajectories, which may be incapable of discoveringthe true communities if a large mass of additional trajectoriesexhibit similar proximity. For example, if Alice, Bob and 100other people are waiting in a concourse area, then the spatialsimilarity between the trajectories of Alice and Bob should notbe too significant, because this concourse may be the only wayfor people to traverse through the space. However, if Alice andBob are the only two people in the group study room on acollege campus, and everyone else is in a different classroom,then these two trajectories should be viewed as significantlysimilar. Based on this intuitive observation, we propose a newGlobal Alignment Kernel with Inverse Proportion (GAK-IP)(inspired by TF-IDF in IR research), that intuitively weighsthe spatial similarity in inverse proportion to how many otherpeople are located within the similar distance range.

To describe this in detail, we first explain how an unmodi-fied GAK works. Consider two time series (trajectories):

cx = (cx(1), ..., cx(n)), cy = (cy(1), ..., cy(m)) (8)

where each cx(i) or cy(j) can be the two dimensional spatialcoordinates or the indicator of a certain site. An alignment isπ = (π1, π2) where a pair of increasing integral vectors oflength p ≤ n + m − 1. Note that such alignment may notbe unique, we write A(n,m) for the set of all alignmentsbetween two time series of length n and m. The GlobalAlignment Kernel is defined as the exponential soft-minimumof all alignment distances.

K0p(cx, cy)

def=

∑π∈A(n,m)

e−Dπ(cx,cy) (9)

Page 6: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 6

where Dπ (cx, cy)def=|π|∑i=1

ϕ(cx(π1(i)), cy(π2(i))),ϕ(cx, cy) =

||cx − cy||2.It is equivalent with the following formulation:

K0p(cx, cy)

def=

∑π∈A(n,m)

|π|∏i=1

κ(cx(π1(i)), cy(π2(i))),

κ(cx(π1(i)), cy(π2(i))) = e−||cx(π1(i))−cy(π2(i))||2

(10)

In our modified version of GAK-IP, we revise the definitionof distance as follows:

Dπ (cx, cy)def=

|π|∑i=1

wπ(i)ϕ(cx(π1(i)), cy(π2(i))), (11)

where wπ(i) denotes the weight calculated by considering thenumber of people within a range or on the same site. Wedenote the number of people on the site π(i) at the time whencx and cy appear together as nπ(i) ≥ 1, wπ(i) can be calculatedas:

wπ(i) =|π| ln(1 + nπ(i))

|π|∑i=1

ln(1 + nπ(i))

. (12)

Note that using loge reduces the disproportionate impact ofa site where the pair are co-resident with a very large numberof objects (a high nπ(i)), which may cause that Kp will behighly dependent at a certain location with crowded objects.

Lemma 2. GAK-IP is positive definite.

The proof is provided in our previous work [23].

4.3 Modeling velocity informationThe information encoded in velocity pattern of moving objectsis also critical for real-life trajectory analysis. For example, theyounger taxi drivers tend to drive fast, while the experienceddriver will keep a safe speed. However, we face two challengeswhen modeling the velocity pattern. The first is that trajectoriesare with non-uniform lengths, which brings about difficultyin directly measuring their pairwise similarity from velocityaspect. The second challenge is that velocity characteristics arediversified on different object, time and location, therefore, itis hard to directly construct their velocity pattern correlation.To analyze the velocity consistency of the trajectories byaddressing the two challenges, we design a temporal pyramidkernel by considering different temporal resolutions, which isinspired by [15] in image classification domain on calculatingthe image level similarity.

Each trajectory is attached with a velocity vector vk withunequal lengths. We uniformly quantize the velocity into Llevels2. Given vk with length lk, we calculate the normalizedhistogram hk (0) on vk. Then we equally divide vk into twoparts vk → [vk(1), vk(2)], where both vk(1) and vk(2) arealso velocity vectors with lk

2 . We calculate the normalizedhistogram hk(1) and hk(2) on vk(1) and vk(2), respectively,and normalize them so that

∑hk(1) +

∑hk(2) = 1. Con-

sequently, we further equally divide vk(1) or vk(2) into two

2. In fact, the quantization scheme can be application dependent.

parts again and calculate the histograms in the same way. Suchprocess can be conducted until a predefined level is achieved.We concatenate all the histograms with predefined weights.

Based on this, we can extract a velocity histogram hkof equal length Dh with coarse-to-fine temporal resolution.The similarity between user trajectory k and k′ can be eithercalculated with histogram intersection or Chi-Square kernel.The histogram intersection and the Chi-Square kernel are:

Kh(k, k′) =Dh∑i=1

min(hk(i), hk′(i))

Kh(k, k′) = exp

(− 1

2$

Dh∑i=1

(hk(i)−hk′ (i))2

(hk(i)+hk′ (i))

) (13)

where $ denotes the bandwidth parameter for the Chi-Squarekernel. $ may be tuned as follows:

1) Given N trajectories, calculate the average velocitypattern distances as:

dh =1

2N2

N∑k′=1

N∑k=1

Dh∑i=1

(hk (i)− hk′ (i))2

(hk (i) + hk′ (i)). (14)

2) Set the initial value $0 as 1/dh.3) To acquire the best performance, we can

conduct parameter tuning near the initialsetting (e.g., setting candidate parameter set as:$ = [0.8$0, 0.9$0, $0, 1.1$0, 1.2$0]).

5 TRAJECTORY COMMUNITY DISCOVERYFROM MULTI-SOURCE SIMILARITY MEASURE-MENT

Now we have several types of kernels representing the similar-ity from different information sources. The kernel set includessemantic kernel Ks, temporal kernel Kβ , spatial kernel Kp andvelocity kernel Kh. Based on the similarity definition in Eqn.1, we detect the trajectory communities from the trajectorydatabase. Note that the trajectory data is collected from openenvironment with diversified customer behaviors, so that eachtrajectory does not have to be assigned with a communitylabel. To this end, we propose two strategies for communitydiscovery.

5.1 Weighted CombinationA direct way to combine multiple information sources forcommunity discovery is to obtain a unified behavior similaritymatrix K(α) by weighted combination of all the similaritymatrices as follows:

K(α) = α1Ks + α2Kβ + α3Kp + α4Kh (15)

where∑kαk = 1 and ∀k, αk ≥ 0. Note that, in dense

sub-graph detection, the diagonal elements for each indepen-dent similarity matrix and K(α) are all set zeros to avoidself-loop in graph. The weighted combination in Eqn 15is similar as traditional multi-feature combination models[43], [1]. It has the following properties that lead to thelimited flexibility in processing real data. First, the optimalbehavior similarities (i.e., entries in K(α)) among users areapproximated by adjusting the weight αk, k = 1, ..., 4. The

Page 7: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 7

structures of the spectral graphs (i.e., similarity matrices)on different information sources are significantly different.Therefore, the neighborhood structure of direct combinationof multiple similarity matrices is drastically different fromthe similarity matrix of each information source, leading tonumerically unstable community detection solutions. Second,from the Markov random walk perspective, only the first orderconditional dependency (i.e., one-step Markov state transition)is considered in Eqn. 15. Considering that the formulation ofcommunities are a collective behavior evolution process in along time range, the one-step conditional dependency tends tobe overly simple to model the behavior similarity among largenumber of users.

5.2 Multi-source Diffusion ModelingThe problem of community detection with multiple similaritymatrices is closely related with traditional multi-view cluster-ing [41]. Sindhwani et al. [37] used a convex combinationof Laplacians in the ‘co-regularization’ framework, whichis similar in spirit with weighted combination in Eqn. 15.Eynard et al. [7] proposed to find a common eigenspaceof multiple Laplacians by joint approximate diagonalization.Bronstein et al. [4] studied the closest commuting operators(CCO) and showed its equivalence to joint diagonalization [7].Specifically, two data views can be aligned by minimizingthe divergence of the heat kernels on two data manifolds [3],which models the intra-modal relation by diffusion process.

In the community detection scenario, beyond the pair-wisecoupling of the diffusion process, the similarity structurein each information source should be jointly aligned andmaintained as much as possible to guarantee the maximalconsistency of different data manifolds and effective infor-mation fusion. We first define the diffusion process on datamanifolds provided by different behavior similarity matriceswith the following standard partial differential equation as:

Lmfm(t)+∂

∂tfm(t) = 0, fm(0) = f0

m, m = 1, ...,MK . (16)

where Lm denotes the graph Laplacian on m-th behaviorsimilarity measurement. Correspondingly, instead of the Eu-clidean distance, the similarity among different users on eachinformation view is measured by the diffusion distance:

dtm(xk, xk′) =

(K∑q=1

((Ht

m)kq − (Htm)k′q

)) 12

(17)

which measures the “reachability” of vertex (user) xq fromvertex xp in time t. The heat kernel Ht

m is defined as:Htm = e−tLm . By diffusion process, the behavior similarity in

each information source can be modeled in a more appropriatemanner. Therefore, the true community information can beidentified more precisely.

Second, each user has the corresponding node on eachinformation source. For example, the k-th node of all the m-th kernel, m = 1, ...,MK correspond to exactly the sameuser. Therefore, the diffusion processes should be naturallyaligned. In other word, either the ‘heat flows’ on each graphLm should be identical, or the projection of the solutions ofthe corresponding functions should be equal. Since the former

condition requires that the graphs of different informationsources are isometric [34], it is too strict for modeling the realworld user behaviors. We adopt the latter condition, i.e., theweak coupling condition which only enforces the projectionconsistency [3]. To jointly couple the heat kernels on all theinformation sources, we learn the graph Laplacian matrices byminimizing the objective function:

minLm∈L

MK∑m=1||Lm − Lm||2F+

βMK−1∑m=1

MK∑m′=m+1

Υ∑τ=1||e−tτLm − e−tτLm′ ||2F

(18)

where Lm denotes the learned graph Laplacian on m-th kernel,which is regularized to be not too dissimilar with the originalgraph Laplacian Lm. tτ , τ = 1, ...,Υ represent the discretetime stamps. The diffusion processes at each tτ is expectedto be maximally aligned. The parameter β is the weightparameter that controls the relative importance of the diffusionprocess consistency at multiple time stamps. In this paper, weempirically set β = 105 to guarantee the robustness of thediffusion process alignment. The Eqn. 18 can be solved bysimple gradient based techniques [3].

Based on the learned graph Laplacians, the aligned unifiedbehavior similarity matrix K′(α) is given by:

K′(α) = α1K′s + α2K′β + α3K′p + α4K′h (19)

where K′s, K′β , K′p and K′h are recovered from the learnedgraph Laplacian Lm.

5.3 Community Detection

Based on K(α) (Eqn. 15) or K′(α) (Eqn. 19), we use thedense sub-graph detection method proposed by [22] whichrobustly detects a set of highly connected sub-graphs (cliques)from the graph, where the nodes represent the trajectories andthe weights of the vertices represent the pair-wise similarityamong trajectories [47].

The method represents a set of vertices by a probabilisticcluster, which is a unit vector in the space of standardsimplex. Then, a quadratic function is introduced to measurethe average edge weight among them and the dominant set isdefined as the sub-graph with the largest average edge weight.In this paper, we refer to such average edge weight as thedominance of a sub-graph. More detailed as:

1) The probabilistic cluster is defined as z ∈ ∆N , where∆N = {z|z ∈ RN , z ≥ 0, ||z||1 = 1} is the space ofstandard simplex and N is the total number of vertices.In fact, z is a unit mapping vector; the value of zi,which is the i-th dimension of z, is the probabilitythat the probabilistic cluster z contains the i-th vertex .Particularly, if z = Ii, whose i-th dimensional value is1, then it represents a probabilistic cluster that containsonly the i-th vertex with probability zi = 1. Any i-thvertex with zi = 0 is not included by the cluster.

2) The dominance of the probabilistic cluster z is definedin Eqn. 20, where K is the symmetric similarity matrixof the consistency graph G representing K(α) (Eqn. 15)

Page 8: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 8

or K′(α) (Eqn. 19), where the diagonal element is setto 0:

g(z) = z>Kz (20)

The dense sub-graph seeking problem can be formulated as astandard quadratic optimization problem [22]. It can be solvedby the replicator dynamics method, where z is the probabilisticcluster and t indicates the index of iteration.

maxz g(z) = z>Kz, s. t. z ∈ ∆N

zi(t+ 1) = zi(t)(Kz(t))iz(t)TKz(t) , i = 1, ..., N

(21)

For the graph G(α) with N vertices, the probabilisticcluster is initialized as: z(0) =

{zi(0) = 1

N |z(0) ∈ ∆N}

.Such iteration is easy to implement and easy to calculateand according to the experiment results, it converges in 4iterations on average for a graph with less than 100 vertices.After detecting one dense sub-graph using the above process(i.e., find an optimal solution z∗ by using the method in Eqn.21 [22]), we exclude the vertexes with non-zero value in z∗,and start the optimization process again on the reduced graph,where the new affinity matrix K′ is the sub-matrix of K. Thisprocess is repeated until the vertex set becomes empty. Foreach z∗, we define the average similarity of the currentlydetected dense sub-graph as (z∗)>Kz∗. The whole procedureof TODMIS is demonstrated in Algorithm 1.

Algorithm 1 TODMIS algorithmInput: Data: X, Parameters: α, η, λ, β1, β2, $Output: z∗c′ , c′ = 1, ..., C

1: Construct Ks on X with parameters η and λ.2: Construct Kβ on X with parameters β1 and β2.3: Construct Kp on X with GAK-IP.4: Construct Kh on X with parameter $.5: Learn [K′s,K

′β ,K

′p,K

′h] and construct K′0 based on α.

6: c′ = 1.7: while The vertex set in G(α) is not empty do8: Get z∗c′ by solving Eqn. 21 on K′c′−1.9: Exclude the vertexes with non-zero elements in z∗c′ from

G(α).10: Form K′c′ by extracting the sub-matrix from K′c′−1.11: c′ = c′ + 112: end while

According to [22], the time complexity depends on thenumber of edges, which is O(N2) for fully connected graph.However, for processing large trajectory database, a sparsenearest neighbor graph can be constructed instead of a fullyconnected graph. Therefore, the time complexity can be re-duced to O(qN), which is linear with respect to the numberof trajectories N and the number of nearest neighbors q.

6 ONLINE RECOMMENDATION

Based on the detected trajectory community, we observethat similar persons have similar interest, and thus we canconduct recommendation (such as academic activities in cam-pus, coupons in shopping mall and taxi services in the city,namely items in the scenario of recommendation) to certain

objects (such as students, customers and taxi drivers, namelyusers in the scenario of recommendation). In this section, weinvestigate online recommendation via trajectory communityinformation. There are two important questions to be an-swered: one is how to improve the current recommendationtechnologies based on the detected trajectory community, andthe other is how to generate the recommended candidate setof users who are willing to follow the recommendation.

6.1 Trajectory Community based RecommendationBased on the detected trajectory community, we employmemory-based collaborative filtering algorithm [39] for on-line recommendation. For the user-based recommendationapproach, it mainly contains two critical steps: calculate thesimilarity of users and produce recommendation for the ob-jective users. For many practical recommender systems, thecalculation of user similarity is intractable due to informationincompletion, such as data sparsity. Fortunately, in this paper,the user similarity can be calculated based on the detectedtrajectory community in advance.

Let Ti = {ti1, · · · , tiNi} ⊆ X denotes the trajectory setof user i, similarly, Ti′ = {ti′1 , · · · , ti

Ni′} ⊆ X denotes

the trajectory set of user i′, and Ci = {ci1, · · · , ciK} andCi′ = {ci′1 , · · · , ci

L} are their corresponding sets of trajec-tory communities, thus one simple strategy to calculate thesimilarity of user i and user i′ is

S(i, i′) =|Ci ∩ Ci′ ||Ci ∪ Ci′ |

. (22)

Thus based on the user similarity, the rating prediction ofuser i on item j can be calculated by

r̂ij =

∑Nii′=1 ri′jS(i, i′)∑Nii′=1 S(i, i′)

, (23)

where r̂ij denotes the predicted rating of user i on item j, Nidenotes the number of neighbor users of user i, ri′j denotes theobserved rating of user i′ on item j. In practical recommendersystems, ri′j is one measure of user i′’s interest, such aspurchase product j or not, money or time spent in store j.

Due to the similarities of all trajectories and their commu-nity clustering have been calculated in model training phase,recommendation only needs to calculate the weighted averageof user ratings according to the similarities of objective users,which is actually an online procedure. Due to the trajectoriesare incremental, the period of model updating can be pre-assigned according to user experience, such as ten days, orthe available computing resources.

6.2 User RankingPersons follow the recommendation with different probabili-ties due to their different profiles, different behaviors in thepast and different application contexts. One crucial part here ishow to choose the right persons to conduct recommendation.Actually, the recommended candidate set of users can begenerated based on the detected trajectory community. LetTi = {ti1, · · · , tiNi} ⊆ X denotes the trajectory set of useri, correspondingly, Ci = {ci1, · · · , ciNi}, thus the strategy is

Page 9: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 9

to choose the users with the most amount of trajectories (i.e.,|Ti| is largest among the trajectory sets of all users, where| · | is the number of elements in a set) or the most trajectorycommunities. On the other side, some users have little overlapwith the popular trajectory communities which lots of trajec-tories belong to, however, these outlier-alike users are usuallywith higher loyalty to certain items. In certain scenarios suchas some products for the minority, some selective outlier-alikeusers can be considered as the candidate set for the novelty ofrecommendation.

7 EXPERIMENTAL EVALUATION

Real-life data sets: We conduct our experiments on three reallife data sets: 1) Campus student tracking data (Campus):We collected 1,000 students’ trajectories with time durationas long as one week in one level of a university campusbuilding. Based on the students’ real activities (ground truthdata), we are able to label 40 real groups (a.k.a. trajectorycommunity). Based on detected student community in campus,social activities can be recommended to different groups ofstudents. 2) Customer shopping behavior data (Mall): Wecollected 5,000 customers’ trajectories (over an observationduration of two days) in one level of a big shopping mall inSingapore. In this data set, we introduced 16 controlled groups(ground truth) to test if TODMIS can identify these groups.Based on the detected shopping groups in mall, coupons onspecific products and services can be recommended to certaincustomers for a win-win situation. 3) Taxi driver tracking data(City): We collected 10,000 taxi drivers’ trajectories (overan observation duration of one week) from a big city inChina. In this data set, there are 650 groups (ground truthfrom the taxi company), such that each group is assigneda region to traverse by the company.). Here, places withhigh probability of picking up passengers and real-time roadcongestion information can be sent to taxi drivers in groupsfor sharing information [24], [25]. The whole city has morethan 20,000 road segments, around 10,000 taxis belongingto more than 100 taxi companies; each company partitionsits drivers into several taxi groups (10 to 50 groups). Thetracking records include Taxi ID, instant speed, driving di-rection, location, company ID, taxi type ID and group ID(associated with company ID). From the data, we observe thattaxi drivers do exhibit a grouping pattern (community): givena region (described as a set of road segments), they alwaystraverse in this region in a given time period. Thus in thisexperiment, we create a grid-based representation of the city(based on the road segments) and investigate the effectivenessof TODMIS by applying it on the collected trajectories ofthe drivers, and then comparing the identified groups withthe real groups defined by the company (ground truth). Tocompare our detected trajectory community (trajectory-based)against a traditional social network-based clustering approach,we also collected the social contact information of the abovetaxi drivers, consisting of the phone call (from a communicatoron taxi) records including caller taxi ID, callee taxi ID, starttime, end time, start location and end location. To evaluate therecommendation methods, in the three scenarios, we observe

how the objects followed different recommendation methods.More details are provided in Section 4.

Experiment environment: Our experiments and latencyobservations are conducted on a standard server (Linux) withfour Intel Core Quad CPUs, Q9550 2.83 GHz and 32 GB mainmemory, and implemented in Java 1.7 using 64-bit addressing.

Baseline methods: 1) Trajectory community detection eval-uation (Section 7.1 to Section 7.3). For the trajectory data,we compare our method with BU algorithm [40], M-Atlas[30], [10] and Multifeature [1]. BU algorithm is used todetect traveling companions from streaming trajectories. M-Atlas is used to create and navigate a catalog of the mobilitybehaviors of a territory (GPS data). Multifeature is used toestimate common patterns of behaviors and isolates outliersin video data. For the social connection data, we compareour method with DSHRINK (one of the latest communitydetection algorithms), a distance-based clustering algorithmfor detecting communities in incomplete information networkswith missing edges [21]. Unless specified, we report thecommunity detection results of TODMIS by using the alignedkernel combination (Eqn. 19). 2) Recommendation evaluation(Section 4). We compare our method with Supervised Ran-dom Walk (SRW)[2] which is a graph-based link predictionapproach utilizing the supervised information of edges and thenode (user) attributes, and FUSE [5] which is a content-basedrecommendation method.

The parameters for the various baseline methods will bedescribed shortly.

Evaluation metrics: We use the total execution time (atdifferent scales) to evaluate the computational efficiency. Weutilize the precision and recall to evaluate effectiveness. Thedetailed definitions of the above evaluation metrics are ex-plained in the experimental results. The parameters of allthe kernels for TODMIS are tuned strictly according to themethods described in the previous sections.

7.1 Effectiveness evaluation

In this experiment, precision is defined as the fraction ofretrieved communities that are relevant to the search. Therecall is defined as the fraction of the communities that arerelevant to the query that are successfully retrieved. The F1score can be easily calculated based on precision and recall,and we omit it in this paper.

Influence of heat kernel coupling: We perform experimentcomparison on the influence of heat kernel coupling in Eqn.18. We use the average weight αm = 0.25,m = 1, ..., 4 tocombine all the kernels for fair comparison. In the Mall settingwith 1,000 trajectories, the precision of TODMIS without heatkernel coupling and with heat kernel coupling are 0.61 and0.64, respectively. The recall of TODMIS without and withheat kernel coupling are 0.58 and 0.61, respectively. In theCity scenario with 2,000 trajectories, the precision of TODMISwithout and with heat kernel coupling are 0.61 and 0.63,respectively. The recall of TODMIS without and with heatkernel coupling are 0.59 and 0.60, respectively. Without excep-tion, the heat kernel coupling on multiple information sourcesproduces more consistent and aligned similarity representation

Page 10: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 10

200 400 600 800 10000

0.2

0.4

0.6

0.8

Number of trajectories (Campus)

Prec

isio

n

TODMIS (0.3, 0.4, 0.1, 0.2)TODMIS (0.25, 0.25, 0.25, 0.25)BUM−AtlasMultifeature

(a) Precision in campus test

1000 2000 3000 4000 50000

0.2

0.4

0.6

0.8

Number of trajectories (Mall)

Prec

isio

n

TODMIS (0.5, 0.2, 0.1, 0.2)TODMIS (0.25, 0.25, 0.25, 0.25)BUM−AtlasMultifeature

(b) Precision in mall test

2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

Number of trajectories (City)

Prec

isio

n

TODMIS (0.3, 0.2, 0.2, 0.3)TODMIS (0.25, 0.25, 0.25, 0.25)BUM−AtlasMultifeature

(c) Precision in city test

2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

Number of trajectories/nodes (City)

Prec

isio

n

TODMIS (0.3, 0.2, 0.2, 0.3)TODMIS (0.25, 0.25, 0.25, 0.25)TODMIS (0.4, 0.3, 0.3, 0)DSHRINK

(d) Precision in social test

Fig. 3. Precision evaluation (TODMIS performs best in trajectory and social connection based community detection).

200 400 600 800 10000

0.2

0.4

0.6

0.8

Number of trajectories (Campus)

Rec

all

TODMIS (0.3, 0.4, 0.1, 0.2)TODMIS (0.25, 0.25, 0.25, 0.25)BUM−AtlasMultifeature

(a) Recall in campus test

1000 2000 3000 4000 50000

0.2

0.4

0.6

0.8

Number of trajectories (Mall)

Rec

all

TODMIS (0.5, 0.2, 0.1, 0.2)TODMIS (0.25, 0.25, 0.25, 0.25)BUM−AtlasMultifeature

(b) Recall in mall test

2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

Number of trajectories (City)

Rec

all

TODMIS (0.3, 0.2, 0.2, 0.3)TODMIS (0.25, 0.25, 0.25, 0.25)BUM−AtlasMultifeature

(c) Recall in city test

2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

Number of trajectories/nodes (City)

Rec

all

TODMIS (0.3, 0.2, 0.2, 0.3)TODMIS (0.25, 0.25, 0.25, 0.25)TODMIS (0.4, 0.3, 0.3, 0)DSHRINK

(d) Recall in social test

Fig. 4. Recall evaluation (TODMIS performs best in trajectory and social connection based community detection).

on different behavior similarity measurement space, thus betterperformance can be obtained.

Single kernel: We conduct experiments to validate theadvantage of using multiple information sources over singleinformation source. We compare TODMIS using average ker-nel weight (αm = 0.25,m = 1, ..., 4, which has been widelyaccepted in multiple-feature fusion paradigm) with TODMISusing single kernel corresponding to αm = 1,m = 1, ..., 4,respectively. Note that we perform the heat kernel coupling inEqn. 18 to learn the aligned kernels, and use the single alignedkernel for evaluation. For example, in the Mall setting with1,000 trajectories, the precision of TODMIS using averageweight is 0.64, which outperforms the best performance ofthe single kernel version (the best precision is 0.47 achievedby using semantic kernel K′s). The recall of TODMIS usingaverage weight is 0.61, which outperforms the best recall ofthe single kernel version (the best recall is 0.42 achievedby using K′s). For the City scenario, with 2,000 trajectories,the precision of TODMIS using average weight is 0.63,which outperforms the best performance of the single kernelversion (the best precision is 0.38 achieved by using velocitykernel K′h). The recall of TODMIS using average weight is0.60, which outperforms the best recall of the single kernelversion (the best recall is 0.36 achieved by using K′h). Ourexperiments thus provide strong evidence of the benefit ofusing multiple information sources.

Multiple kernels: We use the combination of the alignedkernels (Eqn. 19) for experiment comparison. Based on theresults from Figure 3 and Figure 4, TODMIS performs betterprecision and recall than the baseline methods. Note thateven if we simplistically weigh the semantic, temporal, spatialand velocity measures equally, TODMIS still performs better.From the parameter tuning, we can find that different measuresimpact the precision and recall in different scenarios. For theCampus scenario, we observe that the grouping performanceis more sensitive to the semantic and temporal similaritymeasures (rather than the spatial and velocity measures).Intuitively, this is due to the fact that groups perform different

1000 2000 3000 4000 50000

100

200

300

400

500

Number of trajectories (Mall)

Tim

e co

st (

Seco

nds)

TODMIS (0.5, 0.2, 0.1, 0.2)

BU

M−Atlas

Multifeature

(a) Test in mall

2000 4000 6000 8000 100000

400

800

1200

1600

2000

Number of trajectories (City)

Tim

e co

st (

Seco

nds)

TODMIS (0.3, 0.2, 0.2, 0.3)

BU

M−Atlas

Multifeature

(b) Test in city

Fig. 5. Efficiency evaluation. TODMIS is efficient andscalable in different scenarios.

2000 4000 6000 8000 100000.2

0.4

0.6

0.8

1.0

Number of trajectories (City)

Prec

isio

n

200 m400 m600 m800 m1000 m

(a) Precision

2000 4000 6000 8000 100000.2

0.4

0.6

0.8

1.0

Number of trajectories (City)

Rec

all

200 m400 m600 m800 m1000 m

(b) Recall

Fig. 6. Sensitivity evaluation. TODMIS is tolerant todifferent trajectory granularities.

activities in different spaces (e.g., lectures in lecture halls,project work in group study rooms) and for different durations(e.g., lectures for 1.5 hours, project work for longer durations),whereas the spatial and velocity information are less discrim-inative (especially because the floor is very crowded). In theMall scenario, the semantic similarity measure (each store istreated as a distinct semantic label) is obviously important as,intuitively, members of a group are likely to visit the samestores in a similar sequence. However, unlike campus, wecannot perform any significant state-space reduction duringpost-processing, as it is unlikely to find two outlets of thesame store in the mall. Similar to the Campus scenario, thespatial and velocity similarity measures in the Mall are lesssignificant for clustering.

For the City scenario, the observations are quite different:for taxi traffic, the semantic information of different sites and

Page 11: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 11

200 400 600 800 10000

0.2

0.4

0.6

0.8

1

Number of objects (Mall)

Prec

isio

n

TODMISFUSESRW

(a) Precision in mall test

200 400 600 800 10000

0.2

0.4

0.6

0.8

1

Number of objects (Mall)

Rec

all

TODMISFUSESRW

(b) Recall in mall test

200 400 600 800 10000

200

400

600

800

1000

Number of objects (Mall)

Run

ning

tim

e (S

econ

ds)

TODMISFUSESRW

(c) Time cost in mall test

Fig. 7. Recommendation evaluation (TODMIS performs the best).

the velocity information of drivers are more discriminativethan the other two measures (although the temporal and spatialmeasures do play a role). Note that TODMIS is less accuratein our experiment on taxi drivers’ trajectories, compared to theCampus and Mall scenarios. The reason is that even thoughwe utilize the road segment to identify each taxi driver’strajectory, taxi drivers do not traverse the roads in groups(unlike students on campus or shoppers in the mall). ButTODMIS still achieves promising effectiveness as shown inFigure 3 (c) and Figure 4 (c). In Figure 3 (d) and Figure 4 (d),we report the experiment results of community detection basedon social links (phone calls between drivers). The precisionand recall tests are based on the real group (community)information from the company. It is obvious that our method,even without velocity kernel, is still better than DSHRINK(close to but not better than TODMIS without velocity kernel).From the results, we notice that the real social connectionbased community is actually not the same as the communityinformation from taxi companies. On investigating the reasonsfor the poor performance of DSHRINK, we found that 4.5%taxi drivers actually make less than 5 calls per day and 18%taxi drivers make more out-going calls to other communities(in the same company) than to their own community. ForTODMIS, we not only capture the calling behavior betweendrivers, but also their semantic, temporal, spatial and velocitysimilarities from the actual movement, which is a betterindicator of the community as opposed to the call records.

7.2 Efficiency evaluation

In Figure 5, we report the total execution time of the clus-tering algorithm, as a function of the number of trajectories.We see that TODMIS’s computational efficiency is close toMultifeature but better than the other two baseline methods.Moreover, TODMIS shows a close-to-linear relationship be-tween execution time and the number of trajectories.

7.3 Sensitivity evaluation

The spatial granularity of trajectory data has impact on theclustering result/ community discovery quality. To study this,we run TODMIS on the City scenario, with different trajectory(spatial grid) granularities. Specifically, we conduct experi-ments with different road segment lengths (where each roadsegment defines a sample point in the trajectory data). Asshown in Figure 6, TODMIS is tolerant to different trajectorygranularities (different road segment lengths). More specifi-cally, the results for segment lengths of 200 meters and 400

meters are very similar, but the precision begins to decreaseif resolution gets coarser (segment lengths of 600 metersand higher). Intuitively, we note that 55.4% of the true roadsegments within our test area have length between 200-400meters, implying that larger values of segment lengths (e.g.,600 meters) cause TODMIS to miss out features distinct toeach individual true segments. However, we find that TODMISis robust to different trajectory granularity values.

7.4 Recommendation evaluation

To evaluate recommendation methods, we select 1,000 objectsin the three scenarios and run different recommendation meth-ods separately. To save the space, we only report the evaluationresult in shopping mall, but the other results are similar andhave the same conclusion as shown in Figure 7.

In Figure 7 we report the evaluation results in the shoppingmall scenario. Precision is defined as the fraction of objectswho follows our recommendation are relevant to the search.Recall is defined as the fraction of the objects that are relevantto the objects who followed the recommendation that aresuccessfully retrieved. The F1 score can be easily calculatedbased on precision and recall, and we omit it in this paperto save the space. TODMIS in Figure 7 refers to our onlinerecommendation method in Section 6.

Based on (a) and (b) in Figure 7, the conclusion is thatTODMIS performs the better recommendation than the twobaseline methods. Note that TODMIS returns the better rec-ommendation results along the more objects, but the twobaseline methods performs the worse. The reason for thisresult is that TODMIS is designed on the observation thatthe more similar trajectories the more similar interest for theobjects, while SRW and FUSE do not model the trajectoryinformation and environment information very well. Suchresult also confirms the correctness of our above observation.In Figure 7 (c), TODMIS saves much more time than the twobaseline methods during the recommendation process. Suchperformance difference appears much more obvious along theincreasing number of objects. The reason for such result is thatour method utilizes the information learned from one set ofobjects’ trajectories which can be used for other set of objects’recommendation, while the other two baseline methods do nothave such merit. Even though time cost has a slow growthalong with the increasing number of objects, recommendationfor each object can be achieved in very little time which isfeasible for online recommendation. In summary, our methodworks very well in recommendation process.

Page 12: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 12

SupperLunch

Fig. 8. Trajectory community analysis. Voronoi cells in-side the icon encode the activity type and the cell sizeencodes the number of records of an activity. Outer barpresents different activities’ counts in the weekend. Innerbar shows the weekday’s activity counts.

7.5 Findings and discussions

Trajectory community analysis: Based on our retrievedtrajectory communities, we can analyze objects’ social be-haviors. More specifically, in the Mall scenario, we notedthat more than 80% of the identified communities (groups)spent more than one hour in the specific level of the shoppingmall and visited more than five stores. Among them, 95%traversed together–i.e., visited the same stores concurrently.Additionally, if we analyze their activity distribution, as shownin Figure 8, most of their social behavior (time spent to-gether) was spent on lunch and dinner (orange and red areas).Of course, this distribution is undoubtedly affected by thedistributions of stores in the mall: on the specific floor, themajority of the stores are fashion (clothes), beauty (hair andnail) and shoes/ bags stores, and only three restaurants andcafe. However, the majority of social behaviors still happenedin the restaurants and cafe.

It should be clear that the ability to identify groups, and theirpreferred activities, offers significant opportunities for targetedmobile advertising. To understand this further, Table 1 reportsa preliminary analysis of shopping behavior and preferences,segmented by gender and group vs. individual characteristics.Expense is computed via a survey where each individual/groupprovided a list of items they were planning to buy during theirvisit. Based on the 16 groups of customers with full back-ground information, we can find that 1) in general, a group ofcustomers stay longer, visit more stores and spend more moneythan a single customer, and 2) while the mixed groups (maleand female together) stay for a shorter duration, they spendmore money and visit more stores.Clearly, these preliminaryresults provide evidence that mobile advertisements shouldbe carefully targeted to differentiate between individuals andgroups (and different group types).

For the City scenario, we noticed that around 14% cannot beclustered, which we call as “single taxis”. Compared with thesingle taxi, the drivers belonging to a group earned 17%±2%more revenue, while traveling 18%±2% less distance (withouta fare). Thus, the community can help drivers earn morerevenue and reduce cost (e.g., gas and depreciation).

Outliers in trajectories: Since trajectories are collected

TABLE 1Shopping behavior analysis. Time is the duration spent,Expense is the money spent and Variety is the number

of stores visited.

Composition Time Expense VarietyFemale community 2.4 194.4 3.7Male community 0.7 33.2 2.3Mixed community 2.0 284.7 5.8Single female 1.8 26.1 1.4Single male 0.3 46.3 2.7

from real-life scenarios, it is obvious that some outliers willbe included in the database. These outliers act differently withall the other trajectories within the detected communities, sothey cannot be included into any of them. For example, forthe taxi driver’s trajectories, the taxi may encounter with trafficaccident and stay in certain location for a very long time towait for the traffic police, or some drivers may interrupt theirbusiness due to unpredictable emergency. These trajectoriesact differently and they are not similar from other trajectoriesin all the information sources. Therefore, they are assigned tothe “background” by our algorithm and the average similarityof the background are very low (usually less than 0.2).

Automatically learning kernel weights: Besides manuallysetting, the weights for different kernels can be automaticallylearned towards certain objective. For example, given labelsdescribing if the trajectories belong to the same communityor different communities, we can learn the kernel weights tomaximize the performance of community discovery. In future,we plan to apply to techniques such as metric learning onmultiple kernels [43] to learn similarity weights better.

Noise tolerance and heterogeneity: Noise is ubiquitousdue to inaccurate signal sensing and localization. In ouralgorithm, the risk of model degradation brought by noise canbe alleviated by the weighted combination of the informationin multiple kernels. For example, if the noise level of spatialkernel Kp is unexpectedly high, the influenced can be reducedby decreasing its weight and the missing information can becomplemented by other kernels such as the semantic kernel.Moreover, the trajectories are collected from different peoplein real-life; hence, the community behavior can be influencedby many factors, such as age, nationality, culture and gender.Therefore, the compactness and average similarity levels arenot consistent from community to community. Our proposedmethod is robust to such kind of heterogeneity, since we canchoose the sub-graphs with top ranked average similarity levelsas the discovered community, or we just choose the top c′ sub-graphs where there is a significant drop between the averagesimilarity level of c′-th and (c′ + 1)-th sub-graph.

Community detection on moving objectives: Comparedto the Campus and Mall scenarios, community detection forthe City scenario is different from several perspectives. First,taxis are located by GPS which is not so accurate as indoorlocating by wifi. Therefore, location correction by weightedsoft assignment is particularly important on taxi trajectoryprocessing. Second, different taxi drivers traverse the roadnetwork with different speeds, usually resulting in a verylarge speed range. Thus the velocity information is more

Page 13: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 13

discriminative than other cues. Third, taxi drivers do nottraverse the roads in groups (unlike students on campus orshoppers in the mall). These observations indicate that thespatial information should be combined with other cues toproduce a more robust behavior similarity measurement.

8 RELATED WORKCommunity detection: Communities in networks/graphs aregroups of vertices within which connections are dense, butbetween which connections are sparser. There are mainly fourtypes of methods [31], [17], [36], [13]: hierarchical clustering,similarity on edge betweenness scores, counts of short loopsand voltage differences in resistor networks. These methodsfocus on detection given a network structure and social-linkdistance between nodes which are hard to be captured fromtrajectories. If we construct such input based on spatial andtemporal information, the detection results are not promising(Figure 3 (d) and Figure 4 (d)).

Trajectory clustering: Trajectory clustering is a diverseresearch area, with significant diversity in clustering measuresand the end goals. Trajectory has been studied using a varietyof measures, ranging from efficient search for activity tra-jectories [49], probability function of time [8], density-baseddistance function [10], uncertainty measurement of trajectories[35] and convoy query using trajectory simplification [14].Different similarity measures (time and location distances) andclustering methodologies have their strengths and weaknesses[29]. In contrast to most prior work, our method is ableto handle multiple information sources (not just movementtrajectories but also the semantics of the underlying space)and apply a general metric-based learning framework to theclustering problem. Trajectory-based clustering has been usedfor different broad objectives, such as discovering commonsub-trajectories [16], identifying spatial structures [32], se-mantic region modeling [44], moving objective clustering withnon-consecutive timestamps [20], trajectory pattern modellingvarious group incidents with high efficiency [50], as well asapplication-specific objectives such as vehicle motion analysis[19] and discovering object groups that travel together (theobjects spatially close at a snapshot). Recently there are aset of papers study moving together pattern discovery intrajectories [53], [51] and keywords searching in semantictrajectories [48]. Other work on trajectory data have studiedphenomenon such as traffic accidents or other outlier activity[26], evaluating the overall health of a road network design[52], and providing dynamic taxi ride sharing service [27].But such work is based purely on spatial locations, makingit hard to to extend it to incorporate semantic, velocity orother information that may contain distinctive markers of realcommunity interaction.

Spatial and semantic information analysis: There arethree types of approaches for addressing semantic analysis oftrajectories: 1) segmentation-based method which segments ascene into semantic regions [45]; 2) conceptual model whichstructures movement data into countable semantic units [38];3) segmentation and annotation-based approach which trans-forms the raw mobility data into stops and moves [46]. But theprevious analysis do not consider multiple information sources

and are difficult to generalize to incorporate additional simi-larity metrics. Our approach is different from not only the pre-vious techniques, but also the problem of finding semanticallysimilar individuals, who may or may not be visiting the samesemantic sites within the same time period (for purposes suchas profile-building of individuals). Furthermore, on the featuredesign for trajectory analysis, previous approaches usuallyextract either individual information from the trajectory data orlong-term movement trends, but not both. For example, GAK[6] measure the similarity between trajectories by applyingthe dynamic time warping kernels to trajectories. On the otherhand, the global statistical information measures the trend andprobability how a user may act, and is more robust to noise andmissing information in individual trajectories. Thus both theglobal statistical and individual information are needed jointlyto model the complicated inter-connection of trajectories.

Behavior-based recommendation: Link, content and loca-tion can be viewed as the results of users’ different behaviors,but little previous work builds trajectory community models toprovide the online recommendation. In recommender systems,behavior models are proposed for different purposes such aseffects of behavior monitoring and perceived system benefit[33], encapsulation of user behavior and news content [18], ef-fect of context-aware recommendations on customer purchas-ing behavior and trust [11], and utility query recommendationby mining users’ search behaviors [54]. Considering the be-havior heterogeneity in our work, it is hard to directly employthe previous approaches for comprehensive behavior modelingtowards the users’ desires. Our behavior modeling is basedon heterogeneous information fusion, which jointly considersattributes in trajectories and contents in the environment bythe unified behavior modeling.

9 CONCLUSIONIn this paper, we proposed TODMIS, a trajectory-based com-munity detection technique, consisting of three phases: 1)modeling semantic information of locations, temporal, spatialand velocity information with kernels; 2) creating a unifiedsimilarity measure via multi-modal diffusion process andweighted combination of multiple information markers; 3)identifying communities via dense sub-graph mining. Further-more, we propose an online recommendation method basedon trajectory communities. Extensive experiments were con-ducted on three real life data sets: customers in a shoppingmall, students in a campus building and taxi drivers in a city.The results demonstrated that TODMIS and our online recom-mendation method outperformed other clustering algorithms indetecting the correct groups and other recommendation meth-ods in recommending information to targets from differenttrajectory data.

Acknowledgments This research was supported by National BasicResearch Program of China (973 Program): 2012CB316400, Na-tional Natural Science Foundation of China: 61572488, 61672497and 61303160, and Basic Research Program of Shenzhen:JCYJ20140610152828686. The work of S. Liu was also supportedby the Google Faculty Research Award. The authors thank RamayyaKrishnan, Kasthuri Jayarajah, Archan Misra, Stephen E. Fienberg andLionel Ni for valuable discussions and support.

Page 14: TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 …1croreprojects.com/basepapers/2017/Trajectory... · TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 1 Trajectory Community

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2016.2637898, IEEETransactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE DISCOVERY AND ENGINEERING 14

REFERENCES

[1] N. Anjum and A. Cavallaro. Multifeature object trajectory clusteringfor video analysis. IEEE TCSVT, 2008.

[2] L. Backstrom and J. Leskovec. Supervised random walks: predicting andrecommending links in social networks. In WSDM’11, pages 635–644,2011.

[3] M. M. Bronstein and K. Glashoff. Heat kernel coupling for multiplegraph analysis. arXiv:1312.3035v1, 2013.

[4] M. M. Bronstein, K. Glashoff, and T. A. Loring. Making laplacianscommute. ArXiv:1307.6549, 2013.

[5] W. Chen, W. Hsu, and M.-L. Lee. Making recommendations frommultiple domains. In KDD’13, pages 892–900, 2013.

[6] M. Cuturi. Fast global alignment kernels. In ICML’11.[7] D. Eynard, K. Glashoff, M. Bronstein, and A. Bronstein. Multimodal dif-

fusion geometry by joint diagonalization of laplacians. ArXiv:1209.2295,2012.

[8] S. Gaffney and P. Smyth. Trajectory clustering with mixtures ofregression models. In SIGKDD’99.

[9] Y. Ge, H. Xiong, Z. Zhou, H. Ozdemir, J. Yu, and K. C. Lee. Top-eye:top-k evolving trajectory outlier detection. In CIKM’10.

[10] F. Giannotti, M. Nanni, D. Pedreschi, F. Pinelli, C. Renso, S. Rinzivillo,and R. Trasarti. Unveiling the complexity of human mobility by queryingand mining massive trajectory data. VLDB J., 2011.

[11] M. Gorgoglione, U. Panniello, and A. Tuzhilin. The effect of context-aware recommendations on customer purchasing behavior and trust. InRecSys’11, pages 85–92, 2011.

[12] D. Horowitz and S. D. Kamvar. The anatomy of a large-scale socialsearch engine. In WWW’10.

[13] J. Huang, H. Sun, J. Han, H. Deng, Y. Sun, and Y. Liu. Shrink: astructural clustering algorithm for detecting hierarchical communities innetworks. In CIKM’10.

[14] H. Jeung, M. L. Yiu, X. Zhou, C. S. Jensen, and H. T. Shen. Discoveryof convoys in trajectory databases. PVLDB’08, pages 1068–1080, 2008.

[15] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatialpyramid matching for recognizing natural scene categories. In CVPR(2)’06.

[16] J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD’07.

[17] J. Leskovec, K. J. Lang, and M. W. Mahoney. Empirical comparison ofalgorithms for network community detection. In WWW’10.

[18] L. Li and T. Li. News recommendation via hypergraph learning:encapsulation of user behavior and news content. In WSDM’13, pages305–314, 2013.

[19] X. Li, W. Hu, and W. Hu. A coarse-to-fine strategy for vehicle motiontrajectory clustering. In ICPR (1)’06.

[20] Z. Li, B. Ding, J. Han, and R. Kays. Swarm: Mining relaxed temporalmoving object clusters. PVLDB’10, pages 723–734, 2010.

[21] W. Lin, X. Kong, P. S. Yu, Q. Wu, Y. Jia, and C. Li. Communitydetection in incomplete information networks. In WWW’12.

[22] H. Liu, L. J. Latecki, and S. Yan. Fast detection of dense subgraphswith iterative shrinking and expansion. IEEE TPAMI, 2013.

[23] S. Liu, S. Wang, K. Jayarajah, A. Misra, and R. Krishnan. Todmis:mining communities from trajectories. In CIKM’13, pages 2109–2118,2013.

[24] S. Liu, Y. Yue, and R. Krishnan. Adaptive collective routing usinggaussian process dynamic congestion models. In KDD’13, pages 704–712, 2013.

[25] S. Liu, Y. Yue, and R. Krishnan. Non-myopic adaptive route planningin uncertain congestion environments. IEEE TKDE, 2015.

[26] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and X. Xing. Discovering spatio-temporal causal interactions in traffic data streams. In SIGKDD’11.

[27] S. Ma, Y. Zheng, and O. Wolfson. T-share: A large-scale dynamic taxiridesharing service. In ICDE’13.

[28] O. Martinez Mozos et al. Semantic place labeling with mobile robots.PhD thesis, University of Freiburg, Germany, 2008.

[29] B. Morris and M. M. Trivedi. Learning trajectory patterns by clustering:Experimental studies and comparative evaluation. In CVPR’09.

[30] M. Nanni and D. Pedreschi. Time-focused clustering of trajectories ofmoving objects. J. Intell. Inf. Syst., 2006.

[31] M. Newman. Detecting community structure in networks. EPJ B, 2004.[32] R. T. Ng and J. Han. Clarans: A method for clustering objects for spatial

data mining. IEEE TKDE, 2002.[33] M. Nowak and C. Nass. Effects of behavior monitoring and perceived

system benefit in online recommender systems. In CHI’12, pages 2243–2246, 2012.

[34] M. Ovsjanikov, Q. Merigot, F. Memoli, and L. Guibas. One pointisometric matching with the heat kernel. Computer Graphics Forum,29(5):1555–1564, 2010.

[35] N. Pelekis, I. Kopanakis, E. E. Kotsifakos, E. Frentzos, and Y. Theodor-idis. Clustering uncertain trajectories. Knowl. Inf. Syst., 2011.

[36] C. Shi, P. S. Yu, Y. Cai, Z. Yan, and B. Wu. On selection of objectivefunctions in multi-objective community detection. In CIKM’11.

[37] V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularization approachto semi-supervised learning with multiple views. In ICML Workshop onLearning with Multiple Views, 2005.

[38] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. F. de Macedo, F. Porto,and C. Vangenot. A conceptual view on trajectories. Data Knowl. Eng.,2008.

[39] X. Su and T. M. Khoshgoftaar. A survey of collaborative filteringtechniques. Advances in AI, 2009.

[40] L. A. Tang, Y. Zheng, J. Yuan, J. Han, A. Leung, C.-C. Hung, and W.-C.Peng. On discovery of traveling companions from streaming trajectories.In ICDE’12.

[41] W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. InICDM’09, 2009.

[42] M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similarmultidimensional trajectories. In ICDE’02.

[43] J. Wang, H. Do, A. Woznica, and A. Kalousis. Metric learning withmultiple kernels. In NIPS’11.

[44] X. Wang, K. T. Ma, G. W. Ng, and W. E. L. Grimson. Trajectory analysisand semantic region modeling using nonparametric hierarchical bayesianmodels. IJCV, 2011.

[45] X. Wang, K. Tieu, and E. Grimson. Learning semantic scene modelsby trajectory analysis. In ECCV (3)’06.

[46] Z. Yan, D. Chakraborty, C. Parent, S. Spaccapietra, and K. Aberer.Semantic trajectories: Mobility data computation and annotation. ACMTIST, 2012.

[47] Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similaritysearch on large probabilistic graph databases. PVLDB’12, pages 800–811, 2012.

[48] B. Zheng, N. Yuan, K. Zheng, X. Xie, S. Sadiq, and X. Zhou. Ap-proximate keyword search in semantic trajectory database. In ICDE’15,pages 975–986, 2015.

[49] K. Zheng, S. Shang, N. Yuan, and Y. Yang. Towards efficient searchfor activity trajectories. In ICDE’13, pages 230–241, 2013.

[50] K. Zheng, Y. Zheng, N. J. Yuan, and S. Shang. On discovery of gatheringpatterns from trajectories. In ICDE’13, pages 242–253, 2013.

[51] K. Zheng, Y. Zheng, N. J. Yuan, S. Shang, and X. Zhou. Onlinediscovery of gathering patterns over trajectories. IEEE TKDE, 2014.

[52] Y. Zheng, Y. Liu, J. Yuan, and X. Xie. Urban computing with taxicabs.In UbiComp’11.

[53] Y. Zheng, N. J. Yuan, K. Zheng, and S. Shang. On discovery of gatheringpatterns from trajectories. In ICDE’13, pages 242–253, 2013.

[54] X. Zhu, J. Guo, X. Cheng, and Y. Lan. More than relevance: high utilityquery recommendation by mining users’ search behaviors. In CIKM’12,pages 1814–1818, 2012.

Siyuan Liu is assistant professor at Smeal Col-lege of Business, Pennsylvania State University.He received his first Ph.D. from Departmentof Computer Science and Engineering at HongKong University of Science and Technology, andthe second Ph.D. from University of ChineseAcademy of Sciences. His research interestsinclude trajectory mining, social behavior analyt-ics, and mobile marketing.

Shuhui Wang received B.S. from Tsinghua Uni-versity in 2006, and Ph.D. from Institute of Com-puting Technology, Chinese Academy of Sci-ences, in 2012. He is an associate professorwith Institute of Computing Technology, ChineseAcademy of Sciences. He is also with Key Labo-ratory of Intelligent Information Processing, Chi-nese Academy of Sciences. His research inter-ests include large-scale Web data mining, visualsemantic analysis and machine learning.