caps: context aware personalized poi sequence …caps: context aware personalized poi sequence...

17
Noname manuscript No. (will be inserted by the editor) CAPS: Context Aware Personalized POI Sequence Recommender System Ramesh Baral and Tao Li and XiaoLong Zhu Abstract The revolution of World Wide Web (WWW) and smart-phone technologies have been the key-factor be- hind remarkable success of social networks. With the ease of availability of check-in data, the location-based social networks (LBSN) (e.g., Facebook 1 , etc.) have been heavily explored in the past decade for Point-of-Interest (POI) recommendation. Though many POI recommenders have been defined, most of them have focused on recommending a single location or an arbitrary list that is not contextually coherent. It has been cumbersome to rely on such systems when one needs a contextually coherent list of locations, that can be used for various day-to-day activities, for e.g., itinerary planning. This paper proposes a model termed as CAPS (C ontext A ware Personalized P OI S equence Recommender System) that generates contextually coherent POI sequences relevant to user preferences. To the best of our knowledge, CAPS is the first attempt to formulate the contextual POI sequence modeling by extending Recurrent Neural Network (RNN) and its variants. CAPS extends RNN by incorporating multiple contexts to the hidden layer and by incorporating global context (sequence features) to the hidden layers and output layer. It extends the variants of RNN (e.g., Long-short term memory (LSTM)) by incorporating multiple contexts and global features in the gate update relations. The major contributions of this paper are: (i) it models the contextual POI sequence problem by incorporating personalized user preferences through multiple constraints (e.g., categorical, social, temporal, etc.), (ii) it extends RNN to incorporate the contexts of individual item and that of whole sequence. It also extends the gated functionality of variants of RNN to incorporate the multiple contexts, and (iii) it evaluates the proposed models against two real-world data sets. Keywords Information Retrieval, Context Aware Recommendation, POI Recommendation, Social Networks 1 INTRODUCTION The POI recommendation systems have been very popular in the last few years. Such systems exploit the explicit historical check-in data and some implicit aspects or contexts to generate a list of POIs that are of potential interest to users. Generally, most of the POI recommenders recommend a single POI or an arbitrary list of POIs (Yuan et al. (2013); Zhang and Chow (2015)) that satisfy the personalized user preferences but may not be contextually coherent. Often, users need a list of items that are contextually coherent as per the preferences of users. Such a coherent list of recommendation can be formulated as a sequence modeling problem where the sequence of items need to adhere to users’ preferences by addressing constraints that are applicable to the consecutive items and also to the whole sequence. For instance, in the case of itinerary recommendation, the distance between consecutive places, total trip time, tentative start/end time, start/end place/category, etc. can be the major concerns. In comparison to the general item recommendation, the POI recommendation is more challenging as it is sensitive to various aspects (e.g., categorical, social, spatial, temporal, etc.) (Yuan et al. (2013); Zhang and Chow (2015); Baral Ramesh Baral School of Computing and Information Sciences, Florida International University, Miami, FL, USA E-mail: rbara012@fiu.edu Tao Li School of Computing and Information Sciences, Florida International University, Miami, FL, USA E-mail: taoli@cs.fiu.edu XiaoLong Zhu School of Computing and Information Sciences, Florida International University, Miami, FL, USA E-mail: xzhu009@fiu.edu 1 www.facebook.com arXiv:1803.01245v1 [cs.IR] 3 Mar 2018

Upload: others

Post on 26-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

Noname manuscript No.(will be inserted by the editor)

CAPS: Context Aware Personalized POI Sequence Recommender System

Ramesh Baral and Tao Li and XiaoLong Zhu

Abstract The revolution of World Wide Web (WWW) and smart-phone technologies have been the key-factor be-hind remarkable success of social networks. With the ease of availability of check-in data, the location-based socialnetworks (LBSN) (e.g., Facebook1, etc.) have been heavily explored in the past decade for Point-of-Interest (POI)recommendation. Though many POI recommenders have been defined, most of them have focused on recommendinga single location or an arbitrary list that is not contextually coherent. It has been cumbersome to rely on such systemswhen one needs a contextually coherent list of locations, that can be used for various day-to-day activities, for e.g.,itinerary planning.

This paper proposes a model termed as CAPS (Context Aware Personalized POI Sequence Recommender System)that generates contextually coherent POI sequences relevant to user preferences. To the best of our knowledge, CAPSis the first attempt to formulate the contextual POI sequence modeling by extending Recurrent Neural Network (RNN)and its variants. CAPS extends RNN by incorporating multiple contexts to the hidden layer and by incorporating globalcontext (sequence features) to the hidden layers and output layer. It extends the variants of RNN (e.g., Long-shortterm memory (LSTM)) by incorporating multiple contexts and global features in the gate update relations. The majorcontributions of this paper are: (i) it models the contextual POI sequence problem by incorporating personalized userpreferences through multiple constraints (e.g., categorical, social, temporal, etc.), (ii) it extends RNN to incorporatethe contexts of individual item and that of whole sequence. It also extends the gated functionality of variants of RNNto incorporate the multiple contexts, and (iii) it evaluates the proposed models against two real-world data sets.

Keywords Information Retrieval, Context Aware Recommendation, POI Recommendation, Social Networks

1 INTRODUCTION

The POI recommendation systems have been very popular in the last few years. Such systems exploit the explicithistorical check-in data and some implicit aspects or contexts to generate a list of POIs that are of potential interestto users. Generally, most of the POI recommenders recommend a single POI or an arbitrary list of POIs (Yuan et al.(2013); Zhang and Chow (2015)) that satisfy the personalized user preferences but may not be contextually coherent.Often, users need a list of items that are contextually coherent as per the preferences of users. Such a coherent listof recommendation can be formulated as a sequence modeling problem where the sequence of items need to adhereto users’ preferences by addressing constraints that are applicable to the consecutive items and also to the wholesequence. For instance, in the case of itinerary recommendation, the distance between consecutive places, total triptime, tentative start/end time, start/end place/category, etc. can be the major concerns.

In comparison to the general item recommendation, the POI recommendation is more challenging as it is sensitiveto various aspects (e.g., categorical, social, spatial, temporal, etc.) (Yuan et al. (2013); Zhang and Chow (2015); Baral

Ramesh BaralSchool of Computing and Information Sciences, Florida International University, Miami, FL, USAE-mail: [email protected]

Tao LiSchool of Computing and Information Sciences, Florida International University, Miami, FL, USAE-mail: [email protected]

XiaoLong ZhuSchool of Computing and Information Sciences, Florida International University, Miami, FL, USAE-mail: [email protected]

1 www.facebook.com

arX

iv:1

803.

0124

5v1

[cs

.IR

] 3

Mar

201

8

Page 2: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

2 Ramesh Baral and Tao Li and XiaoLong Zhu

and Li (2016); Baral et al. (2016)) and the impact of those aspects might vary on the preference of user ( Baral andLi (2017)). The POI sequence recommendation becomes even more interesting and challenging because of followingmajor reasons:

1. the brute-force approaches on computing all permutations of the POIs are NP-hard problem and are not useful,2. getting a preferred list from an arbitrary list may not be optimal because the preferred list need to satisfy multiple

constraints, for instance, the places in a trip should be contextually coherent, need to satisfy many constraints,such as spatial (near/far places), temporal (relevant places for a time of a day, for instance bars can be relevantat evening or night), social (as the trip might be a family, friend focused), time budget constraint of the traveler,start and end POIs, etc.,

3. the contexts can be user dependent and can vary dynamically in real-time (e.g., the same context may not bealways relevant to a user).

There are few studies that focused on POI sequence recommendation. Most of the existing systems are focusedon either frequency-based user interest or the average visit durations but incorporation of major constraints (such asdistance between consecutive POIs, social constraints, categorical constraints, and temporal constraints) is less exploredand is of great interest in the recommendation community. This paper attempts to fill the gap by incorporating multipleconstraints for POI sequence modeling.

We can envision some implicit relevance between POI sequence modeling and the word sequence modeling. First,the smallest elements (check-ins and words) have some contextual coherence with their neighbors. Second, most of thesubsequences (check-ins made within a time duration, and a sentence) comprise of contextually coherent elements andalso have contextual relation with neighbors (check-ins of preceding/succeeding time duration, and neighbor sentences).Third, the ordered sequence of elements (chronological check-ins of a user, and a text) follow some pattern which canbe modeled to learn relation between subsequences (e.g., predict the next check-in and predict next word).

Besides the aforementioned relevances, the contextual POI sequence modeling is more challenging due to thefollowing reasons: (i) the POI sequence modeling can have longer contextual impact (e.g., unlike the language modelwhere the n-grams can capture most of the context, the set of check-ins can be unique every day) and adhere topersonalized preferences, (ii) unlike the language modeling tasks (where there is mostly a one-to-one mapping betweenthe input and output items, e.g., language translation problems), the POI sequence modeling can have many-to-manyrelation (e.g., depending on the context, a user who has visited a set of locations can select multiple destinations).Sometimes the POI sequences can repeat (for instance, a user can depict same check-in pattern after some timeinterval) which is rare in language modeling (e.g., in next sentence prediction, the input sentences are mostly unique),and (iii) the real-time contexts can complicate the problem formulation using traditional sequence models.

Inspired from the wider popularity of RNN and its variants in sequence modeling (e.g., language modeling (machinetranslation) and image caption generation) (Mikolov et al. (2010); Mikolov and Zweig (2012); Ghosh et al. (2016);Bengio et al. (2003)), we attempt to cope with the challenges of contextual POI sequence modeling by incorporatingthe local contexts (context valid for subsequence) and global contexts (context valid for whole sequence). The localcontexts (known as context now onwards in the paper) are incorporated into the recurrent module and the globalcontexts (known as feature of sequence now onwards in the paper) are fed to all the layers in the network. We alsoextend the gating mechanism of LSTM to incorporate the context and feature for personalized POI sequence modeling.To the best of our knowledge, the proposed model is the first one to address the multi-context POI sequence modelingusing extended RNN and its variants.

The major contributions of this paper are: (i) it introduces all the major personalized user constraints (such astemporal, categorical, spatial etc.) as the influential context in the POI sequence modeling, (ii) it extends RNN and itsvariants to explicitly incorporate relevant contexts, and (iii) it evaluates the proposed models against two real-worlddatasets.

2 RELATED RESEARCH

We categorize the relevant studies using two broad notions - general data mining-based studies, and neural network-based studies.

2.1 General Data Mining Approaches

Most of the existing studies were focused on analysis of landmark visit patterns by exploiting the geo-spatial andtemporal traces from check-ins. However, those studies mostly focused on the analysis only and ignored synthesis orrecommendation of new paths. The Orienteering approach from Feillet et al. (2005) (a variant of Traveling salesmanproblem) was the dominating technique for earlier studies. This approach was focused on the objective of finding a

Page 3: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 3

(a) Check-in distribution in Weeplace dataset (b) Check-in distribution in Gowalla dataset

Fig. 1: Check-in distribution in different dataset

subset of places that could maximize the reward under some constraints, such as maximum travel cost (e.g., the timelimit is not exceeded). The other widely exploited techniques were collaborative filtering (Zhang et al. (2015); Yuet al. (2016)), content-based approach (Bohnert et al. (2012)), Apriori principle (Lu et al. (2012); Yu et al. (2016)),matrix factorization (Ge et al. (2011)), topic-modeling (Liu et al. (2011, 2014)), and tree traversal approaches (Zhanget al. (2015)). Dunstall et al. (2003) proposed a trip planner that matched the explicit user inputs against a databaseto get the best matching itineraries. Tai et al. (2008) mined itineraries’ data and generated sequences by implicitlyincorporating users’ interests via the images on Flickr2. However, they did not focus on constructing an itinerary fromscratch and also did not address the major aspects, such as the temporal influence on users and locations.

A content-based filtering approach from Bohnert et al. (2012) exploited visitors’ explicit ratings to recommendtours in a museum. Unlike a real trip, the order of recommended items and the distance between consecutive itemswere not focused in this system. The greedy algorithm-based approaches (De Choudhury et al. (2010); Vansteenwegenet al. (2011)) matched user profiles (consisting of preferred location categories, types, start/end time, and keywords)to relevant places. Lu et al. (2012) mined user check-in data and exploited the Apriori-based algorithm. Thought itwas computationally intensive, it was able to find optimal trips under multiple constraints, such as social relationship,temporal property, categorical diversity, and parallel computing simultaneously. Zhang et al. (2015) exploited theuncertain travel time between two POIs and the POI availability constraints and proposed a tree-based algorithm tosolve the personalized trip recommendation problem. A recent study from Wang et al. (2016) proposed a personalizedtrip recommender that was able to handle the crowd constraints (i.e., avoid peak hour of POIs) by extending the AntColony Optimization algorithm. Zheng and Xie (2011) proposed a HITS-based inference model to evaluate the degreesof user experiences and location interests based on GPS trajectory data.

A heuristic-based algorithm from Chen et al. (2015) exploited user preferences, and traveling times based ondifferent traffic conditions using trajectory patterns derived from taxi GPS traces. The heuristic search-based travelroute planning system from Yu et al. (2016) exploited only user preferences and spatio-temporal constraints. Jianget al. (2016) proposed a ranking-based system to recommend personalized travel sequences in different seasons bymerging textual data and view point information extracted from images. However, they did not use the constraintsfor social, temporal preference of users, and temporal popularity of places. Another recent study from Lim et al.(2017) proposed time-based user interest and demonstrated advantages over the use of frequency-based popularitymeasures.They exploited geo-tagged images and incorporated contexts, such as visit duration, users’ preferences, andstart and end points. However, their model did not address the categorical, temporal, and social constraints on thePOI sequence. A probabilistic model from Chen et al. (2016) incorporated POI categories and user behavior historyto generate trajectories. They used Rank-SVM to rank the items and used Markov model to predict the transitionbetween POIs.

Most of the existing studies have exploited few contexts and have focused on personalized POI visit durations.Unlike others, our study exploits multiple contexts (temporal, spatial, categorical, and social) to generate the POIsequence. In comparison to the existing studies, this paper presents following major differences: (i) it exploits multiplecontexts (temporal, spatial, categorical, and social) to model the contextual POI sequence problem, (iii) it extends theRNN by incorporating multiple contexts to efficiently model the personalized contexts in POI sequence generation. Itextends the gating mechanism of variants of RNN to incorporate multiple contexts for efficient POI sequence modeling,and (iii) it evaluates the generated sequences with various relevant metrics (e.g., pairs-F1 score, diversity, displacement(see Section 4.2 for detail)) on two real-world datasets.

2 https://www.flickr.com

Page 4: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

4 Ramesh Baral and Tao Li and XiaoLong Zhu

2.2 Neural Network-based approaches

The Recurrent Neural Networks and its variants (e.g., Long Short-Term Memory (LSTM), Gated Recurrent Unit(GRU),etc.) were quite effective in sequence modeling problems (for instance speech recognition and machine translation)(Mikolov and Zweig (2012); Twardowski (2016); Sutskever et al. (2011)) that can exploit the dependencies betweenspatially correlated items (such as words in a sentence and pixels in images). Though some of the neural network-basedmodels focused on predicting trajectories, they were more focused on collision avoidance between humans (Alahi et al.(2016)). The Social LSTM from Alahi et al. (2016) extended the approach from Graves (2013) by incorporating thespatial distance between neighbor objects and predicted the trajectory for human movement. The existing modelswere mainly focused on collision avoidance and ignored the constraints (such as user preferences and POI contexts)relevant to POI sequence generation.

Though RNNs were quite popular in sequence modeling for image and text, relatively fewer studies have exploitedit for the recommendation domain. A recent study from Wu et al. (2017) used the Recurrent Recommender Networks(RRN) to predict future behavioral trajectories. They exploited LSTM (Hochreiter and Schmidhuber (1997)) basedmodel to capture the user-movie preference dynamics, in addition to a more traditional low-rank factorization. Hidasiet al. (2015) trained a GRU (Cho et al. (2014))-based model with a ranking loss to predict the next item in the usersession on e-commerce system. However, they did not consider personalization. A contextual network from Smirnovaand Vasile (2017) used three different techniques (concatenation, multiplication, and both) to integrate the contextembedding with the input embedding and was used for next item prediction. Though some of the neural network-basedrecommenders exist, none of them focused on multi-context personalized POI sequence modeling.

3 Methodology

In this section, we define our proposed approach. First, we define the basic concepts that are frequently used in thepaper and then we elaborate the incorporation of contexts into RNN and its variants.

Context : A context of a check-in represents the current and previous scenarios which have direct or indirect influenceon the selection of next POI. Such a context (for instance current time, category of current place, category of previousplace, popularity score of current place, and so forth.) can be represented as a high-dimensional vector.

Context-aware POI sequence : Given a set of contexts C = {ci}, where i ∈ K, is a context type, our objective isto predict or generate a sequence of POIs, that are most relevant to the given context and match the user preferences.

Given a user u who has made n check-ins, we define her travel history as an ordered sequence, Hu = (V1, V2, ..., Vn),where Vi is a check-in activity and can be represented by the triplet Vi = (li, ai, di) indicating the location (li) of thecheck-in, arrival time (ai) to li, and the departure time (di) from li respectively. The travel sequences can be split intosmall subsequences (for instance, the check-ins made within a time interval, e.g., 8 hours, one day, etc.). As in earlierstudies (Lim et al. (2015); De Choudhury et al. (2010)), we use the time interval of 8 hours.

Visit duration of POI : The average visit duration (stay time (ST)) of a POI i is defined as:

ST (i) =1

| U |∑u∈U

Vu∈HuVu.l=i

1

| Vu,i |∑

l∈Vu,i

(al+1 − TT (l, l + 1)− al), (1)

where U is the set of all users, Vu,l is the set of visits made by the user u to location l, TT(a,b) is the travel timebetween POI a and POI b. We can use a log normal distribution to compute the travel time between two POIs visitedconsecutively. The stay time is 0-1 normalized and is represented as ST ′(i).

Page 5: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 5

We define the user interest to a place in terms of aggregate of stay time (AST) to that place. This term in turnrelies on the visit frequency and the stay time to that place, and the stay time to the places of same category.

AST (u, i)cat = (1− α) ∗ Fst′(u, i) + α ∗ Gst′(u, i)∑l∈Vu

l.cat=i.cat

ST ′(l)

V ′u,l,

Fst′(u, i) =ST ′(i)

V ′u,iif | Vu,i |> 0, i.e. u has some visits to location i ,

Fst′(u, i) = 0 otherwise,

Gst′(u, i) =1∑

l∈Vul.cat=i.cat

1if ∃Vu,l ∧ l.cat = i.cat, i.e., u has some check-ins on category i.cat ,

Gst′(u, i) = 0 otherwise, (2)

where Vu is the set of visits by user u, Vu,l is the set of visits by user u to location l, V ′u,l =|Vu,l||Vu| is the normalized

visit count of user u to location l, l.cat is the category of location l, and α is a constant tuning factor estimated usingthe fraction of check-ins that are of same category as location i . After incorporating social impact, the average staytime on a location i can be defined as:

AST (u, i) = (1− ψ1) ∗AST (u, i)cat + ψ∗J (Fu)∑k∈Fu

AST (k, i)cat,

J (Fu) =1

| Fu |if | Fu |> 0 , i.e. u has some friends,

J (Fu) = 0 otherwise, (3)

where Fu represents the set of friends of user u, ψ1 is a constant tuning parameter estimated using the fraction ofcheck-ins of the user u that are common to her friends. Similarly, the average stay time by a user u to a locationcategory ’cat’ can be defined as:

AST (u)cat = (1− γ1) ∗ (∑i∈Vu

i.cat=cat

AST (u, i)cat) + γ1 ∗ (∑j∈Fuk∈Vj

k.cat=cat

AST (j, k)cat), (4)

where γ1 is a tuning factor estimated using the fraction of check-ins of user u that are common to her friends andhave category ’cat’. ASTcat is the aggregate of average stay on the category ’cat’ from all users and AST t

cat gives themeasure for time t.

Preference score of POI : For a user u, the preference score (PS) of a place l at time t is defined using the followingrelation:

PS(u, l, t) = β ∗ {(1− θ) ∗ P(u, l)∗ | Vu,l,t | +θ ∗ Q(u, l)∑l′∈L

l′.cat=l.cat

| Vu,l′,t || Vu,l′ |

}+ (1− β) ∗AST (u, l),

P(u, l) =1

| Vu,l |if | Vu,l |> 0,

P(u, l) = 0 otherwise,

Q(u, l) =1∑

l∈Vul.cat=i.cat

1if ∃Vu,l ∧ l.cat = i.cat, i.e., u has some check-ins on category i.cat ,

Q(u, l) = 0 otherwise, (5)

where Vu,l,t is the set of visits made by user u to location l at time t, L is the set of all locations, θ can be estimated asin Equation 2, and β is a tuning factor estimated using TF-IDF (term frequency inverse document frequency) (Saltonand McGill (1986)). This relation addresses the trade-offs between visit frequency and stay time which is crucial toreward the preferred check-ins with low frequency but reasonable stay time. The generalized preference score PS(l, t)can be easily derived from above relation by considering the visit frequencies of all the users and stay time of allthe users to this location at time t. Furthermore, the usage of categorical and the social contribution can handle thecold-start items (items with no check-ins information) and cold-start users (users with no check-ins) to some extent

Page 6: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

6 Ramesh Baral and Tao Li and XiaoLong Zhu

as the relations defined above incorporate some contributions from categorical and social factors. We also handle thescenario in which places with higher preference scores are compromised due to other constraints, such as travel timeand cost. To address the trade-off between constraints and preference score, we define a consolidated preference scoreas:

P (u, l, t) = PS(u, l, t) ∗ (1− 1

m

m∑i=1

Constrainti(l, p)), (6)

where Constrainti(l, p) is a normalized numeric measure of ith constraint between the users’ current location p andthe target location l. For instance, the spatial constraint is the measure of distance between locations p and l which ismin-max normalized by the minimum and maximum distance traveled by any user to reach location l from any otherlocation. The above mentioned preference metrics are used as features and attributes (defined later) when we trainour prediction model.

3.1 Network Design

We exploit RNNs because they are ideal to our problem due to the following reasons: (i) the POI check-in patterns arerelatively casual and vary across the users. Thus a long term dependency is required which can be modeled by RNNs(or its variants like GRU and LSTM), (ii) the check-in trends are associated with many contextual attributes, such associal, categorical, temporal, and spatial. The capability of RNNs to incorporate additional attributes (e.g., Mikolovet al. (2010); Mikolov and Zweig (2012)) is another reason behind selecting them.

POI_1 POI_2 POI_k <eot>

POI_2 <eot>POI_3f(t)

w_t

C_(t-1)

A_t

F

U

W

C_tV

y_t

G

(a) POI Sequence Modeling with basic RNN

POI_1 POI_2 POI_k <eot>

POI_2 <eot>POI_3f(t)

w_t

C_(t-1)

A_t

F

U

W

C_tV

y_t

G

(b) Contextual POI Sequence Modeling

Fig. 2: Overview of proposed system architecture. Figure a shows the sequence modeling with general RNN where nocontextual information is used in the hidden and output layers. Figure b shows the sequence modeling with extendedRNN where the contextual information is explicitly fed to the hidden and output layer.

3.1.1 Preliminaries of proposed Network

RNNs are ideal for sequence modeling problem because they can efficiently model the generative process of sequentialinformation, can summarize the information in hidden states, and can generate new sequences by using the probabilitydistribution specified by the hidden states. The hidden state ht from the input sequence (x1, x2, ...., xt) is recursivelyupdated as ht = f(ht−1, xt), where f(., .) is a non-linear transformation function (for e.g., ht = tanh(Uht−1 + V xt),where U and V are weight matrices). The overall probability of a sequence −→x = x1, x2, ....., xN can be defined as:

p(−→x ) =N∏t=1

p(xt | ht−1). (7)

Training such a regular RNN by maximizing the above function suffers from vanishing gradient (i.e., a problem withgradient based learning methods where the gradient (error signal) decreases exponentially with the value of n (numberof layers) while the front layers train very slowly, see Hochreiter (1998) for further detail). While the architecturalenhancements, such as long short-term memory (LSTM) from Hochreiter and Schmidhuber (1997) and Gated RecurrentUnit (GRU) from Cho et al. (2014) are believed to address the vanishing gradient problem, the context enriched models(Mikolov et al. (2010); Mikolov and Zweig (2012)) are also claimed as an effective solution. We formulate the contextualPOI sequence modeling by incorporating the contexts into RNN and its variants.

Page 7: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 7

POI_1 POI_2 POI_k <eot>

POI_2 <eot>POI_3f(t)

w_t

C_(t-1)

A_t

F

U

W

C_tV

y_t

G

POI_1 POI_2 POI_3 <eot>

POI_2<eot>POI_3

A_2

W

U

f(t)

WUU

A_3

W

GF F F

W

A_1

b

y_t

F

U

b

Fig. 3: Unraveled view of Contextual POI Sequence Model representing the propagation of contextual informationthrough different layers of the network.

The basic POI sequence modeling with regular RNN is illustrated in Figure 2(a). The proposed contextual POIsequence modeling is illustrated in Figure 2(b). The unraveled view of the contextual RNN (Figure 2(b)) is illustratedin Figure 3. The input vector and the output vector both have the dimension of the number of POIs. The input vectorw(t) is the check-in at time t (which is a one hot encoding), and the output layer produces a probability distributionover the POIs, given the context, feature and the previous check-in. The hidden layers of RNNs are responsible tomaintain the history of check-ins that form the sequence. Inspired from Mikolov and Zweig (2012), we extend thevanilla RNN to incorporate the context and feature inputs (f(t)). The context and the attribute of the current input,the current input itself, and the features representing the whole sequence are fed to the network. The features arepropagated to the hidden as well as to the output layers. This helps us to retain the important features which thenetwork will be aware at each and every iteration (see Figure 2(b)).

3.1.2 Basic POI Sequence Modeling

In this model, we use the regular RNN which is popular for modeling the sequential data. The capability of maintainingthe session information into the recurrent states are often considered as its strength. Although they are theoreticallyclaimed as poor candidates for problem with long session relations (due to vanishing gradient problem), architecturalextensions (e.g., LSTM) or contextual extensions (e.g., Mikolov and Zweig (2012)) are some of the state-of-arttechniques in language modeling domain.

With respect to POI sequence modeling, the RNNs can use the probability distribution specified by the hiddenstate to predict the next POI. The hidden state summarizes the information of the sequence observed so far (e.g.,l1, l2, ...., lt−1). A hidden layer depends on the current input (xt) and the input at previous step (ht−1), i.e. ht =f(Wxhxt + Whhht−1), where Wxh and Whh are the weights for the current input, and for the value from previoushidden layer respectively. The probability of a POI sequence can then be defined using the chain rule:

p(−→l ) =

T∏t=1

p(lt | ht−1), (8)

where T is the number of timesteps to be considered. The negative log likelihood is used as the sequence loss to trainthe network:

L(x) = −T∑

t=1

log(p(lt | ht−1)). (9)

For the defined network, the partial derivative of the loss can be computed using the backpropagation throughtime (Williams and Zipser (1995)) and the network can be trained using gradient descent. Such a sequence modelingis illustrated in Figure 2(a). This network predicts the next POI by learning from the previously fed POI sequencesonly and does not consider other information, such as POI attributes, current context, and other relevant features.The contextual model defined in next subsection addresses these aspects.

Page 8: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

8 Ramesh Baral and Tao Li and XiaoLong Zhu

3.1.3 Contextual POI Sequence Modeling using RNN

This model is termed as CAPS-RNN and represents the contextual POI sequence modeling as illustrated in Fig-ure 2(b) and Figure 3. CAPS-RNN extends the regular RNN by incorporating the context to hidden layers and thefeature (f(t)) to the hidden and output layers. This extension has following major advantages: (i) as the regular RNNshave no provision of explicit context propagation and as the POI sequence modeling requires the contexts to be remem-bered for longer span, the extension helps us to propagate the relevant contexts to make sure that the next predictedsubsequence adheres to the context, (ii) the regular RNNs suffer from vanishing gradient problem (i.e. during back-propagation, the gradients decrease exponentially with the value of n (number of layers) and the front layers train veryslowly, see Hochreiter (1998) for further detail). The explicit feeding of contexts to the hidden layer and the featuresto hidden and output layers helps alleviate the gradients, meanwhile retaining the relevant context (see Mikolov andZweig (2012) for detail).

The feature contains information that is relevant to the current input and also to the whole sequence, for instancewith respect to POI sequence modeling, it can be the time budget constraint, start/end place categories, the geograph-ical vicinity of interest, etc. These are the auxiliary information that can have significant impact on the predictiontasks. The context represents the aspect that is relevant to the current input (for instance, the current time, previouscheck-ins which impact the current check-in). In addition to context and feature, the attribute vector which representsany information that is relevant to the current item (for e.g., the locations’ category, locations’ hours, locations’ pop-ularity, locations’ preference score, and locations’ average stay time) is also incorporated to the network. For the sakeof ease, the attribute and context vectors can be combined together.

After we train the network (using stochastic gradient descent), the output vector y(t) gives the probability measureof different POIs at time t, given the previous POI, context, and the feature vector. The hidden and output layers areupdated using the following relations:

c(t) = H(Uw(t) + W(c(t− 1)�A(t)) + Ff(t)), (10)

yt = Vc(t) + Gf(t), (11)

y(t) = g(yt), (12)

where H(.) is a non-linear function (e.g., tanh), A(t) is the context attribute vector of current item for time t (see

Eqn. 14), f(t) is the sequence feature vector (see Eqn. 13), and g(ai) = exp(ai)∑j

exp(aj)is the softmax function. The sequence

feature vector contains the information relevant to the whole sequence. We use the feature vector of following format:

f(t) =< catstart, catend, locstart, locend, locdist, timestart, timeend >, (13)

where catstart is category of starting place, catend is category of ending place, locstart is starting place, locend isending place, locdist is the distance between consecutive places, timestart is starting hour, and timeend is ending hourof the POI sequence. All the non-numeric elements of this vector are label encoded before feeding to the network.The attribute vector contains the information relevant to the current item and current context. We use the followingformat for the attribute vector:

A(l, t) =< ST ′(l), AST (l), AST tcat, PS(l), l.cat, lT , ldist >, (14)

where lT = l1, l2, ..., lT is the temporal popularity of location l for each hour, ’cat’=l.cat is category of location l,and ldist is the distance of location l from the previous location in the sequence. All the non-numeric elements of thisvector are label encoded. The feature vector and the attribute vector can be used to incorporate additional featuresand attributes if required. The network is trained to learn the weight matrices (U, V, W, F, and G), and to maximizethe likelihood of the training data (Mikolov (2012); Bengio et al. (2003)). The probability of a POI sequence l for thenetwork can be defined as:

p(l) =T∏

t=1

p(lt+1 | yt). (15)

3.1.4 Contextual POI Sequence modeling using LSTM

This model is termed as CAPS-LSTM. The core idea behind LSTM is the introduction of memory state andmultiple gating functions to control the write, read, and removal (forget) of the information written on memory state.The information is propagated by applying these gates to the input data and data from the previous memory states.We incorporate the explicit context to each LSTM cell because each cell models a subsequence and each subsequencecan have potentially unique context. This explicit context to each LSTM cell and the feature of sequence to all LSTMcells help us propagate the relevant context for the sequence modeling. Due to space constraint, we only provide the

Page 9: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 9

update equations of our extended LSTM which incorporates the contextual information. The hidden layer of contextualLSTMs are updated using the following relations:

it = σ(Wixt + Whi(ht−1 �At) + Wcict−1 + bi + WfF ),

ft = σ(Wfxt + Whf (ht−1 �At) + Wcfct−1 + bf + WfF ),

zt = tanh(Wzxt + Whc(ht−1 �At) + bz + WfF ),

ct = ft � ct−1 + it � zt,ot = σ(Woxt + Who(ht−1 �At) + Wcoct−1 + bo + WfF ),

ht = ot � tanh(ct). (16)

where ct is the memory state, zt is the module that transforms information from input space xt to the memory space,and ht is the information read from the memory state. The input gate it controls information from input zt to memorystate, the forget gate ft controls information in the memory state to be forgotten, and the output gate ot controlsinformation read from the memory state. The memory state ct is updated through a linear combination of inputfiltered by the input gate and the previous memory state filtered by the forget gate. The term Wf is feature weightmatrix, F is feature vector, At is attribute vector as in the contextual RNN model, and � is element-wise productoperator. The relevant weight matrices W and biases b are subscripted accordingly.

3.1.5 Sequence generation

After the model is trained, it can generate the personalized POI sequences. The basic idea is to train the networkusing a sequence data one step at a time. The sampled output from the network is then fed as the input for the nextlevel. If we have k different POIs, and kth POI is checked-in at time t, then the input xt is one-hot encoded vectorwith only the kth entry set to 1. The output of the network is a multinomial distribution which is parameterized usinga softmax function and can be defined as:

p(xt+1 = k | yt) = ykt =exp(ykt )

K∑k′=1

exp(yk′

t )

. (17)

From the generated sequences, the top-k scorers (sum of preference scores (AS) of all places in a sequence) arerecommended to user.

4 Evaluation

In this section, we describe the dataset, the evaluation baselines, evaluation metrics, and the experimental results.

4.1 Dataset

We used two real-world datasets collected from two popular LBSNs -Gowalla and Weeplaces (Liu et al. (2013)). Thesedatasets were well defined and have the attributes relevant to the context of this paper, such as (i) the locationcategory, (ii) geospatial co-ordinates, (iii) friendship information, and (iv) check-in time. The statistics of the datasetis summarized in Table 1.

Dataset Check-ins Users Locations Links Location Categories

Gowalla 36,001,959 319,063 2,844,076 337,545 629

Weeplace 7,658,368 15,799 971,309 59,970 96

Table 1: Statistics of the datasets.

After avoiding incomplete records, the 5 most checked-in categories (and their check-in count) were: (i) Home/Work/Other:Corporate/Office (437,824), (ii) Food: Coffee Shop (267,589), (iii) Nightlife:Bar (248,565), (iv) Shop: Food& Drink:Grocery/Supermarket(161,016), and (v) Travel: Train Station (152,114) for Weeplaces, and (i) Corporate Office (1,750,707), (ii) Coffee Shop(1,063,961), (iii) Mall (958,285), (iv) Grocery (884,557), and (iv) Gas & Automotive (863,199) for the Gowalla dataset.The work or home-related category (Home/Work/ Other:Corporate/Office) was popular from 6 am to 6 pm, with the

Page 10: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

10 Ramesh Baral and Tao Li and XiaoLong Zhu

highest check-ins (42,019) made at 1 pm. Similarly, the bars had highest of 21,806 check-ins at 2 am and the lowestcheck-ins (15,209) at 5 am. Most of the check-ins were at 12 pm - 6 pm and were either in Home or Work relatedcategories.

The check-in distribution of all users in the Weeplaces and Gowalla dataset is illustrated in Figure 1. Figure 8(a)and Figure 8(b) show that the frequency of daily check-ins is <50 for most of the users. This implies that most ofthe users have daily sequence length that is reasonable and exploiting the proposed model within this sequence lengthis enough to evaluate its performance. Among the many factors influencing the check-in trend of the users, distancemeasure is one of the major factors. Figure 5 illustrates the inverse relation between the distance of a location andlikelihood of check-ins (i.e. most users preferred near places (≤1 K.m.)). This implies that most of the users prefer tovisit near places and hence the POIs within a sequence should also be within a reasonable distance. The three days’check-in distribution of three users with most check-ins in Weeplaces dataset is illustrated in Figure 6. Three differentcolor marks are used to distinguish the check-in of different days. The overlapping check-ins on different days areoverlapped in the map and are not distinguishable. This figure also illustrates that most of the users prefer check-insto near locations and most of the users have reasonably smaller sequence length.

0 50 100 150 200 250 3000.00

0.05

0.10

0.15

0.20

(a) Distribution of daily check-ins in Weeplaces dataset

0 100 200 300 400 5000.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

(b) Distribution of daily check-ins in Gowalla dataset

Fig. 4: (a) and (b) show the distribution of daily check-ins (X-axis represents the number of daily check-ins and Y-axisrepresents the fraction of users with the respective number of check-ins)

Fig. 5: Spatial impact on Check-ins

Unlike other studies, we did not split the dataset into different cities because we found that most of the users havecheck-ins spanned across multiple cities and splitting the dataset into multiple cities can disrupt the coherence of theplaces the users have in the dataset. The check-ins were chronologically sorted and for every user, the check-ins weresplitted into subsequence of every 8 hours.

Page 11: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 11

top_user1_3_days

top_user1_3_days

places_top1_user107.csv

All items

places_top1_user224.csv

All items

places_top1_user102.csv

All items

(a) Check-ins of top user

top_user_2_day3

top_user_2_day3

places_top2_user44.csv

All items

places_top2_user46.csv

All items

places_top2_user43.csv

All items

(b) Check-ins of 2nd top user

top_user_3_day3

top_user_3_day3

places_top3_user314.csv

All items

places_top3_user283.csv

All items

places_top3_user306.csv

All items

(c) Check-ins of 3rd top user

Fig. 6: Check-in distribution of 3 highest check-in days of top three users in Weeplaces dataset (Three different colormarks are used for three different days. Repeated check-ins are overlapped and not distinguished. All three users havecheck-ins within reasonable distance and the sequences are of reasonable length.)

4.2 Evaluation Metrics

It is difficult to measure the correctness of the order of items using simple precision, recall, and F-score. We use thepairs-F1 metrics defined in Chen et al. (2016). It considers both the POI identity and its order by using the F1 scoreof every pair of POIs in a sequence and is defined as:

pairs− F1 =2 ∗ PPAIR ∗RPAIR

PPAIR +RPAIR, (18)

where PPAIR and RPAIR are the precision and recall of the ordered POI pairs respectively.We also evaluate the diversity (Bradley and Smyth (2001)) of locations in the sequence. The diversity of locations

is measured using their categorical similarity (i.e. Similarity =1 if two places are of same category and Similarity = 0otherwise). We use the following relation to define the diversity of items in the list:

Diversity (c1, c2, ..., cn) =

n∑i=1

n∑j=i+1

(1− Similarity (ci, cj))

n2 ∗ (n− 1)

(19)

We use the displacement metric to measures the distance (in K.m.) between the predicted POIs and the actualPOIs in the sequence. It is defined as:

Displacement (seqa, seqe) =k∑

i=1

| Distance(seqai , seqei) |, (20)

where seqa and seqe are the actual and estimated sequences respectively.

4.3 Evaluation Baselines

We compared our proposed method with the following baseline models:

1. POI Popularity: This is a naive approach that relies on the popularity of POIs. For a given location, an area withina predefined radius is used to find the most popular POI (i.e. most visits in the locality) within that area whichis not already included in the list. The radius is dynamically updated by a predefined factor when no location isfound in the area.

2. Markov Chain-based approach: We use first order Markov Chain to generate the POI sequences. The Laplacesmoothed state-transition matrix and initial probability matrix are derived from the check-in data and are person-alized for each user.

3. Apriori-based approach: The most frequently checked-in place of a user is considered as the starting point becausethe model does not have provision of user inputs. From the starting point, we select the places that are within somethreshold distance (ε) to get the candidate 1-sets. The candidate 1-sets are used to get the other candidate sets.All the candidates that do not satisfy the constraints are pruned. The constraint checking procedure and candidategeneration procedure continues till the trip of desired length is obtained or the candidate sets are exhausted. Everytrip that has greater than 8 hours travel time are pruned. Basically, we adapt a greedy pruning approach to getrid of the less preferred routes. Among the available candidate sets, we select the one that satisfies the followingcriteria: (i) has higher trip score, and (ii) has lower travel time (see Eqn 6). The trip score is calculated by adding

Page 12: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

12 Ramesh Baral and Tao Li and XiaoLong Zhu

the preference scores (see Eqn. 5 and Eqn. 6) of all the places in the trip. The top-k trips with highest scores arerecommended to the user. Some of the existing studies (Lu et al. (2012); Yu et al. (2016)) have also exploited theApriori-based approach.

4. HITS-based approach (Zheng and Xie (2011)): In this, the locations are hierarchically organized into clusters/regionsand the hub scores of user and authority scores of places are generated relevant to the regions. This facilitates inmodeling the popularity of locations and users within a region. The inference is made by using the adjacent matrixbetween users and locations with respect to the region. The score of a sequence is determined using the hub scoresof users who visited the sequence, and the authority scores of places weighted by the probability that people wouldconsider the sequence. The top k popular sequences are recommended.

4.4 Experimental settings

We used a 5-fold cross validation to measure the performance of the models. An Ubuntu 14.04.5 LTS, 32 GB RAM, aQuadcore Intel(R) Core(TM) i7-3820 CPU @ 3.60 GHz was used to evaluate the models. The same configuration witha Tesla K20c 6 GB GPU was used to evaluate the neural network based models. We used Python 3 as the programmingplatform, Numpy 4 as the mathematical computation library, Pandas 5 as data analysis library, and TensorFlow 6 asthe neural network library.

The users with less than 25 check-ins were ignored. The context and feature vector were estimated from thehistorical check-ins of the users. For each user, 7 most frequently checked-in places were taken as starting point, and10 sequence per starting point was generated. The average metrics on these 10 sequences was observed. The POI-Popularity and Apriori models used distance threshold of 2 K.m. The contextual LSTM used 512 hidden states, andthe contextual RNN used 5 layers and 256 nodes. The input sequence length was set to 25, the data was fed in batchesof size 50, embedding vectors were of size 384, and the experiment was repeated for 100 epochs. The learning rate wasset to 0.002, and the gradients were clipped at 5 to prevent overfitting.

4.5 Experimental Results and Discussions

The pair-wise precision, recall, and F-score of different models is illustrated in Table 2 and Table 3. The diversity anddisplacement-based performance is illustrated in Table 4 and Table 5.

Models PrecisionPAIR RecallPAIR Pair − F1Weeplaces Dataset

POI-Popularity 0.30000 0.16666 0.21428Apriori 0.46079 0.23088 0.30762

POI-Markov 0.49411 0.24711 0.32945HITS 0.49981 0.27336 0.35342

Vanilla RNN 0.49788 0.27618 0.35528LSTM 0.51557 0.27500 0.35868

CAPS-RNN 0.62422 0.41970 0.50192CAPS-LSTM 0.67771 0.43100 0.52690∗

Table 2: Pair F-Score Performance of different models (∗ implies observed difference was statistically significant at95% confidence level) on Weeplace dataset

The popularity-based model performed worst among all the models. It generated almost similar sequences for all theusers and hence was not relevant to personalized preferences. This might be due to the ignorance of the personalizedpreferences of user. The diversity measure was also quite low, which means the POIs in the generated sequencesincluded few categories. The high displacement metrics indicates that the predicted POIs were far from the actualones.

The Apriori-based model had better performance than popularity-based model. Although the Apriori-based modelpruned irrelevant candidate sequences, it also did not capture the personalization aspect. This might be the reasonbehind its low performance. The diversity and displacement measures were also better than the popularity-basedmodel.

3 https://www.python.org4 http://www.numpy.org5 http://pandas.pydata.org6 https://www.tensorflow.org

Page 13: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 13

Models PrecisionPAIR RecallPAIR Pair − F1Gowalla Dataset

POI-Popularity 0.36442 0.20010 0.25834Apriori 0.46922 0.24276 0.31997

POI-Markov 0.49993 0.24981 0.33314HITS 0.50653 0.27993 0.36058

Vanilla RNN 0.51001 0.27896 0.36065LSTM 0.53333 0.44000 0.48219

CAPS-RNN 0.60914 0.43000 0.50412CAPS-LSTM 0.67112 0.44462 0.53487∗

Table 3: Pair F-Score Performance of different models (∗ implies observed difference was statistically significant at95% confidence level) in Gowalla dataset

Models Diversity Displacement(K.m.)Weeplaces Dataset

POI-Popularity 1.20000 23.30785Apriori 1.90000 13.00000

POI-Markov 2.50000 11.72130HITS 4.00000 10.55233

Vanilla RNN 6.22000 10.27620LSTM 6.73000 10.00023

CAPS-RNN 7.11820 8.22990CAPS-LSTM 7.09120 7.77014

Table 4: Diversity and displacement performance (on sequence length of 25) of models in Weeplace dataset

Models Diversity Displacement(K.m.)Gowalla Dataset

POI-Popularity 3.20000 25.22877Apriori 3.33500 13.00000

POI-Markov 3.50000 11.22113HITS 4.00000 11.11224

Vanilla RNN 5.88441 11.00111LSTM 7.22533 10.33333

CAPS-RNN 8.44765 7.77669CAPS-LSTM 8.45001 7.71001

Table 5: Diversity and displacement performance (on sequence length of 25) of models in Gowalla dataset

The first-order Markov model relied on one previous check-in data to determine next location and hence was notable to fully model the check-in sequence generation process. However, its pair-Fscore, diversity and displacementmetrics were better than popularity-based and Apriori-based models which is due to the personalization implied fromseparate initial-probability and state-transition tables for each user.

The HITS-based model slightly outperformed Markov-based model on all three metrics. As it relies on segregation ofplaces into regions and finding the authority and hub scores of the places and users within that regions, its performancedepends on the region generation approach. We used a radius of 10 K.m. from a specified location to generate suchregions. Its performance with the radius of 5 K.m. and 15 K.m. was in par with popularity-based model.

Though not sophisticated, the performance of vanilla RNN was in par with the HITS model. This might be becauseof its capability to retain information of previous items from the sequence. The regular LSTM performed slightly betterthan vanilla RNN due to its implicit ability to cope with vanishing gradient problem.

The CAPS-LSTM model was best performer in both dataset. Its performance was slightly better than CAPS-RNN in all the cases except the diversity metrics in Weeplaces dataset where CAPS-RNN was slightly better. Theperformance of CAPS-LSTM was significant in larger dataset (i.e. Gowalla). This is obvious because the Gowalladataset is larger, and has more social (friendship) relations, which favored the neural networks that need lots oftraining data for better performance. We used the threshold of 6, 8, and 10 hours and found that the thresholdof 8 hours performed better than others. This might however, vary on the dataset used. In terms of execution time,CAPS-LSTM was slower as it used multiple epochs and required lots of training time. The execution time of differentmodels were in the order of: CAPS-LSTM ≥ CAPS-RNN > LSTM ≥ vanilla RNN > HITS > Apriori ≥ Markov >Popularity-based.

To summarize, the CAPS-LSTM performed best, followed by the CAPS-RNN in all three metrics. This evaluationresult supports our claim that extension of RNN and it’s variants by incorporating item-wise contexts and sequence-wise features can be efficient for sequence modeling.

Page 14: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

14 Ramesh Baral and Tao Li and XiaoLong Zhu

5 10 15 20 25 30

0

2

4

6

8

10

Sequence−length

Dive

rsity

Diversity Performance on Weeplace

Popularity

Apriori

Markov

HITS

RNN

LSTM

CAPS.RNN

CAPS.LSTM

(a) Diversity performance on Weeplace dataset

5 10 15 20 25 30

0

2

4

6

8

Sequence−length

Dive

rsity

Diversity Performance on Gowalla

Popularity

Apriori

Markov

HITS

RNN

LSTM

CAPS.RNN

CAPS.LSTM

(b) Diversity performance on Gowalla dataset

Fig. 7: (a) and (b) show the trend in diversity with the increasing sequence length

5 10 15 20 25 30

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

Sequence−length

Disp

lace

men

t

Displacement Performance on Weeplace

Popularity

Apriori

Markov

HITS

RNN

LSTM

CAPS.RNN

CAPS.LSTM

(a) Displacement performance on Weeplace dataset

5 10 15 20 25 30

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

Sequence−length

Disp

lace

men

t

Displacement Performance on Gowalla

Popularity

Apriori

Markov

HITS

RNN

LSTM

CAPS.RNN

CAPS.LSTM

(b) Displacement performance on Gowalla dataset

Fig. 8: (a) and (b) show the trend in displacement with the increasing sequence length

4.6 Impact of sequence length

Figure 7 illustrates the trend of diversity changes with the increasing sequence length. The diversity metric measuresthe extent to which different categories are included in the recommendation. A good recommendation should includeitems with different category that satisfy the user preferences. We can see that the CAPS-RNN and CAPS-LSTM havebetter diversity trend with the increasing sequence length. It is most likely to have locations of different category whensequence length is increased. This effect is clearly reflected in all neural network-based models in both datasets. The

Page 15: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 15

popularity model always recommends popular items and does not focus on the diversity, hence its diversity is lowestof all the models. Although the Apriori, Markov, and HITS models perform better than popularity-based model, theirperformance is lower than that of the neural network-based models. The increase in diversity was quite slow as weincrease the sequence length. This might be because of the fine grained categories which cover only few places andmost of the broader categories cover many items and after some sequence length, the newly added places overlap withthe category of places already in the sequence.

Figure 8 illustrates the impact of sequence length on the displacement. As the popularity model recommends onlythe popular items no matter how far they are, it has highest displacement among all the models. The displacementis increasing with the increasing sequence length. The CAPS-RNN and CAPS-LSTM show similar effect of sequencelength on the displacement. The other models have reasonable displacement until we reach the sequence of length25. For all the models, the increase in displacement is sharp after sequence length of 25. This is also supported bythe check-in trend presented in Figure 4 which shows that the daily check-in frequency of a user is ∼25 and as weincrease the sequence length, the check-ins are most likely to be made on another day (which might be in another cityor some place farther than the one checked-in on previous days). As the CAPS-RNN and CAPS-LSTM have betterperformance for subsequences, they have better performance over other models.

5 Case study on generated trajectories

In this section, we provide a case study on POI sequences generated by popularity-based approach and CAPS-LSTM on Gowalla dataset. We select sequences of length 5 for two different users ’thadd-fiala’, ’boon-yap’ (knownas u1, u2 in rest of the paper) with most check-ins and analyze the relevance of the sequences for them. For u1, asequence of length 5 from popularity model is {’sycamore-place-lofts-cincinnati’, ’pg-gardens-cincinnati’, ’lytle-park-cincinnati’, ’piatt-park-cincinnati’, ’sycamore-place-at-st-xavier-park-apartments-cincin’}, and their respective cate-gories are {’Home/Work/Other:Home’, ’Parks & Outdoors:Plaza / Square’, ’Parks & Outdoors:Park’, ’Parks & Out-doors:Plaza / Square’, ’Home/Work/Other:Home’}. Similarly for user u2 a length 5 sequence is {’starbucks-boston’,’mbta-south-station-boston’, ’boston-common-boston’, ’dunkin-donuts-boston’, ’mbta-park-street-station-boston’}andtheir respective categories are {’Food:Coffee Shop’, ’Travel:Train Station’, ’Parks & Outdoors:Park’, ’Food:Donuts’,’Travel:Train Station’}. Most of the places recommended are the popular ones and the generated sequences have lessdiversity. For both users, there are three different categories in the generated sequences. With the increasing sequencelength, the diversity shows some increasing trend (see Figure 7) but this is lower in both datasets.

With CAPS-LSTM, a sequence generated for user u1 is {’sycamore-place-lofts-cincinnati’, ’pg-gardens-cincinnati’,’piatt-park-cincinnati’, ’lytle-park-cincinnati’,’lpk-cincinnati’}and their categories are {’Home/Work/ Other: Home’,’Parks & Outdoors:Plaza/Square’, ’Parks & Outdoors:Plaza/Square’, ’Parks & Outdoors:Park’, ’Home/Work/ Other:Corporate/ Office’}. For user u2 a sequence is {’starbucks-boston’, ’mbta-park-street-station-boston’, ’boston-common-boston’, ’digitas-boston-boston’, ’hubspot-cambridge’}and their categories are {’Food:Coffee Shop’, ’Travel:Train Sta-tion’, ’Parks & Outdoors: Park’, ’Nightlife: Speakeasy / Secret Spot’, ’Home/Work/Other: Corporate/ Office’}. Wecan observe that for both users, there are four different categories in the sequence and the recommendation is morecontextual. With the increasing sequence length, the diversity shows some increasing trend (see Figure 7) which isbetter with CAPS-LSTM among all other models.

With the popularity-based model, the average displacement of above sequence was 19.36 K.M. for user u1 and itwas 20.03 K.M. for user u2. With the CAPS-LSTM, the average displacement of above sequence was 5.02 K.M foruser u1 and it was 5.61 K.M. for user u2. This shows that CAPS-LSTM addresses the distance constraint better forboth users. With the increasing sequence length, the displacement trend increased for both models and followed thetrend as shown in Figure 8.

Limitations The deep models need lots of training data and the valid check-in sequences (the sequences that haveminimum number of check-ins) that we used might still be insufficient to exploit the full potential of the proposedmodel. As the model needs to be trained offline, the model in its current state might not be efficient for real-timeprediction. The threshold of 8 hours might not be equally applicable for all users, all types of trips, and might varyacross different datasets.

6 Conclusion and Future Work

We formulated the contextual personalized POI sequence modeling problem by extending the recurrent neural networksand its variants. We incorporated different contexts (such as social, temporal, categorical, and spatial) by feeding themto the hidden layer and the output layer. We propagated the feature vector to all the layers of the network and retained

Page 16: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

16 Ramesh Baral and Tao Li and XiaoLong Zhu

the contextual information that was valid throughout the sequence. We evaluated the proposed model with two real-world datasets and demonstrated that the contextual models can perform better than regular models. The proposedmodel performed slightly better with larger datasets. There are many interesting directions to explore. We wouldlike to incorporate the textual attributes (e.g., tags, tips, and review text) and also visual information (e.g., imageof places) to define the preference of users and popularity of places. We would also like to explore other datasets forsequence modeling.

References

A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, S. Savarese, Social lstm: Human trajectory prediction in crowded spaces,in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 961–971

R. Baral, T. Li, Maps: A multi aspect personalized poi recommender system, in Proceedings of the 10th ACM Conference on Recom-mender Systems, ACM, 2016, pp. 281–284. ACM

R. Baral, T. Li, Exploiting the roles of aspects in personalized poi recommender systems. Data Mining and Knowledge Discovery, 1–24(2017)

R. Baral, D. Wang, T. Li, S.-C. Chen, Geotecs: exploiting geographical, temporal, categorical and social aspects for personalized poirecommendation, in Information Reuse and Integration (IRI), 2016 IEEE 17th International Conference on, IEEE, 2016, pp.94–101. IEEE

Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. Journal of machine learning research 3(Feb),1137–1155 (2003)

F. Bohnert, I. Zukerman, J. Laures, Geckommender: Personalised theme and tour recommendations for museums, in InternationalConference on User Modeling, Adaptation, and Personalization, Springer, 2012, pp. 26–37. Springer

K. Bradley, B. Smyth, Improving recommendation diversity, in Proceedings of the Twelfth Irish Conference on Artificial Intelligenceand Cognitive Science, Maynooth, Ireland, 2001, pp. 85–94

C. Chen, D. Zhang, B. Guo, X. Ma, G. Pan, Z. Wu, Tripplanner: Personalized trip planning leveraging heterogeneous crowdsourceddigital footprints. IEEE Transactions on Intelligent Transportation Systems 16(3), 1259–1273 (2015)

D. Chen, C.S. Ong, L. Xie, Learning Points and Routes to Recommend Trajectories, in Proceedings of the 25th ACM International onConference on Information and Knowledge Management, ACM, 2016, pp. 2227–2232. ACM

K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations usingrnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

M. De Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R. Lempel, C. Yu, Automatic construction of travel itineraries usingsocial breadcrumbs, in Proceedings of the 21st ACM conference on Hypertext and hypermedia, ACM, 2010, pp. 35–44. ACM

S. Dunstall, M.E. Horn, P. Kilby, M. Krishnamoorthy, B. Owens, D. Sier, S. Thiebaux, An automated itinerary planning system forholiday travel. Information Technology & Tourism 6(3), 195–210 (2003)

D. Feillet, P. Dejax, M. Gendreau, Traveling salesman problems with profits. Transportation science 39(2), 188–205 (2005)Y. Ge, Q. Liu, H. Xiong, A. Tuzhilin, J. Chen, Cost-aware travel tour recommendation, in Proceedings of the 17th ACM SIGKDD

international conference on Knowledge discovery and data mining, ACM, 2011, pp. 983–991. ACMS. Ghosh, O. Vinyals, B. Strope, S. Roy, T. Dean, L. Heck, Contextual lstm (clstm) models for large scale nlp tasks. arXiv preprint

arXiv:1602.06291 (2016)A. Graves, Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, Session-based recommendations with recurrent neural networks. arXiv preprint

arXiv:1511.06939 (2015)S. Hochreiter, The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of

Uncertainty, Fuzziness and Knowledge-Based Systems 6(02), 107–116 (1998)S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural computation 9(8), 1735–1780 (1997)S. Jiang, X. Qian, T. Mei, Y. Fu, Personalized travel sequence recommendation on multi-source big social media. IEEE Transactions

on Big Data 2(1), 43–56 (2016)K.H. Lim, J. Chan, C. Leckie, S. Karunasekera, Personalized Tour Recommendation Based on User Interests and Points of Interest

Visit Durations., in IJCAI, 2015, pp. 1778–1784K.H. Lim, J. Chan, C. Leckie, S. Karunasekera, Personalized trip recommendation for tourists based on user interests, points of interest

visit durations and visit recency. Knowledge and Information Systems, 1–32 (2017)Q. Liu, Y. Ge, Z. Li, E. Chen, H. Xiong, Personalized travel package recommendation, in Data Mining (ICDM), 2011 IEEE 11th

International Conference on, IEEE, 2011, pp. 407–416. IEEEQ. Liu, E. Chen, H. Xiong, Y. Ge, Z. Li, X. Wu, A cocktail approach for travel package recommendation. IEEE Transactions on

Knowledge and Data Engineering 26(2), 278–293 (2014)X. Liu, Y. Liu, K. Aberer, C. Miao, Personalized point-of-interest recommendation by mining users’ preference transition, in Proceedings

of the 22nd ACM international conference on Conference on information & knowledge management, ACM, 2013, pp. 733–738. ACME.H.-C. Lu, C.-Y. Chen, V.S. Tseng, Personalized trip recommendation with multiple constraints by mining user check-in behaviors,

in Proceedings of the 20th International Conference on Advances in Geographic Information Systems, ACM, 2012, pp. 209–218.ACM

T. Mikolov, Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April (2012)T. Mikolov, G. Zweig, Context dependent recurrent neural network language model. SLT 12, 234–239 (2012)T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur, Recurrent neural network based language model., in Interspeech, vol.

2, 2010, p. 3G. Salton, M.J. McGill, Introduction to modern information retrieval (1986)E. Smirnova, F. Vasile, Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks, in Proceedings of the 2nd

Workshop on Deep Learning for Recommender Systems, ACM, 2017, pp. 2–9. ACMI. Sutskever, J. Martens, G.E. Hinton, Generating text with recurrent neural networks, in Proceedings of the 28th International Con-

ference on Machine Learning (ICML-11), 2011, pp. 1017–1024

Page 17: CAPS: Context Aware Personalized POI Sequence …CAPS: Context Aware Personalized POI Sequence Recommender System 3 (a) Check-in distribution in Weeplace dataset (b) Check-in distribution

CAPS: Context Aware Personalized POI Sequence Recommender System 17

C.-H. Tai, D.-N. Yang, L.-T. Lin, M.-S. Chen, Recommending personalized scenic itinerarywith geo-tagged photos, in Multimedia andExpo, 2008 IEEE International Conference on, IEEE, 2008, pp. 1209–1212. IEEE

B. Twardowski, Modelling Contextual Information in Session-Aware Recommender Systems with Neural Networks., in RecSys, 2016,pp. 273–276

P. Vansteenwegen, W. Souffriau, G.V. Berghe, D. Van Oudheusden, The city trip planner: an expert system for tourists. Expert Systemswith Applications 38(6), 6540–6546 (2011)

X. Wang, C. Leckie, J. Chan, K.H. Lim, T. Vaithianathan, Improving Personalized Trip Recommendation by Avoiding Crowds, inProceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM, 2016, pp. 25–34.ACM

R.J. Williams, D. Zipser, Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropaga-tion: Theory, architectures, and applications 1, 433–486 (1995)

C.-Y. Wu, A. Ahmed, A. Beutel, A.J. Smola, H. Jing, Recurrent recommender networks, in Proceedings of the Tenth ACM InternationalConference on Web Search and Data Mining, ACM, 2017, pp. 495–503. ACM

Z. Yu, H. Xu, Z. Yang, B. Guo, Personalized travel package with multi-point-of-interest recommendation based on crowdsourced userfootprints. IEEE Transactions on Human-Machine Systems 46(1), 151–158 (2016)

Q. Yuan, G. Cong, Z. Ma, A. Sun, N.M. Thalmann, Time-aware point-of-interest recommendation, in Proceedings of the 36th interna-tional ACM SIGIR conference on Research and development in information retrieval, ACM, 2013, pp. 363–372. ACM

C. Zhang, H. Liang, K. Wang, J. Sun, Personalized trip recommendation with poi availability and uncertain traveling time, in Proceedingsof the 24th ACM International on Conference on Information and Knowledge Management, ACM, 2015, pp. 911–920. ACM

J.-D. Zhang, C.-Y. Chow, Geosoca: Exploiting geographical, social and categorical correlations for point-of-interest recommendations,in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM,2015, pp. 443–452. ACM

Y. Zheng, X. Xie, Learning travel recommendations from user-generated gps traces. ACM Transactions on Intelligent Systems andTechnology (TIST) 2(1), 2 (2011)