deep and broad learning on content-aware poi...

Deep and Broad Learning on Content-aware POIRecommendation

Fengjiao Wang∗, Yongzhi Qu†, Lei Zheng∗, Chun-Ta Lu∗ and Philip S. Yu∗∗Department of Computer Science

University of Illinois at Chicago

Chicago, Illinois, 60607, USA

Email: {fwang27,lzheng21,clu29,psyu}@uic.edu†Wuhan University of Technology, China

Email: {quwong}@whut.edu.cn

Abstract—POI recommendation has attracted lots of researchattentions recently. There are several key factors that needto be modeled towards effective POI recommendation – POIproperties, user preference and sequential momentum of check-ins. The challenge lies in how to synergistically learn multi-sourceheterogeneous data. Previous work tries to model multi-sourceinformation in a flat manner, using either embedding basedmethods or sequential prediction models in a cross-related space,which cannot generate mutually reinforce results. In this paper,a deep and broad learning approach based on a Deep Context-aware POI Recommendation (DCPR) model was proposed tostructurally learn POI and user characteristics. The proposedDCPR model includes three collaborative layers, a CNN layerfor POI feature mining, a RNN layer for sequential dependencyand user preference modeling, and an interactive layer basedon matrix factorization to jointly optimize the overall model.Experiments over three data sets demonstrate that DCPR modelachieves significant improvement over state-of-the-art POI recom-mendation algorithms and other deep recommendation models.

Keywords—Spatial Temporal Modeling; Embedding; POI Rec-ommendation;

I. INTRODUCTION

As location-based applications rapidly gain popularity, alarge volume of online contents with geo-tagged information(check-ins) is created daily. Check-ins, as a direct channelconnecting the online and offline worlds, aid the developmentof many personalized and locational information services, suchas personalized advertisement [1], local event promote [2],[3] and city management improvement [4]. One of the coretasks towards these services is Point Of Interest (POI) rec-ommendation, since it not only helps users enriching theirurban experiences but also facilitates the analysis of the crowdmobility and communication.

Most of the prominent approaches to POI recommenda-tion can be divided into three categories: 1) collaborativefiltering, 2) sequential pattern modeling and 3) context-awarerecommendation. Basically, they are derived to learn threetypes of information - user preference, check-in sequences, andtext information, respectively. Recently, some state-of-the-artmodels try to learn two types of information simultaneously,such as PRME [5] and FPMC [6], which model user prefer-ence and sequential patterns together. However, most of theextended variants of the prominent approaches still relied onthe original architecture and integrate other information as side

information. There are several drawbacks of these algorithms.First of all, existing POI recommendation algorithms mainlyfocus on information of users, such as user preference, users’check-in sequence, while ignoring the characteristics of POIs.Second, current algorithms typically model different sourcesof information with the same metric, such as distances inPRME and transition probabilities in FPMC. However, thesesymbolized features may not be suitable to handle differentform of dependencies. Third, they always model consecutivedependencies but ignores long term dependencies in check-in sequences. Moreover, the above-mentioned models are allshallow models, which cannot capture the highly non-linearityof sequential patterns.

Recently, researchers take the content information of POIsinto consideration. Content information can be helpful invarious ways. For instance, a user may search a POI’s reviewsor tips beforehand to decide whether she/he is interested invisiting the place. Therefore, in reality, POIs’ reviews or tipscan actually be part of the inputs that affect a user’s check-in decision. Besides, context information can help identifysemantically similar POIs, e.g., ‘burgers’ often appear in thereviews and descriptions of fast food shops. As shown inrecent works [7], [8], [9], integrating context informationcan be beneficial to alleviate the sparsity problem in POIrecommendation. However, most of these works are based ontraditional topic models that simply use bag-of-word featuresand ignore the word orders. Sentences with similar N-gramsbut total different semantic meanings are hard to differentiatefor bag-of-words based technique [10]. Therefore, previousmethods may not fully uncover semantic information of POIs.Moreover, topic models can be easily affected by the scalabilityproblem and also cannot handle new users and new POIs.

Due to the success of the deep neural networks, researchershave also applied deep models on POI recommendation tasks.Among which, Recurrent Neural Networks (RNN) is espe-cially suitable for sequential prediction. Recently, [11] showsRNN’s superior performance on sequential click prediction. Byconcurrently model spatial and temporal patterns in LBSNsthrough transition matrix of RNN, [12] achieves promisingperformance improvement over matrix factorization based andMarkov chain-based algorithms.

In order to broadly fuse different sources of information(user preference, check-in sequences, and text information), in

369

2017 IEEE 3rd International Conference on Collaboration and Internet Computing

0-7695-6303-1/17/31.00 ©2017 IEEEDOI 10.1109/CIC.2017.00054

this paper we propose a new deep and broad learning modelnamed as Deep Content-aware POI Recommendation model(DCPR) to learn effective representations of POIs and users tofacilitate POI recommendation task. In particular, in the pro-posed model we design a multi-layer deep architecture whichconsists of multiple deep neural networks. The composition ofmultiple layers of deep neural networks can first map the data(POI associated with text information) into a highly non-linearlatent space (POI space), and then the user representationscan be learned through user preference and check-in sequencemodeling with deep neural networks in the produced highlynon-linear latent space.

Specifically, to enable the content-aware features as wellas to address the sparsity problem and long term sequentialpattern mining, the proposed model utilizes convolutionalneural networks (CNN) model to capture semantic informationand common opinion of POIs while preserving the word-orders for the original documents. Then, long short termmemory networks (LSTM) is employed to store user pref-erence through modeling check-in sequences to collectivelylearn user preference from similar users. The LSTM networkand CNN network are connected in a structural manner asLSTM learns user preference and sequential patterns withprior knowledge of POIs’ semantic information by taking therepresentational vectors as input from CNN layer. Finally,the personalized ranking layer on top jointly optimize latentrepresentations produced in the first two layers (convolutionallayers CNN, recurrent layers LSTM), as it refines the learnedlatent features in the first two layers towards generating moreaccurate patterns and better recommendations. The proposedarchitecture makes DCPR model an end-to-end trainable deepmodel.

Contributions of this paper is summarized as follows.

1) We propose a deep and broad learning approachbased on a deep content-aware model (DCPR) inwhich content-based POI features and user specificsequential patterns are learned synergistically. Thehierarchical model can jointly learn a multi-sourceheterogeneous network and is robust to sparsity.

2) We propose a structural pair deep learning model,in which the first deep learning algorithm effectivelylearns an embedding space with latent representationsof POIs, and the second deep learning model learnsglobal structure of the constructed embedding spacewith physical meanings to mine users’ mobility pat-terns. Both the deep representation learning and deepmobility mining are optimized by an unified rankingbased objective function.

3) The proposed model is extensively evaluated on threereal LBSN datasets. The results demonstrate thatit outperforms state-of-the-art sequential modelingmethods and deep recommendation models in POIrecommendation tasks.

4) The proposed deep learning framework can be em-ployed to solve a generic class of problems involvingheterogeneous network learning.

The rest of the paper is organized as follows: Section 2 givesthe details of the problem definition. Section 3 illustrates theproposed architecture and mathematical formulation. Section

4 shows the experimental results as well as the discussion.Section 5 presents a review for the state-of-the-art researchstatus. Section 5 concludes the paper.

II. PROBLEM FORMULATION

In this section, we will introduce the problem formulation.Given a set of users U where U = {u1, u2, ...uN} and a setof POIs P where P = {p1, p2, . . . , pM} in a location-basedservice. N is the total number of users and M denotes the totalnumber of POIs. Each user in U is associated with a list ofcheck-ins in chronological order. For instance, user ui’s check-in list is denoted as Cui , where Cui = {c1ui

, c2ui, ...cnui

}. The

k-th check-in ckuiin the list Cui

is defined as ckui= (ui, pl),

which means that user ui checked in at POI pl at the k-th time stamp. Each POI in P is associated with a list ofreviews or tips. For example, for POI pl, its list of reviews/tipsis denoted as Rpl

, where Rpl= {r1pl

, r2pl, ..., rmpl

} with rjpl

indicates the j-th review/tip in POI pl’s review/tip list. GivenG = (U ,P, C,R), which consists of a list of users U , a list ofPOIs P , all users’ check-ins C, and all POIs’ reviews/tips R,the task is to recommend a certain number of POIs for eachuser based on previous check-ins.

In this paper, we utilize ranking-based loss [13] to train thedeep neural networks. For the ranking-based loss, each trainingsample usually contains a positive item and a negative item.For the proposed problem, each training sample is a sequenceof check-in POIs performed by a user, the positive item isthe POI checked in after the sequence of check-ins, while thenegative item is the POI uniformly sampled from the list ofPOIs that are not in the user’s training sequence. Then, a user,a positive POI, and a negative POI form one training sample.Training data Ds is defined as

Ds := {(ui, pj , pj′)|ui ∈ U ∧ pj ∈ P+i ∧ pj′ ∈ P−i } (1)

where ui, pj , and pj′ are uniformly sampled from U ,P+i ,P−i ,

respectively. P+i denotes the list of positive POIs for user ui,

while P−i represents the list of negative POIs for user ui.

III. THE PROPOSED ARCHITECTURE

In this section, we introduce the proposed model DCPRthat effectively learns embeddings of users and POIs for POIrecommendation through a deep network architecture. DCPRcollectively models broad information on check-in sequencesand text information with the deep neural network in a hi-erarchical manner, and it is coupled with probabilistic matrixfactorization [14] to provide top-N recommendations for users.The advantages of the proposed DCPR model are two-fold.Firstly, DCPR is an end-to-end deep model which can learnsmore representative embeddings of users and POIs. Secondly,the proposed model explains how check-in behaviors areformed by modeling text information and check-in sequencesin a hierarchical order.

A. Architecture

The architecture of the proposed framework is illustratedin Fig. 1. It consists of three components: POI context ex-traction, user preference and check-in sequence modeling, andpersonalized ranking from bottom to top.

370

POIs

User Representation Learning

Word Embeddings

POI Representation Learning

CNN

Ranking Loss

Reviews/tips…

…

…

POI Embeddings…

R R

h

.

.. . .

.

.

. ...

. . ...

. .

.

.. . .

.

.

. ...

. . ...

. ...

. . ...

. .

..

. . ...

. .

… … … … … …

on

.

.. . .

.

.

. . . .

…

nR R

User Embeddings

positive POI

negative POICheck-in Behavior

Learningr

Fig. 1: Network Architecture. The architecture contains three components: 1) POI representation learning; 2) user representationlearning; 3) check-in behavior learning.

At the bottom of DCPR, the POI context extraction com-ponent of the algorithm learns semantic information of POIs togenerate latent representations from reviews/tips by employingCNN. Above the POI context extraction component is thecheck-in sequence modeling component, which is responsiblefor modeling check-in sequences to learn latent representa-tions of users by utilizing LSTM. In the check-in sequencemodeling component, rectangle R stands for recurrent celland h denotes hidden state in LSTM. The POI embeddingslearned by CNN from reviews/tips represent POIs’ propertiesand can help explain users checked-ins. Compared to previousmodels ignoring the textual content, such as [6], [5], it canfacilitate the check-in sequence modeling component to learnmore effective latent representations of users. Futhermore, theabove-mentioned two components are directly connected andorganized in a hierarchical order. At the top of DCPR is thepersonalized ranking component which optimizes the latentrepresentations of users and POIs following the fashion ofprobabilistic matrix factorization [14].

B. POI representation learning

Given all POIs’ reviews/tips, we aim to learn latent rep-resentations of POIs to facilitate POI recommendations. Intu-itively, when a user searches a POI online, he/she is more likelyto browse some of the reviews/tips to sum up the property andgeneral opinions of this POI. To mimic this online behavior andaccurately model POIs from their textual content, we proposeto learn model POIs from their reviews/tips.

To sum up all reviews/tips belonging to one POI, we firstlyconcatenate all reviews/tipcs of the POI into one document.Formally, for the q-th POI pq , its list of reviews/tips can be

Japanese

spots

in

Convoy

area

Document EmbeddingsConvolutional Layer

with Multiple Filters Max Pooling Layer

gs

Fully Connected Layer

Fig. 2: The structure of the POI representation learning com-ponent.

concatenated into one document dq . The dq contains semanticinformation and common opinions of the q-th POI. This helpsto construct a meaningful solution space and facilitate theprediction of users’ future check-ins. Also, it helps learnthe users’ historical behavior more effectively and boost theperformance of prediction.

Given a document dq of POI pq , before feeding to thePOI context extraction component, we first apply a wordembedding function, denoted as Φ, on each word of dq . Φmaps each word into a n-dimentional vector.

Assume there are N words in document dq , then anembedding matrix Pi of document dq is represented as:

Πq = Φ(w1)⊕ Φ(w2)⊕ ...⊕ Φ(wn)⊕ ..⊕ Φ(wN ) (2)

where Φ is a word embedding function mapping each word toa n-dimentional vector, Πq denotes the embedding matrix of

371

document dq , and ⊕ is the concatenation operator. Note thatn-th column of Φ corresponds to embedding of n-th word indocument dq .

Following the embedding function, three inner layers insideCNN, including a convolution layer, a max-pooling layer anda fully connected layer, are built to learn feature vectors ofPOIs. The structure of the POI representation learn componentis illustrated in Figure 2. Next, we will explain these threelayers in details.

Convolutional layers apply convolution operator on doc-ument embeddings to generate new features. A convolutionoperation corresponds to a neuron in neural networks. Itemploys a filter Kj ∈ R

h×t to a window of h words togenerate a new feature. For example, applying convolutionoperation on document dq produces feature zj is defined asfollows.

zqj = f(Πq ∗Kj + bj) (3)

where zqj is the new convolution feature produced by filter Kj ,Πq is the q-th document that convolution operation works on,symbol ∗ is the convolution operator, bj is the bias term, andf is the activation function. Rectified Linear Units (ReLUs)[15] are used as activation units. It has been shown that usingReLUs as activation units in CNN effectively shortening thetraining time of neural networks [16]. The equation of ReLUsis

f(x) = max{0, x} (4)

Following the convolution layer, a max pooling operationis applied on the newly produced features as Eq. (5)

lj = max{z1j , z2j , ..., zn−h+1j } (5)

Here, lj denotes the feature corresponding to filter Kj . For allof the filters, the produced features after max pooling layer is

L = {l1, l2, l3, ..., ln1} (6)

where n1 denotes the number of filters (neurons). The outputof max pooling layer is feed into a fully connected layer as:

xq = f(W × L+ g), (7)

where W is the weight matrix in the fully connected layer,xq ∈ R

n2×1 is latent features of the q-th POI. The fullyconnected layer is designed to learn non-linear combinationof extracted features from convolution and max pooling oper-ations.

C. User representation learning

In this section, we aim to model a user’s interests from theuser’s past POI sequences. Traditional approaches representseach POI with one-hot encoding and lose the rich semantic in-formation existing in the textual information. In order to utilizethe semantic information, in this paper, we build a check-insequence modeling component to utilize POI representationsfrom the POI representation learning component. Given alist of POIs for a user i and their corresponing embeedings,the sequence modeling component generates a vector as theembeeding of user i.

Check-in sequence modeling component employs longshort term memory networks (LSTM) [17] to model check-in sequences with long term dependencies. The architecture

σ

σ σ

tanh

+×tanh××

Ct-1

ht-1

Xt

ht

Ct

Fig. 3: Basic structure of LSTM.

of the LSTM is illustrated in Figure 3. A special mechanismis involved in the basic structure of LSTM which includesmemory cell, input and output gate, and forget gate. Differentparts work collaboratively to store and access information inmemory cell which is a unique part introduced in LSTMto handle long term dependency problem, particularly. Theequations introduced in this special mechanism is listed asfollows.

ft = σ(Wf · [ht−1, xt] + bf ) (8)

it = σ(Wi · [ht−1, xt] + bi) (9)

Ct = tanh(Wc · [ht−1, xt] + bC) (10)

Ct = ft ∗ Ct−1 + it ∗ Ct (11)

ot = σ(Wo[ht−1, xt] + bo) (12)

ht = ot ∗ tanh(Ct) (13)

where, Wf , Wi, Wc, and Wo are weight matrices, and bf ,bi, bC , and bo are bias terms. Equation (8) works in forgetgate layer, which calculates how much information shouldbe discarded for memory cell. In equation (8), ht−1 denoteshidden state in last time stamp, xt stands for the input atthe time stamp t, bf is the bias term in forget gate, ftdetermines how much information should be kept in memorycell, and σ is the sigmoid function. In our scenario, inputxt is the embedding vector of POI checked in at this timestamp. Equation (8) decides what information to forget for thememory cell in last time stamp, while equations (9) and (10)determine what new information should be stored in the newmemory cell. Equation (9) works in input gate layer, whichdecides which values will be updated according to last hiddenstate and input, and equation (10) deploys a tanh layer to create

a vector of new candidate values Ct. Equation (11) updatesmemory cell values in this time stamp. Equations (12) and(13) calculate the values in the new hidden state based onvalues in the new memory cell and hidden state in last timestamp as well as input of this time stamp.

D. Check-in behavior learning

Ranking based loss function attracts lots of attentions lately[6], [5] since it directly optimizes the ranking order of POIs.Essentially, to recommend POIs is to provide a ranking on thelist of POIs with top POIs with high probability to be visitedby user. Inspired by Bayesian Personalized Ranking [18], we

372

可能是h*n�

(a) Data distribution@4sq (b) Data distribution@yelp

Fig. 4: Global distribution of POIs’ location in Foursquare and Yelp datasets.

model the conditional probability over POI j’s latent featureswith Gaussian distribution as

p(vj |xj , λ) = N (vj |xj , λvI) (14)

where I is a K × K identity matrix. Similarly, conditionalprobability over user i’s latent representation with Gaussiandistribution is defined as

p(ui|hi, λ) = N (ui|hi, λuI) (15)

where I is also a K × K identity matrix. The goal is tomaximize the difference between positive POI and negativePOI. The difference probability given user i ∈ U , positivePOI j ∈ P+

i , and negative POI j′ ∈ P−i is defined as

p(ri,j,j′ |ui,vj ,vj′) = σ(uTi vj − uT

i vj′) (16)

where ui, vj , and v′j are latent features of user u, POI j, andPOI j′. Furthermore, σ is sigmoid function.

For optimization, we utilize the technique of MAP. Max-imizing the posterior probability of u, v, and parameters indeep neural networks is to minimize the negative of log-likelihood.

L = −∑

(i,j,j′)∈DS

{logσ(uTi vj − uT

i vj′

+λu

2(ui − hi)

T (ui − hi) +λv

2(vj − xj)

T (vj − xj))}(17)

The first term of equation (17) enforces user preference inthe way of maximizing the difference between product of userfactors with positive embeddings and product of user factorswith negative embeddings. The second and third terms forcesui and vj to be close to user i’s latent factors and POI j’s latentfeatures respectively. Stochastic Gradient Descent algorithm[19] is utilized to minimize the loss function.

IV. EXPERIMENTS

To test whether the proposed architecture can effectivelymodeling users’ check-in sequences and extracting semanticinformation from text, we evaluate the performance of the pro-posed framework and state-of-the-art baselines in this sectionwith various metrics and case studies.

TABLE I: Datasets

Dataset Foursquare Yelp TIST

Users 74,140 30,367 266,909

POIs 104,844 25,728 3,680,126

Check-ins 418,081 146,456 33,263,631

A. Experiment Setting

We conduct extensive experiments to evaluate the proposedDCPR algorithm on the following three datasets. The statisticsof each dataset is summarized in Table I. For Foursquare andYelp datasets, check-ins are from tweets in Twitter network.Each tweet’s source indicates whether it is a Foursquare check-in or Yelp check-in or something else. Foursquare tips are fromFoursquare network. Yelp reviews are from Yelp website. Toremove users or POIs with too few check-ins, we filter outusers with less than 5 check-ins and POIs with less than 3visits for both Foursquare and Yelp datasets. TIST datasetis a public dataset. It is originally utilized for monitoringand visualizing global check-in behaviors [20]. Note that, thisdataset only contain user check-ins and it does not contain textinformation. For TIST dataset, we remove users with less than20 check-ins and POIs with less than 20 visits. Distributionsof POIs’ locations in Foursquare and Yelp datasets are shownin Fig. 4. Yelp dataset contains more check-ins in the UnitedStates, while Foursquare dataset includes check-ins spreadingworldwide which creates more challenges for the learning task.We omit the description for TIST dataset, please refer to thepaper [20] for more details.

We compared the proposed approach with two state-of-the-art POI recommendation algorithms (FPMC, PRME), onetraditional recommendation algorithm (FM), and two deepmodels (RNN, CDL).

1) FPMC [6] refers to factorized personalized Markovchains model, which constructs a transition tensorto model the probability of users’ next behaviorbased on previous behaviors. A factorization modelis proposed to decompose the tensor to estimate theprobability. The factorization model is able to learninformation among similar users and similar items.

2) PRME [5] stands for personalized ranking metricembedding model. It learns two embeddings in two

373

λu:对于每个用户的权重�

0 5 10 15 200

0.05

0.1

0.15

0.2

FPMCPRMEFMRNNDCPR

(a) Precision@TopN (4sq)

0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

FPMCPRMEFMRNNDCPR

(b) Recall@TopN (4sq)

0 5 10 15 200.04

0.06

0.08

0.1

0.12

0.14

0.16

FPMCPRMEFMRNNDCPR

(c) F1-score@TopN (4sq)

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

FPMCPRMEFMRNNCDLDCPR

(d) Precision@TopN (Yelp)

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5


(e) Recall@TopN (Yelp)

0 5 10 15 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16


(f) F1-score@TopN (Yelp)

Fig. 5: Performance on Foursquare and Yelp datasets.

separate spaces. One embedding is based on sequen-tial transition probability, while the other embeddingis based on user preferences. Each user’s top-Nrecommendation is based on linear combination ofthe learned embeddings.

3) FM [21] refers to Factorization Machine. It modelspairwise interactions between all features. Note that,for the proposed problem, there are three types of fea-tures constructed for FM, including one hot encodingof users, combinations of one hot encoding of POIsin check-in sequences, and one hot encoding of POIchecked in after the sequences.

4) RNN [11] is the state-of-the-art deep model forsequential prediction by adopting recurrent neuralnetworks.

5) CDL [22] jointly models text information with deeprepresentation learning and user feedback with col-laborative filtering.

6) DCPR is the proposed method in this paper.

To evaluate the performance of the different approaches,for each user, we pick the first 80% of check-ins as trainingdata, and the remained 20% of check-ins are considered astesting data. For the FPMC algorithm, the training data isfurther divided into 80% and 20%, for training and validation,respectively. Learning rate is set to 0.005, the parameter forthe regularization term is set to 0.03, and the factorizationdimension is set to 20. For the PRME algorithm, parameterα and latent dimension are set to 0.02 and 60 respectively,which follows the setting in the original paper. For the RNN

algorithm, the dimension of POIs’ embeddings is set to 50,the number of neurons in the recurrent layer is set to 64, crossentropy is employed as the loss function. For the proposedDCPR algorithm, embedding dimension of POIs is set to 50.For the convolution layer, the number of filters is set to 100,filter length is set to 3. The number of neurons in the fullyconnected layer and the recurrent layer is set to 50. Note that,we use different latent dimensions for different comparisonalgorithms to optimize the performance for each case.

Three metrics are used to evaluate the performance of thecompared methods. The output of the compared methods isa ranked list of all POIs which indicate the likelihood ofthe POI being checked in at the testing period from highto low. The first metric is Precision@N, which measures thepercentage of correct predictions in the top-N ranked list. Thesecond metric Recall@N measures the percentage of correctpredictions in the top-N ground truth set. Note that, top-Nground truth set is constructed based on the time differencebetween training check-in sequence and testing check-ins. Thecloser the time difference is, the higher position the POI’s takesin the top-N ground truth list. The third metric F1-score@Nis the harmonic mean of above-mentioned two metrics, whichshows a comprehensive evaluation of the compared methods.

B. Performance Comparison

Fig. 5 shows the performance of POI recommendationon Foursquare and Yelp datasets with metrics Precision@N,Recall@N, and F1-score@N. N varies from 1 to 20. Fourobservations are made as follows.

374

0 5 10 15 200.05

0.1

0.15

0.2

0.25

FPMCDCPR

(a) Precision@TopN (TIST)

0 5 10 15 200

0.05

0.1

0.15

0.2

FPMCDCPR

(b) Recall@TopN (TIST)

0 5 10 15 200.02

0.04

0.06

0.08

0.1

0.12

0.14

FPMCDCPR

(c) F1-score@TopN (TIST)

Fig. 6: Performance on TIST dataset.

50% 60% 70% 80%0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

FPMC

PRME

FM

RNN

DCPR

(a) Precision@Top5 (4sq)

50% 60% 70% 80%0.00

0.05

0.10

0.15

0.20

0.25

0.30

FPMC

PRME

FM

RNN

DCPR

(b) Recall@Top5 (4sq)

50% 60% 70% 80%0.00

0.05

0.10

0.15

0.20FPMC

PRME

FM

RNN

DCPR

(c) Fscore@Top5 (4sq)

50% 60% 70% 80%0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

FPMC

PRME

FM

RNN

CDL

DCPR

(d) Precision@Top5 (Yelp)

50% 60% 70% 80%0.00

0.05

0.10

0.15

0.20

0.25 FPMC

PRME

FM

RNN

CDL

DCPR

(e) Recall@Top5 (Yelp)

50% 60% 70% 80%0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

FPMC

PRME

FM

RNN

CDL

DCPR

(f) Fscore@Top5 (Yelp)

Fig. 7: Performance comparison with varied training size.

• DCPR consistently outperform other compared meth-ods in two datasets, as shown in Fig. 5. Although inyelp dataset, PRME achieves slightly better resultswhen N = 1, the proposed DCPR algorithm per-forms the best in most of general cases. The reasonthat PRME shows slightly higher results is becausethat PRME utilizes metric embedding technique tomodel sequential transition probability. The metricembedding technique is designed to learn transitionprobability between consecutive check-ins, but it cannot model long term sequential influences. In con-trast, the proposed DCPR algorithm employ specialrecurrent structure to particularly modeling long-term

dependencies, therefore, DCPR wins in almost all ofthe varied N.

• FM usually performs well in rate prediction tasks,while it achieves inferior results compared to othermethods in POI recommendation task. Although FMcaptures all pairwise interactions between all features,the model is incapable of differentiate the importanceof different feature interactions. Therefore, it is notable to focus on important feature interactions andignore insignificant ones. In comparison, the proposedDCPR have different parts to specialize on modelingspecific type of information and jointly learn the im-

375

portance of each part in one loss function. Therefore,it achieves superior results compared to FM.

• The proposed DCPR outperforms other two deepneural network based models. It can be seen fromFig. 5 that DCPR achieves much higher accuracycompared with typical RNN algorithms and CDLas well. Even though RNN tries to model check-in sequences, long term dependencies may not becaptured by deep recurrent neural networks. Also,RNN ignores text information and thus loses anothersource of information to tackle the problem. AlthoughCDL algorithm learns deep representation for contentinformation, it is not capable of modeling sequentialinfluence. The proposed DCPR algorithm models textinformation and check-in sequences simultaneously,so it outperforms RNN and CDL with big margin.

• Comparing the performance of the comparison meth-ods on three different metrics, we observe that Pre-cision@N and Recall@N always monotonically de-crease or increase in all three datasets, while Re-call@N shows non-monotonic trending. It increasesfirst and then decrease. It is worth noticing that DCPRalmost always achieves the biggest improvement whenthe comparison algorithms are at their highest F1-score. For example, in Foursquare dataset, Fig. (5c),when N = 4, DCPR achieves 13% improvement overFPMC, and DCPR obtains 128% improvement overFM. Interestingly, for Foursquare and Yelp datasets,almost all algorithms perform best when N = 4.It is probably because Foursquare and Yelp datasetscontain more users with short sequences.

From Fig. 5, we can conclude that FPMC is the bestperforming baseline method. Therefore, for the large-scaleTIST dataset, we only compare the performance of the pro-posed DCPR algorithm with FPMC in Fig. 6. Since the TISTdataset does not contain text information, we accommodate theproposed DCPR algorithm to only generate embeddings forPOIs by omitting the convolution feature generating process.We can see that, for all three different metrics, DCPR alwaysoutperforms FPMC with a big margin. For instance, for F1-score@N metric, when N equals to 12, DCPR achieves 15.2%improvement over FPMC. For the F1-score@N metric, bothalgorithms perform the best when N = 15. It is probablybecause TIST dataset include users with longer sequencescompared to that of Foursquare and Yelp datasets. It showsthat the proposed DCPR is robust in terms of varying sequencelength. Also, compared to the performance on foursquareand yelp datasets, the proposed DCPR algorithm achieves thelargest improvement in TIST dataset. It is probably due to thereason that the proposed DCPR is especially good at modelinglong term dependencies and average sequence length of TISTdataset is much longer than that of other two datasets.

The robustness of the proposed algorithm is also testedby varying the size of the training check-ins in Foursquareand Yelp datasets. Also, we pick N = 5 for illustrationpurpose. As can be seen in Fig. 7, the proposed DCPR alwaysoutperform other compared algorithms. For instance, in Yelpdataset, when the size of the training data increase from 50% to80%, FPMC’s Recall@Top5 increases 7.54%, while DCPR’s

(a) Precision@Top5-4sq (b) Recall@Top5-4sq

(c) Precision@Top5-Yelp (d) Recall@Top5-Yelp

Fig. 8: Gain over FPMC on Foursquare and Yelp datasets.

Recall@Top5 increases 14.92%. The evaluation results on thesize of training set indicate that DCPR is capable of producinghigh-quality embedding vectors of users and POIs.

Besides evaluating the proposed approach on the wholedataset with different metrics in macro level, we also showa comprehensive study on the performance of the comparedapproaches in micro level. Specifically, we study the perfor-mance of the proposed algorithm on different users groupswhere users are clustered according to the length of theircheck-in sequences. As an illustration example, Fig. 8 showsthe gain of DCPR over FPMC in Precision@5 and [email protected] pick users with modestly long sequences in the overallpopulation for Foursquare and Yelp datasets. At the same time,population density of each group is shown to provide in depthunderstanding of the performance of different algorithms. Thepopulation density of each groups is indicated by the size oforange marker in fig. 8. First of all, in all of the differentgroup of users, DCPR achieves larger than 10% improvements.Interestingly, for both datasets, highest improvement alwaysachieved when users having 11 or 12 check-ins. For instance,when sequence length equals to 11, the proposed algorithmachieves nearly 50% improvement over FPMC in Foursquaredataset, while it also improves FPMC nearly 70% in Yelpdataset. The possible reason for this observation is that whenfeeding too long a sequence from the past may contain morenoise, while too short a sequence does not capture enoughbehavior information.

C. Sensitivity analysis

We perform the sensitivity analysis in Fig. 9 on twoparameters: one is the number of convolution kernels n1, whilethe other is the number of latent recurrent features n2. Theseresults are all based on Yelp dataset due to space limitation.Upper two figures show results of n1, while bottom two figures

376

(a) Precision@5, n1 (b) Recall@5, n1

(c) Precision@5, n2 (d) Recall@5, n2

Fig. 9: Sensitivity analysis.

display results of n2. First column’s figures display analysis onthe Precision@5, while the second column’s figures indicateanalysis on Recall@5. As can be seen, for parameter n1,when it increases from 5 to 50, values increase, however,when it increases beyond 50, values almost stay same. Forthe parameter n2, when it increases from 5, the performanceincreases drastically, when it reaches 50, the performance staysevenly. Therefore, for the proposed DCPR algorithm, we pickthe number of convolution kernels equals to 100 and thenumber of recurrent features as 50.

V. RELATED WORK

A. POI Recommendation

Similar like the traditional recommender systems, matrixfactorization technique is introduced in POI recommendation[23], [24]. Different from item recommender systems whichemploy explicit user feedback such as ratings, POI recom-mendation utilize implicit user behavior (check-ins) as userfeedback. Other implicit information is introduced such aslocation of check-in POIs, temporal information of check-ins,and social networks. Some recent works focus on leveraginggeographical [23], [24], social influences [24] and temporaleffects. [24] combines users’ preference, social influence, andgeographical influence based on matrix factorization frame-work. [23] proposes a GeoMF model which jointly modelsgeographical information and user preference. [25] introducesranking based loss into the GeoMF model.

Sequential pattern mining gains lots of attentions latelyin personalized recommendation [6], [26]. Rendle at al. [6]proposes a FPMC model which constructs a personalizedprobability transition tensor based on Markov chains. Then,a factorization model is proposed to estimate the transitiontensor. FPMC model is extended by incorporating geographicalconstraints [27]. Embedding technique [5], [28] attracts lots ofresearch attentions lately since it is capable of learning betterrepresentations for various tasks. Personalized Ranking Metric

Embedding (PRME) [5] model learns embeddings in twoseparate spaces which models sequential transition probabilityand user preference. Bayesian personalized ranking loss isintroduced to combine learned embeddings to predict futurecheck-ins. Instead of learning POI representations only fromprevious check-ins, [28] proposes to learn representations fromsurrounding check-ins inspired by skip-gram. [29] incorporatesskip-gram model with bayesian personalized ranking loss.Even though PRME also models sequential pattern and userpreference, simply linear combination of embeddings cannotexplain the complex relationship interacted between these twofactors.

B. Context-Aware Recommendation

Although spatial, temporal, and social information havebeen investigated in POI recommendation, text information isrelatively less explored in POI recommendation[30], [8], [9].Text information includes reviews, tags, tips, and categories,etc. [8] proposes a topic and location aware probabilisticmatrix factorization model using POI-associated tags. Firstly,users’ interest with respect to semantic topics is learned fromtext information of POIs through Latent Dirichlet Allocation(LDA) model. Then, learned users’ topic interests is comparedwith POIs’ topic distribution to find potential POIs utilizingprobabilistic matrix factorization. Meanwhile, word-of-monthopinions are considered in the above-mentioned factor-basedmodel. Yang et al. [7] employs sentiment analysis techniquesto extract users’ preference from text information (tips).And then, preference inferred from contents is consideredsimultaneously with preference learned from users’ check-in behavior. Factor analysis framework is also extended tomodel geographical influence [24]. Similar to LDA model,[9] proposes a spatial topic model by simultaneously mod-eling spatial and content information in Twitter networks.[31] investigates personal and local preferences from POIs’contents. [30] exploits contents associated with POIs’ andcomments written by users with weighted matrix factorization.[32] models personal preferences and sequential influence witha latent probabilistic generative model.

Above-mentioned models learn text similarity only basedon lexical similarity. Two reviews can be semantically similarwhen they have low lexical overlaps, as English vocabularyis very diverse. These works ignores semantic meaning whichplays an important role in understanding POIs. In addition,topic modeling-based approaches can easily be affected bysparsity problem and also cannot cope with new users andPOIs.

C. Deep Learning for Recommendation

Lately, neural network based methods attract lots of at-tentions not only because it generates useful representationsfor various learning tasks but also delivers state-of-the-art per-formance on natural language processing and other sequentialmodeling tasks [11], [12]. Among which, Recurrent NeuralNetworks (RNN) is especially good at modeling sequence [33],[34]. For example, [11] shows RNN’s superior performanceon sequential click prediction. By concurrently model spatialand temporal patterns in LBSNs through transition matrixof RNN, [12] achieves promising improvement over matrixfactorization-based and markov chain-based algorithms.

377

Researchers start to focus on employing neural networkbased models for traditional recommender systems [35], [22].[35] proposes an item recommendation algorithm which jointlymodels users and items from reviews utilizing deep neuralnetworks.

As discussed above, while there are studies try to model se-quential pattern in check-in sequences and review text in itemrecommender system, they did not address both challengessimultaneously. Instead of learning sequence from markovchain-based models, the proposed DCPR model learns per-sonalized sequential behaviors with the aid of advanced deepmodel. Instead of only relying on topic modeling based modelsto handle review text, the proposed DCPR learns the semanticmeaning and sentimental attitudes of reviews with deep CNNmodel.

VI. CONCLUSION

This paper proposed a deep content-aware POI recom-mendation (DCPR) algorithm to tackle the problem of POIrecommendation. Broad learning from multiple sources ofinformation is utilized to solve this challenging problem.Specifically, text information associated with POIs and users’check-in sequences are simultaneously modeled in this paper.Furthermore, two different types of deep neural networks arecombined in an architectural framework with each one learnsone information source, and finally a ranking-based loss isintroduced to learn the users’ overall check-in behaviors. Theproposed DCPR model learns different source informationdiscriminatively. Therefore, it can synergistically learns multi-source heterogeneous networks. To this end, it is a deepand broad learning model. Evaluation on three different real-world datasets demonstrated the effectiveness of the proposedapproach. For future work, other side information such astemporal information and geographical information can be in-cluded in the proposed framework. Besides, the proposed deepframework can be further extended for event recommendation.

ACKNOWLEDGMENT

This work is supported in part by NSF through grants IIS-1526499, and CNS-1626432, and NSFC 61672313.

REFERENCES

[1] A. Agarwal, K. Hosanagar, and M. D. Smith, “Location, location,location: An analysis of profitability of position in online advertisingmarkets,” JMR’11.

[2] R. Lee and K. Sumiya, “Measuring geographical regularities of crowdbehaviors for twitter-based geo-social event detection,” in SIGSPATIALWorkshop’10.

[3] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users:Real-time event detection by social sensors,” in WWW’10.

[4] C. Xia, R. Schwartz, K. Xie, A. Krebs, A. Langdon, J. Ting, andM. Naaman, “Citybeat: Real-time social media visualization of hyper-local city data,” in WWW’14.

[5] S. Feng, X. Li, Y. Zeng, G. Cong, Y. M. Chee, and Q. Yuan, “Person-alized ranking metric embedding for next new poi recommendation,” inIJCAI’15.

[6] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, “Factorizing per-sonalized markov chains for next-basket recommendation,” in WWW’10.

[7] D. Yang, D. Zhang, Z. Yu, and Z. Wang, “A sentiment-enhancedpersonalized location recommendation system,” in HT’13.

[8] B. Liu and H. Xiong, “Point-of-interest recommendation in locationbased social networks with topic and location awareness,” in SDM’13.

[9] B. Hu and M. Ester, “Spatial topic modeling in online social media forlocation recommendation,” in RecSys’13.

[10] H. M. Wallach, “Topic modeling: Beyond bag-of-words,” in ICML’06.

[11] Y. Zhang, H. Dai, C. Xu, J. Feng, T. Wang, J. Bian, B. Wang, and T.-Y.Liu, “Sequential click prediction for sponsored search with recurrentneural networks,” in AAAI’14.

[12] Q. Liu, S. Wu, L. Wang, and T. Tan, “Predicting the next location: Arecurrent model with spatial and temporal contexts,” in AAAI’16.

[13] W. W. Cohen, R. E. Schapire, and Y. Singer, “Learning to order things,”J. Artif. Int. Res., 1999.

[14] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” inNIPS’08.

[15] V. Nair and G. E. Hinton, “Rectified linear units improve restrictedboltzmann machines,” in ICML’10.

[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in NIPS’12.

[17] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” NeuralComput., 1997.

[18] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr:Bayesian personalized ranking from implicit feedback,” in UAI’09.

[19] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller, “Effiicient backprop,”in Neural Networks: Tricks of the Trade, 1998.

[20] D. Yang, D. Zhang, L. Chen, and B. Qu, “Nationtelescope: Monitoringand visualizing large-scale collective behavior in lbsns,” J. Network andComputer Applications, 2015.

[21] S. Rendle, “Factorization machines with libFM,” ACM Trans. Intell.Syst. Technol., 2012.

[22] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learning forrecommender systems,” in KDD’15.

[23] D. Lian, C. Zhao, X. Xie, G. Sun, E. Chen, and Y. Rui, “Geomf: Jointgeographical modeling and matrix factorization for point-of-interestrecommendation,” in KDD’14.

[24] B. Liu, Y. Fu, Z. Yao, and H. Xiong, “Learning geographical preferencesfor point-of-interest recommendation,” in KDD’13.

[25] X. Li, G. Cong, X.-L. Li, T.-A. N. Pham, and S. Krishnaswamy, “Rank-geofm: A ranking based geographical factorization method for point ofinterest recommendation,” in SIGIR’15.

[26] J.-D. Zhang, C.-Y. Chow, and Y. Li, “Lore: Exploiting sequentialinfluence for location recommendations,” in SIGSPATIAL’14.

[27] C. Cheng, H. Yang, M. R. Lyu, and I. King, “Where you like to gonext: Successive point-of-interest recommendation,” in IJCAI’13.

[28] X. Liu, Y. Liu, and X. Li, “Exploring the context of locations forpersonalized location recommendations,” in IJCAI’16.

[29] S. Zhao, T. Zhao, I. King, and M. R. Lyu, “Geo-teaser: Geo-temporalsequential embedding rank for point-of-interest recommendation,” inWWW Companion’17.

[30] H. Gao, J. Tang, X. Hu, and H. Liu, “Content-aware point of interestrecommendation on location-based social networks,” in AAAI’15.

[31] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen, “Lcars: A location-content-aware recommender system,” in KDD’13.

[32] W. Wang, H. Yin, S. W. Sadiq, L. Chen, M. Xie, and X. Zhou,“SPORE: A sequential personalized spatial item recommender system,”in ICDE’16.

[33] A. Graves, “Generating sequences with recurrent neural networks,”CoRR’13.

[34] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradientflow in recurrent nets: the difficulty of learning long-term dependen-cies,” in A Field Guide to Dynamical Recurrent Neural Networks, 2001.

[35] L. Zheng, V. Noroozi, and P. S. Yu, “Joint deep modeling of users anditems using reviews for recommendation,” in WSDM’17.

378

deep and broad learning on content-aware poi...

Documents