personalized travel recommendation by mining people...

Personalized Travel Recommendation by Mining PeopleAttributes from Community-Contributed Photos

An-Jung Cheng†, Yan-Ying Chen†, Yen-Ta Huang†, Winston H. Hsu†

and Hong-Yuan Mark Liao‡

† Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan‡ Institute of Information Science, Academia Sinica, Taipei, Taiwan

{anon,yanying,hyd0916}@[email protected], [email protected]

ABSTRACTLeveraging community-contributed data (e.g., blogs, GPSlogs, and geo-tagged photos) for travel recommendation isone of the active researches since there are rich contexts andtrip activities in such explosively growing data. In this work,we focus on personalized travel recommendation by leverag-ing the freely available community-contributed photos. Wepropose to conduct personalized travel recommendation byfurther considering specific user profiles or attributes (e.g.,gender, age, race). In stead of mining photo logs only, we ar-gue to leverage the automatically detected people attributesin the photo contents. By information-theoretic measures,we will demonstrate that such people attributes are informa-tive and effective for travel recommendation – especially pro-viding a promising aspect for personalization. We effectivelymine the demographics for different locations (or landmarks)and travel paths. A probabilistic Bayesian learning frame-work which further entails mobile recommendation on thespot is introduced. We experiment on four million photoscollected for eight major worldwide cities. The experimentsconfirm that people attributes are promising and orthogo-nal to prior works using travel logs only and can furtherimprove prior travel recommendation methods especially indifficult predictions by further leveraging user contexts inmobile devices.

Categories and Subject DescriptorsH.4 [Information Systems Applications]: Miscellaneous;H.2.8 [Database Applications]: [Data mining, Spatialdatabases and GIS]

General TermsAlgorithms, Experimentation, Human Factors

KeywordsPersonalized tourist recommendation, geo-tagged photos, routeplanning

*Area chair: Daniel Gatica-Perez

Permission to make digital or hard copies of all or part of this workfor personal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. To copyotherwise, to republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee.MM’11, November 28–December 1, 2011, Scottsdale, Arizona, USA.Copyright 2011 ACM 978-1-4503-0616-4/11/11 ...$10.00.

��

��

��

��

��

��

��

��

��

Figure 1: Community-contributed photos are freelyavailable and with rich context information (e.g.,geo-tags, time). We can mine the travel patterns forthe users by associating their trips with the majorlandmarks. Meanwhile, we can indirectly mine de-mographic information (e.g., gender, race, age) fromthe photos along with the trips by automaticallydetecting people attributes in the photos. In thisexample, the width of an arrow denotes the travelfrequency between the two locations. The color re-gions are proportional to the percentages of genders– male (blue) and female (red). Such minded de-mographic information is promising for personalizedtravel recommendation and planning; for example,suggesting the best route for Asian travelers in Man-hattan or the next location for a family group aftervisiting Rockfeller Center.

1. INTRODUCTIONWith the prosperity of the social media and the success of

many photo-sharing websites, like Flickr and Picasa, the vol-ume of community-contributed photos have increased dras-tically. Such large-scale user-contributed photos contain richmetadata information such as tags, time, and geo-locations(or geo-tags). These overwhelming amounts of context data,though noisy, is tremendously useful for many multimediaapplications including annotation, searching, advertising andrecommendation [15].

Among all the applications, travel recommendation hasbeen attracted by many researchers because of the impor-tance and the intrinsic relationship between people’s every-

day lives. In general, a typical travel recommendation sys-tem consists of two aspects: generic recommendation andpersonalized recommendation [19]. For the generic recom-mendation, it contains the suggested travel information forthe destination given by user when he/she is planning a trip,which answers the question like ”I want to go to New York,what are the must-see attractions?”. The personalized rec-ommendation, on the other hand, takes user’s profile intoaccount such that it can provide a more appropriate recom-mendation result matching user preferences. Both aspectsare to support route planning before the journey.

Millions of human-sensors [16] capture different aspects ofthe spatio-temporal information. In order to mine the travelknowledge automatically, a focus of recent interest is the useof user-contributed resources, including the textual travel-ogues (i.e., blogs or logs) ([10, 9, 8]) and photos taken duringthe trips ([2, 14]). Discovering and summarizing knowledgefrom these huge amount of user-contributed multimedia re-sources provide us useful tourist information.

However, the previous works solely consider the travel logsand ignore the rich people attributes in the photo contents.Such rich people attributes can be automatically detectedand provide another important aspect regarding travel de-mographics (e.g., gender, race, age). Figure 1 illustratesthe real attribute-oriented travel movements in our datasetin Manhattan1. Rather than the plain travel frequenciesbetween certain locations, we can further investigate theirdemographic distributions for the people attributes, whichare promising for personalized travel recommendation andplanning. Meanwhile, we had observed that such (detected)people attributes are correlated with traveling between lo-cations (cf. Section 3.2).

Additionally, based on the mined people attributes andpreferred travel patterns between locations, it is able toadopt the mobile devices where the user profile and con-text information (e.g., geo-locations) can be easily detectedfrom mobile sensors and further entails the “location-aware”– recommending next travel location from his/her currentlocation or even delivering context-related advertisementsor services ([4, 13]).

In this work, we focus on the personalized recommenda-tion framework to provide not only a context-aware recom-mendation system (i.e., mobile travel recommendation) butalso a route planning application before the journey. Thepersonalization is achieved by adopting specific user profileswith the automatically detected people attributes (e.g., gen-der, age and race) along with the trips. That is, the questionwe want to answer is, for example, ”For a male, what is thesuggested travel sequence in Rome?” (route planning), ”I ama female, I am now at Central Park in Manhattan, what isthe next suggested destination?” (mobile travel recommen-dation). Although users are reluctant to reveal their trueidentities or profiles in the community-contributed photos,we can actually indirectly mine the demographic informa-tion about these trips from their associated photos.

In this sense, we crawl 4 millions geo-tagged photos be-longing to eight major worldwide cities, adopt the peopleattribute detectors on the generated photo trips and thenprovide a probabilistic recommendation model based on theuser’s profiles and the travel logs information. Experimentsshow that the people attributes are very informative and

1Note that the photos in the paper are courtesy of Flickrusers under the Creative Commons License.

helpful for both mobile travel recommendation and routeplanning. To summarize, the contributions of this paperare:

• To our best knowledge, this is the first research workthat uses the additional context in the photo, i.e., peo-ple attributes, to support the personalized recommen-dation framework. We leverage these automaticallydetected people attributes in the large-scale photosfor social media mining and uncover the differencesin travel behaviors across demographic groups.

• We propose a probability-based personalized travel rec-ommendation model based on user’s attributes andthe knowledge mined from travel logs. Such schemeis promising to apply in mobile environment. (Section7)

• We conduct the experiments on eight major cities inthe world and show that using people’s attributes hasthe potential to improve the personalized travel rec-ommendation, especially in the location where peoplehave diverse choices of the next stops. (Section 9)

The remainder of this paper is structured as follows. Inthe next section, we discuss the related work. Section 3are some attribute-related characteristics on photo trips andour system overview is described in Section 4. Section 5introduces people attribute detection scheme. The proposedrecommendation algorithms and the application are mainlyin Section 6, Section 7 and Section 8. The experimentalresults and discussions are reported in Section 9, followedby the conclusion in Section 10.

2. RELATED WORKTrip mining and recommendation has been shown impor-

tant in recent years. Generally, the data sources for learn-ing to recommend can be roughly classified into three cat-egories: GPS trajectory data, travelogues (i.e., blogs), andgeo-tagged photos.

GPS trajectory data which obtained by GPS receiversare mainly used at the early stage. Zheng et al. [18, 19,20] utilize GPS trajectory data to extract the interestinglocations, classical travel sequences and provide a personal-ized friend and location recommender using the similarity ofusers in terms of their location histories. The main obstaclefor trajectories-based method is that the data resources isnot easy to obtain from a large number of people.

[10, 9, 8] provide location-based travel recommendationwhich analyzes the blogs to obtain trip-related knowledge.[10] emphasizes on the mining of city landmarks by a graph-based method. [9] proposes a probabilistic topic model whichdiscovers topics from travelogues and then represents loca-tions with appropriate topics for further destination recom-mendation and summarization. The goal of [8] is to auto-matically recognize and rank the landmarks for travelers.They use geo-tag information, metadata of photos and userknowledge in Yahoo! Travel Guide to identify and ranklandmarks for any location specified by the travelers. Thetravelogue-based method has the difficulty in determiningthe certain location of travelogues which are usually unstruc-tured and contain much noisy metadata. They only play arole of destination recommendation which merely shows theinformation about a location.

Table 1: Relationship of people attribute and travel patterns measured by entropy and mutual informationin 5 locations sampled from 5 major worldwide cities. Taking the nearest five locations (indexed by j) intoconsiderations, as starting from location i, the entropy for the uniform choice is H(Li→j) = 2.3219. We showhow the entropy (uncertainties in choosing next visiting location) will be reduced as introducing mined (fromcommunity-contributed photos) knowledges. Here F denotes the calculated visiting frequencies betweenlocation i and j from the photo trips (cf. Figure 4(d)); A denotes the (detected) people attributes along withthe photo trips. We can see that the people attributes (A) are promising to predict travel locations Li→j .See more explanations in Section 3.2.

City Location i H(Li→j) H(Li→j |F ) H(Li→j |A,F ) I(Li→j ;A|F )

Manhattan Madison Square 2.3219 1.9911 1.4582 0.5329Rome Castel Sant’Angelo 2.3219 1.8909 1.0068 0.8841Berlin Marx-Engels Forum 2.3219 0.8281 0.1883 0.6397Barcelona Casa de la Caritat 2.3219 0.9751 0.3907 0.5844Taipei Xinyi Square 2.3219 1.9796 0.8566 1.1229

Recently, there is an increasing tendency to adopt theinformation from geo-tagged photos. Crandall et al. [7]systematically adopt large-scale photos to discover impor-tant landmarks. They evaluate on many cities and indi-cate that the time-stamped and geo-tagged photos will con-struct the typical pathways of people movements. Arase etal. [2] define the problem of photo trip pattern mining andshow an application with which users can search frequenttrip patterns given some preferences (e.g., destination, visitduration, trip theme). Although they also provide a simi-lar route planning framework, the recommendation level isinter-cities, while our work is based on the intra-cities levelfor a one-day trip recommendation in a specific city. Most ofall, we can further recommend travel routes satisfying thedesignated user profiles (e.g., gender, age, race) by lever-aging people attributes in the user-contributed photos; forexample, the best one-day trip in Manhattan for the Asiangirls.

Another similar work is [14], which uses geo-tagged photosand textual blogs information on landmark generation, pathdiscovering, and route planning. They merge incompletepaths inside a destination to suggest tourists with typicaltravel paths and stay times in a destination. For route plan-ning, they model as a graph analysis problem and use dy-namic programming as the solver. We also have the similarprocesses and extract travel patterns the from pictures butwe want to emphasize that the major difference between ourwork and other related researches described above is that webring in the concept of people attributes in the travel photosand consider these demographic information with the move-ments of photographer into a personalized travel recommen-dation framework. Besides, we have fundamentally differentdefinitions on the travel patterns. That is, the travel pat-terns in this study are measured from the face detectionresults and further augmented by the face attributes (e.g.,gender, age, race) extracted from the photos. By this novelaspect, we can mine more human attributes for effectivetravel personalization. For example, we will have the travelpreferences about Caucasian males or Asian kids. Addi-tionally, in the previous researches (e.g., [14, 2]), the travelpatterns are defined over travel objects such as landmarks,events only. We will demonstrate in the next section thatsuch automatically detected people attributes are informa-tive for mobile travel recommendation and route planning.

3. OBSERVATIONS

3.1 Rich People Attributes in User-ContributedPhotos

Using geo-tagged and time-stamped photos from socialmedia as a resource for travel information mining have at-tracted a lot of researchers in recent years (e.g., [7, 2, 14,5]). These large amounts of photo trajectories not only re-veal the users’ travel movements but are also promising formining the demographic information about the locations bydetecting people attributes on the photos. Authors in [1]have shown that users are willing to share photos for orga-nization and (social) communication purposes – especiallyfor travel photos. Meanwhile, such photos can be treatedas the social pixels by travelers’ cameras among travel lo-cations (cf. Figure 1) [16]. It is promising to leverage suchfreely available user-contributed photos and further detectimportant people attributes. However, the previous worksonly focus on the statistics of the travel logs such as thepopularity of locations but neglected the important and richdimension of people attributes.

Meanwhile, in [11] Kumar et al. exploited the community-contributed photos for learning facial attributes (e.g., gen-der, age and race) which contribute to mining people at-tributes in large-scale media. Such rich attributes are shownhaving reasonable accuracy (>80%) for further applications.Therefore, we propose an approach to enable personalizedtravel recommendation by mining the parameterizing factorsdirectly and automatically from the community-contributedphotos – especially emphasizing on the (automatically) de-tected people attributes from the photos.

3.2 Correlation between Travel Patterns andPeople Attributes

By intuition, we know that some landmarks are female-favored, and some are male-favored. So are by other at-tributes such as race, age. Here we want to use entropy andmutual information [6] to measure the correlation betweenpredicting next travel location and people attributes. Themutual information I(X;Y ) between two random variablesX and Y is defined as follows:

I(X;Y ) = H(X)−H(X|Y ), (1)

where entropy H(X) measures the uncertainty with a ran-dom variable X. The entropy reduction as introducing an-other random variable (i.e., Y ) is the mutual informationI(X;Y ) between the two random variables. Intuitively, the

reduction means the help that random variable Y brings forpredicting X.

Measured by entropy and mutual information and illus-trated in Table 1, we demonstrate the relationship of peopleattributes and travel patterns in 5 locations sampled from5 major worldwide cities. Taking the nearest five locations(indexed by j) into considerations as starting from locationi, the entropy for the uniform choice is H(Li→j) = 2.3219.We will show how the entropy (uncertainties in choosingnext visiting location) will be reduced as introducing mined(from community-contributed photos) knowledges. Here F

denotes the calculated visiting frequencies between locationi and j from the photo trips (cf. Figure 4(d)); A denotes the(detected) people attributes along with the photo trips; weuse gender (i.e., male and female) only in this measurement.Thus, H(Li→j |F )2 denotes the entropy of predicting thenext location as given the mined travel frequencies (F ). It isapparent that, from H(Li→j |F ), the mined travel frequency(F ) helps choosing (recommending) the next travel locationfrom the current one. Further introducing the people at-tributes (A), the recommendation will be more accurate;that is, salient entropy reduction is shown in H(Li→j |A,F ).Note that I(Li→j ;A|F ) = H(Li→j |F ) − H(Li→j |A,F ), asdepicted in the last column.

Taking Madison Square in Manhattan as an example (cf.the first row in Table 1), the mutual information I(Li→j ;A|F )is 0.5329 (bits), the reduction is about 25% of the entropywhen given the people attributes (A). To interpret that in-tuitively, we can illustrate like this, if there are 4 randomchoices for the next destination, after knowing the peopleattribute (e.g., for the male only), the number of choice isdown to 3. Therefore, we can see that the people attributes(A) have quite correlations with predicting travel locationsLi→j . In this paper, we will show how to leverage such au-tomatically mined people attributes for personalized travelrecommendation.

4. SYSTEM OVERVIEWFigure 2 shows the architecture of our system, which is

divided into four main parts. At first, in order to mine thetravel information within each city, we crawl the photos fromthe on-line photo-sharing websites (i.e., Flickr). We then usea mean-shift based method on geo-locations of these photosto generate the important locations on each city (Section6.2) for the following user trip mining process. For the tripattribute pattern mining, the faces in the photos are firstlydetected. The attributes are generated by applying peopleattribute detectors (Section 5).

Travel paths and locations mined from the community-contributed photos also contain rich people attributes fromthe detected people. For instance, if many faces in this lo-cation are detected as male, this destination is probably fa-vored by male. This phenomenon can also be found in travelpaths. We can further identify the demographic information(via people attributes) within travel paths by analyzing theassociated photos. By mining the travel patterns users’ daytrips (Section 6.3), we propose two personalized travel rec-ommendation applications – mobile travel recommendation(Section 7) and route planning (Section 8), which are fur-ther entailed by a probability Bayesian model and dynamicprogramming techniques.

2The detailed deviations can be referred to Section 7.

Figure 2: System Diagram: There are four maincomponents. The first part contains data crawl-ing process and destination generation (Section 6.2).After that, people attribute (i.e., gender, age, race)detection is applied on each photo (Section 5). Thenext step is the trip generation process and the at-tribute information of each trip is inferred by thedetected faces in the trip (Section 6.3). Combinedwith the people attributes and the travel logs, wecan build an personalized tourist recommender, suchas mobile travel recommendation (Section 7) androute planning (Section 8).

Concept DefinitionWe first define some concepts used in the following sections:Photo collectionsAll the geo-tagged photos in the database.(i.e., all the points in Figure 4(a)).Destination Destinations are the popular or important lo-cations in a city, e.g., Brooklyn bridge in Manhattan, Colos-seum in Rome, and Brandenburg Gate in Berlin. (i.e., thered point in Figure 4(c)).Trajectory A collection of time series geo-tagged photosfor a specific photographer.Path & Route Path refers to a sequence of time seriesgeo-tagged photos traversed by a user in one specific day(i.e., Figure 4(b)). After quantizing the geo-tagged photosto destinations, the results are defined as route (i.e., Figure4(c)).Trip An individual user’s trip is the route with attributeinformation (i.e., Figure 4(d)).Trip Segment A trip segment is a tuple with (start loca-tion, end location, attributes) and is a part of the trip. Wewill take the trip segments as major knowledge sources (orobserved data) for mobile travel recommendation and routeplanning.

5. PEOPLE ATTRIBUTE DETECTIONIn this work, we utilize nine people attributes including

two gender attributes (female, male), four age attributes(kid, teen, middle-aged, elder) and three race attributes(Caucasian, African, Asian) to profile the travel preferenceof users. Currently, we only focus on the facial attributessince they represent rich information of people and can belearned through an adaptive framework [11]. Consideringthe scalability to deal with the increasing number of facialattributes, [11] proposes a generic framework for learningvarying facial attributes adaptively. To begin with, we crawlthe user-contributed photos from Flickr and extract the fa-cial region by face detector. Further, the face images are de-composed into face components for learning and construct-ing a mid-level feature bank (by SVM) to describe varyingfacial attributes. Finally, Adaboost is conducted to selectthe most important mid-level feature set for the designatedfacial attribute.

Crawling

photos from

Flickr

Ensemble

learning over

mid!level

features

Learning mid!

level features

Manual

annotation

(a) (b) (c)

User!contributed photos

Face

detection

HoG

Color

Gabor

LBP

Type Attribute

Gender Female, Male

Age Kid, Teen,

Middle!ager, Elder

Race White, African,

Asian

Female

Kid

White

…

Attributes

Figure 3: The framework of facial attribute detec-tion: (a) crawling the user-contributed photos fromsocial media and extracting the facial region by facedetection to prepare the training images; (b) manu-ally annotating the facial images for learning mid-level features by SVM; (c) utilizing Adaboost tolearn the best combination of mid-level features fordescribing different facial attributes respectively.

5.1 PreprocessingTo construct a training dataset naturally with rich and di-

verse images close to real life, we crawl training images fromsocial media (Flickr) containing abundant user-contributedphotos. Moreover, the facial region is extracted from thewhole image by face detection (cf. Figure 3(a)) and furtherapplied facial feature detection [17] to locate important partsof a face such as eyes, philtrum and mouth. Features (e.g.,edge, color, texture) extracted from different facial compo-nents pose different discriminating abilities across facial at-tributes. Therefore, we construct a mid-level feature bankbased on those facial components for providing better gen-eralization capability to deal with various facial attributes.

5.2 Mid-level FeaturesThe preprocessed facial images are annotated manually

as the training data to learn the mid-level features. A mid-level feature (Figure 3(b)) is a SVM classifier [3] with vary-ing low-level features (e.g., Gabor filter, HoG, Color, LocalBinary Patterns) extracted from different face components(e.g., whole face, eyes, philtrum, mouth). Those combina-tions constitute a feature bank to provide possible mid-levelfeatures required by facial attributes. Rather than usinglow-level features, mid-level features have better semanticmeanings for describing facial attributes through multiplemodalities. Note that, unlike image object classification,randomly selecting the images from other categories (at-tributes) as negative training data is impractical since at-tributes are often partially correlated such as female andAsian. Hence we simply group the attributes into several at-tribute type, for example, the attribute type “age” includeskid, teen, middle-ager and elder. Since the attributes withthe same attribute type are mutually exclusive, thus pro-viding negative training images with better discriminatingcapability. For example, we acquire the negative trainingimages of kid attribute from the rest of the attributes be-longing to the same attribute type (e.g., teen, middle-agerand elder).

5.3 Ensemble Learning for Different AttributesFor a facial attribute, the best combination of mid-level

features is learned by Adaboost (Figure 3(c)). Here, wetreat each mid-level feature as a weak classifier. The com-bined strong classifier represents the most important parts

of that attribute, for example, <whole face, Gabor> is mosteffective for female attribute while <whole face, color> ismost effective for African attribute. The important mid-level feature set for the designed facial attribute is adap-tively selected and weighted through the boosting scheme.Experimenting in the benchmark data [11], the approachcan effectively detect facial attributes and achieve more than80% accuracy on the average. Meanwhile, the framework isgeneric for various facial attributes thus providing betterscalability to precisely profile users for more attributes.

6. PHOTO TRIP GENERATIONThe trip generation is a challenging problem because of

the various trip types (e.g., itinerary durations) and differ-ent travel behaviors across demographic groups. We adoptthe common frameworks, e.g., [14, 2], to mine the trips fromthe (noisy) user-contributed geo-tagged photos and furtherconsider the people attributes in the photos. We conductthree main procedures: path extraction, destination discov-ery, and trip attribute pattern mining, to extract the obser-vations (in trip segments between destinations) to be furtherincorporated in the next recommendation model.

We first use the time information in the photos to seg-ment the trajectories from each user. These segments arewith the limitation in one-day that can be seen as the one-day paths in the given city. The next processing is aboutthe destination discovering because we try to map these un-informative paths into a higher level, i.e., routes which arebased on the detected destinations. The next challenge isto give the attribute knowledge into the routes, resulting inthe informative trips. These trips can be separated into tripssegments and then will become the inputs in our proposedtravel recommendation.

We emphasize here that, although people may visit placesas a couple or group and usually not alone, in this beginningstage for proving the novel concept by leveraging people at-tributes in the photos for personalized recommendation, wefocus on the individual travelers in this study first and webelieve it can be extended to group travelers by further in-vestigations; for example, conducting face clustering in theoriginal travel patterns.

6.1 Path ExtractionWe sort photos by the captured time for each user. The

one-day paths are then detected by separating the sortedphotos according to their date information. Also, we adoptthe soft assignment to the photos which are at the boundarybetween two days, that is, we make each boundary photobelong to two paths, one is its original date path, anotheris the previous date path in order to handle the case thatif a user’s trip is across the midnight. The result paths areshown in Figure 4 (b).

6.2 Destination Discovering and Route Map-ping

Because the generated paths are too noisy and compli-cated to mine the information, the next task is to refinethese paths to a more informative level, routes, by quantizingthe photos into the destinations. In the beginning, we applymean-shift clustering procedure on the geo-tagged photos foreach city, where each photo is represented by latitude andlongitude coordinates, to discover the locations with highphoto densities. Empirically, for mean-shift method, we use

(a) Photo Collection (b) Path (c) Route (d) Observation

{Female, Elder,

Caucasian}

{lat1, lng1}

{Female,

Elder,

Caucasian}

{lat2, lng2}

Figure 4: The photo trip generation process illustrated in Manhattan collection. (a) The collection of geo-tagged photos; each dot is the place where the photo is taken. After separating by date information onthe photos for each user, the resulted travel path, which connects its photo sequences of the same user, isshown in (b). However, these noisy path trajectories are not informative. In (c), we first cluster these geo-tagged photos as destinations, representing locations of interests in the city and also marked as red points;we then quantize the paths into these discovered destinations as routes. The two photos are the examplesfor geo-tagged photos along with the trips, which contain the people attributes (i.e., gender, age, race) andthe geo-location information (i.e., Lat for latitude, Lng for longitude) and can be further utilized to minethe relationship between attributes and travel behaviors. The trips with people attributes are shown in (d)which will finally become the learning data in our recommendation model. (Best seen in color)

a flat kernel function and set the bandwidth parameter as0.001, which is roughly the radius of a landmark, as men-tioned in [12, 5, 7]. We use clusters containing at least 0.1%photos in the city and no less than 100 photos as destina-tions. The red points in Figure 4 (c) illustrate the detecteddestinations on Manhattan via this procedure.

The path sequences of all users now can be translated intodestination sequences, i.e., routes, by assigning the photo tothe nearest destination if the distance between them is undera given threshold.

6.3 Trip Attribute Pattern MiningA photographer can be seen as a human-sensor [16]; hence

each detected face in the photos reveals not only the tripbut also an observed people attributes. Besides, the de-tected face can be regarded as an individual sharing the samemovements with this photographer. That is, we propagatethe same movements from photographer to all the faces inthe taken photos. For instance, given that there is a routesequence (A → B → C) which is generated by an user. A,B, and C are the traversed destinations sequentially. Allthe detected faces in this route will all own the same routesequences and then result in many trips according to theface number. Although these trips have the same movementsequence, they might have different attributes based on thedetection on the faces. Besides, all the trips generated fromthis route sequence will contain trip segments (e.g., (A → B)and (B → C) in this example). These trip segments aretreated as the atomic references as the observation data forlearning and recommendation later.

Formally, we define the trip T containing n destinationsfor a specific person (i.e., detected face) u ∈ U (i.e., U is theset of users) as Tu = (L1u → L2u → ... → Lnu ,Au) whereLiu ∈ L (i.e., L is the set of destinations) is the sequentiallytraversed destinations of user u and Au is a tuple containing

the detected attributes such as gender and age. For example,in Figure 4 (c), the two photos have the same (detected)attributes (female, elder, Caucasian), where each dimensionrepresents the gender, age, and race accordingly.

7. TOURIST RECOMMENDATION MODELFor personalized recommendation, we tackle the following

scenario: “I am a male, and I am at Central Park now. Whatis the next suggested destination for me?” Obviously, the in-puts of this framework include the target user u’s profile Au,composed of known people attributes, u’s present locationLi, and the output is the recommended next destination Lj .

To recommend the optimal next destination L∗, this prob-lem can be formulated as follows:

L∗ = argmaxLj

P (Lj |Au, Li,Θ) Θ = {F,A}, (2)

where Θ is our learned tourist model composed of the travelfrequencies between the destinations (F) and detected at-tributes in photos (A). Intuitively, we will recommend theproper destination that most tourists visit and matches thetarget user’s profile Au. That is why we need to mine bothF and A, the existing travel preferences among the destina-tions, from the large-scale community-contributed photos.

Until now, we have the trips with attribute patterns Tu∈U =(L1u → L2u → ... → Lnu ,Au). And each trip Tu∈U can beregarded as a sequence of trip segments as described aboveand which then become the observation dataset of our rec-ommendation model.

This section describes a way of constructing the learningtourist knowledge that takes two factors into account: (1)the sequence correlation between destinations (P (Li→j) ordenoted as P (Lij)), and (2) the people attribute patternson the movements (P (A|Lij)).

7.1 Bayesian Learning Model

We adopt Bayesian learning model as our recommenda-tion model because Bayesian has shown effective in recom-mendation systems and, most importantly, it can be appliedfor real-time mobile recommendation and immediate routeplanning service. It is the first attempt for us and can bereplaced by any other recommendation models. By Bayes’theorem, the probability that the location Lj is the sug-gested destination given a start location Li and the attributevalue Au of a specific user u is:

P (Li→j |Au) =P (Li→j ,Au)

P (Au)=

P (Li→j)P (Au|Li→j)

P (Au),

(3)where P (Li→j) is the probability that moving from locationLi to location Lj . Intuitively, it is equivalent to that wechoose Li as a start location (with the probability P (Li))(i.e., given Li) and then transit to Lj (with the probabilityP (Lj |Li)) and can be jointly formulated as P (Li)P (Lj |Li).

Therefore, the Equation 3 will become

P (Li→j |Au) =P (Li)P (Lj |Li)P (Au|Li→j)

P (Au)(4)

In other words, based on the target user profile and hiscurrent location and given the observation dataΘ = {F,A},the optimal prediction of next location is the destination L∗

such that Equation 4 is maxmized,

L∗ = argmaxL{P (Lj |Li)P (Au|Li→j)} (5)

We assume the independence between different types ofattributes (i,e,. gender and race are independent), such thatthe joint probability P (Au|Li→j) can be further expressedas the product of the marginals,

L∗ = argmaxL{P (Lj |Li)

∏

au∈Au

P (au|Li→j)} (6)

The au is one of the tuple of Au (i.e., gender). P (au|Li→j)and P (Lj |Li) in the product above are easy to be estimatedfrom the frequencies of the training corpus (i.e., observationdata):

P (au|Li→j) =count(Li→j ∧Au = au)

count(Li→j)(7)

P (Lj |Li) =count(Li→j)

∑

j∈Lcount(Li→j)

(8)

The count(Li→j) is the total number of observations thatstart at Li and end at Lj . Similarly, The count(Li→j∧Au =au) is the total number of observations that start at Li andend at Lj with the attribute au.

7.2 Background SmoothingWe cannot ignore the possibility that the user will be in

the location or want to go to the location that beyond ourobserved travel logs. For example, if a male is now in Li,and there is no male which starts from Li to any other lo-cations in the historical data, we cannot predict the nextlocation for this user due to the zero probability. There-fore, in order to deal with the unobserved events, we needto add a smoothing factor in the probabilistic model. Intu-itively, we take destination popularity into consideration forthe location correlation. This information is obtained from

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Manhattan Rome Barcelona Prague Hong

Kong

Berlin Taipei Venice

Performance (P@1)

Baseline R(1) Baseline R(5) Attribute (A) Frequency (F) Combined (A+F)

Figure 5: Mobile recommendation performance ofdifferent methods on eight worldwide cities. Theknowledge-based methods, i.e., frequency (F), at-tribute (A), and combined (F+A), perform betterthan naively choose the first near location (R(1)) orthe top 5 nearest (R(5)). Best seen in color.

the destination generation process for each location Lj . Thebackground probability is defined by,

PC.L(Lj) =number of photos in the destination Lj

number of photos in the city C(9)

For attribute probability, we use the city C’s backgroundattribute information to smooth the attribute effect.

PC.A(ak) =number of people with the attribute ak

number of people in this city C(10)

It must be noted that the background information of at-tributes is based on the detected faces, which may includethe trips staying in the same location (i.e., it generates notrip segment) or the faces in the photos which does not quan-tizing into destinations.

We adopt a linear interpolation of the maximum likelihoodmodel with the collection background information, using thecoefficient α and β to control the influence of backgroundmodel.

When we substitute Equation (9), (10) into Equation (6),we get the following final estimate for the joint probability:

L∗ = argmaxL{((1− α)P (Lj |Li) + αPC.L(Lj))

∏

ak∈Au

((1− β)P (ak|Lij) + βPC.A(ak))}. (11)

To avoid floating-point underflow when computing productsof probabilities, all of the computations are in log-space.

Because the term ((1−α)P (Lj |Li)+αPC.L(Lj)) considersthe “frequency” of movements between locations, the recom-mendation using this probability is regarded as frequency-based method (F). Similarly, using the term

∏

ak∈Au

((1 −

β)P (ak|Lij)+βPC.A(ak)) in recommendation is the attribute-based method (A). The combination of the F and A (i.e.,Equation 11) is denoted as “F+A”.

8. ROUTE PLANNINGThe route planning problem can be treated like this: start-

ing from location Ls and ending at location Le, we want togo through N interesting destinations, and we have desig-nated profile (or attributes) Au (e.g., male).

We can take the graph-based method to solve the prob-lem. A graph is generally composed of nodes, edges, and

Table 2: The 4 million geo-referenced photos col-lected from eight major cities in the world.

City Images Trips with face Face in trips Trip Segments

Manhattan 718,194 1,621 5,871 26,734Rome 494,483 523 1,684 12,289

Barcelona 538,027 467 1,523 6,108Prague 289,304 304 857 6,326

Hong Kong 446,813 379 1,393 4,316Berlin 530,298 294 778 3,639Taipei 579,033 585 3,242 3,658Venice 334,513 199 606 3,140

Total 3,930,665 4,372 15,954 66,210

the weights on the edges. Nodes: each destination in thecity is a node in the graph. Edges: The edge connects twonodes. From our learning model, we can calculate score ofedges from the paths connecting the two destinations, how-ever, parameterized by the user profile Au. Because we wantto go through popular landmarks rather than obscure ones,P (Li) should be placed into consideration. The other factoris the distance between candidate destinations; users do notprefer a devious journey, so we need a penalty function todiminish the overall route distance. Besides, we do not wantto visit a destination twice, so we add the constraint that ifthe destination has been visited, the weight connecting thedestination (from the current destination) is 0. Thus theweights for the edges can be formulated as:

sij =

{

0 , if j was visited

P (Li)P (Lij |Au) +ω

1+dist(Li,Lj), otherwise

(12)where dist(Li, Lj) is the geo-distance between destinationLi and Lj . The former term can be derived from Section7. The latter term, ω

1+dist(Li,Lj), is a penalty function of

distance, and ω is the weight for this penalty function.Because of the smoothing technique mentioned in Section

7.2, it is a fully connected and directed graph. The routeplanning problem is to recommend best travel route from Ls

to Le via N other destinations. We define the route R as:

R = {r0, r1, ..., rN , rN+1, } (13)

where ri is the i-th destination of the route R. By definition,r0 = Ls, and rN+1 = Le. In addition, sij is the score fromdestination i to j. We will locate a route that connecting thestarting and ending destinations that maximize the scoresamong the traversed edges. Thus, the route we want to findis:

R∗ = argmaxR{

N∑

k=0

srkrk+1} (14)

Therefore, the route problem can be treated as the short-est path problem. In our constructed graph, the path withthe highest score is the route we expect for recommenda-tion. As mentioned in Section 7.2, we can get L∗, which isthe maximum score between current and the next destina-tion given Au (designated user profile). Hence we can solvethis problem by dynamic programming to derive the bestroute.

9. EXPERIMENTSOur dataset was collected by downloading images and

photo metadata using Flickr API for the major cities suchas Manhattan, Rome and Barcelona. Each city’s boundarycan be determined by the Wikimapia3, which is an on-line3www.wikimapia.org

collaboratively map service containing the user-contributedgeographic coordinates. We first issue a roughly rectangleboundary query and then use the precise boundary informa-tion to remove the photos outside the city. Table 2 summa-rizes the dataset statistics. These real travel logs are prac-tical for interpreting the performance (by accuracy) of ourrecommendation algorithm. Although there are total about4 millions images, we only experiment on the trips whichcontain detected attributes, i.e., those trips with face de-tected. In total, there are about 16K faces in the 4,372 tripsas shown in the third and fourth columns in Table 2. The ob-servations (Li, Lj ,A) are generated by separating each tripinto segments and duplicating these segments according tothe detected face number in the trip. That is, faces belong-ing to the same trip share the same observations but havedifferent attribute information (Section 6). Without loss ofgenerality, we discard the trip segments not including in atleast 2 different people.

For the processing time of our framework, the backendprocesses, including people attribute detection, trip gener-ation, and learning prediction model are all in the off-lineprocedures which can be separated into different data splitsand tasks to be processed in the distributed computationframework (e.g., Hadoop/MapReduce). The efficiency of theon-line recommendation scenarios can be easily achieved bythe proposed Bayesian probability model. For on-line timecost, we measured on four randomly selected cities using thenon-optimized MATLAB codes. On the average, it takes0.0032 (sec) for personalized mobile recommendation. Forroute planning, we issued 5 queries per city, with randomlyselected start/end locations and fixed 3 destinations in themidway; the averaged computation time is 3.5076 (sec).

9.1 Mobile Travel RecommendationThe first experiment is designed to verify the performance

of our probability model on travel recommendation. Weadopt a 5-fold cross-validation approach for the experiment.That is, the observations (i.e., the combination of (Li, Lj ,A))are first split into 5 folds; 5 iterations of training and testingare performed; the training set contain 4 folds iteratively.Instances in the testing fold are excluded if their initial lo-cations are not seen in the training set because they cannot benefit from the recommendation model. For the back-ground smoothing factors, α and β, we conduct a sensitivitytest which varies the parameters by increments of 0.1 be-tween 0 and 1. The best performance is measured by theaverage cross-validation accuracy, which is defined by pre-cision at one (P@1) for the recommended destination. Weset the parameters as those performed the best in this ex-periment.

In our experiment, we consider three kinds of people at-tributes, i.e., gender, age, and race, where gender ∈ {male,female}, age ∈ {elder, middle-aged, teenager, kid}, and race∈ {Caucasian, African, Asian}. For each kind of attribute,we apply the detectors for an input face and assign this faceto the class with the highest score. Take gender as an exam-ple; if the score from male detector is higher than female’s,this face will be classified as a male.

For measuring the frequency of attributes in Equation 7,it just needs to count on how many peoples in the samemovement sequence having the same attribute value au (e.g.,au = male).

9.1.1 Impact of Background Smoothing

Table 3: Improvements in P@1 for mobile travel recommendation across eight worldwide cities by exploitingthe mined travel patterns and people attributes. R(5) is the baseline that recommends randomly choosingthe top 5 nearest destinations. F represents the method which considers the visiting frequency betweenlocations only. F+A combines the people attributes mined from photos and the F-based method. Themined knowledges, either travel frequencies and people attributes, both are helpful for mobile travel recom-mendation. We further categorize the city locations into different classes determined by prediction entropyH(Lij) = H(P (Li→j)), measured in the collection and predicting by the travel frequencies only. The entropyH(Lij) is higher as more diverse and hard choices for recommending the next travel destination. As observed,the relative gain between F and F+A (cf. the sixth and the last columns) increase for the difficult predictions.

City Entropy R(5) F F+A F vs. F+A (%) City Entropy R(5) F F+A F vs. F+A (%)

ManhattanH(Lij) ≥ 0 0.13 0.39 0.39 0.5

Hong KongH(Lij) ≥ 0 0.14 0.57 0.58 1.6

H(Lij) ≥ 1 0.13 0.35 0.35 0.6 H(Lij) ≥ 1 0.14 0.41 0.43 4.6H(Lij) ≥ 2 0.11 0.28 0.28 0.03 H(Lij) ≥ 2 - - - -

RomeH(Lij) ≥ 0 0.10 0.33 0.35 6.5

BerlinH(Lij) ≥ 0 0.14 0.51 0.52 2.6

H(Lij) ≥ 1 0.10 0.28 0.30 8.5 H(Lij) ≥ 1 0.12 0.39 0.41 6.0H(Lij) ≥ 2 0.09 0.23 0.25 8.5 H(Lij) ≥ 2 0.08 0.28 0.32 13.5

BarcelonaH(Lij) ≥ 0 0.13 0.48 0.51 6.4

TaipeiH(Lij) ≥ 0 0.18 0.57 0.58 1.3

H(Lij) ≥ 1 0.12 0.38 0.41 9.6 H(Lij) ≥ 1 0.16 0.43 0.44 1.7H(Lij) ≥ 2 0.12 0.26 0.30 14.7 H(Lij) ≥ 2 - - - -

PragueH(Lij) ≥ 0 0.14 0.42 0.44 6.1

VeniceH(Lij) ≥ 0 0.15 0.40 0.41 3.5

H(Lij) ≥ 1 0.14 0.38 0.41 6.4 H(Lij) ≥ 1 0.14 0.37 0.38 4.5H(Lij) ≥ 2 0.10 0.30 0.30 -0.2 H(Lij) ≥ 2 0.11 0.29 0.30 3.5

The trade-off parameters α and β in the proposed al-gorithm influence the relative contribution from the back-ground destination popularity and the people’s attribute dis-tributions in each city. We find that for most of the cities,i.e., Manhattan, Rome, Barcelona, the optimal α is below0.4, except for Hong Kong that has a higher α value (0.8).This phenomenon probably is caused by the sparse distribu-tion of the observation data of Hong Kong such that mostof the destinations have few faces detected. For other cities,the trips spread out well over the destinations, and can ac-curately predict using the training corpus.

9.1.2 The Recommendation PerformanceFigure 5 shows the performance of the destination rec-

ommendation. We compare our knowledge-based methodswith two baselines which randomly recommend the next des-tination by the top 1 (i.e., R(1)) or top 5 (i.e., R(5)) near-est destinations. The baselines are intuitive and proper be-cause usually people will choose the nearby location as theirnext destination. As observed, the knowledge-based meth-ods (i.e., frequency, combined) outperform the two baselinesover 20% with respect to P@1. The method with attributeonly sometimes performs worse than top 1 random due tothe dense location distribution in these cities, incorporat-ing the attribute into frequency factors generally increasesthe performance, which indicates that the attributes indeedbring up additional benefits besides those travel frequenciesbetween destinations.

9.2 The Impact of People AttributesIn Figure 5, we can see that with the help of the location

frequency mined from the historical travel logs, the perfor-mance is much better than the two baseline methods. Themethod with attribute only also improves but not so muchas the method with frequency only. The phenomenon isreasonable since for major landmarks the visiting sequencesbetween them are stable and people of different attributesmight all likely travel between them. Therefore, we look at amore difficult testing location subset where the uncertaintyof the number of choices for the next destination is large,i.e., where the conditional entropy H(Lij) is above a certainthreshold. A location with a higher movement (or predic-

tion) entropy H(Lij) means that the choices of the next stopare more diverse for the people who leave from this location.For these cases, the attribute information is more useful toaid the performance of recommendation.

Table 3 presents the performance gain between differentmethods. We compare the random choice among the fivenearest R(5) baseline with the two knowledge-based meth-ods. As shown in Table 3, the improvement comparing withbaseline degrades as the entropy increases. The gain be-tween the frequency (F) and the combined method (F+A)usually increases when the testing subset becomes more harderto predict (H(Lij) ≥ 0 to H(Lij) ≥ 1). There are somemissing entries in Table 3 due to the lack of testing sets.

9.3 Route PlanningIn the route planning framework, we take the graph-based

method to solve this problem and the scores (over edges) areall based on the above destination recommendation stepswhich have been shown effective through the experiment(Section 9.1).

Initially we recruit eight testers to use the designed routeplanning system for travel planning on the cities they arefamiliar with. The ratings are satisfactory and consistentwith their expectations. Here we list 3 cases from 3 majorcities and try to analyze the results4.

In the case of the routes from Central Park to BatteryPark in Manhattan (see Figure 6(a)), we can see that Metropoli-tan Museum of Art is famous that it becomes a must-seelandmark for people no matter what gender they have; In-trepid Sea, Air & Space Museum, a military and maritimehistory museum, is apparently male-tended. Note that theroute planning is suggested according to the user profiles(i.e., male and female) and the learned travel preferences be-tween destinations, automatically mined from the community-contributed photos.

In the case of the routes from Pincio to Piramide di CaioCestio in Rome (see Figure 6(b)), it shows that females pre-fer going to Fontana Dei Quattro Flumi, while males like

4Note that our system assumes the higher score between twodestinations reflects the fact that the traffic between themis accessible. We do not directly concern whether there aresome barriers (e.g. river) or traffic barriers via the route.

(a) Manhattan (b) Rome

��

��

� ��

��

��

��

��

� ��

��

��

��

��

��

��

� ��

��

��

��

��

��

!��

��

"��#�� "��

$��%� �&�!��'

(��'��

(c) Berlin

��

��

��#)��* ��

��

��+��

�� ,�

��

��

��#

Figure 6: The route planning samples, with designated profiles – male vs. female, in 3 cities: (a) Manhattan,(b) Rome, and (c) Berlin. The blue route is for male, while the red one for female. Both routes in the samecity are fixed at the same starting and end locations (or destinations). In addition, they must go through 3other destinations in midway. Our route planning system could recommend different routes considering thegiven user profiles such as gender, age. From these results, Intrepid Sea, Air & Space Museum (i.e., IntrepidSea Museum in (a)) and Berliner Cigarren Club tend to be suggested to males. (Best seen in color.)

visiting Vatican Museum. In this case, we hypothesize thatthe total distance might matter; that is the route throughVatican is longer.

In the case of the path from Stiftung Brandenburger Tor toOberbaum Bridge in Berlin (see Figure 6(c)), Berliner Cigar-ren Club seems male-favored, while Hackesches Hof Theateris suggested for the female route. A leisurely route with the-ater visiting is more attractive than the cigarette museumto females. Besides, Mauermuseum in the route of male isthe famous crossing point on the Berlin Wall.

Generally speaking, there are some landmarks very pop-ular with both males and females, but it is without doubtthat there are still some landmarks favored by only males orfemales according to the recommended routes of the samestarting and ending destinations. By such attribute knowl-edge, our system is able to provide personalized route plan-ning given the users profiles (i.e., gender, race, age)

10. CONCLUSIONSIn this work, we propose a probabilistic personalized travel

recommendation model which adopts the automatically minedknowledges from the travel photo logs and the automati-cally detected people attributes in the photo contents. Byinformation-theoretic measures and experiments in eight ma-jor cities, we confirm that people attributes are effectivefor mining demographics for travel landmarks and pathsand very helpful for personalized travel recommendation androute planning. Meanwhile, people attributes are orthogonalto the travel logs alone and can further yield more satisfac-tory results especially in more challenging recommendations.For the future work, we are conducting experiments on morepeople attributes and adjust the probabilistic model for suchdiverse attributes to address more capabilities in personal-ization. Besides, the more competitive recommender modelsneed to be investigated as well. We also want to expand ourmodel with more contexts such as travel durations, travel-ing seasons. We believe such location- and individual-awaremodels are promising for further applications such as adver-tisement.

11. REFERENCES

[1] M. Ames et al. Why we tag: motivations for annotation inmobile and online media. In ACM CHI, 2007.

[2] Y. Arase et al. Mining people’s trips from large scalegeo-tagged photos. In ACM MM, 2010.

[3] C.-C. Chang et al. LIBSVM: a library for support vectormachines, 2001. Software available athttp://www.csie.ntu.edu.tw/ cjlin/libsvm.

[4] A.-J. Cheng et al. Gps, compass, or camera?: investigatingeffective mobile sensors for automatic search-based imageannotation. In ACM MM, 2010.

[5] M. Clements et al. Using flickr geotags to predict usertravel behaviour. In ACM SIGIR, 2010.

[6] T. M. Cover et al. Elements of Information Theory. Wiley,1991.

[7] D. J. Crandall et al. Mapping the world’s photos. InWWW, 2009.

[8] Y. Gao et al. W2go: a travel guidance system by automaticlandmark ranking. In ACM MM, 2010.

[9] Q. Hao et al. Equip tourists with knowledge mined fromtravelogues. In WWW, 2010.

[10] R. Ji et al. Mining city landmarks from blogs by graphmodeling. In ACM MM, 2009.

[11] N. Kumar et al. Facetracer: A search engine for largecollections of images with faces. In ECCV, 2008.

[12] Y. Li et al. Landmark Classification in Large-scale ImageCollections. In IEEE ICCV, 2009.

[13] D. Liu et al. Location sensitive indexing for image-basedadvertising. In ACM MM, 2009.

[14] X. Lu et al. Photo2trip: generating travel routes fromgeo-tagged photos for trip planning. In ACM MM, 2010.

[15] T. Mei et al. Knowledge discovery fromcommunity-contributed multimedia. IEEE MultiMedia, 17,October 2010.

[16] V. K. Singh et al. Social pixels: genesis and evaluation. InACM MM, 2010.

[17] P. Viola et al. Rapid object detection using a boostedcascade of simple features. In IEEE CVPR, 2001.

[18] Y. Zheng et al. Mining interesting locations and travelsequences from gps trajectories. In WWW, 2009.

[19] Y. Zheng et al. Learning travel recommendations fromuser-generated gps traces. ACM Trans. Intell. Syst.Technol., 2, January 2011.

[20] Y. Zheng et al. Recommending friends and locations basedon individual location history. ACM Trans. Web, 2011.

personalized travel recommendation by mining people...

Documents