the importance of diversity in profile-based recommendations: a … · 2015. 10. 28. · at rúa...

8
The importance of Diversity in Profile-based recommendations: A Case Study in Tourism Fernando Sanchez-Vilas CITIUS University of Santiago de Compostela Santiago de Compostela, Spain [email protected] Jasur Ismoilov CITIUS University of Santiago de Compostela Santiago de Compostela, Spain [email protected] Eduardo Sanchez * CITIUS University of Santiago de Compostela Santiago de Compostela, Spain [email protected] ABSTRACT The paper explores the concept of similarity between two users measured in user profile space rather than the tradi- tional rating space. The study aims at discovering the most relevant user profiles in order to provide recommendations to any given target profile. Closer profiles were found to be the most accurate in terms of prediction error. However, the best results were obtained when including profiles far away from the target one. This striking result is explained in light of the diversity prediction theorem. Keywords User profile, Diversity Prediction Theorem, tourism, gas- tronomy. 1. INTRODUCTION A search on a tourism website typically involves a user fill- ing a form to choose the item of interest (hotel, travel pack- age and so on) as well as the relevant dates of the trip. The response usually includes prices and availability of those items fulfilling the request, which actually assists the con- sumer to make an informed decision. In the Tourism 2.0 era, the decision-making process has been further facilitated by means of specialized websites that include additional feed- back provided by travelers who previously experienced the evaluated item. This feedback comes under different fla- vors (reviews, ratings and comments) and serves to further clarify the quality of services or to uncover issues or prob- lematic situations. However, the relevancy of this additional experience-based information is not the same for all future users. Recently, some advanced websites started to pave the way for personalization services by classifying user feedback on the basis of user profiles, like Families, Couples and Busi- ness profiles. The motivation for this move comes from the * To whom correspondence should be addressed. fact that a touristic product could be a wonderful experience for, let’s say, a Family, but not appropriate for a Couple. The perceived trend is to increase the tourist satisfaction by providing better personalization services on the basis of relevant consumer attributes, both personal and contextual [1]. A number of Recommender strategies based on consumer attributes have been proposed in the literature. They can be classified according the scheme followed to generate the predictions: (1) probabilistic predictors, and (2) rule predic- tors. Among the first ones, an algorithm proposed by Ono et al. (2007) uses contextual information including user, item and context attributes, to recommend movies. The in- teraction scenario is modeled by: (1) a set of user profile attributes (U), with attributes like age and sex, (2) a set of context attributes (S), with attributes like mood and loca- tion, (3) a set of item attributes (C) with attributes like film genre and director, (4) a set of film ratings enhanced with contextual information. Their approach applies Bayesian networks to obtain the probability P (V |u, s, c) of a rating for the target user U = u, specific context S = s and can- didate movie C = c. The Bayesian paradigm is also used in another recommender system but pointing out a difference between a fixed profile stated by the user, and an adaptive profile that is built dynamically based on user activity [3]. A naive Bayes network was also applied to generate recommen- dations adapted to contexts that were previously predicted by means of behavioral pattern analysis [4]. In the field of tourism, Costa et al. (2012) propose a probabilistic clas- sifier and an agent-based approach in which a set of user attributes, a user context attributes and some item context attributes are defined: the restaurant category, the price, the schedule, the kind of day, the distance, the timeOfDay, the day of the week and the goal of the user interaction. The recommendations are conditioned by the users goal on any given moment. Among the second ones, a restaurant rec- ommender system made up with a set of semantic rules was proposed by Vargas et al. (2011). The rules depend on user properties and context information. The key aspect of this work is the selection of the most relevant context attributes among an original set of 23 restaurant attributes, 21 user attributes and 2 environmental attributes. In this paper we aim at exploring the value added by user attributes under the traditional k-Nearest Neighbors (k-NN)

Upload: others

Post on 21-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

The importance of Diversity in Profile-basedrecommendations: A Case Study in Tourism

Fernando Sanchez-VilasCITIUS

University of Santiago deCompostela

Santiago de Compostela,Spain

[email protected]

Jasur IsmoilovCITIUS

University of Santiago deCompostela

Santiago de Compostela,Spain

[email protected]

Eduardo Sanchez∗

CITIUSUniversity of Santiago de

CompostelaSantiago de Compostela,

[email protected]

ABSTRACTThe paper explores the concept of similarity between twousers measured in user profile space rather than the tradi-tional rating space. The study aims at discovering the mostrelevant user profiles in order to provide recommendationsto any given target profile. Closer profiles were found to bethe most accurate in terms of prediction error. However, thebest results were obtained when including profiles far awayfrom the target one. This striking result is explained in lightof the diversity prediction theorem.

KeywordsUser profile, Diversity Prediction Theorem, tourism, gas-tronomy.

1. INTRODUCTIONA search on a tourism website typically involves a user fill-ing a form to choose the item of interest (hotel, travel pack-age and so on) as well as the relevant dates of the trip.The response usually includes prices and availability of thoseitems fulfilling the request, which actually assists the con-sumer to make an informed decision. In the Tourism 2.0 era,the decision-making process has been further facilitated bymeans of specialized websites that include additional feed-back provided by travelers who previously experienced theevaluated item. This feedback comes under different fla-vors (reviews, ratings and comments) and serves to furtherclarify the quality of services or to uncover issues or prob-lematic situations. However, the relevancy of this additionalexperience-based information is not the same for all futureusers. Recently, some advanced websites started to pave theway for personalization services by classifying user feedbackon the basis of user profiles, like Families, Couples and Busi-ness profiles. The motivation for this move comes from the

∗To whom correspondence should be addressed.

fact that a touristic product could be a wonderful experiencefor, let’s say, a Family, but not appropriate for a Couple.The perceived trend is to increase the tourist satisfactionby providing better personalization services on the basis ofrelevant consumer attributes, both personal and contextual[1].

A number of Recommender strategies based on consumerattributes have been proposed in the literature. They canbe classified according the scheme followed to generate thepredictions: (1) probabilistic predictors, and (2) rule predic-tors. Among the first ones, an algorithm proposed by Onoet al. (2007) uses contextual information including user,item and context attributes, to recommend movies. The in-teraction scenario is modeled by: (1) a set of user profileattributes (U), with attributes like age and sex, (2) a set ofcontext attributes (S), with attributes like mood and loca-tion, (3) a set of item attributes (C) with attributes like filmgenre and director, (4) a set of film ratings enhanced withcontextual information. Their approach applies Bayesiannetworks to obtain the probability P (V |u, s, c) of a ratingfor the target user U = u, specific context S = s and can-didate movie C = c. The Bayesian paradigm is also used inanother recommender system but pointing out a differencebetween a fixed profile stated by the user, and an adaptiveprofile that is built dynamically based on user activity [3]. Anaive Bayes network was also applied to generate recommen-dations adapted to contexts that were previously predictedby means of behavioral pattern analysis [4]. In the field oftourism, Costa et al. (2012) propose a probabilistic clas-sifier and an agent-based approach in which a set of userattributes, a user context attributes and some item contextattributes are defined: the restaurant category, the price,the schedule, the kind of day, the distance, the timeOfDay,the day of the week and the goal of the user interaction. Therecommendations are conditioned by the users goal on anygiven moment. Among the second ones, a restaurant rec-ommender system made up with a set of semantic rules wasproposed by Vargas et al. (2011). The rules depend on userproperties and context information. The key aspect of thiswork is the selection of the most relevant context attributesamong an original set of 23 restaurant attributes, 21 userattributes and 2 environmental attributes.

In this paper we aim at exploring the value added by userattributes under the traditional k-Nearest Neighbors (k-NN)

Page 2: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

Figure 1: TapasPassport of Santiago(e)Tapas contest. The official information of the contest, including theset of available tapas as well as the location of the restaurants, was published in the TapasPassport.

scheme [7, 8]. User attributes are used to characterize userprofiles, and then k-NN predictors can work with a similaritymeasure built on top of a user profile space rather than arating space. In what follows, we present the hypothesisunderlying the research work, the experiment carried out totest the hypothesis, the exploratory analysis that providedthe key findings to develop the profile-based algorithms, theevaluation of those algorithms, and finally, the application ofthe diversity prediction theorem to understand an strikingresult obtained with our algorithms.

2. HYPOTHESISThe line of thought behind this work is that k-Nearest Neigh-bors predictors in the user profile space, i.e user attributespace, should have to work better than in rating space. Therationale is that the similarity metrics behind the traditionaluser-based approach, a k-NN predictor in rating space, is notreally measuring the similarity in tastes between two users.In other words, the fact that two users present the sameset of ratings on a number of items does not mean that thetwo share the same tastes on those items. A coincidenceon ratings does not imply a coincidence on tastes. Ratingsmeasure satisfaction, an outcome of the experience process,while tastes measure preference on the attribute space of achoice set. In short, we believe that the probability of twousers having similar tastes is higher when the two are closerin profile space rather than in Rating space.

3. EXPERIMENTAL DESIGN3.1 Santiago(é)Tapas contest

In the context of the RECTUR project, an experiment wascarried out with real users in the context of Santiago(e)Tapas,a gastronomic contest that takes place every year in Santi-ago de Compostela. In 2011 the fourth edition was held witha total of 56 participating restaurants proposing and elabo-rating up to three tapas that were sold at a price of 2 euro.The experiment was designed to gather relevant data whilepreserving the spirit of the contest. Participants were localusers as well as Spanish and international tourists. A Tapas-Passport with the official information about the contest wasmade available to all participants (Fig. 1). It contained: (i)the contest guidelines and other related information to theparticipants, (ii) restaurants location, (iii) the tapas offeredon each restaurant, (iv) an official seal to demonstrate thata participant has visited the minimum number of restau-rants required to obtain contests gifts. Restaurant staff hadto sign the TapasPassport to certify that its owners havevisited the place.

After consuming a tapa, participants were asked to evaluatetheir experience by covering the vote shown in Fig. 2. Usershad to provide two ratings ranging from 0 to 5: (i) a rating ofthe tapa, and (ii) a rating of the overall experience (service,place atmosphere, etc.). In addition, they were informedabout our research experiment and asked to extend theirfeedback providing information about the temporal and so-cial context in which the experience took place.

3.2 RECTUR DatasetThe data gathered in the experiment was collected in theRECTUR dataset. It is assumed that the choice of a tapa

Page 3: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

Do 13 ao 29 de maio

www.santiagoetapas.com

PATROCINIO INSTITUCIONALPATROCINIO PRIVADO

PROMOVE

dñ: m

aruj

as c

reat

ivas

ORGANIZA

EVENTO ENMARCADO EN

Nome&apelidos*//Nombre&apellidos*//Name&surname*

Tapa Nº//Nr

Tapa

Experiencia global (atención, local)//Overall experience (treatment, premises)

0

0

PUNTÚA A TAPA E DEPOSITA O TEU VOTO NA URNAPUNTÚA LA TAPA Y DEPOSITA TU VOTO EN LA URNA

EVALUATE YOUR TAPA AND PLACE YOUR VOTE IN BALLET BOX

DNI/Pasaporte*//ID/Passport*

Realizamos unha investigación. Axúdanos cubrindo estes datos. Grazas! // Realizamos una investigación. Ayúdanos cubriendo estos datos. Gracias! // We are doing some research. Please help us by providing the information below. Thanks!

Os seus datos persoais van ser incorporados ao ficheiro “Xestión turística y comercial”, que ten como finalidade a prestación de servizos de información de Turismo de Santiago. O titular dos datos consinte o tratamento dos mesmos para la finalidade declarada. Os campos marcados con* resultan obrigatorios para poder participar no

concurso, en caso de non facilitalos quedará excluído do mesmo de forma automática. Os campos marcados con** resultan obrigatorios para poder participar no sorteo. Pode exercer os dereitos de acceso, rectificación, cancelación e oposición dirixíndose a “INFORMACION E COMUNICACION LOCAL, S. A. (Turismo de Santiago)” na rúa

do Vilar, 63 15705 Santiago de Compostela. O titular dos datos consinte que se lle remita ao enderezo electrónico facilitado información, convocatorias e outras comunicacións relacionadas cos servizos prestados por Turismo de Santiago. Marque o cadro en caso de que non desexe recibir este tipo de envíos // Sus datos

personales van ser incorporados al fichero “Gestión turística y comercial”, que tiene como finalidad la prestación de servicios de información de Turismo de Santiago. El titular de los datos consiente el tratamiento de los mismos para la finalidad declarada. Los campos marcados con* resultan obligatorios para

poder participar en el concurso, en caso de no facilitarlos quedará excluido del mismo de forma automática. Los campos marcados con** resultan obligatorios para poder participar en el sorteo. Puede ejercer los derechos de acceso, rectificación, cancelación y oposición dirigiéndose a

“INFORMACION E COMUNICACION LOCAL, S. A. (Turismo de Santiago)” en la Rúa do Vilar, 63 15705 Santiago de Compostela. El titular de los datos consiente que se le remita a la dirección electrónica facilitada información, convocatorias y otras comunicaciones relacionadas con los servicios

prestados por Turismo de Santiago. Marque el cuadro en caso de que non desee recibir este tipo de envíos // Your particulars will be included in the files “Tourism and Business Administration,” with the objective of providing Turismo de Santiago’s information services.

The holder of the particulars agrees to the processing of such for the aforementioned objective. Fields marked with* are compulsory in order to participate in the competition; failure to comply will result in automatic disqualification. Fields marked

with** are compulsory in order to participate in the prize draw. You can exercise your right to access, rectification, cancellation or objection by writing to “INFORMACION E COMUNICACION LOCAL, S. A. (Turismo de Santiago)”

at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to receiving, in the e-mail account provided, information, announcements or other communications related to

the services provided by Turismo de Santiago. Mark the box if you do not want to receive this kind of information.

www.santiagoetapas.com

Fecha//Date: _____ / _____ / ________

De tapas con//Tasting tapas with: Só//Solo//Alone Parella//Pareja//Partner Grupo//Group

Mediodía//Midday Tarde//Evening

SANTIAGO(é)TAPASIV CONCURSO DE TAPAS DE SANTIAGO DE COMPOSTELA

Lugar de procedencia*//Place of origin*

Tel./e-mail**

Vota & participa en sorteo de 2 billetes de avión Air Berlin!

Vote & participate in prize draw for 2 free Air Berlin flights!

Figure 2: Santiago(e)Tapas Vote. The participants had to fill the Tapa Votes with both the tapa and theoverall experience rating.

depends on the user preferences about the levels of tapaattributes, which will in turn depend on the user attributesand context elements. The consumption of a tapa on a givenmoment determines an Interaction of a User with a Tapathat will elicit a satisfaction response quantified as a userRating. Table 1 shows some relevant figures of the experi-ment.

In order to avoid the overload of contest participants witha large list of feedback questions, only a set of attributes ofthe full research model has been included in the evaluationprocess. These attributes were selected with the help ofexperts in the field of gastronomy. Figure 2 shows the tapavote that was finally designed to gather the experience ofthe user after a tapa consumption (user-tapa interaction).

For each tapa, we gathered the following attributes:

• Type: Meat, Fish, Vegetables, etc. The main ingredi-ent defined the type of the tapa.

• Character: Traditional or Daring. Traditional tapasare those that follow popular well-known recipes, whiledaring tapas are creative and provide innovative recipes.

• Restaurant. The restaurant that offers the tapa wasalso categorized in terms of its location, atmosphereand style.

• Average Rating. The average of ratings provided byconsumers.

The consumers, in turn, were characterized with the follow-ing attributes:

• Origin: There will be differences between local, Span-ish and foreign users as the first ones have a deeperknowledge of both restaurants and gastronomy.

• Character: Users are classified either as daring or astraditional based on the tapas they have consumed.A group of experts grouped the tapas offered in thecontest in two groups, traditional and daring. Expertsconsidered several attributes like tapa ingredients ortapa presentation in order to classify them.

• Experience: User domain knowledge will increase ow-ing to the consumption of new tapas. Due to this, itwas assumed that user ratings accuracy will increasewith the experience she has.

At each tapa consumption, the user had a user context thatwas described as follows:

• Social context defined by the user company

• Temporal context defined by the time frame when sheconsumes the tapa, the hour, the day, the kind of day(work day or holiday)

• Climatological context defined by weather conditions

• Location context defined by the position of the userwhen she decides to start a tapa consumption.

Page 4: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

Table 1: Experiment InfoParticipating restaurants 56

Different tapas offered 109Tapas consumed 35.000

4. EXPLORATORY ANALYSIS4.1 MethodsExploratory data analysis was developed by John Tukey inthe field of Statistics to encourage researchers to explore thedata in some informal way in order to discover patterns orrelationships between different variables [9]. The idea be-hind this approach is to generate hypothesis that could leadto further experiments and/or specific confirmatory analy-sis.

We found this strategy suitable to explore our idea about theusefulness of user profile information to generate better rec-ommendations. On the basis of the available user attributesand attribute levels (see Table 2 for details), 18 profiles wereidentified and every user of the Rectur Dataset were cate-gorized on each one of those profiles. Thereafter, a numberof target users as well as their collection of tapa ratings waschosen. For each tapa rating, a prediction was generatedby averaging just the ratings of those users in profile k (kranging from 1 to 18)). The Mean Absolute Error (MAE)was used to test the accuracy of the prediction accordingto each profile k. MAE is estimated by comparison of theprediction with the real rating value in the following way:

MAE =1

nk

nk∑i=1

|ri − ri| (1)

where nk is the number of users in profile k, ri is the pre-dicted rating, and ri is the real rating provided by the targetuser.

4.2 ResultsThe results of the exploratory analysis are shown in table3. For brevity, only five target user profiles are presented.For each target profile, the best as well as the worst pre-dictor in terms of MAE are presented. From this sampleit can be observed that the best predictors correspond tothose that are closer to the target profile according to theirattribute values. There seems to be a correlation betweenthe accuracy of the prediction and the similarity betweenprofiles in the user attribute space. This finding motivatedthe development and evaluation of profile-based algorithms.

5. PROFILE-BASED ALGORITHM5.1 MethodsThe similarity between user profiles is determined by meansof a distance measure in the user profile space. The con-tribution of each rating to the final prediction is weightedusing this profile similarity among the active user and thetarget user instead of the traditional rating similarity be-tween users.

Table 2: Attributes and Levels of User ProfilesAttributes LevelsCharacter Traditional, Daring

Origin Local, Spanish, ForeignExperience Low, Medium, High

Table 3: Exploratory Analysis: MAE per user pro-file. Abbreviations stand for: DAR (Daring), TRA(Traditional, FOR (Foreign), SPA (Spanish), MED(Medium).

Target Predictor MAEDAR,FOR,HIGH Best: DAR,FOR,HIGH 0,50

Worst: TRA,FOR,LOW 1,21DAR,SPA,MED Best: DAR,SPA,MED 0,00

Worst: DAR,FOR,HIGH 1,50TRA,FOR,HIGH Best: DAR,FOR,HIGH 0,67

Worst: DAR,SPA,MED 1,50TRA,FOR,MED Best: DAR,FOR,MED 0,50

Worst: TRA,SPA,HIGH 1,66TRA,SPA,MED Best: TRA,SPA,HIGH 0,33

Worst: TRA,FOR,MED 1,25

A metric of profile distance has been defined on the basis ofthe distances between profile attributes, which are shown inTables 4, 5 and 6. To compute the profile distance betweena user with profile i and a user with profile j, the followingequation was used:

di,j =∑a∈A

dai,j (2)

where A is the set of attributes and dai,j are the distancesbetween profiles i and j regarding to attribute a . Finally,the similarity between profiles is calculated as follows:

si,j =1

1 + di,j(3)

where di,j the distance between profile i and j. When di,j =0, the similarity si,j = 1, the maximum value. Similaritythen decreases as long as the distance between profiles in-creases.

The prediction was estimated in two different ways: (1) basicweighted average of neighboring users ratings (equation 4) ,and (2) compensated weighted average of neighboring usersratings (equation 5) [10]. The equations for both schemesare:

rj,k,l =

∑i∈profilek

sj,i × ri,k,l∑i∈profilek

sj,i(4)

Page 5: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

Table 4: Distances between Character values.uk/ul Daring Traditional

Daring 0.0 1.0Traditional 1.0 0.0

Table 5: Distances between Origin values.uk/ul Local Spanish Foreign

Local 0.0 1.0 2.0Spanish 1.0 0.0 1.0Foreign 2.0 1.0 1.0

rj,k,l = rl +

∑i∈profilek

sj,i × (ri,k,l − ri)∑i∈profilek

sj,i(5)

where rj,k,l is the predicted rating for user j in profile k foritem l, sj,i the similarity computed under equation 3, ri,k,lthe rating of user i of profile k on item l, and ri the averageof ratings of user i.

5.2 ResultsThe first analysis was focused on estimating the MAE forpredictors based on all user profiles at the same distance di,jfrom target user i. Results with increasing distances fromthe target user are shown in table 7 and plotted in figure 3. Itis observed that lower MAEs correspond to closer distances,i.e. higher similarities according to equation 2. The errorincreases with distance until reaching its maximum value.These results confirm the exploratory analysis performed inthe previous section. However, the traditional user-basedalgorithm still outperforms the best profile-based predictor,the one with user profile at distance d = 1 (see Table 10).

The second analysis was aimed at analyzing the impact ofaggregating the users at different distance profiles. In short,we have generated the predictions at distance d with all usersin profiles with distances lower or equal to d. The resultsof such aggregation profile scheme is shown in Table 8. Astriking pattern is found here, as the MAE decreases whenincreasing the profile distance. This behavior was confirmedunder the compensated weighting prediction of equation 5(results in Table 9). However, the consequence of this out-come is that Profile-based algorithms with aggregation andcompensated weighting prediction could work slightly betterthan traditional user-based approaches (see Table 10).

The question now opened is how to explain the fact thataggregating lower accurate predictors, i.e. those with higherdistance and independent higher MAE, results on a decreasein MAE. The next section is focused on answering this ques-tion.

6. DIVERSITY PREDICTION THEOREMLu Hong and Scott Page propose what they called DiversityPrediction Theorem which states that the squared error of acollective prediction equals the average squared error of in-dividual predictions minus the predictive diversity [11]. This

Table 6: Distances between Experiences values.uk/ul Low Medium High

Low 0.0 1.0 2.0Medium 1.0 0.0 2.0

High 2.0 1.0 0.0

Table 7: MAE Results for predictors based on userprofiles with increasing distances: basic weightingprediction.

Distance MAE Average Predictors

0 0.87236 16.309811 0.86204 20.568392 0.86851 16.142713 0.93652 7.278144 1.02836 3.036655 1.03636 2.33515

Table 8: MAE Results for predictors based on ag-gregation of user profiles with increasing distances:basic weighting prediction.

Distance MAE Average Predictors

0 0.87236 16.309810+1 0.8424 35.9854

0+1+2 0.83232 51.865150+...+3 0.83149 58.709790+...+4 0.83132 60.217920+...+5 0.83112 60.59597

Table 9: MAE Results for predictors based on ag-gregation of user profiles with increasing distances:compensated weighting prediction.

Distance MAE Average Predictors

0 0.7981 16.309810+1 0.78246 35.9854

0+1+2 0.77478 51.865150+...+3 0.77316 58.709790+...+4 0.77296 60.217920+...+5 0.77288 60.59597

Table 10: Comparison between profile-based anduser-based algorithms.

Algorithm MAE

Profile-based (Best predictor at d=1) 0.86Profile-based (Aggr. + Basic Weight) 0.83

Profile-based (Aggr. + Comp. Weight) 0.77User-based(k=500,minxy=3,x=3) 0.78

Page 6: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

Figure 3: Plot of MAE for different profile distances.

theorem builds on a well know statistical principle, the bias-variance tradeoff, and its formulation is shown in Equation6:

SqE(rj,k,l) = SqE(ri,k,l)− PDiv(ri,k,l) (6)

where rj,k,l is the global predicted value for user j on profilek for item l. It is calculated as the arithmetic mean of theindividual predictions, as it is shown in Equation 7:

rj,k,l =1

K × nk

K∑k=1

nk∑i=1

ri,k,l (7)

where K is the total number or user profiles, nk the numberof users with profile k, and ri,k,l the individual prediction ofuser i.

SqE(rj,k,l) is the squared error of the global prediction. Itis calculated as shown in Equation 8, being rj,k,l the truevalue.

SqE(rj,k,l) = (rj,k,l − rj,k,l)2 (8)

SqE(ri,k,l) is the squared error of the individual predictionsused to compute the global prediction, the individual pre-diction being just the single rating generated by user i onprofile k for item l. The error is calculated as shown inEquation 9:

SqE(ri,k,l) =1

K × nk

K∑k=1

nk∑i=1

(ri,k,l − rj,k,l)2 (9)

PDiv(ri,k,l) is the predictive diversity of the individual pre-dictions used to compute the global prediction. It is calcu-lated as shown in Equation 10

PDiv(~s) =1

K × nk

K∑k=1

nk∑i=1

(ri,k,l − rj,k,l)2 (10)

The theorem states that the error of the collective predictioncan be explained not only in terms of the error of the indi-vidual predictions, but also in terms of the diversity of theindividual predictions. This is a possible explanation of theresults shown in Tables 8 and 9. The aggregation of predic-tors that individually show higher MAEs could be balancedby the fact that the diversity of predictions increase in ahigher rate. If this would happen in our case, the theoremcould therefore explain our results.

6.1 ResultsThe application of the Diversity Prediction Theorem for pre-dictors based on user profiles with increasing distances gen-erated the results shown in Table 11. While the error valuesof SqE(rj,k,l) are different to MAE values of Table 7, a pos-itive correlation with distances is also observed. However,the values of SqE(ri,k,l) and PDiv(ri,k,l) indicate that thereason behind the error increase is not because of the loweraccuracy of the individual predictors at higher distances, butdue to the poorer diversity of such predictors.

The unexpected results shown in tables 8 and 9 are explained

Page 7: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

Table 11: Diversity Prediction Analysis for predic-tors based on user profiles with increasing distances:basic weighting prediction.

Distance SqE(rj,k,l) SqE(ri,k,l) PDiv(ri,k,l)

0 1.26678 2.14481 0.878031 1.19857 2.17998 0.981422 1.2202 2.19183 0.971633 1.48458 2.27204 0.787464 1.89438 2.24763 0.353255 1.85749 2.15699 0.2995

Table 12: Diversity Prediction Analysis for predic-tors based on aggregration of user profiles with in-creasing distances: basic weighting prediction.

Distance SqE(rj,k,l) SqE(ri,k,l) PDiv(ri,k,l)

0 1.26678 2.14481 0.878030+1 1.14378 2.15894 1.0139

0+1+2 1.10859 2.15455 1.044690+...+3 1.10515 2.16903 1.061320+...+4 1.10491 2.16971 1.06190+...+5 1.1046 2.16927 1.06234

in light of the application of the Diversity Prediction Theo-rem for predictors based on aggregation of user profiles. Ta-ble 12 show the results for increasing distances. It is clearlyobserved that error SqE(rj,k,l) decreases as the aggregationof user profiles increases with higher distances. This is ex-plained by the values of PDiv(ri,k,l), which indicates thatthe diversity of predictors also increases with higher dis-tances but at a higher rate than the errors of the individualpredictors SqE(ri,k,l).

7. DISCUSSIONThe first point to be discussed is the explanatory power ofthe Diversity Prediction Theorem. In order to test the ag-gregation effect in other algorithms, we applied the theoremto the traditional user-based algorithm. Table 13 illustrateshow the error SqE(rj,k,l) again decreases as users with lowersimilarity to the target user are aggregated to the predictionequation. As in the case with our profile-based algorithms,this result can be explained by means of the increase ofPDiv(ri,k,l).

The second point regards with our original hypothesis. Theresults show that profile-based algorithms only outperformtraditional user-based algorithms when an aggregation schemeand a compensated weighting prediction are used. The ag-gregation works only when individual predictors with highprediction errors bring diversity to the pool of predictors.If this condition is satisfied, then the aggregation of suchpredictors could improve substantially the final prediction.

As a conclussion, profile-based algorithms can be particu-larly useful in the field of tourism in which users can showdifferent profiles under different contexts. As future workwe believe that preference learning, i.e. the discovery of

Table 13: Diversity Prediction Analysis for theUser-Based Algorithm.

User Sim Th SqE(rj,k,l) SqE(ri,k,l) PDiv(ri,k,l)

1 1.19533 2.09803 0.90270.99 1.14034 2.08211 0.941850.98 1.11217 2.08478 0.9727410.97 1.09727 2.08696 0.989830.96 1.09228 2.0948 1.002630.95 1.0924 2.10321 1.010780.94 1.09589 2.11771 1.021570.93 1.0983 2.12571 1.026960.92 1.09905 2.13274 1.033120.91 1.09871 2.13686 1.037510.90 1.10166 2.14358 1.041010.89 1.10268 2.14762 1.043920.88 1.10262 2.14945 1.045780.87 1.10197 2.1498 1.046810.86 1.10179 2.15093 1.048110.85 1.10062 2.15116 1.049580.84 1.10023 2.15226 1.05110.83 1.09975 2.15284 1.05220.82 1.10031 2.15422 1.05290.81 1.10057 2.15531 1.053680.80 1.10059 2.15545 1.053770.79 1.1007 2.15607 1.054250.78 1.10048 2.15584 1.054280.77 1.10067 2.15626 1.054460.76 1.1013 2.15743 1.054840.75 1.10089 2.15739 1.055310.74 1.10109 2.15806 1.055710.73 1.10103 2.15827 1.055980.72 1.10097 2.15818 1.055980.71 1.10078 2.15797 1.056020.70 1.10158 2.1592 1.056180.69 1.10138 2.15905 1.056290.68 1.10154 2.15928 1.05630.67 1.10168 2.15954 1.056350.66 1.10156 2.15957 1.056540.65 1.10158 2.15957 1.056520.64 1.10154 2.15958 1.056580.63 1.10165 2.15975 1.056590.62 1.1016 2.15955 1.056460.61 1.10154 2.15942 1.056410.60 1.10157 2.15972 1.05666

user preferences associated to different user context, will bethe key for improving predictions and generate better rec-ommendations.

AcknowledgmentsThis research was sponsored by the Ministry of Science andInnovation of Spain under grant TIN2014-56633-C3-1-R aswell as EMALCSA/Coruna Smart City under grant CSC-14-13.

Page 8: The importance of Diversity in Profile-based recommendations: A … · 2015. 10. 28. · at Rúa do Vilar, 63 15705 Santiago de Compostela. The holder of the particulars agrees to

8. REFERENCES[1] Borras, Joan, Antonio Moreno, and Aida Valls.

Intelligent tourism recommender systems: A survey.Expert Systems with Applications 41.16, pages7370-7389, 2014.

[2] Chihiro Ono, Mori Kurokawa, Yoichi Motomura, andHideki Asoh. A Context-Aware Movie PreferenceModel Using a Bayesian Network for Recommendationand Promotion. Proceedings of the 11th InternationalConference on User Modeling, pages 257–267, 2007.

[3] Benedikt Engelbert, K Morisse, and KC Hamborg.Evaluation and user acceptance issues of a Bayesianclassifier based TV Recommendation System. Proc. ofthe RecSys 2012 Workshop on CARS, 2012.

[4] Takashi Shiraki, C Ito, and T Ohno. Large ScaleEvaluation of Multi-Mode Recommender System UsingPredicted Contexts with Mobile Phone Users. Proc. ofthe RecSys 2011 Workshop on CARS, pages 4–8, 2011.

[5] Hernani Costa, Barbara Furtado, D Pires, Luis Macedo,and Amilcar Cardoso. Context and Intention-Awarenessin POIs Recommender Systems. Proc. of the RecSys2012 Workshop on CARS, pages 1–5, 2012.

[6] R. Vargas-Govea, B. and Gonzalez-Serna, G. andPonce-Medellın. Effects of relevant contextual featuresin the performance of a restaurant recommendersystem. Proc. of the RecSys 2011 Workshop on CARS,2011.

[7] Breese, John S., David Heckerman, and Carl Kadie.Empirical analysis of predictive algorithms forcollaborative filtering. Proc. of the Fourteenthconference on Uncertainty in artificial intelligence.Morgan Kaufmann Publishers Inc., 1998.

[8] Lousame, Fabian P., and Eduardo Sanchez. Ataxonomy of collaborative-based recommender systems.Web Personalization in Intelligent Environments.Springer Berlin Heidelberg, pages 81-117, 2009.

[9] John W. Tukey. Exploratory data analysis. Reading,Mass., 1977.

[10] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, andJ. Riedl. GroupLens: an open architecture forcollaborative filtering of netnews. Proceedings of the1994 ACM conference on Computer supportedcooperative work, pages 175–186, 1994.

[11] Lu Hong and SE Page. The Foundations of Collective Wis-dom. Collective Wisdom: Principles and Mechanisms, pages1–33, 2011.