recommender systems in the linked data era

29
Recommender Systems in the Linked Data era ROBERTO MIRIZZI, PHD [email protected]

Upload: roberto-mirizzi

Post on 18-Dec-2014

466 views

Category:

Technology


2 download

DESCRIPTION

The ultimate goal of a recommender system is to suggest interesting and not obvious items (e.g., products to buy, people to connect with, movies to watch, etc.) to users, based on their preferences. The advent of the Linked Open Data (LOD) initiative in the Semantic Web gave birth to a variety of open knowledge bases freely accessible on the Web. They provide a valuable source of information that can improve conventional recommender systems, if properly exploited. Here I present several approaches to recommender systems that leverage Linked Data knowledge bases such as DBpedia. In particular, content-based and hybrid recommendation algorithms will be discussed. For full details about the presented approaches please refer to the full papers mentioned in this presentation.

TRANSCRIPT

Page 1: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data eraROBERTO MIRIZZI, [email protected]

Page 2: Recommender Systems in the Linked Data era

Outline

What is a Recommender System?◦ A definition

◦ Types

What is Linked Data?◦ LOD

◦ DBpedia

Some Recommender Systems (RS):◦ A content-based RS (memory-based)

◦ A mobile content-based RS (memory-based)

◦ A content-based RS (model-based)

◦ A hybrid RS (model-based)

Page 3: Recommender Systems in the Linked Data era

What is a Recommender System?

Page 4: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

What is a Recommender System?Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.

[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

Input Data:

A set of users U = {u1, …, uM}

A set of items I = {i1, …, iN}

The preference matrix R = [ru,i]

Problem Definition:

Given user u and target item i

Predict the preference ru,i

?

?

Page 5: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Content-based (CB): recommendations are based on the assumption that if in the past a user liked a set of items with particular features, they will likely go for items having similar characteristics

Recommender Systems: types

animation fairytale ogre castle

Collaborative-filtering (CF): recommendations are based on the assumption that users having similar history are more likely to have similar tastes/needs

Hybrid: it’s not too hard to guess what they are

Page 6: Recommender Systems in the Linked Data era

What is Linked Data?

Page 7: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

What is Linked Data?A collection of interrelated datasets on the Web

Principles:1. Use HTTP URIs to identify

things

2. Leverage standards such as RDF and SPARQL to provide information about things

3. Link related things by relationships

[http://linkeddata.org/]

Page 8: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

What is Linked Data?A collection of interrelated datasets on the Web

Principles:1. Use HTTP URIs to identify

things

2. Leverage standards such as RDF and SPARQL to provide information about things

3. Link related things by relationships

[http://linkeddata.org/]

Page 9: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

foaf:page

DBpedia: a Nucleus for a Web of Open Data

http://dbpedia.org

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.

DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data.

[Auer et al., DBpedia: A Nucleus for a Web of Open Data. ISWC+ASWC 2007][Bizer et el., A crystallization point for the Web of Data. Journal Web Semantics, 2009]

Page 10: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Querying DBpedia: SPARQL

DBpedia exposes a SPARQL endpoint (http://dbpedia.org/sparql) to query the dataset.

Results can be provided in several formats (e.g., JSON, XML, NTriples, etc.)

SPARQL is an RDF query language. Its queries consist of triple patterns, conjunctions, disjunctions and optional patterns

Page 11: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

A graph of knowledge

Why don’t we use all this information to foster recommender systems?

Ocean’s Eleven

George Clooney

Brad Pitt

Ocean’s Twelve

Steven Soderbergh

Catherine Zeta-Jones

2000s crime films

American criminal comedy films

Crime films

Crime

Page 12: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

A graph of knowledge

Ocean’s Eleven

George Clooney

Brad Pitt

Ocean’s Twelve

Steven Soderbergh

Catherine Zeta-Jones

2000s crime films

American criminal comedy films

Crime films

Crime

Why don’t we use all this information to foster recommender systems?

likes

likes

Page 13: Recommender Systems in the Linked Data era

A content-based RS (memory-based)

Page 14: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

The good old Vector Space Model

[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]

The Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary.

1, 2, ,, ,...,

T

d d d N dv w w w

, ,t d t d tw tf idf

,

,

,

t d

t d

k dk

ntf

n

, ,1

2 2

, ,1 1

( , )

N

i j i qj q i

jN N

j i j i qi i

w wd dsim d q

d q w w

' 'log

t

Didf

d D t d

Page 15: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Semantic Vector Space Model (i)

Ocean’s Eleven

George Clooney

Steven Soderberg2000s crime films

Crimestarring

directorsubject/broader

genre

Ocean’s Twelve

Brad PittCatherine Zeta-Jones

Crime filmsAmerican criminal…

Ocean’s ElevenOcean’s Twelve

starring

Each item is expressed as a tensor in a multi-dimensional space where each dimension corresponds to a specific property of the considered datasets (e.g., starring, subject/broader, director, genre, …)

Page 16: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

STARRINGGeorge

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11](13 actors)

Ocean’s Twelve [o12](15 actors)

STARRINGGeorge

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11](13 actors)

Ocean’s Twelve [o12](15 actors)

Semantic Vector Space Model (ii)

starring George Clooney [gc] Catherine Z. Jones [czj] Brad Pitt [bp]

Ocean’s Eleven [o11]

Ocean’s Twelve [o12]

, ,x y x y xactor movie actor movie actorw tf idf

11,gc ow

12,gc ow

12,czj ow

11,bp ow

12,bp ow

11,czj ow

We can now compute the scalar product between the two vectors to get their similarity…

Page 17: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Semantic Vector Space Model (iii)

12 11 12 11 12 11

12 12 12 11 11 11

, , , , , ,

12 112 2 2 2 2 2

, , , , , ,

( , )gc o gc o czj o czj o bp o bp o

starring

gc o czj o bp o gc o czj o bp o

w w w w w wsim o o

w w w w w w

…and then combine all the similarities for each property:

12 11 12 11 12 11 12 11( , ) () ) ( ,( , , )

starring directostarring director subjecr subjecttsim o o sis m oim o si o oo mo

soon we will see how to compute the p coefficients

Page 18: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Ready for our first Content-based RS

( ) , 1 if likes , 1 otherwisej j j j j

profile u m r r u m r

( )

( , )

( , )( )

j

p p j i

p

j

m profile u

i

sim m m

rP

r u mprofile u

Given a user profile, defined as:

We predict the rating using a Nearest Neighbor Classifier (Memory-based) where the similarity measure is a linear combination of local similarities:

( ) , j j j

profile u m r r

or as:

[Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS 2012) – best paper]

Page 19: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

How do we compute the p coefficients?We need to identify the best possible values for the coefficient p, that is the weights associated with each property. There are plenty of choices to do that.

Depending on the nature of the user ratings (Likert or binary), we can consider the rating prediction as a regression problem (linear regression) or as a classification problem (logistic regression), and minimize a loss function J().

In the former case we can minimize the least squares loss function, and in the latter case we can minimize the cross-entropy loss function. In both cases we can use gradient descent:

p p

p

J

Another possible approach is to use a genetic algorithm, to minimize a not smooth loss function, such as the number of misclassification errors.

Page 20: Recommender Systems in the Linked Data era

A mobile content-based RS (memory-based)

Page 21: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Let’s go Mobile (e.g., recommend movies in theaters)

[Vito Claudio Ostuni, Giosia Gentile, Tommaso Di Noia, Roberto Mirizzi, Davide Romito, Eugenio Di Sciascio. Mobile Movie Recommendations with Linked Data. Human-Computer Interaction & Knowledge Discovery @ CD-ARES’13 (HCI-KDD 2013)]

( , ) , 1 if likes with companion , 1 otherwisej j j j j

profile u cmp m r r u m cmp r

This time the user profile is context-dependent and is defined as:

( , , ) ( , , ) ( )i prefFilter preFilter i postFilter postFilter

r u m cmp r u m cmp r u

h (hierarchy): 1 if the theater is in the same city, 0 otherwisec (cluster): 1 if the theater is a multiplex, 0 otherwisecl (co-location): 1 if the theater is close to other POIs, 0 otherwisear (association-rule): 1 if the ticket price is known, 0 otherwiseap (anchor-point proximity): 1 if the theater is close to the user home or office, 0 otherwise

( )5

postFilter

h c cl ar apr u

( , )

( , )

( , , )( , )

j

j j i

m profile u cmp

preFilter i

r sim m m

r u m cmpprofile u cmp

And the prediction is made by two parts, contextual pre-filtering and contextual post-filtering:

Page 22: Recommender Systems in the Linked Data era

A content-based RS (model-based)

Page 23: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Time for a Model-based CB-RSGeorge

Clooney [gc] Catherine Z. Jones [czj]

Brad Pitt [bp]

starring

Ocean’s Eleven [o11]

Ocean’s Twelve [o12]

Steven Soderbergh [ss]

director

2000s crime films [2cf]

Crime films [cf]

American criminal comedy [acc]

subject

11,gc ow

12,gc ow

12,czj ow

11,bp ow

12,bp ow

11,czj ow

112 ,cf ow

122 ,cf ow

12,cf ow

11,acc ow

12,acc ow

11,cf ow

11,ss ow

12,ss ow

This time each item is represented by a feature vector, where each feature corresponds to a property value.

( ) , 1 if likes , 1 otherwisej j j j j

profile u m r r u m r The user profile is defined as:

[Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito. Exploiting the Web of Data in Model-based Recommender Systems. 6th ACM Conference on Recommender Systems (RecSys 2012)]

Page 24: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Training the system with an SVM classifier

[https://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png]

Support Vector Machine (SVM) is known to work well for text classification. Our problem of learning the user profile has a lot of commonalities with it, such as the sparse nature of the feature vector and the high dimensionality of the input space.

Main advantages:1. Feature selection is often not needed (SVM

robust to over-fitting and scales up pretty well)2. No need to tune parameters like before

We then fit a logistic model to SVM output to obtain a ranked list of items.

Page 25: Recommender Systems in the Linked Data era

A hybrid RS (model-based)

Page 26: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Let’s continue with a Hybrid RS

[Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi. Top-N Recommendations from Implicit Feedback leveraging Linked Open Data. 7th ACM Conference on Recommender Systems (RecSys 2013)]

We want to recommend items i to user u, exploiting both the LOD knowledge base and other users’ interactions.

The ultimate goal of this recommendation system is to rank in the top-N positions items to be likely relevant for the user, in presence of implicit feedback.

Given the nature of the problem, the user profile is defined as:

( ) is relevant for profile u i i u

Page 27: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Path-based features

1

# ( )( )

# ( )

ui

ui D

ui

d

path jx j

path d

We define as the feature vector encoding all the interactions between user u and item i. Each component of this vector represents the relevance score between u and i with respect to a particular feature, and is defined as:

D

uix

The paths can be content-based, collaborative or hybrid.

Page 28: Recommender Systems in the Linked Data era

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

Learning the ranking functionIn order to predict the ranking and form the top-N recommendation lists we deal with the learning to rank problem by adopting a point-wise approach.In particular we use a combination of Random Forests and Gradient Boosted Regression Trees (GBRT).

Page 29: Recommender Systems in the Linked Data era

Thank you!