recommender systems in the linked data era

Recommender Systems in the Linked Data eraROBERTO MIRIZZI, [email protected]

mailto:[email protected]

Outline

What is a Recommender System?◦ A definition

◦ Types

What is Linked Data?◦ LOD

◦ DBpedia

Some Recommender Systems (RS):◦ A content-based RS (memory-based)

◦ A mobile content-based RS (memory-based)

◦ A content-based RS (model-based)

◦ A hybrid RS (model-based)

What is a Recommender System?

Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA7/12/2013

What is a Recommender System?Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.

[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

Input Data:

A set of users U = {u1, …, uM}

A set of items I = {i1, …, iN}

The preference matrix R = [ru,i]

Problem Definition:

Given user u and target item i

Predict the preference ru,i

?

?


Content-based (CB): recommendations are based on the assumption that if in the past a user liked a set of items with particular features, they will likely go for items having similar characteristics

Recommender Systems: types

animation fairytale ogre castle

Collaborative-filtering (CF): recommendations are based on the assumption that users having similar history are more likely to have similar tastes/needs

Hybrid: it’s not too hard to guess what they are

What is Linked Data?


What is Linked Data?A collection of interrelated datasets on the Web

Principles:1. Use HTTP URIs to identify

things

2. Leverage standards such as RDF and SPARQL to provide information about things

3. Link related things by relationships

[http://linkeddata.org/]


foaf:page

DBpedia: a Nucleus for a Web of Open Data

http://dbpedia.org

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.

DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data.

[Auer et al., DBpedia: A Nucleus for a Web of Open Data. ISWC+ASWC 2007][Bizer et el., A crystallization point for the Web of Data. Journal Web Semantics, 2009]


Querying DBpedia: SPARQL

DBpedia exposes a SPARQL endpoint (http://dbpedia.org/sparql) to query the dataset.

Results can be provided in several formats (e.g., JSON, XML, NTriples, etc.)

SPARQL is an RDF query language. Its queries consist of triple patterns, conjunctions, disjunctions and optional patterns

http://dbpedia.org/sparql


A graph of knowledge

Why don’t we use all this information to foster recommender systems?

Ocean’s Eleven

George Clooney

Brad Pitt

Ocean’s Twelve

Steven Soderbergh

Catherine Zeta-Jones

2000s crime films

American criminal comedy films

Crime films

Crime


A graph of knowledge

Ocean’s Eleven

George Clooney

Brad Pitt

Ocean’s Twelve

Steven Soderbergh

Catherine Zeta-Jones

2000s crime films

American criminal comedy films

Crime films

Crime

Why don’t we use all this information to foster recommender systems?

likes

likes

A content-based RS (memory-based)


The good old Vector Space Model

[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]

The Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary.

1, 2, ,, ,...,

T

d d d N dv w w w

, ,t d t d tw tf idf

,

,

,

t d

t d

k dk

ntf

n

, ,1

2 2

, ,1 1

( , )

N

i j i qj q i

jN N

j i j i qi i

w wd dsim d q

d q w w

' 'log

t

Didf

d D t d


Semantic Vector Space Model (i)

Ocean’s Eleven

George Clooney

Steven Soderberg2000s crime films

Crimestarring

directorsubject/broader

genre

Ocean’s Twelve

Brad PittCatherine Zeta-Jones

Crime filmsAmerican criminal…

Ocean’s ElevenOcean’s Twelve

starring

Each item is expressed as a tensor in a multi-dimensional space where each dimension corresponds to a specific property of the considered datasets (e.g., starring, subject/broader, director, genre, …)


STARRINGGeorge

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11](13 actors)

Ocean’s Twelve [o12](15 actors)

STARRINGGeorge

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11](13 actors)

Ocean’s Twelve [o12](15 actors)

Semantic Vector Space Model (ii)

starring George Clooney [gc] Catherine Z. Jones [czj] Brad Pitt [bp]

Ocean’s Eleven [o11]

Ocean’s Twelve [o12]

, ,x y x y xactor movie actor movie actorw tf idf

11,gc ow

12,gc ow

12,czj ow

11,bp ow

12,bp ow

11,czj ow

We can now compute the scalar product between the two vectors to get their similarity…


Semantic Vector Space Model (iii)

12 11 12 11 12 11

12 12 12 11 11 11

, , , , , ,

12 112 2 2 2 2 2

, , , , , ,

( , )gc o gc o czj o czj o bp o bp o

starring

gc o czj o bp o gc o czj o bp o

w w w w w wsim o o

w w w w w w

…and then combine all the similarities for each property:

12 11 12 11 12 11 12 11( , ) () ) ( ,( , , )

starring directostarring director subjecr subjecttsim o o sis m oim o si o oo mo

soon we will see how to compute the p coefficients


Ready for our first Content-based RS

( ) , 1 if likes , 1 otherwisej j j j j

profile u m r r u m r

( )

( , )

( , )( )

j

p p j i

p

j

m profile u

i

sim m m

rP

r u mprofile u

Given a user profile, defined as:

We predict the rating using a Nearest Neighbor Classifier (Memory-based) where the similarity measure is a linear combination of local similarities:

( ) , j j j

profile u m r r

or as:

[Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS 2012) – best paper]


How do we compute the p coefficients?We need to identify the best possible values for the coefficient p, that is the weights associated with each property. There are plenty of choices to do that.

Depending on the nature of the user ratings (Likert or binary), we can consider the rating prediction as a regression problem (linear regression) or as a classification problem (logistic regression), and minimize a loss function J().

In the former case we can minimize the least squares loss function, and in the latter case we can minimize the cross-entropy loss function. In both cases we can use gradient descent:

p p

p

J

Another possible approach is to use a genetic algorithm, to minimize a not smooth loss function, such as the number of misclassification errors.

A mobile content-based RS (memory-based)


Let’s go Mobile (e.g., recommend movies in theaters)

[Vito Claudio Ostuni, Giosia Gentile, Tommaso Di Noia, Roberto Mirizzi, Davide Romito, Eugenio Di Sciascio. Mobile Movie Recommendations with Linked Data. Human-Computer Interaction & Knowledge Discovery @ CD-ARES’13 (HCI-KDD 2013)]

( , ) , 1 if likes with companion , 1 otherwisej j j j j

profile u cmp m r r u m cmp r

This time the user profile is context-dependent and is defined as:

( , , ) ( , , ) ( )i prefFilter preFilter i postFilter postFilter

r u m cmp r u m cmp r u

h (hierarchy): 1 if the theater is in the same city, 0 otherwisec (cluster): 1 if the theater is a multiplex, 0 otherwisecl (co-location): 1 if the theater is close to other POIs, 0 otherwisear (association-rule): 1 if the ticket price is known, 0 otherwiseap (anchor-point proximity): 1 if the theater is close to the user home or office, 0 otherwise

( )5

postFilter

h c cl ar apr u

( , )

( , )

( , , )( , )

j

j j i

m profile u cmp

preFilter i

r sim m m

r u m cmpprofile u cmp

And the prediction is made by two parts, contextual pre-filtering and contextual post-filtering:

A content-based RS (model-based)


Time for a Model-based CB-RSGeorge

Clooney [gc] Catherine Z. Jones [czj]

Brad Pitt [bp]

starring

Ocean’s Eleven [o11]

Ocean’s Twelve [o12]

Steven Soderbergh [ss]

director

2000s crime films [2cf]

Crime films [cf]

American criminal comedy [acc]

subject

11,gc ow

12,gc ow

12,czj ow

11,bp ow

12,bp ow

11,czj ow

112 ,cf ow

122 ,cf ow

12,cf ow

11,acc ow

12,acc ow

11,cf ow

11,ss ow

12,ss ow

This time each item is represented by a feature vector, where each feature corresponds to a property value.

( ) , 1 if likes , 1 otherwisej j j j j

profile u m r r u m r The user profile is defined as:

[Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito. Exploiting the Web of Data in Model-based Recommender Systems. 6th ACM Conference on Recommender Systems (RecSys 2012)]


Training the system with an SVM classifier

[https://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png]

Support Vector Machine (SVM) is known to work well for text classification. Our problem of learning the user profile has a lot of commonalities with it, such as the sparse nature of the feature vector and the high dimensionality of the input space.

Main advantages:1. Feature selection is often not needed (SVM

robust to over-fitting and scales up pretty well)2. No need to tune parameters like before

We then fit a logistic model to SVM output to obtain a ranked list of items.

A hybrid RS (model-based)


Let’s continue with a Hybrid RS

[Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi. Top-N Recommendations from Implicit Feedback leveraging Linked Open Data. 7th ACM Conference on Recommender Systems (RecSys 2013)]

We want to recommend items i to user u, exploiting both the LOD knowledge base and other users’ interactions.

The ultimate goal of this recommendation system is to rank in the top-N positions items to be likely relevant for the user, in presence of implicit feedback.

Given the nature of the problem, the user profile is defined as:

( ) is relevant for profile u i i u


Path-based features

1

# ( )( )

# ( )

ui

ui D

ui

d

path jx j

path d

We define as the feature vector encoding all the interactions between user u and item i. Each component of this vector represents the relevance score between u and i with respect to a particular feature, and is defined as:

D

uix

The paths can be content-based, collaborative or hybrid.


Learning the ranking functionIn order to predict the ranking and form the top-N recommendation lists we deal with the learning to rank problem by adopting a point-wise approach.In particular we use a combination of Random Forests and Gradient Boosted Regression Trees (GBRT).

Thank you!

recommender systems in the linked data era

Technology