linked open data to support content based recommender systems

22
I-SEMANTICS 2012 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria LINKED OPEN D ATA TO SUPPORT CONTENT-BASED RECOMMENDER S YSTEMS Tommaso Di Noia 1 , Roberto Mirizzi 2 , Vito Claudio Ostuni 1 , Davide Romito 1 , Markus Zanker 3 [email protected], [email protected], [email protected], [email protected], [email protected] 2 HP Labs 1501 Page Mill Road Palo Alto, CA (US) 94304 3 Alpen-Adria-Universität Klagenfurt Universitätsstraße 65 -67 9020 Klagenfurt, Austria 1 Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY)

Upload: vito-ostuni

Post on 11-May-2015

567 views

Category:

Business


2 download

TRANSCRIPT

Page 1: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

LINKED OPEN DATA TO SUPPORT CONTENT-BASED RECOMMENDER SYSTEMS

Tommaso Di Noia1, Roberto Mirizzi2, Vito Claudio Ostuni1, Davide Romito1, Markus Zanker3

[email protected], [email protected], [email protected], [email protected], [email protected]

2HP Labs 1501 Page Mill Road Palo Alto, CA (US) 94304

3Alpen-Adria-Universität Klagenfurt Universitätsstraße 65 -67 9020 Klagenfurt, Austria

1Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY)

Page 2: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Outline

What are (Content-based) Recommender Systems? The main drawback: limited content analysis

Vector Space Model for Linked Open Data (LOD) Vector Space Model adapted to RDF graphs

A Semantic Content-based Recommender System A Memory-based algorithm which uses a LOD-based item similarity measure

Evaluation Precision and Recall experiments with MovieLens

Conclusion

Page 3: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Recommender Systems

Input Data: A set of users U={u1, …, uN} A set of items I={i1, …, iM} The rating matrix R=[ru,i]

Problem Definition:

Given user u and target item i Predict the rating ru,i

A definition Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

Page 4: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Content-based Recommender Systems

CB-RSs recommend items to a user based on their description and on the profile of the user’s interests *

Recommender System

User profile

Items Item1 Item2

Item100 Item’s

descriptions

Item7 Item15 Item11 …

Top-N Recommendations

Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….

(*) Pazzani, M. J., & Billsus, D. Content-Based Recommendation Systems. The Adaptive Web. Lecture Notes in Computer Science vol. 4321, 325-341, 2007

….

Page 5: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Main CB RS Drawback: Limited Content Analysis

Need of domain knowledge! We need rich descriptions of the items!

No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.*

(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira, editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners

The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items.

Page 6: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

A Linked Data based Solution

Use Linked Data to mitigate the limited content analysis issue

Plenty of structured data available No Content Analyzer required

Page 7: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

LINKED DATA as structured information source for item’s descriptions

Rich items descriptions

Let’s use all this ontological knowledge to build smarter CB RSs

Page 8: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Computing similarity in LOD datasets

Page 9: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Vector Space Model for LOD (i)

[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]

Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary.

1, 2, ,, ,...,T

d d d N dv w w w

, ,t d t d tw tf idf

,

,

,

t d

t d

k dk

ntf

n

, ,1

2 2

, ,1 1

( , )

N

i j i qj q ij

N Nj i j i qi i

w wd dsim d q

d q w w

' 'logt

Didf

d D t d

Page 10: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Vector Space Model for LOD(ii)

Righteous Kill

Robert De Niro

John Avnet Serial killer films

Drama starring

director subject/broader

genre

Heat

Al Pacino Brian Dennehy

Heist films

Crime films

Rig

hte

ou

s K

ill

Ro

be

rt D

e N

iro

Joh

n A

vnet

Seri

al k

ille

r fi

lms

Dra

ma

He

at

Al P

acin

o

Bri

an D

en

ne

hy

He

ist

film

s C

rim

e f

ilms

starring

Ro

be

rt D

e N

iro

A

l Pac

ino

B

rian

De

nn

eh

y

Righteous Kill Heat

Page 11: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Vector Space Model for LOD(iii)

Righteous Kill

STARRING Al Pacino

(a1)

Robert De Niro

(a2)

Brian Dennehy

(a3)

Righteous Kill (m1)

Heat (m2)

Heat

xyxyx actormovieactormovieactor idftfw ,,

Righteous Kill (m1) wa1,m1 wa2,m1 wa3,m1

Heat (m2) wa1,m2 wa2,m2 0

Page 12: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Vector Space Model for LOD(iv)

1 2( , )starring starringsim m m

1 2( , )director directorsim m m

1 2( , )subject subjectsim m m

+

+

1 2( , )sim m m

+ … =

1 1 1 2 2 1 2 2 3 1 3 2

1 1 2 1 3 1 1 2 2 2 3 2

, , , , , ,

1 22 2 2 2 2 2

, , , , , ,

( , )a m a m a m a m a m a m

starring

a m a m a m a m a m a m

w w w w w wsim m m

w w w w w w

Page 13: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Semantic Content-based Recommender

Given a user profile, defined as:

( ) , =1 if u likes , =-1 otherwise j j j j jprofile u m v v m v

We predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities

( )

( , )

( , )( )

j

p p j i

p

j

m profile u

i

sim m m

vP

r u mprofile u

If this similarity is greater or equal to 0, we suggest the movie m to the user u.

Page 14: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Training the system(i)

In order to identify the best possible values for the coefficients p (i.e., the weights associated to the properties), we train the system via a genetic algorithm.

Fitness function: Minimize the number of misclassification errors ei on the training data (user profile)

User profile

Item1, 1 Item2, -1 Item5, 1 ….

training data user u

| ( )|

i

profile u

Min e

(p1 p2 p3 ….)

optimal values

Optimization

Page 15: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Training the system(ii)

In some cases (e.g. new user problem) the user could have not rated any item yet. The user-profile is empty. We cannot learn the αp coefficients!

Look at Amazon.com Use Amazon’s collaborative results to capture movie similarities We collected a set of 1000 movies from Amazon. For each one of these movies we look at the correspondent recommendation list.

First suggestion

Righteous Kill Heat

Increment the weights αp associated to the common properties between the two movies. e.g. They have same actors in common and no directors. Hence we can increase the weight of the property starring.

Page 16: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Experiment settings(i)

MovieLens 1M dataset One-One mapping between MovieLens and DBpedia Using SPARQL queries and Levensthein Distance 3,654 matched movies on 3,952

( ) , =1 if r(u,m ) r , =-1 otherwise j j j j u jprofile u m v v v

Binarization of the 1-5 rating scale

@@

Rec N TestSetP N

N

@@

Rec N TestSetR N

TestSet

1,2...20N

Evaluation goal : Top-N recommendations Metrics: Precision@n + Recall@n

Page 17: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Experiment settings(ii)

53,840 actors, 18,149 directors, 29,352 distinct writers and 27,035 categories from DBpedia 667 genres from Freebase 26 genres from LinkedMDB

Extracted Graph

dcterms:subject + skos:broader + DBpedia Ontology + Freebase + LinkedMDB genres

Properties

Page 18: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Alpha-coefficients evaluation

The α-coefficents obtained with the genetic algorithm give us the best performance.

Page 19: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Property subset evaluation

The subject+broader solution is better than only subject or subject+more broaders.

The best solution is achieved with subject+broader+ genres.

Too many broaders introduce noise.

Page 20: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Evaluation against other approaches

Our solution outperforms a Linked Data approach (LDSD) and others content-based which do not leverage LOD.

Page 21: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Conclusion & Future directions

The huge amount of data available on Linked Data datasets can be successfully exploited to overcome limited content analysis.

We have presented a semantic version of the classical vector space model to compute item similarities.

Evaluation against historical datasets and high values of precision and recall prove the validity of our approach.

We are currently working on: Testing the approach with different domains

Improving the recommendation with a hybrid approach (content-based and collaborative filtering)

Page 22: Linked Open Data to support content based Recommender Systems

I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria

Q & A

We acknowledge partial support of HP IRP 2011. Grant CW267313.