linked open data-enabled strategies for top-n recommendations

79
Linked Open Data-enabled Strategies for Top-N Recommendations Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group) CBRecSys 2014 Workshop on New Trends in Content-based Recommender Systems Foster City (CA, United States) October 6, 2014

Upload: cataldo-musto

Post on 01-Jul-2015

1.070 views

Category:

Documents


1 download

DESCRIPTION

Linked Open Data-enabled Strategies for Top-N Recommendations - Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro - 1st Workshop on New Trends in Content-based Recommender Systems, co-located with ACM Recommender Systems 2014

TRANSCRIPT

Page 1: Linked Open Data-enabled Strategies for Top-N Recommendations

Linked Open Data-enabled Strategies for Top-N Recommendations

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

CBRecSys 2014 Workshop on New Trends in

Content-based Recommender Systems Foster City (CA, United States)

October 6, 2014

Page 2: Linked Open Data-enabled Strategies for Top-N Recommendations

Outline• Background

• Content-based RecSys (CBRS) • Limitations

• Linked Open Data • What? • Introducing LOD in CBRS

• Experiments • Conclusions

2Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 3: Linked Open Data-enabled Strategies for Top-N Recommendations

Content-based Recommender SystemsSuggest items similar to those the user liked in the past (I bought Converse shoes, I’ll continue buying similar sport shoes)

3Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 4: Linked Open Data-enabled Strategies for Top-N Recommendations

Content-based Recommender SystemsLimitations

Limited content

4

(in several domains)

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 5: Linked Open Data-enabled Strategies for Top-N Recommendations

Content-based Recommender SystemsLimitations

Poor Semantics

5Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 6: Linked Open Data-enabled Strategies for Top-N Recommendations

How can we boost Content-based

Recommender Systems with Semantics?

(and with more content)

6

Problem

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 7: Linked Open Data-enabled Strategies for Top-N Recommendations

7

Semantics in CBRSState of the art

XOntologies

Encyclopedic Knowledge Linked Open Data

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Distributional SemanticsFolksonomies

Page 8: Linked Open Data-enabled Strategies for Top-N Recommendations

8

Top-down approachesWhat is the difference?

XFormal Semantics Large-scale

Folksonomies X XOntologies V X

Encyclopedic Knowledge X VDistributional Semantics X V

Linked Open Data V V

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 9: Linked Open Data-enabled Strategies for Top-N Recommendations

9

Top-down approachesWhat is the difference?

XFormal Semantics Large-scale

Folksonomies X XOntologies V X

Encyclopedic Knowledge X VDistributional Semantics X V

Linked Open Data V V

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Linked Open Data merge the vastness of encyclopedic knowledge with the formal semantics typical of ontologies

Page 10: Linked Open Data-enabled Strategies for Top-N Recommendations

10

Top-down approachesWhat is the difference?

XFormal Semantics Large-scale

Folksonomies X XOntologies V X

Encyclopedic Knowledge X VDistributional Semantics X V

Linked Open Data V V

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Linked Open Data merge the vastness of encyclopedic knowledge with the formal semantics typical of ontologies

We focus on the introduction of Linked Open Data in

Content-based Recommender Systems

Page 11: Linked Open Data-enabled Strategies for Top-N Recommendations

11

Linked Open Data

What are we talking about?Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 12: Linked Open Data-enabled Strategies for Top-N Recommendations

12

Linked Open Data

Methodology to publish, share and link structured data on the Web

Definition

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 13: Linked Open Data-enabled Strategies for Top-N Recommendations

13

Linked Open Data (cloud)

A (large) set of interconnected semantic datasets

What is it?

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 14: Linked Open Data-enabled Strategies for Top-N Recommendations

14

Linked Open Data (cloud)What kind of datasets?

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 15: Linked Open Data-enabled Strategies for Top-N Recommendations

15

Linked Open Data (cloud)DBpedia

http://dbpedia.org

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 16: Linked Open Data-enabled Strategies for Top-N Recommendations

16

Linked Open Data (cloud)

DBpedia is the structured mapping of Wikipedia

http://dbpedia.org

It is the core of the LOD cloud.Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

DBpedia

Page 17: Linked Open Data-enabled Strategies for Top-N Recommendations

17

Linked Open Data (cloud)Example: unstructured content from Wikipedia

“Foster City is a town in United States located in California”example

(from Wikipedia page)

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 18: Linked Open Data-enabled Strategies for Top-N Recommendations

18

Linked Open Data (cloud)How are these data represented?

Semantic Web cake

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Information from the LOD cloud is

represented in RDF

Page 19: Linked Open Data-enabled Strategies for Top-N Recommendations

19

Linked Open Data (cloud)How are these data represented?

Foster City United States

California

http://dbpedia.org/resource/Foster_City,_California

http://dbpedia.org/resource/California

http://dbpedia.org/resource/United_States

dbpedia-owl:country

dbpedia-owl:isPartOf

“Foster City is a town in United States located in California”example

(from Wikipedia page)

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 20: Linked Open Data-enabled Strategies for Top-N Recommendations

20

Linked Open Data (cloud)How are these data represented?

Foster City United States

California

http://dbpedia.org/resource/Foster_City,_California

http://dbpedia.org/resource/California

http://dbpedia.org/resource/United_States

dbpedia-owl:country

dbpedia-owl:isPartOf

“Foster City is a town in United States located in California”example

(from Wikipedia page)

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Data coming from the LOD cloud have a formal semantics represented in RDF

Page 21: Linked Open Data-enabled Strategies for Top-N Recommendations

21

Our checklistCan Linked Open Data boost

content-based recommender systems?

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

More Semantics More Content

V ?

Page 22: Linked Open Data-enabled Strategies for Top-N Recommendations

22

Linked Open Data (cloud)How many data?

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 23: Linked Open Data-enabled Strategies for Top-N Recommendations

23

Linked Open Data (cloud)How many data?

1048 datasets and 58 billions triplessource: http://stats.lod2.eu

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 24: Linked Open Data-enabled Strategies for Top-N Recommendations

24

Our checklist

More Semantics More Content

V V

Can Linked Open Data boost content-based recommender systems?

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 25: Linked Open Data-enabled Strategies for Top-N Recommendations

25

Our checklist

More Semantics More Content

V V

Can Linked Open Data boost content-based recommender systems?

…but

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 26: Linked Open Data-enabled Strategies for Top-N Recommendations

26

Research Question

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 27: Linked Open Data-enabled Strategies for Top-N Recommendations

27

ApproachWe propose two methodologies to

introduce LOD-based features into CBRS

Direct Access to DBpedia Entity Linking algorithms

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 28: Linked Open Data-enabled Strategies for Top-N Recommendations

28

Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia

The simplest way to introduce LOD-based features

Domain-dependent features are manually defined

(e.g. book recommendation —> genre, author, publisher, subject, etc.)

SPARQL queries extract features’ values

1.

2.Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

(We assume that each item to be recommender is already in the LOD cloud)

Page 29: Linked Open Data-enabled Strategies for Top-N Recommendations

Example: The Great and Secret Show (Clive Barker’s book)

29

Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 30: Linked Open Data-enabled Strategies for Top-N Recommendations

30

Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia

e.g. Book Recommendation: author, genre, publisher, subjectCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 31: Linked Open Data-enabled Strategies for Top-N Recommendations

31

Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia

Each item is represented through the set of the (manually defined) features extracted from the LOD cloud.

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 32: Linked Open Data-enabled Strategies for Top-N Recommendations

32

Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia

9 LOD-based features: author (Clive Barker), genre (Fantasy Literature), publisher (William Collins), series (Books of the Art), subject (1980s fantasy novels, William Collins books,

Novels by Clive Barker, British Fantasy Novels)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 33: Linked Open Data-enabled Strategies for Top-N Recommendations

33

Direct Access to DBpediaAnalysis

- Very Straightforward approach

- SPARQL queries can be easily built

- Properties are manually defined- Approach is strongly domain-dependent- Does not exploit unstructured information

Pros:

Cons:

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 34: Linked Open Data-enabled Strategies for Top-N Recommendations

Introducing LOD-based features in CBRS

• Entity Linking Algorithms!• Input: free text.

• items description, in our setting • Output: identification of the most

relevant entities mentioned in the text.

• State of the art • tag.me(1), • DBpedia Spotlight(2), • Wikipedia Miner(3)

Methodology :: Entity Linking algorithms

(1) http://tagme.di.unipi.it

(2) http://spotlight.dbpedia.org

(3) http://wikipedia-miner.cms.waikato.ac.nz

34Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 35: Linked Open Data-enabled Strategies for Top-N Recommendations

Introducing LOD-based features in CBRS

• Entity Linking Algorithms!• Input: free text.

• items description, in our setting • Output: identification of the most

relevant entities mentioned in the text.

• State of the art • tag.me(1), • DBpedia Spotlight(2), • Wikipedia Miner(3)

Methodology :: Entity Linking algorithms

(1) http://tagme.di.unipi.it

(2) http://spotlight.dbpedia.org

(3) http://wikipedia-miner.cms.waikato.ac.nz

35Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 36: Linked Open Data-enabled Strategies for Top-N Recommendations

36

• Entity Linking Algorithms!• Input: free text.

• in this setting: textual description of the items (e.g. Wikipedia abstract)

• Output: identification of the most relevant entities mentioned in the text.

Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms

from Tagme

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 37: Linked Open Data-enabled Strategies for Top-N Recommendations

Entity Linking - output

37

Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms

Very human-readable representation!Free n-grams and entity recognition, free sense disambiguation

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 38: Linked Open Data-enabled Strategies for Top-N Recommendations

Entity Linking - output

Each entity is a reference to a DBpedia node http://dbpedia.org/resource/Harry_D'Amour

not a simple textual feature!

38

Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 39: Linked Open Data-enabled Strategies for Top-N Recommendations

LOD-based representation can be enriched!through broader categories by exploiting SPARQL queries

39

Introducing LOD-based features in CBRS

encoded in the dcterms:subject property

Methodology :: Entity Linking algorithms

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 40: Linked Open Data-enabled Strategies for Top-N Recommendations

The final representation of

each item is obtained by merging the

DBpedia nodes identified in the

text with those the dcterms:subjects property refers to

(broader categories)

+dbpedia nodesbroader categories

Features =

40

Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 41: Linked Open Data-enabled Strategies for Top-N Recommendations

41

Entity Linking AlgorithmsAnalysis

Pros:

Cons:

- Very general approach

- Strong features engineering (which ones are the best?)

- Threshold score of Entity Linking algorithms is difficult to be set

- Exploit unstructured information

- May introduce unexpected (but relevant) features

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 42: Linked Open Data-enabled Strategies for Top-N Recommendations

42

LOD-based features in CBRS

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 43: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationResearch Hypothesis

43

1. Which is the contribution of the Linked Open Data features to the accuracy of recommendation algorithms?

2. Does the representation based on Linked Open Data outperform existing state-of-the-art recommendation algorithms?

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 44: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDescription of the dataset

44

• Book recommendation • ESWC 2014 Challenge

Dataset (*)

• 6,733 books

• 6,181 users

• 72,372 binary ratings • 11.71 ratings/user • Very sparse dataset! • Only 5.37 positive

ratings/user! (*) http://challenges.2014.eswc-conferences.org/index.php/RecSys

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 45: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationFeature combinations

45

• Content (crawled from Wikipedia + NLP processing)

• LOD (direct access to DBpedia)

• Entity Linking (Tagme)

• Content + LOD

• Content + Entity Linking

• LOD + Entity Linking

• All

7 combinations for each run

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 46: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationSetup

46

• Evaluation of the effectiveness of LOD-based features on varying six different recommendation algorithms

• Vector Space Models • VSM • BM25 • eVSM (*)

• Classifiers • Random Forests • Linear Regression

• Graph-based Approaches • PageRank with Priors

(*) C. Musto: Enhanced vector space models for content-based recommender

systems. RecSys 2010: 361-364

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 47: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDesign of the Experiment :: Vector Space Models

47

User profile (built upon the features describing the items the

user liked) used as query

Cosine Similarity to get the most similar items

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 48: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDesign of the Experiment :: Classifiers

48

Random Forests learn a classification model which is used to predict the class (positive/negative)

of unlabeled item.!! Model is based on the features

coming from labeled items.

Linear Regression also uses “basic” features (e.g. positive and

negative ratings, average rating of the user, ratio between positive and

negative ratings, etc.) to learn the model.

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 49: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)

49Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

users, items = nodes positive feedback = edgesPageRank calculates the ‘importance’ of a node according to the

quality and the number of its connectionsEqual probability is assigned to all the nodes, by default

graph-based representation

Page 50: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)

50Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

users, items = nodes positive feedback = edges

PageRank with Priors introduces a bias towards some nodes !(in our setting, the items the user liked)

PageRank calculates the ‘importance’ of a node according to the quality and the number of its connections

graph-based representation

Page 51: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)

51

Several strategies to build the graph are compared

1. no-LOD. Graph only models users and items

2. small-LOD. Graph expanded with new nodes

by adding basic properties (subject,

genre, publisher, author, etc.), of the items as well

as their relationships

3. big-LOD. Graph is further expanded by

introducing more nodes (e.g. other resources of the same

genre, other resources written by the authors, etc.),

as well as their relationships

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Rationale: the introduction of new nodes and connections coming from the LOD cloud can

improve the effectiveness of the PageRank.

Page 52: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)

52

Several strategies to build the graph are compared

1. no-LOD. Graph only models users and items

2. small-LOD. Graph expanded with new nodes

by adding basic properties (subject,

genre, publisher, author, etc.), of the items as well

as their relationships

3. big-LOD. Graph is further expanded by

introducing more nodes (e.g. other resources of the same

genre, other resources written by the authors, etc.),

as well as their relationships

PRP is run and items in the test set are ranked according to their PageRank

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 53: Linked Open Data-enabled Strategies for Top-N Recommendations

Experimental EvaluationRecap

53Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

6 algorithms 7 set of features• Content

• LOD

• Entity Linking

• Content + LOD

• Content + Entity Linking

• LOD + Entity Linking

• All

• VSM

• BM25

• eVSM

• Linear Regression

• Random Forests

• Page Rank With Priors

Page 54: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 1

54

Impact of LOD-based features.

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 55: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,5 54 54,5 55

54,36

54,69

54,47

54,59

54,62

53,79

54,42

Experiment 1

55

Impact of LOD-based features :: VECTOR SPACE MODEL

LOD-based features improve F1-measureCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,17+0,05

Page 56: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,5 54 54,5 55

54,36

54,69

54,47

54,59

54,62

53,79

54,42

Experiment 1

56

Impact of LOD-based features :: VECTOR SPACE MODEL

Statistically significant improvementCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,17+0,05

paired t-test (p<0.01)

Page 57: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,5 54 54,5 55

54,36

54,69

54,47

54,59

54,62

53,79

54,42

Experiment 1

57

Impact of LOD-based features :: VECTOR SPACE MODEL

Best: LOD+Entity Linking (No Content!)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,27

paired t-test (p<0.01)

Page 58: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,5 54 54,5 55

54,6

53,91

54,51

54,56

53,9

53,43

54,43

Experiment 1

58

Impact of LOD-based features :: BM25

Worst (again): LOD aloneCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

-1,00%

Page 59: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,5 54 54,5 55

54,6

53,91

54,51

54,56

53,9

53,43

54,43

Experiment 1

59

Impact of LOD-based features :: BM25

Best (again): LOD+Entity Linking (With Content!)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,17

paired t-test (p<0.01)

Page 60: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

51 51,75 52,5 53,25 54

53,02

53,04

53,07

52,8

53,37

52,06

52,9

Experiment 1

60

Impact of LOD-based features :: EVSM

Introduction of LOD-based features leads to an improvement againCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,17+0,14

paired t-test (p<0.01)

+0,12

+0,47

Page 61: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 1

61

Impact of LOD-based features :: LESSONS LEARNED FOR VSMS

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

VSM BM25 eVSM

LOD features alone are always the worst configuration.

(At least) a LOD-based representation based on Entity Linking always

improve the content alone

1.2.

Page 62: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,25 53,5 53,75 54

53,86

53,77

53,76

53,75

53,68

53,34

53,52

Experiment 1

62

Impact of LOD-based features :: RANDOM FORESTS

Similar outcomes: all but LOD alone lead to improvementCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,36

Page 63: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

53 53,25 53,5 53,75 54

53,86

53,77

53,76

53,75

53,68

53,34

53,52

Experiment 1

63

Impact of LOD-based features :: RANDOM FORESTS

Content does matter: LOD+entity+content is the bestCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,36

Page 64: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

55 55,25 55,5 55,75 56

55,57

55,61

55,64

55,5

55,67

55,59

55,59

Experiment 1

64

Impact of LOD-based features :: LINEAR REGRESSION

Entity-based representation is the best oneCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,08paired t-test (p<0.01)

Page 65: Linked Open Data-enabled Strategies for Top-N Recommendations

CONTENT

LOD

ENTITY

CONTENT+LOD

CONTENT+ENTITY

LOD+ENTITY

ALL

55 55,25 55,5 55,75 56

55,57

55,61

55,64

55,5

55,67

55,59

55,59

Experiment 1

65

Impact of LOD-based features :: LINEAR REGRESSION

BTW, smaller improvements (due to basic features?)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,08paired t-test (p<0.01)

Page 66: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 1

66

Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

LRRF

LOD features alone never overcome the content

(At least) a LOD-based representation based on Entity Linking always

improve the content alone

1.2.

Page 67: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 1

67

Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

LR RF

LOD features alone never overcome the content

(At least) a LOD-based representation based on Entity Linking always

improve the content alone

1.2.

Same outcomes (algorithm-independent behaviour)

Page 68: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 1

68

Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

LR RF

LOD features alone never overcome the content

(At least) a LOD-based representation based on Entity Linking always

improve the content alone

1.2.

Same outcomes (algorithm-independent behaviour)

Page 69: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 1

69

Impact of LOD-based features :: PAGERANK WITH PRIORS

The more LOD-based data, the best the accuracyCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

NO-LOD

SMALL-LOD

BIG-LOD

53 54 55 56 57

55,44

54,73

54,28

+0,45

+1,16

paired t-test (p<0.001)

Page 70: Linked Open Data-enabled Strategies for Top-N Recommendations

NO-LOD

SMALL-LOD

BIG-LOD

53 54 55 56 57

55,44

54,73

54,28

Experiment 1

70

Impact of LOD-based features :: PAGERANK WITH PRIORS

Drawback: more nodes produce an exponential growth of computational costs (from 3 hours to 120 hours to run the experiment!)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+0,45

+1,16

paired t-test (p<0.001)

Page 71: Linked Open Data-enabled Strategies for Top-N Recommendations

Experiment 2

71

Comparison to State of the art

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

SPRANK (Semantic Path Ranking)[*] BPRMF (Bayesian Personalized Ranking) [+]

U2U_CF (User to User CF) I2I_CF (Item to Item CF)

[+] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme: BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI 2009.

[*] V. Ostuni, T. Di Noia, E. Di Sciascio, R. Mirizzi: Top-N recommendations from implicit feedback leveraging Linked Open Data. RECSYS 2013

Page 72: Linked Open Data-enabled Strategies for Top-N Recommendations

VSM

LR

PRP

SPRANK

BPRMF

U2U_CF

I2I_CF

51 52,25 53,5 54,75 56

52,24

52,28

54,12

52,27

55,44

55,67

54,69

Experiment 2

72

Comparison to state of the art

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Our best-performing configurations are considered as baseline

baselines

Page 73: Linked Open Data-enabled Strategies for Top-N Recommendations

VSM

LR

PRP

SPRANK

BPRMF

U2U_CF

I2I_CF

51 52,25 53,5 54,75 56

52,24

52,28

54,12

52,27

55,44

55,67

54,69

Experiment 2

73

Comparison to state of the art

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Classical CF techniques poorly performs (sparsity?)

Page 74: Linked Open Data-enabled Strategies for Top-N Recommendations

VSM

LR

PRP

SPRANK

BPRMF

U2U_CF

I2I_CF

51 52,25 53,5 54,75 56

52,24

52,28

54,12

52,27

55,44

55,67

54,69

Experiment 2

74

Comparison to state of the art

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

+3,4% over LOD-based state of the art algorithm

!-3,4%

Page 75: Linked Open Data-enabled Strategies for Top-N Recommendations

VSM

LR

PRP

SPRANK

BPRMF

U2U_CF

I2I_CF

51 52,25 53,5 54,75 56

52,24

52,28

54,12

52,27

55,44

55,67

54,69

Experiment 2

75

Comparison to state of the art

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Our approaches overcome Matrix Factorization

+0,57

+0,32

+1,55

Page 76: Linked Open Data-enabled Strategies for Top-N Recommendations

Conclusions

76Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 77: Linked Open Data-enabled Strategies for Top-N Recommendations

Lessons Learned

77

Two Solutions have been proposed.!Direct Access to DBpedia and Entity Linking Algorithms!!

Evaluation.!Research Question: What is the impact of LOD-based features on VSM, Classifiers and Graph-based Algorithms?!All recommendation approaches significantly benefit of the introduction of LOD-based features!Our best-performing configurations overcomes both collaborative and LOD-based state of the art algorithms

INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA IN CONTENT-BASED RECOMMENDATION TASKS

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 78: Linked Open Data-enabled Strategies for Top-N Recommendations

Future Research

78

Evaluation against different datasets and stronger baselines;

Better (automatic) tuning of parameters and integration of more LOD-based datasources

Evaluation of Novelty, Diversity and Serendipity on LOD-based Recommendations;

Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014

Page 79: Linked Open Data-enabled Strategies for Top-N Recommendations

questions?Cataldo Musto, Ph.D

[email protected]