similarity on dbpedia

Similarity on DBpediaUIMR

PhD student: Samantha LamSupervisor: Conor Hayes

Similarity

How similar are the following films:

Similarity

How similar are the following films: (Unsatisfactory)Answer: it depends!

DBpedia Graph

Films - nodes - on DBpedia.

Some things about DBpedia:

Big, rich, dense Knowledge Base

→ 3.77m nodes, 400m edges (EN)

Lots of prior work (as we shall see...)

But very heterogeneous - vocabularies, categories

It is a graph

DBpedia Graph

It is a graph

DBpedia Graph

It is a graph

Similarity in general

Cognitive Science - Tversky (1977) - psychology - featural.

E.g. film: genre, language, director

Modelling of human thought, semantic relations, how do werelate things to each other? (Quillian & Collins 1969)

Semantic

The notion of semantic networks is derived from the hierarchicalsemantic memory model [Collins & Quillian, 1969]

Semantic Similarity

Different techniques:

Word frequency: Latent semantic analysis (doesn’t actuallyuse semantic net structure)

Rada (1989) - average shortest path length

Resnik (1999) - information content of lcs

Unfortunately...

Word frequency N/A

Often assumes hierarchical/tree structure oftaxonomy/ontology. (Both Rada and Resnik assumetaxonomy is an is-A hierarchy)

Semantic Similarity

Different techniques:

Word frequency: Latent semantic analysis (doesn’t actuallyuse semantic net structure)

Rada (1989) - average shortest path length

Resnik (1999) - information content of lcs

Unfortunately...

Word frequency N/A

Often assumes hierarchical/tree structure oftaxonomy/ontology. (Both Rada and Resnik assumetaxonomy is an is-A hierarchy)

Semantic Similarity

Remember, DBpedia not as ‘neat’:

(Image source: http://www.visualdataweb.org/relfinder/)

On DBpedia/Wikipedia

Recent applications:

Gabrilovich & Markovitch (2007) - express text as a weightedvector of Wikipedia articles, Explicit Semantic Analysis (ESA)

Witten & Milne (2008) - the Wikipedia Link-based measure -similarity of neighbours

Passant (2010) - Linked Data Semantic Distance

Mirizzi et al. (2012) uses DBpedia for movie recommendationusing a Vector Space Model

Passant (2010) - Linked Data Semantic Distance ← uses paths!

Similarity

Important:

Properties can be related to each other

type 1, e.g. influenced

node, e.g. director

type 2, e.g. collaborated with

node type 2, e.g. film

Network Similarity

Social Network Analysis

Established field - notions of influence, centrality, rank etc.

Often applied to small networks

Note: Ranking is often based on similarity

Network Similarity

Homogeneous network measures:

PageRank - Sergey & Brin (1998) - random-surfer withteleportation

SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rankof neighbours

σact - Thiel & Berthold (2010) - node similarities fromspreading activation with a decay factor

Network Similarity

Heterogeneous network measures:

PathSim - Sun & Han (2009) - count instances of‘meta-path’ (specific link pattern)

Network Similarity

Applicability to DBpedia:

PageRank, SimRank - N/A - assumes homogeneous links!

Spreading Activation - possible with constraints

Apply PathSim - but how to learn such meta-paths?

Another idea:

Count node-disjoint paths.

Why? View each path as one distinct ‘reason’.

Network Similarity

Applicability to DBpedia:

PageRank, SimRank - N/A - assumes homogeneous links!

Spreading Activation - possible with constraints

Apply PathSim - but how to learn such meta-paths?

Another idea:

Count node-disjoint paths.

Why? View each path as one distinct ‘reason’.

Similarity

Totoro GITS Matrix

Totoro 44 1 0GITS 1 35 2

Matrix 0 2 58

Totoro – GITS

Category:Anime films

GITS – Matrix

Category:Brain-computer interfacing in fictionMatrix → Category:The Matrix (franchise) →Category:Media franchises ← GITS

Similarity

How similar are the following films: Answer: it still depends

Similarity

How similar are the following films: Answer: it still depends- on the path you take

Summary

Similarity, useful concept in many areas, hard to define

how are films similar?

DBpedia, richly linked KB

film information available here

→ Problem: How to define similarity on DBpedia?

Past methods - don’t exploit linkedness

Network analysis methods can aid this

test trial with node-disjoint paths, GITS more similar to Matrixthan Totoro

Summary

Ongoing/Future Work

Mining DBpedia as Network

Analyse structured and related data

Similarity as complement to – reasoning, retrieval, querying

Also useful in NLP, recommender systems, knowledgediscovery

→ Examples: work we do in UIMR

Ongoing/Future Work

Mining DBpedia as Network

Analyse structured and related data

Similarity as complement to – reasoning, retrieval, querying

Also useful in NLP, recommender systems, knowledgediscovery

→ Examples: work we do in UIMR

Ioana Hulpus (2011/2012)

Graph-based topic analysis with the support of Linked Data

Ioana Hulpus (2011/2012)

Graph-based topic analysis with the support of Linked Data

Benjamin Heitmann (2011/2012)

Spreading activation for cross-domain recommendation

Challenges/Discussion

Challenges:

Topology of DBpedia graph

Standard SNA measures for homogeneous networks, e.g.density, degree distribution - how to apply to DBpedia?

What does a path actually mean?

Which subgraphs to use?

How do metrics vary with different subgraphs, e.g. diffontologies/categories?

Scalability (not problem, but challenge)

Evaluation - how do we confirm something is similar?

Thanks for listening! Questions/Suggestions?

Challenges:

similarity on dbpedia

semantic relations

semantic similarityremember

dbpedia graphfilms nodes

express text

gabrilovich markovitch

following lms

similarityhow similar

pagerank sergey brin

Documents

dbpedia - a crystallization point

efficient graph -based document similarity ·...

dbpedia as gaeilge chapter

josé eduardo talavera herrera on the connectivity of...

dbpedia-entity - faegheh...

[kips2014 spring] "a method of automatic schema evolution on...

computing the semantic similarity of resources in dbpedia...

dbpedia leipzig2014 csarasua_open

exploratory search on topics through different perspectives...

similarity on right triangle

quantifying species similarity and species diversity …...

characterizing user groups of dbpedia-nl through user log...

remarks on similarity-libre

dbpedia insideout

dbpedia - a large-scale, multilingual knowledge base...

dbpedia japaneseとは？

linked data & dbpedia - fusionfactory.de€¦ · linked...

time travelling through dbpedia

dbpedia bloopers

dbpedia italiana