property matching and query expansion on linked data using kullback-leibler divergence

25
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence Sean Golliher, Nathan Fortier, Logan Perreault December 12, 2013 1 / 25

Upload: sean-golliher

Post on 29-Aug-2014

384 views

Category:

Technology


1 download

DESCRIPTION

Slides for our final paper

TRANSCRIPT

Page 1: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Property Matching and Query Expansion onLinked Data Using Kullback-Leibler Divergence

Sean Golliher, Nathan Fortier, Logan Perreault

December 12, 2013

1 / 25

Page 2: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Property Matching Problem

Databases with different properties:

2 / 25

Page 3: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

def: Query Expansion

Query expansion (QE) is the process of reformulating a seedquery to improve retrieval performance in information retrievaloperations.

3 / 25

Page 4: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Societal Cloud

4 / 25

Page 5: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Cloud Diagram (TRIZ Problem Solving)

5 / 25

Page 6: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Cloud Diagram Broken

6 / 25

Page 7: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Property Matching Problem

How do we find all actors in both databases?

Don’t want to manually inspect all databases

Can we use SPARQL query language to infer across all datasets?

SELECT ?pWHERE { s ?p o }

Can only match total sizes of returned triple sets

7 / 25

Page 8: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Original Bayesian Approach

Problems with Bayesian Approach

Had to create, and track, a large vocabulary for trainingSmoothing issues with very sparse textUnderflow issues – small confidence valuesComplexity of likelihood was growing:n different features in feature set X and c classes + tunable parameters.

8 / 25

Page 9: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

KL-Divergence

Original paper from 1951 entitled “On Information and Sufficiency”

Also referred to as“relative entropy”A system gains entropy when it moves to a state with more possiblearrangements. For example, a liquid to a gas.Used in paper from 2003 for text categorization:”Using KL-Distance for Text CategorizationElegant and efficient method for plagiarism detection

9 / 25

Page 10: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

KL-Divergence

Measure of divergence of information between two distributions:

D(P ‖ Q) =∑x∈X

P(x) logP(x)

Q(x)

Not symmetric

10 / 25

Page 11: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

KL-Divergence Example

11 / 25

Page 12: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

KL-Divergence Example

Table : Generic Vocabularies Generated by Fixing on Predicates

d1 d2 d3

subject1 subject3 subject1object1 object4 object1object2 object2subject2 subject2 subject4object3 object3 object3object3

ex: D(d1‖d2) = 15 log

1/50 + 1

5 log1/50 + ........+ 2

5 log2/51/4

tf( subject1) is 1/5 in d1 and 0 in d2 – using ε value for now

12 / 25

Page 13: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Algorithm Description

13 / 25

Page 14: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Formal Problem Statement

Given:

Two databases DB1 and DB2

A predicate p1 ∈ DB1

An object type S1 where some triple “s p1 o′′ exists in D1

where s ∈ S1

Find predicate p2 in DB2 where p2 is equivilant to p1

14 / 25

Page 15: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

High Level Description

Create a document d1 containing labels of all objects linkedby p1

Find an object type S2 ∈ d2 where S1 is equivilant to S2

For each predicate p2 used by S2 create a document d2containing labels of all objects linked by p2

Remove stop words and language tags from d1 and d2

For each document compute the normalized KL-Divergence,KLD∗(d1, d2)

Return predicate corresponding to the document with thelowest KL-Divergence

15 / 25

Page 16: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Algorithm 1 FindPredicate(DB1,DB2, p1,S1)

Create document d1 containing labels of all objects linked by p1Find an object type S2 ∈ d2 where S1 is equivilant to S2for each predicate p2 used by S2 do

Create document d2 containing labels of all objects linked by p2end forRemove stop words and language tags from d1 and d2min← 1for each predicate pi used by S2 do

k ← KLD∗(d1, di )if k < min then

min← kpmap ← pi

end ifend forreturn pmap

16 / 25

Page 17: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Computing KL-Divergence

KL-Divergence is computed as

KLD(di , dj) =∑k∈V

(P(tk , di )− P(tk , dj))× logP(tk , di )

P(tk , dj)(1)

Where

P(tk , di ) =tf (tk , di )∑

x∈di tf (tx , dj)(2)

If tk does not occur in di then P(tk , di )← ε

KL-Divergence is then normalized as follows:

KLD∗(di , dj) =KLD(di , dj)

KLD(di , 0)(3)

17 / 25

Page 18: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Algorithm 2 tf (tk , di )

tf ← 0for each term tx in di do

if sim(tk , tx ) > τ thentf ← tf + 1

end ifend forreturn tf

18 / 25

Page 19: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Experimental Results

19 / 25

Page 20: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Experimental Results

20 / 25

Page 21: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Experimental Results

21 / 25

Page 22: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Experimental Results

22 / 25

Page 23: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Experimental Results

23 / 25

Page 24: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Experimental Results

24 / 25

Page 25: Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Questions?

25 / 25