improving link discovery using context-aware link...

32
PhD candidate Andrea Cimmino Improving Link Discovery using context-aware link specifications Supervised by David Ruiz, University of Seville, Spain Carlos R. Rivero, Rochester Institute of Technology, USA LDOW 2016

Upload: others

Post on 18-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

PhD candidate Andrea Cimmino

Improving Link Discovery using context-aware link specifications

Supervised by David Ruiz, University of Seville, Spain

Carlos R. Rivero, Rochester Institute of Technology, USA

LDOW 2016

Page 2: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Hi! My name is Andrea

2

BARI

SEVILLE ROCHESTER

Page 3: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Roadmap

Problem statementResults

Future work

Page 4: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

To be or not to be … the same

4

name : “Wei Wang”

name : “Wei Wang” email : “[email protected]

The same?

name : “Wei Wang”

email : “[email protected]

DATASET 2 DATASET 1

Page 5: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

To be or not to be … the same

5

name : “Wei Wang”

full-name : “Wei Wang” email : “[email protected]

The same?

full-name : “Wei Wang”

email : “[email protected]

DATASET 2 DATASET 1

Link Specification (LSAR): Levenshtein( name, full-name) ≤ 0.42

Page 6: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Article Paper

Award

To be or not to be … the same

6

writes

leads

supports

LSAR

Some publications in common?

… …

Page 7: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Article Paper

Award

To be or not to be … the same

7

writes

leads

supports

LSAR

Some publications in common?

… …

1.  RDF, OWL

Page 8: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Article Paper

Award

To be or not to be … the same

8

writes

leads

supports

LSAR

Some publications in common?

… …

1.  RDF, OWL 2.  ≠ Vocabularies

Page 9: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Article Paper

Award

To be or not to be … the same

9

writes

leads

supports

LSAR

Some publications in common?

… …

1.  RDF, OWL 2.  ≠ Vocabularies 3.  Rule generation

Page 10: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Article Paper

Award

To be or not to be … the same

10

writes

leads

supports

LSAR

Some publications in common?

… …

1.  RDF, OWL 2.  ≠ Vocabularies 3.  Rule generation 4.  Context

Page 11: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Overlap Factor

11

owl:sameAs (LSAR)

owl:sameAs (LSAP) EXISTS

FOR ALL

Contex-Aware Link Specification:

FOR ALL Levenshtein( name, full-name) ≤ 0.42 AND

EXISTS Levenshtein (title, title) < 1.20

Page 12: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Applying LSAR

12

name : “Wei Wang”

full-name : “Wei Wang” email : “[email protected]

The same?

full-name : “Wei Wang”

email : “[email protected]” The same?

Page 13: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Applying LSAR

13

name : “Wei Wang”

full-name : “Wei Wang” email : “[email protected]

The same?

full-name : “Wei Wang”

email : “[email protected]” The same?

owl:sameAs

owl:sameAs

wrongly linked correctly linked

Page 14: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Applying CALS

14

name : “Wei Wang”

full-name : “Wei Wang” email : “[email protected]

The same?

full-name : “Wei Wang”

email : “[email protected]” The same?

date: “2007” title: “Efficient computation …” year: “2007”

title: “Direct Oxidative Conversion…” date: “2012”

title: “HolisticTtwig…”

title: “Efficient computation …”

Page 15: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Applying CALS

15

name : “Wei Wang”

full-name : “Wei Wang” email : “[email protected]

full-name : “Wei Wang”

email : “[email protected]

date: “2007” title: “Efficient computation …” year: “2007”

title: “Direct Oxidative Conversion…” date: “2012”

title: “HolisticTtwig…”

title: “Efficient computation …”

owl:sameAs

wrongly linked correctly linked

Page 16: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Roadmap

Problem statementResults

Future work

Page 17: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

♦ Scenarios

Experiments

17

Scenario 1 – DBLP-NSF DBLP NSF

Author 764 Researcher 235 Article 47,225 Award 235

Paper 6,877 owl:sameAs Author ~ Researcher 188

Scenario 2 – DBLP-DBLP DBLP

Author 58 Article 5,284

owl:sameAs Author ~Author 62

Page 18: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

DBLP-NSF improving precision

18

Link Specification (LS1) Context-Aware Link Specification CALS

0.83 1.00

LS1: Jaro(name, full-name) < Threshold

CALS: for all BEST(LS1) and exists Jaro(title, title) < Threshold

Page 19: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

DBLP-DBLP improving recall

19

Link Specification (LS1) Context-Aware Link Specification CALS

0.83 1.00

LS1: Jaro(name, name) < Threshold CALS: for all Jaro(title, title) < Threshold

Page 20: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

DBLP-NSF GenLink evaluation results

20

LS for DBLP-NSF ID Examples(+/-) Link

LSN1 (+1, -1) Author ~ Researcher LSN5 (+5, -5) Author ~ Researcher LSN10 (+5, -5) Author ~ Researcher LST1 (+1, -1) Article ~ Paper LST5 (+5, -5) Article ~ Paper LST10 (+10, -10) Article ~ Paper

CALS for DBLP-NSF for link Author~ Researcher ID P R

for all LSN1 and exists LST1 0.94 1.0 for all LSN5 and exists LST5 1.0 0.38

for all LSN10 and exists LST10 1.0 0.95 Best improvement 0.24

LS for DBLP-NSF ID P R

LSN1 0.76 1.0 LSN5 0.76 1.0 LSN10 0.76 1.0

Page 21: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

DBLP-DBLP GenLink evaluation results

21

LS for DBLP-NSF ID Examples(+/-) Link

LSN1 (+1, -1) Author ~ Author LSN5 (+5, -5) Author ~ Author LSN10 (+5, -5) Author ~ Author LST1 (+1, -1) Article ~ Article LST5 (+5, -5) Article ~ Article LST10 (+10, -10) Article ~ Article

CALS for DBLP-NSF for link Author ~ Author ID P R

for all LST1 1.00 0.84 for all LST5 1.00 0.84 for all LST10 1.00 0.84 Best impr. 0.58

LS ID P R

LSN1 1.00 0.26 LSN5 1.00 0.30 LSN10 1.00 0.26

Page 22: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Roadmap

Problem statementResults

Future work

Page 23: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Award

Current work

23

WWW2017 Australia

Article Paper

writes

leads

supports

LSAR

… …

LSAR1, LSAR2

co-leads

co-author

Page 24: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Future work

24

DATASETS SETUP

TECHNIQUES

Exper iments and Ana l yses t r ack

Page 25: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Award

Future Future work

25

Article Paper

writes

leads

supports

… …

owl:sameAs

co-leads

co-author owl:sameAs

Page 26: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Andrea Cimmino [email protected]

http://tdg-seville.info/acimmino

THANKS! Queries?

Page 27: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Features

♦ R1: Input RDF, not OWL. ♦ R2: Handle different schemas/vocabularies ♦ R3: Rule based (LS) ♦ R4: Context aware ♦ R5: Efficient context

27

Page 28: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Related Work

28

Technique R1 R2 R3 R4 R5

RiMOM   - - - Nikolov et al. -

AgreementMaker   - - GenLink   - - CODI       - - - - EAGLE   - - LOGMAP   - - - Zhishi.links   - - - SLINT+   - - -

SignoProsik   ~ - ~ SERIMI   - - -

Song and Heflin - - - PARIS - - - -

Hassanzadeh et al. - -

Page 29: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

DBLP-NSF GenLink LS

dblp:name, nsf:name

Jaccard ≤ 0.37 LSN1

Jaccard ≤ 0.37 LSN5

Jaccard ≤ 0.21 LSN10

dblp:title, nsf:title

Levenshtein ≤ 29.48 LST1

Levenshtein ≤ 0.59 LST5

Levenshtein ≤ 7.05 LST10

29

Page 30: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

DBLP-DBLP GenLink LS

dblp:name, nsf:name

Jaccard ≤ 0.15 LSN1

Levenshtein ≤ 1.48 LSN5

Levenshtein ≤ 1.15 LSN10

dblp:title, nsf:title

Levenshtein ≤ 1.76 LST1

Levenshtein ≤ 1.46 LST5

Levenshtein ≤ 1.76 LST10

30

Page 31: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Link Specification model

31

Page 32: Improving Link Discovery using context-aware link specificationsevents.linkeddata.org/ldow2016/slides/ldow2016-slides... · 2016. 5. 3. · PhD candidate Andrea Cimmino Improving

Link Specification extended (context)

32

oF: OverlapFactor

C-ASameAsCondition

f: Aggregation

ConditionComposite

C-ACondition

source: Class

target: Class

C-ALinkSpecification

*

prop: ObjectProperty

dataset: {SRC, TRG}

LeafNode

source: Class

target: Class

LinkSpecification

2

*