(open) information extraction: where are we going?dellibovi/talks/talk_oie_ai2.pdf ·...

83
(Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th, 2016

Upload: lekiet

Post on 09-Aug-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

(Open) Information Extraction:Where are we going?

Claudio Delli BoviJuly 18th, 2016

Page 2: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

About me

[email protected]

Second-year PhD student

http://wwwusers.di.uniroma1.it/~dellibovi

LCL group @ Sapienza

Advisor: prof. Roberto Navigli

bn:17381128n

Focus (so far): Disambiguation, (Open) Information Extraction

Page 3: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Outline

BabelNet and friends: some backgroundResearch work @ LCL Sapienza

DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015

KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015

Page 4: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Outline

BabelNet and friends: some backgroundResearch work @ LCL Sapienza

DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015

KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015

Page 5: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Linguistic Computing Laboratory (LCL)@ Sapienza University of Rome

● Part of the Computer Science Department of Sapienza, focused on Natural Language Processing

● Some projects we have been involved in:

○ MultiJEDI (1.3M €): ERC Starting Grant○ LIDER (1.5 M €): EU CSA○ Google Focused Research Award (300k $)

Page 6: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

http://multijedi.org/

Page 7: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,
Page 8: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

● To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 14M entries in 271 languages and 380M semantic connections)

Page 9: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

● To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 14M entries in 271 languages and 380M semantic connections)

● Initially created as an integration of Wikipedia and WordNet, now BabelNet is a merger of many different resources (Wiktionary, Wikidata, OmegaWiki, VerbNet, ImageNet, …)

Page 10: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

● The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation

Page 11: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

● The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation

● BabelNet is composed of Babel Synsets, concepts or entities lexicalized (“WordNet-style”) in many languages and featuring:

● is-a relations● domain and categories

● images and definitions● translations

Page 12: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,
Page 13: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

BabelNet and friends

Page 14: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

BabelNet and friends

BabelfyA graph-based algorithm for multilingual joint Word Sense Disambiguation and Entity Linking, based on BabelNet

Page 15: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

BabelNet and friends

BabelfyA graph-based algorithm for multilingual joint Word Sense Disambiguation and Entity Linking, based on BabelNet

The Wikipedia BitaxonomyAn iterative algorithm for the automatic creation of a “bitaxonomy” for Wikipedia pages and categories

… and much more!

Page 16: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

BabelNet and my research

● BabelNet (especially in its early stages) was conceived as a lexico-semantic resource more than an actual knowledge base:

○ semantic connections are mostly lexical relations from WordNet or unspecified “relatedness edges” derived from Wikipedia hyperlinks

semantically related

semantically

related

Atom Heart Mother

Pink FloydNeil Armstrong NASA

Page 17: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

BabelNet and my research

● Construct from BabelNet a proper knowledge base with labeled relations (X is album by Y, X worked at Y, ... )

● BabelNet (especially in its early stages) was conceived as a lexico-semantic resource more than an actual knowledge base:

○ semantic connections are mostly lexical relations from WordNet or unspecified “relatedness edges” derived from Wikipedia hyperlinks

● Use Open Information Extraction!

Page 18: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

(Open) Information Extraction

OIE is great, but…

Sparsity: many relation phrases express the same relationship (e.g. synonyms, paraphrases)

Ambiguity: arguments (and relation phrases) are ambiguous!

Page 19: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Outline

BabelNet and friends: some backgroundResearch work @ LCL Sapienza

DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015

KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015

Page 20: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

DefIE: OIE from textual definitions

The idea:instead of targeting massive and noisy corpora (like the web) and then trying to find a smart way to cope with the noise

target smaller but “denser” (and virtually noise-free) corpora of definitional knowledge.

Page 21: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

DefIE: OIE from textual definitions

The idea:instead of targeting massive and noisy corpora (like the web) and then trying to find a smart way to cope with the noise

target smaller but “denser” (and virtually noise-free) corpora of definitional knowledge.

Apply OIE techniques to extract as much information as possible!

Page 22: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)

- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)

- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)

DefIE: OIE from textual definitions

Page 23: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:

14 million entries

both lexicographic and encyclopedic knowledge

- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)

- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)

- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)

http://babelnet.org

DefIE: OIE from textual definitions

Page 24: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:

unified graph-based approach to EL and WSD

unsupervised, based on BabelNet

- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)

- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)

- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)

http://babelfy.org

DefIE: OIE from textual definitions

Page 25: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:

log-linear parser and supertagger based on CCG

(theoretically) suited to long-distance dependencies

- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)

- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)

- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)

http://svn.ask.it.usyd.edu.au/trac/candc

DefIE: OIE from textual definitions

Page 26: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

“Atom Heart Mother is the fifth album by English band Pink Floyd.”

1. Extracting relation instances

DefIE: How it works

Textual definition d

Page 27: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Atom

“Atom Heart Mother is the fifth album by English band Pink Floyd.”

1. Extracting relation instances

DefIE: How it works

Parsingis

Mother

Heart

fifth

album

the

by

Pink

Floyd

Englishband

mod

mod

subj comp

mod det

prep

pobj

mod

mod

mod

Disambiguation Atom Heart Mother is the fifth album by English band Pink Floyd

bn:02070902n bn:03292767n

bn:00002488n

bn:00102248a

bn:00008280n

Dependency graph Gd

Sense mappings Sd

Page 28: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Atom

“Atom Heart Mother is the fifth album by English band Pink Floyd.”

1. Extracting relation instances

DefIE: How it works

is

Mother

Heart

fifth

album

the

by

Pink

Floyd

Englishband

mod

mod

subj comp

mod det

prep

pobj

mod

mod

mod

Atom Heart Mother is the fifth album by English band Pink Floyd

bn:02070902n bn:03292767n

bn:00002488n

bn:00102248a

bn:00008280n

isAtom Heart

Motherbn:02070902n

subjcomp

albumbn:00002488n

Pink Floydbn:03292767n

Englishbn:00102248a

bandbn:00008280n

fifth

the by

mod

det prep

pobj

mod mod

Syntactic-Semantic Graph S sem

d

Page 29: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

1. Extracting relation instances

DefIE: How it works

isAtom Heart

Motherbn:02070902n

subjcomp

albumbn:00002488n

Pink Floydbn:03292767n

Englishbn:00102248a

bandbn:00008280n

fifth

the by

mod

det prep

pobj

mod mod

Page 30: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

1. Extracting relation instances

DefIE: How it works

isAtom Heart

Motherbn:02070902n

subjcomp

albumbn:00002488n

Pink Floydbn:03292767n

Englishbn:00102248a

bandbn:00008280n

fifth

the by

mod

det prep

pobj

mod modalbum

bn:00002488n

= Atom Heart Mother bn:02070902n

= Pink Floyd bn:03292767nExtr

actio

n 1

Page 31: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

1. Extracting relation instances

DefIE: How it works

isAtom Heart

Motherbn:02070902n

subjcomp

albumbn:00002488n

Pink Floydbn:03292767n

Englishbn:00102248a

bandbn:00008280n

fifth

the by

mod

det prep

pobj

mod modalbum

bn:00002488n

= Atom Heart Mother bn:02070902n

= Pink Floyd bn:03292767nExtr

actio

n 1

= Atom Heart Mother bn:02070902n

= album bn:00002488nExtr

actio

n 2

Page 32: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

1. Extracting relation instances

DefIE: How it works

albumbn:00002488n

R1:

R2:

⟨Atom Heart Mother, album⟩⟨Pink Floyd, band⟩

⟨Seattle, city⟩

⟨Atom Heart Mother, Pink Floyd⟩⟨Mutter, Rammstein⟩

⟨Can’t Get Enough, Barry White⟩

Page 33: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

1. Extracting relation instances

DefIE: How it works

albumbn:00002488n

R1:

R2:

Domain Range

⟨Atom Heart Mother, album⟩⟨Pink Floyd, band⟩

⟨Seattle, city⟩

⟨Atom Heart Mother, Pink Floyd⟩⟨Mutter, Rammstein⟩

⟨Can’t Get Enough, Barry White⟩

Page 34: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

2. Relation typing and scoring

DefIE: How it works

Page 35: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

2. Relation typing and scoring

DefIE: How it works

For each relation R:

Substitute each domain and range argument with its hypernym h (using the BabelNet taxonomy) and generate a probability distribution over semantic types for the two sets

Compute the entropy of R as

Page 36: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

2. Relation typing and scoring

DefIE: How it works

For each relation R:

Substitute each domain and range argument with its hypernym h (using the BabelNet taxonomy) and generate a probability distribution over semantic types for the two sets

Compute the score of R as

Domain and range entropy of R

Length of the relation pattern of R

Total number of extracted instances

for R

Page 37: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

2. Relation typing and scoring

DefIE: How it works

Page 38: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

3. Relation taxonomization

DefIE: How it works

Page 39: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

3. Relation taxonomization

DefIE: How it works

Hypernym generalization

isStudio album by

ci

is album by

album

work of art

creation

H(c )i

cj

Page 40: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

3. Relation taxonomization

DefIE: How it works

Hypernym generalization

isStudio album by

ci

is album by

cj

album

work of art

creation

H(c )i

Substring generalization

isStudio album by

ni

is album by

nj

studio album

modifier head

Page 41: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Dataset:whole set of English textual definitions in BabelNet 2.5

4 357 327 items from 5 different sources (Wikipedia, WordNet, Wikidata, Wiktionary, OmegaWiki)

DefIE: Setup

Page 42: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

DefIE NELL PATTY ReVerb WiSeNet

# Relations 255 881 298 1 631 531 664 746 245 935

Avg. extractions 81.68 7 013.03 9.68 22.16 9.24

# Extractions 20 352 903 2 089 883 15 802 946 14 728 268 2 271 807

# Entities 2 398 982 1 996 021 1 087 907 3 327 425 1 636 307

# Edges in the taxonomy

44 412 - 20 339 - -

DefIE: Results

Page 43: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Other evaluations:

- Precision and coverage of relations

- Novelty of information

- Quality of relation taxonomization

- Quality of entity linking/disambiguation

- Impact of definition sources

DefIE: Results

Page 44: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Where from here?

- Relation clustering (as in PATTY and WiSeNet)

- Multilinguality

- Relational learning and KB completion

- Harvest definitions from the web

- Adapt to “general” text

DefIE: Future work

Page 45: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Outline

BabelNet and friends: some backgroundResearch work @ LCL Sapienza

DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015

KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015

Page 46: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

KB-Unify: Knowledge base unification via sense embeddings and disambiguation

The idea:

Open Information Extraction system

PATTYWiseNet...

NELLReVerb...

Linked Resources

Unlinked Resources

Page 47: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

KB-Unify: Knowledge base unification via sense embeddings and disambiguation

The idea:

Open Information Extraction system

PATTYWiseNet...

NELLReVerb...

Linked Resources

Unlinked Resources

Page 48: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:- A WSD/EL system (to disambiguate unlinked resources)

- A unified sense inventory S (to make the various resources “speak to each other”)

- A unified vector space VS (to associate a vector with each item of S)

KB-Unify: Knowledge base unification via sense embeddings and disambiguation

Page 49: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:- A WSD/EL system (to disambiguate unlinked resources)

- A unified sense inventory S (to make the various resources “speak to each other”)

- A unified vector space VS (to associate a vector with each item of S)

Babelfy

Babelnet

KB-Unify: Knowledge base unification via sense embeddings and disambiguation

Page 50: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

The tools:- A WSD/EL system (to disambiguate unlinked resources)

- A unified sense inventory S (to make the various resources “speak to each other”)

- A unified vector space VS (to associate a vector with each item of S)

Sense-based embedding model

Popular word2vec architecture (skip- gram) trained on a sense-annotated corpus

SensEmbed (Iacobacci et al., 2015)

KB-Unify: Knowledge base unification via sense embeddings and disambiguation

Page 51: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

A bird’s-eye view

KB-Unify: How it works

Page 52: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

A bird’s-eye viewuse BabelNet mappings to redefine each linked resource

disambiguate each unlinked resource using BabelNet as sense inventory (more on this later!)

KB-Unify: How it works

Page 53: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Disambiguation

KB-Unify: How it works

Page 54: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Two basic intuitions:1. Among all triples in target knowledge base, some of them

(even if ambiguous) will be easier to disambiguate

e.g. 〈 Armstrong , works for , NASA 〉

Disambiguation

KB-Unify: How it works

Page 55: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Two basic intuitions:1. Among all triples in target knowledge base, some of them

(even if ambiguous) will be easier to disambiguate;

2. In general, the disambiguation strategy should vary according to the degree of specificity of each relation

Two basic intuitions:1. Among all triples in target knowledge base, some of them

(even if ambiguous) will be easier to disambiguate

e.g. 〈 Armstrong , works for , NASA 〉

Disambiguation

KB-Unify: How it works

Page 56: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Group the set of unlinked triples by relation

Disambiguation

For each relation r:● Extract and disambiguate a subset of high-confidence seed

argument pairs for r ;● Estimate the specificity of r by looking at the distribution of its

disambiguated seeds in the vector space VS ; ● Disambiguate the remaining argument pairs of r with Babelfy

either triple-by-triple (if r is general) or all at once (if r is specific).

KB-Unify: How it works

Page 57: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Identifying seed argument pairs

KB-Unify: How it works

Page 58: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Identifying seed argument pairs

KB-Unify: How it works

Page 59: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Identifying seed argument pairs

ζdisSeed

Disambiguation Confidence

KB-Unify: How it works

Page 60: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Ranking relations by specificity

Domain/Range Centroids

Domain/Range Variances

KB-Unify: How it works

Page 61: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Ranking relations by specificity

Domain/Range Centroids

Domain/Range Variances

spec

Specificity threshold:

KB-Unify: How it works

Page 62: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

unlinked triples specificity ranking

disambiguated seeds

δspec

Babelfy

general

specific

triple-by-triple disambiguation

relation-by-relation disambiguation

Disambiguation with Relation Context

KB-Unify: How it works

Page 63: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

A bird’s-eye view

KB-Unify: How it works

Page 64: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Relation alignment

KB-Unify: How it works

Page 65: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Relation alignment

KB-Unify: How it works

Page 66: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Relation alignment

KB-Unify: How it works

Page 67: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Relation alignment

KB-Unify: How it works

Page 68: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Relation alignment

KB-Unify: How it works

Page 69: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Evaluation

Experimental setup:

KB-Unify: Experiments

Page 70: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Disambiguation

KB-Unify: Experiments

Page 71: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Specificity ranking

KB-Unify: Experiments

Page 72: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Specificity ranking

KB-Unify: Experiments

Page 73: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Specificity ranking

KB-Unify: Experiments

Page 74: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Cross-resource relation alignment

KB-Unify: Experiments

Page 75: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Cross-resource relation alignment

KB-Unify: Experiments

Page 76: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

KB-Unify: Future work

Where from here?

- Less “naïve” relation alignment procedure

- Iterative algorithm for disambiguation and alignment(EM-style)

- Unify OIE-based KBs with hand-curated resources(Wikidata, DBpedia, etc.)

Page 77: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Wrap up and Conclusion

Page 78: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

DefIE: A full-fledged OIE pipeline targeted to textual definitions, with explicit semantic characterization of both arguments and relation patterns

Wrap up and Conclusion

Page 79: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

DefIE: A full-fledged OIE pipeline targeted to textual definitions, with explicit semantic characterization of both arguments and relation patterns

KB-Unify: An approach to knowledge base disambiguation and unification based on a shared sense inventory and a sense-based vector space model

Wrap up and Conclusion

Page 80: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Take-home message(s):

Web-scale OIE is absolutely great, but…

1. Definitional knowledge is important: sometimes it is worth it to just step back and analyze from where valuable information is extracted (quality vs. quantity)

2. Making sense of the output is important: semantic analysis can be used to let different OIE outputs “speak to each other” and benefit from mutual enrichment

Wrap up and Conclusion

Page 81: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Take-home message(s):

Web-scale OIE is absolutely great, but…

1. Definitional knowledge is important: sometimes it is worth just stepping back and analyze from where valuable information is extracted (quality vs. quantity)

Making sense of the output is important: semantic analysis can be used to let different OIE outputs “speak to each other” and benefit from mutual enrichment

Wrap up and Conclusion

Page 82: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Take-home message(s):

Web-scale OIE is absolutely great, but…

1. Definitional knowledge is important: sometimes it is worth just stepping back and analyze from where valuable information is extracted (quality vs. quantity)

2. Making sense of the output is important: semantic analysis can be used to let different OIE outputs “speak to each other” and benefit from mutual enrichment

Wrap up and Conclusion

Page 83: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,

Thank you!

xkcd, “Extended Mind”