leila – learning to extract information by linguistic analysis

31
LEILA - Learning to Extract Information by Linguistic Analysis 1 Fabian M. Suchanek LEILA – Learning to Extract Information by Linguistic Analysis presented at the 2 nd Workshop on Ontology Learning and Population (OLP2) Fabian M. Suchanek , Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany)

Upload: ginata

Post on 06-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

LEILA – Learning to Extract Information by Linguistic Analysis. presented at the 2 nd Workshop on Ontology Learning and Population (OLP2). Fabian M. Suchanek , Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany). Overview. ر Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 1Fabian M. Suchanek

LEILA – Learning to Extract Information by Linguistic Analysis

presented at the2nd Workshop on Ontology Learning and Population (OLP2)

Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum

(Max-Planck Institute for Computer Science Saarbrücken/Germany)

Page 2: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 2Fabian M. Suchanek

Overview

Motivation ر

The LEILA System ر

Plan of Attack ر

System Architecture ر

Experiments ر

Conclusion ر

Page 3: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 3Fabian M. Suchanek

Motivation

Meat dish

Google Search I'm feeling hungry

This page has been createdto enlighten the public about the Wiener Schnitzel. [...]

?

Page 4: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 4Fabian M. Suchanek

Motivation

To know that a Schnitzel is a meat dish,

we need an ontology.

Use hand-crafted ontologies (like WordNet) ر

(but: low coverage, high cost, fast aging)

Or: Gather ontological data from Web documents ر

Page 5: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 5Fabian M. Suchanek

Goal

Given

a binary target relation (e.g. subclassOf) ر

a set of Web documents ر

extract

all pairs of entities that are in the target relation

Page 6: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 6Fabian M. Suchanek

Related Work

X is a Y

A Schnitzel is a meat dish from Austria.

Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)

Page 7: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 7Fabian M. Suchanek

Related Work

X is a Y

A Schnitzel, also called Wiener Schnitzel, is a meat dish.

Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)

Page 8: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 8Fabian M. Suchanek

Related Work

┌──────Subject───────────┐┌Obj─┐

A Schnitzel, also called Wiener Schnitzel, is a meat dish.

Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)

Idea: Learn linguistic patterns!

Page 9: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 9Fabian M. Suchanek

Plan of Attack

(Web documents) (Output pairs)

Schnitzel meat dish

Koala mammal

subclassOf

(Target relation)

Page 10: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 10Fabian M. Suchanek

Preprocessing

(Web documents) (Output pairs)

Schnitzel meat dish

Koala mammal

subclassOf

(Target relation)

The Schnitzel (0.0314946089 stones) is best

enjoyed with Ösibräu.

Page 11: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 11Fabian M. Suchanek

Preprocessing

(Web documents) (Output pairs)

Schnitzel meat dish

Koala mammal

subclassOf

(Target relation)

The Schnitzel (200g) is best

enjoyed with Ösibräu.

Page 12: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 12Fabian M. Suchanek

The Schnitzel (200g) is best

enjoyed with Oesibraeu.

Preprocessing

(Web documents) (Output pairs)

Schnitzel meat dish

Koala mammal

subclassOf

(Target relation)

Page 13: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 13Fabian M. Suchanek

Preprocessing

(Web documents) (Output pairs)

Schnitzel meat dish

Koala mammal

subclassOf

(Target relation)

The Schnitzel is best enjoyed with Oesibraeu.

The Schnitzel ( 200 g )

Page 14: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 14Fabian M. Suchanek

Preprocessing

Schnitzel meat dish

Koala mammal

subclassOf

det subjparticiple

advmod comp

The Schnitzel ( 200 g )

adj adj adj adj adj

The Schnitzel is best enjoyed with Oesibraeu.

Page 15: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 15Fabian M. Suchanek

Preprocessing

Schnitzel meat dish

Koala mammal

subclassOf

(Web documents) (Output pairs)(Target relation)

Page 16: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 16Fabian M. Suchanek

Algorithm

(Web documents)(Seed pairs) (Output pairs)

Schnitzel meat dish

Koala mammal

dog mammal

...

A dog is a mammal.

dog nag

...

+

-

Page 17: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 17Fabian M. Suchanek

Algorithm

(Web documents)(Seed pairs) (Output pairs)

Schnitzel meat dish

Koala mammal

(Positive patterns)

dog nag

...

dog mammal

...+

-

This dog is a nag.A X is a Y.

Page 18: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 18Fabian M. Suchanek

Algorithm

(Web documents)(Seed pairs) (Output pairs)

Schnitzel meat dish

Koala mammal

(Positive patterns) (Negative patterns)

A X is a Y. This X is a Y.

dog nag

...

dog mammal

...+

-

Page 19: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 19Fabian M. Suchanek

Algorithm

(Web documents)(Seed pairs) (Output pairs)

Schnitzel meat dish

Koala mammal

(Generalized positive patterns)

A X is a Y.

dog mammal

...

dog nag

...

+

-

A Schnitzel is a meat dish.

Page 20: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 20Fabian M. Suchanek

LEILA: System Architecture

(Web documents)(Seed pairs) (Output pairs)

Schnitzel meat dish

Koala mammal

dog mammal

...

dog nag

...

Seed pair data sets

LEILA

LinkParser (Sleator, CMU)

Preprocessing, stemming

kNN Learner

SVMLight (Joachims, Cornell U)

Page 21: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 21Fabian M. Suchanek

Gold Standard for Evaluation

(Web documents) (Output pairs)

Schnitzel meat dish

Koala mammal

(Target relation)

(Ideal pairs)A Schnitzel is practically vitamin-free

and thus the meat dish is extremely

popular in Europe.

Schnitzel meat dish

Page 22: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 22Fabian M. Suchanek

Results with different relations

Seed pairs are given by a function that decides whether a word pair is

an example ر

(here: list of birth dates from www.famousbirthdays.com)

a counterexample ر

(here: can be deduced from examples)

a candidate ر

(here: all pairs of a name and a date)

birthDate

Page 23: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 23Fabian M. Suchanek

Results with different relations

birthDate

Patterns:

X (born in Y)

X was born in Y

...

79%8% 70%9%

Target Relation Corpus Precision Recall

Wikip composers

(see paper for details on the experiments)

Page 24: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 24Fabian M. Suchanek

Results with different relations

synonymy

Examples: all WordNet synsets

Counterexamples: all words that are not in a synset

Candidates: all pairs of proper names

Patterns: X or Y, X (or Y), ...

73%7% 64%7%

birthDate 79%8% 70%9%

Target Relation Corpus Precision Recall

Wikip composers

Wikip geography

Page 25: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 25Fabian M. Suchanek

Results with different relations

Examples: all direct WordNet hyponyms

Counterexamples: all words that are not hyponyms of each other

Candidates: all pairs of a proper name and a WordNet concept

Patterns: an X is a Y, X is unusual among the Y,...

instanceOf 58%3% 41%3%

synonymy 73%7% 64%7%

birthDate 79%8% 70%9%

Target Relation Corpus Precision Recall

Wikip composers

Wikip geography

Wikip composers

Page 26: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 26Fabian M. Suchanek

Results with different relations

instanceOf 58%3% 41%3%

synonymy 73%7% 64%7%

birthDate 79%8% 70%9%

Target Relation Corpus Precision Recall

Wikip composers

Wikip geography

Wikip composers

Wikip random

Google composers 28%3% 17%2%

33%3% 33%3%

(see paper for details on the experiments)

Page 27: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 27Fabian M. Suchanek

58

41

Results with different competitors

(see paper for explanations, conditions and details!)

Snowball

headquarters

Snowball’s corpus

TextToOnto,Text2Onto

instanceOf

Wikip composers

CV-System

instanceOf

CV’s corpus

CV-System

instanceOf

Wikip composers

34

90

50

30

58

4150

39

4

3226

15

3222

4

(Results in %, LEILA in red)

2

Precision Recall Precision Recall Precision Recall Precision Recall

Page 28: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 28Fabian M. Suchanek

Conclusion

Our system LEILA

can learn arbitrary binary relations from Web documents ر

uses a deep linguistic analysis ر

compares favorably with other systems ر

See http://www.mpi-inf.de/~suchanek

Page 29: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 29Fabian M. Suchanek

Results with different competitors

headquarters

instanceOf

34%8% 30%7%

System Relation Corpus Precision Recall

Snowball Snowball’s

headquarters 90%6% 50%7%LEILA Snowball’s

TextToOnto Wikip composers 39%9% 4%1%

Text2Onto instanceOf Wikip composers 50% 2%1%

CV-System instanceOf CV’s 32%5% 32%5%

LEILA instanceOf CV’s 26%7% 15%4%

CV-System instanceOf 22% 4%2%Wikip composers

LEILA instanceOf Wikip composers 58%3% 41%3%

(see paper for explanations, conditions and details!)

LEILA instanceOf Wikip composers 58%3% 41%3%

Page 30: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 30Fabian M. Suchanek

Pattern Generalization – kNN

This X is a Y. X such as Y

A X is a Y.+

+-

A X is a big Y

(See our paper at KDD for details)

Page 31: LEILA – Learning to Extract Information by Linguistic Analysis

LEILA - Learning to Extract Information by Linguistic Analysis 31Fabian M. Suchanek

Pattern Generalization – SVM

This X is a Y.

X such as Y

A X is a Y.+

+

-

A X is a big Y

(See our paper at KDD for details)

- +

+