12/15/2014
1
11ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI)
Roberto Navigli
1
http://lcl.uniroma1.it
BabelNet, Babelfy and BeyondAI*IA 2014 Tutorial – 12th December 2014
Roberto Navigli
ERC Starting Grant MultiJEDI No. 259234
66
Tutorial Outline
1. Foundations in Semantic Processing
2. BabelNet: the largest multilingual semantic resource
3. Babelfy: Multilingual Word Sense Disambiguation and
Entity Linking
4. Beyond: what comes next?
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
2
88
Projects thanks to which this tutorial exists
MultiJEDI (1.3Meuros): ERC Starting Grant
LIDER (1.5Meuros): EU CSA
Google Focused Research Award (200k$)
8BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
99
Also starring
15/12/2014
9
Simone
Ponzetto
Tiziano
Flati
David
Jurgens
Andrea
Moro
Daniele
Vannella
Taher
Pilehvar
Francesco
Cecconi
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
3
1010
Part 1:
Foundations
10BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
1111
Barack Obama peruses the internet.
Understanding a simple phrase
11BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
4
121212BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
1313
Natural language is ambiguous
Listen to some rock!
Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial
Roberto Navigli and David Jurgens
12/15/2014
5
1414BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens.I cannot hear anything…
1919
The Multilingual, Big-Picture Goal
“Underground
rock concert”
“언더그라운드락콘서트"
“Underground rock
formation”
“지하암석"
NLP
Applications
[semantic representation]
[semantic representation]
Black
Box
19BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
6
2020
The General Problem
POLYSEMY
• The most frequent words have several
meanings!
• Our job: model meaning from a
computational perspective
20BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
2121
Monosemous vs. Polysemous words
• Monosemous words have only one meaning– Examples:
• plant life
• internet
• Polysemous words have more than onemeaning– Example: bar
– “a room or establishment where alcoholic drinks are served”
– “a counter where you can obtain food or drink”
– “a rigid piece of metal or wood”
– “musical notation for a repeating pattern of musical beats”
21BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
7
2525
How do we represent and
encode semantics?
“Underground
rock concert”
“언더그라운드락콘서트"
“Underground rock
formation”
“지하암석"
NLP
Applications
[semantic representation]
[semantic representation]
Black
Box
What comes out of the black box?
25BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
2626
How do we represent and encode semantics?
• Thesauri
• Groups words according to similar meaning
• Relations between groups (e.g., narrower meanings)
• Roget’s Thesaurus (1911)
• Machine Readable Dictionaries
• Enumerates all meanings of a word
• Includes definitions, morphology, example usages, etc.
• Oxford Dictionary of English, LDOCE, Collins, etc.
• Computation Lexicons
• Repositories of structured knowledge about a word semantics and
syntax
• Include relations like hypernymy, meronymy, or entailment
• WordNet
26BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
8
2727
• Each meaning is encoded as a synset (synonym set), which is a
collection of synonymous senses
Senses and Relations in WordNet
27BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
2828
• Each meaning is encoded as a synset (synonym set), which is a
collection of synonymous senses
• Semantic relations between synsets
– Hypernymy (carn1 is-a motor vehiclen
1)
– Meronymy (carn1 has-a car doorn
1)
– Entailment, similarity, attribute, etc.
• Lexical relations between word senses
– Antonymy (gooda1 antonym of bada
1)
– Pertainymy (dentala1 pertains to toothn
1)
– Nominalization (servicen2 nominalizes servev
4)
Senses and Relations in WordNet
28BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
9
2929
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has
-par
t{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
concepts
semantic relation
WordNet [Miller et al., 1990; Fellbaum, 1998]
29BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
3030
Wordnets in other Languages
• EuroWordNet (Vossen, 1998)
• BalkaNet (Tufis et al., 2004)
• Multilingual Central Repository (Atserias et al., 2003)
• GermaNet (Hamp and Feldweg, 1997)
• SloWNet (Fišer and Sagot, 2008)
• WOLF (Sagot and Fišer, 2008)
• Hungarian WN (Miháltz et al, 2008)
• Japanese WN (Isahara et al, 2008)
• …
• Currently 73 unique wordnets: http://globalwordnet.org/wordnets-in-the-world/
30
WordNet
MultiWordNet
WOLF
MCRGermaNet
BalkaNet
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
10
3131
An ideal resource for Multilingual Semantic
Processing
• Capable of representing the meaning of a piece of text as
word senses in any language
• broad coverage of different senses, including
language-specific senses
• currently problematic for many language-specific
wordnets
• Encodes semantic and syntactic relationships between
the synsets
• Highly beneficial for NLP applications
• Encodes definitions and usages for synsets
31BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
3232BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
• Question Answering
• Semantic Information Retrieval
• Cross-lingual Document Retrieval
• Semantically-enhanced Machine Translation
• Computer-assisted translation
• Language learning/teaching
• Linguistically-grounded Multilingual Knowledge
Representation
• Semantic annotation
• (Linguistic) Linked Open Data
• Computer vision: Vision & Language
Motivations at the intersection of NLP
12/15/2014
11
3333
Part 2a:
Making Multilingual Knowledge
33BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
3434
Objective and motivation
Goal:
• A large repository of knowledge in a multilingual setting
Motivations:
• A common ground for language technologies that brings
together:
• Multilinguality
• Encyclopedic knowledge
• Lexicographic knowledge
• Semantic relations
• Textual definitions
• Domain information
• …34BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
12
3737
How many meanings for «balloon»?
balloon
WordNet
Wikipedia
37BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
3838
Core Challenges
1. Integrating and unifying heterogeneous resources
2. Managing many different languages
3. Having a wide range of semantic relations between
concepts and named entities
4. Maintaining high accuracy
38BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
13
3939
This is where the ERC (and our project) comes
into play
A 5-year ERC Starting Grant (2011-2016)
on Multilingual Word Sense Disambiguation
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
40
Key Objective 1: create knowledge for all languages
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
WordNet
MultiWordNet
WOLF
MCRGermaNet
BalkaNet
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
14
4141
Goal: Creating a Multilingual Semantic Network
Start from two large complementary resources:
WordNet: full-fledged taxonomy
Wikipedia: multilingual and continuously updated
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has
-par
t
{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
Get the best from both worlds
41BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
4242
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has
-par
t
{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
concepts
semantic relation
WordNet [Miller et al., 1990; Fellbaum, 1998]
42BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
15
4343
Playing with senses
Bla bla bla bla bla bla bla
Bla bla bla bla bla bla bla
Bla bla bla bla bla bla bla
Bla bla bla bla bla bla bla
Bla bla bla bla bla bla bla
concepts
(unspecified) semantic relation
Wikipedia [The Web Community, 2001-today]
43BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens.
4444
BabelNet: concepts and semantic relations (1)
Concepts and relations in BabelNet are harvested from
WordNet and Wikipedia:
WordNet: BabelNet:
Wikipedia: BabelNet:
synsets concepts
lexico-semantic relations semantic relations
pages
hyperlinks
concepts
semantic relations
44BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
16
4545
An example of mapping
45BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
4646
Creation of the Wikipedia disambiguation
contexts
ctx(Balloon (aircraft)) = { }
46BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
17
4747
Creation of the Wikipedia disambiguation contexts
ctx(Balloon (aircraft)) = { aircraft }
sense label
47BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
4848
Creation of the Wikipedia disambiguation contexts
ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy,
airship, …, gondola }
hyperlinkshyperlinkshyperlinks
48BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
18
4949
Creation of the Wikipedia disambiguation contexts
ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy,
airship, …, gondola, ballooning, hydrogen, aeronautics }
categoriescategories
49BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
5050
Building BabelNet: Mapping Wikipedia to
WordNet
Given a Wikipage w and its disambiguation context ctx(w):
For each WordNet sense s of w, calculate score(s, w) as follows:
50BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
19
5252
The Wikipedia page context in the WordNet
graph
balloon#n#1
ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy,
airship, …, gondola }
52BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
5353
The Wikipedia page context in the WordNet
graph
airship#n#1
aerostat#n#1
aircraft#n#1
buoyancy#n#1gondola#n#1
balloon#n#1
53BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
20
5454
The Wikipedia page context in the WordNet
graph
airship#n#1
aerostat#n#1
aircraft#n#1
buoyancy#n#1gondola#n#1
balloon#n#1
balloon#n#1 -> aircraft#n#1
balloon#n#1 -> aircraft#n#1 -> airship#n#1
balloon#n#1 -> gondola#n#1
balloon#n#1 -> gondola#n#1 -> flight#n#1 -> buoyancy#n#1
balloon#n#1 -> aerostat#n#1
54BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
5555
The Wikipedia page context in the WordNet
graph
airship#n#1
aerostat#n#1
aircraft#n#1
buoyancy#n#1gondola#n#1
balloon#n#1
balloon#n#1 -> aircraft#n#1
balloon#n#1 -> aircraft#n#1 -> airship#n#1
balloon#n#1 -> gondola#n#1
balloon#n#1 -> gondola#n#1 -> flight#n#1 -> buoyancy#n#1
balloon#n#1 -> aerostat#n#1
0.35
55BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
21
5656
BabelNet: concepts and semantic relations (2)
We encode knowledge as a labeled directed graph:
Each vertex is a Babel synset
Each edge is a semantic relation between synsets:
is-a (balloon is-a aircraft)
part-of (gasbag part-of balloon)
instance-of (Einstein instance-of physicist)
…
unspecified/relatedness (balloon related-to flight)
balloonEN, BallonDE,
aerostatoES, aerostatoIT,
pallone aerostaticoIT,
mongolfièreFR
56BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
5757
Building BabelNet: Translating Babel synsets
1. Exploiting Wikipedia interlanguage links
pallone
aerostatico
globo
aerostàtico
Ballon
57BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
22
5858
Building BabelNet: Translating Babel synsets
2. Filling the lexical translation gaps using a Machine
Translation system to translate the English lexicalizations of
a concept
On August 27, 1783 in Paris, Franklin witnessed the
world's first hydrogen [[Balloon (aircraft)|balloon]]
flight.
Le 27 Août, 1783 à Paris, Franklin vu le premier vol en
ballon d'hydrogène.
Google Translate
58BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
5959
Building BabelNet: Translating Babel synsets
2. Filling the lexical translation gaps using a Machine
Translation system to translate the English lexicalizations of
a concept
For each word sense s, we translate:
sentences from SemCor (a corpus annotated with WordNet
senses) which contain s
sentences from Wikipedia linked to the Wikipage of s
The most frequent translation of s is selected for each target
language
59BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
23
6060
The most frequent translation of a word in a given
meaning
left context term right context
wikification may refer to: the…
geoinformatics services' and ' wikification of GIS by the masses'
the process may be called wikification (as in ...
which is then called " wikification and to the related problem
reason needs copyediting, wikification , reduction of POV, work on references
huge amount of cleanup, wikification , etc. Version of 12 Nov
60BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
6161
left context term right context
wikificazione potrebbe riferirsi a: il…
servizi geoinformatici' e ' wikification di GIS dalle masse'
il processo chiamato wikificazione (come in ...
che è quindi chiamato wikificazione e al problema correlato…
ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference
grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre
The most frequent translation of a word in a given
meaning
61BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
24
6262
left context term right context
wikificazione potrebbe riferirsi a: il…
servizi geoinformatici' e ' wikification di GIS dalle masse'
il processo chiamato wikificazione (come in ...
che è quindi chiamato wikificazione e al problema correlato…
ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference
grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre
The most frequent translation of a word in a given
meaning
62BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
6363
BabelNet [Navigli and Ponzetto, AIJ 2012]
A wide-coverage multilingual semantic network
including both encyclopedic (from Wikipedia) and
lexicographic (from WordNet) entries
Concepts from WordNetNEs and specialized
concepts from Wikipedia
Concepts integrated from
both resources
63BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
25
6464
Integrating WordNet with Wikipedia…
Is that all?!?
WordNet
64BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
6565
Open Multilingual WordNet
[Bond and Foster, 2013]
• http://compling.hss.ntu.edu.sg/omw/
• 22 languages
• Mappings to the Princeton WordNet synsets
• More than 600,000 lexicalizations
Francis Bond and Kyonghee Paik. 2012. A survey of wordnets and their
licenses. In Proc. of GWC 2012
Francis Bond and Ryan Foster. 2013. Linking and extending an open
multilingual wordnet. In Proc. of ACL
65BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
26
6666
OmegaWiki (http://www.omegawiki.org)
• Hundreds of languages
• About 50,000 entries («synsets»)
66BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
6767
Some statistics for OmegaWiki
Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial
Roberto Navigli and David Jurgens
67
12/15/2014
27
6868
• A collaborative dictionary!
• Hundreds of languages
• About 3.7M entries
68
Wiktionary (http://www.wiktionary.org)
Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial
Roberto Navigli and David Jurgens
6969BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 69
Some statistics for Wiktionary
12/15/2014
28
7070
• A collaborative knowledge base!
• Hundreds of languages
• About 15M entries
BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 70
Wikidata (http://www.wikidata.org)
7171
But how to integrate all these resources?
BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 71
12/15/2014
29
7272
Alignment Approaches
Usually measure the similarity of two concepts
WordNet
plant#n#1plant#n#1
72BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
7373
Alignment Approaches
Usually measure the similarity of two concepts
And align two concepts if their similarity exceeds
a threshold
12/15/2014
30
7474
SemAlign: Cross-resource Concept Alignment
[Pilehvar and Navigli, ACL 2014]
We combine two different similarity measures:
74BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
7777
Alignment Approaches
Gloss similarity
Strong baseline
Falls short whenTotally different wordings are used for same concepts
When we lack quality glosses
An area within a building enclosed by walls and floor and ceiling.
A room is any distinguishable space within a structure.
Gloss similarityDefinition similarity
77BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
31
8282
SemAlign: structural similarity
Structural similarity
82BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
8383
1. paper -- a material made
of cellulose pulp derived
mainly from wood or rags
or certain grasses.
sheet
cellulose
Wikipedia
Semantic
Network
WordNet
Semantic
Network
fiber
fiber
material
cellulose
SemAlign: structural similarity
83BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
32
8484
SemAlign: Core
84BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
8585
Structural Similarity with Personalized PageRank
[Pilehvar and Navigli, ACL 2014]
some
12/15/2014
33
8686
Personalized PageRank
9696
SemAlign: signature comparison
Structural similarity
96BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
34
9797
Semantic Signature Comparison
BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 97
102102
• We calculate the following formula:
• where rik is the ranking of the i-th element in vector k
Comparing Semantic SignaturesWeighted Overlap [Pilehvar et al., ACL 2013]
102BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
35
108108
BabelNet 3.0 is now out: http://babelnet.org
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
109109
BabelNet 3.0 is now out: http://babelnet.org
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
36
110110
BabelNet 3.0 is now out: http://babelnet.org
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
111111
BabelNet goes at a faster pace than I can cope
withKey fact!
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
37
112112
Anatomy of BabelNet 3.0
271 languages covered (including Latin!)
List at http://babelnet.org/stats
112BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
113113
Anatomy of BabelNet 3.0
271 languages covered (including Latin!)
13.8M Babel synsets
6.4M concepts, 7.4M Named Entities
117M word senses
355M semantic relations (26 edges per synset on avg.)
11M synset-associated images
40M textual definitions
113BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
38
114114
• Seamless integration of:
• WordNet 3.0
• Wikipedia
• Wikidata
• Wiktionary
• OmegaWiki
• Open Multilingual WordNet [Bond and Foster, 2013]
• Translations for all open-class parts of speech
• 1.1B RDF triples available via SPARQL endpoint
New 3.0 version out!
114BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
115115
BabelNet 1.1.1 2.0 2.5 3.0
1. From six to 50 to 271 languages;
2. From two resources to six;
3. From 5M to 9.3M to 13.8M million synsets;
4. From 50M to 68M to 117M word senses;
5. From 140M to 262M to 355M semantic relations.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
39
116116
WordNet+OpenMultilingualWordNet+Wikipedia+…
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
117117
+OmegaWiki+automatic translations…
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
40
118118
+textual definitions
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
119119
More definitions+Wikipedia categories+…
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
41
120120
+images
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
121121
Evaluations: I (might) have to go fast here!
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
42
122122
WordNet-Wikipedia mapping accuracy
Overall quality of the mapping: ~90%
Note: this concerns only those 50k synsets in the intersection
BabelNet goes to the
Multilingual Semantic Web.
Roberto Navigli and David
Jurgens.
123123
We are not alone in the (resource) universe!
15/12/2014 BabelNet: a Very Large
Multilingual Ontology
Roberto Navigli
123
12/15/2014
43
124124
We are not alone in the (resource) universe!DBPedia [Bizer et al. 2009] - a resource obtained from
structured information in Wikipedia
«Describes 3.77M things»
Core of the Linked Open Data Cloud
YAGO [Suchanek et al. 2007]
«Contains 10M entities and 120M facts about these entities»
Links Wikipedia categories to WordNet synsets
MENTA [de Melo and Weikum, 2010]
A «multilingual taxonomy with 5.4M entities»
WikiNet [Nastase and Strube, 2013]
Semantic network connecting Wikipedia entities
«3M concepts and 38+M relations»
Freebase (http://freebase.com): collaborative effort
Structured data; started from Wikipedia, MusicBrainz, ChefMoz, etc.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12612615/12/2014 Natural Language
Processing:
Regular Expressions,
Automata and Morphology
Pagina 126
Hands-on Session: the BabelNet Java API 3.0
12/15/2014
44
127127
Part 2b:
Structuring Knowledge
127BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
128128
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has
-par
t
{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
(The nominal part of) WordNet is structured as
a taxonomy!
128BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
45
129129
The Wikipedia structure
Article pages
~4M
Category pages
~ 700K
Two noisy graphs with no explicit hypernym relation.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
130130
The Wikipedia structure: an example
Pages Categories
Mickey Mouse
Funny AnimalSuperman
Cartoon
Donald Duck
Disney comics
characters
Disney comicsDisney character
Fictional characters
by mediumComics by
genre
Fictional
characters
The Walt Disney
Company
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
46
131131
Our goal
To automatically create a Wikipedia Bitaxonomy
for Wikipedia pages and categories in a
simultaneous fashion.
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
132132
Our goal
To automatically create a Wikipedia Bitaxonomy
for Wikipedia pages and categories in a
simultaneous fashion.
The page and category level are mutually
beneficial for inducing a wide-coverage
and fine-grained integrated taxonomy
KEY IDEA
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
47
133133
The Wikipedia Bitaxonomy: an example
Pages Categories
Disney comics
characters
Disney comicsDisney character
The Walt Disney
Company
Fictional characters
by mediumComics by
genre
Fictional
characters
Mickey Mouse
Funny AnimalSuperman
Cartoon
Donald Duckis a
is a
is a is a
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
134134
A 3-phase method
Starting from two noisy graphs
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
48
139139
The WiBi Page
taxonomy1
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
140140
Assumption
• The first sentence of a page is a good definition
(also called gloss)
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
49
141141
The WiBi Page taxonomy
1. [Syntactic step]
Extract the hypernym lemma
from a page definition using
a syntactic parser;
2. [Semantic step]
Apply a set of linking
heuristics to disambiguate
the extracted lemma.
Scrooge McDuck is a character […]
Syntactic step
Hypernym lemma: character
ASemantic step
Scrooge McDuck is a character[…]
nn nsubj
cop
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
155155
The story so far
1
Noisy page graph Page taxonomy
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
50
156156
2The Bitaxonomy
algorithm
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
157157
The Bitaxonomy algorithm
The information available in the two taxonomies
is mutually beneficial
• At each step exploit one taxonomy to update
the other and vice versa
• Repeat until convergence
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
51
158158
pages categories
Real Madrid
F.C.
Football team Football teams
Football clubs
in Madrid
Atlético MadridFootball clubs
Starting
from the
page
taxonomy
The Bitaxonomy algorithm
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
159159
The Bitaxonomy algorithm
Real Madrid
F.C.
Football team Football teams
Football clubs
in Madrid
Football clubs
Exploit the cross links to infer hypernym relations in the category taxonomy
Atlético Madrid
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
52
160160
The Bitaxonomy algorithm
Real Madrid
F.C.
Football team Football teams
Football clubs
in Madrid
Football clubs
Take advantage of cross links to infer back is-a relations in the page taxonomy
Atlético Madrid
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
161161
The Bitaxonomy algorithm
Real Madrid
F.C.
Football team Football teams
Football clubs
in Madrid
Football clubs
Use the relations found in previous step to infer new hypernym edges
Atlético Madrid
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
53
162162
The Bitaxonomy algorithm
Atlético MadridReal Madrid
F.C.
Football team Football teams
Football clubs
in Madrid
Football clubs
Mutual enrichment of both taxonomies until convergence
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
166166
The story so far
2
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
54
167167
3The WiBi category
taxonomy refinement
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
168168
Comics characters
by protagonist
Comics
characters
Garfield
characters
Category taxonomy refinement
Some categories are affected by some
structural problems.
pages categories
No pages
associated!
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
55
169169
Category taxonomy evaluation: coverage
+50%
categories
covered!
1SUP SUB SUPER
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
171171
WiBi: Experimental Setup
We created 2 datasets:
o 1000 randomly sampled pages;
o 1000 randomly sampled categories.
Each item was annotated with the most suitable
generalization (lemma+page or category).
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
56
172172
Other resources to compare with
WikiNet
MENTA
WikiTaxonomy
pages categories
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
175175
Page Taxonomy Comparison
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
57
176176
Page Taxonomy Comparison
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
177177
Category Taxonomy Comparison
“Football in Catalonia” is-a “entity#n#1”
“Human height” is-a “entity#n#1”
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
58
178178
Category Taxonomy Comparison
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
181181BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
59
182182
Part 3:
Addressing ambiguity
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
183183
Motivation
• Web content is available in many languages
• Information should be extracted and processed
independently of the source/target language
• This could be done automatically by means of high-
performance multilingual text understanding
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
60
184184
Motivation
One of the key challenges of multilingual text
understanding regards the effective treatment of one of
the fundamental aspects of language:
Ambiguity!
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
185185
Word Sense Disambiguation and Entity Linking
Thomas and Mario are strikers playing in Munich
Entity Linking: The task
of discovering mentions
of entities within a text
and linking them in a
knowledge base.
WSD: The task aimed at
assigning meanings to word
occurrences within text.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
61
186186
Word Sense Disambiguation in a Nutshell
strikers
(target word)
“Thomas and Mario are strikers playing in Munich”
(context)
WSD
system
knowledge
sense of target word
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
187
Main references
A complete survey of the field:Navigli R. Word Sense Disambiguation: a Survey. ACM
Computing Surveys, 41(2), ACM Press, 2009, pp. 1-69.
WSD book:Agirre E. and Edmonds P. Word Sense Disambiguation:
Algorithms and Applications, New York, USA, Springer, 2006.
Another survey from last decade:Ide N. and Véronis J. Word Sense Disambiguation: The
State of The Art. Computational Linguistics, 24(1), 1998, pp. 1-40.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
62
188188
WSD: main approaches
Supervised WSDFrames the problem as a classification task
Relies on hand-labeled training sets
Knowledge-based WSDUses knowledge resources to identify the best senses for words in context
Typically, it does not need a training phase and relies on an existing inventory of senses
Word Sense Discrimination / InductionUnsupervised WSD: clustering
Does not need manually-tagged datasets
Can make the task more difficult to evaluate
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
189
Supervision: labeled data vs.
knowledge
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
63
198198
Entity Linking in a Nutshell
Thomas
(target mention)
“Thomas and Mario are strikers playing in Munich”
(context)
EL
system
knowledge
Named Entity
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
199199
Entity Linking
EL encompasses a set of similar tasks:
• Named Entity Disambiguation, that is the task of
linking entity mentions in a text to a knowledge base
• Wikification, that is the automatic annotation of text by
linking its relevant fragments of text to the appropriate
Wikipedia articles.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
64
200200
Entity Linking
State-of-the-art approaches are based on the following
concepts:
• Collective disambiguation of mentions vs. indipendent
disambiguation of mentions;
• Enforcing semantic coherence among the chosen
named entities;
• Efficiency: there are orders of magnitude between the
number of word senses and named entities!
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
201201
State-of-the-art EL systems
• AIDA (Hoffart et al., 2011): a graph-based framework for the
exploitation of similarity measures between candidate entities;
• KORE (Hoffart et al., 2012): a graph-based similarity measure
integrated with key phrases contained within the context to
disambiguate entities;
• Tagme (Ferragina and Scaiella, 2012): a combination of the
Milne-Witten measure (hyperlinks similarity on Wikipedia) with the
commonness of an entity;
• Wikifier (Cheng and Roth, 2013): a global and local approach
based on the TF-IDF score combined with hyperlinks in Wikipedia;
• DBpedia Spotlight (Mendes et al., 2011): a generative model
based on counts obtained from manually disambiguated Wikipedia
hyperlinks (high prec., low recall).
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
65
202202
State-of-the-art EL systems
• AIDA (Hoffart et al., 2011): a graph-based framework for the
exploitation of similarity measures between candidate entities;
• KORE (Hoffart et al., 2012): a graph-based similarity measure
integrated with key phrases contained within the context to
disambiguate entities;
• Tagme (Ferragina and Scaiella, 2012): a combination of the
Milne-Witten measure (hyperlinks similarity on Wikipedia) with
the commonness of an entity;
• Wikifier (Cheng and Roth, 2013): a global and local approach
based on TF-IDF combined with hyperlinks in Wikipedia;
• DBpedia Spotlight (Mendes et al., 2011): a generative model
based on counts obtained from manually disambiguated Wikipedia
hyperlinks (high prec., low recall).
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
213213
The multilingual aspect of disambiguation
• In both tasks, WSD and EL, knowledge-based
approaches have been shown to perform well
• What about multilinguality?
• Which kind of resources are available out there?
Open
Multilingual
WordNet
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
66
214214
BabelNet as a Multilingual Inventory for:
Concepts
Calcio in Italian can denote different concepts:
Named Entities
The text Mario can be used to represent different things
such as the video game charachter or a soccer player
(Gomez) or even a music album
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
215215
Calcio / Kick in BabelNet 2.5
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
67
216216
Calcio / Calcium in BabelNet 2.5
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
217217
Calcio / Soccer in BabelNet 2.5
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
68
218218
Word Sense Disambiguation in a Nutshell
striker
(target word)
“Thomas and Mario are strikers playing in Munich”
(context)
WSD
system
knowledge
sense of target word
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
219219
Entity Linking in a Nutshell
Thomas
(target mention)
“Thomas and Mario are strikers playing in Munich”
(context)
Entity Linking
system
Named Entity
knowledge
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
69
220220
Disambiguation and Entity Linking together!
BabelNet is a huge multilingual inventory
for both word senses and named entities!
220BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
221
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
Key Objective 2: use all languages to disambiguate one
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
70
222222
So what?
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
223223
Babelfy: A Joint approach to WSD and EL
[Moro et al., TACL 2014]
• Based on Personalized PageRank, the state-of-the-art
method for graph-based WSD.
However, it cannot be run for each new input on huge graphs.
• Idea: Precompute semantic signatures for the nodes!
• Semantic signatures are the most relevant nodes for
a given node in the graph computed by using random
walk with restart
Andrea Moro and Alessandro Raganato and Roberto Navigli. 2014. Entity
Linking meets Word Sense Disambiguation: a Unified Approach.
Transactions of the Association for Computational Linguistics (TACL), 2.
http://babelfy.orgBabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
71
224224
Babelfy: A Joint approach to WSD and EL
[Moro et al., TACL 2014]
1. Precompute semantic signatures;
2. Given an input text select all the possible candidate
meanings from BabelNet by matching mentions with
BabelNet lexicalizations;
3. Connect all the candidate meanings by using semantic
signatures;
4. Extract a dense subgraph containing semantically
coherent candidates;
5. Select the most connected candidate for each fragment
of text.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
225225
Step 1: Semantic Signatures
a. Start from one target vertex of the semantic network;
b. Randomly select a neighbor of the current vertex or
restart from the target vertex;
c. Keep the counts of hitting frequencies;
d. Take the most visited vertices.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
72
226226
Step 1: Semantic Signatures
striker
offside
athlete
sportsoccer player
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
227227
1. Precompute semantic signatures;
2. Given an input text select all the possible candidate
meanings from BabelNet by matching mentions with
BabelNet lexicalizations;
3. Connect all the candidate meanings by using semantic
signatures;
4. Extract a dense subgraph containing semantically
coherent candidates;
5. Select the most connected candidate for each fragment
of text.
Babelfy: A Joint approach to WSD and EL
[Moro et al., TACL 2014]
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
73
228228
Step 2: Find all possible meanings of words
1. Exact Matching (good for WSD, bad for EL)
Thomas and Mario are strikers playing in Munich
Thomas,
Norman Thomas,
Seth
They both have
Thomas as one of
their lexicalizations
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
229229
Step 2: Find all possible meanings of words
1. Partial Matching (good for EL)
Thomas and Mario are strikers playing in Munich
Thomas,
Norman Thomas,
Seth
Thomas
Müller
It has Thomas as a
subsequence of one
of its lexicalizations
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
74
230230
Step 2: Find all possible meanings of words
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)Munich (City)
FC Bayern Munich
Munich (Song)
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
231231
Step 2: Find all possible meanings of words
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)Munich (City)
FC Bayern Munich
Munich (Song)
Ambiguity!
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
75
232232
1. Precompute semantic signatures;
2. Given an input text select all the possible candidate
meanings from BabelNet by matching mentions with
BabelNet lexicalizations;
3. Connect all the candidate meanings by using semantic
signatures;
4. Extract a dense subgraph containing semantically
coherent candidates;
5. Select the most connected candidate for each fragment
of text.
Babelfy: A Joint approach to WSD and EL
[Moro et al., TACL 2014]
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
233233
Step 3: Connect all the candidate meanings
Thomas and Mario are strikers playing in Munich
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
76
234234
1. Precompute semantic signatures;
2. Given an input text select all the possible candidate
meanings from BabelNet by matching mentions with
BabelNet lexicalizations;
3. Connect all the candidate meanings by using semantic
signatures;
4. Extract a dense subgraph containing semantically
coherent candidates;
5. Select the most connected candidate for each fragment
of text.
Babelfy: A Joint approach to WSD and EL
[Moro et al., TACL 2014]
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
235235
Step 4: Extract a dense subgraph
Thomas and Mario are strikers playing in Munich
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
77
236236
Step 4: Extract a dense subgraph
Thomas and Mario are strikers playing in Munich
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
237237
1. Precompute semantic signatures;
2. Given an input text select all the possible candidate
meanings from BabelNet by matching mentions with
BabelNet lexicalizations;
3. Connect all the candidate meanings by using semantic
signatures;
4. Extract a dense subgraph containing semantically
coherent candidates;
5. Select the most connected candidate for each fragment
of text.
Babelfy: A Joint approach to WSD and EL
[Moro et al., TACL 2014]
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
78
238238
Step 5: Select the most reliable meanings
• We take into account both the lexical coherence, in
terms of the number of fragments a candidate relates to,
and the semantic coherence, using a graph centrality
measure among the candidate meanings.
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
239239
Step 5: Select the most reliable meanings
Thomas and Mario are strikers playing in Munich
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
79
240240
Step 5: Select the most reliable meanings
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)Munich (City)
FC Bayern Munich
Munich (Song)
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
242242
Experimental Results:
Fine-grained (Multilingual) Disambiguation
Senseval-3
SemEval-2007
task 17
SemEval-2013 task 12
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
80
243243
Experimental Results:
Coarse-grained Word Sense Disambiguation
SemEval-2007 task 7 dataset:
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
244244
Experimental Results: Entity Linking
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
81
245245
http://babelfy.org
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
246246BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 246
12/15/2014
82
247247Multilingual Word Sense Disambiguation and Entity Linking – COLING 2014 TutorialRoberto Navigli and Andrea Moro
248248Multilingual Word Sense Disambiguation and Entity Linking – COLING 2014 TutorialRoberto Navigli and Andrea Moro
12/15/2014
83
249249
Babelfy: RESTful API
Babelfy bfy = Babelfy.getInstance(AccessType.ONLINE);
String inputText = "hello world, I'm a computer scientist";
Annotation annotations = bfy.babelfy("key", inputText, Matching.PARTIAL, Language.EN);
System.out.println("inputText: "+inputText);
System.out.println("annotations:");
for(BabelSynsetAnchor annotation : annotations.getAnnotations()){
System.out.println(annotation.getAnchorText());
System.out.println("\t"+annotation.getBabelSynset().getId()+"\t"+annotation.getBabelSynset());
}
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
25025015/12/2014 Natural Language
Processing:
Regular Expressions,
Automata and Morphology
Pagina 250
Hands-on Session: Babelfy
12/15/2014
84
251251
Annotating with BabelNet:
all in one!
Annotating with BabelNet implies annotating with WordNet
and Wikipedia
(now also OmegaWiki, Open Multilingual WordNet,
Wiktionary and WikiData!)
Key fact!
251
BabelNet
7
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
261261
Publishing Structured Data as Linked Data
BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens.
12/15/2014
85
26226215/12/2014 Natural Language
Processing:
Regular Expressions,
Automata and Morphology
Pagina 262
Hands-on Session: RDF & SPARQL
Go to:
http://babelnet.org:8084/sparql/
263
Retrieve all the RDF information of a synset
● For instance, given the synset:
● http://babelnet.org/2.0/s00000356n
DESCRIBE <http://babelnet.org/2.0/s00000356n>
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
86
264
Retrieve the senses of a given lemma for a
certain language
● Given a word, e.g. home, retrieve all its
senses and corresponding synsets in all
supported languages:
SELECT DISTINCT ?sense ?synset WHERE {
?entries a lemon:LexicalEntry .
?entries lemon:sense ?sense .
?sense lemon:reference ?synset .
?entries rdfs:label "home"@EN .
}
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
265
Retrieve the translations of a given sense
● Given a sense, we want to obtain all its
translations:
● For instance, given the sense:
– http://babelnet.org/2.0/home_EN/s00044488n
SELECT ?translation WHERE {
<http://babelnet.org/2.0/home_EN/s00044488n> lexinfo:translation
?translation .
}
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
87
266
Retrieve license information about a sense
● For instance, given the sense:
– http://babelnet.org/2.0/home_EN/s00044488n
SELECT ?license WHERE {
<http://babelnet.org/2.0/home_EN/s00044488n> dcterms:license ?license .
}
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
267
Retrieve textual definitions in all languages
● For instance, given the synset:
● http://babelnet.org/2.0/s00000356n
SELECT DISTINCT ?language ?gloss ?license ?sourceurl WHERE {
<http://babelnet.org/2.0/s00000356n> bn-lemon:definition ?definition .
?definition lemon:language ?language .
?definition bn-lemon:gloss ?gloss .
?definition dcterms:license ?license .
?definition dc:source ?sourceurl .
}
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
88
268
Retrieve a synset’s hypernyms
● For instance, given the synset:
● http://babelnet.org/2.0/s00000356n
SELECT ?broader WHERE {
<http://babelnet.org/2.0/s00000356n> skos:broader ?broader
}
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
302302
Conclusion
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
89
303303
To summarize
• I have taken you through a tour of:
A very large multilingual semantic network: BabelNet
A state-of-the-art WSD and EL system: Babelfy
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
304304304Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial
Roberto Navigli and David Jurgens
Actually there’s much much
much more!
Fei, thanks for this crazy photo!
12/15/2014
90
305305
Next feature
in BabelNet!Semantic Predicates (SPred)
[Flati and Navigli, ACL 2013]
cup of *
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
306306
Next feature
in BabelNet!Semantic Predicates (SPred)
[Flati and Navigli, ACL 2013]
cup of *cup of *
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
91
307307
Next feature
in BabelNet!Semantic Predicates (SPred)
[Flati and Navigli, ACL 2013]
cup of *cup of *
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
308308
Earl Grey tea
Green tea
Indian tea
Black tea
Tea
…
Water
Seawater
…
Coffee
Turkish coffee
Drip coffee
Espresso
Cappuccino
Caffè latte
Decaffeinated
coffee
…
Wine
Sack
White wine
Red wine
Claret
Kosher wine
Madeira wine
Wine in China
…Classes sorted by relevance!
Next feature
in BabelNet!Semantic Predicates (SPred)
[Flati and Navigli, ACL 2013]
wine coffee beverage water
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
92
30930915/12/2014Homework 2
Roberto Navigli
309
Bored worker Enthusiastic player
31031015/12/2014Homework 2
Roberto Navigli
310
Bored worker Enthusiastic playerEnthusiastic player Enthusiastic worker
12/15/2014
93
311311
We want people to play, play, and play!
Real videogames «with a purpose»
Having fun with annotations
[Vannella et al. ACL 2014; Jurgens and Navigli, TACL 2014]
311BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
312312
Case Study #1: Validate and Extend
Semantic Relations in BabelNet[Vannella et al., ACL 2014]
• Given a pair of concepts, decide if they are
related
• “doctor” and “medicine”
• “doctor” and “USA”
• used for new and existing relations
312BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
94
313313
Game #1: Data Design
• Pick a target BabelNet synset
• Pick a related synset and a lemma from that synset
(validation case)
• Show the gloss of the target synset as a clue
• We know what synset the other word comes from
• Generate true negative data automatically by picking
random words related to other synsets
• Low probability of being related
• Lets us measure player accuracy
313BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
314314
Game #1: Infection (a zombie survival game)
12/15/2014
95
315315
Game #1: Infection (a zombie survival game)
316316
Game #1: Presenting a clue to the player
12/15/2014
96
317317
Game #1: Gameplay
317BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
318318
Game #1: Results
• ~250 players made 6.5K annotations in a two-week period
• Better performance than crowdsourcing, with zero cost*
• Gamers spotted 67.8% of true positive relations
compared with 16.9% on Crowdflower
Players were very
accurate at spotting
false negative items!
318BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
97
319319
• Given a concept and an image, decide if the
image depicts the concept
Case Study #2: Validate and Extend
Image-Sense Associations in BabelNet[Vannella et al., ACL 2014]
319BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
321321
Game #2: The Knowledge Towers (Action RPG)
12/15/2014
98
322322
Game #2: Showing the concept hint
323323
Game #2: Gameplay
ies
323BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
99
324324
Game #2: Gameplay
324
ies
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
325325
Game #2: Results
• ~200 players made 6.3K annotations in a two-week period
• Better performance than crowdsourcing, with zero cost*
• Gamers spotted 82.5% of true positive images compared
with 59.5% on Crowdflower
Players were very
accurate at spotting
false negative items!
325BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
100
32632615/12/2014 BabelNet & friends
Roberto Navigli
326
327327
Thanks or…
m i(grazie)
BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial
Roberto Navigli
12/15/2014
101
328328
http://lcl.uniroma1.it
http://babelnet.org
http://babelfy.org
Google group: babelnet-group