the global wordnet grid: anchoring languages to universal meaning piek vossen irion...
TRANSCRIPT
The Global Wordnet Grid: anchoring languages to universal meaning
Piek Vossen
Irion Technologies/Vrije Universiteit Amsterdam
6th International Plain Language Conference, October 11-14th, 2007, Amsterdam
6th International PLAIN language Conference
11-14th October, Amsterdam
Overview: Problem: effective language and communication
From human to human From human to machine From machine to machine From human to machine and back to human, maybe via other
machines... Solution: anchoring language to universal meaning
Wordnets: network of words related through meaning The Global Wordnet Grid: wordnets for languages connected
to each other through an ontology Future:
Equal access to the knowledge and information on the Internet to all people, regardless of language and background
Systems that start to understand language
6th International PLAIN language Conference
11-14th October, Amsterdam
Problem
6th International PLAIN language Conference
11-14th October, Amsterdam
Language is inherently vague and ambiguous Communication through language:
mediates between the expectation of the Speaker and the Hearer => half a word is enough
Language is not fully descriptive but minimally sufficient: Do not bother the Hearer with information that is
already known => rely on background knowledge Use a minimal set of words and expressions to
avoid memory overloading => words and expressions have multiple meaning
6th International PLAIN language Conference
11-14th October, Amsterdam
Concept in our head
Plato with beard
"gavagai"
W.V.O.Quine (1964): inscrutability of reference
rabbit with carrots and rosemary
devine appearance announcing spring
sweet pet wanna hug
Understanding is fundamentally impossible
6th International PLAIN language Conference
11-14th October, Amsterdam
Full understanding is fundamentally impossible BUT? People do communicate... People even communicate with computers... As long as language is effective:
meaning= to have the desired effect! Link language to useful content!
6th International PLAIN language Conference
11-14th October, Amsterdam
What is effective computer-mediated language? Computers store information and knowledge in
textual form: People search information and knowledge by 'querying'
computers Effective Computer Mediated Communication (CMC) = find
what you need and nothing else Computers analyze information and knowledge:
Collect data and send alerts, reports and facts Computers connect people:
Support communication across people by analyzing communication or translating languages
6th International PLAIN language Conference
11-14th October, Amsterdam
Strings Strings
Expression in language
Words….
Expression in language
….Words
Strings
Concept
Query
Concept
InformationSeeker
InformationProvider
Information
ape
….
energy
….
mass
….
….
zebra
Index of Strings
6th International PLAIN language Conference
11-14th October, Amsterdam
Strings Strings
Expression in language
my cell phone….
Expression in language
….mobile
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
….
….
mobile
….
….
zebra
Index of Strings
Conceptual match
Linguistic mismatch
6th International PLAIN language Conference
11-14th October, Amsterdam
Strings Strings
Expression in language
my cell phone….
Expression in language
….nerve cells
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
cell
….
….
….
….
zebra
Index of Strings
Conceptual mismatch
Linguistic match
6th International PLAIN language Conference
11-14th October, Amsterdam
Strings Strings
Expression in language
police cell ….
Expression in language
…. nerve cells
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
cell
….
….
….
….
zebra
Index of Strings
Conceptual mismatch
Linguistic match
6th International PLAIN language Conference
11-14th October, Amsterdam
Strings Strings
Expression in language
neuron ….
Expression in language
….nerve cells
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
cell
….
….
….
….
zebra
Index of Strings
Conceptual match
Linguistic mismatch
6th International PLAIN language Conference
11-14th October, Amsterdam
Recall & Precision
query:
“cell”
Search engine for
database with
all documents
“cell
phone”
“mobile
phones”
“nerve cell”
“police cell”
recall = doorsnede / relevant
precision = doorsnede / gevonden
found intersection relevant
Recall < 20% for basic search engines!
(Blair & Maron 1985)
6th International PLAIN language Conference
11-14th October, Amsterdam
Useless dialogues with Alice-bot
6th International PLAIN language Conference
11-14th October, Amsterdam
It is useful to anchor meaning!
Anchoring already takes place all over the world through standardization:
measures and units: meter, liter, kilo terminological databases, legal definitions,
contracts international cooperation ontologies: definition of the meaning of concepts
in a formal knowledge presentation system, (1st order logic) so that a computer can reason with it
6th International PLAIN language Conference
11-14th October, Amsterdam
Solution
6th International PLAIN language Conference
11-14th October, Amsterdam
How can we anchor the meaning of words? We can anchor words to each other:
semantic network or wordnet We can anchor words to logical implications:
a formal ontology
6th International PLAIN language Conference
11-14th October, Amsterdam
Relational model of meaning
man woman
boy girl
cat
kitten
dog
puppy
animal
man
woman
boy
meisje
cat
kitten
dogpuppy
animal
6th International PLAIN language Conference
11-14th October, Amsterdam
Princeton WordNet
Developed by George Miller and his team at Princeton University, as the implementation of a mental model of the lexicon
Organized around the notion of a synset: a set of synonyms in a language that represent a single concept
Semantic relations between concepts Covers over 100,000 concepts and over
120,000 English words
6th International PLAIN language Conference
11-14th October, Amsterdam
Wordnet: a network of semantically related words
{conveyance;transport}
{vehicle}
{motor vehicle; automotive vehicle}
{car; auto; automobile; machine; motorcar}{bumper}
{car door}
{car window}
{car mirror} {armrest}
{doorlock}
{hinge; flexible joint}
{cruiser; squad car; patrol car; police car; prowl car}
{cab; taxi; hack; taxicab}
ENGLISHCar…
Train…
Vehicle
Inter-Lingual-Index
Transport
Road Air Water
Domains DOLCESUMO
Device
Object
TransportDevice
English Words
vehicle
car train
1
2
4
3 3
Czech Words
dopravní prostředník
auto vlak
2
1
French Words
véhicule
voiture train
2
1
Estonian Words
liiklusvahend
auto killavoor
2
1
German Words
Fahrzeug
Auto Zug
2
1
Spanish Words
vehículo
auto tren
2
1
Italian Words
veicolo
auto treno
2
1
Dutch Words
voertuig
auto trein
2
1
Wordnet familyPrinceton WordNet, (Fellbaum 1998): 115,000 concepsEuroWordNet, (Vossen 1998): 8 languagesBalkaNet, (Tufis 2004): 6 languagesGlobal Wordnet Association: all languages
6th International PLAIN language Conference
11-14th October, Amsterdam
Wordnets as autonomous language-specific structures
voorwerp{object}
lepel{spoon}
werktuig{tool}
tas{bag}
bak{box}
blok{block}
lichaam{body}
Wordnet1.5 Dutch Wordnet
bagspoonbox
object
natural object (an object occurring naturally)
artifact, artefact (a man-made object)
instrumentality block body
containerdeviceimplement
tool instrument
6th International PLAIN language Conference
11-14th October, Amsterdam
Complex equivalence Complex equivalence relationsrelations1. Multiple Targets (1:many)
Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5:• make clean by removing dirt, filth, or unwanted substances from• remove unwanted substances from, such as feathers or pits, as of chickens or fruit• remove in making clean; "Clean the spots off the rug"• remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)Dutch wordnet: versiersel near_synonym versiering Target record: decoration.
3. Multiple Targets and Sources (many:many)Dutch wordnet: toestel near_synonym apparaat
Target records: machine; device; apparatus; tool
6th International PLAIN language Conference
11-14th October, Amsterdam
Gaps in the English WordNet:
genuine, cultural gaps: unknown in English culture:
Dutch: klunen, to walk on skates over land from one frozen water to the other
pragmatic gaps: the concept is known but is not expressed by a single lexicalized form in English:
Dutch: kunstproduct = artifact substance <=> artifact object
Complex equivalece relationsComplex equivalece relations
6th International PLAIN language Conference
11-14th October, Amsterdam
From EuroWordNet to Global WordNet Global Wordnet Association:
http://www.globalwordnet.org Bi-annual conference: India (2002), Czech (2004),
Korea (2006), Hungary (2008), .... Currently, wordnets exist for more than 40
languages, including:Arabic, Bantu, Basque, ...., Chinese, Bulgarian, Estonian, Hebrew, ...., Icelandic, Japanese, Kannada, Korean, Latvian, Latin, ....Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, .... Zulu
Many languages are genetically and typologically unrelated
6th International PLAIN language Conference
11-14th October, Amsterdam
Some downsides
Construction is not done uniformly Coverage differs Not all Wordnets can communicate with one another:
not linked linked to different versions: 1.5, 1.6, 1.7, 2.0 and now 3.0, 3.1 linked with different relations
Proprietary rights restrict free access and usage A lot of the semantics is duplicated Complex and obscure equivalence relations due to
linguistic differences between English and other languages
6th International PLAIN language Conference
11-14th October, Amsterdam
Inter-LingualOntology
Device
Object
TransportDeviceEnglish Words
vehicle
car train
1
2
3 3
Czech Words
dopravní prostředník
auto vlak
2
1
French Words
véhicule
voiture train
2
1
Estonian Words
liiklusvahend
auto killavoor
2
1
German Words
Fahrzeug
Auto Zug
2
1
Spanish Words
vehículo
auto tren
2
1
Italian Words
veicolo
auto treno
2
1
Dutch Words
voertuig
auto trein
2
1
Next step: Global WordNet Grid
6th International PLAIN language Conference
11-14th October, Amsterdam
The Ontology: main features Formal, artificial ontology serves as universal index of
concepts List of concepts is not just based on the lexicon of a
particular language (unlike in EuroWordNet) but uses ontological observations:
Lexicalization in a language is not sufficient to warrant inclusion in the ontology
Lexicalization in all or many languages may be sufficient Ontological observations will be used to define the concepts in the
ontology
Concepts are related in a type hierarchy Concepts are defined with axioms: Knowledge
Interchange Format (KIF) based on first order predicate calculus and atomic elements
6th International PLAIN language Conference
11-14th October, Amsterdam
Concepts by ontological observations
Types and Roles among the hyponyms of dog in Wordnet: husky, lapdog; toy dog; hunting dog; working dog; dalmatian,
coach dog, carriage dog; basenji; pug, pug-dog; Leonberg; Newfoundland; Great Pyrenees; spitz; griffon, Brussels griffon, Belgian griffon; corgi, Welsh corgi; poodle, poodle dog; Mexican hairless; pooch, doggie, doggy, barker, bow-wow; cur, mongrel, mutt
Current WordNet treatment:(1) a husky is a kind of dog(2) a husky is a kind of working dog
What’s wrong? (2) is defeasible, (1) is not:*This husky is not a dog => RIGID TYPEThis husky is not a working dog => ROLE, NON-RIGID
6th International PLAIN language Conference
11-14th October, Amsterdam
Ontology versus wordnet
Hierarchy of disjunct types:Canine PoodleDog; NewfoundlandDog;
GermanShepherdDog; Husky
Wordnet: NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP
((instance x Poodle) LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP
((instance x Canine) and (role x GuardingProcess))
6th International PLAIN language Conference
11-14th October, Amsterdam
Properties of the Ontology
Minimal: terms are distinguished by essential properties only
Comprehensive: includes all distinct concepts types of all Grid languages
Allows definitions via KIF of all words that express non-rigid, non-essential properties of types
Logically valid, allows inferencing
6th International PLAIN language Conference
11-14th October, Amsterdam
Ontology versus Wordnet
Not added to the type hierarchy:{straathond}NL (a dog that lives in the streets) ((instance x Canine) and (habitat x Street))
Added to the type hierarchy:{klunen}NL (to walk on skates from one frozen body to the
next over land)KluunProcess => WalkProcessAxioms:(and (instance x Human) (instance y Walk) (instance z
Skates) (wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y) (before y s2) etc…
National dishes, customs, games,....
6th International PLAIN language Conference
11-14th October, Amsterdam
Ontology versus Wordnet
Refer to sets of types in specific circumstances or to concept that are dependent on these types, next to {rivierwater}NL there are many others:
{theewater}NL (water used for making tea)
{koffiewater}NL (water used for making coffee)
{bluswater}NL (water used for making extinguishing file)
Relate to linguistic phenomena: gender, perspective, aspect, diminutives, politeness,
pejoratives, part-of-speech constraints
6th International PLAIN language Conference
11-14th October, Amsterdam
{teacher}EN
((instance x Human) and (agent x TeachingProcess))
{Lehrer}DE ((instance x Man) and (agent x TeachingProcess))
{Lehrerin}DE ((instance x Woman) and (agent x TeachingProcess))
KIF expression for gender marking
6th International PLAIN language Conference
11-14th October, Amsterdam
KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y) buy: subj(y), direct obj(z),indirect obj(x)
FinancialTransaction(and (instance x Human)(instance y Human) (instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e)
The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn
6th International PLAIN language Conference
11-14th October, Amsterdam
Advantages of the Global Wordnet Grid Shared and uniform world knowledge:
universal inferencing uniform text analysis and interpretation
More compact and less redundant databases More clear notion how languages map to the
knowledge better criteria for expressing knowledge better criteria for understanding variation
6th International PLAIN language Conference
11-14th October, Amsterdam
Future
6th International PLAIN language Conference
11-14th October, Amsterdam
Synonyms,Semantic network
thesaurus
golfclub(s)
Tiger
Woods
golf
sticks
Language technology: a hole in one!
golfclubs
Linguistic analysis
Golf at the club
clubs
for golf
6th International PLAIN language Conference
11-14th October, Amsterdam
Index concepts rather than words
Meaning of a word in context: Domain of the document:
Juventus => football Topic of the paragraph:
transfer scandal => business, crime Phrase: linguistically-motivated combination of
words: [wing player]football player in [police cell]jail
Topic of the query: Can I order chicken wings? => food
Phrase: [chicken wings]dish
6th International PLAIN language Conference
11-14th October, Amsterdam
dog
watchdog
poodlestreet dog
dachshundlapdog
short hair dachshund
long hairdachshund
Expansion from a type to roles
hunting dog
Expansion with clear hyponymy
puppy
bitch
6th International PLAIN language Conference
11-14th October, Amsterdam
dog
watchdog
poodlestreet dog
dachshundlapdog
short hair dachshund
long hair dachshund
Expansion from a role to types and other roles
hunting dog
Expansion with clear hyponymy
puppy
bitch
6th International PLAIN language Conference
11-14th October, Amsterdam
Ontology
Texts
Objectsin reality
Thought
Expression
携帯電話(keitaidenwa )
Knowledge &information
Useful and effective behavior:-reason over knowledge-collect information and data-deliver services and be helpful
6th International PLAIN language Conference
11-14th October, Amsterdam
Automotive ontology: (http://www.ontoprise.de)
6th International PLAIN language Conference
11-14th October, Amsterdam
Dialogue system
DialogueManager
• Can I help you?• My head phone is broke.
• I want to buy a new one.
• Would you like repair or products?
• Can yousay more about products?
• It is for my cell phone.
• Can you give more details?
• It is a Nokia 6110• I got the following accessoires for you. Please have a look.
UserModel-Intention-Satisfaction-Emotion
InformationState:-Positive-Negative-Relations
• That is not what I want!
QuestionAnalysis
Topicdetection
SearchEngine
reparair
information
accessories
products
Website
TextAnalysis
Word
mobile
head phone
Concept
6th International PLAIN language Conference
11-14th October, Amsterdam
Prevent deadlocks: Detects vagueness and ambiguity (what meaning of cell?) Detect topic changes Uses negative feedback: “No jails, I want cell phones!”
Can handle out-of-domain questions (users do not know what the system knows) :
"We do not have hotel rooms but we do have electronic equipment".
"No, we do not have portophones but we do have other electronic equipement such as cell phones"
Communicative dialog system
hotel room
room
space
equipment
cell phone portophone
object
6th International PLAIN language Conference
11-14th October, Amsterdam
THANK YOU FOR YOUR ATTENTION