definition clustering, sense naming & lexical augmentation
DESCRIPTION
Mathieu LAFOURCADE [email protected]. Fabien JALABERT [email protected]. Definition Clustering, Sense Naming & Lexical Augmentation. Study context 1/2. Natural Language Processing Lexical Semantics - WSD - Document indexing - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/1.jpg)
Definition Clustering,Sense Naming
&Lexical Augmentation
Fabien [email protected]
Mathieu [email protected]
![Page 2: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/2.jpg)
Natural Language Processing
• Lexical Semantics - WSD - Document indexing
• Dictionary construction and vectorization pb extracting definition meta-language example : ‘cannibale’ = ‘qui mange l’Homme en parlant de l’Homme’ themes : homme, manger, rhétorique
• Multi-source approach noise reduction problem : atom element = definition ≠ sense
• Objectives- clustering definitions to obtain senses- naming these senses
Study context 1/2
![Page 3: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/3.jpg)
Term Tdef 1 - Source 1
def 2 - Source 1
def 3 - Source 1
def 1 - Source 2
def 2 - Source 2
def 1 - Source 3
def 2 - Source 3
def 1 - Source 1
Catégorie 1Sense 1
Sense 2
def 2 - Source 1
def 2 - Source 2
def 1 - Source 3
Sense 3def 3 - Source 1
def 1 - Source 2
def 2 - Source 3
Clustering
Multi-source base
‘Acception’ or sense base
Sense naming
Sense 2 – Name
Sense 1 – Name
Sense 2 – Name
Re-injection as new lexical source
t1
t2
t3
t4
t5
t6
tn
Study context 2/2
![Page 4: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/4.jpg)
• Model, Construction, Organization
• Definition Clustering• Sense Naming• Lexical Augmentation
• Results
Summary
![Page 5: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/5.jpg)
• An idea = a vector
• A vector component = a primitive as defined in a Th.– Thesaurus Larousse : 873 concepts
– Concepts are inter-related
Generator space
• A definition a vector
Conceptual Vector Model 1/2
arme
transports maritimes et fluviauxoiseau
Most activated primitives for ‘frégate’ :(oiseau 6134) (transports maritimes et fluviaux 5644) (arme 4891) …
Salton Deerwester
Chauché Lafourcade
![Page 6: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/6.jpg)
Thematicaly terms close to ‘frégate’ :(destroyer 0.2246) (youyou 0.2267) (voilier 0.2268) (contre-torpilleur 0.2274) (chlamydère 0.2276) (oiseau-jardinier 0.2295) (trois-mâts 0.233) …
Thematicaly terms close to ‘frégate/oiseau/’ :(oiseau-jardinier 0.1237) (plumeur 0.1319) (goglu 0.136) (travailleur 0.136)(chlamydère 0.1385) (penne 0.141) (Galliformes 0.1422) (agami 0.1428) …
Thematicaly terms close to‘frégate/bateau/’ :(démâtage 0.1604) (dégréer 0.1676) (naval 0.1718) (bateau-piège 0.1774)
(bateau-vanne 0.1821) (batelet 0.1824) …
Conceptual Vector Model 2/2
xy
Thematic distance = angle between two vectors
![Page 7: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/7.jpg)
SYGMART
la petite brise la glace
le petit briser le glace
GN – Gouv - adj GV - Gouv GN – Gouv - nf
9GN
8briser
7GV
6petit
5le
4GN
11glace
10le
3PH
2PHAMBG
1
12.
14GN
16GA
15le
18brise
17petit
22glacer
20GN
19GV
21le
23.
13PH
Definition Vector ComputationChauché
![Page 8: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/8.jpg)
Learning agents : Sygmart, computation of vectors from definition, synonymy, antonymy, …
Multi-Agent OrganizationDouble-loop
Lecerf Schwab
Endogenous loop
Exogenous loop
Other agents (society)
Agent
![Page 9: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/9.jpg)
Grouping definitions into senses
Clustering
Objective
![Page 10: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/10.jpg)
• Deep analysis - several criteria• No training (but enhancement through exogenous loop)
• Frontier between senses and definitions
- Centroïd approach
- Heuristics (preferences) - cluster number = nb max of definitions in dictionaries- two definitions of a same source two different clusters
Clustering 1/5Strategy
![Page 11: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/11.jpg)
Chaussure montante(quel qu'en soit l'usage )
Coup porté(en escrime
ou non)
Distinction entre"le coup en escrime"et "l'attaque surprise"
réunion devégétaux
Distinction entre"chaussure élégante" et"chaussure tout-terrain"
Clustering 2/5Difficulty
‘botte’
![Page 12: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/12.jpg)
• Source by source iterationuntil obtaining a min value distribution
Affectation of min. value source/cluster From a distance matrix : Hungarian method – O(n3)
Clustering 3/5Algorithm 1/2
Kuhn Ford, Fulkerson
![Page 13: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/13.jpg)
• For each criteriaone evaluationone distance matrix
• CriteriaComparing lexical contents of definitions
(with term frequency, co-occurrences, etc.)
Angular distanceSymbolic markers
- morphology- etymology ( ‘avocat’ : ‘ahuacatl’ / ‘advocatus’ )
- use (‘vieux’ , ‘ancien’, ‘poétique’ … )
- language level (‘argot’, ‘familier’, … )
- domain (‘médecine’, ‘zoologie’, … )
Clustering 4/5Algorithm 2/2
![Page 14: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/14.jpg)
We would like to designate meanings
‘botte’
Correct results in many cases90 % for nouns, 70 % for verbs - to be done for adj
Pb with very strong polysemy vagueness, continuity in meanings
support verb: ‘prendre’,…
Study augmentation of cluster number
Clustering 5/5Results
![Page 15: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/15.jpg)
Sense Naming
Objective
To give the system some capacity to « talk about a sense »
![Page 16: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/16.jpg)
• Dictionary independent• Interface (man-system & system-system)
• A new lexical source looping :-)
Semantic annotation
La frégate/vaisseau/ naviguait à travers
les océans
La frégate/oiseau/ planait à travers les nues en poussant
son cri incomparable
Sense Naming 1/10Properties
![Page 17: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/17.jpg)
1. Extraction
2. Validation and dispatching of polysem bags bijection
3. Evaluation of candidates
ordering and extracting the most appropriate ones
Sense Naming 2/10Procedure
![Page 18: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/18.jpg)
• Extraction attached to a meaning– Morpho-syntactic analysis of the definition– Extraction of markers : « anc. », « méd. », …– Extraction from unstructured or semi-structured data (XML…)
‘frégate’ : [nf] [ancien] Au XVe s., grande barque demi-pontée gréant deux voiles latines sur antenne et assurant la liaison entre les ports et les escadres de galère. [Club Internet]
• Extraction from polysem bags– Word list (like synonym list of Université de Caen : )
Sense Naming 3/10Extraction
Ploux, Victori
ex: ‘botte’ = chaussure, bottillon, coup, attaque, amas, bouquet,…
![Page 19: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/19.jpg)
Bijection being able to re-associate the proper meaning
ƒ : (term, sense) (term, annotation)
ƒ-1 : (term, annotation) (term, sense)
Sense Naming 4/10
• A candidate associated to a sense should be closer of its own sensethan any other
• Unattached candidates are associated to the closest meaning
• A candidate should not be present in a concurrent definition
),(),(, jAiAij saDsaDss ≤≠∀
Validation
![Page 20: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/20.jpg)
• Extraction grade
• Evaluating the capacity to disambiguate (to distinguish a sense from all others)
• Evaluating the capacity to associateCognitive cost reduction
Sense Naming 5/10Evaluation
Prince
![Page 21: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/21.jpg)
‘frégate’ : [nf] [ancien] Au XVe s., grande barque demi-pontée gréant deux voiles latinessur antenne et assurant la liaison entre les ports et les escadres de galère. [Club Internet]
XVe grande barque demi-pontée barque demi-pontée
(6) (2) (1) (3) (1)
gréant voiles latines voiles latines antenne
(4) (5) (6) (5) (7)
au grande barque demi-pontéeXVe , gréant deux voiles latines sur antennes …
SujetGV
COD CCCC
Sense Naming 6/10Extraction grade
![Page 22: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/22.jpg)
12 ddM A −=
1d
MM A
R =
3d
MR R
NS =
absolute margin
relative margin
risk of ‘non-sens’
Sense Naming 7/10
Disambiguation capacity 1/2
frégate vaisseau
w.3(navire moderne)
w.2(navire ancien) t.12
(sanguin)
t.11(navire)(oiseau)
w.1
Ma = d1 - d2 = 0,1
Mr = 0,1 / d1= 0.33
Rns = d3 / 0,33= 0.6
0,95
1,2
0,8
0,85
0,3= d1
0,4= d2
0,2= d3
![Page 23: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/23.jpg)
Sense Naming 8/10
Disambiguation capacity 2/2
frégate vaisseau
w.3(navire moderne)
w.2(navire ancien) t.12
(sanguin)
t.11(navire)(oiseau)
w.1
Ma = d1 - d2 = 0,1
Mr = 0,1 / d1= 0.33
Rns = d3 / 0,33= 0.6
0,95
1,2
0,8
0,85
0,3= d1
0,4= d2
0,2= d3
frégate voilier
w.3(navire moderne)
w.2(navire ancien) t.12
(navire)
t.11(oiseau)(oiseau)
w.1
Ma = d1 - d2 = 0,04
Mr = 0,04 / d1= 0,16
Rns = d3 / 0,16= 4
0,3
0,7
0,29 = d2
0,72
0,72
0.25 = d1
0,65= d3
![Page 24: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/24.jpg)
survey
- collocations (botte de paille, …)
- co-occurrences (Tintin Milou)
- synonyms and hyperonyms(manger se nourrir, mouche insecte animal)
- domain / context for technical terms(médecine, architecture, agriculture, sport, …)
Done for 13 terms totalizing 38 definitions 134 answers
Sense Naming 9/10Cognitive cost
Church Daille Véronis
![Page 25: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/25.jpg)
‘botte’
- multi-criteria approach seems adapted- easily extensible- strong precision
- enhancement needed for meta-language processing- criteria implementation
(associative memory, lexical functions )
- synthesis grammar (botte/secret/ vs. botte/secrète/)
Useful for multilingual lexical databases
Sense Naming 10/10Results
Mel’cukSchwab
![Page 26: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/26.jpg)
Multilingual Lexical DatabaseSome terms are not lexicalized in some language
Objectivelexicalize these terms
Lexical Augmentation
![Page 27: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/27.jpg)
abats
giblets
offal.1
FRANCAIS ENGLISHACCEPTIONS
abats offal
giblets
offal.2
refuse refuse scrapdéchet
abats de volaille
abats de bœuf
abats de porc
beef offal
porc offal
Lexical Augmentation 1/2Papillon projectBoitet LepageMangot-Lerebours Sérasset
![Page 28: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/28.jpg)
• Extraction from definition and sense mane (glosses of dictionaries) abats = {‘porc’, ‘volaille’, ‘bœuf’, …}
• Patterns‘abats de volaille’, ‘abats en volaille’, …
• Patterns validation with co-occurrencesrelative number de hits in Google
• Difficulties ‘dog meat’ ‘viande pour chien’ / ‘viande de chien’ ?
Lexical Augmentation 2/2Procedure
![Page 29: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/29.jpg)
Clustering• promissing results
manual evaluation on 100 difficult terms, 70 % of proper clusters, 30 % of bad affectation locutions
• pb to increase the cluster number maturing of the basic clusters
Sens Naming complementary with conceptual vectors• Good precision
manual evaluation 90 % of pertinent termsautomatic evaluation 70 % (angular distance)
• Towards a synthesis grammar botte/secret/ botte/secrète/
Future works• More criteria
(associative memory, more lexical functions)• Enhance definition analysis (meta-language)
Conclusion
![Page 30: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/30.jpg)
Theoricformalisation de la ‘capacité de désambiguïsation’ et du ‘risque de non-sens’formalisation de l’annotation en sémantique lexicaleproposition d’une mesure de similarité générique entre définitions
Praticalimplémentation sous forme d’agentscatégorisation, nommage (services sur la Toile)augmentation lexicale (en cours)
Diffusionun poster à RECITAL’2003 (Batz sur Mer – 10 – 14 juin 2003)un article à Papillon’2003 (Sapporo – 2 – 6 juillet 2003)soumission pour RFIA’2004
Contribution
![Page 31: Definition Clustering, Sense Naming & Lexical Augmentation](https://reader036.vdocuments.net/reader036/viewer/2022062518/56813fef550346895daaf6c9/html5/thumbnails/31.jpg)
Thank you