german rigau i claramunt rigau talp research center departament de llenguatges i sistemes...
TRANSCRIPT
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
OntologiesOntologies
OntologiesOntologiesOutlineOutline
• WordNet (Miller et al. 90, Fellbaum 98)WordNet (Miller et al. 90, Fellbaum 98)• EuroWordNet (Vossen et al. 98)EuroWordNet (Vossen et al. 98)• Spanish WordNet Spanish WordNet • Combining Methods (Atserias et al. 97) Combining Methods (Atserias et al. 97) • Mapping hierarchies (Daudé et al. 01)Mapping hierarchies (Daudé et al. 01)• Mikrokosmos (Viegas et al. 96)Mikrokosmos (Viegas et al. 96)• Cyc (Malesh et al. 96)Cyc (Malesh et al. 96)• WordNet 2 (Harabagiu 98)WordNet 2 (Harabagiu 98)• MindNet (Richardson et al. 97)MindNet (Richardson et al. 97)• ThoughtTreasure (Mueller 00)ThoughtTreasure (Mueller 00)• Meaning ...Meaning ...
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
WordNet & WordNet & EuroWordNetEuroWordNet
WordNet & EuroWordNetWordNet & EuroWordNetWordNetWordNet
• Universidad de Princeton (Miller et al. Universidad de Princeton (Miller et al. 1990)1990)• Conceptos lexicalizados (parabras, lexíes) Conceptos lexicalizados (parabras, lexíes) • Relacionados entre sí por relaciones Relacionados entre sí por relaciones semánticassemánticas
• sinonimia sinonimia • antonimia antonimia • hiperonimia-hiponimiahiperonimia-hiponimia• meronimia meronimia • implicaciónimplicación• causacausa• ......
WordNet & EuroWordNetWordNet & EuroWordNetRelaciones Semánticas de WN1.5Relaciones Semánticas de WN1.5
•SinonimiaSinonimia•Conceptos Lexicalizados (Conceptos Lexicalizados (SYNSETSSYNSETS))•Noción Noción débildébil de sinonimia: Sinonimia de sinonimia: Sinonimia en contextoen contexto•SynsetSynset: Conjunto de palabras o lexías : Conjunto de palabras o lexías que en un contexto dado expresan un que en un contexto dado expresan un conceptoconcepto
•Hiperonimia / HiponimiaHiperonimia / Hiponimia•Relación de clase a subclaseRelación de clase a subclase
•MeronimiasMeronimias•Parte componenteParte componente
{mano}{mano}{brazo}{brazo}•Elemento de colectividadElemento de colectividad
{persona}{persona}{gente}{gente}•SustanciaSustancia
{periódico}{periódico}{papel}{papel}
WordNet & EuroWordNetWordNet & EuroWordNetRelacions Semàntiques de WN1.5Relacions Semàntiques de WN1.5
• AntonimiaAntonimia{grande}{grande}{pequeño}{pequeño}
• CausaCausa{matar}{matar}{morir}{morir}
• ImplicaciónImplicación{divorciarse}{divorciarse}{casarse}{casarse}
• DerivaciónDerivación{presidencial}{presidencial}{presidente}{presidente}
• SimilitudSimilitud{bueno}{bueno}{positivo}{positivo}
WordNet & EuroWordNetWordNet & EuroWordNetRelaciones Semánticas de WN1.5Relaciones Semánticas de WN1.5
WordNet & EuroWordNetWordNet & EuroWordNetEjemplo WordNetEjemplo WordNet
<cruiser, squad car, patrol car, ...><cruiser, squad car, patrol car, ...>
<cruiser, squad car, patrol car, ...><cruiser, squad car, patrol car, ...>
<cab, taxi, hack, ...><cab, taxi, hack, ...>
<motor vehicle, automovile,...><motor vehicle, automovile,...>
<vehicle><vehicle>
<conveyance><conveyance>
<car door><car door>
<doorlock><doorlock>
• Proyecto LE-2 4003 Proyecto LE-2 4003 •Telematics Application Programme de la UETelematics Application Programme de la UE
• Redes semánticas de diversas lenguasRedes semánticas de diversas lenguas• Integradas e interconectadasIntegradas e interconectadas
•InglésInglés Universidad de SheffieldUniversidad de Sheffield•HolandésHolandés Univ. de AmsterdamUniv. de Amsterdam•ItalianoItaliano I.L.C. de PisaI.L.C. de Pisa•EspañolEspañol UB, UPC, UNED.UB, UPC, UNED.
• Computers and the HumanitiesComputers and the Humanities• (Vol.monográfico,1998)(Vol.monográfico,1998)
• http://www.hum.uva.nl/~ewn/http://www.hum.uva.nl/~ewn/
WordNet & EuroWordNetWordNet & EuroWordNetEuroWordNetEuroWordNet
•EWN2EWN2Alemán, Francés, Checo, Sueco, EstonioAlemán, Francés, Checo, Sueco, Estonio
•Proyecto ITEMProyecto ITEMCastellano, Catalán, VascoCastellano, Catalán, Vasco
•CREL (Centre de Referència d’Enginyeria CREL (Centre de Referència d’Enginyeria Lingüística)Lingüística)
Catalán (UB, UPC)Catalán (UB, UPC)
WordNet & EuroWordNetWordNet & EuroWordNetExtensiones EuroWordNetExtensiones EuroWordNet
•Desarrollo de recursos BásicosDesarrollo de recursos Básicos•Tratamiento interlingüístico de la Tratamiento interlingüístico de la informacióninformación
- Sistemas multilingües de - Sistemas multilingües de recuperación de información (p.e., recuperación de información (p.e., Internet) Internet) - Módulo léxico-semántico de los - Módulo léxico-semántico de los sistemas de ingeniería lingüísticasistemas de ingeniería lingüística
Extracción de información Extracción de información Traducción automáticaTraducción automática
WordNet & EuroWordNetWordNet & EuroWordNetAplicacionesAplicaciones
•Preservación de las relaciones Preservación de las relaciones semánticas específicas de cada lenguasemánticas específicas de cada lengua•Máxima compatibilidad entre los Máxima compatibilidad entre los diferentes recursosdiferentes recursos•Relativa independencia de los WordNets Relativa independencia de los WordNets
•en el proceso de construcción en el proceso de construcción •en el resultado finalen el resultado final
WordNet & EuroWordNetWordNet & EuroWordNetRequisitos de DiseñoRequisitos de Diseño
•NúcleoNúcleo•El ILIEl ILI•La Top Concept Ontology (TCO)La Top Concept Ontology (TCO)•Ontología de dominios (DO)Ontología de dominios (DO)
•PeriferiaPeriferia•WordNets específicosWordNets específicos
WordNet & EuroWordNetWordNet & EuroWordNetComponentes de EuroWordNetComponentes de EuroWordNet
•Colección no estructurada de elementosColección no estructurada de elementos
•Ligados conLigados con•al menos, un synset de un EWNal menos, un synset de un EWN•un elemento de la TCO o DOun elemento de la TCO o DO
•Asociados a synsets de WN 1.5Asociados a synsets de WN 1.5
WordNet & EuroWordNetWordNet & EuroWordNetInterlingual Index of EuroWordNetInterlingual Index of EuroWordNet
•Jerarquía de conceptos independientes Jerarquía de conceptos independientes de la lenguade la lengua
•distinciones semánticas: objeto, lugar, distinciones semánticas: objeto, lugar, dinámico, …dinámico, …•abstracta (no léxica)abstracta (no léxica)•Superpuesta al ILISuperpuesta al ILI
•Tres tipos de entidades:Tres tipos de entidades:•Primer orden: Primer orden: entidades concretasentidades concretas•Segundo orden: Segundo orden: situaciones estáticas situaciones estáticas o dinámicaso dinámicas•Tercer orden: Tercer orden: proposiciones proposiciones abstractasabstractas
WordNet & EuroWordNetWordNet & EuroWordNetTop Concept Ontology of Top Concept Ontology of EuroWordNetEuroWordNet
Top0
1stOrderEntity1 2ndOrderEntity0
Origin0
Natural21
Living30
Plant18
Human106
Creature2
Animal23
Artifact144
Form0
Substance32
Solid63
Liquid13
Gas1
Object162
Composition0
Part86
Group63
Function55
Vehicle8
SituationType6
Dynamic134
BoundedEvent183
UnboundedEvent48
Static28
Property61
Relation38
SituationComponent0
Cause67
Agentive170
Phenomenal17
Stimulating25
Communication50
Condition62
Existence27
Experience43
Location76
Manner21
Mental90
3rdOrderEntity33
WordNet & EuroWordNetWordNet & EuroWordNetTop Concept Ontology of Top Concept Ontology of EuroWordNetEuroWordNet
•Jerarquía de etiquetas de dominioJerarquía de etiquetas de dominio•Reducción de la polisemiaReducción de la polisemia•Dominios: Dominios:
•Tráfico:Tráfico:•Tráfico rodado, tráfico aéreoTráfico rodado, tráfico aéreo
•Información InternacionalInformación Internacional•MicologíaMicología•MedicinaMedicina
WordNet & EuroWordNetWordNet & EuroWordNetDomain Ontology of EuroWordNetDomain Ontology of EuroWordNet
•Riqueza superior a WNRiqueza superior a WN•Entre:Entre:
•synsets (módulos monolingües)synsets (módulos monolingües)
•registros ILI (multilingües):registros ILI (multilingües):{actuar-1} {actuar-1} EQ-SYNONYMEQ-SYNONYM {‘behave {‘behave in a certain manner’}in a certain manner’}
•registros registros ILIILI y y TCOTCO o o ODOD
WordNet & EuroWordNetWordNet & EuroWordNetRelaciones de EuroWordNetRelaciones de EuroWordNet
eq_synonym has_eq_hyponym has_eq_hypernym
ESP
ILI HOL
IT ING
dito
dedo
finger
toe
head
cabeza
hoofd
kop
finger
toe
finger or toe
head
human head
animal head
WordNet & EuroWordNetWordNet & EuroWordNetRelaciones Interlingüísticas de Relaciones Interlingüísticas de EuroWordNetEuroWordNet
relación etiquetasaplicables
ejemplo descripción
HAS_XPOS_HYPERONYM d c r destrucción > cambiar hiperonimia transcategorialHAS_XPOS_HYPONYM d r cambio > destruir hiponimia transcategorialNEAR_SYNONYM r aparato<>instrumento cuasi-sinonimiaNEAR_ANTONYM r construir <> destrozar cuasi-antonimiaXPOS_NEAR_ANTONYM r construcción > destrozar cuasi-antonimia transcategorialINVOLVED d c r martillear > martillo entidad directamente relacionada
con un eventoROLE d c r vino > beber evento directamente relacionado
con una entidadinvolved_agent d c r educar > educador involvement en que la entidad
realiza un papel agentivorole_agent d c r educador > educar role en que la entidad realiza un
papel agentivorole_location d c r comedor > comer role en que la entidad realiza
un papel locativoHAS_MERONYM d c r n cara > nariz meronimia (genérica)has_mero_portion d c r n pan > mendrugo inversa de la anteriorhas_mero_location d c r n desierto > oasis inversa de la anteriorBE_IN_STATE d c r n belleza > bello estado correspondiente a la
posesión de una cierta propiedad
WordNet & EuroWordNetWordNet & EuroWordNetRelaciones de EuroWordNetRelaciones de EuroWordNet
Spanish WordNet:Spanish WordNet:Building ProcessBuilding Process
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
Spanish WordNetSpanish WordNet
General MethodologyGeneral Methodology
1)1) Mapping to WN1.5Mapping to WN1.5 manual work manual work automatic derivation of equivalents, automatic derivation of equivalents,
using bi-lingual dictionariesusing bi-lingual dictionaries
2) Manual correction2) Manual correction
3) Re-structuring3) Re-structuring
Spanish WordNetSpanish WordNetMain Steps: Main Steps: First Core (Manual Translation)First Core (Manual Translation)
– Nouns: Nouns: A) WN1.5’s Tops File plus first level of A) WN1.5’s Tops File plus first level of
hyponyms (about 800 synsets).hyponyms (about 800 synsets). B) The rest of EWN’s Common Base B) The rest of EWN’s Common Base
Concepts (which were not in our set).Concepts (which were not in our set). C) Manual translation of synsets C) Manual translation of synsets
intermediate between (A) and (B) intermediate between (A) and (B) following WN1.5 hyerarchy following WN1.5 hyerarchy thus building thus building a compact taxonomy equivalent to a compact taxonomy equivalent to WN1.5 without gapsWN1.5 without gaps
– Verbs: Verbs: Manual translation of EWN’s Base Manual translation of EWN’s Base
Concepts (about 150 synsets)Concepts (about 150 synsets)
Spanish WordNetSpanish WordNetMain Steps: Subset 1 (Semi-Main Steps: Subset 1 (Semi-
automatic)automatic) ouns:ouns:
– Applying authomatic methods using bi-lingual Applying authomatic methods using bi-lingual dictionariesdictionaries
– Manual validation of several subsets to check if the Manual validation of several subsets to check if the link is correctlink is correct
– Deriving a Confidence Score (CS) for every authomatic Deriving a Confidence Score (CS) for every authomatic method (heuristic)method (heuristic)
– Selecting pairs synset-word above 85% CS Selecting pairs synset-word above 85% CS – Some manual correction of this Subset 1 (mainly, Some manual correction of this Subset 1 (mainly,
filling gaps)filling gaps) Verbs:Verbs:
– 3600 English verbs connected to WN1.5 senses and 3600 English verbs connected to WN1.5 senses and ambiguously translated to Spanish are manually ambiguously translated to Spanish are manually inspected and disambiguatedinspected and disambiguated
Spanish WordNetSpanish WordNet
Main Steps: Subset 1 (Results 1)Main Steps: Subset 1 (Results 1)
Nouns Verbs Others TotalSynsets 18577 2602 0 21179number of senses (variants) 39620 6795 0 46415X variants per synset 2.22 2.61 0 2.27Corresponding to number of entries (words) 23216 2278 0 25494X senses per word 1.77 2.98 0 1.88Language Internal Relations 40559 3749 0 44308Average per synset 2.18 1.44 0 2.09Equivalent Relations to ILI (WN1.5) 18634 2602 0 21236Average per synset 1.00 1.00 0 1.00Synset without ILI 0 0 0 0Percentage of Synsets without translation 0% 0% 0%
Spanish WordNetSpanish WordNet
Main Steps: Subset 1 (Results 2)Main Steps: Subset 1 (Results 2)
CS Nouns Verbs Total100% (manual) 5041 6795 11836> 97% 403 0 403> 95% 304 0 304> 93% 1598 0 1598> 86% 27649 0 27649> 85 % 4625 0 4625Total 39620 6795 46415
Spanish WordNetSpanish WordNetMain Steps: Subset 2Main Steps: Subset 2
Main goalsMain goals
enhance the quality of the Subset 1 by enhance the quality of the Subset 1 by manual revisionmanual revision
extend it by manual building of synsetsextend it by manual building of synsets
4 Sub-tasks4 Sub-tasks
Spanish WordNetSpanish WordNetMain Steps: Subset 2Main Steps: Subset 2
1) Covering manually those gaps in the 1) Covering manually those gaps in the hyponymy chains covered by other languageshyponymy chains covered by other languages
2) Manual cleaning of some automatically-2) Manual cleaning of some automatically-generated variants. generated variants. – (a) pairs of synsets which are adjacent in the (a) pairs of synsets which are adjacent in the
hyponymy chain and share at least one variant. hyponymy chain and share at least one variant. deleting redundant variants deleting redundant variants re-locating to either pre-existant or newly created re-locating to either pre-existant or newly created
synsetssynsets– (b) multi-word expressions present in synsets. (b) multi-word expressions present in synsets.
Deleting non-lexicalizedDeleting non-lexicalized
Spanish WordNetSpanish WordNetMain Steps: Subset 2Main Steps: Subset 2
3) Manual addition of new vocabulary which has 3) Manual addition of new vocabulary which has been considered relevant. been considered relevant. – It mainly comes from the Catalan WordNet: It mainly comes from the Catalan WordNet:
since we are building both wordnets in since we are building both wordnets in parallell, we detected those synsets which parallell, we detected those synsets which were built for Catalan and not for Spanishwere built for Catalan and not for Spanish
4) Manual addition of 4) Manual addition of cross-part of speechcross-part of speech relations relations between nominal and verbal synsets. between nominal and verbal synsets. – This work has been based mainly on noun-This work has been based mainly on noun-
verb pairs obtained by means of verb pairs obtained by means of morphological criteria. (Work carried out by morphological criteria. (Work carried out by UNED –Madrid-)UNED –Madrid-)
Spanish WordNetSpanish WordNetMain Steps: Subset 2 (Results)Main Steps: Subset 2 (Results)
Nouns Verbs Others Total
Synsets 19663 3538 0 23201number of senses (variants) 39782 8394 0 48176X variants per synset 2.02 2.37 0 2.08Corresponding to number of entries (words) 22881 3324 0 26205X senses per word 1.74 2.53 0 1.84Language Internal Relations 43151 6756 2661 52568Average per synset 2.19 1.91 ? 2.27Equivalent Relations to ILI (WN1.5) 19534 3534 0 23068Average per synset 0.99 1.00 0 0.99Synset without ILI 185 4 0 189Percentage of Synsets without translation 1% 0% 0 1%
Spanish WordNetSpanish WordNetMain Steps: Subset 2 (Results)Main Steps: Subset 2 (Results)
Confidence (Variants) Nouns Verbs Total
100% (Manual) 7819 8394 16213>96% 382 0 382>94% 2948 0 2948>92% 1364 0 1364>85% 23113 0 23113>84% 4156 0 4156Total 39782 8394 48176
Spanish WordNetSpanish WordNetMain Steps: Main Steps: Beyond Subset 2Beyond Subset 2
Massive Manual Checking (from Massive Manual Checking (from Nov’98)Nov’98)
– Using WEIUsing WEI
– Variants automatically generatedVariants automatically generated– Filling gaps in the hierachyFilling gaps in the hierachy– New vocabularyNew vocabulary– New AdjectivesNew Adjectives
Spanish WordNetSpanish WordNetMain Steps: Main Steps: Beyond Subset 2Beyond Subset 2
Noun Verb Others TotalSynsets 24215 4079 2191 30485No. of senses 40759 9317 2439 52515Sens./syns. 1.68 2.28 1.11 1.72Entries 26485 3828 2439 32752Sens./entry 1.54 2.43 1.00 1.60LIRels. 54832 7978 10855 73665LIRels/syns 2.26 1.96 * 2.42EQRels-ILI 24209 4074 0 28283EQRels/syn 1.00 1.00 0 0.93Synsets without ILI 62 5 2191 2258
Spanish WordNetSpanish WordNetMain Steps: Main Steps: Beyond Subset 2Beyond Subset 2
CS Nouns Verbs Adjectives Total99% (Manual) 16568 9317 2439 2832497% 310 0 0 31095% 2652 0 0 265293% 1173 0 0 117390% 6 0 0 686% 16605 0 0 1660585% 3445 0 0 3445Total 40759 9317 2439 52515
Spanish WordNetSpanish WordNetMain Steps: Main Steps: Parole CoverageParole Coverage
Nouns VerbsFrequency parole entries parole covered %coverage parole
entriesparolecovered
%coverage
1001- 147 143 97.28 110 107 97.27501-1000 261 246 94.25 139 118 84.89251-500 462 429 92.86 218 172 78.90101-250 933 863 92.50 381 257 67.4551-100 959 863 89.99 374 265 70.8631-50 892 804 90.13 347 185 53.3121-30 730 632 86.57 286 141 49.3011-20 1202 978 81.36 469 175 37.316-10 1024 790 77.15 360 129 35.833-5 968 665 68.70 254 74 29.132 435 257 59.08 123 32 26.021 643 334 51.94 131 26 19.85overall 8656 7004 80.91 3192 1681 52.66
Spanish WordNetSpanish WordNetCurrent Current FiguresFigures
– Spanish, Catalan, Basque, (English)Spanish, Catalan, Basque, (English)– http://nipadio.lsi.upc.es/wei2.htmlhttp://nipadio.lsi.upc.es/wei2.html
Nouns Verbs Adjs
Synsets Words Synsets Words Synsets Words
English 60556 87641 11363 14727 16428 19101
Spanish 43522 47665 7934 5312 12481 8762
Catalan 30701 32987 4505 4285 1444 1561
Combining Multiple Combining Multiple Methods for the Automatic Methods for the Automatic Construction of Construction of Multilingual WordNetsMultilingual WordNets
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
Ten class methodsTen class methods– Four monosemic criteriaFour monosemic criteria– Four polysemic criteriaFour polysemic criteria– two hybrid criteriatwo hybrid criteria
Three conceptual distance Three conceptual distance methodsmethods– CD1: using pairwise word CD1: using pairwise word
coocurrencescoocurrences– CD2: using headword and genusCD2: using headword and genus– CD3: using bilingual Spanish entries CD3: using bilingual Spanish entries
with multiple translationswith multiple translations
Combining Multiple Combining Multiple Methods ...Methods ...OutlineOutline
– Four ClassesFour Classes
Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods
SWSW EWEW
SWSW EWEW
EWEW
SWSW EWEW
EWEWSWSW
SWSW EWEW
SWSW
– Four monosemic criteriaFour monosemic criteria
SWSW EWEW
SWSW EWEW
EWEW
SynsetSynset
SynsetSynset
SynsetSynset
SynsetSynset
SWSW EWEW
EWEWSWSW
SynsetSynset
SynsetSynset
SWSW EWEW
SWSW
Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods
– Four polysemic criteriaFour polysemic criteria
SWSW EWEW
SWSW EWEW
EWEW
SWSW EWEW
EWEWSWSW
Synset+Synset+
Synset+Synset+
Synset+Synset+
Synset+Synset+
Synset+Synset+
Synset+Synset+
SWSW EWEW
SWSW
Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods
– Variant criterionVariant criterion
– Field criterionField criterion
<..., EW, ..., EW, ...><..., EW, ..., EW, ...>
SWSW
<..., headword-EW, ..., Ind-EW, ...><..., headword-EW, ..., Ind-EW, ...>
SWSW
Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods
ResultsResults
Criterion #links #synsets #words %okmono1 3697 3583 3697 92mono2 935 929 661 89mono3 1863 1158 1863 89mono4 2688 1328 2063 85poly1 5121 4887 1992 80poly2 1450 1426 449 75poly3 11687 6611 3165 58poly4 40298 9400 3754 61Variant 3164 2195 2261 85Field 510 379 421 78
Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods
Conceptual Distance (Agirre et al. 94)Conceptual Distance (Agirre et al. 94)– length of the shortest pathlength of the shortest path– specificity of the conceptsspecificity of the concepts
)c,path(cc kwcwc
21
i2i1k2i2
1i1 )depth(c
1min)w,dist(w
using WordNet using WordNet Bilingual dictionaryBilingual dictionary
Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance Conceptual Distance methodsmethods
Three conceptual distance methodsThree conceptual distance methods– CD1: using pairwise word coocurrencesCD1: using pairwise word coocurrences– CD2: using headword and genusCD2: using headword and genus– CD3: using bilingual Spanish entries with multiple CD3: using bilingual Spanish entries with multiple
translationstranslations
Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance Conceptual Distance methodsmethods
<object, ...><object, ...>
<artifact, artefact><artifact, artefact>
<entity><entity>
<structure, construction><structure, construction>
<building, edifice><building, edifice>
<place of worship, ...><place of worship, ...>
<<churchchurch, church building>, church building>
<<abbeyabbey>>
<house, lodging><house, lodging>
abadíaabadía_1_2 _1_2 IglesiaIglesia o o monasteriomonasterio regido por un abad o abadesa regido por un abad o abadesa
((abbey, a church or a monastery ruled by an abbot or an abbessabbey, a church or a monastery ruled by an abbot or an abbess))
<religious residence, cloiser><religious residence, cloiser>
<monastery><monastery>
<abbey><abbey>
<convent><convent>
<abbey><abbey>
Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance methods (Example Conceptual Distance methods (Example CD2)CD2)
<object, ...><object, ...>
<artifact, artefact><artifact, artefact>
<entity><entity>
<structure, construction><structure, construction>
<building, edifice><building, edifice>
<place of worship, ...><place of worship, ...>
<<churchchurch, church building>, church building>
<<abbeyabbey> > 06 ARTIFACT06 ARTIFACT
<house, lodging><house, lodging>
abadíaabadía_1_2 _1_2 IglesiaIglesia o o monasteriomonasterio regido por un abad o abadesa regido por un abad o abadesa
((abbey, a church or a monastery ruled by an abbot or an abbessabbey, a church or a monastery ruled by an abbot or an abbess))
<religious residence, cloiser><religious residence, cloiser>
<monastery><monastery>
<abbey><abbey>
<convent><convent>
<abbey><abbey>
Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance methods (Example Conceptual Distance methods (Example CD2)CD2)
ResultsResults
Criter. #links #synsets #words %okCD - 1 23,828 11,269 7,283 56CD - 2 24,739 12,709 10,300 61CD - 3 4,567 3,089 2,313 75
Combining Multiple Combining Multiple Methods ...Methods ...
Three CD methodsThree CD methods
ResultsResults
method2method1 cd2 cd3 p1 p2 p3 p4cd1 size 15736 1849 2076 556 3146 15105
%ok 79 85 86 86 72 64cd2 size 0 2401 2536 592 3777 13246
%ok 0 86 88 86 75 67cd3 size 0 0 205 180 215 3114
%ok 0 0 95 95 100 77p1 size 0 0 0 0 77 178
%ok 0 0 0 0 100 88p2 size 0 0 0 0 28 78
%ok 0 0 0 0 77 96
Combining Multiple Combining Multiple Methods ...Methods ...
Combining methodsCombining methods
WNs #links #synsets #word #CS #poly linksSpWN v0.0 10,982 7,131 8,396 87.4 1,777Combination 7,244 5,852 3,939 85.6 2,075SpWN v0.1 15,535 10,786 9,986 86.4 3,373
Combining Multiple Methods ...Combining Multiple Methods ...Resulting Spanish Resulting Spanish WordNetsWordNets
Mapping Mapping Conceptual Hierarchies Conceptual Hierarchies Using Relaxation LabellingUsing Relaxation Labelling
German Rigau i ClaramuntGerman Rigau i Claramunt
TALP Research CenterTALP Research Center
UPCUPC
– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II
(monolingual)(monolingual)– Further workFurther work
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting C1C1
C2C2
C3C3
C5C5
C6C6
C4C4
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting C1C1
C2C2
C3C3
C5C5
C6C6
C4C4
Connecting already existing Connecting already existing HierarchiesHierarchies– Relaxattion labelling AlgorithnRelaxattion labelling Algorithn– ConstraintsConstraints
BetweenBetween– Spanish taxonomy automatically Spanish taxonomy automatically
derived from an MRD (Rigau et al. 98)derived from an MRD (Rigau et al. 98)– WordNetWordNet
using a bilingual MRDusing a bilingual MRD
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting
animalanimal
aveave
rapazrapaz
faisánfaisán
(Tops <(Tops <animalanimal, animate_being, ...>), animate_being, ...>)
(person <(person <beastbeast, , brutebrute, ...>), ...>)(person <(person <duncedunce, blockhead, ...>), blockhead, ...>)
(animal <(animal <birdbird>)>)
(artifact <(artifact <birdbird, shuttle, ...>), shuttle, ...>)
(food <(food <fowlfowl, , poultrypoultry, ...>), ...>)
(person <(person <damedame, , dolldoll, ...>), ...>)
(animal <(animal <pheasantpheasant>)>)
(food <(food <pheasantpheasant>)>)
(animal <(animal <birdbird>)>)
(artifact <(artifact <birdbird, shuttle, ...>), shuttle, ...>)
(food <(food <fowlfowl, , poultrypoultry, ...>), ...>)
(person <(person <damedame, , dolldoll, ...>), ...>)
– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II
(monolingual)(monolingual)– Further workFurther work
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline
– Iterative algorithm for function Iterative algorithm for function optimization based on local optimization based on local informationinformation
– it can deal with any kind of it can deal with any kind of constraintsconstraints
variables (senses of the taxonomy)variables (senses of the taxonomy) labels (synsets)labels (synsets)
– Finds a weight assignment for each Finds a weight assignment for each possible label for each variablepossible label for each variable
weights for the labels of the same weights for the labels of the same variable add up to onevariable add up to one
weigth assignation satisfies -to the weigth assignation satisfies -to the maximum possible extent- the set of maximum possible extent- the set of constraintsconstraints
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingRelaxation Labelling AlgorithmRelaxation Labelling Algorithm
1) Start with a random weight assigment1) Start with a random weight assigment
2) Compute the support value for each label of 2) Compute the support value for each label of each variable (according to the constraints) each variable (according to the constraints)
3) Increase the weights of the labels more 3) Increase the weights of the labels more compatible with context and decrease those compatible with context and decrease those and decrease those of the less compatible and decrease those of the less compatible labels.labels.
4) If a stopping/convergence is satisfied, stop,4) If a stopping/convergence is satisfied, stop,
otherwiese go to step 2.otherwiese go to step 2.
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingRelaxation Labelling AlgorithmRelaxation Labelling Algorithm
– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II
(monolingual)(monolingual)– Further workFurther work
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline
– Rely on the taxonomy structureRely on the taxonomy structure– Coded with three charactersCoded with three characters
X: Spanish Taxonomy,X: Spanish Taxonomy, I (immediate),I (immediate), Y: English Taxonomy,Y: English Taxonomy, A (ancestor)A (ancestor) X: Relation, E (hypernym), O (hyponym), B X: Relation, E (hypernym), O (hyponym), B
(both)(both)
– Examples:Examples:
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingConstraintsConstraints
IIEIIE AABAAB
++ ++
++++
– II ConstraintsII Constraints
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints
NAACL’2001NAACL’2001
IIEIIE IIBIIBIIOIIO
– AI ConstraintsAI Constraints
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints
AIEAIE AIBAIB
++
++
NAACL’2001NAACL’2001
AIOAIO
++
++
– IA ConstraintsIA Constraints
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints
IAEIAE IABIAB
++
++
NAACL’2001NAACL’2001
IAOIAO
++
++
– AA ConstraintsAA Constraints
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints
AAEAAE AABAAB
++
++
NAACL’2001NAACL’2001
AAOAAO
++
++
++
++
++
++
– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I Experiments & Results I
(multilingual)(multilingual)– Experiments & Results II Experiments & Results II
(monolingual)(monolingual)– Further workFurther work
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline
– Four monosemic criteriaFour monosemic criteria
SWSW EWEW
SWSW EWEW
EWEW
Synset Synset 92%92% 5%5%
SynsetSynset 89%89% 1%1%
SynsetSynset
SynsetSynset 89%89% 2%2%
SWSW EWEW
EWEWSWSW
SynsetSynset 85%85% 4%4%
SynsetSynset
SWSW EWEW
SWSW
Combining Multiple Combining Multiple Methods ...RANLP’97Methods ...RANLP’97Eight class methodsEight class methods
Prec.Prec. Cov.Cov.
– Four polysemic criteriaFour polysemic criteria
SWSW EWEW
SWSW EWEW
EWEW
SWSW EWEW
EWEWSWSW
Synset+ 80%Synset+ 80% 8% 8%
Synset+ 75%Synset+ 75% 2% 2%
Synset+Synset+
Synset+ 58%Synset+ 58% 17% 17%
Synset+ 61%Synset+ 61% 60% 60%
Synset+Synset+
SWSW EWEW
SWSW
Combining Multiple Combining Multiple Methods ...RANLP’97Methods ...RANLP’97Eight class methodsEight class methods
Prec.Prec. Cov.Cov.
PolyPoly TOK, FOKTOK, FOK TOK, FNOKTOK, FNOK totaltotal
animalanimal 279 (90%)279 (90%) 30 (91%)30 (91%) 209 (90%)209 (90%)
foodfood 166 (94%)166 (94%) 3 (100%)3 (100%) 169 (94%)169 (94%)
cognitioncognition 198 (67%)198 (67%) 27 (90%)27 (90%) 225 (69%)225 (69%)
communicationcommunication 533 (77%)533 (77%) 40 (97%)40 (97%) 573 (78%)573 (78%)
allall TOK, FOKTOK, FOK TOK, FNOKTOK, FNOK totaltotal
animalanimal 424 (93%)424 (93%) 62 (95%)62 (95%) 486 (90%)486 (90%)
foodfood 166 (94%)166 (94%) 83 (100%)83 (100%) 249 (96%)249 (96%)
cognitioncognition 200 (67%)200 (67%) 245 (90%)245 (90%) 445 (82%)445 (82%)
communicationcommunication 536 (77%)536 (77%) 234 (97%)234 (97%) 760 (81%)760 (81%)
Combining Multiple Combining Multiple Methods ...RANLP’97 Methods ...RANLP’97 Experiments & ResultsExperiments & Results
Combining Multiple Combining Multiple Methods ...RANLP’97 Methods ...RANLP’97 Experiments & ResultsExperiments & Results
pielpiel
visónvisón
martamarta
(substance <skin, fur, peel>)(substance <skin, fur, peel>)
(substance <sable, marte, coal_back>)(substance <sable, marte, coal_back>)
(substance <mink, mink_coat>)(substance <mink, mink_coat>)
– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II
(monolingual)(monolingual)– Further workFurther work
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Generalized ConstraintsGeneralized Constraints
All RelationshipsAll Relationships
– also-see, similar-to, attribute, antonym, also-see, similar-to, attribute, antonym, etc.etc.
RRRR
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Generalized ConstraintsGeneralized Constraints
Non-structural constraintsNon-structural constraints
– W: number of word coincidencesW: number of word coincidences– G: word coincidences in glossesG: word coincidences in glosses– F: number of frame coincidences F: number of frame coincidences
(verbs)(verbs)
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01POS mapping depencencesPOS mapping depencences
NounsNouns
AdjectivesAdjectives
VerbsVerbs
AdverbsAdverbs
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Constraints for VerbsConstraints for Verbs
Structural constraintsStructural constraints– hyper/hyponymyhyper/hyponymy– antonymyantonymy– also-seealso-see
Non-structural constraintsNon-structural constraints– W, G and FW, G and F
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
ConstraintsConstraints AdjectivesAdjectives
Structural constraintsStructural constraints– Adj-to-AdjAdj-to-Adj
antonymy, similar-to and also-seeantonymy, similar-to and also-see
– Adj-to-VerbAdj-to-Verb participle-ofparticiple-of
– Adj-to-NounAdj-to-Noun pertains and attributepertains and attribute
Non-structural constraintsNon-structural constraints– W and GW and G
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01
ConstraintsConstraints AdverbsAdverbs
Structural constraintsStructural constraints– Adv-to-AdvAdv-to-Adv
antonymyantonymy
– Adv-to-AdjAdv-to-Adj derivedderived
Non-structural constraintsNon-structural constraints– W and GW and G
A Complete... ACL’00, NAACL’01A Complete... ACL’00, NAACL’01Example extra-POSExample extra-POS
02025107a02025107a
evangelical evangelisticevangelical evangelistic
04237485n04237485n
Gospel Gospels evangelGospel Gospels evangel
00843344a00843344a
evangelical evangelisticevangelical evangelistic
02025107a02025107a
evangelical evangelical
04853575n04853575n
Gospel Gospels evangelGospel Gospels evangel
00842521a00842521a
enthusiasticenthusiastic
pertainym
pertainym
Similar toWN1.5WN1.5
WN1.6WN1.6
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Example extra-POSExample extra-POS
00057615r00057615r
impossibly absurdlyimpossibly absurdly
01393725a01393725a
impossibleimpossible
00294844r00294844r
impossiblyimpossiblyderived from
WN1.5WN1.5 WN1.6WN1.6
01752468a01752468a
impossibleimpossible
derived from
00294658a00294658a
possiblypossibly
antonym
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults
Basic constraint set: structural constraintsBasic constraint set: structural constraints
– Nouns: Nouns: AAAA hyper/hyponym hyper/hyponym– Verbs: Verbs: AAAA hyper/hyponym, II also-see hyper/hyponym, II also-see– Adjectives: Adjectives: IIII antonymy, similar-to, antonymy, similar-to,
also-seealso-see– Adverbs: Adverbs: IIII antonymy antonymy
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults
Basic constraint set: structural constraintsBasic constraint set: structural constraints
CoverageCoverage AmbigouAmbigouss
OverallOverall
NN
VV
AA
RR 80.8%80.8%94.1%94.1%96.9%96.9%99.7%99.7% 94.9% - 99.6%94.9% - 99.6% 97.6% - 99.8%97.6% - 99.8%
82.8% - 98.9%82.8% - 98.9%
97.5% - 100%97.5% - 100%
93.5% - 99.2%93.5% - 99.2% 94.6% - 99.2%94.6% - 99.2%
89.5% - 99.4%89.5% - 99.4%
99.0% - 100%99.0% - 100%
Precision - recallPrecision - recall
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults
Basic constraint set + W, G and F for Basic constraint set + W, G and F for verbsverbs
CoverageCoverage AmbigouAmbigouss
OverallOverall
NN
VV
AA
RR 99.5%99.5%98.9%98.9%99.8%99.8%99.9%99.9% 97.5% - 97.797.5% - 97.7 %%98.8% - 98.9%98.8% - 98.9%
96.5% - 98.8%96.5% - 98.8%
97.5% - 100%97.5% - 100%
99.4% - 99.7%99.4% - 99.7% 99.3% - 99.6%99.3% - 99.6%
97.9% - 99.3%97.9% - 99.3%
99.0% - 100%99.0% - 100%
Precision - recallPrecision - recall
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01ResultsResults
Basic + extra-POS relationshipsBasic + extra-POS relationships
CoverageCoverage AmbigouAmbigouss
OverallOverall
NN
VV
AA
RR 88.0%88.0%95.8%95.8%---- -- --
95.8% - 98.9%95.8% - 98.9%
69.2% - 94.2%69.2% - 94.2%
-- --
90.9% - 99.4%90.9% - 99.4%
97.9% - 98.1%97.9% - 98.1%
Precision - recallPrecision - recall
A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults
Basic + extra-POS relationships + WGFBasic + extra-POS relationships + WGF
CoverageCoverage AmbigouAmbigouss
OverallOverall
NN
VV
AA
RR 99.6%99.6%99.0%99.0%99.8%99.8%99.9%99.9% 97.5% - 97.797.5% - 97.7 %%98.8% - 98.9%98.8% - 98.9%
96.5% - 99.1%96.5% - 99.1%
98.3% - 100%98.3% - 100%
99.4% - 99.7%99.4% - 99.7% 99.3% - 99.6%99.3% - 99.6%
97.9% - 99.5%97.9% - 99.5%
99.3% - 100%99.3% - 100%
Precision - recallPrecision - recall
– First complete mapping between First complete mapping between Wordnet versionsWordnet versions
– Combining structural and non-Combining structural and non-structural informationstructural information
– Robust approach based on local Robust approach based on local information, but with global effectsinformation, but with global effects
– Incremental POS approachIncremental POS approach
– http://www.lsi.upc.es/~nlphttp://www.lsi.upc.es/~nlp– 90 downloads (since November 2000)90 downloads (since November 2000)
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabelling ConclusionsConclusions
– mapping other structuresmapping other structures WN-EDR, WN-LDOCE, etc.WN-EDR, WN-LDOCE, etc. Other language taxonomies to Other language taxonomies to
EuroWordNetEuroWordNet
– SpanishEWN to WN1.6SpanishEWN to WN1.6– symmetrical philosophy rather than symmetrical philosophy rather than
source-targetsource-target
Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabelling Further WorkFurther Work
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
MikrokosmosMikrokosmos
MikrokosmosMikrokosmosOutlineOutline
• Introduction Introduction • Representational IssuesRepresentational Issues
• The LexiconThe Lexicon• The OntologyThe Ontology
• Acquisition ProcessAcquisition Process• Lexicon AcquisitionLexicon Acquisition• GuidelinesGuidelines• Ontology/Lexicon Trade-offOntology/Lexicon Trade-off
• Semantics in ActionSemantics in Action
MikrokosmosMikrokosmosIntroductionIntroduction
• Knowledge Base Machine Translation Knowledge Base Machine Translation (KBMT)(KBMT)• CRL, NMSUCRL, NMSU• 5,000 concepts5,000 concepts
• EventsEvents• ObjectsObjects• PropertiesProperties
• 7,000 Spanish word senses7,000 Spanish word senses• 40,000 word senses 40,000 word senses
• after expansion with productive Lexical after expansion with productive Lexical RulesRules• comprar -> comprador, comprable, ...comprar -> comprador, comprable, ...
• Text Meaning RepresentationText Meaning Representation
MikrokosmosMikrokosmosRepresentational Issues: The LexiconRepresentational Issues: The Lexicon
• Typed Feature Structures (Pollard and Sag 87)Typed Feature Structures (Pollard and Sag 87)• language-dependantlanguage-dependant• 10 zones10 zones
• phonologyphonology• orthographyorthography• morphologymorphology• Syntactic (subcategorization)Syntactic (subcategorization)• Semantic (Lexical Semantic Semantic (Lexical Semantic Representation)Representation)• syntax-semantic linkingsyntax-semantic linking• stylisticsstylistics• paradigmatic paradigmatic • syntacmaticsyntacmatic
MikrokosmosMikrokosmosRepresentational Issues: The LexiconRepresentational Issues: The Lexicon
Adquirir-V1Adquirir-V1syn:syn: subj: subj: cat: cat: NPNP
obj:obj: cat:cat: NPNPsem:sem: acquireacquire
agent:agent: HUMANHUMANtheme: theme: OBJECTOBJECT
Adquirir-V2Adquirir-V2syn:syn: subj: subj: cat: cat: NPNP
obj:obj: cat:cat: NPNPsem:sem: acquireacquire
agent:agent: HUMANHUMANtheme: theme: INFORMATIONINFORMATION
MikrokosmosMikrokosmosRepresentational Issues: The Representational Issues: The OntologyOntology
• Taxonomic multi-hierarchicalTaxonomic multi-hierarchical• 14 local or inherited links in average14 local or inherited links in average• language-impartiallanguage-impartial• EVENTS, OBJECTS, PROPERTIESEVENTS, OBJECTS, PROPERTIES• Methodology & GuidelinesMethodology & Guidelines
MikrokosmosMikrokosmosRepresentational Issues: The Representational Issues: The OntologyOntology
• ACQUIREACQUIREDEFINITION DEFINITION “The transfer of possession event where “The transfer of possession event where the the
agent transfers an object to its agent transfers an object to its possession”possession”IS - A IS - A TRANSFER-POSSESSIONTRANSFER-POSSESSION
SOURCESOURCE HUMAN PLACEHUMAN PLACETHEMETHEME OBJECT (NOT HUMAN)OBJECT (NOT HUMAN)AGENTAGENT ANIMAL (DEFAULT HUMAN)ANIMAL (DEFAULT HUMAN)DESTINATIONDESTINATION ANIMAL PLACE (DEFAULT HUMAN)ANIMAL PLACE (DEFAULT HUMAN)
INHERITEDINHERITED
BENEFICIARYBENEFICIARY HUMANHUMAN
MikrokosmosMikrokosmosAcquisition Process: The LexiconAcquisition Process: The Lexicon
• Multi-lingual Multi-lingual •French, English, Japanese, Russian, Spanish, French, English, Japanese, Russian, Spanish, etc.etc.
• Multi-mediaMulti-media• Multi-processMulti-process
• AnalysisAnalysis• Generation (mono and multilingual)Generation (mono and multilingual)• MTMT• SummarizationSummarization• IEIE• Speech ProcessingSpeech Processing
• ToolsTools• corpus-search, lookup dictionary, ontology corpus-search, lookup dictionary, ontology browserbrowser
MikrokosmosMikrokosmosAcquisition Process: The OntologyAcquisition Process: The Ontology
• GuidelinesGuidelines1) Do not add instances as concepts1) Do not add instances as concepts
• Instances do not have their own instancesInstances do not have their own instances• Concepts do not have fixed position in Concepts do not have fixed position in space/timespace/time
2) Do not decompose concepts further2) Do not decompose concepts further3) Use close concepts3) Use close concepts4) Do not add EVENTs with particular arguments4) Do not add EVENTs with particular arguments5) Do not add concepts with instance-specific 5) Do not add concepts with instance-specific aspects,aspects,
temporal relationstemporal relations6) Do not add language-specific concepts6) Do not add language-specific concepts7) Do not add ontologycal concepts for 7) Do not add ontologycal concepts for collectionscollections
MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off
• Daily negociationsDaily negociations
• lexicon acquirerslexicon acquirers• ontology acquirersontology acquirers
• PossibilitiesPossibilities
• one-to-one mappingone-to-one mapping• lexicon unspecificationlexicon unspecification• lexicon ontology balancelexicon ontology balance
MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off
• one-to-one mappingone-to-one mapping
• ProblemsProblems
• Lexical: every word in a language is a conceptLexical: every word in a language is a concept• conceptual: conceptual: cuire cuire in french is not ambiguousin french is not ambiguous
PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT
COOKCOOKINST: STOVEINST: STOVE
BAKEBAKEINST: OVENINST: OVEN
cook : cuire sur le feucook : cuire sur le feu bake : cuire ou fourbake : cuire ou four
MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off
• Lexicon UnspecificationLexicon Unspecification
• ProblemsProblems
• BAKE is not in the ontology BAKE is not in the ontology
PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT
cook : cuire sur le feucook : cuire sur le feu bake : cuire ou fourbake : cuire ou fourINST: OVENINST: OVEN
MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off
• Lexicon-Ontology BalanceLexicon-Ontology Balance
PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT
FRYFRYINST: STOVEINST: STOVEINST: FRYING-PANINST: FRYING-PAN
BAKEBAKEINST: OVENINST: OVEN
cook : cuirecook : cuire
bakebake
MikrokosmosMikrokosmosSemantics in ActionSemantics in Action
• El grupo Roche, a través de su compañía en El grupo Roche, a través de su compañía en España, adquirió Doctor Andreu.España, adquirió Doctor Andreu.• El grupo Roche adquirió Doctor Andreu a través de El grupo Roche adquirió Doctor Andreu a través de su compañía en España.su compañía en España.• La adquisición de Doctor Andreu por el grupo La adquisición de Doctor Andreu por el grupo Roche fue hecha a través de su compañía en España.Roche fue hecha a través de su compañía en España.
ACQUIRE-1ACQUIRE-1 Agent: ORGANIZATION-1Agent: ORGANIZATION-1Theme: ORGANIZATION-2Theme: ORGANIZATION-2Instrument: ORGANIZATION-3Instrument: ORGANIZATION-3
ORGANIZATION-1 ORGANIZATION-1 Object-Name: Grupo Object-Name: Grupo RocheRocheORGANIZATION-2ORGANIZATION-2 Object-Name: Doctor AndreuObject-Name: Doctor AndreuORGANIZATION-3ORGANIZATION-3 Location: EspañaLocation: España
MikrokosmosMikrokosmosSemantics in ActionSemantics in Action
• Onto-Search: Ontological search Onto-Search: Ontological search mechanism to check constraintsmechanism to check constraints
• check-onto(ACQUIRE, EVENT) = 1check-onto(ACQUIRE, EVENT) = 1• since ACQUIRE is a type of EVENTsince ACQUIRE is a type of EVENT
• check-onto(ORGANIZATION, HUMAN) = 0.9check-onto(ORGANIZATION, HUMAN) = 0.9• since ORGANIZATION HAS-MEMBER HUMANsince ORGANIZATION HAS-MEMBER HUMAN
MikrokosmosMikrokosmosSemantics in ActionSemantics in Action
1) 1) a-través-dea-través-de INSTRUMENTINSTRUMENT, LOCATION, LOCATIONadquiriradquirir require PHYSICAL-OBJECT require PHYSICAL-OBJECT
2) 2) enen LOCATIONLOCATION, TEMPORAL, TEMPORALEspaña is not a TEMPORAL-OBJECTEspaña is not a TEMPORAL-OBJECT
3) 3) adquiriradquirir ACQUIREACQUIRE, LEARN, LEARNDoctor Andreu is not an INFORMATIONDoctor Andreu is not an INFORMATION
4) 4) Doctor AndreuDoctor Andreu ORGANIZATIONORGANIZATION, HUMAN, HUMANthe Theme of ACQUIRE is not HUMANthe Theme of ACQUIRE is not HUMAN
5) 5) compañíacompañía CORPORATIONCORPORATION, SOCIAL-EVENT, SOCIAL-EVENTORGANIZATIONs typically fill the INSTRUMENT ORGANIZATIONs typically fill the INSTRUMENT
slot of slot of ACQUIRE actsACQUIRE acts
MikrokosmosMikrokosmosExperiment: WSDExperiment: WSD
TextText 11 22 33 44 MeanMeanwordswords 347347 385385 370370 353353 364364words/sentencewords/sentence 16.516.5 24.024.0 26.426.4 20.820.8 21.421.4open-class wordsopen-class words 183183 167167 177177 177177 176176ambiguous wordsambiguous words 5757 4242 5757 3535 4848
syntaxsyntax 2121 1919 2020 12121818correctcorrect 5151 4141 4545 3434 4343%% 9797 9999 9393 9999 9797
MikrokosmosMikrokosmosExperiment: WSDExperiment: WSD
TextText MeanMean Mean UnseenMean Unseenwordswords 364364 390390words/sentencewords/sentence 21.421.4 2626open-class wordsopen-class words 176176 104104ambiguous wordsambiguous words 4848 2626
syntaxsyntax 1818 99correctcorrect 4343 2323%% 9797 9797
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
WordNet2WordNet2
WordNet2WordNet2OutlineOutline
• IntroductionIntroduction• Text InferencesText Inferences• Defining FeaturesDefining Features• Plausible inferencesPlausible inferences• Inference RulesInference Rules• Semantic PathsSemantic Paths• What WordNet cannot doWhat WordNet cannot do
WordNet2WordNet2IntroductionIntroduction
• (Harabagiu 98)(Harabagiu 98)• Commonse reasoning requires extensive Commonse reasoning requires extensive knowledgeknowledge• ~ 100 millions of concepts and relations~ 100 millions of concepts and relations• WordNet WordNet
• represents almost all English wordsrepresents almost all English words• 100.000 synsets100.000 synsets• linked by semantic relationslinked by semantic relations
• WordNet2WordNet2• each synset has a gloss that, when each synset has a gloss that, when disambiguated may increase the number of disambiguated may increase the number of relationsrelations• WordNet glosses into semantic networksWordNet glosses into semantic networks• NEW RELATIONSNEW RELATIONS
WordNet2WordNet2Text InferencesText Inferences
German was hungryGerman was hungryHe opened the refrigeratorHe opened the refrigerator
• hungry (feeling a need or desire to eat)hungry (feeling a need or desire to eat)
• eat (take in solid food)eat (take in solid food)
• refrigerator (an appliance in which foods refrigerator (an appliance in which foods can be can be stored at low temperature)stored at low temperature)
WordNet2WordNet2Defining FeaturesDefining Features
• Transform each concept’s gloss into a Transform each concept’s gloss into a graph where concepts are nodes and lexical graph where concepts are nodes and lexical relations are linksrelations are links
• <culture> (all the knowledge shared by society)<culture> (all the knowledge shared by society)<share> --AGENT--> <society><share> --AGENT--> <society>
• <doctor> (licensed medical practitioner)<doctor> (licensed medical practitioner)<medical practitioner> --ATRIBUTTE--> <medical practitioner> --ATRIBUTTE--> <licensed><licensed>
WordNet2WordNet2Defining FeaturesDefining Features
pilotpilot personperson
qualifiedqualified
guideguide
waterwater
difficultdifficult
GLOSSGLOSSATTRIBUTEATTRIBUTE
PURPOSEPURPOSE LOCATIONLOCATION
ATTRIBUTEATTRIBUTE
shipship
OBJECTOBJECT
WordNet2WordNet2Inference RulesInference Rules
Rule 1Rule 1 Rule 2Rule 2VC1VC1 IS-AIS-A VC2VC2 VC1VC1 IS-AIS-A VC2VC2VC2VC2 IS-AIS-A VC3VC3 VC2VC2 ENTAILENTAILVC3VC3-------------------------------------------------- --------------------------------------------------VC1VC1 IS-AIS-A VC3VC3 VC1VC1 ENTAILENTAILVC3VC3
Rule 3Rule 3 Rule 2Rule 2VC1VC1 IS-AIS-A VC2VC2 VC1VC1 IS-AIS-A VC2VC2VC2VC2 R_IS-AR_IS-A VC3VC3 VC2VC2 R_ENTAIL VC3R_ENTAIL VC3-------------------------------------------------- --------------------------------------------------VC1VC1 PLAUSIBLE (not VC3)PLAUSIBLE (not VC3) VC1VC1 EXPLAINS VC3EXPLAINS VC3
• 16 + 1 regles16 + 1 regles
WordNet2WordNet2Semantic PathsSemantic Paths
0) Create and load the KB0) Create and load the KB1) Place markers on KB concepts1) Place markers on KB concepts2) Propagate markers2) Propagate markers
The algorithm avoids cyclesThe algorithm avoids cycles3) Detect collisions3) Detect collisions
To each marker collision it To each marker collision it corresponds a pathcorresponds a path4) Extract Inferences4) Extract Inferences
WordNet2WordNet2Semantic PathsSemantic Paths
Inference sequenceInference sequence• German was hungryGerman was hungry• German felt a desire to eatGerman felt a desire to eat• German felt a desire to take in foodGerman felt a desire to take in food
COLLISION: German=he felt a desire to take COLLISION: German=he felt a desire to take food, stored in an appliance, food, stored in an appliance,
which he openedwhich he opened
• He opened an appliance where food is He opened an appliance where food is storedstored• He opened the refrigeratorHe opened the refrigerator
WordNet2WordNet2What WordNet cannot doWhat WordNet cannot do
Major WordNet limitations:Major WordNet limitations:
1) The lack of compound concepts1) The lack of compound concepts
2) The small number of causation and 2) The small number of causation and entailment relationsentailment relations
3) the lack of preconditions for verbs3) the lack of preconditions for verbs
4) the absence of case relations4) the absence of case relations
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
ThoughtTreasureThoughtTreasure
ThoughtTreasureThoughtTreasureOverviewOverview
• a comprehensive platform fora comprehensive platform for• NLP English, FrenchNLP English, French• commonsense reasoningcommonsense reasoning
• A hotel room has a bed, night table, ...A hotel room has a bed, night table, ...• People has fingernailsPeople has fingernails• soda is a drinksoda is a drink• one hangs up at the end of a phone callone hangs up at the end of a phone call• the sky is bluethe sky is blue• dogs barkdogs bark• someone who is 16 years old is a teenager someone who is 16 years old is a teenager
ThoughtTreasureThoughtTreasureOverviewOverview
• 25,000 concepts organized into a 25,000 concepts organized into a hierarchyhierarchy
EVIAN -> FLAT-WATER -> DRINKING-EVIAN -> FLAT-WATER -> DRINKING-WATER WATER
•55,000 words (English, French)55,000 words (English, French)food <-> aliment <-> FOODfood <-> aliment <-> FOOD
•50,000 asertions about concepts50,000 asertions about conceptsgreen-pea is greengreen-pea is green
•100 scripts100 scripts
ThoughtTreasureThoughtTreasureOverviewOverview
• Text Agents for recognizing names, Text Agents for recognizing names, phones, etcphones, etc• mechanisms for learning new wordsmechanisms for learning new words
•X-phile is someone who likes XX-phile is someone who likes X• a syntactic parsera syntactic parser• a NL generatora NL generator• a semantic parsera semantic parser• an anaphoric parseran anaphoric parser• planning agents for achieving goalsplanning agents for achieving goals• understanding agentsunderstanding agents
ThoughtTreasureThoughtTreasureExampleExample
• Who created Bugs Bunny?Who created Bugs Bunny?
• 1.0 (create human-interrogative-pronoun Bugs-1.0 (create human-interrogative-pronoun Bugs-Bunny)Bunny)• 0.9 (create rock-group-the-Who Bugs-Bunny)0.9 (create rock-group-the-Who Bugs-Bunny)
• 1.0 (create Tex-Avery Bugs-Bunny)1.0 (create Tex-Avery Bugs-Bunny)• 0.1 (not (create rock-group-the-Who Bugs-0.1 (not (create rock-group-the-Who Bugs-Bunny))Bunny))
German Rigau i ClaramuntGerman Rigau i Claramunt
http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center
Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics
Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya
MeaningMeaning
Bases de ConocimientoBases de Conocimiento– EnriquecimientoEnriquecimiento automático de EWN (modelos automático de EWN (modelos
verbales, etc.)verbales, etc.)– Aproximación mixta (KB + ML)Aproximación mixta (KB + ML)– Q/AQ/A
ProblemaProblema– ambigüedad estructural y léxica ambigüedad estructural y léxica
AproximaciónAproximación– localizar automáticamente ejemplos de sentidoslocalizar automáticamente ejemplos de sentidos
(Leacock et al. 98, Mihalcea y Moldovan 99)(Leacock et al. 98, Mihalcea y Moldovan 99)– WSD a gran escala (Boosting, SVM, transductivos WSD a gran escala (Boosting, SVM, transductivos
…)…)– Acquisición Conocimiento (Ribas 95, McCarthy 01)Acquisición Conocimiento (Ribas 95, McCarthy 01)
MeaningMeaningOverviewOverview
MeaningMeaning
Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations
<evento social><evento social>
<competición, concurso><competición, concurso>
<evento><evento>
<<partido_1partido_1>>
<semifinal><semifinal><cuartos_de_final><cuartos_de_final>
<grupo_social><grupo_social>
<organización><organización>
<agrupación grupo colectivo><agrupación grupo colectivo>
<<partido_2partido_2, , partido_político>partido_político>
<partido_laborista><partido_laborista>
MeaningMeaning
Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations
partido 1partido 1
Todos los Todos los partidospartidos piden reformas legales para TV3. piden reformas legales para TV3.La derecha planea agruparse en un La derecha planea agruparse en un partidopartido..El diputado reiteró que ni él ni UDC, “como El diputado reiteró que ni él ni UDC, “como partidopartido”, han ”, han recibido dinero de Pellerols.recibido dinero de Pellerols.
partido 2partido 2
Pero España puso al Pero España puso al partidopartido intensidad, ritmo y coraje. intensidad, ritmo y coraje.El seleccionador cree que el El seleccionador cree que el partidopartido de hoy contra Italia dará de hoy contra Italia dará la medida de Españala medida de EspañaEl Racing no gana en su campo desde hace seis El Racing no gana en su campo desde hace seis partidospartidos..
MeaningMeaning
Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations
partido 1partido 1
No negociaremos nunca com un No negociaremos nunca com un partido políticopartido político que sea que sea partidario de la independencia de Taiwan.partidario de la independencia de Taiwan.Una vez más es noticia la desviación de fondos destinadoss a Una vez más es noticia la desviación de fondos destinadoss a la formación ocupacional hacia la financiación de un la formación ocupacional hacia la financiación de un partido partido políticopolítico..Estas lleyess fueron votadas gracias a un consenso general de Estas lleyess fueron votadas gracias a un consenso general de los los partidos políticospartidos políticos..
partido 2partido 2
Rivera pide el suporte de la afición para encarrilar las Rivera pide el suporte de la afición para encarrilar las semifinalessemifinales..Sólo el equipo de Valero Ribera puede sentenciar una Sólo el equipo de Valero Ribera puede sentenciar una semifinalsemifinal como lo hizo ayer en un Palau Blaugrana como lo hizo ayer en un Palau Blaugrana completamente entregado.completamente entregado.El Racing ganó los El Racing ganó los cuartos de finalcuartos de final en su campo. en su campo.
MultilingualMultilingualCentral RepositoryCentral Repository
ItalianItalianEWNEWN
BasqueBasqueEWNEWN
SpanishSpanishEWNEWN
EnglishEnglishEWNEWN
BasqueWeb Corpus
ItalianWeb Corpus
EnglishWeb Corpus
CatalanCatalanEWNEWN
SpanishWeb Corpus
CatalanWeb Corpus
ACQACQ
ACQACQACQACQ
ACQACQ
UPLOADUPLOADUPLOADUPLOAD
UPLOADUPLOADUPLOADUPLOAD
PORTPORT
PORTPORT
PORTPORT
PORTPORT
WSDWSD
WSDWSD
WSDWSD
WSDWSD
MeaningMeaningArquitectureArquitecture