german rigau i claramunt rigau talp research center departament de llenguatges i sistemes...

127
German Rigau i Claramunt German Rigau i Claramunt http://www.lsi.upc.es/~rigau http://www.lsi.upc.es/~rigau TALP Research Center TALP Research Center Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes Informàtics Informàtics Universitat Politècnica de Catalunya Universitat Politècnica de Catalunya Ontologies Ontologies

Upload: julio-laurel

Post on 28-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

OntologiesOntologies

OntologiesOntologiesOutlineOutline

• WordNet (Miller et al. 90, Fellbaum 98)WordNet (Miller et al. 90, Fellbaum 98)• EuroWordNet (Vossen et al. 98)EuroWordNet (Vossen et al. 98)• Spanish WordNet Spanish WordNet • Combining Methods (Atserias et al. 97) Combining Methods (Atserias et al. 97) • Mapping hierarchies (Daudé et al. 01)Mapping hierarchies (Daudé et al. 01)• Mikrokosmos (Viegas et al. 96)Mikrokosmos (Viegas et al. 96)• Cyc (Malesh et al. 96)Cyc (Malesh et al. 96)• WordNet 2 (Harabagiu 98)WordNet 2 (Harabagiu 98)• MindNet (Richardson et al. 97)MindNet (Richardson et al. 97)• ThoughtTreasure (Mueller 00)ThoughtTreasure (Mueller 00)• Meaning ...Meaning ...

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

WordNet & WordNet & EuroWordNetEuroWordNet

WordNet & EuroWordNetWordNet & EuroWordNetWordNetWordNet

• Universidad de Princeton (Miller et al. Universidad de Princeton (Miller et al. 1990)1990)• Conceptos lexicalizados (parabras, lexíes) Conceptos lexicalizados (parabras, lexíes) • Relacionados entre sí por relaciones Relacionados entre sí por relaciones semánticassemánticas

• sinonimia sinonimia • antonimia antonimia • hiperonimia-hiponimiahiperonimia-hiponimia• meronimia meronimia • implicaciónimplicación• causacausa• ......

WordNet & EuroWordNetWordNet & EuroWordNetRelaciones Semánticas de WN1.5Relaciones Semánticas de WN1.5

•SinonimiaSinonimia•Conceptos Lexicalizados (Conceptos Lexicalizados (SYNSETSSYNSETS))•Noción Noción débildébil de sinonimia: Sinonimia de sinonimia: Sinonimia en contextoen contexto•SynsetSynset: Conjunto de palabras o lexías : Conjunto de palabras o lexías que en un contexto dado expresan un que en un contexto dado expresan un conceptoconcepto

•Hiperonimia / HiponimiaHiperonimia / Hiponimia•Relación de clase a subclaseRelación de clase a subclase

•MeronimiasMeronimias•Parte componenteParte componente

{mano}{mano}{brazo}{brazo}•Elemento de colectividadElemento de colectividad

{persona}{persona}{gente}{gente}•SustanciaSustancia

{periódico}{periódico}{papel}{papel}

WordNet & EuroWordNetWordNet & EuroWordNetRelacions Semàntiques de WN1.5Relacions Semàntiques de WN1.5

• AntonimiaAntonimia{grande}{grande}{pequeño}{pequeño}

• CausaCausa{matar}{matar}{morir}{morir}

• ImplicaciónImplicación{divorciarse}{divorciarse}{casarse}{casarse}

• DerivaciónDerivación{presidencial}{presidencial}{presidente}{presidente}

• SimilitudSimilitud{bueno}{bueno}{positivo}{positivo}

WordNet & EuroWordNetWordNet & EuroWordNetRelaciones Semánticas de WN1.5Relaciones Semánticas de WN1.5

WordNet & EuroWordNetWordNet & EuroWordNetEjemplo WordNetEjemplo WordNet

<cruiser, squad car, patrol car, ...><cruiser, squad car, patrol car, ...>

<cruiser, squad car, patrol car, ...><cruiser, squad car, patrol car, ...>

<cab, taxi, hack, ...><cab, taxi, hack, ...>

<motor vehicle, automovile,...><motor vehicle, automovile,...>

<vehicle><vehicle>

<conveyance><conveyance>

<car door><car door>

<doorlock><doorlock>

• Proyecto LE-2 4003 Proyecto LE-2 4003 •Telematics Application Programme de la UETelematics Application Programme de la UE

• Redes semánticas de diversas lenguasRedes semánticas de diversas lenguas• Integradas e interconectadasIntegradas e interconectadas

•InglésInglés Universidad de SheffieldUniversidad de Sheffield•HolandésHolandés Univ. de AmsterdamUniv. de Amsterdam•ItalianoItaliano I.L.C. de PisaI.L.C. de Pisa•EspañolEspañol UB, UPC, UNED.UB, UPC, UNED.

• Computers and the HumanitiesComputers and the Humanities• (Vol.monográfico,1998)(Vol.monográfico,1998)

• http://www.hum.uva.nl/~ewn/http://www.hum.uva.nl/~ewn/

WordNet & EuroWordNetWordNet & EuroWordNetEuroWordNetEuroWordNet

•EWN2EWN2Alemán, Francés, Checo, Sueco, EstonioAlemán, Francés, Checo, Sueco, Estonio

•Proyecto ITEMProyecto ITEMCastellano, Catalán, VascoCastellano, Catalán, Vasco

•CREL (Centre de Referència d’Enginyeria CREL (Centre de Referència d’Enginyeria Lingüística)Lingüística)

Catalán (UB, UPC)Catalán (UB, UPC)

WordNet & EuroWordNetWordNet & EuroWordNetExtensiones EuroWordNetExtensiones EuroWordNet

•Desarrollo de recursos BásicosDesarrollo de recursos Básicos•Tratamiento interlingüístico de la Tratamiento interlingüístico de la informacióninformación

- Sistemas multilingües de - Sistemas multilingües de recuperación de información (p.e., recuperación de información (p.e., Internet) Internet) - Módulo léxico-semántico de los - Módulo léxico-semántico de los sistemas de ingeniería lingüísticasistemas de ingeniería lingüística

Extracción de información Extracción de información Traducción automáticaTraducción automática

WordNet & EuroWordNetWordNet & EuroWordNetAplicacionesAplicaciones

•Preservación de las relaciones Preservación de las relaciones semánticas específicas de cada lenguasemánticas específicas de cada lengua•Máxima compatibilidad entre los Máxima compatibilidad entre los diferentes recursosdiferentes recursos•Relativa independencia de los WordNets Relativa independencia de los WordNets

•en el proceso de construcción en el proceso de construcción •en el resultado finalen el resultado final

WordNet & EuroWordNetWordNet & EuroWordNetRequisitos de DiseñoRequisitos de Diseño

•NúcleoNúcleo•El ILIEl ILI•La Top Concept Ontology (TCO)La Top Concept Ontology (TCO)•Ontología de dominios (DO)Ontología de dominios (DO)

•PeriferiaPeriferia•WordNets específicosWordNets específicos

WordNet & EuroWordNetWordNet & EuroWordNetComponentes de EuroWordNetComponentes de EuroWordNet

•Colección no estructurada de elementosColección no estructurada de elementos

•Ligados conLigados con•al menos, un synset de un EWNal menos, un synset de un EWN•un elemento de la TCO o DOun elemento de la TCO o DO

•Asociados a synsets de WN 1.5Asociados a synsets de WN 1.5

WordNet & EuroWordNetWordNet & EuroWordNetInterlingual Index of EuroWordNetInterlingual Index of EuroWordNet

•Jerarquía de conceptos independientes Jerarquía de conceptos independientes de la lenguade la lengua

•distinciones semánticas: objeto, lugar, distinciones semánticas: objeto, lugar, dinámico, …dinámico, …•abstracta (no léxica)abstracta (no léxica)•Superpuesta al ILISuperpuesta al ILI

•Tres tipos de entidades:Tres tipos de entidades:•Primer orden: Primer orden: entidades concretasentidades concretas•Segundo orden: Segundo orden: situaciones estáticas situaciones estáticas o dinámicaso dinámicas•Tercer orden: Tercer orden: proposiciones proposiciones abstractasabstractas

WordNet & EuroWordNetWordNet & EuroWordNetTop Concept Ontology of Top Concept Ontology of EuroWordNetEuroWordNet

Top0

1stOrderEntity1 2ndOrderEntity0

Origin0

Natural21

Living30

Plant18

Human106

Creature2

Animal23

Artifact144

Form0

Substance32

Solid63

Liquid13

Gas1

Object162

Composition0

Part86

Group63

Function55

Vehicle8

SituationType6

Dynamic134

BoundedEvent183

UnboundedEvent48

Static28

Property61

Relation38

SituationComponent0

Cause67

Agentive170

Phenomenal17

Stimulating25

Communication50

Condition62

Existence27

Experience43

Location76

Manner21

Mental90

3rdOrderEntity33

WordNet & EuroWordNetWordNet & EuroWordNetTop Concept Ontology of Top Concept Ontology of EuroWordNetEuroWordNet

•Jerarquía de etiquetas de dominioJerarquía de etiquetas de dominio•Reducción de la polisemiaReducción de la polisemia•Dominios: Dominios:

•Tráfico:Tráfico:•Tráfico rodado, tráfico aéreoTráfico rodado, tráfico aéreo

•Información InternacionalInformación Internacional•MicologíaMicología•MedicinaMedicina

WordNet & EuroWordNetWordNet & EuroWordNetDomain Ontology of EuroWordNetDomain Ontology of EuroWordNet

•Riqueza superior a WNRiqueza superior a WN•Entre:Entre:

•synsets (módulos monolingües)synsets (módulos monolingües)

•registros ILI (multilingües):registros ILI (multilingües):{actuar-1} {actuar-1} EQ-SYNONYMEQ-SYNONYM {‘behave {‘behave in a certain manner’}in a certain manner’}

•registros registros ILIILI y y TCOTCO o o ODOD

WordNet & EuroWordNetWordNet & EuroWordNetRelaciones de EuroWordNetRelaciones de EuroWordNet

eq_synonym has_eq_hyponym has_eq_hypernym

ESP

ILI HOL

IT ING

dito

dedo

finger

toe

head

cabeza

hoofd

kop

finger

toe

finger or toe

head

human head

animal head

WordNet & EuroWordNetWordNet & EuroWordNetRelaciones Interlingüísticas de Relaciones Interlingüísticas de EuroWordNetEuroWordNet

relación etiquetasaplicables

ejemplo descripción

HAS_XPOS_HYPERONYM d c r destrucción > cambiar hiperonimia transcategorialHAS_XPOS_HYPONYM d r cambio > destruir hiponimia transcategorialNEAR_SYNONYM r aparato<>instrumento cuasi-sinonimiaNEAR_ANTONYM r construir <> destrozar cuasi-antonimiaXPOS_NEAR_ANTONYM r construcción > destrozar cuasi-antonimia transcategorialINVOLVED d c r martillear > martillo entidad directamente relacionada

con un eventoROLE d c r vino > beber evento directamente relacionado

con una entidadinvolved_agent d c r educar > educador involvement en que la entidad

realiza un papel agentivorole_agent d c r educador > educar role en que la entidad realiza un

papel agentivorole_location d c r comedor > comer role en que la entidad realiza

un papel locativoHAS_MERONYM d c r n cara > nariz meronimia (genérica)has_mero_portion d c r n pan > mendrugo inversa de la anteriorhas_mero_location d c r n desierto > oasis inversa de la anteriorBE_IN_STATE d c r n belleza > bello estado correspondiente a la

posesión de una cierta propiedad

WordNet & EuroWordNetWordNet & EuroWordNetRelaciones de EuroWordNetRelaciones de EuroWordNet

Spanish WordNet:Spanish WordNet:Building ProcessBuilding Process

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

Spanish WordNetSpanish WordNet

General MethodologyGeneral Methodology

1)1) Mapping to WN1.5Mapping to WN1.5 manual work manual work automatic derivation of equivalents, automatic derivation of equivalents,

using bi-lingual dictionariesusing bi-lingual dictionaries

2) Manual correction2) Manual correction

3) Re-structuring3) Re-structuring

Spanish WordNetSpanish WordNetMain Steps: Main Steps: First Core (Manual Translation)First Core (Manual Translation)

– Nouns: Nouns: A) WN1.5’s Tops File plus first level of A) WN1.5’s Tops File plus first level of

hyponyms (about 800 synsets).hyponyms (about 800 synsets). B) The rest of EWN’s Common Base B) The rest of EWN’s Common Base

Concepts (which were not in our set).Concepts (which were not in our set). C) Manual translation of synsets C) Manual translation of synsets

intermediate between (A) and (B) intermediate between (A) and (B) following WN1.5 hyerarchy following WN1.5 hyerarchy thus building thus building a compact taxonomy equivalent to a compact taxonomy equivalent to WN1.5 without gapsWN1.5 without gaps

– Verbs: Verbs: Manual translation of EWN’s Base Manual translation of EWN’s Base

Concepts (about 150 synsets)Concepts (about 150 synsets)

Spanish WordNetSpanish WordNetMain Steps: Subset 1 (Semi-Main Steps: Subset 1 (Semi-

automatic)automatic) ouns:ouns:

– Applying authomatic methods using bi-lingual Applying authomatic methods using bi-lingual dictionariesdictionaries

– Manual validation of several subsets to check if the Manual validation of several subsets to check if the link is correctlink is correct

– Deriving a Confidence Score (CS) for every authomatic Deriving a Confidence Score (CS) for every authomatic method (heuristic)method (heuristic)

– Selecting pairs synset-word above 85% CS Selecting pairs synset-word above 85% CS – Some manual correction of this Subset 1 (mainly, Some manual correction of this Subset 1 (mainly,

filling gaps)filling gaps) Verbs:Verbs:

– 3600 English verbs connected to WN1.5 senses and 3600 English verbs connected to WN1.5 senses and ambiguously translated to Spanish are manually ambiguously translated to Spanish are manually inspected and disambiguatedinspected and disambiguated

Spanish WordNetSpanish WordNet

Main Steps: Subset 1 (Results 1)Main Steps: Subset 1 (Results 1)

Nouns Verbs Others TotalSynsets 18577 2602 0 21179number of senses (variants) 39620 6795 0 46415X variants per synset 2.22 2.61 0 2.27Corresponding to number of entries (words) 23216 2278 0 25494X senses per word 1.77 2.98 0 1.88Language Internal Relations 40559 3749 0 44308Average per synset 2.18 1.44 0 2.09Equivalent Relations to ILI (WN1.5) 18634 2602 0 21236Average per synset 1.00 1.00 0 1.00Synset without ILI 0 0 0 0Percentage of Synsets without translation 0% 0% 0%

Spanish WordNetSpanish WordNet

Main Steps: Subset 1 (Results 2)Main Steps: Subset 1 (Results 2)

CS Nouns Verbs Total100% (manual) 5041 6795 11836> 97% 403 0 403> 95% 304 0 304> 93% 1598 0 1598> 86% 27649 0 27649> 85 % 4625 0 4625Total 39620 6795 46415

Spanish WordNetSpanish WordNetMain Steps: Subset 2Main Steps: Subset 2

Main goalsMain goals

enhance the quality of the Subset 1 by enhance the quality of the Subset 1 by manual revisionmanual revision

extend it by manual building of synsetsextend it by manual building of synsets

4 Sub-tasks4 Sub-tasks

Spanish WordNetSpanish WordNetMain Steps: Subset 2Main Steps: Subset 2

1) Covering manually those gaps in the 1) Covering manually those gaps in the hyponymy chains covered by other languageshyponymy chains covered by other languages

2) Manual cleaning of some automatically-2) Manual cleaning of some automatically-generated variants. generated variants. – (a) pairs of synsets which are adjacent in the (a) pairs of synsets which are adjacent in the

hyponymy chain and share at least one variant. hyponymy chain and share at least one variant. deleting redundant variants deleting redundant variants re-locating to either pre-existant or newly created re-locating to either pre-existant or newly created

synsetssynsets– (b) multi-word expressions present in synsets. (b) multi-word expressions present in synsets.

Deleting non-lexicalizedDeleting non-lexicalized

Spanish WordNetSpanish WordNetMain Steps: Subset 2Main Steps: Subset 2

3) Manual addition of new vocabulary which has 3) Manual addition of new vocabulary which has been considered relevant. been considered relevant. – It mainly comes from the Catalan WordNet: It mainly comes from the Catalan WordNet:

since we are building both wordnets in since we are building both wordnets in parallell, we detected those synsets which parallell, we detected those synsets which were built for Catalan and not for Spanishwere built for Catalan and not for Spanish

4) Manual addition of 4) Manual addition of cross-part of speechcross-part of speech relations relations between nominal and verbal synsets. between nominal and verbal synsets. – This work has been based mainly on noun-This work has been based mainly on noun-

verb pairs obtained by means of verb pairs obtained by means of morphological criteria. (Work carried out by morphological criteria. (Work carried out by UNED –Madrid-)UNED –Madrid-)

Spanish WordNetSpanish WordNetMain Steps: Subset 2 (Results)Main Steps: Subset 2 (Results)

Nouns Verbs Others Total

Synsets 19663 3538 0 23201number of senses (variants) 39782 8394 0 48176X variants per synset 2.02 2.37 0 2.08Corresponding to number of entries (words) 22881 3324 0 26205X senses per word 1.74 2.53 0 1.84Language Internal Relations 43151 6756 2661 52568Average per synset 2.19 1.91 ? 2.27Equivalent Relations to ILI (WN1.5) 19534 3534 0 23068Average per synset 0.99 1.00 0 0.99Synset without ILI 185 4 0 189Percentage of Synsets without translation 1% 0% 0 1%

Spanish WordNetSpanish WordNetMain Steps: Subset 2 (Results)Main Steps: Subset 2 (Results)

Confidence (Variants) Nouns Verbs Total

100% (Manual) 7819 8394 16213>96% 382 0 382>94% 2948 0 2948>92% 1364 0 1364>85% 23113 0 23113>84% 4156 0 4156Total 39782 8394 48176

Spanish WordNetSpanish WordNetMain Steps: Main Steps: Beyond Subset 2Beyond Subset 2

Massive Manual Checking (from Massive Manual Checking (from Nov’98)Nov’98)

– Using WEIUsing WEI

– Variants automatically generatedVariants automatically generated– Filling gaps in the hierachyFilling gaps in the hierachy– New vocabularyNew vocabulary– New AdjectivesNew Adjectives

Spanish WordNetSpanish WordNetMain Steps: Main Steps: Beyond Subset 2Beyond Subset 2

Noun Verb Others TotalSynsets 24215 4079 2191 30485No. of senses 40759 9317 2439 52515Sens./syns. 1.68 2.28 1.11 1.72Entries 26485 3828 2439 32752Sens./entry 1.54 2.43 1.00 1.60LIRels. 54832 7978 10855 73665LIRels/syns 2.26 1.96 * 2.42EQRels-ILI 24209 4074 0 28283EQRels/syn 1.00 1.00 0 0.93Synsets without ILI 62 5 2191 2258

Spanish WordNetSpanish WordNetMain Steps: Main Steps: Beyond Subset 2Beyond Subset 2

CS Nouns Verbs Adjectives Total99% (Manual) 16568 9317 2439 2832497% 310 0 0 31095% 2652 0 0 265293% 1173 0 0 117390% 6 0 0 686% 16605 0 0 1660585% 3445 0 0 3445Total 40759 9317 2439 52515

Spanish WordNetSpanish WordNetMain Steps: Main Steps: Parole CoverageParole Coverage

Nouns VerbsFrequency parole entries parole covered %coverage parole

entriesparolecovered

%coverage

1001- 147 143 97.28 110 107 97.27501-1000 261 246 94.25 139 118 84.89251-500 462 429 92.86 218 172 78.90101-250 933 863 92.50 381 257 67.4551-100 959 863 89.99 374 265 70.8631-50 892 804 90.13 347 185 53.3121-30 730 632 86.57 286 141 49.3011-20 1202 978 81.36 469 175 37.316-10 1024 790 77.15 360 129 35.833-5 968 665 68.70 254 74 29.132 435 257 59.08 123 32 26.021 643 334 51.94 131 26 19.85overall 8656 7004 80.91 3192 1681 52.66

Spanish WordNetSpanish WordNetCurrent Current FiguresFigures

– Spanish, Catalan, Basque, (English)Spanish, Catalan, Basque, (English)– http://nipadio.lsi.upc.es/wei2.htmlhttp://nipadio.lsi.upc.es/wei2.html

Nouns Verbs Adjs

Synsets Words Synsets Words Synsets Words

English 60556 87641 11363 14727 16428 19101

Spanish 43522 47665 7934 5312 12481 8762

Catalan 30701 32987 4505 4285 1444 1561

Combining Multiple Combining Multiple Methods for the Automatic Methods for the Automatic Construction of Construction of Multilingual WordNetsMultilingual WordNets

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

Ten class methodsTen class methods– Four monosemic criteriaFour monosemic criteria– Four polysemic criteriaFour polysemic criteria– two hybrid criteriatwo hybrid criteria

Three conceptual distance Three conceptual distance methodsmethods– CD1: using pairwise word CD1: using pairwise word

coocurrencescoocurrences– CD2: using headword and genusCD2: using headword and genus– CD3: using bilingual Spanish entries CD3: using bilingual Spanish entries

with multiple translationswith multiple translations

Combining Multiple Combining Multiple Methods ...Methods ...OutlineOutline

– Four ClassesFour Classes

Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods

SWSW EWEW

SWSW EWEW

EWEW

SWSW EWEW

EWEWSWSW

SWSW EWEW

SWSW

– Four monosemic criteriaFour monosemic criteria

SWSW EWEW

SWSW EWEW

EWEW

SynsetSynset

SynsetSynset

SynsetSynset

SynsetSynset

SWSW EWEW

EWEWSWSW

SynsetSynset

SynsetSynset

SWSW EWEW

SWSW

Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods

– Four polysemic criteriaFour polysemic criteria

SWSW EWEW

SWSW EWEW

EWEW

SWSW EWEW

EWEWSWSW

Synset+Synset+

Synset+Synset+

Synset+Synset+

Synset+Synset+

Synset+Synset+

Synset+Synset+

SWSW EWEW

SWSW

Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods

– Variant criterionVariant criterion

– Field criterionField criterion

<..., EW, ..., EW, ...><..., EW, ..., EW, ...>

SWSW

<..., headword-EW, ..., Ind-EW, ...><..., headword-EW, ..., Ind-EW, ...>

SWSW

Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods

ResultsResults

Criterion #links #synsets #words %okmono1 3697 3583 3697 92mono2 935 929 661 89mono3 1863 1158 1863 89mono4 2688 1328 2063 85poly1 5121 4887 1992 80poly2 1450 1426 449 75poly3 11687 6611 3165 58poly4 40298 9400 3754 61Variant 3164 2195 2261 85Field 510 379 421 78

Combining Multiple Combining Multiple Methods ...Methods ...Ten class methodsTen class methods

Conceptual Distance (Agirre et al. 94)Conceptual Distance (Agirre et al. 94)– length of the shortest pathlength of the shortest path– specificity of the conceptsspecificity of the concepts

)c,path(cc kwcwc

21

i2i1k2i2

1i1 )depth(c

1min)w,dist(w

using WordNet using WordNet Bilingual dictionaryBilingual dictionary

Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance Conceptual Distance methodsmethods

Three conceptual distance methodsThree conceptual distance methods– CD1: using pairwise word coocurrencesCD1: using pairwise word coocurrences– CD2: using headword and genusCD2: using headword and genus– CD3: using bilingual Spanish entries with multiple CD3: using bilingual Spanish entries with multiple

translationstranslations

Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance Conceptual Distance methodsmethods

<object, ...><object, ...>

<artifact, artefact><artifact, artefact>

<entity><entity>

<structure, construction><structure, construction>

<building, edifice><building, edifice>

<place of worship, ...><place of worship, ...>

<<churchchurch, church building>, church building>

<<abbeyabbey>>

<house, lodging><house, lodging>

abadíaabadía_1_2 _1_2 IglesiaIglesia o o monasteriomonasterio regido por un abad o abadesa regido por un abad o abadesa

((abbey, a church or a monastery ruled by an abbot or an abbessabbey, a church or a monastery ruled by an abbot or an abbess))

<religious residence, cloiser><religious residence, cloiser>

<monastery><monastery>

<abbey><abbey>

<convent><convent>

<abbey><abbey>

Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance methods (Example Conceptual Distance methods (Example CD2)CD2)

<object, ...><object, ...>

<artifact, artefact><artifact, artefact>

<entity><entity>

<structure, construction><structure, construction>

<building, edifice><building, edifice>

<place of worship, ...><place of worship, ...>

<<churchchurch, church building>, church building>

<<abbeyabbey> > 06 ARTIFACT06 ARTIFACT

<house, lodging><house, lodging>

abadíaabadía_1_2 _1_2 IglesiaIglesia o o monasteriomonasterio regido por un abad o abadesa regido por un abad o abadesa

((abbey, a church or a monastery ruled by an abbot or an abbessabbey, a church or a monastery ruled by an abbot or an abbess))

<religious residence, cloiser><religious residence, cloiser>

<monastery><monastery>

<abbey><abbey>

<convent><convent>

<abbey><abbey>

Combining Multiple Methods ...Combining Multiple Methods ...Conceptual Distance methods (Example Conceptual Distance methods (Example CD2)CD2)

ResultsResults

Criter. #links #synsets #words %okCD - 1 23,828 11,269 7,283 56CD - 2 24,739 12,709 10,300 61CD - 3 4,567 3,089 2,313 75

Combining Multiple Combining Multiple Methods ...Methods ...

Three CD methodsThree CD methods

ResultsResults

method2method1 cd2 cd3 p1 p2 p3 p4cd1 size 15736 1849 2076 556 3146 15105

%ok 79 85 86 86 72 64cd2 size 0 2401 2536 592 3777 13246

%ok 0 86 88 86 75 67cd3 size 0 0 205 180 215 3114

%ok 0 0 95 95 100 77p1 size 0 0 0 0 77 178

%ok 0 0 0 0 100 88p2 size 0 0 0 0 28 78

%ok 0 0 0 0 77 96

Combining Multiple Combining Multiple Methods ...Methods ...

Combining methodsCombining methods

WNs #links #synsets #word #CS #poly linksSpWN v0.0 10,982 7,131 8,396 87.4 1,777Combination 7,244 5,852 3,939 85.6 2,075SpWN v0.1 15,535 10,786 9,986 86.4 3,373

Combining Multiple Methods ...Combining Multiple Methods ...Resulting Spanish Resulting Spanish WordNetsWordNets

Mapping Mapping Conceptual Hierarchies Conceptual Hierarchies Using Relaxation LabellingUsing Relaxation Labelling

German Rigau i ClaramuntGerman Rigau i Claramunt

TALP Research CenterTALP Research Center

UPCUPC

– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II

(monolingual)(monolingual)– Further workFurther work

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting C1C1

C2C2

C3C3

C5C5

C6C6

C4C4

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting C1C1

C2C2

C3C3

C5C5

C6C6

C4C4

Connecting already existing Connecting already existing HierarchiesHierarchies– Relaxattion labelling AlgorithnRelaxattion labelling Algorithn– ConstraintsConstraints

BetweenBetween– Spanish taxonomy automatically Spanish taxonomy automatically

derived from an MRD (Rigau et al. 98)derived from an MRD (Rigau et al. 98)– WordNetWordNet

using a bilingual MRDusing a bilingual MRD

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingSettingSetting

animalanimal

aveave

rapazrapaz

faisánfaisán

(Tops <(Tops <animalanimal, animate_being, ...>), animate_being, ...>)

(person <(person <beastbeast, , brutebrute, ...>), ...>)(person <(person <duncedunce, blockhead, ...>), blockhead, ...>)

(animal <(animal <birdbird>)>)

(artifact <(artifact <birdbird, shuttle, ...>), shuttle, ...>)

(food <(food <fowlfowl, , poultrypoultry, ...>), ...>)

(person <(person <damedame, , dolldoll, ...>), ...>)

(animal <(animal <pheasantpheasant>)>)

(food <(food <pheasantpheasant>)>)

(animal <(animal <birdbird>)>)

(artifact <(artifact <birdbird, shuttle, ...>), shuttle, ...>)

(food <(food <fowlfowl, , poultrypoultry, ...>), ...>)

(person <(person <damedame, , dolldoll, ...>), ...>)

– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II

(monolingual)(monolingual)– Further workFurther work

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline

– Iterative algorithm for function Iterative algorithm for function optimization based on local optimization based on local informationinformation

– it can deal with any kind of it can deal with any kind of constraintsconstraints

variables (senses of the taxonomy)variables (senses of the taxonomy) labels (synsets)labels (synsets)

– Finds a weight assignment for each Finds a weight assignment for each possible label for each variablepossible label for each variable

weights for the labels of the same weights for the labels of the same variable add up to onevariable add up to one

weigth assignation satisfies -to the weigth assignation satisfies -to the maximum possible extent- the set of maximum possible extent- the set of constraintsconstraints

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingRelaxation Labelling AlgorithmRelaxation Labelling Algorithm

1) Start with a random weight assigment1) Start with a random weight assigment

2) Compute the support value for each label of 2) Compute the support value for each label of each variable (according to the constraints) each variable (according to the constraints)

3) Increase the weights of the labels more 3) Increase the weights of the labels more compatible with context and decrease those compatible with context and decrease those and decrease those of the less compatible and decrease those of the less compatible labels.labels.

4) If a stopping/convergence is satisfied, stop,4) If a stopping/convergence is satisfied, stop,

otherwiese go to step 2.otherwiese go to step 2.

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingRelaxation Labelling AlgorithmRelaxation Labelling Algorithm

– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II

(monolingual)(monolingual)– Further workFurther work

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline

– Rely on the taxonomy structureRely on the taxonomy structure– Coded with three charactersCoded with three characters

X: Spanish Taxonomy,X: Spanish Taxonomy, I (immediate),I (immediate), Y: English Taxonomy,Y: English Taxonomy, A (ancestor)A (ancestor) X: Relation, E (hypernym), O (hyponym), B X: Relation, E (hypernym), O (hyponym), B

(both)(both)

– Examples:Examples:

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingConstraintsConstraints

IIEIIE AABAAB

++ ++

++++

– II ConstraintsII Constraints

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints

NAACL’2001NAACL’2001

IIEIIE IIBIIBIIOIIO

– AI ConstraintsAI Constraints

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints

AIEAIE AIBAIB

++

++

NAACL’2001NAACL’2001

AIOAIO

++

++

– IA ConstraintsIA Constraints

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints

IAEIAE IABIAB

++

++

NAACL’2001NAACL’2001

IAOIAO

++

++

– AA ConstraintsAA Constraints

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingHierarchical ConstraintsHierarchical Constraints

AAEAAE AABAAB

++

++

NAACL’2001NAACL’2001

AAOAAO

++

++

++

++

++

++

– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I Experiments & Results I

(multilingual)(multilingual)– Experiments & Results II Experiments & Results II

(monolingual)(monolingual)– Further workFurther work

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline

– Four monosemic criteriaFour monosemic criteria

SWSW EWEW

SWSW EWEW

EWEW

Synset Synset 92%92% 5%5%

SynsetSynset 89%89% 1%1%

SynsetSynset

SynsetSynset 89%89% 2%2%

SWSW EWEW

EWEWSWSW

SynsetSynset 85%85% 4%4%

SynsetSynset

SWSW EWEW

SWSW

Combining Multiple Combining Multiple Methods ...RANLP’97Methods ...RANLP’97Eight class methodsEight class methods

Prec.Prec. Cov.Cov.

– Four polysemic criteriaFour polysemic criteria

SWSW EWEW

SWSW EWEW

EWEW

SWSW EWEW

EWEWSWSW

Synset+ 80%Synset+ 80% 8% 8%

Synset+ 75%Synset+ 75% 2% 2%

Synset+Synset+

Synset+ 58%Synset+ 58% 17% 17%

Synset+ 61%Synset+ 61% 60% 60%

Synset+Synset+

SWSW EWEW

SWSW

Combining Multiple Combining Multiple Methods ...RANLP’97Methods ...RANLP’97Eight class methodsEight class methods

Prec.Prec. Cov.Cov.

PolyPoly TOK, FOKTOK, FOK TOK, FNOKTOK, FNOK totaltotal

animalanimal 279 (90%)279 (90%) 30 (91%)30 (91%) 209 (90%)209 (90%)

foodfood 166 (94%)166 (94%) 3 (100%)3 (100%) 169 (94%)169 (94%)

cognitioncognition 198 (67%)198 (67%) 27 (90%)27 (90%) 225 (69%)225 (69%)

communicationcommunication 533 (77%)533 (77%) 40 (97%)40 (97%) 573 (78%)573 (78%)

allall TOK, FOKTOK, FOK TOK, FNOKTOK, FNOK totaltotal

animalanimal 424 (93%)424 (93%) 62 (95%)62 (95%) 486 (90%)486 (90%)

foodfood 166 (94%)166 (94%) 83 (100%)83 (100%) 249 (96%)249 (96%)

cognitioncognition 200 (67%)200 (67%) 245 (90%)245 (90%) 445 (82%)445 (82%)

communicationcommunication 536 (77%)536 (77%) 234 (97%)234 (97%) 760 (81%)760 (81%)

Combining Multiple Combining Multiple Methods ...RANLP’97 Methods ...RANLP’97 Experiments & ResultsExperiments & Results

Combining Multiple Combining Multiple Methods ...RANLP’97 Methods ...RANLP’97 Experiments & ResultsExperiments & Results

pielpiel

visónvisón

martamarta

(substance <skin, fur, peel>)(substance <skin, fur, peel>)

(substance <sable, marte, coal_back>)(substance <sable, marte, coal_back>)

(substance <mink, mink_coat>)(substance <mink, mink_coat>)

– SettingSetting– Relaxation Labelling AlgorithmRelaxation Labelling Algorithm– ConstraintsConstraints– Experiments & Results I (multilingual)Experiments & Results I (multilingual)– Experiments & Results II Experiments & Results II

(monolingual)(monolingual)– Further workFurther work

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabellingOutlineOutline

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Generalized ConstraintsGeneralized Constraints

All RelationshipsAll Relationships

– also-see, similar-to, attribute, antonym, also-see, similar-to, attribute, antonym, etc.etc.

RRRR

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Generalized ConstraintsGeneralized Constraints

Non-structural constraintsNon-structural constraints

– W: number of word coincidencesW: number of word coincidences– G: word coincidences in glossesG: word coincidences in glosses– F: number of frame coincidences F: number of frame coincidences

(verbs)(verbs)

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01POS mapping depencencesPOS mapping depencences

NounsNouns

AdjectivesAdjectives

VerbsVerbs

AdverbsAdverbs

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Constraints for VerbsConstraints for Verbs

Structural constraintsStructural constraints– hyper/hyponymyhyper/hyponymy– antonymyantonymy– also-seealso-see

Non-structural constraintsNon-structural constraints– W, G and FW, G and F

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01

ConstraintsConstraints AdjectivesAdjectives

Structural constraintsStructural constraints– Adj-to-AdjAdj-to-Adj

antonymy, similar-to and also-seeantonymy, similar-to and also-see

– Adj-to-VerbAdj-to-Verb participle-ofparticiple-of

– Adj-to-NounAdj-to-Noun pertains and attributepertains and attribute

Non-structural constraintsNon-structural constraints– W and GW and G

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01

ConstraintsConstraints AdverbsAdverbs

Structural constraintsStructural constraints– Adv-to-AdvAdv-to-Adv

antonymyantonymy

– Adv-to-AdjAdv-to-Adj derivedderived

Non-structural constraintsNon-structural constraints– W and GW and G

A Complete... ACL’00, NAACL’01A Complete... ACL’00, NAACL’01Example extra-POSExample extra-POS

02025107a02025107a

evangelical evangelisticevangelical evangelistic

04237485n04237485n

Gospel Gospels evangelGospel Gospels evangel

00843344a00843344a

evangelical evangelisticevangelical evangelistic

02025107a02025107a

evangelical evangelical

04853575n04853575n

Gospel Gospels evangelGospel Gospels evangel

00842521a00842521a

enthusiasticenthusiastic

pertainym

pertainym

Similar toWN1.5WN1.5

WN1.6WN1.6

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Example extra-POSExample extra-POS

00057615r00057615r

impossibly absurdlyimpossibly absurdly

01393725a01393725a

impossibleimpossible

00294844r00294844r

impossiblyimpossiblyderived from

WN1.5WN1.5 WN1.6WN1.6

01752468a01752468a

impossibleimpossible

derived from

00294658a00294658a

possiblypossibly

antonym

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults

Basic constraint set: structural constraintsBasic constraint set: structural constraints

– Nouns: Nouns: AAAA hyper/hyponym hyper/hyponym– Verbs: Verbs: AAAA hyper/hyponym, II also-see hyper/hyponym, II also-see– Adjectives: Adjectives: IIII antonymy, similar-to, antonymy, similar-to,

also-seealso-see– Adverbs: Adverbs: IIII antonymy antonymy

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults

Basic constraint set: structural constraintsBasic constraint set: structural constraints

CoverageCoverage AmbigouAmbigouss

OverallOverall

NN

VV

AA

RR 80.8%80.8%94.1%94.1%96.9%96.9%99.7%99.7% 94.9% - 99.6%94.9% - 99.6% 97.6% - 99.8%97.6% - 99.8%

82.8% - 98.9%82.8% - 98.9%

97.5% - 100%97.5% - 100%

93.5% - 99.2%93.5% - 99.2% 94.6% - 99.2%94.6% - 99.2%

89.5% - 99.4%89.5% - 99.4%

99.0% - 100%99.0% - 100%

Precision - recallPrecision - recall

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults

Basic constraint set + W, G and F for Basic constraint set + W, G and F for verbsverbs

CoverageCoverage AmbigouAmbigouss

OverallOverall

NN

VV

AA

RR 99.5%99.5%98.9%98.9%99.8%99.8%99.9%99.9% 97.5% - 97.797.5% - 97.7 %%98.8% - 98.9%98.8% - 98.9%

96.5% - 98.8%96.5% - 98.8%

97.5% - 100%97.5% - 100%

99.4% - 99.7%99.4% - 99.7% 99.3% - 99.6%99.3% - 99.6%

97.9% - 99.3%97.9% - 99.3%

99.0% - 100%99.0% - 100%

Precision - recallPrecision - recall

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01ResultsResults

Basic + extra-POS relationshipsBasic + extra-POS relationships

CoverageCoverage AmbigouAmbigouss

OverallOverall

NN

VV

AA

RR 88.0%88.0%95.8%95.8%---- -- --

95.8% - 98.9%95.8% - 98.9%

69.2% - 94.2%69.2% - 94.2%

-- --

90.9% - 99.4%90.9% - 99.4%

97.9% - 98.1%97.9% - 98.1%

Precision - recallPrecision - recall

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 ResultsResults

Basic + extra-POS relationships + WGFBasic + extra-POS relationships + WGF

CoverageCoverage AmbigouAmbigouss

OverallOverall

NN

VV

AA

RR 99.6%99.6%99.0%99.0%99.8%99.8%99.9%99.9% 97.5% - 97.797.5% - 97.7 %%98.8% - 98.9%98.8% - 98.9%

96.5% - 99.1%96.5% - 99.1%

98.3% - 100%98.3% - 100%

99.4% - 99.7%99.4% - 99.7% 99.3% - 99.6%99.3% - 99.6%

97.9% - 99.5%97.9% - 99.5%

99.3% - 100%99.3% - 100%

Precision - recallPrecision - recall

– First complete mapping between First complete mapping between Wordnet versionsWordnet versions

– Combining structural and non-Combining structural and non-structural informationstructural information

– Robust approach based on local Robust approach based on local information, but with global effectsinformation, but with global effects

– Incremental POS approachIncremental POS approach

– http://www.lsi.upc.es/~nlphttp://www.lsi.upc.es/~nlp– 90 downloads (since November 2000)90 downloads (since November 2000)

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabelling ConclusionsConclusions

– mapping other structuresmapping other structures WN-EDR, WN-LDOCE, etc.WN-EDR, WN-LDOCE, etc. Other language taxonomies to Other language taxonomies to

EuroWordNetEuroWordNet

– SpanishEWN to WN1.6SpanishEWN to WN1.6– symmetrical philosophy rather than symmetrical philosophy rather than

source-targetsource-target

Mapping Conceptual Hierarchies using Relaxation Mapping Conceptual Hierarchies using Relaxation LabellingLabelling Further WorkFurther Work

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

MikrokosmosMikrokosmos

MikrokosmosMikrokosmosOutlineOutline

• Introduction Introduction • Representational IssuesRepresentational Issues

• The LexiconThe Lexicon• The OntologyThe Ontology

• Acquisition ProcessAcquisition Process• Lexicon AcquisitionLexicon Acquisition• GuidelinesGuidelines• Ontology/Lexicon Trade-offOntology/Lexicon Trade-off

• Semantics in ActionSemantics in Action

MikrokosmosMikrokosmosIntroductionIntroduction

• Knowledge Base Machine Translation Knowledge Base Machine Translation (KBMT)(KBMT)• CRL, NMSUCRL, NMSU• 5,000 concepts5,000 concepts

• EventsEvents• ObjectsObjects• PropertiesProperties

• 7,000 Spanish word senses7,000 Spanish word senses• 40,000 word senses 40,000 word senses

• after expansion with productive Lexical after expansion with productive Lexical RulesRules• comprar -> comprador, comprable, ...comprar -> comprador, comprable, ...

• Text Meaning RepresentationText Meaning Representation

MikrokosmosMikrokosmosRepresentational Issues: The LexiconRepresentational Issues: The Lexicon

• Typed Feature Structures (Pollard and Sag 87)Typed Feature Structures (Pollard and Sag 87)• language-dependantlanguage-dependant• 10 zones10 zones

• phonologyphonology• orthographyorthography• morphologymorphology• Syntactic (subcategorization)Syntactic (subcategorization)• Semantic (Lexical Semantic Semantic (Lexical Semantic Representation)Representation)• syntax-semantic linkingsyntax-semantic linking• stylisticsstylistics• paradigmatic paradigmatic • syntacmaticsyntacmatic

MikrokosmosMikrokosmosRepresentational Issues: The LexiconRepresentational Issues: The Lexicon

Adquirir-V1Adquirir-V1syn:syn: subj: subj: cat: cat: NPNP

obj:obj: cat:cat: NPNPsem:sem: acquireacquire

agent:agent: HUMANHUMANtheme: theme: OBJECTOBJECT

Adquirir-V2Adquirir-V2syn:syn: subj: subj: cat: cat: NPNP

obj:obj: cat:cat: NPNPsem:sem: acquireacquire

agent:agent: HUMANHUMANtheme: theme: INFORMATIONINFORMATION

MikrokosmosMikrokosmosRepresentational Issues: The Representational Issues: The OntologyOntology

• Taxonomic multi-hierarchicalTaxonomic multi-hierarchical• 14 local or inherited links in average14 local or inherited links in average• language-impartiallanguage-impartial• EVENTS, OBJECTS, PROPERTIESEVENTS, OBJECTS, PROPERTIES• Methodology & GuidelinesMethodology & Guidelines

MikrokosmosMikrokosmosRepresentational Issues: The Representational Issues: The OntologyOntology

• ACQUIREACQUIREDEFINITION DEFINITION “The transfer of possession event where “The transfer of possession event where the the

agent transfers an object to its agent transfers an object to its possession”possession”IS - A IS - A TRANSFER-POSSESSIONTRANSFER-POSSESSION

SOURCESOURCE HUMAN PLACEHUMAN PLACETHEMETHEME OBJECT (NOT HUMAN)OBJECT (NOT HUMAN)AGENTAGENT ANIMAL (DEFAULT HUMAN)ANIMAL (DEFAULT HUMAN)DESTINATIONDESTINATION ANIMAL PLACE (DEFAULT HUMAN)ANIMAL PLACE (DEFAULT HUMAN)

INHERITEDINHERITED

BENEFICIARYBENEFICIARY HUMANHUMAN

MikrokosmosMikrokosmosAcquisition Process: The LexiconAcquisition Process: The Lexicon

• Multi-lingual Multi-lingual •French, English, Japanese, Russian, Spanish, French, English, Japanese, Russian, Spanish, etc.etc.

• Multi-mediaMulti-media• Multi-processMulti-process

• AnalysisAnalysis• Generation (mono and multilingual)Generation (mono and multilingual)• MTMT• SummarizationSummarization• IEIE• Speech ProcessingSpeech Processing

• ToolsTools• corpus-search, lookup dictionary, ontology corpus-search, lookup dictionary, ontology browserbrowser

MikrokosmosMikrokosmosAcquisition Process: The OntologyAcquisition Process: The Ontology

• GuidelinesGuidelines1) Do not add instances as concepts1) Do not add instances as concepts

• Instances do not have their own instancesInstances do not have their own instances• Concepts do not have fixed position in Concepts do not have fixed position in space/timespace/time

2) Do not decompose concepts further2) Do not decompose concepts further3) Use close concepts3) Use close concepts4) Do not add EVENTs with particular arguments4) Do not add EVENTs with particular arguments5) Do not add concepts with instance-specific 5) Do not add concepts with instance-specific aspects,aspects,

temporal relationstemporal relations6) Do not add language-specific concepts6) Do not add language-specific concepts7) Do not add ontologycal concepts for 7) Do not add ontologycal concepts for collectionscollections

MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off

• Daily negociationsDaily negociations

• lexicon acquirerslexicon acquirers• ontology acquirersontology acquirers

• PossibilitiesPossibilities

• one-to-one mappingone-to-one mapping• lexicon unspecificationlexicon unspecification• lexicon ontology balancelexicon ontology balance

MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off

• one-to-one mappingone-to-one mapping

• ProblemsProblems

• Lexical: every word in a language is a conceptLexical: every word in a language is a concept• conceptual: conceptual: cuire cuire in french is not ambiguousin french is not ambiguous

PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT

COOKCOOKINST: STOVEINST: STOVE

BAKEBAKEINST: OVENINST: OVEN

cook : cuire sur le feucook : cuire sur le feu bake : cuire ou fourbake : cuire ou four

MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off

• Lexicon UnspecificationLexicon Unspecification

• ProblemsProblems

• BAKE is not in the ontology BAKE is not in the ontology

PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT

cook : cuire sur le feucook : cuire sur le feu bake : cuire ou fourbake : cuire ou fourINST: OVENINST: OVEN

MikrokosmosMikrokosmosAcquisition Process: Ontology/Lexicon Acquisition Process: Ontology/Lexicon Trade-offTrade-off

• Lexicon-Ontology BalanceLexicon-Ontology Balance

PREPARE-FOODPREPARE-FOODINST: COOKING-EQUIPMENTINST: COOKING-EQUIPMENT

FRYFRYINST: STOVEINST: STOVEINST: FRYING-PANINST: FRYING-PAN

BAKEBAKEINST: OVENINST: OVEN

cook : cuirecook : cuire

bakebake

MikrokosmosMikrokosmosSemantics in ActionSemantics in Action

• El grupo Roche, a través de su compañía en El grupo Roche, a través de su compañía en España, adquirió Doctor Andreu.España, adquirió Doctor Andreu.• El grupo Roche adquirió Doctor Andreu a través de El grupo Roche adquirió Doctor Andreu a través de su compañía en España.su compañía en España.• La adquisición de Doctor Andreu por el grupo La adquisición de Doctor Andreu por el grupo Roche fue hecha a través de su compañía en España.Roche fue hecha a través de su compañía en España.

ACQUIRE-1ACQUIRE-1 Agent: ORGANIZATION-1Agent: ORGANIZATION-1Theme: ORGANIZATION-2Theme: ORGANIZATION-2Instrument: ORGANIZATION-3Instrument: ORGANIZATION-3

ORGANIZATION-1 ORGANIZATION-1 Object-Name: Grupo Object-Name: Grupo RocheRocheORGANIZATION-2ORGANIZATION-2 Object-Name: Doctor AndreuObject-Name: Doctor AndreuORGANIZATION-3ORGANIZATION-3 Location: EspañaLocation: España

MikrokosmosMikrokosmosSemantics in ActionSemantics in Action

• Onto-Search: Ontological search Onto-Search: Ontological search mechanism to check constraintsmechanism to check constraints

• check-onto(ACQUIRE, EVENT) = 1check-onto(ACQUIRE, EVENT) = 1• since ACQUIRE is a type of EVENTsince ACQUIRE is a type of EVENT

• check-onto(ORGANIZATION, HUMAN) = 0.9check-onto(ORGANIZATION, HUMAN) = 0.9• since ORGANIZATION HAS-MEMBER HUMANsince ORGANIZATION HAS-MEMBER HUMAN

MikrokosmosMikrokosmosSemantics in ActionSemantics in Action

1) 1) a-través-dea-través-de INSTRUMENTINSTRUMENT, LOCATION, LOCATIONadquiriradquirir require PHYSICAL-OBJECT require PHYSICAL-OBJECT

2) 2) enen LOCATIONLOCATION, TEMPORAL, TEMPORALEspaña is not a TEMPORAL-OBJECTEspaña is not a TEMPORAL-OBJECT

3) 3) adquiriradquirir ACQUIREACQUIRE, LEARN, LEARNDoctor Andreu is not an INFORMATIONDoctor Andreu is not an INFORMATION

4) 4) Doctor AndreuDoctor Andreu ORGANIZATIONORGANIZATION, HUMAN, HUMANthe Theme of ACQUIRE is not HUMANthe Theme of ACQUIRE is not HUMAN

5) 5) compañíacompañía CORPORATIONCORPORATION, SOCIAL-EVENT, SOCIAL-EVENTORGANIZATIONs typically fill the INSTRUMENT ORGANIZATIONs typically fill the INSTRUMENT

slot of slot of ACQUIRE actsACQUIRE acts

MikrokosmosMikrokosmosExperiment: WSDExperiment: WSD

TextText 11 22 33 44 MeanMeanwordswords 347347 385385 370370 353353 364364words/sentencewords/sentence 16.516.5 24.024.0 26.426.4 20.820.8 21.421.4open-class wordsopen-class words 183183 167167 177177 177177 176176ambiguous wordsambiguous words 5757 4242 5757 3535 4848

syntaxsyntax 2121 1919 2020 12121818correctcorrect 5151 4141 4545 3434 4343%% 9797 9999 9393 9999 9797

MikrokosmosMikrokosmosExperiment: WSDExperiment: WSD

TextText MeanMean Mean UnseenMean Unseenwordswords 364364 390390words/sentencewords/sentence 21.421.4 2626open-class wordsopen-class words 176176 104104ambiguous wordsambiguous words 4848 2626

syntaxsyntax 1818 99correctcorrect 4343 2323%% 9797 9797

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

WordNet2WordNet2

WordNet2WordNet2OutlineOutline

• IntroductionIntroduction• Text InferencesText Inferences• Defining FeaturesDefining Features• Plausible inferencesPlausible inferences• Inference RulesInference Rules• Semantic PathsSemantic Paths• What WordNet cannot doWhat WordNet cannot do

WordNet2WordNet2IntroductionIntroduction

• (Harabagiu 98)(Harabagiu 98)• Commonse reasoning requires extensive Commonse reasoning requires extensive knowledgeknowledge• ~ 100 millions of concepts and relations~ 100 millions of concepts and relations• WordNet WordNet

• represents almost all English wordsrepresents almost all English words• 100.000 synsets100.000 synsets• linked by semantic relationslinked by semantic relations

• WordNet2WordNet2• each synset has a gloss that, when each synset has a gloss that, when disambiguated may increase the number of disambiguated may increase the number of relationsrelations• WordNet glosses into semantic networksWordNet glosses into semantic networks• NEW RELATIONSNEW RELATIONS

WordNet2WordNet2Text InferencesText Inferences

German was hungryGerman was hungryHe opened the refrigeratorHe opened the refrigerator

• hungry (feeling a need or desire to eat)hungry (feeling a need or desire to eat)

• eat (take in solid food)eat (take in solid food)

• refrigerator (an appliance in which foods refrigerator (an appliance in which foods can be can be stored at low temperature)stored at low temperature)

WordNet2WordNet2Defining FeaturesDefining Features

• Transform each concept’s gloss into a Transform each concept’s gloss into a graph where concepts are nodes and lexical graph where concepts are nodes and lexical relations are linksrelations are links

• <culture> (all the knowledge shared by society)<culture> (all the knowledge shared by society)<share> --AGENT--> <society><share> --AGENT--> <society>

• <doctor> (licensed medical practitioner)<doctor> (licensed medical practitioner)<medical practitioner> --ATRIBUTTE--> <medical practitioner> --ATRIBUTTE--> <licensed><licensed>

WordNet2WordNet2Defining FeaturesDefining Features

pilotpilot personperson

qualifiedqualified

guideguide

waterwater

difficultdifficult

GLOSSGLOSSATTRIBUTEATTRIBUTE

PURPOSEPURPOSE LOCATIONLOCATION

ATTRIBUTEATTRIBUTE

shipship

OBJECTOBJECT

WordNet2WordNet2Inference RulesInference Rules

Rule 1Rule 1 Rule 2Rule 2VC1VC1 IS-AIS-A VC2VC2 VC1VC1 IS-AIS-A VC2VC2VC2VC2 IS-AIS-A VC3VC3 VC2VC2 ENTAILENTAILVC3VC3-------------------------------------------------- --------------------------------------------------VC1VC1 IS-AIS-A VC3VC3 VC1VC1 ENTAILENTAILVC3VC3

Rule 3Rule 3 Rule 2Rule 2VC1VC1 IS-AIS-A VC2VC2 VC1VC1 IS-AIS-A VC2VC2VC2VC2 R_IS-AR_IS-A VC3VC3 VC2VC2 R_ENTAIL VC3R_ENTAIL VC3-------------------------------------------------- --------------------------------------------------VC1VC1 PLAUSIBLE (not VC3)PLAUSIBLE (not VC3) VC1VC1 EXPLAINS VC3EXPLAINS VC3

• 16 + 1 regles16 + 1 regles

WordNet2WordNet2Semantic PathsSemantic Paths

0) Create and load the KB0) Create and load the KB1) Place markers on KB concepts1) Place markers on KB concepts2) Propagate markers2) Propagate markers

The algorithm avoids cyclesThe algorithm avoids cycles3) Detect collisions3) Detect collisions

To each marker collision it To each marker collision it corresponds a pathcorresponds a path4) Extract Inferences4) Extract Inferences

WordNet2WordNet2Semantic PathsSemantic Paths

Inference sequenceInference sequence• German was hungryGerman was hungry• German felt a desire to eatGerman felt a desire to eat• German felt a desire to take in foodGerman felt a desire to take in food

COLLISION: German=he felt a desire to take COLLISION: German=he felt a desire to take food, stored in an appliance, food, stored in an appliance,

which he openedwhich he opened

• He opened an appliance where food is He opened an appliance where food is storedstored• He opened the refrigeratorHe opened the refrigerator

WordNet2WordNet2What WordNet cannot doWhat WordNet cannot do

Major WordNet limitations:Major WordNet limitations:

1) The lack of compound concepts1) The lack of compound concepts

2) The small number of causation and 2) The small number of causation and entailment relationsentailment relations

3) the lack of preconditions for verbs3) the lack of preconditions for verbs

4) the absence of case relations4) the absence of case relations

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

ThoughtTreasureThoughtTreasure

ThoughtTreasureThoughtTreasureOverviewOverview

• a comprehensive platform fora comprehensive platform for• NLP English, FrenchNLP English, French• commonsense reasoningcommonsense reasoning

• A hotel room has a bed, night table, ...A hotel room has a bed, night table, ...• People has fingernailsPeople has fingernails• soda is a drinksoda is a drink• one hangs up at the end of a phone callone hangs up at the end of a phone call• the sky is bluethe sky is blue• dogs barkdogs bark• someone who is 16 years old is a teenager someone who is 16 years old is a teenager

ThoughtTreasureThoughtTreasureOverviewOverview

• 25,000 concepts organized into a 25,000 concepts organized into a hierarchyhierarchy

EVIAN -> FLAT-WATER -> DRINKING-EVIAN -> FLAT-WATER -> DRINKING-WATER WATER

•55,000 words (English, French)55,000 words (English, French)food <-> aliment <-> FOODfood <-> aliment <-> FOOD

•50,000 asertions about concepts50,000 asertions about conceptsgreen-pea is greengreen-pea is green

•100 scripts100 scripts

ThoughtTreasureThoughtTreasureOverviewOverview

• Text Agents for recognizing names, Text Agents for recognizing names, phones, etcphones, etc• mechanisms for learning new wordsmechanisms for learning new words

•X-phile is someone who likes XX-phile is someone who likes X• a syntactic parsera syntactic parser• a NL generatora NL generator• a semantic parsera semantic parser• an anaphoric parseran anaphoric parser• planning agents for achieving goalsplanning agents for achieving goals• understanding agentsunderstanding agents

ThoughtTreasureThoughtTreasureExampleExample

• Who created Bugs Bunny?Who created Bugs Bunny?

• 1.0 (create human-interrogative-pronoun Bugs-1.0 (create human-interrogative-pronoun Bugs-Bunny)Bunny)• 0.9 (create rock-group-the-Who Bugs-Bunny)0.9 (create rock-group-the-Who Bugs-Bunny)

• 1.0 (create Tex-Avery Bugs-Bunny)1.0 (create Tex-Avery Bugs-Bunny)• 0.1 (not (create rock-group-the-Who Bugs-0.1 (not (create rock-group-the-Who Bugs-Bunny))Bunny))

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

MeaningMeaning

Bases de ConocimientoBases de Conocimiento– EnriquecimientoEnriquecimiento automático de EWN (modelos automático de EWN (modelos

verbales, etc.)verbales, etc.)– Aproximación mixta (KB + ML)Aproximación mixta (KB + ML)– Q/AQ/A

ProblemaProblema– ambigüedad estructural y léxica ambigüedad estructural y léxica

AproximaciónAproximación– localizar automáticamente ejemplos de sentidoslocalizar automáticamente ejemplos de sentidos

(Leacock et al. 98, Mihalcea y Moldovan 99)(Leacock et al. 98, Mihalcea y Moldovan 99)– WSD a gran escala (Boosting, SVM, transductivos WSD a gran escala (Boosting, SVM, transductivos

…)…)– Acquisición Conocimiento (Ribas 95, McCarthy 01)Acquisición Conocimiento (Ribas 95, McCarthy 01)

MeaningMeaningOverviewOverview

MeaningMeaning

Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations

<evento social><evento social>

<competición, concurso><competición, concurso>

<evento><evento>

<<partido_1partido_1>>

<semifinal><semifinal><cuartos_de_final><cuartos_de_final>

<grupo_social><grupo_social>

<organización><organización>

<agrupación grupo colectivo><agrupación grupo colectivo>

<<partido_2partido_2, , partido_político>partido_político>

<partido_laborista><partido_laborista>

MeaningMeaning

Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations

partido 1partido 1

Todos los Todos los partidospartidos piden reformas legales para TV3. piden reformas legales para TV3.La derecha planea agruparse en un La derecha planea agruparse en un partidopartido..El diputado reiteró que ni él ni UDC, “como El diputado reiteró que ni él ni UDC, “como partidopartido”, han ”, han recibido dinero de Pellerols.recibido dinero de Pellerols.

partido 2partido 2

Pero España puso al Pero España puso al partidopartido intensidad, ritmo y coraje. intensidad, ritmo y coraje.El seleccionador cree que el El seleccionador cree que el partidopartido de hoy contra Italia dará de hoy contra Italia dará la medida de Españala medida de EspañaEl Racing no gana en su campo desde hace seis El Racing no gana en su campo desde hace seis partidospartidos..

MeaningMeaning

Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations

partido 1partido 1

No negociaremos nunca com un No negociaremos nunca com un partido políticopartido político que sea que sea partidario de la independencia de Taiwan.partidario de la independencia de Taiwan.Una vez más es noticia la desviación de fondos destinadoss a Una vez más es noticia la desviación de fondos destinadoss a la formación ocupacional hacia la financiación de un la formación ocupacional hacia la financiación de un partido partido políticopolítico..Estas lleyess fueron votadas gracias a un consenso general de Estas lleyess fueron votadas gracias a un consenso general de los los partidos políticospartidos políticos..

partido 2partido 2

Rivera pide el suporte de la afición para encarrilar las Rivera pide el suporte de la afición para encarrilar las semifinalessemifinales..Sólo el equipo de Valero Ribera puede sentenciar una Sólo el equipo de Valero Ribera puede sentenciar una semifinalsemifinal como lo hizo ayer en un Palau Blaugrana como lo hizo ayer en un Palau Blaugrana completamente entregado.completamente entregado.El Racing ganó los El Racing ganó los cuartos de finalcuartos de final en su campo. en su campo.

MultilingualMultilingualCentral RepositoryCentral Repository

ItalianItalianEWNEWN

BasqueBasqueEWNEWN

SpanishSpanishEWNEWN

EnglishEnglishEWNEWN

BasqueWeb Corpus

ItalianWeb Corpus

EnglishWeb Corpus

CatalanCatalanEWNEWN

SpanishWeb Corpus

CatalanWeb Corpus

ACQACQ

ACQACQACQACQ

ACQACQ

UPLOADUPLOADUPLOADUPLOAD

UPLOADUPLOADUPLOADUPLOAD

PORTPORT

PORTPORT

PORTPORT

PORTPORT

WSDWSD

WSDWSD

WSDWSD

WSDWSD

MeaningMeaningArquitectureArquitecture