rogelio nazar & maarten janssen iula, universitat pompeu fabra, barcelona

14
Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Upload: sancho-lares

Post on 22-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Rogelio Nazar & Maarten JanssenIULA, Universitat Pompeu Fabra, Barcelona

Page 2: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Dictionaries good source for information Long tradition of taxonomy extraction

Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008)

Exploiting Machine Readable Dictionaries Parsing definitional phrases Pattern extraction, Shallow parsing Full treatment of a single dictionary

Page 3: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

There is a lot of information available Hand crafted, high-qualify resources

Combining yields new data Taxonomy from multiple dictionaries

Language-independent shallow method Combining definitions of the same word Various dictionaries, online versions DRAE, DGLE, Clave, DEM Frequency Based

Page 4: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Dictionaries differ◦ Different lexicon and definitions◦ Even if only for legal reasons

Hyperonym should be the same◦ A cat is an animal◦ Unless there is uncertainty in the hyperonym

Most dictionaries should use same genus◦ Statistically relevant

Page 5: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

3xablandabrevaspersona2xcom. inútil1xsubstantivocomúnfig.

Page 6: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Directly from harvested text◦ With begin/end tags

No textual analysis More than definitions

◦ Examples, multiple senses, etc. Sense matching impossible

◦ Entries unsystematic◦ Dictionaries do not match in senses

Page 7: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Minimum number of dictionaries Raw frequency count

◦ Hyperonym tends to be repeated Candidates have to be words

◦ Of the same word-class Use of a stop-list

◦ Dictionary generated◦ Words that occur in more than 10% entries

Page 8: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

# deconstrucción (3 dictionaries)teoría 2 1EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico;

# descubrimiento (5 dictionaries)acción 3 3cosa 3 5efecto 2 -EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción;

# cumbia (5 dictionaries)danza 2 -EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana;

# asta (5 dictionaries)mar 6 -lanza 6 -media 5 -toro 5 -cuerno 5 -bandera 4 -EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;

Page 9: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

WordNet (still) best available taxonomy◦ Not the best resource for evaluation

Automatic Verification◦ 100 Random nouns◦ Best 5 hyperonymy candidates◦ Match when candidate in chain

Only about 50% accurracy

Page 10: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona
Page 11: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

WordNet ◦ Many intermediate/artificial levels◦ Compulsory hyperonym◦ Contains proper names

Dictonaries ◦ More word-senses◦ Alternative definitions (synonymy, paraphrasis,

…) Differences

◦ Different choice of hyperonym◦ Different lexicon

Page 12: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona
Page 13: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona
Page 14: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Question?