rogelio nazar & maarten janssen iula, universitat pompeu fabra, barcelona

Post on 22-Jan-2016

228 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Rogelio Nazar & Maarten JanssenIULA, Universitat Pompeu Fabra, Barcelona

Dictionaries good source for information Long tradition of taxonomy extraction

Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008)

Exploiting Machine Readable Dictionaries Parsing definitional phrases Pattern extraction, Shallow parsing Full treatment of a single dictionary

There is a lot of information available Hand crafted, high-qualify resources

Combining yields new data Taxonomy from multiple dictionaries

Language-independent shallow method Combining definitions of the same word Various dictionaries, online versions DRAE, DGLE, Clave, DEM Frequency Based

Dictionaries differ◦ Different lexicon and definitions◦ Even if only for legal reasons

Hyperonym should be the same◦ A cat is an animal◦ Unless there is uncertainty in the hyperonym

Most dictionaries should use same genus◦ Statistically relevant

3xablandabrevaspersona2xcom. inútil1xsubstantivocomúnfig.

Directly from harvested text◦ With begin/end tags

No textual analysis More than definitions

◦ Examples, multiple senses, etc. Sense matching impossible

◦ Entries unsystematic◦ Dictionaries do not match in senses

Minimum number of dictionaries Raw frequency count

◦ Hyperonym tends to be repeated Candidates have to be words

◦ Of the same word-class Use of a stop-list

◦ Dictionary generated◦ Words that occur in more than 10% entries

# deconstrucción (3 dictionaries)teoría 2 1EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico;

# descubrimiento (5 dictionaries)acción 3 3cosa 3 5efecto 2 -EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción;

# cumbia (5 dictionaries)danza 2 -EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana;

# asta (5 dictionaries)mar 6 -lanza 6 -media 5 -toro 5 -cuerno 5 -bandera 4 -EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;

WordNet (still) best available taxonomy◦ Not the best resource for evaluation

Automatic Verification◦ 100 Random nouns◦ Best 5 hyperonymy candidates◦ Match when candidate in chain

Only about 50% accurracy

WordNet ◦ Many intermediate/artificial levels◦ Compulsory hyperonym◦ Contains proper names

Dictonaries ◦ More word-senses◦ Alternative definitions (synonymy, paraphrasis,

…) Differences

◦ Different choice of hyperonym◦ Different lexicon

Question?

top related