combining contexts in lexicon learning for semantic parsing

17
1 Combining Contexts in Lexicon Learning for Semantic Parsing May 25, 2007 NODALIDA 2007, Tartu, Estonia Chris Biemann University of Leipzig Germany Rainer Osswald FernUniversität Hagen Germany Richard Socher Saarland University Germany

Upload: jody

Post on 20-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Combining Contexts in Lexicon Learning for Semantic Parsing. May 25, 2007 NODALIDA 2007, Tartu, Estonia. Richard Socher Saarland University Germany. Chris Biemann University of Leipzig Germany. Rainer Osswald FernUniversität Hagen Germany. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Combining Contexts in Lexicon Learning for Semantic Parsing

1

Combining Contexts in Lexicon Learning for Semantic Parsing

May 25, 2007

NODALIDA 2007, Tartu, Estonia

Chris BiemannUniversity of Leipzig

Germany

Rainer OsswaldFernUniversität Hagen

Germany

Richard SocherSaarland UniversityGermany

Page 2: Combining Contexts in Lexicon Learning for Semantic Parsing

2

Outline• Motivation: lexicon extension for semantic parsing

• The semantic lexicon HaGenLex

• Binary features and complex sorts

• Method: bootstrapping via syntactic contexts

• Results

• Discussion

Page 3: Combining Contexts in Lexicon Learning for Semantic Parsing

3

Motivation• Semantic parsing aims at finding a semantic

representation for a sentence

• Semantic parsing needs as a prerequisite semantic features of words.

• Semantic features are obtained by manually creating lexicon entries (expensive in terms of time and money)

• Given a certain amount of manually created lexicon entries, it might be possible to train a classifier in order to find more entries

• Objective is Precision, Recall is secondary

Page 4: Combining Contexts in Lexicon Learning for Semantic Parsing

4

HaGenLex: Semantic Lexicon for German

complex sort

size: 22,700 entries of these: 13,000 nouns, 6,700 verbs

WORD SEMANTIC CLASSAggressivität nonment-dyn-abs-situationAgonie nonment-stat-abs-situationAgrarprodukt nat-discreteÄgypter human-objectAhn human-objectAhndung nonment-dyn-abs-situationÄhnlichkeit relationAirbag nonax-mov-art-discreteAirbus mov-nonanimate-con-potagAirport art-con-geogrAjatollah human-objectAkademiker human-objectAkademisierung nonment-dyn-abs-situationAkkordeon nonax-mov-art-discreteAkkreditierung nonment-dyn-abs-situationAkku ax-mov-art-discreteAkquisition nonment-dyn-abs-situationAkrobat human-object... ...

Page 5: Combining Contexts in Lexicon Learning for Semantic Parsing

5

Characteristics of complex sorts in HaGenLex

In total, 50 complex sorts for nouns are constructed from allowed combinations of:

• 16 semantic features (binary), e.g. HUMAN+, ARTIFICIAL- • 17 sorts (binary), e.g. concrete, abstract-situation...

sort (hierarchy)

semantic features

complex sorts

Page 6: Combining Contexts in Lexicon Learning for Semantic Parsing

6

Application: WOCADI-Parser

„Welche Bücher von Peter Jackson über Expertensysteme wurden bei Addison-Wesley seit 1985 veröffentlicht?“

Page 7: Combining Contexts in Lexicon Learning for Semantic Parsing

7

General Methodology

Distributional Hypothesis projected on syntactic-semantic contexts for nouns: nouns of similar complex sort are found in similar contexts

We use three kinds of context elements• Adjective Modifier• Verb-Subject (deep)• Verb-Object (deep)

as assigned by the WOCADI parser for training 33 binary classifiers.

Page 8: Combining Contexts in Lexicon Learning for Semantic Parsing

8

DataCorpus:• 3,068,945 sentences obtained from the Leipzig Corpora

Collection• parser coverage: 42%• verb-deep-subject relations: 430,916• verb-deep-object relations: 408,699• adjective-noun relations: 450,184

Lexicon• 11,100 noun entries• lexicon extension: 10-fold cross validation on known nouns• Also unknown nouns will be classified

Page 9: Combining Contexts in Lexicon Learning for Semantic Parsing

9

Algorithm:

Initialize the training set;As long as new nouns get classified { calculate class probabilities for each context element; for all yet unclassified nouns n { Multiply class probs of context elements class-wise; Assign the class with highest probabilities to noun n; }}

Class probabilities per context element:a) count number of per classb) normalize on total number of class wrt. noun classesc) normalize to row sum=1

A threshold regulates the minimum number of different context elements a noun co-occurs with in order to be classified

Bootstrapping Mechanism

Page 10: Combining Contexts in Lexicon Learning for Semantic Parsing

10

From binary classes to complex sorts• Binary classifiers for single features for all three context

element types are combined into one feature assignment:– Lenient: voting– Strict: all classifiers for different context types agree

• Combining the outcome: safe choices

ANIMAL +/-ANIMATE +/-ARTIF +/-AXIAL +/-... (16 features)

... (17 sorts)

ab +/-abs +/-ad +/-as +/-

Selection:compatible complex

sorts that are minimal w.r.t hierarchy and unambiguous.

result classor

reject

Page 11: Combining Contexts in Lexicon Learning for Semantic Parsing

11

Results: binary classes for different context types

=5

=1

most of the binary features are highly biased

Page 12: Combining Contexts in Lexicon Learning for Semantic Parsing

12

Combination of context types =1

Page 13: Combining Contexts in Lexicon Learning for Semantic Parsing

13

Results for complex sorts=5 =1

Complex sorts with highest

training frequency

Page 14: Combining Contexts in Lexicon Learning for Semantic Parsing

14

Typical mistakesPflanze (plant) animal-object instead of plant-objectzart, fleischfressend, fressend, verändert, genmanipuliert, transgen, exotisch, selten, giftig, stinkend,

wachsend...

Nachwuchs (offspring) human-object instead of animal-objectwissenschaftlich, qualifiziert, akademisch, eigen, talentiert, weiblich, hoffnungsvoll, geeignet, begabt,

journalistisch...

Café (café) art-con-geogr instead of nonmov-art-discrete (cf. Restaurant)Wiener, klein, türkisch, kurdisch, romanisch, cyber, philosophisch, besucht, traditionsreich, schnieke,

gutbesucht, ...

Neger (negro) animal-object instead of human-objectweiß, dreckig, gefangen, faul, alt, schwarz, nackt, lieb, gut, brav

but:

Skinhead (skinhead) human-object (ok){16,17,18,19,20,21,22,23,30}ährig, gleichaltrig, zusammengeprügelt, rechtsradikal, brutal

In most cases the wrong class is semantically close. Evaluation metrics did not account for that.

Page 15: Combining Contexts in Lexicon Learning for Semantic Parsing

15

Discussion of ResultsBinary features:• Precision >98% for most binary features• Assigning the smaller class is hard for bias>0.9

Context types• verb-subject and verb-object are better than adjective• verb-subject is best single context for complex sorts • combination always helps for binary features

Complex sorts• Todo: more lenient combination procedure to increase

recall

Page 16: Combining Contexts in Lexicon Learning for Semantic Parsing

16

Conclusion

• Method for semantic lexicon extension• High precision for binary semantic features• Unknown nouns:

– For 3,755 nouns not in the lexicon, a total of 125,491 binary features was assigned.

– For 1,041 unknown nouns, a complex sort was assigned

• Combination to complex sorts yet to be improved• Combination of different context types improves

results

Page 17: Combining Contexts in Lexicon Learning for Semantic Parsing

17

Any Questions?

Thank you very much!