automated verb sense labelling based on linked lexical resources. presentation by judith...
TRANSCRIPT
1
Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych
Automated Verb Sense Labelling
Based on Linked Lexical Resources
2
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell
3 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Motivation
Motivation
Sense annotated corpora are important resources in NLP
usually created manually which is time consuming and expensive
verbs have more senses and thus, annotating verb senses is more
difficult
Solution
Using a large-scale linked lexical resource for creating data annotated
with verb senses automatically
UBY
4
Linking Lexical Resources at the Word Sense
Level – example: UBY
Web 2.0
IMSLex-Subcat
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
UBY
5
Linking Lexical Resources at the Word Sense
Level – example: UBY
Web 2.0
IMSLex-Subcat
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
UBY
Open Source Java API: http://code.google.com/p/uby/
6 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Automated Verb Sense Labelling: Approach
UBY
Corpus
Uby: Verb Sense Patterns derived from lexical information
Corpus: Verb Sense Patterns derived from verb instances
Similarity Metric
7
WN ask%2:32:01 (make a request or demand for something to somebody)
is linked to FN Id 639 (request to do or give something):
As twenty are required it might pay to ask your supplier for a ` bulk discount ".
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 1: Creation of sense patterns from
enriched senses
UBY
Uby: [ask%2:32:0] be PP VV to ask person for a JJ act
8 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 1: Creation of sense patterns from
enriched senses
9 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 1: Creation of sense patterns from
enriched senses
sense enrichment predicate argument structure information
10 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 2: Automated Labelling based on Pattern
Similarity
WN ask%2:32:01 is linked to FN Id 639:
As twenty are required it might pay to ask your supplier for a ` bulk discount ".
UBY
he would n't be pleased if a rumdum like me were to ask
his daughter for a date
Similarity score: 0.217 > threshold
Uby: [ask%2:32:01] be PP VV to ask person for a JJ act
Corpus: if PP be to ask person for a time
11 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 2: Automated Labelling based on Pattern
Similarity
Using a similarity metric to compare patterns derived from UBY and
patterns derived from verb instances found in corpora
Considers the common bi-, tri-, and four-grams of two patterns:
Takes word order into account!
w >= 1 is the window around the verb
Gn(pi) is the set of ngrams occurring in pattern pi
12
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell
13 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Intrinsic Evaluation
Evaluation for occurrences of Senseval-3 verbs in SemCor (152 verbs)
Ca. 33.000 sense patterns generated from WN-FN-WKT for these verbs
various similarity thresholds t
14 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Extrinsic Evaluation – Experimental Setup
Comparison of two supervised classifiers for verb sense
disambiguation:
1. Trained on an automatically labelled corpus (ALC):
Verb senses for test verbs given in MASC and Senseval-3 are
labelled in a huge Web Corpus with similarity threshold t=0.1
2. Trained on SemCor 3.0
Test data:
1. MASC corpus: 16 verbs annotated with WordNet 3.0 senses, 11 997
test instances
2. Senseval-3 dataset for all-words WSD: 152 verbs annotated with
WordNet 3.0 senses, 442 test instances
15 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Training Sets
0 100000 200000 300000 400000
Training Data ALC
SemCor
SemCor 3.0
Ca. 22.000 train instances of 16
MASC and 152 Senseval-3 verbs
Automatically labelled corpus (ALC)
Ca. 350.000 train instances of 16
MASC and 152 Senseval-3 verbs
16 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Classification
Preprocessing: POS tagging, dependency parsing and Named
Entity recognition
using the TreeTagger and the Stanford Parser and Named
Entity Recognizer form the DKPro Core component collection,
http://dkpro-core-asl.googlecode.com
Features: lexical, syntactic and semantic features
Classification: A separate logistic regression classifier is
trained for each of the test verbs, using WEKA,
http://www.cs.waikato.ac.nz/ml/weka/
17
Performance of classifiers (accuracy)
evaluated on MASC / Senseval-3
SemCor 3.0
Evaluation on MASC: 50.23
Evaluation on Senseval-3: 48.64
(45.20 with back-off)
Automatically labelled corpus (ALC)
Evaluation on MASC: 49.00
Evaluation on Senseval-3: 47.51
(43.24 with back-off)
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
MFS Baseline for the two test sets
1. MASC: MFS baseline: 41.72
2. Senseval-3: MFS baseline: 25.34
Training Sets
18 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Extrinsic Evaluation – effect of sense
enrichment
Best results with the combination WordNet-FrameNet-Wiktionary
WordNet-FrameNet achieves similar accuracy but the coverage is lower
WordNet-FrameNet-Wiktionary-VerbNet achieves lower accuracy
Using WordNet only achieved the lowest coverage and accuracy
19
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell
20
Linked Lexical Resources such as UBY are knowledge bases …
… that can be used to perform automated verb sense labelling
the automatically labelled data can successfully be used to train
supervised Machine Learning systems: Distant / Weak Supervision
This is due to the enriched sense representation for word senses
that are interlinked
Particularly useful for languages such as German where lexical resources
are available but no sense-labelled data exist.
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
21 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Thank You!
Questions?
22 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Training Data Coverage
Coverage of WN senses annotated in MASC in the training data:
There are 22 WN senses with instances in MASC which are not found in
SemCor
There are 34 WN senses with instances in MASC which are not found in
the ALC
The VSD system cannot correctly classify instances of those senses
The Coverage of the WN senses annotated in the test sets by the training
data constitutes the upper bound of our classifiers:
ALC: 0.8805 (increasing the size of the ALC does not help)
SemCor: 0.948
23 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Comparison with other systems for verb sense
disambiguation
State-of-the-art supervised system (Chen and Palmer 2009) on Senseval-
2 data :
0.648 accuracy, MFS baseline: 0.407
Not comparable due to different versions of WordNet used
Best performing Lesk-based system (Miller et al., 2012):
33.86% accuracy for the MASC verbs
30.16% accuracy for the Senseval-3 verbs