unsupervised knowledge-free word sense disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Unsupervised Knowledge-Free Word SenseDisambiguation

Dr. Alexander Panchenko

University of Hamburg, Language Technology Group

23 February, 2017

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation


Overview

Introduction

Dense Sense Representations

Sparse Sense Representations

Future Work




About me

I 2008, Engineering degree (MS.) in Computer Science,Moscow State Technical University

I 2009, Research intern, Xerox Research Centre Europe

I 2013, PhD in Natural Language Processing, University ofLouvain

I 2013, Research engineer at a start-up related to socialnetwork analysis (Digsolab)

I 2015, Postdoc at Technical University of Darmstadt

I 2017, Postdoc at University of Hamburg

Topics: computational lexical semantics (semanticsimilarity/relatedness, semantic relations, sense induction, sensedisambiguation), nlp for social network analysis, text categorizationPapers, presentations, datasets: http://panchenko.me



http://panchenko.me


Publications Related to the TalkI Pelevina M., Arefiev N., Biemann C., Panchenko A. (2016)

Making Sense of Word Embeddings. In Proceedings of the1st Workshop on Representation Learning for NLP. ACL 2016,Berlin, Germany. Best Paper Award

I Panchenko A., Simon J., Riedl M., Biemann C. (2016) NounSense Induction and Disambiguation using Graph-BasedDistributional Semantics. In Proceedings of the KONVENS2016, Bochum, Germany

I Panchenko A., Ruppert E., Faralli S., Ponzetto S. P., andBiemann C. (2017). Unsupervised Does Not MeanUninterpretable: The Case for Word Sense Inductionand Disambiguation. In Proceedings of the 15th Conferenceof the European Chapter of the Association for ComputationalLinguistics (EACL’2017). Valencia, Spain




Motivation for Unsupervised Knowledge-Free WSD

I A word sense disambiguation (WSD) system:I Input: word and its context.I Output: a sense of this word.

I Existing approaches (Navigli, 2009):I Knowledge-based approaches that rely on hand-crafted

resources, such as WordNet.I Supervised approaches learn from hand-labeled training data,

such as SemCor.

I Problem 1: hand-crafted lexical resources and training dataexpensive, often inconsistent, domain-dependent.

I Problem 2: These methods assume a fixed sense inventory:I senses emerge and disappear over time.I different applications require different granularities.




Motivation for Unsupervised Knowledge-Free WSD (cont.)

I An alternative route is the unsupervised knowledge-freeapproach.

I learn an interpretable sense inventoryI learn a disambiguation model




Dense Sense Representations for WSD

I Pelevina M., Arefiev N., Biemann C., Panchenko A. MakingSense of Word Embeddings. In Proceedings of the 1stWorkshop on Representation Learning for NLP. ACL 2016,Berlin, Germany.

I An approach to learn word sense embeddings.




Overview of the contribution

Prior methods:

I Induce inventory by clustering of word instances (Li andJurafsky, 2015)

I Use existing inventories (Rothe and Schutze, 2015)

Our method:

I Input: word embeddings

I Output: word sense embeddings

I Word sense induction by clustering of word ego-networks

I Word sense disambiguation based on the induced senserepresentations




Learning Word Sense Embeddings




Word Sense Induction: Ego-Network Clustering

I Graph clustering using the Chinese Whispers algorithm(Biemann, 2006).




Neighbours of Word and Sense Vectors

Vector Nearest Neighbours

tabletray, bottom, diagram, bucket, brackets, stack,basket, list, parenthesis, cup, trays, pile, play-field, bracket, pot, drop-down, cue, plate

table#0

leftmost#0, column#1, randomly#0,tableau#1, top-left0, indent#1, bracket#3,pointer#0, footer#1, cursor#1, diagram#0,grid#0

table#1pile#1, stool#1, tray#0, basket#0, bowl#1,bucket#0, box#0, cage#0, saucer#3, mir-ror#1, birdcage#0, hole#0, pan#1, lid#0

I Neighbours of the word “table” and its senses produced byour method.

I The neighbours of the initial vector belong to both senses.I The neighbours of the sense vectors are sense-specific.




Word Sense Disambiguation

1. Context Extraction

I use context words around the target word

2. Context Filtering

I based on context word’s relevance for disambiguation

3. Sense Choice

I maximize similarity between context vector and sense vector




Word Sense Disambiguation: Example




Evaluation on SemEval 2013 Task 13 Dataset: Comparisonto the State-of-the-art

Model Jacc. Tau WNDCG F.NMI F.B-Cubed

AI-KU (add1000) 0.176 0.609 0.205 0.033 0.317AI-KU 0.176 0.619 0.393 0.066 0.382AI-KU (remove5-add1000) 0.228 0.654 0.330 0.040 0.463Unimelb (5p) 0.198 0.623 0.374 0.056 0.475Unimelb (50k) 0.198 0.633 0.384 0.060 0.494UoS (#WN senses) 0.171 0.600 0.298 0.046 0.186UoS (top-3) 0.220 0.637 0.370 0.044 0.451La Sapienza (1) 0.131 0.544 0.332 – –La Sapienza (2) 0.131 0.535 0.394 – –

AdaGram, α = 0.05, 100 dim 0.274 0.644 0.318 0.058 0.470

w2v 0.197 0.615 0.291 0.011 0.615w2v (nouns) 0.179 0.626 0.304 0.011 0.623JBT 0.205 0.624 0.291 0.017 0.598JBT (nouns) 0.198 0.643 0.310 0.031 0.595TWSI (nouns) 0.215 0.651 0.318 0.030 0.573




Conclusion

I Novel approach for learning word sense embeddings.

I Can use existing word embeddings as input.

I WSD performance comparable to the state-of-the-artsystems.

I Source code and pre-trained models:https://github.com/tudarmstadt-lt/SenseGram



https://github.com/tudarmstadt-lt/SenseGram


Sparse Sense Representations for WSD

I Panchenko A., Simon J., Riedl M., Biemann C. (2016) NounSense Induction and Disambiguation using Graph-BasedDistributional Semantics. In Proceedings of the KONVENS2016, Bochum, Germany

I Panchenko A., Ruppert E., Faralli S., Ponzetto S. P., andBiemann C. (2017). Unsupervised Does Not MeanUninterpretable: The Case for Word Sense Inductionand Disambiguation. In Proceedings of the 15th Conferenceof the European Chapter of the Association for ComputationalLinguistics (EACL’2017). Valencia, Spain




Contributions

I A framework that relies on induced inventories as a pivot forlearning contextual feature representations anddisambiguation.

I The method can integrate several types of context features inan unsupervised way.

I The method is interpretable at several levels.




Outline of the Method

Training Corpus

Contexts

ComputingWordandFeatureSimilarities

WordSenseInduction

Dependencies

Language Model

Co-occurrences

Meta-combination

Disambiguated Contexts

Disambiguation

Dependencies

Language Model

Co-occurrences

FeatureExtraction

Word-Feature Counts from Contexts

Word-Feature Counts from Corpus

Word Sense InvenoryDependency Word-Feature Counts from Corpus

Word Similarities

Feature Similarities

Figure: Outline of our unsupervised interpretable method for word senseinduction and disambiguation




Interpretable Unsupervised Knowledge-Free WSD

Interpretability levels of our model

1. word sense inventory;

2. sense feature representation;

3. results of disambiguation in context.Dr. Alexander Panchenko University of Hamburg, Language Technology Group



WSD based on an Induced Word Sense Inventory




Results on the TWSI Dataset

Table: WSD performance of different configurations of our method onthe full and the sense-balancedTWSI datasets based on the coarse

inventory with 1.96 senses/word




Impact of Word Sense Inventory Granularity on WSDperformance: the TWSI Dataset




Results on the SemEval 2013 Task 13: Word SenseInduction and Disambiguation

Table: WSD performance of the best configuration of our methodidentified on the TWSI dataset as compared to participants of the

SemEval 2013 Task 13 and two systems based on word sense embeddings(AdaGram and SenseGram).




Demonstrating Unsupervised Knowledge-Free WSD




WSD without Sense Inventory: Co-Sets

fruit#1food#0

apple#2 mango#0 pear#0

Hypernym Layer

Co-Hyponym Layer

Hypernymy

Co-Hypernymy




WSD without Sense Inventory: Co-Sets

ID Hypernym Layer, H(c) ⊂ S Co-Hyponym Layer, c ⊂ S

1vegetable#0,fruit#0, crop#0, in-gredient#0, food#0

peach#0, banana#0, pineapple#0, berry#0, black-berry#0, grapefruit#0, strawberry#0, blueberry#0,fruit#0, grape#0, melon#0, orange#0, pear#0,plum#0, raspberry#0, watermelon#0, apple#0,apricot#0, cherry#0

2

programminglanguage#3,technology#0,language#0, for-mat#2, app#0

C#4, Basic#2, Haskell#5, Flash#1, Java#1,Pascal#0, Ruby#6, PHP#0, Ada#1, Oracle#3,Python#3, Apache#3, Visual Basic#1, ASP#2,Delphi#2, SQL Server#0, CSS#0, AJAX#0, theJava#0




Thank you!