unsupervised knowledge-free word sense disambiguation

32
Introduction Dense Sense Representations Sparse Sense Representations Future Work Unsupervised Knowledge-Free Word Sense Disambiguation Dr. Alexander Panchenko University of Hamburg, Language Technology Group 23 February, 2017 Dr. Alexander Panchenko University of Hamburg, Language Technology Group Unsupervised Knowledge-Free Word Sense Disambiguation

Upload: alexander-panchenko

Post on 12-Apr-2017

324 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Unsupervised Knowledge-Free Word SenseDisambiguation

Dr. Alexander Panchenko

University of Hamburg, Language Technology Group

23 February, 2017

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 2: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Overview

Introduction

Dense Sense Representations

Sparse Sense Representations

Future Work

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 3: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

About me

I 2008, Engineering degree (MS.) in Computer Science,Moscow State Technical University

I 2009, Research intern, Xerox Research Centre Europe

I 2013, PhD in Natural Language Processing, University ofLouvain

I 2013, Research engineer at a start-up related to socialnetwork analysis (Digsolab)

I 2015, Postdoc at Technical University of Darmstadt

I 2017, Postdoc at University of Hamburg

Topics: computational lexical semantics (semanticsimilarity/relatedness, semantic relations, sense induction, sensedisambiguation), nlp for social network analysis, text categorizationPapers, presentations, datasets: http://panchenko.me

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 4: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Publications Related to the TalkI Pelevina M., Arefiev N., Biemann C., Panchenko A. (2016)

Making Sense of Word Embeddings. In Proceedings of the1st Workshop on Representation Learning for NLP. ACL 2016,Berlin, Germany. Best Paper Award

I Panchenko A., Simon J., Riedl M., Biemann C. (2016) NounSense Induction and Disambiguation using Graph-BasedDistributional Semantics. In Proceedings of the KONVENS2016, Bochum, Germany

I Panchenko A., Ruppert E., Faralli S., Ponzetto S. P., andBiemann C. (2017). Unsupervised Does Not MeanUninterpretable: The Case for Word Sense Inductionand Disambiguation. In Proceedings of the 15th Conferenceof the European Chapter of the Association for ComputationalLinguistics (EACL’2017). Valencia, Spain

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 5: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Motivation for Unsupervised Knowledge-Free WSD

I A word sense disambiguation (WSD) system:I Input: word and its context.I Output: a sense of this word.

I Existing approaches (Navigli, 2009):I Knowledge-based approaches that rely on hand-crafted

resources, such as WordNet.I Supervised approaches learn from hand-labeled training data,

such as SemCor.

I Problem 1: hand-crafted lexical resources and training dataexpensive, often inconsistent, domain-dependent.

I Problem 2: These methods assume a fixed sense inventory:I senses emerge and disappear over time.I different applications require different granularities.

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 6: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Motivation for Unsupervised Knowledge-Free WSD

I A word sense disambiguation (WSD) system:I Input: word and its context.I Output: a sense of this word.

I Existing approaches (Navigli, 2009):I Knowledge-based approaches that rely on hand-crafted

resources, such as WordNet.I Supervised approaches learn from hand-labeled training data,

such as SemCor.

I Problem 1: hand-crafted lexical resources and training dataexpensive, often inconsistent, domain-dependent.

I Problem 2: These methods assume a fixed sense inventory:I senses emerge and disappear over time.I different applications require different granularities.

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 7: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Motivation for Unsupervised Knowledge-Free WSD

I A word sense disambiguation (WSD) system:I Input: word and its context.I Output: a sense of this word.

I Existing approaches (Navigli, 2009):I Knowledge-based approaches that rely on hand-crafted

resources, such as WordNet.I Supervised approaches learn from hand-labeled training data,

such as SemCor.

I Problem 1: hand-crafted lexical resources and training dataexpensive, often inconsistent, domain-dependent.

I Problem 2: These methods assume a fixed sense inventory:I senses emerge and disappear over time.I different applications require different granularities.

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 8: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Motivation for Unsupervised Knowledge-Free WSD (cont.)

I An alternative route is the unsupervised knowledge-freeapproach.

I learn an interpretable sense inventoryI learn a disambiguation model

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 9: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Dense Sense Representations for WSD

I Pelevina M., Arefiev N., Biemann C., Panchenko A. MakingSense of Word Embeddings. In Proceedings of the 1stWorkshop on Representation Learning for NLP. ACL 2016,Berlin, Germany.

I An approach to learn word sense embeddings.

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 10: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Overview of the contribution

Prior methods:

I Induce inventory by clustering of word instances (Li andJurafsky, 2015)

I Use existing inventories (Rothe and Schutze, 2015)

Our method:

I Input: word embeddings

I Output: word sense embeddings

I Word sense induction by clustering of word ego-networks

I Word sense disambiguation based on the induced senserepresentations

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 11: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Overview of the contribution

Prior methods:

I Induce inventory by clustering of word instances (Li andJurafsky, 2015)

I Use existing inventories (Rothe and Schutze, 2015)

Our method:

I Input: word embeddings

I Output: word sense embeddings

I Word sense induction by clustering of word ego-networks

I Word sense disambiguation based on the induced senserepresentations

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 12: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Learning Word Sense Embeddings

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 13: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Word Sense Induction: Ego-Network Clustering

I Graph clustering using the Chinese Whispers algorithm(Biemann, 2006).

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 14: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Neighbours of Word and Sense Vectors

Vector Nearest Neighbours

tabletray, bottom, diagram, bucket, brackets, stack,basket, list, parenthesis, cup, trays, pile, play-field, bracket, pot, drop-down, cue, plate

table#0

leftmost#0, column#1, randomly#0,tableau#1, top-left0, indent#1, bracket#3,pointer#0, footer#1, cursor#1, diagram#0,grid#0

table#1pile#1, stool#1, tray#0, basket#0, bowl#1,bucket#0, box#0, cage#0, saucer#3, mir-ror#1, birdcage#0, hole#0, pan#1, lid#0

I Neighbours of the word “table” and its senses produced byour method.

I The neighbours of the initial vector belong to both senses.I The neighbours of the sense vectors are sense-specific.

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 15: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Word Sense Disambiguation

1. Context Extraction

I use context words around the target word

2. Context Filtering

I based on context word’s relevance for disambiguation

3. Sense Choice

I maximize similarity between context vector and sense vector

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 16: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Word Sense Disambiguation: Example

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 17: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Evaluation on SemEval 2013 Task 13 Dataset: Comparisonto the State-of-the-art

Model Jacc. Tau WNDCG F.NMI F.B-Cubed

AI-KU (add1000) 0.176 0.609 0.205 0.033 0.317AI-KU 0.176 0.619 0.393 0.066 0.382AI-KU (remove5-add1000) 0.228 0.654 0.330 0.040 0.463Unimelb (5p) 0.198 0.623 0.374 0.056 0.475Unimelb (50k) 0.198 0.633 0.384 0.060 0.494UoS (#WN senses) 0.171 0.600 0.298 0.046 0.186UoS (top-3) 0.220 0.637 0.370 0.044 0.451La Sapienza (1) 0.131 0.544 0.332 – –La Sapienza (2) 0.131 0.535 0.394 – –

AdaGram, α = 0.05, 100 dim 0.274 0.644 0.318 0.058 0.470

w2v 0.197 0.615 0.291 0.011 0.615w2v (nouns) 0.179 0.626 0.304 0.011 0.623JBT 0.205 0.624 0.291 0.017 0.598JBT (nouns) 0.198 0.643 0.310 0.031 0.595TWSI (nouns) 0.215 0.651 0.318 0.030 0.573

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 18: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Conclusion

I Novel approach for learning word sense embeddings.

I Can use existing word embeddings as input.

I WSD performance comparable to the state-of-the-artsystems.

I Source code and pre-trained models:https://github.com/tudarmstadt-lt/SenseGram

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 19: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Sparse Sense Representations for WSD

I Panchenko A., Simon J., Riedl M., Biemann C. (2016) NounSense Induction and Disambiguation using Graph-BasedDistributional Semantics. In Proceedings of the KONVENS2016, Bochum, Germany

I Panchenko A., Ruppert E., Faralli S., Ponzetto S. P., andBiemann C. (2017). Unsupervised Does Not MeanUninterpretable: The Case for Word Sense Inductionand Disambiguation. In Proceedings of the 15th Conferenceof the European Chapter of the Association for ComputationalLinguistics (EACL’2017). Valencia, Spain

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 20: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Contributions

I A framework that relies on induced inventories as a pivot forlearning contextual feature representations anddisambiguation.

I The method can integrate several types of context features inan unsupervised way.

I The method is interpretable at several levels.

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 21: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Outline of the Method

Training Corpus

Contexts

ComputingWordandFeatureSimilarities

WordSenseInduction

Dependencies

Language Model

Co-occurrences

Meta-combination

Disambiguated Contexts

Disambiguation

Dependencies

Language Model

Co-occurrences

FeatureExtraction

Word-Feature Counts from Contexts

Word-Feature Counts from Corpus

Word Sense InvenoryDependency Word-Feature Counts from Corpus

Word Similarities

Feature Similarities

Figure: Outline of our unsupervised interpretable method for word senseinduction and disambiguation

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 22: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Interpretable Unsupervised Knowledge-Free WSD

Interpretability levels of our model

1. word sense inventory;

2. sense feature representation;

3. results of disambiguation in context.Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 23: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

WSD based on an Induced Word Sense Inventory

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 24: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Results on the TWSI Dataset

Table: WSD performance of different configurations of our method onthe full and the sense-balancedTWSI datasets based on the coarse

inventory with 1.96 senses/word

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 25: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Impact of Word Sense Inventory Granularity on WSDperformance: the TWSI Dataset

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 26: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Results on the SemEval 2013 Task 13: Word SenseInduction and Disambiguation

Table: WSD performance of the best configuration of our methodidentified on the TWSI dataset as compared to participants of the

SemEval 2013 Task 13 and two systems based on word sense embeddings(AdaGram and SenseGram).

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 27: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Demonstrating Unsupervised Knowledge-Free WSD

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 28: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Demonstrating Unsupervised Knowledge-Free WSD

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 29: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Demonstrating Unsupervised Knowledge-Free WSD

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 30: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

WSD without Sense Inventory: Co-Sets

fruit#1food#0

apple#2 mango#0 pear#0

Hypernym Layer

Co-Hyponym Layer

Hypernymy

Co-Hypernymy

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 31: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

WSD without Sense Inventory: Co-Sets

ID Hypernym Layer, H(c) ⊂ S Co-Hyponym Layer, c ⊂ S

1vegetable#0,fruit#0, crop#0, in-gredient#0, food#0

peach#0, banana#0, pineapple#0, berry#0, black-berry#0, grapefruit#0, strawberry#0, blueberry#0,fruit#0, grape#0, melon#0, orange#0, pear#0,plum#0, raspberry#0, watermelon#0, apple#0,apricot#0, cherry#0

2

programminglanguage#3,technology#0,language#0, for-mat#2, app#0

C#4, Basic#2, Haskell#5, Flash#1, Java#1,Pascal#0, Ruby#6, PHP#0, Ada#1, Oracle#3,Python#3, Apache#3, Visual Basic#1, ASP#2,Delphi#2, SQL Server#0, CSS#0, AJAX#0, theJava#0

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation

Page 32: Unsupervised Knowledge-Free Word Sense Disambiguation

Introduction Dense Sense Representations Sparse Sense Representations Future Work

Thank you!

Dr. Alexander Panchenko University of Hamburg, Language Technology Group

Unsupervised Knowledge-Free Word Sense Disambiguation