evaluation of utility of lsa for word sense discrimination esther levin, mehrbod sharifi, jerry ball...

Click here to load reader

Post on 22-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1
  • Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/
  • Slide 2
  • 2 Outline Latent Semantic Analysis (LSA) Word sense discrimination through Context Group Discrimination Paradigm Experiments Sense-based clusters (supervised learning) K-means clustering (unsupervised learning) Homonyms vs. Polysemes Conclusions
  • Slide 3
  • 3 Latent Semantic Analysis (LSA) Deerwester 90 Represents words and passages as vectors in the same (low-dimensional) semantic space Similarity in word meaning is defined by similarity of their contexts.
  • Slide 4
  • 4 LSA Steps 1. Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms 2. Compute SVD 3. Reduce dimension by taking k largest singular values 4. Compute the new vector representations for documents 5. [Our Research] Clustering the new context vectors
  • Slide 5
  • 5 Context Vectors of an ambiguous word Inducing senses of ambiguous words from their contextual similarity Context Group Discrimination Paradigm Shutze 98
  • Slide 6
  • 6 a b a < b Sense 1Sense 2 Context Group Discrimination Paradigm Shutze 98 1. Cluster the context vectors 2. Compute the centroids (sense vectors) 3. Classify new contexts based on distance to centroids
  • Slide 7
  • Experiments
  • Slide 8
  • 8 Experimental Setup Corpus Leacock `93 Line (3 senses 1151 instances) Hard (2 senses 752 instances) Serve (2 senses 1292 instances) Interest (3 senses 2113 instances) Context size: full document (small paragraph) Number of clusters = Number of senses
  • Slide 9
  • 9 Research Objective How well the different senses of ambiguous words are separated in the LSA-based vector space. Parameters: Dimensionality of LSA representation Distance measure L1: City Block L2: Squared Euclidean Cosine
  • Slide 10
  • 10 Sense-based Clusters An instance of supervised learning An upper bound on unsupervised performance of K-means or EM Not influenced by the choice of clustering algorithm Best Case Separation Worst Case Separation
  • Slide 11
  • 11 Training: Finding sense vectors based on 90% of data Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags Random selection, cross validation Sense-based Clusters: Accuracy
  • Slide 12
  • 12 Evaluating Clustering Quality: Tightness and Separation Dispersion: Inter-cluster (K-Means minimizes) Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster
  • Slide 13
  • 13 More on Silhouette Value i Closest Cluster Points are perfectly clustered Points can belong one cluster or another Points belong to wrong cluster a(i) average of all blue lines b(i) average of all yellow lines
  • Slide 14
  • 14 Cosine0.9639 L10.7355 L20.9271 Cosine-0.0876 L1-0.0504 L2-0.0879 Average Silhouette Value Evaluating Clustering Quality: Tightness and Separation
  • Slide 15
  • 15 Sense-based Clusters: Discrimination Accuracy Baseline: Percentage of the majority sense
  • Slide 16
  • 16 Sense-based Clusters: Average Silhouette Value
  • Slide 17
  • 17 Sense-based Clusters: Results Good discrimination accuracy Low silhouette value How is that possible?
  • Slide 18
  • 18 Unsupervised Learning with K-means Cosine measure Start randomlyMost compact resultStart with sense vector Sense-based clustering Training/Testing
  • Slide 19
  • 19 Unsupervised Learning with K-means
  • Slide 20
  • 20 Polysemes vs. Homonyms Polysemes: words with multiple related meanings Homonyms: words with the same spelling but completely different meaning
  • Slide 21
  • 21 Pseudo Words as Homonyms Shutze 98 find it hard to believe exactly how to say a line and about 30 minutes and serve warm set the interest rate on the find it x to believe exactly how to say a x and about 30 minutes and x warm set the x rate on the
  • Slide 22
  • 22 Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes Points on red lines are the most compact cluster out of 10 experiments
  • Slide 23
  • 23 Conclusions Good unsupervised sense discrimination performance for homonyms Major deterioration in sense discrimination of polysemes in absence of supervision Dimensionality reduction benefit is computational only (no peak in performance) Cosine measure performs better than L1 and L2