vector-space (distributional) lexical semantics

25
Vector-Space (Distributional) Lexical Semantics

Upload: jed

Post on 08-Feb-2016

52 views

Category:

Documents


2 download

DESCRIPTION

Vector-Space (Distributional) Lexical Semantics. Document corpus. Query String. 1. Doc1 2. Doc2 3. Doc3. Ranked Documents. Information Retrieval System. IR System. The Vector-Space Model. Graphic Representation. Term Weights: Term Frequency. Term Weights: Inverse Document Frequency. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Vector-Space (Distributional) Lexical Semantics

Vector-Space (Distributional)Lexical Semantics

Page 2: Vector-Space (Distributional) Lexical Semantics

2

Information Retrieval System

IRSystem

Query String

Documentcorpus

RankedDocuments

1. Doc12. Doc23. Doc3 . .

Page 3: Vector-Space (Distributional) Lexical Semantics

The Vector-Space Model

Page 4: Vector-Space (Distributional) Lexical Semantics

Graphic Representation

Page 5: Vector-Space (Distributional) Lexical Semantics

Term Weights: Term Frequency

Page 6: Vector-Space (Distributional) Lexical Semantics

Term Weights: Inverse Document Frequency

Page 7: Vector-Space (Distributional) Lexical Semantics

TF-IDF Weighting

Page 8: Vector-Space (Distributional) Lexical Semantics

Similarity Measure

Page 9: Vector-Space (Distributional) Lexical Semantics

Cosine Similarity Measure

Page 10: Vector-Space (Distributional) Lexical Semantics

Vector-Space (Distributional)Lexical Semantics

• Represent word meanings as points (vectors) in a (high-dimensional) Euclidian space.

• Dimensions encode aspects of the context in which the word appears (e.g. how often it co-occurs with another specific word).– “You will know a word by the company that it

keeps.” (J.R. Firth, 1957)• Semantic similarity defined as distance

between points in this semantic space.

10

Page 11: Vector-Space (Distributional) Lexical Semantics

Sample Lexical Vector Space

11

dogcat

manwoman

bottle cup

water

rock

computerrobot

Page 12: Vector-Space (Distributional) Lexical Semantics

Simple Word Vectors

• For a given target word, w, create a bag-of-words “document” of all of the words that co-occur with the target word in a large corpus.– Window of k words on either side.– All words in the sentence, paragraph, or

document.• For each word, create a (tf-idf weighted)

vector from the “document” for that word.• Compute semantic relatedness of words as

the cosine similarity of their vectors. 12

Page 13: Vector-Space (Distributional) Lexical Semantics

Other Contextual Features

• Use syntax to move beyond simple bag-of-words features.

• Produced typed (edge-labeled) dependency parses for each sentence in a large corpus.

• For each target word, produce features for it having specific dependency links to specific other words (e.g. subj=dog, obj=food, mod=red)

13

Page 14: Vector-Space (Distributional) Lexical Semantics

Other Feature Weights

• Replace TF-IDF with other feature weights.• Pointwise mutual information (PMI)

between target word, w, and the given feature, f:

14

𝑃𝑀𝐼 ( 𝑓 ,𝑤 )=𝑙𝑜𝑔 𝑃 ( 𝑓 ,𝑤 )𝑃 ( 𝑓 )𝑝(𝑤)

Page 15: Vector-Space (Distributional) Lexical Semantics

Dimensionality Reduction

• Word-based features result in extremely high-dimensional spaces that can easily result in over-fitting.

• Reduce the dimensionality of the space by using various mathematical techniques to create a smaller set of k new dimensions that most account for the variance in the data.– Singular Value Decomposition (SVD) used in

Latent Semantic Analysis (LSA)– Principle Component Analysis (PCA)

15

Page 16: Vector-Space (Distributional) Lexical Semantics

Sample Dimensionality Reduction

16

Page 17: Vector-Space (Distributional) Lexical Semantics

Sample Dimensionality Reduction

17

Page 18: Vector-Space (Distributional) Lexical Semantics

Evaluation of Vector-Space Lexical Semantics

• Have humans rate the semantic similarity of a large set of word pairs.– (dog, canine): 10; (dog, cat): 7; (dog, carrot): 3;

(dog, knife): 1• Compute vector-space similarity of each

pair.• Compute correlation coefficient (Pearson or

Spearman) between human and machine ratings.

18

Page 19: Vector-Space (Distributional) Lexical Semantics

TOEFL Synonymy Test

• LSA shown to be able to pass TOEFL synonymy test.

19

Page 20: Vector-Space (Distributional) Lexical Semantics

Vector-Space Word Sense Induction

• Create a context-vector for each individual occurrence of the target word, w.

• Cluster these vectors into k groups.• Assume each group represents a “sense” of

the word and compute a vector for this sense by taking the mean of each cluster.

20

Page 21: Vector-Space (Distributional) Lexical Semantics

Sample Word Sense Induction

21

Occurrence vectors for “bat”

hit

flew

woodenplayer

cave

vampire

atebaseball

Page 22: Vector-Space (Distributional) Lexical Semantics

Sample Word Sense Induction

22

Occurrence vectors for “bat”

hit

flew

woodenplayer

cave

vampire

atebaseball

+

+

Page 23: Vector-Space (Distributional) Lexical Semantics

Vector-Space Word Meaning in Context

• Compute a semantic vector for an individual occurrence of a word based on its context.

• Combine a standard vector for a word with vectors representing the immediate context.

23

Page 24: Vector-Space (Distributional) Lexical Semantics

Example Using Dependency Context

24

fired

hunter gunnsubj dobj

fired

boss secretarynsubj dobj

• Compute vector for nsubj-boss by summing contextual vectors for all word occurrences that have “boss” as a subject.

• Compute vector for dobj-secretary by summing contextual vectors for all word occurrences that have “secretary” as a direct object.

• Compute “in context” vector for “fire” in “boss fired secretary” by adding nsubj-boss and dobj-secretary vectors to the general vector for “fire”

Page 25: Vector-Space (Distributional) Lexical Semantics

Conclusions

• A word’s meaning can be represented as a vector that encodes distributional information about the contexts in which the word tends to occur.

• Lexical semantic similarity can be judged by comparing vectors (e.g. cosine similarity).

• Vector-based word senses can be automatically induced by clustering contexts.

• Contextualized vectors for word meaning can be constructed by combining lexical and contextual vectors.

25