vectors and semantics - the proteus project at...

18
Vectors and Semantics Peter Turney November 2008

Upload: tranminh

Post on 10-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Vectors and SemanticsPeter Turney

November 2008

Vectors and Semantics

Peter Turney

2

Vision of the Future

� future of SKDOU: from text to knowledge

� input: web

� output: knowledge

� beyond search: QA with unconstrained questions and answers

� 24/7 continuous automatic learning from the web

� what will that knowledge look like? � default assumption:

� a giant expert system

� but generated automatically, no hand-coding

� what will that knowledge look like? � my opinion:

� expert systems are missing something vital

� expert systems are not a sufficient representation of knowledge

� we need vectors

Vectors and Semantics

Vectors and SemanticsPeter Turney

November 2008

3

Outline

� symbolic versus spatial approaches to knowledge

� logic versus geometry

� term-document matrix

� latent semantic analysis; applications

� pair-pattern matrix

� latent relational analysis; applications

� episodic versus semantic

� some hypotheses about vectors and semantics

� conclusions

� how to acquire knowledge; how to represent knowledge

Vectors and Semantics

4

Outline

� symbolic versus spatial approaches to knowledge

� logic versus geometry

� term-document matrix

� latent semantic analysis; applications

� pair-pattern matrix

� latent relational analysis; applications

� episodic versus semantic

� some hypotheses about vectors and semantics

� conclusions

� how to acquire knowledge; how to represent knowledge

Vectors and Semantics

Vectors and SemanticsPeter Turney

November 2008

5

Symbolic AI

� symbolic approach to knowledge

� logic, propositional calculus, graph theory, set theory ...

� GOFAI: good old-fashioned AI

� benefits

� good for deduction, reasoning about entailment, consistency

� crisp, clean, binary-valued

� good for yes/no questions

� does A entail B?

� costs

� not so good for induction, learning, theories from data

� aliasing: noise due to analog to digital conversion

� not good for questions about similarity

� how similar is A to B?

Symbolic versus Spatial (1 of 3)

6

Spatial AI

� spatial approach to knowledge

� vector spaces, linear algebra, geometry, ...

� machine learning, statistics, feature space, information retrieval

� benefits

� good for induction, learning, theories from data

� fuzzy, analog, real-valued

� good for questions about similarity

� similarity(A,B) = cosine(A,B)

� costs

� not so good for deduction, entailment, consistency

� messy, lots of numbers

� not convenient for communication

� language is digital

Symbolic versus Spatial (2 of 3)

Vectors and SemanticsPeter Turney

November 2008

7

Symbolic vs Spatial

� need to combine symbolic and spatial approaches

� symbolic for communication and entailment

� spatial for similarity and learning

� reference

� Peter Gärdenfors. (2000).Conceptual Spaces: The Geometry of Thought.MIT Press.

Symbolic versus Spatial (3 of 3)

8

Outline

� symbolic versus spatial approaches to knowledge

� logic versus geometry

� term-document matrix

� latent semantic analysis; applications

� pair-pattern matrix

� latent relational analysis; applications

� episodic versus semantic

� some hypotheses about vectors and semantics

� conclusions

� how to acquire knowledge; how to represent knowledge

Vectors and Semantics

Vectors and SemanticsPeter Turney

November 2008

10

Technicalities

� weighting the elements

� give more weight when a term ti is surprisingly frequent in a

document dj

� tf-idf = term frequency times inverse document frequency

� hundreds of variations of tf-idf

� smoothing the matrix

� problem of sparsity, small corpus

� Singular Value Decomposition (SVD), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Nonnegative Matrix Factorization (NMF), ...

� comparing the vectors

� many ways to compare two vectors

� cosine, Jaccard, Euclidean, Dice, correlation, Hamming, ...

Term-Document Matrix (2 of 9)

Vectors and SemanticsPeter Turney

November 2008

11

Information Retrieval

� how similar is document d1 to document d

2?

� cosine of angle between d1 and d

2 column vectors in matrix

� how relevant is document d to query q?

� make a pseudo-document vector to represent q

� cosine of angle between d and q

� references

� Gerard Salton and Michael J. McGill. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.

� Scott Deerwester, Susan T. Dumais, and Richard Harshman. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407.

Term-Document Matrix (3 of 9)

12

Word Similarity

� how similar is term t1 to term t

2?

� cosine of angle between t1 and t

2 row vectors in matrix

� evaluation on TOEFL multiple-choice synonym questions

� 92.5% highest score of any pure (non-hybrid) algorithm

� 64.5% for average human

� references

� Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211�240.

� Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Term-Document Matrix (4 of 9)

Vectors and SemanticsPeter Turney

November 2008

13

Essay Grading

� grade student essays

� latent semantic analysis

� commercial product, Pearson's Knowledge Technologies

� references

� Rehder, B., Schreiner, M.E., Wolfe, M.B., Laham, D., Landauer, T.K., and Kintsch, W. (1998). Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes, 25, 337-354.

� Foltz, P.W., Laham, D., and Landauer, T.K. (1999). Automated essay scoring: Applications to educational technology. Proceedings of the ED-MEDIA �99 Conference, Association for the Advancement of Computing in Education, Charlottesville.

Term-Document Matrix (5 of 9)

14

Textual Cohesion

� measuring textual cohesion

� latent semantic analysis

� reference

� Foltz, P.W., Kintsch, W., and Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes, 25, 285-307.

Term-Document Matrix (6 of 9)

Vectors and SemanticsPeter Turney

November 2008

15

Semantic Orientation

� measuring praise and criticism

� latent semantic analysis

� small set of positive and negative reference words

� good, nice, excellent, positive, fortunate, correct, and superior

� bad, nasty, poor, negative, unfortunate, wrong, and inferior

� semantic orientation of a word X is sum of similarities of X with positive reference words minus sum of similarities of X with negative reference words

� reference

� Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (4), 315-346

Term-Document Matrix (7 of 9)

16

Logic

� logical operations can be performed by linear algebra

� t1 OR t

2 = the vector space spanned by t

1 and t

2

� t1 NOT t

2 = is the projection of t

1 onto the subspace that is

orthogonal to t2

� bass NOT fisherman = bass in the sense of a musical instrument, not bass in the sense of a fish

� reference

� Dominic Widdows. (2004).Geometry and Meaning.CSLI Publications.

Term-Document Matrix (8 of 9)

Vectors and SemanticsPeter Turney

November 2008

17

Summary

� applications for a term-document (word-chunk) matrix

� information retrieval

� measuring word similarity

� essay grading

� textual cohesion

� semantic orientation

� logic

Term-Document Matrix (9 of 9)

18

Outline

� symbolic versus spatial approaches to knowledge

� logic versus geometry

� term-document matrix

� latent semantic analysis; applications

� pair-pattern matrix

� latent relational analysis; applications

� episodic versus semantic

� some hypotheses about vectors and semantics

� conclusions

� how to acquire knowledge; how to represent knowledge

Vectors and Semantics

Vectors and SemanticsPeter Turney

November 2008

19

Pair-Pattern Matrix

� pair-pattern matrix

� rows correspond to pairs of words

� X:Y = mason:stone

� columns correspond to patterns

� �X works with Y�

� element corresponds to the frequency of the given pattern in a corpus, when the variables in the pattern are instantiated with the words in the given pair

� �mason works with stone�

� row vector gives the distribution of the patterns in which the given pair appears

� a signature of the semantic relation between mason and stone

Pair-Pattern Matrix (1 of 8)

20

Technicalities

� exactly the same as with term-document matrices

� weighting the elements

� smoothing the matrix

� comparing the vectors

� many lessons carry over from term-document matrices

� good weighting approaches

� good smoothing algorithms

� good formulas for comparing

Pair-Pattern Matrix (2 of 8)

Vectors and SemanticsPeter Turney

November 2008

21

SAT Analogies

Pair-Pattern Matrix (3 of 8)

� relational similarity of two pairs is cosine of two row vectors

� cosine(traffic:street, water:riverbed) = 0.692

� reference

� Turney, P.D., and Littman, M.L. (2005), Corpus-based learning of analogies and semantic relations, Machine Learning, 60 (1-3), 251-278.

0.692 water:riverbed (e)

0.497 pedestrians:feet (d)

0.687 car:garage (c)

0.572 crop:harvest (b)

0.318 ship:gangplank (a)Choices:

Cosine traffic:street Stem pair:

22

Semantic Relations

� classify noun-modifier expressions according to their semantic relations

� 600 noun-modifier expressions labeled with semantic relations

� 30 classes or 5 classes

� Causality: "cold virus", "onion tear"

� Temporality: "morning frost", "summer travel"

� Spatial: "aquatic mammal", "west coast", "home remedy"

� Participant: "dream analysis", "mail sorter", "blood donor"

� Quality: "copper coin", "rice paper", "picture book"

� supervised nearest neighbour algorithm using cosine of row vectors

� reference

� Turney, P.D. (2006), Similarity of semantic relations, Computational Linguistics, 32 (3), 379-416.

Pair-Pattern Matrix (4 of 8)

Vectors and SemanticsPeter Turney

November 2008

23

Synonyms vs Antonyms

� ESL synonym versus antonym questions

� language test for students of English as a Second Language

� 136 synonym versus antonym questions

� dissimilarity - resemblance syn or ant? (ant)

� naive - callow syn or ant? (syn)

� commend - denounce syn or ant? (ant)

� expose - camouflage syn or ant? (ant)

� galling - irksome syn or ant? (syn)

� two-class supervised learning using row vectors

� reference

� Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

Pair-Pattern Matrix (5 of 8)

24

Similar vs Associated

� similar versus associated

� 3 x 48 = 144 word pairs

� 3 classes: similar, associated, both

� Similar: table-bed, music-art, flea-ant

� Associated: cradle-baby, mug-beer, mold-bread

� Both: ale-beer, uncle-aunt, ball-bat

� three-class supervised learning problem using row vectors

� reference

� Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

Pair-Pattern Matrix (6 of 8)

Vectors and SemanticsPeter Turney

November 2008

25

Systematic Analogies

� analogical mapping between sets of terms

� mapping from solar system to atom

� reference

� submitted but not yet published

Pair-Pattern Matrix (7 of 8)

Source A Mapping M Target B

solar system � atom

sun � nucleus

planet � electron

mass � charge

attracts � attracts

revolves � revolves

gravity � electromagnetism

26

Summary

� applications for a pair-pattern matrix

� proportional analogies

� semantic relations

� synonyms versus antonyms

� similar versus associated

� systematic analogies

Pair-Pattern Matrix (8 of 8)

Vectors and SemanticsPeter Turney

November 2008

27

Outline

� symbolic versus spatial approaches to knowledge

� logic versus geometry

� term-document matrix

� latent semantic analysis; applications

� pair-pattern matrix

� latent relational analysis; applications

� episodic versus semantic

� some hypotheses about vectors and semantics

� conclusions

� how to acquire knowledge; how to represent knowledge

Vectors and Semantics

28

Episodic vs Semantic

� episodic memory is memory of a specific event in one's personal past

� I remember when I first went hang gliding

� I remember when I saw the Great Pyramid of Giza

� semantic memory is memory of basic facts and concepts, unrelated to any specific event in one's personal past

� I remember that the speed of light in a vacuum is approximately3 x 108 meters per second

� I remember that a tesseract is a four-dimensional hypercube composed of eight three-dimensional cubes

� distinction from cognitive psychology

� types of explicit or declarative memory, as opposed to implicit or procedural memory

Episodic vs Semantic (1 of 4)

Vectors and SemanticsPeter Turney

November 2008

29

Episodic vs Semantic

� ACE Local Relation Detection and Recognition (LRDR) task

� �George Bush traveled to France on Thursday for a summit.�

� there is a Physical.Located relation between George Bush and France

� extraction of episodic information from a sentence

� Noun-Modifier Classification task

� acquisition of semantic knowledge from a corpus

� Causality: "cold virus", "onion tear"

� Temporality: "morning frost", "summer travel"

� Spatial: "aquatic mammal", "west coast", "home remedy"

� Participant: "dream analysis", "mail sorter", "blood donor"

� Quality: "copper coin", "rice paper", "picture book"

Episodic vs Semantic (2 of 4)

30

Posterior vs Prior

� posterior probability versus prior probability

� R(X,Y) = X and Y have relation R

� S(X,Y) = X and Y occur in sentence S

� prior probability = prob(R(X,Y)) = semantic

� posterior probability = prob(R(X,Y) | S(X,Y)) = episodic

� ACE Local Relation Detection and Recognition (LRDR) task

� R(X,Y) = there is a Physical.Located relation between George Bush and France

� S(X,Y) = �George Bush traveled to France on Thursday for a summit.�

� Noun-Modifier Classification task

� R(X,Y) = there is a Spatial relation between aquatic and mammal

Episodic vs Semantic (3 of 4)

Vectors and SemanticsPeter Turney

November 2008

32

Outline

� symbolic versus spatial approaches to knowledge

� logic versus geometry

� term-document matrix

� latent semantic analysis; applications

� pair-pattern matrix

� latent relational analysis; applications

� episodic versus semantic

� some hypotheses about vectors and semantics

� conclusions

� how to acquire knowledge; how to represent knowledge

Vectors and Semantics

Vectors and SemanticsPeter Turney

November 2008

33

Knowledge Representation

� need spatial representation

� for measuring similarity

� for estimating probabilities

� need symbolic representation

� for reasoning about entailment

� for communication

� input text and output text

� language is symbolic

Conclusions (1 of 4)

34

Knowledge Acquisition

� spatial approach is able to acquire knowledge from text

� term-document matrix:

� information retrieval, measuring word similarity, essay grading, textual cohesion, semantic orientation, logic

� pair-pattern matrix:

� proportional analogies, semantic relations, synonyms versus antonyms, similar versus associated, systematic analogies

Conclusions (2 of 4)

Vectors and SemanticsPeter Turney

November 2008

35

Knowledge Use

� symbolic representation

� useful for input and output

� compact storage if aliasing is tolerable

� useful for logical reasoning, entailment

� spatial representation

� useful for calculating similarity

� useful for calculating probability

� case-based reasoning, analogical reasoning

� learning

Conclusions (3 of 4)

36

Conclusion

� Information Extraction has focused on episodic information

� IE: NER, MUC, ACE, etc.

� episodic: posterior

� representation is symbolic

� Vector Space Models have focused on semantic information

� VSM: IR, LSA, LRA, cosine, etc.

� semantic: prior

� representation is spatial

� need to combine the two

� IE can use prior information from VSM

� VSM can use posterior information from IE

Conclusions (4 of 4)