lecture 21 computational lexical semantics topics features in nltk iii computational lexical...

42
Lecture 21 Computational Lexical Semantics Topics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USC Readings: Readings: NLTK book Chapter 10 Text Chapters 20 April 3, 2013 CSCE 771 Natural Language Processing

Upload: laurence-malone

Post on 05-Jan-2016

229 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

Lecture 21Computational Lexical Semantics

Lecture 21Computational Lexical Semantics

Topics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USC

Readings:Readings: NLTK book Chapter 10

Text Chapters 20

April 3, 2013

CSCE 771 Natural Language Processing

Page 2: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 2 – CSCE 771 Spring 2013

OverviewOverviewLast Time (Programming)Last Time (Programming)

Features in NLTK NL queries SQL NLTK support for Interpretations and Models Propositional and predicate logic support Prover9

TodayToday Last Lectures slides 25-29 Features in NLTK Computational Lexical Semantics

Readings: Readings: Text 19,20 NLTK Book: Chapter 10

Next Time: Computational Lexical Semantics IINext Time: Computational Lexical Semantics II

Page 3: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 3 – CSCE 771 Spring 2013

Model Building in NLTK - Chapter 10continuedModel Building in NLTK - Chapter 10continued

Mace model builderMace model builder

lp = nltk.LogicParser()lp = nltk.LogicParser()

# install Mace4# install Mace4

config_mace4('c:\Python26\Lib\site-packages\prover9')config_mace4('c:\Python26\Lib\site-packages\prover9')

a3 = lp.parse('exists x.(man(x) & walks(x))')a3 = lp.parse('exists x.(man(x) & walks(x))')

c1 = lp.parse('mortal(socrates)')c1 = lp.parse('mortal(socrates)')

c2 = lp.parse('-mortal(socrates)')c2 = lp.parse('-mortal(socrates)')

mb = nltk.Mace(5)mb = nltk.Mace(5)

print mb.build_model(None, [a3, c1])print mb.build_model(None, [a3, c1])

TrueTrue

print mb.build_model(None, [a3, c2])print mb.build_model(None, [a3, c2])

TrueTrue

print mb.build_model(None, [c1, c2])print mb.build_model(None, [c1, c2])

FalseFalse

Page 4: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 4 – CSCE 771 Spring 2013

>>> a4 = lp.parse('exists y. (woman(y) & all x. (man(x) -> >>> a4 = lp.parse('exists y. (woman(y) & all x. (man(x) -> love(x,y)))')love(x,y)))')

>>> a5 = lp.parse('man(adam)')>>> a5 = lp.parse('man(adam)')

>>> a6 = lp.parse('woman(eve)')>>> a6 = lp.parse('woman(eve)')

>>> g = lp.parse('love(adam,eve)')>>> g = lp.parse('love(adam,eve)')

>>> mc = nltk.MaceCommand(g, assumptions=[a4, a5, a6])>>> mc = nltk.MaceCommand(g, assumptions=[a4, a5, a6])

>>> mc.build_model()>>> mc.build_model()

TrueTrue

Page 5: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 5 – CSCE 771 Spring 2013

10.4   The Semantics of English Sentences10.4   The Semantics of English SentencesPrinciple of compositionality --Principle of compositionality --

Page 6: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 6 – CSCE 771 Spring 2013

Representing the λ-Calculus in NLTKRepresenting the λ-Calculus in NLTK

(33)(33)

a.a. (walk(x) chew_gum(x))∧(walk(x) chew_gum(x))∧

b.b. λλx.(walk(x) chew_gum(x))∧x.(walk(x) chew_gum(x))∧

c.c. \x.(walk(x) & chew_gum(x)) -- \x.(walk(x) & chew_gum(x)) -- the NLTK way!the NLTK way!

Page 7: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 7 – CSCE 771 Spring 2013

Lambda0.pyLambda0.py

import nltkimport nltk

from nltk import load_parserfrom nltk import load_parser

lp = nltk.LogicParser()lp = nltk.LogicParser()

e = lp.parse(r'\x.(walk(x) & chew_gum(x))')e = lp.parse(r'\x.(walk(x) & chew_gum(x))')

print eprint e

\x.(walk(x) & chew_gum(x))\x.(walk(x) & chew_gum(x))

e.free()e.free()

print lp.parse(r'\x.(walk(x) & chew_gum(y))')print lp.parse(r'\x.(walk(x) & chew_gum(y))')

\x.(walk(x) & chew_gum(y))\x.(walk(x) & chew_gum(y))

Page 8: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 8 – CSCE 771 Spring 2013

Simple β-reductionsSimple β-reductions

>>> e = lp.parse(r'\x.(walk(x) & chew_gum(x))(gerald)')>>> e = lp.parse(r'\x.(walk(x) & chew_gum(x))(gerald)')

>>> print e>>> print e

\x.(walk(x) & chew_gum(x))(gerald)\x.(walk(x) & chew_gum(x))(gerald)

>>> print e.simplify() [1]>>> print e.simplify() [1]

(walk(gerald) & chew_gum(gerald))(walk(gerald) & chew_gum(gerald))

Page 9: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 9 – CSCE 771 Spring 2013

Predicate reductionsPredicate reductions

>>> e3 = lp.parse('\P.exists x.P(x)(\y.see(y, x))')>>> e3 = lp.parse('\P.exists x.P(x)(\y.see(y, x))')

>>> print e3>>> print e3

(\P.exists x.P(x))(\y.see(y,x))(\P.exists x.P(x))(\y.see(y,x))

>>> print e3.simplify()>>> print e3.simplify()

exists z1.see(z1,x)exists z1.see(z1,x)

Page 10: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 10 – CSCE 771 Spring 2013

Figure 19.7 Inheritance of PropertiesFigure 19.7 Inheritance of Properties

Exists e,x,y Eating(e) ^ Agent(e, x) ^ Theme(e, y)Exists e,x,y Eating(e) ^ Agent(e, x) ^ Theme(e, y)

““hamburger edible?” from wordnethamburger edible?” from wordnet

Copyright ©2009 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Speech and Language Processing, Second EditionDaniel Jurafsky and James H. Martin

Page 11: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 11 – CSCE 771 Spring 2013

Figure 20.1 Possible sense tags for bassFigure 20.1 Possible sense tags for bassChapter 20 – Word Sense disambiguation (WSD)Chapter 20 – Word Sense disambiguation (WSD)

Machine translationMachine translation

Supervised vs unsupervised learningSupervised vs unsupervised learning

Semantic concordance – corpus with words tagged Semantic concordance – corpus with words tagged with sense tags with sense tags

Page 12: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 12 – CSCE 771 Spring 2013

Feature Extraction for WSDFeature Extraction for WSD

Feature vectorsFeature vectors

CollocationCollocation

[w[wi-2i-2, POS, POSi-2i-2, w, wi-1i-1, POS, POSi-1i-1, w, wii, POS, POSii, w, wi+1i+1, POS, POSi+1i+1, w, wi+2i+2, POS, POSi+2i+2]]

Bag-of-words – unordered set of neighboring wordsBag-of-words – unordered set of neighboring words

Represent sets of most frequent content words with Represent sets of most frequent content words with membership vectormembership vector

[0,0,1,0,0,0,1] – set of 3[0,0,1,0,0,0,1] – set of 3rdrd and 7 and 7thth most freq. content word most freq. content word

Window of nearby words/featuresWindow of nearby words/features

Page 13: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 13 – CSCE 771 Spring 2013

Naïve Bayes ClassifierNaïve Bayes Classifier

w – word vectorw – word vector

s – sense tag vectors – sense tag vector

f – feature vector [wf – feature vector [wii, POS, POSii ] for i=1, …n ] for i=1, …n

Approximate by frequency countsApproximate by frequency counts

But how practical?But how practical?

)|(maxarg^

fsPsSs

Page 14: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 14 – CSCE 771 Spring 2013

Looking for Practical formulaLooking for Practical formula

..

Still not practicalStill not practical

)(

)()|(maxarg

)|(maxarg^

fP

sPsfP

fsPs

Ss

Ss

Page 15: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 15 – CSCE 771 Spring 2013

Naïve == Assume IndependenceNaïve == Assume Independence

n

jj sfPsfP

1

)|()|(

Now practical, but realistic?Now practical, but realistic?

n

jj

SssfPs

1

^

)|(maxarg

Page 16: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 16 – CSCE 771 Spring 2013

Training = count frequenciesTraining = count frequencies

..

Maximum likelihood estimator (20.8)Maximum likelihood estimator (20.8)

)(

),()|(

scount

sfcountsfP jij

)(

),()(

j

jii wcount

wscountsP

Page 17: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 17 – CSCE 771 Spring 2013

Decision List ClassifiersDecision List Classifiers

Naïve Bayes Naïve Bayes hard for humans to examine decisions hard for humans to examine decisions and understandand understand

Decision list classifiers - like “case” statementDecision list classifiers - like “case” statement

sequence of (test, returned-sense-tag) pairssequence of (test, returned-sense-tag) pairs

Page 18: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 18 – CSCE 771 Spring 2013

Figure 20.2 Decision List Classifier RulesFigure 20.2 Decision List Classifier Rules

Page 19: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 19 – CSCE 771 Spring 2013

WSD Evaluation, baselines, ceilingsWSD Evaluation, baselines, ceilings

Extrinsic evaluation - evaluating embedded NLP in Extrinsic evaluation - evaluating embedded NLP in end-to-end applications (in vivo)end-to-end applications (in vivo)

Intrinsic evaluation – WSD evaluating by itself (in vitro)Intrinsic evaluation – WSD evaluating by itself (in vitro)

Sense accuracy Sense accuracy

Corpora – SemCor, SENSEVAL, SEMEVALCorpora – SemCor, SENSEVAL, SEMEVAL

Baseline - Most frequent sense (wordnet sense 1)Baseline - Most frequent sense (wordnet sense 1)

Ceiling – Gold standard – human experts with Ceiling – Gold standard – human experts with discussion and agreement discussion and agreement

Page 20: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 20 – CSCE 771 Spring 2013

Figure 20.3 Simplified Lesk AlgorithmFigure 20.3 Simplified Lesk Algorithm

gloss/sentence overlap

Page 21: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 21 – CSCE 771 Spring 2013

Simplified Lesk exampleSimplified Lesk example

The bank can guarantee deposits will eventually cover The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable future tuition costs because it invests in adjustable rate mortgage securities.rate mortgage securities.

Page 22: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 22 – CSCE 771 Spring 2013

SENSEVAL competitions SENSEVAL competitions

http://www.senseval.org/

Check the Senseval-3 website.

Page 23: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 23 – CSCE 771 Spring 2013

Corpus LeskCorpus Lesk

weights applied to overlap wordsweights applied to overlap words

inverse document frequencyinverse document frequency

idfidfii = log (N = log (Ndocsdocs / num docs containing w / num docs containing wii))

Page 24: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 24 – CSCE 771 Spring 2013

20.4.2 Selectional Restrictions and Preferences20.4.2 Selectional Restrictions and Preferences

Page 25: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 25 – CSCE 771 Spring 2013

Wordnet Semantic classes of ObjectsWordnet Semantic classes of Objects

Page 26: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 26 – CSCE 771 Spring 2013

Minimally Supervised WSD: BootstrappingMinimally Supervised WSD: Bootstrapping

Yarowsky algorithmYarowsky algorithm

Heuritics:Heuritics:

1.1. one sense per collocationsone sense per collocations

2.2. one sense per discourseone sense per discourse

Page 27: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 27 – CSCE 771 Spring 2013

Figure 20.4 Two senses of plantFigure 20.4 Two senses of plant

Page 28: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 28 – CSCE 771 Spring 2013

Figure 20.5Figure 20.5

Page 29: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 29 – CSCE 771 Spring 2013

Figure 20.6 Path Based SimilarityFigure 20.6 Path Based Similarity

Page 30: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 30 – CSCE 771 Spring 2013

Figure 20.6 Path Based SimilarityFigure 20.6 Path Based Similarity

..

\\

simsimpathpath(c(c11, c, c22)= 1/pathlen(c)= 1/pathlen(c11, c, c22) (length + 1)) (length + 1)

Page 31: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 31 – CSCE 771 Spring 2013

Information Content word similarityInformation Content word similarity

Page 32: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 32 – CSCE 771 Spring 2013

Figure 20.7 Wordnet with P(c) valuesFigure 20.7 Wordnet with P(c) values

Page 33: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 33 – CSCE 771 Spring 2013

Figure 20.8Figure 20.8

Page 34: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 34 – CSCE 771 Spring 2013

Page 35: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 35 – CSCE 771 Spring 2013

Figure 20.9Figure 20.9

Page 36: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 36 – CSCE 771 Spring 2013

Figure 20.10Figure 20.10

Page 37: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 37 – CSCE 771 Spring 2013

Figure 20.11Figure 20.11

Page 38: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 38 – CSCE 771 Spring 2013

Figure 20.12Figure 20.12

Page 39: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 39 – CSCE 771 Spring 2013

Figure 20.13Figure 20.13

Page 40: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 40 – CSCE 771 Spring 2013

Figure 20.14Figure 20.14

Page 41: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 41 – CSCE 771 Spring 2013

Figure 20.15Figure 20.15

Page 42: Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text

– 42 – CSCE 771 Spring 2013

Figure 20.16Figure 20.16