computational linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential...

40
Computational Linguistics CS579: Fall Semester 2020 School of Computing Korea Advanced Institute of Science and Technology Jong C. Park © All rights reserved.

Upload: others

Post on 14-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Computational LinguisticsCS579: Fall Semester 2020

School of ComputingKorea Advanced Institute of Science and Technology

Jong C. Park

© All rights reserved.

Page 2: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

We reviewed a first-order model checker.• Represent models and first-order formulas in Prolog;

Specify the evaluation process in Prolog The Satisfaction Definition in Prolog

• satisfy/4 for boolean connectives (negation, conjunction, disjunction, implication), existential and universal quantification, one-place and two-place predicates, equality

• Interpretation function We discussed the need to refine the Model Checker. We examined the relationship between First-Order

Logic and Natural Language.• Inferential and Representational Capabilities

Fall 2020 KAIST CS579: Computational Linguistics 2

Review of the Last Lecture

Page 3: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Compositionality Two Experiments The Lambda Calculus Implementing Lambda Calculus Grammar Engineering

Fall 2020 KAIST CS579: Computational Linguistics 3

Lambda Calculus

Page 4: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Goals Today• We study the following question:• How can we automate the process of

associating semantic representations with expressions of natural language?

• We discuss the idea of compositionality, experiment with ways of implementing compositional semantic construction, and lambda calculus.

Fall 2020 KAIST CS579: Computational Linguistics 4

Lambda Calculus

Page 5: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

COMPOSITIONALITY

CS579: Computational Linguistics 5Fall 2020 KAIST

Page 6: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Question: • Given a sentence of English, is there a

systematic way of constructing its semantic representation?

A more specific question:• Is there a systematic way of translating simple

sentences such as “Vincent loves Mia” and “A woman snorts” into first-order logic?

• What do we mean by “systematic”?

Fall 2020 KAIST CS579: Computational Linguistics 6

Compositionality

Page 7: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Consider “Vincent loves Mia”. • Its semantic content is partially captured by the

first-order formula . • The proper name “Vincent” contributes the

constant to the representation.• The transitive verb “loves” contributes the

relation symbol , and “Mia” contributes . Can we generalize this process to the claim

that the words making up a sentence contribute all the bits and pieces needed to build the sentence’s semantic representation?

Fall 2020 KAIST CS579: Computational Linguistics 7

Page 8: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Another, related question:• From the symbols love, mia and vincent, why

can’t we also form love(mia,vincent) from the sentence “Vincent loves Mia” (in addition to, or instead of love(vincent,mia))?

• Are there any limitations?

Fall 2020 KAIST CS579: Computational Linguistics 8

Page 9: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Such limitations are explained by the notion of syntactic structure. • “Vincent loves Mia” isn’t just a string of words,

but presents a hierarchical structure: • a sentence (S) that is composed of the subject

noun phrase (NP) “Vincent” and the verb phrase (VP) “loves Mia”.

• This VP is in turn composed of the transitive verb (TV) “loves” and the direct object NP “Mia”.

Fall 2020 KAIST CS579: Computational Linguistics 9

Page 10: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

The systematicity of the translation from natural language to first-order logic expressions amounts to using the additional information provided by syntactic structure to spell out exactly how the semantic contributions are to be gluedtogether.

Fall 2020 KAIST CS579: Computational Linguistics 10

Page 11: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Example

Fall 2020 KAIST CS579: Computational Linguistics 11

Vincent loves Mia (S)love(vincent,mia)

Vincent (NP)vincent

Mia (NP)mia

loves (TV)love(?,?)

loves Mia (VP)love(?,mia)

Page 12: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Compositionality• Semantic information flows from the lexicon,

thus each lexical item is associated with a representation.

• How is the information combined at all?• It is combined by making use of the hierarchy

provided by the syntactic analysis.

Fall 2020 KAIST CS579: Computational Linguistics 12

Page 13: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• Suppose the syntax tells us that some kind of sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP). • Then the task is to describe how the semantic

representation of the VP subpart is to be built out of the representation of its two sub-subparts.

• If we succeed in doing this for all the grammatical constructions covered by the syntax, we will have given a compositionalsemantics for the language under discussion.

Fall 2020 KAIST CS579: Computational Linguistics 13

Page 14: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

For the original question, we need to:• [Task 1] Specify a reasonable syntax for the

fragment of natural language of interest.• [Task 2] Specify semantic representations for

the lexical items.• [Task 3] Specify the translation composition-

ally.• We should specify the translation of all expressions in

terms of the translation of their parts, where “parts” refer to the substructures given to us by the syntax.

Fall 2020 KAIST CS579: Computational Linguistics 14

Natural Language Syntax via Definite Clause Grammars

Page 15: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Moreover, all the three tasks need to be carried out in a way that leads naturally to computational implementation.

As for Task 1, the syntactic analysis of a sentence will be identified with a tree, whose non-leaf nodes represent complex syntactic categories and whose leaves represent lexical items.

This approach will allow us to make use of Definite Clause Grammars (DCGs).

Fall 2020 KAIST CS579: Computational Linguistics 15

Page 16: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Refer to the files experiment1.pl, experiment2.pl, and experiment3.pl. s --> np, vp. noun --> [woman].np --> pn. noun --> [foot,massage].np --> det, noun. vp --> iv.pn --> [vincent]. vp --> tv, np.pn --> [mia]. iv --> [snorts].det --> [a]. iv --> [walks].det --> [every]. tv --> [loves].

tv --> [likes].

Fall 2020 KAIST CS579: Computational Linguistics 16

An example DCG implementation

Page 17: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

The grammar accepts the following simple sentence:• Vincent walks.

In addition, by posing the query below, we can test whether “Mia likes a foot massage” is accepted by the grammar.?- s([mia,likes,a,foot,massage],[]).

The query below generates all grammatical sentences.?- s(X,[]).

Fall 2020 KAIST CS579: Computational Linguistics 17

Page 18: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

TWO EXPERIMENTS

CS579: Computational Linguistics 18Fall 2020 KAIST

Page 19: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Experiment 1• Assume the following piece of DCG code:

pn(vincent) --> [vincent].pn(mia) --> [mia].iv(snort(_)) --> [snorts].tv(love(_,_)) --> [loves].

Fall 2020 KAIST CS579: Computational Linguistics 19

Two Experiments

How do we come up with the specific values?

Page 20: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• We choose to use the predicate arg/3.s(Sem) --> np(SemNP), vp(Sem),

{ arg(1,Sem,SemNP) }. np(Sem) --> pn(Sem). vp(Sem) --> tv(Sem), np(SemNP),

{ arg(2,Sem,SemNP) }. vp(Sem) --> iv(Sem).

Fall 2020 KAIST CS579: Computational Linguistics 20

pn(vincent) --> [vincent].pn(mia) --> [mia].iv(snort(_)) --> [snorts].tv(love(_,_)) --> [loves].

Page 21: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• Example interaction:?- s(Sem, [mia,snorts], []).Sem = snort(mia)

• We extend the DCG with the determiners “a” and “every”.

det(some(_,and(_,_))) --> [a].det(all(_,imp(_,_))) --> [every].noun(woman(_)) --> [woman].noun(footmassage(_)) --> [foot,massage].

Fall 2020 KAIST CS579: Computational Linguistics 21

pn(vincent) --> [vincent].pn(mia) --> [mia].iv(snort(_)) --> [snorts].tv(love(_,_)) --> [loves]. s(Sem) --> np(SemNP), vp(Sem),

{ arg(1,Sem,SemNP) }. np(Sem) --> pn(Sem). vp(Sem) --> tv(Sem), np(SemNP),

{ arg(2,Sem,SemNP) }. vp(Sem) --> iv(Sem).

Page 22: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• We use arg/3 again for the determiners. np(Sem) --> det(Sem), noun(SemNoun),

{arg(1,SemNoun,X),arg(1,Sem,X),arg(2,Sem,Matrix),arg(1,Matrix,SemNoun)}.

• Sample interaction:?- np(Sem, [a,woman], []).Sem = some(X,and(woman(X),Y)).

Fall 2020 KAIST CS579: Computational Linguistics 22

missing piece of information

det(some(_,and(_,_))) --> [a].det(all(_,imp(_,_))) --> [every].noun(woman(_)) --> [woman].noun(footmassage(_)) -->

[foot,massage].

Page 23: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• We need to furnish VP with the missing piece of information.

• A proposal:s(Sem) --> np(Sem), vp(SemVP),

{ arg(1,SemVP,X), arg(1,Sem,X),arg(2,Sem,Matrix),arg(2,Matrix,SemVP) }.

• Problems:• Redundancy: We already have a rule ‘s --> np, vp’. • Wrong interaction: np(Sem) --> pn(Sem).• Another rule for quantified NPs in object position?

Fall 2020 KAIST CS579: Computational Linguistics 23

np(Sem) --> det(Sem), noun(SemNoun), {arg(1,SemNoun,X),

arg(1,Sem,X),arg(2,Sem,Matrix),arg(1,Matrix,SemNoun)}.

s(Sem) --> np(SemNP), vp(Sem), { arg(1,Sem,SemNP) }.

np(Sem) --> pn(Sem). vp(Sem) --> tv(Sem), np(SemNP),

{ arg(2,Sem,SemNP) }. vp(Sem) --> iv(Sem).

Page 24: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Experiment 2• Lessons from experiment 1• We need to work with incomplete first-order

formulas to build representations.• We need a way of manipulating the missing

information. • We should take care to always associate

missing information with an explicit Prolog variable.

Fall 2020 KAIST CS579: Computational Linguistics 24

Page 25: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• Proposal:• We shall need three extra arguments: (1) one

for the bound variable, (2) one for the contribution made by the noun (restriction),and (3) one for the contribution made by the VP (nuclear scope).

det(X,Restr,Scope,some(X,and(Restr,Scope))) --> [a].

det(X,Restr,Scope,all(X,imp(Restr,Scope))) --> [every].

Fall 2020 KAIST CS579: Computational Linguistics 25

Page 26: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• Additional lexical changesnoun(X,woman(X)) --> [woman].iv(Y,snort(Y)) --> [snorts].tv(Y,Z,love(Y,Z)) --> [loves].

• Quantified noun phrases and proper namesnp(X,Scope,Sem) --> det(X,Restr,Scope,Sem),

noun(X,Restr).np(SemPN,Sem,Sem) --> pn(SemPN).

Fall 2020 KAIST CS579: Computational Linguistics 26

det(X,Restr,Scope,some(X,and(Restr,Scope))) --> [a].

det(X,Restr,Scope,all(X,imp(Restr,Scope))) --> [every].

noun(woman(_)) --> [woman].noun(footmassage(_)) --> [foot,massage].pn(vincent) --> [vincent].pn(mia) --> [mia].iv(snort(_)) --> [snorts].tv(love(_,_)) --> [loves].

Page 27: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

• Redefined ruless(Sem) --> np(X,SemVP,Sem), vp(X,SemVP).vp(X,Sem) --> tv(X,Y,SemTV),

np(Y,SemTV,Sem).vp(X,Sem) --> iv(X,Sem).

• Problems?• Much of the work is done by the rules, requiring

rule-specific Prolog tricks such as variable doubling. • It is hard to think about the resulting grammar in a

modular way. • We need a better solution.

Fall 2020 KAIST CS579: Computational Linguistics 27

np(X,Scope,Sem) --> det(X,Restr,Scope,Sem), noun(X,Restr).np(SemPN,Sem,Sem) --> pn(SemPN).

Page 28: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Fall 2020 KAIST CS579: Computational Linguistics 28

Page 29: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

THE LAMBDA CALCULUS

CS579: Computational Linguistics 29Fall 2020 KAIST

Page 30: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Lambda calculus works as a notational extension of first-order logic that allows us to bind variables using a new variable binding operator .• Occurrences of variables bound by should be

thought of as placeholders for missing information: they explicitly mark where we should substitute the various bits and pieces obtained in the course of semantic construction.

• An operation called –conversion performs the required substitutions.

Fall 2020 KAIST CS579: Computational Linguistics 30

The Lambda Calculus

Page 31: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

A simple lambda expression:• x.man(x)

Notes• The prefix x. binds the occurrence of x in

man(x).• The prefix x. abstracts over the variable x. • We call expressions with such prefixes

lambda abstractions (or, more simply, abstractions).

Fall 2020 KAIST CS579: Computational Linguistics 31

Page 32: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

We use the symbol @ to indicate the substitutions (or functional applications) we wish to carry out. • The expression x.man(x)@vincent gives rise

to man(vincent).

This substitution process is called -conversion, -reduction, or -conversion.

Fall 2020 KAIST CS579: Computational Linguistics 32

Page 33: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Additional process• -conversion

• the process of relabelling bound variables• -equivalence

• alphabetic-variants• 𝜆y.man(y) and 𝜆z.man(z)

• Why do we need -conversion?• It could happen that, when we apply 𝜆x. 𝐹 to an

argument A, some occurrences of a variable that is free in A becomes bound by a lambda operator or a quantifier when we substitute it into F.

Fall 2020 KAIST CS579: Computational Linguistics 33

Page 34: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

A revised formulation• Assign lambda expressions to the different

basic syntactic categoriesevery: boxer: y.boxer(y)walks: z.walk(z)

Fall 2020 KAIST CS579: Computational Linguistics 34

every boxer (NP)𝜆u. 𝜆v. ∀x u@x → v@x @𝜆y. boxer y

every (DET)𝜆u. 𝜆v. ∀x u@x → v@x

boxer (Noun)𝜆y. boxer y

Page 35: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

every boxer:

leading to:

every boxer walks: @ z.walk(z)

leading to: z.walk(z)

leading to: walk(x)

Fall 2020 KAIST CS579: Computational Linguistics 35

every boxer (NP)𝜆u. 𝜆v. ∀x u@x → v@x @𝜆y. boxer y

every (DET)𝜆u. 𝜆v. ∀x u@x → v@x

boxer (Noun)𝜆y. boxer y

Page 36: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Full example

Fall 2020 KAIST CS579: Computational Linguistics 36

every boxer walks∀x boxer x → walk(x)

every boxer (NP)𝜆u. 𝜆v. ∀x u@x → v@x @𝜆y. boxer y

every (DET)𝜆u. 𝜆v. ∀x u@x → v@x

boxer (Noun)𝜆y. boxer y

walks (VP)𝜆z.walk(z)

Page 37: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Proper names are handled similarly.Mia: 𝜆u.(u@mia)Vincent: 𝜆u.(u@vincent)

Fall 2020 KAIST CS579: Computational Linguistics 37

loves (TV)𝜆w.𝜆z.(w@𝜆x.love(z,x))

Mia (NP)𝜆u. u@mia

loves Mia (VP)𝜆z.(love(z,mia))

Vincent (NP)𝜆u. u@vincent

Vincent loves Mia (S)love(vincent,mia))

Page 38: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

We assume a revised version of the three tasks (for the remainder of the course). • Task 1: Specify a DCG for the fragment of

natural language of interest.• Task 2: Specify semantic representations for

the lexical items with the help of the lambda calculus.

• Task 3: Specify the semantic representation R’of a syntactic item R whose parts are F and A , with the help of the lambda calculus.

Fall 2020 KAIST CS579: Computational Linguistics 38

Page 39: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

SUMMARY

CS579: Computational Linguistics 39Fall 2020 KAIST

Page 40: Computational Linguisticsnlpcl.kaist.ac.kr/~cs579_2020/slides/579-fall-2020-9.pdf · sentential subpart (e.g., a VP) is decomposable into two sub-subparts (e.g., a TV and an NP)

Lambda Calculus• Compositionality• Systematicity via syntactic structure• Definite Clause Grammars (DCGs)

• Two Experiments• The Lambda Calculus • operator, Lambda expression, lambda

abstraction, functional application, -conversion, -conversion

Fall 2020 KAIST CS579: Computational Linguistics 40

Summary