1/9/2016cpsc503 winter 20081 cpsc 503 computational linguistics lecture 10 giuseppe carenini
TRANSCRIPT
![Page 1: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/1.jpg)
04/21/23 CPSC503 Winter 2008 1
CPSC 503Computational Linguistics
Lecture 10Giuseppe Carenini
![Page 2: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/2.jpg)
04/21/23 CPSC503 Winter 2008 2
Knowledge-Formalisms Map
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
![Page 3: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/3.jpg)
04/21/23 CPSC503 Winter 2008 3
Today 8/10
• Probabilistic CFGs: assigning prob. to parse trees and to sentences– parse with prob.– acquiring prob.
• Probabilistic Lexicalized CFGs
![Page 4: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/4.jpg)
04/21/23 CPSC503 Winter 2008 4
“the man saw the girl with the telescope”
The girl has the telescopeThe man has the telescope
Ambiguity only partially solved by Earley parser
![Page 5: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/5.jpg)
04/21/23 CPSC503 Winter 2008 5
Probabilistic CFGs (PCFGs)
• Each grammar rule is augmented with a conditional probability
Formal Def: 5-tuple (N, , P, S,D)
• The expansions for a given non-terminal sum to 1VP -> Verb .55
VP -> Verb NP .40VP -> Verb NP NP .05
![Page 6: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/6.jpg)
04/21/23 CPSC503 Winter 2008 6
Sample PCFG
![Page 7: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/7.jpg)
04/21/23 CPSC503 Winter 2008 7
PCFGs are used to….
• Estimate Prob. of parse tree
)(TreeP
• Estimate Prob. to sentences
)(SentenceP
![Page 8: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/8.jpg)
04/21/23 CPSC503 Winter 2008 8
Example
6
66
102.3
105.1107.1
)...."("
youCanP
6105.1...4.15.)( aTreeP 6107.1...4.15.)( bTreeP
![Page 9: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/9.jpg)
04/21/23 CPSC503 Winter 2008 9
Probabilistic Parsing:
– Slight modification to dynamic programming approach
– (Restricted) Task is to find the max probability tree for an input
)(argmax)()(
^
TreePSentenceTreeSentencetreesParseTree
![Page 10: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/10.jpg)
04/21/23 CPSC503 Winter 2008 10
Probabilistic CYK Algorithm
CYK (Cocke-Younger-Kasami) algorithm– A bottom-up parser using dynamic programming– Assume the PCFG is in Chomsky normal form (CNF)
Ney, 1991 Collins, 1999
Definitions– w1… wn an input string composed of n words
– wij a string of words from word i to word j– µ[i, j, A] : a table entry holds the maximum
probability for a constituent with non-terminal A spanning words wi…wj A
![Page 11: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/11.jpg)
04/21/23 CPSC503 Winter 2008 11
CYK: Base CaseFill out the table entries by induction: Base
case – Consider the input strings of length one (i.e., each
individual word wi) P(A wi)
– Since the grammar is in CNF: A * wi iff A wi
– So µ[i, i, A] = P(A wi)“Can1 you2 book3 TWA4 flights5 ?” Aux
1
1
.4Nou
n
5
5.5
……
![Page 12: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/12.jpg)
04/21/23 CPSC503 Winter 2008 12
CYK: Recursive CaseRecursive case
– For strings of words of length > 1, A * wij iff there is at least one rule A BCwhere B derives the first k words (between i
and i-1 +k ) and C derives the remaining ones (between i+k and j)
– µ[i, j, A)] = µ [i, i-1 +k, B] *
µ [i+k, j,C] *
P(A BC)– (for each non-terminal)Choose the max
among all possibilities
A
CB
i i-1+k i+k j
![Page 13: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/13.jpg)
04/21/23 CPSC503 Winter 2008 13
CYK: Termination
S1
5
1.7x10-
6
“Can1 you2 book3 TWA4 flight5 ?”
5
The max prob parse will be µ [1, n, S]
![Page 14: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/14.jpg)
04/21/23 CPSC503 Winter 2008 14
Acquiring Grammars and Probabilities
Manually parsed text corpora (e.g., PennTreebank)
Ex: if the NP -> ART ADJ NOUN rule is used 50 times and all NP rules are used 5000 times, then the rule’s probability is …
• Grammar: read it off the parse treesEx: if an NP contains an ART, ADJ, and NOUN
then we create the rule NP -> ART ADJ NOUN.
)( AP
• Probabilities:
![Page 15: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/15.jpg)
04/21/23 CPSC503 Winter 2008 15
Non-supervised PCFG Learning
• Take a large collection of text and parse it
• If sentences were unambiguous: count rules in each parse and then normalize
• But most sentences are ambiguous: weight each partial count by the prob. of
the parse tree it appears in (?!)
)probsRule|(maxargProbsRule
trainingSentencesP
![Page 16: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/16.jpg)
04/21/23 CPSC503 Winter 2008 16
Non-supervised PCFG Learning
)probsRule|(maxargProbsRule
trainingSentencesP
Inside-Outside algorithm (generalization of forward-backward
algorithm)
Start with equal rule probs and keep revising them iteratively
• Parse the sentences• Compute probs of each parse• Use probs to weight the counts• Reestimate the rule probs
![Page 17: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/17.jpg)
04/21/23 CPSC503 Winter 2008 17
Problems with PCFGs• Most current PCFG models are not
vanilla PCFGs – Usually augmented in some way
• Vanilla PCFGs assume independence of non-terminal expansions
• But statistical analysis shows this is not a valid assumption – Structural and lexical dependencies
![Page 18: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/18.jpg)
04/21/23 CPSC503 Winter 2008 18
Structural Dependencies: Problem
E.g. Syntactic subject of a sentence tends to be a pronoun
– Subject tends to realize the topic of a sentence – Topic is usually old information – Pronouns are usually used to refer to old
information – So subject tends to be a pronoun – In Switchboard corpus:
![Page 19: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/19.jpg)
04/21/23 CPSC503 Winter 2008 19
Structural Dependencies: SolutionSplit non-terminal. E.g., NPsubject and NPobject
– Automatic/Optimal split – Split and Merge algorithm [Petrov et al. 2006- COLING/ACL]
Parent Annotation:
Hand-write rules for more complex struct. dependenciesSplitting problems?
![Page 20: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/20.jpg)
04/21/23 CPSC503 Winter 2008 20
Two parse trees for the sentence “Moscow sent troops into Afghanistan”
Lexical Dependencies: Problem
VP-attachment NP-attachmentTypically NP-attachment
more frequent than VP-attachment
![Page 21: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/21.jpg)
04/21/23 CPSC503 Winter 2008 21
Lexical Dependencies: Solution
• Add lexical dependencies to the scheme…– Infiltrate the influence of particular
words into the probabilities in the derivation
– I.e. Condition on the actual words in the right way
All the words?
– P(VP -> V NP PP | VP = “sent troops into Afg.”)
– P(VP -> V NP | VP = “sent troops into Afg.”)
![Page 22: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/22.jpg)
04/21/23 CPSC503 Winter 2008 22
Heads
• To do that we’re going to make use of the notion of the head of a phrase– The head of an NP is its noun– The head of a VP is its verb– The head of a PP is its preposition
![Page 23: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/23.jpg)
04/21/23 CPSC503 Winter 2008 23
More specific rules• We used to have rule r
– VP -> V NP PP P(r|VP)•That’s the count of this rule divided
by the number of VPs in a treebank
– VP(dumped)-> V(dumped) NP(sacks) PP(into)
– P(r|VP, dumped is the verb, sacks is the head of the NP, into is the head of the PP)
Sample sentence: “Workers dumped sacks into the bin”
• Now we have rule r– VP(h(VP))-> V(h(VP)) NP(h(NP)) PP(h(PP))– P(r|VP, h(VP), h(NP), h(PP))
![Page 24: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/24.jpg)
04/21/23 CPSC503 Winter 2008 24
Example (right)
Attribute grammar
(Collins 1999)
![Page 25: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/25.jpg)
04/21/23 CPSC503 Winter 2008 25
Example (wrong)
![Page 26: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/26.jpg)
04/21/23 CPSC503 Winter 2008 26
Problem with more specific rules
Rule:– VP(dumped)-> V(dumped) NP(sacks)
PP(into)– P(r|VP, dumped is the verb, sacks is
the head of the NP, into is the head of the PP)
Not likely to have significant counts in any treebank!
![Page 27: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/27.jpg)
04/21/23 CPSC503 Winter 2008 27
Usual trick: Assume Independence
• When stuck, exploit independence and collect the statistics you can…
• We’ll focus on capturing two aspects:
– Verb subcategorization• Particular verbs have affinities for particular
VP expansions– Phrase-heads affinities for their predicates
(mostly their mothers and grandmothers)• Some phrase/heads fit better with some predicates
than others
![Page 28: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/28.jpg)
04/21/23 CPSC503 Winter 2008 28
Subcategorization
• Condition particular VP rules only on their head… so r: VP -> V NP PP P(r|VP, h(VP), h(NP), h(PP)) Becomes
P(r | VP, h(VP)) x ……e.g., P(r | VP, dumped)
What’s the count?How many times was this rule used with
dump, divided by the number of VPs that dump appears in total
![Page 29: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/29.jpg)
04/21/23 CPSC503 Winter 2008 29
Phrase/heads affinities for their Predicates
r: VP -> V NP PP ; P(r|VP, h(VP), h(NP), h(PP))
Becomes
P(r | VP, h(VP)) x P(h(NP) | NP, h(VP))) x P(h(PP) | PP, h(VP)))
E.g. P(r | VP,dumped) x P(sacks | NP, dumped)) x P(into | PP,
dumped))
• count the places where dumped is the head of a constituent that has a PP daughter with into as its head and normalize
![Page 30: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/30.jpg)
04/21/23 CPSC503 Winter 2008 30
Example (right)
P(VP -> V NP PP | VP, dumped) =.67 P(into | PP,
dumped)=.22
![Page 31: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/31.jpg)
04/21/23 CPSC503 Winter 2008 31
Example (wrong)
P(VP -> V NP | VP, dumped)=0
P(into | PP, sacks)=0
![Page 32: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/32.jpg)
04/21/23 CPSC503 Winter 2008 32
Knowledge-Formalisms Map(including probabilistic formalisms)
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
![Page 33: 1/9/2016CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini](https://reader036.vdocuments.net/reader036/viewer/2022062807/5697c01c1a28abf838ccfd36/html5/thumbnails/33.jpg)
04/21/23 CPSC503 Winter 2008 33
Next Time (**Wed –Oct 15**)
• You need to have some ideas about your project topic.
• Assuming you know First Order Logics (FOL)
• Read Chp. 17 (17.4 – 17.5)• Read Chp. 18.1-2-3 and 18.5