grammars and parsing - systems group · grammars b. parsing ... sentence a unit of one or more...
TRANSCRIPT
Partial/shallow parsing (chunking)
Goal: Identify the basic non-recursive (N/V/A/P) phrases of a sentence (chunking):
● flat/non-overlapping● segmentation+labeling task
"[The morning flight]NP from [Denver]NP [has arrived]VP"
BIO + sequence models
American Airlines , a unit of AMR Inc.
[American Airlines]NP , [a unit]NP of [AMR Inc.]NP
AmericanB-NP AirlinesI-NP ,O aO unitB-NP ofO AMRB-
NP Inc.I-NP
Overview
1. Today:a. Grammarsb. Parsing
2. Next class:a. Statistical Parsingb. Dependency Parsing
J&M Ch. 12-14
Sentence
● A unit of one or more words expressing:1. Statements (declarative): "Cats like milk."2. Commands (imperative): "Leave now!"3. Request information (question):
○ Yes/no questions: "Did the plane leave?"○ WH questions: "When did the train leave?"○ How-to questions: "How do you remove html tags
in c++?"○ ...
4. ...Typically has subject and predicate, is marked by specific punctuation in writing (intonation in speech) etc.
Syntactic analysis
● Goal: understanding the principles of sentence structure○ Grammar○ Parsing
● Applications:○ Dialogue management○ Question answering○ Information extraction○ Machine translation○ Summarization○ Text compression○ ...
Syntax
● Key concepts:○ Constituency○ Heads○ Subcategorization and agreement○ Grammatical relations and Dependency
● Key formalism: ○ Context-free grammars○ Advantages and limitations
Constituency
● Basic idea: groups of words within utterances act as a single unit (phrase)
● These units, in a given language, form coherent classes that behave in similar ways○ With respect to their internal structure; e.g., noun
phrases are often made of a determiner preceding a nominal phrase.
○ With respect to other units in the language; e.g., noun phrases tend to precede verbs.
○ NPs, VP, PPs, ...
Constituency
● It makes sense to the say that the following are all noun phrases (NPs) in English:○ "Harry the horse", "the Broadway coppers", "they",
"the reason he comes into the Hot Box", "three parties from Brooklyn", "a high-class spot such as Mindy's", ...
● Why?○ External evidence: they can all precede verbs, ...○ Internal evidence: the most important word is a
noun, the first element is a determiner,...
Constituency
● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...
● Not true of individual words:○ * from arrive ...○ * the love ...
● Movement:○ On Sept. 17th I'd like to fly from Atlanta to Denver
Constituency
● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...
● Not true of individual words:○ * from arrive ...○ * the love ...
● Movement:○ I'd like to fly on Sept. 17th from Atlanta to Denver
Constituency
● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...
● Not true of individual words:○ * from arrive ...○ * the love ...
● Movement:○ I'd like to fly from Atlanta to Denver on Sept. 17th
Constituency
● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...
● Not true of individual words:○ * from arrive ...○ * the love ...
● Movement:○ *On I'd like to fly Sept. 17th from Atlanta to Denver
Constituency
● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...
● Not true of individual words:○ * from arrive ...○ * the love ...
● Movement:○ *I'd like to fly from on Sept. 17th Atlanta to Denver
Grammars and constituency
● What is the right set of constituents and rules that govern how they combine?
● Many different theories of grammar and competing analyses of the same data.
● The approach to grammar, and the analyses, adopted in NLP, is often generic and agnostic with respect to linguistic theories of grammar.
● CGFs: can model naturally many syntactic phenomena in computationally tractable ways.
Context-free grammars
● CFGs, aka Phrase structure grammars (PSGs) Backus-Naur forma. Terminals (words);
■ a, the, flightb. Non-Terminals: the constituents in a language; e.g.,
noun phrase, verb phrase and sentence■ Det, Nominal, NP, Noun, ProperNoun
c. Productions: equations that consist of a single non-terminal on the left and any number of terminals and non-terminals on the right■ NP -> Det Nominal
Formal definition
CFG = (N, Z, R, S):1. N - a set of non-terminal symbols (or
variables, capital letters A, B, ...)2. Z - a set of terminal symbols (disjoint from
N, lowercase roman letters)3. R - a set of rules/productions of the form A
-> beta; A is a non-terminal and beta is a string from (N \cup Z)*
4. S - a start symbol
Noun Phrases
Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John
Noun Phrases
Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John
Types of NPs
Noun Phrases
Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John
Types of NPs
Disjunctive, recursive rules
Noun Phrases
Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John
Types of NPs
Disjunctive, recursive rules
Lexicon
Generation/Analysis
● The CFG can be used for:a. Generating strings in the languageb. Rejecting strings not in the languagec. Associating structures (syntactic trees) to strings in
the language● Derivation: a sequence of rules applied to a
string such that:a. Covers all the elements in the stringb. Covers only the elements in the string
English grammar fragment
● Sentences ● Noun phrases
○ Agreement ● Verb phrases
○ Subcategorization○ Agreement
Sentence types
1. Declarative: "Cats like milk."a. S -> NP VP
2. Imperative: "Leave!"a. S -> VP
3. Questions:a. Yes/no questions: "Did the plane leave?"
i. S -> AUX NP VPb. WH questions: "When did the train leave?"
i. S -> WH-NP AUX NP VP
The Noun Phrase (NP)
● A phrase whose central element is a noun/nominal denoting entities, events, or other concepts acting as subjects, objects etc.
● This central element of the (any) phrase is called the head.
● We can analyze most NPs based on what comes before the head, and what comes after it, the modifiers.
Pre-modifiers
● Determiners:○ Articles, numerals, quantifiers
■ the, this, a, an, three, some, many,...○ Simple possessives:
■ my, their, John’s (car), ...○ Complex recursive variants:
■ John’s sister’s husband’s son’s car● Adjectives and nouns
○ large cars, morning flight, ...● Ordering constraints:
○ Three large cars○ ?large three cars
Post-modifiers
● Prepositional phrases○ "from Seattle"
● Non-finite clauses○ "arriving before noon"
● Relative clauses○ "that serve breakfast"
● Appositives:○ American Airline, a unit of AMR Inc.,...
Agreement
● The constraints that hold among various constituents that take part in a rule or set of rules:
● In English, determiners and the head nouns in NPs have to agree in their number:
This flightThose flights*This flights*Those flight
● Problem: The kind of rules seen so far don’t capture this constraint, they overgenerate:○ NP → Det Nominal
Verb Phrase (VP)
● English VPs consist of a head verb along with 0 or more following constituents called arguments:○ VP -> Verb
■ disappear○ VP -> Verb NP
■ prefer a morning flight○ VP -> Verb NP PP
■ leave Boston in the morning○ VP -> Verb PP
■ leaving on Thursday
Subcategorization
● Not all verbs are allowed to participate in all VP rules of the grammar, because of semantic constraints:○ Example, traditional notion of transitive/intransitive
verb● Verbs can be subcategorizes according to
the sets of VP rules that they participate in:
Examples
● Sneeze: VP -> Verb, "John sneezed"● Find: VP -> Verb NP, "Please find [a flight to NY]"● Give: VP -> Verb NP NP, "Give [me] [a cheaper
fare]"● Help: VP -> Verb NP PP, "Can you help [me]
[with a flight]"● Prefer: VP -> Verb TO-VP, "I prefer [to leave
earlier]" ● Told: VP -> Verb S, "I was told [United has a flight]"
Problem
● Our grammar over-generates:● With respect to subcategorization:
○ *John sneezed the book (S -> NP VP)○ *I prefer United has a flight ○ *Give with a flight
● And Agreement:○ *John sneeze○ I prefers United
As with NP agreement phenomena, we need a way to formally express the constraints
CGF solution for agreement
● Split rules:○ SgS -> SgNP SgVP, PlS -> PlNp PlVP○ SgNP -> SgDet SgNom, PlNP -> PlDet PlNom○ PlVP -> PlV NP, SgVP ->SgV Np.○ Subcategorization?
● This works and stays within the power of CFGs:○ But it is not a elegant solution○ it doesn’t scale all that well because of the
interaction among the various constraints explodes the number of rules in our grammar.
Limitations of CFGs
● CFGs account for substantial basic syntactic structure in English.
● Some problems can be dealt with adequately, although not elegantly, by staying within the CFG framework.
● There are simpler, more elegant, solutions moving beyond the CFG framework (beyond its formal power): LFG, HPSG, Construction grammar, XTAG, etc.
● But loose the computational advantages of CFGs
Grammatical relations
● Based on the syntactic tree, and phrase heads derived from it, we can easily identify important grammatical relations:○ Subject○ Object (direct/indirect)○ Modifier dependencies (temporal, appositional, etc.)○ ...
Summary
● Context-free grammars can be used to model various facts about the syntax of a language.
● When paired with parsers, such grammars constitute a critical component in many applications.
● Constituency is a key phenomena easily captured with CFG rules.
● But agreement and subcategorization do pose significant problems
Parsing
● Parsing with CFGs refers to the task of assigning proper trees to input strings○ a tree that covers all and only the elements of the
input and has an S at the top○ It doesn’t actually mean that the system can select
the correct tree from among all the possible trees● Parsing involves a search which involves the
making of choices○ The search space: exponential number of
parses for a sentence
Top-down search
● Idea: we’re trying to find trees rooted with an S (Sentences), why not start with the rules that generate an S?
● Then we can work our way down from there to the words.
Bottom-up search
● Idea: We also want trees that cover the input words. So we might also start with sub-trees that contain all the words in the right way.
● Then work our way up from there to larger and larger trees.
Pros and cons
● Top-down:○ Only searches for trees that can be valid sentences.○ But also explores trees that are not consistent with
any of the input words.● Bottom-up:
○ Only forms trees consistent with the input words.○ But explore structures which won't lead to valid
sentences.● Many combinations possible● Shared goal: avoid redoing work already
done (shared sub-problems, DP).
CKY
● Bottom-up control● Limit grammar to binary rules● Idea:
○ A →BC○ If there is an A somewhere in the input then there
must be a B followed by a C in the input.○ If the A spans from i to j in the input then there must
be some k st. i<k<j
CNF grammar
● If the grammar is not binary it needs to be converted to Chomsky-Normal Form.
● Any arbitrary CFG can be rewritten into CNF automatically.a. The resulting grammar accepts (and rejects) the
same set of strings as the original grammar.b. The resulting derivations (trees) are different, but
can be transformed back to the original CFG
CNF transform
● CNF: Rules can expand to either 2 non- terminals or to a single terminal:○ A → B C○ A → w
● Binarization: Introduce new intermediate non-terminals into the grammar that distribute rules with length > 2 over several rules:○ S → A B C becomes○ S → X C, and X → A B (X not in the grammar)
CKY
● Builds a table so that an A spanning from i to j in the input is placed in cell [i,j].○ An S spanning an entire string will sit in cell [0, n]
● Bottom-up construction: the parts of A must go from i to k and from k to j, for some k.○ For a rule like A → B C we should look for a B in [i,k]
and a C in [k,j].○ If there might be an A spanning i,j in the input AND○ A → B C is a rule in the grammar THEN○ There must be a B in [i,k] and a C in [k,j], for i<k<j
CKY example
Book the flight through Houston
[0,1] [0,2] [0,3] [0,4] [0,5]
[1,2] [1,3] [1,4] [1,5]
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]
[1,2] [1,3] [1,4] [1,5]
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]
[1,2] [1,3] [1,4] [1,5]
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]
Det
[1,2] [1,3] [1,4] [1,5]
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]
Det
[1,2] [1,3] [1,4] [1,5]
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]
Det
[1,2] [1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
Prep
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
Prep
[3,4] [3,5]
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
Prep
[3,4] [3,5]
NP, ProperNoun
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4] [2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4] [1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4] [0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4]
S, VP, X2
[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4]
S, VP, X2
[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
CKY example
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4]
S, VP, X2
[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
ACCEPTED!
CKY parsing
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4]
S, VP, X2
[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
1- Add backpointers to constituents
CKY parsing
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4]
S, VP, X2
[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
1- Add backpointers to constituents
CKY parsing
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, VP, X2
[0,3] [0,4]
S, VP, X2
[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
1- Add backpointers to constituents2- Allow multiple versions of the same non-terminal
CKY parsing
Book the flight through Houston
S, VP, Verb, Nominal, Noun[0,1] [0,2]
S, X2VP[0,3] [0,4]
S1, VP, X2 S2[0,5]
Det
[1,2]
NP
[1,3] [1,4]
NP
[1,5]
Nominal, Noun
[2,3] [2,4]
Nominal
[2,5]
Prep
[3,4]
PP
[3,5]
NP, ProperNoun
[4,5]
1- Add backpointers to constituents2- Allow multiple versions of the same non-terminal
Limitations
● CKY populates the table with unwarranted constituents.○ By themselves are constituents but cannot really
occur in the context in which they are being suggested.
● Alternative: top-down control strategy (Early algorithm)
● Add some kind of filtering that blocks constituents where they can not happen in a final analysis.
Ambiguity
● Both CKY and Earley will result in multiple S structures for the [0,N] table entry.
● They both efficiently store the sub-parts that are shared between multiple parses.
● And they obviously avoid re-deriving those sub-parts.
● But neither can tell us which one is right.● We’ll try to model that with probabilities.
Final thought
“One morning I shot an elephant in my pajamas. How he got into my pajamas I'll never know.”Groucho Marx
There is more beyond syntax...