grammars and parsing - systems group · grammars b. parsing ... sentence a unit of one or more...

96
Grammars and parsing Intro to NLP - ETHZ - 05/05/2014

Upload: trantruc

Post on 11-Jun-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Grammars and parsingIntro to NLP - ETHZ - 05/05/2014

Parsing

5 + 2 * 7 = ?

Parsing

5 + 2 * 7 = ?1. ((5 + 2) * 7) ?2. (5 + (2 * 7)) ?

Parsing

5 + 2 * 7 = ?1. ((5 + 2) * 7) ?2. (5 + (2 * 7)) ?

5 + 2 * 7 = (5 + (2 * 7)) = 19

+

*

5 2 7

Partial/shallow parsing (chunking)

Goal: Identify the basic non-recursive (N/V/A/P) phrases of a sentence (chunking):

● flat/non-overlapping● segmentation+labeling task

"[The morning flight]NP from [Denver]NP [has arrived]VP"

BIO + sequence models

American Airlines , a unit of AMR Inc.

[American Airlines]NP , [a unit]NP of [AMR Inc.]NP

AmericanB-NP AirlinesI-NP ,O aO unitB-NP ofO AMRB-

NP Inc.I-NP

Full parsing

American Airlines, a unit of AMR Inc.

Overview

1. Today:a. Grammarsb. Parsing

2. Next class:a. Statistical Parsingb. Dependency Parsing

J&M Ch. 12-14

Sentence

● A unit of one or more words expressing:1. Statements (declarative): "Cats like milk."2. Commands (imperative): "Leave now!"3. Request information (question):

○ Yes/no questions: "Did the plane leave?"○ WH questions: "When did the train leave?"○ How-to questions: "How do you remove html tags

in c++?"○ ...

4. ...Typically has subject and predicate, is marked by specific punctuation in writing (intonation in speech) etc.

Syntactic analysis

● Goal: understanding the principles of sentence structure○ Grammar○ Parsing

● Applications:○ Dialogue management○ Question answering○ Information extraction○ Machine translation○ Summarization○ Text compression○ ...

Layers of structured annotations

PoS tagging/disambiguation

Named entity recognition

Dependency parsing

Semantic role labeling

Who did what to whom?

Why?

When? ...

Syntax

● Key concepts:○ Constituency○ Heads○ Subcategorization and agreement○ Grammatical relations and Dependency

● Key formalism: ○ Context-free grammars○ Advantages and limitations

Constituency

● Basic idea: groups of words within utterances act as a single unit (phrase)

● These units, in a given language, form coherent classes that behave in similar ways○ With respect to their internal structure; e.g., noun

phrases are often made of a determiner preceding a nominal phrase.

○ With respect to other units in the language; e.g., noun phrases tend to precede verbs.

○ NPs, VP, PPs, ...

Constituency

● It makes sense to the say that the following are all noun phrases (NPs) in English:○ "Harry the horse", "the Broadway coppers", "they",

"the reason he comes into the Hot Box", "three parties from Brooklyn", "a high-class spot such as Mindy's", ...

● Why?○ External evidence: they can all precede verbs, ...○ Internal evidence: the most important word is a

noun, the first element is a determiner,...

Constituency

● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...

● Not true of individual words:○ * from arrive ...○ * the love ...

● Movement:○ On Sept. 17th I'd like to fly from Atlanta to Denver

Constituency

● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...

● Not true of individual words:○ * from arrive ...○ * the love ...

● Movement:○ I'd like to fly on Sept. 17th from Atlanta to Denver

Constituency

● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...

● Not true of individual words:○ * from arrive ...○ * the love ...

● Movement:○ I'd like to fly from Atlanta to Denver on Sept. 17th

Constituency

● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...

● Not true of individual words:○ * from arrive ...○ * the love ...

● Movement:○ *On I'd like to fly Sept. 17th from Atlanta to Denver

Constituency

● External evidence:○ three parties from Brooklyn arrive ...○ the Broadway coppers love ...○ a high-class spot such as Mindy's attracts ...

● Not true of individual words:○ * from arrive ...○ * the love ...

● Movement:○ *I'd like to fly from on Sept. 17th Atlanta to Denver

Grammars and constituency

● What is the right set of constituents and rules that govern how they combine?

● Many different theories of grammar and competing analyses of the same data.

● The approach to grammar, and the analyses, adopted in NLP, is often generic and agnostic with respect to linguistic theories of grammar.

● CGFs: can model naturally many syntactic phenomena in computationally tractable ways.

Context-free grammars

● CFGs, aka Phrase structure grammars (PSGs) Backus-Naur forma. Terminals (words);

■ a, the, flightb. Non-Terminals: the constituents in a language; e.g.,

noun phrase, verb phrase and sentence■ Det, Nominal, NP, Noun, ProperNoun

c. Productions: equations that consist of a single non-terminal on the left and any number of terminals and non-terminals on the right■ NP -> Det Nominal

Formal definition

CFG = (N, Z, R, S):1. N - a set of non-terminal symbols (or

variables, capital letters A, B, ...)2. Z - a set of terminal symbols (disjoint from

N, lowercase roman letters)3. R - a set of rules/productions of the form A

-> beta; A is a non-terminal and beta is a string from (N \cup Z)*

4. S - a start symbol

Noun Phrases

Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John

Noun Phrases

Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John

Types of NPs

Noun Phrases

Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John

Types of NPs

Disjunctive, recursive rules

Noun Phrases

Productions, example:● NP -> Det Nominal● NP -> ProperNoun● Nominal -> Noun | Nominal Noun● Det -> a● Det -> the● Noun -> flight● ProperNoun -> John

Types of NPs

Disjunctive, recursive rules

Lexicon

L0 grammar

L0 lexicon

Generation/Analysis

● The CFG can be used for:a. Generating strings in the languageb. Rejecting strings not in the languagec. Associating structures (syntactic trees) to strings in

the language● Derivation: a sequence of rules applied to a

string such that:a. Covers all the elements in the stringb. Covers only the elements in the string

Derivation (parse tree)

English grammar fragment

● Sentences ● Noun phrases

○ Agreement ● Verb phrases

○ Subcategorization○ Agreement

Sentence types

1. Declarative: "Cats like milk."a. S -> NP VP

2. Imperative: "Leave!"a. S -> VP

3. Questions:a. Yes/no questions: "Did the plane leave?"

i. S -> AUX NP VPb. WH questions: "When did the train leave?"

i. S -> WH-NP AUX NP VP

The Noun Phrase (NP)

● A phrase whose central element is a noun/nominal denoting entities, events, or other concepts acting as subjects, objects etc.

● This central element of the (any) phrase is called the head.

● We can analyze most NPs based on what comes before the head, and what comes after it, the modifiers.

Pre-modifiers

● Determiners:○ Articles, numerals, quantifiers

■ the, this, a, an, three, some, many,...○ Simple possessives:

■ my, their, John’s (car), ...○ Complex recursive variants:

■ John’s sister’s husband’s son’s car● Adjectives and nouns

○ large cars, morning flight, ...● Ordering constraints:

○ Three large cars○ ?large three cars

Post-modifiers

● Prepositional phrases○ "from Seattle"

● Non-finite clauses○ "arriving before noon"

● Relative clauses○ "that serve breakfast"

● Appositives:○ American Airline, a unit of AMR Inc.,...

Example: head + pre-post rules

Head finding

Agreement

● The constraints that hold among various constituents that take part in a rule or set of rules:

● In English, determiners and the head nouns in NPs have to agree in their number:

This flightThose flights*This flights*Those flight

● Problem: The kind of rules seen so far don’t capture this constraint, they overgenerate:○ NP → Det Nominal

Verb Phrase (VP)

● English VPs consist of a head verb along with 0 or more following constituents called arguments:○ VP -> Verb

■ disappear○ VP -> Verb NP

■ prefer a morning flight○ VP -> Verb NP PP

■ leave Boston in the morning○ VP -> Verb PP

■ leaving on Thursday

Subcategorization

● Not all verbs are allowed to participate in all VP rules of the grammar, because of semantic constraints:○ Example, traditional notion of transitive/intransitive

verb● Verbs can be subcategorizes according to

the sets of VP rules that they participate in:

Examples

● Sneeze: VP -> Verb, "John sneezed"● Find: VP -> Verb NP, "Please find [a flight to NY]"● Give: VP -> Verb NP NP, "Give [me] [a cheaper

fare]"● Help: VP -> Verb NP PP, "Can you help [me]

[with a flight]"● Prefer: VP -> Verb TO-VP, "I prefer [to leave

earlier]" ● Told: VP -> Verb S, "I was told [United has a flight]"

Problem

● Our grammar over-generates:● With respect to subcategorization:

○ *John sneezed the book (S -> NP VP)○ *I prefer United has a flight ○ *Give with a flight

● And Agreement:○ *John sneeze○ I prefers United

As with NP agreement phenomena, we need a way to formally express the constraints

CGF solution for agreement

● Split rules:○ SgS -> SgNP SgVP, PlS -> PlNp PlVP○ SgNP -> SgDet SgNom, PlNP -> PlDet PlNom○ PlVP -> PlV NP, SgVP ->SgV Np.○ Subcategorization?

● This works and stays within the power of CFGs:○ But it is not a elegant solution○ it doesn’t scale all that well because of the

interaction among the various constraints explodes the number of rules in our grammar.

Limitations of CFGs

● CFGs account for substantial basic syntactic structure in English.

● Some problems can be dealt with adequately, although not elegantly, by staying within the CFG framework.

● There are simpler, more elegant, solutions moving beyond the CFG framework (beyond its formal power): LFG, HPSG, Construction grammar, XTAG, etc.

● But loose the computational advantages of CFGs

Grammatical relations

● Based on the syntactic tree, and phrase heads derived from it, we can easily identify important grammatical relations:○ Subject○ Object (direct/indirect)○ Modifier dependencies (temporal, appositional, etc.)○ ...

Example

SUBJ

DOBJ

Summary

● Context-free grammars can be used to model various facts about the syntax of a language.

● When paired with parsers, such grammars constitute a critical component in many applications.

● Constituency is a key phenomena easily captured with CFG rules.

● But agreement and subcategorization do pose significant problems

Parsing

● Parsing with CFGs refers to the task of assigning proper trees to input strings○ a tree that covers all and only the elements of the

input and has an S at the top○ It doesn’t actually mean that the system can select

the correct tree from among all the possible trees● Parsing involves a search which involves the

making of choices○ The search space: exponential number of

parses for a sentence

Top-down search

● Idea: we’re trying to find trees rooted with an S (Sentences), why not start with the rules that generate an S?

● Then we can work our way down from there to the words.

Top-down search space

Bottom-up search

● Idea: We also want trees that cover the input words. So we might also start with sub-trees that contain all the words in the right way.

● Then work our way up from there to larger and larger trees.

Bottom-up search space

Pros and cons

● Top-down:○ Only searches for trees that can be valid sentences.○ But also explores trees that are not consistent with

any of the input words.● Bottom-up:

○ Only forms trees consistent with the input words.○ But explore structures which won't lead to valid

sentences.● Many combinations possible● Shared goal: avoid redoing work already

done (shared sub-problems, DP).

CKY

● Bottom-up control● Limit grammar to binary rules● Idea:

○ A →BC○ If there is an A somewhere in the input then there

must be a B followed by a C in the input.○ If the A spans from i to j in the input then there must

be some k st. i<k<j

CNF grammar

● If the grammar is not binary it needs to be converted to Chomsky-Normal Form.

● Any arbitrary CFG can be rewritten into CNF automatically.a. The resulting grammar accepts (and rejects) the

same set of strings as the original grammar.b. The resulting derivations (trees) are different, but

can be transformed back to the original CFG

CNF transform

● CNF: Rules can expand to either 2 non- terminals or to a single terminal:○ A → B C○ A → w

● Binarization: Introduce new intermediate non-terminals into the grammar that distribute rules with length > 2 over several rules:○ S → A B C becomes○ S → X C, and X → A B (X not in the grammar)

L1 grammar + lexicon

L1 in CNF

CKY

● Builds a table so that an A spanning from i to j in the input is placed in cell [i,j].○ An S spanning an entire string will sit in cell [0, n]

● Bottom-up construction: the parts of A must go from i to k and from k to j, for some k.○ For a rule like A → B C we should look for a B in [i,k]

and a C in [k,j].○ If there might be an A spanning i,j in the input AND○ A → B C is a rule in the grammar THEN○ There must be a B in [i,k] and a C in [k,j], for i<k<j

CKY algorithm

CKY algorithm

CKY example

Book the flight through Houston

[0,1] [0,2] [0,3] [0,4] [0,5]

[1,2] [1,3] [1,4] [1,5]

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]

[1,2] [1,3] [1,4] [1,5]

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]

[1,2] [1,3] [1,4] [1,5]

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]

Det

[1,2] [1,3] [1,4] [1,5]

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]

Det

[1,2] [1,3] [1,4] [1,5]

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]

Det

[1,2] [1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2] [0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

Prep

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

Prep

[3,4] [3,5]

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

Prep

[3,4] [3,5]

NP, ProperNoun

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4] [2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4] [1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4] [0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4]

S, VP, X2

[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4]

S, VP, X2

[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

CKY example

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4]

S, VP, X2

[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

ACCEPTED!

Dynamic programming

CKY parsing

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4]

S, VP, X2

[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

1- Add backpointers to constituents

CKY parsing

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4]

S, VP, X2

[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

1- Add backpointers to constituents

CKY parsing

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, VP, X2

[0,3] [0,4]

S, VP, X2

[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

1- Add backpointers to constituents2- Allow multiple versions of the same non-terminal

CKY parsing

Book the flight through Houston

S, VP, Verb, Nominal, Noun[0,1] [0,2]

S, X2VP[0,3] [0,4]

S1, VP, X2 S2[0,5]

Det

[1,2]

NP

[1,3] [1,4]

NP

[1,5]

Nominal, Noun

[2,3] [2,4]

Nominal

[2,5]

Prep

[3,4]

PP

[3,5]

NP, ProperNoun

[4,5]

1- Add backpointers to constituents2- Allow multiple versions of the same non-terminal

Limitations

● CKY populates the table with unwarranted constituents.○ By themselves are constituents but cannot really

occur in the context in which they are being suggested.

● Alternative: top-down control strategy (Early algorithm)

● Add some kind of filtering that blocks constituents where they can not happen in a final analysis.

Ambiguity

How do we find the correct parse among all valid parse trees returned?

Ambiguity

● Both CKY and Earley will result in multiple S structures for the [0,N] table entry.

● They both efficiently store the sub-parts that are shared between multiple parses.

● And they obviously avoid re-deriving those sub-parts.

● But neither can tell us which one is right.● We’ll try to model that with probabilities.

Final thought

“One morning I shot an elephant in my pajamas.”

Final thought

“One morning I shot an elephant in my pajamas. How he got into my pajamas I'll never know.”Groucho Marx

There is more beyond syntax...

Overview

1. Today:a. Grammarsb. Parsing

2. Next class:a. Statistical Parsingb. Dependency Parsing

J&M Ch. 12-14