parsing context-free grammars

30
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given context-free grammar. We will review a top-down parsing algorithm a bottom-up parsing algorithm We will present the Earley algorithm

Upload: theta

Post on 14-Jan-2016

71 views

Category:

Documents


1 download

DESCRIPTION

Parsing context-free grammars. Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given context-free grammar. We will review a top-down parsing algorithm a bottom-up parsing algorithm We will present the Earley algorithm. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parsing context-free grammars

Parsing context-free grammars

Context-free grammars specify structure, not process.

There are many different ways to parse input in accordance with a given context-free grammar.

We will review– a top-down parsing algorithm– a bottom-up parsing algorithm

We will present the Earley algorithm

Page 2: Parsing context-free grammars

A simple grammar

S NP VP

S Aux NP VP

S VP

NP Det Nominal

NP ProperNoun

VP Verb

VP Verb NP

Det that | this | a

Noun book | flight | meal | money

Verb book | include | prefer

Aux does

Prep from | to | on

ProperNoun Houston | TWA

PP P NP

Nominal Nominal PP

Nominal Noun

Nominal Noun Nominal

Figure 10.2

Page 3: Parsing context-free grammars

Bottom-up parsing

Yngve (1955) presented a bottom-up algorithm

Example (figure 10.4): Book that flight.

Page 4: Parsing context-free grammars

Book is ambiguous – there are two possible POS tags for the word “Book”.

Noun Det Noun Verb Det Noun

Book that flight Book that flight

Look up words in lexicon

Page 5: Parsing context-free grammars

NOM NOM NOM

Noun Det Noun Verb Det Noun

Book that flight Book that flight

Build structure from bottom up

Page 6: Parsing context-free grammars

Now we have three possible structures:

NP NP

NOM NOM VP NOM NOM

Noun Det Noun Verb Det Noun Verb Det Noun

Book that flight Book that flight Book that flight

Build structure from bottom up

Page 7: Parsing context-free grammars

The Noun interpretation of Book leads to a dead end, so only two parse trees survive:

VP

NP NP

VP NOM NOM

Verb Det Noun Verb Det Noun

Book that flight Book that flight

Build structure from bottom up

Page 8: Parsing context-free grammars

There is way to combine a VP and an NP to form an S, so only one parse tree survives: S

VP

NP

NOM

Verb Det Noun

Book that flight

Build structure from bottom up

Page 9: Parsing context-free grammars

When parsing top-down, we start with the grammar’s start symbol and apply productions to try to match input: S

Book that flight

Build structure from top down

Page 10: Parsing context-free grammars

Here we show only the successful choices:

S

VP

Book that flight

Build structure from top down

Page 11: Parsing context-free grammars

Here we show only the successful choices:

S

VP

NP

Verb

Book that flight

Build structure from top down

Page 12: Parsing context-free grammars

Here we show only the successful choices:

S

VP

NP

Verb

Book that flight

Build structure from top down

Page 13: Parsing context-free grammars

Here we show only the successful choices:

S

VP

NP

NOM

Verb Det

Book that flight

Build structure from top down

Page 14: Parsing context-free grammars

Here we show only the successful choices:

S

VP

NP

NOM

Verb Det

Book that flight

Build structure from top down

Page 15: Parsing context-free grammars

Here we show only the successful choices:

S

VP

NP

NOM

Verb Det Noun

Book that flight

Build structure from top down

Page 16: Parsing context-free grammars

Here we show only the successful choices:

S

VP

NP

NOM

Verb Det Noun

Book that flight

Build structure from top down

Page 17: Parsing context-free grammars

Top-down versus bottom-up approaches

Top-down advantages– Doesn’t explore trees

which cannot be S– Subtrees fit under S

Top-down disadvantages– Many fruitless trees are

explored: trees explored may have no hope of matching input

Bottom-up advantages– All trees explored are

consistent with input

Bottom-up disadvantages– Builds structure even if S

cannot be formed– Builds neighboring

structures which can never combine

Page 18: Parsing context-free grammars

Approaches to dealing with ambiguity

parallel exploration depth-first strategy with backtracking

Page 19: Parsing context-free grammars

Improving top-down parsing

Make top-down parser pay attention to input with bottom-up filtering (left-corner parsing)

“The parser should not consider any grammar rule if he current input cannot serve as the first word along the left edge of some derivation from this rule.” [pg. 369]

Left corners are pre-compiled.

Page 20: Parsing context-free grammars

Problems with top-down parsers

left-recursionX * X

* Infinite loop in derivation!

ambiguitynot efficiently handled

recomputationsubtrees can be built multiple times (built, then thrown away during backtracking)

Page 21: Parsing context-free grammars

Earley’s algorithm

Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing.

Dynamic programming involves storing of results so they don’t ever need to be recomputed.

Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N3), where N is length of input in words.

Page 22: Parsing context-free grammars

Data structure

Earley’s algorithm uses a data structure called a chart to store information about the progress of the parse.

A chart contains an entry for each position in the input A position occurs before the first word, between

words, and after the last word.

word1 word2 … wordN

A position is represented by a number; positions in the input are numbered from 0 (at the left) to N (at the right).

Page 23: Parsing context-free grammars

Chart details

A chart entry consists of a sequence of states. A state represents

– a subtree corresponding to a single grammar rule– information about how much of a rule has been processed– information about the span of the subtree w.r.t. the input

A state is represented by an annotated grammar rule– a dot () is used to show how much of the rule has been

processed– a pair of positions, [x,y], indicates the span of the subtree

w.r.t. the input; x is the position of the left edge of the subtree, and y is the position of the dot.

Page 24: Parsing context-free grammars

Three operators on a chart

Predictor– applies when NonTerminal to right of in a state is not a

POS category (i.e. is not a pre-terminal)– adds states to current chart entry

Scanner– applies when NonTerminal to right of in a state is a POS

category (i.e. is a pre-terminal)– adds states to next chart entry

Completer– applies when there is no NonTermial (and hence no

Terminal) to right of in a state (i.e. is at end)– adds states to current chart entry

Page 25: Parsing context-free grammars

Predictor

Suppose rule to which Predicator applies is:

X NT [x,y] Predictor adds, to the current chart entry, a

new state for each possible expansion of NT For each expansion EX of NT, state added is

NT EX [y,y]

Page 26: Parsing context-free grammars

Scanner

Suppose rule to which Scanner applies is:

X POS [x,y] Scanner adds, to the next chart entry, a new

state for each possible expansion of POS The new state added is

X POS [x,y+1]

Page 27: Parsing context-free grammars

Completer

Suppose rule to which Completer applies is:X [x,y]

Completer adds, to the current chart entry, a new state for each possible reduction using the (now completed) state

For each state (from any earlier chart entry) of the form

Y X [w,x]a new state of the following form is added

Y X [w,y]

Page 28: Parsing context-free grammars

Completer (modification)

In order to recover parse tree information from the chart once parsing is complete, we need to modify the completer slightly.

Each state in the chart must be given a unique identifier (N for state N)

Each time the completer adds a state, it also adds the unique identifier of the state completed to the list of previous states for that new state (which is a copy of an already existing state, waiting for the category which the current state just completed).

Page 29: Parsing context-free grammars

Initial state of chart

chart[0] chart[1] chart[2] chart[3]

0: S

Page 30: Parsing context-free grammars

Example (from text)

(work through on board)