parsing with context-free grammars cc437. parsing parsing is the process of recognizing and...

40
PARSING WITH CONTEXT-FREE GRAMMARS cc437

Upload: myles-casey

Post on 16-Dec-2015

273 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

PARSING WITH CONTEXT-FREE GRAMMARS

cc437

Page 2: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

PARSING

Parsing is the process of recognizing and assigning STRUCTURE

Parsing a string with a CFG: – Finding a derivation of the string consistent with

the grammar– The derivation gives us a PARSE TREE

Page 3: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLE (CFR LAST WEEK)

Page 4: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

PARSING AS SEARCH

Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS

There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered

Page 5: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES

The search has to be guided by the INPUT and the GRAMMAR

TOP-DOWN search: the parse tree has to be rooted in the start symbol S– EXPECTATION-DRIVEN parsing

BOTTOM-UP search: the parse tree must be an analysis of the input– DATA-DRIVEN parsing

Page 6: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

AN EXAMPLE OF TOP-DOWN SEARCH(IN PARALLEL)

Page 7: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

AN EXAMPLE OF BOTTOM-UP SEARCH

Page 8: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

NON-PARALLEL SEARCH

If it’s not possible to examine all alternatives in parallel, it’s necessary to make further decisions:– Which node in the current search space to

expand first (breadth-first or depth-first)– Which of the applicable grammar rules to expand

first– Which leaf node in a parse tree to expand next

(e.g., leftmost)

Page 9: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT

Page 10: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)

Page 11: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)

Page 12: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)

Page 13: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

A T-D, D-F, L-R PARSER

Page 14: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

TOP-DOWN vs BOTTOM-UP

TOP-DOWN:– Only search among grammatical answers– BUT: suggests hypotheses that may not be

consistent with data– Problem: left-recursion

BOTTOM-UP:– Only forms hypotheses consistent with data– BUT: may suggest hypotheses that make no

sense globally

Page 15: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

LEFT-RECURSION

A LEFT-RECURSIVE grammar may cause a T-D, D-F, L-R parser to never return

Examples of left-recursive rules:– NP NP PP– S S and S– But also:

NP Det Nom Det NP’s

Page 16: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

THE PROBLEM WITH LEFT-RECURSION

Page 17: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

LEFT-RECURSION: POOR SOLUTIONS

Rewrite the grammar to a weakly equivalent one– Problem: may not get correct parse tree

Limit the depth during search– Problem: limit is arbitrary

Page 18: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

LEFT-CORNER PARSING

A hybrid of top-down and bottom-up parsing Strategy: don’t consider any expansion

unless the current input can serve as the LEFT-CORNER of that expansion

Page 19: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

FURTHER PROBLEMS IN PARSING

Ambiguity – Church and Patel (1982): the number of

attachment ambiguities grows like the Catalan numbers

C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867

Avoiding reparsing

Page 20: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

COMMON STRUCTURAL AMBIGUITIES

COORDINATION ambiguity– OLD (MEN AND WOMEN) vs

(OLD MEN) AND WOMEN

ATTACHMENT ambiguity:– Gerundive VP attachment ambiguity

I saw the Eiffel Tower flying to Paris

– PP attachment ambiguity I shot an elephant in my pajamas

Page 21: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

PP ATTACHMENT AMBIGUITY

Page 22: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

AMBIGUITY: SOLUTIONS

Use a PROBABILISTIC GRAMMAR (not covered in this module)

Use semantics

Page 23: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

AVOID RECOMPUTING INVARIANTS

Consider parsing with a top-down parser the NP:– A flight from Indianapolis to Houston on TWA

With the grammar rules:– NP Det Nominal– NP NP PP– NP ProperNoun

Page 24: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

INVARIANTS AND TOP-DOWN PARSING

Page 25: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

THE EARLEY ALGORITHM

Page 26: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

DYNAMIC PROGRAMMING

A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way

A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work

The Earley algorithm also– Does not suffer from the left-recursion problem– Solves an exponential problem in O(n3)

Page 27: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

THE CHART

The Earley algorithm uses a table (the CHART) of size N+1, where N is the length of the input

– Table entries sit in the `gaps’ between words

Each entry in the chart is a list of – Completed constituents– In-progress constituents– Predicted constituents

All three types of objects are represented in the same way as STATES

Page 28: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

THE CHART: GRAPHICAL REPRESENTATION

Page 29: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

STATES

A state encodes two types of information:– How much of a certain rule has been encountered

in the input– Which positions are covered– A , [X,Y]

DOTTED RULES– VP V NP – NP Det Nominal– S VP

Page 30: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLES

Page 31: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

SUCCESS

The parser has succeeded if entry N+1 of the chart contains the state– S , [0,N]

Page 32: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

THE ALGORITHM

The algorithm loops through the input without backtracking, at each step performing three operations:– PREDICTOR: add predictions to the chart– COMPLETER: Move the dot to the right when

looked-for constituent is found– SCANNER: read in the next input word

Page 33: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

THE ALGORITHM: CENTRAL LOOP

Page 34: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EARLEY ALGORITHM: THE THREE OPERATORS

Page 35: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLE, AGAIN

Page 36: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLE: BOOK THAT FLIGHT

Page 37: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLE: BOOK THAT FLIGHT (II)

Page 38: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLE: BOOK THAT FLIGHT (III)

Page 39: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

EXAMPLE: BOOK THAT FLIGHT (IV)

Page 40: PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

READINGS

Jurafsky and Martin, chapter 10.1-10.4