10. parsing with context-free grammars -speech and language processing-
DESCRIPTION
10. Parsing with Context-free Grammars -Speech and Language Processing-. 발표자 : 정영임 발표일 : 2007. 8. 7. 10.4 The Earley Algorithm. Earley Algorithm Dynamic programming Solution for those three parsing problems Information Represented by Chart: N+1 entries (N: Number of words) Dotted rule - PowerPoint PPT PresentationTRANSCRIPT
10. Parsing with Context-free Grammars-Speech and Language Processing-
발표자 : 정영임발표일 : 2007. 8. 7.
2
10.4 The Earley Algorithm
Earley Algorithm Dynamic programming Solution for those three parsing problems Information Represented by
Chart: N+1 entries (N: Number of words) Dotted rule
e.g.) S → VP• [0,0]
3
10.4 The Earley Algorithm
Fig.10.16 The Earley algorighm
4
10.4 The Earley Algorithm
Predictor To create new states representing top-down expectations
is applied to any state that has a non-terminal immediately to the right of its dot that is not a part-of-speech category
results in the creation of one new state for each alternative expansion of that non-terminal provided by the grammar
begins and ends at the point in the input where the generating state ends.
Example S→• VP, [0,0] Predictor
5
10.4 The Earley Algorithm
Scanner is called to examine the input and incorporate a state
corresponding to the prediction of a word with a particular part-of-speech into the chart.
is accomplished by creating a new state from the input state with the dot advanced over the predicted input category.
Example VP→• Verb NP, [0,0] Scanner consults the current word in the input since the category
following the dot is a part-of-speech. It notes that book can be a verb, matching the expectation in the
current state This results in the creation of the new state VP → Verb• NP, [0,1]. The new state is added to the chart entry that follows the one
currently being processed
6
10.4 The Earley Algorithm
Completer is applied to a state when its dot has reached the right end of the rule. is to find, and advance, all previously created states that were looking for
this grammatical category at this position in the input. New states are then created by copying the older state, advancing the do
t over the expected category, and installing the new state in the current chart entry.
Example NP→ Det Nominal [1,3]• Completer looks for states ending at 1 expecting an NP
– VP→ Verb NP, [0,1]• This results in the addition of a new complete state VP→ Verb NP , [0,3]•
7
10.4 The Earley Algorithm
Fig.10.17 An Example “Book that filght.”
8
10.4 The Earley Algorithm
Retrieving Parse Trees from a Chart the version of the Earley algorithm is a recognizer not a parser.
valid sentences will leave the state S → α , • [0,N] in the chart. Extraction of individual parses from the chart
the representation of each state must be augmented with an additional field to store information about the completed states that generated its constituents.
change necessary is to have COMPLETER add a pointer to the older state onto a list of constituent-states for the new state.
following pointers starting with the state (or states) representing a complete S in the final chart entry.
9
10.4 The Earley Algorithm
Retrieving Parse Trees from a Chart
22
218
191418
23
22
21
19
18
14
8
10.18
10
10.4 The Earley Algorithm
Cost at tree retrieval process if there are an exponential number of trees for a given sentence, the algo
rithm will require an exponential amount of time to return them all.
The Earley algorithm may fill the table in O(N3) time but it can’t magically return them as quickly.
11
10.5 Finite-State Parsing Methods
Efficient in a partial parse or shallow parse Recognition of basic phrases(noun groups, verb groups,
location, preposition and etc.) Extraction of some sort of template in required data
12
10.5 Finite-State Parsing Methods
Finite-state rules for detecting noun groups(NG) NG → Pronoun|Time-NP|Date-NP NG → (DETP)(Adjs) HdNns|DETP Ving HdNns|DETP-CP (and HdNns) DETP → DETP-CP|DETP-INCP DETP-CP → ({Adv-pre-num|“another”|{Det|Pro-Poss}({Adv-pre-num|“only”(“o
ther”)})})Number|Q|Q-er|(“the”)Q-est| “another”|Det-cp|DetQ|Pro-Poss-cp DETP-INCP {{{Det|Pro-Poss}|“only”|“a”|“an”|Det-incomp|Pro-Poss-incomp}(“o
ther”)|(DET-CP)“other”} Adjs → AdjP({ “,”|(“,”) Conj}{AdjP|Vparticiple})* AdjP → Ordinal|{(Q-er|Q-est}{Adj|Vparticiple}+|Number(“-”){“month”| “day” |
“year”}(“-”) “old”}
13
10.5 Finite-State Parsing Methods
Finite-state rules for detecting noun groups(NG) (Ctnd’) HdNns -> HdNn(“and” HdNn) HdNn -> PropN|{PreNs|PropNPreNs}N[!Time-NP]
|{PropN CommonN[!Time-NP]} PreNs -> PreN(“and” PreN2)* PreN -> (Adj”-”)Common-Sing-N PreN2 -> PreN|Ordinal|Adj-noun-like
14
10.5 Finite-State Parsing Methods
Fig. 10.20-10.21
15
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar Allowing only a limited amount of recursion
FASTUS does this by using its automata cascade The second level of FASTUS finds non-recursive noun group The third level combines these groups into larger NP-like units by
– adding on measure phrases» 20,000 iron and “metal wood” clubs a month
– Attaching preposition phrases» Production of 20,000 iron and “metal wood” …
– Dealing with noun group conjunction» A local concern and a Japanese trading house
=> By splitting the parsing into two levels, NP on the left side is treated as a different kind of object from NP on the right side
16
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar Chunk-based partial parsing via a set of finite-set cascades(Abney, 1996)
17
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar Recursive Transition Network(RTN)
RTN is defined by a set of graphs like those in Fig.10.20 and Fig. 10.21
Each arc contains a terminal or non-terminal nodeDifference between RTN and FSA
– In an RTN, whenever the machine comes to an arc labeled with a non-terminals, it treats that non-terminal as a subroutine
» It places its current location onto a stack » It jumps to the non-terminal» Then it jumps back when that non-terminal has been parsed
RTN is exactly equivalent to a context-free grammar– A graphical way to view a simple top-down parser for context-free
rules