earley-parsing.pdf

Upload: haliane

Post on 03-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Earley-Parsing.pdf

    1/27

    11-711 Algorithms for NLP

    The Earley Parsing Algorithm

    Reading:

    Jay Earley,

    An Efficient Context-Free Parsing Algorithm

    Comm. of the ACM vol. 13 (2), pp. 94102

  • 7/28/2019 Earley-Parsing.pdf

    2/27

    The Earley Parsing Algorithm

    General Principles:

    A clever hybrid Bottom-Upand Top-Downapproach

    Bottom-Upparsing completely guided by Top-Downpredictions

    Maintains sets of dotted grammar rules that:

    Reflect what the parser has seen so far

    Explicitly predict the rules and constituents that will combine

    into a complete parse

    Similar to Chart Parsing - partial analyses can be shared

    Time Complexity

    3

    , but better on particular sub-classes

    Developed prior to Chart Parsing, first efficient parsing

    algorithm for general context-free grammars.

    1 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    3/27

    The Earley Parsing Method

    Main Data Structure: The state(or item)

    A state is a dotted rule and starting position:

    1

    The algorithm maintains sets of states, one set for each

    position in the input string (starting from 0)

    We denote the set for position

    by

    2 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    4/27

    The Earley Parsing Algorithm

    Three Main Operations:

    Predictor: If state

    1

    !

    then for every

    rule of the form # 1 # $ , add to the state

    # 1 # $

    Completer: If state

    1

    !

    then for every state

    in

    % of form &

    1

    $

    '

    , add to

    the state

    &

    1 $

    '

    Scanner: If state

    1 )

    !

    and the next

    input word is0

    1

    12

    )

    , then add to

    1

    1 the state

    1 )

    3 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    5/27

    The Earley Recognition Algorithm

    Simplified version with no lookaheads and for grammars

    without epsilon-rules

    Assumes input is string of grammar terminal symbols

    We extend the grammar with a new rule

    3

    $

    The algorithm sequentially constructs the sets

    for

    05

    5

    6

    1

    We initialize the set 0 with 02

    7

    3

    $

    08

    4 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    6/27

    The Earley Recognition Algorithm

    The Main Algorithm: parsing input0

    2

    0 1 09

    1. 02

    7

    3

    $

    08

    2. For 05

    5

    do:

    Process each itemA

    !

    in order by applying to it the single

    applicable operation among:

    (a) Predictor (adds new items to )

    (b) Completer (adds new items to

    )

    (c) Scanner (adds new items to

    1

    1

    3. If

    1

    12B

    , Rejectthe input

    4. If

    2

    and

    9

    1

    127

    3

    $

    08

    then Acceptthe input

    5 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    7/27

    Earley Recognition - Example

    The Grammar:

    1

    CD

    E

    D

    2

    CD

    )FG )H

    3

    CD

    )FG

    4

    CD

    )H

    5

    E

    D

    )I 0 E

    D

    6

    E

    D

    P

    CD

    The original input: 0

    2 The large can can hold the water

    POS assigned input: 0

    2 art adj n aux v art n

    Parser input: 0

    2 art adj n aux v art n $

    6 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    8/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    0: 3

    $ 0

    CD

    E

    D

    0

    CD

    )FG )H

    0

    CD

    )FG

    0

    CD

    )H

    0

    1:

    CD

    )FG

    )H

    0

    CD

    )FG

    0

    7 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    9/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    1: CD )FG )H 0

    CD

    )FG

    0

    2:

    CD

    )FG )H

    0

    8 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    10/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    2: CD )FG )H 0

    3:

    CD

    )FG )H

    0

    9 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    11/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    3: CD )FG )H 0

    CD

    E

    D

    0

    E

    D

    )I 0 E

    D

    3

    E

    D

    P

    CD

    3

    4: E

    D

    )I 0

    E

    D

    3

    10 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    12/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    4: ED )I 0 ED 3

    E

    D

    )I 0 E

    D

    4

    E

    D

    P

    CD

    4

    5:

    E

    D

    P

    C D

    4

    11 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    13/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    5: ED P C D 4

    CD

    )FG )H

    5

    CD

    )FG

    5

    CD

    )H

    5

    6: CD

    )FG

    )H

    5

    CD

    )FG

    5

    12 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    14/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    6: CD )FG )H 5

    CD

    )FG

    5

    7:

    CD

    )FG

    5

    13 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    15/27

    Earley Recognition - Example

    The input: 0

    2 art adj n aux v art n $

    7: CD )FG 5

    E

    D

    P

    CD

    4

    E

    D

    )I 0 E

    D

    3

    CD

    E

    D

    0

    3

    $

    0

    8:

    3

    $

    0

    14 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    16/27

    Time Complexity of Earley Algorithm

    Algorithm iterates for each word of input (i.e.

    iterations)

    How many items can be created and processed in

    ?

    Each item in

    has the form

    1

    ,0

    5

    5

    Thus

    items

    The Scannerand Predictoroperations on an item each requireconstant time

    The Completeroperation on an item adds items of form

    &

    1 $

    '

    to

    , with 05

    '

    5

    , so it may require up

    to

    time for each processed item

    Time required for each iteration (

    ) is thus

    2

    Time bound on entire algorithm is therefore

    3

    15 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    17/27

    Time Complexity of Earley Algorithm

    Special Cases:

    Completeris the operation that may require

    2

    time in

    iteration

    For unambiguous grammars, Earley shows that the completer

    operation will require at most

    time

    Thus time complexity for unambiguous grammars is

    2

    For some grammars, the number of items in each

    is

    bounded by a constant

    These are called bounded-stategrammars and include even

    some ambiguious grammars.

    For bounded-state grammars, the time complexity of the

    algorithm is linear -

    16 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    18/27

    Parsing with an Earley Parser

    As usual, we need to keep back-pointers to the constituents

    that we combine together when we complete a rule

    Each item must be extended to have the form

    1

    G 1

    , where the

    G

    are pointers to the

    already found RHS sub-constituents

    At the end - reconstruct parse from the back-pointers

    To maintain efficiency - we must do ambiguity packing

    17 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    19/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    18 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    20/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    0: 3

    $ 0

    CD

    E

    D

    0

    CD

    )FG )H

    0

    CD

    )FG

    0

    CD

    )H

    0

    1:

    CD

    )FG 1 )H

    0

    1)FG

    CD

    )FG 1

    0

    19 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    21/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    1: CD )FG 1

    )H

    0

    CD

    )FG 1

    0

    2:

    CD

    )FG 1 )H

    2

    0

    2)H

    20 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    22/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    2: CD )FG 1 )H 2

    0

    3:

    CD 4 )FG 1 )H

    2 3

    0

    3

    4 C D )FG 1 )H

    2 3

    21 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    23/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    3: CD 4 )FG 1 )H 2 3 0

    CD 4 ED

    0

    E

    D

    )I 0 E

    D

    3

    E

    D

    P

    CD

    3

    4:

    E

    D

    )I 0

    5

    E

    D

    3

    5)I 0

    22 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    24/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    4: E

    D

    )I 0 5

    E

    D

    3

    E

    D

    )I 0 E

    D

    4

    E

    D

    P

    CD

    4

    5:

    E

    D

    P 6 CD

    4

    6P

    23 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    25/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    5: E

    D

    P 6

    CD

    4

    CD

    )FG )H

    5

    CD

    )FG

    5

    CD

    )H

    5

    6:

    CD

    )FG

    7

    )H

    5

    7)FG

    CD

    )FG 7

    5

    24 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    26/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    6: CD

    )FG 7

    )H

    5

    CD

    )FG 7

    5

    7:

    CD 9 )FG 7 8

    5

    8

    9C D

    )FG 7 8

    25 11-711 Algorithms for NLP

  • 7/28/2019 Earley-Parsing.pdf

    27/27

    Earley Parsing - Example

    The input: 0

    2 art adj n aux v art n $

    7: CD 9 )FG 7 8

    5

    E

    D 10 P 6 CD 9

    4

    10E

    D

    P 6 CD 9

    E

    D 11 )I 0 5 ED 10

    3

    11E

    D

    )I 0 5 ED 10

    12 CD 4 ED 11

    0

    12

    CD 4 ED 11

    3

    $

    0

    8:

    3

    $

    0

    26 11-711 Algorithms for NLP