bottom-up parsing

55
Bottom-Up Parsing CS 471 September 19, 2007

Upload: libby

Post on 21-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Bottom-Up Parsing. CS 471 September 19, 2007. Where Are We?. Finished Top-Down Parsing Starting Bottom-Up Parsing. Lexical Analysis. Syntactic Analysis. Semantic Analysis. Building a Parser. Have a complete recipe for building a parser. Language Grammar. LL(1) Grammar. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bottom-Up Parsing

Bottom-Up Parsing

CS 471September 19,

2007

Page 2: Bottom-Up Parsing

CS 471 – Fall 2007

Where Are We?

Finished Top-Down Parsing

Starting Bottom-Up Parsing

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Page 3: Bottom-Up Parsing

CS 471 – Fall 2007

Building a Parser

Have a complete recipe for building a parser

Language Grammar

LL(1) Grammar

Predictive Parse Table

Recursive-Descent Parser

Recursive-Descent Parser w/AST Gen

Page 4: Bottom-Up Parsing

CS 471 – Fall 2007

Bottom-Up Parsing

More general than top-down parsing• And just as efficient• Builds on ideas in top-down parsing• Preferred method in practice

Also called LR parsing• L means that tokens are read left to right• R means that it constructs a rightmost derivation

Page 5: Bottom-Up Parsing

CS 471 – Fall 2007

Top Down vs. Bottom Up Parsing

Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input

Top Down Bottom Up

scanned

unscanned

Page 6: Bottom-Up Parsing

CS 471 – Fall 2007

An Introductory Example

Consider the following grammar:

E E + ( E ) | int

• Why is this not LL(1)?

LR parsers:• Can handle left-recursion• Don’t need left factoring

Page 7: Bottom-Up Parsing

CS 471 – Fall 2007

The Idea

LR parsing reduces a string to the start symbol by inverting productions:

str = input string of terminals repeat

– Identify in str such that A is a production (i.e., str = )– Replace by A in str (i.e., str becomes A )

until str = G

Page 8: Bottom-Up Parsing

CS 471 – Fall 2007

A Bottom-up Parse in Detail (1)

int++int int

( )

int + (int) + (int)

()

Page 9: Bottom-Up Parsing

CS 471 – Fall 2007

A Bottom-up Parse in Detail (2)

E

int++int int

( )

int + (int) + (int)E + (int) + (int)

()

Page 10: Bottom-Up Parsing

CS 471 – Fall 2007

A Bottom-up Parse in Detail (3)

E

int++int int

( )

int + (int) + (int)E + (int) + (int)E + (E) + (int)

()

E

Page 11: Bottom-Up Parsing

CS 471 – Fall 2007

A Bottom-up Parse in Detail (4)

E

int++int int

( )

int + (int) + (int)E + (int) + (int)E + (E) + (int)E + (int) E

()

E

Page 12: Bottom-Up Parsing

CS 471 – Fall 2007

A Bottom-up Parse in Detail (5)

E

int++int int

( )

int + (int) + (int)E + (int) + (int)E + (E) + (int)E + (int)E + (E)

E

()

EE

Page 13: Bottom-Up Parsing

CS 471 – Fall 2007

A Bottom-up Parse in Detail (6)

E

E

int++int int

( )

int + (int) + (int)E + (int) + (int)E + (E) + (int)E + (int)E + (E)E

E

()

EE

Page 14: Bottom-Up Parsing

CS 471 – Fall 2007

Another example

Grammar:

Is “abbcde” in L(G)?

Yes“Reverse” derivation:

# Production rule

1

2

3

4

G → a A B e

A → A b c

| b

B → d

Rule

Sentential form

-

3

2

4

1

abbcde

aAbcde

aAde

aABe

G

a b b c d e

AB

A

G

Page 15: Bottom-Up Parsing

CS 471 – Fall 2007

Choosing reductions

Basic algorithm:• Search for right sides of productions, reduce• Does this work?

Not always:

Problem: “aAAcde” is not part of any sentential form

Rule Sentential form

-

3

2

?

abbcde

aAbcde

aAAcde

…now what?

# Production rule

1

2

3

4

G → a A B e

A → A b c

| b

B → d

Page 16: Bottom-Up Parsing

CS 471 – Fall 2007

How do we choose?

Important Fact #1 about bottom-up parsing:

An LR parser traces a rightmost derivation in reverse

Page 17: Bottom-Up Parsing

CS 471 – Fall 2007

A →

Why does this help?

Right-most derivation

• A is the right-most non-terminal in 3

• contains only terminal symbols• Unambiguous grammar

– Right-most derivation is unique At each step, reduction is unique

G → 1 → 2 → 3 → 4 → 5 → input

Page 18: Bottom-Up Parsing

CS 471 – Fall 2007

Notation

Split input into two substrings• Right substring (a string of terminals) is as yet

unexamined by parser• Left substring has terminals and non-terminals

The dividing point is marked by a I• The I is not part of the string

Initially, all input is unexamined: Ix1x2 . . . xn

Page 19: Bottom-Up Parsing

CS 471 – Fall 2007

Shift-Reduce Parsing

Bottom-up parsing uses only two kinds of actions: Shift and Reduce

Shift: Move I one place to the right• Shifts a terminal to the left string

E + (I int ) E + (int I )

Reduce: Apply an inverse production at the right end of the left string• If E E + ( E ) is a production, then

E + (E + ( E ) I ) E +(E I )

Page 20: Bottom-Up Parsing

CS 471 – Fall 2007

Shift-Reduce Example

E

E

int++int int

( )

E

()

EE

I int + (int) + (int)$ shift

int I + (int) + (int)$ red. E int

E I + (int) + (int)$ shift 3 times

E + (int I ) + (int)$ red. E int

E + (E I ) + (int)$ shift

E + (E) I + (int)$ red. E E + (E)

E I + (int)$ shift 3 times

E + (int I )$ red. E int

E + (E I )$ shift

E + (E) I $ red. E E + (E)

E I $ accept

Page 21: Bottom-Up Parsing

CS 471 – Fall 2007

How do we keep track?

Left part string implemented as a stack• Top of the stack is the I• Shift:

– Pushes a terminal on the stack• Reduce:

– Pops 0 or more symbols off of the stack– Symbols are right-hand side of a production– Pushes a non-terminal on the stack (production

LHS)• Terminology

– We refer to the top set of symbols as a handle

Page 22: Bottom-Up Parsing

CS 471 – Fall 2007

Shift-Reduce Parsing

derivation stack input stream action

(1+2+(3+4))+5 ← (1+2+(3+4))+5 shift

(1+2+(3+4))+5 ← ( 1+2+(3+4))+5 shift

(1+2+(3+4))+5 ← (1 +2+(3+4))+5 reduce E→num

(E+2+(3+4))+5 ← (E +2+(3+4))+5 reduce S → E

(S+2+(3+4))+5 ← (S +2+(3+4))+5 shift

(S+2+(3+4))+5 ← (S+ 2+(3+4))+5 shift

(S+2+(3+4))+5 ← (S+2 +(3+4))+5 reduce E→num

(S+E+(3+4))+5 ← (S+E +(3+4))+5 reduce S→S+E

(S+(3+4))+5 ← (S +(3+4))+5 shift

(S+(3+4))+5 ← (S+ (3+4))+5 shift

(S+(3+4))+5 ← (S+( 3+4))+5 shift

(S+(3+4))+5 ← (S+(3 +4))+5 reduce E→num

Page 23: Bottom-Up Parsing

CS 471 – Fall 2007

Problem

• How do we know which action to take -- whether to shift or reduce, and which production?

• Sometimes can reduce but shouldn’t

–e.g., X → ε can always be reduced

• Sometimes can reduce in different ways

Page 24: Bottom-Up Parsing

CS 471 – Fall 2007

Action Selection Problem

• Given stack σ and look-ahead symbol b, should parser:

– shift b onto the stack (making it σb)– reduce some production X → γ assuming that

stack has the form γ (making it X)

• If stack has form γ, should apply reduction X → γ (or shift) depending on stack prefix

is different for different possible reductions, since γ’s have different length.

– How to keep track of possible reductions?

Page 25: Bottom-Up Parsing

CS 471 – Fall 2007

Parser States

• Goal: know what reductions are legal at any given point

• Idea: summarize all possible stack prefixes as a finite parser state

• Parser state is computed by a DFA that reads in the stack • Accept states of DFA: unique reduction!

• Summarizing discards information– affects what grammars parser handles– affects size of DFA (number of states)

Page 26: Bottom-Up Parsing

CS 471 – Fall 2007

LR(0) Parser

• Left-to-right scanning, Right-most derivation, “zero” look-ahead characters

• Too weak to handle most language grammars (e.g., “sum” grammar)

• But will help us understand shift-reduce parsing

Page 27: Bottom-Up Parsing

CS 471 – Fall 2007

LR(0) States

• A state is a set of items keeping track of progress on possible upcoming reductions

• An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production

• Stuff before “.” is already on stack (beginnings of possible γ’s to be reduced)

• Stuff after “.” : what we might see next

• The prefixes represented by state itself

E→num ● E→ (● S )

state

item

Page 28: Bottom-Up Parsing

CS 471 – Fall 2007

Start State & Closure

Constructing a DFA to read stack:

• First step: augment grammar with prod’n S →S $

• Start state of DFA: empty stack = S → . S $

• Closure of a state adds items for all productions whose LHS occurs in an item in the state, just after “.”

– Set of possible productions to be reduced next– Added items have the “.” located at the beginning: no

symbols for these items on the stack yet

S →. S $S →. S $

S → . ( L )S → . id

S →( L ) | idL →S | L , S

closure

Page 29: Bottom-Up Parsing

CS 471 – Fall 2007

Applying Terminal Symbols

In new state, include all items that have appropriate input symbol just after dot, advance dot in those items, and take closure.

S ’ → . S $S → . ( L )S → . id

S → id

S →( . L )L → . S

L → . L , SS →. ( L )S → . id

S →( L ) | idL →S | L , S

(

id id (

Page 30: Bottom-Up Parsing

CS 471 – Fall 2007

Applying Nonterminal Symbols

• Non-terminals on stack treated just like terminals (except added by reductions)

S ’ → . S $S → . ( L )S → . id

S → id

S →( . L )L → . S

L → . L , SS →. ( L )S → . id

(

idid (

L → S .

S →( L . )L → L . , S

L

S

S →( L ) | idL →S | L , S

Page 31: Bottom-Up Parsing

CS 471 – Fall 2007

Applying Reduce Actions

• Pop RHS off stack, replace with LHS X (X→γ)

S ’ → . S $S → . ( L )S → . id

S → id .

S →( . L )L → . S

L → . L , SS →. ( L )S → . id

(

idid (

L → S .

S →( L . )L → L . , S

L

S

States causing reductions

Page 32: Bottom-Up Parsing

CS 471 – Fall 2007

Full DFA (Appel p. 62)

• reduce-only state: reduce

• if shift transition for look-ahead: shift otherwise: syntax error

• current state: push stack through DFA

S ’ → . S $S → . ( L )S → . id

S ’ → S . $

final state

S →( . L )L → . SL → . L , SS →. ( L )S → . id

S →id . L → L , . SS → . ( L )S → . id

S → ( L . )L → L . , S

L → S . S → ( L ) .

L → L , S .

S

$

(

id

S

id

L

)

,

idS

(

1

4

2

3

7

8

5

6

9

S →( L ) | idL →S | L , S

(

Page 33: Bottom-Up Parsing

CS 471 – Fall 2007

Parsing Example: ((x),y)

derivation stack input action((x),y) ← 1 ((x),y) shift, goto 3((x),y) ← 1 (3 (x),y) shift, goto 3((x),y) ← 1 (3 (3 x),y) shift, goto 2((x),y) ← 1 (3 (3 x2 ),y) reduce Sid((S),y) ← 1 (3 (3 S7 ),y) reduce LS((L),y) ← 1 (3 (3 L5 ),y) shift, goto 6((L),y) ← 1 (3 (3 L5)6 ,y) reduce S(L)(S,y) ← 1 (3 S7 ,y) reduce LS(L,y) ← 1 (3 L5 ,y) shift, goto 8(L,y) ← 1 (3 L5 , 8 y) shift, goto 9(L,y) ← 1 (3 L5 , 8 y2 ) reduce Sid(L,S) ← 1 (3 L5 , 8 S9 ) reduce LL,S(L) ← 1 (3 L5 ) shift, goto 6(L) ← 1 (3 L5 )6 reduce S(L)S 1 S4 $ done

S →( L ) | idL →S | L , S

Page 34: Bottom-Up Parsing

CS 471 – Fall 2007

Implementation: LR Parsing Table

nextaction

nextstate

input (terminal) symbols non-terminal symbols

state

state

Action table

Used at every step to decide whether to

shift or reduce

Goto table

Used only when reducing, to determine

next state

a XX ▪

Page 35: Bottom-Up Parsing

CS 471 – Fall 2007

Shift-Reduce Parsing Table

Action table

1. shift and goto state n

2. reduce using X → γ– pop symbols γ off stack– using state label of top (end) of stack, look up X

in goto table and goto that state

• DFA + stack = push-down automaton (PDA)

next actions

next state on red’n

state

terminal symbols non-terminal symbols

Page 36: Bottom-Up Parsing

CS 471 – Fall 2007

List Grammar Parsing Table

( ) id , $ S L

1 s3 s2 g4

2 S→id S→id S→id S→id S→id

3 s3 s2 g7 g5

4 accept

5 s6 s8

6 S→(L) S→(L) S→(L) S→(L) S→(L)

7 L→S L→S L→S L→S L→S

8 s3 s2 g9

9 L→L,S L→L,S L→L,S L→L,S L→L,S

Page 37: Bottom-Up Parsing

CS 471 – Fall 2007

Shift-Reduce Parsing

• Grammars can be parsed bottom-up using a DFA + stack

– DFA processes stack σ to decide what reductions might be possible given

– shift-reduce parser or push-down automaton (PDA)– Compactly represented as LR parsing table

• State construction converts grammar into states that decide action to take

Page 38: Bottom-Up Parsing

CS 471 – Fall 2007

Checkpoint

• Limitations of LR(0) grammars

• SLR, LR(1), LALR parsers

• Automatic parser generators

Page 39: Bottom-Up Parsing

CS 471 – Fall 2007

LR(0) Limitations

• An LR(0) machine only works if states with reduce actions have a single reduce action – in those states, always reduce ignoring lookahead

• With more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts

• Need to use look-ahead to choose

L → L , S . L → L , S .S → S ., L

L → L , S .L → S .

ok shift/reduce reduce/reduce

Page 40: Bottom-Up Parsing

CS 471 – Fall 2007

LR(0) Construction

+ $ E

1 2

2 S3/ SE SE

S’ →. S $S→ . E + SS→ . EE→ . numE→ . ( S )

S→E . + SS→E .

S→E + . S

S→ E + S | EE→num | ( S )

E +

1

23

What do we do in state 2?

Page 41: Bottom-Up Parsing

CS 471 – Fall 2007

SLR grammars

Idea: Only add reduce action to table if look-ahead symbol is in the FOLLOW set of the non-terminal being reduced

• Eliminates some conflicts

• FOLLOW(S) = { $, ) }

• Many language grammars are SLR

+ $ E

1 2

2 s3 SE

Page 42: Bottom-Up Parsing

CS 471 – Fall 2007

LR(1) Parsing

• As much power as possible out of 1 lookahead symbol parsing table

• LR(1) grammar = recognizable by a shift/reduce parser with 1 look-ahead.

• LR(1) item = LR(0) item + look-ahead symbols possibly following production

LR(0): S→ . S + E

LR(1): S→ . S + E +

Page 43: Bottom-Up Parsing

CS 471 – Fall 2007

LR(1) State

• LR(1) state = set of LR(1) items

• LR(1) item = LR(0) item + set of lookahead symbols

• No two items in state have same production + dot configuration

S→S . + E +S→S . + E $S→S + . E num

S→S . + E +,$S→S + . E num

Page 44: Bottom-Up Parsing

CS 471 – Fall 2007

LR(1) Closure

Consider A→β . C δ λ Closure formed just as for LR(0) except

1. Look-ahead symbols include characters following the non-terminal symbol to the right of dot: FIRST(δ)

2. If non-terminal symbol may produce last symbol of production (δ is nullable), look-ahead symbols include look-ahead symbols of production (λ)

S →. S $S→ . E + S $S→ . E $E→ . num +,$E→ . ( S ) +,$

S→ E + S | EE→num | ( S )

1

2

Page 45: Bottom-Up Parsing

CS 471 – Fall 2007

LR(1) DFA construction

Given LR(1) state, for each symbol (terminal or non-terminal) following a dot, construct a state with dot shifted across symbol, perform closure

S→ E + S | EE→num | ( S )

S’ →. S $S→ . E + S $S→ . E $E→ . num +,$E→ . ( S) +,$

S→E . + S $S→E . $

E

1

2

Page 46: Bottom-Up Parsing

CS 471 – Fall 2007

LR(1) example

Reductions unambiguous if: look-aheads are disjoint, not to right of any dot in state

S→ E + S | EE→num | ( S )

S’ →. S $S→ . E + S $S→ . E $E→ . num +,$E→ . ( S) +,$

S→E . + S $S→E . $

E

1

2

+ $ E

1 2

2 s3 SE

Page 47: Bottom-Up Parsing

CS 471 – Fall 2007

LALR Grammars

• Problem with LR(1): too many states

• LALR(1) (Look-Ahead LR)– Merge any two LR(1) states whose items are

identical except look-ahead– Results in smaller parser tables—works extremely

well in practice– Usual technology for automatic parser generators

S→id . +S→E . $

S→id . $S→E . ++ = ?

Page 48: Bottom-Up Parsing

CS 471 – Fall 2007

How are Parsers Written?

• Automatic parser generators: yacc, bison

• Accept LALR(1) grammar specification– plus: declarations of precedence, associativity– output: LR parser code (inc. parsing table)

• Some parser generators accept LL(1) – less powerful

Page 49: Bottom-Up Parsing

CS 471 – Fall 2007

Associativity

S→ S + E | E

E→num | ( S )

E→ E + E | num | ( E )

What happens if we run this grammar

through LALR construction?

Page 50: Bottom-Up Parsing

CS 471 – Fall 2007

Conflict!

E→ E + E | num | ( E )

E→ E + E . +E→ E . + E +,$

1+2+3

shift/reduce conflict

Shift: 1+(2+3)

Reduce: (1+2)+3^

Page 51: Bottom-Up Parsing

CS 471 – Fall 2007

Grammar in Parser Generators

non terminal E; terminal PLUS, LPAREN...

precedence left PLUS;

E ::= E PLUS E

| LPAREN E RPAREN

| NUMBER ;

“When shifting + conflicts with reducinga production containing +, choose reduce”

Page 52: Bottom-Up Parsing

CS 471 – Fall 2007

Precedence

Can also handle operator precedence

E→ E + E | T

T →T T | num | ( E )

E→ E + E | E E

| num | ( E )

Page 53: Bottom-Up Parsing

CS 471 – Fall 2007

Conflicts without Precedence

E→ E + E | E E

| num | ( E )

E→ E . + E …E→ E E . +

E→ E + E . E→ E . E …

Page 54: Bottom-Up Parsing

CS 471 – Fall 2007

Conflicts without Precedence

precedence left PLUS;

precedence left TIMES; // TIMES > PLUS

E ::= E PLUS E | E TIMES E | ...

E→ E . + E …E→ E E . +

E→ E + E . E→ E . E …

Rule: in conflict, choose reduce if production symbol higher precedence than shifted symbol; choose shift if vice-versa

Page 55: Bottom-Up Parsing

CS 471 – Fall 2007

Parsing Summary

• Look-ahead information makes SLR(1), LALR(1), LR(1) grammars expressive

• Automatic parser generators support LALR(1)

• Precedence, associativity declarations simplify grammar writing

• Easiest and best way to read structured human-readable input

Coming Up:

• PA3: Defining a grammar for Tiger (Fri 9/28)– No AST … yet

• Test1: Wed 10/3